nature.com

Fbd-SV-2024: Flying Bird Object Detection Dataset in Surveillance Video

AbstractA Flying Bird Dataset for Surveillance Videos (FBD-SV-2024) is introduced and tailored for the development and performance evaluation of flying bird detection algorithms in surveillance videos. This dataset comprises 483 video clips, amounting to 28,694 frames in total. Among them, 23,833 frames contain 28,366 instances of flying birds. The proposed dataset of flying birds in surveillance videos is collected from realistic surveillance scenarios, where the birds exhibit characteristics such as inconspicuous features in single frames (in some instances), generally small sizes, and shape variability during flight. These attributes pose challenges that need to be addressed when developing flying bird detection methods for surveillance videos. Finally, advanced (video) object detection algorithms were selected for experimentation on the proposed dataset, and the results demonstrated that this dataset remains challenging for the algorithms above.

Background & SummaryBirds, a common animal in our daily lives, can sometimes pose hazards to human production and living activities. Examples include bird strikes at airports1, birds pecking at crops2, and bird droppings causing short circuits and trips in substations3. These bird-related incidents seriously threaten the rapid socio-economic development. In response to these hazards, bird repelling is necessary. Traditional bird repelling methods involve continuously emitting physical signals (such as sound and light) to stimulate birds, thereby achieving the purpose of driving them away. These methods may be effective initially, but birds will gradually adapt to these physical stimuli, leading to a decline in the effectiveness of bird repelling3,4, which has prompted the development of the “detection-repelling” model, in which the detection of flying bird objects is one of the primary tasks.Initially, flying bird detection primarily relied on radar detection methods5,6. Radar detection of birds boasts advantages such as high detection accuracy, long detection range, and minimal impact from weather conditions. However, radar detection also has numerous drawbacks1,7, including the fact that radar imagery is not intuitive for humans, making it inconvenient for direct human viewing. Additionally, the high cost of radar equipment hinders its widespread adoption in fields such as bird prevention in power grids and agriculture. In contrast, flying bird detection systems utilizing surveillance cameras and based on computer vision exhibit many advantages, including low cost, ease of deployment, and simplicity in maintenance. With the continuous advancement of deep learning technology, the recognition accuracy of computer vision-based object recognition algorithms (including static image object detection8,9,10,11,12,13,14 and video object detection15,16,17,18,19) has been steadily improving, surpassing human visual recognition capabilities in specific domains. In this process, datasets—collections of input data paired with corresponding desired outputs—play a crucial role in the development of deep learning algorithms, and this is also true for surveillance video flying bird object detection datasets.Currently, several datasets exist for bird objects. Yet, due to differing tasks or scenarios, the flying bird objects within them do not possess the characteristics commonly seen in surveillance videos (where, in most cases, single-frame image features are inconspicuous, object sizes are small, and shapes vary significantly during flight), rendering them unsuitable for the development of flying bird detection algorithms specifically for surveillance videos. Examples include the CUB-200-2011 dataset20, the Birdsnap dataset21, and the NABirds dataset22, where bird objects are typically large and clearly visible, making them primarily useful for bird species classification. The Drone2021 bird dataset proposed by Sanae Fujii et al.23 in 2021 and the MVA2023 dataset for flying birds introduced by Yuki Kondo et al.24 in 2023, captured from a drone’s perspective, can be applied in applications such as drone collision avoidance and bird dispersal but do not fully align with the needs of surveillance video detection. The wind farm flying bird dataset established by R. Yoshihashi et al.25 in 2015 and the airport flying bird dataset created by Hongyu Sun and his team26 in 2022, though captured under surveillance camera angles, suffer from limited scene diversity and simplistic backgrounds, where flying bird objects are conspicuously featured. In summary, there remains a relative scarcity of datasets that can comprehensively reflect the characteristics of flying bird objects in surveillance videos, which in turn constrains the advancement of detection methods in this field.Addressing the issues above, this paper presents a surveillance video flying bird object dataset (FBD-SV-2024) aimed at facilitating the development and performance evaluation of flying bird object detection algorithms for surveillance videos. This dataset comprises 483 video clips, totaling 28,694 frames of images. Among them, 23,833 frames contain 28,366 flying bird objects. A random selection of 83 video clips from this dataset serves as the test set, while the remaining 400 video clips constitute the training set. The video images in this dataset were captured in real-world surveillance scenarios, making them suitable for the development and performance evaluation of practical flying bird object detection algorithms for surveillance videos.MethodsThe creation process of the FBD-SV-2024 dataset mainly consists of three steps: Collection of video clips, Frame extraction from the videos, and Annotation of flying bird objects. The following provides a detailed account of these three steps.Collection of Video ClipsWe installed a pole on the fifth floor of a building located near a field, and mounted a spherical camera (model: Hikvision DS-2DE7232IW-A, resolution set to 1280 × 720, frame rate set to 25fps) that can rotate 360 degrees and supports zoom adjustment on top of the pole. During a six-month period of surveillance video collection (spanning from December 2023 to May 2024), we continuously adjusted the angle and focus of the surveillance camera to capture videos from different perspectives. After the collection process, we manually screened and clipped the video clips containing flying bird objects for preservation (it should be noted that not all frames in some video clips contain flying bird objects). In the end, 483 video clips containing flying bird objects were collected, totaling 28,694 image frames. The collected video clips were numbered and named sequentially from 1 to 483 (bird_1.mp4, bird_2.mp4, …, bird_483.mp4).Frame Extraction From the VideosFor the subsequent annotation of flying birds and the convenience of model training, the collected video clips are divided into video frames. Two naming conventions were employed for the video frames. The first involved appending the video name and sequentially numbering the frames starting from 0 (e.g., bird_1_000000.jpg, bird_1_000001.jpg, …). This naming convention is convenient for image-based object detection methods. The second convention directly numbers the frames starting from 0. It organizes them within folders named after the corresponding video (e.g., bird_1/000000.jpg, bird_1/000001.jpg, …, with all frames from the same video placed in a folder named after that video). This naming convention is advantageous for video-based object detection methods.Annotation of Flying Bird ObjectsThe open-source tool “labelImg”27 was utilized to annotate the categories (bird) and bounding boxes of flying bird objects in the images. The annotation process was conducted in three rounds, with each round handled by different individuals in a cross-checking manner (i.e., one person processed a batch of data in a particular round, and another handled the same batch in the subsequent round). The first round involved general annotation, the second round focused on checking for missed or incorrect annotations, and the third round refined the bounding boxes. Upon completion of annotation, two additional pieces of information were added to the labels: the object difficulty level and the object’s ID within the current video. The difficulty level represents the recognition complexity, categorizing all flying bird objects into four levels ranging from 0 to 3, with 0 indicating easy, 1 indicating moderate, 2 indicating difficult, and 3 indicating hard (determined subjectively by humans, thus serving as a reference only). Additionally, video frames without flying bird objects were labeled, but these labels only contained basic image information (such as image name, image size, etc.) and did not include information related to flying bird objects. Table 1 summarizes the primary information of the label files used for object detection and video object detection, respectively.Table 1 Main Information in the Label File.Full size tableAfter the annotation was completed, 83 video clips were randomly selected as the test set, and the remaining 400 video clips were used as the training set. Table 2 provides specific information for the training and test sets.Table 2 Specific Information of Training Set and Test Set.Full size tableCharacteristics of Flying Bird Objects in FBD-SV-2024 DatasetThrough observation and analysis, it has been found that the flying bird objects in the collected dataset exhibit characteristics such as inconspicuous features in single frames, sparse distribution, generally small size, and varying shapes during flight. Next, this article will analyze and present these characteristics one by one.The Features of Flying Bird Objects in Single-frame Images are not Obvious in Some CasesDue to the complexity of the background environment in surveillance videos, some flying bird objects (in single-frame images) blend seamlessly into the environment. In such cases, the human eye struggles to detect these flying bird objects solely based on a single frame. Statistics reveal that in the dataset, cases where the features of single-frame flying bird objects are not apparent (with a manual judgment difficulty level of 2 or 3) account for 36.7% of all flying bird objects. Fig. 1 showcases some examples of such flying bird objects.Fig. 1Examples of flying bird objects with inconspicuous features in single-frame images (red boxes indicate the bounding boxes of the flying bird objects). (a) Examples of flying bird objects annotated with difficulty level 2. (b) Examples of flying bird objects annotated with difficulty level 3.Full size imageAlthough the features of flying bird objects in single-frame images are not evident in these cases, observing consecutive frames can still reveal the presence of these objects, as shown in Fig. 2. Therefore, when training a flying bird object detection model using the dataset provided in this paper, it is recommended to consider the information of flying bird objects across consecutive frames.Fig. 2Screenshots of the corresponding flying bird objects in Fig. 1 across 5 consecutive frames (highlighted by the yellow dashed box in Fig. 1).Full size imageThe Distribution of Flying Bird Objects is Relatively SparseWe counted the number of images containing different numbers of flying bird objects in the dataset, and the statistics are shown in Fig. 3. As can be seen from the figure, in images containing flying bird objects, the number of birds is generally low, with most images containing only one or two flying bird objects, and the maximum number of birds in any image does not exceed four. It is worth noting that there are 4,861 images in the dataset that do not contain any flying bird objects at all. Therefore, the overall distribution of flying bird objects in the dataset is relatively sparse, which may lead to a more severe imbalance between positive and negative samples. When developing flying bird objects detection methods for surveillance videos, this issue should be fully considered.Fig. 3Statistics of the number of images containing different numbers of flying bird objects in the dataset.Full size imageThe Size of Flying Bird Objects is Generally SmallFig. 4(a) illustrates a schematic diagram of a surveillance camera capturing flying bird objects (in an ideal scenario without ground, perching sites, or obstructions). The figure shows that the camera’s imaging space resembles a quadrangular pyramid. This paper divides this quadrangular pyramid space into three regions: I, II, and III. Flying bird objects located in Region I are closer to the camera, occupying more pixels in the video frame, and can be considered large objects [e.g., the flying bird object at point P, with its imaging shown in Fig. 4(b))]. Flying bird objects in Regions II and III are farther from the camera, occupying fewer pixels in the video frame and appearing smaller compared to objects of the same size in Region I [e.g., the flying bird object at point Q, with its imaging shown in Fig. 4(c)]. According to the properties of a quadrangular pyramid, the spatial volumes of Regions II and III are significantly larger than that of Region I. Therefore, if flying bird objects were uniformly distributed within the camera’s imaging space (in reality, the distribution of flying bird objects in the camera’s imaging space is influenced by various factors such as perching sites and obstructions; this assumption is made for a rough analysis of the size characteristics of flying bird objects in surveillance videos), the probability of objects being located in Regions II and III would be much higher than in Region I. Consequently, theoretically, flying bird objects in surveillance videos tend to be small in size.Fig. 4Schematic diagram of a surveillance camera capturing flying bird objects and the sizes of their imaging representations. (a) Schematic diagram of a surveillance camera capturing flying bird objects. (b) Schematic diagram of the imaging of a flying bird object at point P. (c) Schematic diagram of the imaging of the same flying bird object at point Q.Full size imageFurthermore, this paper also analyzes the size distribution of bird objects within the dataset (with Fig. 5 presenting a bar chart of the distribution of bird object sizes and Fig. 6 showing a scatter plot of bird object sizes). From Fig. 5, it can be observed that the sizes of flying bird objects are primarily distributed within the range of 10 × 10 pixels to 70 × 70 pixels, accounting for approximately 94% of all flying bird objects. Fig. 7 showcases examples of flying bird objects from the dataset with sizes falling within the 10 × 10 to 70 × 70 pixel range. Statistical calculations reveal that flying bird objects with sizes smaller than 32 × 32 pixels constitute approximately 49.99% of all objects, those with sizes between 32 × 32 and 96 × 96 pixels account for approximately 48.48% and objects larger than 96 × 96 pixels make up only about 1.53%. These statistics indicate that the dataset’s flying bird objects are generally small. Therefore, special attention should be given to small-scale objects when developing methods for detecting flying bird objects in surveillance videos.Fig. 5Distribution of flying bird object sizes within the dataset.Full size imageFig. 6Scatter plot of object sizes in the dataset.Full size imageFig. 7Examples of flying bird objects with sizes ranging from 10 × 10 to 70 × 70 pixels (the yellow dashed box before enlargement measures 80 × 80).Full size imageThe Appearance of Flying Birds Varies Greatly During Their FlightMost flying birds must flap their wings continuously to generate sufficient lift and maintain a balance between their gravitational and lift forces, which is precisely why flying birds are considered non-rigid objects, with their appearances constantly changing in a periodic (or non-periodic) manner. Fig. 8 demonstrates the various states of some flying bird objects during their flight in the dataset. As can be seen from the figure, the appearances of flying birds are in constant variation throughout their flight. Furthermore, the visual aspects of most flying bird objects are asymmetric and irregular in surveillance videos.Fig. 8Examples of flying bird objects in the process of flight captured by surveillance videos.Full size imageDue to the significant changes in the appearance of flying bird objects during flight, tasks related to the association of object bounding boxes between consecutive video frames (such as object tracking) should not overly rely on the Intersection over Union (IoU) between boxes when designing association matching algorithms. For the asymmetric and irregular appearance of flying bird objects, if an anchor-free deep learning approach is adopted for flying bird detection, it is recommended to employ a dynamic label assignment method when training the detection model.Data RecordsThe FBD-SV-202428 dataset has been shared via figshare (link: https://figshare.com/s/1ca0193680f894a65371), Baidu Disk (link: https://pan.baidu.com/s/1sw7bv4BeiMnHWyH4BNutYg?pwd=48w4), and Kaggle (link: https://www.kaggle.com/datasets/swjtuziwei/fbd-sv-2024), and the relevant data processing code is also provided (code link: https://github.com/Ziwei89/FBD-SV-2024_github).The directory structure of the dataset is shown in Fig. 9. Under the main directory ‘FBD-SV-2024’, there are four subdirectories: ‘videos’, ‘images’, ‘labels’, and ‘VID’. Among them, ‘videos’ contains all the video clips in the dataset. The images in the ‘images’ directory correspond to the labels in the ‘labels’ directory, with each image sample being independent and suitable for the development of image-based object detection algorithms. The ‘VID’ directory further contains two subdirectories, ‘images’ and ‘labels’, where the images belong to specific video clips and are used for the development of video object detection algorithms.Fig. 9The directory structure of the FBD-SV-2024 dataset. Note: Due to the vast amount of image data, we have not directly uploaded the image files. However, within the ‘FBOD-SV-2024’ directory, we have specially added a subdirectory named ‘scripts’, which contains script tools for splitting video files into images. To learn about the specific usage of these scripts, please refer to the ‘readme’ file located in the ‘FBOD-SV-2024’ directory.Full size imageTechnical ValidationIn this section, this paper adopts state-of-the-art object detection algorithms to conduct experiments on the proposed dataset, aiming to demonstrate that the dataset remains challenging even for these advanced algorithms. Simultaneously, it provides a comparative reference for developing flying bird object detection algorithms in surveillance videos utilizing the dataset proposed in this paper. Specifically, image-based object detection methods such as YOLOV5L8, YOLOV6L10, YOLOXL9, YOLOV8L12, YOLOV9E13, YOLOV10L14, SSD29, and video-based object detection methods like FGFA15, SELSA16, Temporal RoI Align17, as well as our previous works FBOD-BMI30, FBOD-SV31 for flying bird object detection in surveillance videos, have been included in the experiments. Next, this paper will introduce and analyze the experimental platform, implementation details, evaluation methods, and experimental results.Experimental PlatformsThe experimental hardware platform is a desktop computer with an Intel Core i7-12700 CPU, 32 GB of RAM, and an NVIDIA GeForce RTX 3090 graphics card with 24 GB of video memory. The experimental software platform includes the Ubuntu 22.04 operating system, Python 3.10.6, Pytorch 1.11.0, and CUDA 11.3.Implementation DetailsThe (video) object detection algorithms such as YOLOV5L8, YOLOV6L10, YOLOXL9, YOLOV8L12, YOLOV9E13, YOLOV10L14, SSD29, FGFA15, SELSA16, and Temporal RoI Align17 utilized their respective open-source codes. Among them, YOLOV5L8, YOLOV6L10, YOLOV8L12, YOLOV9E13, and YOLOV10L14 employ the open-source framework provided by Ultralytics (https://github.com/ultralytics/ultralytics), while YOLOXL9 and SSD29 utilize the mmdetection32 open-source framework. FGFA15, SELSA16, and Temporal RoI Align17 adopt the mmtracking33 open-source framework. All models were trained from scratch on the training set without using pre-trained models. All data augmentation methods provided by the corresponding open-source frameworks were applied during training. Evaluations were conducted on the test set.Evaluation MetricsThis paper adopts the evaluation metric of Average Precision (AP) from Pascal VOC 200734, a commonly used metric for evaluating object detection algorithms, to evaluate the detection results of the models. Specifically, AP50 (The subscript 50 indicates that a test result is considered a true positive when the IOU between the test result and the true value is greater than or equal to 50%. That is, the IOU threshold is set to 50%), AP75 (subscript 75 has a similar meaning to subscript 50), and AP (average accuracy exceeds multiple threshold averages, and the IOU threshold is set from 50% to 95% with 5% interval) are adopted.Experimental ResultsThe quantitative experimental results are shown in Table 3. In terms of detection accuracy AP50, FBOD-SV31 performs the best, achieving 71.9%. Compared to YOLOXL9 (62.0%), which has the highest accuracy among image-based object detection methods, FBOD-SV’s AP50 is 9.9% higher; compared to SELSA16 (40.0%), which has the highest accuracy among video-based object detection methods, it is 31.9% higher; and compared to FBOD-BMI31, it is 2.7% higher. It is worth noting that image-based methods generally outperform video-based methods in terms of AP50. In addition, regarding detection efficiency, video-based object detection methods have the slowest running speed.Table 3 Quantitative Experimental Results (APS, APM, APL represent the detection performance of flying bird objects with sizes ranging from 0 ~ 32 × 32 pixels, 32 × 32 ~ 48 × 48 pixels, and 48 × 48 ~ pixels, respectively).Full size tableMeanwhile, this paper selects several advanced detection methods, including YOLOXL9, YOLOV8L12, SSD29, SELSA16, Temporal RoI Align17, and FBOD-SV31, and calculates their confusion matrices on the FBD-SV-2024 test set (as shown in Fig. 10). These confusion matrices reveal the performance characteristics of each method. Specifically, YOLOXL9, YOLOV8L12, and SSD29 exhibit significant missed detections during the testing process, meaning that some objects are not correctly identified. FBOD-SV31 has a relatively low missed detection rate but a higher false positive rate (incorrectly identifying non-objects as objects). SELSA16 and Temporal RoI Align17 not only suffer from severe missed detections but also have a high number of false positives.Fig. 10Confusion matrices of several advanced detection methods on the FBD-SV-2024 test set, where the True Positive IoU threshold is set to 0.2 and the True Positive score threshold is set to 0.3.Full size imageImage-based object detection methods primarily rely on the appearance features of objects in still images for detection. However, in single frames of surveillance video, some flying bird objects may exhibit inconspicuous appearance features and small sizes, which are prone to being lost during the feature extraction process, leading to detection failure.Although video-based methods utilize information from multiple frames, they typically extract intermediate features from single frames first, followed by feature aggregation. For flying bird objects with inconspicuous features, features are easily lost during single-frame feature extraction, resulting in incorrect features during feature aggregation and subsequently reducing detection accuracy (as training a detection model using these incorrect features may lead to overfitting).Two methods, FBOD-BMI30 and FBOD-SV31, specifically designed for detecting flying bird objects in surveillance videos, effectively address the issue of inconspicuous bird features in single frames by aggregating flying bird object features before feature extraction. These methods also consider and address challenges such as the small size and irregular shape of flying bird objects, achieving significant results. However, these two methods only aggregate information from consecutive n frames. When certain flying bird objects with inconspicuous features remain momentarily stationary within consecutive n frames, i.e., without additional spatiotemporal information changes, this feature aggregation strategy may still result in the generation of incorrect features. Therefore, there is still room for further improvement in the detection performance of these two methods.In summary, current image-based, video-based, and methods specifically designed for detecting flying bird objects in surveillance videos still face numerous challenges when processing the flying bird object dataset in surveillance videos mentioned in this paper. Specifically, these methods exhibit low quantitative evaluation metrics and suffer from a high number of false positives and missed detections during the prediction and inference process. Therefore, we suggest that when developing detection methods for flying bird objects in surveillance videos, emphasis should be placed on how to effectively utilize the information of flying bird objects across all historical video frames. At the same time, it is also necessary to fully consider the issue of imbalanced positive and negative samples, as well as the challenges posed by small objects, irregular shapes, and varying appearances during flight.

Code availability

The FBD-SV-2024 dataset28 has been shared via figshare (link: https://figshare.com/s/1ca0193680f894a65371), Baidu Disk (link: https://pan.baidu.com/s/1sw7bv4BeiMnHWyH4BNutYg?pwd=48w4), and Kaggle (link: https://www.kaggle.com/datasets/swjtuziwei/fbd-sv-2024), and the relevant data processing code is also provided (code link: https://github.com/Ziwei89/FBD-SV-2024_github).

ReferencesWu, T., Luo, X. & Xu, Q. A new skeleton based flying bird detection method for low-altitude air traffic management. Chinese Journal of Aeronautics 31, 2149–2164, https://doi.org/10.1016/j.cja.2018.01.018 (2018).MATH 

Google Scholar 

Enos, J. K., Ward, M. P. & Hauber, M. E. A review of the scientific evidence on the impact of biologically salient frightening devices to protect crops from avian pests. Crop Protection 148, 105734, https://doi.org/10.1016/j.cropro.2021.105734 (2021).MATH 

Google Scholar 

Gao, W., Wu, Y., Hong, C., Wai, R.-J. & Fan, C.-T. Rcvnet: A bird damage identification network for power towers based on fusion of rf images and visual images. Advanced Engineering Informatics 57, 102104, https://doi.org/10.1016/j.aei.2023.102104 (2023).

Google Scholar 

Cinkler, T., Nagy, K., Simon, C., Vida, R. & Rajab, H. Two-phase sensor decision: Machine-learning for bird sound recognition and vineyard protection. IEEE Sensors Journal 22, 11393–11404, https://doi.org/10.1109/JSEN.2021.3134817 (2022).ADS 

Google Scholar 

Hoffmann, F., Ritchie, M., Fioranelli, F., Charlish, A. & Griffiths, H. Micro-doppler based detection and tracking of uavs with multistatic radar. In 2016 IEEE Radar Conference (RadarConf), 1–6, https://doi.org/10.1109/RADAR.2016.7485236 (2016).Jahangir, M., Baker, C. J. & Oswald, G. A. Doppler characteristics of micro-drones with l-band multibeam staring radar. In 2017 IEEE Radar Conference (RadarConf), 1052–1057, https://doi.org/10.1109/RADAR.2017.7944360 (2017).Li, J., Shimasaki, K. & Ishii, I. Long-distance avian identification approach based on high-frame-rate video. In 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), 1–7, https://doi.org/10.1109/CASE56687.2023.10260444 (2023).Contributors, Y. You only look once version 5. https://github.com/ultralytics/yolov5 (2021).Ge, Z., Liu, S., Wang, F., Li, Z. & Sun, J. YOLOX: exceeding YOLO series in 2021. arxiv preprint arxiv: 2107.08430. (2021).Li, C. et al. Yolov6: A single-stage object detection framework for industrial applications. arxiv preprint arxiv: 2209.02976. (2022).Wang, C.-Y., Bochkovskiy, A. & Liao, H.-Y. M. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr52729.2023.00721 (2023).Ang, G. J. N. et al. A novel application for real-time arrhythmia detection using yolov8. arxiv preprint arxiv:2305.16727. (2023).Wang, C.-Y., Yeh, I.-H. & Mark Liao, H. Y. Yolov9: Learning what you want to learn using programmable gradient information. In European conference on computer vision, 1-21. (2024).Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., & Han, L. Yolov10: Real-time end-to-end object detection. Advances in Neural Information Processing Systems, 37, 107984-108011 (2024).Zhu, X., Wang, Y., Dai, J., Yuan, L. & Wei, Y. Flow-guided feature aggregation for video object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), 408–417, https://doi.org/10.1109/ICCV.2017.52 (2017).Wu, H., Chen, Y., Wang, N. & Zhang, Z.-X. Sequence level semantics aggregation for video object detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 9216–9224, https://doi.org/10.1109/ICCV.2019.00931 (2019).Gong, T. et al. Temporal roi align for video object recognition. In The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 1442–1450 (2021).Han, L., Wang, P., Yin, Z., Wang, F. & Li, H. Class-aware feature aggregation network for video object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 8165–8178, https://doi.org/10.1109/TCSVT.2021.3094533 (2022).MATH 

Google Scholar 

Xu, C., Zhang, J., Wang, M., Tian, G. & Liu, Y. Multilevel spatial-temporal feature aggregation for video object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 7809–7820, https://doi.org/10.1109/TCSVT.2022.3183646 (2022).MATH 

Google Scholar 

Wah, C., Branson, S., Welinder, P., Perona, P. & Belongie, S. Caltech-ucsd birds-200-2011. Tech. Rep. CNS-TR-2011-001, California Institute of Technology (2011).Berg, T. et al. Birdsnap: Large-scale fine-grained visual categorization of birds. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2019–2026, https://doi.org/10.1109/CVPR.2014.259 (2014).Van Horn, G. et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 595–604, https://doi.org/10.1109/CVPR.2015.7298658 (2015).Fujii, S., Akita, K. & Ukita, N. Distant bird detection for safe drone flight and its dataset. In 2021 17th International Conference on Machine Vision and Applications (MVA), 1–5, https://doi.org/10.23919/MVA51890.2021.9511386 (2021).Kondo, Y. et al. Mva2023 small object detection challenge for spotting birds: Dataset, methods, and results. In 2023 18th International Conference on Machine Vision and Applications (MVA), 1–11, https://doi.org/10.23919/MVA57639.2023.10215935 (2023).Yoshihashi, R., Kawakami, R., Iida, M. & Naemura, T. Bird detection and species classification with time-lapse images around a wind farm: Dataset construction and evaluation. Wind Energy 20, 1983–1995 (2017).ADS 

Google Scholar 

Sun, H. et al. Airbirds: A large-scale challenging dataset for bird strike prevention in real-world airports. In Wang, L., Gall, J., Chin, T.-J., Sato, I. & Chellappa, R. (eds.) Computer Vision – ACCV 2022, 409–424 (Springer Nature Switzerland, Cham, 2023).Tzutalin. Labelimg. https://github.com/HumanSignal/labelImg (2015).Sun, Z.-W. et al. FBD-SV-2024, https://doi.org/10.6084/m9.figshare.27377595.v1 (2024).Liu, W. et al. Ssd: Single shot multibox detector. In 2016 European Conference on Computer Vision (ECCV) (2016).Sun, Z.-W., Hua, Z.-X., Li, H.-C. & Zhong, H.-Y. Flying bird object detection algorithm in surveillance video based on motion information. IEEE Transactions on Instrumentation and Measurement 73, 1–15, https://doi.org/10.1109/TIM.2023.3334348 (2024).MATH 

Google Scholar 

Sun, Z.-W., Hua, Z.-X., Li, H.-C. & Li, Y. A flying bird object detection method for surveillance video. IEEE Transactions on Instrumentation and Measurement 73, 1–14, https://doi.org/10.1109/TIM.2024.3435183 (2024).MATH 

Google Scholar 

Chen, K. et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).Contributors, M. MMTracking: OpenMMLab video perception toolbox and benchmark. https://github.com/open-mmlab/mmtracking (2020).Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J. & Zisserman, A. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 303–338, https://doi.org/10.1007/s11263-009-0275-4 (2010).

Google Scholar 

Download referencesAcknowledgementsWe would like to thank Zi-Wei Sun, Ze-Xi Hua, Zhi-Peng Qi, Xiang Li, Yan Li, and Jin-Chi Zhang for their work in the construction and production of the dataset, and Zi-Wei Sun, Ze-Xi Hua, Heng-Chao Li for their work in the writing of the dataset description paper.Author informationAuthors and AffiliationsSouthwest Jiaotong University, School of Information Science and Technology, Chengdu, 611756, ChinaZi-Wei Sun, Ze-Xi Hua, Heng-Chao Li, Zhi-Peng Qi, Xiang Li, Yan Li & Jin-Chi ZhangAuthorsZi-Wei SunView author publicationsYou can also search for this author inPubMed Google ScholarZe-Xi HuaView author publicationsYou can also search for this author inPubMed Google ScholarHeng-Chao LiView author publicationsYou can also search for this author inPubMed Google ScholarZhi-Peng QiView author publicationsYou can also search for this author inPubMed Google ScholarXiang LiView author publicationsYou can also search for this author inPubMed Google ScholarYan LiView author publicationsYou can also search for this author inPubMed Google ScholarJin-Chi ZhangView author publicationsYou can also search for this author inPubMed Google ScholarContributionsZi-Wei Sun took overall responsibility for the construction of the dataset, participating in data collection, data annotation and processing, and also contributed to the writing of the paper. Ze-Xi Hua provided financial and equipment support during the dataset construction process, and additionally engaged in revising the paper. Heng-Chao Li was involved in revising the paper. Zhi-Peng Qi, Xiang Li, Yan Li, and Jin-Chi Zhang participated in the data annotation work.Corresponding authorsCorrespondence to

Ze-Xi Hua or Heng-Chao Li.Ethics declarations

Competing interests

The authors declare no competing interests.

Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissionsAbout this articleCite this articleSun, ZW., Hua, ZX., Li, HC. et al. FBD-SV-2024: Flying Bird Object Detection Dataset in Surveillance Video.

Sci Data 12, 530 (2025). https://doi.org/10.1038/s41597-025-04872-6Download citationReceived: 11 November 2024Accepted: 20 March 2025Published: 29 March 2025DOI: https://doi.org/10.1038/s41597-025-04872-6Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

Read full news in source page