Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks

Published: 07 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Long-term monitoring and recognition of underwater organism objects are of great significance in marine ecology, fisheries science and many other disciplines. Traditional techniques in this field, including manual fishing-based ones and sonar-based ones, are usually flawed. Specifically, the method based on manual fishing is time-consuming and unsuitable for scientific researches, while the sonar-based one, has the defects of low acoustic image accuracy and large echo errors. In recent years, the rapid development of deep learning and its excellent performance in computer vision tasks make vision-based solutions feasible. However, the researches in this area are still relatively insufficient in mainly two aspects. First, to our knowledge, there is still a lack of large-scale datasets of underwater organism images with accurate annotations. Second, in consideration of the limitation on hardware resources of underwater devices, an underwater organism detection algorithm that is both accurate and lightweight enough to be able to infer in real time is still lacking. As an attempt to fill in the aforementioned research gaps to some extent, we established the Multiple Kinds of Underwater Organisms (MKUO) dataset with accurate bounding box annotations of taxonomic information, which consists of 10,043 annotated images, covering eighty-four underwater organism categories. Based on our benchmark dataset, we evaluated a series of existing object detection algorithms to obtain their accuracy and complexity indicators as the baseline for future reference. In addition, we also propose a novel lightweight module, namely Sparse Ghost Module, designed especially for object detection networks. By substituting the standard convolution with our proposed one, the network complexity can be significantly reduced and the inference speed can be greatly improved without obvious detection accuracy loss. To make our results reproducible, the dataset and the source code are available online at https://cslinzhang.github.io/MKUO-and-Sparse-Ghost-Module/.

    References

    [1]
    K. Anantharajah, Z. Ge, C. McCool, S. Denman, C. Fookes, P. Corke, D. Tjondronegoro, and S. Sridharan. 2014. Local inter-session variability modelling for object classification. In IEEE Winter Conference on Applications of Computer Vision. 309–316.
    [2]
    A. R. Appenzeller and W. C. Leggett. 1992. Bias in hydroacoustic estimates of fish abundance due to acoustic shadowing: Evidence from day–night surveys of vertically migrating fish. Canadian Journal of Fisheries and Aquatic Sciences 49, 10 (1992), 2179–2189.
    [3]
    O. Beijbom, P. J. Edmunds, D. I. Kline, B. G. Mitchell, and D. Kriegman. 2012. Automated annotation of coral reef survey images. In IEEE Conference on Computer Vision and Pattern Recognition. 1170–1177.
    [4]
    O. Beijbom, T. Treibitz, D. I. Kline, G. Eyal, A. Khen, B. Neal, Y. Loya, B. G. Mitchell, and D. Kriegman. 2016. Improving automated annotation of benthic survey images using wide-band fluorescence. Scientific Reports 6, 1 (2016), 1–11.
    [5]
    B. J. Boom, P. X. Huang, J. He, and R. B. Fisher. 2012. Supporting ground-truth annotation of image datasets using clustering. In International Conference on Pattern Recognition. 1542–1545.
    [6]
    K. Cai, X. Miao, W. Wang, H. Pang, Y. Liu, and J. Song. 2020. A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone. Aquacultural Engineering 91 (2020), 102117:1–9.
    [7]
    Northeast Fisheries Science Center. 2022. Habitat mapping camera (HABCAM). https://habcam.whoi.edu/data-and-visualization/
    [8]
    F. Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1800–1807.
    [9]
    G. Cutter, K. Stierhoff, and J. Zeng. 2015. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: Labeled Fishes in the Wild. In IEEE Winter Applications and Computer Vision Workshops. 57–62.
    [10]
    M. Dawkins, C. Stewart, S. Gallager, and A. York. 2013. Automatic scallop detection in benthic environments. In IEEE Workshop on Applications of Computer Vision. 160–167.
    [11]
    J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248–255.
    [12]
    M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2010. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.
    [13]
    C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang. 2021. TOOD: Task-aligned one-stage object detection. In IEEE International Conference on Computer Vision. 3490–3499.
    [14]
    Australian Centre for Field Robotics. 2022. Tasmania Coral Point Count. http://marine.acfr.usyd.edu.au/datasets/
    [15]
    K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu. 2020. GhostNet: More features from cheap operations. In IEEE Conference on Computer Vision and Pattern Recognition. 1577–1586.
    [16]
    K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
    [17]
    A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
    [18]
    A. Joly, H. Goëau, H. Glotin, C. Spampinato, P. Bonnet, W. Vellinga, R. Planqué, A. Rauber, S. Palazzo, B. Fisher, and H. Müller. 2014. LifeCLEF 2014: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 229–249.
    [19]
    A. Joly, H. Goëau, H. Glotin, C. Spampinato, P. Bonnet, W. Vellinga, R. Planqué, A. Rauber, S. Palazzo, B. Fisher, and H. Müller. 2015. LifeCLEF 2015: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 462–483.
    [20]
    J. Jäger, M. Simon, J. Denzler, V. Wolff, K. Fricke-Neuderth, and C. Kruschel. 2015. Croatian fish dataset: Fine-grained classification of fish species in their natural habitat. In British Machine Vision Conference Workshops. 6.1–6.7.
    [21]
    K. Kim and H. S. Lee. 2020. Probabilistic anchor assignment with IoU prediction for object detection. In European Conference on Computer Vision. 355–371.
    [22]
    T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, and J. Shi. 2020. FoveaBox: Beyond anchor-based object detection. IEEE Transactions on Image Processing 29 (2020), 7389–7398.
    [23]
    A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
    [24]
    Y. Li, Y. Chen, X. Dai, D. Chen, M. Liu, L. Yuan, Z. Liu, L. Zhang, and N. Vasconcelos. 2021. MicroNet: Improving image recognition with extremely low FLOPs. In IEEE International Conference on Computer Vision. 458–467.
    [25]
    J. Lin, W. Chen, Y. Lin, J. Cohn, C. Gan, and S. Han. 2020. MCUNet: Tiny deep learning on IoT devices. In Advances in Neural Information Processing Systems. 11711–11722.
    [26]
    T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318–327.
    [27]
    T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. 740–755.
    [28]
    C. V. Linnaeus. 1753. Species Plantarum: Exhibentes Plantas Rite Cognitas, Ad Genera Relatas, Cum Differentiis Specificis, Nominibus Trivialibus, Synonymis Selectis, Locis Natalibus, Secundum Systema Sexuale Digestas. Vol. 1. Holmiae, Impensis Laurentii Salvii. 572 pages.
    [29]
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. 21–37.
    [30]
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In International Conference on Learning Representations. 1–13.
    [31]
    R. Lyu. 2021. NanoDet-Plus. https://github.com/RangiLyu/nanodet/releases/tag/v1.0.0-alpha-1/
    [32]
    A. Mahmood, M. Bennamoun, S. An, F. Sohel, F. Boussaid, R. Hovey, G. Kendrick, and R. B. Fisher. 2016. Automatic annotation of coral reefs using deep learning. In OCEANS 2016 MTS/IEEE Monterey. 1–5.
    [33]
    Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun. 2021. YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv: 2107.08430 (2021).
    [34]
    O. A. Misund, A. Aglen, and E. Frønæs. 1995. Mapping the shape, size, and density of fish schools by echo integration and a high-resolution sonar. ICES Journal of Marine Science 52, 1 (1995), 11–20.
    [35]
    National Oceanic and Atmospheric Administration. 2021. How much of the ocean have we explored? https://oceanservice.noaa.gov/facts/exploration.html
    [36]
    OpenAI. 2020. GPT-3: Language Models are Few-Shot Learners. https://github.com/openai/gpt-3/
    [37]
    K. Ovchinnikova, M. A. James, T. Mendo, M. Dawkins, J. Crall, and K. Boswarva. 2021. Exploring the potential to use low cost imaging and an open source convolutional neural network detector to support stock assessment of the king scallop (Pecten maximus). Ecological Informatics 62 (2021), 101233:1–10.
    [38]
    M. Pedersen, J. Bruslund Haurum, R. Gade, and T. B. Moeslund. 2019. Detection of marine animals in a new underwater dataset with varying visibility. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 18–26.
    [39]
    J. Redmon and A. Farhadi. 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
    [40]
    S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.
    [41]
    A. Salman, A. Jalal, F. Shafait, A. Mian, M. Shortis, J. Seager, and E. Harvey. 2016. Fish species classification in unconstrained underwater environments based on deep learning. Limnology and Oceanography: Methods 14, 9 (2016), 570–585.
    [42]
    S. A. Siddiqui, A. Salman, M. I. Malik, F. Shafait, A. Mian, M. R. Shortis, and E. S. Harvey. 2017. Automatic fish species classification in underwater videos: Exploiting pre-trained deep neural network models to compensate for limited labelled data. ICES Journal of Marine Science 75, 1 (2017), 374–389.
    [43]
    L. Soukup. 2021. Automatic coral reef annotation, localization and pixel-wise parsing using mask R-CNN. In Working Notes of CLEF. 1359–1364.
    [44]
    P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, C. Wang, and P. Luo. 2021. Sparse R-CNN: End-to-end object detection with learnable proposals. In IEEE Conference on Computer Vision and Pattern Recognition. 14449–14458.
    [45]
    Ultralytics. 2021. YOLOv5. https://github.com/ultralytics/yolov5/
    [46]
    G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie. 2018. The iNaturalist species classification and detection dataset. In IEEE Conference on Computer Vision and Pattern Recognition. 8769–8778.
    [47]
    S. Villon, M. Chaumont, G. Subsol, S. Villeger, T. Claverie, and D. Mouillot. 2016. Coral reef fish detection and recognition in underwater videos by supervised machine learning: Comparison between deep learning and HOG+SVM methods. In Advanced Concepts for Intelligent Vision Systems. 160–171.
    [48]
    C. Wang, H. M. Liao, Y. Wu, P. Chen, J. Hsieh, and I. Yeh. 2020. CSPNet: A new backbone that can enhance learning capability of CNN. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1571–1580.
    [49]
    J. Wang, W. Zhang, Y. Cao, K. Chen, J. Pang, T. Gong, J. Shi, C. C. Loy, and D. Lin. 2020. Side-aware boundary localization for more precise object detection. In European Conference on Computer Vision. 403–419.
    [50]
    Y. Wu, Y. Chen, L. Yuan, Z. Liu, L. Wang, H. Li, and Y. Fu. 2020. Rethinking classification and localization for object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 10183–10192.
    [51]
    N. Wulandari, I. Ardiyanto, and H. A. Nugroho. 2022. A comparison of deep learning approach for underwater object detection. Journal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 6, 2 (2022), 252–258.
    [52]
    H. Zhang, H. Chang, B. Ma, N. Wang, and X. Chen. 2020. Dynamic R-CNN: Towards high quality object detection via dynamic training. In European Conference on Computer Vision. 260–275.
    [53]
    H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H. Shum. 2023. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In International Conference on Learning Representations. 1–19.
    [54]
    H. Zhang, Y. Wang, F. Dayoub, and N. Sünderhauf. 2021. VarifocalNet: An IoU-aware dense object detector. In IEEE Conference on Computer Vision and Pattern Recognition. 8510–8519.
    [55]
    S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE Conference on Computer Vision and Pattern Recognition. 9756–9765.
    [56]
    S. Zhang, X. Wang, J. Wang, J. Pang, C. Lyu, W. Zhang, P. Luo, and K. Chen. 2023. Dense distinct query for end-to-end object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 7329–7338.
    [57]
    X. Zhang, X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.
    [58]
    P. Zhuang, Y. Wang, and Y. Qiao. 2018. WildFish: A large benchmark for fish recognition in the wild. In ACM International Conference on Multimedia. 1301–1309.
    [59]
    P. Zhuang, Y. Wang, and Y. Qiao. 2021. WildFish++: A comprehensive fish benchmark for multimedia research. IEEE Transactions on Multimedia 23 (2021), 3603–3617.
    [60]
    J. Zwolinski, P. G. Fernandes, V. Marques, and Y. Stratoudakis. 2009. Estimating fish abundance from acoustic surveys: Calculating variance due to acoustic backscatter and length distribution error. Canadian Journal of Fisheries and Aquatic Sciences 66, 12 (2009), 2081–2095.

    Index Terms

    1. An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 5
      May 2024
      650 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3613634
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 February 2024
      Online AM: 11 January 2024
      Accepted: 08 January 2024
      Revised: 19 November 2023
      Received: 23 February 2023
      Published in TOMM Volume 20, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Benchmark dataset
      2. object detection
      3. lightweight module.

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • Shanghai Science and Technology Innovation Plan
      • Shuguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission
      • Fundamental Research Funds for the Central Universities

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 263
        Total Downloads
      • Downloads (Last 12 months)263
      • Downloads (Last 6 weeks)36
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media