Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3611811acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation

Published: 27 October 2023 Publication History

Abstract

High-accuracy Dichotomous Image Segmentation (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes. The main challenge for DIS involves identifying the highly accurate dominant area while rendering detailed object structure. However, directly using a general encoder-decoder architecture may result in an oversupply of high-level features and neglect the shallow spatial information necessary for partitioning meticulous structures. To fill this gap, we introduce a novel Unite-Divide-Unite Network (UDUN) that restructures and bipartitely arranges complementary features to simultaneously boost the effectiveness of trunk and structure identification. The proposed UDUN proceeds from several strengths. First, a dual-size input feeds into the shared backbone to produce more holistic and detailed features while keeping the model lightweight. Second, a simple Divide-and-Conquer Module (DCM) is proposed to decouple multiscale low- and high-level features into our structure decoder and trunk decoder to obtain structure and trunk information respectively. Moreover, we design a Trunk-Structure Aggregation module (TSA) in our union decoder that performs cascade integration for uniform high-accuracy segmentation. As a result, UDUN performs favorably against state-of-the-art competitors in all six evaluation metrics on overall DIS-TE, i.e., achieving 0.772 weighted F-measure and 977 HCE. Using 1024X1024 input, our model enables real-time inference at 65.3 fps with ResNet-18. The source code is available at https://github.com/PJLallen/UDUN.

Supplemental Material

MP4 File
This video provides a comprehensive introduction to UDUN, a novel Unite-Divide-Unite Network for high-accuracy dichotomous image segmentation. The video's structure is as follows: We start with an overview of the High-accuracy Dichotomous Image Segmentation (DIS) task and present our unique insights and motivations. Next, we look in depth at the overall architecture of our UDUN network and provide a thorough explanation of its workflow. Then, we conduct a comprehensive comparison with other cutting-edge models, presenting both quantitative and qualitative results to highlight the superior performance of our method. Finally, we showcase a series of ablation studies to demonstrate the effectiveness of each component and illustrate real-world applications of the UDUN network in relevant scenarios.

References

[1]
Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In CVPR. IEEE, Miami, Florida, USA, 1597--1604.
[2]
Abhishek Chaurasia and Eugenio Culurciello. 2017. Linknet: Exploiting encoder representations for efficient semantic segmentation. In VCIP. IEEE, Petersburg, FL, USA, 1--4.
[3]
Wuyang Chen, Ziyu Jiang, Zhangyang Wang, Kexin Cui, and Xiaoning Qian. 2019. Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images. In CVPR. IEEE, Long Beach, CA, USA, 8924--8933.
[4]
Zuyao Chen, Qianqian Xu, Runmin Cong, and Qingming Huang. 2020. Global context-aware progressive aggregation network for salient object detection. In AAAI. AAAI Press, New York, NY, USA, 10599--10606.
[5]
Ho Kei Cheng, Jihoon Chung, Yu-Wing Tai, and Chi-Keung Tang. 2020. Cascadepsp: Toward class-agnostic and very high-resolution segmentation via global and local refinement. In CVPR. IEEE, Seattle, WA, USA, 8890--8899.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. IEEE, Miami, Florida, USA, 248--255.
[7]
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In ICCV. IEEE, Venice, Italy, 4548--4557.
[8]
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment measure for binary foreground map evaluation. In IJCAI. AAAI Press, Stockholm, Sweden, 1--10.
[9]
Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, and Ling Shao. 2021a. Concealed object detection. IEEE TPAMI, Vol. 44, 10 (2021), 6024--6042.
[10]
Deng-Ping Fan, Jing Zhang, Gang Xu, Ming-Ming Cheng, and Ling Shao. 2022. Salient objects in clutter. IEEE TPAMI, Vol. 45, 2 (2022), 2344--2366.
[11]
Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Junfeng Luo, and Xiaolin Wei. 2021b. Rethinking bisenet for real-time semantic segmentation. In CVPR. IEEE, virtual, 9716--9725.
[12]
Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. 2019. Res2net: A new multi-scale backbone architecture. IEEE TPAMI, Vol. 43, 2 (2019), 652--662.
[13]
Shaohua Guo, Liang Liu, Zhenye Gan, Yabiao Wang, Wuhao Zhang, Chengjie Wang, Guannan Jiang, Wei Zhang, Ran Yi, Lizhuang Ma, et al. 2022. Isdnet: Integrating shallow and deep networks for efficient ultra-high resolution segmentation. In CVPR. IEEE, New Orleans, LA, USA, 4361--4370.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. IEEE, Las Vegas, NV, USA, 770--778.
[15]
Hanzhe Hu, Yinbo Chen, Jiarui Xu, Shubhankar Borse, Hong Cai, Fatih Porikli, and Xiaolong Wang. 2022. Learning implicit feature alignment function for semantic segmentation. In ECCV. Elsevier, Tel Aviv, Israel, 487--505.
[16]
Qi Li, Weixiang Yang, Wenxi Liu, Yuanlong Yu, and Shengfeng He. 2021. From contexts to locality: Ultra-high resolution image segmentation via locality-aware contextual correlation. In ICCV. IEEE, Montreal, QC, Canada, 7252--7261.
[17]
Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian L Curless, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2021. Real-time high-resolution background matting. In CVPR. IEEE, virtual, 8762--8771.
[18]
Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. 2014. How to evaluate foreground maps?. In CVPR. IEEE, Columbus, OH, USA, 248--255.
[19]
Haiyang Mei, Ge-Peng Ji, Ziqi Wei, Xin Yang, Xiaopeng Wei, and Deng-Ping Fan. 2021. Camouflaged object segmentation with distraction mining. In CVPR. IEEE, virtual, 8772--8781.
[20]
Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, and Huchuan Lu. 2022. Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In CVPR. IEEE, New Orleans, LA, USA, 2160--2170.
[21]
Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In CVPR. IEEE, Providence, RI, USA, 733--740.
[22]
Tobias Pohlen, Alexander Hermans, Markus Mathias, and Bastian Leibe. 2017. Full-resolution residual networks for semantic segmentation in street scenes. In CVPR. IEEE, Honolulu, HI, USA, 4151--4160.
[23]
Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. 2022. Highly accurate dichotomous image segmentation. In ECCV. Springer, Tel Aviv, Israel, 38--56.
[24]
Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R Zaiane, and Martin Jagersand. 2020. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern recognition, Vol. 106 (2020), 107404.
[25]
Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. 2019. Basnet: Boundary-aware salient object detection. In CVPR. IEEE, Long Beach, CA, USA, 7479--7489.
[26]
Eduardo Romera, Jose M Alvarez, Luis M Bergasa, and Roberto Arroyo. 2017. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE TITS, Vol. 19, 1 (2017), 263--272.
[27]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, Vol. 9351. Springer, Munich, Germany, 234--241.
[28]
Tiancheng Shen, Yuechen Zhang, Lu Qi, Jason Kuen, Xingyu Xie, Jianlong Wu, Zhe Lin, and Jiaya Jia. 2022. High quality segmentation for ultra high-resolution images. In CVPR. IEEE, New Orleans, LA, USA, 1310--1319.
[29]
Lv Tang, Bo Li, Yijie Zhong, Shouhong Ding, and Mofei Song. 2021. Disentangled high quality salient object detection. In ICCV. IEEE, Montreal, QC, Canada, 3580--3590.
[30]
Yang Tian, Hualong Bai, Shengdong Zhao, Chi-Wing Fu, Chun Yu, Haozhao Qin, Qiong Wang, and Pheng-Ann Heng. 2022. Kine-Appendage: Enhancing Freehand VR Interaction Through Transformations of Virtual Appendages. IEEE TVCG, Vol. 1, 1 (2022), 1--17.
[31]
Jeya Maria Jose Valanarasu and Vishal M Patel. 2022. Unext: Mlp-based rapid medical image segmentation network. In MICCAI. Springer, Singapore, 23--33.
[32]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020. Deep high-resolution representation learning for visual recognition. IEEE TPAMI, Vol. 43, 10 (2020), 3349--3364.
[33]
Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, Haibin Ling, and Ruigang Yang. 2021. Salient object detection in the deep learning era: An in-depth survey. IEEE TPAMI, Vol. 44, 6 (2021), 3239--3259.
[34]
Jun Wei, Shuhui Wang, and Qingming Huang. 2020a. F3Net: fusion, feedback and focus for salient object detection. In AAAI. AAAI Press, New York, NY, USA, 12321--12328.
[35]
Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, and Qi Tian. 2020b. Label decoupling framework for salient object detection. In CVPR. IEEE, Seattle, WA, USA, 13025--13034.
[36]
Chenxi Xie, Changqun Xia, Mingcan Ma, Zhirui Zhao, Xiaowu Chen, and Jia Li. 2022. Pyramid grafting network for one-stage high resolution saliency detection. In CVPR. IEEE, New Orleans, LA, USA, 11717--11726.
[37]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In ECCV. Elsevier, Munich, Germany, 325--341.
[38]
Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, and Alan Yuille. 2021. Mask guided matting via progressive refinement network. In CVPR. IEEE, virtual, 1154--1163.
[39]
Yi Zeng, Pingping Zhang, Jianming Zhang, Zhe Lin, and Huchuan Lu. 2019. Towards high-resolution salient object detection. In ICCV. IEEE, Seoul, Korea (South), 7234--7243.
[40]
Miao Zhang, Tingwei Liu, Yongri Piao, Shunyu Yao, and Huchuan Lu. 2021a. Auto-msfnet: Search multi-scale fusion network for salient object detection. In Multimedia. ACM, Chengdu, China, 667--676.
[41]
Pingping Zhang, Wei Liu, Yi Zeng, Yinjie Lei, and Huchuan Lu. 2021b. Looking for the detail and context devils: High-resolution salient object detection. IEEE TIP, Vol. 30 (2021), 3204--3216.
[42]
Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, and Lei Zhang. 2020. Suppress and balance: A simple gated network for salient object detection. In ECCV. Elsevier, Glasgow, UK, 35--51.
[43]
Zixu Zhao, Yueming Jin, and Pheng-Ann Heng. 2022. Trasetr: track-to-segment transformer with contrastive query for instance-level instrument segmentation in robotic surgery. In ICRA. IEEE, Philadelphia, PA, USA, 11186--11193.
[44]
Zhirui Zhao, Changqun Xia, Chenxi Xie, and Jia Li. 2021. Complementary trilateral decoder for fast and accurate salient object detection. In Multimedia. ACM, Chengdu, China, 4967--4975.
[45]
Hongwei Zhu, Peng Li, Haoran Xie, Xuefeng Yan, Dong Liang, Dapeng Chen, Mingqiang Wei, and Jing Qin. 2022. I can find you! Boundary-guided separated attention network for camouflaged object detection. In AAAI. AAAI Press, New York, NY, USA, 3608--3616.
[46]
Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, and Ling Shao. 2023. Salient object detection via integrity learning. IEEE TPAMI, Vol. 45, 3 (2023), 3738--3752.

Cited By

View all
  • (2024)Bilateral Reference for High-Resolution Dichotomous Image SegmentationCAAI Artificial Intelligence Research10.26599/AIR.2024.9150038(9150038)Online publication date: Dec-2024
  • (2024)Segment Anything with Precise InteractionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681470(3790-3799)Online publication date: 28-Oct-2024
  • (2024)Multi-View Aggregation Network for Dichotomous Image Segmentation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00376(3921-3930)Online publication date: 16-Jun-2024
  • Show More Cited By

Index Terms

  1. Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dichotomous image segmentation
    2. fully convolutional network
    3. high-resolution detection

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)83
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 23 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bilateral Reference for High-Resolution Dichotomous Image SegmentationCAAI Artificial Intelligence Research10.26599/AIR.2024.9150038(9150038)Online publication date: Dec-2024
    • (2024)Segment Anything with Precise InteractionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681470(3790-3799)Online publication date: 28-Oct-2024
    • (2024)Multi-View Aggregation Network for Dichotomous Image Segmentation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00376(3921-3930)Online publication date: 16-Jun-2024
    • (2024)Cross-Domain Facial Expression Recognition by Combining Transfer Learning and Face-Cycle Generative Adversarial NetworkMultimedia Tools and Applications10.1007/s11042-024-18713-yOnline publication date: 11-Mar-2024
    • (2024)Boundary-aware dichotomous image segmentationThe Visual Computer10.1007/s00371-024-03295-540:12(9051-9062)Online publication date: 26-Feb-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media