Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning

Published: 17 February 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Mirrors are everywhere in our daily lives. Existing computer vision systems do not consider mirrors, and hence may get confused by the reflected content inside a mirror, resulting in a severe performance degradation. However, separating the real content outside a mirror from the reflected content inside it is non-trivial. The key challenge is that mirrors typically reflect contents similar to their surroundings, making it very difficult to differentiate the two. In this article, we present a novel method to segment mirrors from a single RGB image. To the best of our knowledge, this is the first work to address the mirror segmentation problem with a computational approach. We make the following contributions: First, we propose a novel network, called MirrorNet+, for mirror segmentation, by modeling both contextual contrasts and semantic associations. Second, we construct the first large-scale mirror segmentation dataset, which consists of 4,018 pairs of images containing mirrors and their corresponding manually annotated mirror masks, covering a variety of daily-life scenes. Third, we conduct extensive experiments to evaluate the proposed method and show that it outperforms the related state-of-the-art detection and segmentation methods. Fourth, we further validate the effectiveness and generalization capability of the proposed semantic awareness contextual contrasted feature learning by applying MirrorNet+ to other vision tasks, i.e., salient object detection and shadow detection. Finally, we provide some applications of mirror segmentation and analyze possible future research directions. Project homepage: https://mhaiyang.github.io/TOMM2022-MirrorNet+/index.html.

    References

    [1]
    Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 12 (2017), 2481–2495.
    [2]
    Maxim Berman, Amal Rannen Triki, and Matthew B. Blaschko. 2018. The Lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In CVPR. 4413–4421.
    [3]
    Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niebner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. 2017. Matterport3D: Learning from RGB-D data in indoor environments. In 3DV. 667–676.
    [4]
    Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2017), 834–848.
    [5]
    Shuhan Chen, Xiuli Tan, Ben Wang, and Xuelong Hu. 2018. Reverse attention for salient object detection. In ECCV. 234–250.
    [6]
    Yizhen Chen and Haifeng Hu. 2021. Y-Net: Dual-branch joint network for semantic segmentation. ACM Trans. Multim. Comput. Commun. Applic. 17, 4 (2021), 1–22.
    [7]
    Zuyao Chen, Qianqian Xu, Runmin Cong, and Qingming Huang. 2020. Global context-aware progressive aggregation network for salient object detection. In AAAI, Vol. 34. 10599–10606.
    [8]
    Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, and Shi-Min Hu. 2014. Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2014), 569–582.
    [9]
    Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, and Pheng-Ann Heng. 2018. R3Net: Recurrent residual refinement network for saliency detection. In IJCAI. 684–690.
    [10]
    Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, and Gang Wang. 2018. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In CVPR. 2393–2402.
    [11]
    Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao. 2020. PraNet: Parallel reverse attention network for polyp segmentation. In MICCAI. 263–273.
    [12]
    Mengyang Feng, Huchuan Lu, and Errui Ding. 2019. Attentive feedback network for boundary-aware salient object detection. In CVPR. 1623–1632.
    [13]
    Lianli Gao, Jingkuan Song, Dongxiang Zhang, and Heng Tao Shen. 2018. Coarse-to-fine image co-segmentation with intra and inter rank constraints. In IJCAI. 719–725.
    [14]
    Shaohua Guo, Liang Liu, Zhenye Gan, Yabiao Wang, Wuhao Zhang, Chengjie Wang, Guannan Jiang, Wei Zhang, Ran Yi, Lizhuang Ma, and Ke Xu. 2022. ISDNet: Integrating shallow and deep networks for efficient ultra-high resolution segmentation. In CVPR. 4361–4370.
    [15]
    Junjun He, Zhongying Deng, and Yu Qiao. 2019. Dynamic multi-scale filters for semantic segmentation. In ICCV. 3562–3572.
    [16]
    Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In ICCV. 2961–2969.
    [17]
    Q. Hou, M. M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. S. Torr. 2019. Deeply supervised salient object detection with short connections. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2019).
    [18]
    Xiaowei Hu, Lei Zhu, Chi-Wing Fu, Jing Qin, and Pheng-Ann Heng. 2018. Direction-aware spatial context features for shadow detection. In CVPR. 7454–7462.
    [19]
    Jiaying Lin, Xin Tan, Ke Xu, Lizhuang Ma, and Rynson W. H. Lau. 2022. Frequency-aware camouflaged object detection. ACM Trans. Multim. Comput. Commun. Applic. (2022).
    [20]
    Philipp Krähenbühl and Vladlen Koltun. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials. NeurIPS 24 (2011).
    [21]
    Guanbin Li and Yizhou Yu. 2015. Visual saliency based on multiscale deep features. In CVPR. 5455–5463.
    [22]
    Xin Li, Fan Yang, Hong Cheng, Wei Liu, and Dinggang Shen. 2018. Contour knowledge transfer for salient object detection. In ECCV. 355–370.
    [23]
    Yin Li, Xiaodi Hou, Christof Koch, James M. Rehg, and Alan L. Yuille. 2014. The secrets of salient object segmentation. In CVPR. 280–287.
    [24]
    Feng Lin, Bin Li, Wengang Zhou, Houqiang Li, and Yan Lu. 2020. Single-stage instance segmentation. ACM Trans. Multim. Comput. Commun. Applic. 16, 3 (2020), 1–19.
    [25]
    Feng Lin, Wengang Zhou, Jiajun Deng, Bin Li, Yan Lu, and Houqiang Li. 2021. Residual refinement network with attribute guidance for precise saliency detection. ACM Trans. Multim. Comput. Commun. Applic. 17, 3 (2021), 1–19.
    [26]
    Nian Liu, Junwei Han, and Ming-Hsuan Yang. 2018. PiCANet: Learning pixel-wise contextual attention for saliency detection. In CVPR. 3089–3098.
    [27]
    Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431–3440.
    [28]
    Xin Man, Deqiang Ouyang, Xiangpeng Li, Jingkuan Song, and Jie Shao. 2022. Scenario-aware recurrent transformer for goal-directed video captioning. ACM Trans. Multim. Comput. Commun. Applic. 18, 4 (2022), 1–17.
    [29]
    Haiyang Mei, Bo Dong, Wen Dong, Jiaxi Yang, Seung-Hwan Baek, Felix Heide, Pieter Peers, Xiaopeng Wei, and Xin Yang. 2022. Glass segmentation using intensity and spectral polarization cues. In CVPR. 12622–12631.
    [30]
    Haiyang Mei, Ge-Peng Ji, Ziqi Wei, Xin Yang, Xiaopeng Wei, and Deng-Ping Fan. 2021. Camouflaged object segmentation with distraction mining. In CVPR. 8772–8781.
    [31]
    Haiyang Mei, Yuanyuan Liu, Ziqi Wei, Dongsheng Zhou, Xiaopeng Wei, Qiang Zhang, and Xin Yang. 2021. Exploring dense context for salient object detection. IEEE Trans. Circ. Syst. Vid. Technol. 32, 3 (2021), 1378–1389.
    [32]
    Haiyang Mei, Xin Yang, Yang Wang, Yuanyuan Liu, Shengfeng He, Qiang Zhang, Xiaopeng Wei, and Rynson W. H. Lau. 2020. Don’t hit me! Glass detection in real-world scenes. In CVPR. 3687–3696.
    [33]
    Haiyang Mei, Xin Yang, Letian Yu, Qiang Zhang, Xiaopeng Wei, and Rynson W. H. Lau. 2022. Large-field contextual feature learning for glass detection. IEEE Trans. Pattern Anal. Mach. Intell. (2022).
    [34]
    Nicola Messina, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, and Stéphane Marchand-Maillet. 2021. Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders. ACM Trans. Multim. Comput. Commun. Applic. 17, 4 (2021), 1–23.
    [35]
    Edwin Olson. 2011. AprilTag: A robust and flexible visual fiducial system. In ICRA. 3400–3407.
    [36]
    Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, and Huchuan Lu. 2022. Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In CVPR. 2160–2170.
    [37]
    Youwei Pang, Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. 2020. Multi-scale interactive network for salient object detection. In CVPR. 9413–9422.
    [38]
    Narinder Singh Punn and Sonali Agarwal. 2020. Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Trans. Multim. Comput. Commun. Applic. 16, 1 (2020), 1–15.
    [39]
    Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. 2019. BASNet: Boundary-aware salient object detection. In CVPR. 7479–7489.
    [40]
    Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, and Rynson W. H. Lau. 2017. DeshadowNet: A multi-context embedding deep network for shadow removal. In CVPR. 4067–4075.
    [41]
    Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. 746–760.
    [42]
    Yue Song, Hao Tang, Nicu Sebe, and Wei Wang. 2022. Disentangle saliency detection into cascaded detail modeling and body filling. ACM Trans. Multim. Comput. Commun. Applic. (2022).
    [43]
    Jinming Su, Jia Li, Yu Zhang, Changqun Xia, and Yonghong Tian. 2019. Selectivity or invariance: Boundary-aware salient object detection. In CVPR. 3799–3808.
    [44]
    Xin Tan, Jiaying Lin, Ke Xu, Pan Chen, Lizhuang Ma, and Rynson W. H. Lau. 2022. Mirror detection with the visual chirality cue. IEEE Trans. Pattern Anal. Mach. Intell. (2022), 1–13.
    [45]
    Xin Tan, Ke Xu, Ying Cao, Yiheng Zhang, Lizhuang Ma, and Rynson W. H. Lau. 2021. Night-time scene parsing with a large real dataset. IEEE Trans. Image Process. 30 (2021), 9085–9098.
    [46]
    Xin Tian, Ke Xu, Xin Yang, Lin Du, Baocai Yin, and Rynson W. H. Lau. 2022. Bi-directional object-context prioritization learning for saliency ranking. In CVPR. 5882–5891.
    [47]
    Xin Tian, Ke Xu, Xin Yang, Baocai Yin, and Rynson Lau. 2022. Learning to detect instance-level salient objects using complementary image labels. International Journal of Computer Vision. 130, 3 (2022), 729–746.
    [48]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NeurIPS 30 (2017).
    [49]
    Tomás F. Yago Vicente, Le Hou, Chen-Ping Yu, Minh Hoai, and Dimitris Samaras. 2016. Large-scale training of shadow detectors with noisily-annotated shadow examples. In ECCV. 816–832.
    [50]
    Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, Dong Wang, Baocai Yin, and Xiang Ruan. 2017. Learning to detect salient objects with image-level supervision. In CVPR. 136–145.
    [51]
    Tiantian Wang, Lihe Zhang, Shuo Wang, Huchuan Lu, Gang Yang, Xiang Ruan, and Ali Borji. 2018. Detect globally, refine locally: A novel approach to saliency detection. In CVPR. 3127–3135.
    [52]
    Wenguan Wang, Shuyang Zhao, Jianbing Shen, Steven C. H. Hoi, and Ali Borji. 2019. Salient object detection with pyramid attention and salient edges. In CVPR. 1448–1457.
    [53]
    Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In CVPR. 7794–7803.
    [54]
    Jun Wei, Shuhui Wang, and Qingming Huang. 2020. F \(^3\) Net: Fusion, feedback and focus for salient object detection. In AAAI, Vol. 34. 12321–12328.
    [55]
    Thomas Whelan, Michael Goesele, Steven J. Lovegrove, Julian Straub, Simon Green, Richard Szeliski, Steven Butterfield, Shobhit Verma, Richard A. Newcombe, M. Goesele, et al. 2018. Reconstructing scenes with mirror and glass surfaces.ACM Trans. Graph. 37, 4 (2018), 102–1.
    [56]
    Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In ECCV. 3–19.
    [57]
    Zhe Wu, Li Su, and Qingming Huang. 2019. Cascaded partial decoder for fast and accurate salient object detection. In CVPR. 3907–3916.
    [58]
    Ke Xu, Xin Tian, Xin Yang, Baocai Yin, and Rynson W. H. Lau. 2021. Intensity-aware single-image deraining with semantic and color regularization. IEEE Trans. Image Process. 30 (2021), 8497–8509.
    [59]
    Ke Xu, Xin Yang, Baocai Yin, and Rynson W. H. Lau. 2020. Learning to restore low-light images via decomposition-and-enhancement. In CVPR.
    [60]
    Xin Xu, Shiqin Wang, Zheng Wang, Xiaolong Zhang, and Ruimin Hu. 2021. Exploring image enhancement for salient object detection in low light images. ACM Trans. Multim. Comput. Commun. Applic. 17, 1s (2021), 1–19.
    [61]
    Qiong Yan, Li Xu, Jianping Shi, and Jiaya Jia. 2013. Hierarchical saliency detection. In CVPR. 1155–1162.
    [62]
    Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, and Ming-Hsuan Yang. 2013. Saliency detection via graph-based manifold ranking. In CVPR. 3166–3173.
    [63]
    Xin Yang, Haiyang Mei, Ke Xu, Xiaopeng Wei, Baocai Yin, and Rynson W. H. Lau. 2019. Where is my mirror? In ICCV. 8809–8818.
    [64]
    Xin Yang, Haiyang Mei, Jiqing Zhang, Ke Xu, Baocai Yin, Qiang Zhang, and Xiaopeng Wei. 2019. DRFN: Deep recurrent fusion network for single-image super-resolution with large factors. IEEE Trans. Multim. 21, 2 (2019), 328–337.
    [65]
    Zhenzhen Yang, Pengfei Xu, Yongpeng Yang, and Bing-Kun Bao. 2021. A densely connected network based on U-Net for medical image segmentation. ACM Trans. Multim. Comput. Commun. Applic. 17, 3 (2021), 1–14.
    [66]
    Letian Yu, Haiyang Mei, Wen Dong, Ziqi Wei, Li Zhu, Yuxin Wang, and Xin Yang. 2022. Progressive glass segmentation. IEEE Trans. Image Process. 31 (2022), 2920–2933.
    [67]
    Jiqing Zhang, Chengjiang Long, Yuxin Wang, Haiyin Piao, Haiyang Mei, Xin Yang, and Baocai Yin. 2021. A two-stage attentive network for single image super-resolution. IEEE Trans. Circ. Syst. Vid. Technol. 32, 3 (2021), 1020–1033.
    [68]
    Jiqing Zhang, Chengjiang Long, Yuxin Wang, Xin Yang, Haiyang Mei, and Baocai Yin. 2020. Multi-context and enhanced reconstruction network for single image super resolution. In ICME. 1–6.
    [69]
    Ji Zhang, Jingkuan Song, Lianli Gao, Ye Liu, and Heng Tao Shen. 2022. Progressive meta-learning with curriculum. IEEE Trans. Circ. Syst. Vid. Technol. 32, 9 (2022), 5916–5930.
    [70]
    Lu Zhang, Ju Dai, Huchuan Lu, You He, and Gang Wang. 2018. A bi-directional message passing model for salient object detection. In CVPR. 1741–1750.
    [71]
    Xiaoning Zhang, Tiantian Wang, Jinqing Qi, Huchuan Lu, and Gang Wang. 2018. Progressive attention guided recurrent network for salient object detection. In CVPR. 714–722.
    [72]
    Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. ICNet for real-time semantic segmentation on high-resolution images. In ECCV. 405–420.
    [73]
    Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In CVPR. 2881–2890.
    [74]
    Ting Zhao and Xiangqian Wu. 2019. Pyramid feature attention network for saliency detection. In CVPR. 3085–3094.
    [75]
    Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633–641.
    [76]
    Huajun Zhou, Xiaohua Xie, Jian-Huang Lai, Zixuan Chen, and Lingxiao Yang. 2020. Interactive two-stream decoder for accurate and fast saliency detection. In CVPR. 9141–9150.
    [77]
    Lei Zhu, Zijun Deng, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Jing Qin, and Pheng-Ann Heng. 2018. Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection. In ECCV. 121–136.
    [78]
    Lei Zhu, Ke Xu, Zhanghan Ke, and Rynson W. H. Lau. 2021. Mitigating intensity bias in shadow detection via feature decomposition and reweighting. In ICCV. 4702–4711.

    Cited By

    View all
    • (2024)Distraction-aware camouflaged object segmentationSCIENTIA SINICA Informationis10.1360/SSI-2022-013854:3(653)Online publication date: 14-Mar-2024
    • (2024)ADRNet-S*: Asymmetric depth registration network via contrastive knowledge distillation for RGB-D mirror segmentationInformation Fusion10.1016/j.inffus.2024.102392108(102392)Online publication date: Aug-2024
    • (2024)Key points trajectory and multi-level depth distinction based refinement for video mirror and glass segmentationMultimedia Tools and Applications10.1007/s11042-024-19627-5Online publication date: 20-Jun-2024
    • Show More Cited By

    Index Terms

    1. Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
      April 2023
      545 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572861
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 February 2023
      Online AM: 05 November 2022
      Accepted: 25 September 2022
      Revised: 27 August 2022
      Received: 22 July 2022
      Published in TOMM Volume 19, Issue 2s

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Mirror segmentation
      2. contextual contrast
      3. semantic association
      4. dataset
      5. reflection
      6. deep neural network

      Qualifiers

      • Research-article

      Funding Sources

      • National Key Research and Development Program of China
      • National Natural Science Foundation of China
      • Innovation Technology Funding of Dalian
      • Research Grants Council of Hong Kong
      • Strategic Research Grant from City University of Hong Kong

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)256
      • Downloads (Last 6 weeks)24

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Distraction-aware camouflaged object segmentationSCIENTIA SINICA Informationis10.1360/SSI-2022-013854:3(653)Online publication date: 14-Mar-2024
      • (2024)ADRNet-S*: Asymmetric depth registration network via contrastive knowledge distillation for RGB-D mirror segmentationInformation Fusion10.1016/j.inffus.2024.102392108(102392)Online publication date: Aug-2024
      • (2024)Key points trajectory and multi-level depth distinction based refinement for video mirror and glass segmentationMultimedia Tools and Applications10.1007/s11042-024-19627-5Online publication date: 20-Jun-2024
      • (2023)UTLNet: Uncertainty-Aware Transformer Localization Network for RGB-Depth Mirror SegmentationIEEE Transactions on Multimedia10.1109/TMM.2023.332389026(4564-4574)Online publication date: 11-Oct-2023
      • (2023)Camouflaged Object Segmentation with Omni PerceptionInternational Journal of Computer Vision10.1007/s11263-023-01838-2131:11(3019-3034)Online publication date: 1-Nov-2023

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media