Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

Hierarchical and Progressive Image Matting

Published: 06 February 2023 Publication History


Most matting research resorts to advanced semantics to achieve high-quality alpha mattes, and a direct low-level features combination is usually explored to complement alpha details. However, we argue that appearance-agnostic integration can only provide biased foreground (FG) details and that alpha mattes require different-level feature aggregation for better pixel-wise opacity perception. In this article, we propose an end-to-end hierarchical and progressive attention matting network (HAttMatting++), which can better predict the opacity of the FG from single RGB images without additional input. Specifically, we utilize channel-wise attention (CA) to distill pyramidal features and employ spatial attention (SA) at different levels to filter appearance cues. This progressive attention mechanism can estimate alpha mattes from adaptive semantics and semantics-indicated boundaries. We also introduce a hybrid loss function fusing structural similarity, mean square error, adversarial loss, and sentry supervision to guide the network to further improve the overall FG structure. In addition, we construct a large-scale and challenging image matting dataset comprised of 59,000 training images and 1,000 test images (a total of 646 distinct FG alpha mattes), which can further improve the robustness of our hierarchical and progressive aggregation model. Extensive experiments demonstrate that the proposed HAttMatting++ can capture sophisticated FG structures and achieve state-of-the-art performance with single RGB images as input.


Yagiz Aksoy, Tunc Ozan Aydin, and Marc Pollefeys. 2017. Designing effective inter-pixel information flow for natural image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 228–236.
Yağız Aksoy, Tae-Hyun Oh, Sylvain Paris, Marc Pollefeys, and Wojciech Matusik. 2018. Semantic soft segmentation. ACM Transactions on Graphics 37, 4 (2018), Article 72.
Shaofan Cai, Xiaoshuai Zhang, Haoqiang Fan, Haibin Huang, Jiangyu Liu, Jiaming Liu, Jiaying Liu, Jue Wang, and Jian Sun. 2019. Disentangled image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 8818–8827.
Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 6298–6306.
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2018), 834–848.
Quan Chen, Tiezheng Ge, Yanyu Xu, Zhiqiang Zhang, Xinxin Yang, and Kun Gai. 2018. Semantic human matting. In Proceedings of the ACM International Conference on Multimedia (MM’18). 618–626.
Qifeng Chen, Dingzeyu Li, and Chi Keung Tang. 2013. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2175–2188.
D. Cho, S. Kim, Y. W. Tai, and I. S. Kweon. 2016. Automatic trimap generation and consistent matting for light-field images. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 8 (2016), 1504–1517.
Donghyeon Cho, Yu-Wing Tai, and In So Kweon. 2019. Deep convolutional neural network for natural image matting using initial alpha mattes. IEEE Transactions on Image Processing 28, 3 (2019), 1054–1067.
Yutong Dai, Hao Lu, and Chunhua Shen. 2021. Learning affinity-aware upsampling for deep image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 6841–6850.
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.
Eduardo S. L. Gastal and Manuel M. Oliveira. 2010. Shared sampling for real-time alpha matting. Computer Graphics Forum 29, 2 (2010), 575–584.
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Xu Bing, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS’14). 2672–2680.
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip H. S. Torr. 2017. Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3203–3212.
Qiqi Hou and Feng Liu. 2019. Context-aware image matting for simultaneous foreground and alpha estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 4129–4138.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967–5976.
L. Karacan, A. Erdem, and E. Erdem. 2015. Image matting with KL-divergence based sparse sampling. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’15). 424–432.
P. Lee and Ying Wu. 2011. Nonlocal matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’11). 2193–2200.
Anat Levin, Dani Lischinski, and Yair Weiss. 2007. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 2 (2007), 228–242.
Anat Levin, Alex Rav-Acha, and Dani Lischinski. 2008. Spectral matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 10 (2008), 1699–1712.
Yaoyi Li and Hongtao Lu. 2020. Natural image matting via guided contextual attention. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20). 11450–11457.
Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian L. Curless, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2021. Real-time high-resolution background matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 8762–8771.
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV’14). 740–755.
Wei Liu, Andrew Rabinovich, and Alexander C. Berg. 2015. ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015).
Hao Lu, Yutong Dai, Chunhua Shen, and Songcen Xu. 2019. Indices matter: Learning to index for deep image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 3265–3274.
Sebastian Lutz, Konstantinos Amplianitis, and Aljoscha Smolic. 2018. AlphaGAN: Generative adversarial networks for natural image matting. In Proceedings of the British Machine Vision Conference (BMVC’18). 259.
Haiyang Mei, Yuanyuan Liu, Ziqi Wei, Dongsheng Zhou, Xiaopeng Xiaopeng, Qiang Zhang, and Xin Yang. 2021. Exploring dense context for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2021), 1378–1389.
Haiyang Mei, Xin Yang, Yang Wang, Yuanyuan Liu, Shengfeng He, Qiang Zhang, Xiaopeng Wei, and Rynson W. H. Lau. 2020. Don’t hit me! Glass detection in real-world scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20).
Yu Qiao, Yuhao Liu, Xin Yang, Dongsheng Zhou, Mingliang Xu, Qiang Zhang, and Xiaopeng Wei. 2020. Attention-guided hierarchical structure aggregation for image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).
Yu Qiao, Yuhao Liu, Qiang Zhu, Xin Yang, Yuxin Wang, Qiang Zhang, and Xiaopeng Wei. 2020. Multi-scale information assembly for image matting. Computer Graphics Forum 39 (2020), 565–574.
Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. 2019. BASNet: Boundary-aware salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 7471–7481.
C. Rhemann and C. Rother. 2011. A global sampling method for alpha matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’11). 2049–2056.
Christoph Rhemann, Carsten Rother, Jue Wang, Margrit Gelautz, Pushmeet Kohli, and Pamela Rott. 2009. A perceptually motivated online benchmark for image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’09). 1826–1833.
Swami Sankaranarayanan, Yogesh Balaji, Arpit Jain, Ser Nam Lim, and Rama Chellappa. 2018. Learning from synthetic data: Addressing domain shift for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 3752–3761.
Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2020. Background matting: The world is your green screen. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 2288–2297.
Ehsan Shahrian, Deepu Rajan, Brian Price, and Scott Cohen. 2013. Improving image matting using comprehensive sampling sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’13). 636–643.
Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. 2016. Deep automatic portrait matting. In Proceedings of the European Conference on Computer Vision (ECCV’16). 92–107.
Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. 2021. Semantic image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 11120–11129.
Jingwei Tang, Yagiz Aksoy, Cengiz Oztireli, Markus Gross, and Tunc Ozan Aydin. 2019. Learning-based sampling for natural image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 3050–3058.
Xin Tian, Ke Xu, Xin Yang, Lin Du, Baocai Yin, and Rynson W. H. Lau. 2022. Bi-directional object-context prioritization learning for saliency ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’22).
Xin Tian, Ke Xu, Xin Yang, Baocai Yin, and Rynson W. H. Lau. 2020. Weakly-supervised salient instance detection. In Proceedings of the British Machine Vision Conference (BMVC’20).
Xin Tian, Ke Xu, Xin Yang, Baocai Yin, and Rynson W. H. Lau. 2021. Learning to detect instance-level salient objects using complementary image labels. International Journal of Computer Vision 130 (2021), 729–746.
Renjie Wan, Boxin Shi, Ling-Yu Duan, Ah-Hwee Tan, and Alex C. Kot. 2018. CRRN: Multi-scale guided concurrent reflection removal network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 4777–4785.
Jue Wang and Michael F. Cohen. 2007. Optimized color sampling for robust matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’07). 1–8.
Yu Wang, Yi Niu, Peiyong Duan, Jianwei Lin, and Yuanjie Zheng. 2018. Deep propagation based image matting. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 999–1006.
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Hanqing Zhao, Weiming Zhang, and Nenghai Yu. 2021. Improved image matting via real-time user clicks and uncertainty estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 15374–15383.
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.
Yuanbo Xiangli, Yubin Deng, Bo Dai, Chen Change Loy, and Dahua Lin. 2020. Real or not real, that is the question. In Proceedings of the International Conference on Learning Representations (ICLR’20).
Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 5987–5995.
Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. 2017. Deep image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 311–320.
Xin Yang, Yu Qiao, Shaozhe Chen, Shengfeng He, Baocai Yin, Qiang Zhang, Xiaopeng Wei, and Rynson W. H. Lau. 2020. Smart scribbles for image matting. ACM Transactions on Multimedia Computing Communications and Applications 16, 4 (2020), Article 121, 21 pages.
Xin Yang, Ke Xu, Shaozhe Chen, Shengfeng He, Baocai Yin Yin, and Rynson Lau. 2018. Active matting. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS’18). 4590–4600.
Guan Yu, Wei Chen, Xiao Liang, Zi’ang Ding, and Qunsheng Peng. 2006. Easy matting—A stroke based approach for continuous image matting. Computer Graphics Forum 25, 3 (2006), 567–576.
Haichao Yu, Ning Xu, Zilong Huang, Yuqian Zhou, and Humphrey Shi. 2021. High-resolution deep image matting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’21). 3217–3224.
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 5505–5514.
Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, and Alan Yuille. 2021. Mask guided matting via progressive refinement network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 1154–1163.
Yunke Zhang, Lixue Gong, Lubin Fan, Peiran Ren, Qixing Huang, Hujun Bao, and Weiwei Xu. 2019. A late fusion CNN for digital matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 7461–7470.
Yuanjie Zheng and Chandra Kambhamettu. 2009. Learning based digital matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’09). 889–896.
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’17). 2242–2251.

Cited By

View all



Information & Contributors


Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2
March 2023
540 pages
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2023
Online AM: 11 June 2022
Accepted: 23 May 2022
Revised: 11 May 2022
Received: 29 August 2021
Published in TOMM Volume 19, Issue 2


Request permissions for this article.

Check for updates

Author Tags

  1. Image matting
  2. alpha matte
  3. hierarchical
  4. progressive
  5. attention


  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Innovation Technology Funding of Dalian
  • Open Research Fund of Beijing Key Laboratory of Big Data Technology for Food Safety
  • National Key Research and Development Program of China


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)116
  • Downloads (Last 6 weeks)10
Reflects downloads up to 15 Oct 2024

Other Metrics


Cited By

View all
  • (2024)ViTMatteInformation Fusion10.1016/j.inffus.2023.102091103:COnline publication date: 4-Mar-2024
  • (2024)Multi-guided-based image matting via boundary detectionComputer Vision and Image Understanding10.1016/j.cviu.2024.103998243:COnline publication date: 1-Jun-2024
  • (2024)Deep image matting with cross-layer contextual information propagationNeural Computing and Applications10.1007/s00521-024-09431-536:12(6809-6825)Online publication date: 20-Feb-2024
  • (2023)Cascading Blend Network for Image InpaintingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360895220:1(1-21)Online publication date: 25-Aug-2023
  • (2023)Dual-context aggregation for universal image mattingMultimedia Tools and Applications10.1007/s11042-023-17517-w83:17(53119-53137)Online publication date: 15-Nov-2023

View Options

Get Access

Login options

Full Access

View options


View or Download as a PDF file.



View online with eReader.


Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format







Share this Publication link

Share on social media