Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548044acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cross-Domain 3D Model Retrieval Based On Contrastive Learning And Label Propagation

Published: 10 October 2022 Publication History

Abstract

In this work, we aim to tackle the task of unsupervised image based 3D model retrieval, where we seek to retrieve unlabeled 3D models that are most visually similar to the 2D query image. Due to the challenging modality gap between 2D images and 3D models, existing mainstream methods adopt domain-adversarial techniques to eliminate the gap, which cannot guarantee category-level alignment that is important for retrieval performance. Recent methods align the class centers of 2D images and 3D models to pay attention to the category-level alignment. However, there still exist two main issues: 1) the category-level alignment is too rough, and 2) the category prediction of unlabeled 3D models is not accurate. To overcome the first problem, we utilize contrastive learning for fine-grained category-level alignment across domains, which pulls both prototypes and samples with the same semantic information closer and pushes those with different semantic information apart. To provide reliable semantic prediction for contrastive learning and also address the second issue, we propose the consistent decision for pseudo labels of 3D models based on both the trained image classifier and label propagation. Experiments are carried out on MI3DOR and MI3DOR-2 datasets, and the results demonstrate the effectiveness of our proposed method.

Supplementary Material

MP4 File (MM22-fp1232.mp4)
We propose a novel cross-domain 3D model retrieval method based on contrastive learning and label propagation to tackle the task of unsupervised image based 3D model retrieval. We perform fine grained semantic alignment via category-level and sample-level contrastive learning. We also improve the prediction accuracy for unlabeled 3D models with the consensus of image classifier and label propagation. Experiments are carried out on two commonly used datasets, and the results demonstrate the effectiveness of our proposed method.

References

[1]
Miao Hu, Xianzhuo Luo, Jiawen Chen, Young Choon Lee, Yipeng Zhou, and Di Wu. Virtual reality: A survey of enabling technologies and its applications in iot. Journal of Network and Computer Applications, page 102970, 2021.
[2]
Leif P Berg and Judy M Vance. Industry use of virtual reality in product design and manufacturing: a survey. Virtual reality, 21(1):1--17, 2017.
[3]
Maxim Tatarchenko, Stephan R Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. What do single-view 3d reconstruction networks learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3405--3414, 2019.
[4]
Heyu Zhou, An-An Liu, and Weizhi Nie. Dual-level embedding alignment network for 2d image-based 3d object retrieval. In Laurent Amsaleg, Benoit Huet, Martha A. Larson, Guillaume Gravier, Hayley Hung, Chong-Wah Ngo, and Wei Tsang Ooi, editors, Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21--25, 2019, pages 1667--1675. ACM, 2019.
[5]
Heyu Zhou, Weizhi Nie, Dan Song, Nian Hu, Xuanya Li, and An-An Liu. Semantic consistency guided instance feature alignment for 2d image-based 3d shape retrieval. In Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann, editors, MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12--16, 2020, pages 925--933. ACM, 2020.
[6]
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik G. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7--13, 2015, pages 945--953. IEEE Computer Society, 2015.
[7]
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain- adversarial training of neural networks. The journal of machine learning research, 17(1):2096--2030, 2016.
[8]
Roman Klokov and Victor S. Lempitsky. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017, pages 863--872. IEEE Computer Society, 2017.
[9]
Charles Ruizhongtai Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. Volumetric and multi-view cnns for object classification on 3d data. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016, pages 5648--5656. IEEE Computer Society, 2016.
[10]
Hui Cui, Lei Zhu, Jingjing Li, Yang Yang, and Liqiang Nie. Scalable deep hashing for large-scale social image retrieval. IEEE Transactions on image processing, 29:1271--1284, 2019.
[11]
Liqiang Nie, Fangkai Jiao, Wenjie Wang, Yinglong Wang, and Qi Tian. Conversational image search. IEEE Trans. Image Process., 30:7732--7743, 2021.
[12]
Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, Qi Tian, and Longin Jan Latecki. GIFT: towards scalable 3d shape retrieval. IEEE Trans. Multim., 19(6):1257--1271, 2017.
[13]
Songle Chen, Lintao Zheng, Yan Zhang, Zhixin Sun, and Kai Xu. VERAM: view-enhanced recurrent attention model for 3d shape classification. IEEE Trans. Vis. Comput. Graph., 25(12):3244--3257, 2019.
[14]
Ke Lu, Ning He, Jian Xue, Jiyang Dong, and Ling Shao. Learning view-model joint relevance for 3d object retrieval. IEEE Trans. Image Process., 24(5):1449--1459, 2015.
[15]
Haoxuan You, Yifan Feng, Xibin Zhao, Changqing Zou, Rongrong Ji, and Yue Gao. Pvrnet: Point-view relation neural network for 3d shape recognition. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 9119--9126. AAAI Press, 2019.
[16]
Hameed Abdul-Rashid, Juefei Yuan, Bo Li, Yijuan Lu, Song Bai, Xiang Bai, Ngoc-Minh Bui, Minh N. Do, Trong-Le Do, Anh Duc Duong, Xinwei He, Tu-Khiem Le, Wenhui Li, Anan Liu, Xiaolong Liu, Khac-Tuan Nguyen, Vinh-Tiep Nguyen, Weizhi Nie, Van-Tu Ninh, Yuting Su, Vinh Ton-That, Minh-Triet Tran, Shu Xiang, Heyu Zhou, Yang Zhou, and Zhichao Zhou. 2d image-based 3d scene retrieval. In Alexandru C. Telea, Theoharis Theoharis, and Remco C. Veltkamp, editors, 11th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2018, Delft, The Netherlands, April 16, 2018, pages 37--44. Eurographics Association, 2018.
[17]
Hameed Abdul-Rashid, Juefei Yuan, Bo Li, Yijuan Lu, Tobias Schreck, Ngoc-Minh Bui, Trong-Le Do, Mike Holenderski, Dmitri Jarnikov, Tu-Khiem Le, Vlado Menkovski, Khac-Tuan Nguyen, Thanh-An Nguyen, Vinh-Tiep Nguyen, Van-Tu Ninh, Luis A. Pérez Rey, Minh-Triet Tran, and Tianyang Wang. Extended 2d scene image-based 3d scene retrieval. In Silvia Biasotti, Guillaume Lavoué, and Remco C. Veltkamp, editors, 12th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2019, Genoa, Italy, May 5--6, 2019, pages 41--48. Eurographics Association, 2019.
[18]
Hui Tang and Kui Jia. Discriminative adversarial domain adaptation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty- Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020, pages 5940--5947. AAAI Press, 2020.
[19]
Aaron Chadha and Yiannis Andreopoulos. Improved techniques for adversarial discriminative domain adaptation. IEEE Trans. Image Process., 29:2622--2637, 2020.
[20]
Zan Gao, Leming Guo, Tongwei Ren, An-An Liu, Zhi-Yong Cheng, and Shengyong Chen. Pairwise two-stream convnets for cross-domain action recognition with small data. IEEE Transactions on Neural Networks and Learning Systems, 2020.
[21]
Baochen Sun, Jiashi Feng, and Kate Saenko. Return of frustratingly easy domain adaptation. In Dale Schuurmans and Michael P. Wellman, editors, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12--17, 2016, Phoenix, Arizona, USA, pages 2058--2065. AAAI Press, 2016.
[22]
Mengxue Li, Yiming Zhai, You-Wei Luo, Pengfei Ge, and Chuan-Xian Ren. Enhanced transport distance for unsupervised domain adaptation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020, pages 13933--13941. Computer Vision Foundation / IEEE, 2020.
[23]
Chaohui Yu, Jindong Wang, Yiqiang Chen, and Meiyu Huang. Transfer learning with dynamic adversarial adaptation network. In Jianyong Wang, Kyuseok Shim, and Xindong Wu, editors, 2019 IEEE International Conference on Data Mining, ICDM 2019, Beijing, China, November 8--11, 2019, pages 778--786. IEEE, 2019.
[24]
Xinyang Chen, Sinan Wang, Mingsheng Long, and Jianmin Wang. Transferability vs. discriminability: Batch spectral penalization for adversarial domain adaptation. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9--15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 1081--1090. PMLR, 2019.
[25]
Yinghua Zhang, Yu Zhang, Ying Wei, Kun Bai, Yangqiu Song, and Qiang Yang. Fisher deep domain adaptation. In Carlotta Demeniconi and Nitesh V. Chawla, editors, Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020, Cincinnati, Ohio, USA, May 7--9, 2020, pages 469--477. SIAM, 2020.
[26]
Astuti Sharma, Tarun Kalluri, and Manmohan Chandraker. Instance level affinity-based transfer for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5361--5371, 2021.
[27]
Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, and Yongdong Zhang. Multi-objective matrix normalization for fine-grained visual recognition. IEEE Trans. Image Process., 29:4996--5009, 2020.
[28]
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems, 33:5812--5823, 2020.
[29]
Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H Chi, Steve Tjoa, Jieqi Kang, et al. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 4321-- 4330, 2021.
[30]
Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23- 28, 2020, Proceedings, Part XI, volume 12356 of Lecture Notes in Computer Science, pages 776--794. Springer, 2020.
[31]
John M Giorgi, Osvald Nitski, Gary D Bader, and Bo Wang. Declutr: Deep contrastive learning for unsupervised textual representations. arXiv preprint arXiv:2006.03659, 2020.
[32]
Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, pages 2069--2080, 2021.
[33]
Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733--3742, 2018.
[34]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
[35]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 1597--1607. PMLR, 2020.
[36]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020, pages 9726--9735. Computer Vision Foundation / IEEE, 2020.
[37]
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria- Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, 2020.
[38]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net, 2021.
[39]
Xinlei Chen, Saining Xie, and Kaiming He. An empirical study of training self- supervised vision transformers. CoRR, abs/2104.02057, 2021.
[40]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. arXiv preprint arXiv:2104.14294, 2021.
[41]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016, pages 2818--2826. IEEE Computer Society, 2016.
[42]
Yabin Zhang, Bin Deng, Kui Jia, and Lei Zhang. Label propagation with augmented anchors: A simple semi-supervised learning baseline for unsupervised domain adaptation. In European Conference on Computer Vision, pages 781--797. Springer, 2020.
[43]
Dan Song, Wei-Zhi Nie, Wen-Hui Li, Mohan Kankanhalli, and An-An Liu. Monocular image-based 3-d model retrieval: A benchmark. IEEE Transactions on Cybernetics, 2021.
[44]
Jindong Wang, Wenjie Feng, Yiqiang Chen, Han Yu, Meiyu Huang, and Philip S. Yu. Visual domain adaptation with manifold embedded distribution alignment. In Susanne Boll, Kyoung Mu Lee, Jiebo Luo, Wenwu Zhu, Hyeran Byun, Chang Wen Chen, Rainer Lienhart, and Tao Mei, editors, 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22--26, 2018, pages 402--410. ACM, 2018.
[45]
Jing Zhang, Wanqing Li, and Philip Ogunbona. Joint geometrical and statistical alignment for visual domain adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017, pages 5150--5158. IEEE Computer Society, 2017.
[46]
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. Deep transfer learning with joint adaptation networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 2208--2217. PMLR, 2017.
[47]
Yaroslav Ganin and Victor S. Lempitsky. Unsupervised domain adaptation by backpropagation. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 1180--1189. JMLR.org, 2015.

Cited By

View all
  • (2025)Adaptive CLIP for open-domain 3D model retrievalInformation Processing & Management10.1016/j.ipm.2024.10398962:2(103989)Online publication date: Mar-2025
  • (2024)Cross-Modal Contrastive Learning with a Style-Mixed Bridge for Single Image 3D Shape RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368964520:12(1-24)Online publication date: 30-Aug-2024
  • (2023)CoCa: A Connectivity-Aware Cascade Framework for Histology Gland SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613779(1598-1606)Online publication date: 26-Oct-2023

Index Terms

  1. Cross-Domain 3D Model Retrieval Based On Contrastive Learning And Label Propagation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D model retrieval
    2. contrastive learning
    3. domain adaptation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Adaptive CLIP for open-domain 3D model retrievalInformation Processing & Management10.1016/j.ipm.2024.10398962:2(103989)Online publication date: Mar-2025
    • (2024)Cross-Modal Contrastive Learning with a Style-Mixed Bridge for Single Image 3D Shape RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368964520:12(1-24)Online publication date: 30-Aug-2024
    • (2023)CoCa: A Connectivity-Aware Cascade Framework for Histology Gland SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613779(1598-1606)Online publication date: 26-Oct-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media