Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3595916.3626447acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

MontageNet: Annotated Dataset of Furniture Components in Real-World Images

Published: 01 January 2024 Publication History

Abstract

Indoor understanding is currently a topic that is widely studied in the field of machine learning. Furniture is the most common object in indoor scenes, just as various vehicles are most commonly seen in street scenes. Any object is made up of a combination of functional components. Functional component dismantling and reassembly is an important development for industrial manufacturing to improve efficiency and reduce costs. In the context of understanding indoor scenes, we focus on building a dataset that uses real-world furniture images as part labels. Building a large-scale furniture dataset is very challenging, first of all, the existing dataset has too few real images of furniture, mostly 3D model images, but the diversity of furniture in the real world far exceeds that of 3D models, and real images help improve the calculation speed of model training. Most of the published furniture data is poorly aligned with the annotation data, and even fewer materials perform component segmentation labeling using real furniture images. MontageNet has become a rich resource for part-level 3D shape analysis, semantic understanding, instance segmentation and 3D reconstruction, and other research. Our accompanying empirical studies provide an in-depth analysis of dataset characteristics and performance evaluation of several state-of-the-art methods against our benchmarks.

References

[1]
S. Song, S. Lichtenberg, and J. Xiao. 2015. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. In Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[2]
Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B. Tenenbaum, and William T. Freeman. 2018. Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[3]
Yu Xiang, Roozbeh Mottaghi and Silvio Savarese. 2014. Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild. In IEEE Winter Conference on Applications of Computer Vision (WACV)
[4]
Yongzhi Su, Mingxin Liu, Jason Rambach, Antonia Pehrson, Anton Berg, Didier Stricker. 2021. IKEA Object State Dataset: A 6DoF object pose estimation dataset and benchmark for multi-state assembly objects. arXiv:2111.08614.
[5]
J. Deng, W. Dong, R. Socher, L. -J. Li, Kai Li and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[6]
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár. 2014. Microsoft COCO: Common Objects in Context. arXiv:1405.0312v3.
[7]
Andreas Geiger, Philip Lenz, Christoph Stiller, Raquel Urtasun. 2013. Vision meets Robotics: The KITTI Dataset. In International Journal of Robotics Research (IJRR)
[8]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba. 2017. Scene Parsing Through ADE20K Dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[9]
Nathan Silberman, Pushmeet Kohli, Derek Hoiem, Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision (ECCV)
[10]
J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. 2010. SUN Database: Large-scale Scene Recognition from Abbey to Zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[11]
B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
[12]
George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41
[13]
Yu Xiang, Wonhui Kim, Wei Chen, Jingwei Ji, Christopher Choy, Hao Su, Roozbeh Mottaghi, Leonidas Guibas and Silvio Savarese. 2016. ObjectNet3D: A Large Scale Database for 3D Object Recognition. In European Conference on Computer Vision (ECCV)
[14]
Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. arXiv:1512.03012.
[15]
Paul Striewski1, Benedikt Wirth. 2022. Elastic 3D–2D Image Registration. J Math Imaging Vis 64, 443–462.
[16]
Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, Hao Su. 2019. PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[17]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[18]
Shervin Minaee, Yuri Boykov, Fatih Porikli, Antonio Plaza, Nasser Kehtarnavaz, Demetri Terzopoulos. 2020. Image Segmentation Using Deep Learning: A Survey. arXiv:2001.05566.
[19]
Holger Caesar, Jasper Uijlings, Vittorio Ferrari. 2018. COCO-Stuff: Thing and Stuff Classes in Context. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[20]
Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, Alan Yuille. 2014. The Role of Context for Object Detection and Semantic Segmentation in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[21]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The PASCAL Visual Object Classes (VOC) Challenge. In International Journal of Computer Vision (IJCV)
[22]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia. 2017. Pyramid Scene Parsing Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. 2015. Deep Residual Learning for Image Recognition. In Computing Research Repository (CoRR)
[24]
Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. 2018. Unified Perceptual Parsing for Scene Understanding. In European Conference on Computer Vision (ECCV)
[25]
Hangbo Bao, Li Dong, Songhao Piao, Furu Wei. 2022. BEiT: BERT Pre-Training of Image Transformers. In International Conference on Learning Representations (ICLR)
[26]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In European Conference on Computer Vision (ECCV)
[27]
Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang. 2020. Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation. In European Conference on Computer Vision (ECCV)
[28]
Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
[29]
MMSegmentation Contributors. 2020. OpenMMLab Semantic Segmentation Toolbox and Benchmark. https://github.com/open-mmlab/mmsegmentation

Index Terms

  1. MontageNet: Annotated Dataset of Furniture Components in Real-World Images
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
        December 2023
        745 pages
        ISBN:9798400702051
        DOI:10.1145/3595916
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 January 2024

        Check for updates

        Author Tags

        1. 2D Real image
        2. Component annotation
        3. Dataset

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        MMAsia '23
        Sponsor:
        MMAsia '23: ACM Multimedia Asia
        December 6 - 8, 2023
        Tainan, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 59 of 204 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 809
          Total Downloads
        • Downloads (Last 12 months)809
        • Downloads (Last 6 weeks)114
        Reflects downloads up to 10 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media