Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3552484.3555754acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Learning Multi-Subset of Classes for Fine-Grained Food Recognition

Published: 24 October 2022 Publication History

Abstract

Food image recognition is a complex computer vision task, because of the large number of fine-grained food classes. Fine-grained recognition tasks focus on learning subtle discriminative details to distinguish similar classes. In this paper, we introduce a new method to improve the classification of classes that are more difficult to discriminate based on Multi-Subsets learning. Using a pre-trained network, we organize classes in multiple subsets using a clustering technique. Later, we embed these subsets in a multi-head model structure. This structure has three distinguishable parts. First, we use several shared blocks to learn the generalized representation of the data. Second, we use multiple specialized blocks focusing on specific subsets that are difficult to distinguish. Lastly, we use a fully connected layer to weight the different subsets in an end-to-end manner by combining the neuron outputs. We validated our proposed method using two recent state-of-the-art vision transformers on three public food recognition datasets. Our method was successful in learning the confused classes better and we outperformed the state-of-the-art on the three datasets.

References

[1]
Eduardo Aguilar, Marc Bolaños, and Petia Radeva. 2017. Exploring food detection using CNNs. In International Conference on Computer Aided Systems Theory. Springer, 339--347.
[2]
Eduardo Aguilar, Beatriz Remeseiro, Marc Bolaños, and Petia Radeva. 2018. Grab, pay, and eat: Semantic food detection for smart restaurants. IEEE Transactions on Multimedia 20, 12 (2018), 3266--3275.
[3]
Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2927--2936.
[4]
Berker Arslan, Sefer Memis, Elena Battinisonmez, and Okan Zafer Batur. 2021. Fine-Grained Food Classification Methods on the UEC Food-100 Database. IEEE Transactions on Artificial Intelligence (2021).
[5]
Nil Ballús, Bhalaji Nagarajan, and Petia Radeva. 2022. Opt-SSL: An Enhanced Self-Supervised Framework for Food Recognition. In Iberian Conference on Pattern Recognition and Image Analysis. Springer, 655--666.
[6]
Emanuel Ben-Baruch, Matan Karklinsky, Yossi Biton, Avi Ben-Cohen, Hussam Lawen, and Nadav Zamir. 2022. It's All in the Head: Representation Knowledge Distillation through Classifier Sharing. arXiv preprint arXiv:2201.06945 (2022).
[7]
Marc Bolaños, Aina Ferrà, and Petia Radeva. 2017. Food ingredients recognition through multi-label learning. In International Conference on Image Analysis and Processing. Springer, 394--402.
[8]
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101--mining discriminative components with random forests. In European conference on computer vision. Springer, 446--461.
[9]
K. E. Brown and D. A. Talbert. 2019. Heuristically reducing the cost of correlation based feature selection. In In Proceedings of the 2019 ACM Southeast Conference. 24--30.
[10]
Xin Chen, Hua Zhou, Yu Zhu, and Liang Diao. 2017. ChineseFoodNet: A large scale Image Dataset for Chinese Food Recognition. arXiv preprint arXiv:1705.02743 (2017).
[11]
Ying Chen, Jie Song, and Mingli Song. 2022. Hierarchical gate network for fine-grained visual recognition. Neurocomputing 470 (2022), 170--181.
[12]
W. Chu, Y. Liu, C. Shen, D. Cai, and X. S. Hua. 2017. Multi-task vehicle detection with region-of-interest voting. IEEE Transactions on Image Processing 27, 1 (2017), 432--441.
[13]
Michael Crawshaw. 2020. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796 (2020).
[14]
Yin Cui, Yang Song, Chen Sun, Andrew Howard, and Serge Belongie. 2018. Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4109--4118.
[15]
Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[16]
Lixi Deng, Jingjing Chen, Qianru Sun, Xiangnan He, Sheng Tang, Zhaoyan Ming, Yongdong Zhang, and Tat Seng Chua. 2019. Mixed-dish recognition with contextual relation networks. In Proceedings of the 27th ACM International Conference on Multimedia. 112--120.
[17]
Samuel Dodge and Lina Karam. 2017. A study and comparison of human and deep learning recognition performance under visual distortions. In 2017 26th international conference on computer communication and networks (ICCCN). IEEE, 1--7.
[18]
Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, and Baining Guo. 2022. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12124--12134.
[19]
Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Zhanyu Ma, Yi-Zhe Song, and Jun Guo. 2020. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In European Conference on Computer Vision. Springer, 153--168.
[20]
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2020. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020).
[21]
Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4438--4446.
[22]
Yu Gao, Xintong Han, XunWang,Weilin Huang, and Matthew Scott. 2020. Channel interaction networks for fine-grained image categorization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34(07). 10818--10825.
[23]
Yuan Gao, Jiayi Ma, Mingbo Zhao, Wei Liu, and Alan L. Yuille. 2019. NDDRCNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction, arxiv:1801.08297. arXiv:1801.08297 [cs.CV]
[24]
Zongyuan Ge, Chris Mccool, Conrad Sanderson, Alex Bewley, Zetao Chen, and Peter Corke. 2015. Fine-grained bird species recognition via hierarchical subset learning. In 2015 IEEE International Conference on Image Processing (ICIP). 561--565. https://doi.org/10.1109/ICIP.2015.7350861
[25]
Zongyuan Ge, Christopher Mccool, Conrad Sanderson, and Peter Corke. 2015. Subset Feature Learning for Fine-Grained Category Classification, arxiv:1505.02269. arXiv:1505.02269 [cs.CV]
[26]
Chun-feng Guo, Hai-rong Cui, Yu Kun, and Xin-ping Mo. 2019. A Survey of Fine-Grained Image Classification Based on Deep Learning. DEStech Transactions on Computer Science and Engineering ica (2019).
[27]
Jiangpeng He and Fengqing Zhu. 2021. Online continual learning for visual food classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2337--2346.
[28]
Jiangpeng He and Fengqing Maggie Zhu. 2021. Online Continual Learning For Visual Food Classification. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2021), 2337--2346.
[29]
Xiangteng He and Yuxin Peng. 2017. Fine-grained image classification via combining vision and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5994--6002.
[30]
Zixuan Huang and Yin Li. 2020. Interpretable and accurate fine-grained recognition via region grouping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8662--8672.
[31]
Ruyi Ji, LongyinWen, Libo Zhang, Dawei Du, YanjunWu, Chen Zhao, Xianglong Liu, and Feiyue Huang. 2020. Attention convolutional binary neural tree for fine-grained visual categorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10468--10477.
[32]
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and visionlanguage representation learning with noisy text supervision. In International Conference on Machine Learning. PMLR, 4904--4916.
[33]
Shuqiang Jiang, Weiqing Min, Linhu Liu, and Zhengdong Luo. 2019. Multi-scale multi-view deep feature aggregation for food recognition. IEEE Transactions on Image Processing 29 (2019), 265--276.
[34]
Parneet Kaur, Karan Sikka, Weijun Wang, Serge Belongie, and Ajay Divakaran. 2019. Foodx-251: a dataset for fine-grained food classification. arXiv preprint arXiv:1907.06167 (2019).
[35]
Masanari Kimura. 2021. Understanding Test-Time Augmentation. In International Conference on Neural Information Processing. Springer, 558--569.
[36]
Zhongqi Lin, Shaomin Mu, Feng Huang, Khattak Abdul Mateen, Minjuan Wang, Wanlin Gao, and Jingdun Jia. 2019. A unified matrix-based convolutional neural network for fine-grained image classification of wheat leaf diseases. IEEE Access 7 (2019), 11570--11590.
[37]
Chengxu Liu, Yuanzhi Liang, Yao Xue, Xueming Qian, and Jianlong Fu. 2020. Food and ingredient joint learning for fine-grained recognition. IEEE Transactions on Circuits and Systems for Video Technology 31, 6 (2020), 2480--2493.
[38]
Chuanbin Liu, Hongtao Xie, Zheng-Jun Zha, Lingfeng Ma, Lingyun Yu, and Yongdong Zhang. 2020. Filtration and distillation: Enhancing region attention for fine-grained visual categorization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34 (07). 11555--11562.
[39]
Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).
[40]
Niki Martinel, Gian Luca Foresti, and Christian Micheloni. 2018. Wide-slice residual networks for food recognition. In 2018 IEEE Winter Conference on applications of computer vision (WACV). IEEE, 567--576.
[41]
Lei Meng, Long Chen, Xun Yang, Dacheng Tao, Hanwang Zhang, Chunyan Miao, and Tat-Seng Chua. 2019. Learning using privileged information for food recognition. In Proceedings of the 27th ACM International Conference on Multimedia. 557--565.
[42]
Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, and Yongdong Zhang. 2020. Multi-objective matrix normalization for fine-grained visual recognition. IEEE Transactions on Image Processing 29 (2020), 4996--5009.
[43]
Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. 2019. A survey on food computing. ACM Computing Surveys (CSUR) 52, 5 (2019), 1--36.
[44]
Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2020. Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network. In Proceedings of the 28th ACM International Conference on Multimedia. 393--401.
[45]
Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Liping Kang, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2021. Large scale visual food recognition. arXiv preprint arXiv:2103.16107 (2021).
[46]
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Arxiv:1604.03539 Martial Hebert. 2016. Cross-stitch Networks for Multi-task Learning. arXiv:1604.03539 [cs.CV]
[47]
H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos, arxiv:1906.06876. arXiv:1906.06876 [cs.CV]
[48]
Yuxin Peng, Xiangteng He, and Junjie Zhao. 2017. Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing 27, 3 (2017), 1487--1500.
[49]
Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. 2021. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972 (2021).
[50]
Mohammed Ahmed Subhi, Sawal Hamid Ali, and Mohammed Abulameer Mohammed. 2019. Vision-based approaches for automatic food recognition and dietary assessment: A survey. IEEE Access 7 (2019), 35370--35381.
[51]
Ximeng Sun, Rameswar Panda, Rogerio Feris, and Kate Saenko. 2020. AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning, arxiv:1911.12423. arXiv:1911.12423 [cs.CV]
[52]
Hugo Touvron, Alexandre Sablayrolles, Matthijs Douze, Matthieu Cord, and Hervé Jégou. 2021. Grafit: Learning fine-grained image representations with coarse labels. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 854--864.
[53]
Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, and Luc Van Gool. 2021. Multi-task learning for dense prediction tasks: A survey. IEEE transactions on pattern analysis and machine intelligence (2021).
[54]
Yafei Wang and Zepeng Wang. 2019. A survey of recent work on fine-grained image classification techniques. Journal of Visual Communication and Image Representation 59 (2019), 210--214.
[55]
Xiu-Shen Wei, Yi-Zhe Song, Oisin Mac Aodha, Jianxin Wu, Yuxin Peng, Jinhui Tang, Jian Yang, and Serge Belongie. 2021. Fine-Grained Image Analysis with Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[56]
Xiu-Shen Wei, Jianxin Wu, and Quan Cui. 2019. Deep learning for fine-grained image analysis: A survey. arXiv preprint arXiv:1907.03069 (2019).
[57]
Chee Sun Won. 2020. Multi-scale CNN for fine-grained image recognition. IEEE Access 8 (2020), 116663--116674.
[58]
Peng Xu, Qiyue Yin, Yongye Huang, Yi-Zhe Song, Zhanyu Ma, Liang Wang, Tao Xiang, W Bastiaan Kleijn, and Jun Guo. 2018. Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing 278 (2018), 75--86.
[59]
Ze Yang, Tiange Luo, Dong Wang, Zhiqiang Hu, Jun Gao, and Liwei Wang. 2018. Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV). 420--435.
[60]
Jun Yu, Min Tan, Hongyuan Zhang, Dacheng Tao, and Yong Rui. 2019. Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).
[61]
Ye Yu, Qiang Jin, and Chang-Wen Chen. 2018. FF-CMnet: A CNN-Based Model for Fine-Grained Classification of Car Models Based on Feature Fusion. In 2018 IEEE International Conference on Multimedia and Expo (ICME). 1--6. https://doi. org/10.1109/ICME.2018.8486443
[62]
Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, and Shuicheng Yan. 2021. Volo: Vision outlooker for visual recognition. arXiv preprint arXiv:2106.13112 (2021).
[63]
Zharfan Zahisham, Chin Poo Lee, and Kian Ming Lim. 2020. Food recognition with resnet-50. In 2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET). IEEE, 1--5.
[64]
Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. 2014. Part-based R-CNNs for fine-grained category detection. In European conference on computer vision. Springer, 834--849.
[65]
Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering (2021).
[66]
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. Learning Deep Representation for Face Alignment with Auxiliary Attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 5 (May 2016), 918--930. https://doi.org/10.1109/tpami.2015.2469286
[67]
Bo Zhao, Jiashi Feng, Xiao Wu, and Shuicheng Yan. 2017. A survey on deep learning-based fine-grained object classification and semantic segmentation. International Journal of Automation and Computing 14, 2 (2017), 119--135.
[68]
Min Zheng, Qingyong Li, Yangli-ao Geng, Haomin Yu, Jianzhu Wang, Jinrui Gan, and Wenyuan Xue. 2018. A survey of fine-grained image categorization. In 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, 533--538.
[69]
Feng Zhou and Yuanqing Lin. 2016. Fine-grained image classification by exploring bipartite-graph labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1124--1133.

Cited By

View all
  • (2024)A Lightweight Hybrid Model with Location-Preserving ViT for Efficient Food RecognitionNutrients10.3390/nu1602020016:2(200)Online publication date: 8-Jan-2024
  • (2024)Nutritional composition analysis in food images: an innovative Swin Transformer approachFrontiers in Nutrition10.3389/fnut.2024.145446611Online publication date: 14-Oct-2024
  • (2024)Lightweight Food Recognition via Aggregation Block and Feature EncodingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368028520:10(1-25)Online publication date: 22-Jul-2024
  • Show More Cited By

Index Terms

  1. Learning Multi-Subset of Classes for Fine-Grained Food Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MADiMa '22: Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management
    October 2022
    97 pages
    ISBN:9781450395021
    DOI:10.1145/3552484
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fine-grained food recognition
    2. hierarchical learning
    3. learning subsets of classes

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    MADiMa '22 Paper Acceptance Rate 9 of 10 submissions, 90%;
    Overall Acceptance Rate 16 of 24 submissions, 67%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)156
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Lightweight Hybrid Model with Location-Preserving ViT for Efficient Food RecognitionNutrients10.3390/nu1602020016:2(200)Online publication date: 8-Jan-2024
    • (2024)Nutritional composition analysis in food images: an innovative Swin Transformer approachFrontiers in Nutrition10.3389/fnut.2024.145446611Online publication date: 14-Oct-2024
    • (2024)Lightweight Food Recognition via Aggregation Block and Feature EncodingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368028520:10(1-25)Online publication date: 22-Jul-2024
    • (2024)Lightweight Food Image Recognition With Global Shuffle ConvolutionIEEE Transactions on AgriFood Electronics10.1109/TAFE.2024.33867132:2(392-402)Online publication date: Sep-2024
    • (2024)LOFI: LOng-tailed FIne-Grained Network for Food Recognition2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00379(3750-3760)Online publication date: 17-Jun-2024
    • (2024)Fine grained food image recognition based on swin transformerJournal of Food Engineering10.1016/j.jfoodeng.2024.112134380(112134)Online publication date: Nov-2024
    • (2023)Deep ensemble-based hard sample mining for food recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10390595:COnline publication date: 1-Sep-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media