Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394171.3414031acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

Published: 12 October 2020 Publication History

Abstract

Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for developing advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in food recognition, we introduce the dataset ISIA Food-500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume. Furthermore, we propose a stacked global-local attention network, which consists of two sub-networks for food recognition. One sub-network first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale discriminative features from multiple layers into global-level representation (e.g., texture and shape information about food). The other one generates attentional regions (e.g., ingredient relevant regions) from different regions via cascaded spatial transformers, and further aggregates these multi-scale regional features from different layers into local-level representation. These two types of features are finally fused as comprehensive representation for food recognition. Extensive experiments on ISIA Food-500 and other two popular benchmark datasets demonstrate the effectiveness of our proposed method, and thus can be considered as one strong baseline. The dataset, code and models can be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html.

Supplementary Material

MP4 File (3394171.3414031.mp4)
ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

References

[1]
Eduardo Aguilar, Marc Bola n os, and Petia Radeva. 2017. Food recognition using fusion of classifiers based on CNNs. In International Conference on Image Analysis and Processing. 213--224.
[2]
Eduardo Aguilar, Beatriz Remeseiro, Marc Bola n os, and Petia Radeva. 2018. Grab, Pay and Eat:Semantic Food Detection for Smart Restaurants. In IEEE Transactions on Multimedia, Vol. 20. 3266--3275.
[3]
Shuang Ao and Charles X. Ling. 2015. Adapting new categories for food recognition with deep representation. In IEEE International Conference on Data Mining Workshop. 1196--1203.
[4]
Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory D Abowd, and Irfan Essa. 2015. Leveraging context to support automated food recognition in restaurants. In IEEE Winter Conference on Applications of Computer Vision. 580--587.
[5]
Marc Bolanos and Petia Radeva. 2017. Simultaneous food localization and recognition. In International Conference on Pattern Recognition. 3140--3145.
[6]
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101--mining discriminative components with random forests. In European Conference on Computer Vision. 446--461.
[7]
Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the ACM on Multimedia Conference. 32--41.
[8]
L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T. Chua. 2017. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. In IEEE Conference on Computer Vision and Pattern Recognition. 6298--6306.
[9]
Mei Chen, Kapil Dhingra, Wen Wu, Lei Yang, Rahul Sukthankar, and Jie Yang. 2009. PFID: Pittsburgh fast-food image dataset. In IEEE International Conference on Image Processing. 289--292.
[10]
Xin Chen, Hua Zhou, and Liang Diao. 2017. ChineseFoodNet: A large-scale image dataset for Chinese food recognition. In CoRR, Vol. abs/1705.02743.
[11]
Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. 2019. Destruction and Construction Learning for Fine-Grained Image Recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition (2019), 5157--5166.
[12]
Gianluigi Ciocca, Paolo Napoletano, and Raimondo Schettini. 2015. Food Recognition and Leftover Estimation for Daily Diet Monitoring. In International Conference on Image Analysis and Processing. 334--341.
[13]
Gianluigi Ciocca, Paolo Napoletano, and Raimondo Schettini. 2016. Food recognition: a new dataset, experiments, and results. In IEEE Journal of Biomedical and Health Informatics, Vol. 21. 588--598.
[14]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248--255.
[15]
Lixi Deng, Jingjing Chen, Qianru Sun, Xiangnan He, Sheng Tang, Zhaoyan Ming, Yongdong Zhang, and Tat-Seng Chua. 2019. Mixed-dish Recognition with Contextual Relation Networks. In ACM Multimedia. ACM, 112--120.
[16]
Hamid Hassannejad, Guido Matrella, Paolo Ciampolini, Ilaria De Munari, Monica Mordonini, and Stefano Cagnoni. 2016. Food image recognition using very deep convolutional networks. In International Workshop on Multimedia Assisted Dietary Management. 41--49.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[18]
S. Horiguchi, S. Amano, M. Ogawa, and K. Aizawa. 2018. Personalized Classifier for Food Image Recognition. IEEE Transactions on Multimedia, Vol. 20, 10 (2018), 2836--2848.
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 7132--7141.
[20]
Tao Hu and Honggang Qi. 2019. See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification. CoRR, Vol. abs/1901.09891.
[21]
Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition. 2261--2269.
[22]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial Transformer Networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems. 2017--2025.
[23]
Mona Jalal, Kaihong Wang, Sankara Jefferson, Yi Zheng, Elaine O. Nsoesie, and Margrit Betke. 2019. Scraping Social Media Photos Posted in Kenya and Elsewhere to Detect and Analyze Food Types. In Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management. 50--59.
[24]
Shuqiang Jiang, Weiqing Min, Linhu Liu, and Zhengdong Luo. 2019. Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Transactions on Image Processing, Vol. 29, 1, 265--276.
[25]
Hokuto Kagaya, Kiyoharu Aizawa, and Makoto Ogawa. 2014. Food detection and recognition using convolutional neural network. In Proceedings of the ACM International Conference on Multimedia. 1085--1088.
[26]
Parneet Kaur, Karan Sikka, Weijun Wang, Serge J. Belongie, and Ajay Divakaran. 2019. FoodX-251: A Dataset for Fine-grained Food Classification. In IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[27]
Yoshiyuki Kawano and Keiji Yanai. 2014. Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In European Conference on Computer Vision. 3--17.
[28]
Simon Kornblith, Jonathon Shlens, and Quoc Le. 2019. Do Better ImageNet Models Transfer Better?. In IEEE Conference on Computer Vision and Pattern Recognition. 2661--2671.
[29]
W. Li, X. Zhu, and S. Gong. 2018. Harmonious Attention Network for Person Re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 2285--2294.
[30]
Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and Yunsheng Ma. 2016. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In International Conference on Smart Homes and Health Telematics. 37--48.
[31]
Pau Rodr'lguez López, Diego Velazquez Dorta, Guillem Cucurull Preixens, Josep M. Gonfaus, and Jordi Gonzàlez Sabaté. 2020. Pay attention to the activations: a modular attention mechanism for fine-grained image recognition. In IEEE Transactions on Multimedia, Vol. 22. 502--514.
[32]
Niki Martinel, Gian Luca Foresti, and Christian Micheloni. 2018. Wide-slice residual networks for food recognition. In IEEE Winter Conference on Applications of Computer Vision. 567--576.
[33]
Niki Martinel, Claudio Piciarelli, and Christian Micheloni. 2016. A supervised extreme learning committee for food recognition. In Computer Vision and Image Understanding, Vol. 148. Elsevier, 67--86.
[34]
Yuji Matsuda and Keiji Yanai. 2012. Multiple-food recognition considering co-occurrence employing manifold ranking. In International Conference on Pattern Recognition. 2017--2020.
[35]
Patrick McAllister, Huiru Zheng, Raymond Bond, and Anne Moorhead. 2018. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets. In Computers in Biology and Medicine, Vol. 95. Elsevier, 217--233.
[36]
Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, and Kevin P Murphy. 2015. Im2Calories: towards an automated mobile vision food diary. In Proceedings of the IEEE International Conference on Computer Vision. 1233--1241.
[37]
Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. 2018. You are what you eat: Exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia, Vol. 20, 4 (2018), 950--964.
[38]
Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. 2019 a. A Survey on Food Computing. In ACM Computing Surveys, Vol. 52. 1--36.
[39]
Weiqing Min, Shuqiang Jiang, Jitao Sang, Huayang Wang, Xinda Liu, and Luis Herranz. 2017a. Being a Super Cook: Joint Food Attributes and Multi-Modal Content Modeling for Recipe Retrieval and Exploration. IEEE Transactions on Multimedia, Vol. 19, 5 (2017), 1100--1113.
[40]
Weiqing Min, Shuqiang Jiang, Shuhui Wang, Jitao Sang, and Shuhuan Mei. 2017b. A Delicious Recipe Analysis Framework for Exploring Multi-Modal Recipes with Various Attributes. In ACM Multimedia. ACM, 402--410.
[41]
Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. 2019 b. Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition. In Proceedings of the ACM International Conference on Multimedia. 1331--1339.
[42]
Nitish Nag, Vaibhav Pandey, and Ramesh Jain. 2017. Health multimedia: Lifestyle recommendations based on diverse observations. In Proceedings of the ACM on International Conference on Multimedia Retrieval. 99--106.
[43]
Paritosh Pandey, Akella Deepthi, Bappaditya Mandal, and N. B. Puhan. 2017. FoodNet: Recognizing foods using ensemble of deep networks. In IEEE Signal Processing Letters, Vol. 24. 1758--1762.
[44]
Jianing Qiu, Frank P.-W. Lo, Yingnan Sun, Siyao Wang, and Benny Lo. 2019. Mining Discriminative Food Regions for Accurate Food Recognition. In British Machine Vision Conference (Accepted).
[45]
Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2017. Learning cross-modal embeddings for cooking recipes and food images. In IEEE Conference on Computer Vision and Pattern Recognition. 3020--3028.
[46]
Zagoruyko Sergey and Komodakis Nikos. 2016. Wide Residual Networks. In Proceedings of the British Machine Vision Conference. BMVA Press, 87.1--87.12.
[47]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1--9.
[48]
Xin Wang, Devinder Kumar, Nicolas Thome, Matthieu Cord, and Frederic Precioso. 2015. Recipe recognition with large multimodal food dataset. In IEEE International Conference on Multimedia and Expo Workshops. 1--6.
[49]
Xiu-Shen Wei, Jianxin Wu, and Quan Cui. 2019. Deep Learning for Fine-Grained Image Analysis: A Survey. CoRR, Vol. abs/1907.03069 (2019).
[50]
Hui Wu, Michele Merler, Rosario Uceda-Sosa, and John R Smith. 2016. Learning to make better mistakes: Semantics-aware visual food recognition. In ACM Multimedia Conference. 172--176.
[51]
Ruihan Xu, Luis Herranz, Shuqiang Jiang, Shuang Wang, Xinhang Song, and Ramesh Jain. 2015. Geolocalized modeling for dish recognition. In IEEE Transactions on Multimedia, Vol. 17. 1187--1199.
[52]
Keiji Yanai and Yoshiyuki Kawano. 2015. Food image recognition using deep convolutional network with pre-training and fine-tuning. In IEEE International Conference on Multimedia and Expo Workshops. 1--6.
[53]
Shulin Yang, Mei Chen, Dean Pomerleau, and Rahul Sukthankar. 2010. Food recognition using statistics of pairwise local features. In IEEE Conference on Computer Vision and Pattern Recognition. 2249--2256.
[54]
S. Yang and D. Ramanan. 2015. Multi-scale Recognition with DAG-CNNs. In IEEE International Conference on Computer Vision. 1215--1223.
[55]
Ze Yang, Tiange Luo, Dong Wang, Zhiqiang Hu, Jun Gao, and Liwei Wang. 2018. Learning to Navigate for Fine-Grained Classification. In European Conference on Computer Vision. 438--454.
[56]
Cui Yin, Song Yang, Sun Chen, Howard Andrew, and Belongie Serge. 2018. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. In IEEE Conference on Computer Vision and Pattern Recognition. 4109--4118.
[57]
F. Yu, D. Wang, E. Shelhamer, and T. Darrell. 2018. Deep Layer Aggregation. In IEEE Conference on Computer Vision and Pattern Recognition. 2403--2412.
[58]
Bolei Zhou, À gata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2018. Places: A 10 Million Image Database for Scene Recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40. 1452--1464.
[59]
Feng Zhou and Yuanqing Lin. 2016. Fine-Grained Image Classification by Exploring Bipartite-Graph Labels. In IEEE Conference on Computer Vision and Pattern Recognition. 1124--1133.
[60]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 8697--8710.

Cited By

View all
  • (2025)An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food RecognitionNutrients10.3390/nu1702036217:2(362)Online publication date: 20-Jan-2025
  • (2025)Enhancing Food Image Recognition by Multi-Level Fusion and the Attention MechanismFoods10.3390/foods1403046114:3(461)Online publication date: 31-Jan-2025
  • (2025)SARI: A Stage-aware Recognition Method for Ingredients Changing Appearance in Cooking Image SequencesJournal of Information Processing10.2197/ipsjjip.33.10433(104-114)Online publication date: 2025
  • Show More Cited By

Index Terms

  1. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '20: Proceedings of the 28th ACM International Conference on Multimedia
      October 2020
      4889 pages
      ISBN:9781450379885
      DOI:10.1145/3394171
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 October 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. benchmark
      2. deep learning
      3. food datasets
      4. food recognition

      Qualifiers

      • Research-article

      Funding Sources

      • Meituan-Dianping Group
      • National Natural Science Foundation

      Conference

      MM '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)102
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food RecognitionNutrients10.3390/nu1702036217:2(362)Online publication date: 20-Jan-2025
      • (2025)Enhancing Food Image Recognition by Multi-Level Fusion and the Attention MechanismFoods10.3390/foods1403046114:3(461)Online publication date: 31-Jan-2025
      • (2025)SARI: A Stage-aware Recognition Method for Ingredients Changing Appearance in Cooking Image SequencesJournal of Information Processing10.2197/ipsjjip.33.10433(104-114)Online publication date: 2025
      • (2025)Improve Fine-Grained Feature Learning in Fine-Grained DataSet GAIIEEE Access10.1109/ACCESS.2024.352050313(12777-12788)Online publication date: 2025
      • (2024)Recognizing Multiple Ingredients in Food Images Using a Single-Ingredient Classification ModelInternational Journal of Intelligent Information Technologies10.4018/IJIIT.36078220:1(1-21)Online publication date: 17-May-2024
      • (2024)Empowering Diabetics: Advancements in Smartphone-Based Food Classification, Volume Measurement, and Nutritional EstimationSensors10.3390/s2413408924:13(4089)Online publication date: 24-Jun-2024
      • (2024)Analyzing the Attractiveness of Food Images Using an Ensemble of Deep Learning Models Trained via Social Media ImagesBig Data and Cognitive Computing10.3390/bdcc80600548:6(54)Online publication date: 27-May-2024
      • (2024)Digital Food Sensing and Ingredient Analysis Techniques to Facilitate Human-Food Interface DesignsACM Computing Surveys10.1145/368567557:1(1-39)Online publication date: 7-Oct-2024
      • (2024)Lightweight Food Recognition via Aggregation Block and Feature EncodingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368028520:10(1-25)Online publication date: 22-Jul-2024
      • (2024)FoodCensor: Promoting Mindful Digital Food Content Consumption for People with Eating DisordersProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641984(1-18)Online publication date: 11-May-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media