research-article

Free access

Just Accepted

Lightweight Food Recognition via Aggregation Block and Feature Encoding

Authors:

Shuqiang JiangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications

Accepted on 13 July 2024

https://doi.org/10.1145/3680285

Online AM: 22 July 2024 Publication History

Abstract

Food image recognition has recently been given considerable attention in the multimedia field in light of its possible implications on health. The characteristics of the dispersed distribution of ingredients in food images put forward higher requirements on the long-range information extraction ability of neural networks, leading to more complex and deeper models. Nevertheless, the lightweight version of food image recognition is essential for improved implementation on end devices and sustained server-side expansion. To address this issue, we present Aggregation Feature Net(AFNet), a lightweight network that is capable of effectively capturing both global and local features from food images. In AFNet, we develop a novel convolution based on a residual model by encoding global features through row-wise and column-wise information integration. Merging aggregation block with classic local convolution yields a framework that works as the backbone of the network. Based on the efficient use of parameters by the aggregation block, we constructed a lightweight food image recognition network with fewer layers and a smaller scale, assisted by a new type of activation function. Experimental results on four popular food recognition datasets demonstrate that our approach achieves state-of-the-art performance with higher accuracy and fewer FLOPs and parameters. For example, in comparison to the current state-of-the-art model of MobileViTv2, AFNet achieved 88.4% accuracy of the top-1 level on the ETHZ Food-101 dataset, with similar parameters and FLOPs but 1.4% more accuracy. The source code will be provided in supplementary materials.

References

[1]

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101 - Mining Discriminative Components with Random Forests. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI (Lecture Notes in Computer Science, Vol. 8694), David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer, 446–461. https://doi.org/10.1007/978-3-319-10599-4_29

[2]

Léon Bottou, Frank E. Curtis, and Jorge Nocedal. 2018. Optimization Methods for Large-Scale Machine Learning. SIAM Rev. 60, 2 (2018), 223–311. https://doi.org/10.1137/16M1080173

[3]

Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, and S-H Gary Chan. 2023. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12021–12031.

[4]

Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based Ingredient Recognition for Cooking Recipe Retrieval. In Proceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016, Alan Hanjalic, Cees Snoek, Marcel Worring, Dick C. A. Bulterman, Benoit Huet, Aisling Kelliher, Yiannis Kompatsiaris, and Jin Li (Eds.). ACM, 32–41. https://doi.org/10.1145/2964284.2964315

Digital Library

[5]

Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, and Zicheng Liu. 2022. Mobile-Former: Bridging MobileNet and Transformer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 5260–5269. https://doi.org/10.1109/CVPR52688.2022.00520

[6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA. IEEE Computer Society, 248–255. https://doi.org/10.1109/CVPR.2009.5206848

[7]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=YicbFdNTTy

[8]

Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, and Chang Xu. 2022. CMT: Convolutional Neural Networks Meet Vision Transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 12165–12175. https://doi.org/10.1109/CVPR52688.2022.01186

[9]

Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2020. GhostNet: More Features From Cheap Operations. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778. https://doi.org/10.1109/CVPR.2016.90

[11]

Shota Horiguchi, Sosuke Amano, Makoto Ogawa, and Kiyoharu Aizawa. 2018. Personalized Classifier for Food Image Recognition. IEEE Trans. Multim. 20, 10 (2018), 2836–2848. https://doi.org/10.1109/TMM.2018.2814339

[12]

Andrew Howard, Ruoming Pang, Hartwig Adam, Quoc V. Le, Mark Sandler, Bo Chen, Weijun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, and Yukun Zhu. 2019. Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

[13]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). arXiv:1704.04861 http://arxiv.org/abs/1704.04861

[14]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[15]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 7132–7141. https://doi.org/10.1109/CVPR.2018.00745

[16]

Tao Huang, Lang Huang, Shan You, Fei Wang, Chen Qian, and Chang Xu. 2022. LightViT: Towards Light-Weight Convolution-Free Vision Transformers. CoRR abs/2207.05557 (2022). https://doi.org/10.48550/arXiv.2207.05557 arXiv:2207.05557

[17]

Akihisa Ishino, Yoko Yamakata, Hiroaki Karasawa, and Kiyoharu Aizawa. 2021. RecipeLog: Recipe Authoring App for Accurate Food Recording. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo César, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 2798–2800. https://doi.org/10.1145/3474085.3478563

Digital Library

[18]

Shuqiang Jiang, Weiqing Min, Linhu Liu, and Zhengdong Luo. 2020. Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Trans. Image Process. 29 (2020), 265–276. https://doi.org/10.1109/TIP.2019.2929447

[19]

Hokuto Kagaya, Kiyoharu Aizawa, and Makoto Ogawa. 2014. Food Detection and Recognition Using Convolutional Neural Network. In Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, Kien A. Hua, Yong Rui, Ralf Steinmetz, Alan Hanjalic, Apostol Natsev, and Wenwu Zhu (Eds.). ACM, 1085–1088. https://doi.org/10.1145/2647868.2654970

Digital Library

[20]

Yoshiyuki Kawano and Keiji Yanai. 2013. Real-Time Mobile Food Recognition System. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2013, Portland, OR, USA, June 23-28, 2013. IEEE Computer Society, 1–7. https://doi.org/10.1109/CVPRW.2013.5

Digital Library

[21]

Yoshiyuki Kawano and Keiji Yanai. 2014. FoodCam-256: A Large-scale Real-time Mobile Food RecognitionSystem employing High-Dimensional Features and Compression of Classifier Weights. In Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, Kien A. Hua, Yong Rui, Ralf Steinmetz, Alan Hanjalic, Apostol Natsev, and Wenwu Zhu (Eds.). ACM, 761–762. https://doi.org/10.1145/2647868.2654869

Digital Library

[22]

Yoshiyuki Kawano and Keiji Yanai. 2015. FoodCam: A real-time food recognition system on a smartphone. Multim. Tools Appl. 74, 14 (2015), 5263–5287. https://doi.org/10.1007/s11042-014-2000-8

Digital Library

[23]

Jiashi Li, Xin Xia, Wei Li, Huixia Li, Xing Wang, Xuefeng Xiao, Rui Wang, Min Zheng, and Xin Pan. 2022. Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios. CoRR abs/2207.05501 (2022). https://doi.org/10.48550/arXiv.2207.05501 arXiv:2207.05501

[24]

Yanyu Li, Geng Yuan, Yang Wen, Eric Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, and Jian Ren. 2022. EfficientFormer: Vision Transformers at MobileNet Speed. CoRR abs/2206.01191 (2022). https://doi.org/10.48550/arXiv.2206.01191 arXiv:2206.01191

[25]

Jihao Liu, Xin Huang, Guanglu Song, Hongsheng Li, and Yu Liu. 2022. UniNet: Unified Architecture Search with Convolution, Transformer, and MLP. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXI (Lecture Notes in Computer Science, Vol. 13681), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, 33–49. https://doi.org/10.1007/978-3-031-19803-8_3

Digital Library

[26]

Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, and Yixuan Yuan. 2023. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14420–14430.

[27]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986

[28]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIV (Lecture Notes in Computer Science, Vol. 11218), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 122–138. https://doi.org/10.1007/978-3-030-01264-9_8

Digital Library

[29]

Niki Martinel, Gian Luca Foresti, and Christian Micheloni. 2018. Wide-Slice Residual Networks for Food Recognition. In 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, March 12-15, 2018. IEEE Computer Society, 567–576. https://doi.org/10.1109/WACV.2018.00068

[30]

Sachin Mehta and Mohammad Rastegari. 2022. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=vh-0sUt8HlG

[31]

Sachin Mehta and Mohammad Rastegari. 2022. Separable Self-attention for Mobile Vision Transformers. CoRR abs/2206.02680 (2022). https://doi.org/10.48550/arXiv.2206.02680 arXiv:2206.02680

[32]

Sachin Mehta, Mohammad Rastegari, Linda G. Shapiro, and Hannaneh Hajishirzi. 2019. ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 9190–9200. https://doi.org/10.1109/CVPR.2019.00941

[33]

Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh C. Jain. 2019. A Survey on Food Computing. ACM Comput. Surv. 52, 5 (2019), 92:1–92:36. https://doi.org/10.1145/3329168

Digital Library

[34]

Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. 2019. Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition. In Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019, Laurent Amsaleg, Benoit Huet, Martha A. Larson, Guillaume Gravier, Hayley Hung, Chong-Wah Ngo, and Wei Tsang Ooi (Eds.). ACM, 1331–1339. https://doi.org/10.1145/3343031.3350948

Digital Library

[35]

Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2020. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 393–401. https://doi.org/10.1145/3394171.3414031

Digital Library

[36]

Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Liping Kang, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2023. Large Scale Visual Food Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 8 (2023), 9932–9949. https://doi.org/10.1109/TPAMI.2023.3237871

Digital Library

[37]

Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Liping Kang, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2023. Large Scale Visual Food Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023), 1–18. https://doi.org/10.1109/TPAMI.2023.3237871

Digital Library

[38]

Kei Nakamoto, Sosuke Amano, Hiroaki Karasawa, Yoko Yamakata, and Kiyoharu Aizawa. 2022. Prediction of Mental State from Food Images. In CEA++@MM 2022: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications, Lisboa, Portugal, 10 October 2022, Yoko Yamakata, Atsushi Hashimoto, and Jingjing Chen (Eds.). ACM, 21–28. https://doi.org/10.1145/3552485.3554937

Digital Library

[39]

Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, and Brais Martínez. 2022. EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers. CoRR abs/2205.03436 (2022). https://doi.org/10.48550/arXiv.2205.03436 arXiv:2205.03436

[40]

Parisa Pouladzadeh and Shervin Shirmohammadi. 2017. Mobile Multi-Food Recognition Using Deep Learning. ACM Trans. Multim. Comput. Commun. Appl. 13, 3s (2017), 36:1–36:21. https://doi.org/10.1145/3063592

Digital Library

[41]

Xu Qin and Zhilin Wang. 2019. Nasnet: A neuron attention stage-by-stage net for single image deraining. arXiv preprint arXiv:1912.03151 (2019).

[42]

Javier Ródenas, Bhalaji Nagarajan, Marc Bolaños, and Petia Radeva. 2022. Learning Multi-Subset of Classes for Fine-Grained Food Recognition. In Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management on Multimedia Assisted Dietary Management, MADiMa 2022, Lisboa, Portugal, 10 October 2022, Stavroula G. Mougiakakou, Giovanni Maria Farinella, Keiji Yanai, and Dario Allegra (Eds.). ACM, 17–26. https://doi.org/10.1145/3552484.3555754

Digital Library

[43]

Ali Rostami, Nitish Nagesh, Amir Rahmani, and Ramesh C. Jain. 2022. World Food Atlas for Food Navigation. In Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management on Multimedia Assisted Dietary Management, MADiMa 2022, Lisboa, Portugal, 10 October 2022, Stavroula G. Mougiakakou, Giovanni Maria Farinella, Keiji Yanai, and Dario Allegra (Eds.). ACM, 39–47. https://doi.org/10.1145/3552484.3555748

Digital Library

[44]

Ali Rostami, Vaibhav Pandey, Nitish Nag, Vesper Wang, and Ramesh C. Jain. 2020. Personal Food Model. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 4416–4424. https://doi.org/10.1145/3394171.3414691

Digital Library

[45]

Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

[46]

Wenjing Shao, Weiqing Min, Sujuan Hou, Mengjiang Luo, Tianhao Li, Yuanjie Zheng, and Shuqiang Jiang. 2023. Vision-based food nutrition estimation via RGB-D fusion network. Food Chemistry 424 (2023), 136309.

[47]

Guorui Sheng, Weiqing Min, Xiangyi Zhu, Liang Xu, Qingshuo Sun, Yancun Yang, Lili Wang, and Shuqiang Jiang. 2024. A Lightweight Hybrid Model with Location-Preserving ViT for Efficient Food Recognition. Nutrients 16, 2 (2024), 200.

[48]

Guorui Sheng, Shuqi Sun, Chengxu Liu, and Yancun Yang. 2022. Food recognition via an efficient neural network with transformer grouping. Int. J. Intell. Syst. 37, 12 (2022), 11465–11481. https://doi.org/10.1002/int.23050

Digital Library

[49]

Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. 2021. Bottleneck Transformers for Visual Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 16519–16529. https://doi.org/10.1109/CVPR46437.2021.01625

[50]

Ghalib Ahmed Tahir and Chu Kiong Loo. 2021. A comprehensive survey of image-based food recognition and volume estimation methods for dietary assessment. In Healthcare, Vol. 9. MDPI, 1676.

[51]

Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105–6114. http://proceedings.mlr.press/v97/tan19a.html

[52]

Ren Zhang Tan, XinYing Chew, and Khai Wah Khaw. 2021. Neural Architecture Search for Lightweight Neural Network in Food Recognition. Mathematics 9, 11 (2021), 1245.

[53]

Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, and Yunhe Wang. 2022. GhostNetV2: Enhance Cheap Operation with Long-Range Attention. CoRR abs/2211.12905 (2022). https://doi.org/10.48550/arXiv.2211.12905 arXiv:2211.12905

[54]

Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, and Anurag Ranjan. 2023. MobileOne: An Improved One millisecond Mobile Backbone. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 7907–7917. https://doi.org/10.1109/CVPR52729.2023.00764

[55]

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. 2021. CvT: Introducing Convolutions to Vision Transformers. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 22–31. https://doi.org/10.1109/ICCV48922.2021.00009

[56]

Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, and Lu Yuan. 2022. TinyViT: Fast Pretraining Distillation for Small Vision Transformers. CoRR abs/2207.10666 (2022). https://doi.org/10.48550/arXiv.2207.10666 arXiv:2207.10666

[57]

Yoko Yamakata, Akihisa Ishino, Akiko Sunto, Sosuke Amano, and Kiyoharu Aizawa. 2022. Recipe-oriented Food Logging for Nutritional Management. In MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, João Magalhães, Alberto Del Bimbo, Shin’ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni (Eds.). ACM, 6898–6904. https://doi.org/10.1145/3503161.3549203

Digital Library

[58]

Shulin Yang, Mei Chen, Dean Pomerleau, and Rahul Sukthankar. 2010. Food recognition using statistics of pairwise local features. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010. IEEE Computer Society, 2249–2256. https://doi.org/10.1109/CVPR.2010.5539907

[59]

Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, and Lu Yuan. 2022. MiniViT: Compressing Vision Transformers with Weight Multiplexing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 12135–12144. https://doi.org/10.1109/CVPR52688.2022.01183

[60]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.

Index Terms

Lightweight Food Recognition via Aggregation Block and Feature Encoding
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Food Item Recognition and Intake Measurement Techniques
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

High-calorie intake can be harmful and result in numerous diseases. Standard intake of a number of calories is fundamental for keeping the right balance of calories in the human body. Currently, some techniques allow users to estimate the calorie count ...
Food recognition via an efficient neural network with transformer grouping
Abstract
Recently, considerable research efforts have been devoted to food recognition for its great potential applications in human health. Much work so far has focused on directly extracted deep visual features via Convolutional Neural Networks, which ...
Few-shot Food Recognition via Multi-view Representation Learning

This article considers the problem of few-shot learning for food recognition. Automatic food recognition can support various applications, e.g., dietary assessment and food journaling. Most existing works focus on food recognition with large numbers of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted

ISSN:1551-6857

EISSN:1551-6865

Table of Contents

Copyright © 2024 Copyright held by the owner/author(s).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 22 July 2024

Accepted: 13 July 2024

Revised: 01 July 2024

Received: 30 July 2023

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
37
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)37

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables