Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Qiu, Jianing; Lo, Frank P. -W.; Gu, Xiao; Jobarteh, Modou L.; Jia, Wenyan; Baranowski, Tom; Steiner-Asiedu, Matilda; Anderson, Alex K.; McCrory, Megan A; Sazonov, Edward; Sun, Mingui; Frost, Gary; Lo, Benny

doi:10.1109/TCYB.2023.3243999

Computer Science > Computer Vision and Pattern Recognition

arXiv:2107.00372 (cs)

[Submitted on 1 Jul 2021 (v1), last revised 1 Mar 2023 (this version, v2)]

Title:Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Authors:Jianing Qiu, Frank P.-W. Lo, Xiao Gu, Modou L. Jobarteh, Wenyan Jia, Tom Baranowski, Matilda Steiner-Asiedu, Alex K. Anderson, Megan A McCrory, Edward Sazonov, Mingui Sun, Gary Frost, Benny Lo

View PDF

Abstract:Camera-based passive dietary intake monitoring is able to continuously capture the eating episodes of a subject, recording rich visual information, such as the type and volume of food being consumed, as well as the eating behaviours of the subject. However, there currently is no method that is able to incorporate these visual clues and provide a comprehensive context of dietary intake from passive recording (e.g., is the subject sharing food with others, what food the subject is eating, and how much food is left in the bowl). On the other hand, privacy is a major concern while egocentric wearable cameras are used for capturing. In this paper, we propose a privacy-preserved secure solution (i.e., egocentric image captioning) for dietary assessment with passive monitoring, which unifies food recognition, volume estimation, and scene understanding. By converting images into rich text descriptions, nutritionists can assess individual dietary intake based on the captions instead of the original images, reducing the risk of privacy leakage from images. To this end, an egocentric dietary image captioning dataset has been built, which consists of in-the-wild images captured by head-worn and chest-worn cameras in field studies in Ghana. A novel transformer-based architecture is designed to caption egocentric dietary images. Comprehensive experiments have been conducted to evaluate the effectiveness and to justify the design of the proposed architecture for egocentric dietary image captioning. To the best of our knowledge, this is the first work that applies image captioning for dietary intake assessment in real life settings.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2107.00372 [cs.CV]
	(or arXiv:2107.00372v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2107.00372
Journal reference:	IEEE Transactions on Cybernetics, 2023
Related DOI:	https://doi.org/10.1109/TCYB.2023.3243999

Submission history

From: Jianing Qiu [view email]
[v1] Thu, 1 Jul 2021 11:16:44 UTC (11,406 KB)
[v2] Wed, 1 Mar 2023 08:20:17 UTC (26,088 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators