research-article

Lifelog Image Retrieval Based on Semantic Relevance Mapping

Authors:

Ana Garcia Del Molino,

Vigneshwaran Subbaraju,

Joo-Hwee LimAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 3

Article No.: 92, Pages 1 - 18

https://doi.org/10.1145/3446209

Published: 22 July 2021 Publication History

Abstract

Lifelog analytics is an emerging research area with technologies embracing the latest advances in machine learning, wearable computing, and data analytics. However, state-of-the-art technologies are still inadequate to distill voluminous multimodal lifelog data into high quality insights. In this article, we propose a novel semantic relevance mapping (SRM) method to tackle the problem of lifelog information access. We formulate lifelog image retrieval as a series of mapping processes where a semantic gap exists for relating basic semantic attributes with high-level query topics. The SRM serves both as a formalism to construct a trainable model to bridge the semantic gap and an algorithm to implement the training process on real-world lifelog data. Based on the SRM, we propose a computational framework of lifelog analytics to support various applications of lifelog information access, such as image retrieval, summarization, and insight visualization. Systematic evaluations are performed on three challenging benchmarking tasks to show the effectiveness of our method.

References

[1]

F. B. Abdallah, G. Feki, A. B. Ammar, and C. B. Amar. 2018. A new model driven architecture for deep learning-based multimodal lifelog retrieval. In ICCE Computer Graphics, Visualization and Computer Vision. 1–10.

[2]

Fatma Ben Abdallah, Ghada Feki, Mohamed Ezzarka, et al.2018. Regim lab team at ImageCLEF lifelog moment retrieval task 2018. In Working Notes of CLEF 2018.

[3]

Peter Anderson, Xiaodong He, Chris Buehler, et al.2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR. 6077–6086.

[4]

I. Androutsopoulos, G. D. Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases—An introduction. Natural Language Engineering 1 (March 1995), 29–81.

[5]

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy S. Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533–1544.

[6]

M. Bolaños, M. Dimiccoli, and P. Radeva. 2017. Toward storytelling from visual lifelogging: An overview. IEEE Transactions on Human-Machine Systems 47 (2017), 77–90.

[7]

Marc Bolaños, Ricard Mestre, Estefanía Talavera, et al.2015. Visual summary of egocentric photostreams by representative keyframes. In IEEE 1st International Workshop on Wearable and Ego-Vision Systems for Augmented Experience (WEsAX’15). ICME. 1–6.

[8]

Yuri Boykov and Vladimir Kolmogorov. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 9 (2004), 1124–1137.

Digital Library

[9]

Shih-Fu Chang. 2013. How far we’ve come: Impact of 20 years of multimedia information retrieval. ACM Transactions on Multimedia Computing, Communications and Applications 9 (2013), 42:1–42:4.

Digital Library

[10]

Yi Chen and Gareth J. F. Jones. 2010. Augmenting human memory using personal lifelogs. In ACM AH’10. Article 24, 9 pages.

Digital Library

[11]

E. K. Choe, B. Lee, and M. C. Schraefel. 2015. Characterizing visualization insights from quantified selfers’ personal data presentations. IEEE Computer Graphics and Applications 35, 4 (2015), 28–37.

Digital Library

[12]

D.-T. Dang-Nguyen, L. Piras, M. Riegler, G. Boato, L. Zhou, and C. Gurrin. 2017. Overview of ImageCLEFlifelog 2017: Lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–14.

[13]

Duc-Tien Dang-Nguyen, Luca Piras, Michael Riegler, Liting Zhou, Mathias Lux, and Cathal Gurrin. 2018. Overview of imagecleflifelog 2018: Daily living understanding and lifelog moment retrieval. In Working Notes of CLEF 2018.

[14]

A. G. del Molino, M. Bappaditya, J. Lin, J.-H. Lim, S. Vigneshwaran, and V. Chandrasekhar. 2017. VC-I2R at ImageCLEF2017: Ensemble of deep learned features for lifelog video summarization. In Working Notes of CLEF 2017. 1–12.

[15]

A. G. del Molino, Joo-Hwee Lim, and Ah-Hwee Tan. 2018. Predicting visual context for unsupervised event segmentation in continuous photo-streams. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). 10–17.

Digital Library

[16]

J. Deng, W. Dong, R. Socher, L. Li, et al. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. 248–255.

[17]

M. Dimiccolia, M. Bolanos, E. Talaveraa, M. Aghaeia, S. G. Nikolovd, and P. Radeva. 2017. SR-Clustering: Semantic regularized clustering for egocentric photo streams segmentation. Computer Vision and Image Understanding 155 (2017), 55–69.

[18]

Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2019. From selective deep convolutional features to compact binary representations for image retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 15 (2019), 43:1–27:22.

Digital Library

[19]

Mihai Dogariu and Bogdan Ionescu. 2017. A textual filtering of hog-based hierarchical clustering of lifelog data. In Working Notes of CLEF 2017.

[20]

A. Duane, R. Gupta, L. Zhou, and C. Gurrin. 2016. Visual insights from personal lifelogs. In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12). 386–389.

[21]

C. Gurrin, H. Joho, F. Hopfgartner, et al.2017. Overview of NTCIR-13 Lifelog-2 task. In The 13th NTCIR Conference (NTCIR-13). 6–11.

[22]

Cathal Gurrin, Alan Smeaton, and Aiden R. Doherty. 2014. LifeLogging: Personal big data. Foundations and Trends in Information Retrieval 8 (Jan. 2014), 1–125.

Digital Library

[23]

M. Harvey, M. Langheinrich, and G. Ward. 2016. Remembering through lifelogging: A survey of human memory augmentation. Pervasive and Mobile Computing 27 (2016), 14–26.

Digital Library

[24]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.

[25]

Ergina Kavallieratou, Carlos R. Del-Blanco, Carlos Cuevas, and Narciso García. 2018. Retrieving events in life logging. In Working Notes of CLEF 2018.

[26]

Atsuhiro Kojima, Takeshi Tamura, and Kunio Fukunaga. 2002. Natural language description of human activities from video images based on concept hierarchy of actions. Int. J. Comput. Vis. 50 (2002), 171–184.

Digital Library

[27]

M. L. Lee and A. K. Dey. 2007. Providing good memory cues for people with episodic memory impairment. In ASSETS’07. 131–138.

Digital Library

[28]

Y. J. Lee, J. Ghosh, and K. Grauman. 2012. Discovering important people and objects for egocentric video summarization. In CVPR. 1346–1353.

Digital Library

[29]

Jie Lin, A. G. del Molino, Qianli Xu, et al.2017. VCI2R at the NTCIR-13 Lifelog-2 lifelog semantic access task. In NTCIR-13. 28–32.

[30]

Tsung- Yi Lin, Michael Maire, Serge J. Belongie, et al.2014. Microsoft COCO: Common objects in context. In ECCV’14. 740–755.

[31]

Dongsheng Liu, Shuicheng Yan, Rongrong Ji, Xiansheng Hua, and HongJiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 9 (2013), 2:1–2:16.

Digital Library

[32]

Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In IEEE CVPR. 2714–2721.

Digital Library

[33]

J. Meyer and S. Boll. 2014. Digital health devices for everyone!Pervasive Computing 13, 2 (2014), 10–13.

[34]

Saima Noreen, Akira R. O’Connor, and Malcolm D. MacLeod. 2016. Neural correlates of direct and indirect suppression of autobiographical memories. Frontiers in Psychology 7 (2016), No. 379.

[35]

Yew-Soon Ong and Abhishek Gupta. 2019. AIR5: Five pillars of artificial intelligence research. IEEE Transactions on Emerging Topics in Computational Intelligence 3 (2019), 411–415.

[36]

Vasileios Papapanagiotou, Christos Diou, and Anastasios Delopoulos. 2015. Improving concept-based image retrieval with training weights computed from tags. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 12 (2015), 32:1–32:22.

Digital Library

[37]

Aiden R. Doherty and Alan Smeaton. 2008. Automatically segmenting lifelog data into events. In 2008 9th International Workshop on Image Analysis for Multimedia Interactive Services. 20–23.

Digital Library

[38]

Aiden R. Doherty and Alan Smeaton. 2010. Automatically augmenting lifelog events using pervasively generated content from millions of people. Sensors 10 (03 2010), 1423–1446.

[39]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (June 2015).

Digital Library

[40]

Gemma Roig, Xavier Boix, Roderick de Nijs, Sebastian Ramos, Kolja Kühnlenz, and Luc J. Van Gool. 2013. Active MAP inference in CRFs for efficient semantic segmentation. In ICCV 2013. 2312–2319.

Digital Library

[41]

B. Safadi, P. Mulhem, G. Quenot, and Chevallet J.-P.2016. LIG-MRIM at NTCIR-12 lifelog semantic access task. In NTCIR-12. 361–365.

[42]

A. Sellen and S. Whittaker. 2010. Beyond total capture: A constructive critique of lifelogging. Communications of the ACM 53, 5 (2010), 70–77.

Digital Library

[43]

Jingkuan Song, Lianli Gao, Feiping Nie, Heng Tao Shen, Yan Yan, and Nicu Sebe. 2016. Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Transactions on Image Processing 25 (2016), 4999–5011.

Digital Library

[44]

Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From deterministic to generative: Multimodal stochastic RNNs for video captioning. IEEE Transactions on Neural Networks and Learning Systems 30 (2019), 3047–3058.

[45]

Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, and Richang Hong. 2018. Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Transactions on Image Processing 27 (2018), 3210–3221.

[46]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proc. AAAI. 4278–4284.

Digital Library

[47]

Christian Szegedy, Wei Liu, Yangqing J. et al. 2015. Going deeper with convolutions. In CVPR. 1894–1903.

[48]

Tsun-Hsien Tang, Min-Huan Fu, Hen-Hsen Huang, Kuan-Ta Chen, and Hsin-Hsi Chen. 2018. Visual concept selection with textual knowledge for understanding activities of daily living and life moment retrieval. In Working Notes of CLEF 2018.

[49]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR’15, 3156–3164.

[50]

Xuanhan Wang, Lianli Gao, Peng Wang, Xiaoshuai Sun, and Xianglong Liu. 2018. Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Transactions on Multimedia 20 (2018), 634–644.

Digital Library

[51]

Q. Xu, V. Subbaraju, A. G. del Molino, et al.2017. Visualizing personal lifelog data for deeper insights at the NTCIR-13 lifelog-2 task. In NTCIR-13. 33–39.

[52]

Qianli Xu, Jiayi Zhang, Joanes Grandjean, Cheston Tan, Vigneshwaran Subbaraju, Liyuan Li, Kuan Jen Lee, Po-Jang Hsieh, and Joo-Hwee Lim. 2020. Neural correlates of retrieval-based enhancement of autobiographical memory in older adults. Scientific Reports 10 (2020), Article 1447.

[53]

S. Yamamoto, T. Nishimura, Y. Akagi, Y. Takimoto, T. Inoue, and H. Toda. 2017. PBG at the NTCIR-13 lifelog-2 LAT, LSAT, and LEST tasks. In NTCIR-13. 12–19.

[54]

Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI’05. 658–666.

Digital Library

[55]

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2018. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (June 2018), 1452–1464.

[56]

Liting Zhou, Aaron Duane, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2017. DCU at the NTCIR-13 lifelog-2 task. In NTCIR-13.

[57]

L. Zhou, L. Piras, M. Riegler, G. Boato, D.-T. Dang-Nguyen, and C. Gurrin. 2017. Organizer team at imageCLEFlifelog 2017: Baseline approaches for lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–11.

[58]

Liting Zhou, Luca Piras, Michael Riegler, Mathias Lux, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2018. An interactive lifelog retrieval system for activities of daily living understanding. In Working Notes of CLEF 2018.

Cited By

Li DZhang HCheng JLiu B(2024)Improving efficiency of DNN-based relocalization module for autonomous driving with server-side computingJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00592-113:1Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-024-00592-1
Shu CChen YTan CLuo YDou H(2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-023-00557-w
Zhang LZhou XLi DYang Z(2024)HCCNet: Hybrid Coupled Cooperative Network for Robust Indoor LocalizationACM Transactions on Sensor Networks10.1145/366564520:4(1-22)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3665645
Show More Cited By

Index Terms

Lifelog Image Retrieval Based on Semantic Relevance Mapping
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Recommendations

Transfer Learning for Improving Lifelog Image Retrieval
Computer Analysis of Images and Patterns
Abstract
With lifelogging devices; such as wearable camera, smart watches, audio recorder or standalone smartphone applications; capturing daily moments becomes easier. In recent years, many workshops and panels have emerged and proposed benchmarks to face ...
Read More
MemoriEase: An Interactive Lifelog Retrieval System for LSC’23
LSC '23: Proceedings of the 6th Annual ACM Lifelog Search Challenge

Lifelogging is an activity of recording all events that happen in the daily life of an individual. The events can contain images, audio, health index, etc which are collected through various devices such as wearable cameras, smartwatches, and other ...
Read More
Incorporating Semantic Knowledge for Visual Lifelog Activity Recognition
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

The advance in wearable technology has made lifelogging more feasible and more popular. Visual lifelogs collected by wearable cameras capture every single detail of individual's life experience, offering a promising data source for deeper lifestyle ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3

August 2021

443 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3476118

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2021

Accepted: 01 December 2020

Revised: 01 November 2020

Received: 01 June 2020

Published in TOMM Volume 17, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
125
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Li DZhang HCheng JLiu B(2024)Improving efficiency of DNN-based relocalization module for autonomous driving with server-side computingJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00592-113:1Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-024-00592-1
Shu CChen YTan CLuo YDou H(2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-023-00557-w
Zhang LZhou XLi DYang Z(2024)HCCNet: Hybrid Coupled Cooperative Network for Robust Indoor LocalizationACM Transactions on Sensor Networks10.1145/366564520:4(1-22)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3665645
Lyu YQin PXu TZhu CChen E(2024)InteractNet: Social Interaction Recognition for Semantic-rich VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366366820:8(1-21)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3663668
Huang JRen HLiu SLiu YLv CLu JXie CLu H(2024)Real-Time Attentive Dilated U-Net for Extremely Dark Image EnhancementACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365466820:8(1-19)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3654668
Jha MBhandari A(2024)NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank MinimizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877220:6(1-22)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3638772
Qiu HLi HWu QShi HWang LMeng FXu L(2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3637214
Yang MGe CZhao XKou H(2023)FSPLO: a fast sensor placement location optimization method for cloud-aided inspection of smart buildingsJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00410-012:1Online publication date: 6-Mar-2023
https://dl.acm.org/doi/10.1186/s13677-023-00410-0
Chen YBai RWu YLi TZhou H(2023)A Multidimensional Data Utility Evaluation and Pricing Scheme in the Big Data MarketWireless Communications & Mobile Computing10.1155/2023/62174952023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/6217495
Wang HWang YYu BZhan YYuan CYang W(2023)Attentional Composition Networks for Long-Tailed Human Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360325320:1(1-18)Online publication date: 9-Jun-2023
https://dl.acm.org/doi/10.1145/3603253
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents