Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Lifelog Image Retrieval Based on Semantic Relevance Mapping

Published: 22 July 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Lifelog analytics is an emerging research area with technologies embracing the latest advances in machine learning, wearable computing, and data analytics. However, state-of-the-art technologies are still inadequate to distill voluminous multimodal lifelog data into high quality insights. In this article, we propose a novel semantic relevance mapping (SRM) method to tackle the problem of lifelog information access. We formulate lifelog image retrieval as a series of mapping processes where a semantic gap exists for relating basic semantic attributes with high-level query topics. The SRM serves both as a formalism to construct a trainable model to bridge the semantic gap and an algorithm to implement the training process on real-world lifelog data. Based on the SRM, we propose a computational framework of lifelog analytics to support various applications of lifelog information access, such as image retrieval, summarization, and insight visualization. Systematic evaluations are performed on three challenging benchmarking tasks to show the effectiveness of our method.

    References

    [1]
    F. B. Abdallah, G. Feki, A. B. Ammar, and C. B. Amar. 2018. A new model driven architecture for deep learning-based multimodal lifelog retrieval. In ICCE Computer Graphics, Visualization and Computer Vision. 1–10.
    [2]
    Fatma Ben Abdallah, Ghada Feki, Mohamed Ezzarka, et al.2018. Regim lab team at ImageCLEF lifelog moment retrieval task 2018. In Working Notes of CLEF 2018.
    [3]
    Peter Anderson, Xiaodong He, Chris Buehler, et al.2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR. 6077–6086.
    [4]
    I. Androutsopoulos, G. D. Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases—An introduction. Natural Language Engineering 1 (March 1995), 29–81.
    [5]
    Jonathan Berant, Andrew Chou, Roy Frostig, and Percy S. Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533–1544.
    [6]
    M. Bolaños, M. Dimiccoli, and P. Radeva. 2017. Toward storytelling from visual lifelogging: An overview. IEEE Transactions on Human-Machine Systems 47 (2017), 77–90.
    [7]
    Marc Bolaños, Ricard Mestre, Estefanía Talavera, et al.2015. Visual summary of egocentric photostreams by representative keyframes. In IEEE 1st International Workshop on Wearable and Ego-Vision Systems for Augmented Experience (WEsAX’15). ICME. 1–6.
    [8]
    Yuri Boykov and Vladimir Kolmogorov. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 9 (2004), 1124–1137.
    [9]
    Shih-Fu Chang. 2013. How far we’ve come: Impact of 20 years of multimedia information retrieval. ACM Transactions on Multimedia Computing, Communications and Applications 9 (2013), 42:1–42:4.
    [10]
    Yi Chen and Gareth J. F. Jones. 2010. Augmenting human memory using personal lifelogs. In ACM AH’10. Article 24, 9 pages.
    [11]
    E. K. Choe, B. Lee, and M. C. Schraefel. 2015. Characterizing visualization insights from quantified selfers’ personal data presentations. IEEE Computer Graphics and Applications 35, 4 (2015), 28–37.
    [12]
    D.-T. Dang-Nguyen, L. Piras, M. Riegler, G. Boato, L. Zhou, and C. Gurrin. 2017. Overview of ImageCLEFlifelog 2017: Lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–14.
    [13]
    Duc-Tien Dang-Nguyen, Luca Piras, Michael Riegler, Liting Zhou, Mathias Lux, and Cathal Gurrin. 2018. Overview of imagecleflifelog 2018: Daily living understanding and lifelog moment retrieval. In Working Notes of CLEF 2018.
    [14]
    A. G. del Molino, M. Bappaditya, J. Lin, J.-H. Lim, S. Vigneshwaran, and V. Chandrasekhar. 2017. VC-I2R at ImageCLEF2017: Ensemble of deep learned features for lifelog video summarization. In Working Notes of CLEF 2017. 1–12.
    [15]
    A. G. del Molino, Joo-Hwee Lim, and Ah-Hwee Tan. 2018. Predicting visual context for unsupervised event segmentation in continuous photo-streams. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). 10–17.
    [16]
    J. Deng, W. Dong, R. Socher, L. Li, et al. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. 248–255.
    [17]
    M. Dimiccolia, M. Bolanos, E. Talaveraa, M. Aghaeia, S. G. Nikolovd, and P. Radeva. 2017. SR-Clustering: Semantic regularized clustering for egocentric photo streams segmentation. Computer Vision and Image Understanding 155 (2017), 55–69.
    [18]
    Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2019. From selective deep convolutional features to compact binary representations for image retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 15 (2019), 43:1–27:22.
    [19]
    Mihai Dogariu and Bogdan Ionescu. 2017. A textual filtering of hog-based hierarchical clustering of lifelog data. In Working Notes of CLEF 2017.
    [20]
    A. Duane, R. Gupta, L. Zhou, and C. Gurrin. 2016. Visual insights from personal lifelogs. In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12). 386–389.
    [21]
    C. Gurrin, H. Joho, F. Hopfgartner, et al.2017. Overview of NTCIR-13 Lifelog-2 task. In The 13th NTCIR Conference (NTCIR-13). 6–11.
    [22]
    Cathal Gurrin, Alan Smeaton, and Aiden R. Doherty. 2014. LifeLogging: Personal big data. Foundations and Trends in Information Retrieval 8 (Jan. 2014), 1–125.
    [23]
    M. Harvey, M. Langheinrich, and G. Ward. 2016. Remembering through lifelogging: A survey of human memory augmentation. Pervasive and Mobile Computing 27 (2016), 14–26.
    [24]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.
    [25]
    Ergina Kavallieratou, Carlos R. Del-Blanco, Carlos Cuevas, and Narciso García. 2018. Retrieving events in life logging. In Working Notes of CLEF 2018.
    [26]
    Atsuhiro Kojima, Takeshi Tamura, and Kunio Fukunaga. 2002. Natural language description of human activities from video images based on concept hierarchy of actions. Int. J. Comput. Vis. 50 (2002), 171–184.
    [27]
    M. L. Lee and A. K. Dey. 2007. Providing good memory cues for people with episodic memory impairment. In ASSETS’07. 131–138.
    [28]
    Y. J. Lee, J. Ghosh, and K. Grauman. 2012. Discovering important people and objects for egocentric video summarization. In CVPR. 1346–1353.
    [29]
    Jie Lin, A. G. del Molino, Qianli Xu, et al.2017. VCI2R at the NTCIR-13 Lifelog-2 lifelog semantic access task. In NTCIR-13. 28–32.
    [30]
    Tsung- Yi Lin, Michael Maire, Serge J. Belongie, et al.2014. Microsoft COCO: Common objects in context. In ECCV’14. 740–755.
    [31]
    Dongsheng Liu, Shuicheng Yan, Rongrong Ji, Xiansheng Hua, and HongJiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 9 (2013), 2:1–2:16.
    [32]
    Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In IEEE CVPR. 2714–2721.
    [33]
    J. Meyer and S. Boll. 2014. Digital health devices for everyone!Pervasive Computing 13, 2 (2014), 10–13.
    [34]
    Saima Noreen, Akira R. O’Connor, and Malcolm D. MacLeod. 2016. Neural correlates of direct and indirect suppression of autobiographical memories. Frontiers in Psychology 7 (2016), No. 379.
    [35]
    Yew-Soon Ong and Abhishek Gupta. 2019. AIR5: Five pillars of artificial intelligence research. IEEE Transactions on Emerging Topics in Computational Intelligence 3 (2019), 411–415.
    [36]
    Vasileios Papapanagiotou, Christos Diou, and Anastasios Delopoulos. 2015. Improving concept-based image retrieval with training weights computed from tags. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 12 (2015), 32:1–32:22.
    [37]
    Aiden R. Doherty and Alan Smeaton. 2008. Automatically segmenting lifelog data into events. In 2008 9th International Workshop on Image Analysis for Multimedia Interactive Services. 20–23.
    [38]
    Aiden R. Doherty and Alan Smeaton. 2010. Automatically augmenting lifelog events using pervasively generated content from millions of people. Sensors 10 (03 2010), 1423–1446.
    [39]
    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (June 2015).
    [40]
    Gemma Roig, Xavier Boix, Roderick de Nijs, Sebastian Ramos, Kolja Kühnlenz, and Luc J. Van Gool. 2013. Active MAP inference in CRFs for efficient semantic segmentation. In ICCV 2013. 2312–2319.
    [41]
    B. Safadi, P. Mulhem, G. Quenot, and Chevallet J.-P.2016. LIG-MRIM at NTCIR-12 lifelog semantic access task. In NTCIR-12. 361–365.
    [42]
    A. Sellen and S. Whittaker. 2010. Beyond total capture: A constructive critique of lifelogging. Communications of the ACM 53, 5 (2010), 70–77.
    [43]
    Jingkuan Song, Lianli Gao, Feiping Nie, Heng Tao Shen, Yan Yan, and Nicu Sebe. 2016. Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Transactions on Image Processing 25 (2016), 4999–5011.
    [44]
    Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From deterministic to generative: Multimodal stochastic RNNs for video captioning. IEEE Transactions on Neural Networks and Learning Systems 30 (2019), 3047–3058.
    [45]
    Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, and Richang Hong. 2018. Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Transactions on Image Processing 27 (2018), 3210–3221.
    [46]
    Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proc. AAAI. 4278–4284.
    [47]
    Christian Szegedy, Wei Liu, Yangqing J. et al. 2015. Going deeper with convolutions. In CVPR. 1894–1903.
    [48]
    Tsun-Hsien Tang, Min-Huan Fu, Hen-Hsen Huang, Kuan-Ta Chen, and Hsin-Hsi Chen. 2018. Visual concept selection with textual knowledge for understanding activities of daily living and life moment retrieval. In Working Notes of CLEF 2018.
    [49]
    Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR’15, 3156–3164.
    [50]
    Xuanhan Wang, Lianli Gao, Peng Wang, Xiaoshuai Sun, and Xianglong Liu. 2018. Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Transactions on Multimedia 20 (2018), 634–644.
    [51]
    Q. Xu, V. Subbaraju, A. G. del Molino, et al.2017. Visualizing personal lifelog data for deeper insights at the NTCIR-13 lifelog-2 task. In NTCIR-13. 33–39.
    [52]
    Qianli Xu, Jiayi Zhang, Joanes Grandjean, Cheston Tan, Vigneshwaran Subbaraju, Liyuan Li, Kuan Jen Lee, Po-Jang Hsieh, and Joo-Hwee Lim. 2020. Neural correlates of retrieval-based enhancement of autobiographical memory in older adults. Scientific Reports 10 (2020), Article 1447.
    [53]
    S. Yamamoto, T. Nishimura, Y. Akagi, Y. Takimoto, T. Inoue, and H. Toda. 2017. PBG at the NTCIR-13 lifelog-2 LAT, LSAT, and LEST tasks. In NTCIR-13. 12–19.
    [54]
    Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI’05. 658–666.
    [55]
    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2018. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (June 2018), 1452–1464.
    [56]
    Liting Zhou, Aaron Duane, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2017. DCU at the NTCIR-13 lifelog-2 task. In NTCIR-13.
    [57]
    L. Zhou, L. Piras, M. Riegler, G. Boato, D.-T. Dang-Nguyen, and C. Gurrin. 2017. Organizer team at imageCLEFlifelog 2017: Baseline approaches for lifelog retrieval and summarization. In Working Notes of CLEF 2017. 1–11.
    [58]
    Liting Zhou, Luca Piras, Michael Riegler, Mathias Lux, Duc-Tien Dang-Nguyen, and Cathal Gurrin. 2018. An interactive lifelog retrieval system for activities of daily living understanding. In Working Notes of CLEF 2018.

    Cited By

    View all
    • (2024)Improving efficiency of DNN-based relocalization module for autonomous driving with server-side computingJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00592-113:1Online publication date: 25-Jan-2024
    • (2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
    • (2024)HCCNet: Hybrid Coupled Cooperative Network for Robust Indoor LocalizationACM Transactions on Sensor Networks10.1145/366564520:4(1-22)Online publication date: 8-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
    August 2021
    443 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3476118
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 July 2021
    Accepted: 01 December 2020
    Revised: 01 November 2020
    Received: 01 June 2020
    Published in TOMM Volume 17, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Lifelog
    2. image retrieval
    3. summarization
    4. semantic mapping

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improving efficiency of DNN-based relocalization module for autonomous driving with server-side computingJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00592-113:1Online publication date: 25-Jan-2024
    • (2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
    • (2024)HCCNet: Hybrid Coupled Cooperative Network for Robust Indoor LocalizationACM Transactions on Sensor Networks10.1145/366564520:4(1-22)Online publication date: 8-Jul-2024
    • (2024)InteractNet: Social Interaction Recognition for Semantic-rich VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366366820:8(1-21)Online publication date: 12-Jun-2024
    • (2024)Real-Time Attentive Dilated U-Net for Extremely Dark Image EnhancementACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365466820:8(1-19)Online publication date: 12-Jun-2024
    • (2024)NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank MinimizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877220:6(1-22)Online publication date: 8-Mar-2024
    • (2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
    • (2023)FSPLO: a fast sensor placement location optimization method for cloud-aided inspection of smart buildingsJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00410-012:1Online publication date: 6-Mar-2023
    • (2023)A Multidimensional Data Utility Evaluation and Pricing Scheme in the Big Data MarketWireless Communications & Mobile Computing10.1155/2023/62174952023Online publication date: 1-Jan-2023
    • (2023)Attentional Composition Networks for Long-Tailed Human Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360325320:1(1-18)Online publication date: 9-Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media