Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3643489.3661130acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article
Open access

Voxento-Pro: An Advanced Voice Lifelog Retrieval Interaction for Multimodal Lifelogs

Published: 18 June 2024 Publication History

Abstract

We present an advanced version called Voxento-Pro which is an interactive voice-based lifelog retrieval system. This system has been developed to participate in the seventh ACM Lifelog Search Challenge LSC'24, at ICMR'24 in Thailand. In Voxento-Pro, we introduce a conversational query methodology by utilising OpenAI's Assistant API and employ OpenAI's Whisper technology for state-of-the-art speech recognition and synthesis. This novel version features a more natural interaction mechanism, which enhances the user's experience. In addition, the user interface (UI) was redesigned and introduced a new chat interface and other components. The backend retrieval API was rebuilt with a new technology to support fast and efficient API interactions. Data processing of the lifelog data resulted in about 20% of non-important images being identified and 27% of missing data being filled with Geocoding APIs.

References

[1]
Naushad Alam, Yvette Graham, and Cathal Gurrin. 2023. Memento 3.0: An Enhanced Lifelog Search Engine for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (LSC '23). ACM, New York, NY, USA, 41--46.
[2]
Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2020. Voxento: A Prototype Voice-controlled Interactive Search Engine for Lifelogs. In Proceedings of the Third Annual Workshop on the Lifelog Search Challenge (LSC'20) (Dublin, Ireland). ACM, New York, NY, USA, 77--81.
[3]
Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2021. Voxento 2.0: A Prototype Voice-controlled Interactive Search Engine for Lifelogs. In LSC 2021 - Proceedings of the 4th Annual Lifelog Search Challenge (Taipei, Taiwan). ACM, New York, NY, USA, 65--70.
[4]
Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2022. Voxento 3.0: A Prototype Voice-controlled Interactive Search Engine for Lifelogs. In Proceedings of the 5th Annual Lifelog Search Challenge (LSC '22), Vol. 1. ACM, New York, NY, USA, 43--47.
[5]
Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2023. Voxento 4.0: A More Flexible Visualisation and Control for Lifelogs. In Proceedings of the 6th International Workshop on Lifelog Search Challenge, LSC 2023. 7--12.
[6]
Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. 2022. Reproducible scaling laws for contrastive language-image learning. arXiv:2212.07143 [cs].
[7]
Cathal Gurrin, Alan F. Smeaton, and Aiden R. Doherty. 2014. LifeLogging: Personal big data. Vol. 8. Now Publishers. 1--125 pages.
[8]
Cathal Gurrin, Liting Zhou, Graham Healy, Werner Bailer, Duc-Tien Dang-Nguyen, Steve Hodges, Björn Þór Jónsson, Jakub Lokoć, Luca Rossetto, Minh-Triet Tran, and Klaus Schöffmann. 2024. Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24. In Proceedings of the 2024 International Conference on Multimedia Retrieval (ICMR '24) (Phuket, Thailand) (ICMR '24). ACM, New York, NY, USA.
[9]
Cathal Gurrin, Björn Por Þór Jónsson, Duc-Tien Dang-Nguyen, Graham Healy, Jakub Lokoč, Liting Zhou, Luca Rossetto, Minh-Triet Tran, Wolfgang Hürst, and Klaus Schöffmann. 2023. Introduction to the Sixth Annual Lifelog Search Challenge, LSC'23. In Proc. International Conference on Multimedia Retrieval (ICMR'23) (Thessaloniki, Greece) (ICMR '23). ACM, New York, NY, USA.
[10]
Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Cathal Gurrin, and Minh-Triet Tran. 2023. Lifelog Discovery Assistant: Suggesting Prompts and Indexing Event Sequences for FIRST at LSC 2023. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (LSC '23). ACM, New York, NY, USA, 47--52.
[11]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535--547.
[12]
Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran, Thanh Binh Nguyen, Graham Healy, Annalina Caputo, and Sinead Smyth. 2023. E-LifeSeeker: An Interactive Lifelog Search Engine for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. ACM, Thessaloniki Greece, 13--17.
[13]
Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Gia-Huy Vuong, Van-Son Ho, Minh-Triet Tran, Van-Tu Ninh, Minh-Khoi Pham, Tu-Khiem Le, and Graham Healy. 2023. LifeInsight: An Interactive Lifelog Retrieval System with Comprehensive Spatial Insights and Query Assistance. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (LSC '23). ACM, New York, NY, USA, 59--64.
[14]
OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs].
[15]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 http://arxiv.org/abs/2103.00020
[16]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. (2022). arXiv:2212.04356 https://github.com/openai/http://arxiv.org/abs/2212.04356
[17]
Klaus Schoeffmann. 2023. lifeXplore at the Lifelog Search Challenge 2023. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (LSC '23). ACM, New York, NY, USA, 53--58.
[18]
Florian Spiess, Ralph Gasser, Silvan Heller, Luca Rossetto, Loris Sauter, Milan Van Zanten, and Heiko Schuldt. 2021. Exploring Intuitive Lifelog Retrieval and Interaction Modes in Virtual Reality with vitrivr-VR. In LSC 2021 - Proceedings of the 4th Annual Lifelog Search Challenge (Taipei, Taiwan). ACM, New York, NY, USA, 17--22.
[19]
Florian Spiess and Heiko Schuldt. 2022. Multimodal Interactive Lifelog Retrieval with vitrivr-VR. In Proceedings of the 5th Annual Lifelog Search Challenge (LSC '22), Vol. 1. ACM, New York, NY, USA, 38--42.
[20]
Ly Duyen Tran, Binh Nguyen, Liting Zhou, and Cathal Gurrin. 2023. MyEachtra: Event-Based Interactive Lifelog Retrieval System for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. ACM, Thessaloniki Greece, 24--29.
[21]
Ly-Duyen Tran, Manh-Duy Nguyen, Binh Nguyen, Hyowon Lee, Liting Zhou, and Cathal Gurrin. 2022. E-Myscéal: Embedding-based Interactive Lifelog Retrieval System for LSC'22. In Proceedings of the 5th Annual Lifelog Search Challenge (LSC '22). ACM, New York, NY, USA, 32--37.
[22]
Ly-Duyen Tran, Dongyun Nie, Liting Zhou, Binh Nguyen, and Cathal Gurrin. 2023. VAISL: Visual-Aware Identification of Semantic Locations in Lifelog. In MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9--12, 2023, Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg, 659--670.
[23]
Quang-Linh Tran, Ly-Duyen Tran, Binh Nguyen, and Cathal Gurrin. 2023. MemoriEase: An Interactive Lifelog Retrieval System for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. ACM, Thessaloniki Greece, 30--35.

Cited By

View all
  • (2024)Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24Proceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658891(1334-1335)Online publication date: 30-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
LSC '24: Proceedings of the 7th Annual ACM Workshop on the Lifelog Search Challenge
June 2024
128 pages
ISBN:9798400705502
DOI:10.1145/3643489
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

  1. lifelog
  2. interactive retrieval
  3. voice interaction
  4. conversational search

Qualifiers

  • Research-article

Funding Sources

  • Insight Centre for Data Analytics
  • Vistamilk SFI Research Centre
  • Ministry of Education in Saudi Arabia

Conference

LSC '24
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)23
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24Proceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658891(1334-1335)Online publication date: 30-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media