research-article

Open access

Will VISIONE Remain Competitive in Lifelog Image Search?

Authors:

Giuseppe Amato,

Paolo Bolettieri,

Fabrizio Falchi,

Claudio Gennaro,

Nicola Messina,

Lucia Vadicamo,

Claudio VairoAuthors Info & Claims

LSC '24: Proceedings of the 7th Annual ACM Workshop on the Lifelog Search Challenge

Pages 58 - 63

https://doi.org/10.1145/3643489.3661122

Published: 18 June 2024 Publication History

Abstract

VISIONE is a versatile video retrieval system supporting diverse search functionalities, including free-text, similarity, and temporal searches. Its recent success in securing first place in the 2024 Video Browser Showdown (VBS) highlights its effectiveness. Originally designed for analyzing, indexing, and searching diverse video content, VISIONE can also be adapted to images from lifelog cameras thanks to its reliance on frame-based representations and retrieval mechanisms.

In this paper, we present an overview of VISIONE's core characteristics and the adjustments made to accommodate lifelog images. These adjustments primarily focus on enhancing result visualization within the GUI, such as grouping images by date or hour to align with lifelog dataset imagery. It's important to note that while the GUI has been updated, the core search engine and visual content analysis components remain unchanged from the version presented at VBS 2024. Specifically, metadata such as local time, GPS coordinates, and concepts associated with images are not indexed or utilized in the system. Instead, the system relies solely on the visual content of the images, with date and time information extracted from their filenames, which are utilized exclusively within the GUI for visualization purposes.

Our objective is to evaluate the system's performance within the Lifelog Search Challenge, emphasizing reliance on visual content analysis without additional metadata.

References

[1]

Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Franca Debole, Fabrizio Falchi, Claudio Gennaro, Lucia Vadicamo, and Claudio Vairo. 2021. The VISIONE video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. Journal of Imaging 7, 5 (2021), 76.

[2]

Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, and Claudio Vairo. 2023. VISIONE at Video Browser Showdown 2023. In MultiMedia Modeling. Springer, 615--621.

[3]

Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, and Claudio Vairo. 2024. VISIONE 5.0: Enhanced User Interface and AI Models for VBS2024. In International Conference on Multimedia Modeling. Springer, 332--339.

Digital Library

[4]

Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, and Lucia Vadicamo. 2020. Large-scale instance-level image retrieval. Information Processing & Management 57, 6 (2020), 102100.

[5]

Robert Benavente, Maria Vanrell, and Ramon Baldrich. 2008. Parametric fuzzy sets for automatic color naming. JOSA A 25, 10 (2008), 2582--2593.

[6]

Fabio Carrara, Claudio Gennaro, Lucia Vadicamo, and Giuseppe Amato. 2023. Vec2Doc: Transforming Dense Vectors into Sparse Representations for Efficient Information Retrieval. In Similarity Search and Applications. Springer, Cham.

[7]

Fabio Carrara, Lucia Vadicamo, Claudio Gennaro, and Giuseppe Amato. 2022. Approximate Nearest Neighbor Search on Standard Search Engines. In Similarity Search and Applications, Tomáš Skopal, Fabrizio Falchi, Jakub Lokoč, Maria Luisa Sapino, Ilaria Bartolini, and Marco Patella (Eds.). Springer International Publishing, Cham, 214--221.

[8]

Han Fang, Pengfei Xiong, Luhui Xu, and Yu Chen. 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097 (2021).

[9]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.

Digital Library

[10]

Cathal Gurrin, Graham Healy, Liting Zhou, Björn Þór Jónsson, Duc Tien Dang Nguyen, Jakub Lokoc, Luca Rossetto, Minh-Triet Tran, Steve Hodges, Werner Bailer, and Klaus Schoeffmann. 2024. Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24. International Conference on Multimedia Retrieval (ICMR'24).

Digital Library

[11]

Cathal Gurrin, Liting Zhou, Graham Healy, Björn Þór Jónsson, Duc-Tien Dang-Nguyen, Jakub Lokoč, Minh-Triet Tran, Wolfgang Hürst, Luca Rossetto, and Klaus Schöffmann. 2022. Introduction to the Fifth Annual Lifelog Search Challenge, LSC'22. In International Conference on Multimedia Retrieval (ICMR'22). ACM.

Digital Library

[12]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.

[13]

Jakub Lokoč, Stelios Andreadis, Werner Bailer, Aaron Duane, Cathal Gurrin, Zhixin Ma, Nicola Messina, Thao-Nhu Nguyen, Ladislav Peška, Luca Rossetto, Loris Sauter, Konstantin Schall, Klaus Schoeffmann, Omar Shahbaz Khan, Florian Spiess, Lucia Vadicamo, and Stefanos Vrochidis. 2023. Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS. Multimedia Systems (24 Aug 2023).

Digital Library

[14]

Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, and Rita Cucchiara. 2022. ALADIN: Distilling Finegrained Alignment Scores for Efficient Image-Text Matching and Retrieval. arXiv preprint arXiv:2207.14757 (2022).

[15]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023).

[16]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[17]

Ly-Duyen Tran, Manh-Duy Nguyen, Duc-Tien Dang-Nguyen, Silvan Heller, Florian Spiess, Jakub Lokoč, Ladislav Peška, Thao-Nhu Nguyen, Omar Shahbaz Khan, Aaron Duane, et al. 2023. Comparing Interactive Retrieval Approaches at the Lifelog Search Challenge 2021. IEEE Access (2023).

[18]

Joost Van De Weijer, Cordelia Schmid, Jakob Verbeek, and Diane Larlus. 2009. Learning color names for real-world applications. IEEE Transactions on Image Processing 18, 7 (2009), 1512--1523.

Digital Library

[19]

Haoyang Zhang, Ying Wang, Feras Dayoub, and Niko Sunderhauf. 2021. VarifocalNet: An IoU-aware Dense Object Detector. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

Cited By

Gurrin CZhou LHealy GBailer WDang Nguyen DHodges SJónsson BLokoč JRossetto LTran MSchöffmann KGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24Proceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658891(1334-1335)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658891

Index Terms

Will VISIONE Remain Competitive in Lifelog Image Search?
1. Information systems
  1. Information retrieval

Recommendations

VISIONE: A Large-Scale Video Retrieval System with Advanced Search Functionalities
ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

VISIONE is a large-scale video retrieval system that integrates multiple search functionalities, including free text search, spatial color and object search, visual and semantic similarity search, and temporal search. The system leverages cutting-edge ...
VISIONE for newbies: an easier-to-use video retrieval system
CBMI '23: Proceedings of the 20th International Conference on Content-based Multimedia Indexing

This paper presents a revised version of the VISIONE video retrieval system, which offers a wide range of search functionalities, including free text search, spatial color and object search, visual and semantic similarity search, and temporal search. ...
Searching consumer image collections using web-based concept expansion
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

As consumers accumulate more and more personal imagery, searching for specific images has become increasingly difficult. Consumers typically provide little or no annotations, and automated classifiers and concept tagging tools are limited in their scope ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LSC '24: Proceedings of the 7th Annual ACM Workshop on the Lifelog Search Challenge

June 2024

128 pages

ISBN:9798400705502

DOI:10.1145/3643489

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

H2020 LEIT Information and Communication Technologies
NextGenerationEU PNRR

Conference

LSC '24

Sponsor:

SIGMM

LSC '24: 7th Annual ACM Workshop on the Lifelog Search Challenge

June 10, 2024

Phuket, Thailand

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
48
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)48

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gurrin CZhou LHealy GBailer WDang Nguyen DHodges SJónsson BLokoč JRossetto LTran MSchöffmann KGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24Proceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658891(1334-1335)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658891

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents