An Image Retrieval System for Video

Bolettieri, Paolo; Carrara, Fabio; Debole, Franca; Falchi, Fabrizio; Gennaro, Claudio; Vadicamo, Lucia; Vairo, Claudio

doi:10.1007/978-3-030-32047-8_29

Paolo Bolettieri¹²,
Fabio Carrara¹²,
Franca Debole¹²,
Fabrizio Falchi¹²,
Claudio Gennaro¹²,
Lucia Vadicamo¹² &
…
Claudio Vairo¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11807))

Included in the following conference series:

International Conference on Similarity Search and Applications

1155 Accesses
2 Citations

Abstract

Since the 1970’s the Content-Based Image Indexing and Retrieval (CBIR) has been an active area. Nowadays, the rapid increase of video data has paved the way to the advancement of the technologies in many different communities for the creation of Content-Based Video Indexing and Retrieval (CBVIR). However, greater attention needs to be devoted to the development of effective tools for video search and browse. In this paper, we present Visione, a system for large-scale video retrieval. The system integrates several content-based analysis and retrieval modules, including a keywords search, a spatial object-based search, and a visual similarity search. From the tests carried out by users when they needed to find as many correct examples as possible, the similarity search proved to be the most promising option. Our implementation is based on state-of-the-art deep learning approaches for content analysis and leverages highly efficient indexing techniques to ensure scalability. Specifically, we encode all the visual and textual descriptors extracted from the videos into (surrogate) textual representations that are then efficiently indexed and searched using an off-the-shelf text search engine using similarity functions.

Work partially supported by the AI4EU project (EC-H2020 - Contract n. 825619).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

VISIONE at VBS2019

Deep Learning Based Semantic Video Indexing and Retrieval

Efficient text-based query based on multi-level and deep-semantic multimedia indexing and retrieval

Article 30 November 2023

Notes

References

Amato, G., et al.: VISIONE at VBS2019. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019, Part II. LNCS, vol. 11296, pp. 591–596. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05716-9_51
Chapter Google Scholar
Amato, G., Falchi, F., Gennaro, C., Rabitti, F.: Searching and annotating 100M images with YFCC100M-HNfc6 and MI-File. In: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, CBMI 2017, pp. 26:1–26:4. ACM (2017)
Google Scholar
Amato, G., Falchi, F., Gennaro, C., Vadicamo, L.: Deep permutations: deep convolutional neural networks and permutation-based indexing. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 93–106. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46759-7_7
Chapter Google Scholar
Awad, G., Snoek, C.G.M., Smeaton, A.F., Quénot, G.: Trecvid semantic indexing of video: a 6-year retrospective. ITE Trans. Media Technol. Appl. 4(3), 187–208 (2016)
Article Google Scholar
Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge (1998)
Book Google Scholar
Gennaro, C., Amato, G., Bolettieri, P., Savino, P.: An approach to content-based image retrieval based on the lucene search engine library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 55–66. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15464-5_8
Chapter Google Scholar
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017)
Article MathSciNet Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Jiang, Y.G., Wu, Z., Wang, J., Xue, X., Chang, S.F.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 352–364 (2018)
Article Google Scholar
Lokoč, J., Kovalčík, G., Souček, T.: Revisiting SIRET video retrieval tool. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 419–424. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_44
Chapter Google Scholar
Lokoč, J., Bailer, W., Schöffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: Video browser showdown 2015–2017. IEEE Trans. Multimedia 20(12), 3361–3376 (2018)
Article Google Scholar
Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15(1), 29:1–29:18 (2019)
Article Google Scholar
Niraimathi, D.S.: Color based image segmentation using classification of k-NN with contour analysis method. Int. Res. J. Eng. Technol. 3, 1169–1177 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05710-7_29
Chapter Google Scholar
Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)
Article Google Scholar
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015)
Truong, T.-D., et al.: Video search based on semantic extraction and locally regional object proposal. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 451–456. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_49
Chapter Google Scholar
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Science and Technologies, Italian National Research Council (CNR), Via G. Moruzzi 1, Pisa, Italy
Paolo Bolettieri, Fabio Carrara, Franca Debole, Fabrizio Falchi, Claudio Gennaro, Lucia Vadicamo & Claudio Vairo

Authors

Paolo Bolettieri
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Carrara
View author publications
You can also search for this author in PubMed Google Scholar
Franca Debole
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Falchi
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Gennaro
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Vadicamo
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Vairo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franca Debole .

Editor information

Editors and Affiliations

ISTI-CNR, Pisa, Italy
Giuseppe Amato
ISTI-CNR, Pisa, Italy
Claudio Gennaro
New Jersey Institute of Technology, Newark, NJ, USA
Vincent Oria
University of Novi Sad, Novi Sad, Serbia
Miloš Radovanović

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bolettieri, P. et al. (2019). An Image Retrieval System for Video. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-32047-8_29
Published: 23 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32046-1
Online ISBN: 978-3-030-32047-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Image Retrieval System for Video

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

VISIONE at VBS2019

Deep Learning Based Semantic Video Indexing and Retrieval

Efficient text-based query based on multi-level and deep-semantic multimedia indexing and retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Image Retrieval System for Video

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

VISIONE at VBS2019

Deep Learning Based Semantic Video Indexing and Retrieval

Efficient text-based query based on multi-level and deep-semantic multimedia indexing and retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation