Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

VISIONE at VBS2019

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Included in the following conference series:

Abstract

This paper presents VISIONE, a tool for large–scale video search. The tool can be used for both known-item and ad-hoc video search tasks since it integrates several content-based analysis and retrieval modules, including a keyword search, a spatial object-based search, and a visual similarity search. Our implementation is based on state-of-the-art deep learning approaches for the content analysis and leverages highly efficient indexing techniques to ensure scalability. Specifically, we encode all the visual and textual descriptors extracted from the videos into (surrogate) textual representations that are then efficiently indexed and searched using an off-the-shelf text search engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/elastic/elasticsearch.

  2. 2.

    https://pjreddie.com/darknet/yolo/.

  3. 3.

    http://github.com/BVLC/caffe/wiki/Model-Zoo.

References

  1. Amato, G., Falchi, F., Gennaro, C., Rabitti, F.: Searching and annotating 100M images with YFCC100M-HNfc6 and MI-file. In: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, CBMI 2017, Florence, Italy, 19–21 June 2017, pp. 26:1–26:4 (2017). https://doi.org/10.1145/3095713.3095740

  2. Amato, G., Falchi, F., Gennaro, C., Vadicamo, L.: Deep permutations: deep convolutional neural networks and permutation-based indexing. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 93–106. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46759-7_7

    Chapter  Google Scholar 

  3. Awad, G., Snoek, C.G.M., Smeaton, A.F., Quénot, G.: TRECVid semantic indexing of video: a 6-year retrospective (2016)

    Google Scholar 

  4. Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76(4), 5539–5571 (2017). https://doi.org/10.1007/s11042-016-3661-2

    Article  Google Scholar 

  5. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  6. Gennaro, C., Amato, G., Bolettieri, P., Savino, P.: An approach to content-based image retrieval based on the lucene search engine library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 55–66. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15464-5_8

    Chapter  Google Scholar 

  7. Gordo, A., Almazán, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017)

    Article  MathSciNet  Google Scholar 

  8. Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)

  9. Jiang, Y.G., Wu, Z., Wang, J., Xue, X., Chang, S.F.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans. Patt. Anal. Mach. Intell. 40(2), 352–364 (2018). https://doi.org/10.1109/TPAMI.2017.2670560

    Article  Google Scholar 

  10. Lokoc, J., Bailer, W., Schoeffmann, K., Muenzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimedia 20(12), 3361–3376 (2018). https://doi.org/10.1109/TMM.2018.2830110

    Article  Google Scholar 

  11. Lokoč, J., Kovalčík, G., Souček, T.: Revisiting SIRET video retrieval tool. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 419–424. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_44

    Chapter  Google Scholar 

  12. Lokoč, J., Souček, T., Kovalčik, G.: Using an interactive video retrieval tool for lifelog data. In: Proceedings of the 2018 ACM Workshop on the Lifelog Search Challenge, LSC 2018, pp. 15–19. ACM, New York (2018). https://doi.org/10.1145/3210539.3210543

  13. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv (2018)

    Google Scholar 

  14. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  15. Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016). https://doi.org/10.1145/2812802

    Article  Google Scholar 

  16. Tolias, G., Sicre, R., JĂ©gou, H.: Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015)

  17. Truong, T.D., et al.: Video search based on semantic extraction and locally regional object proposal. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 451–456. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_49

    Chapter  Google Scholar 

  18. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Patt. Anal. Mach. Intell. 40, 1452–1464 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially funded by “Smart News: Social sensing for breaking news”, CUP CIPE D58C15000270008, by VISECH ARCO-CNR, CUP B56J17001330004, and by “Automatic Data and documents Analysis to enhance human-based processes” (ADA), CUP CIPE D55F17000290009. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucia Vadicamo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Amato, G. et al. (2019). VISIONE at VBS2019. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05716-9_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05715-2

  • Online ISBN: 978-3-030-05716-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics