tutorial

Video Indexing, Search, Detection, and Description with Focus on TRECVID

Authors:

Vinh-Tiep Nguyen,

Georges Quénot,

Shin'ichi SatohAuthors Info & Claims

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

Pages 3 - 4

https://doi.org/10.1145/3078971.3079044

Published: 06 June 2017 Publication History

Abstract

There has been a tremendous growth in video data the last decade. People are using mobile phones and tablets to take, share or watch videos more than ever before. Video cameras are around us almost everywhere in the public domain (e.g. stores, streets, public facilities, ...etc). Efficient and effective retrieval methods are critically needed in different applications. The goal of TRECVID is to encourage research in content-based video retrieval by providing large test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. In this tutorial, we present and discuss some of the most important and fundamental content-based video retrieval problems such as recognizing predefined visual concepts, searching in videos for complex ad-hoc user queries, searching by image/video examples in a video dataset to retrieve specific objects, persons, or locations, detecting events, and finally bridging the gap between vision and language by looking into how can systems automatically describe videos in a natural language. A review of the state of the art, current challenges, and future directions along with pointers to useful resources will be presented by different regular TRECVID participating teams. Each team will present one of the following tasks:

Semantic INdexing (SIN)

Zero-example (0Ex) Video Search (AVS)

Instance Search (INS)

Multimedia Event Detection (MED)

Video to Text (VTT)

References

[1]

George Awad, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, Alan F Smeaton, Georges Quénot, Maria Eskevich, Robin Aly, and Roeland Ordelman. 2016. Trecvid 2016: Evaluating video search, video event detection, localization, and hyperlinking. In Proceedings of TRECVID, Vol. 2016.

[2]

George Awad, Wessel Kraaij, Paul Over, and Shinâichi Satoh. 2017. Instance search retrospective with focus on TRECVID. International Journal of Multimedia Information Retrieval 6, 1 (2017), 1--29.

[3]

George Awad, Cees GM Snoek, Alan F Smeaton, and Georges Quénot. 2016. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Transactions on Media Technology and Applications 4, 3 (2016), 187--208.

[4]

Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, Denis Pellerin, and Georges Quénot. 2016. Learned features versus engineered features for multimedia indexing. Multimedia Tools and Applications (2016), 1--18.

Digital Library

[5]

Jianfeng Dong, Xirong Li, Weiyu Lan, Yujia Huo, and Cees GM Snoek. 2016. Early Embedding and Late Reranking for Video Captioning. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 1082--1086.

Digital Library

[6]

Jianfeng Dong, Xirong Li, and Cees GM Snoek. 2016. Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. In ArXive.

[7]

Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 17--26.

Digital Library

[8]

Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2015. Discovering semantic vocabularies for cross-media retrieval. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 131--138.

Digital Library

[9]

Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2017. Video2vec Embeddings Recognize Events when Examples are Scarce. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).

Digital Library

[10]

Amirhossein Habibian and Cees GM Snoek. 2014. Recommendations for rec- ognizing video events by concept vocabularies. Computer Vision and Image Understanding 124 (2014), 110--122.

[11]

Duy-Dinh Le, S. Phan, V. Nguyen, C. Zhu, D. M. Nguyen, T. D. Ngo, S. Kasamwat- tanarote, P. Sebastien, M. Tran, D. A. Duong, and Shin'ichi Satoh. 2014. National Institute of Informatics, Japan at TRECVID 2014. In TRECVID.

[12]

Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept- Based Interactive Search System. In International Conference on Multimedia Modeling. Springer, 463--468.

[13]

Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detec- tion with zero example: select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 127--134.

Digital Library

[14]

Masoud Mazloom, Efstratios Gavves, and Cees GM Snoek. 2014. Conceptlets: Selective semantics for classifying video events. IEEE Transactions on Multimedia 16, 8 (2014), 2214--2228.

[15]

Masoud Mazloom, Xirong Li, and Cees GM Snoek. 2016. Tagbook: A semantic video representation without supervision for event detection. IEEE Transactions on Multimedia 18, 7 (2016), 1378--1388.

Digital Library

[16]

Pascal Mettes, Dennis C Koelma, and Cees GM Snoek. 2016. The imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 175--182.

Digital Library

[17]

Xiao-Yong Wei, Yu-Gang Jiang, and Chong-Wah Ngo. 2011. Concept-driven multi-modality fusion for video search. IEEE Transactions on Circuits and Systems for Video Technology 21, 1 (2011), 62--73.

Digital Library

[18]

Hao Zhang, Yi-Jie Lu, Maaike de Boer, Frank ter Haar, Zhaofan Qiu, Klamer Schutte, Wessel Kraaij, and Chong-Wah Ngo. 2015. VIREO-TNO@ TRECVID 2015: multimedia event detection. In Proc. of TRECVID .

[19]

Cai-Zhi Zhu, Hervé Jégou, and Shin Ichi Satoh. 2013. Query-adaptive asym- metrical dissimilarities for visual object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1705--1712.

Digital Library

[20]

Cai-Zhi Zhu and Shin'ichi Satoh. 2012. Large vocabulary quantization for search- ing instances from videos. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, 52.

Digital Library

Cited By

Hassani HErshadi MMohebi A(2022)LVTIAInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10280259:2Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102802
Dong JLi XXu CYang XYang GWang XWang M(2021)Dual Encoding for Video Retrieval by TextIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.3059295(1-1)Online publication date: 2021
https://doi.org/10.1109/TPAMI.2021.3059295
Li XZhou FXu CJi JYang G(2021)SEA: Sentence Encoder Assembly for Video Retrieval by Textual QueriesIEEE Transactions on Multimedia10.1109/TMM.2020.304206723(4351-4362)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.3042067
Show More Cited By

Index Terms

Video Indexing, Search, Detection, and Description with Focus on TRECVID
1. Information systems
  1. Information retrieval

Recommendations

Zero-Example Multimedia Event Detection and Recounting with Unsupervised Evidence Localization
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Retrieval of a complex multimedia event has long been regarded as a challenging task. Multimedia event recounting, other than event detection, focuses on providing comprehensible evidence which justifies a detection result. Recounting enables "video ...
n-gram Models for Video Semantic Indexing
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

We propose n-gram modeling of shot sequences for video semantic indexing, in which semantic concepts are extracted from a video shot. Most previous studies for this task have assumed that video shots in a video clip are independent from each other. We ...
Large vocabulary quantization for searching instances from videos
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

A very promising application involving video collections is to search for relevant video segments from a video database when given few visual examples of the specific instance, e.g. a person, object, or place. However, this problem is difficult due to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

June 2017

524 pages

ISBN:9781450347013

DOI:10.1145/3078971

General Chairs:
Bogdan Ionescu
University Politehnica of Bucharest, Romania
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Jiashi Feng
National University of Singapore, Singapore
,
Martha Larson
Radboud University & Delft University of Technology, The Netherlands
,
Rainer Lienhart
University of Augsburg, Germany
,
Cees Snoek
University of Amsterdam & Qualcomm Research Netherlands, The Netherlands

Copyright © 2017 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2017

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

ICMR '17

Sponsor:

SIGMM

ICMR '17: International Conference on Multimedia Retrieval

June 6 - 9, 2017

Bucharest, Romania

Acceptance Rates

ICMR '17 Paper Acceptance Rate 33 of 95 submissions, 35%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
224
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hassani HErshadi MMohebi A(2022)LVTIAInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10280259:2Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102802
Dong JLi XXu CYang XYang GWang XWang M(2021)Dual Encoding for Video Retrieval by TextIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.3059295(1-1)Online publication date: 2021
https://doi.org/10.1109/TPAMI.2021.3059295
Li XZhou FXu CJi JYang G(2021)SEA: Sentence Encoder Assembly for Video Retrieval by Textual QueriesIEEE Transactions on Multimedia10.1109/TMM.2020.304206723(4351-4362)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.3042067
Li XAlameda-Pineda, DRedi DCelis PSebe PChang P(2019)Deep Learning for Video Retrieval by Natural LanguageProceedings of the 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia10.1145/3347447.3350565(2-3)Online publication date: 15-Oct-2019
https://dl.acm.org/doi/10.1145/3347447.3350565
Kuang NClement H.C. L(2019)Analysis of Evolutionary Behavior in Self-Learning Media Search Engines2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006191(643-650)Online publication date: Dec-2019
https://doi.org/10.1109/BigData47090.2019.9006191
Al-Hami MLakaemper RRawashdeh MHossain M(2019)Camera localization for a human-pose in 3D space using a single 2D human-pose image with landmarksMultimedia Tools and Applications10.1007/s11042-018-6789-478:3(3587-3608)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s11042-018-6789-4
Men XZhou FLi XBao HIp HSeidel HSheffer AFu HGhosh AKopf J(2018)A deep learned method for video indexing and retrievalProceedings of the 26th Pacific Conference on Computer Graphics and Applications: Short Papers10.2312/pg.20181287(85-88)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.2312/pg.20181287

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents