Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3078971.3079044acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
tutorial

Video Indexing, Search, Detection, and Description with Focus on TRECVID

Published: 06 June 2017 Publication History

Abstract

There has been a tremendous growth in video data the last decade. People are using mobile phones and tablets to take, share or watch videos more than ever before. Video cameras are around us almost everywhere in the public domain (e.g. stores, streets, public facilities, ...etc). Efficient and effective retrieval methods are critically needed in different applications. The goal of TRECVID is to encourage research in content-based video retrieval by providing large test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. In this tutorial, we present and discuss some of the most important and fundamental content-based video retrieval problems such as recognizing predefined visual concepts, searching in videos for complex ad-hoc user queries, searching by image/video examples in a video dataset to retrieve specific objects, persons, or locations, detecting events, and finally bridging the gap between vision and language by looking into how can systems automatically describe videos in a natural language. A review of the state of the art, current challenges, and future directions along with pointers to useful resources will be presented by different regular TRECVID participating teams. Each team will present one of the following tasks:
Semantic INdexing (SIN)
Zero-example (0Ex) Video Search (AVS)
Instance Search (INS)
Multimedia Event Detection (MED)
Video to Text (VTT)

References

[1]
George Awad, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, Alan F Smeaton, Georges Quénot, Maria Eskevich, Robin Aly, and Roeland Ordelman. 2016. Trecvid 2016: Evaluating video search, video event detection, localization, and hyperlinking. In Proceedings of TRECVID, Vol. 2016.
[2]
George Awad, Wessel Kraaij, Paul Over, and Shinâichi Satoh. 2017. Instance search retrospective with focus on TRECVID. International Journal of Multimedia Information Retrieval 6, 1 (2017), 1--29.
[3]
George Awad, Cees GM Snoek, Alan F Smeaton, and Georges Quénot. 2016. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Transactions on Media Technology and Applications 4, 3 (2016), 187--208.
[4]
Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, Denis Pellerin, and Georges Quénot. 2016. Learned features versus engineered features for multimedia indexing. Multimedia Tools and Applications (2016), 1--18.
[5]
Jianfeng Dong, Xirong Li, Weiyu Lan, Yujia Huo, and Cees GM Snoek. 2016. Early Embedding and Late Reranking for Video Captioning. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 1082--1086.
[6]
Jianfeng Dong, Xirong Li, and Cees GM Snoek. 2016. Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. In ArXive.
[7]
Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 17--26.
[8]
Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2015. Discovering semantic vocabularies for cross-media retrieval. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 131--138.
[9]
Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2017. Video2vec Embeddings Recognize Events when Examples are Scarce. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
[10]
Amirhossein Habibian and Cees GM Snoek. 2014. Recommendations for rec- ognizing video events by concept vocabularies. Computer Vision and Image Understanding 124 (2014), 110--122.
[11]
Duy-Dinh Le, S. Phan, V. Nguyen, C. Zhu, D. M. Nguyen, T. D. Ngo, S. Kasamwat- tanarote, P. Sebastien, M. Tran, D. A. Duong, and Shin'ichi Satoh. 2014. National Institute of Informatics, Japan at TRECVID 2014. In TRECVID.
[12]
Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept- Based Interactive Search System. In International Conference on Multimedia Modeling. Springer, 463--468.
[13]
Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detec- tion with zero example: select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 127--134.
[14]
Masoud Mazloom, Efstratios Gavves, and Cees GM Snoek. 2014. Conceptlets: Selective semantics for classifying video events. IEEE Transactions on Multimedia 16, 8 (2014), 2214--2228.
[15]
Masoud Mazloom, Xirong Li, and Cees GM Snoek. 2016. Tagbook: A semantic video representation without supervision for event detection. IEEE Transactions on Multimedia 18, 7 (2016), 1378--1388.
[16]
Pascal Mettes, Dennis C Koelma, and Cees GM Snoek. 2016. The imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 175--182.
[17]
Xiao-Yong Wei, Yu-Gang Jiang, and Chong-Wah Ngo. 2011. Concept-driven multi-modality fusion for video search. IEEE Transactions on Circuits and Systems for Video Technology 21, 1 (2011), 62--73.
[18]
Hao Zhang, Yi-Jie Lu, Maaike de Boer, Frank ter Haar, Zhaofan Qiu, Klamer Schutte, Wessel Kraaij, and Chong-Wah Ngo. 2015. VIREO-TNO@ TRECVID 2015: multimedia event detection. In Proc. of TRECVID .
[19]
Cai-Zhi Zhu, Hervé Jégou, and Shin Ichi Satoh. 2013. Query-adaptive asym- metrical dissimilarities for visual object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1705--1712.
[20]
Cai-Zhi Zhu and Shin'ichi Satoh. 2012. Large vocabulary quantization for search- ing instances from videos. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, 52.

Cited By

View all

Index Terms

  1. Video Indexing, Search, Detection, and Description with Focus on TRECVID

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval
    June 2017
    524 pages
    ISBN:9781450347013
    DOI:10.1145/3078971
    • General Chairs:
    • Bogdan Ionescu,
    • Nicu Sebe,
    • Program Chairs:
    • Jiashi Feng,
    • Martha Larson,
    • Rainer Lienhart,
    • Cees Snoek
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2017

    Check for updates

    Author Tags

    1. instance search
    2. multimedia event detection
    3. semantic indexing
    4. trecvid
    5. video description
    6. video search

    Qualifiers

    • Tutorial

    Conference

    ICMR '17
    Sponsor:

    Acceptance Rates

    ICMR '17 Paper Acceptance Rate 33 of 95 submissions, 35%;
    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)LVTIAInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10280259:2Online publication date: 1-Mar-2022
    • (2021)Dual Encoding for Video Retrieval by TextIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.3059295(1-1)Online publication date: 2021
    • (2021)SEA: Sentence Encoder Assembly for Video Retrieval by Textual QueriesIEEE Transactions on Multimedia10.1109/TMM.2020.304206723(4351-4362)Online publication date: 2021
    • (2019)Deep Learning for Video Retrieval by Natural LanguageProceedings of the 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia10.1145/3347447.3350565(2-3)Online publication date: 15-Oct-2019
    • (2019)Analysis of Evolutionary Behavior in Self-Learning Media Search Engines2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006191(643-650)Online publication date: Dec-2019
    • (2019)Camera localization for a human-pose in 3D space using a single 2D human-pose image with landmarksMultimedia Tools and Applications10.1007/s11042-018-6789-478:3(3587-3608)Online publication date: 1-Feb-2019
    • (2018)A deep learned method for video indexing and retrievalProceedings of the 26th Pacific Conference on Computer Graphics and Applications: Short Papers10.2312/pg.20181287(85-88)Online publication date: 8-Oct-2018

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media