Article

Learning query-class dependent weights in automatic video retrieval

Authors:

Alexander G. HauptmannAuthors Info & Claims

MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Pages 548 - 555

https://doi.org/10.1145/1027527.1027661

Published: 10 October 2004 Publication History

Abstract

Combining retrieval results from multiple modalities plays a crucial role for video retrieval systems, especially for automatic video retrieval systems without any user feedback and query expansion. However, most of current systems only utilize query independent combination or rely on explicit user weighting. In this work, we propose using query-class dependent weights within a hierarchial mixture-of-expert framework to combine multiple retrieval results. We first classify each user query into one of the four predefined categories and then aggregate the retrieval results with query-class associated weights, which can be learned from the development data efficiently and generalized to the unseen queries easily. Our experimental results demonstrate that the performance with query-class dependent weights can considerably surpass that with the query independent weights.

References

[1]

A. Amir, M. Berg, S.-F. Chang, and et al. IBM research TRECVID-2003 video retrieval system. In NIST TRECVID-2003, Nov 2003.

[2]

M. Beigi, A. B. Benitez, and S.-F. Chang. Metaseek: A content-based meta-search engine for images. In Proc. of SPIE, 1997.

[3]

D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. In Proc. 5th Conf. on Applied Natural Language Processing, 1997.

Digital Library

[4]

E. Brill. Some advances in transformation-based part of speech tagging. In Proc. of the 12th National Conf. Artificial Intelligence, volume 1, 1994.

Digital Library

[5]

P. Clarkson and R. Rosenfeld. Statistical language modeling using the CMU-Cambridge toolkit. In Proc. Eurospeech'97, 1997.

[6]

M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distances. In COLT'00, pages 158--169, 2000.

Digital Library

[7]

A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.

[8]

G. Gaughan, A. F. Smeaton, C. Gurrin, H. Lee, and K. McDonald. Design, implementation and testing of an interactive video retrieval system. In Proc. of 11th ACM MM Workshop on MIR, Nov 2003.

Digital Library

[9]

D. Grinberg, J. Lafferty, and D. Sleator. A robust parsing algorithm for link grammars. In Proc. of the 4th Int'l Workshop on Parsing Technologies, 1995.

[10]

A. G. Hauptmann and et al. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proc. of TRECVID 2003, Gaithersburg, MD, 2003.

[11]

M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181--214, 1994.

Digital Library

[12]

I.-H. Kang and G. Kim. Query type classification for web document retrieval. In Proc. of the 26th ACM SIGIR, pages 64--71. ACM Press, 2003.

Digital Library

[13]

X. Li and D. Roth. Learning question classifiers. In COLING'02, Aug 2002.

Digital Library

[14]

A. Merlino, D. Morey, and M. Maybury. Broadcast news navigation using story segmentation. In Proc. ACM Multimedia, 1997.

Digital Library

[15]

L. Ramshaw and M. Marcus. Text chunking using transformation-based learning. In Proc. of the ACL Third Workshop on Very Large Corpora, 1995.

[16]

S. E. Robertson, S. Walker, M. Hancock-Beaulieu, A. Gull, and M. Lau. Okapi at TREC4. In Text REtrieval Conference, pages 21--30, 1992.

[17]

G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

Digital Library

[18]

TRECVID: TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid.

[19]

T. Westerveld, T. Ianeva, L. Boldareva, A. P. de Vries, and D. Hiemstra. Combining infomation sources for video retrieval: The lowlands team at TRECVID 2003. In NIST TRECVID-2003, Nov 2003.

[20]

R. Yan and A. Hauptmann. Co-retrieval: A boosted reranking approach for multimedia retrieval. In Proc. of Intl. Conf. on Image and Video Retrieval, 2004.

[21]

H. Yang, L. Chaisorn, Y. Zhao, S.-Y. Neo, and T.-S. Chua. VideoQA: question answering on news video. In Proc. of the 11th ACM MM, pages 632--641, 2003.

Digital Library

[22]

Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc of the 14th ICML, pages 412--420, 1997.

Digital Library

[23]

D. Zhang and W. S. Lee. Question classification using support vector machines. In Proc. of the 26th ACM SIGIR, pages 26--32. ACM Press, 2003.

Digital Library

Cited By

Cao BXia YDing YZhang CHu QSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Predictive dynamic fusionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692288(5608-5628)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692288
Han ZZhang CFu HZhou J(2023)Trusted Multi-View Classification With Dynamic Evidential FusionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317198345:2(2551-2566)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TPAMI.2022.3171983
Hou ZNgo CChan WShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475281(3900-3908)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475281
Show More Cited By

Index Terms

Learning query-class dependent weights in automatic video retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Mutual relevance feedback for multimodal query formulation in video retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

Video indexing and retrieval systems allow users to find relevant video segments for a given information need. A multimodal video index may include speech indices, a text-from-screen (OCR) index, semantic visual concepts, content-based image features, ...
News video retrieval by learning multimodal semantic information
VISUAL'07: Proceedings of the 9th international conference on Advances in visual information systems

With the explosion of multimedia data especially that of video data, requirement of efficient video retrieval has becoming more and more important. Years of TREC Video Retrieval Evaluation (TRECVID) research gives benchmark for video search task. The ...
Effectiveness of video ontology in query by example approach
AMT'11: Proceedings of the 7th international conference on Active media technology

In this paper, we develop a video retrieval method based on Query-By-Example (QBE) approach where a query is represented by providing example shots. Relevant shots to the query are then retrieved by constructing a retrieval model from example shots. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

October 2004

1028 pages

ISBN:1581138938

DOI:10.1145/1027527

General Chairs:
Henning Schulzrinne
Columbia University
,
Nevenka Dimitrova
Philips Research
,
Program Chairs:
Angela Sasse
UCL
,
Sue Moon
KAIST
,
Rainer Lienhart
U Augsburg

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM04

Sponsor:

MM04: 2004 12th Annual ACM International Conference on Multimedia

October 10 - 16, 2004

NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

95
Total Citations
View Citations
675
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cao BXia YDing YZhang CHu QSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Predictive dynamic fusionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692288(5608-5628)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692288
Han ZZhang CFu HZhou J(2023)Trusted Multi-View Classification With Dynamic Evidential FusionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317198345:2(2551-2566)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TPAMI.2022.3171983
Hou ZNgo CChan WShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475281(3900-3908)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475281
Hu YLiu MSu XGao ZNie L(2021)Video Moment Localization via Deep Cross-Modal HashingIEEE Transactions on Image Processing10.1109/TIP.2021.307386730(4667-4677)Online publication date: 2021
https://doi.org/10.1109/TIP.2021.3073867
Alam AUllah ILee Y(2020)Video Big Data Analytics in the Cloud: A Reference Architecture, Survey, Opportunities, and Open Research IssuesIEEE Access10.1109/ACCESS.2020.30171358(152377-152422)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3017135
Jiang BHuang XYang CYuan JEl Saddik ADel Bimbo AZhang ZHauptmann ACandan KBertini MXie LWei X(2019)Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal AttentionProceedings of the 2019 on International Conference on Multimedia Retrieval10.1145/3323873.3325019(217-225)Online publication date: 5-Jun-2019
https://dl.acm.org/doi/10.1145/3323873.3325019
Mithun NLi JMetze FRoy-Chowdhury A(2019)Joint embeddings with multimodal cues for video-text retrievalInternational Journal of Multimedia Information Retrieval10.1007/s13735-018-00166-38:1(3-18)Online publication date: 12-Jan-2019
https://doi.org/10.1007/s13735-018-00166-3
Liu MWang XNie LTian QChen BChua TBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Cross-modal Moment Localization in VideosProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240549(843-851)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3240508.3240549
Mithun NLi JMetze FRoy-Chowdhury AAizawa KLew MSatoh S(2018)Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text RetrievalProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206064(19-27)Online publication date: 5-Jun-2018
https://dl.acm.org/doi/10.1145/3206025.3206064
Su YWang HJing PXu C(2017)A spatial-temporal iterative tensor decomposition technique for action and gesture recognitionMultimedia Tools and Applications10.1007/s11042-015-3090-776:8(10635-10652)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s11042-015-3090-7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten