Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1027527.1027661acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Learning query-class dependent weights in automatic video retrieval

Published: 10 October 2004 Publication History

Abstract

Combining retrieval results from multiple modalities plays a crucial role for video retrieval systems, especially for automatic video retrieval systems without any user feedback and query expansion. However, most of current systems only utilize query independent combination or rely on explicit user weighting. In this work, we propose using query-class dependent weights within a hierarchial mixture-of-expert framework to combine multiple retrieval results. We first classify each user query into one of the four predefined categories and then aggregate the retrieval results with query-class associated weights, which can be learned from the development data efficiently and generalized to the unseen queries easily. Our experimental results demonstrate that the performance with query-class dependent weights can considerably surpass that with the query independent weights.

References

[1]
A. Amir, M. Berg, S.-F. Chang, and et al. IBM research TRECVID-2003 video retrieval system. In NIST TRECVID-2003, Nov 2003.
[2]
M. Beigi, A. B. Benitez, and S.-F. Chang. Metaseek: A content-based meta-search engine for images. In Proc. of SPIE, 1997.
[3]
D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. In Proc. 5th Conf. on Applied Natural Language Processing, 1997.
[4]
E. Brill. Some advances in transformation-based part of speech tagging. In Proc. of the 12th National Conf. Artificial Intelligence, volume 1, 1994.
[5]
P. Clarkson and R. Rosenfeld. Statistical language modeling using the CMU-Cambridge toolkit. In Proc. Eurospeech'97, 1997.
[6]
M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distances. In COLT'00, pages 158--169, 2000.
[7]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.
[8]
G. Gaughan, A. F. Smeaton, C. Gurrin, H. Lee, and K. McDonald. Design, implementation and testing of an interactive video retrieval system. In Proc. of 11th ACM MM Workshop on MIR, Nov 2003.
[9]
D. Grinberg, J. Lafferty, and D. Sleator. A robust parsing algorithm for link grammars. In Proc. of the 4th Int'l Workshop on Parsing Technologies, 1995.
[10]
A. G. Hauptmann and et al. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proc. of TRECVID 2003, Gaithersburg, MD, 2003.
[11]
M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181--214, 1994.
[12]
I.-H. Kang and G. Kim. Query type classification for web document retrieval. In Proc. of the 26th ACM SIGIR, pages 64--71. ACM Press, 2003.
[13]
X. Li and D. Roth. Learning question classifiers. In COLING'02, Aug 2002.
[14]
A. Merlino, D. Morey, and M. Maybury. Broadcast news navigation using story segmentation. In Proc. ACM Multimedia, 1997.
[15]
L. Ramshaw and M. Marcus. Text chunking using transformation-based learning. In Proc. of the ACL Third Workshop on Very Large Corpora, 1995.
[16]
S. E. Robertson, S. Walker, M. Hancock-Beaulieu, A. Gull, and M. Lau. Okapi at TREC4. In Text REtrieval Conference, pages 21--30, 1992.
[17]
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
[18]
TRECVID: TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid.
[19]
T. Westerveld, T. Ianeva, L. Boldareva, A. P. de Vries, and D. Hiemstra. Combining infomation sources for video retrieval: The lowlands team at TRECVID 2003. In NIST TRECVID-2003, Nov 2003.
[20]
R. Yan and A. Hauptmann. Co-retrieval: A boosted reranking approach for multimedia retrieval. In Proc. of Intl. Conf. on Image and Video Retrieval, 2004.
[21]
H. Yang, L. Chaisorn, Y. Zhao, S.-Y. Neo, and T.-S. Chua. VideoQA: question answering on news video. In Proc. of the 11th ACM MM, pages 632--641, 2003.
[22]
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc of the 14th ICML, pages 412--420, 1997.
[23]
D. Zhang and W. S. Lee. Question classification using support vector machines. In Proc. of the 26th ACM SIGIR, pages 26--32. ACM Press, 2003.

Cited By

View all
  • (2024)Predictive dynamic fusionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692288(5608-5628)Online publication date: 21-Jul-2024
  • (2023)Trusted Multi-View Classification With Dynamic Evidential FusionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317198345:2(2551-2566)Online publication date: 1-Feb-2023
  • (2021)CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475281(3900-3908)Online publication date: 17-Oct-2021
  • Show More Cited By

Index Terms

  1. Learning query-class dependent weights in automatic video retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia
    October 2004
    1028 pages
    ISBN:1581138938
    DOI:10.1145/1027527
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. learning
    2. modality fusion
    3. query class
    4. video retrieval

    Qualifiers

    • Article

    Conference

    MM04

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Predictive dynamic fusionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692288(5608-5628)Online publication date: 21-Jul-2024
    • (2023)Trusted Multi-View Classification With Dynamic Evidential FusionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317198345:2(2551-2566)Online publication date: 1-Feb-2023
    • (2021)CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475281(3900-3908)Online publication date: 17-Oct-2021
    • (2021)Video Moment Localization via Deep Cross-Modal HashingIEEE Transactions on Image Processing10.1109/TIP.2021.307386730(4667-4677)Online publication date: 2021
    • (2020)Video Big Data Analytics in the Cloud: A Reference Architecture, Survey, Opportunities, and Open Research IssuesIEEE Access10.1109/ACCESS.2020.30171358(152377-152422)Online publication date: 2020
    • (2019)Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal AttentionProceedings of the 2019 on International Conference on Multimedia Retrieval10.1145/3323873.3325019(217-225)Online publication date: 5-Jun-2019
    • (2019)Joint embeddings with multimodal cues for video-text retrievalInternational Journal of Multimedia Information Retrieval10.1007/s13735-018-00166-38:1(3-18)Online publication date: 12-Jan-2019
    • (2018)Cross-modal Moment Localization in VideosProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240549(843-851)Online publication date: 15-Oct-2018
    • (2018)Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text RetrievalProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206064(19-27)Online publication date: 5-Jun-2018
    • (2017)A spatial-temporal iterative tensor decomposition technique for action and gesture recognitionMultimedia Tools and Applications10.1007/s11042-015-3090-776:8(10635-10652)Online publication date: 1-Apr-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media