tutorial

Zero-Example Event Search using MultiModal Pseudo Relevance Feedback

Authors:

Teruko Mitamura,

Alexander G. HauptmannAuthors Info & Claims

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

Pages 297 - 304

https://doi.org/10.1145/2578726.2578764

Published: 01 April 2014 Publication History

Abstract

We propose a novel method MultiModal Pseudo Relevance Feedback (MMPRF) for event search in video, which requires no search examples from the user. Pseudo Relevance Feedback has shown great potential in retrieval tasks, but previous works are limited to unimodal tasks with only a single ranked list. To tackle the event search task which is inherently multimodal, our proposed MMPRF takes advantage of multiple modalities and multiple ranked lists to enhance event search performance in a principled way. The approach is unique in that it leverages not only semantic features, but also non-semantic low-level features for event search in the absence of training data. Evaluated on the TRECVID MEDTest dataset, the approach improves the baseline by up to 158% in terms of the mean average precision. It also significantly contributes to CMU Team's final submission in TRECVID-13 Multimedia Event Detection.

References

[1]

M. Berkelaar. lpsolve: Interface to lp solve v.5.5 to solve linear/integer programs. R package version, 5(4), 2008.

[2]

S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.

[3]

G. Cao, J. Y. Nie, J. Gao, and S. Robertson. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR, pages 243--250, 2008.

Digital Library

[4]

J. Dalton, J. Allan, and P. Mirajkar. Zero-shot video retrieval using content and concepts. In CIKM, pages 1857--1860, 2013.

Digital Library

[5]

A. G. Hauptmann, M. G. Christel, and R. Yan. Video retrieval based on semantic concepts. Proceedings of the IEEE, 96(4):602--622, 2008.

[6]

W. H. Hsu, L. S. Kennedy, and S.-F. Chang. Video search reranking via information bottleneck principle. In Multimedia, pages 35--44, 2006.

Digital Library

[7]

G. Iyengar, P. Duygulu, S. Feng, P. Ircing, S. Khudanpur, et al. Joint visual-text modeling for automatic retrieval of multimedia documents. In Multimedia, pages 21--30, 2005.

Digital Library

[8]

L. Jiang, A. G. Hauptmann, and G. Xiang. Leveraging high-level and low-level features for multimedia event detection. In Multimedia, pages 449--458, 2012.

Digital Library

[9]

T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, DTIC Document, 1996.

[10]

T. Joachims. Optimizing search engines using clickthrough data. In SIGKDD, pages 133--142, 2002.

Digital Library

[11]

S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. An interior-point method for large-scale-l1-regularized least squares. Selected Topics in Signal Processing, IEEE Journal of, 1(4):606--617, 2007.

[12]

A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106--1114, 2012.

Digital Library

[13]

Z. Lan, L. Bao, S. Yu, W. Liu, and A. G. Hauptmann. Double fusion for multimedia event detection. In Advances in Multimedia Modeling, pages 173--185, 2012.

Digital Library

[14]

Z. Lan, L. Jiang, S. Yu, et al. Informedia@trecvid 2013. In NIST TRECVID, Workshop, 2013.

[15]

V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, pages 120--127, 2001.

Digital Library

[16]

K. S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In SIGIR, pages 235--242, 2008.

Digital Library

[17]

Y. Liu, T. Mei, X.-S. Hua, J. Tang, X. Wu, and S. Li. Learning to video search rerank via pseudo preference feedback. In ICME, pages 297--300, 2008.

[18]

Y. Lv and C. Zhai. Positional relevance model for pseudo-relevance feedback. In SIGIR, pages 579--586, 2010.

Digital Library

[19]

I. Mironica, B. Ionescu, J. Uijlings, and N. Sebe. Fisher kernel based relevance feedback for multimodal video retrieval. In ICMR, pages 65--72, 2013.

Digital Library

[20]

S. Oh, S. McCloskey, I. Kim, A. Vahdat, K. J. Cannons, H. Hajimirsadeghi, G. Mori, A. A. Perera, M. Pandey, and J. J. Corso. Multimedia event detection with multimodal feature fusion and temporal concept localization. Machine Vision and Applications, pages 1--21, 2013.

Digital Library

[21]

F. Perronnin, J. Sánchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In ECCV, pages 143--156, 2010.

Digital Library

[22]

C. G. Snoek, M. Worring, and A. W. Smeulders. Early versus late fusion in semantic video analysis. In Multimedia, pages 399--402, 2005.

Digital Library

[23]

X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. Bayesian video search reranking. In Multimedia, pages 131--140, 2008.

Digital Library

[24]

W. Tong, Y. Yang, L. Jiang, S. Yu, Z. Lan, Z. Ma, W. Sze, E. Younessian, and A. G. Hauptmann. E-lamp: integration of innovative ideas for multimedia event detection. Machine Vision and Applications, pages 1--11, 2013.

Digital Library

[25]

H. Wang, A. Klaser, C. Schmid, and C. L. Liu. Action recognition by dense trajectories. In CVPR, pages 3169--3176, 2011.

Digital Library

[26]

Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Information Retrieval, 13(3):254--270, 2010.

Digital Library

[27]

R. Yan, A. G. Hauptmann, and R. Jin. Multimedia search with pseudo-relevance feedback. In CVIR, pages 238--247, 2003.

Digital Library

[28]

R. Yan, A. G. Hauptmann, and R. Jin. Negative pseudo-relevance feedback in content-based video retrieval. In Multimedia, pages 343--346, 2003.

Digital Library

[29]

L. Yang and A. Hanjalic. Supervised reranking for web image search. In Multimedia, pages 183--192, 2010.

Digital Library

[30]

E. Younessian, T. Mitamura, and A. G. Hauptmann. Multimodal knowledge-based analysis in multimedia event detection. In ICMR, page 51, 2012.

Digital Library

[31]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342, 2001.

Digital Library

[32]

H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301--320, 2005.

Cited By

Wu JNgo CChan WHou Z(2023)(Un)likelihood Training for Interpretable EmbeddingACM Transactions on Information Systems10.1145/363275242:3(1-26)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3632752
Zhang HNgo C(2019)A Fine Granularity Object-Level Representation for Event Detection and RecountingIEEE Transactions on Multimedia10.1109/TMM.2018.288447821:6(1450-1463)Online publication date: 22-May-2019
https://dl.acm.org/doi/10.1109/TMM.2018.2884478
Cappallo SSvetlichnaya SGarrigues PMensink TSnoek C(2019)New ModalityIEEE Transactions on Multimedia10.1109/TMM.2018.286236321:2(402-415)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TMM.2018.2862363
Show More Cited By

Index Terms

Zero-Example Event Search using MultiModal Pseudo Relevance Feedback
1. Information systems
  1. Information retrieval
  2. Information systems applications

Recommendations

Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Complex video event detection without visual examples is a very challenging issue in multimedia retrieval. We present a state-of-the-art framework for event search without any need of exemplar videos and textual metadata in search corpus. To perform ...
Zero-Example Multimedia Event Detection and Recounting with Unsupervised Evidence Localization
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Retrieval of a complex multimedia event has long been regarded as a challenging task. Multimedia event recounting, other than event detection, focuses on providing comprehensible evidence which justifies a detection result. Recounting enables "video ...
Semantic Reasoning in Zero Example Video Event Retrieval

Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

April 2014

564 pages

ISBN:9781450327824

DOI:10.1145/2578726

Conference Chairs:
Mohan Kankanhalli
National University of Singapore
,
Stefan Rueger
The Open University, UK
,
R. Manmatha
A9.com, USA
,
General Chairs:
Joemon Jose
University of Glasgow, UK
,
Keith van Rijsbergen
University of Glasgow, UK

Copyright © 2014 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2014

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

ICMR '14

ICMR '14: International Conference on Multimedia Retrieval

April 1 - 4, 2014

Glasgow, United Kingdom

Acceptance Rates

ICMR '14 Paper Acceptance Rate 21 of 111 submissions, 19%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
229
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu JNgo CChan WHou Z(2023)(Un)likelihood Training for Interpretable EmbeddingACM Transactions on Information Systems10.1145/363275242:3(1-26)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3632752
Zhang HNgo C(2019)A Fine Granularity Object-Level Representation for Event Detection and RecountingIEEE Transactions on Multimedia10.1109/TMM.2018.288447821:6(1450-1463)Online publication date: 22-May-2019
https://dl.acm.org/doi/10.1109/TMM.2018.2884478
Cappallo SSvetlichnaya SGarrigues PMensink TSnoek C(2019)New ModalityIEEE Transactions on Multimedia10.1109/TMM.2018.286236321:2(402-415)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TMM.2018.2862363
Gong MLi HMeng DMiao QLiu J(2019)Decomposition-Based Evolutionary Multiobjective Optimization to Self-Paced LearningIEEE Transactions on Evolutionary Computation10.1109/TEVC.2018.285076923:2(288-302)Online publication date: Apr-2019
https://doi.org/10.1109/TEVC.2018.2850769
Markatopoulou FGalanopoulos DTzelepis CMezaris VPatras IVrochidis SHuet BChang EKompatsiaris I(2019)Concept‐Based and Event‐Based Video Search in Large Video CollectionsBig Data Analytics for Large‐Scale Multimedia Search10.1002/9781119376996.ch2(31-60)Online publication date: 15-Mar-2019
https://doi.org/10.1002/9781119376996.ch2
Men XZhou FLi XBao HIp HSeidel HSheffer AFu HGhosh AKopf J(2018)A deep learned method for video indexing and retrievalProceedings of the 26th Pacific Conference on Computer Graphics and Applications: Short Papers10.2312/pg.20181287(85-88)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.2312/pg.20181287
GUO L(2018)Self-Paced Learning with Statistics Uncertainty PriorIEICE Transactions on Information and Systems10.1587/transinf.2017EDL8169E101.D:3(812-816)Online publication date: 2018
https://doi.org/10.1587/transinf.2017EDL8169
Chen ZXu ZZhang YGu X(2018)Query-Free Clothing Retrieval via Implicit Relevance FeedbackIEEE Transactions on Multimedia10.1109/TMM.2017.278525320:8(2126-2137)Online publication date: Aug-2018
https://doi.org/10.1109/TMM.2017.2785253
Chesneau NAlahari KSchmid C(2018)Learning From Web Videos for Event ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2017.276462428:10(3019-3029)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1109/TCSVT.2017.2764624
Ntalianis KDoulamis ATsapatsoulis NMastorakis N(2018)Social Relevance Feedback Based on Multimedia Content PowerIEEE Transactions on Computational Social Systems10.1109/TCSS.2017.27662505:1(109-117)Online publication date: Mar-2018
https://doi.org/10.1109/TCSS.2017.2766250
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten