Article

Robust scene recognition using language models for scene contexts

Authors:

Koichi Shinoda,

Takahiro MochizukiAuthors Info & Claims

MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval

Pages 99 - 106

https://doi.org/10.1145/1178677.1178693

Published: 26 October 2006 Publication History

Abstract

We propose a robust scene recognition framework using scene context information for multimedia contents. Multimedia contents con-sist of scene sequences that are more likely to happen compared with other scene sequences. We employ a statistical approach to deal with this scene context information. We employ a hidden Markov model (HMM) to model each scene and n-gram language model to represent the contexts among scenes. We evaluated the proposed method in scene recognition experiments for 16 scenes in video data of 25 baseball games. The proposed method significantly improved the results compared to that without scene context information.

References

[1]

R. Brunelli, O. Mich, and C. M. Modena, "A survey on the automatic indexing of video data," Journal of Visual Communication and Image Representation, vol. 10, no. 2, pp. 78--112, 1999.

Digital Library

[2]

S. Kumar and M. Hebert, "A hierarchical field framework for unified context-based classification," Proc. IEEE International Conference on Computer Vision,vol.3, pp. 1284--1291, 2005.

Digital Library

[3]

H. B. Nguyen, K. Shinoda, and S. Furui, "Robust highlight extraction using multi-stream hidden Markov models for baseball video," Proc. IEEE International Conference on Image Processing, vol. 3, pp. 173--176, 2005.

[4]

T. Mochizuki, M. Tadenuma, and N. Yagi, "Baseball video indexing using patternization of scenes and hidden Markov model," Proc. IEEE International Conference on Image Processing, vol. 3, pp. 1212--1215, 2005.

[5]

P. Chang, M. Han, and Y. Gong, "Extract highlights from baseball game video with hidden Markov models," Proc. IEEE International Conference on Image Processing, vol.1, pp. I-609--612, 2002.

[6]

Y. Gong, M. Han, W. Hua, and W. Xu, "Maximum entropy model-based baseball highlight detection and classification," International Journal of Computer Vision and Image Understanding, vol. 96, pp. 181--199, 2004.

Digital Library

[7]

C.-H. Liang, W.-T. Chu, J.-H. Kuo, J.-L. Wu, and W.-H. Cheng, "Baseball event detection using game-specific feature sets and rules," Proc. IEEE International Symposium on Circuits and Systems, pp. 3829--3832, 2005.

[8]

P. Xu, L. Xie, S. F. Chang, A. Divakaran, A. Vetro, and H. Sun, "Algorithms and system for segmentation and structure analysis in soccer video," Proc. IEEE International Conference on Multimedia and Expo, pp. 928--931, 2001.

[9]

Y. Gong, L.-T. Sin, C.-H. Chuan, H.-J. Zhang, and M. Sakauchi, "Automatic parsing of TV soccer programs," Proc. IEEE International Conference on Multimedia Computing and Systems, pp. 167--174, 1995.

Digital Library

[10]

E. Kijak, L. Oisel, and P. Gros, "Hierarchical structure analysis of sport videos using HMMs," Proc. IEEE International Conference on Image Processing, vol.3, pp. 1025--1028, 2003.

[11]

G. Xu, Y.-F. Ma, H.-J. Zhang, and S.-Q. Yang, "Motion based event recognition using HMM," IEEE Trans. Circuits and Systems, vol. 15, pp. 1422--1433, 2005.

Digital Library

[12]

N. Babaguchi, Y. Kwai, and T. Kitahashi, "Event based indexing of broadcasted sports video by intermodal collaboration," IEEE Trans. Multimedia, vol. 4, no. 1, pp. 68--75, 2002.

Digital Library

[13]

L. Rabiner and B.-H. Juang, "Fundamentals of speech recognition," Prentice Hall, 1993.

Digital Library

[14]

G. Xu, Y.-F. Ma, H.-J. Zhang, and S. Yang, "Motion based event recognition using HMM," Proc. IEEE International Conference on Pattern Recognition, vol. 2, pp. 831--834, 2002.

[15]

D. Zhong and S. F. Chang, "Structure analysis of sports video using demain models," Proc. IEEE International Conference on Multimedia and Expo, pp. 920--923, 2001.

[16]

S. Takagi, S. Hattori, K. Yokoyama, A. Kodate, and H. Tominaga, "Sports video categorizing method using camera motion parameters," Proc. IEEE International Conference on Multimedia and Expo, pp. 461--464, 2003.

Digital Library

[17]

B. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," Proc. 7th International Joint Conference on Artificial Intelligence, pp. 674--679, 1981.

Digital Library

[18]

S. M. Katz, "Estimation of probabilities from sparse data for the language model component of a speech recognizer," IEEE Trans. Acoustics, Speech and Signal Processing, vol. 35, pp. 400--401, 1987.

[19]

H. Ney, U. Essen, and R. Kneser, "On structuring probabilistic dependencies in stochastic language modeling," Computer Speech and Language, vol. 8, no. 1, pp. 1--38, 1994.

[20]

P. Placeway, R. Schwartz, P. Fung, and L. Nguyen, "The estimation of powerful language models from small and large corpora," Proc. IEEE Acoustics, Speech and Signal Processing, vol. II, pp. 33--36, 1993.

[21]

I. H. Witten and T. C. Bell, "The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression," IEEE Trans. Information Theory, vol.37, no. 4, pp. 1085--1094, 1991.

Digital Library

[22]

G. Saon and M. Padmanablan, "Data-driven approach to designing compound words for continuous speech recognition," IEEE Trans. Speech and Audio Processing, vol. 9, no. 4, pp. 327--332, 2001.

[23]

A. Kilgariff and D. Tugwell, "Wasp-bench: an mt lexicographer's workstation supporting state-of-the-art lexical disambiguation," Proc. the 8th Machine Translation Summit, pp. 187--190, 2001.

[24]

http://htk.eng.cam.ac.uk.

[25]

http://svr-www.eng.cam.ac.uk/¿prc14/toolkit.html.

[26]

X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Prentice Hall PTR, 2001.

Digital Library

[27]

http://julius.sourceforge.jp.

Cited By

Shirahama KGrzegorzek M(2018)Towards large-scale multimedia retrieval enriched by knowledge about human interpretationMultimedia Tools and Applications10.1007/s11042-014-2292-875:1(297-331)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11042-014-2292-8
Inoue NShinoda KHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)n-gram Models for Video Semantic IndexingProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654961(777-780)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654961
Shirahama KGrzegorzek MUehara KKankanhalli MRueger SManmatha RJose Jvan Rijsbergen K(2014)Multimedia Event Detection Using Hidden Conditional Random FieldsProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578742(9-16)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1145/2578726.2578742
Show More Cited By

Index Terms

Robust scene recognition using language models for scene contexts
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Video segmentation
      2. Computer vision tasks
        Scene understanding
        Video summarization

Recommendations

A robust scene recognition system for baseball broadcast using data-driven approach
CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval

We propose a robust scene recognition system for baseball broadcast videos. This system is based on the data-driven approach which has been successful in continuous speech recognition. It uses a multi-stream hidden Markov model to model each scene and an ...
Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast

In this paper, we propose a robust statistical framework for extracting scenes from a baseball broadcast video. We apply multi-stream hidden Markov models (HMMs) to control the weights among different features. To achieve a large robustness against new ...
Calibration of panoramic cameras using 3D scene information
Proceedings of the 11th international conference on Theoretical foundations of computer vision

This chapter proposes a novel approach for the calibration of a panoramic camera using geometric information available in real scenes. Panoramic cameras are of increasing importance for various applications in computer vision, computer graphics or ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval

October 2006

344 pages

ISBN:1595934952

DOI:10.1145/1178677

General Chairs:
James Z. Wang
The Pennsylvania State University
,
Nozha Boujemaa
INRIA Rocquencourt, France
,
Yixin Chen
The University of Mississippi

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM06

Sponsor:

MM06: The 14th ACM International Conference on Multimedia 2006

October 26 - 27, 2006

California, Santa Barbara, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
255
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shirahama KGrzegorzek M(2018)Towards large-scale multimedia retrieval enriched by knowledge about human interpretationMultimedia Tools and Applications10.1007/s11042-014-2292-875:1(297-331)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11042-014-2292-8
Inoue NShinoda KHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)n-gram Models for Video Semantic IndexingProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654961(777-780)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654961
Shirahama KGrzegorzek MUehara KKankanhalli MRueger SManmatha RJose Jvan Rijsbergen K(2014)Multimedia Event Detection Using Hidden Conditional Random FieldsProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578742(9-16)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1145/2578726.2578742
Shirahama KGrzegorzek MUehara K(2014)Weakly supervised detection of video events using hidden conditional random fieldsInternational Journal of Multimedia Information Retrieval10.1007/s13735-014-0068-64:1(17-32)Online publication date: 28-Sep-2014
https://doi.org/10.1007/s13735-014-0068-6
Shinoda KIshihara KFurui SMochizuki T(2008)Automatic score scene detection for baseball videoProceedings of the 3rd international conference on Large-scale knowledge resources: construction and application10.5555/1787800.1787825(226-240)Online publication date: 3-Mar-2008
https://dl.acm.org/doi/10.5555/1787800.1787825
Shinoda KIshihara KFurui SMochizuki T(2008)Automatic Score Scene Detection for Baseball VideoLarge-Scale Knowledge Resources. Construction and Application10.1007/978-3-540-78159-2_21(226-240)Online publication date: 2008
https://doi.org/10.1007/978-3-540-78159-2_21
Ando RShinoda KFurui SMochizuki TSebe NWorring M(2007)A robust scene recognition system for baseball broadcast using data-driven approachProceedings of the 6th ACM international conference on Image and video retrieval10.1145/1282280.1282312(186-193)Online publication date: 9-Jul-2007
https://dl.acm.org/doi/10.1145/1282280.1282312

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents