Abstract
Highlight detection is a fundamental step in semantics based video retrieval and personalized sports video browsing. In this paper, an effective hidden Markov models (HMMs) based soccer video event detection method based on a hierarchical video analysis framework is proposed. Soccer video shots are classified into four coarse mid-level semantics: global, median, close-up and audience. Global and local motion information is utilized for the refinement of coarse mid-level semantics. Sequential soccer video is segmented into event clips. Both the temporal transitions of the mid-level semantics and the overall features of an event clip are fused using HMMs to determine the type of event. Highlight detection performance of dynamic Bayesian networks (DBN), conditional random fields (CRF) and the proposed HMM based approach are compared. The average F-score of our highlights (including goal, shoot, foul and placed kick) detection approach is 82.92%, which outperforms that of DBN and CRF by 9.85% and 11.12% respectively. The effects of number of hidden states, overall features, and the refinement of mid-level semantics on the event detection performance are also discussed.
Similar content being viewed by others
References
Assfalg J, Bertini M, Colombo C, Bimbo A, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlight identification. Comput Vis Image Underst 6(4):285–305
Chen S, Chen M, Zhang C, Shyu M (2006). Exciting event detection using multi-level multimodal descriptors and data classification. in Proc. ISM.
Cheng C, Hsu C (2006) Fusion of audio and motion information on HMM-based highlight extraction for baseball games. IEEE Trans Multimedia 8(3):585–599
Dalal N, Triggs B (2005) Histogram of oriented gradients for human detection. In Proc. Int. Conf. Computer Vision and Pattern Recognition
Dao M, Babaguchi N (2008) Mining temporal information and web-casting text for automatic sports event detection. In Proc. MMSP, pp.616–621
Dao M, Babaguchi N (2008) Sports event detection using temporal patterns mining and web-casting text. In Proc. ACM AREA, pp. 33–40
Duan L, Xu M, Chua T, Tian Q, Xu C (2003) A mid-level representation framework for semantic sports video analysis. In Proc. ACM Multimedia, pp. 29–32
Duan L, Xu M, Tian Q (2003) Semantic shot classification in sports video. In Proc. SPIE Storage and Retrieval for Media Database 5021:300–313
Duan L, Xu M, Tian Q, Xu C, Jin JS (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimedia 7(6):1066–1083
Ekin A, Tekalp A (2003) Generic play-break event detection for summarization and hierarchical sports video analysis. In Proc. Int. Conf. Mulmedia and Expo 1:169–172
Ekin A, Tekalp A, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807
Hanjialic A (2003) Generic approach to highlights extraction from a sports video. In Proc. Int. Conf. Image Processing 1: 1–4
Huang C, Shih H, Chao C (2006) Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans Multimedia 8(4):749–760
Jin G, Tao L, Xu G (2004) Hidden markov model based events detection in soccer video. ICIAR 2004, LNCS 3221:605–612
Li B, Errico J, Pan H, Sezan M (2004) Bridging the semantic gap in sports video retrieval and summarization. J Vis Commun Image R 17:393–424
Lien C, Chiang C, Lee C (2007) Scene-based event detection for baseball videos. J Vis Commun Image R 18:1–14
Lyu M, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circ Syst Video Technol 15(2):243–255
Mittal A, Cheong L, Leung T (2001) Dynamic bayesian framework for extracting temporal structure in video. In Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 110–115
Nan N, Liu G, Qian X, Wang C (2008) An SVM-based soccer video shot classification scheme using projection histograms. PCM
Pan H, Beek P, Sezan M (2001) Detection of slow-motion replay segments in sports video for highlights generation. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing 3:1649–1652, Salt Lake City, USA, May, 2001
Pan H, Li B, Sezan M (2002). Automatic detection of replay segments in broadcast sports programs by detecting of logos in scene transitions. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing 4:3385–3388, Orlando, FL, May 2002
Papadopoulos G, Mezaris V, Kompatsiaris I, Strintzis M (2008) Accumulated motion energy fields estimation and representation for semantic event detection. In Proc. CIVR, pp. 221-230
Qian X, Liu G (2007) Global motion estimation from randomly selected motion vector groups and GM/LM based applications. Signal, Image and Video Processing
Qian X, Liu G, Wang H, Su R (2007) Text detection, localization and tracking in compressed videos. Signal Process Image Commun 22:752–768
Qian X, Liu G, Guo D, Li Z, Wang Z, Wang H (2009) Object categorization using hierarchical wavelet packet texture descriptors. In Proc. ISM, pp. 44–51
Qian X, Wang H, Liu G, Li Z, Wang Z (2010) Soccer video event detection by fusing middle level visual semantics of an event clip. In Proc. PCM, pp. 439–451
Qian X, Liu G, Wang Z, Li Z, Wang H (2010) Highlight events detection in soccer video using HCRF. In Proc. ICIMCS
Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–285
Sadlier D, O’Connor N (2005) Event detection in field sports video using audio-visual features and a support vector Machine. IEEE Trans Circuits Syst Video Technol 15(10):602–615
Snoek C, Worring M (2005) Multimedia event-based video indexing using time intervals. IEEE Trans Multimedia 7(4):638–647
Su Y, Sun M, Hsu V (2005) Global motion estimation from coarsely sampled motion vector field and the applications. IEEE Trans Circuits Syst Video Technol 15(2):232–242
Tjondronegoro DW, Chen Y, Pham B (2004) Classification of self-consumable highlights for soccer video summaries. In Proc. Int. Conf. Mulmedia and Expo pp. 579–582
Wang Y, Liu Z, Huang J (2000) Multimedia content analysis using both audio and video clues. IEEE Signal Processing Magazine
Wang F, Ma Y, Zhang H, Li J (2004) Dynamic Bayesian network based event detection for soccer highlight extraction. In Proc. Int. Conf. Image Processing, pp. 633–636
Wang F, Ma Y, Zhang H, Li J (2005) A generic framework for semantic sports video analysis using dynamic Bayesian networks. In Proc. Int. Conf. Multimedia Modelling, pp. 29–32
Wang T, Li J, Diao Q, Hu W, Zhang Y, Dulong C (2006) Semantic event detection using conditional random fields. In Proc. Computer Vision and Pattern Recognition Workshop, pp. 109–115
Wickramaratna K, Chen M, Chen S, Shyu M (2005) Neural network based framework for goal event detection in soccer videos. In Proc. Int. Symposium on Multimedia. pp. 21–28
Xie L, Chang S, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden Markov models. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing, pp. 4096–4099
Xiong Z, Radhakrishnan R, Divakaran A, Huang T (2005) Highlights extraction from sports video based on an audio-visual marker detection framework. In Proc. Int. Conf. Multimedia & Expo, pp. 29–32
Xu P, Xie L, Chang S (2001) Algorithms and systems for segmentation and structure analysis in soccer video. In Proc. Int. Conf. Multimedia & Expo, pp. 184–187.
Xu C, Wang J, Lu H, Zhang Y (2008) A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans Multimedia 10(3):421–436
Xu C, Zhang Y, Zhu G, Rui Y, Lu H, Huang Q (2008) Using webcast text for semantic event detection in broadcast sports video. IEEE Trans Multimedia 10(7):1342–1325
Xu G, Ma Y, Zhang H, Yang S (2005) An HMM-based framework for video semantic analysis. IEEE Trans Circ Syst Video Technol 15(11):1422–1433
Zhang D, Chang S (2002) Event detection in baseball video using superimposed caption recognition. In Proc. ACM Multimedia, Juan-les- Pins, France, Nov. 1, pp. 315–318
Zhao Z, Jiang S, Huang Q, Zhu G (2006) Highlight summarization in sports video based on replay detection. In Proc. Int. Conf. Mulmedia and Expo pp. 1613–1616, Toronto, Ontario, Canada, July 2006
Zhu X, Wu X, Elmagarmid A, Feng Z, Wu L (2005) Video data mining semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5):665–677
Zhu G, Xu C, Huang Q, Rui Y, Jiang S, Gao W, Yao H (2009) Event tactic analysis based on broadcast sport video. IEEE Trans Multimedia 11(1):49–67
Acknowledgement
This work is supported by the National Natural Science Foundation of China No.60903121, Chinese Center University Foundation XJTU-HRT-002, and Microsoft Research Foundation FY11-RES-THEME-052. The authors give their special thanks to Wenjun Zeng with the Computer Science Department of University of Missouri for proof reading the paper and discussion.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Qian, X., Wang, H., Liu, G. et al. HMM based soccer video event detection using enhanced mid-level semantic. Multimed Tools Appl 60, 233–255 (2012). https://doi.org/10.1007/s11042-011-0817-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0817-y