Abstract
Representing video events is an essential step for a wide range of visual applications. In this paper, we propose the event sketch, a high-level event representation, to depict the dynamic properties of video events composed of actions of semantic objects. We show that this representation can facilitate a novel sketch based video retrieval (SBVR) system, which has not been considered before to the best of our knowledge. In this system, users are allowed to draw the evolutions (e.g. spatiotemporal layouts and behaviors of semantic objects) on a board, and retrieve the events whose semantic objects have the similar evolutions from a database. To do this, event sketches are constructed on both the user queries and database videos, and compared under a novel graph-matching scheme based on data-driven Monta Carlo Markov chain (DDMCMC). To test our approach, we collect a novel dataset of goal events in real soccer videos, which consists actions of multiple players and shows large variability in the evolution process of the events. Experiments on this dataset and the publicly available dataset CAVIAR demonstrated the effectiveness of the proposed approach.
Similar content being viewed by others
References
Yuan J, Zha Z J, Zheng Y T, et al. Learning concept bundles for video search with complex queries. In: Proceedings of International Conference on Multimedia, Scottsdale, 2011. 453–462
Bao L, Cao J, Zhang Y, et al. Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In: Proceedings of International Conference on Multimedia, Firenze, 2010. 939–942
Ulges A, Schulze C, Koch M, et al. Learning automatic concept detectors from online video. Comput Vis Image Underst, 2010, 114: 429–438
Hu R, Collomosse J. Motion-sketch based video retrieval using a trellis levenshtein distance. In: Proceedings of International Conference on Pattern Recognition, Istanbul, 2010. 121–124
Collomosse J P, McNeill G, Qian Y. Storyboard sketches for content based video retrieval. In: Proceedings of International Conference on Computer Vision, Kyoto, 2009. 245–252
Hu R, James S, Collomosse J. Annotated free-hand sketches for video retrieval using object semantics and motion. In: Proceedings of the 18th International Conference on Advances in Multimedia Modeling. Berlin: Springer, 2012. 473–484
Hu R, James S, Wang T, et al. Markov random fields for sketch based video retrieval. In: Proceedings of International Conference on Multimedia Retrieval, Dallas, 2013. 279–286
Zhou R, Chen L, Zhang L. Sketch-based image retrieval on a large scale database. In: Proceedings of International Conference on Multimedia, Nara, 2012. 973–976
Eitz M, Hildebrand K, Boubekeur T, et al. Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Trans Vis Comput Graph, 2011, 17: 1624–1636
Cao Y, Wang C, Zhang L, et al. Edgel index for large-scale sketch-based image search. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, Colorado, 2011. 761–768
Lu D, Ma H, Fu H. Efficient Sketch-based 3D shape retrieval via view selection. In: Proceedings of Advances in Multimedia Information Processing–PCM, Nanjing, 2013. 396–407
Xu H, Wang J, Hua X S, et al. Interactive image search by 2D semantic map. In: Proceedings of International Conference on World Wide Web, Raleigh, 2010. 1321–1324
Yu G, Yuan J, Liu Z. Action search by example using randomized visual vocabularies. IEEE Trans Image Process, 2013, 22: 377–390
Lan T, Wang Y, Mori G, et al. Retrieving actions in group contexts. In: Proceedings of the 11th European Conference on Trends and Topics in Computer Vision–Volume Part I. Berlin: Springer, 2012. 181–194
Ma X, Chen X, Khokhar A, et al. Motion trajectory-based video retrieval, classification, and summarization. In: Video Search and Mining. Berlin: Springer, 2010. 53–82
Cheng Z, Qin L, Huang Q, et al. Human group activity analysis with fusion of motion and appearance information. In: Proceedings of International Conference on Multimedia, Scottsdale, 2011. 1401–1404
Fisher M, Savva M, Hanrahan P. Characterizing structural relationships in scenes using graph kernels. ACM Trans Graph, 2011, 30: 34
Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Tech, 2011, 2: 27
Pérez P, Hue C, Vermaak J, et al. Color-based probabilistic tracking. In: Proceedings of European Conference on Computer Vision, Copenhagen, 2002. 661–675
Tran D, Sorokin A. Human activity recognition with metric learning. In: Proceedings of European Conference on Computer Vision, Copenhagen, 2008. 548–561
Jiang K, Chen X, Zhang Y, et al. Video event representation and inference on and-or graph. Comput Animat Virtual Worlds, 2012, 23: 145–154
Ribeiro P C, Santos-Victor J. Human activity recognition from video: modeling, feature selection and classification architecture. In: Proceedings of International Workshop on Human Activity Recognition and Modelling, Oxford, 2005. 61–78
Ben Shitrit H, Berclaz J, Fleuret F, et al. Tracking multiple people under global appearance constraints. In: Proceedings of International Conference on Computer Vision, Barcelona, 2011. 137–144
Xie Y, Chang H, Li Z, et al. A unified framework for locating and recognizing human actions. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, Colorado, 2011. 25–32
Hua X-S, Qi G-J. Online multi-label active annotation: towards large-scale content-based video search. In: Proceedings of International Conference on Multimedia, Vancouver, 2008. 141–150
Ahn L-V, Dabbish L. Labeling images with a computer game. In: Processings of SIGCHI Conference on Human Factors in Computing Systems, Vienna, 2004. 319–326
Sorokin A, Forsyth D. Utility data annotation with amazon mechanical turk. In: Workshops of International Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1–8
Lee J, Cho M, Lee K M. A graph matching algorithm using data-driven markov chain monte carlo sampling. In: Proceedings of International Conference on Pattern Recognition, Istanbul, 2010. 2816–2819
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, Y., Chen, X., Lin, L. et al. High-level representation sketch for video event retrieval. Sci. China Inf. Sci. 59, 072103 (2016). https://doi.org/10.1007/s11432-015-5494-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-015-5494-4