Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Video Scene Detection Using Compact Bag of Visual Word Models

Published: 01 January 2018 Publication History

Abstract

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.

References

[1]
S. Lefèvre and N. Vincent, “Efficient and robust shot change detection,” Journal of Real-Time Image Processing, vol. 2, no. 1, pp. 23–34, 2007.
[2]
J. Baber, S. Satoh, N. Afzulpurkar, and C. Keatmanee, “Bag of visual words model for videos segmentation into scenes,” in Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, pp. 191–194, ACM, 2013.
[3]
J. Baber, N. Afzulpurkar, and S. Satoh, “A framework for video segmentation using global and local features,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 27, no. 5, 2013.
[4]
L. Baraldi, C. Grana, and R. Cucchiara, “Shot and scene detection via hierarchical clustering for re-using broadcast video,” in Proceedings of the International Conference on Computer Analysis of Images and Patterns, pp. 801–811, Springer, 2015.
[5]
J. Baber, N. Afzulpurkar, and M. Bakhtyar, “Video segmentation into scenes using entropy and SURF,” in Proceedings of the 2011 7th International Conference on Emerging Technologies (ICET '11), pp. 1–6, IEEE, 2011.
[6]
T. Kikukawa and S. Kawafuchi, “Transaction of the institute of electronics development of an automatic summary editing system for the audio visual resources,” Information and communication Engineers, vol. 75, no. 2, pp. 398–402, 1992.
[7]
A. Nagasaka and Y. Tanaka, Visual database systems, II, 1992.
[8]
H. Zhang, A. Kankanhalli, and S. W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Systems, vol. 1, no. 1, pp. 10–28, 1993.
[9]
I. Koprinska and S. Carrato, “Temporal video segmentation: a survey,” Signal Processing: Image Communication, vol. 16, no. 5, pp. 477–500, 2001.
[10]
Z. Cernekova, I. Pitas, and C. Nikou, “Information theory-based shot cut/fade detection and video summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 82–91, 2006.
[11]
T. Kikukawa and S. Kawafuchi, “Development of an automatic summary editing system for the audio-visual resources,” Transactions on Electronics and Information J75-A, pp. 204–212, 1992.
[12]
A. Nagasaka, “Automatic video indexing and full-video search for object appearances,” in Proceedings of the IFIP 2nd Working Conference on Visual Database Systems, 1992.
[13]
G. C. Chavez, F. Precioso, M. Cord, S. Philipp-Foliguet, and A. D. A. Araujo, “Shot boundary detection at trecvid 2006,” in Proceedings of the TREC Video Retrieval Eval, vol. 15, 2006.
[14]
X. Ling, O. Yuanxin, L. Huan, and X. Zhang, “A Method for Fast Shot Boundary Detection Based on SVM,” in Proceedings of the 2008 Congress on Image and Signal Processing, vol. 2, pp. 445–449, IEEE, 2008.
[15]
J. Li, Y. Ding, Y. Shi, and W. Li, “A divide-and-rule scheme for shot boundary detection based on SIFT,” International Journal of Digital Content Technology and Its Applications, vol. 4, no. 3, pp. 202–214, 2010.
[16]
M. Yeung, B.-L. Yeo, and B. Liu, “Segmentation of Video by Clustering and Graph Analysis,” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94–109, 1998.
[17]
Z. Rasheed and M. Shah, “Scene detection in Hollywood movies and TV shows,” in Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 343–348, IEEE, 2003.
[18]
D. Rotman, D. Porat, and G. Ashour, “Robust and efficient video scene detection using optimal sequential grouping,” in Proceedings of the 18th IEEE International Symposium on Multimedia, ISM '16, pp. 275–280, IEEE, 2016.
[19]
D. Rotman, D. Porat, and G. Ashour, “Robust video scene detection using multimodal fusion of optimally grouped features,” in Proceedings of the 19th IEEE International Workshop on Multimedia Signal Processing, MMSP '17, pp. 1–6, 2017.
[20]
L. Baraldi, C. Grana, and R. Cucchiara, “Analysis and re-use of videos in educational digital libraries with automatic scene detection,” in Proceedings of the Italian Research Conference on Digital Libraries, pp. 155–164, Springer, 2015.
[21]
U. Sakarya and Z. Telatar, “Video scene detection using dominant sets,” in Proceedings of the 2008 15th IEEE International Conference on Image Processing - ICIP '08, pp. 73–76, IEEE, 2008.
[22]
T. Lin, H. Zhang, and Q.-Y. Shi, “Video scene extraction by force competition,” in Proceedings of the IEEE International Conference on Multimedia and Expo, (ICME '01), pp. 753–756, 2001.
[23]
X. Chen and F. Lu, “Adaptive rate control algorithm for H.264/AVC considering scene change,” Mathematical Problems in Engineering, vol. 2013, 6 pages, 2013.
[24]
G. Rascioni, S. Spinsante, and E. Gambi, “An optimized dynamic scene change detection algorithm for H.264/AVC encoded video sequences,” International Journal of Digital Multimedia Broadcasting, vol. 2010, 9 pages, 2010.
[25]
Z. Rasheed and M. Shah, “Scene detection in Hollywood movies and TV shows,” in Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003.
[26]
J. Baber, N. Afzulpurkar, M. N. Dailey, and M. Bakhtyar, “Shot boundary detection from videos using entropy and local descriptor,” in Proceedings of the 2011 17th International Conference on Digital Signal Processing (DSP '11), pp. 1–6, IEEE, 2011.
[27]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[28]
J. Baber, M. N. Dailey, S. Satoh, N. Afzulpurkar, and M. Bakhtyar, “BIG-OH: binarization of gradient orientation histograms,” Image and Vision Computing, vol. 32, no. 11, pp. 940–953, 2014.
[29]
H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 3304–3311, 2010.
[30]
J. Delhumeau, P.-H. Gosselin, H. Jégou, and P. Pérez, “Revisiting the VLAD image representation,” in Proceedings of the 21st ACM International Conference on Multimedia, pp. 653–656, 2013.

Cited By

View all
  • (2022)Efficient Shot Boundary Detection with Multiple Visual RepresentationsMobile Information Systems10.1155/2022/41959052022Online publication date: 1-Jan-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Advances in Multimedia
Advances in Multimedia  Volume 2018, Issue
2018
453 pages
ISSN:1687-5680
EISSN:1687-5699
Issue’s Table of Contents
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 January 2018

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Efficient Shot Boundary Detection with Multiple Visual RepresentationsMobile Information Systems10.1155/2022/41959052022Online publication date: 1-Jan-2022

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media