research-article

Video Scene Detection Using Compact Bag of Visual Word Models

Authors:

Muhammad Haroon,

Sher Muhammad Daudpota,

Maheen Bakhtyar,

Varsha Devi Academic Editor:

Deepu RajanAuthors Info & Claims

Advances in Multimedia, Volume 2018

https://doi.org/10.1155/2018/2564963

Published: 01 January 2018 Publication History

Abstract

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.

References

[1]

S. Lefèvre and N. Vincent, “Efficient and robust shot change detection,” Journal of Real-Time Image Processing, vol. 2, no. 1, pp. 23–34, 2007.

[2]

J. Baber, S. Satoh, N. Afzulpurkar, and C. Keatmanee, “Bag of visual words model for videos segmentation into scenes,” in Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, pp. 191–194, ACM, 2013.

Digital Library

[3]

J. Baber, N. Afzulpurkar, and S. Satoh, “A framework for video segmentation using global and local features,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 27, no. 5, 2013.

[4]

L. Baraldi, C. Grana, and R. Cucchiara, “Shot and scene detection via hierarchical clustering for re-using broadcast video,” in Proceedings of the International Conference on Computer Analysis of Images and Patterns, pp. 801–811, Springer, 2015.

[5]

J. Baber, N. Afzulpurkar, and M. Bakhtyar, “Video segmentation into scenes using entropy and SURF,” in Proceedings of the 2011 7th International Conference on Emerging Technologies (ICET '11), pp. 1–6, IEEE, 2011.

[6]

T. Kikukawa and S. Kawafuchi, “Transaction of the institute of electronics development of an automatic summary editing system for the audio visual resources,” Information and communication Engineers, vol. 75, no. 2, pp. 398–402, 1992.

[7]

A. Nagasaka and Y. Tanaka, Visual database systems, II, 1992.

[8]

H. Zhang, A. Kankanhalli, and S. W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Systems, vol. 1, no. 1, pp. 10–28, 1993.

Digital Library

[9]

I. Koprinska and S. Carrato, “Temporal video segmentation: a survey,” Signal Processing: Image Communication, vol. 16, no. 5, pp. 477–500, 2001.

[10]

Z. Cernekova, I. Pitas, and C. Nikou, “Information theory-based shot cut/fade detection and video summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 82–91, 2006.

Digital Library

[11]

T. Kikukawa and S. Kawafuchi, “Development of an automatic summary editing system for the audio-visual resources,” Transactions on Electronics and Information J75-A, pp. 204–212, 1992.

[12]

A. Nagasaka, “Automatic video indexing and full-video search for object appearances,” in Proceedings of the IFIP 2nd Working Conference on Visual Database Systems, 1992.

[13]

G. C. Chavez, F. Precioso, M. Cord, S. Philipp-Foliguet, and A. D. A. Araujo, “Shot boundary detection at trecvid 2006,” in Proceedings of the TREC Video Retrieval Eval, vol. 15, 2006.

[14]

X. Ling, O. Yuanxin, L. Huan, and X. Zhang, “A Method for Fast Shot Boundary Detection Based on SVM,” in Proceedings of the 2008 Congress on Image and Signal Processing, vol. 2, pp. 445–449, IEEE, 2008.

Digital Library

[15]

J. Li, Y. Ding, Y. Shi, and W. Li, “A divide-and-rule scheme for shot boundary detection based on SIFT,” International Journal of Digital Content Technology and Its Applications, vol. 4, no. 3, pp. 202–214, 2010.

[16]

M. Yeung, B.-L. Yeo, and B. Liu, “Segmentation of Video by Clustering and Graph Analysis,” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94–109, 1998.

Digital Library

[17]

Z. Rasheed and M. Shah, “Scene detection in Hollywood movies and TV shows,” in Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 343–348, IEEE, 2003.

[18]

D. Rotman, D. Porat, and G. Ashour, “Robust and efficient video scene detection using optimal sequential grouping,” in Proceedings of the 18th IEEE International Symposium on Multimedia, ISM '16, pp. 275–280, IEEE, 2016.

[19]

D. Rotman, D. Porat, and G. Ashour, “Robust video scene detection using multimodal fusion of optimally grouped features,” in Proceedings of the 19th IEEE International Workshop on Multimedia Signal Processing, MMSP '17, pp. 1–6, 2017.

[20]

L. Baraldi, C. Grana, and R. Cucchiara, “Analysis and re-use of videos in educational digital libraries with automatic scene detection,” in Proceedings of the Italian Research Conference on Digital Libraries, pp. 155–164, Springer, 2015.

[21]

U. Sakarya and Z. Telatar, “Video scene detection using dominant sets,” in Proceedings of the 2008 15th IEEE International Conference on Image Processing - ICIP '08, pp. 73–76, IEEE, 2008.

[22]

T. Lin, H. Zhang, and Q.-Y. Shi, “Video scene extraction by force competition,” in Proceedings of the IEEE International Conference on Multimedia and Expo, (ICME '01), pp. 753–756, 2001.

[23]

X. Chen and F. Lu, “Adaptive rate control algorithm for H.264/AVC considering scene change,” Mathematical Problems in Engineering, vol. 2013, 6 pages, 2013.

[24]

G. Rascioni, S. Spinsante, and E. Gambi, “An optimized dynamic scene change detection algorithm for H.264/AVC encoded video sequences,” International Journal of Digital Multimedia Broadcasting, vol. 2010, 9 pages, 2010.

[25]

Z. Rasheed and M. Shah, “Scene detection in Hollywood movies and TV shows,” in Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003.

[26]

J. Baber, N. Afzulpurkar, M. N. Dailey, and M. Bakhtyar, “Shot boundary detection from videos using entropy and local descriptor,” in Proceedings of the 2011 17th International Conference on Digital Signal Processing (DSP '11), pp. 1–6, IEEE, 2011.

[27]

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.

Digital Library

[28]

J. Baber, M. N. Dailey, S. Satoh, N. Afzulpurkar, and M. Bakhtyar, “BIG-OH: binarization of gradient orientation histograms,” Image and Vision Computing, vol. 32, no. 11, pp. 940–953, 2014.

Digital Library

[29]

H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 3304–3311, 2010.

[30]

J. Delhumeau, P.-H. Gosselin, H. Jégou, and P. Pérez, “Revisiting the VLAD image representation,” in Proceedings of the 21st ACM International Conference on Multimedia, pp. 653–656, 2013.

Digital Library

Cited By

Jose JRajkumar SGhalib MShankar ASharma PKhosravi M(2022)Efficient Shot Boundary Detection with Multiple Visual RepresentationsMobile Information Systems10.1155/2022/41959052022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4195905

Index Terms

Video Scene Detection Using Compact Bag of Visual Word Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing
  2. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Index terms have been assigned to the content through auto-classification.

Recommendations

Bag-of-visual-words expansion using visual relatedness for video indexing
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Bag-of-visual-words (BoW) has been popular for visual classification in recent years. In this paper, we propose a novel BoW expansion method to alleviate the effect of visual word correlation problem. We achieve this by diffusing the weights of visual ...
Evaluating bag-of-visual-words representations in scene classification
MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval

Based on keypoints extracted as salient image patches, an image can be described as a "bag of visual words" and this representation has been used in scene classification. The choice of dimension, selection, and weighting of visual words in this ...
Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection
Once a video sequence is organized as basic shot units, it is of great interest to temporally link shots into semantic-compact scene segments to facilitate long video understanding. However, it still challenges existing video scene boundary detection ...

Comments

Information & Contributors

Information

Published In

cover image Advances in Multimedia

Advances in Multimedia Volume 2018, Issue

2018

453 pages

ISSN:1687-5680

EISSN:1687-5699

Issue’s Table of Contents

Copyright © 2018 Muhammad Haroon et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 January 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jose JRajkumar SGhalib MShankar ASharma PKhosravi M(2022)Efficient Shot Boundary Detection with Multiple Visual RepresentationsMobile Information Systems10.1155/2022/41959052022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4195905

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents