Keyframe Extraction Techniques: A Review: ELEKTRIKA-Journal of Electrical Engineering January 2020

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/348169269
Keyframe Extraction Techniques: A Review
Article in ELEKTRIKA- Journal of Electrical Engineering · January 2020
CITATIONS READS
6 3,403
6 authors, including:
Bashir Sadiq Bilyamin Muhammad

Kampala International University (KIU) Kaduna Polytechnic
59 PUBLICATIONS 165 CITATIONS 5 PUBLICATIONS 9 CITATIONS
SEE PROFILE SEE PROFILE
Abdulhakeem Muhammed Ali

Ahmadu Bello University
2 PUBLICATIONS 7 CITATIONS
SEE PROFILE
All content following this page was uploaded by Bilyamin Muhammad on 03 January 2021.
The user has requested enhancement of the downloaded file.

VOL. 19, NO. 3, 2020, 54-60
www.elektrika.utm.my
ISSN 0128-4428
Keyframe Extraction Techniques: A Review

Bashir Olaniyi Sadiq1, Bilyamin Muhammad2*, Muhammad Nasir Abdullahi3, Gabriel Onuh1, Ali
Abdulhakeem Muhammed1 and Adeogun Emmanuel Babatunde2
1
Department of Computer Engineering, Ahmadu Bello University Zaria, Kaduna State, Nigeria.
2
Department of Computer Engineering, Kaduna Federal Polytechnic, Kaduna State, Nigeria.
3
Department of Informatik and Automation, Technische Universität Ilmenau, Thuringia Germany.
*
Corresponding author: bsmuhd@gmail.com, Tel: +2347064627804
Abstract: Video is an audiovisual data that comprises of large number of frames. Analyzing and processing such large amount
of data is difficult to many applications. Therefore, there is need for an effective video management scheme to manage these
huge volume of video frames in order to provide easy access to the video content in lesser time. Keyframe extraction is the
first step for video browsing, indexing and retrieval. Many techniques exist for the extraction of Keyframes. However, some
of the present techniques come with one or more limitations. In this paper, a brief review on the existing techniques is
presented. Also, the merits and demerits of each technique is also stated.
Keywords: Keyframe Extraction, Shot Boundary Detection, Shot Transitions, Video Hierarchy
© 2020 Penerbit UTM Press. All rights reserved
Article History: received 9 June 2020; accepted 6 December 2020; published 26 December 2020.
1. INTRODUCTION gradual transitions between successive frames have proven

The rapid increase in broadband data connection and to be a challenging task for many keyframe extraction
digital video capturing devices has resulted to digital techniques as it normally spans for one or more seconds in
videos been widely utilized [1]. However, this growth in videos depending on the editing effects used. In addition,
the availability of digital videos has not been accompanied the presence of sudden illuminance can also affect the
by an increase in its accessibility [2]. In a situation where efficiency of extracting a keyframe in a video sequence.
certain frames of interest are needed to be reviewed, the The rest of the paper is structured as follows: Section 2
user has to browse/view the whole video data, and due to presents the video hierarchy. Sections 3 presents video
the huge number of frames available in the video, the transitions. Section 4 presents the video summarization
process becomes difficult and time consuming [3]. To system, shot boundary detection technique, and keyframes
address this problem, an effective content-based video extraction techniques. The metrics used for evaluating the
indexing, browsing, and retrieval (CBVIBR) approach is performance of these techniques are discussed in section 5.
required [4]. While conclusions are done in section 6
Video summarization (also known as video
abstraction) is the process of presenting an abstract view 2. VIDEO HIERARCHY
and comprehensible analysis of a full-length video within A video hierarchy is the entire structure of a video which
the shortest period of time [5]. Video skimming and comprises of scenes, shots, and frames. A story comprises
keyframe selection are the two basic methods for video of a number of scenes that captures sequence of event.
abstraction [6]. Video skimming is a method of Hence it is made up of interrelated shots recorded at
summarization that extracts frames together with their different camera positions [10]. A shot is the smallest unit
matching soundtracks from a given video file. While of temporal visual information that contains a sequence of
keyframe extraction (also known as static video interrelated frames captured uninterruptedly by a single
summarization or representative frames) is an efficient camera [8]. These frames represent certain related actions
method that produces a more condensed version of the full- or events in time and space. Figure 1 gives an illustration
length video [7]. of a video hierarchy.
The keyframe extraction is a prerequisite needed for
CBVIBR. The main idea of keyframe extraction is to
manage large amount of video data by selecting unique set
of representative frames while preserving the essential
activities of the original video. Hence, resulting to
simplicity in video analysis and processing [8]. Some of
the areas touched by the development of the keyframe
extraction techniques are; e-learning, news broadcast,
home videos, sports, movies among other areas [9].
In general, the performance of a keyframe extraction
technique is based on its ability to correctly extract unique
keyframes from a given video. However, effects such as Figure 1. Video Hierarchy
54
Bashir Olaniyi Sadiq et al. / ELEKTRIKA, 19(3), 2020, 54-60
3. VIDEO TRANSITIONS
Transition is a frontier between multiple video shots [8].
The Video Editing Process (VEP) is employed to merge
multiple shots to generate a video during the Video
Production Process (VPP) [11]. These VEPs allows the
generation of various transition effects. The main types of
shot transitions are shown in Figure 2.
Figure 4. Dissolve transition
Shot Transitions
3.2.2 Fade in/out Soft Transition (FST)
FST is the type of transition that is usually applied in
Gradual Transitions
movies to start a scene smoothly. In fade-in transition, one
Abrupt
Transition or more end frames of shot is directly changed by fixed
intensity frame, and the pixel intensity values of the next
shot gradually appears into position from a completely
Fade
dark sequence [16]. Figure 5 shows an example of a fade-
Dissolve Wipe in transition with 4 frames involved in the transition (n =
In/Out
1-4).
Figure 2. Types of Shot Transition [12]
3.1 Abrupt Transition

Abrupt (cut) transition is a sudden change that occurs
between two successive video shots without any video
editing process [13]. Figure 3 shows a sudden change
between the last frame of the current shot and the first
frame of the subsequent shot (i.e. between frames 3 and 4). Figure 5. Fade-in Transition
In fade-out the transition, only frames at the end of the

current shot are involved in the transition process, with no
frames from the next shot are involved in the transition
process. It is usually applied at the end of a movie scene.
Figure 6 illustrates a fade-out transition involving 3 frames
(n = 1-3).
Figure 3. Abrupt transition
3.2 Gradual Transition

Gradual (soft) occurs when two successive shots are
combined by making use of video editing process (VEP)
throughout the course of production. It may span two or
more video frames that contains truncated information and Figure 6. Fade-out Transition
are visually interdependent [14]. In detecting shot
boundary from a given video file containing soft transition,
the result of the operation might not be efficiently 3.2.3 Wipe Soft Transition (WST)
achieved. This is attributed to the high visual content WST is the process in which the current shot pixels are
similarities between the consecutive frames involved in the progressively superseded by the corresponding pixels from
VEP [8]. The gradual transitions are classified into three, the next shot by following an organized spatial pattern
namely; dissolve, fade in/out, and wipe transitions. [17]. Figure 7 shows the gradual substitution of the column
pixels from left to right of the 10 frames involved (n=
3.2.1 Dissolve Soft Transition (DST) 1054-1080).
DST is the process in which the pixel intensity values
gradually diminish from the current shot, and the values of 4. VIDEO ABSTRACTION
the pixel intensity of the next shot gradually appears [15]. The rapid growth in network infrastructure together with
In DST, two or more frames may have different pixel the use of advanced digital video technologies necessitated
intensity values but contains the same visual information the need for video abstraction technologies to manage the
as shown in figure 4 [12]. The figure depicts only one huge volumes of video data generated by enormous
frame (i.e. 2nd frame) that is utilized in the dissolve multimedia applications [4]. Hence, allowing the users to
transition. access and retrieve the relevant contents of the video easily
without viewing the entire video.
55
The SBD algorithms comprises of three core elements:

Frame representation, Dissimilarity measure, and
thresholding [18]. The major challenges faced by these
SBD techniques is finding transitions in the presence of
sudden illuminance and large camera/object movement,
and such leading to the extraction of false keyframes [19].
Some of the existing SBD techniques are:
4.1.1 Pixel-Based Technique (PBT)

Figure 7. Wipe transition In this type of SBD method, the difference between two
successive images is determined by directly comparing
their pixel values using equation 1 [20]. PBT is
The primary role of video summarization is to save and computationally simple and best suited for the detection of
improve the storage capacity of the video contents abrupt transition [21]. However, a slight change in camera
efficiently, and also decrease the amount of data required movement will result in multiple identical images
when streaming or downloading video contents from the becoming dissimilar. In addition, a minor variation in
web [1]. Figure 8 shows a block diagram of a video illuminance (flash light) results in false detection [3].
summarization technology. The system consists of the shot
boundary detection module where the video frames are ,'- &'( !
.*+ )*+ ",$,% / ,'- &'(
.*+ )*+ ! "/0,$,%
partitioned into shots, and the keyframe extraction module > 𝜏 (1)
where the number representative frames are identified and 123∙567∙869
selected.
Where k (m, x, y) and k (m-1, x, y) is the intensity value
of the current and previous images in pixel (x, y), and 𝜏 is
Shot the threshold value. A boundary is detected if the
Keyframe summation of the difference exceeds the threshold value
Input Boundary
Extraction [22].
Video Detection
Process
Process
4.1.2 Histogram-Based Technique (HBT)
HBT is the most common and widely approach used for
detecting shot transitions [23]. A transition is detected if
the variation between the histograms of two successive
Summarized frames is greater than a threshold value as shown in
Video equation 2 [20]. HBT is computationally simple and can
detect abrupt, fade, and dissolve transitions efficiently
Figure 8. Block Diagram of Video Abstraction System [23]. Although HBT is insensitive to minor camera
[9] movement, it is sensitive when the movement is large [3].
Furthermore, the approach is sensitive to sudden
illuminance and does not integrate the spatial distribution
4.1 Shot Boundary Detection Techniques (SBD) information of several colors, resulting in two different
Shot boundary detection (also known as temporal video frames having the same histograms [21].
segmentation) is the process of segmenting video frames
into number of shots by determining the frontier between &∈ &FGBBH(C
@ABC
= >,8,? /= >/0,8,?
@*D
the successive video shots in order to make video analysis >𝜏 (2)
1∙ IJ$K9L ∙ 8MN>>K9L
and processing easy [8]. The basic idea of shot boundary
detection techniques is to find the dissimilarities of visual Where H (n, c, b) and H (n-1, c, b) represent the values
content. These variation between successive images are of bin b in color channel c for the histogram of two
computed and a comparison with a threshold is consecutive frames.
established. A transition is then detected if a significant
change occurs between the shots as shown in figure 9. 4.1.3 Statistical-Based Technique (SBT)
The SBT approach involves computing the histogram
difference of consecutive frames and finding the threshold
value by calculating the mean and standard deviation of the
histogram differences. Then, a comparison is established
between the absolute histogram differences of the frames
and a threshold value. The histogram difference is
computed using equation 3 [24].
\ [𝐻 𝑖,𝑎 −𝐻(𝑖+1,𝑎)]²
𝑅 𝑖, 𝑖 + 1 = N]0 (3)
𝐻(𝑖,𝑎)
The mean and standard deviation of the histogram

differences are computed using equation 4 and 5 [24].
Figure 9. Shot Boundary Detection [12]
56
c/0 a J,Jb0 Similarly, an interpretable TAGs trained by convolutional

𝑀𝑒𝑎𝑛 = J]0 (4)
c/0 neural networks (CNN) to predict the position at which a
transition occurs in the video sequence [31].
c/0 [a J,Jb0 /cKN>]²
𝑆𝑇𝐷 = J]0 (5)
c/0 4.2 Keyframe Extraction Techniques
Keyframe extraction is an efficient method used to clearly
The threshold value is determined using equation 6 [24]. express the important contents of a video file by extracting
a set of representative frames and removing/deleting the
𝜏 = 𝑀𝑒𝑎𝑛 + 𝑆𝑇𝐷 ×𝐶 (6) duplicated ones from the original video [6]. These
extracted keyframes are expected to represent and provide
Where 𝑅 𝑖, 𝑖 + 1 is the histogram difference. ith and comprehensive visual information of the whole video [7].
(i+1)th is the current and next frames. H(i, a) and H(i+1, The keyframe approach is employed to reduce the
a) is the histogram of the color channels for consecutive computational burden and the amount of data needed for
frames. M is the total number of frames. C is pre-specified video processing so as to make indexing, retrieval, storage
constant. organization, and recognition of video data more
A shot transition is detected if the difference between convenient and efficient [32]. These techniques can be
the successive frames is greater than the threshold value. classified into three main classes namely; shot based,
This approach has a high computational time due to the sampling-based, and clustering-based techniques [33].
statistical calculations involved [3].
4.2.1 Sampling-Based Technique
4.1.4 Edge-Based Technique (EBT)
This is a type of method that selects representative frames
This type of technique efficiently detects a boundary when by uniformly or randomly sampling the video frames from
the positions of edges of current frame shows a huge the original video, without giving importance to the video
difference with that of the next frame [8]. The edge change content [33]. The concept of this technique is to choose
ratio (ECR) is utilized to find the edge changes using every kth frame from the original video. This value of k is
equation 7 [20]. In EBA, transitions are detected by determined by the duration of the video. A usual choice of
looking for large edge ratio [20]. Although, EBTs detects duration for a summarized video is 5% to 15% of the whole
abrupt transition more accurately and can eliminate false video. For the case of 5% summarization, every 20th
positives resulting from flash light occurrence; however, frame is selected as the keyframe, while for the case of
they are less reliable compared to the HBAs in terms of 15% summarization, every 7th frame is selected as the
performance and computational time [25]. Some of the keyframe [34]. These keyframes extracted do not
EBAs employed for the detection of shot boundaries are; represent all the content of the original video, and may also
Roberts, Sobel, Prewitt, Laplacian of gaussian, and canny result in redundant frames having similar contents [35].
edge detection techniques [26].
4.2.2 Shot-Based Technique
'op
lBAB lBn+
𝐸𝐶𝑅> = 𝑚𝑎𝑥 , (7) In this approach, an efficient SBD method that detects shot
mB mBn+ boundary/transition is utilized first. After segmenting the
video frames into various shots, the keyframe extraction
Where 𝜎> and 𝜎>/0 are the numbers of edge pixels in the process is then performed. Different kinds of literature
6st
current and previous frames. 𝑋>J> and 𝑋>/0 are the have discussed different techniques for the selection of key
numbers of edge pixels entering and leaving in two frame. The traditional approach is to select the first and
successive images. last frames of the candidate shot as the key frames [35].
These extracted key frames are the representative frames
4.1.5 Machine Learning Based Technique
of the shots, which in turn produces the summary of the
Recently, using machine learning techniques in the area of original video in a more condensed manner [36].
video processing have received tremendous attention from
multimedia industries and academia. This is as a result of 4.2.3 Clustering-Based Technique
its significant capabilities to extract high level features Clustering is an unsupervised learning approach that finds
from video frames [27]. However, it is computationally sets of similar data points and cluster them together. In this
expensive due to the well-trained network model. The method, frames within a video file having similar visual
machine learning based technique is classified into two contents are partitioned into different number of clusters.
main classes namely; supervised and unsupervised. The And from each cluster, the frame that is nearest to the
unsupervised learning technique is the most common and center of the candidate cluster is extracted as key frame [6].
widely used approach usually employed when the prior The frame similarities are determined by the features they
information about the dataset is unknown. While the exhibit such as color histograms, texture, saliency maps,
supervised learning system can learn only those tasks that and motion [37]. The main drawback of the clustering-
it’s trained for [28]. A method based on the assumptions based technique is; it is difficult to determine the number
that hierarchy assists in making decision by reducing the of clusters in a given video file before performing the
number of unspecified transitions was proposed to detect clustering operation [38].
abrupt transitions between successive shots [29]
A Support Vector Machine (SVM) learning approach 4.2.4 Other Keyframe Extraction Approaches
was trained to classify frames as abrupt transitions or non- The demerit of various existing techniques in the area of
abrupt ones through the use of information theory to find keyframe extraction is and high computational time in the
the variation between the consecutive frames [30]. process of extracting the keyframes. In addition, most
57
them fail to select unique frames from a video sequence, uG

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ×100% (9)
uv
which resulted to more redundant frames, and as such
increasing the time required for video analysis and Recall also known as sensitivity is computed using
processing. In this regard, an AdaBoost classifier was equation 10 [43].
trained to extract representative frames from vehicle uG
surveillance footage. The algorithm was implemented in 𝑅𝑒𝑐𝑎𝑙𝑙 = ×100% (10)
uG bu€
two modules. This proposed method has a high
computational time due to the well-trained model needed Where Na is the number of keyframes extracted
[39]. accurately. Nk is the total number of the keyframes in the
A novel approach for shot transition detection and video sequence. Nm is the number of missed extractions.
selection of representative images using Eigen values was
presented [40]. In this approach, a data matrix was first 5.3 F-Measure
created for all the successive frames in the original video. F-measure (also known as f-score) is the method of
Covariance matrix was then calculated to determine the evaluating the performance of an algorithm by merging
dissimilarities between the intensity levels of successive both precision and recall to obtain one metric using the
images. To reduce the computational burden, a modified Harmonic mean. F-score is computed using equation 11
approach for calculating the covariance matrix was utilized [8].
to recalculating the whole matrix whenever a new image is
added to the data matrix. The calculated covariance matrix I5K8JLJ6>×5K8N99
was then utilized to determine the Eigen values. The 𝐹 = 2× (11)
I5K8JLJ6>b5K8N99
minimum Eigen value selected was utilized to determine
the variations between the frames. A comparison was 5.4 Computational Time
established between the minimum Eigen value and a Computational cost is the time (measured in seconds)
predefined threshold. If the eigen value exceeds the taken for the technique to extract the keyframes.
threshold, then the previous image is considered as a
transition point and the current image is selected as the 6. CONCLUSION
representative frame.
A higher order color moment was used to extract In this paper, a brief review of keyframe extraction
keyframes from a video sequence by partitioning the video techniques was carried out. Video structure, transition
frames into M X N block shots [41]. From each shot, types, video abstraction, and metrics employed for
frames with most mean and standard deviation values are measuring the performance of the techniques were
selected as the representative frames. Another method discussed. Also, the advantages and disadvantages of each
based on bitwise exclusive or (XOR) logical operation was technique was stated. Although the performance of these
presented to select keyframes by dissimilarity between two techniques is acceptable, Keyframe extraction still face
successive images [42]. some challenges due gradual transitioned frames, camera
operations (zooming, tilting, or panning) and sudden
5. EVALUATION METRICS illuminance (flashlights) in the video sequence.
Addressing these problems will improve the performance
To validate the performance of the keyframe extraction of keyframe extraction techniques.
techniques, several evaluation metrics are utilized namely;
compression ratio, precision and recall, f-measure, and REFERENCES
computational cost [32].
[1] C. Sujatha and U. Mudenagudi, "A study on
5.1 Compression Ratio keyframe extraction methods for video summary," in
The Compression Ratio (CR) is used to determine the 2011 International Conference on Computational
compactness of the technique due to the extracted key Intelligence and Communication Networks, 2011,
frames. CR is computed using equation 8 [7]. pp. 73-77: IEEE.
[2] J. Li, Y. Ding, Y. Shi, and W. Li, "A Divide And
uv
Rule Scheme For Shot Boundary Detection Based on
𝐶𝑅 = 1 − ×100% (8) SIFT," International Journal of Digital Content
uw
Technology and its Applications, vol. 4, pp. 202-214,
Where Nf is the total number of frames in the original 2010.
video, and Nk is the total number of the extracted [3] J. H. Yuan, H. Y. Wang, and B. Zhang, “A formal
keyframes. study of shot boundary detection”, Journal of
Transactions on Circuits and Systems for Video
5.2 Precision and Recall Technology, vol. 17, no. 2, pp. 168-186, February
2007
Precision also known as positive predictive value [43]. It
[4] G. Kumar, Naveen, V. Reddy, and S. S. Kumar,
is the ratio of the total number of keyframes extracted
"Video shot boundary detection and key frame
accurately to the total number of keyframes extracted by
extraction for video retrieval," in Proceedings of the
the technique from the video sequence. In other word, it
Second International Conference on Computational
measures the accuracy of a keyframe extraction technique,
Intelligence and Informatics, 2018, pp. 557-567:
and computed using equation 9 [6].
Springer.
58
VOL. 19, NO. 3, 2020, 54-60
www.elektrika.utm.my
ISSN 0128-4428
[5] S. Santini, "Who needs video summarization Multimedia and Ubiquitous Engineering, pp. 394-
anyway?," in International Conference on Semantic 397, 2008.
Computing (ICSC 2007), 2007, pp. 177-184: IEEE. [20] K. Wu, "Simple Implementations of Video
[6] A. Paul, K. Milan, J. Kavitha, J. Rani, and P. Arockia, Segmentation, Key Frame Extraction and
"Key-Frame Extraction Techniques: A Review," Browsing," 2011.
Recent Patents on Computer Science, vol. 11, no. 1, [21] H. H. YU, and W. WOLF, “A hierarchical
pp. 3-16, 2018. multiresolution video shot Transition detection
[7] H. Gharbi, S. Bahroun, and E. Zagrouba, "A Novel scheme”, Journal of Computer Vision and Image
Key Frame Extraction Approach for Video Understanding, vol. 75, no. 1/2, pp. 196-213, 1999.
Summarization," in VISIGRAPP (3: VISAPP), [22] M.-S. Lee, Y.-M. Yang, and S.-W. Lee, "Automatic
2016, pp. 148-155. video parsing using shot boundary detection and
[8] S. H. Abdulhussain, A. R. Ramli, M. I. Saripan, B. camera operation analysis," Pattern Recognition, vol.
M. Mahmmod, S. A. R. Al-Haddad, and W. A. 34, no. 3, pp. 711-719, 2001.
Jassim, "Methods and challenges in shot boundary [23] C. Vora, B. K. Yadav, and S. Sengupta,
detection: a review," Entropy, vol. 20, no. 4, p. 214, "Comprehensive Survey on Shot Boundary
2018. Detection Techniques," International Journal of
[9] M. Furini, F. Geraci, M. Montangero, M. Pellegrini, Computer Applications, vol. 140, pp. 24-30, 2016.
and Applications, "STIMO: STIll and MOving video
storyboard for the web scenario," Multimedia Tools [24] Kathiriya, Dhaval S. Pipalia, Gaurav B. Vasani,
Applications, vol. 46, no. 1, p. 47, 2010. Alpesh J. Thesiya, and D. J. Varanva, "Χ2 (Chi-
[10] C. Liu, D. Wang, J. Zhu, and B. Zhang, "Learning a Square) Based Shot Boundary Detection and Key
contextual multi-thread model for movie/tv scene Frame Extraction for Video," International Journal of
segmentation," IEEE transactions on multimedia, Engineering and Science, vol. 2, no. 2, pp. 17-21,
vol. 15, no. 4, pp. 884-897, 2013. 2013.
[11] O. Küçüktunç, U. Güdükbay, Ö. Ulusoy, and I. [25] A. Dailianas, R. B. Allen, and P. England,
Understanding, "Fuzzy color histogram-based video "Comparison of automatic video segmentation
segmentation," Computer Vision Image algorithms," in Integration Issues in Large
Understanding, vol. 114, no. 1, pp. 125-134, 2010. Commercial Media Delivery Systems, 1996, vol.
[12] I. A. Zedan, K. M. Elsayed, and E. Emary, "News 2615, pp. 2-16: International Society for Optics and
Videos Segmentation Using Dominant Colors Photonics.
Representation," in Advances in Soft Computing and [26] S. Bhardwaj and A. Mittal, "A survey on various
Machine Learning in Image Processing: Springer, edge detector techniques," Procedia Technology, vol.
2018, pp. 89-109. 4, pp. 220-226, 2012.
[13] X. Ling, O. Yuanxin, L. Huan, and X. Zhang, "A [27] Nishani, E.; Çiço, B. Computer vision approaches
method for fast shot boundary detection based on based on deep learning and neural networks: Deep
SVM," in 2008 Congress on Image and Signal neural networks for video analysis of human pose
Processing, 2008, vol. 2, pp. 445-449: IEEE. estimation. In Proceedings of the 2017 6th
[14] X. Jiang, T. Sun, J. Liu, J. Chao, and W. Zhang, "An Mediterranean Conference on Embedded Computing
adaptive video shot segmentation scheme based on (MECO), Bar, Montenegro, 11–15 June 2017; pp. 1–
dual-detection model," Neurocomputing, vol. 116, 4.
pp. 102-111, 2013. [28] N. J. Janwe and K. K. Bhoyar, "Video key-frame
[15] K. Choroś, "Reduction of faulty detected shot cuts extraction using unsupervised clustering and mutual
and cross dissolve effects in video segmentation comparison," International Journal of Image
process of different categories of digital videos," in Processing,vol. 10, no. 2, pp. 73-84, 2016.
Transactions on computational collective [29] C. G. Chávez, F. Precioso, M. Cord, S. Phillip-
intelligence V: Springer, 2011, pp. 124-139. Foliguet, and A. d. A. Araújo, "Shot Boundary
[16] Z. Cernekova, I. Pitas, and C. Nikou, "Information Detection by a Hierarchical Supervised Approach,"
theory-based shot cut/fade detection and video pp. 197-200, 2007.
summarization," IEEE Transactions on circuits [30] J. Bi, X. Liu, and B. Lang, "A Novel Shot Boundary
systems for video technology, vol. 16, no. 1, pp. 82- Detection Based on Information Theory using
91, 2005. SVM," International Congress on Image and Signal
[17] Y. Kawai, H. Sumiyoshi, and N. Yagi, "Shot Processing, pp. 512-516, 2011.
Boundary Detection at TRECVID 2007," in [31] W. Tong, L. Song, X. Yang, H. Qu, and R. Xie,
TRECVID, 2007: Citeseer. "CNN-based shot boundary detection and video
[18] J. Yuan et al., "A formal study of shot boundary annotation," in 2015 IEEE international symposium
detection," IEEE transactions on circuits systems for on broadband multimedia systems and broadcasting,
video technology, vol. 17, no. 2, pp. 168-186, 2007. 2015, pp. 1-5: IEEE.
[19] L. Xue, C. Li, H. Li, and Z. Xiong, “A general [32] C. V. Sheena and N. Narayanan, "Key-frame
method for shot boundary detection”, In extraction by analysis of histograms of video frames
Proceedings of the International Conference on using statistical methods," Procedia Computer
Science, vol. 70, pp. 36-40, 2015.
59
[33] M. Asim, N. Almaadeed, S. Al-Máadeed, A. Security, Pattern Analysis, and Cybernetics (SPAC),
Bouridane, and A. Beghdadi, "A key frame based 2014, pp. 91-94: IEEE.
video summarization using color features," in 2018 [40] V. Benni, R. Dinesh, P. Punitha, and V. Rao,
Colour and Visual Computing Symposium (CVCS), "Keyframe extraction and shot boundary detection
2018, pp. 1-6: IEEE. using eigen values," International Journal of
[34] S. Jadon and M. Jasim, "Video Summarization," Information Electronics Engineering, vol. 5, no. 1, p.
EasyChair2516-2314, 2019. 40, 2015.
[35] S. M. Tirupathamma, "Key frame based video [41] P. Jadhava and D. Jadhav, "Video summarization
summarization using frame difference," International using higher order color moments," in Proceedings
Journal of Innovative Computer Science & of the International Conference on Advanced
Engineering, vol. 4, no. 03, pp. 160-165, 2017. Computing Technologies and Applications
[36] P. Kaur and R. Kumar, "Analysis of Video (ICACTA), 2015, vol. 45, pp. 275-281.
Summarization Techniques," International Journal [42] B. Rashmi and H. Nagendraswamy, "Shot-based
for Research in Applied Science & Engineering keyframe extraction using bitwise-XOR dissimilarity
Technology (IJRASET), vol. 6, no. 01, 2018. approach," in International Conference on Recent
[37] X. Li, B. Zhao, and X. Lu, "Key frame extraction in Trends in Image Processing and Pattern Recognition,
the summary space," IEEE transactions on 2016, pp. 305-316: Springer.
cybernetics, vol. 48, no. 6, pp. 1923-1934, 2017. [43] A. S. Murugan, K. S. Devi, A. Sivaranjani, and P.
[38] C. Huang and H. Wang, "A Novel Key-frames Srinivasan, "A study on various methods used for
Selection Framework for Comprehensive Video video summarization and moving object detection
Summarization," IEEE Transactions on Circuits and for video surveillance applications," Multimedia
Systems for Video Technology, 2018. Tools Applications, vol. 77, no. 18, pp. 23273-
[39] J. Yuan, W. Wang, W. Yang, and M. Zhang, 23290, 2018.
"Keyframe extraction using AdaBoost," in
Proceedings 2014 IEEE International Conference on
60
View publication stats

Keyframe Extraction Techniques: A Review: ELEKTRIKA-Journal of Electrical Engineering January 2020

Uploaded by

Copyright:

Available Formats

Keyframe Extraction Techniques: A Review: ELEKTRIKA-Journal of Electrical Engineering January 2020

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Keyframe Extraction Techniques: A Review: ELEKTRIKA-Journal of Electrical Engineering January 2020

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Keyframe Extraction Techniques: A Review

Article in ELEKTRIKA- Journal of Electrical Engineering · January 2020

Bashir Sadiq Bilyamin Muhammad

SEE PROFILE SEE PROFILE

Abdulhakeem Muhammed Ali

The user has requested enhancement of the downloaded file.

Keyframe Extraction Techniques: A Review

1. INTRODUCTION gradual transitions between successive frames have proven

Figure 2. Types of Shot Transition [12]

3.1 Abrupt Transition

In fade-out the transition, only frames at the end of the

3.2 Gradual Transition

The SBD algorithms comprises of three core elements:

4.1.1 Pixel-Based Technique (PBT)

The mean and standard deviation of the histogram

c/0 a J,Jb0 Similarly, an interpretable TAGs trained by convolutional

them fail to select unique frames from a video sequence, uG

You might also like