Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Computer Science and Information Technologies
Vol. 5, No. 2, July 2024, pp. 130~139
ISSN: 2722-3221, DOI: 10.11591/csit.v5i2.pp130-139  130
Journal homepage: http://iaesprime.com/index.php/csit
Video shot boundary detection based on frames objects
comparison and scale-invariant feature transform technique
Noor Khalid Ibrahim, Zinah Sadeq Abduljabbar
Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq
Article Info ABSTRACT
Article history:
Received Dec 12, 2023
Revised Feb 24, 2024
Accepted Mar 4, 2024
The most popular source of data on the Internet is video which has a lot of
information. Automating the administration, indexing, and retrieval of movies
is the goal of video structure analysis, which uses content-based video
indexing and retrieval. Video analysis requires the ability to recognize shot
changes since video shot boundary recognition is a preliminary stage in the
indexing, browsing, and retrieval of video material. A method for shot
boundary detection (SBD) is suggested in this situation. This work proposes
a shot boundary detection system with three stages. In the first stage, multiple
images are read in temporal sequence and transformed into grayscale images.
Based on correlation value comparison, the number of redundant frames in
the same shots is decreased, from this point on, the amount of time and
computational complexity is reduced. Then, in the second stage, a candidate
transition is identified by comparing the objects of successive frames and
analyzing the differences between the objects using the standard deviation
metric. In the last stage, the cut transition is decided upon by matching key
points using a scale-invariant feature transform (SIFT). The proposed system
achieved an accuracy of 0.97 according to the F-score while minimizing time
consumption.
Keywords:
Frames correlation
Object comparison
Shot boundary
Video analysis
Video segmentation
This is an open access article under the CC BY-SA license.
Corresponding Author:
Noor Khalid Ibrahim
Department of Computer Science, College of Science, Mustansiriyah University
Baghdad, Iraq
Email: noor.kh20@ uomustansiriyah.edu.iq
1. INTRODUCTION
The vast amount of video content on the internet makes it challenging to develop effective indexing
and search strategies for managing video data. Content-based video retrieval is emerging as a trend in video
retrieval systems, while conventional methods like video compression and summarizing aim for minimal
storage requirements and maximum visual and semantic accuracy [1]. Given that video is the most
sophisticated sort of multimedia data, it includes information about the target's mobility within the scene as
well as information about the objective world changing with time [2].
Two modules can be approximately regarded in video segmentation which are video object
(foreground/background) segmentation, and video semantic segmentation [3]. Video segmentation, also known
as shot boundary detection (SBD), involves breaking the video up into meaningful scenes so that the essential
feature(s) may be found in each scene through analysis [4]. A cut is a sudden change in the shot that takes place
inside a single frame. A fade is a gradual alteration in brightness that often begins or ends with a completely
dark frame. Frames inside the transition show one image overlaid on the other during a dissolve, which happens
as the images of the first shot go darker and the images of the second shot get brighter [1]. The primary
Comput Sci Inf Technol ISSN: 2722-3221 
Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim)
131
difficulties in shot boundary recognition are movements of the camera and objects since these can significantly
change the video content, producing an effect akin to transition effects and leading to inaccurate shot transition
detection [5].
Numerous studies have addressed video segmentation, Hong Shao et al. [6] utilized a combination of
a color histogram with Hue Saturation Value (HSV) and features of histogram of gradient (HOG) to effectively
detect abrupt shot changes in videos. In [3] This work proposes a shot boundary detection approach based on
the scale-invariant feature transform (SIFT). Using a top-down search strategy, the initial phase of this
approach compares the ratio of matched features derived by SIFT for each RGB channel of video frames to
locate transitions. The boundaries' locations are shown in the overview stage. Second, to ascertain the kind of
transition, a moving average computation is made.
In [7] The research aimed to use a multi-modal visual features-based SBD framework; the behaviors
of the visual representation are analyzed concerning the discontinuity signal. This used a candidate segment
selection strategy that does not compute the threshold; instead, it utilizes the discontinuity signal's cumulative
moving average to determine the shot boundary locations while disregarding the non-boundary video frames.
To differentiate between a candidate segment that is a cut transition and one that is a gradual transition,
including fade in/out and logo occurrence, the transition detection is carried out structurally.
In [8] the proposed temporal video segment representation formalizes video scenes as temporal
motion change data, determining motion modifications and cuts between scenes through optical flow character
changes. This reduces the issue to an optical flow-based cut detection problem, enhancing a pixel-based
representation. The proposed video segment representation divides temporal video segment points into cuts
and non-cuts.
In [9] the bag of visual word (BoVW) model, which splits the video into shots and keyframes, is the
basis for the segmentation model for videos that the study suggested. The BoVW model is employed in two
variants: the traditional BoVW and an expansion known as the vector of linearly aggregated descriptors
(VLAD). Keyframe feature vectors inside a sliding window of length L are used to calculate similarity. In [10]
The study presents a method for feature fusion and clustering technique (FFCT)-based video shot boundary
detection, which involves converting interval frames into grayscale images, extracting fingerprint and speed-
up robust features, fusion, and clustering them using a K-means algorithm. Linear discriminant analysis (LDA)
is introduced for cluster mapping, and features are chosen using density computation based on frame
correlation.
In [2] a novel algorithm for camera detection based on SIFT features was introduced in this study.
The proposed method involves the analysis of multiple frames of images in a sequential manner. Initially, the
images are converted into grayscale and divided into blocks. Subsequently, the dynamic texture of the film is
computed, and the correlation between the dynamic texture of adjacent frames and the matching degree of
SIFT features is determined. Based on these matching results, pre-detection outcomes are obtained.
Idan et al. [11] proposed a fast video processing method for SBD. To reduce computing costs and
disturbances, the proposed SBD framework makes use of candidate segment selection with frame active area
and separable moments. Inequality criteria and adaptive threshold are used to exclude non-transition frames
and maintain candidate segments. Cut transition detection is done using machine learning statistics.
In [12] a practical SBD method was presented in the study, which uses average edge information for
gradual transition detection and gradient and color information for abrupt transition detection. Processing only
transition regions yield an average edge frame and reduces computational complexity. In [5] The proposed
method comprises two distinct stages. In the initial stage, projection features were employed to differentiate
between non-boundary transitions and candidate transitions that potentially encompass abrupt boundaries.
Consequently, only the candidate transitions were retained for further analysis in the subsequent stage. This
approach effectively enhances the speed of shot detection by minimizing the detection scope. In [13] An
effective SBD approach with several invariant properties was presented in this work. With the right mix of
invariant features, such as edge change ratio (ECR), color layout descriptor (CLD), and scale-invariant feature
transform (SIFT) key point descriptors, the accuracy level of SBD was increased.
According to the literature, many applications have been created to address the issue of shot boundary
detection in videos. These applications are performed based on various techniques to process the challenges in
SBD. This proposed SBD system has been achieved in three stages to improve its performance and try to
reduce the problem of object and camera motion, wherein the first stage the redundancy frames in the same
shots are reduced based on correlation value comparison, this stage yields minimizing time-consuming and
computation complexity. Then in the second stage candidate transition is determined by comparing the objects
of sequential frames, final stage the decision of the cut transition is made based on key points matching of SIFT
method. This proposed method aims to find the boundary frame of a shot with a cut transition between
consecutive shots accurately. The rest of the paper is organized as follows, section 2 explains the proposed
method, the experimental result, and the analysis demonstrated in section 3, followed by a conclusion in
section 4.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139
132
2. SBD PROPOSED METHOD
This proposed SBD system has been achieved in three stages, in the first stage, multiple images are
read in temporal sequence and transformed into grayscale images. Based on correlation value comparison, the
number of redundant frames in the same shots is decreased, and then, in the second stage, a candidate transition
is identified by comparing the objects of successive frames using the proposed method to extract frame image
objects. In the last stage, the cut transition is decided upon by matching key points using the SIFT approach.
The details of these stages are explained as follows:
2.1. Reduces redundancy stage
The multiple frames of input video are extracted as the first step, then converted into grayscale and
resized into 256*256. Some pre-processed operations are achieved on these frames to improve their quality
when the noise is removed by the wiener filter [14], and contrast is enhanced by histogram equalization [15].
The resulting frames are normalized in the range [0-1].
In one shot the consecutive frames have a very high similarity, and achieving the SBD process on
each pair of frames will be very time-consuming and computationally complex. So, to minimize this time and
complexity the redundancy frames in one shot have been reduced based on the measure of their correlation
value. The correlation value (r) of frames Fr(i) and Fr(i+1) and based on the threshold value (Th) identified
experimentally the frame Fr(i) is passed to the next stage, otherwise, frame Fr(i) is discarded as demonstrated
in (1). Where the correlation value is calculated as explained in (2) [16].
Passed to next stage r < Th
Fr(i)
discard otherwise (1)
𝑟 =
∑ (𝑥𝑖−𝑥𝑚 )(𝑦𝑖−𝑦𝑚)
𝑖
√(∑ (𝑥𝑖−𝑥𝑚
𝑖 )2 √(∑ (𝑦𝑖−𝑦𝑚
𝑖 )2
(2)
where 𝑥𝑖 denotes the pixel intensity in order ith of the first image, and 𝑦𝑖 demarcated the ith pixel
intensity of the second image, additionally, 𝑥𝑚 and 𝑦𝑚 is the mean intensity of first and second images
sequentially.
2.2. Selection of candidate transition stage
Candidate transition selection is performed based on comparison made on consecutive frame objects,
that means on frame image content. This image content extraction is achieved based on the proposed extraction
method as explained in Figure 1in this stage. As seen in the figure, the objects of the frame have been extracted
in two steps, which are the generation of the feature template, and extract the object, these steps are detailed as
following:
Figure 1. Frame objects extraction flowchart
2.2.1. First step (generate features template)
For each consecutive frame passed to this stage the template of features is generated when multiple
features are extracted and combined from each frame image. The selection of these multiple features must be
able to extract the objects of a frame image accurately, so in this proposed extraction method of this proposed
SBD algorithm, the multiple features are represented by the texture characteristics that yield information about
the local variability of the pixel's intensity values are recovered using the standard deviation filter (SD) [17] of
the 3-by-3 neighborhood around the consistent pixel. The value luminance grayscale of these processed frames
Comput Sci Inf Technol ISSN: 2722-3221 
Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim)
133
is represented by channel L* in the L* a* b* color space [18] used as second feature. The L*a*b* typically
appears to be able to depict the colors to human vision. Additionally, because the RGB representation includes
a transition color between blue and green, the L*a*b* color representation compensates for the diversity in the
color distribution in the RGB color model [19]. For this reason, L*a*b* is taken into account along with its L*
value. These two feature matrices are then merged with the edge of the detected frame by a canny operator
which has the ability to recognize object boundaries in an image and object appreciation to create a feature
template. The following is how SD is calculated [20].
𝜇𝑗 =
1
𝑁
∑𝑥𝑗𝑖
𝑁
𝑖=1
(3)
𝜎𝑗 = √
1
𝑁
∑(𝑥𝑗𝑖 − 𝜇𝑗)2
𝑁
𝑖=1
(4)
2.2.2. Second step (objects extracted)
Utilize the k-means [21] algorithm with this created template to extract the objects from these
successive frames. A k-number group of data is gathered in order to use K-means. kmeans method consists of
two stages. In the first, the centroid is initialized, and in the second, the distance to the closest centroid is used
to identify which cluster the data point belongs to. Because of its ease of use and quick calculation, the k-means
clustering approach is widely utilized in clustering processes [22], which is the reason that it was chosen for
this phase. Consequently, the frame image object has been identified based on this proposed extraction method
with generated features template and K-means technique.
The frames' similarity has been measured based on the objects' comparison by dividing images of
objects of related sequential frames into 8×8 blocks, and then the entropy value of each block is calculated, in
turn, these entropies values are arranged into vectors of the length 64, which represent similarity measurement
vectors as explained in Figure 2, and then the standard deviation is calculated to differences between these two
entropies vectors of object images of consecutive frames when the value of standard deviation is nearest to
zero normal transition has been distinguished. According to the threshold (Thr) value perceived experimentally,
the abrupt transition has been a candidate, otherwise, normal transition has been detected as demarcated in (5).
Entropy value is determined as in (6) [23]. In turn, these candidate frames are passed to the third stage to make
a decision of abrupt transition.
Figure 2. Construct similarity measurement vector of object image
Abrupt transition candidate sd > Thr
Fri
Normal transition otherwise (5)
Let Fri represent the video frame with index i
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139
134
𝐻𝑟 = − ∑ log2(𝑔𝑟
−𝑘
𝑔𝑟
−𝑘
) (6)
Where 𝑔𝑟
−𝑘
denote distribution of assumed color space.
2.3. Transition decision stage
Making the right choice when deciding how to divide a video sequence into shots is mostly dependent
on selecting the right method. David adopted a scale-invariant feature transform SIFT [24]. The SIFT feature
has been used in this stage to determine the frame transition and its boundary because, given an image as input,
the SIFT descriptor generates a wide range of local feature vectors that are independent of image scaling and
rotation. SIFT is capable of precisely correlating two images [13]. In situations of abrupt transitions, when the
matching degree of the SIFT feature between the frames is low, neighboring frames are recognized as
belonging to different shots, which can better discern the moving objects in successive frames.
3. EXPERIMENTAL RESULTS AND ANALYSIS
Eight distinct videos from the standard dataset, TRECVid 2001 test data made existing on the open
video project and accessible at https://open-video.org, are used to assess the suggested method in this research.
These videos are referred to as Vid1 through Vid8. A comprehensive description of those input videos is
provided in Table 1. The ground truth value is determined by observing abrupt changes as seen by people. The
chosen videos contain a variety of aberrations, including lighting variations, viewpoint shifts, scaling, zooming,
rotation, and more.
Table 1. Description of input videos
Input
video
Video name Time Duration
In sec.
Frames
number
Abrupt
transition
Vid1 Free-for-all race at Charter Oak Park (Historical) 26 853 3
Vid2 New Indians, Segment 101 (Documentary) 131 3953 14
Vid3 New Indians, Segment 01 (Documentary) 56 1687 15
Vid4 Winning Aerospace, Segment 02 (Documentary) 65 1970 11
Vid5 Hidden Fury, segment 10 (Documentary) 33 1002 1
Vid6 Hurricanes, Segment 05 (Documentary) 115 3448 32
Vid7 The Miracle of Water, segment 05 (Documentary) 83 2314 1
Vid8 Winning Aerospace, Segment 04 (Documentary) 110 3318 18
3.1. Reduces redundancy stage
When the multiple frames of input video have been extracted, the frames images in the same shot
have a high similarity degree and when performing features extraction to extract objects from each frame image
results in time-consuming and computing complexity, so reducing the redundancy frames stage results in time-
consuming minimization as seen in Table 2 and Figure 3, for instance, the execution time was equivalent to
(111.4 seconds) when the second stage was applied to all of the vid1's frames, that means without similarity
frames reduction. as opposed to the execution time (44.41 seconds) when vid1 advanced to the lower
redundancy level, and so on to others videos as explained in this table that shows how much time each utilized
video takes.
3.2. Selection of candidate transition stage
Based on the motion of the object and/or the camera, shots may be categorized into four types: static
objects with static cameras, dynamic objects with static cameras, dynamic objects with dynamic cameras, and
dynamic objects with dynamic cameras [25]. Candidate transition selection is performed based on comparison
made on consecutive frames objects. This stage is achieved by comparison made to the extracted objects of
frames images based on the created features template by combining multiple features texture, edges, and L*
value of L*a*b* color space applied to the k-means technique.
The stage starts choosing potentially cut transition frames by examining the standard deviation to the
differences of vectors created from frames object blocks for similarity comparison after the frames Fri and
Fri+1 pass the first stage based on their correlation value measure. Table 3 and Figure 4 describe how the block
size of the frame image object is determined empirically, where Figure 4(a) explains the block size effect on
execution time and Figure 4(b) demonstrates the effect of block size on F-score. Vid3, Vid4, and Vid8 are
taken as examples to demonstrate that block size affects execution time and accuracy in this table. For this
investigation, 8*8 blocks with a 32*32 block size are more appropriate in this study.
Comput Sci Inf Technol ISSN: 2722-3221 
Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim)
135
Table 2. Time consuming comparison
Videos Execution time
in Sec. (with reduction)
Execution time
in Sec. (without reduction)
Vid1 44.41 111.42
Vid2 224.94 785.83
Vid3 99.75 314.75
Vid4 129.07 363.44
Vid5 70.178 177.52
Vid6 320.34 566.40
Vid7 111.47 421.15
Vid8 201.23 679.02
Figure 3. Comparison in execution time
(a) (b)
Figure 4. Block size effect, (a) on execution time and (b) on F-score
Table 3. The effect of block size within 256*256 frame size
videos 4*4 blocks (64*64 block size) 8*8 blocks (32*32 block size)
Execution time
In Sec.
F-score Execution time
In Sec.
F-score
Vid3 183.58 0.95 99.75 0.96
Vid4 148.92 0.90 129.07 1.00
Vid8 311.36 0.90 201.23 0.97
To explain the frame’s object extraction, for example with samples of frames that explained in
Figure 5, the frame objects extraction method steps demonstrate in Figure 6. The recovered combined features
(Texture, frame edge, and L* value of L*a*b* color space) from frames i and i+1 create the template features
for each one. The frame objects are then extracted for the frame similarity comparison using the k-means
approach. If identical objects are found in two consecutive frames, they are likely associated with the same
shot; if not, a cut shot transition is a possibility. The significant problem of object and camera movements can
be addressed by similarity discovery based on object comparison because the frame object is recognized where
it should be in the image of succeeding frames.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139
136
Figure 5. transitions examples
Figure 6. Example of consecutive frames objects extracted
This proposed object extraction method has been assessed for adopting in this proposed SBD
algorithim. According to Table 4 and Figure 7, which describe the information content as determined by the
entropy value that means the accuracy of extracting objects by the proposed extraction method of frame, in this
table some frames that apply extraction its objects from some different used videos are selected as samples for
evaluation. As a result of this evaluation explained in this table, and from the analysis of this evaluation, this
proposed object extraction operation has been adopted in this stage of the proposed SBD algorithm.
Figure 7. Object extraction accuracy using entropy
Table 4. Object extraction evaluation using entropy measure (Ent)
Vid2 Vid3 Vid5 Vid6 Vid7 Vid8
Fr. 397 398 262 263 432 432 82 83 568 569 1375 1376 517
Ent 0.926 0.634 0.890 0.985 0.660 0.969 0.998 0.989 0.968 0.992 0.979 0.956 0.884
Comput Sci Inf Technol ISSN: 2722-3221 
Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim)
137
3.3. Transition decision stage
The SIFT properties are adopted in this stage for shot transition decision-making because when it
comes to rotation, and zoom, SIFT characteristics remain unaffected and it able to reflect the local variation of
moving object efficiently, and may be used to impartially characterize the image [2]. SIFT key points are
detected, features are extracted from candidate frames of video results from the previous stage, then feature
matching is performed. In features matching two features’ matrices of frame i, frame i+1 have been matched
using distance calculation results in a p-by-1 vector, where p represents the key point number that is detected.
And from the matched features shot transition decision-making, when the matching degree of the SIFT feature
between the frames is low, neighboring frames are recognized as belonging to different shots. Figure 8(a)
demonstrates features key point matching for frames in same shot, and Figure 8(b) from different shot.
(a) (b)
Figure 8. Frames shots feature key points matching, (a) frames in the same shot and (b) frames in a
different shot
As seen in the figure, due to comparable visual features, the similarity matching between two frames
in the same shot is typically high. Frames from diverse shots, however, lack visual uniformity. They therefore
have either little or no similarity matching.
Recall and precision are the key performance metrics of the suggested system that are typically
employed in the SBD process. The F1 score, which is the harmonic mean of precision and recall, is used in
this paper's evaluation along with these metrics [2]. The following formula can be used to compute these
metrics [5]:
𝑅 =
𝑡𝑟𝑢𝑒
𝑡𝑟𝑢𝑒 + 𝑚𝑖𝑠𝑠
(7)
𝑃 =
𝑡𝑟𝑢𝑒
𝑡𝑟𝑢𝑒 + 𝑓𝑎𝑙𝑠𝑒
(8)
𝐹 − 𝑠𝑐𝑜𝑟𝑒 =
2 ∗ 𝑃 ∗ 𝑅
𝑃 + 𝑅
(9)
where True denotes accurate transition detection, False denotes inaccurate transition detection, and Miss
denotes missed transition detection. Table 5 demonstrates the accuracy with these metrics of this proposed
SBD algorithm.
Table 5. Efficiency of the proposed method
Video Recall Precision F-score
Vid1 1.00 1.00 1.00
Vid2 1.00 1.00 1.00
Vid3 0.93 1.00 0.96
Vid4 1.00 1.00 1.00
Vid5 1.00 1.00 1.00
Vid6 0.87 0.96 0.91
Vid7 1.00 1.00 1.00
Vid8 0.94 0.94 0.94
Average 0.96 0.98 0.97
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139
138
3. CONCLUSION
By comparing frame image objects and using a scale-invariant feature transform SIFT feature with
the discard to the redundant frames of the same shot, the suggested SBD approach has been realized. Three
stages are involved in implementing this proposed system: first, the redundancy frames are reduced based on
their correlation value; this reduces computation complexity and time consumption; second, the candidate shot
transition and boundary are identified based on object comparison using proposed extraction method; this stage
can identify objects that where should be in the image of subsequent frames. The last step then uses the SIFT
feature to choose which of these candidate frames to select. The research demonstrates that this approach
minimizes false positives by utilizing SIFT matching key points, which are independent of the scale and
rotation of the image. Our method yields a 97% F1 score, which is high result while requiring a lesser amount
of time and complexity.
ACKNOWLEDGEMENTS
Authors thank the Department of Computer Science, College of Science, Mustansiriyah University,
Baghdad-Iraq for supporting this present work.
REFERENCES
[1] Z. El Khattabi, Y. Tabii, and A. Benkaddour, “Video shot boundary detection using the scale invariant feature transform and RGB
color channels.,” International Journal of Electrical & Computer Engineering (2088-8708), vol. 7, no. 5, 2017.
[2] L. Kong, “SIFT feature-based video camera boundary detection algorithm,” Complexity, vol. 2021, pp. 1–11, 2021.
[3] T. Zhou, F. Porikli, D. J. Crandall, L. Van Gool, and W. Wang, “A survey on deep learning technique for video segmentation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7099–7122, 2022.
[4] D. M. Thounaojam, T. Khelchandra, K. M. Singh, and S. Roy, “A genetic algorithm and fuzzy logic approach for video shot
boundary detection,” Computational intelligence and neuroscience, vol. 2016, 2016.
[5] E. Hato, “Temporal video segmentation using optical flow estimation,” Iraqi Journal of Science, pp. 4181–4194, 2021.
[6] H. Shao, Y. Qu, and W. Cui, “Shot boundary detection algorithm based on HSV histogram and HOG feature,” in 2015 International
Conference on Advanced Engineering Materials and Technology, Atlantis Press, pp. 951–957, 2015.
[7] S. Tippaya, S. Sitjongsataporn, T. Tan, M. M. Khan, and K. Chamnongthai, “Multi-modal visual features-based video shot boundary
detection,” IEEE Access, vol. 5, pp. 12563–12575, 2017, doi: 10.1109/ACCESS.2017.2717998.
[8] S. Akpinar and F. Alpaslan, “A novel optical flow-based representation for temporal video segmentation,” Turkish Journal of
Electrical Engineering and Computer Sciences, vol. 25, no. 5, pp. 3983–3993, 2017.
[9] M. Haroon, J. Baber, I. Ullah, S. M. Daudpota, M. Bakhtyar, and V. Devi, “Video scene detection using compact bag of visual word
models,” Advances in Multimedia, vol. 2018, pp. 1–9, 2018.
[10] F.-F. Duan and F. Meng, “Video shot boundary detection based on feature fusion and clustering technique,” IEEE Access, vol. 8,
pp. 214633–214645, 2020.
[11] Z. N. Idan, S. H. Abdulhussain, B. M. Mahmmod, K. A. Al-Utaibi, S. A. R. Al-Hadad, and S. M. Sait, “Fast shot boundary detection
based on separable moments and support vector machine,” IEEE Access, vol. 9, pp. 106412–106427, 2021.
[12] N. Kumar, “Shot boundary detection framework for video editing via adaptive thresholds and gradual curve point,” Turkish Journal
of Computer and Mathematics Education (TURCOMAT), vol. 12, no. 11, pp. 3820–3828, 2021.
[13] J. T. Jose, S. Rajkumar, M. R. Ghalib, A. Shankar, P. Sharma, and M. R. Khosravi, “Efficient shot boundary detection with multiple
visual representations,” Mobile Information Systems, vol. 2022, 2022.
[14] K. A. Akintoye, N. A. F. B. Ismial, N. Z. S. B. Othman, M. S. M. Rahim, and A. H. Abdullah, “Composite median Wiener filter
based technique for image enhancement.,” Journal of Theoretical & Applied Information Technology, vol. 96, no. 15, 2018.
[15] S. H. Majeed and N. A. M. Isa, “Adaptive entropy index histogram equalization for poor contrast images,” IEEE Access, vol. 9, pp.
6402–6437, 2020, doi: 10.1109/ACCESS.2020.3048148.
[16] A. M. Neto, A. C. Victorino, I. Fantoni, D. E. Zampieri, J. V. Ferreira, and D. A. Lima, “Image processing using Pearson’s
correlation coefficient: Applications on autonomous robotics,” in 2013 13th International Conference on Autonomous Robot
Systems, IEEE, pp. 1–6,2013.
[17] N. K. Ibrahim, A. H. Al-Saleh, and A. S. A. Jabar, “Texture and pixel intensity characterization-based image segmentation with
morphology and watershed techniques,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 3, pp.
1464–1477, 2023. doi: 10.11591/ijeecs.v31.i3.
[18] N. khalid, “Hybrid features of mask generated with gabor filter for texture analysis and sobel operator for image regions
segmentation using K-Means technique,” Journal La Multiapp, vol. 3, no. 5, pp. 250–258, 2022, doi:
10.37899/journallamultiapp.v3i5.743.
[19] X. Zheng, Q. Lei, R. Yao, Y. Gong, and Q. Yin, “Image segmentation based on adaptive K-means algorithm,” EURASIP Journal
on Image and Video Processing, vol. 2018, no. 1, pp. 1–10, 2018.
[20] U. Petronas, “Mean and standard deviation features of color histogram using laplacian filter for content-based image retrieval,”
Journal of Theoretical and Applied Information Technology, vol. 34, no. 1, pp. 1–7, 2011.
[21] R. Sammouda and A. El-Zaart, “An optimized approach for prostate image segmentation using K-means clustering algorithm with
elbow method,” Computational Intelligence and Neuroscience, vol. 2021, 2021.
[22] N. Dhanachandra and Y. J. Chanu, “A new approach of image segmentation method using K-means and kernel based subtractive
clustering methods,” International Journal of Applied Engineering Research, vol. 12, no. 20, pp. 10458–10464, 2017.
[23] N. M. Kwok, Q. P. Ha, and G. Fang, “Effect of color space on color image segmentation,” in 2009 2nd International Congress on
Image and Signal Processing, IEEE, pp. 1–5, 2009.
[24] L. David, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–
110, 2004.
[25] S. H. Abdulhussain, A. R. Ramli, M. I. Saripan, B. M. Mahmmod, S. A. R. Al-Haddad, and W. A. Jassim, “Methods and challenges
in shot boundary detection: a review,” Entropy, vol. 20, no. 4, p. 214, 2018.
Comput Sci Inf Technol ISSN: 2722-3221 
Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim)
139
BIOGRAPHIES OF AUTHORS
Noor Khalid Ibrahim is lecturer at college of college of science, Mustansiriyah
University, Iraq. Received the B.Sc. degree in computer science from Department of
Computer, College of Science, Mustansiriyah University, Iraq. She holds a master degree in
computer science at 2015, with specialization in multi-media. Her research areas in image
processing. She can be contacted at email: noor.kh20@uomustansiriyah.edu.iq.
Zinah Sadeq Abduljabbar is lectuter at collage of science, Mustansiriyah
university, Iraq. Received the B.Sc. degree in computer science from department of computer,
collage of science, Mustansiriyah university, Iraq. She holds a master degree in computer
science at 2014, with specialization in multi-media. she can be contacted at email:
zinahsadeq@uomustansiriyah.edu.iq.

More Related Content

Similar to Video shot boundary detection based on frames objects comparison and scale-invariant feature transform technique (20)

Ijciet 10 02_043
Ijciet 10 02_043Ijciet 10 02_043
Ijciet 10 02_043
IAEME Publication
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Ijripublishers Ijri
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET Journal
 
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
ijtsrd
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Ijripublishers Ijri
 
IRJET - A Research on Video Forgery Detection using Machine Learning
IRJET -  	  A Research on Video Forgery Detection using Machine LearningIRJET -  	  A Research on Video Forgery Detection using Machine Learning
IRJET - A Research on Video Forgery Detection using Machine Learning
IRJET Journal
 
Content based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transformContent based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transform
nooriasukmaningtyas
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET Journal
 
Video saliency detection using modified high efficiency video coding and back...
Video saliency detection using modified high efficiency video coding and back...Video saliency detection using modified high efficiency video coding and back...
Video saliency detection using modified high efficiency video coding and back...
International Journal of Reconfigurable and Embedded Systems
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
IJAEMSJORNAL
 
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonVideo Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
CSCJournals
 
A Study on Algorithms for Block Motion Estimation in Video Coding
A Study on Algorithms for Block Motion Estimation in Video CodingA Study on Algorithms for Block Motion Estimation in Video Coding
A Study on Algorithms for Block Motion Estimation in Video Coding
Associate Professor in VSB Coimbatore
 
Video Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience DetectionVideo Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience Detection
IRJET Journal
 
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
Best Jobs
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
IJERA Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
1829 1833
1829 18331829 1833
1829 1833
Editor IJARCET
 
1829 1833
1829 18331829 1833
1829 1833
Editor IJARCET
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding System
ijtsrd
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
IAESIJEECS
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Ijripublishers Ijri
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET Journal
 
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...
ijtsrd
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Ijripublishers Ijri
 
IRJET - A Research on Video Forgery Detection using Machine Learning
IRJET -  	  A Research on Video Forgery Detection using Machine LearningIRJET -  	  A Research on Video Forgery Detection using Machine Learning
IRJET - A Research on Video Forgery Detection using Machine Learning
IRJET Journal
 
Content based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transformContent based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transform
nooriasukmaningtyas
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET Journal
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
IJAEMSJORNAL
 
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonVideo Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
CSCJournals
 
Video Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience DetectionVideo Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience Detection
IRJET Journal
 
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
Best Jobs
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
IJERA Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding System
ijtsrd
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
IAESIJEECS
 

More from CSITiaesprime (20)

Vector space model, term frequency-inverse document frequency with linear sea...
Vector space model, term frequency-inverse document frequency with linear sea...Vector space model, term frequency-inverse document frequency with linear sea...
Vector space model, term frequency-inverse document frequency with linear sea...
CSITiaesprime
 
Electro-capacitive cancer therapy using wearable electric field detector: a r...
Electro-capacitive cancer therapy using wearable electric field detector: a r...Electro-capacitive cancer therapy using wearable electric field detector: a r...
Electro-capacitive cancer therapy using wearable electric field detector: a r...
CSITiaesprime
 
Technology adoption model for smart urban farming-a proposed conceptual model
Technology adoption model for smart urban farming-a proposed conceptual modelTechnology adoption model for smart urban farming-a proposed conceptual model
Technology adoption model for smart urban farming-a proposed conceptual model
CSITiaesprime
 
Optimizing development and operations from the project success perspective us...
Optimizing development and operations from the project success perspective us...Optimizing development and operations from the project success perspective us...
Optimizing development and operations from the project success perspective us...
CSITiaesprime
 
Unraveling Indonesian heritage through pattern recognition using YOLOv5
Unraveling Indonesian heritage through pattern recognition using YOLOv5Unraveling Indonesian heritage through pattern recognition using YOLOv5
Unraveling Indonesian heritage through pattern recognition using YOLOv5
CSITiaesprime
 
Capabilities of cellebrite universal forensics extraction device in mobile de...
Capabilities of cellebrite universal forensics extraction device in mobile de...Capabilities of cellebrite universal forensics extraction device in mobile de...
Capabilities of cellebrite universal forensics extraction device in mobile de...
CSITiaesprime
 
Company clustering based on financial report data using k-means
Company clustering based on financial report data using   k-meansCompany clustering based on financial report data using   k-means
Company clustering based on financial report data using k-means
CSITiaesprime
 
Securing DNS over HTTPS traffic: a real-time analysis tool
Securing DNS over HTTPS traffic: a real-time analysis toolSecuring DNS over HTTPS traffic: a real-time analysis tool
Securing DNS over HTTPS traffic: a real-time analysis tool
CSITiaesprime
 
Adversarial attacks in signature verification: a deep learning approach
Adversarial attacks in signature verification: a deep learning approachAdversarial attacks in signature verification: a deep learning approach
Adversarial attacks in signature verification: a deep learning approach
CSITiaesprime
 
Optimizing classification models for medical image diagnosis: a comparative a...
Optimizing classification models for medical image diagnosis: a comparative a...Optimizing classification models for medical image diagnosis: a comparative a...
Optimizing classification models for medical image diagnosis: a comparative a...
CSITiaesprime
 
Acoustic echo cancellation system based on Laguerre method and neural network
Acoustic echo cancellation system based on Laguerre method and neural networkAcoustic echo cancellation system based on Laguerre method and neural network
Acoustic echo cancellation system based on Laguerre method and neural network
CSITiaesprime
 
Clustering man in the middle attack on chain and graph-based blockchain in in...
Clustering man in the middle attack on chain and graph-based blockchain in in...Clustering man in the middle attack on chain and graph-based blockchain in in...
Clustering man in the middle attack on chain and graph-based blockchain in in...
CSITiaesprime
 
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
CSITiaesprime
 
Development of learning videos for natural science subjects in junior high sc...
Development of learning videos for natural science subjects in junior high sc...Development of learning videos for natural science subjects in junior high sc...
Development of learning videos for natural science subjects in junior high sc...
CSITiaesprime
 
Clustering of uninhabitable houses using the optimized apriori algorithm
Clustering of uninhabitable houses using the optimized apriori algorithmClustering of uninhabitable houses using the optimized apriori algorithm
Clustering of uninhabitable houses using the optimized apriori algorithm
CSITiaesprime
 
Improving support vector machine and backpropagation performance for diabetes...
Improving support vector machine and backpropagation performance for diabetes...Improving support vector machine and backpropagation performance for diabetes...
Improving support vector machine and backpropagation performance for diabetes...
CSITiaesprime
 
Machine learning-based anomaly detection for smart home networks under advers...
Machine learning-based anomaly detection for smart home networks under advers...Machine learning-based anomaly detection for smart home networks under advers...
Machine learning-based anomaly detection for smart home networks under advers...
CSITiaesprime
 
Transfer learning: classifying balanced and imbalanced fungus images using in...
Transfer learning: classifying balanced and imbalanced fungus images using in...Transfer learning: classifying balanced and imbalanced fungus images using in...
Transfer learning: classifying balanced and imbalanced fungus images using in...
CSITiaesprime
 
Implementation of automation configuration of enterprise networks as software...
Implementation of automation configuration of enterprise networks as software...Implementation of automation configuration of enterprise networks as software...
Implementation of automation configuration of enterprise networks as software...
CSITiaesprime
 
Hybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networksHybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networks
CSITiaesprime
 
Vector space model, term frequency-inverse document frequency with linear sea...
Vector space model, term frequency-inverse document frequency with linear sea...Vector space model, term frequency-inverse document frequency with linear sea...
Vector space model, term frequency-inverse document frequency with linear sea...
CSITiaesprime
 
Electro-capacitive cancer therapy using wearable electric field detector: a r...
Electro-capacitive cancer therapy using wearable electric field detector: a r...Electro-capacitive cancer therapy using wearable electric field detector: a r...
Electro-capacitive cancer therapy using wearable electric field detector: a r...
CSITiaesprime
 
Technology adoption model for smart urban farming-a proposed conceptual model
Technology adoption model for smart urban farming-a proposed conceptual modelTechnology adoption model for smart urban farming-a proposed conceptual model
Technology adoption model for smart urban farming-a proposed conceptual model
CSITiaesprime
 
Optimizing development and operations from the project success perspective us...
Optimizing development and operations from the project success perspective us...Optimizing development and operations from the project success perspective us...
Optimizing development and operations from the project success perspective us...
CSITiaesprime
 
Unraveling Indonesian heritage through pattern recognition using YOLOv5
Unraveling Indonesian heritage through pattern recognition using YOLOv5Unraveling Indonesian heritage through pattern recognition using YOLOv5
Unraveling Indonesian heritage through pattern recognition using YOLOv5
CSITiaesprime
 
Capabilities of cellebrite universal forensics extraction device in mobile de...
Capabilities of cellebrite universal forensics extraction device in mobile de...Capabilities of cellebrite universal forensics extraction device in mobile de...
Capabilities of cellebrite universal forensics extraction device in mobile de...
CSITiaesprime
 
Company clustering based on financial report data using k-means
Company clustering based on financial report data using   k-meansCompany clustering based on financial report data using   k-means
Company clustering based on financial report data using k-means
CSITiaesprime
 
Securing DNS over HTTPS traffic: a real-time analysis tool
Securing DNS over HTTPS traffic: a real-time analysis toolSecuring DNS over HTTPS traffic: a real-time analysis tool
Securing DNS over HTTPS traffic: a real-time analysis tool
CSITiaesprime
 
Adversarial attacks in signature verification: a deep learning approach
Adversarial attacks in signature verification: a deep learning approachAdversarial attacks in signature verification: a deep learning approach
Adversarial attacks in signature verification: a deep learning approach
CSITiaesprime
 
Optimizing classification models for medical image diagnosis: a comparative a...
Optimizing classification models for medical image diagnosis: a comparative a...Optimizing classification models for medical image diagnosis: a comparative a...
Optimizing classification models for medical image diagnosis: a comparative a...
CSITiaesprime
 
Acoustic echo cancellation system based on Laguerre method and neural network
Acoustic echo cancellation system based on Laguerre method and neural networkAcoustic echo cancellation system based on Laguerre method and neural network
Acoustic echo cancellation system based on Laguerre method and neural network
CSITiaesprime
 
Clustering man in the middle attack on chain and graph-based blockchain in in...
Clustering man in the middle attack on chain and graph-based blockchain in in...Clustering man in the middle attack on chain and graph-based blockchain in in...
Clustering man in the middle attack on chain and graph-based blockchain in in...
CSITiaesprime
 
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
CSITiaesprime
 
Development of learning videos for natural science subjects in junior high sc...
Development of learning videos for natural science subjects in junior high sc...Development of learning videos for natural science subjects in junior high sc...
Development of learning videos for natural science subjects in junior high sc...
CSITiaesprime
 
Clustering of uninhabitable houses using the optimized apriori algorithm
Clustering of uninhabitable houses using the optimized apriori algorithmClustering of uninhabitable houses using the optimized apriori algorithm
Clustering of uninhabitable houses using the optimized apriori algorithm
CSITiaesprime
 
Improving support vector machine and backpropagation performance for diabetes...
Improving support vector machine and backpropagation performance for diabetes...Improving support vector machine and backpropagation performance for diabetes...
Improving support vector machine and backpropagation performance for diabetes...
CSITiaesprime
 
Machine learning-based anomaly detection for smart home networks under advers...
Machine learning-based anomaly detection for smart home networks under advers...Machine learning-based anomaly detection for smart home networks under advers...
Machine learning-based anomaly detection for smart home networks under advers...
CSITiaesprime
 
Transfer learning: classifying balanced and imbalanced fungus images using in...
Transfer learning: classifying balanced and imbalanced fungus images using in...Transfer learning: classifying balanced and imbalanced fungus images using in...
Transfer learning: classifying balanced and imbalanced fungus images using in...
CSITiaesprime
 
Implementation of automation configuration of enterprise networks as software...
Implementation of automation configuration of enterprise networks as software...Implementation of automation configuration of enterprise networks as software...
Implementation of automation configuration of enterprise networks as software...
CSITiaesprime
 
Hybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networksHybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networks
CSITiaesprime
 

Recently uploaded (20)

Autopilot for Everyone Series Session 2: Elevate Your Automation Skills
Autopilot for Everyone Series Session 2: Elevate Your Automation SkillsAutopilot for Everyone Series Session 2: Elevate Your Automation Skills
Autopilot for Everyone Series Session 2: Elevate Your Automation Skills
UiPathCommunity
 
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Safe Software
 
WebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API IntegrationWebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API Integration
Prowess Software Services Inc
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
BrainSell Technologies
 
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruptionFault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Karri Huhtanen
 
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use CasesAutopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
UiPathCommunity
 
Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025
Kaya Weers
 
oil seed milling- extraction and Refining
oil seed milling- extraction and Refiningoil seed milling- extraction and Refining
oil seed milling- extraction and Refining
MaheshKadam154653
 
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdf
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdfTranscript - Delta Lake Tips, Tricks & Best Practices (1).pdf
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdf
carlyakerly1
 
Towards value-awareness in administrative processes: an approach based on con...
Towards value-awareness in administrative processes: an approach based on con...Towards value-awareness in administrative processes: an approach based on con...
Towards value-awareness in administrative processes: an approach based on con...
Universidad Rey Juan Carlos
 
Monitor Kafka Clients Centrally with KIP-714
Monitor Kafka Clients Centrally with KIP-714Monitor Kafka Clients Centrally with KIP-714
Monitor Kafka Clients Centrally with KIP-714
Kumar Keshav
 
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents""Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
panktiskywinds12
 
Introduction to LLM Post-Training - MIT 6.S191 2025
Introduction to LLM Post-Training - MIT 6.S191 2025Introduction to LLM Post-Training - MIT 6.S191 2025
Introduction to LLM Post-Training - MIT 6.S191 2025
Maxime Labonne
 
ISTQB Foundation Level – Chapter 4: Test Design Techniques
ISTQB Foundation Level – Chapter 4: Test Design TechniquesISTQB Foundation Level – Chapter 4: Test Design Techniques
ISTQB Foundation Level – Chapter 4: Test Design Techniques
zubair khan
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
real time ai agent examples | AI agent development
real time ai agent examples | AI agent developmentreal time ai agent examples | AI agent development
real time ai agent examples | AI agent development
ybobbyyoung
 
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
VictorSzoltysek
 
Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)
3G4G
 
CLI, HTTP, GenAI and MCP telemetry/observability in Java
CLI, HTTP, GenAI and MCP telemetry/observability in JavaCLI, HTTP, GenAI and MCP telemetry/observability in Java
CLI, HTTP, GenAI and MCP telemetry/observability in Java
Pavel Vlasov
 
AI in Real Estate Industry PPT | Presentation
AI in Real Estate Industry PPT | PresentationAI in Real Estate Industry PPT | Presentation
AI in Real Estate Industry PPT | Presentation
Codiste
 
Autopilot for Everyone Series Session 2: Elevate Your Automation Skills
Autopilot for Everyone Series Session 2: Elevate Your Automation SkillsAutopilot for Everyone Series Session 2: Elevate Your Automation Skills
Autopilot for Everyone Series Session 2: Elevate Your Automation Skills
UiPathCommunity
 
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Safe Software
 
WebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API IntegrationWebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API Integration
Prowess Software Services Inc
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
BrainSell Technologies
 
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruptionFault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Karri Huhtanen
 
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use CasesAutopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
UiPathCommunity
 
Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025
Kaya Weers
 
oil seed milling- extraction and Refining
oil seed milling- extraction and Refiningoil seed milling- extraction and Refining
oil seed milling- extraction and Refining
MaheshKadam154653
 
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdf
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdfTranscript - Delta Lake Tips, Tricks & Best Practices (1).pdf
Transcript - Delta Lake Tips, Tricks & Best Practices (1).pdf
carlyakerly1
 
Towards value-awareness in administrative processes: an approach based on con...
Towards value-awareness in administrative processes: an approach based on con...Towards value-awareness in administrative processes: an approach based on con...
Towards value-awareness in administrative processes: an approach based on con...
Universidad Rey Juan Carlos
 
Monitor Kafka Clients Centrally with KIP-714
Monitor Kafka Clients Centrally with KIP-714Monitor Kafka Clients Centrally with KIP-714
Monitor Kafka Clients Centrally with KIP-714
Kumar Keshav
 
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents""Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
panktiskywinds12
 
Introduction to LLM Post-Training - MIT 6.S191 2025
Introduction to LLM Post-Training - MIT 6.S191 2025Introduction to LLM Post-Training - MIT 6.S191 2025
Introduction to LLM Post-Training - MIT 6.S191 2025
Maxime Labonne
 
ISTQB Foundation Level – Chapter 4: Test Design Techniques
ISTQB Foundation Level – Chapter 4: Test Design TechniquesISTQB Foundation Level – Chapter 4: Test Design Techniques
ISTQB Foundation Level – Chapter 4: Test Design Techniques
zubair khan
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
real time ai agent examples | AI agent development
real time ai agent examples | AI agent developmentreal time ai agent examples | AI agent development
real time ai agent examples | AI agent development
ybobbyyoung
 
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
VictorSzoltysek
 
Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)
3G4G
 
CLI, HTTP, GenAI and MCP telemetry/observability in Java
CLI, HTTP, GenAI and MCP telemetry/observability in JavaCLI, HTTP, GenAI and MCP telemetry/observability in Java
CLI, HTTP, GenAI and MCP telemetry/observability in Java
Pavel Vlasov
 
AI in Real Estate Industry PPT | Presentation
AI in Real Estate Industry PPT | PresentationAI in Real Estate Industry PPT | Presentation
AI in Real Estate Industry PPT | Presentation
Codiste
 

Video shot boundary detection based on frames objects comparison and scale-invariant feature transform technique

  • 1. Computer Science and Information Technologies Vol. 5, No. 2, July 2024, pp. 130~139 ISSN: 2722-3221, DOI: 10.11591/csit.v5i2.pp130-139  130 Journal homepage: http://iaesprime.com/index.php/csit Video shot boundary detection based on frames objects comparison and scale-invariant feature transform technique Noor Khalid Ibrahim, Zinah Sadeq Abduljabbar Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq Article Info ABSTRACT Article history: Received Dec 12, 2023 Revised Feb 24, 2024 Accepted Mar 4, 2024 The most popular source of data on the Internet is video which has a lot of information. Automating the administration, indexing, and retrieval of movies is the goal of video structure analysis, which uses content-based video indexing and retrieval. Video analysis requires the ability to recognize shot changes since video shot boundary recognition is a preliminary stage in the indexing, browsing, and retrieval of video material. A method for shot boundary detection (SBD) is suggested in this situation. This work proposes a shot boundary detection system with three stages. In the first stage, multiple images are read in temporal sequence and transformed into grayscale images. Based on correlation value comparison, the number of redundant frames in the same shots is decreased, from this point on, the amount of time and computational complexity is reduced. Then, in the second stage, a candidate transition is identified by comparing the objects of successive frames and analyzing the differences between the objects using the standard deviation metric. In the last stage, the cut transition is decided upon by matching key points using a scale-invariant feature transform (SIFT). The proposed system achieved an accuracy of 0.97 according to the F-score while minimizing time consumption. Keywords: Frames correlation Object comparison Shot boundary Video analysis Video segmentation This is an open access article under the CC BY-SA license. Corresponding Author: Noor Khalid Ibrahim Department of Computer Science, College of Science, Mustansiriyah University Baghdad, Iraq Email: noor.kh20@ uomustansiriyah.edu.iq 1. INTRODUCTION The vast amount of video content on the internet makes it challenging to develop effective indexing and search strategies for managing video data. Content-based video retrieval is emerging as a trend in video retrieval systems, while conventional methods like video compression and summarizing aim for minimal storage requirements and maximum visual and semantic accuracy [1]. Given that video is the most sophisticated sort of multimedia data, it includes information about the target's mobility within the scene as well as information about the objective world changing with time [2]. Two modules can be approximately regarded in video segmentation which are video object (foreground/background) segmentation, and video semantic segmentation [3]. Video segmentation, also known as shot boundary detection (SBD), involves breaking the video up into meaningful scenes so that the essential feature(s) may be found in each scene through analysis [4]. A cut is a sudden change in the shot that takes place inside a single frame. A fade is a gradual alteration in brightness that often begins or ends with a completely dark frame. Frames inside the transition show one image overlaid on the other during a dissolve, which happens as the images of the first shot go darker and the images of the second shot get brighter [1]. The primary
  • 2. Comput Sci Inf Technol ISSN: 2722-3221  Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim) 131 difficulties in shot boundary recognition are movements of the camera and objects since these can significantly change the video content, producing an effect akin to transition effects and leading to inaccurate shot transition detection [5]. Numerous studies have addressed video segmentation, Hong Shao et al. [6] utilized a combination of a color histogram with Hue Saturation Value (HSV) and features of histogram of gradient (HOG) to effectively detect abrupt shot changes in videos. In [3] This work proposes a shot boundary detection approach based on the scale-invariant feature transform (SIFT). Using a top-down search strategy, the initial phase of this approach compares the ratio of matched features derived by SIFT for each RGB channel of video frames to locate transitions. The boundaries' locations are shown in the overview stage. Second, to ascertain the kind of transition, a moving average computation is made. In [7] The research aimed to use a multi-modal visual features-based SBD framework; the behaviors of the visual representation are analyzed concerning the discontinuity signal. This used a candidate segment selection strategy that does not compute the threshold; instead, it utilizes the discontinuity signal's cumulative moving average to determine the shot boundary locations while disregarding the non-boundary video frames. To differentiate between a candidate segment that is a cut transition and one that is a gradual transition, including fade in/out and logo occurrence, the transition detection is carried out structurally. In [8] the proposed temporal video segment representation formalizes video scenes as temporal motion change data, determining motion modifications and cuts between scenes through optical flow character changes. This reduces the issue to an optical flow-based cut detection problem, enhancing a pixel-based representation. The proposed video segment representation divides temporal video segment points into cuts and non-cuts. In [9] the bag of visual word (BoVW) model, which splits the video into shots and keyframes, is the basis for the segmentation model for videos that the study suggested. The BoVW model is employed in two variants: the traditional BoVW and an expansion known as the vector of linearly aggregated descriptors (VLAD). Keyframe feature vectors inside a sliding window of length L are used to calculate similarity. In [10] The study presents a method for feature fusion and clustering technique (FFCT)-based video shot boundary detection, which involves converting interval frames into grayscale images, extracting fingerprint and speed- up robust features, fusion, and clustering them using a K-means algorithm. Linear discriminant analysis (LDA) is introduced for cluster mapping, and features are chosen using density computation based on frame correlation. In [2] a novel algorithm for camera detection based on SIFT features was introduced in this study. The proposed method involves the analysis of multiple frames of images in a sequential manner. Initially, the images are converted into grayscale and divided into blocks. Subsequently, the dynamic texture of the film is computed, and the correlation between the dynamic texture of adjacent frames and the matching degree of SIFT features is determined. Based on these matching results, pre-detection outcomes are obtained. Idan et al. [11] proposed a fast video processing method for SBD. To reduce computing costs and disturbances, the proposed SBD framework makes use of candidate segment selection with frame active area and separable moments. Inequality criteria and adaptive threshold are used to exclude non-transition frames and maintain candidate segments. Cut transition detection is done using machine learning statistics. In [12] a practical SBD method was presented in the study, which uses average edge information for gradual transition detection and gradient and color information for abrupt transition detection. Processing only transition regions yield an average edge frame and reduces computational complexity. In [5] The proposed method comprises two distinct stages. In the initial stage, projection features were employed to differentiate between non-boundary transitions and candidate transitions that potentially encompass abrupt boundaries. Consequently, only the candidate transitions were retained for further analysis in the subsequent stage. This approach effectively enhances the speed of shot detection by minimizing the detection scope. In [13] An effective SBD approach with several invariant properties was presented in this work. With the right mix of invariant features, such as edge change ratio (ECR), color layout descriptor (CLD), and scale-invariant feature transform (SIFT) key point descriptors, the accuracy level of SBD was increased. According to the literature, many applications have been created to address the issue of shot boundary detection in videos. These applications are performed based on various techniques to process the challenges in SBD. This proposed SBD system has been achieved in three stages to improve its performance and try to reduce the problem of object and camera motion, wherein the first stage the redundancy frames in the same shots are reduced based on correlation value comparison, this stage yields minimizing time-consuming and computation complexity. Then in the second stage candidate transition is determined by comparing the objects of sequential frames, final stage the decision of the cut transition is made based on key points matching of SIFT method. This proposed method aims to find the boundary frame of a shot with a cut transition between consecutive shots accurately. The rest of the paper is organized as follows, section 2 explains the proposed method, the experimental result, and the analysis demonstrated in section 3, followed by a conclusion in section 4.
  • 3.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139 132 2. SBD PROPOSED METHOD This proposed SBD system has been achieved in three stages, in the first stage, multiple images are read in temporal sequence and transformed into grayscale images. Based on correlation value comparison, the number of redundant frames in the same shots is decreased, and then, in the second stage, a candidate transition is identified by comparing the objects of successive frames using the proposed method to extract frame image objects. In the last stage, the cut transition is decided upon by matching key points using the SIFT approach. The details of these stages are explained as follows: 2.1. Reduces redundancy stage The multiple frames of input video are extracted as the first step, then converted into grayscale and resized into 256*256. Some pre-processed operations are achieved on these frames to improve their quality when the noise is removed by the wiener filter [14], and contrast is enhanced by histogram equalization [15]. The resulting frames are normalized in the range [0-1]. In one shot the consecutive frames have a very high similarity, and achieving the SBD process on each pair of frames will be very time-consuming and computationally complex. So, to minimize this time and complexity the redundancy frames in one shot have been reduced based on the measure of their correlation value. The correlation value (r) of frames Fr(i) and Fr(i+1) and based on the threshold value (Th) identified experimentally the frame Fr(i) is passed to the next stage, otherwise, frame Fr(i) is discarded as demonstrated in (1). Where the correlation value is calculated as explained in (2) [16]. Passed to next stage r < Th Fr(i) discard otherwise (1) 𝑟 = ∑ (𝑥𝑖−𝑥𝑚 )(𝑦𝑖−𝑦𝑚) 𝑖 √(∑ (𝑥𝑖−𝑥𝑚 𝑖 )2 √(∑ (𝑦𝑖−𝑦𝑚 𝑖 )2 (2) where 𝑥𝑖 denotes the pixel intensity in order ith of the first image, and 𝑦𝑖 demarcated the ith pixel intensity of the second image, additionally, 𝑥𝑚 and 𝑦𝑚 is the mean intensity of first and second images sequentially. 2.2. Selection of candidate transition stage Candidate transition selection is performed based on comparison made on consecutive frame objects, that means on frame image content. This image content extraction is achieved based on the proposed extraction method as explained in Figure 1in this stage. As seen in the figure, the objects of the frame have been extracted in two steps, which are the generation of the feature template, and extract the object, these steps are detailed as following: Figure 1. Frame objects extraction flowchart 2.2.1. First step (generate features template) For each consecutive frame passed to this stage the template of features is generated when multiple features are extracted and combined from each frame image. The selection of these multiple features must be able to extract the objects of a frame image accurately, so in this proposed extraction method of this proposed SBD algorithm, the multiple features are represented by the texture characteristics that yield information about the local variability of the pixel's intensity values are recovered using the standard deviation filter (SD) [17] of the 3-by-3 neighborhood around the consistent pixel. The value luminance grayscale of these processed frames
  • 4. Comput Sci Inf Technol ISSN: 2722-3221  Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim) 133 is represented by channel L* in the L* a* b* color space [18] used as second feature. The L*a*b* typically appears to be able to depict the colors to human vision. Additionally, because the RGB representation includes a transition color between blue and green, the L*a*b* color representation compensates for the diversity in the color distribution in the RGB color model [19]. For this reason, L*a*b* is taken into account along with its L* value. These two feature matrices are then merged with the edge of the detected frame by a canny operator which has the ability to recognize object boundaries in an image and object appreciation to create a feature template. The following is how SD is calculated [20]. 𝜇𝑗 = 1 𝑁 ∑𝑥𝑗𝑖 𝑁 𝑖=1 (3) 𝜎𝑗 = √ 1 𝑁 ∑(𝑥𝑗𝑖 − 𝜇𝑗)2 𝑁 𝑖=1 (4) 2.2.2. Second step (objects extracted) Utilize the k-means [21] algorithm with this created template to extract the objects from these successive frames. A k-number group of data is gathered in order to use K-means. kmeans method consists of two stages. In the first, the centroid is initialized, and in the second, the distance to the closest centroid is used to identify which cluster the data point belongs to. Because of its ease of use and quick calculation, the k-means clustering approach is widely utilized in clustering processes [22], which is the reason that it was chosen for this phase. Consequently, the frame image object has been identified based on this proposed extraction method with generated features template and K-means technique. The frames' similarity has been measured based on the objects' comparison by dividing images of objects of related sequential frames into 8×8 blocks, and then the entropy value of each block is calculated, in turn, these entropies values are arranged into vectors of the length 64, which represent similarity measurement vectors as explained in Figure 2, and then the standard deviation is calculated to differences between these two entropies vectors of object images of consecutive frames when the value of standard deviation is nearest to zero normal transition has been distinguished. According to the threshold (Thr) value perceived experimentally, the abrupt transition has been a candidate, otherwise, normal transition has been detected as demarcated in (5). Entropy value is determined as in (6) [23]. In turn, these candidate frames are passed to the third stage to make a decision of abrupt transition. Figure 2. Construct similarity measurement vector of object image Abrupt transition candidate sd > Thr Fri Normal transition otherwise (5) Let Fri represent the video frame with index i
  • 5.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139 134 𝐻𝑟 = − ∑ log2(𝑔𝑟 −𝑘 𝑔𝑟 −𝑘 ) (6) Where 𝑔𝑟 −𝑘 denote distribution of assumed color space. 2.3. Transition decision stage Making the right choice when deciding how to divide a video sequence into shots is mostly dependent on selecting the right method. David adopted a scale-invariant feature transform SIFT [24]. The SIFT feature has been used in this stage to determine the frame transition and its boundary because, given an image as input, the SIFT descriptor generates a wide range of local feature vectors that are independent of image scaling and rotation. SIFT is capable of precisely correlating two images [13]. In situations of abrupt transitions, when the matching degree of the SIFT feature between the frames is low, neighboring frames are recognized as belonging to different shots, which can better discern the moving objects in successive frames. 3. EXPERIMENTAL RESULTS AND ANALYSIS Eight distinct videos from the standard dataset, TRECVid 2001 test data made existing on the open video project and accessible at https://open-video.org, are used to assess the suggested method in this research. These videos are referred to as Vid1 through Vid8. A comprehensive description of those input videos is provided in Table 1. The ground truth value is determined by observing abrupt changes as seen by people. The chosen videos contain a variety of aberrations, including lighting variations, viewpoint shifts, scaling, zooming, rotation, and more. Table 1. Description of input videos Input video Video name Time Duration In sec. Frames number Abrupt transition Vid1 Free-for-all race at Charter Oak Park (Historical) 26 853 3 Vid2 New Indians, Segment 101 (Documentary) 131 3953 14 Vid3 New Indians, Segment 01 (Documentary) 56 1687 15 Vid4 Winning Aerospace, Segment 02 (Documentary) 65 1970 11 Vid5 Hidden Fury, segment 10 (Documentary) 33 1002 1 Vid6 Hurricanes, Segment 05 (Documentary) 115 3448 32 Vid7 The Miracle of Water, segment 05 (Documentary) 83 2314 1 Vid8 Winning Aerospace, Segment 04 (Documentary) 110 3318 18 3.1. Reduces redundancy stage When the multiple frames of input video have been extracted, the frames images in the same shot have a high similarity degree and when performing features extraction to extract objects from each frame image results in time-consuming and computing complexity, so reducing the redundancy frames stage results in time- consuming minimization as seen in Table 2 and Figure 3, for instance, the execution time was equivalent to (111.4 seconds) when the second stage was applied to all of the vid1's frames, that means without similarity frames reduction. as opposed to the execution time (44.41 seconds) when vid1 advanced to the lower redundancy level, and so on to others videos as explained in this table that shows how much time each utilized video takes. 3.2. Selection of candidate transition stage Based on the motion of the object and/or the camera, shots may be categorized into four types: static objects with static cameras, dynamic objects with static cameras, dynamic objects with dynamic cameras, and dynamic objects with dynamic cameras [25]. Candidate transition selection is performed based on comparison made on consecutive frames objects. This stage is achieved by comparison made to the extracted objects of frames images based on the created features template by combining multiple features texture, edges, and L* value of L*a*b* color space applied to the k-means technique. The stage starts choosing potentially cut transition frames by examining the standard deviation to the differences of vectors created from frames object blocks for similarity comparison after the frames Fri and Fri+1 pass the first stage based on their correlation value measure. Table 3 and Figure 4 describe how the block size of the frame image object is determined empirically, where Figure 4(a) explains the block size effect on execution time and Figure 4(b) demonstrates the effect of block size on F-score. Vid3, Vid4, and Vid8 are taken as examples to demonstrate that block size affects execution time and accuracy in this table. For this investigation, 8*8 blocks with a 32*32 block size are more appropriate in this study.
  • 6. Comput Sci Inf Technol ISSN: 2722-3221  Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim) 135 Table 2. Time consuming comparison Videos Execution time in Sec. (with reduction) Execution time in Sec. (without reduction) Vid1 44.41 111.42 Vid2 224.94 785.83 Vid3 99.75 314.75 Vid4 129.07 363.44 Vid5 70.178 177.52 Vid6 320.34 566.40 Vid7 111.47 421.15 Vid8 201.23 679.02 Figure 3. Comparison in execution time (a) (b) Figure 4. Block size effect, (a) on execution time and (b) on F-score Table 3. The effect of block size within 256*256 frame size videos 4*4 blocks (64*64 block size) 8*8 blocks (32*32 block size) Execution time In Sec. F-score Execution time In Sec. F-score Vid3 183.58 0.95 99.75 0.96 Vid4 148.92 0.90 129.07 1.00 Vid8 311.36 0.90 201.23 0.97 To explain the frame’s object extraction, for example with samples of frames that explained in Figure 5, the frame objects extraction method steps demonstrate in Figure 6. The recovered combined features (Texture, frame edge, and L* value of L*a*b* color space) from frames i and i+1 create the template features for each one. The frame objects are then extracted for the frame similarity comparison using the k-means approach. If identical objects are found in two consecutive frames, they are likely associated with the same shot; if not, a cut shot transition is a possibility. The significant problem of object and camera movements can be addressed by similarity discovery based on object comparison because the frame object is recognized where it should be in the image of succeeding frames.
  • 7.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139 136 Figure 5. transitions examples Figure 6. Example of consecutive frames objects extracted This proposed object extraction method has been assessed for adopting in this proposed SBD algorithim. According to Table 4 and Figure 7, which describe the information content as determined by the entropy value that means the accuracy of extracting objects by the proposed extraction method of frame, in this table some frames that apply extraction its objects from some different used videos are selected as samples for evaluation. As a result of this evaluation explained in this table, and from the analysis of this evaluation, this proposed object extraction operation has been adopted in this stage of the proposed SBD algorithm. Figure 7. Object extraction accuracy using entropy Table 4. Object extraction evaluation using entropy measure (Ent) Vid2 Vid3 Vid5 Vid6 Vid7 Vid8 Fr. 397 398 262 263 432 432 82 83 568 569 1375 1376 517 Ent 0.926 0.634 0.890 0.985 0.660 0.969 0.998 0.989 0.968 0.992 0.979 0.956 0.884
  • 8. Comput Sci Inf Technol ISSN: 2722-3221  Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim) 137 3.3. Transition decision stage The SIFT properties are adopted in this stage for shot transition decision-making because when it comes to rotation, and zoom, SIFT characteristics remain unaffected and it able to reflect the local variation of moving object efficiently, and may be used to impartially characterize the image [2]. SIFT key points are detected, features are extracted from candidate frames of video results from the previous stage, then feature matching is performed. In features matching two features’ matrices of frame i, frame i+1 have been matched using distance calculation results in a p-by-1 vector, where p represents the key point number that is detected. And from the matched features shot transition decision-making, when the matching degree of the SIFT feature between the frames is low, neighboring frames are recognized as belonging to different shots. Figure 8(a) demonstrates features key point matching for frames in same shot, and Figure 8(b) from different shot. (a) (b) Figure 8. Frames shots feature key points matching, (a) frames in the same shot and (b) frames in a different shot As seen in the figure, due to comparable visual features, the similarity matching between two frames in the same shot is typically high. Frames from diverse shots, however, lack visual uniformity. They therefore have either little or no similarity matching. Recall and precision are the key performance metrics of the suggested system that are typically employed in the SBD process. The F1 score, which is the harmonic mean of precision and recall, is used in this paper's evaluation along with these metrics [2]. The following formula can be used to compute these metrics [5]: 𝑅 = 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 + 𝑚𝑖𝑠𝑠 (7) 𝑃 = 𝑡𝑟𝑢𝑒 𝑡𝑟𝑢𝑒 + 𝑓𝑎𝑙𝑠𝑒 (8) 𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ 𝑃 ∗ 𝑅 𝑃 + 𝑅 (9) where True denotes accurate transition detection, False denotes inaccurate transition detection, and Miss denotes missed transition detection. Table 5 demonstrates the accuracy with these metrics of this proposed SBD algorithm. Table 5. Efficiency of the proposed method Video Recall Precision F-score Vid1 1.00 1.00 1.00 Vid2 1.00 1.00 1.00 Vid3 0.93 1.00 0.96 Vid4 1.00 1.00 1.00 Vid5 1.00 1.00 1.00 Vid6 0.87 0.96 0.91 Vid7 1.00 1.00 1.00 Vid8 0.94 0.94 0.94 Average 0.96 0.98 0.97
  • 9.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 2, July 2024: 130-139 138 3. CONCLUSION By comparing frame image objects and using a scale-invariant feature transform SIFT feature with the discard to the redundant frames of the same shot, the suggested SBD approach has been realized. Three stages are involved in implementing this proposed system: first, the redundancy frames are reduced based on their correlation value; this reduces computation complexity and time consumption; second, the candidate shot transition and boundary are identified based on object comparison using proposed extraction method; this stage can identify objects that where should be in the image of subsequent frames. The last step then uses the SIFT feature to choose which of these candidate frames to select. The research demonstrates that this approach minimizes false positives by utilizing SIFT matching key points, which are independent of the scale and rotation of the image. Our method yields a 97% F1 score, which is high result while requiring a lesser amount of time and complexity. ACKNOWLEDGEMENTS Authors thank the Department of Computer Science, College of Science, Mustansiriyah University, Baghdad-Iraq for supporting this present work. REFERENCES [1] Z. El Khattabi, Y. Tabii, and A. Benkaddour, “Video shot boundary detection using the scale invariant feature transform and RGB color channels.,” International Journal of Electrical & Computer Engineering (2088-8708), vol. 7, no. 5, 2017. [2] L. Kong, “SIFT feature-based video camera boundary detection algorithm,” Complexity, vol. 2021, pp. 1–11, 2021. [3] T. Zhou, F. Porikli, D. J. Crandall, L. Van Gool, and W. Wang, “A survey on deep learning technique for video segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7099–7122, 2022. [4] D. M. Thounaojam, T. Khelchandra, K. M. Singh, and S. Roy, “A genetic algorithm and fuzzy logic approach for video shot boundary detection,” Computational intelligence and neuroscience, vol. 2016, 2016. [5] E. Hato, “Temporal video segmentation using optical flow estimation,” Iraqi Journal of Science, pp. 4181–4194, 2021. [6] H. Shao, Y. Qu, and W. Cui, “Shot boundary detection algorithm based on HSV histogram and HOG feature,” in 2015 International Conference on Advanced Engineering Materials and Technology, Atlantis Press, pp. 951–957, 2015. [7] S. Tippaya, S. Sitjongsataporn, T. Tan, M. M. Khan, and K. Chamnongthai, “Multi-modal visual features-based video shot boundary detection,” IEEE Access, vol. 5, pp. 12563–12575, 2017, doi: 10.1109/ACCESS.2017.2717998. [8] S. Akpinar and F. Alpaslan, “A novel optical flow-based representation for temporal video segmentation,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 25, no. 5, pp. 3983–3993, 2017. [9] M. Haroon, J. Baber, I. Ullah, S. M. Daudpota, M. Bakhtyar, and V. Devi, “Video scene detection using compact bag of visual word models,” Advances in Multimedia, vol. 2018, pp. 1–9, 2018. [10] F.-F. Duan and F. Meng, “Video shot boundary detection based on feature fusion and clustering technique,” IEEE Access, vol. 8, pp. 214633–214645, 2020. [11] Z. N. Idan, S. H. Abdulhussain, B. M. Mahmmod, K. A. Al-Utaibi, S. A. R. Al-Hadad, and S. M. Sait, “Fast shot boundary detection based on separable moments and support vector machine,” IEEE Access, vol. 9, pp. 106412–106427, 2021. [12] N. Kumar, “Shot boundary detection framework for video editing via adaptive thresholds and gradual curve point,” Turkish Journal of Computer and Mathematics Education (TURCOMAT), vol. 12, no. 11, pp. 3820–3828, 2021. [13] J. T. Jose, S. Rajkumar, M. R. Ghalib, A. Shankar, P. Sharma, and M. R. Khosravi, “Efficient shot boundary detection with multiple visual representations,” Mobile Information Systems, vol. 2022, 2022. [14] K. A. Akintoye, N. A. F. B. Ismial, N. Z. S. B. Othman, M. S. M. Rahim, and A. H. Abdullah, “Composite median Wiener filter based technique for image enhancement.,” Journal of Theoretical & Applied Information Technology, vol. 96, no. 15, 2018. [15] S. H. Majeed and N. A. M. Isa, “Adaptive entropy index histogram equalization for poor contrast images,” IEEE Access, vol. 9, pp. 6402–6437, 2020, doi: 10.1109/ACCESS.2020.3048148. [16] A. M. Neto, A. C. Victorino, I. Fantoni, D. E. Zampieri, J. V. Ferreira, and D. A. Lima, “Image processing using Pearson’s correlation coefficient: Applications on autonomous robotics,” in 2013 13th International Conference on Autonomous Robot Systems, IEEE, pp. 1–6,2013. [17] N. K. Ibrahim, A. H. Al-Saleh, and A. S. A. Jabar, “Texture and pixel intensity characterization-based image segmentation with morphology and watershed techniques,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 3, pp. 1464–1477, 2023. doi: 10.11591/ijeecs.v31.i3. [18] N. khalid, “Hybrid features of mask generated with gabor filter for texture analysis and sobel operator for image regions segmentation using K-Means technique,” Journal La Multiapp, vol. 3, no. 5, pp. 250–258, 2022, doi: 10.37899/journallamultiapp.v3i5.743. [19] X. Zheng, Q. Lei, R. Yao, Y. Gong, and Q. Yin, “Image segmentation based on adaptive K-means algorithm,” EURASIP Journal on Image and Video Processing, vol. 2018, no. 1, pp. 1–10, 2018. [20] U. Petronas, “Mean and standard deviation features of color histogram using laplacian filter for content-based image retrieval,” Journal of Theoretical and Applied Information Technology, vol. 34, no. 1, pp. 1–7, 2011. [21] R. Sammouda and A. El-Zaart, “An optimized approach for prostate image segmentation using K-means clustering algorithm with elbow method,” Computational Intelligence and Neuroscience, vol. 2021, 2021. [22] N. Dhanachandra and Y. J. Chanu, “A new approach of image segmentation method using K-means and kernel based subtractive clustering methods,” International Journal of Applied Engineering Research, vol. 12, no. 20, pp. 10458–10464, 2017. [23] N. M. Kwok, Q. P. Ha, and G. Fang, “Effect of color space on color image segmentation,” in 2009 2nd International Congress on Image and Signal Processing, IEEE, pp. 1–5, 2009. [24] L. David, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91– 110, 2004. [25] S. H. Abdulhussain, A. R. Ramli, M. I. Saripan, B. M. Mahmmod, S. A. R. Al-Haddad, and W. A. Jassim, “Methods and challenges in shot boundary detection: a review,” Entropy, vol. 20, no. 4, p. 214, 2018.
  • 10. Comput Sci Inf Technol ISSN: 2722-3221  Video shot boundary detection based on frames objects comparison and scale… (Noor Khalid Ibrahim) 139 BIOGRAPHIES OF AUTHORS Noor Khalid Ibrahim is lecturer at college of college of science, Mustansiriyah University, Iraq. Received the B.Sc. degree in computer science from Department of Computer, College of Science, Mustansiriyah University, Iraq. She holds a master degree in computer science at 2015, with specialization in multi-media. Her research areas in image processing. She can be contacted at email: noor.kh20@uomustansiriyah.edu.iq. Zinah Sadeq Abduljabbar is lectuter at collage of science, Mustansiriyah university, Iraq. Received the B.Sc. degree in computer science from department of computer, collage of science, Mustansiriyah university, Iraq. She holds a master degree in computer science at 2014, with specialization in multi-media. she can be contacted at email: zinahsadeq@uomustansiriyah.edu.iq.