Modified Binary Search Algorithm For Duplicate Elements
Modified Binary Search Algorithm For Duplicate Elements
Abstract— In computer science, searching an item or data downward search can be carried out fro m the in itial result,
from a large data set efficiently gives a challenging task. A stopping each search when the element is no longer equal [7].
search strategy is a procedure that performs many comparisons. This research tends to extend the classical BS technique as
It starts searching for every value starting from the start, so it Duplicate Element Binary Search (DEBS) for duplicate
performs many comparisons but consumes a lot of time. This
elements in solving computational problems. The proposed
searching time can be reduced by avoiding searching every time
for each value from the start. Binary search is based on this search algorithm is also considered iteratively with two index
concept. An d it gives a very good performance with respect to limits that progressively narrow the search range. These two
other algorithms due to its logarithmic time complexity. But, the algorith ms are demonstrated on the book database in an e-
limitation with the Binary S earch (BS ) technique is that it can lib rary system with the same time co mplexity.
only be used to search for one element in a given list. S o this The rest of this paper is organized as follows. Sect ion II
paper tends to extend Binary Search (BS ) algorithm to presents the related work for this research. Section III
overcome this limitation. On the other hand, Duplicate Element presents description of the proposed algorithm. It describes
Binary Search (DEBS ) algorithm is developed for duplicate
elements in a given list with the same time complexity of Binary Search (BS) algorith m, Duplicate Element Binary
classical BS algorithm. Applications of these two algorithms are Search (DEBS) algorithm and analysis of algorithms . Section
considered together with the book database in an e-library IV describes the implementation of the proposed algorithm.
system. This system is implemented by using java programming Section V closes the paper with conclusion.
language and MyS QL server.
II. R ELATED WO RKS
Keywords: Analysis of Algorithm, Binary Search, Duplicate M. Archibald studied “Average Depth in a Binary Search
Element, Time Complexity Tree with Repeated Keys”. Here, Random sequences from
alphabet {1….r} are examined where repeated letters are
I. INTRO DUCTIO N allo wed. Binary search trees are formed fro m these sequences
Searching is one of the most fundamental operations in the and the average left-going depth of the first ‘1’ is found.
field of co mputing. In co mputer science, the search Next, the right-going depth of the first ‘r’ is examined, and
algorith ms-i.e. the algorithms used to find a particular item finally a merge (or ‘shuffle’) operator is used to obtain the
fro m a set-are generally divided on uninformed (used on average depth of an arbitrary node, which can be exp ressed in
unsorted list) and informed ones (used on already sorted list), terms of the left-going and right-going depths. This paper
that apply knowledge about the structure of the search space examines various parameters of these trees and gives an
to reduce the amount of time spent searching [7]. Searching average case analysis under two standard probabilistic
is one of the most time-consuming processes of many models (probability and mu ltiset) [1].
A. Tarek proposed “a New Approach for Multiple Element
processing systems. There are many types of searching
Binary Search in Database Applications”. In this paper, the
techniques such as Linear Search, Binary Search (BS),
mu lti-key binary search (MKBS) algorith m and the Multi-
Depth-First Search (DFS), Best-First Search (BFS), and so key Binary Insertion Search (MKBIS) algorith m are
on. Among them, Binary Search is a popular and a useful developed based on the classical Binary search algorith m.
technique for practical applications due to its logarithmic MKBS is searching for m different keys in a list of n
time comp lexity. Time co mplexity of BS algorith m is different list elements. MKBIS can be used to insert multiple
O(logN). As the time co mp lexity is logarith mic, the elements inside a sorted list. Both the MKBS and the MKBIS
algorith m exh ibits significant improvements in computation algorith ms are used for extracting records fro m d ifferent
time with a very large size of the list [3]. The logarith mic layers within the structure as well as for inserting mult iple
behavior of BS algorithm to find elements requires data set to records. Applications of the proposed algorithms are
be arranged in ascending or descending order. However, it considered together with a model Emp loyee Database
can be used to search for one element or data in a given list. Management program with imp roved efficiency [3].
S. Korteweg developed a new dictionary structure
This is the limitation of this algorith m.
supporting binary search. This dictionary structure can be
Some of the database applications require searching for
implemented without a penalty in memo ry usage but does not
duplicate elements. It may include repeated values (eg.
support vocabularies. Due to the O(logN) t ime for Binary
student name in student database and book title in book
Search, the system can reduce searching time by
database in e-library, etc.) with different properties in records
implementing the binary search dictionary structure [2].
of database. The elements of the list are not necessarily all
unique. If one searches for a value that occurs multiple t imes R. Nowak studied a generalizat ion of the classic binary
in the list, the index returned will be of the first-encountered search problem. The classic problem can be v iewed as
equal element. To find all equal elements an upward and determining the correct one-dimensional, binary-valued
© http://ijccer.org e-ISSN: 2321-4198 p-ISSN: 2321-418X Page 77
Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014]
threshold function from a fin ite class of such functions based n items
on queries taking the form of point samples of the function.
The generalized problem extends binary search techniques to
mu lti-dimensional threshold functions, which arise in
mach ine learning and pattern classification. It identifies low mid high
key<item[mid]
geometrical conditions on the pair (specific query space X,
hypothesis space H) that guarantee that Generalized Binary n/2 items
Search determines the correct hypothesis in O(log|H|)
queries. Extensions to handle noise are also discussed [4].
Above of all researches, Binary Search algorith m is a faster low mid high
algorith m than other search algorith ms and very useful to key>item[mid]
search elements in Database applications. Many researches n/4 items
use rapid searching using a variant extension of the Binary
Search algorith m. In this paper, we propose an efficient
algorith m by modifying Binary Search algorithm for finding
the duplicate elements in a sorted list. low mid high
Algorithm duplicate element binary search (A [0…n-1], key) low key=A[mid] high
low = middle E F G
if (A[high] = key) then
return h igh low mid high
end if
while (low <= high) do key=A[mid+1]
G
middle = (lo w + high) / 2
if ((A[midd le] > key and A[middle-1] = key) then
return midd le-1 Figure 2: Duplicate Element Binary Search algorith m
else if (A[middle] = key) then
low = middle + 1 C. Analysis of Algorithms
else if (A[middle] > key and A[middle -1] != key) then With each test that fails to find a match at the probed
high = middle – 1 position, the search is continued with one or other of the two
end if sub-intervals, each at most half the size. If the number of
end while items N is odd then both sub-intervals will contain (N-1)/2
return -1 elements, while if N is even then the two sub-intervals
contain N/2-1 and N/ 2 elements.
findFirstIndex(A[0…n-1 ],key, mi ddle, l ow) In Binary Search Algorith m, if the orig inal nu mber of
items is N then after the first iteration there will be at most
high = middle - 1 N/2 items remaining, then at most N/4 items, at most N/8
if (A[low] = key) then items, and so on. In the worst case, when the value is not in
return low the list, the algorithm must continue iterating until the span
end if has been made empty: this will have taken at most
while (low <= high) do
⌊log 2 (N)+1⌋iterat ions. Thus binary search is a logarith mic
middle = (lo w + high) / 2 algorith m and executes in O(logN) time [5]. When compared
if ((A[midd le] < key and A[middle+1] = key) then
to linear search, whose worst-case behavior is N iterations,
return midd le+1 binary search is substantially faster as N gro ws large [8].
else if (A[middle] = key) then In Duplicate Element Binary Search Algorith m, when the
high = middle - 1
value is not in the list, the algorithm must continue as the
else if (A[middle] < key and A[middle+1] != key) then function of binary search. Thus the algorithm will take at
low = middle + 1 most (log 2 N + 1) time and so O(logN) same as original
end if
binary search. If the middle value matches with the search
end while key, then the algorithm calls findLastIndex on right sub array
return -1 and findFirstIndex function on left sub array. On each sides
of the array, the algorith m will require at most N/2 items
remain ing, then at most N/4 items, at most N/8 items and so
on. So there is 2log 2 N + 1 fo r searching on both sides.
Therefore, the running time co mplexity of the proposed
© http://ijccer.org e-ISSN: 2321-4198 p-ISSN: 2321-418X Page 79
Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014]
Start
the sorted list with two index limits to reduce the search
range. Although DEBS algorith m searches the duplicate
elements over BS search algorith m, this has the same time
complexity with BS. Thus, this proposed algorithm is an
efficient algorith m improvement over binary search.
ACKNOWLEDGMENT
First of all, the author is highly grateful to Dr. Myint
Thein, the Pro-Rector of the Mandalay Technological
University for his permission for complet ion of this paper.
The author wants to express her gratitude to Dr. Lai Lai Win
Kyi for her help and advice regarding of this topic and
excellent guidance, valuable suggestions and advices. The
Figure 7: Searching with Author Name by using DEBS author is deeply thankful to Dr. Aung Myint Aye, Head of
Depart ment and all teachers from Depart ment of Informat ion
If the user selects the binary search algorithm option, book Technology at Mandalay Technological Un iversity for their
searching system will use BS to find the desired book. The overall supporting during the writ ing of this paper.
searching process is same as the process of DEBS but it can
only find one result for the desired book. The search results R EFERENCES
of binary search are illustrated in Figure 8 and Figure 9. [1] M . Archibald, J. Clement: “Average depth in a binary search
tree with repeated keys”, DMTCS proc. AG, 2006, 309-320.
[2] Siem Korteweg and Hans Nieuwenhuyzen: “Binary Search”,
The Journal of Forth Application and Research, Volume 2,
Number 4, 1984.
[3] A. Tarek: “A New Approach for M ultiple Element Binary
Search in Database Applications”, International Journal of
Computers, Issue 4, Volume 1, 2007.
[4] R. Nowak: “Generalized Binary Search”, 46th Annual Allerton
Conference, 23-26 Sept. 2008.
[5] Binary Search Algorithm, available at:
http://en.wikipedia.org/binary_search_algorithm.htm, (April
2011).
[6] P. Kumar: Quadratic Search: A New and Fast Searching
Algorithm (An extension of classical Binary search strategy),
International Journal of Computer Applications (0975-8887),
Volume 65-No.14, M arch 2013.
[7] S. Shepurin, Algorithms: search and sorting (from Wikipedia
Figure 8: Searching with Boo k Title by using BS
and M SDN), 2009
[8] M . Nosrati, R. Karimi, H. Allah Hasanvand, “Short
Communication on Basic Searching Algorithms”, World
Applied Programming, Vol(2), Issue (5), ISSN: 2222-2510,
M ay 2012. 325-329
V. CO NCLUSION
Binary Search algorith m is a faster algorithm than other
search algorithms and very useful to search elements in
Database applications. But this algorith m can’t search
duplicate elements in the list. In this paper, a duplicate
element binary search (DEBS) algorith m capable of
performing search with duplicate elements is proposed. And
this algorithm is exp lored through the imp lementation of
book database application with the help of java programming
language. The proposed search algorithm finds iteratively on