Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
74 views

Modified Binary Search Algorithm For Duplicate Elements

Uploaded by

Phyu Too Thwe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Modified Binary Search Algorithm For Duplicate Elements

Uploaded by

Phyu Too Thwe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Computer & Communication Engineering Research (IJCCER)

Volume 2 - Issue 2 March 2014

Modified Binary Search Algorithm for Duplicate


Elements
Phyu Phyu Thwe 1 , Lai Lai Wi n Kyi 2
1,2
Department of Information Technology
Mandalay Technological University,Mandalay, Myanmar
1
phyu2thwe@gmail.co m, 2 laelae83@g mail.co m

Abstract— In computer science, searching an item or data downward search can be carried out fro m the in itial result,
from a large data set efficiently gives a challenging task. A stopping each search when the element is no longer equal [7].
search strategy is a procedure that performs many comparisons. This research tends to extend the classical BS technique as
It starts searching for every value starting from the start, so it Duplicate Element Binary Search (DEBS) for duplicate
performs many comparisons but consumes a lot of time. This
elements in solving computational problems. The proposed
searching time can be reduced by avoiding searching every time
for each value from the start. Binary search is based on this search algorithm is also considered iteratively with two index
concept. An d it gives a very good performance with respect to limits that progressively narrow the search range. These two
other algorithms due to its logarithmic time complexity. But, the algorith ms are demonstrated on the book database in an e-
limitation with the Binary S earch (BS ) technique is that it can lib rary system with the same time co mplexity.
only be used to search for one element in a given list. S o this The rest of this paper is organized as follows. Sect ion II
paper tends to extend Binary Search (BS ) algorithm to presents the related work for this research. Section III
overcome this limitation. On the other hand, Duplicate Element presents description of the proposed algorithm. It describes
Binary Search (DEBS ) algorithm is developed for duplicate
elements in a given list with the same time complexity of Binary Search (BS) algorith m, Duplicate Element Binary
classical BS algorithm. Applications of these two algorithms are Search (DEBS) algorithm and analysis of algorithms . Section
considered together with the book database in an e-library IV describes the implementation of the proposed algorithm.
system. This system is implemented by using java programming Section V closes the paper with conclusion.
language and MyS QL server.
II. R ELATED WO RKS
Keywords: Analysis of Algorithm, Binary Search, Duplicate M. Archibald studied “Average Depth in a Binary Search
Element, Time Complexity Tree with Repeated Keys”. Here, Random sequences from
alphabet {1….r} are examined where repeated letters are
I. INTRO DUCTIO N allo wed. Binary search trees are formed fro m these sequences
Searching is one of the most fundamental operations in the and the average left-going depth of the first ‘1’ is found.
field of co mputing. In co mputer science, the search Next, the right-going depth of the first ‘r’ is examined, and
algorith ms-i.e. the algorithms used to find a particular item finally a merge (or ‘shuffle’) operator is used to obtain the
fro m a set-are generally divided on uninformed (used on average depth of an arbitrary node, which can be exp ressed in
unsorted list) and informed ones (used on already sorted list), terms of the left-going and right-going depths. This paper
that apply knowledge about the structure of the search space examines various parameters of these trees and gives an
to reduce the amount of time spent searching [7]. Searching average case analysis under two standard probabilistic
is one of the most time-consuming processes of many models (probability and mu ltiset) [1].
A. Tarek proposed “a New Approach for Multiple Element
processing systems. There are many types of searching
Binary Search in Database Applications”. In this paper, the
techniques such as Linear Search, Binary Search (BS),
mu lti-key binary search (MKBS) algorith m and the Multi-
Depth-First Search (DFS), Best-First Search (BFS), and so key Binary Insertion Search (MKBIS) algorith m are
on. Among them, Binary Search is a popular and a useful developed based on the classical Binary search algorith m.
technique for practical applications due to its logarithmic MKBS is searching for m different keys in a list of n
time comp lexity. Time co mplexity of BS algorith m is different list elements. MKBIS can be used to insert multiple
O(logN). As the time co mp lexity is logarith mic, the elements inside a sorted list. Both the MKBS and the MKBIS
algorith m exh ibits significant improvements in computation algorith ms are used for extracting records fro m d ifferent
time with a very large size of the list [3]. The logarith mic layers within the structure as well as for inserting mult iple
behavior of BS algorithm to find elements requires data set to records. Applications of the proposed algorithms are
be arranged in ascending or descending order. However, it considered together with a model Emp loyee Database
can be used to search for one element or data in a given list. Management program with imp roved efficiency [3].
S. Korteweg developed a new dictionary structure
This is the limitation of this algorith m.
supporting binary search. This dictionary structure can be
Some of the database applications require searching for
implemented without a penalty in memo ry usage but does not
duplicate elements. It may include repeated values (eg.
support vocabularies. Due to the O(logN) t ime for Binary
student name in student database and book title in book
Search, the system can reduce searching time by
database in e-library, etc.) with different properties in records
implementing the binary search dictionary structure [2].
of database. The elements of the list are not necessarily all
unique. If one searches for a value that occurs multiple t imes R. Nowak studied a generalizat ion of the classic binary
in the list, the index returned will be of the first-encountered search problem. The classic problem can be v iewed as
equal element. To find all equal elements an upward and determining the correct one-dimensional, binary-valued
© http://ijccer.org e-ISSN: 2321-4198 p-ISSN: 2321-418X Page 77
Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014]

threshold function from a fin ite class of such functions based n items
on queries taking the form of point samples of the function.
The generalized problem extends binary search techniques to
mu lti-dimensional threshold functions, which arise in
mach ine learning and pattern classification. It identifies low mid high
key<item[mid]
geometrical conditions on the pair (specific query space X,
hypothesis space H) that guarantee that Generalized Binary n/2 items
Search determines the correct hypothesis in O(log|H|)
queries. Extensions to handle noise are also discussed [4].
Above of all researches, Binary Search algorith m is a faster low mid high
algorith m than other search algorith ms and very useful to key>item[mid]
search elements in Database applications. Many researches n/4 items
use rapid searching using a variant extension of the Binary
Search algorith m. In this paper, we propose an efficient
algorith m by modifying Binary Search algorithm for finding
the duplicate elements in a sorted list. low mid high

III. DESCRIPTION OFTHEPROPOSED ALGORITHM


The proposed algorithm is based on BS algorith m for log2 n steps
finding duplicate values that occur more than one time in key=item[mid]
database. This section discusses Binary Search A lgorith m,
Duplicate Element Binary Search Algorith m and Analysis of low high
Algorith ms. mid

A. Binary Search Algorithm


Figure 1: Co mparison of Binary Search
In computer science, a binary search or half-interval search
algorith m finds the position of a specified value (the input Binary search requires a more co mp lex p rogram than the
“key”) within a sorted array. In each step, the algorithm linear search and thus for small data N it may run slower than
compares the input key value with the key value of the the simple linear search [6]. For large data N, BS is faster
middle element of the array. If the keys match then a than linear search.
matching element has been found so its index or position is
B. Duplicate Element Binary Search Algorithm
returned. Otherwise, if the key is less than the middle
element's key, then the algorithm repeats its action on the Duplicate Element binary search (DEBS) algorith m
sub-array to the left of the middle element or, if the input key modifies binary search operations for finding duplicate
is greater, on the sub-array to the right. If the remaining array elements in the sorted list. The algorithm finds first and last
to be searched is reduced to zero and the key cannot be found occurrences of the key and stores their indexes into an array
in the array, then a special "Not found" indication is returned. and returns it. The search starts with the midpoint of the
A binary search halves the number of items to check with array. If the key is less than or greater than middle element,
each of iterat ions, so locating an item (or determin ing its the algorithm performs the same function of the binary
absence) takes logarithmic t ime [5]. A binary search is an search. If the key matches the middle element, there can be
example of a d ivide and conquer search algorithm.
duplicate keys on both sides of the array. Therefore, the
Algorithm binary search (A [0…n-1], key) algorith m checks the sub array to the right for the last
occurrence whether the last index is equal to the key. If the
while (lo w <= h igh) do key is not equal to the last index, the algorith m finds the
middle = ( low + high) / 2
midpoint again on this sub array and repeats its action. Then
if A[middle] = key then
return midd le it checks again on the sub array to left for the first occurrence
else if A[middle] > key then whether the first index is equal to the key. If the key is not
high = middle - 1 equal to the first index, the algorith m performs action same to
else
low = middle + 1 the right sub array. Finally, the indexes of the first and last
end if occurrences are returned in an array. If remaining sub array
end while to be search reaches zero and the key is not found in the
array, then a special “Not found” indication is returned.
return -1 This algorithm is useful to find where duplicate items are
in a sorted array. Assume that, the user to find book
informat ion contain book title, author name, edition,
description by using author name. There are one or more
books written by the author. DEBS can perform searching

© http://ijccer.org e-ISSN: 2321-4198 p-ISSN: 2321-418X Page 78


Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014]

process for duplicate elements. BS algorith m support key = G


searching process for one element not for duplicate element.
This is the weak point of BS. As a result, DEBS is developed A[0… 15]
to overcome this weak point without changing the time
A B C D E F G G G G G H I J K L
complexity.

Algorithm duplicate element binary search (A [0…n-1], key) low key=A[mid] high

while (low <= high) do Find last occurrence in sub array


middle = (lo w + high) / 2
if A[middle] > key then G G G G H I J K L
high = middle – 1
else if A[middle] < key then
low mid key!=high
low = mid + 1
else if A[middle = key] then key=A[mid-1]
lastOccurrence = find LastIndex(A [0…n -1], key, middle,
high) G G G G
firstOccurrence = findFirstIndex(A[0…n-1],key, midd le,
Find first occurrence in sub array
low)
return result[lastOccurrence,firstOccurrence]
end if A B C D E F G
end while
return -1 key!=low mid high
key!=A[mid+1]
findLastIndex(A[0…n-1], key, mi ddle, high)

low = middle E F G
if (A[high] = key) then
return h igh low mid high
end if
while (low <= high) do key=A[mid+1]
G
middle = (lo w + high) / 2
if ((A[midd le] > key and A[middle-1] = key) then
return midd le-1 Figure 2: Duplicate Element Binary Search algorith m
else if (A[middle] = key) then
low = middle + 1 C. Analysis of Algorithms
else if (A[middle] > key and A[middle -1] != key) then With each test that fails to find a match at the probed
high = middle – 1 position, the search is continued with one or other of the two
end if sub-intervals, each at most half the size. If the number of
end while items N is odd then both sub-intervals will contain (N-1)/2
return -1 elements, while if N is even then the two sub-intervals
contain N/2-1 and N/ 2 elements.
findFirstIndex(A[0…n-1 ],key, mi ddle, l ow) In Binary Search Algorith m, if the orig inal nu mber of
items is N then after the first iteration there will be at most
high = middle - 1 N/2 items remaining, then at most N/4 items, at most N/8
if (A[low] = key) then items, and so on. In the worst case, when the value is not in
return low the list, the algorithm must continue iterating until the span
end if has been made empty: this will have taken at most
while (low <= high) do
⌊log 2 (N)+1⌋iterat ions. Thus binary search is a logarith mic
middle = (lo w + high) / 2 algorith m and executes in O(logN) time [5]. When compared
if ((A[midd le] < key and A[middle+1] = key) then
to linear search, whose worst-case behavior is N iterations,
return midd le+1 binary search is substantially faster as N gro ws large [8].
else if (A[middle] = key) then In Duplicate Element Binary Search Algorith m, when the
high = middle - 1
value is not in the list, the algorithm must continue as the
else if (A[middle] < key and A[middle+1] != key) then function of binary search. Thus the algorithm will take at
low = middle + 1 most (log 2 N + 1) time and so O(logN) same as original
end if
binary search. If the middle value matches with the search
end while key, then the algorithm calls findLastIndex on right sub array
return -1 and findFirstIndex function on left sub array. On each sides
of the array, the algorith m will require at most N/2 items
remain ing, then at most N/4 items, at most N/8 items and so
on. So there is 2log 2 N + 1 fo r searching on both sides.
Therefore, the running time co mplexity of the proposed
© http://ijccer.org e-ISSN: 2321-4198 p-ISSN: 2321-418X Page 79
Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014]

algorith m is O(logN). Thus, this proposed algorithm is an


efficient algorith m improvement over binary search.

IV. D ESIGN AND IMPLEMENTATIO N O F THE PROPOSED


S YSTEM
This section presents the design and imp lementation of the
proposed system. The system is implemented for users to
search the desired books in e-library by using classical BS
algorith m and DEBS algorith m.

A. Design of the Proposed System


The design of book searching system is illustrated in
Figure 3. In this system, the book database is first created. It
is composed of the book’s information such as Book ID, Figure 4: Ho me page of the proposed system
Book Title, Edit ion number, Author name and Description of
each book. The system allo ws the users to choose the After the desire algorith m has selected, the following
following two search functions. searching window will be appeared as shown in Figure 5.
1. Searching with book title: Users input a book name Here, the users can choose one of two options (book title and
which they want to search. author name) for searching process. The search result will be
2. Searching with author name: Users input an author shown as the table containing book id, t itle, edition, author
name written the book which they want to search. name and other description of book.
And the user can also select the Binary Search algorith m
for searching one item in the lists; otherwise duplicate
element binary search algorith m is selected.

Start

Input book Title


or Author Name

Duplicate Element Binary Search


Binary Search
Figure 5: Search Frame of book searching system
DB
If the user selects the duplicate element binary search
algorith m option, book searching system will use DEBS to
Display Result Display Result find the desired book. After the users have selected the book
title option, they must enter the desire book name in the text
box. The duplicate book title results will be d isplayed with
different attributes because of the advantage of the DEBS.
End

Figure 3: System design for Duplicate Element Binary


Search

B. Implementation of the proposed system


This section describes the implementation results. To
implement the proposed system, the booklist database is built
by using MySQL server. It is created collection of objects to
find key values or to recognize that the key values do not exit
in the database. In real applications, the list of items is often
records (e.g. student records, book lists) and the list is
implemented as an array object. The searching process is
done by using BS and DEBS. The whole system is Figure 6: Searching with Boo k Title by using DEBS
implemented by using Java programming language.
When the system starts, the Home page is displayed as On the other hand, when the users choose author name
shown in Figure 4. Under this page, the user must select the option, they can find the desire books by typing author name.
desire algorith m. The duplicate result will be shown as illustrated in Figure 7.

© http://ijccer.org e-ISSN: 2321-4198 p-ISSN: 2321-418X Page 80


Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014]

the sorted list with two index limits to reduce the search
range. Although DEBS algorith m searches the duplicate
elements over BS search algorith m, this has the same time
complexity with BS. Thus, this proposed algorithm is an
efficient algorith m improvement over binary search.

ACKNOWLEDGMENT
First of all, the author is highly grateful to Dr. Myint
Thein, the Pro-Rector of the Mandalay Technological
University for his permission for complet ion of this paper.
The author wants to express her gratitude to Dr. Lai Lai Win
Kyi for her help and advice regarding of this topic and
excellent guidance, valuable suggestions and advices. The
Figure 7: Searching with Author Name by using DEBS author is deeply thankful to Dr. Aung Myint Aye, Head of
Depart ment and all teachers from Depart ment of Informat ion
If the user selects the binary search algorithm option, book Technology at Mandalay Technological Un iversity for their
searching system will use BS to find the desired book. The overall supporting during the writ ing of this paper.
searching process is same as the process of DEBS but it can
only find one result for the desired book. The search results R EFERENCES
of binary search are illustrated in Figure 8 and Figure 9. [1] M . Archibald, J. Clement: “Average depth in a binary search
tree with repeated keys”, DMTCS proc. AG, 2006, 309-320.
[2] Siem Korteweg and Hans Nieuwenhuyzen: “Binary Search”,
The Journal of Forth Application and Research, Volume 2,
Number 4, 1984.
[3] A. Tarek: “A New Approach for M ultiple Element Binary
Search in Database Applications”, International Journal of
Computers, Issue 4, Volume 1, 2007.
[4] R. Nowak: “Generalized Binary Search”, 46th Annual Allerton
Conference, 23-26 Sept. 2008.
[5] Binary Search Algorithm, available at:
http://en.wikipedia.org/binary_search_algorithm.htm, (April
2011).
[6] P. Kumar: Quadratic Search: A New and Fast Searching
Algorithm (An extension of classical Binary search strategy),
International Journal of Computer Applications (0975-8887),
Volume 65-No.14, M arch 2013.
[7] S. Shepurin, Algorithms: search and sorting (from Wikipedia
Figure 8: Searching with Boo k Title by using BS
and M SDN), 2009
[8] M . Nosrati, R. Karimi, H. Allah Hasanvand, “Short
Communication on Basic Searching Algorithms”, World
Applied Programming, Vol(2), Issue (5), ISSN: 2222-2510,
M ay 2012. 325-329

Figure 9: Searching with Author Name by using BS

V. CO NCLUSION
Binary Search algorith m is a faster algorithm than other
search algorithms and very useful to search elements in
Database applications. But this algorith m can’t search
duplicate elements in the list. In this paper, a duplicate
element binary search (DEBS) algorith m capable of
performing search with duplicate elements is proposed. And
this algorithm is exp lored through the imp lementation of
book database application with the help of java programming
language. The proposed search algorithm finds iteratively on

© http://ijccer.org e-ISSN: 2321-4198 p-ISSN: 2321-418X Page 81

You might also like