ASH Search Binary Search Optimization
ASH Search Binary Search Optimization
Ashar Mehmood
School of Electrical Engineering and Computer Science (SEECS)
National University of Science and Technology (NUST)
Islamabad, 44000, Pakistan
10
International Journal of Computer Applications (0975 – 8887)
Volume 178 – No. 15, May 2019
11
International Journal of Computer Applications (0975 – 8887)
Volume 178 – No. 15, May 2019
3.3 Algorithmic point of view As, at 19th index element is 197, which is still greater than
Presented algorithm works on a single loop. Each 192 so proposed algorithm again changes the starting and
iteration does the following set of operations: ending index. Thus,
start13start+((0.5)*Ω)
Calculating average variation between the elements end18 est_n-1;
of array. In third iteration:
est_var7△⁄Ω
Calculating estimated index of target element on the est_n18 (µ⁄(est_var))+start;
basis of calculated average variation between the At 18th index the element is 192 which is equal to target
elements of array. element. Thus, proposed algorithm finds the target element in
Checking whether the element on the estimated 3 iterations.
index is equal to target element or not. With the help of average variation and index estimation
formula, we ended up with estimated index of target element.
If the element on the estimated index is not equal to We can noticed that after comparing the estimated index
target element then update the starting and ending element with target element in second iteration the starting
index of the array to converge to target element. and ending index is updated.There are two types of updation
is performed in the proposed algorithm: first on the basis of
If target element is greater than element on the estimated index and second is just cutting the array from left
estimated index then it means that the target element or right side. The purpose of both the updation is to converge
lies between the estimated index and ending index. to the target element quickly.
Thus, now our starting index will be On the basis of estimated index:
estimated_index+1. s_num>arr[est_n]: If target element is greater than element at
If target element is smaller than element on the the estimated index then it is obvious that to acquire more
estimated index then it means that the target element accurate estimation of target element we have to start from
lies between the starting index and estimated index. next index of current estimated index. Therefore, “start”
Thus, now our ending index would be becomes “est_n+1” (As arr[est_n]!=s_num, so there is no
estimated_index-1. need to consider est_n in new array).
s_num<arr[est_n]: If target element is smaller than element at
Discard the 50% part of instantanios array from the the estimated index then it is obvious that to acquire more
right or left side such that the target element accurate estimation of target element we have to update our
remains in the updated array and estimations can be ending index. As we know that the target element is smaller
more accurate. than estimated index so it would be better to consider “est_n-
1” as ending index in next iteration. Thus, it will give better
Consider the case where there is a vector containing 30 estimation in next iteration.
elements coming with different variations in ascending order Suppose there is an array whose elements comes with very
arr= {21, 27, 35, 58, 59, 60, 67, 69, 85, 95, 120, 151, 152, different variations such as,
157, 160, 166, 174, 181, 192, 197, 204, 209, 219, 225, 229, arr={16,81,256,625,1296,2401,4096,6561,10000,14641,1464
235, 241, 248, 251, 263} 2,14643,14644,20736,83521,104976}. It can be noticed that
For example, we want to find the position of of element “67”, difference between first two element of array is 65 and
where subtraction of elements (arr[end]-arr[start]), subtraction difference between next two elements is 175. Similarly,
of indexes (end-start) and difference of starting index element difference between 3rd and 4th element is 369 and so on. But
from target element (s_num-arr[start]) is represented by △ difference between 10th and 11th element is just 1 and same
(Delta), Ω (Omega), µ (mu) respectively. is the case for 11th, 12th and 12th, 13th. It can also be noticed
In first iteration: that the variation in the last four elements are 6092, 62785,
Start0 21455.Thus, with this much variation in the variations
endt_elem-1 between the elements of array the average variation estimation
est_var8.34483△⁄Ω would not be better enough to give accurate result for the
est_n6 (µ⁄(est_var))+start; estimate index. Here comes the second part of updation of
After comparing the element at 6th index with target element starting and ending index.
it is noted that they are equal. Thus, in this case proposed After assigning “start” or “end” index to “est_n+1” or “est_n-
algorithm finds the elements in first iteration. 1”, the proposed algorithm further check whether right half
Now, suppose we want to find 192, part of the array can be discarded or left one such that the
In first iteration: target element doesn’t lie in the discarded part. After
start0 validation (target element doesn’t lie in the discarded part) it
endt_elem-1 discards the right or left half part of the array. Basically, it’s
est_var8.34483△⁄Ω the fast way of converging to element which is being targeted.
est_n20(µ⁄(est_var))+start; As the array is becoming smaller so the variation in the
As, at 20th index element is 204, which is greater than 192 so variations between the elements of array will be lesser. In this
proposed algorithm changes the starting and ending index to way we will be closer to target element and index estimation
make estimation more accurate and find correct index of would be better.
element 192. Thus,
start9start+((0.5)*Ω)
end19 est_n-1;
In second iteration:
est_var10.2△⁄Ω
est_n19 (µ⁄(est_var))+start;
12
International Journal of Computer Applications (0975 – 8887)
Volume 178 – No. 15, May 2019
13
International Journal of Computer Applications (0975 – 8887)
Volume 178 – No. 15, May 2019
Now consider the array containing 1000 elements increasing Algo. LO MO EO TO MaxO AvgO
randomly whereas the random function for this test is given Interp. 5 995 0 4494511 8990 4494
as:
Bin. - - - 39991 48 39
Next_element=Previous_element+ (random() * 1000 +1)
It means that random number is from 1 to 1000 and every next Pres. 795 192 13 28782 110 28
element is the addition of previous element and generated
random number. The results of different algorithms are as
follow: The reason for interpolation search failure in efficient
searching in this case is the large variation between 999th and
Table 3.0. Random generated elements
1000th elements. When there are some elements which have
Algo. TS LI MI EI TI MaxI AvgI great difference in variation between them, as compared to
variations between other elements, then average variation
Interp. 1000 991 4 5 2754 7 2 becomes biased towards large variation (average variation
Bin. 1000 - - - 8987 10 8 comes out very larger and gives estimated index very far from
actual index) which affects the searching of other
Pres. 1000 994 4 2 2613 5 2 elements(e.g all the elements excepts outliers). On the other
hand, presented algorithm does not fail in this case because it
reduces the search space into half from one of either side,
Table 3.1. Random generated elements(Operational which is suitable for next estimation and due to reduction of
Analysis) search space, estimation becomes more accurate. This is why
Algo. LO MO EO TO MaxO AvgO most of the time presented approach converges to the
searched element in fewer numbers of operations. When we
Interp. 934 58 8 22217 53 22 look closely towards the efficiency of different algorithms in
Bin. - - - 39991 48 39 operational analysis, it can also be noticed that the MaxO of
presented algorithm is greater than that of binary search; it
Pres. 940 52 8 22714 47 22 does not matter much because most of the time presented
approach works very well as compared to binary search (from
AvgO) and the case where there is only one outlier with this
It can be noticed that average number of iterations and much variation from previous elements is very rare. The result
operations that binary takes to search an element is very large gets better when we consider real life situations which usually
as compared to interpolation and presented algorithm. This is don’t have just one outlier but some outliers or clusters of
because of not considering the type of data. Both interpolation elements.
and presented approach consider the variations between the
data. Thus, they converge to target element with very less Now consider the case where elements increase exponentially.
iterations and operations. However, in this case too presented This is the case where each pair of elements has variation vary
algorithm has minimum MaxI and MaxO. with large amount than other pair of elements’ variation.As a
result the average variation comes out is very disturbed which
Now consider the case where interpolation fails to produce cause very wrong prediction of index of target element.Thus,
efficient result. It is the case when there is equally distributed interpolation search fails to perform efficient searching in this
data but at the end there are few elements which have a very case too. However, presented algorithm works very well as
large variation from its previous element e.g outlier elements compared to interpolation in this case too because it shortens
or there are some equally distributed data and there are some the search space with each iteration to give better average
data coming with very different variation e.g cluster of variation which helps in better prediction of index of target
elements. For example, in this test we assumed an extreme element. Exponential function for this test is given below:
situation of aforementioned case represented as array = {1, 2,
3, 4, 5, 6, …….. , 997, 998, 999, 1000000999}. The results of Next_element=i*i*i*i
different algorithms are as follow: Here ‘i’ is the index number. It means that every next element
of array is the 4th power of the index number. In this test there
is an array of 1000 elements e.g array={1,16,81,256,625,…..,
996005996001, 1000000000000}. The results of different
algorithms are as follow:
14
International Journal of Computer Applications (0975 – 8887)
Volume 178 – No. 15, May 2019
Table 5.0. Exponentially increasing elements it takes log (log n) steps (only 4 or 5 iterations), which is a
huge advantage over other search algorithms.
Max Av
Algo. TS LI MI EI TI
I gI 7. COMPLEXITY ANALYSIS AND ITS
Interp. 1000 230 730 40 41262 177 41 COMPARISON
In best case, all the three algorithms have constant time
Bin. 1000 - - - 8987 10 8 complexity. As presented algorithm estimate the position of
Pres. 1000 933 32 35 5556 10 5 target element on the basis of estimate variation between the
elements of array. Suppose, if the elements of array increase
with equal variation or has equally distributed data. Then,
average variation comes out will be very accurate. Therefore,
Table 5.1. Exponentially increasing elements (Operational
presented algorithm finds the target element in constant time.
Analysis)
For example, an
Algo. LO MO EO TO MaxO AvgO array={2,4,5,8,9,11,12,14,16,18,20,22,23,25,27} contained
equally distributed data. All the searches made to this array
Interp. 118 880 2 370358 1592 370 will be in constant time. Thus, the time complexity of
Bin. - - - 48978 60 50 presented algorithm in best case is O(1).
Pres. 330 648 22 54506 109 54 In average case, the performance of presented algorithm is
slight better as compared to interpolation search and way
better than binary search because in average case presented
When elements increase exponentially, variation between the algorithm estimates the position of target element on the basis
elements of data/array also increases and as a result, the of average variation which is not so disturbed because
average variation becomes biased towards large variation. elements are coming with random variations. Thus, it
Due to biased average variation (toward large variation converges to target element taking very less iteration as
between the elements at the end part of array), the index compared to binary search. As, we are using the same
estimation is very wrong for the elements near the beginning approach as interpolation search so we can say that the
of array. Interpolation search takes several numbers of average case complexity of presented algorithm is same as
iterations to find the target element becaurse after predicting a interpolation search O(log(log(n)) where ‘n’ is the number of
very wrong index it moves forward in linear fashion e.g elements in the data structure/array . It can be noticed that the
suppose it predicts 2nd index but actual index is 28 then in reduction of search space in presented algorithm follows both
next iteration it predicts 3rd index and in next interation it the techniques e.g search space reduction technique used in
predicts 4th index and so on (same is the situation for set of interpolation and binary search. But in average case, reduction
test “Outliers elements”). However, the presented algorithm is biased towards one used in interpolation search(even in first
works well in this case because it reduces the search space iteration average variation is accurate enough to move very
with each iteration. Due to which better average variation close to target element due to which reduction using binary
comes out and prediction of target element’s index is more search technique is not so needed ). If we talk about binary
accurate. But in operational analysis of set of test 3 and 4, it search, it is not affected by any variation or other factors. It
can also be noticed that in case of the maximum number of does not concerend with how elements of data are increasing.
operations “MaxO” taken by the presented algorithm exceeds It just targets the middle element and discards half of the
the maximum number of operations binary search takes, elements. Thus, its time complexity in average case is O(log
which depicts that the efficiency of presented approach is n).
slightly lower as compared to binary when elements have very
The most interesting thing happens in worst case. Binary
different variations. However, it does not have a significant
search complexity remains the same as average case (O(log
effect in even these cases, most of the time, presented
n)) but interpolation search fails to perform effficient
algorithm takes less operations than binary search as we can
searching in worst case. It is because of elements coming with
see in the “AvgO” column of the operational analysis and
very different variations or we can say that there is a large
these cases are rare too.
difference/variation between the variations of each pair of
Moreover, there are some more test has been run on the elements or there is cluster of elements in the search space,
presented algorithms. In these test too, presented approach which causes very inaccurate index estimation. That’s why
works better as compared to other search algorithms. One of interpolation search take almost ‘n’ iterations in worst case or
these tests includes string searching. For this test, a worst case complexity is O (n), which is very bad relatively.
CMUdict[16] (Carnegie Mellon University) Pronouncing Presented algorithm, using its index estimation formula and
Dictionary (an open-source machine-readable pronunciation search space reduction technique, overcome the
dictionary for North American English that contains over aforementioned problem and finds the target element in fewer
134,000 words and their pronunciations) is used. To perform number of iterations. The worst case complexity of presented
this test, a hashing function is used to convert each approach is not very accurate. However, it is evaluated to be
string/word into a unique number according to their characters O(log n)+O(log(log n)) which eventually becomes O(log n).
(in that word/string). Each word/string is searched in this test,
Comparison of different search algorithms complexity is as
about 132905 searches have been made, out of which in
follow:
132524 searches, presented algorithm works better than
binary search, in 178 searches it works the same as binary
search and in rest of the searches presented algorithm takes a
few more iterations than binary search. It was noticed that the
presented approach takes maximum of log n steps in
searching any word/string (17 iterations max.) and in average
15
International Journal of Computer Applications (0975 – 8887)
Volume 178 – No. 15, May 2019
10. ACKNOWLEDGMENT
The author thanks Dr. Muhammad Ali Tahir (Assistant
professor, Department of Computing-SEECS) for his
guidance and inspiration.
11. REFERENCES
[1] D. E. Knuth, “The Art of Computer Programming”, Vol.
3: Sorting and Searching, Addison Wesley, 1973.
[2] F. Plavec, Z. G. Vranesic, Stephen D. Brown, “On
Digital Search Trees: A Simple Method for Constructing
Balanced Binary Trees”, in Proceedings of the 2 nd
Fig 2: Different cases comparison International Conference on Software and Data
Technologies (ICSOFT ’07), Vol. 1, Barelona, Spain,
In Figure 2, it can be noticed that the presented algorithm is a July 2007, pp. 61-68.
drastic improvement over binary and interpolation search. In
worst case, interpolation search totally fails to produce result [3] W. W. Peterson, “Addressing for Random-Access
efficiently but presented algorithm works well. The main Storage”, IBM Journal of Research & Development, doi:
advantage of presented algorithm is in average and some of 10.1147/rd.12.0130, 1957.
the worst case scenarios. The grey shaded region represents [4] Ben Shneiderman, “Jump Searching: A Fast Sequential
the cases where the proposed approach works more efficiently Search Technique”, Communications of the ACM, Vol.
as compared to both interpolation and binary search. These 21, NY, USA, Octuber 1978, pp. 831-834, doi:
cases include randomly increasing elements, clusters of 10.1145/359619.359623.
elements, outliers etc in search space. In rest of the cases,
presented approach works more or less the same as binary [5] Phisan Kaewprapha, Thaewa Tansarn, Nattankan
search. Thus, it is better to use proposed approach than other Puttarak, “Network localization using tree search
algorithm in any case. algorithm: A heuristic search via graph properties”, 13 th
International Conference on Electrical
8. ADVANTAGES AND Engineering/Electronics, Computer, Telecommunications
DISADVANTAGES and Information Technology (ECTI-CON), 2016.
Presented algorithm can be used in any scenario and [6] Parveen Kumar, “Quadratic Search: A New and Fast
eliminates the need of analyzing the given problem to look for Searching Algorithm (An extension of classical Binary
the apt algorithm. Presented algorithm works well in all search strategy)”, International Journal of Computer
scenarios and has better efficiency than all the best search Applications, Vol. 65, Hamirpur Himachal Pradesh,
algorithms in sorted array domain. Less iteration is needed to India, March 2013.
find the target element, no need of extra space and execution
time is also better than other algorithms. There is as such no [7] Hermann Von Schmid, “Decimal Computation (1 st ed)”,
cons of presented approach other than that it is slight more John Wiley & Sons Ins., NY, USA, 1974.
costly (calculation of estimation formula and search space [8] J. L. Bentley, A. C. Yao, “An almost optimal algorithm
reduction calculation makes it more costly). However, even for unbounded searching”, Vol. 5, Issue 3, Information
after including these extra costs to the algorithm’s time Processing Letters, pp. 82-87, doi: 10.1016/0020-
complexity, it works very well in best and average cases but 0190(76)90071-5, ISSN 0020-0190, 1976.
works almost identical to binary search in worst case. Another
disadvantage is that it only works on numbers (If you find a [9] B. Chazelle, L. J. Guibas, “Fractional cascading: A data
string in a batches of string, you will have to convert array of structuring technique”, Algorithmica, Vol. 1, Issue 1-4,
string into array of numbers using some hashing function) pp. 133-162, November 1986.
16
International Journal of Computer Applications (0975 – 8887)
Volume 178 – No. 15, May 2019
[11] Jaiwei Han, Micheline Kamber, Jian Pei, “Data Mining: Proceedings of the 10 th annual ACM symposium on
Concepts and Techniques (3 rd ed.)”, ISBN: 978-0-12- Theory of computing, doi: 10.1145/800133.804351, pp.
381479-1, June 2011. 227-232, San Diego, California, USA, 1978.
[12] Vladimir Korepin, Ying Xu, “Binary Quantum Search”, [15] David E. Ferguson, “Fibonaccian searching”,
International Journal of Modern Physics B, Vol. 21, Communications of ACM, Vol. 3, Issue 12, NY, USA,
ISSN 5187-5205, doi: 10.1117/12.717282, May 2007. doi: 10.1145/367487.367496, 1960.
[13] Andrzej Pelc, “Searching with known error probability”, [16] Kevin lenzo, “The CMU Pronouncing Dictionary”,
Theoretical Computer Science, pp. 1855-2022, doi: Speech at CMU, Retrieved Aug 13, 2018 from url:
10.1016/0304-3975(89)90077-7, 1989. http://www.speech.cs.cmu.edu/cgi-bin/cmudict.
[14] R. L. Rivest, A. R. Meyer, D. J. Kleitman, “Coping with
errors in binary search procedures”, STOC ’78
IJCATM : www.ijcaonline.org 17