Chapter Three Searching and Sorting Algorithms Mod 2015
Chapter Three Searching and Sorting Algorithms Mod 2015
1
Simple Searching and Sorting Algorithms
• Why searching and sorting algorithms in preference from
other algorithms?
3
Simple Searching Algorithms
• Examples of searching: an application program might
– Retrieve a record in a database matching a certain criteria
(e.g. Name = "Scott")
– Retrieve or Search a student record from a database
– Retrieve or search a specific document from the Internet
– Searching a file from the hard disk
– Retrieve a bank account record, credit record
– Deleting a record from a database (requires locating the
required item from a database)
– Find the maximum of a set of values
– Find the minimum of a set of values
– The median of a set of values
– ... Using searching algorithms
– Sequential searching
– Binary searching
5
Sequential Searching
• Is also called linear search, serial search
6
Sequential Search of a student database
name student major credit hrs.
1 John Smith 58567 physics 36
2 Paula Jones 36794 history 125
3 Led Belly 85674 music 72
. . . .
. . . .
. . . .
n Chuck Bin 43687 math 89
1. ask user to enter studentNum to search for
algorithm 2. set i to 1
to search database 3. set found to ‘no’
by student # : 4. while i <= n and found = ‘no’ do
5. if studentNum = studenti then set found to ‘yes’
else increment i by 1
7. if found = ‘no’ then print “no such student”
else < student found at array index i >
7
What is this algorithm’s time requirements? Hint: focus on the loop body
Sequential Searching
• Algorithm for sequential search
– Steps through the list from the beginning one item at a time
looking for the desired item
– The search stops when the item (that is the key) is found or
when the search has examined each item without success
( that is until end of list is reached)
8
Implementation of sequential search
• Following is the same algorithm for sequential search written
using the C++ programming language
return location;
9
Analysis of Sequential Search
• How do we estimate the complexity of this algorithm?
10
Analysis of Sequential Search
• A useful metric would be general (That is, applicable to any (search)
algorithm)
• One such metric is the number of main steps the algorithm will
require to finish
14
Time requirements for sequential search
• Best case (minimum amount of work)
15
Time requirements for sequential search
• Worst case (maximum amount of work ):
– For the serial search, the worst case running time occurs
either when the target is not in the list or the target is the
last element of the list
16
Time requirements for sequential search
• Worst case (maximum amount of work)
17
Time requirements for sequential search
time y = mx + b
time = m n + b
0 1 2 … n
size of the problem
•Average case
150,000 1 sec onds
comparisons * 3.75 sec
2 20,000 comparisons
•Worst case
1 sec onds
150,000 comparisons * 7.5 sec
20,000 comparisons
19
Bad news for searching phone book, IRS database, ...
Time requirements for sequential search
• Remark
20
Time requirements for sequential search
• Average case (expected amount of work)
– One way of developing an expression for the average
running time of serial search is based on all the targets that
are actually in the list
– Suppose (to be concrete) the list has ten elements
– So, there are ten possible targets
– A search for the target that occurs at the first location
requires just one comparison (or array access)
– A search for the target that occurs at the second location
requires two comparisons (or array accesses)
– And so on, through the final target, which requires ten
comparisons (or array access) search for
– In all there are ten possible targets which requires 1, 2, 3, 4,
5, 6, 7, 8, 9, 10 comparisons (or array accesses)
21
Time requirements for sequential search
• Average case (expected amount of work )
– The average of all theses searches is:
1 2 3 4 5 6 7 8 9 10
5.5
10
– Generalization
• The average-case running time of the serial search can be
written as an expression with part of the expression being
the size of the list (array)
• Using n as the size of the array, this expression for the
average-case running time is:
1 2 ... n n 1
n 2
amount
of work
– Tworst(n) = O(n)
22
Sequential search- Analysis
• Summary
23
Pros and Cons of Sequential Search Algorithm
• The principle strength of linear search is that it doesn't require
that the elements of the list be in any particular order, only that
we can step through the list.
• It is easy to implement
• It is easy to analyze
• It is fine to use if you are searching a small list (or array) a few
times
25
Binary search
• When the values of a list are in sorted order, there are better
searches than sequential search for determining whether a
particular value is in the list
• For example, when you look up a name on the phone book, you
do not start in the beginning and scan through until you find the
name
• You use the fact that names are listed in sorted order and use
some intelligence to jump quickly to the right page and then
start scanning
26
Binary Search
– Binary search is such an example, a better search than sequential
search for determining whether a particular value is in the list
27
Binary Search
• How it works
– A flag called found is set to false, meaning NOT found.
– middle is calculated
– This has the effect of discarding the larger half of the array
– If the value does not exist, the loop will terminate after the
array has been subdivided so many times, the last position is
a smaller integer than the first position, which of course is
impossible in an array
– That is, the search ends when the target item is found or
the values of first and last cross over, so that last < first,
indicating that no list items are left to check
30
Psuedocode for Binary Search
1. Locate midpoint of array to search
2. Determine if target is in lower half or upper half of an
array.
i. If in lower half, make this half the array to search
ii. If in the upper half, make this half the array to search
3. Loop back to step 1 until the size of the array to search is
one, and this element does not match, in which case return –1.
31
• Implementation of Binary sereach algorithm in c++
int Binary_Search(int list[],int n,int key) {
int left=0;
int right=n-1;
int found=0;
do{
mid=(left+right)/2;
if(key==list[mid])
found=1;
else{
if(key<list[mid])
right=mid-1;
else
left=mid+1;}}while(found==0&&left<right);
32
if(!found)
index=-1;
else
index=mid;
return index;
}
33
Binary Search
• Remark
– It might appear that this algorithm changes the list because each time
through the while loop half of the list is “discarded”, so that
at the end the list may contain only one element
– Rather than changing the list, the algorithm changes the part of the list to
search
– Remark
• The best case analysis does not reveal much
36
Analysis Binary Sort
• Worst case
– When does the worst case occur?
– Generalization
• If n is the size of the list to be searched and C is the
number of comparisons to do so in worst case, C = logn,
base is 2
100,000 100,000 16
200,000 200,000 17
400,000 400,000 18
800,000 800,000 19
1,600,000 1,600,000 20
39
Sequential Vs Binary search
• Note
40
Sequential Vs Binary search
41
Sequential Vs Binary search
• One can estimate from the graph which one performs better
(Include graph)
42
Sorting Algorithms
• Another very common operation is sorting a list of values or
data items
• Example
– Sorting a list of students in ascending order
– Sorting file search results by name, date modified, type
43
Time requirements for binary search
In the worst case (i.e. when studentNum is not in the list called student)
how many times will this happen?
n = 16 the number of times a number n
1st iteration 16/2 = 8 can be cut in half and not go below 1
2nd iteration 8/2 = 4 is log2 n.
3rd iteration 4/2 = 2 Said another way:
4th iteration 2/2 = 1 log2 n = m is equivalent to 2m = n
In the average case and the worst case, binary search is O(log 2 n) 44
This is a major improvement
n sequential search binary search
O(n) O(log2 n)
100 100 7 27=128
150,000 150,000 18 218=262,144
20,000,000 20,000,000 25 225 is about
33,000,000
number of comparisons
needed in the worst case
in terms of seconds...
150,000 comparisons x 1 second = 7.5 seconds
sequential search:
20,000 comparisons
vs.
binary search: 18 comparisons x 1 second > .001 seconds
20,000 comparisons 45
Conclusion
• The design of an algorithm can make the difference
between useful and impractical.
• Efficient algorithms for searching require lists that are
sorted.
• Next time we’ll see good and bad ways to do that, too.
46
Sorting: Analysis and Complexity
Sorting:
•one of most frequently performed tasks
•intensively studied
•many classic algorithms
•there remain unsolved problems
•new algorithms are still being developed
•refinements are very important for special cases
•good for illustrating analysis and complexity issues
•motivate use of file processing (when main memory is too
small)
•amenable to parallel implementation 47
Analysis and abstraction
Why not just measure their execution time?
Because this depends, in many cases, on the specific input array values:
•number of records,
•size of keys and records
•allowable range of key values
•the amount of (dis)order in original input
Traditional to measure number of comparisons: machine and type
independent, but when records are very large then their movement (in the
swap) may take a significant amount of time which would not be taken
into account using the comparison count. In these cases, it is better to
measure the number of swaps.
When comparisons and swaps take roughly the same amount of constant
time then it is best to measure/count both … this is particularly true when
records and keys are of constant size.
Question: there are always special cases … can you think of any?
48
Terminology and notation
For now, input is assumed to be a collection of records in an array (we will
look at linked list implementations later)
Each record must contain (at least) one sort key
The key can be of any type so long as there is a linear ordering relation
Assume that there is a function/method key that returns the key for a record
A typical comparison between records R and S will look like:
if (R.key() <= S.key()) …
We will also assume that there is a swap method which is called as follows:
swap(array, x,y)
An additional requirement must be placed on the handling of duplicates:
•can the algorithms cope
•how should they cope: an algorithm is stable if it maintains ordering of
records with the same key.
49
Three O(n^2) Sorting
Algorithms
Selection, Bubble and Insertion Sort
•Exchange Sorts
•Simple to understand/code
•Unacceptably slow for large arrays
•Some cases where they are acceptable
•You have already seen them informally
•We will now code them in C++
50
Internal and external sorting
• Internal sorting:
•The process of sorting is done in main memory
•The number of elements in main memory is relatively small
(less than millions).
•The input in fit into main memory
•In this type of sorting the main advantage is memory is directly
addressable, which bust-up performance of the sorting process.
51
• Some of the algorithms that are internal are:
• Bubble Sort
• Insertion Sort
• Selection Sort
• Shell Sort
• Heap Sort
52
External sorting:
•Cannot be performed in main memory due to their large input
size. i.e., the input is much larger to fit into main memory
• Sorting is done on disk or tape.
• It is device dependent than internal sorting
•Some of the algorithms that are external are:
• Simple algorithm- uses merge routine from merge sort
– Multiway merge
– Polypbase merge
– Replacement selection
53
• Assumptions:
Our sorting is comparison based.
Each algorithm will be passed an array containing N
elements.
N is the number of elements passed to our sorting
algorithm
The operators that will be used are “<”,”>”, and“==”
54
Simple sorting algorithms
•The following are simple sorting algorithms
used to sort small-sized lists.
Insertion Sort
Selection Sort
Bubble Sort
55
Bubble Sort
•Bubble sort is the simplest algorithm to implement and the
slowest algorithm on very large inputs.
•The basic idea is: Loop through list and compare adjacent pair
of elements and when every two elements are out of order with
respect to each other interchange them.
•Target: Pull a least element to upper position s bubble.
56
• The process of sequentially traversing through all part of list is
known as pass.
• Implementation: This algorithm needs two nested loops. The
outer loop controls the number of passes through the list and
inner loop controls the number of adjacent comparisons.
57
• Example: - suppose the following list of numbers are
stored in an array a.
Elements 7 20 15 3 72 13 11 32 9
Index 0 1 2 3 4 5 6 7 8
58
Index Elements Pass1 Pass2 Pass3 Pass4 Pass 5 Pass 6 Pass 7
7 7 72 72 72 72 72 72 72
6 20 7 20 20 20 20 20 20
5 15 20 7 17 17 17 17 17
4 3 17 17 7 13 13 13 13
3 72 3 13 13 7 11 11 11
2 13 13 3 11 11 7 9 9
1 11 11 11 3 9 9 7 7
0 9 9 9 9 3 3 3 3
59
Implementation:
void bubble_sort(list[])
{
int i,j,temp;
for(i=0;i<n; i++){
for(j=n-1;j>i; j--){
if(list[j]<list[j-1]){
temp=list[j];
list[j]=list[j-1];
list[j-1]=temp;
}//swap adjacent elements
}//end of inner loop
}//end of outer loop
}//end of bubble_sort
60
Analysis of Bubble Sort
•How many comparisons?
(n-1)+(n-2)+…+1= O(n2)
•How many swaps?
(n-1)+(n-2)+…+1= O(n2)
•Space?
• In-place algorithm.
•Conclusion: bubble sort is terrible in all cases
61
• If there is no change in the ith pass it implies that the elements
are sorted(sorting is done) earlier. Thus there is no need to
continue the remaining pass.
62
Modified bubble sort:
void mBubbleSort(int a[],int n)
{
int i=1;
int swapped =1;
Analysis:
number of comparisons is always 1+2+..+n-1 = O(n^2)
number of swaps is,on average, half the comparisons … this is also O(n^2)
Conclusion: bubble sort is terrible in all cases!
64
Insertion sort
•Insertion sort adds one element to the list at a time, placing it in
the proper place.
•Cards analogy
•Sequentially process a list of records.
•Each record is inserted in turn at the correct position within a
sorted list composed of records already processed.
65
Basic Idea:
•Find the location for an element and move all others up, and insert the
element.
•The process involved in insertion sort is as follows :
1.The left most value can be said to be sorted relative to itself. Thus,
we don’t need to do anything.
2.Check to see if the second value is smaller than the first one. If it
is, swap these two values. The first two values are now relatively
sorted.
3.Next, we need to insert the third value in to the relatively sorted
portion so that after insertion, the portion will still be relatively
sorted.
66
4. Remove the third value first. Slide the second value to make
room for insertion. Insert the value in the appropriate position.
5.Now the first three are relatively sorted.
6. Do the same for the remaining items in the list.
67
void insertion_sort(int a[],int n){
int temp;
int j;
for(int i=1;i<n;i++){
temp=a[i];
for(j=i; j>0;j--)
{
if(temp<a[j-1]){
a[j]=a[j-1];
}
else
break;
}
a[j]=temp;
cout<<"Output after pass:"<<i<<endl;
display(a,n);
}
}
68
Analysis
•How many comparisons?
1+2+3+…+(n-1)= O(n2)
•How many swaps?
1+2+3+…+(n-1)= O(n2)
•How much space?
• In-place algorithm
69
Insertion Sort Illustrated
x=1 2 3 4 5
S
10 5 1 1 1 1
O
5 10 5 5 5 4 R
T
1 1 10 8 8 5 E
8 8 8 10 10 8 D
13 13 13 13 13 10 P
A
4 4 4 4 4 13 R
T
UNSORTED PART OF ARRAY
Implementation Notes:
we appear to use only the array space of the original input array but this can only
be verified by analysis of the swap routine.
for each element x the number of swaps is between 0 and x (inclusive)
70
Analysis of Insertion Sort
Two nested for loops
•Outer loop is always executed n-1 times (where n is size of array)
•Inner for loop is more difficult to analyse: the number of iterations depends on
the input array.
Worst case analysis:
each record must go to top of array (when the input is originally in reverse
order). In this case, the number of comparisons = 1+2+3+…+ n-1.
This has complexity O(n^2)
Best case analysis:
keys are sorted and only 1 comparison is needed each time through the loop. In
this case, the number of comparisons = n-1
This has complexity O(n)
Conclusion: when we know that the disorder in a list is slight then insertion sort is
a good choice for re-sorting, otherwise insertion sort is too complex.
72
Basic Idea:
•Loop through the array from i=0 to n-1.
•Select the smallest element in the array from i to n
•Swap this value with value at position i.
73
Implementation of selection sort in C++
void selection_sort(int a[])
{
int i,j, smallest;
for(i=0;i<n;i++){
smallest=i;
for(j=i+1;j<n;j++){
if(a[j]<a[smallest])
smallest=j;
}//end of inner loop
temp=a[smallest];
a[smallest]=a[i];
a[i]=temp;
} //end of outer loop
}//end of selection_sort
74
ANALYSIS of Selection Sort
•Selection sort is a bubble sort, except that rather than
repeatedly swapping adjacent elements we remember the
position of the element to be selected and do the swap at the
end of the inner loop.
How many comparisons?
•(n-1)+(n-2)+…+1= O(n2)
•How many swaps?
•n=O(n)
•How much space?
• In-place algorithm
75
Illustrating Selection Sort
x=1 2 3 4 5 O
10 1 1 1 1 1 R
D
4 4 4 E
5 5 4 R
10 5 5 E
1 10 5 D
8 8 8 8 8 8
T
13 13 13 13 13 10 O
P
4 4 5 10 10 13
UNORDERED BOTTOM
76
C++ Selection Sort: Array of Integers
77
Exchange Sort Analysis --- A
summary
Insertion Bubble Selection
Comparisons
best case O(n) O(n^2) O(n^2)
average case O(n^2) O(n^2) O(n^2)
worst case O(n^2) O(n^2) O(n^2)
Swaps
best case 0 0 O(n)
average case O(n^2) O(n^2) O(n)
worst case O(n^2) O(n^2) O(n)
The crucial bottleneck, with these exchange sorts, is that only adjacent items are
compared.
78