DS Unit 4
DS Unit 4
DS Unit 4
Unit IV
Sorting: O notation – efficiency of sorting – bubble sort – quick sort – selection sort – heap sort
– insertion sort – shell sort – merge sort – radix sort.
O NOTATION
Example:
Let's take an example of Big-O. Say that f(n) = 2n + 8, and g(n) = . Find a constant , so
that 2n + 8 <= . The number 4 works here, giving us 16 <= 16. For any number n greater than
4, this will still work. Since we're trying to generalize this for large values of n, and small values
(1, 2, 3). we can say that f(n) is generally faster than g(n); that is, f(n) is bound by g(n), and will
always be less than it.
To find the upper bound - the Big-O time - assuming we know that f(n) is equal to (exactly) 2n +
8, we can take a few shortcuts. For example, we can remove all constants from the runtime;
eventually, at some value of c, they become irrelevant. This makes f(n) = 2n. Also, for
convenience of comparison, we remove constant multipliers; in this case, the 2. This makes f(n)
= n. It could also be said that f(n) runs in O(n) time; that lets us put a tighter (closer) upper bound
onto the estimate.
O(n): printing a list of n items to the screen, looking at each item once.
O(ln n): also "log n", taking a list of items, cutting it in half repeatedly until there's only one item
left.
O( ): taking a list of n items, and comparing every item to every other item.
O(1)
An algorithm with this running time is said to have "constant" running time. Basically, this
means the algorithm always take about the same amount of time, regardless of the size of the
input. To state it technically, if an algorithm will never perform more than a certain number of
steps, no matter how large the input gets, then that algorithm is considered to have a constant
running time. For example, an algorithm which consists of performing exactly 7 multiplications
has a constant running time. Although constant time is the best running time an algorithm can
have, that algorithm could still be considered bad if the total amount of time to run the algorithm
were too large.
Some examples of O(1) algorithms include: inserting an element onto the front of a linked list,
popping from or pushing onto a stack, and retrieving the nth element of an array.
O(n)
An algorithm which runs in O(n) is said to have a "linear" running time. This basically means
that the amount of time to run the algorithm is proportional to the size of the input. To be
technical, an algorithm which never performs more than certain number of steps for each
element in the input has a linear running time. For example, an algorithm which sums the total
of a list of numbers has a linear running time, because the number of additions required is the
same as the number of elements.
Some examples of O(n) algorithms include searching through an unordered list, incrementing
every element of an array, and calculating fibonacci numbers using dynamic programming.
O(n2)
An algorithm with this running time is said to have "quadratic" running time. This means that
whenever you increase the size of the input by a factor of n, the running time increases by a
factor of n2. For example, if you double the size of the input of a quadratic algorithm, then the
running time will quadruple.
Some sorting algorithms, such as insertion sort and bubble sort, have quadratic running times.
O(lgn)
An algorithm with O(lgn) running time is said to have "logarithmic" running time. This
means that as the size of the input increases by a factor of n, the running time increases by a
factor of the logarithm of n. For example, if you increase the input size of a O(lgn) algorithm by
a factor of 1024, the running time will increase by a factor of 10. This running time is better than
O(n), but not as good as O(1). As the input size gets large, however, the behavior becomes
comparable to O(1) in many circumstances. Algorithms which search through ordered lists or
binary trees, as well as operations on heaps generally have logarithmic running times.
O(nlgn)
An algorithm which has this order, will in increase in running time proportionate to the size of
the input times the logarithm of the size of the input. Technically speaking, an algorithm which
when given an input of size n never performs more than cnlgn steps has a running time of
O(nlgn). This running time is better than O(n2) but not quite as good as O(n).
The fastest sorting algorithms, including mergesort and quicksort, have O(nlgn) running times.
SORTING
The two main classifications of sorting based on the source of data are
a. Internal sorting
b. External sorting
c. Stable Sorting
External sorting
External sorting is a process of sorting in which large blocks of data stored in storage
devices are moved to the main memory and then sorted .i.e A sort can be External if the records
that it is sorting are in Auxiliary storage.
Internal sorting
Internal sorting is a process of sorting the data in the main memory. i.e A sort can be Internal if
the records that it is sorting are in main memory.
Stable sort
A sorting technique is called stable if for all records i and j such that k[i] equals k[j], if
r[i] precedes r[j] in the original file, r[i] precedes r[j] in the sorted file. That is, a stable sort
keeps records with the same key in the same relative order that they were in before the sort.
Various factors to be considered in deciding a sorting algorithm
a. Programming time
b. Execution time of the program
c. Memory needed for program environment
Bubble Sort:
In this sorting method, to arrange elements in ascending order, we begin with the 0th element
an compare it with the 1st element.
If it is found to be greater than the 1st element, then they can be interchanged.
In this way all the elements are compared (excluding last) with their next element and are
interchanged if required. On completing the first iteration, the largest element gets placed at
the last position. After all the iterations, the list becomes a sorted list
The best case involves performing one pass which requires n-1 comparisons.
Consequently, the best case is O(n). The worst case performance of the bubble sort is
𝑛(𝑛−1) 𝑛(𝑛−1)
comparisons and exchanges. The average case is more difficult to analyze
2 2
than the other cases. It can be shown that the average case analysis is O(n2). the average
no. of passes is approximately n − 1.25 𝑛. for n=10 the average number of passes is 6.
The average no. of comparisons and exchanges are both O(n2).
Selection Sort
Selection sort is the easiest method of sorting. To sort the data in ascending order, the 0 th
element is compared with all other elements. If the 0th element then they are interchanged.
In this way after the first iteration, the smallest element is placed at 0th position. The
procedure is repeated for 1st element.
23 15 29 11 1
exchange No exchange
15 23 29 11 1
15 23 29 11 1
exchange
11 23 29 15 1
exchange
1 23 29 15 1
First iteration
No exchange
1 23 29 15 11
exchange
1 15 29 23 11
1 11 29 23 15
Second iteration
1 11 29 23 15
No exchange
1 11 23 29 15
1 11 15 29 23
Third iteration
1 11 15 29 23
1 11 15 23 29
Fourth iteration
Sorted Array: 1 11 15 23 29
The algorithm, the search for the record with the next smallest key is called a pass.
There are n-1 such passes required in order to perform the sort. This is because each pass
places one record into its proper location.
During the first pass, in which the record with the smallest key is found, n-1
records are compared. In general, for the ith pass of the sort, n-i comparisons are required.
The total number of comparisons is therefore, the sum
𝑛−1
1
𝑛−𝑖 = 𝑛(𝑛 − 1)
2
𝑖−1
INSERTION SORT
In insertion sort, the first iteration starts with comparison of 1st element with the 0th element.
In the second iteration 2nd element is compared with the 0th and 1st. In general in every
iteration an element is compared with all elements.
If at some point it is found that the element can be inserted at a position then space is created
for it by shifting the other elements one position to the right and inserting the element at the
suitable position.
23 15 29 Third 11
iteration 1
No exchange
15 23 29 11 1
15 23 29 11 1
11 15 23 29 1
Sorted Array: 1 11 15 23 29
In the worst case algorithm makes (i+1) comparisons before making the insertion. Hence
the computing time for the insertion is O(i) overall worst case time is O(n2). The average
case time is also O(n2).
The shell sort, named after its developer Dona/d. L . shell in 1959 is an extension of the
Insertion sort,which has the limitation, that it compares only the consecutive elements and The
interchanges the elements by only one space. The smaller elements that are far away require
Many passes through the sort, to properly insert them in its correct position.
The shell sort overcomes this limitation, gains speed than insertion sort, by comparing
elements that are at a specific distance from each other, and interchanges them if necessary.The
shell sort divides the list into smaller sub lists, and then sorts the sub lists seperately using the
insertion sort. This is done by considering the input list being n-sorted. This method splits the
input list into h-independent sorted files. The procedure of h-sort is insertion sort considering
only the hth element (starting any where).The value of h will be initially high and is repeatedly
Decremented until it reaches 1.When h is equal to 1 , a regular insertion sort is performed on the
list, but by then the 1ist of data is guaranteed to be almost sorted. Using the above Procedure for
any sequence values of h, always ending in 1 will produce a sorted list.
ALGORITHM:
Void shell sort (int a [ ], int n)
{
d=n/2;
for (i=0;i<n;i++)
{
for (j=0;j<n-d;j++)
{
if (a[ j ] > a[ j+d] )
{
temp= a[ j ];
a[ j ] = a[ j+d];
a[ j+d ] = temp;
}
}
d=(d+1)/2;
}}
Shell sort is the method of choice for many sorting applications because it has acceptable
running time for moderately large arrays (containing more than 5000 elements) and requires
only a very small amount of code for its operation. The shell sort is also referred as diminishing
increment sort, because the number of elements compared in a group continuously decreases in
each pass.
MERGE SORT
Merging is the process of combining two sorted lists into merge sorted list. To perform
the merge sort both the sorted lists are compared. The smaller of both the elements from
both the lists are placed in the third list.
Given an unsorted array, the array is divided into two array say x and y and can be sorted
with any of the sorting algorithm. The 0th element from the first array is compared with
0th of the second array y. If it is smaller than it is moved to the third array. Now the 0 th
element from the first array is compared with 1st element from the second array.
Then the 1st element from the first array is compared with the 1st element from the second
array. Now the1st element from the first array in compared with the 2nd element in the
second array and place it in the third array.
The same procedure is repeated till the end of one of the arrays is reached. Now the
remaining elements from the other array are placed directly into the third list as are
already in sorted order
The merge algorithm also uses divide and conquer rule for its operation .The most
popular method for sorting on external storage device is Merge sort. This method consists
of two distinct phase.
i) Fist segment of input list are sorted using a good internal sorting method. There
sorted segment known as run’s are written on to external storage as they are
generated.
ii) Second. the run generated in phase one are merged together in a merge_ tree Pattern.
Until only one run is left.
26 5 77 1 61 11 59 15 48 19
5 , 26 1 , 77 11, 61 15 , 59 19 , 48
1 , 5 , 26 , 77 11 , 15 , 59 , 61 19 ,48
1 , 5 , 11 , 15 , 26 , 59 , 61 , 77 19 , 48
1 , 5 , 11 , 15 , 19 , 26 , 48 , 59 , 61 , 77
Because the simple merge functions merges only the loading records of the two runs being
merged to be present in memory at one time. It is possible to merge large run’s together.
Time and space complexity of merge sort:
The timing performance of this algorithm is O(n) where n denotes the num of the sizes of
the two sub tables to be merged.
RADIX SORT
Radix sort is a small method that many people intuitively use when alphabetizing a large
list of names.
Specifically, the list of names is first sorted according to the first letter of each names,
that is, the names are arranged in 26 classes
Intuitively, one might want to sort numbers on the most significant digit.
But Radix sort do counter-intuitively by sorting on the least significant digits first.
On the first pass entire numbers sort on the least significant digit and combine in a array.
Then on the second pass, the entire numbers are sorted again on the second least-
significant digits and combine in a array and so on.
Following example shows how Radix sort operates on seven 3-digits number.
Buckets 1st Pass 2nd Pass 3rd Pass
0 720
1
INPUT 329 329
2
720 355
436 436
329 3
839 457
457 355
657 5 355 457
839 657
436 6 436 657
720
457
355 7 720
657
8 839
329
9
839
In the above example, the first column is the input. The remaining shows the list after
successive sorts on increasingly significant digits position.
The code for Radix sort assumes that each element in the n-element array A has d
digits, where digit 1 is the lowest-order digit and d is the highest-order digit.
Time and space complexity of radix sort:
The average case for radix sort is O(m+n), the worst case is also O(m+n).
QUICK SORT
STEPS:
1. If the number of elements in S is 0 or 1, then return.
2. Compare the first two elements. Choose the largest as the pivot. In case the first two elements
are the same, compare the second and the third and choose the largest. Continue this till the
last element is reached. In case all the elements have the same value, a pivot cannot be
chosen. Return.
3. Swap A[l] and A[r]. The function interchange (a, i, j) does this.
4. Advance the ‘l’ pointer till an element greater than the pivot element is found.
5. Decrement ‘r’ pointer until an element less than or equal to the pivot is found.
6. Check whether l<r. If so, swap a[l] and a[r].
7. If l>r, partition the list into 2 parts. (upto A[r] and after A[r]).
8. Follow the above mentioned steps for each partition.
HEAP SORT
The binary heap data structure is an array that can be viewed as a complete binary tree.
Each node of the binary tree corresponds to an element of the array. The array is completely
filled on all levels except possibly lowest.
Heaps are represented in level order, going from left to right. The array corresponding to
the heap above is [25, 13, 17, 5, 8, 3].
The root of the tree A[1] and given index i of a node, the indices of its parent, left child
and right child can be computed
PARENT (i)
return floor(i/2)
LEFT (i)
return 2i
RIGHT (i)
return 2i + 1
Go from the 20 to the 6 first. The index of the 20 is 1. To find the index of the left child,
calculate 1 * 2 = 2. This takes us to 14.
Now, to go right, calculate 2 * 2 + 1 = 5. This takes us to the 6.
Try going from 4 to 20. 4's index is 7. To go to the parent, calculate 7 / 2 = 3, which takes us
to the 17.
Now, to get 17's parent, calculate 3 / 2 = 1, which takes us to the 20.
Structure Property
A heap is a binary tree that is completely filled, with the possible exception of the bottom
level, which is filled from left to right. Such a tree is known as a complete binary tree. Figure
shows an example.
Heap Order Property
This property allows operations to be performed quickly is the heap order property.
Min-heap: If the minimum element has to be found quickly, the smallest element should
be at the root. If the subtrees should also be min-heaps, then all the sub root nodes should be
smaller than all of their descendants.
Max-heap : If the maximum has to be found quickly, the largest element should be at the
root. If the subtrees should also be max-heaps, then all the sub root nodes should be larger than
all of their descendants.
A heap of height h has the minimum number of elements when it has just one node at the
lowest level.
The levels above the lowest level form a complete binary tree of height h -1 and 2h -1 nodes.
Hence the minimum number of nodes possible in a heap of height h is 2h.
Clearly a heap of height h, has the maximum number of elements when its lowest level is
completely filled.
In this case the heap is a complete binary tree of height h and hence has 2h+1 -1 nodes.
Following is not a heap, because the heap order property holds – but the structure property
does not hold.
Algorithm
void heapsort ( )
{
for (int i = n/2; i>=1; i- -)
pushdown (i,n);
for (int i = n; i>=2; i - -)
{
swap (A[1], A[i]);
pushdown(1, i-1);
}}
void pushdown (int first, int last)
{
int t, i;
int r = first;
while (r <= last/2)
if (last = = 2r)
{
if ( A[r] < A[2*r] )
{
swap ( A[r], A[2 * r]);
r = last;
}}
else if (A[r] < A[2*r] && A[2*r] >= A[2*r + 1] )
{
swap ( A[r], A[2 * r ]);
r = 2 * r;
}
else if (A[r] < A[2*r+1] && A[2*r+1] >= A[2*r] )
{
swap ( A[r], A[2 * r + 1]);
r = 2 * r+1;
}
else r = last;
}
We must first create the heap and then sort the heap.
The worst case at each step involves performing a no. of comparisons which is
given by the depth of the tree.
No. of comparisons is O(n log 2 𝑛). Average case shows as also O(n log 2 𝑛)
2. What are the two main classifications of sorting based on source of data?
Internal sorting
External sorting