Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DS Unit 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

IT-T33 DATA STRUCTURES UNIT IV

Unit IV
Sorting: O notation – efficiency of sorting – bubble sort – quick sort – selection sort – heap sort
– insertion sort – shell sort – merge sort – radix sort.

O NOTATION

BIG OH (O) NOTATION


Big ‘oh’: the function f(n)=O(g(n)) iff there exist positive constants c and no such that
f(n)≤c*g(n) for all n, n ≥ no.The Big oh (O) notation is used to give the upper bound for a
function t(n) with a constant factor. t(n) is bounded above by a constant multiple of
g(n).The upper bound on t(n) indicates that the function t(n) will be the worst-case that it does
not consume more than this computing time.

Example:

Let's take an example of Big-O. Say that f(n) = 2n + 8, and g(n) = . Find a constant , so
that 2n + 8 <= . The number 4 works here, giving us 16 <= 16. For any number n greater than
4, this will still work. Since we're trying to generalize this for large values of n, and small values
(1, 2, 3). we can say that f(n) is generally faster than g(n); that is, f(n) is bound by g(n), and will
always be less than it.
To find the upper bound - the Big-O time - assuming we know that f(n) is equal to (exactly) 2n +
8, we can take a few shortcuts. For example, we can remove all constants from the runtime;
eventually, at some value of c, they become irrelevant. This makes f(n) = 2n. Also, for
convenience of comparison, we remove constant multipliers; in this case, the 2. This makes f(n)

II YEAR/ III SEM Page 1


IT-T33 DATA STRUCTURES UNIT IV

= n. It could also be said that f(n) runs in O(n) time; that lets us put a tighter (closer) upper bound
onto the estimate.
O(n): printing a list of n items to the screen, looking at each item once.
O(ln n): also "log n", taking a list of items, cutting it in half repeatedly until there's only one item
left.

O( ): taking a list of n items, and comparing every item to every other item.

OMEGA (Ω) NOTATION


Omega: the function f(n)=Ω(g(n)) iff there exist positive constants c and no such that
f(n) ≥ c*g(n) for all n, n ≥ no. The Omega (Ω) notation is used to give the lower bound for a
function t(n) within a constant factor. t(n) is bounded below by a constant multiple of
g(n).The lower bound on t(n) indicates that the function t(n) will be the best-case that it does not
consume more than this computing time.

Theta (Θ) Notation


Theta: the function f(n)=ө(g(n)) iff there exist positive constants c1,c2 and no such that c1 g(n)
≤ f(n) ≤ c2 g(n) for all n, n ≥ no.The Theta (Θ) notation is used to give the lower and upper bound
for a function t(n) within a constant factor. t(n) is bounded above and below by constant
multiples of g(n). The lower bound on t(n) indicates that the function t(n) will be the worst-case
that it does not consume more than this computing time.

II YEAR/ III SEM Page 2


IT-T33 DATA STRUCTURES UNIT IV

Some polynomial running times:

O(1)
An algorithm with this running time is said to have "constant" running time. Basically, this
means the algorithm always take about the same amount of time, regardless of the size of the
input. To state it technically, if an algorithm will never perform more than a certain number of
steps, no matter how large the input gets, then that algorithm is considered to have a constant
running time. For example, an algorithm which consists of performing exactly 7 multiplications
has a constant running time. Although constant time is the best running time an algorithm can
have, that algorithm could still be considered bad if the total amount of time to run the algorithm
were too large.
Some examples of O(1) algorithms include: inserting an element onto the front of a linked list,
popping from or pushing onto a stack, and retrieving the nth element of an array.

O(n)
An algorithm which runs in O(n) is said to have a "linear" running time. This basically means
that the amount of time to run the algorithm is proportional to the size of the input. To be
technical, an algorithm which never performs more than certain number of steps for each
element in the input has a linear running time. For example, an algorithm which sums the total
of a list of numbers has a linear running time, because the number of additions required is the
same as the number of elements.
Some examples of O(n) algorithms include searching through an unordered list, incrementing
every element of an array, and calculating fibonacci numbers using dynamic programming.

O(n2)
An algorithm with this running time is said to have "quadratic" running time. This means that
whenever you increase the size of the input by a factor of n, the running time increases by a

II YEAR/ III SEM Page 3


IT-T33 DATA STRUCTURES UNIT IV

factor of n2. For example, if you double the size of the input of a quadratic algorithm, then the
running time will quadruple.
Some sorting algorithms, such as insertion sort and bubble sort, have quadratic running times.

O(lgn)
An algorithm with O(lgn) running time is said to have "logarithmic" running time. This
means that as the size of the input increases by a factor of n, the running time increases by a
factor of the logarithm of n. For example, if you increase the input size of a O(lgn) algorithm by
a factor of 1024, the running time will increase by a factor of 10. This running time is better than
O(n), but not as good as O(1). As the input size gets large, however, the behavior becomes
comparable to O(1) in many circumstances. Algorithms which search through ordered lists or
binary trees, as well as operations on heaps generally have logarithmic running times.

O(nlgn)
An algorithm which has this order, will in increase in running time proportionate to the size of
the input times the logarithm of the size of the input. Technically speaking, an algorithm which
when given an input of size n never performs more than cnlgn steps has a running time of
O(nlgn). This running time is better than O(n2) but not quite as good as O(n).
The fastest sorting algorithms, including mergesort and quicksort, have O(nlgn) running times.

SORTING

The two main classifications of sorting based on the source of data are
a. Internal sorting
b. External sorting
c. Stable Sorting
External sorting
External sorting is a process of sorting in which large blocks of data stored in storage
devices are moved to the main memory and then sorted .i.e A sort can be External if the records
that it is sorting are in Auxiliary storage.
Internal sorting
Internal sorting is a process of sorting the data in the main memory. i.e A sort can be Internal if
the records that it is sorting are in main memory.
Stable sort

II YEAR/ III SEM Page 4


IT-T33 DATA STRUCTURES UNIT IV

A sorting technique is called stable if for all records i and j such that k[i] equals k[j], if
r[i] precedes r[j] in the original file, r[i] precedes r[j] in the sorted file. That is, a stable sort
keeps records with the same key in the same relative order that they were in before the sort.
Various factors to be considered in deciding a sorting algorithm
a. Programming time
b. Execution time of the program
c. Memory needed for program environment

Bubble Sort:
 In this sorting method, to arrange elements in ascending order, we begin with the 0th element
an compare it with the 1st element.

 If it is found to be greater than the 1st element, then they can be interchanged.

 In this way all the elements are compared (excluding last) with their next element and are
interchanged if required. On completing the first iteration, the largest element gets placed at
the last position. After all the iterations, the list becomes a sorted list

II YEAR/ III SEM Page 5


IT-T33 DATA STRUCTURES UNIT IV

Time and space complexity of bubble sorting:

II YEAR/ III SEM Page 6


IT-T33 DATA STRUCTURES UNIT IV

The best case involves performing one pass which requires n-1 comparisons.
Consequently, the best case is O(n). The worst case performance of the bubble sort is
𝑛(𝑛−1) 𝑛(𝑛−1)
comparisons and exchanges. The average case is more difficult to analyze
2 2
than the other cases. It can be shown that the average case analysis is O(n2). the average
no. of passes is approximately n − 1.25 𝑛. for n=10 the average number of passes is 6.
The average no. of comparisons and exchanges are both O(n2).

Selection Sort
 Selection sort is the easiest method of sorting. To sort the data in ascending order, the 0 th
element is compared with all other elements. If the 0th element then they are interchanged.
In this way after the first iteration, the smallest element is placed at 0th position. The
procedure is repeated for 1st element.

 Eg. Array before sorting: 23 15 29 11 1

II YEAR/ III SEM Page 7


IT-T33 DATA STRUCTURES UNIT IV

Array before sorting: 23 15 29 11 1

23 15 29 11 1
exchange No exchange

15 23 29 11 1

15 23 29 11 1
exchange

11 23 29 15 1
exchange

1 23 29 15 1

First iteration

No exchange

1 23 29 15 11
exchange

1 15 29 23 11

1 11 29 23 15

Second iteration
1 11 29 23 15
No exchange

1 11 23 29 15

1 11 15 29 23

II YEAR/ III SEM Page 8


IT-T33 DATA STRUCTURES UNIT IV

Third iteration

1 11 15 29 23

1 11 15 23 29

Fourth iteration

Sorted Array: 1 11 15 23 29

Time and space complexity of selection sorting:

The algorithm, the search for the record with the next smallest key is called a pass.
There are n-1 such passes required in order to perform the sort. This is because each pass
places one record into its proper location.

During the first pass, in which the record with the smallest key is found, n-1
records are compared. In general, for the ith pass of the sort, n-i comparisons are required.
The total number of comparisons is therefore, the sum
𝑛−1
1
𝑛−𝑖 = 𝑛(𝑛 − 1)
2
𝑖−1

Therefore, the no. of comparisons is proportional to n2. i.e., O(n2).

INSERTION SORT

 Insertion sort is performed by inserting a particular element at the appropriate position.

 In insertion sort, the first iteration starts with comparison of 1st element with the 0th element.

 In the second iteration 2nd element is compared with the 0th and 1st. In general in every
iteration an element is compared with all elements.

II YEAR/ III SEM Page 9


IT-T33 DATA STRUCTURES UNIT IV

 If at some point it is found that the element can be inserted at a position then space is created
for it by shifting the other elements one position to the right and inserting the element at the
suitable position.

 This procedure is repeated for all the elements in the array.

Eg: unsorted array 23 15 29 11 1

23 15 29 Third 11
iteration 1
No exchange

15 23 29 11 1

15 23 29 11 1

11 15 23 29 1

Sorted Array: 1 11 15 23 29

Time and space complexity of selection sorting:

In the worst case algorithm makes (i+1) comparisons before making the insertion. Hence
the computing time for the insertion is O(i) overall worst case time is O(n2). The average
case time is also O(n2).

SHELL SORT(DIMINSHING INCREMENT SORT):

The shell sort, named after its developer Dona/d. L . shell in 1959 is an extension of the
Insertion sort,which has the limitation, that it compares only the consecutive elements and The
interchanges the elements by only one space. The smaller elements that are far away require
Many passes through the sort, to properly insert them in its correct position.

The shell sort overcomes this limitation, gains speed than insertion sort, by comparing
elements that are at a specific distance from each other, and interchanges them if necessary.The
shell sort divides the list into smaller sub lists, and then sorts the sub lists seperately using the
insertion sort. This is done by considering the input list being n-sorted. This method splits the

II YEAR/ III SEM Page 10


IT-T33 DATA STRUCTURES UNIT IV

input list into h-independent sorted files. The procedure of h-sort is insertion sort considering
only the hth element (starting any where).The value of h will be initially high and is repeatedly
Decremented until it reaches 1.When h is equal to 1 , a regular insertion sort is performed on the
list, but by then the 1ist of data is guaranteed to be almost sorted. Using the above Procedure for
any sequence values of h, always ending in 1 will produce a sorted list.

ALGORITHM:
Void shell sort (int a [ ], int n)
{
d=n/2;
for (i=0;i<n;i++)
{
for (j=0;j<n-d;j++)
{
if (a[ j ] > a[ j+d] )
{
temp= a[ j ];
a[ j ] = a[ j+d];
a[ j+d ] = temp;
}
}
d=(d+1)/2;
}}
Shell sort is the method of choice for many sorting applications because it has acceptable
running time for moderately large arrays (containing more than 5000 elements) and requires
only a very small amount of code for its operation. The shell sort is also referred as diminishing
increment sort, because the number of elements compared in a group continuously decreases in
each pass.

II YEAR/ III SEM Page 11


IT-T33 DATA STRUCTURES UNIT IV

MERGE SORT
 Merging is the process of combining two sorted lists into merge sorted list. To perform
the merge sort both the sorted lists are compared. The smaller of both the elements from
both the lists are placed in the third list.
 Given an unsorted array, the array is divided into two array say x and y and can be sorted
with any of the sorting algorithm. The 0th element from the first array is compared with
0th of the second array y. If it is smaller than it is moved to the third array. Now the 0 th
element from the first array is compared with 1st element from the second array.
 Then the 1st element from the first array is compared with the 1st element from the second
array. Now the1st element from the first array in compared with the 2nd element in the
second array and place it in the third array.
 The same procedure is repeated till the end of one of the arrays is reached. Now the
remaining elements from the other array are placed directly into the third list as are
already in sorted order
 The merge algorithm also uses divide and conquer rule for its operation .The most
popular method for sorting on external storage device is Merge sort. This method consists
of two distinct phase.

II YEAR/ III SEM Page 12


IT-T33 DATA STRUCTURES UNIT IV

i) Fist segment of input list are sorted using a good internal sorting method. There
sorted segment known as run’s are written on to external storage as they are
generated.
ii) Second. the run generated in phase one are merged together in a merge_ tree Pattern.
Until only one run is left.

26 5 77 1 61 11 59 15 48 19

5 , 26 1 , 77 11, 61 15 , 59 19 , 48

1 , 5 , 26 , 77 11 , 15 , 59 , 61 19 ,48

1 , 5 , 11 , 15 , 26 , 59 , 61 , 77 19 , 48

1 , 5 , 11 , 15 , 19 , 26 , 48 , 59 , 61 , 77

Because the simple merge functions merges only the loading records of the two runs being
merged to be present in memory at one time. It is possible to merge large run’s together.
Time and space complexity of merge sort:

 The timing performance of this algorithm is O(n) where n denotes the num of the sizes of
the two sub tables to be merged.

RADIX SORT

 Radix sort is a small method that many people intuitively use when alphabetizing a large
list of names.
 Specifically, the list of names is first sorted according to the first letter of each names,
that is, the names are arranged in 26 classes
 Intuitively, one might want to sort numbers on the most significant digit.

II YEAR/ III SEM Page 13


IT-T33 DATA STRUCTURES UNIT IV

 But Radix sort do counter-intuitively by sorting on the least significant digits first.
 On the first pass entire numbers sort on the least significant digit and combine in a array.
 Then on the second pass, the entire numbers are sorted again on the second least-
significant digits and combine in a array and so on.
 Following example shows how Radix sort operates on seven 3-digits number.
Buckets 1st Pass 2nd Pass 3rd Pass
0 720
1
INPUT 329 329
2
720 355
436 436
329 3
839 457
457 355
657 5 355 457
839 657
436 6 436 657
720
457
355 7 720
657
8 839
329
9
839

 In the above example, the first column is the input. The remaining shows the list after
successive sorts on increasingly significant digits position.
 The code for Radix sort assumes that each element in the n-element array A has d
digits, where digit 1 is the lowest-order digit and d is the highest-order digit.
Time and space complexity of radix sort:
 The average case for radix sort is O(m+n), the worst case is also O(m+n).

II YEAR/ III SEM Page 14


IT-T33 DATA STRUCTURES UNIT IV

QUICK SORT

 The divide-and-conquer approach can be used to arrive at an efficient sorting method


different from merge sort. In merge sort, the file a[1:n] was divided at its midpoint into
sub arrays which were independently sorted & later merged.
 In Quick sort, the division into 2 sub arrays is made so that the sorted sub arrays do not
need to be merged later.
 This is accomplished by rearranging the elements in a[1:n] such that a[i]<=a[j] for all i
between 1 & n and all j between (m+1) & n for some m, 1<=m<=n.
 Thus the elements in a[1:m] & a[m+1:n] can be independently sorted.
 No merge is needed. This rearranging is referred to as partitioning.
 Function partition( ) of algorithm accomplishes an in-place partitioning of the elements of
the elements a[m:p-1].
 It is assumed that a[p] ≥ a[m] and that a[m] is the partitioning element. If m=1 & p-1=n,
then a[n+1] must be defined and must be greater than or equal to all elements in a[1:n]
 The assumption that a[m] is the partition element is merely for convenience, other
choices for the partitioning element than the first item in the set are better in practice.
 As its name implies, quicksort is the fastest known sorting algorithm in practice. Its
average running time is O(n log n).
 It is very fast, mainly due to a very tight and highly optimized inner loop.
 It has O(n2) worst-case performance, but this can be made exponentially unlikely with a
little effort.
Divide: Spilt the array into two sub arrays that each element in the left sub array is less than or
equal to the middle element and each element in the right sub array is greater than the
middle element. The splitting of the array into two sub arrays is based on the pivot
element all the elements that are more than pivot should be in the right sub array.
Conquer: Recursively sort the two sub arrays
Combine: Combine the sorted elements in a group to form a list of sorted elements.

II YEAR/ III SEM Page 15


IT-T33 DATA STRUCTURES UNIT IV

STEPS:
1. If the number of elements in S is 0 or 1, then return.
2. Compare the first two elements. Choose the largest as the pivot. In case the first two elements
are the same, compare the second and the third and choose the largest. Continue this till the
last element is reached. In case all the elements have the same value, a pivot cannot be
chosen. Return.
3. Swap A[l] and A[r]. The function interchange (a, i, j) does this.
4. Advance the ‘l’ pointer till an element greater than the pivot element is found.
5. Decrement ‘r’ pointer until an element less than or equal to the pivot is found.
6. Check whether l<r. If so, swap a[l] and a[r].
7. If l>r, partition the list into 2 parts. (upto A[r] and after A[r]).
8. Follow the above mentioned steps for each partition.

Time and space complexity of quick sorting:


 The worst case behaviour of this algorithm is O(n2). The time required to position a
record in a file of size n is O(n). if T(n) is the time taken to sort a file of n records, then
when the file splits roughly into two equal parts each time a record is positioned correctly
we have
T(n) ≤ cn + 2T(n/2) ,for some constant c
≤ cn + 2(cn/2+2T(n/4))
≤ 2cn + 4T(n/4)
.
.
≤ cn log2 n +nT(1) = O(n log2 n)
The better computing time for quick sort is O(n log2 n)

II YEAR/ III SEM Page 16


IT-T33 DATA STRUCTURES UNIT IV

HEAP SORT

 The binary heap data structure is an array that can be viewed as a complete binary tree.
 Each node of the binary tree corresponds to an element of the array. The array is completely
filled on all levels except possibly lowest.

 Heaps are represented in level order, going from left to right. The array corresponding to
the heap above is [25, 13, 17, 5, 8, 3].
 The root of the tree A[1] and given index i of a node, the indices of its parent, left child
and right child can be computed

PARENT (i)
return floor(i/2)
LEFT (i)
return 2i
RIGHT (i)
return 2i + 1

is represented by the array [20, 14, 17, 8, 6, 9, 4, 1].

II YEAR/ III SEM Page 17


IT-T33 DATA STRUCTURES UNIT IV

 Go from the 20 to the 6 first. The index of the 20 is 1. To find the index of the left child,
calculate 1 * 2 = 2. This takes us to 14.
 Now, to go right, calculate 2 * 2 + 1 = 5. This takes us to the 6.
 Try going from 4 to 20. 4's index is 7. To go to the parent, calculate 7 / 2 = 3, which takes us
to the 17.
 Now, to get 17's parent, calculate 3 / 2 = 1, which takes us to the 20.

Structure Property
A heap is a binary tree that is completely filled, with the possible exception of the bottom
level, which is filled from left to right. Such a tree is known as a complete binary tree. Figure
shows an example.
Heap Order Property
This property allows operations to be performed quickly is the heap order property.
Min-heap: If the minimum element has to be found quickly, the smallest element should
be at the root. If the subtrees should also be min-heaps, then all the sub root nodes should be
smaller than all of their descendants.
Max-heap : If the maximum has to be found quickly, the largest element should be at the
root. If the subtrees should also be max-heaps, then all the sub root nodes should be larger than
all of their descendants.
A heap of height h has the minimum number of elements when it has just one node at the
lowest level.

 The levels above the lowest level form a complete binary tree of height h -1 and 2h -1 nodes.
 Hence the minimum number of nodes possible in a heap of height h is 2h.
 Clearly a heap of height h, has the maximum number of elements when its lowest level is
completely filled.
 In this case the heap is a complete binary tree of height h and hence has 2h+1 -1 nodes.
Following is not a heap, because the heap order property holds – but the structure property
does not hold.

II YEAR/ III SEM Page 18


IT-T33 DATA STRUCTURES UNIT IV

Algorithm
void heapsort ( )
{
for (int i = n/2; i>=1; i- -)
pushdown (i,n);
for (int i = n; i>=2; i - -)
{
swap (A[1], A[i]);
pushdown(1, i-1);
}}
void pushdown (int first, int last)
{
int t, i;
int r = first;
while (r <= last/2)
if (last = = 2r)
{
if ( A[r] < A[2*r] )
{
swap ( A[r], A[2 * r]);
r = last;
}}
else if (A[r] < A[2*r] && A[2*r] >= A[2*r + 1] )
{
swap ( A[r], A[2 * r ]);
r = 2 * r;
}
else if (A[r] < A[2*r+1] && A[2*r+1] >= A[2*r] )
{
swap ( A[r], A[2 * r + 1]);
r = 2 * r+1;
}
else r = last;
}

Time and space complexity of heap sort:

 The worst case analysis is easier than average case.


 The depth of a complete binary tree of n nodes is [log 2 𝑛].

II YEAR/ III SEM Page 19


IT-T33 DATA STRUCTURES UNIT IV

 We must first create the heap and then sort the heap.
 The worst case at each step involves performing a no. of comparisons which is
given by the depth of the tree.
 No. of comparisons is O(n log 2 𝑛). Average case shows as also O(n log 2 𝑛)

PART A(QUESTION WITH ANSWER)

1. What is meant by sorting?


Ordering the data in an increasing or decreasing fashion according to some relationship among
the data item is called sorting.

2. What are the two main classifications of sorting based on source of data?
 Internal sorting
 External sorting

3. Define external sorting.


External sorting is a process of sorting in which large blocks of data stored in storage devices are
moved to the main memory and then sorted.

4. Define internal sorting.


Internal sorting is a process of sorting the data which reside in the main memory.

5. What are the various factors to be considered in deciding a sorting algorithm?


 Programming time
 Execution time of program
 Memory needed for program environment.

6. What is the main idea in bubble sort?


The basic idea underlying bubble sort is to pass through the file sequentially several times. Each
pass consists of comparing each element in the file with its successor(X[i] and X[i+1]) and
interchanging the two elements if they are not in proper order.

7. What is the idea behind insertion sort?


The main idea of insertion sort is to insert in the ith pass the ith element in A(1), A(2),……A(i)
in its rightful place.

8. Define selection sort.


The main idea behind the selection sort is to find the smallest element among in A (I) A(J+1)...A
(n) and then interchange it with a (J). This process is then repeated for each value of J.

9. Define shell sort.


Instead of sorting the entire array at once, it is first divide the array into smaller segments, which
are then separately sorted using the insertion sort. Shell sort is also called as Diminishing
increment sort.

II YEAR/ III SEM Page 20


IT-T33 DATA STRUCTURES UNIT IV

10. What is the purpose of quick sort?


The purpose of the quick sort is to move a data item in the correct direction, just enough for to
reach its final place in the array. Quick sort reduces unnecessary swaps and moves an item to a
greater distance, in one move.

11. What is the average efficiency of heap sort?


The average efficiency of heap sort is 0 (n(log2 n)) where, n is the number of elements sorted.

12. Name some of the external sorting methods?


 Polyphase merging
 Oscillation sorting
 Merge sorting

13. When is a sorting method said to be stable?


A sorting method is said to be stable, it two data items of matching values are guaranteed to be
not rearranged with respect to each other as the algorithm progresses.

14. Define radix sort.


Radix sort is a sorting algorithm that sorts integers by processing individual digits, by comparing
individual digits which share the same significant position and value.

15. What is the main idea behind merge sort?


Merge sort is a comparison based sorting algorithm. Most implementations produce a stable
sort, which means that the implementation preserves the input order of equal elements in the
sorted output.

II YEAR/ III SEM Page 21

You might also like