Sorting
Sorting
Sorting
consider the problem of sorting a list, x1 , x2 ,..., xn arrange the elements so that they (or some key fields in them) are in ascending order x1 <= x2 ,<= ... <= xn or in descending order x1 >= x2 >=...>= xn Some O(n2) sorting schemes easy to understand and to implement not very efficient, especially for large data sets Three categories: selection sorts, exchange sorts, and insertion sorts.
Selection Sort
basic idea: make a number of passes through the list or a part of the list and, on each pass, select one element to be correctly positioned. For example, on each pass through a sublist, the smallest element in this sublist might be found and then moved to its proper location. Given the following list is to be sorted into ascending order: 67, 33, 21, 84, 49, 50, 75 Scan the list to locate the smallest element and find it in position 3 Interchange this element with the first element properly positioning smallest element at the beginning of the list 21 , 33 , 67 , 84 , 49 , 50 , 75 Now in all subsequent scans, the first element need not be looked at!!
Selection Sort
Continue the sort by scanning the sublist of elements from position 2 on to find the smallest element Exchange it with the second element (itself in this case) properly positioning the next-to-smallest element in position 2 In all subsequent scans, the first two elements need not examined! Continue in this manner, locating the smallest element in the sublist of elements from position i on and interchanging it with the ith element, until sublist consists only of the last two elements, which results in an exchange or not and thus completes the sort. 21 , 33 , 49 , 84 , 67 , 50 , 75 21 , 33 , 49 , 50 , 67 , 84 , 75 21 , 33 , 49 , 50 , 67 , 84 , 75 21 , 33 , 49 , 50 , 67 , 75 , 84
Exchange Sort
exchange sorts systematically interchange pairs of elements that are out of order until eventually no such pairs remain => list is sorted. One example of an exchange sort is bubble sort: very inefficient, but quite easy to understand Consider again the list 67, 33, 21, 84, 49, 50, 75 On the first pass, compare the first two elements, 67 and 33, and interchange them because they are out of order: 33 , 67 , 21 , 84 , 49 , 50 , 75 Next compare the second and third elements, 67 and 21 and interchange them, yielding: 33 , 21 , 67 , 84 , 49 , 50 , 75
Bubble Sort
Next we compare 67 and 84 but do not interchange them (already ordered) 33 , 21 , 67 , 84 , 49 , 50 , 75 Next, 84 and 49 are compared & interchanged: 33 , 21 , 67 , 49 , 84 , 50 , 75 Then 84 and 50 are compared & interchanged: 33 , 21 , 67 , 49 , 50 , 84 , 75 Finally 84 and 75 are compared & interchanged: 33 , 21 , 67 , 49 , 50 , 75 , 84 The first pass through the list is now complete. guaranteed that on this pass, the largest element in the list will sink to the end of the list, since it will obviously be moved past all smaller elements. Notice also that some of the smaller items have bubbled up toward their proper positions nearer the front of the list. Scan the list again, leaving out the last item ( already in its proper position).
On the first pass through the list, n - 1 comparisons and interchanges are made, and only the largest element is correctly positioned. On the next pass, the sublist consisting of the first n - 1 elements is scanned; there are n 2 comparisons and interchanges; and the next largest element sinks to position n - 1. Continue until the sublist consisting of the first two elements is scanned Total of (n - 1) + (n - 2) + + 1 = n(n - 1) / 2 comparisons and interchanges worst-case computing time for bubble sort is O(n 2 ).
Insertion Sort
Insertion sorts are based on the process of repeatedly inserting a new element into already sorted list At the ith stage, xi is inserted into its proper place among the already sorted x1, x2 ,..., xi-1. Compare xi with each of these elements, starting from the right end, and shift them to the right as necessary. Use array position 0 to store a copy of xi to prevent falling off the left end in these right-to-left scans.
Heaps
A heap is a binary tree with the following properties: 1. left-complete: each level of the tree is completely filled, except possibly the bottom level where the nodes are in the leftmost positions. 2. heap-ordered: data item stored in each node is greater than or equal to the data items stored in its children. 22 12 14 Not a heap 24 28 12 14 22 Heap 24 28
Heaps
To implement a heap, an array or a vector can be used most effectively. Simply number the nodes in the heap from top to bottom, number the nodes on each level from left to right and store the data in the ith node in the ith location of the array. The completeness property of a heap guarantees that these data items will be stored in consecutive locations at the beginning of the array. If heap is the name of the array or vector used, the items in previous heap stored as follows: heap[1] = 24, heap[2] = 14, heap[3] = 28, heap[4] = 12, heap[5] = 22. in an array implementation, easy to find the children of a given node: children of the ith node are at locations 2*i and 2*i + 1. Similarly, the parent of the ith node is easily seen to be in location i / 2.
Heapsort
1. Consider array x as a complete binary tree and use the Heapify algorithm to convert this tree to a heap. 2. For i = n down to 2: a. Interchange x[1] and x[i], thus putting the largest element in the sublist x[1],...,x[i] at end of sublist. b. Apply the PercolateDown algorithm to convert the binary tree corresponding to the sublist stored in positions 1 through i - 1 of x. In PercolateDown, the number of items in the subtree considered at each stage is one-half the number of items in the subtree at the preceding stage. Thus, the worst-case computing time is O(log 2 n). Heapify algorithm executes PercolateDown n/2 times: worst-case computing time is O(nlog 2 n). Heapsort executes Heapify one time and PercolateDown n - 1 times; consequently, its worst-case computing time is O(n log 2 n).
Heapsort
template <typename ElementType> void HeapSort(ElementType x[], int n) { Heapify(x,n); for (int index = n; index > 0; index--) { ElementType temp = x[1]; x[1] = x[index]; x[index] = temp; PercolateDown(x,n, index-1); } return; }
Quicksort
A more efficient exchange sorting scheme than bubble sort because a typical exchange involves elements that are far apart fewer interchanges are required to correctly position an element. Quicksort uses a divide-and-conquer strategy a recursive approach to problem-solving in which the original problem partitioned into simpler sub-problems, each subproblem considered independently. Subdivision continues until subproblems obtained are simple enough to be solved directly Choose some element called a pivot Perform a sequence of exchanges so that all elements that are less than this pivot are to its left and all elements that are greater than the pivot are to its right. divides the (sub)list into two smaller sublists, each of which may then be sorted independently in the same way.
Quicksort
1. If the list has 0 or 1 elements, return. // the list is sorted Else do: 2. Pick an element in the list to use as the pivot. 3. Split the remaining elements into two disjoint groups: SmallerThanPivot = {all elements < pivot} LargerThanPivot = {all elements > pivot} 4. Return the list rearranged as: Quicksort(SmallerThanPivot), pivot, Quicksort(LargerThanPivot).
Quicksort Example
Given 75, 70, 65, 84, 98, 78, 100, 93, 55, 61, 81, 68 to sort Select, arbitrarily, the first element, 75, as pivot. Search from right for elements <= 75, stop at first element >75 Search from left for elements > 75, stop at first element <=75 Swap these two elements, and then repeat two elements same 75, 70, 65, 84, 98, 78, 100, 93, 55, 61, 81, 68 75, 70, 65, 68, 98, 78, 100, 93, 55, 61, 81, 84 75, 70, 65, 68, 98, 78, 100, 93, 55, 61, 81, 84 75, 70, 65, 68, 61, 78, 100, 93, 55, 98, 81, 84 75, 70, 65, 68, 61, 78, 100, 93, 55, 98, 81, 84 75, 70, 65, 68, 61, 55, 100, 93, 78, 98, 81, 84 75, 70, 65, 68, 61, 55, 100, 93, 78, 98, 81, 84 done, swap with pivot
Quicksort Example
The previous SPLIT operation placed pivot 75 so that all elements to the left were <= 75 and all elements to the right were >75. 75 is now placed appropriately Need to sort sublists on either side of 75 55, 70, 65, 68, 61, 75, 100, 93, 78, 98, 81, 84 Need to sort (independently): 55, 70, 65, 68, 61 100, 93, 78, 98, 81, 84 pivot 75
Quicksort performance: O(nlogn) if the pivot results in sublists of approximately the same size. O(n2) worst-case (list already ordered, elements in reverse) when Split() repetitively results, for example, in one empty sublist
Quicksort
template <typename ElementType> void Split(ElementType x[],int first, int last, int& pos) ( ElementType pivot = x[left]; // pivot element int left = first, // index for left search right = last; // index for right search while (left < right) { // Search from right for element <= pivot while (x[right] > pivot) right--; // Search from left for element > pivot while (left < right && x[left] <= pivot) left++; // Interchange elements if searches havent met if (left < right) Swap(x[left], x[right]); } // End of searches; place pivot in correct position pos = right; x[first] = x[pos]; x[pos] = pivot; }
Quicksort
template <typename ElementType> void Quicksort(ElementType x[], int first, int last) { int pos; // final position of pivot if (first < last) // list has more than one element { // Split into two sublists Split(x, first, last, pos); // Sort left sublist Quicksort(x, first, pos - 1); // Sort right sublist Quicksort(x, pos + 1, last); } // else list has 0 or 1 element and // requires no sorting return; } This function is called with a statement of the form Quicksort(x, 1, n);
Quicksort Improvement I
Quicksort is a recursive function
stack of activation records must be maintained by system to manage recursion. The deeper the recursion is, the larger this stack will become. The depth of the recursion and the corresponding overhead can be reduced sort the smaller sublist at each stage first
Another improvement aimed at reducing the overhead of recursion is to use an iterative version of Quicksort() To do so, use a stack to store the first and last positions of the sublists sorted "recursively".
Quicksort Improvement II
An arbitrary pivot gives a poor partition for nearly sorted lists (or lists in reverse) virtually all the elements go into either SmallerThanPivot or LargerThanPivot all through the recursive calls. Quicksort takes quadratic time to do essentially nothing at all. One common method for selecting the pivot is the median-of-three rule, select the median of the first, middle, and last elements in each sublist as the pivot. Often the list to be sorted is already partially ordered median-of-three rule will select a pivot closer to the middle of the sublist than will the first-element rule.
Mergesort
Sorting schemes are internal -- designed for data items stored in main memory external -- designed for data items stored in secondary memory. Previous sorting schemes were all internal sorting algorithms: required direct access to list elements ( not possible for sequential files) made many passes through the list (not practical for files)
mergesort can be used both as an internal and an external sort. basic operation in mergesort is merging, that is, combining two lists that have previously been sorted so that the resulting list is also sorted.
Mergesort
For example: File1 File2 15 20 25 35 45 60 65 70 10 30 40 50 55 Pair by pair, compare the smallest unmerged element in File1, call it x with the smallest unmerged element in File2, call it y If x < y, copy x from File1 to the "merged" file, File3 Else copy y from File2 to the "merged" file, File3
15 20 25 35 45 60 65 70 10 30 40 50 55 15 20 25 35 45 60 65 70 10 30 40 50 55
File3 File3
10 10 15
Mergesort
File1 File2 File1 File2 File1 File2 File1 File2 File1 File2 File1 File2 Etc. 15 20 25 35 45 60 65 70 10 30 40 50 55 15 20 25 35 45 60 65 70 10 30 40 50 55 15 20 25 35 45 60 65 70 10 30 40 50 55 15 20 25 35 45 60 65 70 10 30 40 50 55 15 20 25 35 45 60 65 70 10 30 40 50 55 15 20 25 35 45 60 65 70 10 30 40 50 55 File3 10 15 20 File3 10 15 20 25
File3
10 15 20 25 30
File3
10 15 20 25 30 35
File3
10 15 20 25 30 35 40
File3
10 15 20 25 30 35 40 45
Mergesort
1. Open File1 and File2 for input, File3 for output. 2. Read first element x from File1 and first element y from File2. 3. Repeat the following until end of either File1 or File2 reached: If x< y a. Write x to File3. b. Read a new x value from File1. Else a. Write y to File3. b. Read a new y value from File2. 4. If end of File1 encountered, copy any remaining elements from File2 into File3. Else // end of File2 was encountered copy the rest of File1 into File3.