Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
52 views

Sorting and Searching - Problem Solving With Algorithms and Data Structures

The document discusses different searching algorithms like sequential search and binary search. It explains how sequential search works by comparing each item in a list to find a target item. Binary search improves on this by repeatedly dividing the search space in half. The document analyzes the number of comparisons required in the best, average and worst cases for both algorithms.

Uploaded by

satya1401
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Sorting and Searching - Problem Solving With Algorithms and Data Structures

The document discusses different searching algorithms like sequential search and binary search. It explains how sequential search works by comparing each item in a list to find a target item. Binary search improves on this by repeatedly dividing the search space in half. The document analyzes the number of comparisons required in the best, average and worst cases for both algorithms.

Uploaded by

satya1401
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

To be able to explain and implement sequential search and binary search. To be able to explain and implement selection sort, bubble sort, merge sort, quick sort, insertion sort, and shell sort. To understand the idea of hashing as a search technique. To introduce the map abstract data type. To implement the map abstract data type using hashing.

We will now turn our attention to some of the most common problems that arise in computing, those of searching and sorting. In this section we will study searching. We will return to sorting later in the chapter. Searching is the algorithmic process of finding a particular item in a collection of items. A search typically answers either True or False as to whether the item is present. On occasion it may be modified to return where the item is found. For our purposes here, we will simply concern ourselves with the question of membership. In Python, there is a very easy way to ask whether an item is in a list of items. We use the in operator.
>>>15in[3,5,2,4,1] False >>>3in[3,5,2,4,1] True >>>

Even though this is easy to write, an underlying process must be carried out to answer the question. It turns out that there are many different ways to search for the item. What we are interested in here is how these algorithms work and how they compare to one another.

When data items are stored in a collection such as a list, we say that they have a linear or sequential relationship. Each data item is stored in a position relative to the others. In Python lists, these relative positions are the index values of the individual items. Since these index values are ordered, it is possible for us to visit them in sequence. This process gives rise to our first searching technique, the sequential search. Figure 1 shows how this search works. Starting at the first item in the list, we simply move from item to item, following the underlying sequential ordering until we either find what we are looking for or run out of items. If we run out of items, we have discovered that the item we were searching for was not present.

1 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

Figure 1: Sequential Search of a List of Integers The Python implementation for this algorithm is shown in CodeLens 1. The function needs the list and the item we are looking for and returns a boolean value as to whether it is present. The boolean variable found is initialized to False and is assigned the value True if we discover the item in the list.

1 def sequentialSearch(alist, item): 2 3 4 5 6 7 8 9 10 11 12 13 testlist = [1, 2, 32, 8, 17, 19, 42, 13, 0] 14 print(sequentialSearch(testlist, 3)) 15 print(sequentialSearch(testlist, 13)) return found while pos < len(alist) and not found: if alist[pos] == item: found = True else: pos = pos+1 pos = 0 found = False

Step 1 of 65
line that has just executed next line to execute

Forward >

Last >>

Frames

Objects

CodeLens: 1 Sequential Search of an Unordered List (search1)

Analysis of Sequential Search


To analyze searching algorithms, we need to decide on a basic unit of computation. Recall that this is typically the common step that must be repeated in order to solve the problem. For searching, it makes sense to count the number of comparisons performed. Each comparison may or may not discover the item we are looking for. In addition, we make another assumption here. The list of items is not ordered in any way. The items have been placed randomly into the list. In other words, the probability that the item we are looking for is in any particular position is exactly the same for each position of the list.

2 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

If the item is not in the list, the only way to know it is to compare it against every item present. If there are n items, then the sequential search requires n comparisons to discover that the item is not there. In the case where the item is in the list, the analysis is not so straightforward. There are actually three different scenarios that can occur. In the best case we will find the item in the first place we look, at the beginning of the list. We will need only one comparison. In the worst case, we will not discover the item until the very last comparison, the nth comparison. What about the average case? On average, we will find the item about halfway into the list; that is, we will n compare against items. Recall, however, that as n gets large, the coefficients, no matter what they are, 2 n) become insignificant in our approximation, so the complexity of the sequential search, is O ( . Table 1 summarizes these results. Table 1: Comparisons Used in a Sequential Search of an Unordered List Case item is present item is not present Best Case Worst Case Average Case
n 2

1 n

n n

We assumed earlier that the items in our collection had been randomly placed so that there is no relative order between the items. What would happen to the sequential search if the items were ordered in some way? Would we be able to gain any efficiency in our search technique? Assume that the list of items was constructed so that the items were in ascending order, from low to high. If the item we are looking for is present in the list, the chance of it being in any one of the n positions is still the same as before. We will still have the same number of comparisons to find the item. However, if the item is not present there is a slight advantage. Figure 2 shows this process as the algorithm looks for the item 50. Notice that items are still compared in sequence until 54. At this point, however, we know something extra. Not only is 54 not the item we are looking for, but no other elements beyond 54 can work either since the list is sorted. In this case, the algorithm does not have to continue looking through all of the items to report that the item was not found. It can stop immediately. CodeLens 2 shows this variation of the sequential search function.

Figure 2: Sequential Search of an Ordered List of Integers

3 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

1 def orderedSequentialSearch(alist, item): 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,] return found pos = 0 found = False stop = False while pos < len(alist) and not found and not if alist[pos] == item: found = True else: if alist[pos] > item: stop = True else: pos = pos+1

Step 1 of 51
line that has just executed next line to execute

Forward >

Last >>

Frames

Objects

CodeLens: 2 Sequential Search of an Ordered List (search2) Table 2 summarizes these results. Note that in the best case we might discover that the item is not in the n list by looking at only one item. On average, we will know after looking through only items. However, this 2 n) technique is still O ( . In summary, a sequential search is improved by ordering the list only in the case where we do not find the item. Table 2: Comparisons Used in Sequential Search of an Ordered List

item is present item not present

1 1

n n

n 2 n 2

Self Check srch-1: Suppose you are doing a sequential search of the list [15, 18, 2, 19, 18, 0, 8, 14, 19,

4 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

14]. How many comparisons would you need to do in order to find the key 18? a) 5 b) 10 c) 4 d) 2 Check Me

srch-2: Suppose you are doing a sequential search of the ordered list [3, 5, 6, 8, 11, 12, 14, 15, 17, 18]. How many comparisons would you need to do in order to find the key 13? a) 10 b) 5 c) 7 d) 6 Check Me

It is possible to take greater advantage of the ordered list if we are clever with our comparisons. In the sequential search, when we compare against the first item, there are at most n 1 more items to look through if the first item is not what we are looking for. Instead of searching the list in sequence, a binary search will start by examining the middle item. If that item is the one we are searching for, we are done. If it is not the correct item, we can use the ordered nature of the list to eliminate half of the remaining items. If the item we are searching for is greater than the middle item, we know that the entire lower half of the list as well as the middle item can be eliminated from further consideration. The item, if it is in the list, must be in the upper half. We can then repeat the process with the upper half. Start at the middle item and compare it against what we are looking for. Again, we either find it or split the list in half, therefore eliminating another large part of our possible search space. Figure 3 shows how this algorithm can quickly find the value 54. The complete function is shown in CodeLens 3.

Figure 3: Binary Search of an Ordered List of Integers

5 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

1 def binarySearch(alist, item): 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 return found while first<=last and not found: midpoint = (first + last)//2 if alist[midpoint] == item: found = True else: if item < alist[midpoint]: last = midpoint-1 else: first = midpoint+1 first = 0 last = len(alist)-1 found = False

Step 1 of 40
line that has just executed next line to execute

Forward >

Last >>

Frames

Objects

CodeLens: 3 Binary Search of an Ordered List (search3) Before we move on to the analysis, we should note that this algorithm is a great example of a divide and conquer strategy. Divide and conquer means that we divide the problem into smaller pieces, solve the smaller pieces in some way, and then reassemble the whole problem to get the result. When we perform a binary search of a list, we first check the middle item. If the item we are searching for is less than the middle item, we can simply perform a binary search of the left half of the original list. Likewise, if the item is greater, we can perform a binary search of the right half. Either way, this is a recursive call to the binary search function passing a smaller list. CodeLens 4 shows this recursive version.

6 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

1 def binarySearch(alist, item): 2 3 4 5 6 7 8 9 10 11 12 13 14 testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,] 15 print(binarySearch(testlist, 3)) 16 print(binarySearch(testlist, 13)) if len(alist) == 0: return False else: midpoint = len(alist)//2 if alist[midpoint]==item: return True else: if item<alist[midpoint]: return binarySearch(alist[:midpoint] else: return binarySearch(alist[midpoint+1

Step 1 of 30
line that has just executed next line to execute

Forward >

Last >>

Frames

Objects

CodeLens: 4 A Binary Search--Recursive Version (search4)

Analysis of Binary Search


To analyze the binary search algorithm, we need to recall that each comparison eliminates about half of the remaining items from consideration. What is the maximum number of comparisons this algorithm will require n to check the entire list? If we start with n items, about items will be left after the first comparison. After 2 n n n the second comparison, there will be about . Then , , and so on. How many times can we split the 8 16 4 list? Table 3 helps us to see the answer. Table 3: Tabular Analysis for a Binary Search Comparisons 1 2 3 Approximate Number of Items Left
n 2 n 4 n 8

7 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

Comparisons ... i

Approximate Number of Items Left

n 2i

When we split the list enough times, we end up with a list that has just one item. Either that is the item we are looking for or it is not. Either way, we are done. The number of comparisons necessary to get to this n ogn. The maximum number of comparisons is point is i where i = 1. Solving for i gives us i= l 2 logarithmic with respect to the number of items in the list. Therefore, the binary search is O ( . l ogn) One additional analysis issue needs to be addressed. In the recursive solution shown above, the recursive call,
binarySearch(alist[:midpoint],item)

uses the slice operator to create the left half of the list that is then passed to the next invocation (similarly for the right half as well). The analysis that we did above assumed that the slice operator takes constant time. However, we know that the slice operator in Python is actually O(k). This means that the binary search using slice will not perform in strict logarithmic time. Luckily this can be remedied by passing the list along with the starting and ending indices. The indices can be calculated as we did in Listing 3. We leave this implementation as an exercise. Even though a binary search is generally better than a sequential search, it is important to note that for small values of n, the additional cost of sorting is probably not worth it. In fact, we should always consider whether it is cost effective to take on the extra work of sorting to gain searching benefits. If we can sort once and then search many times, the cost of the sort is not so significant. However, for large lists, sorting even once can be so expensive that simply performing a sequential search from the start may be the best choice.

Self Check srch-3: Suppose you have the following sorted list [3, 5, 6, 8, 11, 12, 14, 15, 17, 18] and are using the recursive binary search algorithm. Which group of numbers correctly shows the sequence of comparisons used to find the key 8. a) 11, 5, 6, 8 b) 12, 6, 11, 8 c) 3, 5, 6, 8 d) 18, 12, 6, 8 Check Me

srch-4: Suppose you have the following sorted list [3, 5, 6, 8, 11, 12, 14, 15, 17, 18] and are using the recursive binary search algorithm. Which group of numbers correctly shows the sequence of comoparisons used to search for the key 16?

8 of 19

10/24/2013 10:21 AM

Sorting and Searching Problem Solving with Algorithms and Data Str...

http://interactivepython.org/courselib/static/pythonds/SortSearch/search...

a) 11, 14, 17 b) 18, 17, 15 c) 14, 17, 15 d) 12, 17, 15 Check Me

In previous sections we were able to make improvements in our search algorithms by taking advantage of information about where items are stored in the collection with respect to one another. For example, by knowing that a list was ordered, we could search in logarithmic time using a binary search. In this section 1)time. This we will attempt to go one step further by building a data structure that can be searched in O ( concept is referred to as hashing. In order to do this, we will need to know even more about where the items might be when we go to look for them in the collection. If every item is where it should be, then the search can use a single comparison to discover the presence of an item. We will see, however, that this is typically not the case. A hash table is a collection of items which are stored in such a way as to make it easy to find them later. Each position of the hash table, often called a slot, can hold an item and is named by an integer value starting at 0. For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on. Initially, the hash table contains no items so every slot is empty. We can implement a hash table by using a list with . In each element initialized to the special Python value None . Figure 4 shows a hash table of size m = 11 other words, there are m slots in the table, named 0 through 10.

Figure 4: Hash Table with 11 Empty Slots The mapping between an item and the slot where that item belongs in the hash table is called the hash function. The hash function will take any item in the collection and return an integer in the range of slot names, between 0 and m-1. Assume that we have the set of integer items 54, 26, 93, 17, 77, and 31. Our first hash function, sometimes referred to as the remainder method, simply takes an item and divides it by i t em )= i t em % 11 the table size, returning the remainder as its hash value (h( ). Table 4 gives all of the hash values for our example items. Note that this remainder method (modulo arithmetic) will typically be present in some form in all hash functions, since the result must be in the range of slot names. Table 4: Simple Hash Function Using Remainders Item 54 Hash Value 10

9 of 19

10/24/2013 10:21 AM

You might also like