Data Structures
Data Structures
Objectives
After studying this unit, you should be able to:
Understand the concept of data types.
Discuss the arrays and its type.
Explain the pointers.
1.1 Introduction
The power of a programming language depends, among other things, on the range of
different types of data it can handle.
Inside a digital computer, at the lowest level, all data and instructions are stored
using only binary digits (0 and 1). Thus, decimal number 65 is stored as its binary
equivalent: 0100 0001. Also the character “A” is stored as binary equivalent of 65(A’s
ASCII): 0100 0001. Both the stored values are same but represent different type of
values. How’s that?
Actually, the interpretation of a stored value depends on the type of the variable in
which the value is stored even if it is just 0100 0001 as long as it is stored on the
secondary storage device. Thus, if 0100 0001 is stored in an integer type variable, it will
be interpreted to have integer value 65, whereas, if it is stored in character type of
variable, it will represent “A”.
Therefore, the way a value stored in a variable is interpreted is known as its data
type. In other words, data type of a variable is the type of data it can store.
Notes
1.2 Data Type
Every computer language has its own set of data types it supports. Also, the size of the
data types (number of bytes necessary to store the value) varies from language to
language. Besides, it is also hardware platform dependent.
C has a rich set of data types that is capable of catering to all the programming
requirements of an application. The C-data types may be classified into two categories:
Primary and Composite data types as shown below.
void Array
char Pointer
int Structure
float Union
double Enum, etc.
In addition to these data types, C also has data type qualifiers – short, long, signed
and unsigned. Thus an integer type data may be defined in C as short int, int, unsigned
int, long int. The range of values and size of these qualified data-types is
1.3 Array
An array is a group of data items of same data type that share a common name.
Ordinary variables are capable of holding only one value at a time. If we want to store
more than one value at a time in a single variable, we use arrays.
An array is a collective name given to a group of similar variables. Each member in
the group is referred to by its position in the group. Arrays are allotted the memory in a
strictly contiguous fashion. The simplest array is one dimensional array which is a list of
variables of same data type. An array of one dimensional arrays is called a two
dimensional array.
As C performs no bounds checking, care should be taken to ensure that the array
indices are within the declared limits. Also, indexing in C begins from 0 and not from 1.
Array Declaration
Notes Arrays are defined in the same manner as ordinary variables, except that each array
name must be accompanied by the size specification.
The general form of array declaration is:
data-type array-name [size];
data-type specifies the type of array, size is a positive integer number or symbolic
constant that indicates the maximum number of elements that can be stored in the
array.
e.g.: float height [50];
This declaration declares an array named height containing 50 elements of type
float.
Note: The compiler will interpret first element as height [0]. As in C, the array
elements are induced for 0 to [size-1].
Array Initialization
The elements of an array can be initialized in the same way as the ordinary variables,
when they are declared. Given below are some examples which show how the arrays
are initialized.
static int num [6] = {2, 4, 5, 45, 12};
static int n [ ] = {2, 4, 5, 45, 12};
static float press [ ] = {12.5, 32.4, -23.7, -11.3};
In these examples note the following points:
(a) Till the array elements are not given any specific values, they contain garbage
value.
(b) If the array is initialized where it is declared, its storage class must be either static
or extern. If the storage class is static, all the elements are initialized by 0.
(c) If the array is initialized where it is declared, mentioning the dimension of the array
is optional.
2. Program to read in a one dimensional character array, convert all the elements to
upper case, and then write out the converted array
Notes
#include <stdio.h>
# define SIZE 80
main( )
{
char letter [SIZE]; int count;
for (count = 0; count <SIZE; ++ count) /* read in the line
*/
letter [count] = getchar( );
for (count = 0; count <SIZE; ++ count) /*write out the
line in upper case */
putchar (toupper (letter[count]));
}
It is sometimes convenient to define an array size in terms of a symbolic constant
rather than a fixed integer quantity. This makes it easier to modify a program that
utilizes an array, since all references to the maximum array size can be altered simply
by changing the value of the symbolic constant.
1.3.2 Strings
Just as a group of integers can be stored in an integer array, group of characters can be
stored in a character array or “strings”. The string constant is a one dimensional array of
characters terminated by null character (‘\0’). This null character ‘\0’ (ASCII value 0) is
different from ‘0’ (ASCII value 48).
The terminating null character is important because it is the only way the function
that works with string can know where the string ends.
e.g.: Static char name [ ] = {‘K’, ‘R’, ‘I’, ‘S’, ‘H’, ‘\0’};
This example shows the declaration and initialization of a character array. The array
elements of a character array are stored in contiguous locations with each element
occupying one byte of memory.
Note:
1. Contrary to the numeric array where a 5 digit number can be stored in one array
cell, in the character arrays only a single character can be stored in one cell. So in
order to store an array of strings, a 2-dimensional array is required.
2. As scanf() function is not capable of receiving multi word string, such strings should
be entered using gets().
Notes In memory, whether it is a one dimensional or a two dimensional array, the array
elements are stored in one continuous chain.
The arrangement of array elements of a two dimensional array of students, which
contains roll numbers in one column and the marks in the other (in memory), is shown
below:
e.g.:
1. Program that stores roll number and marks obtained by a student side by side in a
matrix
main( )
{
int stud [4] [2];
int i, j;
for (i = 0; i < = 3; i++)
{
printf (“\n Enter roll no. and marks”);
scanf (“%d%d”, &stud [i] [0], &stud[i] [1]);
}
for (i = 0; i < = 3; i++)
printf (“%d%d\n”, stud [i] [0], stud [i] [0];
}
There are two parts to the program, in the first part through a for Loop we read in
the values of roll number and marks, whereas in second part through another for
Loop we print out these values.
2. Program to print a multiplication table, using two dimensional array.
#define ROWS 5
#define COLUMNS 5
main( )
{
int row, column, product [ROWS] [COLUMNS];
int i, j;
printf (“MULTIPLICATION TABLE \N”);
printf (“ “);
for (j = 1; j < = COLUMNS; j++)
printf (“%4d”, j);
printf (“\n”);
for (i = 0; i < ROWS; i++)
{
row = i + 1;
printf (“%2d\n”, row);
for (j = 1; j < = COLUMNS; j++)
{
column = j;
product [i] [j] = row *column;
}
printf (“\n The sorted numbers are:”);
for (i = 0; i < n; i++)
printf (“\n %d”, x [i]);
}
3. Accept character string and find its length.
Note: We will solve this question by looping instead of using Library function
strlen().
# include <stdio.h>
void main( )
{
char name [20];
Amity Directorate of Distance & Online Education
Introduction to Data Structure 11
int i, len;
printf (“\n Enter the name:”);
Notes
scanf (“%s”, name);
for (i = 0; name [i] ! = ‘\0’; i++);
Len = i - 1;
print f(“\n Length of array is % d”, len);
}
1.4 Pointer
A memory variable is merely a symbolic reference given to a memory location. Now let
us consider that an expression in a C program is as follows:
int a = 10, b = 5, c;
c = a + b;
The above expression implies that a, b and c are the variables which can hold the
integer data. Now from the above mentioned statement let us assume that the variable
‘a’ occupies the address 3000 in the memory, ‘b’ occupies 3020 and the variable ‘c’
occupies 3040 in the memory. Then the compiler will generate the machine instruction
to transfer the data from the location 3000 and 3020 into the CPU, add them and
transfer the result to the location 3040 referenced as c. Hence we can conclude that
every variable holds two values:
Address of the variable in the memory (l-value)
Value stored at that memory location referenced by the variable. (r-value)
Pointer is nothing but a simple data type in C programming language, which has a
special characteristic to hold the address of some other memory location as its r-value.
C programming language provides ‘&’ operator to extract the address of any object.
These addresses can be stored in the pointer variable and can be manipulated.
The syntax for declaring a pointer variable is,
<data type> *<identifier>;
Example:
int n;
int *ptr;/* pointer to an integer*/
The following statement assigns the address location of the variable n to ptr, and ptr
is a pointer to n.
ptr=&n;
Since a pointer variable points to a location, the content of that location is obtained
by prefixing the pointer variable by the unary operator * (also called the indirection or
dereferencing operator) like, *<pointer_variable>.
Example:
# include<stdio.h>
main()
{
int a=10, *ptr;
ptr=&a; /* ptr points to the location of a */
As the memory addresses are numbers, they can be assigned to some other
variable. Let ptr be the variable which holds the address of variable num. We can
access the value of num by the variable ptr. Thus, we can say “ptr points to num”.
Diagrammatically, it can be shown as:
num ptr
197 1001
1001 2341
Note that ptr is itself a variable therefore it will also be stored at a location in the
memory having some address (2341 in above case). Here we say that – ptr is a pointer
variable which is currently pointing to an integer type variable num which holds the
value 197.
Indirection Operation - *
Since a pointer type variable contains an assigned address of another variable the
value stored in the target variable can be obtained using this address. The value store
in a variable can be referred to using a pointer variable pointing to this variable using
indirection operator (*).
For example, consider the following code.
int x = 109;
int *p;
p = &x;
Then the following expression
*p
Represents the value 109.
1.6 Summary
Notes An array is a group of memory locations related by the fact that they all have the same
name and same data type. An array including more than one dimension is called a
multidimensional array. The size of an array should be a positive number. If an array in
declared without a size and in initialized to a series of values it is implicitly given the
size of number of initializers. Array subscript always starts with 0. Last element’s
subscript is always one less than the size of the array e.g., An array with 10 elements
contains element 0 to 9. Size of an array must be a constant number.
Pointers are often passed to a function as arguments by reference. This allows data
items within the calling function to be accessed, altered by the called function, and then
returned to the calling function in the altered form. There is an intimate relationship
between pointers and arrays as an array name is really a pointer to the first element in
the array. Access to the elements of array using pointers is enabled by adding the
respective subscript to the pointer value (i.e. address of zeroth element) and the
expression proceeded with an indirection operator.
Objectives
After studying this unit, you should be able to:
Describe internal and external sorting.
Explain various simple sorting schemes like quick sort, merge sort, etc.
Compare various sorting algorithms
Define and explain order statistics
Describe linear and binary search.
2.1 Introduction
In this lesson we have four topics, in first topic, that is internal sorting we discuss
definition and example of internal sorting. In the next topic we will see various simple
sorting schemes like bubble sort, insertion sort, heap sort, and quick sort, merge sort
Amity Directorate of Distance & Online Education
Searching and Sorting Techniques 23
and bin sort with their algorithms and one example for each. In the next section we will
compare various sorting algorithm. Last section shows the concept of order statistics,
here, topics to be discussed are ith order statistic, median, selection problem, minimum Notes
and maximum, simultaneous minimum and maximum.
2.2 Sorting
A sorting algorithm is an algorithm which puts elements of a list in a certain order. Order
can either be a numerical order or lexicographical order. Efficient sorting is an important
for optimizing the use of other algorithms which require input data to be in sorted lists.
Data is often taken to be in an array, which allows random access, rather than a list,
which only allows sequential access, though often algorithms can be applied with
suitable modification to either type of data.
Sorting algorithms are prevalent in introductory computer science classes, where
the abundance of algorithms for the problem provides a gentle introduction to a variety
of core algorithm concepts, such as big O notation, divide and conquer algorithms, data
structures such as heaps and binary trees, randomized algorithms, best, worst and
average case analysis, time-space trade-offs, and upper and lower bounds.
Types of Sorting
There are two type of sorting, external and internal sorting.
Internal Sorting: In this all the data to be sorted is available in the high-speed main
memory of the computer. An internal sort is a sorting process which takes place entirely
inside the main memory of a computer. This is only possible whenever the data which is
needed to be sorted is small enough, so that it can be held in the main memory. For
sorting larger datasets, it may be necessary to hold only a small amount (chunk) of data
in memory at a time, since it won’t all fit. The rest of the data is normally held on some
larger, but slower medium, like a hard-disk. Any reading or writing of data to and from
this slower media can slow the sortation process considerably. This issue has
implications for different sort algorithms.
Example:
Bubble sort
Insertion sort
Quick sort
Two-way merge sort
Heap sort
Consider a Bubble sort, where adjacent records are swapped in order to get them
into the right order, so that records appear to “bubble” up and down through the data
space. If this has to be done in chunks, then when we have sorted all the records in
chunk 1, we move on to chunk 2, but we find that some of the records in chunk 1 need
to “bubble through” chunk 2, and vice versa (i.e., there are records in chunk 2 that
belong in chunk 1, and records in chunk 1 that belong in chunk 2 or later chunks). This
will cause the chunks to be read and written back to disk many times as records cross
over the boundaries between them, resulting in a considerable degradation of
performance. If the data can all be held in memory as one large chunk, then this
performance hit is avoided.
On the other hand, some algorithms handle external sorting rather better. A Merge
sort breaks the data up into chunks, sorts the chunks by some other algorithm (maybe
bubble sort or Quick sort) and then recombines the chunks two by two so that each
recombined chunk is in order. This approach minimises the number or reads and writes
of data-chunks from disk, and is a popular external sort method.
External Sorting: In this sorting methods are employed when the data to be sorted
is too large to fit in primary memory.
Notes
Characteristics of External Sorting
During the sort, some of the data must be stored externally. Typically the data will
be stored on tape or disk.
The cost of accessing data is significantly greater than either bookkeeping or
comparison costs.
There may be severe restrictions on access. For example, if tape is used, items
must be accessed sequentially.
1 2 3 4 5 6
A 10 2 5 12 7 9
Pass = 1
2 10 5 12 7 9
2 5 10 12 7 9
2 5 10 7 12 9
A 2 5 10 7 9 12 6th location is
sorted
nth location
Pass = 2
2 5 7 10 9 12 5th location is
sorted
2 5 7 9 10 12 6th location is
sorted
(n – 1) n
Pass = 3 No Exchanges occurred, All elements are correctly phased. Thus sorted array
is
2 5 7 9 10 12
for(j=1;j<n;j++)
{
for(i=0; i<n; i++)
{
if(a[i]>a[i+1])
{
temp=a[i];
a[i]=a[i+1];
a[i+1]=temp;
}
}
}
A 10 2 5 12 7
Here n = 5
j=2
A 2 10 5 12 7
j=3
A 2 5 10 12 7
j=4
A 2 5 10 12 7
j=5
A 2 5 10 12 7
Sorted Array is
A 2 5 7 10 12
Notes n n n
w(n) = c1 n + c2(n – 1) + c3(n – 1) + c4 j + c5 ( j 1) + c6 ( j 1) + c7(n – 1)
j 2 j 2 j 2
n
(n 1)n(n 1 1 ) n(n 1)
( j 1) 1 + 2 + 3 + … + (n – 1) =
j 2 2
2
Heaps
Heap sort introduces another algorithm design technique which is using a data structure
called “heap”, to manage information during the execution of the algorithm. Not only is
the heap data structure useful for heap sort, but it also makes an efficient priority queue.
The term “heap” was originally coined in the context of heap sort, but it has since
come to refer to “garbage-collected storage,” such as the programming languages Lisp
and Java provide. Our heap data structure is not garbage-collected storage, and
whenever we refer to heaps in this book, we shall mean the structure defined in this
chapter.
The (binary) heap data structure is an array object that can be viewed as a nearly
complete binary tree, as shown in following figure. Each node of the tree corresponds to
an element of the array that stores the value in the node. The tree is completely filled on
all levels except possibly the lowest, which is filled from the left up to a point.
Notes 16
2 3
14 10
4 7 1 2 3 4 5 6 7 8 9 10
5 6
8 7 9 3 16 14 10 8 7 9 3 2 4 1
8 9 10
2 4 1
(a) (b)
Figure 1.2: A heap viewed as (a) a binary tree and (b) an array
An array A that represents a heap is an object with two attributes: length [A]., which
is the number of elements in the array, and heap-size[A], the number of elements in the
heap stored within array A. That is, although A[1 ... length[A]] may contain valid
numbers, no element past A [heap-size [A]], where heap-size[A] length[A], is an
element of the heap.
The root of the tree is A[1], and given the index i of a node, the indices of its parent
PARENT(i), left child LEFT(i), and right child RIGHT(i) can be computed simply:
PARENT(i) return i/2
LEFT (i) return 2i
RIGHT (i) return 2i + 1
There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds, the
values in the nodes satisfy a heap property, the specifics of which depend on the kind of
heap. In a max-heap, the max-heap property is that for every node i other than the root,
A[PARENT(i)] A[i]
that is, the value of a node is at most the value of its parent. Thus, the largest element
in a max-heap is stored at the root, and the sub tree rooted at a node contains values
no larger than that contained at the node itself. A min-heap is organized in the opposite
way; the min-heap property is that for every node i other than the root,
A[PARENT(i)] A[i]
16 16
2 3 2 3
I 4 10 14 10
4 5 6 7 4 5 6 7
i 4 7 9 3
14 7 9 3
8 9 10 8 9 10
2 8 1 2 8 1
(a) (b)
1
16
2 3
I 14 10
4 5 6 7
8 7 9 3
8 9 10
I
2 4 1
(c)
Notes We can characterize the running time of HEAPIFY on a node of height h as O(h).
Building a Heap
We can use the procedure HEAPIFY in a bottom-up manner to convert an array
A[1. .n], where n = length [A], into a heap. Now, it can be proven that the elements in
the sub array A [(n/2 + 1) ... n] are all leaves of the tree, and so each is a l-element
heap to begin with. The procedure BUILD-HEAP goes through the remaining nodes of
the tree and runs HEAPIFY on each one.
Algorithm BUILD-HEAP (A)
{
heap-size[A] length[A]
for(i length[A]/2 downto 1)
HEAPIFY(A, i)
}
Following figure shows an example of the action of BUILD-HEAP.
A 4 1 3 2 16 9 10 14 8 7
1 1
4 4
2 3 2 3
1 3 1 3
4 5 6 7 4 5 6 7
2 i 16 i 2 16 9 10
9 10
8 9 10 8 9 10
14 8 7 14 8 7
(a) (b)
1 1
4 4
2 3 2 3
1 10 i i 1 10
4 5 6 7 4 5 6 7
14 16 9 3 14 16 9 3
8 9 10 8 9 10
2 8 7 2 8 7
(c) (d)
1 1
i 4 16
2 3 2 3
16 10 i 14 10
4 5 6 7 4 5 6 7
14 7 9 3 8 7 9 3
8 9 10 8 9 10
2 8 1 2 4 1
(e) (f)
[lgn] h h
O n h O n h
h0 2 h0 2
O(n)
16 14
14 10 8 10
8 7 9 3 4 7 9 3
2 4 1 2 1 16 i
(a) (b)
10 9
8 9 8 3
4 7 1 3 4 7 1 2
i
2 14 16 10 14 16
(c) (d)
Contd...
8 7
Notes 7 3 3
4
4 2 1 i 9 1 2 8 i 9
10 11 16 10 14 16
(e) (f)
4 3
2 3 2 1
1 i 7 8 9 i 4 7 8 9
10 14 16 10 14 16
(g) (h)
2 1
1 3 i i 2 3
4 7 8 9 4 7 8 9
10 14 16 10 14 16
(i) (j)
A 1 2 3 4 7 8 9 10 14 16
(k)
Notes $ cc heap.c
$ a.out
Average case
Enter no of elements: 7
Enter the no’s: 6
5
3
1
8
7
2
Heap array: 8 6 7 1 5 3 2
The sorted array is: 1 2 3 5 6 7 8
Complexity:
Best case = Avg case = Worst case = O(n logn)
$ a.out
/* Best case
Enter no of elements: 7
Enter the no’s: 12
10
8
9
7
4
2
Heap array: 12 10 8 9 7 4 2
The sorted array is: 2 4 7 8 9 10 12
Complexity:
Best case = Avg case = Worst case = O(n logn)
$ a.out
/* Worst case
Enter no of elements: 7
Enter the no’s: 5
7
12
6
9
10
14
Heap array: 14 9 12 5 6 7 10
The sorted array is: 5 6 7 9 10 12 14
Complexity:
Best case = Avg case = Worst case = O(n logn)
*/
Amity Directorate of Distance & Online Education
Searching and Sorting Techniques 37
2.3.4 Quick Sort
This is the most widely used internal sorting algorithm. Its popularity lies in the ease of Notes
implementation, moderate use of resources and acceptable behaviour among a variety
of sorting cases. The basis of quick sort is the divide and conquer strategy i.e. Divide
the problem into sub-problems, until solved sub problems are found. In its basic form, it
was invented by C.A.R. Hoare in 1960.
Algorithm quicksort( a, low, high )
{
if ( high > low )
q = partition (a, low, high)
quicksort (a, low, q-1)
quicksort (a, q+1, high)
}
Algorithm partition(a[], lo, hi)
{
// lo is the lower index, hi is the upper index
i=lo
j=hi
piv=a[lo];
// partition
while (i<=j)
while (a[i]< piv) i++
while (a[j]> piv) j--
if (i<=j)
a[i]a[j] //swapping
i++
j--
a[i] a[piv] //swapping
return j
}
Example: Given list 50, 40, 20, 60, 80, 100, 45, 70, 105, 30, 90, 75
Pivot
Use this method recursively for these two lists. Now we get sorted array as:
if(index1<index2)
{
//Swapping opertation
temp = array[index1];
array[index1] = array[index2];
array[index2] = temp;
}
}
int main()
{
//Declaring variables
int array[100],n,i;
getch();
return 0;
}
Output
Merge sort is the best method for sorting linked lists in random order. The total
computing time is O(n log2 n).
The disadvantage of using merge sort is that it requires two arrays of the same size
and space for the merge phase. That is, to sort a list of size n, it needs space for 2n
elements.
To insert a new item in the table, we hash the key to determine which list the item
goes on, and then insert the item at the beginning of the list. For example, to insert 11,
we divide 11 by 8 giving a remainder of 3. Thus, 11 goes on the list starting at
hashTable[3]. To find a number, we hash the number and chain down the correct list to
see if it is in the table. To delete a number, we find the number and remove the node
from the linked list.
In the above example, the first column is the input. The remaining shows the list
after successive sorts on increasingly significant digits position. The code for Radix sort
assumes that each element in the n-element array A has d digits, where digit 1 is the
lowest-order digit and d is the highest-order digit.
RADIX_SORT (A, d)
for i ← 1 to d do
use a stable sort to sort A on digit i
// counting sort will do the job
Analysis
The running time depends on the stable used as an intermediate sorting algorithm.
When each digits is in the range 1 to k, and k is not too large, COUNTING_SORT is the
obvious choice. In case of counting sort, each pass over n d-digit numbers takes O(n+k)
time. There are d passes, so the total time for Radix sort is (n+k)time. There are d
passes, so the total time for Radix sort is (dn+kd). When d is constant and k = (n),
the Radix sort runs in linear time.
/*
* Main
*/
int main()
{
int arr[] = {170, 45, 75, 90, 802, 24, 2, 66};
int n = sizeof(arr)/sizeof(arr[0]);
radixsort(arr, n);
for (int i = 0; i < n; i++)
cout << arr[i] << " ";
return 0;
}
2.5.2 A Median
A median, informally, is the “halfway point” of the set. When n is odd, the median is
unique, occurring at i = (n + 1)/2. When n is even, there are two medians, occurring at
i = n/2 and i = n/2 + 1. Thus, regardless of the parity of n, medians occur at i = (n +
1)/2 and i = (n + 1)/2 .
Algorithm MEDIAN (A, n)
{
if (n mod 2 = 0)
print A[n/2] and A[n/2 + 1]
else
print A[(n+1)/2]
}
It requires only 1 comparison, thus time complexity is O(1).
The selection problem can be solved in O(n lg n) time, since we can sort the
numbers using heapsort or merge sort and then simply index the ith element in the
Notes output array.
In fact, only 3 n/2 comparisons are necessary to find both the minimum and the
maximum.
Algorithm MINIMAX (A)
{
Min = A[1]
Max = A[1]
for(i = 2 to length[A])
if (min > A[i]) min = A[i]
if (max < A[i]) max = A[i]
2.6 Searching
The process of finding a particular element of an array is called Searching”. If the item is
not present in the array, then the search is unsuccessful. There are two types of search
(Linear search and Binary Search)
Linear Search
The linear search compares each element of the array with the search key until the
search key is found. To determine that a value is not in the array, the program must
compare the search key to every element in the array. It is also called “Sequential
Search” because it traverses the data sequentially to locate the element.
/* This program use linear search in an array to find the LOCATION of the given
Key value */
/* This program is an example of the Linear Search*/
#include <iostream.h>
int const N=10;
int LinearSearch(int [ ], int); //Function Prototyping
int main()
{
int A[N]= {9, 4, 5, 1, 7, 78, 22, 15, 96, 45}, Skey, LOC;
cout<<“ Enter the Search Key\n”;
cin>>Skey;
LOC = LinearSearch(A, Skey); //call a function
if(LOC == -1)
cout<<“ The search key is not in the array\n Un-Successful
Search\n”;
else
cout<<“ The search key “<<Skey<< “ is at location “<<LOC<<endl;
return 0;
}
int LinearSearch (int b[ ], int skey)//function definition
{
int i;
for (i=0; i<= N-1; i++) if(b[i] == skey) return i;
return -1;
}
Binary Search
It is useful for the large sorted arrays. The binary search algorithm can only be used
with sorted array and eliminates one half of the elements in the array being searched
after each comparison. The algorithm locates the middle element of the array and
compares it to the search key. If they are equal, the search key is found and array
subscript of that element is returned. Otherwise the problem is reduced to searching
one half of the array. If the search key is less than the middle element of array, the first
half of the array is searched. If the search key is not the middle element of in the
specified sub array, the algorithm is repeated on one quarter of the original array. The
search continues until the sub array consist of one element that is equal to the search
key (search successful). But if Search-key not found in the array then the value of END
of new selected range will be less than the START of new selected range. This will be
explained in the following example:
A[9] 68
A[8] 37
A[7] 25
A[6] 22
A[5] 17
A[4] 15
A[3] 11
A[2] 9
A[1] 5
A[0] 3
Start=0
End = 9
Mid=int(Start+End)/2
Mid= int (0+9)/2
Mid=4
_________________
Start=4+1 = 5
End = 9
Mid=int(5+9)/2 = 7
_________________
Start = 5
End = 7 – 1 = 6
Mid = int(5+6)/2 =5
_________________
Start = 5+1 = 6
End = 6
Mid = int(6 + 6)/2 = 6
Found at location 6
Successful Search
Amity Directorate of Distance & Online Education
Searching and Sorting Techniques 55
Search-Key = 22
A[9] 68 Notes
A[8] 37
A[7] 25
A[6] 22
A[5] 17
A[4] 15
A[3] 11
A[2] 9
A[1] 5
A[0] 3
Search-Key = 8
Start=0
End = 9
Mid=int(Start+End)/2
Mid= int (0+9)/2
Mid=4
_________________
Start=0
End = 3
Mid=int(0+3)/2 = 1
_________________
Start = 1+1 = 2
End = 3
Mid = int(2+3)/2 =2
_________________
Start = 2
End = 2 – 1 = 1
End is < Start
Un-Successful Search
Algorithm: (Binary Search)
Here A is a sorted Linear Array with N elements and SKEY is a given item of
information to search. This algorithm finds the location of SKEY in A and if successful, it
returns its location otherwise it returns -1 for unsuccessful.
BinarySearch (A, SKEY)
1. [Initialize segment variables.]
Set START=0, END=N-1 and MID=INT((START+END)/2).
2. Repeat Steps 3 and 4 while START ≤ END and A[MID]≠SKEY.
3. If SKEY< A[MID]. Then
Set END=MID-1.
Else Set START=MID+1.
[End of If Structure.]
4. Set MID=INT((START +END)/2).
2.7 Summary
Computer systems are often used to store large amounts of data from which individual
records must be retrieved according to some search criterion. Thus the efficient storage
of data to facilitate fast searching is an important issue. We’re interested in the average
time, the worst-case time and the best possible time.
However, we will generally be most concerned with the worst-case time as
calculations based on worst-case times can lead to guaranteed performance
predictions. Conveniently, the worst-case times are generally easier to calculate than
average times.
If there are n items in our collection - whether it is stored as an array or as a linked
list - then it is obvious that in the worst case, when there is no item in the collection with
the desired key, then n comparisons of the key with keys of the items in the collection
will have to be made.
To simplify analysis and comparison of algorithms, we look for a dominant operation
and count the number of times that dominant operation has to be performed. In the
case of searching, the dominant operation is the comparison, since the search requires
n comparisons in the worst case, we say this is a O(n) (pronounce this “big-Oh-n” or
“Oh-n”) algorithm. The best case in which the first comparison returns a match -
requires a single comparison and is O(1). The average time depends on the probability
that the key will be found in the collection - this is something that we would not expect to
know in the majority of cases. Thus in this case, as in most others, estimation of the
Amity Directorate of Distance & Online Education
Searching and Sorting Techniques 57
average time is of little utility. If the performance of the system is vital, i.e. it’s part of a
life-critical system, then we must use the worst case in our design calculations as it
represents the best guaranteed performance. Notes
2.8 Check Your Progress
Multiple Choice Questions
1. The worst case occur in linear search algorithm when ……………….
(a) Item is somewhere in the middle of the array
(b) Item is not in the array at all
(c) Item is the last element in the array
(d) Item is the last element in the array or item is not there at all
2. If the number of records to be sorted is small, then ……………… sorting can be
efficient.
(a) Merge
(b) Heap
(c) Selection
(d) Bubble
3. The complexity of sorting algorithm measures the ……………… as a function of the
number n of items to be sorter.
(a) average time
(b) running time
(c) average-case complexity
(d) case-complexity
4. Which of the following is not a limitation of binary search algorithm?
(a) must use a sorted array
(b) requirement of sorted array is expensive when a lot of insertion and deletions
are needed
(c) there must be a mechanism to access middle element directly
(d) binary search algorithm is not efficient when the data elements more than 1500.
5. The Average case occurs in linear search algorithm ……………….
(a) when item is somewhere in the middle of the array
(b) when item is not the array at all
(c) when item is the last element in the array
(d) Item is the last element in the array or item is not there at all
6. Binary search algorithm cannot be applied to ……………….
(a) sorted linked list
(b) sorted binary trees
(c) sorted linear array
(d) pointer array
7. Complexity of linear search algorithm is ………………
(a) O(n)
(b) O(logn)
(c) O(n2)
(d) O(n logn)