Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Chapter One - Introduction and Elementary Data Structures

The document provides an introduction to algorithms and data structures, covering basic types such as stacks, queues, linked lists, trees, and graphs. It discusses the properties and analysis of algorithms, including time and space complexity, and presents examples of algorithm implementation. The content emphasizes the importance of understanding data structures and algorithm efficiency in computational problem-solving.

Uploaded by

sam negro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Chapter One - Introduction and Elementary Data Structures

The document provides an introduction to algorithms and data structures, covering basic types such as stacks, queues, linked lists, trees, and graphs. It discusses the properties and analysis of algorithms, including time and space complexity, and presents examples of algorithm implementation. The content emphasizes the importance of understanding data structures and algorithm efficiency in computational problem-solving.

Uploaded by

sam negro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Design and Analysis of Algorithms

Chapter 01
Introduction and Elementary Data
Structures

1
Outline
◼ Review of Basic Data structures

◼ Stack, Queue, Linked List, Tree and Graphs

◼ Algorithm and its Properties

◼ Analysis of Algorithms

◼ Heap and Heap Sort

◼ Hashing

2
Review of Basic Data Structures
◼ A data structure is defined by
1) the logical arrangement of data elements, combined
with
2) the set of operations we need to access the elements.
◼ There are two types of data structure:
◼ Linear : Elements form a sequence or linear list.

◼ Ex: Arrays, Stacks, Queues and linked lists.

◼ Non-linear: Elements do not form a sequence.

◼ Ex: Trees and Graphs.

3
Abstract Data Type(ADT)
◼ A data type consists of a collection of values together with a
set of basic operations on these values.

◼ A data type is an abstract data type if the programmers who


use the type do not have access to the details of how the values
and operations are implemented.

◼ Simply ADT is Type defined in terms of its data items and


associated operations, not its implementation.

◼ All pre-defined types such as int, double, … are abstract data


types.

4
The Stack ADT
◼ Stack = a list where insert and remove
operations take place only at the end “top”
◼ Insertions and deletions follow the last-in first-
out (LIFO) scheme
◼ Main stack operations:
◼ Push(e): inserts element e at the top of the stack
◼ pop(): removes and returns the top element of the
stack (last inserted element)
◼ top(): returns reference to the top element without
removing it
◼ Auxiliary stack operations:
◼ size(): returns the number of elements in the stack
◼ empty(): a Boolean value indicating whether the
stack is empty

5
Contd.
◼ Attempting the execution of an operation of ADT may
sometimes cause an error condition, called an exception
◼ Exceptions are said to be “thrown” by an operation that cannot
be executed
◼ In the Stack ADT, operations pop and top cannot be performed
if the stack is empty
◼ Attempting the execution of pop or top on an empty stack
throws an EmptyStackException

FIGURE: Various examples of stacks 6


Stack Operations - Examples

7
The Queue ADT
◼ Queue = a list where insert takes place at the rear end, but remove
takes place at the front end.

◼ Insertions and deletions follow the first-in first-out (FIFO) scheme

◼ Main queue operations:


◼ enqueue(object o): inserts element o at the rear of the queue
◼ dequeue(): removes and returns the element at the front of the queue

◼ Auxiliary queue operations:


◼ front(): returns the element at the front without removing it
◼ size(): returns the number of elements stored
◼ isEmpty(): returns a Boolean value indicating whether no elements are
stored
◼ Attempting the execution of dequeue or front on an empty queue
throws an EmptyQueueException 8
Queue Operations - Examples

9
Linked List
◼ A Linked List is a series of connected of nodes
◼ Each node in linked list contains
◼ Data: stores relevant information
◼ Link: stores address of next node
Figure 1: Structure of a node
◼ A linked list is called "linked" because each node in the series
has a pointer that points to the next node in the list.
◼ A linked list can grow or shrink in size as the program runs
◼ A Linked List can be
◼ Singly Linked List: has one pointer points to next node in the list
◼ Double Linked List: has two pointers points to successor and predecessor
nodes in the list

myList Singly Linked List

a b c d 10
Tree
◼ A tree T is a collection of nodes
◼ T can be empty
◼ (recursive definition) If not empty, a tree T consists of
◼ a (distinguished) node r (the root),
◼ and zero or more nonempty subtrees T1, T2, ...., Tk, each of whose
roots are connected by a directed edge from r.
◼ If a tree is a collection of N nodes, then it has N-1 edges.

root

T2
...
T1 Tk

11
Graph
◼ A graph G = (V,E) is composed of:
V: set of vertices
E: set of edges connecting the vertices in V
◼ An edge e = (u,v) is a pair of vertices
◼ Example:

a b
V= {a,b,c,d,e}

E= {(a,b),(a,c),(a,d),(b,e),(c,d),(c,e),(d,e)}
c

d e
12
Algorithm
The word “algorithm” comes from the name of Persian author,
“ Abu Jafar Mohammad ibn Musa al Khowarizmi ”.
An algorithm is a finite set of instructions to be followed to
solve a particular computational problem.
Properties:
In addition, all algorithms must satisfy the following properties:
i. Input: Zero or more quantities are externally supplied.
ii. Output. At least one quantity is produced.
iii. Definiteness. Each instruction must be clear and unambiguous.
iv. Finiteness. If we trace out the instructions of an algorithm, then
for all cases, the algorithm terminate after a finite number of steps.
v. Effectiveness. Every step in the algorithm must be easy to
understand and prove using only pencil and paper.
13
Contd…
◼ In formal computer science, one distinguishes between an
algorithm, and a program.
◼ A program does not necessarily satisfy condition (iv).
◼ One important example of such a program for a computer is its operating
system which never terminates (except for system crashes) but continues in
a wait loop until more jobs are entered.
◼ An algorithm can be described in many ways.
◼ A natural language such as English can be used but we must be very
careful that the resulting instructions are definite (condition iii).
◼ Programming Language
◼ Pseudocode = natural language + Programming Language
◼ Flowcharts
◼ It places each processing step in a "box" and uses arrows to

indicate the next step. 14


Algorithm - Example
Write an algorithm to change a numeric grade (integer) to a letter grade.
Algorithm “LetterGrade”
Input: One number
1. if (the number is between 90 and 100, inclusive)
then
1.1 Set the grade to “A”
End if
2. if (the number is between 80 and 89, inclusive)
then
2.1 Set the grade to “B”
End if
3. if (the number is between 70 and 79, inclusive)
then
3.1 Set the grade to “C”
End if

Continues on the next slide 15


Contd…
4. if (the number is between 60 and 69, inclusive)
then
4.1 Set the grade to “D”
End if
5. If (the number is less than 60)
then
5.1 Set the grade to “F”
End if
6. Return the grade
End

16
Analysis of Algorithms
◼ When is a Program/Algorithm said to be better than
another?
◼ Problem : Sorting data in ascending order.
◼ Problem Instance: e.g. sorting data (2 3 9 5 6 8)
◼ Algorithms: Insertion Sort, Bubble Sort, Merge sort, Quick sort,
Selection sort, etc.
◼ What are the measures for Comparison?
◼ Space Required (Space Complexity)
◼ amount of memory program occupies
◼ usually measured in bytes, KB or MB
◼ Time Required (Time Complexity)
◼ execution time
◼ usually measured by the number of executions

17
Space Complexity
◼ Space complexity is defined as the amount of memory a
program needs to run to completion.
◼ Some of the reasons for studying the Space Complexity
◼ If the program is to run on multi user system, it may be required to
specify the amount of memory to be allocated to the program.
◼ We may be interested to know in advance that whether sufficient
memory is available to run the program.
◼ There may be multiple solutions, each having different space
requirements.
◼ May define an upper bound on the data that the program can handle.

18
Contd…
◼ The Space needed by each of the algorithms consists of the
following component.
1. Fixed Space Requirements (C), independent of the
characteristics (Ex: number, size) of the inputs and outputs.
◼ Instruction space (Space needed to store the executable version of the program)
◼ Data space for simple variables, constants and fixed-size structural variables
(arrays and structures)
2. Variable Space Requirements (SP(I)), depends on the instance
characteristic I
◼ number, size, values of inputs and outputs associated with I
◼ recursive stack space(space needed by recursive function)
◼ return address (from where it has to resume after completion of the called
function)
◼ formal parameters and local variables,
◼ Therefore, the space requirement S(P) of any algorithm P may be written
as, S(P)=C+SP(I) 19
Space Complexity - Examples
Program: Simple arithmetic function Space needed by a, b and c = 3*4=
12 bytes
float sum(float a, float b, float c)
{
return a + b + c; S(P) = 12
}

Program: Iterative function for summing a list of numbers


float sum(float list[ ], int n) ◼ Every instance needs to store array a[] &
{ n.
float tsum = 0; ◼ Space needed to store n = 2 bytes.
int i; ◼ Space needed to store a[ ] = n floating bytes
for (i = 0; i<n; i++) ◼ Space needed to store i and tsum = 6 bytes
tsum += list [i]; ◼ Hence S(P) = (n + 8).
return tsum;
}
S(P) = n+8
20
Contd…
Program: Recursive function for summing of all the elements of an array
float rsum(float a[ ], int n) S(P) = recursive stack space * depth of
{ recursion
if (n<=0) So, S(P)=4*n
return 0;
else
return rsum(a, n-1) + a[ n ]);
}
Assumptions:
Space needed for one recursive call of Program
Type Name Number of bytes
parameter: float a[] n
parameter: integer n 2
return address:(used internally) 2(unless a far address)
TOTAL per recursive call 4*n

21
Time Complexity
◼ Time complexity is the amount of computer time a program
or an algorithm needs to run to completion.
◼ Why do we care about time complexity?
◼ Some computers require upper limits for program execution times.
◼ Some programs require a real-time response.
◼ If there are many solutions to a problem, typically we’d like to
choose the quickest.
◼ The exact time will depend on
◼ the implementation of the algorithm,
◼ programming language,
◼ optimizing the capabilities of the compiler used,
◼ the CPU speed,
◼ other hardware characteristics/specifications ,
◼ the amount of data inputted to an algorithm and so on. 22
Contd…
◼ The time T(P) taken by an algorithm P is the sum of the
compile time and the run time(execution time)
◼ Compile time (C), independent of instance characteristics
◼ Run (execution) time TP
◼ Therefore, the time requirement T(P) of any algorithm P
may be written as T(P)=C+TP(I)
TP(n)=caADD(n)+csSUB(n)+clLDA(n)+cstSTA(n)

◼ How do we measure?
1. Count a particular operation (operation counts)
2. Count the number of steps (step counts)
3. Asymptotic complexity
23
Operation Count
◼ Use operation count to measure runtime cost.
◼ pick one operation that is performed most often (e.g., ADD, SUB,
MUL, DIV, CMP, LOAD, or STORE)
◼ count the number of times that operation is performed!
//Insertion Sort
for (int i = 1; i < n; i++)
for (int j = i - 1; j >= 0 && t < a[j]; j--)
a[j + 1] = a[j];

◼ How many comparisons are made?


◼ The number of compares depends on a[ ] and t as well as on n.
◼ total compares in worst case = 1+2+3+…+(n-1)
= n (n-1)/2
24
Step Count
◼ The operation-count method omits accounting for the time
spent on all but the chosen operation.
◼ The step-count method count for all the time spent in all
parts of the program.
◼ A program step is loosely defined to be a syntactically or
semantically meaningful segment of a program for which the
execution time is independent of the instance characteristics.
◼ Examples
◼ a+b+b*c+(a+b)/(a-b) → one step;
◼ comments → zero steps;
◼ while (<expr>) do → step count equal to the number of times <expr> is
executed.
◼ for i=<expr> to <expr1> do → step count equal to number of times
<expr1> is checked.
25
Step Count - Methods
◼ Introduce variable count into programs.

◼ Tabular method
◼ Determine the total number of steps contributed by each statement
step per execution  frequency

◼ add up the contribution of all statements

26
Step Count – Example1
// Iterative function to sum a list of numbers
float sum(float list[ ], int n)
{
float tempsum = 0; count++; /* for assignment */
int i;
for (i = 0; i < n; i++) {
count++; /*for the for loop */
tempsum += list[i]; count++; /* for assignment */
}
count++; /* last execution of for */
return tempsum;
count++; /* for return */
}
2n+3 Steps

27
Step Count – Example2
// Iterative function to sum a list of numbers
Statement s/e Frequency Total steps
float sum(float list[ ], int n) 0 0 0
{ 0 0 0
float tempsum = 0; 1 1 1
int i; 0 0 0
for(i=0; i <n; i++) 1 n+1 n+1
tempsum += list[i]; 1 n n
return tempsum; 1 1 1
} 0 0 0
Total 2n+3

28
Step Count – Example3
// Recursive function to sum a list of numbers
Statement s/e Frequency Total steps
float rsum(float list[ ], int n) 0 0 0
{ 0 0 0
if (n) 1 n+1 n+1
return rsum(list, n-1)+list[n-1]; 1 n n
return list[0]; 1 1 1
} 0 0 0
Total 2n+2

29
Step Count – Example4
s/e frequency
for (int i = 1; i < n; i++) 1 n-1
{ 0 0
// insert a[i] into a[0:i-1] 0 0
int t = a[i]; 1 n-1
int j; 0 0
for (j = i - 1; j >= 0 && t < a[j]; j--) 1 n(n-1)/2
a[j + 1] = a[j]; 1 n(n-1)/2
a[j + 1] = t; 1 n-1
} 0 0

Total step counts


= (n-1) + 0 + 0 + (n-1) + 0 + n(n-1)/2 +n(n-1) /2 + (n-1)
= n2 + 2n – 3

30
Asymptotic Complexity
◼ Two important reasons to determine operation and step
counts
1. To compare the time complexities of two programs that compute
the same function
2. To predict the growth in run time as the instance characteristic
changes
◼ Neither of the two yield a very accurate measure
◼ Operation counts: focus on “key” operations and ignore all others
◼ Step counts: the notion of a step is itself inexact
◼ Asymptotic complexity provides meaningful statements
about the time and space complexities of a program.

31
Asymptotic Notations Properties
◼ Categorize algorithms based on asymptotic growth rate e.g.
linear, quadratic, polynomial, exponential
◼ Ignore small constant and small inputs

◼ Estimate upper bound and lower bound on growth rate of

time complexity function


◼ Describe running time of algorithm as n grows to .

◼ Describes behavior of function within the limit.

Limitations
◼ not always useful for analysis on fixed-size inputs.

◼ All results are for sufficiently large inputs.

32
Asymptotic Notation
◼ Describes the behavior of the time or space complexity for
large instance characteristic.

◼ Big Oh (O) notation provides an upper bound for the


function f.

◼ Omega (Ω) notation provides a lower-bound.

◼ Theta (Q) notation is used when an algorithm can be


bounded both from above and below by the same
function.
33
Big Oh (O) Notation
◼ The asymptotic complexity is a function f(n) that forms an
upper bound for T(n) for large n.
◼ In general, just the order of the asymptotic complexity is of
interest, i.e., if it is a linear, quadratic, exponential function.
◼ The order is denoted by a complexity class using the Big Oh
(O) notation.
Definition:
f(n) = O(g(n)) (read as “f(n) is Big Oh of g(n)”)
if there exist positive constants c and n0 such that
f(n) ≤ cg(n) for all n, n ≥ n0.

34
Big Oh Examples
Example 1: Prove that 2n2  O(n3)
Proof:
Assume that f(n) = 2n2 , and g(n) = n3
f(n)  O(g(n)) ?
Now we have to find the existence of c and n0

f(n) ≤ c.g(n) 2n2 ≤ c.n3 2 ≤ c.n


if we take, c = 1 and n0= 2 OR
c = 2 and n0= 1 then
2n2 ≤ c.n3
Hence f(n)  O(g(n)), c = 1 and n0= 2

35
Contd…
Example 2: Prove that n2  O(n2)
Proof:
Assume that f(n) = n2 , and g(n) = n2
Now we have to show that f(n)  O(g(n))
Since
f(n) ≤ c.g(n) n2 ≤ c.n2 1 ≤ c, take, c = 1, n0= 1
Then
n2 ≤ c.n2 for c = 1 and n  1
Hence, 2n2  O(n2), where c = 1 and n0= 1

36
Omega (Ω) Notation
▪ Again, only the order of the lower bound is considered,
namely if it is a linear, quadratic, exponential or some other
function.
▪ This order is given by a function class using the Omega (Ω)
notation.
Definition:
f(n) = Ω(g(n)) (read as “f(n) is omega of g(n)”) if there exist
positive constants c and n0 such that f(n) ≥ cg(n) for all n, n ≥ n0.

37
Omega Examples
Example 1: Prove that 5.n2  (n)
Proof:
Assume that f(n) = 5.n2 , and g(n) = n
f(n)  (g(n)) ?
We have to find the existence of c and n0 s.t.
c.g(n) ≤ f(n) n  n0
c.n ≤ 5.n2 c ≤ 5.n
if we take, c = 5 and n0= 1 then
c.n ≤ 5.n2 n  n0
And hence f(n)  (g(n)), for c = 5 and n0= 1

38
Contd…
Example 2: Prove that 5.n + 10  (n)
Proof:
Assume that f(n) = 5.n + 10, and g(n) = n
f(n)  (g(n)) ?
We have to find the existence of c and n0 s.t.
c.g(n) ≤ f(n) n  n0
c.n ≤ 5.n + 10 c.n ≤ 5.n + 10.n c ≤ 15.n
if we take, c = 15 and n0= 1 then
c.n ≤ 5.n + 10 n  n0
And hence f(n)  (g(n)), for c = 15 and n0= 1

39
Theta (Θ) Notation
◼ Used when the function f can be bounded both from
above and below by the same function g.
Definition:
f(n) = Θ(g(n)) (read as “f(n) is theta of g(n)”)
if there exist positive constants c1, c2 and n0 such that
c1g(n) ≤ f(n) ≤ c2g(n) for all n, n ≥ n0.

40
Theta Examples
Example 1: Prove that 2.n2 + 3.n + 6 Q(n3)
Proof: Let f(n) = 2.n2 + 3.n + 6, and g(n) = n3
we have to show that f(n) Q(g(n))
On contrary assume that f(n)  Q(g(n)) i.e.
there exist some positive constants c1, c2 and n0
such that: c1.g(n) ≤ f(n) ≤ c2.g(n)
c1.g(n) ≤ f(n) ≤ c2.g(n) c1.n3 ≤ 2.n2 + 3.n + 6 ≤ c2. n3
c1.n ≤ 2 + 3/n + 6/n2 ≤ c2. n 
c1.n ≤ 2 ≤ c2. n, for large n 
n ≤ 2/c1 ≤ c2/c1.n which is not possible
Hence f(n) Q(g(n))  2.n2 + 3.n + 6 Q(n3)
41
Contd…
Example 2: Prove that a.n2 + b.n + c = Q(n2) where a, b, c
are constants and a > 0
Proof
If we take c1 = ¼.a, c2 = 7/4. a and

n0 = 2. max(( b / a), ( c / a))


Then it can be easily verified that
0 ≤ c1.g(n) ≤ f(n) ≤ c2.g(n),n ≥ n0, c1= ¼.a, c2 =
7/4.a
Hence f(n)  Q(g(n))  a.n2 + b.n + c = Q(n2)
Hence any polynomial of degree 2 is of order Q(n2)
42
Best Case, Worst Case and Average Case
◼ When we analyze an algorithm it depends on the input data,
there are three cases :
1. Best Case:
▪ The lower bound on the running time for any input of given
size.
2. Average Case
▪ Assume all inputs of a given size are equally likely.
3. Worst Case
▪ An upper bound on the running time.

43
The Heap Data Structure
◼ A heap is a complete binary tree with the following two
properties:
◼ Structural property: all levels are full, except possibly the last
one, which is filled from left to right
◼ Order (heap) property: for any node x, Parent(x) ≥ x
8
From the heap property, it
7 4
follows that:
5 2
“The root is the maximum
element of the heap!”
Heap

A heap is a binary tree that is filled in order

44
Array Representation of Heaps
◼ A heap can be stored as an
array A.
◼ Root of tree is A[1]
◼ Left child of A[i] = A[2i]
◼ Right child of A[i] = A[2i + 1]
◼ Parent of A[i] = A[ i/2 ]
◼ Heapsize[A] ≤ length[A]
◼ The elements in the sub array
A[(n/2+1) .. n] are leaves

45
Heap Types
◼ Max-heaps (largest element at root), have the max-heap
property:
◼ for all nodes i, excluding the root:
A[PARENT(i)] ≥ A[i]

◼ Min-heaps (smallest element at root), have the min-heap


property:
◼ for all nodes i, excluding the root:
A[PARENT(i)] ≤ A[i]

46
Max Heap – Example
26 24 20 18 17 19 13 12 14 11 Max-heap as an array.
1 2 3 4 5 6 7 8 9 10

Max-heap as a binary tree.

26

24 20

18 17 19 13

12 14 11 Last row filled from left to right.

47
Adding/Deleting Nodes
◼ New nodes are always inserted at the bottom level (left to
right)
◼ Nodes are removed from the bottom level (right to left)

48
Operations on Heaps
◼ Maintain/Restore the max-heap property
◼ MAX-HEAPIFY
◼ Create a max-heap from an unordered array
◼ BUILD-MAX-HEAP
◼ Sort an array in place
◼ HEAPSORT

49
Maintaining the Heap Property
◼ Suppose a node is smaller than a child
◼ Left and Right subtrees of i are max-
heaps
◼ To eliminate the violation:
◼ Exchange with larger child
◼ Move down the tree
◼ Continue until node is not smaller than
children

50
Example
MAX-HEAPIFY(A, 2, 10)

A[2]  A[4]

A[4] violates the heap property


A[2] violates the heap property

A[4]  A[9]

Heap property restored


51
Maintaining the Heap Property
◼ Assumptions: Alg: MAX-HEAPIFY(A, i, n)
◼ Left and Right 1. l ← LEFT(i)
subtrees of i are
max-heaps 2. r ← RIGHT(i)
◼ A[i] may be 3. if l ≤ n and A[l] > A[i]
smaller than its
children 4. then largest ←l
5. else largest ←i
6. if r ≤ n and A[r] > A[largest]
7. then largest ←r
8. if largest  i
9. then exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest, n)

52
Building a Heap
◼ Convert an array A[1 … n] into a max-heap (n =
length[A])
◼ The elements in the subarray A[(n/2+1) .. n] are leaves
◼ Apply MAX-HEAPIFY on elements between 1 and n/2
1
Alg: BUILD-MAX-HEAP(A) 4
1. n = length[A] 2 3

1 3
2. for i ← n/2 downto 1 4 5 6 7

2 16 9 10
do MAX-HEAPIFY(A, i, n)
8 9 10
3. 14 8 7

A: 4 1 3 2 16 9 10 14 8 7
Example: A 4 1 3 2 16 9 10 14 8 7

i=5 i=4 i=3


1 1 1

4 4 4
2 3 2 3 2 3

1 3 1 3 1 3
4 5 6 7 4 5 6 7 4 5 6 7

2 16 9 10 2 16 9 10 14 16 9 10
8 9 10 8 9 10 8 9 10

14 8 7 14 8 7 2 8 7

i=2 i=1
1 1 1

4 4 16
2 3 2 3 2 3

1 10 16 10 14 10
4 5 6 7 4 5 6 7 4 5 6 7

14 16 9 3 14 7 9 3 8 7 9 3
8 9 10 8 9 10 8 9 10

2 8 7 2 8 1 2 4 1

54
Heapsort
◼ Goal:
◼ Sort an array using heap representations

◼ Idea:
◼ Build a max-heap from the array
◼ Swap the root (the maximum element) with the last element in
the array
◼ “Discard” this last node by decreasing the heap size
◼ Call MAX-HEAPIFY on the new root
◼ Repeat this process until only one node remains

55
Example: A=[7, 4, 3, 1, 2]

MAX-HEAPIFY(A, 1, 4) MAX-HEAPIFY(A, 1, 3) MAX-HEAPIFY(A, 1, 2)

MAX-HEAPIFY(A, 1, 1)

56
Alg: HEAPSORT(A)
1. BUILD-MAX-HEAP(A)
2. for i ← length[A] downto 2
3. do exchange A[1] ↔ A[i]
4. MAX-HEAPIFY(A, 1, i - 1)

Heap sort: Analysis


◼ Running time
◼ worst case is Q(N log N)
◼ Average case is also O(N log N)

57
Hashing
◼ Hashing is an alternative way of storing data that aims to greatly
improve the efficiency of search operations.
◼ With hashing, when adding a new data element, the key itself is
used to directly determine the location to store the element.
◼ Therefore, when searching for a data element, instead of searching
through a sequence of key values to find the location of the data we
want, the key value itself can be used to directly determine the
location in which the data is stored.
◼ This means that the search time is reduced from O(n), as in
sequential search, or O(log n), as in binary search, to O(1), or
constant complexity.
◼ Regardless of the number of elements stored, the search time is the
same.

58
Contd.
◼ How can we determine the position to store a data element using only
its key value?
◼ We need to find a function h that can transform a key value K (e.g. an
integer, a string, etc.) into an index into a table used for storing data.
◼ The function h is called a hash function. If h transforms different keys
into different indices it is called a perfect hash function. (A non-perfect
hash function may transform two different key values into the same
index.)
◼ Main idea: use arithmetic operations (hash function) to transform
keys into table locations
◼ the same key is always hashed to the same location!
◼ such that insert and search are both directed to the same location in O(1) time!

59
Hash Table
◼ The Hash table is an array of buckets, where each bucket contains items
assigned by a hash function
◼ h: hash function
◼ maps the universe U of keys into the slots of a hash table T[0,1,...,m-1]
◼ an element of key k hashes to slot h(k)
◼ h(k) is the hash value of key k
◼ The ideal hashing case is if a pair p has the key k and h is the hash
function, then p is stored in position h(k) of the table.
◼ With hashing, an element of key k is stored in T[h(k)]

60
Hash Function
◼ The division method
◼ h(k) = k mod m
◼ e.g. m=12, k=100, h(k)=4
Requires only a single division operation (quite fast)

◼ It’s a good practice to set the table size m to be a prime number

◼ Good values for m: primes not too close to exact powers of 2


◼ e.g. the hash table is to hold 2000 numbers, and we don’t mind an average of 3
numbers being hashed to the same entry
◼ choose m=701

61
Hashing Example
◼ In a text editor, to speed up search, we build a hash table and
hash each word into the table.
◼ Let hash table size (M) = 16
◼ Let hash function (h( )) = (sum all characters) mod 16
◼ by “sum all characters” we mean sum the ASCII (or UTF-8)
representation of the character
◼ for example, h(“He”) = (72+101)%16 = 13
◼ Let sample text be the following N=13 words: “He was well
educated and from his portrait a shrewd observer might divine”

62
Example Contd.

63
Collision and Collision Resolution
◼ Collision occurs when the hash function maps two or more items—
all having different search keys—into the same bucket
◼ What to do when there is a collision?
◼ Collision-resolution scheme:
◼ assigns distinct locations in the hash table to items involved in a collision
▪ Two schemes exist
▪ Separate Chaining
▪ Open Addressing
▪ Design a good hash function
◼ that is fast to compute and
◼ can minimize the number of collisions
◼ Design a method to resolve the collisions when they occur

64
Separate Chaining
◼ Is a collision resolution scheme that lets each bucket points to a linked
list of elements.
◼ That is, instead of a hash table, we use a table of linked list keep a
linked list of keys that hash to the same value

65
Separate Chaining
◼ To insert a key K
◼ Compute h(K) to determine which list to traverse
◼ If T[h(K)] contains a null pointer, initialize this entry to point to a linked list

that contains K alone.


◼ If T[h(K)] is a non-empty list, we add K at the beginning of this list.

◼ To delete a key K
◼ compute h(K), then search for K within the list at T[h(K)]. Delete K if it is
found.
◼ Assume that we will be storing n keys. Then we should make m the
next larger prime number.
◼ Therefore, we expect that each search, insertion, and deletion can be
done in constant time.
◼ Disadvantage: Memory allocation in linked list manipulation will slow
down the program.
◼ Advantage: deletion is easy.
66
Open Addressing
◼ Idea: if there’s a collision, apply another hash function
from a predetermined set of hash functions {h0, h1,
h2, . . .} repeatedly until there’s no collision
◼ To probe: to compare the key of an entry with search
key.
◼ Three common methods for resolving a collision in
open addressing
◼ Linear probing
◼ Quadratic probing
◼ Double hashing
◼ Linear probing:
hi(key) = (h0(key)+i) mod M
do a linear search from h0(key)
until you find

67
Linear Probing Example
◼ E.g, inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10

To insert 58, probe


T[8], T[9], T[0], T[1]

To insert 69, probe


T[9], T[0], T[1], T[2]

68
Primary Clustering
◼ We call a block of contiguously occupied table entries a cluster
◼ On the average, when we insert a new key K, we may hit the middle of
a cluster. Therefore, the time to insert K would be proportional to
half the size of a cluster. That is, the larger the cluster, the slower the
performance.
◼ Linear probing has the following disadvantages:
◼ Once h(K) falls into a cluster, this cluster will definitely grow in size by one. Thus, this
may worsen the performance of insertion in the future.
◼ If two cluster are only separated by one entry, then inserting one key into a cluster can
merge the two clusters together. Thus, the cluster size can increase drastically by a single
insertion. This means that the performance of insertion can deteriorate drastically after a
single insertion.
◼ Large clusters are easy targets for collisions.

69
Quadratic Probing
◼ hi(K) = ( hash(K) + i2 ) mod m
◼ E.g., inserting keys 89, 18, 49, 58, 69 with hash(K) = K mod 10

To insert 58, probe


T[8], T[9], T[(8+4)
mod 10]

To insert 69, probe


T[9], T[(9+1) mod
10], T[(9+4) mod 10]

70
Quadratic Probing
◼ Two keys with different home positions will have different probe
sequences
◼ e.g. m=101, h(k1)=30, h(k2)=29
◼ probe sequence for k1: 30,30+1, 30+4, 30+9
◼ probe sequence for k2: 29, 29+1, 29+4, 29+9

◼ If the table size is prime, then a new key can always be inserted if the
table is at least half empty..

◼ Secondary clustering
◼ Keys that hash to the same home position will probe the same alternative cells
◼ Simulation results suggest that it generally causes less than an extra half probe per
search
◼ To avoid secondary clustering, the probe sequence need to be a function of the
original key value, not the home position

71
Double Hashing
◼ To alleviate the problem of clustering, the sequence of probes for a key
should be independent of its primary position => use two hash
functions: hash() and hash2()
◼ hi(K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod m
◼ f(i) = i * hash2(K); hash2(K) = R - (K mod R), with R is a prime
smaller than m.
◼ Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69
To insert 49,
hash2(49)=7, 2nd
probe is T[(9+7)
mod 10]
To insert 58,
hash2(58)=5, 2nd
probe is T[(8+5)
mod 10]

To insert 69,
hash2(69)=1, 2nd
probe is T[(9+1)
mod 10]
72
Deletion in open addressing
◼ Actual deletion cannot be performed in open addressing hash
tables
◼ otherwise this will isolate records further down the probe sequence

◼ Solution: Add an extra bit to each table entry, and mark a


deleted slot by storing a special value DELETED (tombstone)

73
Disjoint Set
◼ Keeps track of a set of elements partitioned into a number
disjoint subsets

◼ Two sets are said to be disjoint if they have no elements in


common

◼ Initially, each element e is a set in itself:


◼ Ex: { {e1}, {e2}, {e3}, {e4}, {e5}, {e6}, {e7}}

74
Disjoint Set – Example2
◼ student records
s1 = set of students with 0< GPA <= 1
s2 = set of students with 1 < GPA <= 2 etc

◼ Question: Are x and y in the same set?


Yes, if find(x) == find(y).

75
Disjoint Set Terminology
◼ We identify a set by choosing a representative element of the set. It
doesn’t matter which element we choose, but once chosen, it can’t
change

◼ Two operations of interest:

◼ find (x): determine which set x is in. The return value is the
representative element of that set

◼ union (x, y): make one set out of the sets containing x and y.

◼ Disjoint set algorithms are sometimes called union-find algorithms.

76
Operations: Union
◼ Union(x, y) – Combine or merge two sets x and y into a
single set
◼ Before:
{{e3, e5, e7} , {e4, e2, e8}, {e9}, {e1, e6}}

◼ After Union(e5, e1):


{{e3, e5, e7, e1, e6} , {e4, e2, e8}, {e9}}

77
Operations: Find
◼ Determine which set a particular element is in
◼ Useful for determining if two elements are in the same set

◼ Each set has a unique name


◼ name is arbitrary; what matters is that find(a) == find(b)

is true only if a and b in the same set


◼ one of the members of the set is the "representative" (i.e.

name) of the set


◼ {{e3, e5, e7, e1, e6} , {e4, e2, e8}, {e9}}

78
Operations: Find
◼ Find(x) – return the name of the set containing x.

◼ Ex: {{e3, e5, e7, e1, e6} , {e4, e2, e8}, {e9}}

◼ Find(e1) = e5

◼ Find(e4) = e8

79
Up-Trees
◼ A simple data structure for implementing disjoint sets is the up-
tree.

H X
F

A W B R

H, A and W belong to the same set. H is the X, B, R and F are in the same set. X is the
representative representative

80
Operations in Up-Trees - Find
◼ Find is easy. Just follow pointer to representative element. The
representative has no parent.

find(x) {
if (parent(x)) // not the representative
return(find(parent(x));
else
return (x);
}

81
Union
◼ Union is more complicated.
◼ Make one representative element point to the other.
◼ In the example, some elements are now twice as deep as they
were before
H X X points to H
F
B, R and F are now
deeper
A W B R

H X
F H points to X
A and W are now
deeper
A W B R

Time Complexity: O(1) 82


Self-Review Questions
1. Define Disjoint Set

2. How do you represent disjoint sets? Explain with one


example.

3. Explain the following operations of disjoint set with suitable


examples
a. Union

b. Find

83
Self-Review Questions
1. Define the following
a. Data Structure
b. Stack
c. Queue
d. Linked List
e. Tree
f. Graph
g. Algorithm
2. Discuss briefly the asymptotic notations used for finding the
complexity of algorithms with suitable examples.
3. What is a heap? Differentiate between min heap and max heap?
4. What is a hash function ?
5. What makes a good hash function ?
6. What is a hash table ?
84

You might also like