0% found this document useful (0 votes)

30 views

Chapter One - Introduction and Elementary Data Structures

The document provides an introduction to algorithms and data structures, covering basic types such as stacks, queues, linked lists, trees, and graphs. It discusses the properties and analysis of algorithms, including time and space complexity, and presents examples of algorithm implementation. The content emphasizes the importance of understanding data structures and algorithm efficiency in computational problem-solving.

Uploaded by

sam negro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Chapter One - Introduction and Elementary Data Structures

Uploaded by

sam negro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Design and Analysis of Algorithms

Chapter 01
Introduction and Elementary Data
Structures

1
Outline
◼ Review of Basic Data structures

◼ Stack, Queue, Linked List, Tree and Graphs

◼ Algorithm and its Properties

◼ Analysis of Algorithms

◼ Heap and Heap Sort

◼ Hashing

2
Review of Basic Data Structures
◼ A data structure is defined by
1) the logical arrangement of data elements, combined
with
2) the set of operations we need to access the elements.
◼ There are two types of data structure:
◼ Linear : Elements form a sequence or linear list.

◼ Ex: Arrays, Stacks, Queues and linked lists.

◼ Non-linear: Elements do not form a sequence.

◼ Ex: Trees and Graphs.

3
Abstract Data Type(ADT)
◼ A data type consists of a collection of values together with a
set of basic operations on these values.

◼ A data type is an abstract data type if the programmers who

use the type do not have access to the details of how the values
and operations are implemented.

◼ Simply ADT is Type defined in terms of its data items and

associated operations, not its implementation.

◼ All pre-defined types such as int, double, … are abstract data

types.

4
The Stack ADT
◼ Stack = a list where insert and remove
operations take place only at the end “top”
◼ Insertions and deletions follow the last-in first-
out (LIFO) scheme
◼ Main stack operations:
◼ Push(e): inserts element e at the top of the stack
◼ pop(): removes and returns the top element of the
stack (last inserted element)
◼ top(): returns reference to the top element without
removing it
◼ Auxiliary stack operations:
◼ size(): returns the number of elements in the stack
◼ empty(): a Boolean value indicating whether the
stack is empty

5
Contd.
◼ Attempting the execution of an operation of ADT may
sometimes cause an error condition, called an exception
◼ Exceptions are said to be “thrown” by an operation that cannot
be executed
◼ In the Stack ADT, operations pop and top cannot be performed
if the stack is empty
◼ Attempting the execution of pop or top on an empty stack
throws an EmptyStackException

FIGURE: Various examples of stacks 6

Stack Operations - Examples

7
The Queue ADT
◼ Queue = a list where insert takes place at the rear end, but remove
takes place at the front end.

◼ Insertions and deletions follow the first-in first-out (FIFO) scheme

◼ Main queue operations:

◼ enqueue(object o): inserts element o at the rear of the queue
◼ dequeue(): removes and returns the element at the front of the queue

◼ Auxiliary queue operations:

◼ front(): returns the element at the front without removing it
◼ size(): returns the number of elements stored
◼ isEmpty(): returns a Boolean value indicating whether no elements are
stored
◼ Attempting the execution of dequeue or front on an empty queue
throws an EmptyQueueException 8
Queue Operations - Examples

9
Linked List
◼ A Linked List is a series of connected of nodes
◼ Each node in linked list contains
◼ Data: stores relevant information
◼ Link: stores address of next node
Figure 1: Structure of a node
◼ A linked list is called "linked" because each node in the series
has a pointer that points to the next node in the list.
◼ A linked list can grow or shrink in size as the program runs
◼ A Linked List can be
◼ Singly Linked List: has one pointer points to next node in the list
◼ Double Linked List: has two pointers points to successor and predecessor
nodes in the list

myList Singly Linked List

a b c d 10
Tree
◼ A tree T is a collection of nodes
◼ T can be empty
◼ (recursive definition) If not empty, a tree T consists of
◼ a (distinguished) node r (the root),
◼ and zero or more nonempty subtrees T1, T2, ...., Tk, each of whose
roots are connected by a directed edge from r.
◼ If a tree is a collection of N nodes, then it has N-1 edges.

root

T2
...
T1 Tk

11
Graph
◼ A graph G = (V,E) is composed of:
V: set of vertices
E: set of edges connecting the vertices in V
◼ An edge e = (u,v) is a pair of vertices
◼ Example:

a b
V= {a,b,c,d,e}

E= {(a,b),(a,c),(a,d),(b,e),(c,d),(c,e),(d,e)}
c

d e
12
Algorithm
The word “algorithm” comes from the name of Persian author,
“ Abu Jafar Mohammad ibn Musa al Khowarizmi ”.
An algorithm is a finite set of instructions to be followed to
solve a particular computational problem.
Properties:
In addition, all algorithms must satisfy the following properties:
i. Input: Zero or more quantities are externally supplied.
ii. Output. At least one quantity is produced.
iii. Definiteness. Each instruction must be clear and unambiguous.
iv. Finiteness. If we trace out the instructions of an algorithm, then
for all cases, the algorithm terminate after a finite number of steps.
v. Effectiveness. Every step in the algorithm must be easy to
understand and prove using only pencil and paper.
13
Contd…
◼ In formal computer science, one distinguishes between an
algorithm, and a program.
◼ A program does not necessarily satisfy condition (iv).
◼ One important example of such a program for a computer is its operating
system which never terminates (except for system crashes) but continues in
a wait loop until more jobs are entered.
◼ An algorithm can be described in many ways.
◼ A natural language such as English can be used but we must be very
careful that the resulting instructions are definite (condition iii).
◼ Programming Language
◼ Pseudocode = natural language + Programming Language
◼ Flowcharts
◼ It places each processing step in a "box" and uses arrows to

indicate the next step. 14

Algorithm - Example
Write an algorithm to change a numeric grade (integer) to a letter grade.
Algorithm “LetterGrade”
Input: One number
1. if (the number is between 90 and 100, inclusive)
then
1.1 Set the grade to “A”
End if
2. if (the number is between 80 and 89, inclusive)
then
2.1 Set the grade to “B”
End if
3. if (the number is between 70 and 79, inclusive)
then
3.1 Set the grade to “C”
End if

Continues on the next slide 15

Contd…
4. if (the number is between 60 and 69, inclusive)
then
4.1 Set the grade to “D”
End if
5. If (the number is less than 60)
then
5.1 Set the grade to “F”
End if
6. Return the grade
End

16
Analysis of Algorithms
◼ When is a Program/Algorithm said to be better than
another?
◼ Problem : Sorting data in ascending order.
◼ Problem Instance: e.g. sorting data (2 3 9 5 6 8)
◼ Algorithms: Insertion Sort, Bubble Sort, Merge sort, Quick sort,
Selection sort, etc.
◼ What are the measures for Comparison?
◼ Space Required (Space Complexity)
◼ amount of memory program occupies
◼ usually measured in bytes, KB or MB
◼ Time Required (Time Complexity)
◼ execution time
◼ usually measured by the number of executions

17
Space Complexity
◼ Space complexity is defined as the amount of memory a
program needs to run to completion.
◼ Some of the reasons for studying the Space Complexity
◼ If the program is to run on multi user system, it may be required to
specify the amount of memory to be allocated to the program.
◼ We may be interested to know in advance that whether sufficient
memory is available to run the program.
◼ There may be multiple solutions, each having different space
requirements.
◼ May define an upper bound on the data that the program can handle.

18
Contd…
◼ The Space needed by each of the algorithms consists of the
following component.
1. Fixed Space Requirements (C), independent of the
characteristics (Ex: number, size) of the inputs and outputs.
◼ Instruction space (Space needed to store the executable version of the program)
◼ Data space for simple variables, constants and fixed-size structural variables
(arrays and structures)
2. Variable Space Requirements (SP(I)), depends on the instance
characteristic I
◼ number, size, values of inputs and outputs associated with I
◼ recursive stack space(space needed by recursive function)
◼ return address (from where it has to resume after completion of the called
function)
◼ formal parameters and local variables,
◼ Therefore, the space requirement S(P) of any algorithm P may be written
as, S(P)=C+SP(I) 19
Space Complexity - Examples
Program: Simple arithmetic function Space needed by a, b and c = 3*4=
12 bytes
float sum(float a, float b, float c)
{
return a + b + c; S(P) = 12
}

Program: Iterative function for summing a list of numbers

float sum(float list[ ], int n) ◼ Every instance needs to store array a[] &
{ n.
float tsum = 0; ◼ Space needed to store n = 2 bytes.
int i; ◼ Space needed to store a[ ] = n floating bytes
for (i = 0; i<n; i++) ◼ Space needed to store i and tsum = 6 bytes
tsum += list [i]; ◼ Hence S(P) = (n + 8).
return tsum;
}
S(P) = n+8
20
Contd…
Program: Recursive function for summing of all the elements of an array
float rsum(float a[ ], int n) S(P) = recursive stack space * depth of
{ recursion
if (n<=0) So, S(P)=4*n
return 0;
else
return rsum(a, n-1) + a[ n ]);
}
Assumptions:
Space needed for one recursive call of Program
Type Name Number of bytes
parameter: float a[] n
parameter: integer n 2
return address:(used internally) 2(unless a far address)
TOTAL per recursive call 4*n

21
Time Complexity
◼ Time complexity is the amount of computer time a program
or an algorithm needs to run to completion.
◼ Why do we care about time complexity?
◼ Some computers require upper limits for program execution times.
◼ Some programs require a real-time response.
◼ If there are many solutions to a problem, typically we’d like to
choose the quickest.
◼ The exact time will depend on
◼ the implementation of the algorithm,
◼ programming language,
◼ optimizing the capabilities of the compiler used,
◼ the CPU speed,
◼ other hardware characteristics/specifications ,
◼ the amount of data inputted to an algorithm and so on. 22
Contd…
◼ The time T(P) taken by an algorithm P is the sum of the
compile time and the run time(execution time)
◼ Compile time (C), independent of instance characteristics
◼ Run (execution) time TP
◼ Therefore, the time requirement T(P) of any algorithm P
may be written as T(P)=C+TP(I)
TP(n)=caADD(n)+csSUB(n)+clLDA(n)+cstSTA(n)

◼ How do we measure?
1. Count a particular operation (operation counts)
2. Count the number of steps (step counts)
3. Asymptotic complexity
23
Operation Count
◼ Use operation count to measure runtime cost.
◼ pick one operation that is performed most often (e.g., ADD, SUB,
MUL, DIV, CMP, LOAD, or STORE)
◼ count the number of times that operation is performed!
//Insertion Sort
for (int i = 1; i < n; i++)
for (int j = i - 1; j >= 0 && t < a[j]; j--)
a[j + 1] = a[j];

◼ How many comparisons are made?

◼ The number of compares depends on a[ ] and t as well as on n.
◼ total compares in worst case = 1+2+3+…+(n-1)
= n (n-1)/2
24
Step Count
◼ The operation-count method omits accounting for the time
spent on all but the chosen operation.
◼ The step-count method count for all the time spent in all
parts of the program.
◼ A program step is loosely defined to be a syntactically or
semantically meaningful segment of a program for which the
execution time is independent of the instance characteristics.
◼ Examples
◼ a+b+b*c+(a+b)/(a-b) → one step;
◼ comments → zero steps;
◼ while (<expr>) do → step count equal to the number of times <expr> is
executed.
◼ for i=<expr> to <expr1> do → step count equal to number of times
<expr1> is checked.
25
Step Count - Methods
◼ Introduce variable count into programs.

◼ Tabular method
◼ Determine the total number of steps contributed by each statement
step per execution  frequency

◼ add up the contribution of all statements

26
Step Count – Example1
// Iterative function to sum a list of numbers
float sum(float list[ ], int n)
{
float tempsum = 0; count++; /* for assignment */
int i;
for (i = 0; i < n; i++) {
count++; /*for the for loop */
tempsum += list[i]; count++; /* for assignment */
}
count++; /* last execution of for */
return tempsum;
count++; /* for return */
}
2n+3 Steps

27
Step Count – Example2
// Iterative function to sum a list of numbers
Statement s/e Frequency Total steps
float sum(float list[ ], int n) 0 0 0
{ 0 0 0
float tempsum = 0; 1 1 1
int i; 0 0 0
for(i=0; i <n; i++) 1 n+1 n+1
tempsum += list[i]; 1 n n
return tempsum; 1 1 1
} 0 0 0
Total 2n+3

28
Step Count – Example3
// Recursive function to sum a list of numbers
Statement s/e Frequency Total steps
float rsum(float list[ ], int n) 0 0 0
{ 0 0 0
if (n) 1 n+1 n+1
return rsum(list, n-1)+list[n-1]; 1 n n
return list[0]; 1 1 1
} 0 0 0
Total 2n+2

29
Step Count – Example4
s/e frequency
for (int i = 1; i < n; i++) 1 n-1
{ 0 0
// insert a[i] into a[0:i-1] 0 0
int t = a[i]; 1 n-1
int j; 0 0
for (j = i - 1; j >= 0 && t < a[j]; j--) 1 n(n-1)/2
a[j + 1] = a[j]; 1 n(n-1)/2
a[j + 1] = t; 1 n-1
} 0 0

Total step counts

= (n-1) + 0 + 0 + (n-1) + 0 + n(n-1)/2 +n(n-1) /2 + (n-1)
= n2 + 2n – 3

30
Asymptotic Complexity
◼ Two important reasons to determine operation and step
counts
1. To compare the time complexities of two programs that compute
the same function
2. To predict the growth in run time as the instance characteristic
changes
◼ Neither of the two yield a very accurate measure
◼ Operation counts: focus on “key” operations and ignore all others
◼ Step counts: the notion of a step is itself inexact
◼ Asymptotic complexity provides meaningful statements
about the time and space complexities of a program.

31
Asymptotic Notations Properties
◼ Categorize algorithms based on asymptotic growth rate e.g.
linear, quadratic, polynomial, exponential
◼ Ignore small constant and small inputs

◼ Estimate upper bound and lower bound on growth rate of

time complexity function

◼ Describe running time of algorithm as n grows to .

◼ Describes behavior of function within the limit.

Limitations
◼ not always useful for analysis on fixed-size inputs.

◼ All results are for sufficiently large inputs.

32
Asymptotic Notation
◼ Describes the behavior of the time or space complexity for
large instance characteristic.

◼ Big Oh (O) notation provides an upper bound for the

function f.

◼ Omega (Ω) notation provides a lower-bound.

◼ Theta (Q) notation is used when an algorithm can be

bounded both from above and below by the same
function.
33
Big Oh (O) Notation
◼ The asymptotic complexity is a function f(n) that forms an
upper bound for T(n) for large n.
◼ In general, just the order of the asymptotic complexity is of
interest, i.e., if it is a linear, quadratic, exponential function.
◼ The order is denoted by a complexity class using the Big Oh
(O) notation.
Definition:
f(n) = O(g(n)) (read as “f(n) is Big Oh of g(n)”)
if there exist positive constants c and n0 such that
f(n) ≤ cg(n) for all n, n ≥ n0.

34
Big Oh Examples
Example 1: Prove that 2n2  O(n3)
Proof:
Assume that f(n) = 2n2 , and g(n) = n3
f(n)  O(g(n)) ?
Now we have to find the existence of c and n0

f(n) ≤ c.g(n) 2n2 ≤ c.n3 2 ≤ c.n

if we take, c = 1 and n0= 2 OR
c = 2 and n0= 1 then
2n2 ≤ c.n3
Hence f(n)  O(g(n)), c = 1 and n0= 2

35
Contd…
Example 2: Prove that n2  O(n2)
Proof:
Assume that f(n) = n2 , and g(n) = n2
Now we have to show that f(n)  O(g(n))
Since
f(n) ≤ c.g(n) n2 ≤ c.n2 1 ≤ c, take, c = 1, n0= 1
Then
n2 ≤ c.n2 for c = 1 and n  1
Hence, 2n2  O(n2), where c = 1 and n0= 1

36
Omega (Ω) Notation
▪ Again, only the order of the lower bound is considered,
namely if it is a linear, quadratic, exponential or some other
function.
▪ This order is given by a function class using the Omega (Ω)
notation.
Definition:
f(n) = Ω(g(n)) (read as “f(n) is omega of g(n)”) if there exist
positive constants c and n0 such that f(n) ≥ cg(n) for all n, n ≥ n0.

37
Omega Examples
Example 1: Prove that 5.n2  (n)
Proof:
Assume that f(n) = 5.n2 , and g(n) = n
f(n)  (g(n)) ?
We have to find the existence of c and n0 s.t.
c.g(n) ≤ f(n) n  n0
c.n ≤ 5.n2 c ≤ 5.n
if we take, c = 5 and n0= 1 then
c.n ≤ 5.n2 n  n0
And hence f(n)  (g(n)), for c = 5 and n0= 1

38
Contd…
Example 2: Prove that 5.n + 10  (n)
Proof:
Assume that f(n) = 5.n + 10, and g(n) = n
f(n)  (g(n)) ?
We have to find the existence of c and n0 s.t.
c.g(n) ≤ f(n) n  n0
c.n ≤ 5.n + 10 c.n ≤ 5.n + 10.n c ≤ 15.n
if we take, c = 15 and n0= 1 then
c.n ≤ 5.n + 10 n  n0
And hence f(n)  (g(n)), for c = 15 and n0= 1

39
Theta (Θ) Notation
◼ Used when the function f can be bounded both from
above and below by the same function g.
Definition:
f(n) = Θ(g(n)) (read as “f(n) is theta of g(n)”)
if there exist positive constants c1, c2 and n0 such that
c1g(n) ≤ f(n) ≤ c2g(n) for all n, n ≥ n0.

40
Theta Examples
Example 1: Prove that 2.n2 + 3.n + 6 Q(n3)
Proof: Let f(n) = 2.n2 + 3.n + 6, and g(n) = n3
we have to show that f(n) Q(g(n))
On contrary assume that f(n)  Q(g(n)) i.e.
there exist some positive constants c1, c2 and n0
such that: c1.g(n) ≤ f(n) ≤ c2.g(n)
c1.g(n) ≤ f(n) ≤ c2.g(n) c1.n3 ≤ 2.n2 + 3.n + 6 ≤ c2. n3
c1.n ≤ 2 + 3/n + 6/n2 ≤ c2. n 
c1.n ≤ 2 ≤ c2. n, for large n 
n ≤ 2/c1 ≤ c2/c1.n which is not possible
Hence f(n) Q(g(n))  2.n2 + 3.n + 6 Q(n3)
41
Contd…
Example 2: Prove that a.n2 + b.n + c = Q(n2) where a, b, c
are constants and a > 0
Proof
If we take c1 = ¼.a, c2 = 7/4. a and

n0 = 2. max(( b / a), ( c / a))

Then it can be easily verified that
0 ≤ c1.g(n) ≤ f(n) ≤ c2.g(n),n ≥ n0, c1= ¼.a, c2 =
7/4.a
Hence f(n)  Q(g(n))  a.n2 + b.n + c = Q(n2)
Hence any polynomial of degree 2 is of order Q(n2)
42
Best Case, Worst Case and Average Case
◼ When we analyze an algorithm it depends on the input data,
there are three cases :
1. Best Case:
▪ The lower bound on the running time for any input of given
size.
2. Average Case
▪ Assume all inputs of a given size are equally likely.
3. Worst Case
▪ An upper bound on the running time.

43
The Heap Data Structure
◼ A heap is a complete binary tree with the following two
properties:
◼ Structural property: all levels are full, except possibly the last
one, which is filled from left to right
◼ Order (heap) property: for any node x, Parent(x) ≥ x
8
From the heap property, it
7 4
follows that:
5 2
“The root is the maximum
element of the heap!”
Heap

A heap is a binary tree that is filled in order

44
Array Representation of Heaps
◼ A heap can be stored as an
array A.
◼ Root of tree is A[1]
◼ Left child of A[i] = A[2i]
◼ Right child of A[i] = A[2i + 1]
◼ Parent of A[i] = A[ i/2 ]
◼ Heapsize[A] ≤ length[A]
◼ The elements in the sub array
A[(n/2+1) .. n] are leaves

45
Heap Types
◼ Max-heaps (largest element at root), have the max-heap
property:
◼ for all nodes i, excluding the root:
A[PARENT(i)] ≥ A[i]

◼ Min-heaps (smallest element at root), have the min-heap

property:
◼ for all nodes i, excluding the root:
A[PARENT(i)] ≤ A[i]

46
Max Heap – Example
26 24 20 18 17 19 13 12 14 11 Max-heap as an array.
1 2 3 4 5 6 7 8 9 10

Max-heap as a binary tree.

24 20

18 17 19 13

12 14 11 Last row filled from left to right.

47
Adding/Deleting Nodes
◼ New nodes are always inserted at the bottom level (left to
right)
◼ Nodes are removed from the bottom level (right to left)

48
Operations on Heaps
◼ Maintain/Restore the max-heap property
◼ MAX-HEAPIFY
◼ Create a max-heap from an unordered array
◼ BUILD-MAX-HEAP
◼ Sort an array in place
◼ HEAPSORT

49
Maintaining the Heap Property
◼ Suppose a node is smaller than a child
◼ Left and Right subtrees of i are max-
heaps
◼ To eliminate the violation:
◼ Exchange with larger child
◼ Move down the tree
◼ Continue until node is not smaller than
children

50
Example
MAX-HEAPIFY(A, 2, 10)

A[2]  A[4]

A[4] violates the heap property

A[2] violates the heap property

A[4]  A[9]

Heap property restored

51
Maintaining the Heap Property
◼ Assumptions: Alg: MAX-HEAPIFY(A, i, n)
◼ Left and Right 1. l ← LEFT(i)
subtrees of i are
max-heaps 2. r ← RIGHT(i)
◼ A[i] may be 3. if l ≤ n and A[l] > A[i]
smaller than its
children 4. then largest ←l
5. else largest ←i
6. if r ≤ n and A[r] > A[largest]
7. then largest ←r
8. if largest  i
9. then exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest, n)

52
Building a Heap
◼ Convert an array A[1 … n] into a max-heap (n =
length[A])
◼ The elements in the subarray A[(n/2+1) .. n] are leaves
◼ Apply MAX-HEAPIFY on elements between 1 and n/2
1
Alg: BUILD-MAX-HEAP(A) 4
1. n = length[A] 2 3

1 3
2. for i ← n/2 downto 1 4 5 6 7

2 16 9 10
do MAX-HEAPIFY(A, i, n)
8 9 10
3. 14 8 7

A: 4 1 3 2 16 9 10 14 8 7
Example: A 4 1 3 2 16 9 10 14 8 7

i=5 i=4 i=3

1 1 1

4 4 4
2 3 2 3 2 3

1 3 1 3 1 3
4 5 6 7 4 5 6 7 4 5 6 7

2 16 9 10 2 16 9 10 14 16 9 10
8 9 10 8 9 10 8 9 10

14 8 7 14 8 7 2 8 7

i=2 i=1
1 1 1

4 4 16
2 3 2 3 2 3

1 10 16 10 14 10
4 5 6 7 4 5 6 7 4 5 6 7

14 16 9 3 14 7 9 3 8 7 9 3
8 9 10 8 9 10 8 9 10

2 8 7 2 8 1 2 4 1

54
Heapsort
◼ Goal:
◼ Sort an array using heap representations

◼ Idea:
◼ Build a max-heap from the array
◼ Swap the root (the maximum element) with the last element in
the array
◼ “Discard” this last node by decreasing the heap size
◼ Call MAX-HEAPIFY on the new root
◼ Repeat this process until only one node remains

55
Example: A=[7, 4, 3, 1, 2]

MAX-HEAPIFY(A, 1, 4) MAX-HEAPIFY(A, 1, 3) MAX-HEAPIFY(A, 1, 2)

MAX-HEAPIFY(A, 1, 1)

56
Alg: HEAPSORT(A)
1. BUILD-MAX-HEAP(A)
2. for i ← length[A] downto 2
3. do exchange A[1] ↔ A[i]
4. MAX-HEAPIFY(A, 1, i - 1)

Heap sort: Analysis

◼ Running time
◼ worst case is Q(N log N)
◼ Average case is also O(N log N)

57
Hashing
◼ Hashing is an alternative way of storing data that aims to greatly
improve the efficiency of search operations.
◼ With hashing, when adding a new data element, the key itself is
used to directly determine the location to store the element.
◼ Therefore, when searching for a data element, instead of searching
through a sequence of key values to find the location of the data we
want, the key value itself can be used to directly determine the
location in which the data is stored.
◼ This means that the search time is reduced from O(n), as in
sequential search, or O(log n), as in binary search, to O(1), or
constant complexity.
◼ Regardless of the number of elements stored, the search time is the
same.

58
Contd.
◼ How can we determine the position to store a data element using only
its key value?
◼ We need to find a function h that can transform a key value K (e.g. an
integer, a string, etc.) into an index into a table used for storing data.
◼ The function h is called a hash function. If h transforms different keys
into different indices it is called a perfect hash function. (A non-perfect
hash function may transform two different key values into the same
index.)
◼ Main idea: use arithmetic operations (hash function) to transform
keys into table locations
◼ the same key is always hashed to the same location!
◼ such that insert and search are both directed to the same location in O(1) time!

59
Hash Table
◼ The Hash table is an array of buckets, where each bucket contains items
assigned by a hash function
◼ h: hash function
◼ maps the universe U of keys into the slots of a hash table T[0,1,...,m-1]
◼ an element of key k hashes to slot h(k)
◼ h(k) is the hash value of key k
◼ The ideal hashing case is if a pair p has the key k and h is the hash
function, then p is stored in position h(k) of the table.
◼ With hashing, an element of key k is stored in T[h(k)]

60
Hash Function
◼ The division method
◼ h(k) = k mod m
◼ e.g. m=12, k=100, h(k)=4
Requires only a single division operation (quite fast)

◼ It’s a good practice to set the table size m to be a prime number

◼ Good values for m: primes not too close to exact powers of 2

◼ e.g. the hash table is to hold 2000 numbers, and we don’t mind an average of 3
numbers being hashed to the same entry
◼ choose m=701

61
Hashing Example
◼ In a text editor, to speed up search, we build a hash table and
hash each word into the table.
◼ Let hash table size (M) = 16
◼ Let hash function (h( )) = (sum all characters) mod 16
◼ by “sum all characters” we mean sum the ASCII (or UTF-8)
representation of the character
◼ for example, h(“He”) = (72+101)%16 = 13
◼ Let sample text be the following N=13 words: “He was well
educated and from his portrait a shrewd observer might divine”

62
Example Contd.

63
Collision and Collision Resolution
◼ Collision occurs when the hash function maps two or more items—
all having different search keys—into the same bucket
◼ What to do when there is a collision?
◼ Collision-resolution scheme:
◼ assigns distinct locations in the hash table to items involved in a collision
▪ Two schemes exist
▪ Separate Chaining
▪ Open Addressing
▪ Design a good hash function
◼ that is fast to compute and
◼ can minimize the number of collisions
◼ Design a method to resolve the collisions when they occur

64
Separate Chaining
◼ Is a collision resolution scheme that lets each bucket points to a linked
list of elements.
◼ That is, instead of a hash table, we use a table of linked list keep a
linked list of keys that hash to the same value

65
Separate Chaining
◼ To insert a key K
◼ Compute h(K) to determine which list to traverse
◼ If T[h(K)] contains a null pointer, initialize this entry to point to a linked list

that contains K alone.

◼ If T[h(K)] is a non-empty list, we add K at the beginning of this list.

◼ To delete a key K
◼ compute h(K), then search for K within the list at T[h(K)]. Delete K if it is
found.
◼ Assume that we will be storing n keys. Then we should make m the
next larger prime number.
◼ Therefore, we expect that each search, insertion, and deletion can be
done in constant time.
◼ Disadvantage: Memory allocation in linked list manipulation will slow
down the program.
◼ Advantage: deletion is easy.
66
Open Addressing
◼ Idea: if there’s a collision, apply another hash function
from a predetermined set of hash functions {h0, h1,
h2, . . .} repeatedly until there’s no collision
◼ To probe: to compare the key of an entry with search
key.
◼ Three common methods for resolving a collision in
open addressing
◼ Linear probing
◼ Quadratic probing
◼ Double hashing
◼ Linear probing:
hi(key) = (h0(key)+i) mod M
do a linear search from h0(key)
until you find

67
Linear Probing Example
◼ E.g, inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10

To insert 58, probe

T[8], T[9], T[0], T[1]

To insert 69, probe

T[9], T[0], T[1], T[2]

68
Primary Clustering
◼ We call a block of contiguously occupied table entries a cluster
◼ On the average, when we insert a new key K, we may hit the middle of
a cluster. Therefore, the time to insert K would be proportional to
half the size of a cluster. That is, the larger the cluster, the slower the
performance.
◼ Linear probing has the following disadvantages:
◼ Once h(K) falls into a cluster, this cluster will definitely grow in size by one. Thus, this
may worsen the performance of insertion in the future.
◼ If two cluster are only separated by one entry, then inserting one key into a cluster can
merge the two clusters together. Thus, the cluster size can increase drastically by a single
insertion. This means that the performance of insertion can deteriorate drastically after a
single insertion.
◼ Large clusters are easy targets for collisions.

69
Quadratic Probing
◼ hi(K) = ( hash(K) + i2 ) mod m
◼ E.g., inserting keys 89, 18, 49, 58, 69 with hash(K) = K mod 10

To insert 58, probe

T[8], T[9], T[(8+4)
mod 10]

To insert 69, probe

T[9], T[(9+1) mod
10], T[(9+4) mod 10]

70
Quadratic Probing
◼ Two keys with different home positions will have different probe
sequences
◼ e.g. m=101, h(k1)=30, h(k2)=29
◼ probe sequence for k1: 30,30+1, 30+4, 30+9
◼ probe sequence for k2: 29, 29+1, 29+4, 29+9

◼ If the table size is prime, then a new key can always be inserted if the
table is at least half empty..

◼ Secondary clustering
◼ Keys that hash to the same home position will probe the same alternative cells
◼ Simulation results suggest that it generally causes less than an extra half probe per
search
◼ To avoid secondary clustering, the probe sequence need to be a function of the
original key value, not the home position

71
Double Hashing
◼ To alleviate the problem of clustering, the sequence of probes for a key
should be independent of its primary position => use two hash
functions: hash() and hash2()
◼ hi(K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod m
◼ f(i) = i * hash2(K); hash2(K) = R - (K mod R), with R is a prime
smaller than m.
◼ Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69
To insert 49,
hash2(49)=7, 2nd
probe is T[(9+7)
mod 10]
To insert 58,
hash2(58)=5, 2nd
probe is T[(8+5)
mod 10]

To insert 69,
hash2(69)=1, 2nd
probe is T[(9+1)
mod 10]
72
Deletion in open addressing
◼ Actual deletion cannot be performed in open addressing hash
tables
◼ otherwise this will isolate records further down the probe sequence

◼ Solution: Add an extra bit to each table entry, and mark a

deleted slot by storing a special value DELETED (tombstone)

73
Disjoint Set
◼ Keeps track of a set of elements partitioned into a number
disjoint subsets

◼ Two sets are said to be disjoint if they have no elements in

common

◼ Initially, each element e is a set in itself:

◼ Ex: { {e1}, {e2}, {e3}, {e4}, {e5}, {e6}, {e7}}

74
Disjoint Set – Example2
◼ student records
s1 = set of students with 0< GPA <= 1
s2 = set of students with 1 < GPA <= 2 etc

◼ Question: Are x and y in the same set?

Yes, if find(x) == find(y).

75
Disjoint Set Terminology
◼ We identify a set by choosing a representative element of the set. It
doesn’t matter which element we choose, but once chosen, it can’t
change

◼ Two operations of interest:

◼ find (x): determine which set x is in. The return value is the
representative element of that set

◼ union (x, y): make one set out of the sets containing x and y.

◼ Disjoint set algorithms are sometimes called union-find algorithms.

76
Operations: Union
◼ Union(x, y) – Combine or merge two sets x and y into a
single set
◼ Before:
{{e3, e5, e7} , {e4, e2, e8}, {e9}, {e1, e6}}

◼ After Union(e5, e1):

{{e3, e5, e7, e1, e6} , {e4, e2, e8}, {e9}}

77
Operations: Find
◼ Determine which set a particular element is in
◼ Useful for determining if two elements are in the same set

◼ Each set has a unique name

◼ name is arbitrary; what matters is that find(a) == find(b)

is true only if a and b in the same set

◼ one of the members of the set is the "representative" (i.e.

name) of the set

◼ {{e3, e5, e7, e1, e6} , {e4, e2, e8}, {e9}}

78
Operations: Find
◼ Find(x) – return the name of the set containing x.

◼ Ex: {{e3, e5, e7, e1, e6} , {e4, e2, e8}, {e9}}

◼ Find(e1) = e5

◼ Find(e4) = e8

79
Up-Trees
◼ A simple data structure for implementing disjoint sets is the up-
tree.

H X
F

A W B R

H, A and W belong to the same set. H is the X, B, R and F are in the same set. X is the
representative representative

80
Operations in Up-Trees - Find
◼ Find is easy. Just follow pointer to representative element. The
representative has no parent.

find(x) {
if (parent(x)) // not the representative
return(find(parent(x));
else
return (x);
}

81
Union
◼ Union is more complicated.
◼ Make one representative element point to the other.
◼ In the example, some elements are now twice as deep as they
were before
H X X points to H
F
B, R and F are now
deeper
A W B R

H X
F H points to X
A and W are now
deeper
A W B R

Time Complexity: O(1) 82

Self-Review Questions
1. Define Disjoint Set

2. How do you represent disjoint sets? Explain with one

example.

3. Explain the following operations of disjoint set with suitable

examples
a. Union

b. Find

83
Self-Review Questions
1. Define the following
a. Data Structure
b. Stack
c. Queue
d. Linked List
e. Tree
f. Graph
g. Algorithm
2. Discuss briefly the asymptotic notations used for finding the
complexity of algorithms with suitable examples.
3. What is a heap? Differentiate between min heap and max heap?
4. What is a hash function ?
5. What makes a good hash function ?
6. What is a hash table ?
84

Comp 272 Notes
0% (1)
Comp 272 Notes
26 pages
UNIT 1 - DATA STRUCTURES (1)
No ratings yet
UNIT 1 - DATA STRUCTURES (1)
30 pages
Data Structure and Algorithm
No ratings yet
Data Structure and Algorithm
16 pages
Data Structures
No ratings yet
Data Structures
529 pages
DS UNIT-1 NOTES
No ratings yet
DS UNIT-1 NOTES
25 pages
DS PPTS - 2 PDF
No ratings yet
DS PPTS - 2 PDF
653 pages
Data Structure Full Book
100% (2)
Data Structure Full Book
361 pages
DSA Self Placed: Geeksforgeeks
No ratings yet
DSA Self Placed: Geeksforgeeks
20 pages
DSStud2023Link Listr (2)
No ratings yet
DSStud2023Link Listr (2)
116 pages
Data Structures and Algorithms II
No ratings yet
Data Structures and Algorithms II
103 pages
Data Structures- Unit-I (1)
No ratings yet
Data Structures- Unit-I (1)
45 pages
Algorithms Data Structure
No ratings yet
Algorithms Data Structure
9 pages
FDS Unit I
No ratings yet
FDS Unit I
158 pages
Lecture # 0 Data - Structures - and - Algorithm
No ratings yet
Lecture # 0 Data - Structures - and - Algorithm
41 pages
All Chapters (Data Starcture)
No ratings yet
All Chapters (Data Starcture)
205 pages
Data Structure and Algorithms
No ratings yet
Data Structure and Algorithms
24 pages
DS Intro
No ratings yet
DS Intro
77 pages
Chapter 1-3
No ratings yet
Chapter 1-3
82 pages
DS Unit1 Part-1
No ratings yet
DS Unit1 Part-1
65 pages
DSA Secion A (1)
No ratings yet
DSA Secion A (1)
79 pages
Data Structure & Algorithms (TIU-UCS-T201) : Presented by Suvendu Chattaraj (Department of CSE, TIU, WB)
No ratings yet
Data Structure & Algorithms (TIU-UCS-T201) : Presented by Suvendu Chattaraj (Department of CSE, TIU, WB)
23 pages
02 Data Types and Data Structures
No ratings yet
02 Data Types and Data Structures
62 pages
File
No ratings yet
File
17 pages
Data Structure Unit-1
No ratings yet
Data Structure Unit-1
74 pages
1 - Introduction To DS
No ratings yet
1 - Introduction To DS
27 pages
Unit 1
No ratings yet
Unit 1
25 pages
Data Structures
No ratings yet
Data Structures
42 pages
Unit 1 Introduction To Data Structures
No ratings yet
Unit 1 Introduction To Data Structures
31 pages
Dsa Basic Data Structure
No ratings yet
Dsa Basic Data Structure
72 pages
DSA-MODULE 1 LECTURE NOTES
No ratings yet
DSA-MODULE 1 LECTURE NOTES
30 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Vidhya Institute Data Structure
No ratings yet
Vidhya Institute Data Structure
217 pages
Datastrucpptnew
No ratings yet
Datastrucpptnew
361 pages
1 INTRO
No ratings yet
1 INTRO
125 pages
Unit-1 PPT (DS)
No ratings yet
Unit-1 PPT (DS)
215 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
20 pages
Unit 1
No ratings yet
Unit 1
19 pages
Data Structure Notes
No ratings yet
Data Structure Notes
60 pages
Introduction To Datastruacture
No ratings yet
Introduction To Datastruacture
22 pages
DS unit 01.01
No ratings yet
DS unit 01.01
11 pages
Day1 DSA
No ratings yet
Day1 DSA
4 pages
Data-Structure-and-Algorithm (2)
No ratings yet
Data-Structure-and-Algorithm (2)
18 pages
Data Structure and Algorithms (4330704) Chap1
No ratings yet
Data Structure and Algorithms (4330704) Chap1
46 pages
01 Overview of Data Structure PDF
No ratings yet
01 Overview of Data Structure PDF
20 pages
Data Structures
No ratings yet
Data Structures
59 pages
Data Structures Unit-1 Chapter-1
No ratings yet
Data Structures Unit-1 Chapter-1
17 pages
D1-Intro To Algorithm DataStructures
No ratings yet
D1-Intro To Algorithm DataStructures
56 pages
A979968895 - 21482 - 28 - 2020 - Ds 1-Basic Data Structure
No ratings yet
A979968895 - 21482 - 28 - 2020 - Ds 1-Basic Data Structure
65 pages
Data Structure Using C
No ratings yet
Data Structure Using C
98 pages
Basic Algoritm
No ratings yet
Basic Algoritm
15 pages
DSSlides M1
No ratings yet
DSSlides M1
86 pages
Data Structure & Algorithms
No ratings yet
Data Structure & Algorithms
41 pages
Data Structure-ECE NOTES
No ratings yet
Data Structure-ECE NOTES
102 pages
SEM1-DSA - Lesson01-Introduction to Data Structures and Algorithms (1)
No ratings yet
SEM1-DSA - Lesson01-Introduction to Data Structures and Algorithms (1)
29 pages
Lecture # 1-2-Intro
No ratings yet
Lecture # 1-2-Intro
35 pages
Introduction To Data Structures
No ratings yet
Introduction To Data Structures
51 pages
TCS 200-SM02
No ratings yet
TCS 200-SM02
10 pages
DSA Chapter 0
No ratings yet
DSA Chapter 0
58 pages
UNIT-1
No ratings yet
UNIT-1
40 pages
Data Types & Abstract Data Type
No ratings yet
Data Types & Abstract Data Type
22 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Section Schedule_20230714 (1)
No ratings yet
Section Schedule_20230714 (1)
7 pages
1688744144889_Final Exit Exam Schedule 2 with room number
No ratings yet
1688744144889_Final Exit Exam Schedule 2 with room number
15 pages
OOSE Chapter Three
No ratings yet
OOSE Chapter Three
47 pages
Linux Interview Questions @Freecodecs
No ratings yet
Linux Interview Questions @Freecodecs
32 pages
OOSE Chapter Two P1
No ratings yet
OOSE Chapter Two P1
40 pages
OOSE Chapter Four
No ratings yet
OOSE Chapter Four
67 pages
CH4 1
No ratings yet
CH4 1
38 pages
CH2 1
No ratings yet
CH2 1
27 pages
Complexity Theory Chapter Three 3 Undecidability
No ratings yet
Complexity Theory Chapter Three 3 Undecidability
19 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
24 pages
Consider An AVL Tree Whose Root Has A Height of 5
No ratings yet
Consider An AVL Tree Whose Root Has A Height of 5
13 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Sorting and Hashing: By, Pankti Doshi
No ratings yet
Sorting and Hashing: By, Pankti Doshi
89 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
AMCAT Computer Programming1 - Part2
100% (1)
AMCAT Computer Programming1 - Part2
35 pages
C
No ratings yet
C
20 pages
Hashing
No ratings yet
Hashing
37 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
2 - Programming and Data Structures PDF
No ratings yet
2 - Programming and Data Structures PDF
224 pages
DBMS_UNIT_5_NOTES
No ratings yet
DBMS_UNIT_5_NOTES
28 pages
Hashing
No ratings yet
Hashing
41 pages
CS204 - Final
No ratings yet
CS204 - Final
3 pages
DS & Alg
No ratings yet
DS & Alg
10 pages
2 Mtech I Sem Regular & Supply R21 May 2022
No ratings yet
2 Mtech I Sem Regular & Supply R21 May 2022
51 pages
Hashing
No ratings yet
Hashing
10 pages
Activity 3 - Dorado, V
No ratings yet
Activity 3 - Dorado, V
4 pages
CS3301 Data Stuctures Important Questions
No ratings yet
CS3301 Data Stuctures Important Questions
7 pages
Cs 301
No ratings yet
Cs 301
404 pages
Hashing MCQ
No ratings yet
Hashing MCQ
3 pages
14 Hashing
No ratings yet
14 Hashing
23 pages
Module 5 Hashing
No ratings yet
Module 5 Hashing
66 pages
DSA - Searching PDF
100% (1)
DSA - Searching PDF
50 pages
Chapter One - Introduction and Elementary Data Structures
No ratings yet
Chapter One - Introduction and Elementary Data Structures
84 pages
Viva Questions
No ratings yet
Viva Questions
7 pages
Hashing
No ratings yet
Hashing
22 pages
Quiz 2
No ratings yet
Quiz 2
43 pages
Data Structures Using C++ 2E: Searching and Hashing Algorithms
No ratings yet
Data Structures Using C++ 2E: Searching and Hashing Algorithms
47 pages

Chapter One - Introduction and Elementary Data Structures

Uploaded by

Chapter One - Introduction and Elementary Data Structures

Uploaded by

Design and Analysis of Algorithms

◼ Stack, Queue, Linked List, Tree and Graphs

◼ Algorithm and its Properties

◼ Heap and Heap Sort

◼ Ex: Arrays, Stacks, Queues and linked lists.

◼ Non-linear: Elements do not form a sequence.

◼ Ex: Trees and Graphs.

◼ A data type is an abstract data type if the programmers who

◼ Simply ADT is Type defined in terms of its data items and

◼ All pre-defined types such as int, double, … are abstract data

FIGURE: Various examples of stacks 6

◼ Insertions and deletions follow the first-in first-out (FIFO) scheme

◼ Main queue operations:

◼ Auxiliary queue operations:

myList Singly Linked List

indicate the next step. 14

Continues on the next slide 15

Program: Iterative function for summing a list of numbers

◼ How many comparisons are made?

◼ add up the contribution of all statements

Total step counts

◼ Estimate upper bound and lower bound on growth rate of

time complexity function

◼ Describes behavior of function within the limit.

◼ All results are for sufficiently large inputs.

◼ Big Oh (O) notation provides an upper bound for the

◼ Omega (Ω) notation provides a lower-bound.

◼ Theta (Q) notation is used when an algorithm can be

f(n) ≤ c.g(n) 2n2 ≤ c.n3 2 ≤ c.n

n0 = 2. max(( b / a), ( c / a))

A heap is a binary tree that is filled in order

◼ Min-heaps (smallest element at root), have the min-heap

Max-heap as a binary tree.

12 14 11 Last row filled from left to right.

A[4] violates the heap property

Heap property restored

i=5 i=4 i=3

MAX-HEAPIFY(A, 1, 4) MAX-HEAPIFY(A, 1, 3) MAX-HEAPIFY(A, 1, 2)

Heap sort: Analysis

◼ It’s a good practice to set the table size m to be a prime number

◼ Good values for m: primes not too close to exact powers of 2

that contains K alone.

To insert 58, probe

To insert 69, probe

To insert 58, probe

To insert 69, probe

◼ Solution: Add an extra bit to each table entry, and mark a

◼ Two sets are said to be disjoint if they have no elements in

◼ Initially, each element e is a set in itself:

◼ Question: Are x and y in the same set?

◼ Two operations of interest:

◼ Disjoint set algorithms are sometimes called union-find algorithms.

◼ After Union(e5, e1):

◼ Each set has a unique name

is true only if a and b in the same set

name) of the set

Time Complexity: O(1) 82

2. How do you represent disjoint sets? Explain with one

3. Explain the following operations of disjoint set with suitable

You might also like