Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
23 views

01 - Advanced Data Structure

The document discusses advanced data structures including hashing, trees, disjoint-sets, binary search trees, and others. It provides details on: - Hashing uses a hash function to directly find items in a table using a search key. Collisions can be resolved with chaining or open addressing. - Trees are non-linear, hierarchical data structures that can represent calculations. Binary trees restrict nodes to at most two children. - Terms like nodes, edges, leaves, height are defined for describing tree structures. Binary trees are explored further including full and complete binary trees.

Uploaded by

Nguyễn Hoàn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

01 - Advanced Data Structure

The document discusses advanced data structures including hashing, trees, disjoint-sets, binary search trees, and others. It provides details on: - Hashing uses a hash function to directly find items in a table using a search key. Collisions can be resolved with chaining or open addressing. - Trees are non-linear, hierarchical data structures that can represent calculations. Binary trees restrict nodes to at most two children. - Terms like nodes, edges, leaves, height are defined for describing tree structures. Binary trees are explored further including full and complete binary trees.

Uploaded by

Nguyễn Hoàn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 161

ADVANCED DATA STRUCTURE

Advanced Tech. P – Feb. 2017


Contents

Hashing 3
Tree 11
Disjoint-Sets 38
Binary Search Tree 56
AVL Tree 66
Heap 88
Treap 98
Trie 111
Segment Tree 136
Binary Indexed Tree 143

2
HASHING
Presentation of the problem: Find directly by
using a file name
Most file systems do not have limitation in the number of files
that can be contained in a directory. Therefore, many files can
exist in each of various directories.
Consider the case where following tasks are frequently
performed.
List the files existing in a directory
Follow the directory path
Determine whether a specific file exists
It can cause drop in performance.

What is the solution?

4
Hashing

Method of directly finding a specific item by using


computational operations using a search key when the item is
to be searched

Hash function
Function converting the search key
into the location of the item

Hash table
Table storing the items at the addresses returned by the hashing
function

5
Process of search by hashing
First calculate the address by inputting the search key into a hash
function, and then move to the hash table corresponding to the
address calculated
Search succeeds when the wanted item is at the corresponding
address, fails otherwise

Hash function
Hash table
Search key Hash address


6
Collision
The case where returned hash addresses are the same when different
search keys are applied to a hash function
Even if the hash function distributes hash addresses evenly, collision
may be inevitable as the number of data stored in the hash table
increases.

Collision: Different names indicate the same hash address

7
Solutions to collision
Open addressing

Chaining

Chaining
Method of enabling the data with one or more key values to be
stored in each bucket by changing the structure of the hash table

Linked list is utilized for storing multiple key values in a bucket

8
Ex: Picture
Collision occurs between the hash addresses of John Smith and
Sandra Dee.

Connect to the bucket where John Smith is stored to create a list


node for storing Sandra Dee, and store the value.

9
Open Addressing
If collision occurs because there in no empty space a the address
calculated by the hash function, investigate whether empty space
exists at the next space.
If there is an empty space, store the item for the search key.
If there is no empty space, repeat searches until an empty space appears.

Ex: Right picture


John Smith and Sandra Dee collides.
Then, Sandra Dee is stored at the next space.
Ted Baker should have been stored at 153,
but pushed behind and stored at 154.

10
TREE
Presentation of the problem: Calculator

Express the mathematical formula 2 + 3 * 4


like the following graph, and
calculate the formula by traversing the graph.

2 *

3 4

12
Tree

Tree is non-directed linked graph with no cycle.


There is only one path between two nodes (or vertices).
For each node, maximum of one parent node can exist.
Each node can have one or more child node or can have no child
node. 6
1
0
Non-linear structure 2
3
Data structure with 1:n relation among elements
5
Hierarchical data structure having hierarchical
4
relation among elements

13
Definition of Tree
Tree is a finite set composed of one or more nodes, satisfying
following conditions.
1. Among the nodes, the node having no parent is called a root.
2. Other nodes can be separated into n (>= 0) separated sets T1,…,TN.
In turn, each of the sets, T1, … , TN, becomes a tree in itself which
is called the subtree of the root (recursive definition).
Tree T
A

B C D
Leaf node

E F G H I J

K Subtree
T2 Subtree
T3
Subtree
T1
14
Terms on Tree
Node: Element of tree, which is often called the vertex.
Node of the tree T - A,B,C,D,E,F,G,H,I,J,K

Edge: Line connecting nodes.


Connects a parent node and child nodes

Root node: The node from which a tree starts


Root node of the tree T - A

B C D

E F G H I J

K
tree T

15
Sibling nodes: Child nodes of a common parent node
B, C and D are sibling nodes.
Ancestor node: All nodes existing on the path along the edges to the
root node
Ancestor nodes of K: F, B, A
Subtree: A tree generated when the edge connected to the parent
node is broken
Descendent node: Nodes at the lower level of the subtree
Descendent nodes of B – E, F, K
A

B C D

E F G H I J

16
Degree
Degree of a node: The number of child nodes connected to the node
Degree of B = 2, degree of C = 1
Degree of a tree: The largest number among the degrees of nodes in a tree
Degree of tree T = 3
leaf node (leaf node): The node with degree of 0, i.e., the node without a child node
Height
Height of a node: The number of edges leading to the node from the root. The level
of the node
Height of B = 1, height of F = 2
Height of tree: The largest value among the heights of nodes in a tree. Maximum
level
Height of tree T = 3

A Level 0 Height 0

B C D Level 1 Height 1

E F G H I J Level 2 Height 2

K Level 3 Height 3 Height of


tree
17
Binary Tree

Special form of a tree in which all nodes have 2 subtrees


Tree that each node can have maximum of 2 child nodes
Left child node
Right child node

Example of binary tree

A A A
A

B B B C

D E

18
Binary tree - Features

Maximum number of nodes at level I is 2i


Minimum number of nodes that a binary tree with height h can
have is (h+1), and the maximum number is (2h+1-1).

1 Level 0

2 3 Level 1

4 5 6 7 Level 2

8 9 10 11 12 13 14 15 Level 3

19
Binary Tree - Types
Full Binary Tree
The binary tree filled with nodes with saturated state at all levels
The binary tree having the maximum number of nodes, (2h+1-1), when the
height is h,
When the height is 3, 23+1-1 = 15 nodes
Starting as 1 from the root, the nodes have numbers corresponding to the
designated locations to 2h+1-1
1

2 3

4 5 6 7

8 9 10 11 12 13 14 15

20
Complete Binary Tree
The binary tree which has no empty space from the node 1 to node
n of the full binary tree when the height is h and the number of
nodes is n (provided h+1 ≤ n < 2h+1-1 )

Ex) Complete binary tree with 10 nodes

2 3

4 5 6 7

8 9 10

21
Skewed Binary Tree
The binary tree having child node skewed in one direction while
having the minimum number of nodes for height h

Left skewed binary tree

Right skewed binary tree

1 1

2 2

3 3

4 4

22
Binary Tree - Traversal

Traversal refers to visiting all nodes of a tree with no


duplication, but it is impossible to know which node precedes
the other node as in the linear structure since trees have non-
linear structure.
Therefore, special methods are needed.

23
Traversal: Visiting nodes of a tree systematically
3 basic traversal methods
Preorder traversal : VLR

The root node is visited earlier than the child note.

Inorder traversal: LVR

Nodes are visited in the order of the left child, root and then right child.

Postorder traversal: LRV V


Root
Descendent nodes are visited before the root note.

L R

Left Right
subtree subtree

24
Preorder traversal
Operation method

① Visit the current node n, and process. : V

② Traverse the left subtree of the current node n. : L

③ Traverse the right subtree of the current node n. : R

Preorder traversal algorithm

preorder_traverse (TREE T)
IF T is not null
visit(T)
preorder_traverse(T.left)
preorder_traverse(T.right)

25
Example of Preorder traversal

T0

B C Order 1 : T0 -> T1 -> T2


T3
Order 2: A -> B D (T3) -> C F G
D E F G
Total order: A B D E H I C F G
T2
H I

T1

26
Inorder traversal
Executing method

① Traverse the left subtree of the current node n. : L

② Visit the current node n, and process. : V

③ Traverse the right subtree of the current node n. : R

Inorder traversal algorithm


inorder_traverse (TREE T)
IF T is not null
inorder_traverse(T.left)
visit(T)
inorder_traverse(T.right)

27
Example of inorder traversal

T0

B C Order 1 : T1 -> T0 -> T2


T3
Order 2: D B (T3) -> A -> F C G
D E F G
Total order: D B H E I A F C G
T2
H I

T1

28
Postorder traversal
Execution method

① Traverse the left subtree of the current node n. : L

② Traverse the right subtree of the current node n. : R

③ Visit the current node n, and process. : V

Postorder traversal algorithm

postorder_traverse (TREE T)
IF T is not null
postorder_traverse(T.left)
postorder_traverse(T.right)
visit(T)

29
Example of postorder traversal

T0

Order 1 : T1 -> T2 -> T0


B C
T3 Order 2: D (T3) B -> F G C -> A
D E F G
Total order: D H I E B F G C A

T2
H I

T1

30
Expression of tree

Expression of binary trees using arrays


Assign index to each node in a binary tree as follows
The index of the root is given as 1
For nodes at level n, indices are assigned from left to right starting
from 2n to 2n+1 – 1.

1
Level 0 A
2 3
Level 1 B C

4 5 6 7
Level 2 D E F G
8 9 10
Level 3 H I J

31
Properties of the node index
What is the index of the parent node of the node with index i? i/2
What is the index of the left child node of the node with index i? 2*i
What is the index of the right child node of the node with index i? 2*i+1
What is the starting index of the node indices at level n? 2n
0
1 A
1 2 B Index of parent node
level 0 A 3 C = 2
2 3
4 D
B C 5 E
4 5 6 7 6 F
7 G
D E F G
8 9 10 11 12 13 8 H
9 I Index of left child
H I J K L M
10 J node = 10
11 K
Index of right
12 L child node = 11
13 M

32
Expression of binary tree by using arrays
Node index is used as the index of the array
What is the size of the array for a binary tree of height h?
What is the maximum number of nodes at level i? 2i
Hence, 1 + 2 + 4 + 8 … + 2i = ∑2i = 2h+1-1
1
A
2 3
B C
4 5 6 7
D E F G
8 9 10
H I J

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

- A B C D E F G H I J - - - - -

33
Expression of binary tree by using arrays

1 1

A A
2 3
B B
4 7
C C
8 15
D D

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

- A B - C - - - D - - - - - - -

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

- A - B - - - C - - - - - - - D

34
Disadvantages of expressing binary tree by using arrays
In the case of skewed binary trees, memory space can be wasted for
the element of an array which is not used

When a new node is inserted in the tree or an existing node is


deleted from the array, it is difficult to change the size of the array,
making the operation inefficient.

35
Expression of tree – Linked List

In order to supplement the shortcomings of binary tree


expressions using arrays, tree can be represented by using
linked lists.
Expression of a binary tree by using a linked list
Since all nodes in a binary tree have maximum of 2 child nodes, the
tree is implemented by using nodes of simply linked list with a
certain structure

Left Data Right


• •

Left child node Right child node

36
Expression of complete binary tree by using linked lists

● A ●

● B ● ● C ●

● D ● ● E ● ● F ● ● G ●

null H null null I null null J null null K null null L null

37
DISJOINT-SETS
Disjoint-sets

Disjoint-Sets are the sets having no overlapping elements, in


other words, there is no joint set.
Each set is identified through a specific element belonging to
the set. This element is called the representative.
Method of representing Disjoint-Sets
Linked list
Tree
Operations of Disjoint-Sets
Make-Set( x )
Find-Set( x )
Union( x , y )

39
Disjoint-Sets examples
Make-Set(x)

Make-Set(y) y
x
Make-Set(a)
a
Make-Set(b) b
Union(x, y)

Union(a, b)

Find-Set(y) return x (representative)

Find-Set(b) return a (representative)

Union(x, a)

40
Representation of Disjoint-sets – Linked
list
Elements in the same set are managed by a linked list.
The front element of the linked list is set as the representative
of the set.
Each element has a link indicating the representative of the
set.
rep tail rep tail rep tail

a d e b f c

41
Examples of linked list operations
Find-Set(e) return a

Find-Set(f) return b

Union(a, b)

rep tail rep tail

a d e b f c

42
Disjoint-Sets expression - tree
A disjoint set is represented as a tree.
A child node indicates a parent node, and the root node
becomes a representative.

a b c

d e f

43
Operation examples
Make-Set(a) ~ Make-Set(f)

a b c d e f

Union(c, d), Union(e, f)

a b c e

d f

44
Union(d, f)

a b c
e
d
f

Find-Set(d) return c

Find-Set(e) return c

45
Illustration of staring by using an array of trees expressing
Disjoint-Sets

a b c
e
d
f

Index 0 1 2 3 4 5
Vertex a b c d e f
Parent -1 -1 -1 2 2 4

46
Operations on Disjoint-Sets

Make-Set( x ): Operation generating a new set containing a


unique member x
Make-Set(x)
p[x] ← x

Find_Set( x ): Operation finding a set containing x

Find-Set(x)
IF x == p[x] : RETURN x
ELSE : RETURN Find_Set(p[x])

Union( x , y ): Operation unifying two sets containing x and y

Union(x, y)
p[Find-Set(y)] ← Find-Set(x)

47
Problems

e
Find-Set(b)
d

b
f
a

b c d e

48
Methods for enhancing the efficiency of operation

Union by using rank


Each node stores the height of the subtree having itself as the root
in the name of Rank.

When unifying two sets, the set with low rank is attached to the set
with high rank.

Path compression
Change the pointer in the process of performing Find-set so that all
nodes directly points to the root.

49
Examples of Union using ranks

2 2
1
a e a
+ =
0 0 1 0 1
1 0
d b d e
b f g

0 0 0
0
c c f g

50
Examples of increasing ranks in the Union using ranks

2 3
2
a e a
+ =
0 1 0 1 0 2
1
d b d e
b f g

0 1 0
0 0
c h c f g

0
h

51
Example of Path Compression

a a

Find-Set(h)
b d b d e h

c e c f g

f g h

52
Make_Set() operation
Make_Set( x ): Operation generating a new set containing a unique
member x
p[x]: Store the parent of node x
rank[x]: Store the rank value of the tree with root
node being x

Make_Set(x)
p[x] ← x
rank[x] ← 0

53
Find_Set operation
Find_Set( x ): Operation finding the set containing x

Find_Set(x)
IF x ≠ p[x] // when x is not root
p[x] ← Find_Set(p[x])
RETURN p[x]

Find_set operation renews the information of the parent of the node


following the path from a specific node to the root.

54
Union operation
Union( x , y ): Operation unifying two sets including x and y

Union(x, y)
Link( Find_Set(x), Find_Set(y) )

Link(x, y)
IF rank[x] > rank[y] // rank in the height of tree
p[y] ← x
ELSE
p[x] ← y
IF rank[x] == rank[y]
rank[y]++

55
BINARY SEARCH TREE
Binary Search Tree
Data structure for performing search operations efficiently
Every element has unique key which is different from each
other.
key(left subtree)<key(root node)<key(right subtree)
The left subtree and right subtree are also binary search trees.
Values sorted in increasing order can be obtained through
inorder traversal.

root 8

3 10

5 14
2

Left Right 11 16
subtree subtree
Value smaller than Value larger than
Value smaller than root Value larger than root root
root 57
Operations of binary search tree
Search operation
Start search at root
Compare the key value x to be searched to the key value k of
the root node.
x == k: Success in search
x < k: Perform search operation on the left subtree of the
root node.
x > k: Perform search operation on the right subtree of the
root node.
Perform the search operation repeatedly on subtrees through
traversal.
Search is failed when there is no subtree to perform the search.

58
Search operation
Search 13

Start search

9 1 13 > 9

4 12 2 13 > 12

3 6 15 2 13 < 15

4 13 = 13 13 17

59
Insertion operation
1) Search operation is performed first

Confirm through search whether there are any element which is the same
as the element to be inserted is in the tree, since insertion is impossible
in that case.
The location at which failure in search is determined becomes the place
of insertion.

2) Insert an element at the place where search has failed.

 The following example shows insertion of 5.


Start search

9 1 5<9 9

2 5>4 4 12 4 12

3 6 53 < 6 15 3 6 15
Failure in
search

13 17 5 13 17
4 Insert to left child node
60
Binary Search Tree – Operation exercise

Deletion operation
Consider the algorithms for deletion operation.
Delete 13, 12 and 9 sequentially for the following tree.

4 12

3 6 15

13 17

61
Solution

Delete 13: When the node to be deleted is a leaf node: When


the degree is 0

4 12

3 6 15
1 Search

13 17
2 Deletion

62
Delete 12: When the node to be deleted is a leaf node: When
the degree is 1

9
1 Search

4 2 Deletion 12
3 Follow up measure:
move child node

3 6 15

17

63
Delete 9: When the node to be deleted is a leaf node: When
the degree is 2

9 1 Search

3 Move
4 15

3 6 17
2 Find candidate

64
Binary Search Tree - Performance
The time for searching, insertion and deletion corresponds to
the height of the tree.
O(h), h: Height of BST

In average cases
In the case where the binary tree has been created in balanced way
O(log n)

In the worst case


In the case of skewed binary tree which is skewed in one direction
O(n)
Sequential search is the same as time complexity.

65
AVL TREE
AVL Tree
The AVL tree is named after its two Soviet inventors, Georgy
Adelson-Velsky and Evgenii Landis, who published it in their 1962
paper "An algorithm for the organization of information".
AVL tree is a self-balancing Binary Search Tree (BST) where the
difference between heights of left and right subtrees cannot be more
than one for all nodes.
Lookup, insertion, and deletion all take O(log n) time in both the
average and worst cases, where n is the number of nodes in the tree
prior to the operation.
Insertions and deletions may require the tree to be rebalanced by one
or more tree rotations.

67
AVL Tree example

9 5

4 14 3 6

3 6 12 15 2 4

68
Not AVL Tree example

10 8

3 15 6 10

1 6 11 4 5

12

69
AVL Tree operation - Insertion
Perform standard BST insert for new node w.
Starting from w, travel up and find the first unbalanced node. Let z be
the first unbalanced node, y be the child of z that comes on the path
from w to z and x be the grandchild of z that comes on the path from
w to z.
Re-balance the tree by performing appropriate rotations on the
subtree rooted with z. There can be 4 possible cases that needs to be
handled as x, y and z can be arranged in 4 ways.
Left Left : y is left child of z and x is left child of y
Left Right : y is left child of z and x is right child of y
Right Right : y is right child of z and x is right child of y
Right Left : y is right child of z and x is left child of y

70
AVL Tree operation - Insertion
Left Left case (Insert 15)
12 12

Unbalanced node
Right rotation
10 22 10 18

6 18 25 6 16 22

16 20 15 20 25

15 Balanced

71
AVL Tree operation - Insertion
Left Right case (Insert 19)
12 12

Unbalanced node Unbalanced node


1. Left rotation
10 22 10 22

6 18 25 6 20 25

16 20 18

19 16 19

72
AVL Tree operation - Insertion
Left Right case (cont.)
12 12

Unbalanced node
2. Right rotation
10 20 10 22

6 18 22 6 20 25

16 19 25 18

Balanced 16 19

73
AVL Tree operation - Insertion
Right Right case (Insert 30)
12 12

Unbalanced node
Left rotation
10 22 10 25

6 18 25 6 22 26

24 26 18 24 30

30 Balanced

74
AVL Tree operation - Insertion
Right Left case (Insert 23)
12 12

Unbalanced node Unbalanced node


1. Right rotation
10 22 10 22

6 18 25 6 18 24

24 26 23 25

23 26

75
AVL Tree operation - Insertion
Right Left case (cont.)
12 12

Unbalanced node
2. Left rotation
10 24 10 22

6 22 25 6 18 24

18 23 26 23 25

Balanced 26

76
AVL Tree operation - Deletion
Perform standard BST delete for the node w.
Starting from w, travel up and find the first unbalanced node. Let z be
the first unbalanced node, y be the larger height child of z, and x be
the larger height child of y. Note that the definitions of x and y are
different from insertion.
Re-balance the tree by performing appropriate rotations on the
subtree rooted with z. There can be 4 possible cases that needs to be
handled as x, y and z can be arranged in 4 ways.
Left Left : y is left child of z and x is left child of y
Left Right : y is left child of z and x is right child of y
Right Right : y is right child of z and x is right child of y
Right Left : y is right child of z and x is left child of y

77
AVL Tree operation - Deletion
Left Left case (Delete 22)

15 15 Unbalanced node

Delete 22
8 22 8 23

6 10 18 25 6 10 18 25

3 7 11 23 3 7 11

5 5

78
AVL Tree operation - Deletion
Left Left case (cont.)

8 15 Unbalanced node

Right rotation
6 15 8 23

3 7 10 23 6 10 18 25

5 11 18 25 3 7 11

Balanced
5

79
AVL Tree operation - Deletion
Left Right case (Delete 22)

15 15 Unbalanced node

Delete 22
8 22 8 23

6 12 18 25 6 12 18 25

3 11 14 23 3 11 14

13 13

80
AVL Tree operation - Deletion
Left Right case (cont.)

15 Unbalanced node 15 Unbalanced node

1. Left rotation
12 23 8 23

8 14 18 25 6 12 18 25

6 11 13 3 11 14

3 13

81
AVL Tree operation - Deletion
Left Right case (cont.)

15 Unbalanced node 12

2. Right rotation
12 23 8 15

8 14 18 25 6 11 14 23

6 11 13 3 13 18 25

Balanced
3

82
AVL Tree operation - Deletion
Right Right case (Delete 8)

15 15 Unbalanced node

Delete 8
8 22 11 22

6 10 18 25 6 10 18 25

11 16 20 24 30 16 20 24 30

50 50

83
AVL Tree operation - Deletion
Right Right case (cont.)

22 15 Unbalanced node

Left rotation
15 25 11 22

11 18 24 30 6 10 18 25

6 10 16 20 50 16 20 24 30

Balanced 50

84
AVL Tree operation - Deletion
Right Left case (Delete 8)

15 15 Unbalanced node

Delete 8
8 22 11 22

6 10 18 25 6 10 18 25

11 16 20 30 16 20 30

21 21

85
AVL Tree operation - Deletion
Right Left case (cont.)

15 Unbalanced node 15 Unbalanced node

1. Right rotation
11 18 11 22

6 10 16 22 6 10 18 25

20 25 16 20 30

21 21
30

86
AVL Tree operation - Deletion
Right Left case (cont.)

15 Unbalanced node 18

2. Left rotation
11 18 15 22

6 10 16 22 11 16 20 25

20 25 6 10 21 30

21 Balanced
30

87
HEAP
Heap

Data structure for finding the node with the largest or smallest
key value among the nodes in a complete binary tree
Max. heap
Complete binary tree for finding the node with the largest key
value
Key value of parent node > Key value of child node
Root node: Node with the largest key value
Min. heap
Complete binary tree for finding the node with the smallest key value
Key value of parent node < Key value of child node
Root node: Node with the smallest key value

89
Heap example
33

31 27

21 22 18 23

14 19 Max. heap 15

16 18

26 29 25 22

33 37 Min. heap

90
Examples of trees which are not heaps
Explain why they are not heaps.

8 8

6 5 11 7

4 5 1 4 11 14

Tree 2
Tree 1

91
Heap operation - Insertion

Insert 17
1 1

20 20

2 3 2 3

15 19 15 19
4 5 6 4 5 6 7

4 13 11 4 13 11

Heap before Extension of places


insertion to be inserted
1

20

2 3

15 19
4 5 6 7

4 13 11 17

Store inserted element at the


extended place 92
Insert 23
1 1 Exchange
places
20 20

2 3 2 3
Exchange
15 19 places 15 23
4 5 6 7 4 5 6 7

4 13 11 23 4 13 11 19

1) (insertion node 23 > parent node 19) : 2) (insertion node 23 > parent node 19) :
Exchange places Exchange places

23

2 3

15 20
4 5 6 7

4 13 11 19

3) The place is finalized because there is no


parent node to be compared

93
Heap operation - Deletion

Elements of the root node can only be deleted in the heap.


The element of the root node is deleted and returned.
Maximum or minimum value can be calculated depending on
the type of the heap.
Comparison with priority queues

94
Examples of deletions in heap
Deletion of
element of root 1 Change 1

10 places 10

2 3 2 3 2 3

15 19 15 19 15 19
4 5 6 4 5 6 4 5

4 13 11 4 13 4 13

1) Deletion of element of 2) Deletion of last 3) (insertion node 10 < child node 19):
root node Change places

19

2 3

15 10
4 5

4 13

4) Place is finalized

95
Use of Heap 1
Representative 2 examples of using heap are implementation of a
special queue and sort.
The most effective way of implementing a priority queue is to use a
heap.
The time complexity of addition/deletion of one node is O(logN), and
maximum/minimum value can be calculated with O(1).
Management cost is low compared to the complete sort.
The form of a tree can be easily implemented through arrays.
Parent or child nodes can be easily found through O(1) operation.
The child nodes of the node at location N locate at 2N and 2N+1.
The location of addition/deletion can be easily determined through the start
and end indices of data due to the characteristics of the complete binary tree.

96
Use of Heap 2

Heap sort is performed in the similar way as that of the binary


tree by using the heap data structure.
2 steps for sorting:
1. One value is inserted into a heap.
2. Delete values one by one from the heap sequentially (in increasing order).

Time complexity of heap sort


N node insertion operations + N node deletion operations
Insertion and deletion operations are both performed with O(logN).
Therefore, total sort is O(NlogN).

Heap sort is useful for sorting data stored in arrays.

97
TREAP
Treap
First introduced in 1989 by Aragon and Seidel.
The idea is to use Randomization and Binary Heap property to
maintain balance with high probability (a “tree - heap”).
Every node of Treap maintains two values.
Key Follows standard BST ordering (left is smaller and right is greater).
Priority Randomly assigned value that follows Max (Min) Heap property.
Like AVL Tree, Treap uses rotations to maintain Max-Heap property
during insertion and deletion.

99
Why Treap?
High probability of O( lg N ) performance for any set of input
Priorities are chosen randomly when node is inserted
High probability of avoiding O(N) operation
Code is less complex
Perhaps the simplest of BSTs that try to improve performance
Priorities don’t have to be updated to keep tree balanced
Non-recursive implementation possible

100
Treap example

Key Priority
(BST) (Max Heap)

60|9

40|8 80|7

20|5 50|3 70|4 90|6

10|1 30|2

101
Treap operation - Insertion
1. Create new node with key equals to x and value equals to a
random value.
2. Perform standard BST insert.
3. Use rotations to make sure that inserted node’s priority follows
max heap property.
If new node is inserted in left subtree and root of left subtree has higher
priority, perform right rotation.
If new node is inserted in right subtree and root of right subtree has higher
priority, perform left rotation.

102
Treap operation - Insertion

Insert 15|7 (key = 15, priority = 7)

60|9 60|9

40|8 80|7 Insert 15 (BST) 40|8 80|7

20|5 50|3 70|4 90|6 20|5 50|3 70|4 90|6

10|1 30|2 10|1 30|2

15|7

103
Treap operation - Insertion

Insert 15|7 (cont.)

60|9 60|9

40|8 80|7 Left rotation 40|8 80|7

20|5 50|3 70|4 90|6 20|5 50|3 70|4 90|6

15|7 30|2 10|1 30|2

10|1 15|7

104
Treap operation - Insertion

Insert 15|7 (cont.)

60|9 60|9

40|8 80|7 Right rotation 40|8 80|7

20|5 50|3 70|4 90|6 15|7 50|3 70|4 90|6

15|7 30|2 10|1 20|5

10|1 30|2

Treap
105
Treap operation - Deletion

1. If node is a leaf, delete it.


2. If node has one child NULL and other as non-NULL, replace
node with the non-empty child.
3. If node has both children as non-NULL, find max priority of
left and right children.
If priority of right child is greater, perform left rotation at node.
If priority of left child is greater, perform right rotation at node.

106
Treap operation - Deletion

Delete key = 40

60|9 60|9

40|8 80|7 Right rotation 20|5 80|7

20|5 50|3 70|4 90|6 10|1 40|8 70|4 90|6

10|1 30|2 30|2 50|3

107
Treap operation - Deletion

Delete key = 40 (cont.)

60|9 60|9

20|5 80|7 Left rotation 20|5 80|7

10|1 50|3 70|4 90|6 10|1 40|8 70|4 90|6

40|8 30|2 50|3

30|2

108
Treap operation - Deletion

Delete key = 40 (cont.)

60|9 60|9

20|5 80|7 Replace 20|5 80|7

10|1 50|3 70|4 90|6 10|1 50|3 70|4 90|6

40|8 30|2

30|2

109
Treap Summary

Operations
Search in expected O(log n) time
Insert in expected O(log n) time
Delete in expected O(log n) time
but worst case O(n)
Memory use
O(1) per node
about the cost of AVL trees
Very simple to implement, little overhead – less than AVL
trees

110
TRIE
Trie

Trie is a tree for expressing the set of strings.


All strings are assumed not to be the prefix of other string.

c
{ a
“aeef” b
“ad” e
“bbfe” d b
“bbfg”
e
“c” f
}
f
e g

112
Each edge corresponds to (labeled) one character
Edges from the same node do not have the same label.
Each string corresponds to a leaf node.

c
a
b
e
d b

e
f

f e g
“aeef “

113
When expressing all suffixes of a string as Trie,
If string A  a0 a1a2 an1 exists,

the suffix starting at the location of i is


Ai  ai ai 1ai 2 an1 , n  length ( A)

1 abac
a c
b
2 7 10
b c
a
3 6 8
a c
4 9
c
5
114
String operation using suffix trie
Substring test

Is “ba” the substring of “abac”?

Finding the longest common prefix of two prefixes

What is the longest common prefix of “abac” and “ac”?

Finding kth suffix sorted in lexicographic order

What is the third suffix of “abac”?

115
Substring test through trie
Given a string, the string is checked whether it is a substring of A
string by following the edges corresponding to the root character by
character.
a b a c

Ex) 1
a c
“bac”(O) b
2 7 10
“aca”(X) b c
a
3 6 8
a c
4 9
c
5

116
Longest common prefix of two suffixes
1. Select the node corresponding to end characters of two suffixes.

2. Find the nearest common ancestor.

3. Generate the common suffix.


a b a c

Ex) 1
a c
“abac” b
2 7 10
b c
“ac” a
3 6 8
a c
4 9
c
5

117
Finding kth suffix sorted in lexicographic order
When strings are generated through depth first search, they are
sorted in lexicographic order.
Not all generated strings are not stored in memory but only the
values of indices are stored.
index 0 1 2 3
(0, 2, 1, 3) string a b a c

A  a0 a1a2 an1 1
a c
Ai  ai ai 1ai 2 an1, n  length ( A) b
2 7 10
b c
abac = a0 a
3 6 8
ac = a2 c
a
bac = a1 4 9

c = a3 c
5

118
Compressed trie
Compress nodes and edges into a substring

a c c
a
b
e
d b d bbf
eef
e
f

f e g e g

119
Suffix Trees

Compact expression of trie, which includes all suffixes of a


string.
Suffix tree had been first introduced by Weiner (1973), and
then suffix array was known for reducing spatial complexity.
Algorithms required for string operations can be rapidly
implemented.

Application of suffix trees


Exact string matching
Substring matching
Longest common substring of 2 strings

120
Attributes of the suffix tree of string S having length of s:

① Leaf nodes numbered from 1 to s exist.


② Internal nodes except the root have at least 2 child nodes.
③ Each edge is given a label of substring of the string S.
④ Two edges from a node cannot have string labels starting
with the same letter.
⑤ The strings, to which string labels existing on the path from
the root to the leaf node i are connected, are called Suffix
S[i..m], for i = 1, . . ., s.

121
The following is a suffix tree for the string S = {xabxac}.

S = “x a b x a c”
123456 a
c
{ xa
6
S6= c c
S5= ac
S4= xac 5 bxac
S3= bxac c
S2= abxac bxac
4
bxac
S1= xabxac 3
} 2
1

122
S = suffix tree of {xabxa} 4th and 5th suffixes become the
prefix of other suffix, and are not
expressed by the leaf node.
S = “x a b x a”
12345
{ a
S5= a xa
S4= xa 5
S3= bxa
S2= abxa 4
bxa
S1= xabxa
} bxa
bxa
3
2
1

123
In order to express the case where a suffix becomes a prefix
of other suffix, special character ($) is added to the end of the
string S.
$ is called the termination character and has the smallest value (when
comparing strings)

The start and end (i, j) of the indices of the substring T[i...j]
can be stored in order to efficiently store edge labels.

124
S = suffix tree of {xabxa$}

S = “x a b x a $”
a
123456 $
{ xa
S6= $ $ 6
S5= a$
5
S4= xa$ bxa$
$
S3= bxa$
bxa$
S2= abxa$ bxa$ 4
S1= xabxa$ 3
} 2
1

125
Generation of suffix tree - Trivial algorithm
“abab$”

Insert the longest suffix


a
b
a
b
$

Insert suffix bab$ a b


b a
a b
b $
$

126
Insert suffix ab$
a b
b a
b
$
a $
b
$

Insert suffix b$ a b
b
$
a
a $ b
b $
$

127
$
Insert suffix $ a b
b
$
a
a $ b
b $
$

Insert the starting point of the suffix to a leaf node


$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1

128
The time for adding ith suffix is O(n - i + 1). Therefore, total
generation time is O(n2).
n

1
O (i)  O (n 2
)

Ukkonen in 1995: O( nlogn )


Martin Farach in 1997: O(n)

129
Suffix Array

Array in which suffixes of a text are listed in lexicographic


order, which was introduced by Manber and Myers in 1990.
Somewhat slower than the suffix tree although memory is used
more efficiently.
Used in various areas including indexing of text, data
compression, Bio-Informatics, and so on.

s = abab
Suffix sort in lexicographic order: ab, abab, b, bab
Suffix array: 3 1 4 2

130
Complexity of suffix array
① Suffix array has O(n) of memory size.
② The array is generated in O(nlogn).
③ The existence of pattern P in a text T is calculated in O(|P| + logn).

Advantages of suffix array


① Generation method is simpler than that of suffix tree.
② Can be implemented with small memories.
- Composed of 2 arrays with linear sizes. Typically known to use a
fourth of memory compared to the suffix tree.

131
Generation of suffix array
string S = banana$

i Si A[i] Si
1 banana$ 7 $
Sort
2 anana$ 6 a$
3 nana$ 4 ana$
4 ana$ 2 anan$
5 na$ 1 banana$
6 a$ 5 n a&
7 $ 3 n a n a&

132
LCP array

Auxiliary data structure of suffix arrays, which generates


Longest Common Prefix (LCP) array.

LCP array stores the length of the longest common prefix


between continuous two suffixes in a sorted suffix array.

The array is used for carrying out traversal of suffix arrays or


pattern matching efficiently.

133
string S = banana$

i Si A[i] Si
LCP[i]
1 banana$ 7 $ 0
2 anana$ 6 a$ 0
3 nana$ 4 ana$ 1
4 ana$ 2 anan$ 3
5 na$ 1 banana$ 0
6 a$ 5 na$ 0
7 $ 3 nana$ 2

LCP[4] = 3
Common suffix of A[3] = S[4,7] = ana$ and A[4]=S[2, 7] = anana$
becomes ana.

134
Suffix tree and suffix array
string
1 2 3 4 5 6
a b c a b $

Value of leaf node Starting Suffix


point array
Starting point of
suffix $ 1 abcab$
4 ab$
b
5 b$
6
2 bcab$
$ $ 3 cab$
6 $
3
5
1 4
2

135
SEGMENT TREE
Segment Tree
Segment Tree is a tree data structure for storing intervals, or
segments.
It allows querying which of the stored segments contain a given
point.
It is, in principle, a static structure; that is, its structure cannot be
modified once it is built.
A similar data structure is the Interval Tree.

137
Structure of a Segment Tree
The root of T will represent the whole array A[0, N-1].
Each leaf of T will represent a single element A[i], 0 <= i < N.
The internal nodes of T represent the intervals A[i, j], 0 <= i < j < N.
1
vertex interval
[0-7]

2 3
[0-3] [4-7]

4 5 6 7
[0-1] [2-3] [4-5] [5-7]

8 9 10 11 12 13 14 15
[0-0] [1-1] [2-2] [3-3] [4-4] [5-5] [6-6] [7-7]

How many vertices of T if the array has N elements?


138
Building a Segment Tree
Since a segtree is a binary tree, we can use a simple linear array to
represent the segment tree.
In almost any segtree problem we need think about what we need to
store in the segment tree?

// Building a Segment Tree represent the sum [start, end]


void build (int node, int start, int end) {
if (start == end) {
// Leaf node will have a single element
T[node] = A[start];
}
else {
int mid = (start + end) / 2;
// Recurse on the left child
build(2*node, start, mid);
// Recurse on the right child
build(2*node+1, mid+1, end);
// Internal node will have the sum of both of its children
T[node] = T[2*node] + T[2*node+1];
}
}
139
Querying the Segment Tree
To query on a given range, we need to check 3 conditions:
1. range represented by a node is completely outside the given range.
2. range represented by a node is completely inside the given range.
3. range represented by a node is partially inside and partially outside
the given range.
// Find the sum of [l, r]
int query(int node, int start, int end, int l, int r)
{
// case 1
if (r < start || end < l)
return 0;
// case 2
if (l <= start && end <= r)
return T[node];
// case 3
int mid = (start + end) / 2;
int p1 = query(2*node, start, mid, l, r);
int p2 = query(2*node+1, mid+1, end, l, r);
return (p1 + p2);
}
140
Structure of a Segment Tree
Example
A[8] = {8, 2, 6, 1, 9, 10, 3, 6}

the sum 45 the range


[0-7]

17 28
[0-3] [4-7]

10 7 19 9
[0-1] [2-3] [4-5] [5-7]

8 2 6 1 9 10 3 6
[0-0] [1-1] [2-2] [3-3] [4-4] [5-5] [6-6] [7-7]

141
Querying the Segment Tree
Example
A[8] = {8, 2, 6, 1, 9, 10, 3, 6}
The sum of [0, 6] is:
sum(0, 3)+sum(4, 5)+sum(6, 6) 45
= 17+19+3 = 39. [0-7]

17 28
[0-3] [4-7]

10 7 19 9
[0-1] [2-3] [4-5] [5-7]

8 2 6 1 9 10 3 6
[0-0] [1-1] [2-2] [3-3] [4-4] [5-5] [6-6] [7-7]

142
BINARY INDEXED TREE
Binary Indexed Tree
Binary Indexed Tree (Fenwick Tree) is a data structure used
to efficiently calculate and update cumulative frequency tables, or
prefix sums.
This structure was proposed by Peter Fenwick in 1994 to improve
the efficiency of arithmetic coding compression algorithms.
The idea of Binary Indexed Tree (BIT) is based on the fact that all
positive integers can be represented as sum of powers of 2.
Every node of BIT stores n elements where n is a power of 2.
parent[v] = v+1, if v is odd
parent[v] = 2*parent[v/2], if v is even

144
BIT example
16

8 12 14 15

4 6 7 10 13

2 3 5 9 11

1
parent[v] = v + (v & -v)

Why?
145
Example

We have an array arr[0 . . . n-1]. We should be able to


Find the sum of first i elements.

Change value of a specified element of the array arr[i] += x, 0 <= i <= n-1.

Simple solutions:
Run a loop from 0 to i-1 to calculate the sum of elements. To update a value,
do arr[i] = x. The first operation takes O(n) time and second operation takes
O(1) time.

Create another array and store sum from start to i at the i’th index in this array.
The sum can be calculated in O(1) time, but update operation takes O(n) time.

Segment Tree and BIT have complexity O(log n) for both operations.

The advantages of BIT over Segment are, requires less space and very
easy to implement.

146
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8 index

0 sum

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 0 0 0
BIT 0 0 0 0 0 0 0 0
2 3 5
0 0 0

147
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
3

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 3 0 0
BIT 3 3 0 3 0 0 0 3
2 3 5
3 0 0

1
update 3

148
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
8

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 8 0 0
BIT 3 8 0 8 0 0 0 8
2 3 5
8 0 0

149
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
9

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 9 0 0
BIT 3 8 1 9 0 0 0 9
2 3 5
8 1 0

150
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
12

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 0 0
BIT 3 8 1 12 0 0 0 12
2 3 5
8 1 0

151
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
14

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 2 0
BIT 3 8 1 12 2 2 0 14
2 3 5
8 1 2

152
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
20

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 8 0
BIT 3 8 1 12 2 8 0 20
2 3 5
8 1 2

153
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
28

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 28
2 3 5
8 1 2

154
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
30

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2

155
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it
is maximum and is not child of index
Return sum.

156
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8

is maximum and is not child of index 30

Return sum.

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2

Find the sum from 1->6 1

157
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8

is maximum and is not child of index 30

Return sum.

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2

Find the sum from 1->6 1

index = 6 3
Sum = Sum + BIT[6] = 8
158
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8

is maximum and is not child of index 30

Return sum.

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2

Find the sum from 1->6 1

index = 4 3
Sum = Sum + BIT[4] = 20
159
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8

is maximum and is not child of index 30

Return sum.

4 6 7
1 2 3 4 5 6 7 8

arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2

Find the sum from 1->6 1

index = 0 => stop 3


Sum = 20
160
References
[1] Algorithm Problem Solving Advanced. SATTI.
[2] T. H. Cormen, Introduction to algorithms. The MIT Press, Cambridge,
Masachusetts; London, 2009.
[3] GeeksforGeeks, http://www.geeksforgeeks.org
[4] Wikipedia, https://en.wikipedia.org

161

You might also like