01 - Advanced Data Structure
01 - Advanced Data Structure
Hashing 3
Tree 11
Disjoint-Sets 38
Binary Search Tree 56
AVL Tree 66
Heap 88
Treap 98
Trie 111
Segment Tree 136
Binary Indexed Tree 143
2
HASHING
Presentation of the problem: Find directly by
using a file name
Most file systems do not have limitation in the number of files
that can be contained in a directory. Therefore, many files can
exist in each of various directories.
Consider the case where following tasks are frequently
performed.
List the files existing in a directory
Follow the directory path
Determine whether a specific file exists
It can cause drop in performance.
4
Hashing
Hash function
Function converting the search key
into the location of the item
Hash table
Table storing the items at the addresses returned by the hashing
function
5
Process of search by hashing
First calculate the address by inputting the search key into a hash
function, and then move to the hash table corresponding to the
address calculated
Search succeeds when the wanted item is at the corresponding
address, fails otherwise
Hash function
Hash table
Search key Hash address
∽
6
Collision
The case where returned hash addresses are the same when different
search keys are applied to a hash function
Even if the hash function distributes hash addresses evenly, collision
may be inevitable as the number of data stored in the hash table
increases.
7
Solutions to collision
Open addressing
Chaining
Chaining
Method of enabling the data with one or more key values to be
stored in each bucket by changing the structure of the hash table
8
Ex: Picture
Collision occurs between the hash addresses of John Smith and
Sandra Dee.
9
Open Addressing
If collision occurs because there in no empty space a the address
calculated by the hash function, investigate whether empty space
exists at the next space.
If there is an empty space, store the item for the search key.
If there is no empty space, repeat searches until an empty space appears.
10
TREE
Presentation of the problem: Calculator
2 *
3 4
12
Tree
13
Definition of Tree
Tree is a finite set composed of one or more nodes, satisfying
following conditions.
1. Among the nodes, the node having no parent is called a root.
2. Other nodes can be separated into n (>= 0) separated sets T1,…,TN.
In turn, each of the sets, T1, … , TN, becomes a tree in itself which
is called the subtree of the root (recursive definition).
Tree T
A
B C D
Leaf node
E F G H I J
K Subtree
T2 Subtree
T3
Subtree
T1
14
Terms on Tree
Node: Element of tree, which is often called the vertex.
Node of the tree T - A,B,C,D,E,F,G,H,I,J,K
B C D
E F G H I J
K
tree T
15
Sibling nodes: Child nodes of a common parent node
B, C and D are sibling nodes.
Ancestor node: All nodes existing on the path along the edges to the
root node
Ancestor nodes of K: F, B, A
Subtree: A tree generated when the edge connected to the parent
node is broken
Descendent node: Nodes at the lower level of the subtree
Descendent nodes of B – E, F, K
A
B C D
E F G H I J
16
Degree
Degree of a node: The number of child nodes connected to the node
Degree of B = 2, degree of C = 1
Degree of a tree: The largest number among the degrees of nodes in a tree
Degree of tree T = 3
leaf node (leaf node): The node with degree of 0, i.e., the node without a child node
Height
Height of a node: The number of edges leading to the node from the root. The level
of the node
Height of B = 1, height of F = 2
Height of tree: The largest value among the heights of nodes in a tree. Maximum
level
Height of tree T = 3
A Level 0 Height 0
B C D Level 1 Height 1
E F G H I J Level 2 Height 2
A A A
A
B B B C
D E
18
Binary tree - Features
1 Level 0
2 3 Level 1
4 5 6 7 Level 2
8 9 10 11 12 13 14 15 Level 3
19
Binary Tree - Types
Full Binary Tree
The binary tree filled with nodes with saturated state at all levels
The binary tree having the maximum number of nodes, (2h+1-1), when the
height is h,
When the height is 3, 23+1-1 = 15 nodes
Starting as 1 from the root, the nodes have numbers corresponding to the
designated locations to 2h+1-1
1
2 3
4 5 6 7
8 9 10 11 12 13 14 15
20
Complete Binary Tree
The binary tree which has no empty space from the node 1 to node
n of the full binary tree when the height is h and the number of
nodes is n (provided h+1 ≤ n < 2h+1-1 )
2 3
4 5 6 7
8 9 10
21
Skewed Binary Tree
The binary tree having child node skewed in one direction while
having the minimum number of nodes for height h
1 1
2 2
3 3
4 4
22
Binary Tree - Traversal
23
Traversal: Visiting nodes of a tree systematically
3 basic traversal methods
Preorder traversal : VLR
Nodes are visited in the order of the left child, root and then right child.
L R
Left Right
subtree subtree
24
Preorder traversal
Operation method
preorder_traverse (TREE T)
IF T is not null
visit(T)
preorder_traverse(T.left)
preorder_traverse(T.right)
25
Example of Preorder traversal
T0
T1
26
Inorder traversal
Executing method
27
Example of inorder traversal
T0
T1
28
Postorder traversal
Execution method
postorder_traverse (TREE T)
IF T is not null
postorder_traverse(T.left)
postorder_traverse(T.right)
visit(T)
29
Example of postorder traversal
T0
T2
H I
T1
30
Expression of tree
1
Level 0 A
2 3
Level 1 B C
4 5 6 7
Level 2 D E F G
8 9 10
Level 3 H I J
31
Properties of the node index
What is the index of the parent node of the node with index i? i/2
What is the index of the left child node of the node with index i? 2*i
What is the index of the right child node of the node with index i? 2*i+1
What is the starting index of the node indices at level n? 2n
0
1 A
1 2 B Index of parent node
level 0 A 3 C = 2
2 3
4 D
B C 5 E
4 5 6 7 6 F
7 G
D E F G
8 9 10 11 12 13 8 H
9 I Index of left child
H I J K L M
10 J node = 10
11 K
Index of right
12 L child node = 11
13 M
32
Expression of binary tree by using arrays
Node index is used as the index of the array
What is the size of the array for a binary tree of height h?
What is the maximum number of nodes at level i? 2i
Hence, 1 + 2 + 4 + 8 … + 2i = ∑2i = 2h+1-1
1
A
2 3
B C
4 5 6 7
D E F G
8 9 10
H I J
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- A B C D E F G H I J - - - - -
33
Expression of binary tree by using arrays
1 1
A A
2 3
B B
4 7
C C
8 15
D D
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- A B - C - - - D - - - - - - -
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- A - B - - - C - - - - - - - D
34
Disadvantages of expressing binary tree by using arrays
In the case of skewed binary trees, memory space can be wasted for
the element of an array which is not used
35
Expression of tree – Linked List
36
Expression of complete binary tree by using linked lists
● A ●
● B ● ● C ●
● D ● ● E ● ● F ● ● G ●
null H null null I null null J null null K null null L null
37
DISJOINT-SETS
Disjoint-sets
39
Disjoint-Sets examples
Make-Set(x)
Make-Set(y) y
x
Make-Set(a)
a
Make-Set(b) b
Union(x, y)
Union(a, b)
Union(x, a)
40
Representation of Disjoint-sets – Linked
list
Elements in the same set are managed by a linked list.
The front element of the linked list is set as the representative
of the set.
Each element has a link indicating the representative of the
set.
rep tail rep tail rep tail
a d e b f c
41
Examples of linked list operations
Find-Set(e) return a
Find-Set(f) return b
Union(a, b)
a d e b f c
42
Disjoint-Sets expression - tree
A disjoint set is represented as a tree.
A child node indicates a parent node, and the root node
becomes a representative.
a b c
d e f
43
Operation examples
Make-Set(a) ~ Make-Set(f)
a b c d e f
a b c e
d f
44
Union(d, f)
a b c
e
d
f
Find-Set(d) return c
Find-Set(e) return c
45
Illustration of staring by using an array of trees expressing
Disjoint-Sets
a b c
e
d
f
Index 0 1 2 3 4 5
Vertex a b c d e f
Parent -1 -1 -1 2 2 4
46
Operations on Disjoint-Sets
Find-Set(x)
IF x == p[x] : RETURN x
ELSE : RETURN Find_Set(p[x])
Union(x, y)
p[Find-Set(y)] ← Find-Set(x)
47
Problems
e
Find-Set(b)
d
b
f
a
b c d e
48
Methods for enhancing the efficiency of operation
When unifying two sets, the set with low rank is attached to the set
with high rank.
Path compression
Change the pointer in the process of performing Find-set so that all
nodes directly points to the root.
49
Examples of Union using ranks
2 2
1
a e a
+ =
0 0 1 0 1
1 0
d b d e
b f g
0 0 0
0
c c f g
50
Examples of increasing ranks in the Union using ranks
2 3
2
a e a
+ =
0 1 0 1 0 2
1
d b d e
b f g
0 1 0
0 0
c h c f g
0
h
51
Example of Path Compression
a a
Find-Set(h)
b d b d e h
c e c f g
f g h
52
Make_Set() operation
Make_Set( x ): Operation generating a new set containing a unique
member x
p[x]: Store the parent of node x
rank[x]: Store the rank value of the tree with root
node being x
Make_Set(x)
p[x] ← x
rank[x] ← 0
53
Find_Set operation
Find_Set( x ): Operation finding the set containing x
Find_Set(x)
IF x ≠ p[x] // when x is not root
p[x] ← Find_Set(p[x])
RETURN p[x]
54
Union operation
Union( x , y ): Operation unifying two sets including x and y
Union(x, y)
Link( Find_Set(x), Find_Set(y) )
Link(x, y)
IF rank[x] > rank[y] // rank in the height of tree
p[y] ← x
ELSE
p[x] ← y
IF rank[x] == rank[y]
rank[y]++
55
BINARY SEARCH TREE
Binary Search Tree
Data structure for performing search operations efficiently
Every element has unique key which is different from each
other.
key(left subtree)<key(root node)<key(right subtree)
The left subtree and right subtree are also binary search trees.
Values sorted in increasing order can be obtained through
inorder traversal.
root 8
3 10
5 14
2
Left Right 11 16
subtree subtree
Value smaller than Value larger than
Value smaller than root Value larger than root root
root 57
Operations of binary search tree
Search operation
Start search at root
Compare the key value x to be searched to the key value k of
the root node.
x == k: Success in search
x < k: Perform search operation on the left subtree of the
root node.
x > k: Perform search operation on the right subtree of the
root node.
Perform the search operation repeatedly on subtrees through
traversal.
Search is failed when there is no subtree to perform the search.
58
Search operation
Search 13
Start search
9 1 13 > 9
4 12 2 13 > 12
3 6 15 2 13 < 15
4 13 = 13 13 17
59
Insertion operation
1) Search operation is performed first
Confirm through search whether there are any element which is the same
as the element to be inserted is in the tree, since insertion is impossible
in that case.
The location at which failure in search is determined becomes the place
of insertion.
9 1 5<9 9
2 5>4 4 12 4 12
3 6 53 < 6 15 3 6 15
Failure in
search
13 17 5 13 17
4 Insert to left child node
60
Binary Search Tree – Operation exercise
Deletion operation
Consider the algorithms for deletion operation.
Delete 13, 12 and 9 sequentially for the following tree.
4 12
3 6 15
13 17
61
Solution
4 12
3 6 15
1 Search
13 17
2 Deletion
62
Delete 12: When the node to be deleted is a leaf node: When
the degree is 1
9
1 Search
4 2 Deletion 12
3 Follow up measure:
move child node
3 6 15
17
63
Delete 9: When the node to be deleted is a leaf node: When
the degree is 2
9 1 Search
3 Move
4 15
3 6 17
2 Find candidate
64
Binary Search Tree - Performance
The time for searching, insertion and deletion corresponds to
the height of the tree.
O(h), h: Height of BST
In average cases
In the case where the binary tree has been created in balanced way
O(log n)
65
AVL TREE
AVL Tree
The AVL tree is named after its two Soviet inventors, Georgy
Adelson-Velsky and Evgenii Landis, who published it in their 1962
paper "An algorithm for the organization of information".
AVL tree is a self-balancing Binary Search Tree (BST) where the
difference between heights of left and right subtrees cannot be more
than one for all nodes.
Lookup, insertion, and deletion all take O(log n) time in both the
average and worst cases, where n is the number of nodes in the tree
prior to the operation.
Insertions and deletions may require the tree to be rebalanced by one
or more tree rotations.
67
AVL Tree example
9 5
4 14 3 6
3 6 12 15 2 4
68
Not AVL Tree example
10 8
3 15 6 10
1 6 11 4 5
12
69
AVL Tree operation - Insertion
Perform standard BST insert for new node w.
Starting from w, travel up and find the first unbalanced node. Let z be
the first unbalanced node, y be the child of z that comes on the path
from w to z and x be the grandchild of z that comes on the path from
w to z.
Re-balance the tree by performing appropriate rotations on the
subtree rooted with z. There can be 4 possible cases that needs to be
handled as x, y and z can be arranged in 4 ways.
Left Left : y is left child of z and x is left child of y
Left Right : y is left child of z and x is right child of y
Right Right : y is right child of z and x is right child of y
Right Left : y is right child of z and x is left child of y
70
AVL Tree operation - Insertion
Left Left case (Insert 15)
12 12
Unbalanced node
Right rotation
10 22 10 18
6 18 25 6 16 22
16 20 15 20 25
15 Balanced
71
AVL Tree operation - Insertion
Left Right case (Insert 19)
12 12
6 18 25 6 20 25
16 20 18
19 16 19
72
AVL Tree operation - Insertion
Left Right case (cont.)
12 12
Unbalanced node
2. Right rotation
10 20 10 22
6 18 22 6 20 25
16 19 25 18
Balanced 16 19
73
AVL Tree operation - Insertion
Right Right case (Insert 30)
12 12
Unbalanced node
Left rotation
10 22 10 25
6 18 25 6 22 26
24 26 18 24 30
30 Balanced
74
AVL Tree operation - Insertion
Right Left case (Insert 23)
12 12
6 18 25 6 18 24
24 26 23 25
23 26
75
AVL Tree operation - Insertion
Right Left case (cont.)
12 12
Unbalanced node
2. Left rotation
10 24 10 22
6 22 25 6 18 24
18 23 26 23 25
Balanced 26
76
AVL Tree operation - Deletion
Perform standard BST delete for the node w.
Starting from w, travel up and find the first unbalanced node. Let z be
the first unbalanced node, y be the larger height child of z, and x be
the larger height child of y. Note that the definitions of x and y are
different from insertion.
Re-balance the tree by performing appropriate rotations on the
subtree rooted with z. There can be 4 possible cases that needs to be
handled as x, y and z can be arranged in 4 ways.
Left Left : y is left child of z and x is left child of y
Left Right : y is left child of z and x is right child of y
Right Right : y is right child of z and x is right child of y
Right Left : y is right child of z and x is left child of y
77
AVL Tree operation - Deletion
Left Left case (Delete 22)
15 15 Unbalanced node
Delete 22
8 22 8 23
6 10 18 25 6 10 18 25
3 7 11 23 3 7 11
5 5
78
AVL Tree operation - Deletion
Left Left case (cont.)
8 15 Unbalanced node
Right rotation
6 15 8 23
3 7 10 23 6 10 18 25
5 11 18 25 3 7 11
Balanced
5
79
AVL Tree operation - Deletion
Left Right case (Delete 22)
15 15 Unbalanced node
Delete 22
8 22 8 23
6 12 18 25 6 12 18 25
3 11 14 23 3 11 14
13 13
80
AVL Tree operation - Deletion
Left Right case (cont.)
1. Left rotation
12 23 8 23
8 14 18 25 6 12 18 25
6 11 13 3 11 14
3 13
81
AVL Tree operation - Deletion
Left Right case (cont.)
15 Unbalanced node 12
2. Right rotation
12 23 8 15
8 14 18 25 6 11 14 23
6 11 13 3 13 18 25
Balanced
3
82
AVL Tree operation - Deletion
Right Right case (Delete 8)
15 15 Unbalanced node
Delete 8
8 22 11 22
6 10 18 25 6 10 18 25
11 16 20 24 30 16 20 24 30
50 50
83
AVL Tree operation - Deletion
Right Right case (cont.)
22 15 Unbalanced node
Left rotation
15 25 11 22
11 18 24 30 6 10 18 25
6 10 16 20 50 16 20 24 30
Balanced 50
84
AVL Tree operation - Deletion
Right Left case (Delete 8)
15 15 Unbalanced node
Delete 8
8 22 11 22
6 10 18 25 6 10 18 25
11 16 20 30 16 20 30
21 21
85
AVL Tree operation - Deletion
Right Left case (cont.)
1. Right rotation
11 18 11 22
6 10 16 22 6 10 18 25
20 25 16 20 30
21 21
30
86
AVL Tree operation - Deletion
Right Left case (cont.)
15 Unbalanced node 18
2. Left rotation
11 18 15 22
6 10 16 22 11 16 20 25
20 25 6 10 21 30
21 Balanced
30
87
HEAP
Heap
Data structure for finding the node with the largest or smallest
key value among the nodes in a complete binary tree
Max. heap
Complete binary tree for finding the node with the largest key
value
Key value of parent node > Key value of child node
Root node: Node with the largest key value
Min. heap
Complete binary tree for finding the node with the smallest key value
Key value of parent node < Key value of child node
Root node: Node with the smallest key value
89
Heap example
33
31 27
21 22 18 23
14 19 Max. heap 15
16 18
26 29 25 22
33 37 Min. heap
90
Examples of trees which are not heaps
Explain why they are not heaps.
8 8
6 5 11 7
4 5 1 4 11 14
Tree 2
Tree 1
91
Heap operation - Insertion
Insert 17
1 1
20 20
2 3 2 3
15 19 15 19
4 5 6 4 5 6 7
4 13 11 4 13 11
20
2 3
15 19
4 5 6 7
4 13 11 17
2 3 2 3
Exchange
15 19 places 15 23
4 5 6 7 4 5 6 7
4 13 11 23 4 13 11 19
1) (insertion node 23 > parent node 19) : 2) (insertion node 23 > parent node 19) :
Exchange places Exchange places
23
2 3
15 20
4 5 6 7
4 13 11 19
93
Heap operation - Deletion
94
Examples of deletions in heap
Deletion of
element of root 1 Change 1
10 places 10
2 3 2 3 2 3
15 19 15 19 15 19
4 5 6 4 5 6 4 5
4 13 11 4 13 4 13
1) Deletion of element of 2) Deletion of last 3) (insertion node 10 < child node 19):
root node Change places
19
2 3
15 10
4 5
4 13
4) Place is finalized
95
Use of Heap 1
Representative 2 examples of using heap are implementation of a
special queue and sort.
The most effective way of implementing a priority queue is to use a
heap.
The time complexity of addition/deletion of one node is O(logN), and
maximum/minimum value can be calculated with O(1).
Management cost is low compared to the complete sort.
The form of a tree can be easily implemented through arrays.
Parent or child nodes can be easily found through O(1) operation.
The child nodes of the node at location N locate at 2N and 2N+1.
The location of addition/deletion can be easily determined through the start
and end indices of data due to the characteristics of the complete binary tree.
96
Use of Heap 2
97
TREAP
Treap
First introduced in 1989 by Aragon and Seidel.
The idea is to use Randomization and Binary Heap property to
maintain balance with high probability (a “tree - heap”).
Every node of Treap maintains two values.
Key Follows standard BST ordering (left is smaller and right is greater).
Priority Randomly assigned value that follows Max (Min) Heap property.
Like AVL Tree, Treap uses rotations to maintain Max-Heap property
during insertion and deletion.
99
Why Treap?
High probability of O( lg N ) performance for any set of input
Priorities are chosen randomly when node is inserted
High probability of avoiding O(N) operation
Code is less complex
Perhaps the simplest of BSTs that try to improve performance
Priorities don’t have to be updated to keep tree balanced
Non-recursive implementation possible
100
Treap example
Key Priority
(BST) (Max Heap)
60|9
40|8 80|7
10|1 30|2
101
Treap operation - Insertion
1. Create new node with key equals to x and value equals to a
random value.
2. Perform standard BST insert.
3. Use rotations to make sure that inserted node’s priority follows
max heap property.
If new node is inserted in left subtree and root of left subtree has higher
priority, perform right rotation.
If new node is inserted in right subtree and root of right subtree has higher
priority, perform left rotation.
102
Treap operation - Insertion
60|9 60|9
15|7
103
Treap operation - Insertion
60|9 60|9
10|1 15|7
104
Treap operation - Insertion
60|9 60|9
10|1 30|2
Treap
105
Treap operation - Deletion
106
Treap operation - Deletion
Delete key = 40
60|9 60|9
107
Treap operation - Deletion
60|9 60|9
30|2
108
Treap operation - Deletion
60|9 60|9
40|8 30|2
30|2
109
Treap Summary
Operations
Search in expected O(log n) time
Insert in expected O(log n) time
Delete in expected O(log n) time
but worst case O(n)
Memory use
O(1) per node
about the cost of AVL trees
Very simple to implement, little overhead – less than AVL
trees
110
TRIE
Trie
c
{ a
“aeef” b
“ad” e
“bbfe” d b
“bbfg”
e
“c” f
}
f
e g
112
Each edge corresponds to (labeled) one character
Edges from the same node do not have the same label.
Each string corresponds to a leaf node.
c
a
b
e
d b
e
f
f e g
“aeef “
113
When expressing all suffixes of a string as Trie,
If string A a0 a1a2 an1 exists,
1 abac
a c
b
2 7 10
b c
a
3 6 8
a c
4 9
c
5
114
String operation using suffix trie
Substring test
115
Substring test through trie
Given a string, the string is checked whether it is a substring of A
string by following the edges corresponding to the root character by
character.
a b a c
Ex) 1
a c
“bac”(O) b
2 7 10
“aca”(X) b c
a
3 6 8
a c
4 9
c
5
116
Longest common prefix of two suffixes
1. Select the node corresponding to end characters of two suffixes.
Ex) 1
a c
“abac” b
2 7 10
b c
“ac” a
3 6 8
a c
4 9
c
5
117
Finding kth suffix sorted in lexicographic order
When strings are generated through depth first search, they are
sorted in lexicographic order.
Not all generated strings are not stored in memory but only the
values of indices are stored.
index 0 1 2 3
(0, 2, 1, 3) string a b a c
A a0 a1a2 an1 1
a c
Ai ai ai 1ai 2 an1, n length ( A) b
2 7 10
b c
abac = a0 a
3 6 8
ac = a2 c
a
bac = a1 4 9
c = a3 c
5
118
Compressed trie
Compress nodes and edges into a substring
a c c
a
b
e
d b d bbf
eef
e
f
f e g e g
119
Suffix Trees
120
Attributes of the suffix tree of string S having length of s:
121
The following is a suffix tree for the string S = {xabxac}.
S = “x a b x a c”
123456 a
c
{ xa
6
S6= c c
S5= ac
S4= xac 5 bxac
S3= bxac c
S2= abxac bxac
4
bxac
S1= xabxac 3
} 2
1
122
S = suffix tree of {xabxa} 4th and 5th suffixes become the
prefix of other suffix, and are not
expressed by the leaf node.
S = “x a b x a”
12345
{ a
S5= a xa
S4= xa 5
S3= bxa
S2= abxa 4
bxa
S1= xabxa
} bxa
bxa
3
2
1
123
In order to express the case where a suffix becomes a prefix
of other suffix, special character ($) is added to the end of the
string S.
$ is called the termination character and has the smallest value (when
comparing strings)
The start and end (i, j) of the indices of the substring T[i...j]
can be stored in order to efficiently store edge labels.
124
S = suffix tree of {xabxa$}
S = “x a b x a $”
a
123456 $
{ xa
S6= $ $ 6
S5= a$
5
S4= xa$ bxa$
$
S3= bxa$
bxa$
S2= abxa$ bxa$ 4
S1= xabxa$ 3
} 2
1
125
Generation of suffix tree - Trivial algorithm
“abab$”
126
Insert suffix ab$
a b
b a
b
$
a $
b
$
Insert suffix b$ a b
b
$
a
a $ b
b $
$
127
$
Insert suffix $ a b
b
$
a
a $ b
b $
$
128
The time for adding ith suffix is O(n - i + 1). Therefore, total
generation time is O(n2).
n
1
O (i) O (n 2
)
129
Suffix Array
s = abab
Suffix sort in lexicographic order: ab, abab, b, bab
Suffix array: 3 1 4 2
130
Complexity of suffix array
① Suffix array has O(n) of memory size.
② The array is generated in O(nlogn).
③ The existence of pattern P in a text T is calculated in O(|P| + logn).
131
Generation of suffix array
string S = banana$
i Si A[i] Si
1 banana$ 7 $
Sort
2 anana$ 6 a$
3 nana$ 4 ana$
4 ana$ 2 anan$
5 na$ 1 banana$
6 a$ 5 n a&
7 $ 3 n a n a&
132
LCP array
133
string S = banana$
i Si A[i] Si
LCP[i]
1 banana$ 7 $ 0
2 anana$ 6 a$ 0
3 nana$ 4 ana$ 1
4 ana$ 2 anan$ 3
5 na$ 1 banana$ 0
6 a$ 5 na$ 0
7 $ 3 nana$ 2
LCP[4] = 3
Common suffix of A[3] = S[4,7] = ana$ and A[4]=S[2, 7] = anana$
becomes ana.
134
Suffix tree and suffix array
string
1 2 3 4 5 6
a b c a b $
135
SEGMENT TREE
Segment Tree
Segment Tree is a tree data structure for storing intervals, or
segments.
It allows querying which of the stored segments contain a given
point.
It is, in principle, a static structure; that is, its structure cannot be
modified once it is built.
A similar data structure is the Interval Tree.
137
Structure of a Segment Tree
The root of T will represent the whole array A[0, N-1].
Each leaf of T will represent a single element A[i], 0 <= i < N.
The internal nodes of T represent the intervals A[i, j], 0 <= i < j < N.
1
vertex interval
[0-7]
2 3
[0-3] [4-7]
4 5 6 7
[0-1] [2-3] [4-5] [5-7]
8 9 10 11 12 13 14 15
[0-0] [1-1] [2-2] [3-3] [4-4] [5-5] [6-6] [7-7]
17 28
[0-3] [4-7]
10 7 19 9
[0-1] [2-3] [4-5] [5-7]
8 2 6 1 9 10 3 6
[0-0] [1-1] [2-2] [3-3] [4-4] [5-5] [6-6] [7-7]
141
Querying the Segment Tree
Example
A[8] = {8, 2, 6, 1, 9, 10, 3, 6}
The sum of [0, 6] is:
sum(0, 3)+sum(4, 5)+sum(6, 6) 45
= 17+19+3 = 39. [0-7]
17 28
[0-3] [4-7]
10 7 19 9
[0-1] [2-3] [4-5] [5-7]
8 2 6 1 9 10 3 6
[0-0] [1-1] [2-2] [3-3] [4-4] [5-5] [6-6] [7-7]
142
BINARY INDEXED TREE
Binary Indexed Tree
Binary Indexed Tree (Fenwick Tree) is a data structure used
to efficiently calculate and update cumulative frequency tables, or
prefix sums.
This structure was proposed by Peter Fenwick in 1994 to improve
the efficiency of arithmetic coding compression algorithms.
The idea of Binary Indexed Tree (BIT) is based on the fact that all
positive integers can be represented as sum of powers of 2.
Every node of BIT stores n elements where n is a power of 2.
parent[v] = v+1, if v is odd
parent[v] = 2*parent[v/2], if v is even
144
BIT example
16
8 12 14 15
4 6 7 10 13
2 3 5 9 11
1
parent[v] = v + (v & -v)
Why?
145
Example
Change value of a specified element of the array arr[i] += x, 0 <= i <= n-1.
Simple solutions:
Run a loop from 0 to i-1 to calculate the sum of elements. To update a value,
do arr[i] = x. The first operation takes O(n) time and second operation takes
O(1) time.
Create another array and store sum from start to i at the i’th index in this array.
The sum can be calculated in O(1) time, but update operation takes O(n) time.
Segment Tree and BIT have complexity O(log n) for both operations.
The advantages of BIT over Segment are, requires less space and very
easy to implement.
146
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8 index
0 sum
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 0 0 0
BIT 0 0 0 0 0 0 0 0
2 3 5
0 0 0
147
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
3
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 3 0 0
BIT 3 3 0 3 0 0 0 3
2 3 5
3 0 0
1
update 3
148
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
8
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 8 0 0
BIT 3 8 0 8 0 0 0 8
2 3 5
8 0 0
149
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
9
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 9 0 0
BIT 3 8 1 9 0 0 0 9
2 3 5
8 1 0
150
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
12
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 0 0
BIT 3 8 1 12 0 0 0 12
2 3 5
8 1 0
151
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
14
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 2 0
BIT 3 8 1 12 2 2 0 14
2 3 5
8 1 2
152
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
20
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 8 0
BIT 3 8 1 12 2 8 0 20
2 3 5
8 1 2
153
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
28
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 28
2 3 5
8 1 2
154
Updating a BIT
Initialize index as index+1.
Do following while index is smaller than or equal to n.
Add value to BIT[index]
Go to parent of BIT[index]. 8
30
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2
155
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it
is maximum and is not child of index
Return sum.
156
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8
Return sum.
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2
157
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8
Return sum.
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2
index = 6 3
Sum = Sum + BIT[6] = 8
158
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8
Return sum.
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2
index = 4 3
Sum = Sum + BIT[4] = 20
159
Querying the BIT
Initialize sum as 0 and index as index+1.
Do following while index is greater than 0.
Add BIT[index] to sum
Go to the node less than index, such that it 8
Return sum.
4 6 7
1 2 3 4 5 6 7 8
arr 3 5 1 3 2 6 8 2 12 8 8
BIT 3 8 1 12 2 8 8 30
2 3 5
8 1 2
161