Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

5 B tree

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

DATA STRUCTURE

DR. ACHAL KAUSHIK

1 10/27/2023
Paper Code(s): CIC‐209 L P C
Paper: Data Structures 4 ‐ 4
Marking Scheme:
1. Teachers Continuous Evaluation: 25 marks
2. Term end Theory Examinations: 75 marks
Instructions for paper setter:
1. There should be 9 questions in the term end examinations question paper.
2. The first (1st) question should be compulsory and cover the entire syllabus. This question should be objective, single line answers or short
answer type question of total 15 marks.
3. Apart from question 1 which is compulsory, rest of the paper shall consist of 4 units as per the syllabus. Every unit shall have two questions
covering the corresponding unit of the syllabus. However, the student shall be asked to attempt only one of the two questions in the unit.
Individual questions may contain upto 5 sub‐parts / sub‐questions. Each Unit shall have a marks weightage of 15.
4. The questions are to be framed keeping in view the learning outcomes of the course / paper. The standard / level of the questions to be asked
should be at the level of the prescribed textbook.
5. The requirement of (scientific) calculators / log‐tables / data – tables may be specified if required.
Course Objectives :
1. To introduce basics of Data structures (Arrays, strings, linked list etc.)
2. To understand the concepts of Stacks, Queues and Trees, related operations and their implementation
3. To understand sets, heaps and graphs
4. To introduce various Sorting and searching Algorithms
Course Outcomes (CO)
CO 1 To be able to understand difference between structured data and data structure
CO 2 To be able to create common basic data structures and trees
CO 3 To have a knowledge of sets, heaps and graphs
CO 4 To have basic knowledge of sorting and searching algorithms
Course Outcomes (CO) to Programme Outcomes (PO) mapping (scale 1: low, 2: Medium, 3: High)
PO01 PO02 PO03 PO04 PO05 PO06 PO07 PO08 PO09 PO10 PO11 PO12

CO 1 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 2 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 3 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
CO 4 3 2 2 2 3 ‐ ‐ ‐ 2 2 2 3
UNIT – I
Overview of data structure, Basics of Algorithm Analysis including Running Time Calculations, Abstract Data Types, Arrays, Arrays and Pointers,
Multidimensional Array, String processing, General Lists and List ADT, List manipulations, Single, double and circular lists. Stacks and Stack ADT,
Stack Manipulation, Prefix, infix and postfix expressions, recursion. Queues and Queue ADT, Queue manipulation.

UNIT – II
Sparse Matrix Representation (Array and Link List representation) and arithmetic (addition, subtraction and multiplication), polynomials and
polynomial arithmetic.
Trees, Properties of Trees, Binary trees, Binary Tree traversal, Tree manipulation algorithms, Expression trees and their usage, binary search trees,
AVL Trees, Heaps and their implementation, Priority Queues, B‐Trees, B* Tree, B+ Tree

UNIT – III
Sorting concept, order, stability, Selection sorts (straight, heap), insertion sort (Straight Insertion, Shell sort), Exchange Sort (Bubble, quicksort),
Merge sort (External Sorting) (Natural merge, balanced merge and polyphase merge). Searching – List search, sequential search, binary search,
hashing methods, collision resolution in hashing.

UNIT – IV
Disjoint sets representation, union find algorithm, Graphs, Graph representation, Graph Traversals and their implementations (BFS and DFS).
Minimum Spanning Tree algorithms, Shortest Path Algorithms

Textbook(s):
1. Richard Gilberg , Behrouz A. Forouzan, “Data Structures: A Pseudocode Approach with C, 2nd Edition, Cengage Learning, Oct 2004
2. E. Horowitz, S. Sahni, S. Anderson‐Freed, "Fundamentals of Data Structures in C", 2nd Edition, Silicon Press (US), 2007.

References:
1. Mark Allen Weiss, “Data Structures and Algorithm Analysis in C”, 2nd Edition, Pearson, September, 1996
2. Robert Kruse, “Data Structures and Program Design in C”, 2nd Edition, Pearson, November, 1990
3. Seymour Lipschutz, “Data Structures with C (Schaum's Outline Series)”, McGrawhill, 2017
4. A. M. Tenenbaum, “Data structures using C”. Pearson Education, India, 1st Edition 2003.
5. Weiss M.A., “Data structures and algorithm analysis in C++”, Pearson Education, 2014.
Data Structure

 Data Structure is a way of collecting and organizing


data in such a way that we can perform operations on
these data in an effective way
 In simple language, Data Structures are structures
programmed to store ordered data, so that various
operations can be performed on it easily

4 10/27/2023
Multi‐way Trees

 Discussed Data structure ‐ data stored in the internal memory


Support internal information retrieval
 Retrieval / Manipulation of data stored in external memory
Need special data structure
 M‐way search tree
 B‐Tree
 B+ Tree

5 10/27/2023
M‐way Tree

 Generalized version of BST


 BST has one value per node and Two sub‐trees
 M‐way has
M‐1 value per node and M sub‐trees
M is called degree of the tree (BST has M=2)
 Every internal node of an M‐way search tree
consists of pointers to M sub‐trees and
contain (M‐1) keys, where M > 2
6 10/27/2023
Structure of M‐way tree

P0 K0 P1 K1 P2 K2 … Pn‐1 Kn‐1 Pn

P0, P1, P2, …, Pn are pointers to the nodes sub‐tree


K0, K1, K2, …, Kn‐1 are the key values of the node

 All the key values are stored in ascending order K0< K1< K2 …

7 10/27/2023
M‐way search tree

 M‐way search tree


Not compulsory to have exactly M‐1 values & M sub‐trees
It can have anywhere 1 to (M‐1) values
It can have anywhere 0 (leaf node) to (1 + i) sub‐trees
where i is the number of key values in the node
 M is a fixed upper limit defines
how many key values can be stored in the node

8 10/27/2023
M‐way Search tree

15 50

5 12 20 30 55 80

25 28 85 90
 M=3
 Every sub‐tree is also an M‐way search tree and
follows the same rule

9 10/27/2023
B‐Tree

 Specialized M‐way tree – Rudolf Bayer & Ed McCreight


 Widely used for Disk Access
 B‐Tree of order m can have
Maximum of m – 1 keys
Maximum of m pointers to its sub‐trees
 Storing large number of keys in a single node keeps the
height of the tree relatively small

10 10/27/2023
B‐tree properties

 Similar to BST

 All properties of M‐way search tree


 Every node has at most m children
 Every node except the root and leaf has at least m/2 children
 Root node has at least two children except for terminal node
 All leaf nodes are at the same level

11 10/27/2023
Insert: B‐tree

 All insertion are done at the leaf node level


 Search to find the leaf node where the new key value to be inserted
 If the leaf node is not full (less than m‐1 key values) then
 insert the new element in the node (ordered)
 If the leaf node is full (m‐1 key values), then
 Insert the new value in order into the existing set of keys
 Split the node at its median into two nodes (split nodes are half full)
 Push the median element up to its parents node.
 If the parents node is already full then split the parent node (same steps)

12 10/27/2023
B‐tree of order 5
 Insert 8, 9, 39, 4
18 45 72

7 11 21 27 36 42 54 63 81 89 90

36

8 18 45 72

4 7 21 27 39 42 54 63 81 89 90

9 11
13 10/27/2023
B‐tree of order 5
 Insert 8, 9, 39, 4
18 45 72

7 11 21 27 36 42 54 63 81 89 90

36

8 18 45 72

4 7 21 27 39 42 54 63 81 89 90

9 11
14 10/27/2023
Delete: B‐tree

 Deletion are also done at the leaf node level (virtually)


 Two cases:
Leaf node
Internal node

15 10/27/2023
Deleting a Leaf node
1. Locate the leaf node which is to be deleted
2. If the leaf node contains more than minimum number of key values
(m/2 key values),
 then delete the value
3. Else If the leaf node does not contain even m/2 elements, then
 fill the node by taking either from the left or from the right sibling

 If the left sibling has more than the minimum number of key values
 push its largest key into its parent’s node and
 Pull down the intervening element from the parent node to the leaf node
where the key is deleted
 Else if the right sibling has more than minimum number of key values
 push its smallest key into its parent’s node and
 Pull down the intervening element from the parent node to the leaf node
where the key is deleted
16 10/27/2023
Deleting an internal node

 Promote the successor or predecessor of the key to be


deleted to occupy the position of the deleted key
 Successor or predecessor will always be in the leaf node
 Processing will be done as if a value from the leaf node has
been deleted

17 10/27/2023
B‐tree of order 5
 Delete 93, 201, 180, 72
108

63 81 117 201

36 45 72 79 111 114 151 180

243 256 333 450


90 93 101

81 108 117 256

36 45 63 79 151 243 333 450

18 10/27/2023
90 101 111 114
Example: B‐tree

 B‐tree of order 5
 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12, 20, 26, 4, 16, 18, 24, 25, 19

19 10/27/2023
Example: B‐tree

 B‐tree of order 5
 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12, 20, 26, 4, 16, 18, 24, 25, 19

20 10/27/2023
Example: B‐tree

 B‐tree of order 4
 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12, 20, 26, 4, 16, 18, 24, 25, 19

21 10/27/2023
2‐3 tree

 B‐tree of order 3
 Each non‐root node has either two or three sub‐trees
 Root may have zero, two or three sub‐trees
 Complete tree
[|33|72|]

[|17|23|] [|42|68|] [|85|96|]

22 10/27/2023
2‐3 tree

 2‐3 tree with empty entries


[|33|72|]

[|14|21|] [| 44|65|] [|94| |]

[|11|13|] [|17| |] [|24|28|] [|35|42|] [|46|58|] [|68| |] [|85| |] [|96|98|]

23 10/27/2023
2‐3‐4 tree

 B‐tree or order 4
 Each node can have either two or three or four sub‐trees
[|42|]

[|16|21|] [|57| 78|91|]

[|11|] [|17| 19 |] [|21|23|24|] [|45|] [|63|65|] [|85|90|] [|95|]

24 10/27/2023
B* tree

 50% of the entries may be empty in B tree


 B* minimum set at two‐thirds
 When node overloads data are redistribute among siblings
 Splitting occur only when all the siblings are full
 When the nodes are split
data from two full siblings are divided
among two full nodes and a new node resulting in
all three nodes are two‐thirds full

25 10/27/2023
B* tree

[|21|78|]

[|8|10|] [|25|32|53|76|] [|93|99|]


 Add 30
[|25|78|]

[|8|10|21|] [|30|32|53|76|] [|93|99|]

26 10/27/2023
REVISE: B‐tree

 Reading a single block takes as much time as reading a partial block


 A block can hold a large number of keys/pointers (128‐1024)
 Order m is often very large (m = 128‐1024)
 Single node can contain 127‐1023 keys
 128‐1024 pointers to child nodes
 Why large m?
 Disk access is slow – need to fetch large data in one disk access
 Disk is block oriented – single node can occupy entire block
 Large value of m minimizes the height of the tree

27 10/27/2023
B + ‐ Tree

 Disks work by reading and writing whole blocks of data at once


 typically 512 bytes or four kilobytes
 A node of a binary search tree uses a small fraction of that
 so it makes sense to look for a structure that fits more neatly into a
disk block
 B+‐tree ‐‐ node stores up to m references to children and up
to m − 1 keys

28 10/27/2023
Example: B‐tree

 B‐tree of order 5
 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12, 20, 26, 4, 16, 18, 24, 25, 19

29 10/27/2023
B + ‐ Tree

 Each leaf be the same distance from the root


 searching for any of the 11 values (all listed on the bottom level)
 Involve loading three nodes from the disk
 (the root block, a second‐level block, and a leaf).
30 10/27/2023
B + tree

 Every node has one more references than it has keys (children)
 All leaves are at the same distance from the root
 The root has at least two children
 Every non‐leaf, non‐root node has at least ceil(m / 2) children
 Each leaf contains at least floor(m / 2) keys
 Every key from the table appears in a leaf & in left‐to‐right sorted
order
 For every non‐leaf node N with k being the number of keys in N:
all keys in the first child's subtree are less than N's first key;
all keys in the ith child's subtree (2 ≤ i ≤ k) are
between the (i − 1)th key of n and
31
the i th key of n
10/27/2023
B + tree

 M= 4
 Each leaf have at least two keys, and
 Each internal node to have
at least two children (and thus at least one key)

32 10/27/2023
B + tree: Insertion algorithm

 Descend to the leaf where the key fits


 If the node has an empty space
insert the key/reference pair into the node
 If the node is already full,
split it into two nodes
distributing the keys evenly between the two nodes

33 10/27/2023
B + tree: Insertion algorithm

 If the node is already full,


split it into two nodes
distributing the keys evenly between the two nodes
 If the node is a leaf,
take a copy of the minimum value in the second of these two
nodes and
repeat this insertion algorithm to insert it into the parent node
 If the node is a non‐leaf,
exclude the middle value during the split and
repeat this insertion algorithm to insert this excluded value into
the parent node
34 10/27/2023
B + Tree: Insertion
Data Index Action
Page Full Page Full
(Leaf) (Internal)
NO NO Place the record in sorted position in the appropriate leaf page

YES NO 1. Split the leaf page


2. Place Middle Key in the index page in sorted order
3. Left leaf page contains records with keys below the middle key
4. Right leaf page contains records with keys equal to or greater than the
middle key
YES YES 1. Split the leaf page.
2. Records with keys < middle key go to the left leaf page.
3. Records with keys >= middle key go to the right leaf page
Split the index page
4. Keys < middle key go to the left index page
5. Keys > middle key go to the right index page
6. The middle key goes to the next (higher level) index

IF the next level index page is full, continue splitting the index pages
B + tree (m = 4)

 **** Insert 5 ****


[|5|]
 **** Insert 9 ****
[|5|9|]
 **** Insert 1 ****
[|1|5|9|]
 **** Insert 3 ****
[|5|]
[|1|3|] [|5|9|]

36 10/27/2023
B + tree

 **** Insert 4 ****


[|5|]
[|1|3|4|] [|5|9|]

 **** Insert 59 ****


[|5|]
[|1|3|4|] [|5|9|59|]

 **** Insert 65 ****


[|5|59|]
[|1|3|4|] [|5|9|] [|59|65|]
37 10/27/2023
B + tree

 **** Insert 45 ****


[|5|59|]
[|1|3|4|] [|5|9|45|] [|59|65|]

 **** Insert 89 ****


[|5|59|]
[|1|3|4|] [|5|9|45|] [|59|65|89|]

 **** Insert 29 ****


[|5|29|59|]
[|1|3|4|] [|5|9|] [|29|45|] [|59|65|89|]

38 10/27/2023
B + tree

 **** Insert 68 ****


[|59|]
[|5|29|] [|68|]
[|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|]

 **** Insert 108 ****


[|59|]
[|5|29|] [|68|]
[|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|108|]
39 10/27/2023
B + tree

 **** Insert 165 ****


[|59|]
[|5|29|] [|68|108|]
[|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|]

 **** Insert 298 ****


[|59|]
[|5|29|] [|68|108|]
[|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|298|]
40 10/27/2023
B + tree

 **** Insert 219 ****


[|59|]
[|5|29|] [|68|108|219|]
[|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|] [|219|298|]

 **** Insert 569 ****


[|59|]
[|5|29|] [|68|108|219|]
[|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|] [|219|298|569|]
41 10/27/2023
B + tree

 **** Insert 37 ****


[|59|]
[|5|29|] [|68|108|219|]
[|1|3|4|] [|5|9|] [|29|37|45|] [|59|65|] [|68|89|] [|108|165|] [|219|298|569|]
 **** Insert 47 ****
[|59|]
[|5|29|45|] [|68|108|219|]
[|1|3|4|] [|5|9|] [|29|37|] [|45|47|] [|59|65|] [|68|89|] [|108|165|] [|219|298|569|]

42 10/27/2023
+
B Example: insertion

 order: m=4
 Insert: 1, 4, 16, 25, 9, 20, 13, 15, 10, 11, 12

43 10/27/2023
Example

44 10/27/2023
B + tree: Deletion

 Start at the root and go up to leaf node containing the key K


 Find the node n on the path from the root to the leaf node
containing K
 A. If n is root, remove K
 a. if root has more than one keys, done
 b. if root has only K
 i) if any of its child node can lend a node
 • Borrow key from the child and adjust child links
 ii) Otherwise merge the children nodes – it will be new root

45 10/27/2023
B + tree

 B. If n is a internal node, remove K


 i) If n has at lease ceil(m/2) keys, done!
 ii) If n has less than ceil(m/2) – keys,
 • If a sibling can lend a key,
 – Borrow key from the sibling and
adjust keys in n and the parent node
 – Adjust child links
 • Else
 – Merge n with its sibling
 – Adjust child links

46 10/27/2023
B + tree

 C. If n is a leaf node, remove K


 i) If n has at least ceil(M/2) elements, done!
 • In case the smallest key is deleted, push up the
next key
 ii) If n has less than ceil(m/2) elements
 • If the sibling can lend a key
 – Borrow key from a sibling and adjust
keys in n and its parent node
 • Else
 – Merge n and its sibling
 – Adjust keys in the parent node

47 10/27/2023
Deletion

Data Page Below Index Page Below Action


Fill Factor Fill Factor
NO NO Delete the record from the leaf page
Arrange keys in ascending order to fill void
If the key of the deleted record appears in the index page,
use the next key to replace it
YES NO Combine the leaf page and its sibling
Change the index page to reflect the change

YES YES 1. Combine the leaf page and its sibling


2. Adjust the index page to reflect the change
3. Combine the index page with its sibling

Continue combining index pages until you reach a


page with the correct fill factor or you reach the root
page.
B + tree: Delete (m=4)

[|45|]

[|5|29|] [|51|59|165|]

[|1|3|4|] [|5|9|] [|29|37|43|] [|45|47|49|] [|51|53|57|] [|59|61|65|] [|165|298|]

 ‐‐‐‐ Delete 298 ‐‐‐‐


[|45|]

[|5|29|] [|51|59|65|]

[|1|3|4|] [|5|9|] [|29|37|43|] [|45|47|49|] [|51|53|57|] [|59|61|] [|65|165|]


49 10/27/2023
B + tree

[|45|]

[|5|29|] [|51|59|65|]

[|1|3|4|] [|5|9|] [|29|37|43|] [|45|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

 ‐‐‐‐ Delete 45 ‐‐‐‐


[|47|]

[|5|29|] [|51|59|65|]

[|1|3|4|] [|5|9|] [|29|37|43|] [|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

50 10/27/2023
B + tree

[|47|]

[|5|29|] [|51|59|65|]

[|1|3|4|] [|5|9|] [|29|37|43] [|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

‐‐‐‐ Delete 1 ‐‐‐‐


[|47|]

[|5|29|] [|51|59|65|]

[|3|4|] [|5|9|] [|29|37|43] [|47|49|] [|51|53|57|] [|59|61|] [|65|165|]


51 10/27/2023
B + tree
[|47|]

[|5|29|] [|51|59|65|]

[|3|4|] [|5|9|] [|29|37|43] [|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

 ‐‐‐‐ Delete 3 ‐‐‐‐


[|47|]

[|29|] [|51|59|65|]

[|4|5|9|] [|29|37|43|] [|47|49|] [|51|53|57|] [|59|61|] [|65|165|]


52 10/27/2023
B + tree
[|47|]

[|29|] [|51|59|65|]

[|4|5|9|] [|29|37|43|] [|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

‐‐‐‐ Delete 47 ‐‐‐‐


[|43|]

[|29|] [|51|59|65|]

[|4|5|9|] [|29|37|] [|43|49|] [|51|53|57|] [|59|61|] [|65|165|]


53 10/27/2023
B + tree
[|43|]

[|29|] [|51|59|65|]

[|4|5|9|] [|29|37|] [|43|49|] [|51|53|57|] [|59|61|] [|65|165|]

 ‐‐‐‐ Delete 53 ‐‐‐‐


[|43|]

[|29|] [|51|59|65|]

[|4|5|9|] [|29|37|] [|43|49|] [|51|57|] [|59|61|] [|65|165|]


54 10/27/2023
B + tree
[|43|]

[|29|] [|51|59|65|]

[|4|5|9|] [|29|37|] [|43|49|] [|51|57|] [|59|61|] [|65|165|]

 ‐‐‐‐ Delete 37 ‐‐‐‐


[|43|]

[|9|] [|51|59|65|]

[|4|5|] [|9|29|] [|43|49|] [|51|57|]


55
[|59|61|] [|65|165|] 10/27/2023
B + tree
[|43|]

[|9|] [|51|59|65|]

[|4|5|] [|9|29|] [|43|49|] [|51|57|] [|59|61|] [|65|165|]

 ‐‐‐‐ Delete 165 ‐‐‐‐


[|43|]

[|9|] [|51|59|]

[|4|5|] [|9|29|] [|43|49|]56 [|51|57|] [|59|61|65|] 10/27/2023


B + tree
[|43|]

[|9|] [|51|59|]

[|4|5|] [|9|29|] [|43|49|] [|51|57|] [|59|61|65|]

 ‐‐‐‐ Delete 57 ‐‐‐‐


[|43|]

[|9|] [|51|61|]

[|4|5|] [|9|29|] [|43|49|] [|51|59|] [|61|65|]


57 10/27/2023
B + tree

[|43|]

[|9|] [|51|61|]

[|4|5|] [|9|29|] [|43|49|] [|51|59|] [|61|65|]

 ‐‐‐‐ Delete 49 ‐‐‐‐


[|43|]

[|9|] [61|]

[|4|5|] [|9|29|] [|43|51|59|]


58
[|61|65|] 10/27/2023
B + tree (m=4)

[|43|]

[|9|] [61|]

[|4|5|] [|9|29|] [|43|51|59|] [|61|65|]

59 10/27/2023
Example: deletion

 Order: 4

 Delete: 13, 15, 1

60 10/27/2023
B + trees vs. B tree

 B+ trees store data pointer ONLY in leaf nodes.


 B trees interior nodes can fit more keys on block of memory
 leaf nodes of B+ trees are linked
doing a linear scan of all keys will requires just one pass through
all the leaf nodes
 B tree would require a traversal of every level in the tree

61 10/27/2023

You might also like