B+ Tree Rules
B+ Tree Rules
B+ Tree Rules
1. The B+-tree index structure is the most widely used of several index
structures that maintain their efciency despite insertion and deletion of
data.
2. A B+ Tree combines features of ISAM(Indexed Sequential Access Method)
and B Trees. It contains index pages and data pages. The data pages always
appear as leaf nodes in the tree. The root node and intermediate nodes are
always index pages. These features are similar to ISAM. Unlike ISAM,
overflow pages are not used in B+ trees.
3. The index pages in a B+ tree are constructed through the process of inserting
and deleting records. Thus, B+ trees grow and contract like their B Tree
counterparts. The contents and the number of index pages reflects this
growth and shrinkage.
4. A B+tree index takes the form of a balanced tree in which every path from
the root of the tree to a leaf of the tree is of the same length. length of every
path from the root to leaf node is same and hence B+ tree is a balanced tree.
The letter B stands for balanced with this unique property. This balance
property ensures good performance for look up, insertion and deletion.
5. Data records are only stored in the leaves.
6. Internal nodes store just key values.
7. Keys are used to for directing a search to the proper leaf.
8. If a target key is less than a key in an internal node, then the pointer just to
its left side is followed.
9. If a target is greater than or equal then a key in an internal node, the pointer
just to its right side is followed.
10.B+ Trees and B Trees use a "fill factor" to control the growth and the
shrinkage. A 50% fill factor would be the minimum for any B+ or B tree. As
our example, we use the smallest page structure. This means that our B+ tree
conforms to the following guidelines.
11.A root node of n pointers must have at least 2 children and can have fewer
than int(n/2) keys.
12.If there are n number of pointers in a leaf node then the number of keys it
can hold is (n-1).
13.In B+ tree structure, a leaf node having n pointers must have at least (n
1)/2 keys and it can have at most (n-1) keys.
a. If n = 3, then the minimum number of keys a leaf node must have is
(31)/2 = 1 and the maximum number of keys are (3-1) = 2.
b. If n = 4, then the minimum number of keys a leaf node must have is
(41)/2 /2 = 2 and the maximum number of keys are (4-1) = 3.
c. If n = 5, then the minimum number of keys a leaf node must have is
(51)/2 = 2 and the maximum number of keys are (5-1) = 4.
d. If n = 6, then the minimum number of keys a leaf node must have is
(61)/2 = 3 and the maximum number of keys are (6-1) = 5.
14.In B+ tree structure, a non-leaf node except root with n number of pointers
must have between (n/2 and n children.
a. A non-leaf tree other than root with 3 pointers must have between
(3/2 and 3 children. i.e it must have 2 to 3 children.
b. A non-leaf tree other than root with 4 pointers must have between
(4/2 and 4 children. i.e it must have 2 to 4 children.
c. A non-leaf tree other than root with 5 pointers must have between
(5/2 and 5 children. i.e it must have 3 to 5 children.
d. A non-leaf tree other than root with 6 pointers must have between
(6/2 and 6 children. i.e it must have 3 to 6 children.
15.The B+ tree contains a relatively small number of lavels.
a. Level below root has at least 2*int(n/2) values.
b. Next level has at least 2*int(n/2)*int(n/2) values and so on.
16.If there are K search key values in the file, the tree height is no more than
log n/2 (K) .
17.The ranges of values in each leaf do not overlap, except if there are duplicate
search-key values, in which case a value may be present in more than one
leaf. Specically, if Li and Lj are leaf nodes and i < j, then every search-key
value in Li is less than or equal to every search-key value in Lj.
18.The non-leaf nodes of the B+ tree form a multilevel(sparse)index on the leaf
nodes. The structure of non-leaf nodes is the same as that for leaf nodes,
except that all pointers are pointers to tree nodes. A non-leaf node may hold
up to n pointers, and must hold at least n/2 pointers. The number of
pointers in a node is called the fanout of the node. Non-leaf nodes are also
referred to as internal nodes.
19.We shall see that the B+-tree structure imposes performance overhead on
insertion and deletion, and adds space overhead. The overhead is acceptable
even for frequently modied les, since the cost of le reorganization is
avoided. Further more, since nodes may be as much as half empty (if they
have the minimum number of children), there is some wasted space. This
space overhead, too, is acceptable given the performance benets of the B+-
tree structure.
20.A search can be done for a specific key value under B+ tree.
21.A search can be done for a range of values under B+ tree.
22.Insertion and deletion can be easily performed.
23. While processing a query in B+ tree, we traverse a path in the tree
from the root to some leaf node. If the number of records in the file is N,
number of pointers in the leaf node is n then the maximum number of nodes
to be accessed for a query like lookup is
24.In practice, only a few nodes need to be accessed. Typically, a node is made
to be the same size as a disk block, which is typically 4 kilobytes. With a
search-key size of 12 bytes, and a disk-pointer size of 8 bytes, n is around
200. Even with a more conservative estimate of 32 bytes for the search-key
size, n is around 100. With n=100, if we have 1million search-key values in
the le, a lookup requires only Ceiling(log50(1,000,000)) = 4 nodes to be
accessed. Thus, at most four blocks need to be read from disk for the lookup.
The root node of the tree is usually heavily accessed and is likely to be in the
buffer, so typically only three or fewer blocks need to be read from disk.
Now let us insert 10, the node is overflows as keys are already 3 and
hence it should be spilt and it is leaf node as it is initial node and not a
root. After inserting 10 using Rule 1we get the following.
There should be a parent node to look after these nodes and hence copy
7 into the parent node.
Now start inserting 13 and 16. After inserting 13 to the right leaf node, it
is full and hence need to be split further and the modified leaf nodes are
as follows.
After this we need to add 20 and 22. After adding the key value 20, the
leaf node is full and for the addition of key value 22, a split is needed as
follows.
partitioned.