Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
80 views

Binary Tree

A binary tree is a tree data structure where each node has at most two children, referred to as left and right. Binary trees can be used to implement binary search trees and binary heaps. They are commonly represented using nodes that contain data and references to left and right child nodes. There are different types of binary trees including full, complete, balanced, and degenerate binary trees. Binary trees have various properties related to their structure, representation, traversal, and number of possible configurations.

Uploaded by

Timothy Sawe
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Binary Tree

A binary tree is a tree data structure where each node has at most two children, referred to as left and right. Binary trees can be used to implement binary search trees and binary heaps. They are commonly represented using nodes that contain data and references to left and right child nodes. There are different types of binary trees including full, complete, balanced, and degenerate binary trees. Binary trees have various properties related to their structure, representation, traversal, and number of possible configurations.

Uploaded by

Timothy Sawe
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 22

In computer science, a binary tree is a tree data structure in which each node has at most two

children. Typically the first node is known as the parent and the child nodes are called left and
right. In type theory, a binary tree with nodes of type A is defined inductively as TA = μα. 1 + A ×
α × α. Binary trees are commonly used to implement binary search trees and binary heaps.

A simple binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is neither a
sorted nor a balanced binary tree

Definitions for rooted trees


 A directed edge refers to the link from the parent to the child (the arrows in the picture
of the tree).
 The root node of a tree is the node with no parents. There is at most one root node in a
rooted tree.
 A leaf node has no children.
 The depth of a node n is the length of the path from the root to the node. The set of all
nodes at a given depth is sometimes called a level of the tree. The root node is at depth
zero (or one [1] ).
 The height of a tree is the length of the path from the root to the deepest node in the tree.
A (rooted) tree with only one node (the root) has a height of zero (or one [2] ).
 Siblings are nodes that share the same parent node.
 If a path exists from node p to node q, where node p is closer to the root node than q, then
p is an ancestor of q and q is a descendant of p.
 The size of a node is the number of descendants it has including itself.
 In-degree of a node is the number of edges arriving at that node.
 Out-degree of a node is the number of edges leaving that node.
 Root is the only node in the tree with In-degree = 0.

2. Types of binary trees


 A rooted binary tree is a rooted tree in which every node has at most two children.
 A full binary tree (sometimes proper binary tree or 2-tree or strictly binary tree) is a
tree in which every node other than the leaves has two children.
 A perfect binary tree is a full binary tree in which all leaves are at the same depth or
same level. [3] (This is ambiguously also called a complete binary tree.)
 A complete binary tree is a binary tree in which every level, except possibly the last, is
completely filled, and all nodes are as far left as possible. [4]

 An infinite complete binary tree is a tree with levels, where for each level d the
number of existing nodes at level d is equal to 2 . The cardinal number of the set of all
d

nodes is . The cardinal number of the set of all paths is or equivalently, assuming
the axiom of choice, . (See Continuum hypothesis.) The infinite complete binary tree
essentially describes the structure of the Cantor set; the unit interval on the real line (of
cardinality ) is the continuous image of the Cantor set; this tree is sometimes called
the Cantor space.
 A balanced binary tree is where the depth of all the leaves differs by at most 1.
Balanced trees have a predictable depth (how many nodes are traversed from the root to a
leaf, root counting as node 0 and subsequent as 1, 2, ..., depth). This depth is equal to the
integer part of where is the number of nodes on the balanced tree. Example
1: balanced tree with 1 node, (depth = 0). Example 2: balanced tree with 3
nodes, (depth=1). Example 3: balanced tree with 5 nodes,
(depth of tree is 2 nodes).
 A rooted complete binary tree can be identified with a free magma.
 A degenerate tree is a tree where for each parent node, there is only one associated child
node. This means that in a performance measurement, the tree will behave like a linked
list data structure.
A rooted tree has a top node as root.

3. Properties of binary trees


 The number of nodes in a perfect binary tree can be found using this formula:
where is the height of the tree.
 The number of nodes in a complete binary tree is minimum: and maximum:
where is the height of the tree.
 The number of nodes in a perfect binary tree can also be found using this formula:
where is the number of leaf nodes in the tree.
 The number of leaf nodes in a perfect binary tree can be found using this formula:
where is the height of the tree.
 The number of NULL links in a Complete Binary Tree of n-node is (n+1).
 The number of leaf node in a Complete Binary Tree of n-node is
.
 For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1. [5]

- Note that this terminology often varies in the literature, especially with respect to the meaning
"complete" and "full".

4. Definition in graph theory

Graph theorists use the following definition: A binary tree is a connected acyclic graph such that
the degree of each vertex is no more than three. It can be shown that in any binary tree of two or
more nodes, there are exactly two more nodes of degree one than there are of degree three, but
there can be any number of nodes of degree two. A rooted binary tree is such a graph that has
one of its vertices of degree no more than two singled out as the root.

With the root thus chosen, each vertex will have a uniquely defined parent, and up to two
children; however, so far there is insufficient information to distinguish a left or right child. If we
drop the connectedness requirement, allowing multiple connected components in the graph, we
call such a structure a forest.

Another way of defining binary trees is a recursive definition on directed graphs. A binary tree is
either:

 A single vertex.
 A graph formed by taking two binary trees, adding a vertex, and adding an edge directed
from the new vertex to the root of each binary tree.

This also does not establish the order of children, but does fix a specific root node.
5. Combinatorics

The groupings of pairs of nodes in a tree can be represented as pairs of letters, surrounded by
parenthesis. Thus, (a b) denotes the binary tree whose left subtree is a and whose right subtree is
b. Strings of balanced pairs of parenthesis may therefore be used to denote binary trees in
general. The set of all possible strings consisting entirely of balanced parentheses is known as the
Dyck language.

Given n nodes, the total number of ways in which these nodes can be arranged into a binary tree
is given by the Catalan number . For example, declares that (a 0) and (0 a) are the
only binary trees possible that have two nodes, and declares that ((a 0) 0), ((0 a) 0), (0
(a 0)), (0 (0 a)), and (a b) are the only five binary trees possible that have 3 nodes. Here 0
represents a subtree that is not present.

The ability to represent binary trees as strings of symbols and parentheses implies that binary
trees can represent the elements of a magma. Conversely, the set of all possible binary trees,
together with the natural operation of attaching trees to one-another, forms a magma, the free
magma.

Given a string representing a binary tree, the operators to obtain the left and right subtrees are
sometimes referred to as car and cdr.

6. Methods for storing binary trees

Binary trees can be constructed from programming language primitives in several ways.

6. 1. Nodes and references

In a language with records and references, binary trees are typically constructed by having a tree
node structure which contains some data and references to its left child and its right child.
Sometimes it also contains a reference to its unique parent. If a node has fewer than two children,
some of the child pointers may be set to a special null value, or to a special sentinel node.

In languages with tagged unions such as ML, a tree node is often a tagged union of two types of
nodes, one of which is a 3-tuple of data, left child, and right child, and the other of which is a
"leaf" node, which contains no data and functions much like the null value in a language with
pointers.

6. 2. Ahnentafel list

Binary trees can also be stored as an implicit data structure in arrays, and if the tree is a complete
binary tree, this method wastes no space. In this compact arrangement, if a node has an index i,
its children are found at indices (for the left child) and (for the right), while its
parent (if any) is found at index (assuming the root has index zero). This method
benefits from more compact storage and better locality of reference, particularly during a
preorder traversal. However, it is expensive to grow and wastes space proportional to 2h - n for a
tree of height h with n nodes.

A binary tree can also be represented in the form of array as well as adjacency linked list. In the
case of array, each node(root,left,right) is simply placed in the index and there is no connection
mentioned about the relationship between parents and children. But In linked list representation
we can find the relationship between parent and children. In array representation the nodes are
accessed by calculating the index. This method is used in languages like FORTRAN which
doesn't have dynamic memory allocation. We can't insert a new node into array implemented
binary tree with ease, but this is easily done when using a binary tree implemented as linked list.

7. Methods of iterating over binary trees

Often, one wishes to visit each of the nodes in a tree and examine the value there. There are
several common orders in which the nodes can be visited, and each has useful properties that are
exploited in algorithms based on binary trees.

7. 1. Pre-order, in-order, and post-order traversal

Main article: Tree traversal

Pre-order, in-order, and post-order traversal visit each node in a tree by recursively visiting each
node in the left and right subtrees of the root. If the root node is visited before its subtrees, this is
pre-order; if after, post-order; if between, in-order. In-order traversal is useful in binary search
trees, the in-order traversal visits the nodes in increasing order.

7. 2. Depth-first order

In depth-first order, we always attempt to visit the node farthest from the root that we can, but
with the caveat that it must be a child of a node we have already visited. Unlike a depth-first
search on graphs, there is no need to remember all the nodes we have visited, because a tree
cannot contain cycles. Pre-order is a special case of this. See depth-first search for more
information.

7. 3. Breadth-first order
Contrasting with depth-first order is breadth-first order, which always attempts to visit the node
closest to the root that it has not already visited. See Breadth-first search for more information.
Also called a level-order traversal.

8. Encodings

8. 1. Succinct encodings

A succinct data structure is one which takes the absolute minimum possible space, as established
by information theoretical lower bounds. The number of different binary trees on nodes is
, the th Catalan number (assuming we view trees with identical structure as identical). For
large , this is about ; thus we need at least about bits to encode it. A
succinct binary tree therefore would occupy only 2 bits per node.

One simple representation which meets this bound is to visit the nodes of the tree in preorder,
outputting "1" for an internal node and "0" for a leaf. [1] If the tree contains data, we can simply
simultaneously store it in a consecutive array in preorder. This function accomplishes this:

function EncodeSuccinct( node n, bitstring structure, array data) { if n = nil then append 0 to
structure; else append 1 to structure; append n.data to data; EncodeSuccinct(n.left, structure,
data); EncodeSuccinct(n.right, structure, data); }

The string structure has only bits in the end, where is the number of (internal) nodes;
we don't even have to store its length. To show that no information is lost, we can convert the
output back to the original tree like this:

function DecodeSuccinct( bitstring structure, array data) { remove first bit of structure and put
it in b if b = 1 then create a new node n remove first element of data and put it in n.data n.left =
DecodeSuccinct(structure, data) n.right = DecodeSuccinct(structure, data) return n else return
nil }

More sophisticated succinct representations allow not only compact storage of trees but even
useful operations on those trees directly while they're still in their succinct form.

8. 2. Encoding general trees as binary trees

There is a one-to-one mapping between general ordered trees and binary trees, which in
particular is used by Lisp to represent general ordered trees as binary trees. To convert a general
ordered tree to binary tree, we only need to represent the general tree in left child-sibling way.
the result of this representation will be automatically binary tree, if viewed from a different
perspective. Each node N in the ordered tree corresponds to a node N' in the binary tree; the left
child of N' is the node corresponding to the first child of N, and the right child of N' is the node
corresponding to N 's next sibling --- that is, the next node in order among the children of the
parent of N. This binary tree representation of a general order tree, is sometimes also referred to
as a Left child-right sibling binary tree (LCRS tree), or a Doubly-chained tree, or a Filial-Heir
chain.

One way of thinking about this is that each node's children are in a linked list, chained together
with their right fields, and the node only has a pointer to the beginning or head of this list,
through its left field.

For example, in the tree on the left, A has the 6 children {B,C,D,E,F,G}. It can be converted into
the binary tree on the right.

The binary tree can be thought of as the original tree tilted sideways, with the black left edges
representing first child and the blue right edges representing next sibling. The leaves of the tree
on the left would be written in Lisp as:

(((N O) I J) C D ((P) (Q)) F (M))

which would be implemented in memory as the binary tree on the right, without any letters on
those nodes that have a left child.

Building a binary tree in C


The following program shows how to build a binary tree in a C program. It uses dynamic memory
allocation, pointers and recursion. A binary tree is a very useful data-structure, since it allows efficient
insertion, searching and deletion in a sorted list. As such a tree is essentially a recursively defined
structure, recursive programming is the natural and efficient way to handle it.

tree
empty
node left-branch right-branch

left-branch
tree

right-branch
tree
#include<stdlib.h>
#include<stdio.h>

struct tree_el {
int val;
struct tree_el * right, * left;
};

typedef struct tree_el node;

void insert(node ** tree, node * item) {


if(!(*tree)) {
*tree = item;
return;
}
if(item->val<(*tree)->val)
insert(&(*tree)->left, item);
else if(item->val>(*tree)->val)
insert(&(*tree)->right, item);
}

void printout(node * tree) {


if(tree->left) printout(tree->left);
printf("%d\n",tree->val);
if(tree->right) printout(tree->right);
}

void main() {
node * curr, * root;
int i;

root = NULL;

for(i=1;i<=10;i++) {
curr = (node *)malloc(sizeof(node));
curr->left = curr->right = NULL;
curr->val = rand();
insert(&root, curr);
}

printout(root);
}

Heaps

What is a heap?

A heap is a complete binary tree, each of whose nodes contains a key which is greater
than or equal to the key in each of its children. Actually, this is technically a "maximum
heap"; if we replace "greater than or equal to" with "less than or equal to", we get the
definition of a "minimum heap".

Note that this use of the term heap has absolutely nothing to do with the other meaning
of the word heap, which in that context was another term for "free store". Two words
that have the same spelling or pronunciation but different meanings are called
homonyms. Here are some other examples in which the sound is again the same, but at
least the spelling is different: so and sew; do, dew, due; grate and great.
Thus we may say that a heap satisfies two properties:

1. A "shape property" (that is, it's a complete binary tree)


2. An "order property" (the value in a node is "optimal" with respect to the values in all
nodes below it)

What are some uses for heaps?

 Heaps are ideal for implementing priority queues, which should not be surprising if you
just think about the definition of a heap for a moment. For one thing, we can regard the
root element as being the one of "highest priority", since this will be either the "largest"
value (in the case of a maximum heap) or the "smallest" value (in the case of a minimum
heap).
 Heaps also give us another sorting algorithm, called heapsort, which you should
compare with selection sort, and for which the pseudocode (for sorting the values in a
heap) looks like this:

 while not finished (while heap not empty)


 Remove root element and put it in its place
 Re-heap the remaining elements

This is a O(n*log n) algorithm, and, unlike quicksort, it is guaranteed not to degenerate to


O(n2). However, the algorithm begs a couple of questions:

o How do you get a heap in the first place?


o How do you "re-heap" after an insertion or deletion has been applied to a heap?

In order to deal with this question we introduce a new way of representing binary trees.

Non-linked representation of binary trees

Study the binary tree shown below, and the array (or vector) that immediately follows it
and contains the same values:
Note that although this binary tree happens to be a heap, it could be any kind of binary
tree. That is, we could use this kind of representation for any of our binary trees; it's just
that it turns out to be particularly convenient for heaps. You should make the following
observations:

 If we place the values from the tree into a vector via a level-order traversal, then we
have the following pattern:
 The children of the value at index 0 are at indices 1 and 2.
 The children of the value at index 1 are at indices 3 and 4.
 The children of the value at index 2 are at indices 5 and 6.
 The children of the value at index 3 are at indices 7 and 8.
 ... and so on, or, in general ...
 The children of value at index i are at indices 2i+1 and 2i+2.

And, going the other way ...

The parent of the value at index k is at index (k-1)/2.


 Also, since our example is a heap, it is of course a complete binary tree. This means that
there are no "gaps" (i.e., "missing values" or "empty spots") in the vector representation.
If we use this form of representation for some other kind of binary tree and there are
such spots, we can us a special symbol (say '~', for character values) to mark them.

Now that we know what a heap is, and how we are going to represent it, it's time for the
usual questions:

 Supposing we have an arbitrary vector of values, how do we turn it into a heap?


 Once we have a heap, how do we add a new value to the heap, while ensuring that the
structure retains its heap properties (shape and order)?

 Once we have a heap, how do we delete a value from the heap, while ensuring that the
structure retains its heap properties (shape and order)?

Though it seems natural to ask the above questions in the order given, it turns out to be
convenient to answer them in the opposite order.

 Deletion So, note first that the element we delete from a heap is always the root
element, which simplifies our discussion of heap deletion. Why should this be? Well, the
whole rationale of the heap structure is to have the "optimal" value (maximum or
minimum, say) at the root and hence "easily accessible". The idea behind deletion is
thus to delete the root element and then make sure that what's left behind is again a
heap, so that the "next most optimal element" will be in the root position of the revised
heap.

The way we perform deletion is to first overwrite the root value with the "most
remote" value in the tree (last value in the vector). This retains the shape
property, but destroys the order property. So, we have to move this value down
through the tree by exchanging it with one of its children until we reach a point
where the order property, and hence "heapness", has been restored. This
process is called "re-heaping down", and its pseudocode looks like this:

Algorithm ReHeapDown
--------------------
if currentNode is not a leaf
Set maxChild to index of child of currentNode with larger value
if value at currentNode < value at maxChild
Swap value at currentNode with value at maxChild
ReHeapDown starting at maxChild
 Insertion When inserting into a heap, we begin by adding the new element as the
rightmost element on the bottom row of the tree (i.e., as the last element of the vector.
This preserves the shape property, but again (in all likelihood) destroys the order
property. This time, we need to move the value up the tree until it reaches the point
where the order property, and hence "heapness" is restored. This process is called "re-
heaping up", and its pseudocode looks like this:
 Algorithm ReHeapUp
 ------------------
 if value at currentNode > value at parent node
Set parentNode to index of parent of currentNode
Swap value at currentNode with value at parentNode
ReHeapUp starting at parentNode
Building Starting with a complete binary tree (i.e., a tree that has the appropriate
shape for a heap) and "building" it into a heap (i.e., giving it the "order property" as well)
is a "bottom up" process. And this process is based on the following observation: If we
have two trees, a "left" one that is full, and a "right" one that is complete, and both are
already heaps of the same height, then joining them at a root node in that left-right order
will only require a "re-heap" down to restore "heapness". Thus, the pseudocode for
building a heap looks like this:

Algorithm BuildHeap
-------------------
for each index from first non-leaf back up to root (in reverse-level-
order order)
ReHeapDown starting at that index

Binary Trees
 

Binary Trees Defined

A binary tree is a tree that may have no nodes (empty tree) or if there are nodes, the following
rules apply.

 There is a root node.


 Each node, except the root, must have only 1 parent.
 Each node may have at most 2 children called the left child and the right child.

For a node in a binary tree, the node beginning with its left child and below is its left sub-tree.
The node beginning with its right child and below is its right sub-tree. Notice that a binary tree
can have no nodes, let's define the depth of the empty tree as-1. The depth or height of a node is
the depth of the parent plus 1 with the root node having depth of zero. In a full binary tree, every
leaf node has the same depth. In a complete binary tree, the levels above the deepest represent a
full tree, and the nodes at the deepest level are arranged in order beginning at the left most
position. The following are examples of full and complete binary trees. All full binary trees are
also complete.
Nodes of a complete binary tree are numbered from level 1 to the highest level and from left to
right. The depth of a complete binary tree with n nodes can be computed as follows:

depth =  log2 (n) + 1

Of course, binary trees may not be complete. They can even be skewed left or right. The
following are also binary trees.
The maximum number of nodes on any level of a binary tree is 2i, and the maximum number of
nodes in a binary tree of depth k is 2k+1 -1. A full binary tree of depth k is a binary tree with 2k+1 -
1 nodes.

Binary Tree Representation

Since nodes are numbered from 1 to n, we can use a one dimensional array to store the nodes.
The size of the array would be n for a complete binary tree. If a complete binary tree with n
nodes is represented sequentially, then, for any node with index i, 0  i  n, we know the
following.

1. The parent of i is at  (i -1)/2  if i  0. If i = 0, then there is no parent.


2. The left child of i is at 2i + 1 if 2i + 1  n. If 2i + 1  n, then there is no left child.

3. The right child of i is at 2i + 2 if 2i + 2  n. If 2i + 2  n, then there is no right child.


Given a binary tree that is not complete, we can compute the maximum number of nodes for a
tree of that depth. This would then be the size of the array to hold the tree. For sparsely
populated trees such as skewed trees, an array representation may waste space. For complete
trees and densely populated trees, an array representation may be adequate. Let's examine the
tree below and create an array representation.

The root node, A, is always at level zero and is always at position zero, 0, in the array. Node B is
the left child of A; the computed position of B is 1 ((2 x 0) + 1), for C, the left child of B, the
position is 3, and the position of D is 7. The array representation can be visualized as shown
below with the top row as the index and the 2nd row as values.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

A B   C       D              

Notice that most positions in the array will not be used. For such trees, they should be
represented using binary tree nodes that hold the value of the object and a reference to its' left
child and right child. If this methodology were used, only space for four objects would be
allocated plus additional space for references.

Binary Tree Traversals

Trees are very important data structures, and binary trees have very special uses in computer
science. Traversing a tree may be compared to an iterator for a list. While a list iterator begins at
the head of the list and goes (traverses) to the next object in order until it gets to the end of the
list, trees may be traversed using different methodologies. These tree traversal methods are
Inorder, Preorder, and Postorder. Notice that the prefixes in, pre, and post refer to the order in
which the root node is accessed in reference to its left and right subtrees. Inorder means that the
left subtree is traversed first, then the root node, followed by the right subtree. Preorder means
that the root is processed before its subtrees, e.g., root is first, then the left subtree, followed by
right subtree. Postorder means the root node is handled last -- left subtree, right subtree, root.
These methods are described recursively. The following are algorithms for inorder, preorder, and
postorder tree traversals.

Inorder Tree Traversal Algorithm

Input Parameter: A Binary Tree Node

Output: None

Process

if input node is not null

call inorder(node left child)

print node value

call inorder(node right child)

end if

Preorder Tree Traversal Algorithm

Input Parameter: A Binary Tree Node

Output: None

Process

if input node is not null

print node value

call preorder(node left child)

call preorder(node right child)

end if

 
Postorder Tree Traversal Algorithm

Input Parameter: A Binary Tree Node

Output: None

Process

if input node is not null

call postorder(node left child)

call postorder(node right child)

print node value

end if

The algorithms state that the node is to be printed. However, your implementation may call for
placing the nodes in a queue and returning the queue to a calling program. Can you think of why
the nodes would be placed in a queue? Practice traversing trees using the algorithms above.

2. Binary Search Tree

Binary Search Tree (BST) enable you to search a collection of objects (each with a real or
integer value) quickly to determine if a given value exists in the collection.

Basically, a binary search tree is a node-weighted, rooted binary ordered tree. That
collection of adjectives means that each node in the tree might have no child, one left
child, one right child, or both left and right child. In addition, each node has an object
associated with it, and the weight of the node is the value of the object.

The binary search tree also has the property that each node's left child and descendants
of its left child have a value less than that of the node, and each node's right child and its
descendants have a value greater or equal to it.
Binary Search Tree

The nodes are generally represented as a structure with four fields, a pointer to the
node's left child, a pointer to the node's right child, the weight of the object stored at this
node, and a pointer to the object itself. Sometimes, for easier access, people add pointer
to the parent too.

Why are Binary Search Tree useful?

Given a collection of n objects, a binary search tree takes only O(height) time to find an
objects, assuming that the tree is not really poor (unbalanced), O(height) is O(log n). In
addition, unlike just keeping a sorted array, inserting and deleting objects only takes
O(log n) time as well. You also can get all the keys in a Binary Search Tree in a sorted
order by traversing it using O(n) inorder traversal.

Variations on Binary Trees

There are several variants that ensure that the trees are never poor. Splay trees, Red-
black trees, B-trees, and AVL trees are some of the more common examples. They are all
much more complicated to code, and random trees are generally good, so it's generally
not worth it.

Tips: If you're concerned that the tree you created might be bad (it's being created by
inserting elements from an input file, for example), then randomly order the elements
before insertion.

3. Dictionary / Hash Table

A dictionary, or hash table, stores data with a very quick way to do lookups. Let's say
there is a collection of objects and a data structure must quickly answer the question: 'Is
this object in the data structure?' (e.g., is this word in the dictionary?). A hash table does
this in less time than it takes to do binary search.

The idea is this: find a function that maps the elements of the collection to an integer
between 1 and x (where x, in this explanation, is larger than the number of elements in
your collection). Keep an array indexed from 1 to x, and store each element at the
position that the function evaluates the element as. Then, to determine if something is in
your collection, just plug it into the function and see whether or not that position is
empty. If it is not check the element there to see if it is the same as the something
you're holding,

For example, presume the function is defined over 3-character words, and is (first letter
+ (second letter * 3) + (third letter * 7)) mod 11 (A=1, B=2, etc.), and the words are
'CAT', 'CAR', and 'COB'. When using ASCII, this function takes 'CAT' and maps it to 3,
maps 'CAR' to 0, and maps 'COB' to 7, so the hash table would look like this:

0: CAR
1
2
3: CAT
4
5
6
7: COB
8
9
10

Now, to see if 'BAT' is in there, plug it into the hash function to get 2. This position in the
hash table is empty, so it is not in the collection. 'ACT', on the other hand, returns the
value 7, so the program must check to see if that entry, 'COB', is the same as 'ACT' (no,
so 'ACT' is not in the dictionary either). In the other hand, if the search input is 'CAR',
'CAT', 'COB', the dictionary will return true.

Why are Hash Tables useful?

Hash tables enable, with a little bit of memory cost, programs to perform lookups in
almost constant work. Generally, the program must evaluate the function and then
possibly compare the looked up element to an entry in the table.

Collision Handling

This glossed over a slight problem that arises. What can be done if two entries map to
the same value (e.g., I wanted to add 'ACT' and 'COB')? This is called a collision. There
are couple ways to correct collisions, but this document will focus on one method, called
chaining.

Instead of having one entry at each position, maintain a linked list of entries with the
same hash value. Thus, whenever an element is added, find its position and add it to the
beginning (or tail) of that list. Thus, to have both 'ACT' and 'COB' in the table, it would
look something like this:

0: CAR
1
2
3: CAT
4
5
6
7: COB -> ACT
8
9
10

Now, to check an entry, all elements in the linked list must be examined to find out the
element is not in the collection. This, of course, decreases the efficiency of using the
hash table, but it's often quite handy.

Hashers Training

There are two basic things you need to consider to avoid collisions. The first is easy:
have a large hash table. Assuming a reasonable hash function, this reduces collisions just
for probabilistic reasons.

The more subtle, and often forgotten, thing to do to avoid collisions is pick a good hash
function. For example, taking the three letter prefix as the hash value for a dictionary
would be very bad. Under this hash function, the prefix 'CON' would have a huge number
of entries. Pick a function where two elements are unlikely to map to the same value:

1.Create a relatively huge value and mod it with the size of your table (this works
especially well if your hash table is a prime size).
2.Primes are your friends. Multiply by them.

3.Try to have small changes map to completely different locations.

4.You don't want to have two small changes cancel each other out in your mapping
function (a transposition, for example).

5.This is a whole field of study, and you could create a 'Perfect Hash Function' that
would give you no collisions, but, for your purposes, that's entirely too much
work. Pick something that seems fairly random, and see if it works; it probably
will.

Hash Table variations

It is often quite useful to store more information that just the value. One example is
when searching a small subset of a large subset, and using the hash table to store
locations visited, you may want the value for searching a location in the hash table with
it.

Even a small hash table can improve runtime by drastically reducing your search space.
For example, keeping a dictionary hashed by the first letter means that if you wanted to
search for a word, you would only be looking at words that have the same first letter.
Introduction
An important special kind of binary tree is the binary search tree (BST). In a BST, each node
stores some information including a unique key value, and perhaps some associated data. A
binary tree is a BST iff, for every node n in the tree:

 All keys in n's left subtree are less than the key in n, and
 all keys in n's right subtree are greater than the key in n.

Note: if duplicate keys are allowed, then nodes with values that are equal to the key in node n can be
either in n's left subtree or in its right subtree (but not both). In these notes, we will assume that
duplicates are not allowed.

Here are some BSTs in which each node just stores an integer key:

These are not BSTs:

In the left one 5 is not greater than 6. In the right one 6 is not greater than 7.

Note that more than one BST can be used to store the same set of key values. For example, both
of the following are BSTs that store the same set of integer keys:
The reason binary-search trees are important is that the following operations can be implemented
efficiently using a BST:

 insert a key value


 determine whether a key value is in the tree

 remove a key value from the tree

 print all of the key values in sorted order

Binary Search Trees

You might also like