Data Structures Notes
Data Structures Notes
DATA STRUCTURES
& ALGORITHMS
2. SOME DEFINITIONS
Data
any fact, number, character, state
Information
data placed within a context
data with meaning
Algorithm
a finite sequence of instructions, each with a clear meaning,
which lead to the solution of a problem
Data Type
the data type of a variable is the set of values that the variable may assume
Data Structure
collection of variables, possibly of different data types, connected in various ways
data structures are used to represent the mathematical model of an ADT
3. AN E XA M P L E : RAT I O N A L N U MB E R
Type rational;
DATA STRUCTURES AND ALGORITHMS 3
Function makeRational(a,b: integer): rational;
- teamwork
- prototyping
- modular programming approach
integer
values: whole numbers, positive and negative
+, -, *, DIV, MOD
ABS(-4) -> 4
real
values: floating point numbers
+, -, *, /
ROUND(3.6) -> 4
TRUNC(3.6) -> 3
char
values: alphabet, special chars, control chars
ORD(ch) -> ordinal number
CHR(num) -> character (refer to ascii table for details)
boolean
Using subrange types, you protect your programs of using unexpected values.
3. E N U M E RAT E D TYPES
A simple data type for which all possible values are listed as identifiers.
4. ARRAY TYPES
An array is a structured data type for which all members are of the same type.
Each value of an array variable is identified by its "index".
Example
type string = ARRAY[0..255] OF char;
table = ARRAY [1..10] OF integer;
var myTable: table;
1 2 3 4 5 6 7 8 9 10
myTable
x := 5;
myTable[6] := 13;
myTable[1] := x;
myTable[x] := 4 * x;
myTable[3 * x] := 2; (* index out of range error *)
myTable[2] := myTable[1] * 6;
myTable[10] := myTable[1 * 6];
1 2 3 4 5 6 7 8 9 10
myTable 5 30 20 13 13
day
today month
year
today.day := 14;
today.month := 5;
today.year := 2001;
day 14
today month 5
year 2001
Idea
Start comparing keys at beginning of table and move towards the end of the table until element is
found or the end of the table is reached.
Example: searching for 43
1 2 3 4 5 6 7 8 9 10
myTable 5 14 23 30 40 43 56 61 77 99
Algorithm
The following function returns the index of the cell where myKey is found, or 0 if myKey is not
present.
function SeqSearch(T: table; myKey: integer): integer;
var index: integer;
begin
index := 1;
while (index < max) and (T[index] <> myKey) do
index := index + 1;
if T[index] = myKey
then seqSearch := index
else seqSearch := 0;
end;
Idea
take middle-element of the table & compare
if keys do not match then
if your key is smaller
then Search the left half of the table (*using binary search *)
else Search the right half of the table (*using binary search *)
Algorithm
The following function returns the index of the cell where myKey is found, or 0 if myKey is not
present.
function BinSearch(T: table; myKey: integer): integer;
var low, high, mid: integer;
found: boolean;
begin
low := 1;
high := max;
found := false;
repeat
mid := (low + high) DIV 2;
if T[mid] = myKey
then found := true
else if T[mid] < myKey
then low := mid + 1
else high := mid 1;
until found OR (low > high);
if found
then BinSearch := mid
else BinSearch := 0;
end;
4. C O M PA R I N G A L G O R I T H M S
"Sorting" means to arrange a sequence of data items so that the values of their key-fields form a
proper sequence.
Sorting of data is an extremely important and frequently executed task.
Efficient algorithms for sorting data are needed.
2. BUBBLE SORT
The idea
Go through the whole array, comparing adjacent elements and reversing two elements whenever
they are out of order.
Observe that after one pass, the smallest element will be in place.
If we repeat this process N-1 times then the whole array will be sorted!
An example
1 2 3 4 5 6 7 8 9 10
11 76 4 14 40 9 66 61 5 12
Pass 1
1 2 3 4 5 6 7 8 9 10
4 11 76 5 14 40 9 66 61 12
Pass 2
1 2 3 4 5 6 7 8 9 10
4 5 11 76 9 14 40 12 66 61
Pass 3
1 2 3 4 5 6 7 8 9 10
4 5 9 11 76 12 14 40 61 66
Pass 4
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 76 14 40 61 66
DATA STRUCTURES AND ALGORITHMS 16
Pass 5
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 76 40 61 66
Pass 6
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 76 61 66
Pass 7
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 61 76 66
Pass 8
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 61 66 76
Pass 9
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 61 66 76
The Algorithm
We assume the following declarations:
const max = 10;
type table = array[1..max] of integer;
3. INSERTION SORT
The Idea
If we insert an element in its correct position in a list of x sorted elements, we end up with a list
of x + 1 sorted elements.
Observe that 1 element is always sorted, and so, after inserting one other element we have 2
sorted elements.
If we now keep on inserting elements then after N-1 insertions, our whole list will be sorted.
An example
1 2 3 4 5 6 7 8 9 10
11 76 4 14 40 9 66 61 5 12
1 2 3 4 5 6 7 8 9 10
11 76 4 14 40 9 66 61 5 12
1 2 3 4 5 6 7 8 9 10
4 11 76 14 40 9 66 61 5 12
Pass 3: we insert 14
1 2 3 4 5 6 7 8 9 10
4 11 14 76 40 9 66 61 5 12
Pass 4: we insert 40
1 2 3 4 5 6 7 8 9 10
4 11 14 40 76 9 66 61 5 12
Pass 6
1 2 3 4 5 6 7 8 9 10
4 9 11 14 40 66 76 61 5 12
Pass 7
1 2 3 4 5 6 7 8 9 10
4 9 11 14 40 61 66 76 5 12
Pass 8
1 2 3 4 5 6 7 8 9 10
4 5 9 11 14 40 61 66 76 12
Pass 9
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 61 66 76
The Algorithm
We assume the following declarations:
const max = 10;
type table = array[1..max] of integer;
We need a procedure to insert one element into a list of already sorted elements:
{ the procedure insert puts element T[index] in its correct }
{ place between elements T[1] and T[index 1] }
procedure insert(var t: table; index: integer);
begin
while (index > 1) and (t[index] < t[index-1]) do
begin
swap(T[index], T[index-1]);
index := index 1;
end;
end;
The Idea
Find the smallest element in a list of unsorted elements and swap it with the element in position
1.
Now, find the smallest element in the rest of the list of unsorted elements and swap it with the
element in position 2.
If we repeat this N-1 times, our list will be completely sorted.
An Example
1 2 3 4 5 6 7 8 9 10
11 76 4 14 40 9 66 61 5 12
1 2 3 4 5 6 7 8 9 10
4 76 11 14 40 9 66 61 5 12
1 2 3 4 5 6 7 8 9 10
4 5 11 14 40 9 66 61 76 12
1 2 3 4 5 6 7 8 9 10
4 5 9 14 40 11 66 61 76 12
1 2 3 4 5 6 7 8 9 10
4 5 9 11 40 14 66 61 76 12
Pass 5
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 66 61 76 40
DATA STRUCTURES AND ALGORITHMS 21
Pass 6
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 66 61 76 40
Pass 7
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 61 76 66
Pass 8
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 61 76 66
Pass 9
1 2 3 4 5 6 7 8 9 10
4 5 9 11 12 14 40 61 66 76
The Algorithm.
We assume the following declarations:
const max = 10;
type table = array[1..max] of integer;
We need a function that finds the smallest element in a part of the table:
{ the function findSmallest returns the index of the }
{ smallest element to be found in cells T[index] .. T[max]}
function findSmallest(t: Table; index: integer): integer;
var currentSmall, x: integer;
begin
currentSmall := index;
for x := index + 1 to max do
if t[x] < t[currentSmall]
then currentSmall := x;
findSmallest := currentSmall;
end;
A pointer variable doesnt store 'data' directly, it only points to where 'data' is stored.
A pointer variable is a variable whose value indicates another variable.
We say that a pointer variable points to an anonymous variable. That is, a variable which does not
have a name. The only way to access the anonymous variable is through the use of the pointer.
2. DECLARING POINTERS
Note:
intPtr is a variable of type pointer.
intPtr points to another variable; this anonymous variable, which can not be referred to by
name, is of type integer.
intPtr actually contains the address of the anonymous variable
intPtr 12
3. A L L O C AT I N G MEMORY: NEW
NEW(pointerVariable)
The system will search for a free part of memory which is big enough to hold an
anonymous variable and it will store the address of that variable in
pointerVariable
Example
NEW(intPtr);
an anonymous variable of type integer is created
intPtr points to that anonymous variable
the value of the anonymous variable is undefined
intPtr ?
pointerVariable^
This refers to the (anonymous) variable pointed to by pointerVariable
Example
Example
program happyPointers;
var ip1, ip2: ^integer;
begin
ip1^ := 10; (* Error: ip^ undefined *)
new(ip1); ip1 ?
NEW(ip1) ip1 82
ip2
?
END.
Through the use of pointers we can allocate, but also de-allocate memory at run-time.
The procedure DISPOSE allows us to free a portion of memory that is used by a dynamic variable.
DISPOSE(pointerVariable)
The pascal system will free the memory that was in use by the anonymous variable
pointed to by pointerVariable.
After this, the value for pointerVariable is undefined.
Example
program happyPointers;
var ip1, ip2: ^integer;
begin
new(ip1); ip1 ?
dispose(ip1); ip1 ?
new(ip1); ip1 ?
dispose(ip1); ip1 ?
ip2 (* DANGLING POINTER!!! We
did not dispose p2 but it is
no longer pointing to a
dynamic variable *)
6. THE NIL VA LU E
Every pointer can be assigned a special NIL value to indicate it does not point to any dynamic
variable.
It is a good practice to assign this NIL-value to pointers which are not pointing to any dynamic
variable.
begin
NEW(ptr1) ptr1
Data structures which have their size defined at compile time, and which can not have their size
changed at run time are called static data structures.
The size of a dynamic data structure is not fixed. It can be changed at run time.
begin
head ?
tail ?
NEW(head); head
head^.name := Nyasha;
head^.sex := male;
head^.age := 9;
head^.next := nil;
tail := head;
head tail
9
Nyasha
male
head tail
9
Nyasha
male
tail := tail^.next;
head
9
Nyasha
male
tail
tail
5
Paida
female
new(tail^.next);
tail := tail^.next
tail^.name := Mazvita;
tail^.sex := female;
tail^.age := 14;
tail^.next := nil;
head
9
Nyasha
male
5
Paida
female
14 tail
Mazvita
female
A linked list consists of a sequence of data-items. Each data-item in the list is related, or identified,
by its position relative to the other elements.
In a linear single linked list, the only data-item accessible from an element is the next element.
Example
List1: 2, 5, 10, 8, 3, 7
List2: 2, 5, 8, 10, 3, 7
4. I M P L EM E N TAT I O N OF LISTS
l := nil; l
l 5 10 2
l 3 5 10 2
l 5 10 2
pos
l 5 10 7 2
l 5 10 2
5 10 2
l
Doing this, we create GARBAGE !!!
Proper solution:
l 5 10 2
Traversing a list.
Example 1: Printing the contents of a list.
l 5 10 2
temp := l;
while temp <> nil do begin
writeln(temp^.info);
temp := temp^.next;
end;
l 5 10 2
pos
Observe: we need a pointer to the previous element!
l 5 10 2
prev pos
prev^.next := pos^.next;
DISPOSE(pos);
l 5 2
Contiguous List
Dynamic List
Comparison
Dynamic data structures have the obvious advantage that their size can grow and shrink at run
time.
- The variable staticResults can hold exactly 100 integers.
- If we store only 5 integers in the variable, still space for 100 integers is
taken up.
-Storing 101 integers in the variable is impossible.
-the size of dynamicResults is only limited by the memory size of your
machine/system.
5 8 9
l 5 10 2
5 8 9
Recursive definition:
Factorial(n) = 1 when n = 0
Factorial(n) = n x Factorial(n-1) when n > 0
Factorial(0) = 1
Program Example;
procedure P1;
begin
...
end;
procedure P2;
begin
...
p1;
...
end;
procedure P3;
begin
...
p2;
...
end;
begin
p3;
end;
begin
p3;
end;
subprogram call
- halt the current process
- pass parameters
- start execution of subprogram
end of subprogram
- go back to that point in the program where the subprogram was called from
- return result (functions!)
- resume execution.
3. RECURSIVE SUBPROGRAMS
n <- 4
function fac(n: integer);
begin
if n = 0
then fac := 1
else fac := n x fac(n-1);
end; n <- 3
function fac(n: integer);
begin
if n = 0
then fac := 1
else fac := n x fac(n-1);
end;
n <- 2
function fac(n: integer);
begin
if n = 0
then fac := 1
else fac := n x fac(n-1);
end; n <- 1
function fac(n: integer);
begin
if n = 0
then fac := 1
else fac := n x fac(n-1);
end; n <- 0
function fac(n: integer);
begin
if n = 0
then fac := 1
else fac := n x fac(n-1);
end;
The definition:
fib(n) = n when n = 0 or n = 1
fib(n) = fib(n-1) + fib(n-2) when n > 1
An exercise:
Write an iterative version of fibo
b x a = a + a + a + ... + a
The definition:
a x b = a when b = 1
a x b = a + a x (b 1) when b > 1
The function:
function multiply(a, b: integer): integer;
begin
if (b = 1)
then multiply := a
else multiply := a + multiply(a, b-1);
end;
An exercise:
Write a recursive version multiply which works for integer numbers
The function:
function binSearch( a: array[1..max] of integer;
key: integer;
low, high: integer): integer;
var mid: integer;
begin
if low > high
then binSearch := 0
else begin
mid := (low + high) div 2;
if a[mid] = key
then binSearch := mid
else if key < a[mid]
then binSearch := binSearch(a, key, low, mid 1)
else binSearch := binSearch(a, key, mid + 1, high)
end;
end;
The problem:
A B C
???
Some ideas:
Suppose we can move N-1 disks from one peg to another.
...then the problem is solved...
!!!
So we have a solution for N disks in terms of a solution of N-1 disks, and we have
the trivial case of 1 disk.
If we keep on using this same strategy then the whole problem will come down to
trivial cases:
N disks
N-1 disks & trivial case
N-2 disks & a trivial case
N-3 disks & a trivial case
...
1 disk = a trivial case
The solution:
program TowersOfHanoi;
var nrOfDisks: integer;
procedure towers(...);
begin
...
end;
begin
write(How many disks need to be moved? );
readln(nrOfdisks);
towers(nrOfDisks, A, B, C);
end.
Counting nodes in a linked list.
A stack is an ordered collection of items into which new items may be inserted and from which items
may be deleted at one end, called the TOP of the stack.
A stack is a dynamic datastructure. The two basic operations on stacks are push and pop.
2. AN EXAMPLE.
25 <-
tos
80 <- 80 80 <-
tos tos
40 <- 40 40 40 40 <-
tos tos
55 55 55 55 55 55 <-
tos
20 20 20 20 20 20
10 10 10 10 10 10
S s s s s s
push(s, 80) push(s, 25) x :=pop(s) x:=pop(s) x:=pop(s)
The function stacktop returns a copy of the top element without removing it.
x := STACKTOP(s) x := POP(s);
PUSH(s, x);
var x: integer;
s: stack;
str: string;
readln(str);
for x := 1 to length(str) do
push(s, str[x]);
e
l
p
p
A
s
x := 1;
while not(emptyStack(s)) do begin
str[x] := pop(s);
x := x + 1;
end;
writeln(str); elppA
DEFINITION
type stack;
elementType;
DEFINITITION
APPLICATION 1
type stack;
elementType;
Precedence rules
() $ x and / + and -
Examples
Observation
When scanning a postfix expression from left to right, we always read the operands before
we read the operator, this means that,
when we reach an operator, the operands are readily available, actually,
the last operands that we read will be the first ones to be used, hence,
our idea to use a stack!
Idea
- we scan the expression from left to right
- if we get an operand,
operand,
we push it on the stack
- if we get an operator,
operator,
we pop the two operands from the stack,
apply the operator and
push the result on the stack
Example
6 2 3 + - 3 8 2 / + * 2 $ 3 +
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2
3 8 8 4
2 2 5 3 3 3 3 7 2 3
6 6 6 6 1 1 1 1 1 1 7 7 49 49 52
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
An exercise
Write a function to evaluate a prefix expression.
Idea
- we scan the infix expression from left to right
- if we get an operand,
operand,
we place it in the postfix expression
- if we get an operator,
operator,
we pop all operators with higher (or same) precedence from the stack and put them in the
postfix string
we push the operator on the stack
3 + 5 * 6 7 * 2
1 2 3 4 5 6 7 8 9
Operator stack:
* * * *
+ + + + - - - -
1 2 3 4 5 6 7 8 9
Prefix expression:
3 5 6 * + 7 2 * -
1 2 3 4 5 6 7 8 9
begin
prcd := assignValue(op1) >= assignValue(op2);
end;
Infix expressions with brackets:
When an opening bracket is reached, we start evaluation of a fresh sub-expression.
So, to accommodate for brackets:
Example
( ( 1 - ( 2 + 3 ) ) * 4 ) $ ( 5 + 6 )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
+ +
( ( ( (
- - - - - - + +
( ( ( ( ( ( ( ( * * ( ( ( (
( ( ( ( ( ( ( ( ( ( ( ( $ $ $ $ $ $
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Postfix string: 1 2 3 + - 4 * 5 6 + $
Exercise
program Example;
var x, y: integer;
begin
x := 5; y := 10;
writeln(Three(x, y, false));
end.
Subprogram Call:
an activation record is pushed on the run-time stack
the return address is the address of the current instruction
parameters are 'initialised', local variables are not initialised
Subprogram Ends:
an activation record is popped from the run-time stack
parameters and local variables are freed
program execution resumes at the return address found in that record
Recursive subprograms.
Each recursive call involves a new activation record to be pushed on the run-time stack!
A queue is a dynamic data structure consisting of an ordered collection of items into which new
items may be inserted at one end, called the REAR of the queue, and from which items may be
deleted at the other end, called the FRONT of the queue.
We use DELETE or DEQUEUE to remove an element from the front of the queue.
x := delete(q) -> delete and return the element at the front of the queue.
11. AN EXAMPLE.
myQueue A B C D
front rear
INSERT(myQueue, E);
myQueue A B C D E
front rear
myQueue B C D E
front rear
myQueue C D E
front rear
INSERT(myQueue, F);
DATA STRUCTURES AND ALGORITHMS 72
myQueue C D E F
front rear
DEFINITION
type queue;
elementType;
1 2 3 4 5 ... 97 98 99 100
myQueue 5 8
insert(myQueue, 6)
insert(myQueue, 4)
Although only 4 cells of the array are used, an overflow will be generated.
myQueue 5 8 4
1 2 3 4 5 ... 97 98 99 100
myQueue 10 5 8 4
front <- 98
rear <- 1
Buffers
A queue is an ideal data structure to provide a buffer between (computer-) devices that work at
different speeds.
- Typing: characters are entered into a queue and are removed from the queue when they are
ready to be processed.
d i r
c:> dir
- Print SPOOLing
Print jobs are stored in a queue. Whilst the print spooler is taking care of the print jobs, you
continue to work on your machine.
In some environments, different users have different priorities.
An ASCENDING PRIORITY QUEUE consists of an ordered collection of items into which items may be
inserted arbitrarily and from which only the smallest element can be removed.
Note that both structures require items to have a field on which they can be sorted
insert(dpq, 3);
insert(dpq, 7);
insert(dpq, 2);
write(remove(dpq)); -> 7
write(remove(dpq)); -> 3
insert(dpq, 9);
insert(dpq, 1);
write(remove(dpq)); -> 9
insert(dpq, 3); 3
insert(dpq, 7);
7 3
insert(dpq, 2);
write(remove(dpq)); -> 7 7 3 2
write(remove(dpq)); -> 3 3 2
insert(dpq, 9);
2
insert(dpq, 1);
9 2
write(remove(dpq)); -> 9
9 2 1
2 1
In a model each object and action of the real world has its counterpart in the computer program.
Event-driven simulation:
Actions, or events, occur over a period of time. The simulation proceeds as events are generated
and they have their effect on the simulated situation. The generated events are stored in a queue-
structure, waiting to be processed.
Definition
A BINARY TREE is a finite set of elements that is either empty or is partitioned into three disjoint
subsets. The first subset contains a single element called the root of the tree. The other two subsets
are themselves binary trees, called the left and the right sub-trees of the original tree. Each element
of a binary tree is called a node of the tree.
Example
R E
S D G U
O Y B
In a binary search tree all the elements in the left sub-tree of a node n have values less than the
contents of n and all elements in the right sub-tree of n have values bigger than or equal to the
contents of n.
Example.
Insert the following elements in an initially empty binary search tree.
70, 50, 30, 80, 60, 90, 55, 76, 20, 85
Post-order traversal
Exercise
What output will be produced when the following binary search tree is traversed using pre-order,
post-order and in-order traversals?
program removeDuplicates;
uses trees;
begin
write('Enter number: ');
readln(number);
t := createNode(number);
while number <> sentinel do
begin
write('Enter number: ');
readln(number);
if number <> sentinel then begin
aux := t;
head := t;
while (number <> getInfo(aux)) and not(emptyTree(head)) do
begin
aux := head;
if number < getInfo(head)
then head := getLeft(head)
else head := getRight(head);
end;
if number = getInfo(aux)
then writeln(number, ' is a duplicate')
else if number < getInfo(aux)
then setLeft(aux, createNode(number))
else setRight(aux, createNode(number))
end;
end;
end.
unit trees;
interface
implementation
begin
end.
bt = 1
1 2 3 4 5 6 7 8 9 10
info 25 6 15 50 40 30 14 20 30 75
left 3 0 2 0 6 0 0 0 0 0
right 5 0 8 10 4 0 0 0 0 0
How to find a free node? Use a special value for left or right!
Construct a free-list.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
25 15 40 6 20 30 50 75
Advantage:
Advantage: we can go to a nodes father.
Disadvantage:
Disadvantage: use of space in the array.
A heterogeneous binary tree is a tree in which the information field of different nodes might have a
different type.
3 + 4 * (6 - 7) / 5 + 1
evaluate(expTree):
case expTree^.infoKind of
operand: evaluate := expTree^.info
operator: begin
opnd1 := evaluate(getLeft(expTree));
opnd2 := evaluate(getRight(expTree));
oper := getInfo(expTree^.info);
evaluate := calculate(oper, opnd1, opnd2);
end;
end;
Observation:
: lots of pointers in a tree have the nil value
traversing a tree is a recursive process
Idea: use some of those nil pointers as guides for traversing the tree
Example
b c
d e f
g h i
A tree is a finite nonempty set of elements in which one element is called the root and the remaining
elements are partitioned into m>=o disjoint subsets, each of which itself is a tree.
The degree of a node is defined as its number of sons.
Example:
Implementation:
using arrays:
const maxSons = 20;
type tree = ^node;
node = record
info: integer;
sons: array[1..maxSons] of tree;
end;
using pointers:
type tree = ^node;
node = record
info: integer;
sons: tree;
brother: tree;
end;
Merging is the process of combining two or more sorted files into a third sorted file.
A sequence of x sorted numbers on a file is called a run of length x.
Example
[10 18 25 27 43 46 55 75]
Merge sort is an external sorting method with O(n Log2n), but can also be used on tables and any
other sequential data structure like linked lists
Radix sort is another external sorting method based on the values of the actual digits in a number
(key).
Example:
123 45 670 320 523 36 13 605 102 425 671 11
Empty queues:
670 320 671 11 102 123 523 13 45 605 425 36
Empty queues:
102 605 11 13 320 123 523 425 36 45 670 671
Empty queues:
11 13 36 45 102 123 320 425 523 605 670 671
Hashing is a way of organising information that tries to optimise the speed for searching.
It's aim is actually to come up with a searching algorithm of O(1).
Example.
We want to store information on 200 parts, each part identified by a key which is a number between
0 and 999999.
hash function
key -> index
hash of key
Knowing that the index will be a number between 0 and 999 we can declare our table as follows:
For storing and retrieving purposes we now use the hash function.
110003
010004
569005
100007
563996
103997
A problem arises when two records hash to the same hash key, this we call a collision or a hash
clash.
hash(110003) = 3
hash(583003) = 3
Example:
Example:
rehash(x) = (x + 1) mod 1000 linear rehashing
Example:
A final remark
The design of a good hash function is a complex task. It should minimise the number of collisions,
and, at the same time, use as little space as possible.
midsquare
key * key and then take the middle few digits
folding
break number into parts and combine them (adding/or-ing)
division method
mod
A graph consists of a set of nodes (or vertices) and a set of arcs (or edges). Each arc in a graph is
specified by a pair of nodes.
G nodes {a, b, c, d, e}
edges { (a, b), (a, d), (a, e), (b, c),
(c, e), (d, c), (d, e), (e, e)}
If the pairs of nodes that make up the arcs are ordered pairs, the graph is said to be a directed
graph (or digraph).
G nodes {a, b, c, d, e}
edges { <a, b>, <a, d>, <a, e>, <b, c>,
<c, e>, <d, c>, <d, e>, <e, e> }
A node n is incident to an arc if n is one of the two nodes in the ordered pair of nodes that comprise
x. We also say that x is incident to n.
A path from a node to itself is called a cycle. If a graph contains a cycle it is cyclic, otherwise it is
acyclic.
A directed acyclic graph is called a dag.
Example.
G nodes = {a, b, c, d, e, f}
arcs = { <A, B>, <A, C>, <B, A>, <C, C>, <D, A>,
<D, C>, <D, F>, <D, D>, <F, C>, <F, D> }
B C
D E
Adjacency matrix.
1 2 3 4 5
1 F T F T F
2 F F F F F
3 F T T F T
4 F T F F F
5 F F T F F
arc = record
adj: boolean;
weight: integer;
end;
adjMatrix = array[1..maxNodes, 1..maxNodes] of arc;
1 2 4
2
3 2 3 5
4 2
5 3
A PERT graph is a weighted directed acyclic graph in which there is exactly one source and one sink.
The weights represent a time value.
Example.
bytes blocks
The unit of transfer between main storage and secondary storage is a block.
Disadvantage: slow
Advantage: can be implemented on any medium
Index area.
This area holds the index, which is usually split in different levels.
Example:
The first level index contains the highest key found on every cylinder. There is one first level index.
Key Cylinder-Index
150 1
300 2
570 3
. .
. .
. .
40000 249
50000 250
Key Surface-Index
20 1
50 2
75 3
. .
. .
. .
140 9
150 10
The third level index contains the highest key found on every sector, there is one third level index
for every cylinder/surface.
Key Sector-Index
1 1
5 2
7 3
. .
. .
. .
12 9
20 10
The index is kept on disk! When working with the file, relevant parts of the index are read in main
memory.
When overflow areas become full, access time becomes slower and the need for re-organising the
file arises.
Random files give the fastest (direct) access time to records but have the big disadvantage that
records can not be accessed sequentially.
The techniques for these files are the hashing techniques in which record keys are translated into
disk addresses.
The blocks (sectors) act as buckets.
1 INFORMATION.
STATIC IMPLEMENTATION
node = RECORD
item: article;
next: ^node;
END;
stock = ^node;
DYNAMIC IMPLEMENTATION
2 ALGORITHMS.
Adding articles
Deleting articles *
Changing price of an article *
Changing description of article *
Update the stock *
Printing out the stock in an ordered way
* involves a search!
we prefer a binary search
only possible with tables
only possible when data is sorted
A. Initializing
STATIC DYNAMIC
NO SIGNIFICANT DIFFERENCE
B. Adding articles
STATIC DYNAMIC
OVERFLOW
SHIFTING
C. Deleting articles
STATIC DYNAMIC
SEARCH
SHIFTING
C. Changing price/description
STATIC DYNAMIC
SEARCH
D. Update stock
STATIC DYNAMIC
SEARCH
STATIC DYNAMIC
NO SIGNIFICANT DIFFERENCE
Based on algorithms
Static implementation will generally be faster
Based on storage
POINTER OVERHEAD
for each item there is a pointer which takes up memory space!
Example: assume 1 pointer takes up 4 bytes.
assume 1 article takes up 300 bytes.
assume there is static space reserved for 10000
articles of which 9800 are actually used.
STATIC DYNAMIC
3000000 bytes taken from memory 9800 x 304 = 2979200 bytes used
2850000 bytes really used 9500 x 4 = 38000 bytes overhead
150000 bytes wasted
The idea:
To sort an array A containing N elements...
Note that A[index] will remain in this position when the whole array is sorted.
If we repeat this process for the elements in the left and right subarray than
we end up with a completely sorted array.
An example:
1 2 3 4 5 6 7 8
30 60 50 40 20 80 75 35
myTable
Lets take the first element and re-arrange the myTable
1 2 3 4 5 6 7 8
20 30 50 40 60 80 75 35
myTable
Left part is OK (1 element)
1 2 3 4 5 6 7 8
20 30 35 40 50 60 75 80
myTable
1 2 3 4 5 6 7 8
20 30 35 40 50 60 75 80
myTable
1 2 3 4 5 6 7 8
20 30 35 40 50 60 75 80
myTable
3 4 5 6 7 8
50 40 60 80 75 35
3 4 5 6 7 8
50 40 60 80 75 35
3 4 5 6 7 8
50 40 60 80 75 35
3 4 5 6 7 8
50 40 35 80 75 60
3 4 5 6 7 8
50 40 35 80 75 60
3 4 5 6 7 8
50 40 35 80 75 60
-> when low > high then swap T[pivot] with T[high]
3 4 5 6 7 8
35 40 50 80 75 60
Round up:
Declarations:
Using quicksort:
quickSort(myTable, 1, max);
Time analysis:
BubbleSort O(N2)
SelectionSort O(N2)
InsertionSort O(N2)