Data Structures
Data Structures
Data Structures
STRUCTURES
N.K. Tiwari
Director
Bansal Institute of Science & Technology
Bhopal (MP)
Jitendra Agrawal
Assistant Professor
Department of Computer Science & Engineering
Rajiv Gandhi Proudyogiki Vishwavidyalaya
Bhopal (MP)
Shishir K. Shandilya
Dean (Academics) and Professor & Head
Department of Computer Science & Engineering
Bansal Institute of Research & Technology
Bhopal (MP)
Published by
I.K. International Publishing House Pvt. Ltd.
S-25, Green Park Extension
Uphaar Cinema Market
New Delhi110 016 (India)
E-mail: info@ikinternational.com
Website: www.ikbooks.com
ISBN: 978-93-84588-92-2
2016 I.K. International Publishing House Pvt. Ltd.
Preface
This is an introductory book for data structures as a core subject recommended for
beginners. This book focuses on data structures and algorithms for manipulating them.
Data structures for storing information in tables, lists, trees, queues and stacks are covered.
As a subject, Data Structures will be suitable for B.E./B. Tech students of Computer
Science & Engineering and for M.C.A. students. It is also useful for working software
professionals and programmers for understanding commonly used data structures and
algorithm techniques. Familiarity with C programming is assumed from all readers. To
understand the material in this book one should be comfortable enough in a programming
language to be able to work with and write their own variables, arithmetic expressions, ifelse conditions, loops, subroutines, pointers, class structures, and recursion modules.
The purpose of this book is to provide all the important aspects of the subject. Attempt
has also been made to illustrate the working of algorithms with self-explanatory examples.
Outline
Organized in ten chapters, each chapter includes problems and programming examples
also.
N.K. Tiwari
Jitendra Agrawal
Shishir K. Shandilya
Contents
Preface
1. Introduction
1.1 Information
1.2 Basic Terminologies
1.3 Common Structures
1.4 Abstract Data Type
1.5 Specification
1.6 Layered Software
1.7 Data Structure
1.8 Algorithms
2. Array
2.1 Introduction
2.2 Uses
2.3 Array Definition
2.4 Representation of Array
2.5 Ordered List
2.6 Sparse Matrices
2.7 Storage Pool
2.8 Garbage Collection
3. Recursion
3.1 Introduction
3.2 Recursion
3.3 Tower of Hanoi
3.4 Backtracking
4. Stack
4.1 Definition and Examples
4.2 Data Structure of Stack
4.3 Disadvantages of Stack
4.4 Applications of Stack
4.5 Expressions (Polish Notation)
1
INTRODUCTION
In computer science, a data structure is a way of storing data in computer so that it can be
used efficiently. Often a carefully chosen data structure will allow a more efficient
algorithm to be used. The choice of the data structure often begins with the choice of an
abstract data structure. A well-designed data structure allows a variety of critical
operations to be performed, using as little resources, both during execution time and
memory space allocation, as possible. After the data structures are chosen, the algorithms
to be used often become relatively obvious. Sometimes things work in the opposite
direction data structures are chosen because certain key tasks have algorithms that work
best with a particular data structure.
This insight has given rise to many formalized design methods and programming
languages in which data structures, rather than algorithms, are the key organizing factors.
Most languages feature some sort of a module system, allowing data structures to be
safely reused in different applications by hiding their verified implementation details
behind controlled interfaces. Object-oriented programming languages such as C++ and
Java in particular use objects for this purpose. Since data structures are so crucial to
professional programs, many of them enjoy extensive support in standard libraries of
modern programming languages and environments, such as C++ Standard Template
Library, the Java API, and the Microsoft .Net framework.
1.1 INFORMATION
Computer science is fundamentally the study of information. The information is
associated with an attribute or a set of attributes of a situation or an object; for example,
the number of students in a class, the length of a hall, and the make of a computer. But to
explain and transmit these abstract properties they are represented in the same way and
these representations convey the knowledge or information. As a result of frequent and
well-understood use, these representations have come to be accepted as being the
information they convey.
The basic unit of information is the data; information is a collection of data. When data
is processed or organized, it gives a meaningful and logical knowledge, and it becomes
information.
name and last name. Data can be numerical, character, symbol or any other kind of
information.
A data type consists of a domain (a set of values), and a set of operations. A data type is
a term which refers to kinds of data that variables may hold in the programming language.
The data is stored in the memory at some location. By using the name of the variable, one
can access the data from that memory location easily. For example in C, the data types
are int (integer value), float (floating point value), char (character), double (real value of
large range) etc.
Data types are divided into the following categories: built in data types (primitive data)
and user defined data types (non-primitive data). Generally, a programming language
supports a set of built in data types and allow a user to define a new type which are called
user defined data types.
1. Built in data type: These are basic data that are directly operated upon machine
instructions. These have different representations on different computers. Such as int,
float, char, double which are defined by the programming language itself.
2. User defined data type: These are more sophisticated data, which are derived from
the primitive data. The user defined data emphasize on structuring of a group of
homogeneous or heterogeneous data items. With the set of built in data types a user
can define his own data type such as arrays, lists, stacks, queues, file, etc.
Example: Consider the data type fraction. How can we specify the domain and operations
that define fractions? It seems straightforward to name the operations; fractions are
numbers so all the normal arithmetic operations apply, such as addition, multiplication and
comparison. In addition, there might be some fraction-specific operations such as
normalization of a fraction by removing common terms from its numerator and
denominator. For example, if we normalize 6/9 wed get 2/3.
But how do we specify the domain for fractions, i.e. the set of possible values for a
fraction?
Structural and Behavioral Definitions
There are two different approaches to specifying a domain: we can give a structural
definition or a behavioral definition. Let us see what these two are like.
Structural Definition of the domain for fraction
The value of a fraction is made of three parts (or components):
A sign, which is either + or
A numerator, which may be any non-negative integer
A denominator, which may be any positive integer (not zero, not negative).
A structural definition defines the values of a type by imposing an internal structure on
them. This is called a structural definition because it defines the values of the type fraction
by imposing an internal structure on them (they have three parts). The parts themselves
have specific types, and there may be further constraints. For example, we could have
insisted that a fractions numerator and denominator have no common divisor (in that case
we wouldnt need the normalization operation 6/9 would not be a fraction by this
definition).
Behavioral definition of the domain for fraction
The alternative approach for defining the set of values for fractions does not impose any
internal structure on them. Instead, it just adds an operation that creates fractions out of
other things, such as
CREATE_FRACTION (N, D)
Where N is any integer, D is any non-zero integer.
The values of the type fraction are defined to be the values that are produced by this
function for any valid combination of inputs.
The parameter names were chosen to suggest its intended behavior:
CREATE_FRACTION (N, D) should return a value representing the fraction N/D (N for
numerator and D for denominator).
CREATE_FRACTION could be any old random function. How do we guarantee that
CREATE_FRACTION (N, D) actually returns the fraction N/D?
The answer is that we have to constrain the behavior of this function by relating it to the
other operations on fractions. For example, one of the key properties of multiplication is:
NORMALIZE ((N/D) * (D/N)) = 1/1
This turns into a constraint on CREATE_FRACTION:
NORMALIZE (CREATE_FRACTION (N, D) * CREATE_FRACTION (D, N)) =
CREATE_FRACTION (1, 1)
CREATE_FRACTION cannot be any old function, its behavior is highly constrained,
because we can write down a lot of constraints like this.
In this type of definition, the domain of a data type the set of permissible values plays
an almost negligible role. Any set of values will do, as long as we have an appropriate set
of operations to go along with it.
ADT is useful to handle the data type correctly. Always what is to be done is given in
ADT but how it is to be done is not given in ADT. Note that we have only given what are
the operations (arrays) in above example. But how is to be done is not given. Thus, while
using ADT only abstract representation of the data structure is given.
In a real application, we would like to experiment with many different implementations,
in order to find the implementation that is most efficient in terms of memory and speed
for our specific application. And, if our application changes, we would like to have the
freedom to change the implementation so that it is the best for the new application.
Equally important, we would like our implementation to give us simple implementations
of the operations. It is not always obvious from the outset how to get the simplest
implementation; so, again, we need to have the freedom to change our implementation.
What is the cost we must pay in order to change the implementation? We have to find
and change every line of code that depends upon the specific details of the implementation
(e.g. available operations, naming conventions, details of syntax for example, the two
implementations of fractions given above differ in how you refer to the components: one
uses the dot notation for structures, and the bracketed index notation for arrays). This can
be very expensive and can run a high risk of introducing bugs.
Programming with Abstract Data Types
By organizing our program this way i.e., by using abstract data types we can change
implementations extremely quickly. All we have to do is re-implement three very trivial
functions no matter how large our application is.
In general terms, an abstract data type is a specification of the values and operations that
has two properties:
It specifies everything you need to know in order to use the data type.
It makes absolutely no reference to the manner in which the datatype will be
implemented.
When we use abstract datatype, our program divides into two pieces as shown in Figure
1.1.
The Application: The part uses the abstract data type.
The Implementation: The part that implements the abstract data type.
These two pieces are completely independent. It should be possible to take the
implementation developed for one application and use it for a completely different
application with no changes.
If programming is done in teams, the implementers and application writers can work
completely independently once the specification is set.
1.5 SPECIFICATION
Let us now look in detail at how we specify an abstract data type. We will use stack as an
example.
The data structure stack is based on the everyday notion of stack, such as a stack of
books, a stack of plates or stack of folded towels. The defining property of a stack is that
you can only access the top element of the stack. All the other elements are underneath the
top one and these cant be accessed except by removing all the elements above them one
at a time.
The notion of a stack is extremely useful in computer science, and it has many
applications. It is so widely used that microprocessors often are stack-based or at least
provide hardware implementations of the basic stack operations.
We will briefly consider some of the applications later. First, let us see how we can
define, or specify, the abstract concept of a stack. The main point to notice here is how we
specify everything needed in order to use the stacks, without any mention of how the
stacks will be implanted.
Pre- & Postconditions
Preconditions
These are properties about the inputs that are assumed by an operation. If they are satisfied
by the inputs, the operation is guaranteed to work properly. If the preconditions are not
satisfied, the behavior of the operation is unspecified. It might work properly (by chance),
it might return an incorrect answer, or it might crash.
Postconditions
These specify the effects of an operation. These are the only things that you may assume
as have been done by the operation. They are only guaranteed to hold if the preconditions
are satisfied.
Note: the definition of the values of type stack makes no mention of an upper bound on
the size of a stack. Therefore, the implementation must support stacks of any size. In
practice, there is always an upper bound the amount of computer storage available. This
limit is not explicitly mentioned, but is understood it is an implicit precondition on all
operations that there is storage available, as needed. Sometimes this is made explicit, in
which case it is advisable to add an operation that tests if there is sufficient storage
2. Programs wont bomb mysteriously errors will be detected (and reported) at the
earliest possible moment. This is not true when the user checks preconditions, because
the user is human and occasionally might forget to check, or might think that checking
was unnecessary when it was needed in fact.
3. Most important of all, if we ever change the specification, and wish to add, delete, or
modify preconditions, we can do this easily, because the precondition occurs in
exactly one place in our program.
There are arguments on both sides. This textbook specifies that procedures should signal
an error if their preconditions are not satisfied. This means that these procedures must
check their own preconditions. Thats what our model solution will do too. We will
thereby sacrifice some efficiency for a high degree of maintainability and robustness.
It illustrates an important, general idea: the idea of a layered software. In this figure,
(Figure 1.2) there are two layers: the application layer and the implementation layer. The
critical point the property that makes these truly separated layers is that the
functionality of the upper layer and the code that implements that functionality are
completely independent of the code of the lower layer. Furthermore, the functionality of
the lower layer is completely described in the specification.
We have already discussed how this arrangement permits very rapid, bug-free changes to
the code implementing an abstract data type. But this is the not the only advantage.
Reusability
Another great advantage is that the abstract data type (implemented in the lower layer) can
be ready reused: nothing in it depends critically on the application layer (neither its
functionality nor its coding details). An abstract type like stack has extremely diverse
uses in computer science, and the same well-specified, efficient implementation can be
used for all of them (although always keep in mind that there is no universal, optimallyefficient implementation: so efficiency gains by re-implementation are always possible).
Abstraction in Software Engineering
Libraries of abstract data type are a very effective way of extending the set of data type
provided by a programming language, which themselves constitute a layer of abstraction
the so called virtual machine, above the actual data types supported by the hardware. In
fact, in an ordinary programming environment there are several layers of software layers
Thus traversing, inserting, printing, searching would be the operations required to perform
these tasks for the elements. Thus, the data object integer elements and set of operations
form the data structure Array.
Basic Operation of Data Structures
The data or elements appearing in our data structures are processed by means of certain
operations. In fact, the particular data structure that one chosen for a given situation
depends largely on the frequency with which specific operations are performed.
The following four operations play a major role in data processing on data structures:
1. Traversing: Accessing each record exactly once so that certain items in the record
may be processed. This accessing and processing is sometimes called visiting the
record.
2. Inserting: Adding a new record to the structure.
3. Deleting: Removing a record from the structure.
4. Searching: Finding the location of a record with a given key value, or finding the
locations of all records which satisfy one or more conditions.
Sometimes two or more of the operations may be used in a given situation. The following
two operations, which are used in special situations, are:
1. Sorting: Arranging the records in some logical order.
2. Merging: Combining the records in two different sorted files into a single sorted file.
Classification of Data Structures
Data structures are normally divided into two broad categories. Figure 1.3 shows various
types of data structures. Linear data structures are the data structures in which data is
arranged in a straight sequence, consecutive or in a list. For example, Arrays, Stacks,
Queues and List. Non-linear data structures are the data structures in which the data may
be arranged not in a sequence or hierarchical manner. For example, Trees and Graphs.
1.8 ALGORITHMS
An algorithm is composed of a finite set of steps, each of which may require one or more
7. Maintenance.
Let us discuss each step one by one.
Feasibility study
In the feasibility study, the problem is analyzed to decide whether it is feasible to develop
some program for the given problem statement. If we find that it is really essential to
develop some computer program for the given program then only the further steps will be
carried out.
Requirement analysis and problem specification
In this step, the programmer has to find out the essential requirement for solving the given
problem. For that, the programmer has to communicate with the user of his software. The
programmer then has to decide what are the inputs needed for this program, in which form
the inputs are to be given, the order of the inputs, and what kind of output should be
generated. Thus, the total requirement for the program has to be analyzed. It is also
essential to analyze what could be the possible in the program. Thus, after deciding the
total requirements for solving the problem, one can make the problem statement specific.
Design
Once the requirement analysis is done, the design can be prepared using the problem
specification document. In this phase of development, some layout for developing a
program has to be decided. In this step, the algorithm has to be designed for the most
suitable data structure. Then the appropriate programming language has to be
implemented for the given algorithm. The design of algorithm and selection of data
structures are the two key issues in this phase.
Coding
When the design of the program is ready then coding becomes a simpler job. If we have
already decided the language of implementation then we can start writing the code simply
by breaking the problem into small modules. If we can write functions for these modules
and interface functionalities in some desired order then the desired code gets ready. The
final step in coding is the well-document, well formed output.
Debugging
In this phase we compile the code and check for errors. If any error is there then we try to
eliminate it. The debugging needs a complete scan of the program.
Testing
In the testing phase, certain set of data is given to the program as an input. The program
should show the desired results as the output. The output should vary according to the
input of the program. For the wrong input, the program should terminate or it should
display some error message, it should not be in a continuous loop.
Maintenance
Once the code is ready and is tested properly, then if the user requires some modifications
in the code later then those modifications should be easily carried out. If the programmer
has to rewrite the code then it is because of poor design of the program. The modularity in
the code has to be maintained.
Documentation
The documentation is not a separate step in the program development process but it is
required at every step. Documentation means providing help or some manual which will
help the user to make use of the code in the proper direction. It is a good practice to
maintain some kind of document for every phase of the compilation process.
We have already discussed the fundamentals of algorithm. Writing an algorithm is
essential step in the program development process. The efficiency of algorithm is directly
related to efficiency of the program. In other words, if the algorithm is efficient then the
program becomes efficient.
Analysis of Programs
The analysis of the program does not mean simply working of the program but to check
whether for all possible situations program works or not. The analysis also involves
working of the program efficiently. Efficiency in the following sense:
1. The program requires less amount of storage space.
2. The programs get executed in very less amount of time.
The time and space are factors which determine the efficiencies of the program. Time
required for execution of the program cannot be computed in terms of seconds because of
the following factors:
1. The hardware of the machine.
2. The amount of time required by each machine instruction.
3. The amount of time required by the compilers to execute the instruction.
4. The instruction set.
Hence, we will assume that time required by the program to execute means the total
number of times the statements get executed.
Complexity of an Algorithm
The analysis of algorithms is a major task in computer science. In order to compare
algorithms, there must be some criteria to measure the efficiency of an algorithm. An
algorithm can be evaluated by a variety of criteria the rate of growth of the time or
space required to solve larger and larger instance of a program.
The three cases one usually investigates in complexity theory are as follows:
1. Worst case: The worst case time complexity is the function defined by the maximum
amount of time needed by an algorithm for an input of size, n. Thus, it is the
function defined by the maximum number of steps taken on any instance of size n.
2. Average case: The average case time complexity is the execution of an algorithm
having typical input data of size n. Thus, it is the function defined by the average
number of steps taken on any instance of size n.
3. Best case: The best case time complexity is the minimum amount of time that an
algorithm requires for an input of size n. Thus, it is the function defined by the
minimum number of steps taken on any instance of size n.
Space Complexity: The space complexity of a program is the amount of memory it needs
to run to completion. The space needed by a program is the sum of the following
components:
A fixed part that includes space for the code, space for simple variable and fixed size
component variables.
The variable part that consists of the space needed by a component variable where the
size is dependent on the particular problem.
The space requirement S (P) of any algorithm P may therefore be written as
S (P) = c + Sp
where c is a constant and Sp denotes instance characteristics.
Time Complexity: The time complexity of an algorithm is the amount of computer time it
needs to run to completion. The time T (P) taken by a program P is the sum of the
compilation time and the run (or execution) time. The compilation time does not depened
on the characteristics. We assume that a compiled program will run several times without
recompilation. We concern ourselves with just time of a program. This run time is denoted
by Tp (instance characteristics).
If we knew the characteristics of the compiler to be used, we could proceed to determine
the number of additions, subtractions, multiplications, divisions, compares, stores and so
on, that would be made by the code for P.
Tp (n) = ca ADD (n) + cs SUB (n) + cm MUL (n) + ..
where n denotes the instance characteristics, and ca, cs, cm and so on.
Efficiency of algorithms
If we have two algorithms that perform the same task, and the first one has a computing
time of O(n) and the second of O(n2) , then we usually prefer the first one.
The reason for this is that as n increases the time required for the execution of the second
algorithm will get far more than the time required for the execution of the first. We will
study various values for computing the function for the constant values.
log2 n > n > n log2 n > n2 > n3 > 2n
Notice how the times O (n) and O (n log2 n) grow much slower than the others. For large
data sets, algorithms with a complexity greater than O(n log2 n) are often impractical. The
very slow algorithm will be the one having the time complexity 2n.
= 2(2) +2
F (n) = 6
and g (n) = n2
= (2)2
g (n) = 4
i.e. F (n) > g (n)
if n = 3 then,
F (n) = 2 n + 2
= 2(3) +2
F (n) = 8
and g (n) = n2
= (3)2
g (n) = 9
i.e. F(n) < g (n) is true.
Hence, we can conclude that for n > 2, we obtain,
F (n) < g (n)
Thus, always upper bound of existing time is obtained by the big O notation.
Omega Notation
Omega notation is denoted by W. This notation is used to represent the lower bound of
the algorithms running time. Using the omega notation we can denote the shortest amount
of time taken by an algorithm.
Definition
A function F (n) is said to be in W (g (n)) if F (n) is bounded below by some positive
constant multiple of g (n) such that
F (n) c * g (n) for all n n0
It is denoted as F (n) W (g (n)). The following graph illustrates the curve for the W
notation.
F (n) = 2(0)2 + 5
= 5
g (n) = 7 (0)
= 0 i.e. F (n) > g (n)
But if n = 1
F (n) = 2(1)2 + 5
= 7
g (n) = 7 (1)
= 7 i.e. F (n) = g (n)
If n = 2 n = 2
F (n) = 2(2)2 + 5
= 13
g (n) = 7 (2)
= 14 i.e. F (n) < g (n)
If n = 3 n = 3
F (n) = 2(3)2 + 5
= 23
g (n) = 7 (3)
= 21 i.e. F (n) > g (n)
Hence, we can conclude that for the n > 3, we obtain F (n) > c * g (n). It can be
represented as 2 n2 + 5 W (n).
Thus, always the lower bound of the existing time is obtained by W notation.
Q Notation
The theta notation is denoted by Q. By this method, the running time is between the upper
bound and lower bound.
Definition
Let F (n) and g (n) be two non-negative functions. There are two positive constants,
namely, c1 and c2 such that
c1 g (n) F (n) c2 g (n)
Thus, we can say that
F (n) (g (n))
2
ARRAY
2.1 INTRODUCTION
In computer programming, an array, (also known as a vector or list) is one of the simplest
data structures. Array is a non-primitive data structure or linear data structure. An array
holds a series of data elements, usually of the same size and data type. Individual elements
are accessed by an index using a consecutive range of integers, as opposed to an
associative array. Some arrays are multi-dimensional, i.e. they are indexed by a fixed
number of integers, for example by a quadruple of four integers. Generally one and twodimensional arrays are the most common.
The fundamental data types are char, int, float, and double. Although these types are very
useful, they are constrained by the fact that a variable of these types can store only one
value at any given time. Therefore, they can be used to handle limited amounts of data. In
many applications, however, we need to handle a large volume of data in terms of reading,
processing and printing. To process such large amounts of data, we need a powerful data
type that would facilitate efficient storing, accessing and manipulation of data items. C
supports a derived data type known as Array that can be used for such applications.
Most programming languages have arrays as built in data type. Some programming
languages (such as Fortran, C, C++, and Java) generalize the available operations and
functions to work transparently over arrays as well as scalars, providing a higher-level
multiplication than most other languages, which require loops over all the individual
members of the arrays.
2.2 USES
Although useful in their own right, arrays also form the basis for several more complex
data structures, such as heaps, hash tables and lists and can represent strings, stacks and
queues. They also play a minor role in many other data structures. All of these
applications benefit from the compactness and locality of arrays.
One of the disadvantages of array is that it has a single fixed size, and although its size
can be arrived in many environments, this is an expensive operation. Dynamic arrays are
arrays which automatically perform this resizing as late as possible, when the programmer
attempts to add an element to the end of the array and there is no more space. To average
the high cost of resizing over a long period of time, they expand the array again, it just
uses more of this reserved space.
In the C programming language, one-dimensional character arrays are used to store null
terminated strings, so called because the end of the string is indicated with a special
reserved character called the null character.
can begin with number 0, that is, A[0] is allowed. For example, if we want to represent a
set of five numbers, say (12, 23, 33, 45, 54) by an array variable A, then we may declare it
as
int A[5];
And the computer reserved five storage locations as shown below:
Now let us see how to handle this array. We will write a simple C++ program in which
we are simply going to store the elements and then we will print those stored elements.
#include<iostream.h>
#include<conio.h>
main( )
{
int a[5];
clrscr( );
cout<< Enter the element which want to store<<endln;
for ( int i = 0; i < 5; i++)
{
cin>> a[i];
}
cout<<Print the stored element in array<<endln;
for ( int i = 0; i < 5; i++)
{
cout<< a[i] <<endln;
}
getch( );
}
Two dimensional arrays are called matrices in mathematics and tables in business
application. Hence, two-dimensional arrays are also called matrix arrays.
A two-dimensional m n array A is a collection of m * n data elements such that each
element specified by a pair of integers I and J, called subscript such that
1 I m and 1 J n
The element of A with first subscript I and second subscript K will be dented by
A [I, J] or AI,J
There is a standard way of drawing a two-dimensional m nA where the elements of A
form a rectangular array with m rows and n columns and where the element A[I, J]
appears in row I and column J. One such type of two-dimensional array with dimensions 3
rows and 4 columns is shown in Figure 2.2.
A [3, 4] where m = 3 is number of rows, and n = 4 is number of columns.
To calculate the address of the first of an arbitrary element A [I, J], first compute the
address of the first element of row I and then add the quantity J * size. Therefore, the
address of A [I, J] is:
Base (A) + (I * n + J) * size
For example, the array A [3, 4] is stored as in Figure 2.3. The base address is 200. Here
m =3, n = 4 and size = 1. Then the address of A [1, 2] is computed as
= 200 + (1 * 4 + 2) * 1
= 206
Column major representation
If the elements are stored in a column-wise manner then it is called column major
representation. It means that the complete first column is stored and then the complete
second column is stored and so on.
Example: If we want to store elements 10, 20, 30, 40, 50, 60, 70, 80, 90,100,110,120 then
the elements will be filled up in a column-wise manner as follows (consider the array A
[3,4]).
For example, the array A [3,4] is stored as in Figure 2.3. The base address is 200. Here m
=3, n = 4 and size = 1. Then the address of A [1, 2] is computed as
= 200 + (3 (2 0) + (1 0)) * 1
= 207
Example 2.1: Consider the integer array int A [3,4] declared. If the base address is 1000,
find the address of the element A [2, 3] with row major and column major representation
of array.
Solution:
Row major representation
Given that base address = 1000, the size is integer = 2 byte, m = 3, n = 4, I = 2, J = 3.
Then A [2, 3] = Base (A) + (I * n + J) * size
= 1000 + ( 2 * 4 + 3) * 2
= 1022
Column major representation
Given that base address = 1000, the size is integer = 2 byte, m = 3, n = 4, I = 2, J = 3, L1 =
0, L2 = 0.
Then A [2, 3] = base address + (m (J L2) + (I L1)) * size
= 1000 + (3 (3-0) + (2 -0)) * 2
= 1022
We will write a simple C++ program in which we are simply going to store the elements
and then we will print those stored elements in two- dimensional array and perform the
addition.
Program:
#include<iostream.h>
#include<conio.h>
void main()
{
int a[3][3],b[3][3],c[3][3],i,j;
clrscr();
cout<<enter the value of first matrix<<endl;
for(i=0;i<=2;i++)
{ for(j=0;j<=2;j++)
{ cin>>a[i][j];
}
}
cout<<first matrix is<<endl;
for(i=0;i<=2;i++)
{ for(j=0;j<=2;j++)
{ cout<<a[i][j];
}
cout<<endl;
}
cout<<enter second matrix<<endl;
for(i=0;i<=2;i++)
{ for(j=0;j<=2;j++)
{ cin>>b[i][j];
}
}
cout<<second is<<endl;
for(i=0;i<=2;i++)
{ for(j=0;j<=2;j++)
{ cout<<b[i][j];
}
cout<<endl;
}
cout<<addition of matrix<<endl;
for(i=0;i<=2;i++)
{ for(j=0;j<=2;j++)
{ c[i][j]=a[i][j]+b[i][j];
}
}
for(i=0;i<=2;i++)
{ for(j=0;j<=2;j++)
{ cout<<c[i][j]<< ;
}
cout<<endl;
}
getch();
}
Output of the Program
Enter the value of first matrix
1
2
3
4
5
6
7
8
9
First matrix is
1 2 3
4 5 6
7 8 9
Enter second matrix
14
10
12
13
11
25
23
26
22
Second is
14 10 12
13 11 25
23 26 22
Addition of matrix
15 12 15
17 16 31
30 34 31
In the above C++ code, the for loop is used to store the elements in an array. By this the
elements will be stored from location 0 to n-1. Similarly, for retrieval of elements again a
for loop is used.
int i, m, n, a[10][3];
cout << how many rows and columns;
cin >> m;
cin >>n;
for ( i =0; i <m; i++)
2.5.1 Polynomials
One classic example of an ordered list is a polynomial. A polynomial is the sum of term
consisting of variable, coefficient and exponent.
Various operations which can be performed on a polynomial are:
1. Addition of two polynomials
2. Multiplication of two polynomials.
3. Evaluation of polynomials.
An array structure can be used to represent the polynomial.
Representation of array polynomial using single dimensional array
For representing a single-variable polynomial one can make use of one-dimensional array.
In a single-dimensional array the index of an array will act as the exponent and the
coefficient can be stored at that particular index which can be represented as follows:
Example: 3x4 + 5x3 + 7x2 + 10x 19
This polynomial can be stored in single dimensional array.
3
RECURSION
3.1 INTRODUCTION
Recursion is a programming technique that allows the programmer to express operations
in terms of themselves. In C++, this takes the form of a function that calls itself. A useful
way to think of recursive functions is to imagine them as a process being performed where
one of the instructions is to repeat the process. This makes it sound very similar to a loop
because it repeats the same code, and in some ways it is similar to looping. On the other
hand, recursion makes it easier to express ideas in which the result of the recursive call is
necessary to complete the task. It must be possible for the process to sometimes be
completed without the recursive call. One simple example is the idea of building a wall
that is ten feet high. If I want to build a ten feet high wall, and then I will first build a 9
feet high wall, and then add an extra foot of bricks. Conceptually, this is like saying the
build wall function takes a height and if that height is greater than one, first calls itself to
build a lower wall, and then adds one foot of bricks.
3.2 RECURSION
Recursion is a programming technique in which the function calls itself repeatedly for
some input. Recursion is a process of doing the same task again and again for some
specific input.
Recursion is:
A way of thinking about problems.
A method for solving problems.
Related to mathematical induction.
A method is recursive if it can call itself, either directly:
void f( ) {
f( )
}
or indirectly:
void f( ) {
g( )
}
void g( ) {
f( )
}
A recursion is said to be direct if a subprogram calls itself. It is indirect if there is a
sequence of more than one subprogram call which eventually calls the first subprogram:
such as a function f calls a function g, which in turn calls the function f.
We can trace this computation in the same way that we trace any sequence of function
calls.
factorial(6)
factorial(5)
factorial(4)
factorial(3)
factorial(2)
factorial(1)
return 1
return 2*1 = 2
return 3*2 = 6
return 4*6 = 24
return 5*24 = 120
return 6*120 = 720
Our factorial( ) implementation exhibits the two main components that are required for
every recursive function.
The base case returns a value without making any subsequent recursive call. It does this
for one or more special input values for which the function can be evaluated without
recursion. For factorial( ), the base case is N = 1.
The reduction step is the central part of a recursive function. It relates the function at one
(or more) inputs to the function evaluated at one (or more) other inputs. Furthermore, the
sequence of parameter values must converge to the base case. For factorial(), the reduction
step is N*factorial(N 1) and N decreases by one for each call, so the sequence of
parameter values converges to the base case of N = 1.
A Factorial program in C++
#include<iostream.h>
#include<conio.h>
void main()
{
int n,fact;
int rec(int); clrscr();
cout<<Enter the number:->;
cin>>n;
fact=rec(n);
cout<<endl<<Factorial Result are:: <<fact<<endl;
getch();
}
rec(int x)
{
int f;
if(x==1)
return(x);
else
{
f=x*rec(x-1);
return(f);
}
}
Output of Program
Enter the number :-> 6
Factorial Result are:: 720
0 1 1 2 3 5 8 13 21 34
Each number in this sequence is the sum of two preceding elements. The series can be
formed in this way:
0thelement + 1stelement = 0 + 1 = 1
1stelement + 2ndelement = 1 + 1 = 2
2ndelement + 3rdelement = 1 + 2 = 3 so on.
Following the definition:
fibo(n) = if (n = 0) then 1
if (n = 1) then 1
else
fibo(n-1) + fibo( n-2)
We can define the recursive definition of Fibonacci sequence by the recursive function
function fibo( int n )
{
if ( (n == 0) || (n == 1) ) return 1;
else
return fibo(n-1) + fibo(n-2);
}
Output of Program
Enter the total elements in the series: 6
The Fibonacci series is:
0 1 1 2 3 5
{
if(n == 1)
if(n == 0)
return;
return;
else
else
print(n);
head(n-1);
tail(n-1);
Print(n);
A function with a path with a single recursive call at the beginning of the path uses a head
recursion. The factorial function of a previous exhibit uses a head recursion. The first
thing it does once it determines that recursion is needed is call itself with the decremented
parameter.
A function with a single recursive call at the end of a path uses a tail recursion. Most
examples of head and tail recursion can be easily converted into a loop. Most loops will be
naturally converted into head or tail recursion.
Iteration
Recursion
1.
The iterative methods are more efficient because of better execution speed.
2.
3.
It is a process of executing a statement or a set of statements, until some It is the technique of defining anything in
specified condition is specified.
terms of itself.
4.
5.
It is simple to implement.
It is complex to implement.
6.
the program.
Figure 3.1
The solution of this problem is very simple. The solution can be stated as
1. Move top n-1 disks from A to B using C as auxiliary.
2. Move the remaining disk from A to C.
3. Move the n-1 disks from B to C using A as auxiliary.
The above is a recursive algorithm: to carry out steps 1 and 3, apply the same algorithm
again for n1. The entire procedure is a finite number of steps, since at some point the
algorithm will be required for n = 1. This step, moving a single disc from peg A to peg B,
is trivial.
We can convert it to
Move disk 1 from A to B.
Move disk 2 from A to C.
Move disk 1 from B to C.
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Actually we have moved n -1 disk from peg A to C. in the same way we can move the
remaining disks from A to C.
Code for Program of Tower of Hanoi in C++
#include <iostream.h>
#include <conio.h>
void tower(int a,char from,char aux,char to){
if(a==1){
cout<<\t\tMove disc 1 from <<from<< to <<to<<\n;
return;
}
else{
tower(a-1,from,to,aux);
cout<<\t\tMove disc <<a<< from <<from<< to <<to<<\n;
tower(a-1,aux,from,to);
}
}
void main(){
clrscr();
int n;
cout<<\n\t\t*****Tower of Hanoi*****\n;
cout<<\t\tEnter number of discs : ;
cin>>n;
cout<<\n\n;
tower(n,A,B,C);
getch();
}
Output of Program
*****Tower of Hanoi*****
Enter number of discs: 2
Move disk 1 from A to B.
Move disk 2 from A to C.
Move disk 1 from B to C.
3.4 BACKTRACKING
Backtracking is a technique used to solve problems with a large search space that
systematically tries and eliminates possibilities. The name backtrack was first coined by
D.H. Lehmer in the 1950s. A standard example of backtracking would be going through a
maze. At some point in a maze, you might have two options of which direction to go. One
strategy would be to try going through portion A of the maze. If you get stuck before you
find your way out, then you backtrack to the junction. At this point in time you know
that portion A will NOT lead you out of the maze, so you then start searching in portion B.
Clearly, at a single junction you could have even more than two choices. The backtracking
strategy says to try each choice, one after the other, if you ever get stuck. Backtrack to
the junction and try the next choice. If you try all choices and never find a way out, then
there is no solution to the maze.
In this example we drew a picture of a tree. The tree is an abstract model of the possible
sequences of choices we could make. There is also a data structure called a tree, but
usually we dont have a data structure to tell us what choices we have (If we do have an
actual tree data structure, backtracking on it is called depth-first tree searching.).
The backtracking algorithm:
Here is the algorithm (in pseudocode) for doing backtracking from a given node n:
boolean solve(Node n) {
if n is a leaf node {
if the leaf is a goal node, return true
else return false
} else {
for each child c of n {
if solve(c) succeeds, return true
}
return false
}
}
Notice that the algorithm is expressed as a Boolean function. This is essential to
understanding the algorithm. If solve(n) is true, that means that node n is part of a
solutionthat is, node n is one of the nodes on a path from the root to some goal node. We
say that n is solvable. If solve(n) is false, then there is no path that includes n to any goal
node.
How does this work?
If any child of n is solvable, then n is solvable.
If no child of n is solvable, then n is not solvable.
Hence, to decide whether any non-leaf node n is solvable (part of a path to a goal node),
all you have to do is test whether any child of n is solvable. This is done recursively, on
each child of n. In the above code, this is done by the lines
for each child c of n {
if solve(c) succeeds, return true
}
return false
Eventually, the recursion will bottom out at a leaf node. If the leaf node is a goal node,
it is solvable. If the leaf node is not a goal node, it is not solvable. This is our base case. In
the above code, this is done by the lines
if n is a leaf node {
if the leaf is a goal node, return true
else return false
}
The backtracking algorithm is simple but important. You should understand it
thoroughly. Another way of stating it is as follows:
To search a tree:
1. If the tree consists of a single leaf, test whether it is a goal node,
2. Otherwise, search the subtrees until you find one containing a goal node, or until you
have searched them all unsuccessfully.
4
STACK
One of the most useful concepts of data structure in computer science is that of stack. In
this chapter, we shall define stack, algorithm and procedures for insertion and deletion and
see why stack plays such a prominent role in the area of programming. We shall also
describe prefix, postfix and infix expression. The stack method of expression evaluation
was first proposed by early German computer scientist F.L. Bauer, who received the IEEE
Computer Society Pioneer Award in 1988 for his work on computer stacks.
top of the stack as more and more elements are inserted into and deleted from the stack.
The declarations in C are as follows.
# define size 100
int stack [size];
int top = -1;
In the above declaration, the stack is nothing but an array of integers. And the most
recent index of that array will act as the top.
The stack is of the size 100. As we insert the numbers, the top will get incremented. The
elements will be placed from 0th position in the stack.
The stack can also used in a database. For example, if we want to store marks of all
students of third semester we can declare the structure of the stack as follows:
# define size 60
typedef struct student
{
int rollno;
char name [30];
float marks;
} stud;
stud S1 [size];
int top = -1;
The above stack will look like this
Thus, we can store the data about the whole class in our stack. The above declaration
means creation of a stack.
2. PUSH: This operation is used to the process of inserting a new element to the top of
stack. Each time a new element is inserted in the stack, the top is incremented by one
before the element is placed on the stack.
3. POP: This operation is used to the process of deleting an element from the top of
stack. After every pop operation the top is decremented by one.
4. EMPTY: This operation is used to check whether the stack is empty or not. It returns
a true if the stack is empty and false otherwise.
5. TOP: This operation is used to return to the top element of the stack.
6. PEEP: This operation is used to extract information stored at some location in a
stack.
Thus stackfull is a Boolean function: if the stack is full it returns 1 otherwise it returns 0.
This operation is used to the process of inserting a new element to the top of stack. Each
time a new element is inserted in the stack, the top is incremented by one before the
element is placed on the stack. The function is as follows:
void push (int item)
{
top= top + 1;
stack [top] = item;
}
The push function takes the parameter item which actually is the element which we want
to insert into the stack, which means we are pushing the element onto the stack. In the
function, we have checked whether the stack is full or not. If the stack is not full then only
the insertion of the element can be achieved by means of a push operation.
A push operation can be shown by following Figure 4.5.
int item;
if (top = = (size-1))
{
cout << the stack is full<< endl;
}
else
{
cout << Enter the element to be pushed << endl;
cin >> item;
top = top + 1;
S [top] = item;
}
}
int top=-1;
int stack[Maxsize];
void main()
{
int choice;
char ch;
do
{
clrscr ( );
cout <<1.Push<<endl;
cout <<2.Pop<<endl;
cout <<3.Traverse<<endl;
cout <<enter your choice<<endl;
cin >> choice;
switch(choice)
{
case 1: push();
break;
case 2:
cout <<The deleted element is<<endl<<pop( );
break;
case 3:
traverse();
break;
default:
cout<<your wrong choice<<endl;
}
cout<<Do u wish to continue pres Y<<endl;
cin >>ch;
}
while (ch==Y|| ch==Y);
}
void push( )
{
int item;
if(top==(Maxsize-1))
{
cout<<stack is full;
}
else
{
cout<<Enter the element to be inserted<<endl;
cin>>item;
top=top+1;
stack[top]=item;
}
}
int pop()
{
int item;
if(top==-1)
{
cout<<The stack is empty<<endl;
}
else
{
item=stack[top];
top=top-1;
}
return (item);
}
void traverse( )
{
int i;
if(top==-1)
{
cout<<The stack is empty;
}
else
{
for(i=top;i>=0;i)
{
cout<<Traverse the Element=<<stack[i];
cout<<endl;
}
}
Output
1. Push
2. Pop
3. Traverse
Enter your Choice 1
Enter the element to be inserted
19 21 23
1. Push
2. Pop
3. Traverse
Enter your Choice 3
Traverse the Element= 19 21 23
binary operators. Unary operators are + and - and binary operators are +, -, *, /
and exponential. In general, there are three types of expressions:
1. Infix Expression
2. Postfix Expression
3. Prefix Expression
One of the applications of stack is conversion of the expression. First of all, let us see
these expressions with the help of examples:
1. Infix Expression:
When the operators exist between two operands then the expression is called an infix
expression.
Infix expression = operand1 operator operand2
For example: 1. (A+B)
2. (A+B) * (C-D)
2. Prefix Expression:
When the operators are written before their operands then the expression is called a prefix
expression.
Prefix expression = operator operand1 operand2
For example: 1. (+AB)
2. * +AB CD
3. Postfix Expression:
When the operators are written after their operands then the expression is called a postfix
expression.
Postfix expression = operand1 operand2 operator
For example: 1. (AB+)
2. AB + CD *
Stack
Postfix
Empty
AB
AB
AB
ABC
( *
ABC
( *
ABCD
ABCD*
ABCD*F
( / ABCD*F
( / ABCD*FG
ABCD*FG/
ABCD*FG/
ABCD*FG/E
empty
ABCD*FG/ *
Stack
Prefix
dc
dc
dc
*)
dc
*)
dc b
*) +
dc b
*) +
dc ba
Operation
Stack
ab
The operator is read then pop two operands and form an infix
(a + b) c
(a + b) c d
The operator is read then pop two operands and form an infix
The operator is read then pop two operands and form an infix (a + b) * (c + d)
(a + b)
(a + b) (c + d)
Operation
Stack
ab
The operator is read then pop two operands and concatenate + with OP1 and OP2
+ a b c
+ a b c d
The operator is read then pop two operands and concatenate - with OP1 and OP2
The operator is read then pop two operands and concatenate * with OP1 and OP2 *+ a b c d
+ a b
+ a b c d
3
2
3, 2
9
5
*
9
9, 5
45
45
45, 3
45, 3, 2
45, 6
45, 6, 3
45, 3
45
15
15
5
+
15, 5
15
20
20
Number
Remainder
stack
Now stack is
of a string onto the stack and then pop all the characters from the stack and print them. For
example, if the input string is
P R O G R A M \0
then push all the characters onto the stack till \0 is encountered.
Top
M
A
R
G
O
R
P
Now if we pop each character from the stack and print it we get,
M A R G O R P
5
QUEUE
5.1 INTRODUCTION
A queue is a linear data structure in which additions are made only at one end of the list
and from the other end of the queue you can delete the elements. The queue can be
formally defined as an ordered collection of elements that has two ends named as front
and rear. From the front end one can delete the elements and from the rear end one can
insert the elements. This is a first in first out (FIFO) list since an element, once added to
the rear of the list, can be removed only when all the earlier additions have been removed.
Example:
When a receptionist makes a list of the names of patients who arrive to see a doctor,
adding each new name at the front of the list and crossing the top name off the list as a
patient is called in, her list of names has the structure of a queue. The word queue is also
used in many other everyday examples. The typical example can be a queue of people
who wait for railway ticket a ticket counter at a railway station. Any new person joins at
one end of the queue. You can call it as the rear end. When one person gets ticket at the
other end first, you can call it as the front end of the queue.
Figure 5.1 represents the queue of a few elements.
A queue is a linear list where additions and deletions may take place at either end of the
list, but never in the middle. A queue which is both input-restricted and output-restricted
must be either a stack or a queue.
Figure 5.2
In this case, the beginning of the array will become the front for the queue and the last
location of the array will act as rear for the queue. The total number of elements present in
the queue is
Front rear + 1
Let us consider that there only 10 elements in the queue at present as shown in Figure
5.3 (a). When we remove an element from the queue, we get the resulting queue as shown
in Figure 5.3 (b) and when we insert an element in the queue we get the resulting queue as
shown in Figure 5.3 (c). When an element is removed from the queue, the value of the
front pointer is increased by 1 i.e.,
Front = Front + 1
Similarly, when an element is added to the queue the value of the rear pointer is
increased by 1 i.e.,
Rear = Rear + 1
If rear < front then there will be no element in the queue or the queue will always be
empty.
A queue is a collection of elements in which an element can be inserted form one end
called rear and elements get deleted from the other end called front.
Operation
Q_full( ) checks whether a queue is full or not.
Q_Empty( ) checks whether a queue is empty or not.
Q_insert ( ) insert the element in a queue from the rear end.
Q_delete ( ) delete an element in queue from the front end.
Thus, the ADT for a queue gives the abstract for what has to be implemented, which are
the various operations on the queue. But it never specifies how to implement these
operations.
{
cout << queue empty;
return;
}
for(int i=front+1;i<=rear;i++)
cout <<queue1[i]<< ;
}
};
main()
{
int ch;
queue qu;
while(1)
{
cout <<\n1.Insert 2.Delete 3.Display 4.Exit\nEnter ur choice;
cin >> ch;
switch(ch)
{
case 1: cout <<enter the element;
cin >> ch;
qu.insert(ch);
break;
case 2: qu.delet(); break;
case 3: qu.display();break;
case 4: exit(0);
}
}
return (0);
}
Output
1.Insert 2.Delete 3.Display 4. Exit
Enter ur choice1
enter the element21
inserted21
1.Insert 2.Delete 3.Display 4.Exit
Enter ur choice1
We have deleted the elements 10, 20 and 30 means simply the front pointer is shifted
ahead. We will consider a queue from the front to the rear always. And now if we try to
insert any more elements then it wont be possible as it is going to give queue full
message. Although there is a space occupied by elements 10, 20 and 30 (these are the
deleted elements), we cannot utilize them because the queue is nothing but a linear array.
This brings us to the concept or circular queue. The main advantage of a circular queue
is that we can utilize the space of the queue fully. A circular queue shown in Figure 5.5.
A circular queue has a front and rear to keep the track of the elements to be deleted and
inserted. The following assumption are made:
1. The front will always be pointing to the first element.
2. If front = rear, the queue is empty.
3. When a new element is inserted into the queue the rear is incremented by one (Rear =
Rear + 1).
4. When an element is deleted from the queue the front is incremented by one (Front =
Front +1).
Insertion in a circular queue will be the same as with a linear queue, but it is required to
keep a track of front and rear with some extra logic. If a new element is to be inserted in
the queue, the position of the element to be inserted will be calculated using the relation:
Rear = (Rear + 1) % MAXSIZE
If we add an element 30 to the queue the rear is calculated as follows:
Rear = (Rear + 1) % MAXSIZE
= (2 + 1) % 5
= 3
The deletion method for a circular queue also requires some modification as compared to
a linear queue. The position of the front will be calculated by the relation:
Front = (Front + 1) % MAXSIZE
In D-queue, if the element is inserted from the front-end then the front is decreased by
1. If it is inserted at the rear-end then the rear is increased by 1. If the element is deleted
from the front-end, then the front is increased by 1. If the element is deleted from the rearend, then the rear is decreased by 1. When the front is equal to the rear before deletion
then front and rear are both set to NULL to indicate that the queue is empty.
ADT for D-queue
Instances:
Deq[MAX] is a finite collection of elements in which the elements can be inserted from
both the ends, rear and front. Similarly, the elements can be deleted from both the ends,
front and rear.
Precondition
The front and rear should be within the maximum size MAX.
Before an insertion operation, whether the queue is full or not is checked.
Before a deletion operation, whether the queue is empty or not is checked.
Operation
1. Create ( ): The D-queue is created by declaring the data structure for it.
2. Insert_rear ( ): This operation is used for inserting the element from the rear end.
3. Delete_front ( ): This operation is used for deleting the element from the front end.
4. Insert_front ( ): This operation is used for inserting the element from the front end.
5. Delete_rear ( ): This operation is used for deleting the element from the rear end.
6. Display ( ): The elements of the queue can display from the front to the rear end.
Algorithm for DQEmpty
1. [check for empty Deque]
If (front = = 0 and rear = =1)
Then print deque is empty
2. [finished]
Return
Algorithm for DQFull
1. [check for full Deque]
If (front = = 0 and rear = =MAX-1)
2. Then print deque is full
[finished]
Return
3. Delete ( ) If the priority queue is an ascending priority queue then only the smallest
element is deleted each time.
4. Display ( ) The elements of a queue are displayed from the front to rear.
Applications of Priority Queue
In network communication, a priority queue is used to manage limited bandwidth for
transmission.
In simulation modeling, a priority queue is used to manage the discrete events.
6
LIST
6.1 LIMITATIONS OF STATIC MEMORY
Static memory allocation is done by arrays. In arrays the elements are stored sequentially.
The elements can be accessed sequentially as well as randomly when we use arrays. But
there are some drawbacks or limitations of using arrays as given below.
1. Once the elements are stored sequentially, it becomes very difficult to insert the
element in between or to delete the middle elements. This is because, if we insert
some element in between then we will have to shift down the adjacent elements.
Similarly, if we delete some element from an array, then a vacant space gets created in
the array. And we do not desire such vacant spaces in between in the arrays. Thus
shifting of elements is time consuming and is not logical. The ultimate result is that
the use of array makes the overall representation time and space inefficient.
2. Use of array requires determining the array size prior to its use. There are some
chances that the pre-decided size of the array might be larger than the requirement.
Similarly it might be possible that the size of array may be less than the required one.
This results in either wastage of memory or shortage of memory. Hence, another data
structure has come up which is known as linked list. This is basically a dynamic
implementation.
6.2 LISTS
Lists, like arrays, are used to store ordered data. A list is a linear sequence of data objects
of the same type. Real-life events such as people waiting to be served at a bank counter or
at a railway reservation counter may be implemented using list structures. In computer
science, lists are extensively used in database management systems, in process
management systems, in operating systems, in editors, etc.
We shall discuss lists such as singly, doubly and circularly linked lists, and their
implementation; using arrays and pointers.
In computer science, a list is usually defined as an instance of an abstract data type
(ADT) formalizing the concept of an ordered collection of entities. For example, a single
linked-list, with 3 integer values is shown in Figure 6.1.
In practice, lists are usually implemented using arrays or linked lists of some sort, as lists
share certain properties with arrays and linked lists. Informally, the term list is sometimes
used synonymously with linked list.
6.3 CHARACTERISTICS
Lists have the following properties:
The size and contents of lists may or may not vary at runtime, depending on the
implementations.
Random access over lists may or may not be possible, depending on the
implementation.
In mathematics, sometimes equality of lists is defined simply in terms of object
identity: two lists are equal if and only if they are the same object.
In modern programming languages, equality of lists is normally defined in terms of
structural equality of the corresponding entries, except that if the lists are typed then
the list types may also be relevant.
In a list, there is a linear order (called followed by or next) defined on the elements.
Every element (except for one called the last element) is followed by one other
element, and no two elements are followed by the same element.
Note that the link field of the last node consists of NULL which indicates the end of the
list.
Node
Step 2: We start filling the data in each node at data field and assigning the next pointer to
the next node.
Here & is the address of the symbol. So the above figure can be interpreted as the
next pointer of n1 is pointing to the node n2. Then we will start filling the data in each
node and fill again the next pointer to the next node. Continuing this we will get:
Step 4: Now to print the data in a linked list we will use printf (\n % d, temp data);
desire and if some nodes are not required we can de-allocate them.
In the above example s1 is the pointer to the structure s. In the malloc function one
parameter is passed because the syntax of malloc is
malloc (size)
where size means how many bytes have to be allocated. The size can be obtained by the
function sizeof where syntax of size of is
sizeof (datatype)
When we finish using the memory, we must return it back. The function free in C is
used to free storage of a dynamically allocated variable.
The format for free is
free (pointer variable).
For example, the statement
free (i); // deallocated memory
We know that the list can be represented using arrays. In this section we will discuss in
detail how exactly a list can be represented using arrays. Basically, list is a collection of
elements. To show the list using arrays we will have data and link fields in the array. The
array can be created as shown in Figure 6.5.
struct node
{
int data;
int next;
}a[10];
Consider a list of 10, 20, 30, 40, and 50. We can store it in arrays as:
The next field in first node gives the index as 0. The next field in the last node gives the
index as 1. 1 is taken as end of the list.
With this concept various operations that can be performed on the list using array:
1. Creation of list
2. Insertion of any element in the list
3. Deletion of any element in the list
4. Display of list
5. Searching of particular element in the list
Let us see a C program based on it.
/* Implementation of various List operations using arrays */
# include <stdio.h>
# include <conio.h>
# include <stdlib.h>
# include <string.h>
struct node
{
int data;
int next;
} a[10];
void main ( )
{
char ans;
int i, head, choice;
int Create ( );
void Display ( );
void Insert ( );
void Delete ( );
void Search ( );
do
{
clrscr ( );
printf(\n Main Menu);
printf(\n1 Creation );
printf(\n2 Display);
printf(\n3 insertion of element in the list);
printf(\n4 Deletion of element from the list);
printf(\n5 Searching of element from the list);
printf(\n6 Exit);
printf(\n Enter your choice);
scanf(\n %d, &choice );
switch (choice)
{
case 1:
for (i = 0; i <10; i++)
{
a[i]. data = -1; // this for loop initialize the data field of list to -1
}
head = Create ( );
break;
case 2:
Display (head);
break;
case 3:
Insert ( );
break;
case 4:
Delete ( );
break;
case 5:
Search ( );
break;
case 6:
exit (0);
}
printf(\n Do you wish to go main menu? );
ans = getch ( );
}
while (ans = = Y !! ans = = y);
getch ( );
}
int Create ( ) // function for create a node
{
int head, i;
printf(\n Enter the index for first node );
scanf (%d , &i);
head = i;
while ( i != -1)
{
printf(\n Enter the data and index of the first element );
}
if (a[i + 1].data = = -1) // next location is empty
{
a[i+1].next = a[i].next;
a[i].next = i +1;
a[[i+1].data = new_data;
}
}
void Delete ( ) // function for delete a node
{
int i, temp, current, new_next;
printf(\n Enter the node to be deleted);
scanf(%d, &temp);
for (i =0; i <10; i++)
{
if(a[i].data = =temp)
{
if(a[i].next = =-1)
{
a[i].data = -1;
}
current = i;
new_next = a[i].next;
}
}
for (i =0; i <10; i++)
{
if(a[i].next = =current)
{
a[i].next = =new_next;
a[current].data = = -1;
}
}
}
void Search ( ) // function for search a node
{
int i, temp, flag = 0;
printf(\n Enter the node to be searched);
scanf(%d, &temp);
for (i =0; i <10; i++)
{
if(a[i].data = = temp)
{
flag =1;
break;
}
}
if(flag = =1)
printf(\n the %d node present in the list , temp);
else
printf(\n the node is not present);
}
Output of Program
1. Main Menu
2. Creation
3. Display
4. Insertion of element in the list
5. Deletion of element from the list
6. Searching of element from the list
7. Exit
Enter your choice 1
Enter the index for first node 4
6. Exit
Enter your choice 5
Enter the node to be searched 40
The 40 node is present in the list
Do you wish to go to main menu?
Main Menu
1. Creation
2. Display
3. Insertion of element in the list
4. Deletion of element from the list
5. Searching of element from the list
6. Exit
Enter your choice 6
It is usually not preferred to do list implementation using arrays because of two main
reasons:
1. There is a limitation on the number of nodes in the list because of the fixed size of
array. Memory may get wasted because of less elements in the list or there may be
large number of nodes in the list and we will not be able to store some elements in the
array.
2. Insertion and deletion of elements in array is complicated.
Next
20
NULL
New
Step 2:
if (flag = = TRUE)
{
head = new;
temp = head; /* this node as temp because heads address will be preserved in
head and we can change temp node as per requirement */
flag = FALSE;
}
Data
Next
20
NULL
New/head/temp
Step 3: If the head node of a linked list is created we can further create the linked list by
attaching the subsequent nodes. Suppose we want to insert a node with value 20 then:
Gets created after invoking get_node ( );
20 NULL
25
head/temp
New
NULL
Step 4: If a user wants to enter more elements then let us say for value 30 the scenario will
be:
Gets created after invoking get_node ( );
Next
18
NULL
head/temp
New
28
NULL
New
Then:
Suppose we want to delete node 25. Then we will search the node containing 25, using
the search (*head, key) routine. Mark the node to be deleted as temp. Then we will obtain
Then:
prev next = temp next
Now we will free the temp node using the free function. Then the linked list will be:
Suppose key = 30. We want a node containing value 30 then compare temp data and
key value. If there is no match then we will mark the next node as temp.
NO
Yes
break;
case3: printf (enter the element you want to search);
scanf (%d, &val);
break;
case4: head = insert (head);
break;
case5: dele (&head);
break;
case6: exit (0);
default: clrscr (0);
printf (invalid choice, try again);
getch ( );
}
}
while (choice!=6);
}
node* create( )
{
node *temp,*new,*head;
int val, flag;
char ans = y;
node *get_node ( );
temp = null;
flag = true;
do
{
printf (\nEnter the element:);
scanf (%d, &val);
new = get_node)():
if (new = = null)
printf (\n memory is not allocated);
else
found=TRUE;
}
if(found= =TRUE)
{
printf(\nThe element is present in the list\n);
getch( );
return temp;
}
else
{
printf(\nThe element is not present in the list\n);
getch( );
return NULL;
}
}
node *insert(node *head)
{
int choice;
node *insert_head(note *);
void insert _after(node *);
void insert_last(node *);
printf (\n 1.Insert a node as a head node);
printf (\n 2.Insert a node as a last node);
printf (\n 3. Insert a node as intermediate position in the link list);
printf (\n Enter your choice for insertion of node);
scanf (%d,&choice);
switch(choice)
{
case 1:head = insert_head(head);
break;
case 2:insert_last(head);
break;
case 3;insert_after(head);
break;
}
return head;
}
{
node*New,*temp;
New = get_node();
printf (\nEnter The element which you want to insert);
scanf(%d,Newdata);
if(head = =null)
head = new;
else
{
temp = head;
new next=temp;
head = new;
}
return head;
}
void insert_last(node*head)
{
node*New,*temp;
new = get_node ( );
printf(\nenter the element which you want to insert);
scanf(%d,%newdata);
if(head= = NULL)
head = new;
else
{
temp = head;
while(temp-.next!=null)
Temp = tempnext;
tempnext=new;
newnext=null;
}
}
{
int key;
node*new,*temp;
new=get_node();
printf (\nenter the element which you want to insert);
scanf(%d&newdata);
if (head = = NULL)
{
head = new;
}
else
{
printf (\n Enter the element after which you want to insert the node);
scanf (%d, &key);
temp = head;
do
{
if (temp data = = key)
{
New next = temp next;
temp next = new;
return;
}
else
temp = temp next;
}
while ( temp != NUUL);
}
}
node * get_pre (node * head, int val)
{
node *temp, *prev;
int flag;
temp = head;
if ( temp = = NULL)
returen NULL;
flag = FALSE;
prev = NULL;
while (temp != NULL && !flag)
{
if ( temp data != val)
{
prev = temp;
temp = temp next;
}
else
flag = TRUE;
}
if ( flag)
return prev;
else
return NULL;
}
void dele (node **head)
{
node *temp, *prev;
int key;
temp = *head;
if ( temp = = NULL)
{
printf ( \n The list is empty\n);
getch ( );
clrscr ();
return;
}
clrscr ( );
printf ( \n Enter the element you want to delete: );
scanf (%d, &key);
temp = search (*head, key);
if (temp != NULL)
{
prev = get_prev( *head, key);
if ( prev != NULL)
{
prev next = temp next;
free (temp);
}
else
{
*head = temp next;
free (temp);
}
printf ( \n The element is deleted\n);
getch ( );
clrscr ( );
}
}
Output
Program to perform various operations on linked list
1. Create
2. Display
3. Search for an item
4. Insert an element in a list
5. Delete an element from list
6. Quit
Enter your Choice ( 1-6) 1
Enter the element: 10
Do you want to enter more elements?(y/n) y
Enter the element: 20
Do you want to enter more elements?(y/n) y
Enter the element: 30
Do you want to enter more elements?(y/n) y
Enter the element: 40
Do you want to enter more elements?(y/n) n
The Singly linked list is created
Program to perform various operations on linked list
1. Create
2. Display
3. Search for an item
4. Insert an element in a list
5. Delete an element from list
6. Quit
Enter your Choice ( 1-6) 2
10 20 30 40 NULL
Program to perform various operations on linked list
1. Create
2. Display
3. Search for an item
4. Insert an element in a list
5. Delete an element from list
6. Quit
Enter your Choice ( 1-6) 3
Enter the element you want to search 30
The element is present in the list
Program to perform various operations on linked list
1. Create
2. Display
3. Search for an item
4. Insert an element in a list
5. Delete an element from list
6. Quit
Enter your Choice ( 1-6) 4
1. Insert a node as a head node
2. Insert a node as a last node
3. Insert a node at intermediate position in the linked list
Enter the your choice for insertion of node 1
Enter the element which you want to insert 9
Program to perform various operations on linked list
1. Create
2. Display
3. Search for an item
4. Insert an element in a list
5. Delete an element from list
6. Quit
Enter your Choice ( 1-6) 2
9 10 20 30 40 NULL
Sr.
No.
Array
Linked List
1.
Any element can be accessed randomly with the help of the index of Any element can be accessed by sequential
the array.
access only.
2.
3.
4.
When we traverse a circular list, we must be careful as there is a possibility to get into an
infinite loop, if we are not able to detect the end of the list. To do that we must look for the
starting node. We can keep an external pointer at the starting node and look for this
external pointer as a stop sign. An alternative method is to place the header node at the
first node of a circular list. This header node may contain a special value in its info field
that cannot be the valid contents of a list in the context of the problem. If a circular list is
empty then the external pointer will point to null.
Various operations that can be performed on circular linked list are:
1. Creation of a circular linked list.
2. Insertion of a node in a circular linked list
3. Deletion of any node from a linked list
4. Display of a circular linked list
1. Creation of circular linked list
First we will allocate memory for New node using a function get_node ( ). There is one
variable flag whose purpose is to check whether the first node is created or not. That
means that when the flag is 1 (set) then the first node is not created. Therefore, after
creation of the first node we have to reset the flag (set to 0).
Initially, the variable head indicates the starting node. Suppose we have taken element
10 and the flag =1, head = New;
New next = head;
flag = 0;
Now as flag = 0, we can further create the nodes and attach them as follows. When we
have taken element 10
temp = head;
temp next = head;
temp next = New;
New next = head;
2. Insertion of a node in circular linked list
For inserting a new node in the circular linked list, there are 3 cases:
(i) Inserting a node as a head node
(ii) Inserting a node as a last node
(iii) Inserting a node at an intermediate position
(i) If we want to insert a New node as a head node then,
20 NULL
New
Then
(ii) If you want to insert a New node as a last node consider a circular linked list given
below:
A New node as a last node then,
50 NULL
New
Then,
New
Then,
int info;
*lastnode = NULL;
firstnode = NULL;
printf(\n Enter the data:);
scanf(%d,&info);
while(info!=-999)
{
temp = (node *)malloc(sizeof(node));
temp item=info;
tempnext=NULL;
if(firstnode = =NULL)
firstnode = temp;
else
(*lastnode) next=temp;
(*lastnode) = temp;
scanf(%d,&info);
}
if(firstnode! = NULL)
tempnext=firstnode;
return(firstnode);
}
void display (node *first, node *last)
{
do
{
printf(\t %d, firstitem);
first = firstnext;
}while(last->next!=first);
return;
}
void insert(node **first, node **last)
{
node *new node;
node *temp;
int newitem, pos, i;
printf(\n Enter the new item:);
scanf(%d,&newitem);
printf(\n Position of insertion:);
scanf(%d,&pos);
if(((*first) = =NULL)||(pos = =1))
{
newnode = (node *)malloc(sizeof(node));
newnodeitem=newitem;
newnodenext=*first;
*first = newnode;
if((*last)!=NULL)
(*last) next=*first;
}
else
{
i=1;
temp=*first;
while((i < (pos-1)) && ((tempnext)!=(*first)))
{
i++;
temp = tempnext;
}
newnode = (node *)malloc(sizeof(node));
if(tempnext==(*first))
*last = newnode;
newnodeitem = newitem;
newnodenext=tempnext;
tempnext = newnode;
}
}
void delet(node **first,node **last)
{
node *temp;
node *prev;
int target;
printf(\n Data to be deleted:);
scanf(%d,&target);
if(*first= =NULL)
printf(\n List is empty);
else if((*first) item= =target)
{
if((*first) next= =*first)
*first=*last=NULL;
else
{
*first=(*first) next;
(*last) next=*first;
printf(\n Circular list\n);
display(*first,*last);
}
}
else
{
temp=*first;
prev=NULL;
while((tempnext!=(*first))&&((tempitem)!=target))
{
prev = temp;
temp=tempnext;
}
if(tempitem!=target)
{
printf(\n Element not found);
}
else
{
if(temp= =*last)
*last=prev;
prevnext=tempnext;
printf(\n CIRCULAR LIST);
display(*first,*last);
}
}
}
void main()
{
node *start,*end;
int choice;
clrscr();
printf(\n CIRCULAR LINKED LIST);
printf(\n );
do
{
choice = menu();
switch(choice)
{
case 1:
printf(\n Type -999 to stop);
start=create(&end);
1.CREATE
2.INSERT
3.DELETE
4.EXIT
Enter your choice: 2
Enter the new item: 40
Position of insertion: 2
Circular list
10 40 20 30
MAIN MENU
1.CREATE
2.INSERT
3.DELETE
4.EXIT
Enter your choice: 3
Data to be deleted: 20
Circular List
10 40 30
MAIN MENU
1.CREATE
2.INSERT
3.DELETE
4.EXIT
Enter your choice:3
Data to be deleted: 60
Element not found
Advantages of circular linked list over singly linked list
In a circular linked list the next pointer of the last node points to the head node. Hence we
can move from the last node to the head node of the list very efficiently. Hence accessing
of any node is much faster than that of a singly linked list.
A header node is a linked list which always contains a special node, called the header
node. The head node is a node which resides at the beginning of the linked list. Sometimes
such an extra node needs to be kept at the front of the list. This node basically does not
represent any data of the linked list. But it may contain some useful information about a
linked list such as the total number of nodes in the list, address of the last or some specific
unique information.
The following are two kinds of header list:
A grounded header list is a header list where the last node contains the null pointer.
A circular head list is a header list where the list node points back to the header node.
For example:
Figure 6.7
int data;
struct node *prev;
struct node * next;
}dnode;
The linked representation of a doubly linked list is
Step 2: For further addition of the nodes the New node is created.
NULL 10 NULL
NULL 20 NULL
Start/dummy New
dummy next = New;
New prev = dummy;
Step 3: For further addition of the nodes the New node is created.
Step 2: If we want to delete any node other than the first node then, we want to delete the
node other than 20 and call it as temp node.
head = var;
headprevious=NULL;
headnext=NULL;
last = head;
}
else
{
temp = var;
tempprevious=NULL;
temp next=head;
headprevious=temp;
head= temp;
}
}
void insert_end(int value)
{
struct node *var,*temp;
var=(struct node *)malloc(sizeof(struct node));
vardata = value;
if(head= =NULL)
{
head= var;
headprevious=NULL;
headnext=NULL;
last= head;
}
else
{
last= head;
while(last!=NULL)
{
temp= last;
last= lastnext;
}
last= var;
tempnext=last;
lastprevious=temp;
lastnext=NULL;
}
}
int insert_after(int value, int loc)
{
struct node *temp,*var,*temp;
var= (struct node *)malloc(sizeof(struct node));
vardata=value;
if(head==NULL)
{
head= var;
headprevious=NULL;
headnext=NULL;
}
else
{
temp= head;
while(temp!=NULL && tempdata!=loc)
{
temp=tempnext;
}
if(temp = =NULL)
{
printf(\nd is not present in list ,loc);
}
else
{
temp=tempnext;
tempnext = var;
varprevious = temp;
var next=temp1;
tempprevious = var;
}
}
last= head;
while(lastnext!=NULL)
{
last= lastnext;
}
}
int delete_from_end( )
{
struct node *temp;
temp= last;
if(temp previous= =NULL)
{
free(temp);
head= NULL;
last= NULL;
return ;
}
printf(\nData deleted from list is d \n,last->data);
last= tempprevious;
lastnext= NULL;
free(temp);
return ;
}
int delete_from_middle(int value)
{
struct node *temp,*var,*t, *temp;
temp= head;
while(temp!= NULL)
{
if(tempdata = = value)
{
if(temp previous= =NULL)
{
free(temp);
head=NULL;
last=NULL;
return ;
}
else
{
varnext= temp1;
tempprevious = var;
free(temp);
return ;
}
}
else
{
var= temp;
temp= tempnext;
temp=tempnext;
}
}
{
printf(enter the value you want to delete);
scanf(d,value);
delete_from_middle(value);
display( );
break;
}
case :
{
display( );
break;
}
case :
{
exit();
break;
}
}
}
printf(\n\n%d,last->data);
display( );
getch( );
}
Singly linked list is a collection of nodes and each Doubly linked list is a collection of nodes and each node has one
node has one data field and next link field.
data field, one previous link field and one next link field.
For example:
Data Next
2.
For example:
Previous Data Next
3.
The elements can be accessed using both the previous link as well
as the next link.
4.
No extra field is required; hence a node takes less One field is required to store the previous link: hence a node takes
memory in SLL.
more memory in DLL.
To represent a term a polynomial in the variables x and y, each node consists of four
sequentially allocated fields. The first two fields represent the power of the variables x and
y respectively. The third and fourth field represent the coefficient of the term in the
polynomial and the address of the next term in the polynomial. For example:
The polynomial 4x4 y2 + 2x3y x2 + 3 is represented as linked list by Figure 6.10.
Polynomial Arithmetic
Addition of two polynomial
Multiplication of two polynomial
Evaluation of polynomial
delete at the end of the function) is time consuming and error prone. Hence automatic
memory management is done.
2. Reusability of the memory can be achieved with the help of garbage collection.
Disadvantage
1. The execution of the program is paused or stopped during the process of garbage
collection.
2. Thus we have learned a dynamic data structure represented by a linear organization.
7
TREE
7.1 INTRODUCTION
In the previous chapters we have studied some linear data structures such as arrays,
stacks, queues, linked lists. Now we will study some non-linear data structures such as
trees and graphs. Trees are one of the most important data structures in computer science.
Trees are basically used to represent the data objects in a hierarchical manner.
7.3 TERMINOLOGIES
Consider a tree as shown in Figure 7.3. The tree has 14 nodes. Node A is a root node.
The number of sub-trees of a node is referred to its degree. Thus the degree of a node A
is 3. Similarly the degree of node E is 1, and L is 0. The degree of a tree is the
maximum degree of any nodes in the tree. The degrees of various nodes are given below:
Nodes having the degree zero are known as terminal nodes or leaf nodes and the nodes
other than these nodes are known as non-terminal nodes or non-leaf nodes.
The degree of tree shown in Figure 7.3 is 3.
NODES
DEGREES
sub-trees of the original tree. A left or right sub-tree can be empty. The distinction between
a binary tree and tree is that, there is no tree having zero nodes, but there is an empty
binary tree.
The binary tree BT may also have zero nodes, and can be defined recursively as:
An empty tree is binary tree.
A distinguished node (unique node) known as root node.
The remaining nodes are divided into two disjoint sets L and R, where L is a left
sub-tree and R is right sub-tree such that these are binary tree once again. Some
binary trees are shown in Figure 7.4.
distinctions are made, is called an ordered tree, and data structures built on them are called
ordered tree data structures.
An ordered tree is a rooted tree in which the children of each vertex are assigned an
order. For example, consider this tree:
If this is a family tree, there could be no significance to left and right. In this case, the
tree is unordered, and we could redraw the tree exchanging sub-trees without affecting the
meaning of the tree. On the other hand, there may be some significance to left and right
may be the left child is younger than right or (as is the case here) or may be the left child
has the name that occurs earlier in the alphabet system. Then the tree is ordered and we
are not free to move around the sub-trees.
Figure 7.9
(ni)
It can be observed that the number of edges in the i th child of the root is (ni -1).
Total number of edges in all the children of the root is
Also, the original tree contains C edges from the root to its C children. Thus, the total
number of edges in the tree is:
(ni) j + 1 n 1
Thus, the above lemma is proved for any tree.
Lemma 2: The maximum number of nodes on level l of a binary tree is 2l, l 0
Proof: The proof is by induction on l
Induction base
On level l = 0, the root node is the only node, hence, the maximum number of nodes
present at level l = 0, 2l = 20, which is 1.
Induction Hypothesis
It can be seen that the maximum number of nodes on level l, 0 i l, is 2i.
Induction step
By the induction hypothesis, it can be observed that the maximum number of nodes at the
level k 1 is 2k 1. Also, a binary tree has a property that each node can have a of
maximum two degrees. Thus the maximum number of nodes on level l is twice the
maximum number on level l 1, which is 2l 1. So, for the l level we have 2. 2l 1,which
results to 2l.
Thus, the above lemma is proved.
Lemma 3: The maximum number of nodes in a binary tree of height h is 2h + 1 1, h
0.
Proof: The proof is by induction on h
Induction base
On level l = 0, the root node is the only node. Hence, the height h of the tree is zero.
Induction Hypothesis
Let us assume a tree with height h = m for all k, 0 k h, and the maximum number of
nodes on level k is 2k + 1.
Induction step
By induction hypothesis, it can be seen that the maximum number of nodes on level j 1
is 2 j 1. Thus, the maximum number of nodes in a binary tree of height h:
=
=
= 2h + 1 1
Thus, the above lemma is proved.
The sequential representation consumes more space for representing a binary tree. But
for representing a complete binary tree proved to be efficient as no space is wasted.
2. Linked List Representation: In this representation each node of a binary tree
consists of three parts where:
Consider a binary tree and binary arithmetic expression tree shown in Figure 7.14 (a) and
(b).
Pre-order traversal:
Figure 7.14 (a): ABDECFG
Figure 7.14 (b): *+/ABCD
2. In-order Traversal (LRR): The in-order traversal of a binary tree is as follows:
First, traverse the left sub-tree in in-order.
Second, process the root node.
Lastly, traverse the right sub-tree in in-order.
If the tree has an empty sub-tree the traversal is performed by doing nothing. That means
a tree having NULL sub-tree is considered to be completely traversed when it is
encountered. The algorithm for the in-order traversal in a binary tree is given below:
Algorithm In-order (Node): The pointer variable Node stores the address of the root
node.
Step 1: Is empty?
If (empty [Node]) then
Print Empty tree return
Step 2: Traverse the left sub-tree
If (Lchild [Node] NULL) then
Call in-order (Lchild [Node])
Step 3: Process the root node
If (Node NULL) then
Output: (Data [Node])
Step 4: Traverse the right sub-tree
If (Rchild [Node] NULL) then
Call in-order (Rchild [Node])
Step 5: Return at the point of call
Exit
Consider a binary tree and binary arithmetic expression tree shown in Figure 7.15 (a) and
(b).
In-order traversal:
Figure 7.15 (a): DBEAFCG
Figure 7.15 (b): A/B+C*D
3. Post-order Traversal (LRR): The post-order traversal of a binary tree is as follows:
First, traverse the left sub-tree in post-order.
Second, traverse the right sub-tree in post-order.
Lastly, process the root node.
If the tree has an empty sub-tree the traversal is performed by doing nothing. That means
a tree having NULL sub-tree is considered to be completely traversed when it is
encountered. The algorithm for the post-order traversal in a binary tree is given below:
Algorithm Post-order (Node):
The pointer variable Node stores the address of the root node.
Step 1: Is empty?
If (empty [Node]) then
Print Empty tree return
Step 2: Traverse the left sub-tree
If (Lchild [Node] NULL) then
Call post-order (Lchild [Node])
Step 3: Traverse the right sub-tree
If (Rchild [Node] NULL) then
Call post-order (Rchild [Node])
Step 4: Process the root node
If (Node NULL) then
Output: (Data [Node])
Post-order traversal:
Figure 7.16 (a): DEBFGCA
Figure 7.16 (b): AB/C+D*
Step 1: The last node in post-order (left, right and root) sequence is the root node. In the
above example A is the root node. Now the in-order sequence locates the A. Left
sequence to A indicates the left sub-tree and right sequence to A indicates the right subtree.
Step 2: These alphabets H, D, I, B, E observe the post-order and sequence in in-order
Post-order: H I D E B
In-order: H D I B E
Here B is parent node; therefore pictorially the tree will be as shown in the figure below.
Step 4: Now we will solve for the right sub-tree of root A with the alphabets F, C, G.
Observe both the sequences:
Post-order: F G C
In-order: F C G
C is the parent node, F is the left child and G is the right child. So finally the tree will be
as shown in the figure below.
If the link of a node P is NULL then this link is replaced by the address of the
predecessor of P. similarly, if a right link is NULL then this link is replaced by the address
of the successor of the node which would come after node P. Internally, a thread and a
pointer, both are addresses. These can be distinguished by the assumption that a normal
pointer is represented by positive addresses and threads are represented by negative
addresses. Figure 7.18 shows a threaded binary tree where normal pointers and threads are
shown by solid lines and dashed lines respectively.
It is to be noted that by making little modification in the structure of a binary tree we can
get the threaded tree structure, thereby distinguishing threads and normal pointer by
adding two extra one-bit fields-lchildthread and rchildthread.
also,
Advantages
1. The in-order traversal of a threaded tree is faster than its unthreaded version.
2. With a threaded tree representation, it may be possible to generate the successor or
predecessor of any arbitrarily selected node without having to incur the overhead of
using a stack.
Disadvantages
1. Threaded trees are unable to share common sub-trees.
2. If negative addressing is not permitted in the programming language being used, two
additional fields are required to distinguish between the thread and structural links.
3. Insertions and deletions from a threaded tree are time consuming, since both thread
and structural links must be maintained.
item to be searched.
Step 1: Checking, Is empty?
If (R = 0), then
Print: Empty tree
Return 0
Step 2: if K is equal to the value of the root node
If (R[data] = K)
Print: search is successful
Return (R[data])
Step 3: K is less than the key value at root
If (K < R[data])
Return (BST search (R[lchild], K)
Step 4: K is greater than the key value at root
If (K > R[data])
Return (BST search (R[rchild], K)
Example: Given the binary search tree, see Figure 7.20. Suppose we have to search a data
item having key K = 13, then searching of the data item can be done by using the
searching algorithm as follows.
Solution
Step 1: Initially
K = 13
R[data] = 18
(K < R[data]), so,
Left sub-tree to be searched
Step 2: K = 13
R[data] = 9
(K > R[data]), so,
So, the node becomes the root node as the tree is empty.
Step 2: Insertion 30
Checking with the root node 30 > 5
So, it is inserted at right of the root node.
Step 3: Insertion 2
Checking with the root node 2 < 5
So, it is inserted at the left of the root node
Step 4: Insertion 40
Checking with root node 40 > 5,
So, it is inserted at the right sub-tree of the root node,
Checking with the root node of the right sub-tree 40 > 30
From the above tree, we want to delete the node having the value 8. Then we will set the
right pointer of its parent node as NULL that is the right pointer of the node having the
value 9 is set to NULL.
If we want to delete the node 15, then we will simply copy node 18 of 15 and then set the
node free. If the delete node that has a right child then the right child pointer value is
assigned to the right child value of its parent, but if the delete node that has a left child
then the left child pointer value is assigned to the left child value of its parent.
We want to delete the node having the value 6. We will then find out the in-order
successor of node 6. The in-order successor will be simply copied at location of node 6.
That means copy 7 at the position where value of the node is 6. Set the left pointer of 9 as
NULL. This completes the deletion procedure.
of the ten possible children of a given node. But, if a key is a set of characters, determine
one of the twenty-six possible children of a given node. In this search tree, the leaf node is
represented by a special symbol Ek, which indicates end key. The node structure of a
digital search tree is as follows:
Each node consists of three fields
Symbol key
Child, pointer to the first sub-tree
Csib, child sibling which is a pointer to the next sibling.
In Figure 7.27, a forest is represented as a set of data items from the given sets:
S = {111, 199, 153, 1672, 27,245, 2221, 310, 389, 3333}
Binary tree representation method is not the only method to represent digital search tree.
If binary tree representation is not used, then for n symbols in each position of the key,
each node in a tree contains n pointers to the corresponding symbols. In such type of tree
representation, the node pointer is associated with a symbol value based at its position in
the node. This implementation of digital search tree is known as trie search tree, where
Trie is derived from the word retrieval.
so on. The root node thus has a path connecting it to any other node in the tree. If a node
has no children, we call it a leaf node, since intuitively it is at the edge of the tree. A subtree is the portion of the tree that can be reached from a certain node, considered as a tree
itself. In red-black trees, the leaves are assumed to be null or empty.
As red-black trees are also binary search trees, they must satisfy the constraint that every
node contains a value greater than or equal to all the nodes in its left sub-tree, and less
than or equal to all nodes in its right sub-tree. This makes it quick to search the tree for a
given value.
Properties
A red-black tree is a binary search tree where each node has a color attribute the value of
which is either red or black. In addition to the ordinary requirement imposed on binary
search trees, we add the following conditions to any valid red-black tree:
Every node is colored either black or red.
The root is black.
All leaves are black.
Both children of every red node are black.
Every leaf-nil node, known as external node is colored black.
All paths from any given node to its leaf nodes contain the same number of black
nodes.
One such type of Red-black tree is shown in Figure 7.28.
As compared to simple binary tree, the balanced search trees are more efficient because
the insertion or deletion of nodes in this data structure requires O (log n) time. These
balanced structures allow performing various dictionary operations such as insertions and
deletions. In balanced tree, as items are inserted and deleted, the tree is restricted to keep
the nodes balanced and the search paths uniform.
AVL TREE
Adelsion Velski and Lendis in 1962 introduced a binary tree structure that is balanced with
respect to height of sub-trees. The tree can be made balanced and because of this retrieval
of any node can be done in O (log n) times, where n is total number of nodes. From the
name of these scientists the tree is called AVL tree.
An empty tree is height balanced if T is a non-empty binary tree with TL and TR as its left
and right sub-trees. The T is height balanced if and only if
TL and TR are height balanced.
Height of left hL hR height of right < = 1 where hL and hR are height of TL and TR.
The idea of balancing a tree is obtained by calculating the balanced factor of a tree.
Balanced Factor
The balanced factor BF (T) of a node in binary tree is defined to hL hR where hL and hR
are height of left and right sub-trees of T.
For any node in AVL tree the balanced factor i.e. BF (T) is 1, 0, 1.
The AVL tree follows the property of binary search tree. In fact AVL trees are basically
binary search trees with balanced factor as -1, 0, 1. After insertion of any tree if the
balanced factor of any node becomes other than -1, 0, 1 then it is said that the AVL
property is violated.
Insertion
There are four different cases when rebalancing is required after insertion of a new
element or node.
1. An insertion of a new node into the left sub-tree of left child (LL).
2. An insertion of a new node into the right sub-tree of left child (LR).
3. An insertion of a new node into the left sub-tree of right child (RL).
4. An insertion of a new node into the right sub-tree of right child (RR).
Some modification done on an AVL tree in order to rebalance it is called rotations of
AVL tree. These are classification of rotations as shown in Figure 7.31.
Insertion in an AVL search tree is a binary search tree. Thus, the insertion of the data
item having key K in an AVL search tree is same as performed in a binary search tree.
The insertion of the data item with key K is performed at the leaf, in which three cases
arise.
If the data item with K is inserted into an empty AVL search tree, then the node with
key K is set to be the root node. In this case the tree is balanced.
If the tree contains only a single node, the root node, then the insertion of node with
key K depends upon the value of K. If K is less than the key value of the root then
it is appended to the left of the root. Otherwise, for a greater value of K it is
appended to right of the root. In this case the tree is height balanced.
If an AVL search tree contains number of nodes (which are height balanced), then in
that case it has to be taken from inserting a data item with the key K so that after the
insertion the tree is height balanced.
We have noticed that insertion may cause unbalancing the tree. So, rebalancing of the
tree is performed for making it balanced. The rebalancing is accomplished by performing
four kinds of rotations. The rotations for balancing the tree are characterized by the nearest
ancestor of inserted node whose balance factor becomes 2.
(1) Left-Left (L-L) Rotation: Given an AVL search tree as shown in Figure 7.32. After
inserting the node with the value 15 the tree becomes unbalanced. So, by performing an
LL rotation the tree becomes balanced. After inserting the new node 15 the tree as in
Figure 7.32 it becomes unbalanced. So by performing an LL rotation the tree becomes
balanced as shown in Figure 7.33.
(2) Right-Right (RR) Rotation: Given an AVL search tree as shown in Figure 7.34. After
inserting the node with the value 75 the tree becomes unbalanced. So, by performing an
RR rotation the tree become balanced.
After inserting the new node 75 the tree as in Figure 7.34 become unbalanced. So by
performing an RR rotation the tree becomes balanced as shown in Figure 7.35.
(3) Left-Right (LR) Rotation: Given an AVL search tree as shown in Figure 7.36. After
inserting the node with the value 25 the tree becomes unbalanced. So, by performing an
LR rotation the tree becomes balanced.
After inserting the new node 25 the tree as in Figure 7.36 becomes unbalanced. So by
performing an LR rotation the tree becomes balanced as shown in Figure 7.37.
(4) Right-Left (RL) Rotation: Given an AVL search tree as shown in Figure 7.38. After
inserting the node with the value 25 the tree becomes unbalanced. So, by performing an
RL rotation the tree becomes balanced.
After inserting the new node 25 the tree as in Figure 7.38 becomes unbalanced. So by
performing an LR rotation the tree becomes balanced as shown in Figure 7.39.
Example: Creation of an AVL search tree is illustrated from the given set of values:
20, 30, 40, 50, 60, 57, 56, 55.
Solution Insertion 20
No balancing required because BF = 0
Insertion 30
No balancing required
Insertion 60
No balancing required
Deletion
For deletion of any particular node from an AVL tree, the tree has to be reconstructed in
order to preserve the AVL property, and various rotations are needed to be applied for
balancing the tree.
Algorithm for deletion
The deletion algorithm is more complex than insertion algorithm.
1. Search the node which is to be deleted.
2. (A) If the node to be deleted is a leaf node then simply make it NULL to remove it.
(B) If the node to be deleted is not a leaf node i.e. the node has one or two children, then
the node must be swapped with its in-order successor. Once the node is swapped, we
can remove the node.
3. Now we have to traverse back up the path towards the root, checking the balance
factor of every node along the path. If we encounter unbalancing in some sub-tree then
balance that sub-tree using an appropriate single or double rotation.
The deletion algorithm takes O (log n) time to delete any node.
Searching
The searching of a node in an AVL tree is very simple. As an AVL tree is basically a
binary search tree, the algorithm used for searching a node from a binary search tree is the
same one is used to search a node from an AVL tree.
The searching of a node takes O (log n) time to search any node.
Figure 7.40
For example, consider the tree given in Figure 7.40. This is a balanced tree, which is
organized according to the number of accesses.
The rules for putting a node in a weight balanced tree are expressed recursively as
follows:
1. The first node of tree or sub-tree is the node with the highest count of number of
times it has been accessed.
2. The left sub-tree of the tree is composed of nodes with values lexically less than the
first node.
3. The right sub-tree of the tree is composed of nodes with value lexically higher the
first node.
7.12 B-TREES
The working with large amount of data elements is inconvenient when considering
primary storage (RAM). Instead, for large data elements, only a small portion is
maintained in the primary storage and the rest of them reside in the secondary storage. If
required it can be accessed from the secondary storage. Secondary storage, such as a
magnetic disk, is slower in accessing data then the primary storage.
B-Trees are balanced trees and a specialized multiway (m-way) tree is used to store the
records in a disk. There are a number of sub-trees to each node. The height of the tree is
relatively small so that only small number of nodes must be read from the disk to retrieve
an item. The goal of B-trees is to get a fast access to the data. B-trees try to minimize the
disk accesses, as disk accesses are expensive.
Multiway search tree
A multiway search tree of order m is an ordered tree where each node has at the most m
children. If there are n number of children in a node then (n-1) is the number of keys in the
node.
The B-tree is of order m if it satisfies following conditions:
1. The root node should have at least two children.
2. Except the root node, each node has at most m children and at least m/2 children.
3. The entire leaf node must be at the same level. There should be no empty sub-tree
above the level of the leaf nodes.
4. If order of tree m, it means that m-1 keys are allowed.
records. If the node overflows because there is an upper bound on the size of a node,
splitting is required.
The node is split into three parts. The middle record is passed upward and inserted into
the parent, leaving two children behind where there was one before. The splitting may
propagate up the tree because the parent into which a record to be split in its child node,
may overflow. Therefore, it may also split. If the root is required to be split, a new root is
created with just two children, and the tree grows taller by one level.
For example of a B-tree, we will construct of order 5 using the following numbers.
3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12, 20.
The order 5 means at the most 4 keys are allowed. The internal node should have at least
3 non-empty children and each leaf node must contain at least 2 keys.
Step 1: Insert 3, 14, 7, 1 as follows.
1 3 7 14
Step 2: Insert the next element 8. Then we need to split the node 1, 3, 7, 14 at medium.
Hence,
Here 1 and 3 are < 7 so these are at left branch, node 8 and 14 > 7 so these are at right
branch.
Step 3: Insert 5, 11, 17 which can be easily inserted in a B-tree.
Step 4: Insert next element 13. But if we insert 13 then the leaf node will have 5 keys
which are not allowed. Hence 8, 11, 13, 14, 17 is split and the medium node 13 is moved
up.
2. Deletion
As in the insertion method, the record to be deleted is first searched for. If the record is in
the terminal node, the deletion is simple. The record along with an appropriate pointer is
deleted. If the record is not in the terminal node, it is replaced by a copy of its successor,
which is a record with the next higher value.
Consider a B- Tree,
Now we want delete 20, the 20 is not in a leaf node so we will find its successor which is
23. Hence 23 will be moved up to replace 20.
Next we will delete 18; Deletion of 18 from the corresponding node causes the node with
only one key, which is not desired in B-tree of order 5. The sibling node to immediate
right has an extra key. In such a case we can borrow a key from parent and move spare
key of sibling to up.
3. Searching
The search operation on a B-tree is similar to a search on binary search tree. Instead of
choosing between a left and right child as in binary tree, B-tree makes an m-way choice.
Consider a B-tree as given below:
Step 2:
We will encode each of the above branches. The encoding should start from top to down.
If we follow the left branch then we should encode it as 0 and if we follow the right
branch then we should encode it as 1. Hence, we get
Step 3:
Step 4:
Hence the Huffmans coding with the fixed length code will be
Symbol Code word
A
111
011
010
110
001
000
Step 3:
Step 4:
Step 5:
Step 6:
110
1110
10
1111
11110
8
GRAPH THEORY
8.1 INTRODUCTION
In the previous chapter we have studied the non-linear data structure tree. Now we
introduce another non-linear data structure, graphs. With tree data structure, the main
restriction is that every tree has a unique root node. If we remove this restriction we get a
more complex data structure i.e. graph. In graph there is no root node at all and so we will
get introduced to a more complex data structure. In computer science graphs are used in a
wide range. There are many theorems on graphs. The study of graphs in computer science
is known as graph theory.
One of the first results in graph theory appeared in Leonhard Eulers paper on seven
bridges of Konigsberg, published in 1736. It is also regarded as one of the first topological
results in geometry. It does not depend on any measurements. In 1945, Gustav Kirchhoff
published his Kirchhoffs circuit laws for calculating the voltage and current in electric
circuits.
In 1852, Francies Guthrie posed the four color problem which asks if it is possible to
color, using only four colors, any map of countries in such a way as to prevent two
bordering countries from having the same color. This problem, which was solved only a
century later in 1976 by Kenneth Appel and Wolfgang Haken, can be considered the birth
of graph theory. While trying to solve it, mathematicians invented many fundamental
graph theoretic terms and concepts.
Structures that can be represented as graphs are everywhere, and many practical
problems can be represented by graphs. The link structure of a website could be
represented by a graph, such that the vertices are the web pages available at the website
and theres a directed edge from page X to page Y if and only if X contains a link to Y.
Networks have many uses in the practical side of graph theory, network analysis (for
example, to model and analyze traffic networks or to discover the shape of the internet).
The difference between a tree and a graph is that a tree is a connected graph having no
circuits, while a graph can have circuits. A loop may be a part of a graph but a loop does
not take place in a tree.
We could have written (1, 5) and (5, 1) means ordering of vertices is not significant in an
undirected graph.
Undirected Graph: A graph is called an undirected graph when the edges of a graph are
unordered pairs. If the edges in a graph are undirected or two-way then the graph is
known as an undirected graph.
By unordered pair of edges we mean that the order in which the Vi, Vj occur in the
pair of vertices (Vi, Vj) is unrelated for describing the edge. Thus the pair (Vi, Vj) and (Vj,
Vi) both represent the same edge that connect the vertices Vi and Vj. Figure 8.3 shows an
undirected graph.
Set of vertices V = {V1, V2, V3, V4}
Set of edges E = {e1, e2, e3, e4}
We can say E1 is the set of (V1, V2) and of (V2, V1) represent the same edge.
Subgraph: A subgraph G of the graph G is a graph such that the set of vertices and the
set of the edges of G are proper subsets of the set of the edges of G.
The graph shown in Figure 8.5 is a sub-graph.
Multigraph: A graph which contains a pair of nodes joined by more than one edge is
called a multigraph and such edges are called parallel edges. An edge having the same
vertex as both its end vertices is called a self-loop (or a loop). The graph shown in Figure
8.7 is a multigraph.
A graph that does not self-loop nor have parallel edges is called a simple graph.
Degree: In a graph the degree is defined for a vertex. The degree of a vertex is denoted as
degG (Vi). It is the total number of edges incident with Vi. It is to be noted that self-loops
on a given vertex is counted twice. An edge having the same vertex as both its end
vertices is called a self-loop.
Consider Figure 8.8.
As we have observed that for an undirected graph the edge contributes two degrees. A
graph G with ek edges and n vertices V1, V2,.Vn, the number of edges is half the
sum of the degrees of all vertices.
Again, it can be easily calculated that for any directed graph the sum of all in-degrees is
equal to the sum of all out-degrees, and each sum is equal to the number of edges in a
graph G, thus:
Null Graph: If a graph contains an empty set of edges and non-empty sets of vertices, the
graph is known as a null graph.
The graph shown in Figure 8.10 is null graph.
Graph Isomorphism
Two graphs, G = {V, E} and G = {V, E} are said to be isomorphic graphs if there exits
one-to-one correspondence between their vertices and between their edges such that the
incidence relationship is preserved. Suppose that an edge ek has end vertices Vi and Vj
in G, then the corresponding edge ek in G must be incident on the vertices Vi and Vj
that correspond to Vi and Vj respectively.
Two isomorphic graphs are shown in the figure below.
Isomorphic Properties
Both the graphs G and G have the same number of vertices.
Both the graphs G and G have the same number of edges.
Both the graphs G and G have the same degree sequences.
Consider a graph G of n vertices and the matrix M. if there is an edge present between
vertices Vi and Vj then M[i][j] = 1 else M[i][j] = 0. Note that for an undirected graph if
M[i][j] =1 then for M[j][i] is also 1. Here are some graphs shown by adjacency matrix.
We have seen how a graph can be represented using adjacency matrix. We used array data
structure there. But the problems associated with array are still there in the adjacency
matrix that there should be some flexible data structure and so we will go for a linked data
structure for creation of a graph. The type in which a graph is created with the linked list is
called adjacency list.
In this representation, a graph is stored as a linked structure. We will represent a graph
using an adjacency list. This adjacency list stores information about only those edges that
exist. The adjacency list contains a directory and a set of linked lists. This representation is
also known as node directory representation. The directory contains one entry for each
node of the graph. Each entry in the directory points to a linked list that represents the
nodes that are connected to that node. The directory represents the nodes and linked lists
represent the edges.
Each node of the linked list has three fields one is the node identifier, second is an
option weight field which contains the weight of the edge and third is the link to the next
field.
Nodeid Next or Nodeid Weight Next
Figure 8.14 represents the linked list representation of the directed graph as given in
Figure 8.13.
Figure 8.14 Linked list representation of the graph given in Figure 8.13.
An undirected graph of order N with E edges requires N entries in the directory and 2 *
E linked list entries. The adjacency list representation of Figure 8.15 is shown in Figure
8.16.
visited (v);
for each vertex w adjacent to v do
if not visited (w) then
traverse (w);
end;
For example, let a graph is shown in Figure 8.18 which is visited in depth first traversal
starting from vertex A.
begin
generate children of x;
put x on closed;
put children on right end of open;
end
end
return (failure)
end
For example, consider the tree shown in Figure 8.19. The open and closed lists
maintained by BFS are shown below:
Closed = [ ]
Closed = [A]
Open = [E,C,D];
Closed = [A,B]
Closed = [A,B,E,C,D]
Open = [G,H,I,J];
Closed = [A,B,E,C,D,F]
Open = [H,I,J,K];
Closed = [A,B,E,C,D,F,G]
Open = [I,J.K];
Closed = [A,B,E,C,D,F,G,H]
Open = [J,K];
Closed = [A,B,E,C,D,F,G,H,I]
Open = [K];
Closed = [A,B,E,C,D,F,G,H,I,J]
Open = [ ];;
Closed = [A,B,E,C,D,F,G,H,I,J,K]
To understand DFS, consider Figure 8.20. The open and closed list maintained by DFS is
shown below:
Closed = [ ]
Closed = [A]
Open = [D,E,C];
Closed = [A,B]
Closed = [A,B,D,H]
Open = [E,C];
Closed = [A,B,D,H,I]
Open = [J,C];
Closed = [A,B,D,H,I,E]
Open = [C];
Closed = [A,B,D,H,I,E,J]
Open = [F,G];
Closed = [A,B,D,H,I,E,J,C]
Open = [K,G];
Closed = [A,B,D,H,I,E,J,C,F]
Open = [G];
Closed = [A,B,D,H,I,E,J,C,F,K]
Open = [ L];
Closed = [A,B,D,H,I,E,J,C,F,K,G]
Open = [ ];
Closed = [A,B,D,H,I,E,J,C,F,K,G,L]
Advantages of BFS
1. BFS will not get trapped on dead-end paths. This constrains with DFS which may
follow a single unfruitful path for a long time, before the path actually terminates in a
state that has no successor.
2. If there is a solution then BFS guarantees to find it. Furthermore if there are multiple
solutions then a minimal solution will be found.
Disadvantage of BFS
Full tree explored so far will have to be stored in the memory.
Advantages of DFS
1. DFS requires less memory since only the nodes on the current path are stored. This
contrasts with BFS where all of the tree that have so far been generated must be
stored.
2. By chance, DFS may find a solution without examining much of the search space at
all. This contrasts with BFS in which all parts of the trees must be examined to level n
before any nodes of level n + 1 be examined.
Disadvantages of DFS
1. DFS may be trapped on dead-end paths. DFS follows a single unfruitful path for a
long time, before the path is actually terminated in a state that has no successor.
2. DFS may find a long path to a solution in one part of the tree, when a shorter path
exists in some other unexpected part of the tree.
from the remaining edges another edge that has the minimum weight and then follow the
condition that this edge does not make any circuit with the previously selected edges. The
whole process continues till all n-1 edges are selected and these edges will form the
desired minimal spanning tree.
Algorithm steps
1. Initialize T = NULL
2. (scan n-1 edges from the given set E)
Repeat through Ei = 1,2,3,..n-1, edges
Set edge = minimum (Ei)
Set temp = edge [delete edge from the set E]
3. (add temp to T if no circuit is obtained)
Repeat while Ei does not create cycle
Set T = temp [minimum weight edges]
4. (no spanning tree)
If edges of T is less than n-1 edges
Then message = No spanning Tree
5. Exit
Example: Consider a graph G = (V, E, W), an undirected connected weighted graph as
shown in Figure 8.22. Kruskals algorithm on graph G produces the minimum spanning
tree shown in Figure 8.23.
Solution The process for obtaining the minimum spanning tree using Kruskals algorithm
is pictorially shown below:
Hence, the minimum cost of spanning tree of the given graph using Kruskals algorithm
is
= 2 + 3 + 3 + 5 + 6 + 9 = 28
Jarnik-Prims Algorithm: In this algorithm, the pair with the minimum weight is to be
chosen. The adjacent to these vertices whichever is the edge having the minimum weight
is selected. This process is continued till all the vertices are not covered. The necessary
condition in this case is that the circuit should not be formed. From Figure 8.24 we will
build the minimum spanning tree.
Example: Consider a graph G = (V, E, W), undirected connected weighted graph shown
in Figure 8.24. Prims algorithm on graph G produces the minimum spanning tree shown
in Figure 8.25. The arrows on edges indicate the predecessor pointers and the numeric
label in each vertex is the key value.
Solution The process for obtaining the minimum spanning tree using Prims algorithm is
pictorially shown below:
Figure 8.25
Hence, the minimum cost of spanning tree of the given graph using Prims algorithm is
= 5 + 9 + 3 + 2 + 3 + 6 = 28
Algorithm
In Prims algorithm an arbitrary node is chosen initially as the root node. The nodes of
the graph are then appended to the root one at a time until all nodes of the graph are
included. The node of the graph added to the tree at each point is that node adjacent to a
node of the tree by an arc of the minimum weight. The arc of the minimum weight
becomes the tree arc connecting the new node to the tree. When all the nodes of the graph
have been added to the tree, a minimum spanning tree has been considered to be
constructed for the graph.
While in Kruskals algorithm, the nodes of the graph are initially considered as n
distinct partial trees with one node each. At each step of the algorithm, two distinct partial
trees are connected into a single partial tree by an edge of the graph. When only one
partial tree exits, it is a minimum spanning tree.
two nodes (source and destination nodes). We can obtain simply the minimum cost. But by
using the shortest path algorithm we can obtain the minimum distance between two nodes.
In our laboratories we have local area network for all the computers. Before designing
LAN we should always find out the shortest path and thereby we can obtain economical
networking.
A solution to the shortest path problem is sometimes called pathing algorithm. The most
important algorithms for solving this problem are:
Dijkstras algorithm: In this algorithm one solves single source problem if all edge
weights are greater than or equal to zero. Without worsening the run time, this
algorithm can in fact compute the shortest paths from a given start point to all other
nodes.
Bellman-Ford algorithm: In this algorithm one solves single source problem if the
edge weights are negative.
A*algorithm: A heuristic algorithm for single source shortest paths.
Floyd-Warshall algorithm: Solves all pairs shortest paths.
Johnsons algorithm: In this algorithm one solves all pairs of shortest paths, may be
faster than Floyd-Warshall on sparse graphs.
There are weighted and unweighted graph. Based on this category, let us discuss the
shortest path algorithm.
1. Unweighted shortest path: The unweighted shortest path algorithm gives a path in
an unweighted graph which is equal to the number of edges travelled from the source
to destination.
Example: Consider the graph given below Figure 8.26.
Path
Number of edges
V1 V2 V3 V10
V1 V4 V5 V6 V10
V1 V7 V8 V9 V10
Out of these the path 1 i.e. V1 V2 V3 V10 is shortest one as it consists of only 3 edges
from a to z.
2. Dijkstras shortest path algorithm: The Dijkstras shortest path algorithm suggests
the shortest path from some source node to the some other destination node. The
source node or the node from where we start measuring the distance is called the start
node and the destination node is called the end node. In this algorithm we start finding
the distance from the start node and find all the paths from it to neighboring nodes.
Among those the path whichever is the nearest node is selected. This process of
finding the nearest node is repeated till the end node. This path is called the shortest
path.
Since in this algorithm all the paths are tried and then we choose the shortest path among
them, this algorithm is solved by a greedy algorithm. One more point is that we are having
all the vertices in the shortest path and therefore the graph doesnt give the spanning tree.
Example: Find the shortest distance between a to z for the given in graph shown in Figure
8.27.
The shortest distance between a and z is computed for the given graph using Dijkstras
algorithm as follows:
P = Set which is for nodes which have already selected
T = Remaining nodes
Step 1: v = a
P = {a}, T = {b, c, d, e, f, z}
distance (b) = min {old distance (b), distance (a) + w (a, b)}
dist (b) = min {, 0 + 22}
dist(b) = 22
dist(c) = 16
dist(d) = 8 minimum node
dist(e) =
dist(f) =
dist(z) =
so the minimum node is selected in P i.e. node d
Step 2: v = d
dist(z) = min{23, 21 + 2} = 23
Now the target vertex for finding the shortest path is z. Hence the length of the shortest
path from the vertex a to z is 23.
The shortest path in the given graph is {a, d, f, z}.
Algorithm for shortest path
1. Algorithm shortest paths (v, cost, dist, n)
2. // dist[j], 1jn, is set to the length of the shortest
3. // path from vertex v to vertex j in a diagraph G
4. // with n vertices dist[v] is set to zero, G is
5. // represented by its cost adjacency matrix cost[1 : n, 1 : n]
6. {
7. For i: =1 to n do
8. { // initialize S
9. S [i]: = false; dist[v] = cost [v,i];
10. }
11. S [v]: = true; dist[v] = 0.0; // put v in S.
12. For num: = 2 to n-1 do
13. {
14. // determine n -1 paths from v
15. Choose u from among those vertices not in S such
16. That dist[u] is minimum;
17. S[u]: = true; // put u in S
18. For (each w adjacent to u with S[w] = false do
19. If(dist[w] < (dist[u] + cost[u,w])) then
20. Dist [w]; = dist[u] + cost [u,w];
21. }
22. }
9
SORTING AND SEARCHING
9.1 INTRODUCTION
The sorting and searching operation plays a very important role in various applications.
For most of them the database applications involve a large amount of data. Consider a
payroll system for a multidimensional company, having several departments; each
department having many employees. Now if we want to see the salary of a particular
employee, it will very difficult for us to see each and every record of the employee. If the
data is organized according to the employees salary i.e. either in ascending (increasing) or
in descending (decreasing) order of the employee Id the employee record is arranged then
the searching of the desired data will be an easy task.
Another application of systematic arrangement of the data is our university student
records. In any university there are many colleges, having several courses, having several
departments. Each department has many students. If we want to see the result of a
particular student it will very difficult. So organizes students data according to the
students enrolment number. Another example of telephone directory where the phone
numbers are stored along with persons name, and the surnames are arranged in an
alphabetical order. So to find a persons telephone number, you just search it by his
surname. Imagine how difficult it would have been if the telephone directory is with the
non-systematic arrangement of the numbers. Above examples are based on two techniques
sorting and searching.
Sorting is a systematic arrangement of the data. Systematic arrangement means based on
some key the data should be arranged in an ascending or descending order.
Before learning the sorting techniques let us understand some basic terminology which is
used in sorting.
9.2.1.1 Order
Sorting is a technique by which we expect the list of elements to be arranged as we expect.
Sorting order is the arrangement of the elements in some specific manner. Usually sorting
is of two types:
Descending Order: It is the sorting order in which the elements are arranged in the form
of high to low value. In other words elements are in a decreasing order.
Example: 15, 35, 45, 25, 55, 10
can be arranged in descending order after applying some sorting methods as
55, 45, 35, 25, 15, 10
Ascending Order: It is the sorting order in which the elements are arranged in the form of
low to high value. In other words elements are in an increasing order.
Example: 15, 35, 45, 25, 55, 10
can be arranged in ascending order after applying some sorting methods as
10, 15, 25, 35, 45, 55
9.2.1.2 Efficiency and passes
One of the major issues in the sorting algorithms is its efficiency. If we can efficiently sort
the records then that adds value to the sorting algorithm. We generally denote the
efficiency of a sorting algorithm in terms of time complexity. The time complexities are
given in terms of big-O notations.
Commonly there are O(n2) and O(nlogn) time complexities for various algorithms. The
sorting techniques such as bubble sort, insertion sort, selection sort, shell sort has the time
complexity O(n2) and the techniques such as merge sort, quick sort, heap sort has time
complexities such as O(nlogn). Efficiency also depends on number of records to be sorted.
A sorting efficiency means how much time that algorithm have taken to sort the elements.
Sorting the elements in some specific order gives a group of arrangement of elements.
The phases in which the elements move to acquire their proper position are called passes.
Example: 10, 30, 20, 50, 40
Pass 1: 10, 20, 30, 50, 40
Pass 2: 10, 20, 30, 40, 50
In the above method we can see that data is getting sorted in two definite passes. By
applying the logic of comparison of each element with its adjacent elements gives us the
result in two passes.
Sorting is an important activity and every time we insert or delete the data we need to sort
the remaining data. Various sorting algorithms are developed for sorting elements such as:
Bubble sort
Insertion sort
Selection sort
Merge sort
Quick sort
Heap sort
Radix sort
Pass 1:
In this pass each element will be compared with its neighboring element.
45
55 35 90 70 30
A0
A1 A2 A3 A4 A5
35 90 70 30
A0 A1
A2 A3 A4 A5
Compare A[1] = 55 and A[2] = 35. Is 55 > 35 is true so interchange. A[1] = 35 and A[2]
= 55.
45 35 55
90 70 30
A0 A1 A2
A3 A4 A5
70 30
A0 A1 A2 A3
A4 A5
Compare A[3] = 90 and A[4] = 70. Is 90 > 70 is true so interchange. A[3] = 70 and A[4]
= 90.
45 35 55 70 90
30
A0 A1 A2 A3 A4
A5
Compare A[4] = 90 and A[5] = 30. Is 90 > 30 is true so interchange. A[4] = 30 and A[4]
= 90.
45 35 55 70 30 90
A0 A1 A2 A3 A4 A5
After the first pass the array will hold the elements which are sorted to some level.
Pass 2:
45
35 55 70 30 90
A0
A1 A2 A3 A4 A5
Compare A[0] = 45 and A[1] = 35. Is 45 > 35 is true so interchange. A[0] = 35 and A[1]
= 45.
35 45
55 70 30 90
A0 A1
A2 A3 A4 A5
70 30 90
A0 A1 A2
A3 A4 A5
30 90
A0 A1 A2 A3
A4 A5
Compare A[3] = 70 and A[4] = 30. Is 70 > 30 is true so interchange. A[3] = 30 and A[4]
= 70.
35 45 55 30 70
90
A0 A1 A2 A3 A4
A5
After the second pass the array will hold the elements which are sorted to some level.
Pass 3:
35
45 55 30 70 90
A0
A1 A2 A3 A4 A5
55 30 70 90
A0 A1
A2 A3 A4 A5
30 70 90
A0 A1 A2
A3 A4 A5
Compare A[2] = 55 and A[3] = 30. Is 55 > 30 is true so interchange. A[2] = 30 and A[3]
= 55.
35 45 30 55
70 90
A0 A1 A2 A3
A4 A5
90
A0 A1 A2 A3 A4
A5
After third pass the array will hold the elements which are sorted to some level.
Pass 4:
35
45 30 55 70 90
A0
A1 A2 A3 A4 A5
30 55 70 90
A0 A1
A2 A3 A4 A5
Compare A[1] = 45 and A[2] = 30. Is 45 > 30 is true so interchange. A[1] = 30 and A[2]
= 45.
35 30 45
55 70 90
A0 A1 A2
A3 A4 A5
70 90
A0 A1 A2 A3
A4 A5
90
A0 A1 A2 A3 A4
A5
A0 A1 A2 A3 A4 A5
After the fourth pass the array will hold the elements which are sorted to some level.
Pass 5:
35
30 45 55 70 90
A0
A1 A2 A3 A4 A5
Compare A[0] = 35 and A[1] = 30. Is 35 > 30 is true so interchange. A[0] = 30 and A[1]
= 35.
30 35
45 55 70 90
A0 A1
A2 A3 A4 A5
55 70 90
A0 A1 A2
A3 A4 A5
70 90
A0 A1 A2 A3
A4 A5
90
A0 A1 A2 A3 A4
A5
Finally, at the end of the last pass the array will hold the entire sorted element like this
30 35 45 55 70 90
A0 A1 A2 A3 A4 A5
Since the comparison positions look like bubbles, it is called bubble sort.
Algorithm of Bubble Sort
Step 1: Read the total number of elements say n.
Step 2: Store the elements in an array.
Step 3: Set the initial element i = 0.
Step 4: Compare the adjacent elements.
Step 5: Repeat step 4 for all n elements.
Step 6: Increment the value of i by 1 and repeat step 4, 5 for i < n.
Step 7: Print the sorted list of elements.
Step 8: Stop.
Program for sorting the elements by bubble sort algorithm
# include <iostream.h>
# include<conio.h>
void main()
{
int a[100],n, i, j, temp;
clrscr( );
cout <<How many element you want to sort =;
cin >> n;
cout <<endl <<Enter the element of array <<endl;
for (i=0; i <=n-1; i++)
{
The complexity of sorting depends on the number of comparisons. The number of passes
necessary may vary from 1 to (n 1), but the number of comparisons required in a pass is
not dependent on data. For the ith pass, the number of comparisons required is (n 1).
In the best case, the bubble sort performs only one pass, which gives O(n) complexity.
The number of comparison required is obviously (n 1). This case arises when the given
list of array is sorted.
In the worst case, performance of the bubble sort is given by:
Pass 1: Compare A[1] > A[0] or 70 > 30. True, so the position of the elements remain
same.
30 70 20 50 40 10
A0 A1 A2 A3 A4 A5
Pass 2: Compare A[2] > A[1] or 20 > 70. False, so interchange the position of the
elements. And A[1] > A[0] or 20 > 30. False, so interchange the position of the elements.
20 30 70 50 40 10
A0 A1 A2 A3 A4 A5
Pass 3: Compare A[3] > A[2] or 50 > 70. False, so interchange the position of the
elements. And A[2] > A[1] or 50 > 30. True, so the position of the elements remain same.
20 30 50 70 40 10
A0 A1 A2 A3 A4 A5
Pass 4: Compare A[4] > A[3] or 40 > 70. False, so interchange the position of the
elements. And A[3] > A[2] or 40 > 50. False, so interchange the position of the elements.
A[2] > A[1] or 40 > 30. True, so the position of the elements remain same.
20 30 40 50 70 10
A0 A1 A2 A3 A4 A5
Pass 5: Compare A[5] > A[4] or 10 > 70. False, so interchange the position of the
elements. And A[4] > A[3] or 10 > 50. False, so interchange the position of the elements.
A[3] > A[2] or 10 > 40. False, so interchange the position of the elements. A[2] > A[1] or
10 > 30. False, so interchange the position of the elements. And A[1] > A[0] or 10 > 20.
False, so interchange the position of the elements.
10 20 30 40 50 70
A0 A1 A2 A3 A4 A5
Finally, at the end of the last pass the array will hold the entire sorted element like this
10 20 30 40 50 70
A0 A1 A2 A3 A4 A5
Else
Insert the key into array.
Step 5: Repeat step 4 for all n elements.
Step 6: Increment the value of i by 1 and repeat step 4, 5 for i < n.
Step 7: Print the sorted list of elements.
Step 8: Stop.
Program for sorting the elements by insertion sort algorithm
# include<iostream.h>
# include<conio.h>
void main()
{
clrscr( );
int a[100],n, i, j, temp;
clrscr( );
cout <<How many element you want to sort =;
cin >> n;
cout <<endl <<Enter the element of array <<endl;
for (i=0; i <=n-1; i++)
{
cin>>a[i];
}
cout<<Elements before sorting is<<\n;
for(i=0;i<n-1;i++)
{
cout<<a[i]<<endl;
}
for(i=0;i<n-1;i++)
{
temp=a[i];
j=i-1;
while(j>=0&&a[j]>temp)
{
a[j+1]=a[j];
j=j-1;
}
a[j+1]=temp;
}
cout<<Elements after sorting are<<\n;
for(i=0;i<n-1;i++)
{
cout<<a[i]<<endl;
}
getch( );
}
Output of the program
How many element you want to sort = 6
Enter the element of array
Elements before sorting is
30
70
20
50
40
10
Elements after sorting are
10
20
30
40
50
70
Analysis
When an array of elements is almost sorted then it is best case complexity. The best case
time complexity of insertion sort is O(n).
If an array is randomly arranged then it results in average case time complexity which is
O(n2).
If the list of elements is arranged in a descending order and if we want to sort the
elements in ascending order then it results in worst case time complexity which is O(n2).
45 25 50 90 20
A0
A1 A2 A3 A4 A5
min
Pass 1:
70
45
25
50
90
20
A0
A1
A2
A3
A4
A5
min
70 45
25
50
90
20
A0 A1
A2
A3
A4
A5
Now swap A[i] with smallest element. Then we get the array list,
20 45 25 50 90 70
A0 A1 A2 A3 A4 A5
Pass 2:
20 45
25
50
90
70
A0 A1
A2
A3
A4
A5
20 45 25
50
90
70
A0 A1 A2
A3
A4
A5
Now swap A[i] with smallest element. Then we get the array list,
20 25 45 50 90 70
A0 A1 A2 A3 A4 A5
Pass 3:
20 25 45
50
90
70
A0 A1 A2
A3
A4
A5
Pass 4:
20 25 45 50 90
70
A0 A1 A2 A3 A4
A5
Pass 5:
20 45 25 50 90 70
A0 A1 A2 A3 A4 A5
i,
smallest
Now swap A[i] with smallest element. Then we get the array list,
20 25 45 50 70 90
A0 A1 A2 A3 A4 A5
50
90
20
Elements after sorting are
20
25
45
50
70
90
Analysis
When an array of elements is almost sorted then it is best case complexity. The best case
time complexity of insertion sort is O(n).
If an array is randomly arranged then it results in average case time complexity which is
O(n2).
If the list of elements is arranged in descending order and if we want to sort the elements
in ascending order then it results in worst case time complexity which is O(n2).
Advantage
Selection sort is faster than bubble sort.
If an item is in its correct final position, then it will never be moved.
The selection sort has better predictability, that is, the worst case time will differ little
from its best case time .
Pass 3: [1 2 3 4 5 6 7 8]
Sorted element: 1 2 3 4 5 6 7 8
When merge sort apply two or more list
Merging is the process of combining two or more sorted files into a third sorted file. Let
A be a sorted list combining X number of elements and B be a sorted list containing
Y number of elements. Then the operation that combines the elements A and B into new
sorted list C with Z = X + Y number of elements is called merging.
Compare the smallest elements of A and B. After finding the smallest, put it into new list
C. The process is repeated until either list A or B is empty. Now place the remaining
elements of A (or perhaps B) in C. The new list C contain the sorted elements which is
equal to the total sum of elements of A and B lists.
Algorithm
Given two sorted lists A and B that consist of X and Y number of elements
respectively. These algorithms merge the two lists and produce a new sorted list C.
Variables Pa and Pb keep track the location of smallest element in A and B. Variable Pc
refers to the location in C to be filled.
Step 1: Set Pa = 1;
Pb = 1;
Pc =1;
Step 2: loop comparisons
Repeat while ( Pa X and Pb Y)
If (A[Pa] < B[Pb]) then
Set C[Pc] =A[Pa]
Set Pc = Pc + 1
Set Pa = Pa + 1
else
C[Pc] = B[Pb]
Set Pc = Pc + 1
Set Pb = Pb + 1
Step 3: Append C list with remaining elements in A (or B)
If (Pa > X) then
Repeat for i = 0, 1, 2.., Y Pb.
Set C[Pc + i ] = B[Pb + i]
End loop
Repeat for i = 0, 1, 2.., Y Pa.
Set C[Pc + i ] = B[Pa + i]
End loop
Step 4: Finished.
Example: Consider two sorted lists A and B is as follows:
A: 1 5
10 20 25
B: 7 14 21 28 35
The process of merging and sorting illustrated below, which will produce a new sorting
list C.
Initially: Pa = 1;
Pb = 1;
Pc =1;
Step 1: Compare A[Pa] and B[Pb] or (A[1] and B[1])
A[Pa] < B[Pb], (1 < 7) so put 1 in C[Pc]
A: 1 5
10 20 25
B: 7 14 21 28 35
C: 1
Pa = Pa + 1
Pa = 2
Pb = 1
Pc = Pc + 1
Pc = 2
Step 2: Compare A[Pa] and B[Pb] or (A[2] and B[1])
A[Pa] < B[Pb], (5 < 7) so put 5 in C[Pc]
Pa = Pa + 1
Pa = 3
Pb = 1
Pc = Pc + 1
Pc = 3
Step 3: Compare A[Pa] and B[Pb] or (A[3] and B[1])
A[Pa] > B[Pb], (10 > 7) so put 7 in C[Pc]
Pa = 3
Pb = Pb + 1
Pb = 2
Pc = Pc + 1
Pc = 4
Step 4: Compare A[Pa] and B[Pb] or (A[3] and B[2])
A[Pa] < B[Pb], (10 < 14) so put 10 in C[Pc]
Pa = Pa + 1
Pa = 4
Pb = 2
Pc = Pc + 1
Pc = 5
Step 5: Compare A[Pa] and B[Pb] or (A[4] and B[2])
A[Pa] > B[Pb], (20 > 14) so put 14 in C[Pc]
Pa = 4
Pb = Pb + 1
Pb = 3
Pc = Pc + 1
Pc = 6
Pa = Pa + 1
Pa = 5
Pb = 3
Pc = Pc + 1
Pc = 7
Step 7: Compare A[Pa] and B[Pb] or (A[5] and B[3])
A[Pa] > B[Pb], (25 > 21) so put 21 in C[Pc]
Pa = 5
Pb = Pb + 1
Pb = 4
Pc = Pc + 1
Pc = 8
Step 8: Compare A[Pa] and B[Pb] or (A[5] and B[4])
A[Pa] < B[Pb], (25 < 28) so put 25 in C[Pc]
Pa = Pa + 1
Pa = 6
Pb = 4
Pc = Pc + 1
Pc = 9
Step 9: Append the elements of B in C
As Pa > x so, put all the remaining elements of B in C and increment Pb and Pc respectively
by 1 until the list B is also empty.
Pa = 6
Pb = Pb + 1
Pb = 5
Pc = Pc + 1
Pc = 10
Pa = 6
Pb = Pb + 1
Pb = 6
Pc = Pc + 1
Pc = 11
Now, Pb > y. This shows that B is also empty finally we have a sorted new list C as
follows:
C = 1, 5, 7, 10, 14, 20, 21, 25, 28, 35
Analysis
When an array of elements is almost sorted then it is best case complexity. The best case
time complexity of insertion sort is O(n log2 n).
If an array is randomly arranged then it results in average case time complexity which is
O(n log2 n).
If the list of elements is arranged in descending order and if we want to sort the elements
in an ascending order then it results in worst case time complexity which is O(n log2 n).
Example: Consider a list 25, 10, 35, 5, 60, 12, 58, 18, 49, 19 we have to sort the list using
quick sort techniques.
Solution Given
We use the first number 25. Beginning with the last number, 19, scanning from the right to
left, comparing each number with 25 and stopping at the first number having a value of
less than 25. The first number visited that has a value less than 25 is 19. Thus, exchange
both of them.
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9
10 35 5 60 12 58 18 49
Scanning from left to right, the first number visited that has a value greater than 25 is 35.
Thus, exchange both of them.
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9
19 10
5 60 12 58 18 49
Scanning from right to left, the first number visited that has a value less than 25 is 18.
Thus, exchange both of them.
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9
19 10
5 60 12 58
49 35
Scanning from left to right, the first number visited that has a value greater than 25 is 60.
Thus, exchange both of them.
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9
19 10 18 5
12 58
49 35
Scanning from right to left, the first number visited that has a value less than 25 is 12.
Thus, exchange both of them.
A0 A1 A2 A3 A4
A5 A6 A7 A8 A9
19 10 18 5
58 60 49 35
Thus 25 is correctly placed in its final position, and we get two sublist. Sublist1 and
Sublist2. Sublist1 has lesser value than 25 while Sublist2 has greater values.
Beginning with the last number, 12, scanning from the right to left, comparing each
number with 19 and stopping at the first number having a value less than 19. The first
number visited that has a value less than 19 is 12. Thus, exchange both of them.
A0 A1 A2 A3 A4
10 18 5
Now, 19 is correctly placed in its final position. Therefore, we sort the remaining Sublist1
beginning with 12. We scan the list from right to left. The first number having a value less
than 12 is 5. We interchange 5 and 12 to obtain list.
A0 A1 A2 A3
10 18
Beginning with 5 we scan the list from left to right. The first number having a value
greater than 12 is 18. We interchange 12 and 18 to obtain the list.
A0 A1 A2
A3
5 10
Beginning with 58 we scan the list right to left. The first number having a value less than
58 is 35. We interchange 58 and 35 and obtain the list.
A6 A7 A8 A9
60 49
Beginning with 35 we scan the list from left to right. The first number having a value
greater than 58 is 60. We interchange 58 and 60 to obtain the list.
A6 A7 A8 A9
35
49
Beginning with 60 we scan the list right to left. The first number having a value less than
A8 A9
60
pivot A[low]
i low
j high +1
Step 2: Checking
While (i < = j) do
While ( A[i] < = pivot) do
i i +1
while (A[j] > = pivot) do
j j-1
if (i < = j) then
swap (A[i], A[j])
swap (A[low], A[j])
return j
Program for sorting the elements by Quick sort algorithm
#include<process.h>
#include<iostream.h>
#include<conio.h>
#include<stdlib.h>
int Partition(int low,int high,int arr[]);
void Quick_sort(int low,int high,int arr[]);
void main()
{
int *a,n,low,high,i;
clrscr();
cout<</*********Quick Sort Algorithm Implementation***************/;
cout<<Enter number of elements:;
cin>>n;
a=new int[n];
/* cout<<enter the elements:;
for(i=0;i<n;i++)
cin>>a;*/
for(i=0;i<n;i++)
a[i]=rand()%100;
clrscr();
cout<<Initial Order of elements;
for(i=0;i<n;i++)
cout<<a[i]<< ;
cout<< ;
high=n-1;
low=0;
Quick_sort(low,high,a);
cout<<Final Array After Sorting:;
for(i=0;i<n;i++)
cout<<a[i]<< ;
getch();
}
/*Function for partitioning the array*/
int Partition(int low,int high,int arr[])
{ int i,high_vac,low_vac,pivot/*,itr*/;
pivot=arr[low];
while(high>low)
{ high_vac=arr[high];
while(pivot<high_vac)
{
if(high<=low) break;
high;
high_vac=arr[high];
}
arr[low]=high_vac;
low_vac=arr[low];
while(pivot>low_vac)
{
if(high<=low) break;
low++;
low_vac=arr[low];
}
arr[high]=low_vac;
}
arr[low]=pivot;
return low;
}
void Quick_sort(int low,int high,int arr[])
{
int Piv_index,i;
if(low<high)
{
Piv_index=Partition(low,high,arr);
Quick_sort(low,Piv_index-1,arr);
Quick_sort(Piv_index+1,high,arr);
}
}
Output
/*********Quick Sort Algorithm Implementation***************/
Enter number of elements: 9
enter the elements:
Initial Order of elements 50 30 10 90 80 20 40 70
Final Array after Sorting: 10 20 30 40 50 70 80 90
Analysis
When the pivot is chosen such that the array gets divided at the middle then it gives the
best case complexity. The best case time complexity of insertion sort is O(n log2 n).
If an array is randomly arranged then it results in average case time complexity which is
O(n log2 n).
The worst case for quick sort occurs when the pivot is minimum or maximum of all the
elements in the list. Then it results in worst case time complexity which is O(n2).
Complete binary tree: The complete binary tree is a binary tree in which all levels are
at the same depth or total number of nodes at each level i is 2i.
For example:
The heap must be either max heap (i.e. the parent is greater than all its children nodes) or
min heap (i.e. parent node is lesser than all children nodes).
Heap sort is a sorting method discovered by J.W.J. Williams. It works in two stages,
heap construction and processing the heap.
Heap construction: heap is a tree data structure in which every parent node must be
either greater than or lesser than its children nodes. Such heaps are called as max heap and
min heap respectively.
Now we will scan the tree from bottom and check parental property in order to build
max heap.
list[i] = temp
Return.
7 11
11 13 15
361, 321
4 5
6
366
Element
0
1
321, 361
2
3
4
5
6
366
7
8
Elements after the first pass: 321, 361, 143, 423, 543, 366, 128, 348, 538
Step 2: In the second pass, sort the elements according the tens digits.
Tens Digits
Element
0
1
2
538
5
6
361, 366
7
8
9
Elements after the second pass: 321, 423, 128, 538, 143, 543, 348, 361, 366
Step 3: In the third or final pass, sort the elements according the hundreds digits.
Hundreds Digits
Element
0
1
128, 143
2
3
423
538, 543
6
7
8
9
Elements after the third pass: 128, 143, 321, 348, 361, 366, 123, 538, 543.
Thus, finally the sorted list by radix sort method will be:
128, 143, 321, 348, 361, 366, 123, 538, 543.
Algorithm for Radix sort
1. Read the total number of elements in the array.
2. Store the unsorted elements in the array.
3. Now sort the elements by digit by digit.
4. Sort the elements according to the unit digit than tens digit than hundred and so on.
5. Thus the elements should be sorted for up to the most significant bit.
6. Store the sorted element in the array and print them.
7. Stop.
9.4 SEARCHING
The technique for finding the particular or desired data element that has been stored with
specific given identification is referred to as searching. Every day in our daily life, most
the people spend their time in searching their keys. We are using key as the identification
of the data, which has to be searched.
While searching, we are asked and to find a record that contains other information
associated with the key. For example, given a name we are asked to find the telephone
number, or given an account number and we are asked to find the balance in that account.
Such a key is called an internal key or an embedded key. There may be a separate table
of keys that includes pointer to records, and then it will be necessary to store the records in
the secondary storage. This kind of searching where most of the table is kept in the
secondary storage is called external searching. Searching where the table to be searched
is stored entirely in the main memory is called internal searching.
There are two searching methods: linear search and binary search.
{
If (target = = k[i])
{
Print: successful search;
Go to step 4.
}
Else
i++;
}
No match then
Print unsuccessful search;
Exit
Example: Given a set contains 6 data items:
25, 30, 13, 20, 37, 26
A0 A1 A2 A3 A4 A5
25 30 13 20 37 26
From the set we have to search the data item target = 13. The sequential search is as
follows:
Step 1: target A0, here i = 0 (as 13 25) so i++
Step 2: target A1 here i = 1 (as 13 30) so i++
Step 3: target A2 here i = 1 (as 13 = 13)
The search is successful and it requires 3 comparisons.
Program for linear search algorithm
#include <iostream.h>
#include <apvector.h>
int main(void)
{
apvector <int> array(10);
//drudge filling the array
array[0]=20; array[1]=40; array[2]=100; array[3]=80; array[4]=10;
array[5]=60; array[6]=50; array[7]=90; array[8]=30; array[9]=70;
cout<< Enter the number you want to find (from 10 to 100)<<endl;
int key;
cin>> key;
int flag = 0; // set flag to off
for(int i=0; i<10; i++) // start to loop through the array
{
if (array[i] == key) // if match is found
{
flag = 1; // turn flag on
break ; // break out of for loop
}
}
if (flag) // if flag is TRUE (1)
{
cout<< Your number is at subscript position << i <<.\n;
}
else
{
cout<< Sorry, I could not find your number in this array.<<endl<<endl;
}
return 0;
}
Output
Enter the number you want to find (from 10 to 100)
10
Your number is at subscript position 4
Analysis
Worst case: O(n)
Average case: O(n)
Best case: O(1)
Advantages of Linear Search
It is a simple and easy method.
It is efficient for small lists.
No sorting of items is required.
Disadvantages of Linear Search
It is not suitable for large list of elements.
The data items are arranged in the following manner along with their respective keys:
A1 A2 A3 A4 A5 A6 A7 A8
5 10 15 20 25 30 35 40
cin>>AR[i];
}
cout<<\nEnter the number you want to search ;
cin>>val;
found=bsearch(AR,n,val);
if(found==1)
cout<<\nItem found;
else
cout<<\nItem not found;
getch();
return 0;
}
int bsearch(int AR[], int N, int VAL)
{
int Mid,Lbound=0,Ubound=N-1;
while(Lbound<=Ubound)
{
Mid=(Lbound+Ubound)/2;
if(VAL>AR[Mid])
Lbound=Mid+1;
else
if(VAL<AR[Mid])
Ubound=Mid-1;
else
return 1;
}
return 0;
}
Output
SAMPLE RUN # 1
Enter number of elements you want to insert 5
10
TABLES
10.1 INTRODUCTION
In this chapter, we examine the simplest of all data types of table. The values in a table,
like the values in a sorted list, have two parts, a key and a data part. As the specification
shows, there are only 3 non-trivial operations: insert, delete and retrieve. Retrieve
operation takes a key and returns a Boolean indicating if the table contains a value with
that key and if the Boolean is true, returns the appropriate data part.
10.2 EXAMPLES
A familiar example is a telephone book. A value is an entry for a person or business. The
key is the persons name, the data part is the other information (address and phone
number). Another example is a tax table issued with the income tax guide. The key is the
amount of taxable income, the data parts include the amount of federal and provincial tax
you must pay.
However, these examples are actually sorted lists, not tables in the pure sense. The
difference is that in a list, the elements are arranged in a sequence. There is a first element,
a second one, etc. for every element (except the last). Also, there is a unique next
element.
In a table, there is no order given to the elements. There is no notion of next. Tables
with no particular order arise fairly often in everyday life. A very familiar example is a
table for converting two kinds of units between themselves, such as metric units (of
measure) and English units. The key is the unit of measure that you currently have, the
data is the unit in the other system and the conversion formula. There is no particular order
given to the entries in this table. Although it happens that the entry for kilograms is written
directly after the entry for meters, this is an arbitrary ordering which has no intrinsic
meaning. An abstract type table reflects the fact that, in general, there is no intrinsic
order among the entries of a table.
A table most closely resembles the abstract type collection. Indeed, there is only one
important difference between the two. While we have an operation for traversing a
collection (MAP) there is no such operation for tables which means there is no way to
examine the entry contents of a table. You can lookup individuals with the retrieve
operation e.g. you can find out how to convert grams to kilograms but there is no
operation that will list all the values in a table. Indeed there is not even an operation
reporting how many values a table contains.
relatively efficiently. A heap would be good for insertion and deletion but terrible for
retrieval. In most applications, retrieval is the principal operation. You build up a table
initially with a sequence of insertions, and then do a large number of retrievals. Deletion is
usually rare. The importance of retrieval makes heaps poor way to implement tables.
The best choices are binary search trees (especially if balanced), or B-trees, giving O
(logN) insertion, deletion and retrieval. We will look at a technique called hashing that
aims to make these operations in constant time. That may seem impossible, but hashing
does indeed come very close to achieving this goal.
We can access any position of an array in constant time. We think of the subscript as the
key, and the value stored in the array as the data. Given the key, we can access the data in
constant time. For example, suppose we want to store the student details in a table class.
We could use an array of size 100, say, and assign to each student a particular position in
the array. We tell this number to the student calling it his/her student record. We have used
a student number as a subscript in the array.
This is the basic idea behind a hash table. In fact, the only flaw in the strategy is that
there is a need to be addressed is the steps which tell you what the student number is. In
practice, we usually do not control the key values: the set of possible keys is given to us as
the part of the problem, and we must accommodate it. To carry on with our example,
suppose that circumstances forced us to use some part of student personal data as the key
say the students social insurance number as an array subscript, and you stored your
information in the position that it is indexed.
The set of possible key values is very large. This set might even be unbounded.
Imagine that the student name was to be used as the key: there are an infinite number
of different names.
The the set of actual key values is quite small.
To get constant-time operations, we must use an array to store the information.
The array cannot possibly be large enough to have a different position for every possible
key. And, in any case we must be able to accommodate keys of types (such as real
numbers or strings) that are not legitimate (in C) as array subscripts.
10.4 HASHING
The search techniques are based exclusively on comparing keys. The organization of the
file and the order in which the keys are inserted affect the number of keys that must be
examined before getting the desired one. If the location of the record within the table
depends only on the values of the key and not on the locations of the keys, we can retrieve
each key in a single access. The most efficient way to achieve this is to store each record
at a single offset from the base application of the table. This suggests the use of arrays. If
the record keys are integers, the keys themselves can serve as the index to the array. There
is a one-to-one correspondence between keys and array index.
The perfect relationship between the key value and the location of an element is not easy
to establish or maintain. Consider, if an institute uses its students five digit ID number as
the primary key. Now, the range of key values is from 00000 to 99999. It is clear that it
will be impractical to setup an array of 1,00,000 elements each if only 100 are needed.
What if we keep the array size down to the size that we actually need (array of 100
elements) and just use the last two digits of the key to identify each student? For instance,
the element of student 53374 is in student record [74].
Position
Key
31300
49001
52202
99
01999
Record
Hashing is an approach to convert a key into an integer within a limited range. This key
to address transformation is known as hashing function which maps the key space (K) into
an address space (A). Thus, a hash function H produces a table address where the record
may be located for the given key value (K).
Hashing function can be denoted as:
H : K A
Ideally no two keys should be converted into the same address. Unfortunately, there
exists no hash function that guarantees this. This situation is called collision. For example,
the hash function in the preceding example is h(k) = key % 100. The function key % 100
can produce any integer between 0 and 99, depending on the value of key.
A hash table is used for storing and retrieving data every quickly. Insertion of data in
the hash table is based on the key value. Hence, every entry in the hash table is
associated with some key. For example, for storing an employee record in the hash
table the employee ID will work as a key.
Using the hash key the required piece of data can be searched in the hash table by few
or more key comparisons. The searching time is then dependent upon the size of the
hash table.
Effective representation of directory can be done using a hash table. We can place the
dictionary entries (key and value pair) in the hash table using the hash function.
496800
1
2
7421002
998
7886998
999
1245999
10.5 COLLISION
The hash function is a function that returns the key value using which the record can be
placed in the hash table. Thus this function helps us in placing the record in the hash table
at an appropriate position and due to this we can retrieve the record directly form that
location. This function needs to be designed very carefully and it should not return the
same hash key address for two different records. This is undesirable in hashing.
The situation in which the hash function returns the same hash key for more than one
record is called collision and the two identical hash keys returned for different records are
called synonyms.
When there is no room for a new pair in a hash table such a situation is called an
overflow. Sometimes when we handle collision it may lead to overflow conditions.
Collision and overflow show poor hash functions.
Example: Consider a hash function. H(key) = key % 10 having the hash table of size 10.
The record keys to be placed are 131, 44, 43, 78, 19, 36, 57 and 77
0
1 131
2
3 43
4 44
5
6 36
7 57
8 78
9 19
Now if we try to place 77 in the hash table then we get the hash key to be 7, and at index
7 the record key 57 is in place already. This situation is called collision. From the index 7
if we look for next vacant position at subsequent indices 8, 9 then we find that there is no
room to place 77 in the hash table. This situation is called an overflow.
Characteristics of Good Hashing Function
1. The hash function should be simple to compute.
2. Number of collisions should be less while placing the record in the hash table. Ideally
no collision should occur. Such a function is called a perfect hash function.
3. Hash functions should produce such a key which will get distributed uniformly over
an array.
4. The function should depend upon every bit of the key. Thus the hash function that
simply extracts the portion of a key is not suitable.
10.6.1 Chaining
In collision handling method chaining is a concept which introduces an additional field
with data i.e. chain. A separate chain is maintained for the colliding data. When collision
occurs then a linked list (chain) is maintained at the home bucket.
Chaining involves maintaining two tables in the memory. First of all, as before, there is a
table in the memory which contains the records except that now it has an additional field
Link, which is used so that all records in the table with same hash address H may be
linked together to form a linked list. Second there is a hash address table list, which
contains pointers to the linked list in the table.
Chaining hash tables have advantages over open addressed hash tables in that the
removal operation is simple, and resizing the table can be postponed for a much longer
time because performance degrades more gracefully even when every slot is used.
Example: Consider the keys to be placed in their home buckets are: 3, 4, 61, 131, 24, 9, 8,
7, 97, 21
We will apply a hash function as:
H(key) = key % D
where D is size of table. (Here D = 10) The hash table will be:
Example: Consider the keys to be placed in their home buckets are: 3, 4, 61, 131, 21, 24,
9, 8, 7
We will apply a hash function. We will use the division hash function. That means the
keys are placed using the formula:
H(key) = key % tablesize
H(key) = key % 10
For instance the element 61 can be placed at:
H(key) = 61 % 10
= 1
Index 1 will be the home bucket for 61. Continuing in this fashion we will place 3, 4, 8,
7.
0 Null
1 61
2 Null
3 3
4 4
5 Null
6 Null
7 7
8 8
9 9
Now the next key to be inserted is 131. According to the hash function
H(key) = 131 % 10
H(key) = 1
But the index 1 location is already occupied by 61 i.e. collision occurs. To resolve this
collision we will linearly move down and at the next empty location. Therefore 131 will
be placed at index 2. 21 is placed at index 5 and 24 at index 6.
0 Null
1 61
2 131
3 3
4 4
5 21
6 24
7 7
8 8
9 9
We can see that the chain is maintained at the number which demands for location 1.
When the first number 131 comes we will place it at index 1. Next comes 21 but collision
occurs so by linear probing we will place 21 at index 2, and the chain is maintained by
writing 2 in the chain table at index 1. Similarly next comes 61 by linear probing. We can
place 61 at place 61 at index 5 and the chain will be maintained at index 2. Thus, any
element which gives hash key as 1 will be stored by linear probing at an empty location
but a chain is maintained so that traversing the hash table will be efficient.
The drawback of this method is in finding the next empty location. We are least bothered
about the fact that when the element which actually belongs to that empty location cannot
obtain its location. This means that the logic of hash function gets disturbed.
3 31 1
6
7
8
9
Now next element is 2. The hash function will indicate the hash key as 2. We have stored
element 21 already at index 2. But we also know that 21 is not of that position at which
currently it is placed. Hence we will replace 21 by 2 and accordingly the chain table will
be updated. See the table:
Index Data Chain
0
131
31
21
The value 1 in the hash table and chain table indicate the empty location.
The advantage of this method is that the meaning of hash function is preserved. But each
time some logic is needed to test the element, whether it is at its proper position or not.
4
5 65
6 87
7 27
8 17
9 49
6
7 37
8
9 49
H1 (45) = 45 % 10 = 5
H1 (22) = 22 % 10 = 2
H1 (49) = 49 % 10 = 9
Now if 17 is to be inserted, then:
H1 (17) = 17 % 10 = 7
H2 (key) = M (key mod M)
Here M is a prime number smaller than the size of the table. A prime number that is
smaller than the table size of 10 is 7.
Hence, M = 7
H2 (17) = 7 (17 mod 7) = 7 3 = 4
That means we have to insert the element 17 at 4 places from 37. In short we have to
take 4 jumps. Therefore, 17 will be placed at index 1.
Now to insert 55,
H1 (55) = 55 % 10 = 5
H2 (55) = 7 (55 mod 7) = 7 6 = 1
That means we have take one jump from index 5 to place 55. Finally, the hash table will
be:
0 90
1 17
2 22
3
4
5 45
6 55
7 37
8
9 49
10.6.7 Rehashing
Rehashing is a technique in which the table is resized. The size of the table is doubled by
creating a new table. It is preferable to have the total size of table as a prime number.
There are situations in which the rehashing is required which are:
When the table is completely full
With quadratic probing when the table is filled half
When insertions fail due to overflow
In such situations, we will have to transfer entries from the old table to the new table by
recomputing their positions using suitable hash functions.
Consider that we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. The table size
is 10 and will use hash function,
H (key) = key mod table size
37 % 10 = 7
90 % 10 = 0
55 % 10 = 5
22 % 10 = 2
Now this table is almost full and if we try to insert more elements collisions will occur
and eventually further insertion will fail. Hence we will rehash by doubling the table size.
The old table size is 10 then we should this size for new table, which becomes 20. But 20
is not a prime number. We will prefer to make table size as 23. Now hash function will be
H (key) = key mod 23
37 % 23 = 14
90 % 23 = 21
55 % 23 = 9
3 49
22 % 23 = 22
17 % 23 = 17
49 % 23 = 3
87 % 23 = 18
7
8
9 55
10
11
12
13
14 37
15
16
17 17
18 87
19
20
21 90
22 22
Index
A
Abstract data type (ADT)
Adjacency list, representation of
Adjacency matrix
properties of
representation of
ADT (abstract data type)
array as
libraries of
programming with
reusability of
Algorithm
complexity notations
complexity of time
efficiency of
for DQ Full
for insert front
implementation of
Almost complete binary tree
Array
analysis of
definition of
disadvantages of
limitations of
representation of
uses of
Array polynomial
representation of
Ascending priority queue
Design
Digital binary search tree
Dijkstras shortest path algorithm
Directed graph
Documentation
Domain for fraction
Double hashing
Doubly circular linked list
Doubly linked list
Down pointer
D-queue (double ended queue)
ADT for
input restricted
output-restricted
Dynamic memory management
linked list and
Dynamic memory
allocation in C
Dynamic tree table
E
Efficiency
Eight queens problem
Extended binary tree
External sorting
F
Free functions
Factorial function
Feasibility study
Fibonacci function
G
Garbage collection and compaction
Malloc function
Matrices
Merge sort
Minimal spanning tree
Minimum cost spanning tree
Minimum spanning tree
Multigraph
Multiway search tree
N
Next pointer
Node directory
representation of
Node, structure of
Non-leaf nodes
Non-terminal nodes
Null graph
O
Omega notation
One-dimensional array
Open addressing
Operations
Optimal binary search tree
Order
Ordered list
operation on
Ordered trees
P
Parallel edges
Pass
Polish notation
Polynomials
Postconditions
Postfix expression
evaluation of
Postfix to infix expression
Postfix to prefix expression
Preconditions
Prefix expression
Prims algorithm
Priority queue
applications of
ADT for
Problem specification
Programs
analysis of
Q
Quadratic probing
Queue structure
Queue
applications of
as ADT
in C++
operations on
static implementation of
Quick sort
algorithm for
working of
R
Radix sort
algorithm for
Recursion
Recursive functions
Tree
common operations on
definition of
uses for
Two-dimensional arrays
U
Undirected graph
Unweighted shortest path
User defined data type
W
Weight balanced tree