Data Structures Using C Compress
Data Structures Using C Compress
M -rv *
v-
■
'1 i
Data
Structures
Using C
\
Samir Kumar Bandyopadhyay
Kashi Nath Dey
Data Structures and Software Development in an Object Oriented Domain, Java Edition
Jean-Paul Tremblay, Grant A. Cheston
‘a
PEARSON
Copyright © 2009 Dorling Kindersley (India) Pvt. Ltd.
Licensees of Pearson Education in South Asia
No part of this eBook may be used or reproduced in any manner whatsoever without the publisher’s
prior written consent.
This eBook may or may not include all assets that were part of the print version. The publisher
reserves the right to remove any material in this eBook at any time.
ISBN 9788131722381
eISBN 9789332501362
Head Office: A-8(A), Sector 62, Knowledge Boulevard, 7th Floor, NOIDA 201 309, India
Registered Office: 11 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India
PREFACE
This book aims to cater to beginners who look to learning C and data structure under the same umbrella.
While teaching C and data structure, we felt the need for a balanced book on the subject. In fact, this is the
main impetus for writing such a book.
The book is designed for a one-semester course or a one-year course. It is suitable for courses based
on algorithms and data structures. The prerequisite for using this text is elementary to middle level
knowledge of C programming.
Algorithms in the book are presented in a way that readers can easily understand the method of
solving problems. Concepts are illustrated through examples. All programs in the text are tested. Each
chapter ends with exercises containing questions of varied difficulty levels.
Chapter 1 deals with basic data representation techniques. Chapter 2 concentrates on abstract data
types and structures together with the concepts of implementing a data structure. Chapter 3 covers array
data structures, the simplest and one of the best-knWn linear data structures, and their implementation
details. As applications of array data structures, Chapter 4 deals with string processing and data matching
techniques. Chapter 5 introduces the concept of pointers in C. Why pointers play a key role in algorithm
implementation, and how and when to use them are discussed in detail. Stacks and queues are covered in
Chapter 6. Though they are special type of lists, this chapter deals only with their array implementation.
Expression evaluation is discussed as an application of stack. A rudimentary program for skill testing in
multiplication is presented as an application to queue.
Chapter 7 covers recursion, a problem-solving technique. Lists are defined in Chapter 8. This chapter
elaborates on the concepts of linked lists, their implementation techniques using both arrays and pointers.
Linked list manipulation and list searching is also covered in the chapter. A word indexing program is
explained and presented as an application through both array and linked version. Chapter 9 brings forth
different variants of linked lists. Linked implementation of stacks and queues are presented in this chapter.
It also focuses on major application areas of linked lists. Chapter 10 discusses details of internal sorting
as well as some external sorting algorithms. Given a large number of internal sorting techniques, one
must choose the best alternative for a particular problem.
Chapter 11 deals with various searching methodologies. The concept of trees is introduced in
Chapter 12. It starts with the general tree and then concentrates on binary trees, tree traversal techniques,
binary search trees, AVL trees, and B-trees. Chapter 13 describes graphs. The coverage of graph algorithms
completes the basic understanding of data structure. The chapter discusses only fundamental graph
algorithms.
It is possible that C language will be replaced by a better language in the near future. However, we
think our techniques will remain with our readers.
We acknowledge the help of Dr. S. SenSarma, Reader of the Department of Computer Science and
Engineering, University of Calcutta, for taking an active interest in different ways. We also thank our
colleagues, students, and other members of the Department of Computer Science and Engineering,
University of Calcutta, for providing the right environment during the preparation of the manuscript.
We thank the members of our family without whose help this book could not be written.
Preface v
3. ARRAYS___________________________________________________________ 29
3.1 Linear Arrays 29
3.2 Arrays in C 29
3.3 Initializing Arrays 32
3.4 Insertion and Deletion 34
3.5 Multidimensional Arrays 36
3.6 Row-major and Column-major Order 38
Exercises 50
5. POINTERS_________________________________________________________ 71
5.1 Introduction 71
5.2 Fundamentals and Defining Pointers 71
5.3 Type Specifiers and Scalars for Pointers 72
5.4 Operations Using Pointers 73
5.5 Passing Pointers to Functions 73
5.6 Pointers and Arrays, Pointer Arithmetic 74
5.7 Pointers and Two-dimensional Arrays 78
5.8 Array o f Pointers 80
5.9 Pointers to Pointers 83
5.10 Pointers to Functions 84
5.11 Command the Arguments 85
Exercises 88
7. RECURSION_______________________________________________________ 119
7.1 Basic Concepts o f Recursion 119
7.2 Recursion Implementation 124
7.3 The Tower o f Hanoi 126
7.4 Time and Space Requirements 132
7.5 Recursion vs Iteration 134
7.6 Examples 135
7.7 Cost o f Recursion 140
Exercises 142
8. LISTS_____________________________________________________________144
8.1 Sequential Lists 144
8.2 Linked Lists 146
■ CONTENTS ■ ix
10. SORTING__________________________________________________________200
10.1 Introduction 200
10.2 Sorting Techniques 200
10.3 Sorting on Multiple Keys 229
Exercises 237
12. TREES____________________________________________________________253
12.1 Fundamental Terminologies 253
12.2 Binary Trees 255
12.3 Traversals o f Binary Tree 256
12.4 Threaded Binary Tree 260
12.5 Binary Search Trees 263
12.6 AVL Trees 274
12.7 B-Trees 282
Exercises 292
x ■ CONTENTS ■
INDEX 310
FUNDAMENTALS OF DATA REPRESENTATION
Data structure is the study of concrete implementations of frequently occurring abstract data
types. An abstract data type is a set, together with a collection of operations on the elements of
the set. There are several terms we need to define carefully before we proceed to different types
of data structures such as arrays, stacks, linked list, and so on.
The meaning of data representation is introduced in Section 1.1. In Section 1.2 there are
definitions of data types, data object, and data structure. The notion of abstraction is very im-
portant in computing. We are particularly interested in its application to data stored in a digital
computer. In Section 1.3 we will introduce the concept of data abstraction and abstract data
types.
A data type is an abstract concept defined by a set of logical properties. Once such an
abstract data type is defined, it is important to know how to implement it in a machine. Section
1.4 describes the system-defined data types. Section 1.5 will highlight the concepts of primitive
data structure. C language is used in this book since it is used globally and continues to grow in
popularity.
DATATYPE
A data type is a collection of values along with a set of operations defined on those values. The
essence of a type is that it attempts to identify qualities common to a group of individuals or
objects that distinguish it as an identifiable class or kind. In a programming language, the data
type of a variable is the set of values that the variable may assume. The basic data types vary
from language to language.
Let us look at two classes of data types. A simple, or basic, data type is made up of values
that cannot be decomposed. In 'C', they are int (for integers), floa t (for real), char (for char-
acters), and so on. A composite data type, also called a data structure, is one in which the ele-
ments of the data type can be decomposed into either simple data types or other composite data
types. Examples of composite types include the familiar array and structure in C language. In
data structure the values of data types are decomposable, and we must therefore be aware of
their internal construction. There are two essential ingredients to any object that can be decom-
posed—it must have component elements and it must have structure, the rules for relating or
fitting the elements together.
The operations of a structured data type might not only act on the values of the data type,
they might also act on component elements of the data structure. We now present the formal
definitions of some terms that must be known to the readers.
In a programming language, a data type is a term that refers to the nature of data which
variables hold. In C the data types are int, float, char, short, unsigned, double,
long, and so on. These are built-in data types and type de f in C can be used to construct new
data types.
■ FUNDAMENTALS OF DA TA REPRESENTA TION ■ 3
Data object refers to a set of elements, say F. For example, the data object 'float' refers to
F={0, ± -5, ± *6 +....}.
Similary, the data type 'int' in C language refers to data object integers, that is, a variable
of int data type can hold only integer-type data objects.
We are not only interested in the content of data objects but we also need to know the
way they are related. A data structure is a data type whose values are composed of component
elements that are related by some structure. Since a data structure is a data type, it must have a
set of operations on its value. Further, there may be operations that act on its component ele-
ments. We can write program using the operations defined on the data and its structure. We
imagine an abstract data type in our program and we can do so, being concerned with neither
how the data will be represented in the computer nor the details of the code that implements the
operations.
In the next section, we will present data abstraction and abstract data types.
structure. Thus we attempt to bring the power of abstraction to bear on the study of data struc-
tures. In order to do that, we provide a template to view and discuss each data structure. This
template is called an abstract data structure and consists of three basic components: (i) specifi-
cation, (ii) representation, and (iii) implementation.
Our approach throughout the book is to implement data structures using modules. These
modules here act as black boxes. The user has no direct access to the data structure since the
data structure and the algorithms are encapsulated within the module. The integrity of the data
structure is protected because the user gets control over it through operations that are sepa-
rately specified and implemented. The implementation is done very carefully and in such a way
as to assure preservation of the integrity of the data structure. This is an advantage to users for
designing a software system.
Another advantage is maintainability. Implementation independence frees the user from
non-functional details. The implementation may be changed with no effect on the way in which
the program executes. It may, however, affect the performance, that is, time, space, and main-
tainability. For example, if we change the basic technique used to perform an operation—with-
out changing the operation performed—then the user may see a change in the performance for
the module containing that operation but will not see any change in the results produced. The
user is protected from changes in the way in which operations are implemented.
If an abstract data structure is to be more than mere theoretical interest, it must be imple-
mented. Although the user still deals with the abstract conception of the structure, and indeed,
the notion of abstract data structuring is to guarantee that the user need deal with no more than
abstraction. The implementor must face the problems of representation and implementation.
As was told earlier, the abstraction can be treated as the functional specifications of a black box.
The implementor must design the box in such a way that memory space is not wasted and the
operations are performed simply and efficiently. The implementor must be familiar with the
physical data type and virtual data type. We often implement our data type using some high-
level language. For example, in C language we might define the variables A, B, and C as integer,
real, and character data type,
int A;
float B;
char C [10 0];
We call the above as virtual data type. Eventually any structure is stored in a physical
memory to be operated on a physical machine, that is, computer. The actual physical operations
that the machine can perform are limited to those in its machine language. We will call a data
type at this level a physical data type. Thus abstract data types are implemented with virtual
data types. Virtual data types are translated into physical data types.
In summary, we have understood the basic idea of an abstract data type. Many different
modules can be written that implement the same abstract type. Advantages of abstract data
types are highlighted.
SYSTEM^DEFINED
In defining an abstract data type as a mathematical concept, we do not consider the implemen-
tation issue. Often no implementation, hardware or software, can model a mathematical con-
cept completely. For example, an arbitrarily large integer cannot be represented due to the finite
size of the machine's memory. Thus, it is not the data type 'integer' that is represented by the
■ FUNDAMENTALS OF DA TA REPRESENTA TION ■ 5
hardware but rather the data type 'interger between a and b', where a and b are the minimum
and maximum integers representable by that machine. Once a representation has been chosen
for objects of a particular data type and routines have been written to operate on those represen-
tations, the programmer is free to use that data type to solve a problem. The programmer need
not worry about how the computer is designed and what circuitry is used to execute each in-
struction. The programmer needs know only what instructions are available and how those
instructions can be used. The programmer must know about the data types which are available
in the system.
A data type is an abstract concept by a set of logical properties. We can define a limitless
number of data types but system considerations are necessary before implementing the data
types. In hardware implementation, proper circuitry is necessary to perform the requisite op-
erations and software implementation includes specifications of how such data types are to be
manipulated.
Every computer system has a set of 'native' data types. It is the programmer's responsi-
bility to know what data types are available in the system and how they are stored in memory.
C has the 'usual' simple data types: characters, integers, and numbers with fractional
components. In addition, C allows variants of some of these types. Simple types include char,
int, float, and double. These types differ in the sort of information they contain and in
the amount of storage space allocated to them on different systems, ranging from 1 to 8 bytes.
C's character data type is known as char. C allocated 1 byte for storing a character. We
can store at the most 256 values. The integer data type, i n t , is used to represent whole numbers
within a specified range of values. Variables of type int are usually stored in 2 bytes ranging
from - 32,768 to 32,767. Internal representation of a whole number can be treated as a character
or an integer.
The real numbers are represented by data type float. C allocated four types to repre-
sent variables of floating type. The double type is for double-precision floating-point numbers.
For doubles, C allocates 8 bytes as storage space.
In addition to simple types, C provides other types, which are variations of char and
int. These types are essentially for a different amount of memory to be allocated for storing a
value. Signed types, where numbers can be positive or negative, are standard in most languages.
C also allows us to declare integers as unsigned, so the sign bit is used as part of the number
rather than as a sign indicator. Short and long are requests for versions of the int type for
which different amount of storage may be allocated. The amount of space allocated for these
variants depends on the implementation. For example, short reserves 1 byte while long requires
4 bytes as storage space. Finally, to represent a number as a hexadecimal constant, Ox or OX is
placed before the hexadecimal representation and octal integers start with the digit 0.
in a class, the number of passengers in a train, and so on, are all information items expressible as
integers. Integers are represented in signed magnitude form, signed complement form, and
signed 2's complement form. Although the first form is probably the simplest in concept, other
methods such as two's complement representation are used in modern computer systems to
simplify the design of computer circuitry. The data type integer provided by C can be viewed as
an abstract data type whose specification is as follows.
(a) It reserves 2 bytes for int and unsigned int but 1 byte for short int.
(b) The elements are the whole numbers from - maxinteger to maxinteger. The value of
maxinteger is implementation dependent.
(c) Integers are both ordered and linear.
(d) The set of operations is implementation dependent.
Real numbers can be represented by either fixed-point representation, such as in 15.75 or
floating-point representation, such as -1575 x 102. Floating-point representation is the most com-
mon storage structure used for real data. In this representation, the real number is expressed by
mantissa and exponent. For example, -1575 x 102 is expressed with a mantissa -1575 and expo-
nent 102, with a radix 10. The radix and the number of digit positions represented with a float-
ing-point format vary from one computer to another. Floating-point reals in C can be viewed as
an abstract data type with the following properties.
(a) The values are a finite subset of the real numbers. The actual subset is imple-
mentation dependent.
(b) The structure is ordered.
(c) Typical operations are assignment, arithmetic, relational, and so on.
(d) Four to eight bytes are required to store real numbers.
Usually, the sign is the first bit, that is, MSB (most significant bit) in a floating-point
representation, and by convention 0 denotes a positive number and 1 denotes a negative num-
ber (this is also true in case of integer-number representation). The biased exponent is an ex-
pression of the exponent in a form of notation called excess notation. In a seven-digit field, we
can express non-negative integers in the range 0 to 127 (in excess notation) though we are ca-
pable of representing integers in the range - 64 to + 63 in 2's complement representation. A
floating-point number with an exponent of -25 would have a characteristic of - 25 + 64 = 39 in
excess - 64 notation. Further, the mantissa part of a floating-point number is expressed as a
normalization form, that is, there is no significant digit before the decimal point. For example,
0.4272 is in normalized mantissa whereas 4.272 is not in normalized form.
In addition to a floating-point representation of real numbers, a fixed-point storage rep-
resentation is also possible. Real numbers are stored similar to the structure involving integer
numbers. In C language both representations are available with different control parameters
(such as %f and %e).
A wide variety of character sets or alphabets are handled by most computers. The two
most widely used codes are the ASCII and the EBCDIC. Characters are used as a primitive data
structure since they are useful in expressing much of the non-numeric information which can be
processed by a computer. Each character is stored as a fixed number of bits in the computer's
memory. A common technique for storing characters in a computer's memory is to store each
character in 1 byte. One byte is a sequence of 8 bits. In many digital computers it is the smallest
unit of information that can be addressed directly. Such machines operate efficiently on indi-
vidual characters.
■ FUNDAMENTALS OF DA TA REPRESENTA TION ■ 7
C allocates 1 byte for storing a character. This means that we can have at the most
256 character values. Generally, the byte used to store a character variable is interpreted as
having values ranging from -128 to 127. Depending on the character encoding scheme used,
positive values of a character variable correspond to particular characters. For example, a char-
acter value of 100 corresponds to the character 'd' in the ASCII character set used on most
systems. Negative values generally do not have a 'useful' interpretation. Also, integer numbers
and characters can be used interchangibly with the control parameters %d and %c.
A logical data item is a primitive data structure that can assume the value of either 'true'
or 'false'. C has three classes of operators: arithmetic, relational and logical, and bitwise opera-
tors. The key to the concept of relational and logical operators is the idea of true and false. In C,
true is any value other than 0 and false is 0. Therefore, expressions that use relational or logical
operators will return 1 for true and 0 for false. The three most common logical operators are
'AND' (&&), 'OR' (!!), and 'NOT' (!). If A and B are logical variables, then A && B is true if A and
B both have the value true, otherwise, the result is false. The result of A!!B is false if A and B
have the value false, otherwise the result is true. !A is false if A is true or !A is false if A is true.
Logical variables are used often to represent complex logical expressions and also as terminat-
ing conditions in the loop evaluation. Relational and logical operators always produce a result
that is either 0 or 1 and bitwise operators are used to change the values of variables, not to
evaluate true or false conditions.
The storage representation of logical values is dependent upon the compiler and the
machine for which the compiler is designed. One bit is sufficient to represent true or false but
because of the difficulty most computers have in isolating a single bit, it is common to find an
entire byte. Most computers cannot address a single bit in their memory but must address at-
least a byte. Therefore, 8 bits at a time are fetched into the registers. The single bit representing
the boolean quantity would then have to be isolated by masking out all of the others. Most
designers have chosen to sacrifice some memory space to avoid their complexity and use the
whole byte to represent boolean quantities.
A pointer is a reference to a data structure. It is a word or portion of a word in memory,
which, instead of containing data, contains the address of another word or byte. Pointer is a
single fixed-size data item and it provides a homogenous method of referencing any data struc-
ture. Pointer permits faster addition and deletion of elements to and from a data structure. In
terms of storage representation, addresses are generally assigned a word or half a word of
storage in most computers. Thus, the larger the number of addresses in the computer, the larger
the amount of storage needed to represent an address. In the next chapter we will describe
fundamentals of data structure.
__________________________ E X i E i R C I S E S __________________________
1. What are the two basic data types? How are they defined in C language?
2. Define the term 'data object'. How is it different from abstract data type?
3. How many components are there in abstract data structure? Explain the term 'maintain-
ability' .
4. Why is system-defined data type different from primitive data types? Explain.
5. Describe the specifications of integer data type.
8 ■ DATA STRUCTURES USING C ■
6. What are the different data types that are available in C language?
7. Are the following system-defined data types? Give reasons for your answers.
(a) Files
(b) Pointers
(c) Enumerated data types
8. Explain why it is not possible to support opaque types in C.
9. Describe one or two situations in your everyday life where you use the idea of abstraction
to simplify large tasks that you need to perform.
10. Describe the nature of pointer data type in C.
2
Computer science is primarily concerned with the study of data structures and their transfor-
mation by some techniques. The modern digital computer was invented and intended as a
system that should facilitate and speed-up complicated and time-consuming computations. In
the majority of applications its capability to store and retrieve large amount of information
plays a dominant role in processing information. The information which is available to the com-
puter consists of selected set of data relating to a real-world problem and it is believed that the
desired results can be derived from those set of data. So it is desirable to understand the logical
relationships between the data items in the problem. The possible ways in which the data items
or atoms are logically related define data structures.
We assume that numbers are stored in an array X. We hope that these instructions are suffi-
ciently clear so that the reader grasps our intention.
Algorithm 2.1: Searching a maximum from an array X
Input: An array, X, with n elements.
Output: Finding the largest element, MAX, from the array X.
Step 1: Set MAX=0 / * Initial value of MAX* /
Step 2: For j=l to n do
Step 3: If (X[j]>MAX) then MAX=X[j]
end for
Step 4: Stop
Each algorithm in this book is given a number and a title. The title immediately follows
the algorithm number on the same line. Inputs and outputs are described next. The body of the
algorithm consists of a set of numbered steps (the word 'step' before the number). Comments
(similar to C comments) may appear in steps of an algorithm to help the reader in understand-
ing the details. For example, the remark /* initial value of MAX */ appears at Step 1. Different
constructs such as for-do-end, if-then, while-do, and so on are used very similar to
pseudolanguage. It is important to emphasize that data structures are language independent.
Pseudocode is a general tool that allows notation similar to any high-level language.
An algorithm can be described in many ways. As described earlier, pseudocode can also
be used to represent an algorithm. Another way we can express an algorithm is through a graphi-
cal form of notation such as flowcharts. In case of complex decisions, it is difficult to understand
the decisions either in flowcharts or through pseudocode. Decision table is an alternative analy-
sis tool for indicating complex relationships and solutions. In view of this, we start our discus-
sion through the basic concepts of flowcharting for expressing an algorithm.
2.3.1 Flowcharts
A flowchart is a pictorial representation of an algorithm. It serves as a means of recording,
analyzing, and communicating problem information. Programmers often use a flowchart be-
fore writing a program. It is not always mandatory to draw a flowchart. In practice, sometimes,
drawing of the flowchart and writing of code in a high-level language go side by side.
Two kinds of flowcharts are used—program flowchart and system flowchart. A program
flowchart (also called a flowchart) shows the detailed processing steps within one computer
program and the sequence in which those steps must be executed. Different symbols are used in
a flowchart to denote the different operations that take place in a program. Terminal symbol
C 3 shows clearly the beginning and ending of the program. The symbol / / denotes the
input/output operation. Any manipulating or processing of data within the computer is ex-
pressed by the processing symbol EZH. In a flowchart the decision symbol <(^> is used to specify
a conditional branch or decision-making step. Connector symbols O are used in a flowchart to
denote exit to or entry from another part of the flowchart.
A system flowchart shows the procedures involved in converting data on input media to
data in output form. Emphasis is placed on the data-flow into or out of a computer program,
the forms of input and the forms of output. A system flowchart makes no attempt to depict the
function-oriented processsing steps within a program. A system flowchart may be constructed
12 ■ DATA STRUCTURES USING C ■
by the systems analyst as part of the problem definition. However, algorithms in data structure
are always expressed in the form of flowcharts.
A system flowchart for monthly billing is show in Fig. 2.1(a) to emphasize a distinction
between a system flowchart and a flowchart, a flowchart showing the detailed processing steps
in the monthly billing program is given in Fig. 2.1(b).
In drawing flowcharts we call directly our attention to the standard flowcharting symbols and
techniques recommended by the American National Standards Institute (ANSI) and its interna-
tional counterpart, the International Organization for standardization (ISO). These symbols are
used throughout the book. These symbols are summarized in Fig. 2.2.
The program flowchart in Fig. 2.1 has one serious drawback; it shows how to compute
the monthly statement for only one customer. Generally, a computer program is written to
perform a particular operation or sequence of operations many times. To provide for this, a
program flowchart can be made to curve back on itself, that is, a sequence of processing steps
can be executed repeatedly on a different set of data. In effect, a program loop is formed. We
now present the modified flowcharts in Fig. 2.3.
begin
Read a salesperson payroll record
do while there is more data
multiply sales by commission rate
if sales is greater than quota
then add 10% bonus to commission
endif
add commission to salary
write a report line
read a salesperson payroll record
enddo
end
Certain words in a pseudocode are significant. 'Input' or 'read' a record means that data
is made available to the computer for processing. The input data is generally in the form of a
record, for which several fields of data pertaining to a person or thing are given as one-line
input or one item. If the data pertained to employee records, a record might contain the em-
ployee identification number, the department to which that employee is assigned, the number
of hours worked, the rate per hour, and the tax deduction.
The word 'set' or 'assign' is often used in a pseudocode to initialize values to a desired
amount. The word 'if' in the pseudocode indicates comparison between two items. Sometimes
the words add, subtract, multiply, or divide appear in a pseudocode but this often is the choice
of the programmer. Another word used in the pseudocode is 'print' or 'write'. It indicates that
data is to be prepared as output on the printer. Other words, such as do-enddo, dowhile-endwhile,
are used. We will now illustrate examples of pseudocode.
Example 2.1
b egi n
do
read a rec o r d of three numbe rs
p rin t eleme nts in rec o r d
compute sum of elements
print sum
enddo
end
Example 2.2
beg in
read a r e c ord-ho urs worked, rate, tax
mu l t i p l y rate b y hours w o r k e d a nd set it to gross p a y
compute n e t p a y = gross pay - tax
wr it e hours worked, rate, tax, gross pay, net p a y
end
■ FUNDAMENTALS OF DA TA STRUCTURES — BASIC CONCEPTS ■ 15
We now present another pseudocode for insertion sort with a procedure insertion sort. It
takes as a parameter an array A[l]-A [n] containing a sequence of length n which is to be sorted.
Insertion sort works the way many people sort cards. We start with an empty left hand and
cards face down on the table. We then remove one card at a time from the table and insert it into
the correct position in the left hand. To find the correct position for a card, we compare it with
each of the cards already in the hand from right to left.
Insertion sort
procedure
J <- 2
> Return
[Jl
A [i + 1] <- Key
A[i +1 ]
The above insertion sort procedure can be graphically described by the flowchart shown
in Fig. 2.5.
Rules
In the above example, three conditions are identified but any one of the three conditions
will produce a certain action. If an employee worked less than 10 hours, his or her name should
be deleted from the payroll list, irrespective of whether or not he or she was late more than once
or absent on the day before a holiday, and so on. The decision table is shown in Fig. 2.7. For each
of the three possible conditions, the condition entry in a column may be Y for 'yes' or N for 'no'.
The action entry contains an 'X' for the satisfying conditions.
Rules
Condition Stub 2 2 3 4 5 6 7 8
1. He or she worked less than Y Y Y Y N N N N
10 hours
2. He or she was late more Y Y N N Y Y N N
than once
3. He or she was absent before Y N Y N Y N Y N
a holiday
Action Stub Action Entries
Fig. 2.8 Flowchart for the heavier coin among coins a, b, c, d, e and f
18 ■ DATA STRUCTURES USING C ■
for corrections as it is developed, instead of having to wait until the entire program is com-
pletely coded. As program units get larger, debugging time increases drastically, eventually
becoming the dominant step in the entire programming project. It is definitely to our advantage
to debug and test a program as a collection of small units rather than one large one.
The important point is that having modularised the problem and developed it in a top-
down fashion, we have a choice of ways to approach the implementation. As we work on one
unit, we know what unit we should work on next and which ones can be effectively postponed.
Without this plan of action, we might write code in some less logical order and thus not have
pieces that work together or that can be tested as a unit. The top-down design method, which
develops as a hierarchical set of tasks, defines a natural set of small sub-units that can be indi-
vidually tested, verified, and integrated into the overall solution. In the next subsection we
describe the top-down design method.
We consider a program X that can be subdivided into three independent submodules, Xl7
X2, and X3. These submodules, Xv X2, and X3 are further broken down into independent
submodules, and so on. This process is repeated until we obtain modules which are small enough
to be understood and coded quite easily. We can represent it pictorially as shown in Fig. 2.10.
Top-down design is not the one-step process as shown in Fig. 2.10(a). The decomposition
and simplified task is performed over and over. First on the original problem and then on suc-
cessive sub-units until finally we are left with a task that is so elementary that it need not be
simplified any further. The repeated decomposition and simplification of a task into a collection
of simpler sub-tasks is called stepwise refinement. Each of the tasks X in Fig. 2.10(b) represents
a separate program until needed to solve the original problem.
20 ■ DATA STRUCTURES USING C ■
The data structures needed for the solution are also developed in a top-down fashion as
we proceed from general descriptions of abstract data types to operations performed on these
data types to their internal implementation. Next we will decide on the internal structure for
our abstract data types and write the internal module that implements the operations. When we
have finished refining our program units and abstract data types, we would have created a
large number of procedures and data structures, defined in terms of the information they con-
tain and the operation that can be performed on them. Description of these modules and data
structure constitute a major component of the program design document.
Abstraction refers to dealing with an operation from a high-level viewpoint, disregard-
ing its detailed structure. This will help to hide the design details of the lower-level modules to
the higher-level ones. Only the data and control are specified for communication between the
higher-and lower-level modules.
Procedural abstraction and data abstraction are the fundamental tools for managing the
implementation of large programs, both are used in the top-down design method. With proce-
dural abstraction we initially think only about the highest-level functions and procedures needed
to solve the problem. The specification of low-level routines is postponed until later. Each suc-
cessive refinement of the design adds additional detail to the developing solution. So the com-
plexity level increases as we proceed through the design process.
With data abstraction, we initially view a data structure in terms of the external interface
it displays to the user, that is, the operations that can be performed on that structure. Only later
do we begin to concern ourselves with the underlying details of the implementation of that data
type in a given programming language.
Modern program design techniques focus on stepwise refinement. Stepwise refinement
produces software design in a top-down manner. Stepwise refinement is an iterative process.
At each step, the problem to be solved is decomposed into subproblems that are solved sepa-
rately. Thus, if P is the statement of the original problem, p i, p2,..., p n are the statements of the
sub-problems to be solved iteratively. The following is a description of the sort-by-straight se-
lection algorithm by stepwise refinement.
Example 2.3
Step 1
Let n be the length of the arra y A to be sorted
i =l
whi l e i<n do
Place the smallest element at p o s i t i o n i
i=i+l
endwhile
Step 2
Let n be the length of the a r ray A to be sorted
i=l
w h i l e i<n do
j =n
w h i l e j>i do
if (A [i ] >A [j ] )
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 21
According to a bottom-up strategy, the design process consists of defining modules that
can be iteratively combined together to form sub-systems. This is typical in the case where we
are reusing modules from a library to build a new system, instead of building such a system
from scratch.
Information hiding proceeds mainly bottom-up. It suggests that we should first recognise
what we wish to encapsulate within a module and then provide an abstract interface to define
the module boundaries. The decision of what to hide inside a module may depend on the result
of some top-down design activity.
Therefore, the programr might choose to solve disjoint parts of the problem directly in a
particular programming language and then combine these parts into a complete problem. In
contrast to bottom-up design, a top-down design technique decomposes a problem into logical
subtasks and each subtask is further decomposed until all the tasks are expressed through a
programming language. The advantage that can accrue from top-down design strategy includes
not only the management of complexity but also an improved ability to test, validate, and
maintain the software that is ultimately produced.
ANALYSIS OF ALGORITHMS f l H H H B H B B H H H IH H
The analysis of algorithms is a critically important issue in computer science. The data struc-
tures that we will be discussing ( i.e. array, stack, queue, etc.) in this book would be introduced
not just because they are mathematically interesting but because we claim that they allow us to
develop more efficient algorithms for such tasks as insertion, deletion, searching, sorting, pat-
tern matching, and others.
Analzing an algorithm has emerged to mean predicting the resources that the algorithm
requires. We are mainly concerned with resources such as memory, communication band width,
or gates but often it is computational time that we want to measure. Analysing even a simple
algorithm can be a challange. Suppose we want to execute a statement x+1 (i.e.x=x+l) in a
C program. We must consider how much time will be required to execute the statement and
how many times it will be processed. The product of these two outcomes will be the total time
taken by the statement. Another statistic is called frequency count and it may vary from one
data set to another. It is impossible to estimate frequency count unless we have knowledge
about the machine structure, machine cycle time, and the speed of the translator. In this book
we will concentrate on developing only the frequency count for the statements. In our analysis
we want to find the order of magnitude of an algorithm. It indicates that we are determining
only those statements which may have the greatest frequency count.
In case of sorting method, we looked at both cases, in which the input array has already
sorted, and the worst case, in which the input array was reverse sorted. The worst case running
time of an algorithm is the upper bound on the running time for any input, the average case is
often as bad as the worst case. Suppose we randomly choose n numbers and apply insertion
sort. If we work out the resulting average case running time, it turns out to be a quadratic
function (for example, ax2+bx+c, for constants a,b, and c) of the input size, just like the worst
case running time.
When we look at an input size very large to make only the order of growth of the running
time relevant, we are studying the asymptotic analysis of algorithms. That is, we are concerned
with how the running time of an algorithm increases with the size of the input in the limit, as the
size of the input increases without bound. In the next subsection we discuss the asymptotic
analysis of an algorithm.
■ FUNDAMENTALS OF DA TA STRUCTURES — BASIC CONCEPTS ■ 23
Note that 0(n log n) is better than 0 ( n1) but not as good as 0(n). Similarly, if an algo-
rithm takes time 0(log n) it is faster, for sufficiently large n, than if it had taken 0{n).
24 ■ DATA STRUCTURES USING C ■
For reasonably large problems we always want to select algorithms of the lowest order
possible. If algorithm A is 0\f(n)\ and B is 0 [g (n )], then algorithm A is lower order than B if
/( w)< g(n) f°r all n greater than same constant K. For example, 0 (n 2) is lower order than 0 (n 3)
because n2<n3 for all n> 1. Similarly, 0 (n 3) is lower order than O (2”) since n3<2nfor all n >9.
Thus we would definitely want to select an 0 (n 2) algorithm to solve a problem, instead of
an 0 (n 3) or 0 (2 n) algorithm, if such an 0 (n 2) algorithm existed.
Fig. 2.11 shows the behaviour for algorithms of order 0 (2 ”), 0 (n 3/2), 0(5n2), and O(lOOn).
This type of analysis provides the general guidelines we need and thus O notation is the
fundamental technique for describing the efficiency properties of algorithms.
One important system of description used throughout the book is the use of notation of
the form 0[f(n)\ to describe the time or space requirements of running algorithms. If n is a
parameter that features the size of the input to a given algorithm, and if we say the algorithm
runs to completion in 0\f(n)\ steps, we mean that the actual number of steps executed is no
more than a constant timef(n ) for sufficiently large n.
Structure relinquishes control to the next sequential statement. Again, there is one entry
point and one exit point from the structure.
Consider again Fig. 2.12. The entry point to Fig. 2.12 is a selection structure to evaluate
the condition. If the condition is false, an iteration structure is entered. If the condition is true, a
sequence structure is executed. Both the iteration and sequence structure meet at a single point,
which in turn becomes the exit point for the initial selection structure.
We have acquired some familiarity with the basic patterns, which are sufficient for any
program. Certain other combinations have proven to be especially useful. One is the combina-
tion of simple sequence and do-while, known as do-until control structure. Even though the
nested if-then-else structure seems to be useful for any number of conditions, it is difficult to
draw such a flowchart or pseudocode. Case structure is a generalization of the nested if
then else pattern with a large number of possible conditions. These two structures are shown in
Fig. 2.14.
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 27
In this section, we have used ANSI standard flowcharting symbols within the structure
of visualizing the program login set up within the structured programming control structures.
Some feel that the structural programming is self-explanatory and need not be accompanied by
flowcharts. These persons usually point out that a flowchart often fails to represent the current
status of a program. To many, the use of pseudocode is a viable alternative to flowcharting. It
permits the programr to express required program logic unencumbered by programming lan-
guage rules and constraints.
After a structured program has been designed, it must be expressed in a programming
language which supports the structured construct. The primary language statements directly
analogous to the control structures we have described are available in a particular language. C
language supports these constructs. In the next chapter, array as a data structure will be intro-
duced.
EsXiEiRiCiliSiEiS
1. A function excan be approximated by using the following formula:
ex= 1 + x + x2 /2! + x3 /3! + . . . + xk /k!
Write an algorithm for finding ex, where x is given as input.
2. What is the smallest value of n such that an algorithm whose running time is 100n2 runs
faster than an algorithm whose running time is 2non the same machine?
3. The most common computing times for algorithms are:
0 (1)
O (log2n)
28 ■ DATA STRUCTURES USING C ■
O (n)
O (n log2 n)
O (n2)
O (n3)
O (2n)
Draw a graph for their time complexities in the range 1< n <128 and compare their rate of
growth.
4. Compare the two functions n2and 2n/4 for various values of n. Determine when the second
becomes larger than the first.
5. Given n, a positive integer, determine if n is the sum of all of its divisors; that is, if n is the
sum of all t such that 1< t < n and t divides n. Draw a flowchart for this problem.
6. Write pseudocode for the insertion sort to sort into nonincreasing instead of nondecreasing
order.
7. What are the main advantages of a decision table over a flowchart?
8. Discuss the merits and demerits of top-down and bottom-up approaches to algorithm de-
sign.
9. Imagine that we have an algorithm p whose time complexity we have analyzed and found
to be O [log2(log2n)]. What would be the position of that complexity function in the ordered
list of functions given in Exercise 3?
10. Can we write a 0 (1 ) algorithm to determine n! for 1< n <15?
3
ARRAYS
In this chapter we will be concerned with non-primitive data structures that are linear. A num-
ber of possible storage representations for these linear structures are available. We will concen-
trate here on array structures only. Others will be discussed in the succeeding chapters.
For example, to create an array called 'A' with ten integer elements, we can declare it as follows.
int A [10];
The indexing of array elements always starts at 0. Thus the above array reserves memory
locations that are referred to by A[0],A[1] ,...,A[9] . This is one of the characteristics of the
C language.
A typical array declaration allocates memory starting from a base address. The array
name is in effect a pointer constant to this base address. To store the elements of the array, the
compiler assigns an appropriate amount of memory starting from a base address. The elements
of an array are accessed using a subscript, also called an index. We can write A[i] to access an
element of the array. More generally, we may write A[expr], where expr is an integral expres-
sion, to access an element of the array. The value of the subscript must lie in the range 0 to size
-1 if the declaration of the array is of the form A[size]. An array subscript value outside this
range will cause a run-time error. This is a common and serious programming error which must
always be avoided since it will cause the program to fail.
Arrays of all types are possible, including arrays of arrays. Strings are just arrays of char-
acters, but they are sufficiently important to be treated separately. We may have an array of any
type: int , char , float, double , another array, a structure, a union, or a pointer to any
type. The size of the array must be a constant. The following example will illustrate the idea:
#define SIZE 25
int A [SIZE +1 ];
To illustrate the concept of an array, let us write a program that fills an array, prints out
values, and sums the elements of the array.
Example 3.1
#include <stdio .h>
#define N 5 /* Size of arra y */
main( )
{
int A[N] ; /* Space for A [ 0 ] ......... A[4] is a l l o c a t e d */
int j,sum=0;
for(i=0; i<N; i + +) A[i]= i*i ; /* Initi alize the ar ray
*/
for(i=0;i<N; i + +) /* D i s p l a y a r r a y elements */
printf(" A[%d]=%d" ,i, A [i] );
for(i =0; i <N; ++ i) /* F i n d total */
sum + = A [i ];
pri ntf (" \n sum=%dn/n", sum);
return ;
}
Output: The output of this programme is
A[0]=0 A [l]= l A[2]=4 A[3]=9 A[4]=16
sum = 30
m ARRAYSm 31
Consider an election contested by four candidates. Let us write a program to read the
ballots and count the votes cast for each candidate. Assume that the candidates are numbered
0 to 3 and each ballot is presented as a line of input containing one of these members. Assume
that there are n number of ballot papers.
Example 3.2
}
pri ntf ("Candidate-1 % d \n " , countO)
p rin tf (" Candida te-2 % d \n", countl) ;
pri ntf (" Candidate -3 % d \n", count2 )
pri ntf ("Candidate-4 %d\n", count3) ;
p rin tf ("Invalid votes %d \n", invalid);
ret urn ;
}
You might feel that this solution is rather clumsy. Imagine how much worse it would be
if the program were modified to handle ten or fifty candidates. We observe that the variables
countO, countl, count2, count3, and so on are used in exactly the same way, however,
32 ■ DATA STRUCTURES USING C ■
we have a clue for better solution. Let us rename the variables countO as count [ 0 ] ,countl
as count [1 ], and so on. The variables now form an array, with a single variable count. Thus
we can rewrite the program as in Example 3.3.
Example 3.3: The modified program is listed below.
lowing definition
static int marks[4]={55,67,92,45};
creates and initializes the array marks as shown in Fig. 3.1 below.
marks 55 67 92 45
Note that we can ignore the integer within the brackets if we define and initialize the
static or extern arrays. The compiler will allocate exactly as many cells as are needed to store the
initial values. Thus the above example can also be written as
static int marks[ ]={55,67,92,45};
If, in a definition, we specify fewer initial values than the number of cells, the cells begin-
ning with the first will be initialized with the given values. The rest of the cells will be initialized
to zero. It may be an error to supply more initial values than there are cells. For example, the
following statement will create and initialize values as shown in Fig. 3.2.
A set of ten readings is stored in an array number. Write a program to check whether
each reading is positive or negative. Add all positive readings and store the resulting Sum
variable.
Example 3.4
# i n c l ud e<std io.h>
main( )
{
static int n u m b e r [ 10]={ 50,-30,20,70,-25,19,18, 78,-
225,719};
i n t S u m = 0 ,i ,j = 0;
for(i=0,i<10;i++)
i f ( N u m b e r [ i ] >0)
{
Sum+=Number[i];
j ++;
}
p r i n t f ("Total p o s i t i v e n u m b e r %d \t Sum: %d",
j , Sum);
return;
}
34 ■ DATA STRUCTURES USING C ■
The symbolic name 'Number' here denotes the name of the array. The array is initialized
with ten values. The 'for' loop contains an 'if' statement to test for the positive values in the
array. Finally the 'Sum' will hold result and j will store the total number of positive values in the
given array.
Linear[i+1] = Linear[i] ;
i + +;
}
Linear[K] = Item ;
N++;
/* Print the a r r a y */
for( i=0 ; i<=N ; i + + )
p r i n t f ( "\n Linear[%d] = ", Linear[i] ) ?
}
Example 3.6
Example 3.7: Suppose an array of 10 integer values is given. Find the content of the array on
execution of the following program.
int A [10], i ;
for(i=0; i<10 ; i + +)
scanf ("%d", &A[i] ) ;
ford = 0, i< 9 ; i + + )
A [i + 1 ] = A [i ] ;
for (i=0 ; i<10 ; i++)
pri ntf ( "\n A [%d] = A[i] ) ;
return ;
}
The arrays discussed so far are called one-dimensional arrays, since each element in the array is
referenced by a single subscript. A vector in mathematics can be represented by a one-dimen-
sional array, and a matrix by a two-dimensional array. Most programming languages allow
two-dimensional and three-dimensional arrays by two and three subscripts. In C language we
can use the higher number of dimensions for an array. The general form of a two-dimensional
array declaration in C is as follows:
type_specifier Variable_name[Rows] [Columns];
For example, the following is a two-dimensional array A of integer type with m rows and
n columns.
int A[m][n];
It is assumed here that m and n have already been defined with #de fine compiler direc-
tive. Each element is specified by a pair of integers ( such as j, k) called subscripts, with the
property that
0 <= j<= m - 1 and 0 <= k =< n - 1
There is a standard way of representing a m x n two-dimensional array A where the
elements of A form a rectangular array with m number of rows and n number of columns.
The element A[j] [k] appears in row j and column k. Fig. 3.3 shows the array A which has
3 rows and 4 columns.
Columns
0 1 2 3
Rows 0 A[0] [0] A[0] [1] A[0] [2] A[0] [3]
1 A [l] [0] A [l] [1] A[l][2] A [l] [3]
2 A[2] [0] A[2] [1] A[2][2] A[2] [3]
We illustrate the procedure for reading and writing a two-dimensional array with the
following example program given in Example 3.8.
Example 3.8
# i n c l u d e < s t d i o .h>
#define R O W 5
#define C O L U M N 6
main( )
{
Int A [ R O W ] [ C O L U M N ] , i, j ;
/* R e a d the a r r a y A*/
for( i = 0 ;<ROW; i + +)
f o r (j = 0 ; j <C O L U M N ; j++ )
s c a n f (" % d* , & A[ R O W ][ COLUMN]);
/ *Print the a r ra y element s */
f o r (i = 0 ; i<ROW ; i ++ )
{
for( j= 0; j< column; j+ + )
p r i n t f (" A [%d] [%d]" , A [ R O W ] [ COLUMN]) ;
}
}
As mentioned earlier, C language allows arrays with more than two-dimensions. For
example, a three-dimensional integer array is defined as
int x [3][2][5];
An array element x [1][2][3] specifies second plane, the first row, and the fourth column
number. For example, an array of marks of a student in a particular class number might be
indexed as
int A[class_no] [roll_no] [marks];
If we want to refer to an element of the array A, three subscripts are required. The num-
ber of elements in an array is the product of the ranges of all its dimensions. The array x may
contain 3x2x5 = 30 elements. In general, if we multiply together the numbers of brackets in the
definition of an array, we will obtain the total number of memory locations that would be nec-
essary for storing all the elements.
In general, an n dimensional array A is denoted by A [SJ [S2]...[SJ and subscript limits by
0<S1<=U1,0<S2<= U7,...,0< Sn<=Un, where and so on indicate upper bounds on its sub-
script. The array will be stored in memory in a sequence of memory locations. Specifically, the
programming languages will store the array A either in row-major order or in column-major
order. In row-major order, the last subscript changes first, the next-to-last subscript second, and
so on. In case of column-major order, the first subscript varies most rapidly, then the second
subscript, and so on. Suppose that B is an n-dimensional array declared by:
int B[I1]B[I2] ... [In] ;
38 ■ DATA STRUCTURES USING C ■
where lv I2, and other are declared as the ranges of first dimension, second dimension, and so
on. The base (B) is the address of the first element of the B array, that is, B[0][0]...[0]. Assume
that the array B is stored in row-major order and each element of B reserves m bytes. Thus the
address of B[il] [i2]...[in] is written as follows:
base (B) ' ' + m * [ i1*1i * . . * i n + i2*i j * i,*...*i+...+i]
4 n n
The number of elements in the array B is the product of For example, an array X
with seven subscripts is declared as
int X [7] [15] [3] [5] [8] [2] [2] ;
The number of elements array X contains is 7x15x3x5x8x2x2=50,400. In C an integer reserves
two bytes and X array stores 50,400x2 bytes.
A (14) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4)
l 2 3 4 5 6 7 8 9 10 11 12
Fig. 3.4 Linear storage of data in row-major order
According to this arrangement, the first row would take up the first four locations in the
list allocated for the array, the second row the second four locations, and so on. This arrange-
ment is called row-major order. The element in the 3rd row and 2nd column, will in fact, be
located in the 10th position within the list. In this two-dimensional array, the ith row and
jth column must be transformed into the following position in the list according to C-language.
[4* (i-1) + (j-l)]th
In general, if N is the number of columns in the array, then the entry in the ith row and jth
column is given as the
[N * (i-1) + {j-1)]th
Similarly, Fig. 3.5 indicates the approach for storing elements in the column-major order.
A (1,1) (2,1) (3,1) (1,2) (2,2) (3,2) (1,3) (2,3) (3,3) (1,4) (2,4) (3,4)
1 2 3 4 5 6 7 8 9 10 11 12
Fig. 3.5 Linear storage of data in column-major order
MARRAYSM 39
To access the entry in the ith row and jth column of a two-dimensional array stored in
column-major order, the transformation N1 x (j-1) + i +1 is required, where N1 represents the
number of rows in the array.
Let us consider a problem . An array A has 25 rows and 4 columns. Suppose the two-
dimensional array is stored in column-major order.
Then the position of A[3][2] may be computed as
25 x (4-1) + 2-1=75+1=76
Note that C language stores two-dimensional arrays using row-major order. An n-di-
mensional (Pa x P2...xP J array C is a collection of Fv P2..., Pn data elements in which each ele-
ment is specified by a list of n number of integers, say k v k2,...,kn called subscripts with the
following properties:
0<=k1<P1
1 1 ',0<=k9<P_,....0<=k
2 2' n<Pn
The element of C with subscripts kl,k2,...kn will be denoted by
Ck1k2...kno rC [k 1][k2]...[kn]
The array will be stored in memory in consecutive locations. The programming language
will store the array C either in row-major order or in column-major order. In row-major order
the elements are listed in such a way that the last subscript varies first, the next-to-last subscript
varies second, and so on. In the column-major order, the elements are listed so that the first
subscript varies first, the second subscript second, and so on. Suppose C is a three-dimensional
2x4x3 array. Then C array contains 2x4x3=24 elements. Figs. 3.6(a) and Fig. 3.6(b) indicate the
arrangements in column-major order and row-major order, respectively.
C Subscripts C Subscripts
(14,1) (1,1,1)
(2,1,1) (1,1,2)
(1,2,1) (1,1,3)
(2,2,1) (1,2,1)
• •
• •
• •
(1,4,3) (2,4,2)
(2,4,3) (2,4,3)
N u m b e r ++;
/* Sum of elements above diagonal */
f o r (j =1 ; j<n ; j++ )
for(i=0 ; i<j ; i++ )
Sum + = A [i ] tj ];
/* Product of diagonal elements */
for( i=0 ; i<n ; i++ )
Product * = A [i ][i ] ;
p r i n t f ( " \n \n Num b e r = %d \n \n Sum =%d \n\n
Product = %d \n", Number, Sum, Product) ;
}
■ ARRAYS M 41
Sample Input:
1 2 3
A[3] [3]= 4 0 6
7 8 9
Output:
Number = 8
Sum = 11
Product = 0
The program in Example 3.10 prints the elements of a two-dimensional array in row-
major and column-major order.
Example 3.10
main ( )
{
int T e m p [5][4],i , j ;
for( j = 0; j<4 ; j+ + )
f o r ( j = 0 ; j<4 ; j+ + )
f o r (j = 0; j<4; j++ )
}
A program to illustrate the idea of lower-triangular matrix and tridiagonal matrix is given
in Example 3.11. A square matrix is a lower- triangular if non-zero elements are below the
diagonal. In case of a tridiagonal matrix, all elements of the square matrix other than those on
the major diagonal and on the diagonal immediately above and below this one are zero. For
example, an n-square tridiagonal array B has n elements on the diagonal, n-1 elements above,
and n-1 elements below the diagonal. Thus B contains almost 3n-2 non-zero elements.
42 ■ DATA STRUCTURES USING C ■
Example 3.11
i=p;
}
i++ ;
}
i++ ;
r++ ;
1 ++ ;
}
if ( i == p-n+1)
pr intf ( "\n It is a lower triangular m a t r i x \n" )
k = 1 ; i = 2 ;
while(i<p-l)
{
w h i l e ( k<= 2)
m ARRAYSm 43
{
i f ( mat[i] )
{
p r i n t f ( " \n\n It is not a tridiagonal
m a t r i x , \n" ) ;
i= p - n ; k = 3 ;
}
k++ , i + +;
}
i=i+n-l; k=l;
}
k++; i + +;
}
i=i+n-l ; k=l;
}
i f ( i== p+1)
p r i n t f ( "\n It is a tridiag onal m a t r i x \n") ;
}
We now present more examples on arrays.
Example 3.12: Given is a decimal number, find its binary equivalent and the number of l's in
the binary number.
/* Decimal to b i n a r y c o n v e r s i o n and */
/* Co u n t i n g of l's in the b i n a r y numb e r */
# include < stdio.h>
# include < m a t h . h >
# def ine SIZE 20
main ( )
{
int r e s u l t [ S I Z E ] , num, temp, count, tag ;
p r i n t f ( " \n \n Enter the decimal n u m b e r :") ;
s c a n f ( "%d",, & nu m ) ;
/* F i n d the b i n a r y n u m b e r */
for( count =0, count <SIZ E ; ++count )
{
temp = num./2 ;
result[count] = num % 2 ;
44 ■ DATA STRUCTURES USING C ■
num = temp ;
i f ( temp == 0 )
{
tag = count ;
break ;
}
}
/* Print the result */
printf( "\n The binary number is :") ;
fort count = tag ; count > = 0 ; — count )
{
printf( "%d, result[count]) ;
i f ( result [count] == 1 )
temp+= 1 ;
}
printf( " \n \n There are %d l's in the
binary number ", temp ) ;
}
We now explain the above program through the following tables. Assume that input
Table 3.1
9 0 4 1
4 1 2 0
2 2 1 0
1 3 0 1 3
Table 3.2
3 1 Yes 0 + 1= 1
2 0 No 1
1 0 No 1
0 1 Yes 1+1=2
m ARRAYSm 45
Example 3.13: Write a program to generate pascal triangle for a given number of rows. The
output is given in the following form:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
Example 3.15: A pair of positive numbers is said to be AMICABLE if the sum of the divisors of
the first number is equal to the second number and the sum of the divisors of the second num-
ber is equal to the first number. The divisors should include 1 but leave the number itself. Write
a program to find if a pair of numbers num [0] and num [1], as input, are AMICABLE or not
AMICABLE.
At the beginning, the program prompts for two inputs. Next, the factors of each number
are found using 'for' loop. The number whose factors are to be calculated is divided repeatedly
with 2, 3, 4,..., upto num [i] / 2 where num [i] stores two numbers (i =0 and i = 1). If there is no
remainder then the quotient is added to the variable sum [i] at the end of the second 'for loop.'
The first 'for loop' is used to find sum [i] twice for i = 0 and i= 1, since two numbers have been
taken as input. Lastly, we test for the amicability of the two numbers. It should be noted that the
sum of factors of a number should exclude the number itself. Also since 1 is a factor of every
number, so sum [i] is assigned a value 1 initially for i=0 and i= 1.
Example 3.16: Write a program to sort the list of 100 vouchers in their increasing order of voucher
numbers. Also trap and list out the in-between missing voucher numbers.
This chapter described the representation of linear data structure by using one of the
methods of sequential allocation storage. Other methods will be discussed later. Although this
method of allocation is suitable for certain applications, there are many other applications where
the sequential allocation method is unacceptable. This chapter illustrated initially the represen-
tation of arrays in C language. Next, the different operations on arrays were given and ex-
plained the process for initializing elements in the array. Finally, multidimensional arrays and
the concept of row-major order and column-major order were introduced. Examples are given
to illustrate different methods.
Write a C program to generate and print a magic square for a given N, which should be
odd.
11. The following table shows the average rainfall of a city month-wise during a year.
Month Jan Feb M ar A pr May Jun Jul A ug Sep Oct Nov Dec
Rainfall 5 3 2 3 4 8 12 12 11 9 3 4
in
centimetres
In the previous chapter, we studied the representation of simple data using an array. These
representation had the property of storing homogeneous type of data. One of the primary inter-
ests of today's computer processing concentrates on string processing, broadly called as text
processing. Such processing usually involves some type of pattern matching. Pattern matching
is the process of finding a pattern within a string of character text. The answer may be (1) whether
a match exists or not, (2) the place of (the first) match, (3) the total number of matches, or (4) the
total number of matches and where they occur. We discuss in this chapter fundamentals of
string representation, string manipulations, string functions, and pattern-matching algorithms.
Three string matching algorithms— straightforward, Kunth-Morris-Pratt, and Boyer-Moore—
are examined and their time complexity will be discussed. Algorithms and C programs for
different string processing methods will be presented.
mance considerations relating operations to string data structures? How pattern matching op-
erations are performed? In this chapter we will find those answers. This chapter also explores a
number of possible representation choices and presents some well-known algorithms for imple-
menting string-manipulation tasks.
When creating character strings, however, we must append the null character at the end
of the string. It helps to search the end of the string. The following 'for' statement illustrates the
process for finding the end of the string.
for(i=0; s t r i n g [ i ] ! =NULL; i + + )
or
f o r ( i = 0 ; s t r i n g [ i ] ! = ' \0 7 ; i + + )
The word ' NULL' can also be used in place of ' \0 ' .
As we will see later on, the majority of string manipulating functions search for the end of
the string in the above manner. These routines can also be implemented with pointers instead of
character arrays.
The implementation of arrays in C is strongly tied to pointers. In many ways, array in-
dexing simplifies pointer arithmetic for the program. In C, the name of an array is a pointer to
the first element in the array. A pointer references a location in memory and a pointer to a
variable is created by the use of the asterisk (*) before the variable. To access the value contained
in the location referenced by the pointer, the asterisk is used.
Pointers are convenient for string functions. One advantage of using pointers to charac-
ter string is that we need not worry about the dimension arrays to contain the characters. When
pointers to character strings are used in functions, a pointer to the first character in the string is
only to be passed in a similar manner for arrays of characters. Once the pointer is assigned to the
locations of the first character in the string, we can move further through the characters con-
tained in the string by incrementing this pointer.
int i;
for( i=0; string [i]!='\0' ; i++)
/
return(i) ; /* Number of characters in the string */
}
The substring of a string returns any length of consecutive characters. A substring function
requires three parameters.
(i) The name of the string or the string itself.
(ii) The position of the first character of the substring in the given string.
(iii) The length of the substring or the position of the last character in the substring.
(iv) Copy of substring into another string. We now write the s u b s t r ( S I , K, L , S2 ) function
below:
Example 4.2
We now present some more useful functions such as Insert ( ), Index ( ), and remove ( ) .
Example 4.3
Index(SI, S2)
char S I [ ], S 2 [ ];
{
int i , j , k ;
for(i=0 ; S I [i ]! = '\0' ; i++)
for(j=i, k=o; S2[k] = = sl[j] ; k+ + , j++ )
if(S2 [k+1]= =' \ 0 ' )
return(i); /* Substring is found */
/* Substring is not found */
re t u r n (-1);
}
We now present two more functions, namely, remove (S ), and rem ove_al l_ b la n k s
( S I ). The first function removes all trailing blanks and the second function removes all blanks
from the given string.
Example 4.5
Example 4.6
/* Append SI to S2 T*/
f o r (j = 0; (S 2 [i ]= S 1 [j ]) ! = '\0'; + + i, + + j)
/
}
The above function appends the contents of SI to the contents of S2. We can also utilize
the function to append the contents of S2 to the contents of SI by interchanging the arguments,
that is, append (S2 , S I ) instead of append (S I , S2 ) .
The function Remove_all_blanks () deletes all spaces from a string. Some more ex-
amples are also given.
Example 4.7
Input:
abcdefgh
2 4
Substring from i=2 to m=4 is: cde
Let us now write an algorithm for finding the location where the pattern P is a leftmost substring
of string S.
Algorithm
Input: S and P are two strings with lengths n and m, respectively, and are stored in an array. It
is assumed that the value of m is less than or equal to n.
Output: If the pattern P is a leftmost substring of string S starting at the kth character of S, this
algorithm returns k, otherwise it returns 0.
Stepl: Set k <— 1, j <— 1 and X <—m-n+1
Step 2: If k > x then return 0 and exit____________________
Step 3: If P[j] = S [ k+j-1] and j=m then return k and exit.
Step 4: Repeat Step 5 while P[j]= s[k+j-l] and j<m
Step 5: Set J <— j+1
End of loop.
Step 6: If P[j] # S [k+j-1] then Set k<— k+1 and Set j<— 1
Step 7: Go to Step 2
Step 8: Exit
It is left to the reader as an exercise.
Pattern matching is an important problem that occurs in different areas of computer science
and information processing. The basic objective of the problem is to detect the occurrence of a
particular string of characters (the pattern) as a substring in a sequence of characters (the input
string). For example, the input string 'computer science' contains the pattern 'science' as a
substring. Many text editors and programming languages (such as C, PASCAL, etc.) have facili-
ties for matching strings. Pattern matching is one of the central and most widely studied prob-
lems in theoretical computer science.
There are three basic approaches for implementing pattern-matching algorithms. Each of
them is conceptually simple but the performance of the third one in many cases is the fastest-
known pattern-matching algorithm in both theory and practice.
The first approach, called the brute-force algorithm, is the one that first comes to mind
whenever the pattern-matching problem is addressed. The pattern is placed over the input
string at its extreme left. The pattern and input characters are then scanned to the right for a
mismatch. If a mismatch is found, the pattern is shifted one position to the right and the scan is
started again at the new position of pattern. Whenever a partial match fails a backtracking is
necessary over the pattern.
In the second approach, initial pattern and input string placement are as in the brute-
force algorithm and scanning is done to the right. When a mismatch occurs, however, the pat-
■ STRING PROCESSING AND PATTERN MATCHING ■ 61
tern is shifted to the right in such a way that the scan can be restarted at the point where a
mismatch occurs in the input string. In this case no backtracking is therefore required.
In both approaches the pattern is scanned from left to right and each input character is
checked at least once. This approach achieves great speed by skipping over portions of the
input string that cannot possibly contribute to a match.
In the next subsections we will present three approaches for pattern-matching problems.
For convenience, we will assume P=P1, P2,..., Pmof length m and S= Sv S2,... Snof length n where
p. represents the ith character of the pattern and $ the jth character of the input string.
Note that when we will implement algorithms through C programs, we will use pattern
as P^ Pj,..., Pml and input string as S^ S^..., Sml since the index of array starts from 0 in C.
Further, it is straightforward to generalize these algorithms to locate all occurrences of
the pattern in the input string.
When a mismatch occurs, it slides the pattern one character to the right and starts looking for a
match by comparing P1with Sk+1. The following algorithm illustrates the Brute-Force approach.
/* Brute-Force algorithm */
# define MAXPATLEN 80
# define MAXTEXLEN 80
# include <stdio.h>
# include <string.h>
main ( )
{
char P[MAXPATLEN] , S[MAXTEXLEN] ;
int m, n ; /* m— Length of P, n- Length of S */
printf("\n Enter the text:");
gets(S);
printf("\n Enter the pattern to be matched:");
gets(P);
m=strlen(P); /* Length of pattern using strlen ( ) */
n= strlen(S); /* Length of text using strlen ( ) */
/* strlen ( ) is a C-Library function */
Brute _ Force (S,P, m, n , ) ; /* Function call */
}
/* Brute-Force function */
■ STRING PROCESSING AND PATTERN MATCHING ■ 63
Brute_Force (S,p,m,n )
char S [ ] , P [ ] ;
int m,n;
{
int i, j , k;
i = 0 ;
j = 0 ;
k = -1 ;
while ( i<m && j<n)
{
k = k+1 ;
if(p[i] == S[j] )
{
i + +;
j ++;
}
else
{
i = 0 ;
j ++ ;
}
}
if( i==m)
printf ("Pattern is found at", k );
else
printf ("Pattern is not found");
return ;
}
S=' BABCBABCABCAABCABCABCACABC'
P = ' ABCABCACAB'
Fig. 4.2 String and pattern
We will assume that the search begins at the left end of S to be searched.
Suppose S=s2, s2,...sm, and P=p1, p2,...,Pn and assume that we are currently determining
whether or not there is a match beginning at s.. If s.#p1then we proceed by comparing s.+1 and pr
Similarly, if s. = p1and s.+1 # p2 then we may proceed by comparing s.+1=p1 In general, after
starting it is found that p^p^...,p. matches s^s^.^s.and s.+1 * P.+r
The best possible place to recover from this failure would be to slide P over the right so
that as many of its initial characters p^p^.^p.match as many of the final characters of s^s^.-.s. as
possible, provided we can continue matching s.+1 with p.+1from this position. Thus, we want to
find the longest head of P, p^p^./p.is equal to the tail of s^s^.^s., for which s.+1 = pi+r So we find
that the first j+1 symbols of P have been matched. We continue further pattern matching from
this position. To see how the method works, let us consider the following example.
Example: Consider the pattern P= abaababaabaab. The algorithm A produces Table 4.2.
Let us now apply Kunth-Morris-Pratt algorithm (Algorithm B) to look for the pattern P in
the input string abaababaabacabaababaabaab. Initially, the first 11 characters of P and S (the
input string) align successfully. Let i be the pointer of p and j be the pointer of S. When i=j=12,
the algorithm finds a mismatch at the input character C. The first iteration of the inner loop of
■ STRING PROCESSING AND PATTERN MATCHING ■ 65
algorithm B sets i=h12=7 ( from h.of Table 4.2). This has the effect of shifting the pattern five
characters to the right, so position 7 of the pattern is now aligned above position 12 of the input
string. At this point p. still does not match S, so i is set to h7=4. Mismatches continue for i=4,
2,1,0. At this point the inner loop is exhausted without finding a match and the outer loop then
sets i to 1 and j to 13. It is shown in Fig. 4.3.
i i
abaababaabaab
ab aab ab aa bac a b a a b a b a a b a a b
_______________________ j______________________
Fig. 4.3 A successful match
/* Kunth-Morris-Pratt */
int h [80], m, n ;
main ( )
{
char pat[80], s[80];
int i ,j ;
gets(s);
puts(s);
gets(pat);
puts(pat);
m=strlen(pat);
n=strlen(s);
i=l;
j=l;
f(pat);
while (i<=m && j<=n)
{
while(( i>0 && (pat[i-l] != s[j—1]))
i = h [i - 1 ];
i+ +;
j+ + 7
}
if (i > m)
p r i n t f ("yes\n");
else
printf("no");
■ STRING PROCESSING AND PATTERN MATCHING ■ 67
f(pat)
char p a t [ ];
{
int i ,j ;
i = l, j =0 ; h = [0]=0;
while(i<= m)
{
while(( j>0) ScSc (pat[i-l]!=pat[j -1]))
j = h [j-1];
i++ ;
j ++;
if (pat[i-1] == pat [ j - 1 ] )
h[i-l] = h[j-1];
else
h[i-1]=j ;
}
for(i=0 ; i<m ; i + +)
p r i n t f (" %3d", h [ i ] );
p r i n t f ("\n");
}
(C) The pattern is moved the distance given by the maximum of (1) and (2) above.
We now explain the essential features of the Boyer-Moore algorithm in terms of S and P.
Suppose that we are checking the characters of P against those of S in right-to-left order.
If all characters of P match those of S beneath, we have found a substring of S to match P.
Initially, we compare pm(P= p1,p2,...,pm) with sm(S=s1,s2,...,smsm+],...,sn). If Smoccurs no-
where in the pattern, then there cannot be a match for the pattern beginning at any of the first
m characters of the input string. We can safely slide over the pattern m characters to the right
and try to match pmwith s2m. So we avoid m-1 unnecessary character comparisons.
Suppose we have just shifted the pattern to the right and are about to compare pmwith sk.
This is shown in Fig. 4.4.
i
4
Pm
S1......... k-m +l - A ......... .........s n
t
j
Fig. 4.4 Shift and compare Pmwith sk
If skdid not occur in the pattern, we would shift the pattern m positions to the right and
start comparing p with sk+m.
Case 2: Suppose that the last m-i characters of the pattern match with the last m -i characters of
the input string ending at position k, that is, p,+l, pi+2,- ,p m= sk_m+i+1, sk_m+i+2,...,sk.
If i=0, we have found a match, otherwise we consider two cases,' that is,' i>0 and p r i
=8,k-m+i
If the rightmost occurrence of the character sk_m+i in the pattern is p. ^ then we can simply shift
the pattern g positions to the right, so that p and sk_m+. align. The pattern matching starts again
uy comparing pmwith sk+g/ as shown in Fig. 4.5.
i
i
Pi....... ....... Pi-g..... ..... pm
S1......
s
k-m +l+g
........... S. .........
k-m+i k+g
.....s n
t
j
Fig. 4.6 Compare sk+gwith pm
■ STRING PROCESSING AND PATTERN MATCHING ■ 69
If p. gis to the right of p.(g<0), then we would instead shift the pattern one position to the
right and resume matching by comparing pmwith sk+r The first covers Case 1 and Case 2 in the
discussion above.
The first table, called dv determines how far to slide the pattern to the right when p.#s.. It
is indexed by characters. The table da is a function of the text character C for which the mis-
match occurred. For every character C^ da[C] is the largest i such that C=p.or C=m if the char-
acter C does not occur in the pattern.
Case 3: The second table, called d2, is a function of the position in the pattern at which the
mismatch occurred. Suppose suffix Pi+1,pi+2,—,pm reoccurs at the substring Pi+1_g/-*vPi+2_g/---Pm_gin
the pattern and P^P^g- If there is more than one such reoccurrence, we take the rightmost one.
In this case a longer shift than Case 2 may be possible by aligning Pi+1_g/-**/Pm_gabove sk m+i+1,...,sk
and restarting the scan by comparing pmwith sk+g(Fig. 4.7).
i
4
P i ................... P i + l - g ......................p m- g ........ - P m
The table d2is indexed by positions in the pattern. For every l<i<m (m is the length of
pattern P), d2[i] gives the minimum shift g such that when we align pmover sk+gthe substring
Pi+1_g/—,pm_gof the pattern matches with substring sk_m+i+1,...,skof the input string, assuming p^id
not match sk_m+i. It is left to the reader as an exercise. Also, write a C program for the algorithm.
In this chapter, we have discussed the string-processing and pattern-matching problems. Algo-
rithms are given for solving string-matching problems that have proven useful for text-editing
and text-processing applications. In the next chapter, pointers in C will be discussed.
EmXmEmRmC I SiEnS
1. Write a C program to count all occurrences of a particular word from a given text.
2. Write a program that prints the user specified last n lines of its input. If n is more than the
number of lines in the input, all the lines should be printed. If n is not specified, a default
number of p lines should be printed.
3. Write a function that replaces the first occurrence of a given substring in a source string by
the specified substitution string.
4. Write a C program that converts a given string to its equivalent floating-point equivalent.
5. Write a function check ( s t r , c ) to print C' if C is in the string pointed to be Str. The para-
meter str is a pointer to a char, and the parameter C is a char. If C is not in the str,
return 0.
6. Write a C program to print the word 'Computer' in the following way:
Computer
Compute
70 ■ DATA STRUCTURES USING C ■
Comput
Compu
Comp
Com
Co
C
7. Write a C program that copies its input to output, except that it removes trailing blanks,
leading blanks, and tabs from the end of lines and prints only one line from each group of
adjacent identical lines.
8. Write a function to extract a portion of a character string and print the extracted string.
Assume that p characters are extracted, starting with n characters.
9. Write a function that appends a string to a given string. In the resultant string, any upper-
case character in the original string needs to be converted into lowercase, and vice-versa.
10. Briefly describe the merits and demerits of all the pattern-matching algorithms that were
discussed in this chapter.
5
POINTERS
The use of pointers is one of the most powerful features in C. Pointers are simply variables that
point to another variable. Relation between these two variables is established by the fact that
the value of the pointer variable is the address of the variable it points to. In this chapter we
discuss the basics, understanding the use of pointers.
To be precise, pointer to c means that the pointer variable that points to c holds the
address of variable c, that is it holds &c. Similarly a pointer to i holds the address of i (&i) and a
pointer of f holds the address of f (&f).
Thus, to get the address of a variable we make use of the & operator, which is an unary
operator. C provides the facility to store such an address to another variable, known as pointer
( because these variables are holding the address of another variable). Consider three pointer
variables: pc, pi, and pf. This means all of them can hold addresses of other variables. Since they
are variables, we must define them before their use. We will see the method of defining such
variables within a moment. As pc, pi, and pf are pointer variables we can easily make the
following assignments:
72 ■ DATA STRUCTURES USING C ■
pc=&c;
pi=&i;
pf =&f ;
Assume the following three variable definitions:
ch ar cc;
in t ii;
float ff;
C provides another unary operator * to get the content of an address. More pricisely, *pc
gives you the contents of address pc and we know that it is a character. So we can easily write
the statements of the form
cc=*pc;
Similarly, we can write
ii=*pi;
ff=*pf;
This highlights the fact that though the variables pc, pi, and pf are pointers, the data type
of their contents differs. More precisely, pc is a pointer that points to a character, pi is a pointer
pointing to an integer, and pointer pf points to a floating-point value. This distinction of point-
ers should be reflected at the time of defining these variables and we define these variables as
char *pc;
int *pi;
float *pf;
Clearly, the sequence of statements,
p i =&i ;
ii=*pi;
is equivalent to the statement
ii= i;
The statements
*pi=*pi+l;
*pi+=l;
and
++*pi;
are identical and increments (by 1) of what pi points to. We should note that pointers never
point to anything useful until they are initialized.
not refer to itself, but refers to the element it points to. The scalar size for a pointer is defined by
its type specifier, which is i n t in this case. We may use the s i z e o f () operator to determine
the scalar size of a pointer.
By this we mean that for the pointer definitions
float *pf
double *pd
in t *p i
the scalar size of pf, pd, and pi sets to
s i z e o f ( f l o a t ) =4
s i z e o f ( d o u b l e ) =8
and s i z e o f ( i n t )=2
respectively. Actually the compiler requires this scalar of a pointer to perform the pointer op-
erations properly.
Sometimes we may need to test the pointer variables. Relational operators such as >=, <=,>,
<,==, and != may be applied to pointers only when both operands are pointers, for example,
if ( p n t r l >= p n t r 2 )
{
}
is acceptable, but the following example,
if ( p n t r > 50)
{
}
is invalid. The reason behind this is that the numeric constant 50 is not of pointer type. Note
that the equality (==) and inequality (!=) test operators may also be applied if one of the oper-
ands is a null pointer ( i.e., NULL o r' \ O'). This discussion is not yet sufficient because one may
seek to find the situation when pointers in comparisons are pointing to different data types. In
fact, we should not use these tests when the pointers point to different data types. Unfortu-
nately, some compilers do not detect this sort of errors and it may be very difficult to track the
bugs that crop up due to such pointer comparisons.
t = a ;
a = b ;
b = t ;
return;
}
This is because the formal parameters a and b hold the private copy of the values of
corresponding actual parameters. To achieve the goal we may pass the addresses of the vari-
ables to the function. For example, to interchange two integers x and y, the function reference
may be of the form
interchange(&x, &y);
and the function definition may be written as the following:
interchange (int *pa, int *pb)
{
int t;
t = *pa;
*pa = *pb;
*pb = t ;
}
As earlier, this function does not have the provision to change the arguments, but as the
arguments are pointers, it can very well alter the contents of the pointers. So by passing pointers
as function arguments we can bypass the problem of the 'call by value' technique.
So the assignments
px=&x[ 0 ] ;
and
px=x;
are equivalent.
Now, if we write the statement
px = px+1; o r px+ +;
it will advance px by 1, meaning px will now point to the next element of the array, that is, x[l].
More generally, if px points to x[0], then (px+i) will point to the element that is i elements after
px and this is true regardless of the size of the elements of the array. This means that * ( px+i) is
identical to x[i]. In fact, *( x+ i) is same as x[i], since x is holding the address of x[0].
To visualize this fact we need to understand the pointer arithmetic little clearly. When a
pointer is added to or subtracted from an integer, the scalar size of the pointer comes in use. In
fact, this integer is scaled by the scalar size. That is, to the compiler the expression
* (px+i)
looks like
* ( p x + ( i * s c a l a r s i z e of p x ) )
= * (px +i * s i z e o f ( i n t ) )
= x [i]
This indicates that (px +i) points to the ith elements of the array if px points to the first
element of the array, irrespective of the type of the array elements.
We should note that a statement like
x = px;
bound = limit-1;
do
{
flag = 0;
for ( i=0; i < bound ; i++ )
if ( x[i] > x[i+l] )
{
temp = x [i ];
x[i] = x[ i + 1 ] ;
x[i+l]= temp;
flag = i;
}
bound = flag;
} v/hile ( bound ) ;
return ;
}
The version of the program given in Example 5.2 needs little discussion. The program
receives n (the number of elements to sort) from the user.
■ POINTERS M 77
is new to us. The call to the function c a l l o c ( a, b ) , which is a library function, reserves a
memory area for storing a number of elements where the size of each element is b bytes and
returns the pointer to this area.
This pointer is of void type. To convert it to a pointer of integer type we write
( i n t *) c a l l o c ( a , b ) ;
which is known as casting. This function is used to allocate storage space dynamically (at
runtime). Another function that may be used to allocate storage space dynamically ismal lo c ()
function. This also returns a pointer to void type. A call to this function looks like
malloc(a)
It reserves a bytes in memory and returns a pointer to this memory area.
Check the first 'for' statement in the example that reads n integer values from standard input
device and stores them in the locations pointed by
a, a +1 , a+2,..., a+ n-1
The function bub s o r t () receives the address of the first element and the number of elements
to sort. The rest of the program is self-explanatory, except for the function call f r e e (a). This
function is used to deallocate the storage space reserved earlier by c a l l o c or m a l l o c which
is pointed by a.
Example 5.3: Program to print each of the programming languages together with their memory
address.
main ()
{
static char lang[MAX][LEN] = {
"FORTRAN", "BASIC", "COBOL", "PASCAL", "C", "Ada"
} ;
int i ,j ;
00AB FORTRAN
00B2 BASIC
00BC COBOL
00C6 PASCAL
00D0 C
00DA Ada
Note that the above program uses two 'for' loops to display the characters in the array. In
the starting pass through the loop, the first print f () displays the address of lang [0][0] and
then falls into the j loop and in this loop it prints " FORTRAN". Then again it executes the outer
loop and continues until all the language names are displayed.
Let us now modify the above program a little by removing the inner loop which is con-
trolled by j and adding a %s (string) conversion character to theprintf () function. After this
modification the 'for' statement looks like
for(i=0; i<MAX; i++)
printf ("\n%p%s", &lang[i][0], lang+i);
In this case, the %p in printf () prints the addresses as before but %s in printf ()
prints a string from the given address lang +i. As we know, pointer arithmetic is always scaled
by the scalar size of the data item being pointed to, and lang holds the first address of a two-
dimensional array. Note that lang +i will be scaled as
lang + i*(scalar size of the two dimensional array)
80 ■ DATA STRUCTURES USING C ■
The scalar for a two-dimensional array is the size of each element multiplied by the sec-
ond dimension in the array definition. So, lang +i converts to
lang + i*(sizeof( char)*8)
=lang + 8i
This is why the expression lang+i in the program increases by 8 each time i is incremented
by 1. In general, for a multidimensional array declaration like
type_specif ier array_name [dl ] [d2 ] ............... [dn];
the scalar size is given by
scalar_size = sizeof (type_specif ier) * d2 * d3 * ..........* dn.
To illustrate this fact, let us consider the declaration
float a [5] [8] [10] [12] ;
The corresponding scalar size is computed as
scalar_size = sizeof(float)*8*10*12
=4*8*10*12
=3840
It is to be noted that the first dimension is not used to determine the scalar size.
myarray a r r a y \o
T
an array
name
The ith element may be referred to as myrray[i]. The name myarray holds the address of
the character 'a', the first element of the array. In the later definition the meaning is completely
different. Here the string constant "pointer" is pointed by the pointer variable yourptr. Pictori-
ally, it may be shown as in Fig. 5.4.
yourptr - p o i n t e r \o
T
a pointer
variable
Example 5.4: Program to display the strings pointed by the array elements of an array of pointers.
}
■ POINTERS M 83
From Fig. 5.5 it is obvious that *aop gives the pointer to the string "BASIC". Since aop is
an array (aop+i) is the address of aop[i], and hence *(aop+2) will be give pointer to "FOR-
TRAN". So *(aop+2)[0] will give the character F, the first character of "FORTRAN". The pro-
gram in Example 5.6 is written using this concept of pointers and is a combined form of the last
two programs.
Example 5.6: A pointer version of the programs in Example 5.4 and Example 5.5.
#include <stdio.h>
m a i n ()
{
int x,y,operation,
double result, operate () , add(), subtract (), mul () , d i v i d e O ;
argv[0] ------------- ► f 1 u u s h \o
argv[l] ------------- ► D e a r \o
argv[2] ------------- ► I \o
argv[3] ------------- * A 1 w a y s \0
argv[4] ------------- ► R e m e m b e r \o
argv[5] ------------- ► Y o u \o
A program code to achieve this is listed below in Example 5.8. As an additional task it
also displays the first characters of all these arguments. It treats the array elements as pointers.
Precisely saying that it is a pointer version program and is essentially same as the program code
listed in Example 5.6, except for a few changes.
Example 5.8: The C code for the program flush.
while(n— >0)
p r i n t f ("%c", (ptrarr++)[0]);
printf ("\n");
return;
}
Another C code is presented in Example 5.9 which receives a date in the format dd- mm-
yyyy from the command line and checks whether it is a valid date or not. This program is not
only an example of command line argument, but also covers many aspects relating to pointers
in C. Note that the function convert receives a parameter ptrarr which is a pointer to a pointer to
character. This function highlights the way of changing the value of variables by passing point-
ers to variables.
Example 5.9: The C code to check the validity of date given in command line.
#include <stdio.h>
main(int argc, char *argv[])
{
int d,m,y;
int leap;
if (— argc>0)
{
convert(++argv, &d,&m,&y);
leap = y % 4 == 0 && y % 100 i= 0 I I y% 400 = = 0;
printf ("The date %s is %s \n", *argv,
(valid (d,m,y,leap )) ? "valid" : "not valid" ) ;
}
else
printf ("Usage :: VALIDATE <dd-mm-yyyy>\n" );
.return ;
}
convert (char **ptrarr, int *pd, int *pm, int *py)
{
char *curptr, c;
int n;
In this chapter we have not discussed how pointers are related with structures. Lastly,
we make some final observations as follows:
(i) As a single fixed-size data item, pointers provide a homogeneous method of referenc-
ing any data structure regardless of the structures' type or complexity.
(ii) In some instance, pointers permit faster inclusion and deletion of elements to and
from a data structure.
3. What is the difference between an array name and a variable defined as a pointer?
4. Write a program to read a group of input lines, each containing one word. The program
should print each word that appears in input and the number of times it appeared.
5. Write a function strlast (s t r 1, s tr 2 ) which returns 1 if the string str2 occurs at the
end of the string strl, otherwise the function returns 0.
6. Write a function strsearch which receives two character pointers as arguments to it and
returns a character pointer. The function searches the first string to see whether the second
string appears in it. If it is so, it returns a pointer to where the second string is in the first
string, otherwise, it returns a null pointer.
7. Write a program to read an integer (maximum upto nine digits) and print it in words.
STACKS AND QUEUES
In the earlier chapters we have seen the primitive data structures that are available in C. We
have also looked through arrays and strings. The array data structure is implemented with
storage structure as memory, while string data structure is implemented with arrays as their
storage structures. There are many other simple but important data structures. These are simple
because they can be implemented using arrays as their storage structures. Other implementa-
tions of these data structures are also possible. In this chapter, we have considered two such
important data structures, stacks and queues.
2 46
2 23
2 11
2
2
2
46
top (empty stack)
2 I 46
23
top
2 I 23
1 0 1
11
^top
2 |_11_
1 0 1 1
5
•top
2 I 5
1 0 1 1 1
^top
2 |_2_ 0 0 1 1 1 0
1
^top
2 I 1
1 0 1 1 1 0 1
1\op
0 1 1 1 0 1
I' top
0 1 1 1 10
T top
0 1 1 101
t t op
0 1 1011
0 10111
top
101110
t top (empty stack)
This example clearly shows that to convert a decimal positive integer to its binary equiva-
lent, the remainder (after dividing by 2) is to be taken on last-generate-first-take basis which is
nothing but like a stack. Now we define a stack formally. A stack is a list or sequence of data
items in which all insertions and deletions take place at one end, called the top of the stack. With
this background we can now write an algorithm that converts a decimal positive integer to its
binary equivalent and display it. One such algorithm is given below.
Algorithm 6.1: Decimal to binary conversion algorithm
/ *Algorithm to convert a decimal positive integer to its equivalent binary form and display it* /
92 ■ DATA STRUCTURES USING C ■
create(&stack);
while(number)
{
remainder = number %2;
push(&stack, remainder);
number/=2;
}
p r i n t f ("Equivalent binary representation is") ;
w h i l e (!empty(stack))
{
remainder=pop(&stack);
(printf ("%d", remainder);
}
pu t c h a r ('\n');
■ STACKS AND QUEUES ■ 93
stack [1] 1
stack [2] 1
stack [3]
stack [stackumit]
The next remainder that will be generated by our algorithm is 0 for the integer number
46. To push this remainder to stack top we must first shift the values between stack [0] and
stack [3] to stack [1] and stack [4] respectively, and then put the remainder 0 to stack [0]. Clearly,
as the stack grows, this shifting becomes a real overhead. On the other hand, when we try to
pop an element from the stack top we must shift up the stack elements one step each time. This
is required so that we can pop the next element from the stack top. This also creates an overhead
of shifting up the stack elements while popping. So now the question is how to get rid of this
overhead? One trivial solution is to use a variable top to keep track of the array index where the
top element of the stack is stored. In our case, this implementation works like the following. As
and when a remainder is pushed to the stack, first the variable top is increased by 1 and the
remainder is stored in stack [top]. Fig. 6.4(a) shows the stack after storing the first four remain-
ders to the stack.
94 ■ DATA STRUCTURES USING C ■
When the next remainder 0 is pushed, the variable top is increased to 4 from 3 and then
the remainder is stored in stack [4]. This is shown in Fig. 6.4(b). So in this implementation the
storage structure for the stack is an array that holds the stack elements and there is a variable
top that holds an array index indicating the stack top element. This structure tells us to choose
the following declarations and definitions in C.
#define STACKLIMIT ..... /* Maximum size of stack */
typedef int elemtype; */ Type of item in the stack */
struct stacktype {
int top; /* Stack top index */
elemtype i t e m [STACKLIMIT];
In-
struct stacktype stack;
Let us revisit our above implementation with the idea that we want to store either an
integer or a floating-point number or a string. In such a situation the typede f given above will
not be sufficient, because a stack element may be of either int or float or a pointer to character
string type. In this case we need to take the help of union feature of C also. Moreover, we must
keep the information that as to what type of element is stored at a particular array index of the
stack. This discussion suggests to revise our above declaration and definitions as in the follow-
ing.
#define STACKLIMIT ......... /* Maximum size of stack */
#define INT 1
#define FLOAT 2
#define CHAR 3
■ STACKS AND QUEUES ■ 95
struct stackitem {
int itemtype;
union {
int i;
float f;
char *pc;
} element;
};
struct stacktype {
int top ;
struct stackitem item[STACKLIMIT];
In-
struct stacktype stack;
With this background we can now write the C code to implement the four basic stack
operations easily. Here we should notice the fact that when a stack is just created it is empty and
at that time the value of the variable top should be -1 (minus one). This is because when we
push an element to the stack we must increase this top and then store the element to stack.
To create an empty stack we must set the top member of the variable stack to -1. In
fact, this is the only thing that we need to do within the function. So if a pointer to the stack
ptrstk is passed as the argument to function create, then the only statement within the
function should be ptrstk-> top = -1;
The function empty must check whether the top member of the stack which is to be
passed as argument to the function is -1. If that is so, return 1, otherwise return 0. Hence, the
statement within empty is of the form
return ((stack.top == -1)?1:0) ;
The push function must receive two arguments. One of them is a pointer to the stack
within which an element is to be pushed and the other one is the element itself which is to be
pushed. Within the function we must check whether the stack is already full. In such a situation
an error message must be given. Otherwise, we push the element to the stack top. We achieve
this by first increasing the top member of the s tack by 1, and then copying the element to the
top of the stack. The C code to do so is of the form
ptrstk -> top++;
ptrstack -> item[ptrstack->top] = element;
The pop function is just the opposite to push function. It also requires two arguments,
one of which is a pointer to the stack from which an element is to be popped. The second argu-
ment should again be a pointer to element (say ptrelement) which will actually hold the popped
element from the stack. In this case first the element is popped and then the top member of the
stack is to be decreased by 1. The function must check whether the stack is empty before
popping because in such a case it is an error. The program code will look like
if (e m p t y (stack))
96 ■ DATA STRUCTURES USING C ■
struct stackitem {
int itemtype;
union {
int i;
float f;
char *pc;
} element;
In-
struct stacktype {
int top;
struct stackitem i t e m [STACKLIMIT];
};
ma i n ( )
{
int number, c, dum;
struct stackitem info;
struct stacktype stack;
■ STACKS AND QUEUES ■ 97
do {
printf ("Enter the positive integer to convert
scanf ("%d", &number) ;
create (&stack);
while (number)
{
info.itemtype = INT ;
info.element.i = number % 2 ;
push ( &stack, info ) ;
number >> = 1 ;
}
printf ("The equivalent binary representation is :" ) ;
while ( lempty (&stack))
{
pop ( &stack, &info ) ;
printf ("%d", i n f o .e lement.i ) ;
}
{
printf ("Convert : illegal attempt to pop from
empty stack \n") ;
exit(1);
}
*ptrinfo = ptrstk->item[(ptrstk -> top
return ;
}
push (struct stacktype *ptrstk, struct stackitem x)
{
if (ptrstk -> top == STACKLIMIT-1)
{
printf ("convert : illegal attempt to push
to full stack\n") ;
exit (1) ;
}
ptrstk -> item[++(ptrstk -> top)] = x ;
return ;
}
By definition, a list does not have any upper limit on the number of members of the list.
Hence a stack should also have no upper limit on the number of members of the stack. So as an
abstract data structure there should not be any condition as to whether a stack is full. But as an
array is of fixed size and here we chose this as the storage structure of the stack, we have to
impose an upper limit on the number of members in the stack. Thus it is clear that this imple-
mentation of stack data structure is not completely a faithful representation. Later in the chap-
ter 9 we will see an alternative representation of stack using linked list that puts no such upper
limit and is more flexible.
top
Now the o p e r a t o r i s encountered. It is pushed to the stack since the stack top symbol
'('is assumed to have lower priority than any other operator. The operand 3 is found next and
it is sent to output directly. Now the output and stack take the following form,
output stack
4 5 3
top
Finally, the right parenthesis ')' is encountered. When a ')' is encountered, the symbols
are popped from the stack and sent to output until a '(' is found in stack top. This'(' is popped
from the stack but not sent to output. After doing this the output and stack become
output stack
i 5 I"*7] p
T
top
100 ■ DATA STRUCTURES USING C ■
There is no other symbol left in the input now. At this stage, the operators are popped
from stack and sent to output until the stack is empty. Hence the output will look like
output
4 5 3 1 r
and the stack is empty. The output now shows the RPN expression for the given infix expres-
sion. Notice that though there is a set of parentheses in the infix expression, it is not present in
our RPN expression. An algorithm is presented to transform an infix notation arithmetic ex-
pression to its equivalent RPN expression below, in its general form. This algorithm may al-
ways be extended to include logical infix expression conversion by incorporating the logical
operators.
Algorithm 6.2: Infix to RPN conversion algorithm
/* Algorithm to transform an arithmetic infix notation expres
sion to RPN expression */
8+(((7-5)*(9-4)+6)/ 4)
t
+(((7-5)*(9-4)+6)/ 4) 8 output 8
t
t top
(((7-5)*(9-4)+6)/ 4) 8 + push +
T
T top
((7-5)*(9-4)+6)/4) 8 + ( push (
T jtop
(7-5)*(9-4)+6)/4) 8 + ( ( push (
t .t
top
7-5)*(9-4)+6)/4) 8 + ( ( ( push (
t .t
top
-5)*(9-4)+6)/4) 87 + ( ( ( output 7
T
t top
5)*(9-4)+6)/ 4) 87 + ( ( ( - push -
T
t top
)*(9-4)+6)/4 875 + ( ( ( - output 5
T
t top
*(9-4)+6)/4) 875- + ( ( pop and output -
t pop (
T top
(9-4)+6)/4) 875- + l( ( * push *
.t
t top
9-4)+6)/4) 875- + !l( ( * ( push (
T
t top
-4)+6)/4) 8 75-9 + ( ( * ( output 9
t fT
top
102 ■ DATA STRUCTURES USING C ■
t Jtop
6)/4) 8 75-94-* + l< < + push +
T
T top
)/4) 875-94-*6 + ( ( + output 6
t fT
top
/4 ) 875-94-*6 + + ( pop & output +
T t7op also pop (
4) 875-94-*6+ jl+ ( / push /
T
t top
) 8 7 5 - 9 4 - * 6 +4 +l </ output 4
t top
end of infix 8 7 5 - 9 4 - * 6+ 4/ + pop & output /
t T & also pop (
endofinfix 875-94-*6 +4 /+ pop & output +
t t
Fig. 6.5 Execution of Algorithm 6.2
The Figure 6.5 shows that for the infix expression the RPN expression takes the form
8 7 5 - 9 4 - * 6 + 4 /+
Let us take this RPN expression to illustrate the process of evaluating an RPN expression.
To do this, the RPN expression is scanned from left to right. When an operator is found,
the operator and the last two operands are removed from the expression and joined with the
operator to yeild a new sub-expression. This sub-expression is then replaced in RPN expression
at the point from which the operands and the operator are removed. Scanning is then resumed.
Finally there will be only one operand remaining in RPN and there will be no operators. This
operand's value is the value of the expression. The process is executed below with the above
expression. The left to right scan is indicated by underlining.
8 7 5 - 9 4 - * 6 + 4 /+
8 2 9 4 - * 6 + 4/ +
825*6+4 / +
■ STACKS AND QUEUES ■ 103
8 10 6 + 4 / +
8164 / +
84 +
12
After evaluating the RPN expression we reached at the value 12, which is the value of the
expression. This method suggests that in the left to right scan the operands must be stored until
an operator is encountered. At this situation the last two operands must be retrieved to operate
on the operator. This means that the order in which the operands are retrieved is in last-in-first-
out order and hence a stack should be used to store the operands. Thus each time an operand is
found it should be pushed to a stack. When an operator is found, two operands are popped
from the stack, the operator is applied on these operands and the resulting value is pushed back
onto the stack. This process is continued and finally there will be one element in the stack which
will give the value of the expression.
In the following a formal algorithm is presented that evaluates an RPN expression.
Algorithm 6.3: Evaluation of an RPN expression
/* Algorithm to evaluate RPN expression*/
Step 1: Create an empty stack
/* stack elements must be of type of operands */
Step 2: Repeat the following
i. Get next token from RPN expression;
/* Token may be a constant, variable or
arithmetic expression */
ii. If (token is an operand)
push it onto stack
else do the following:
a. pop two elemetns from stack;
(If not available there is an error
due to malformed RPN expression)
b. apply the operator on these two
elements;
c. push the resulting value onto stack;
until (end of RPN expression);
Step 3: Pop the only value on stack top and send it to
output as the value of the expression. (In fact,
there will be only one value in the stack.)
An execution of the above algorithm is given in Fig. 6.6 for illustration purpose. Here also
an upward arrow (t) is used to indicate the current token in RPN expression. The same RPN
expression is considered for a good understanding of the algorithm.
104 ■ DATA STRUCTURES USING C ■
Example 6.2
struct stackitem {
int itemtype;
union {
int i;
float f;
char c;
} element;
};
struct stacktype {
int top;
struct stackitem item [STACKLIMIT];
};
main()
{
char rpn[SIZE], infix[SIZE];
int value;
switch (operator )
{
■ STACKS AND QUEUES ■ 107
info.itemtype = INT;
info.element.i = vail;
push (&opstack, info);
>
token = gettoken(rpn, &index, &tokenop, tokenstr);
}
if (opstack.top != 0)
{ /* if stack does not contain one element only */
printf ("Error in evaluation or malformed RPN
expression\n");
exit(1);
}
pop (&opstack, &info);
return (info.element.i);
}
gettoken (char s[], int *ptri, char *ptrop, char str[])
{
int i, j, token;
char c;
break;
default : if (c >= ' 0 ' ScSc c <= ' 9 ' ) {
for(j = 0; (s[i] >= '0' ScSc s[i] <= '9');
j+ + )
{ str [j+ + ] = ' ';
str[j ] = '\0';
*ptri = i;
token = 1;
} else
token = -1;
break;
}
return token;
}
A complete C program that receives an infix expression and then evaluates it after con-
verting the expression into RPN expression is given in Example 6.2. The main program calls
two functions. The first function is inf ix-to-RPN which translates an infix expression to its
equivalent RPN expression using Algorithm 6.2 mentioned above. The function evaluate takes
the generated RPN expression and evaluates it using Algorithm 6.3. The infix expression is
assumed to have its operands as integer only. The program may always be extended for logical
expressions also.
four integers are there in the queue. Now if we try to insert another new integer it will not be
possible to do so unless we shift the queue elements to the left side. Clearly this shifting is a slow
process. Obviously, it would be a good solution to this problem if we can bring the rear to the
beginning of the array instead of setting its value to MAX in this situation. So we must treat our
array as a circular one as in Fig. 6.9.
■ STACKS AND QUEUES ■ 113
0 5
rear -----------------------
85\
1 ( J 80) 4
\50 60y
front ----------------------- > — ?>
Let us now look at the whole process from the beginning, keeping in mind that our array
is circular. The initial queue is shown in Fig. 6.10(a), where the front and rear have the same
value 0. After inserting 70, 90, and 50 the queue looks like in Fig. 6.10(b). Here front holds the
index 0 and rear holds the index 3. After removing two elements from the queue (from front of
the queue) the queue takes the form of Fig. 6.10(c). Now if we insert five more integers to the
rear of the queue, say the values 60, 80, 85, 95, 99, then the queue takes the form of Fig. 6.10(d).
In the last situation it can easily be noticed that front and rear hold the same value 2. This
again creates a difficulty that with this implementation we cannot differentiate the status of
empty queue and full queue, because in both the situations the front equates to rear. For clarifi-
cation, if all the six integers are removed from the front of the queue it will take the form given
in Fig. 6.10(e).
Note that when front and rear hold the value (MAX - 1 ) the very next value that front and
rear get is 0. This means that after removing an element, the front is changed using the assign-
ment
front = (front +1) % MAX;
Similarly, rear changes after inserting an element to the queue by the assignment
rear = (rear +1) % MAX;
To avoid the anomaly between an empty queue and a full queue, we can introduce a
restriction that a queue implemented using an array of size MAX must not have more than
(MAX - 1 ) elements. Then the status of a queue is full when the condition
((rear + 1) % MAX == front)
is fulfilled. Obviously, the status of the queue is empty if
(rear == front)
is fulfilled.
114 ■ DATA STRUCTURES USING C ■
Summing up the discussions above, to implement a queue we may use the storage struc-
ture as a structure in C containing a circular array that can store the queue elements, front and
rear, to hold the position of the starting element and the position following the last element of
the queue, respectively.
#define MAX . /^Maximum size of the queue array */
typedef ......... Q_type; /* the type of element stired in queue*/
struct Q_typee {
int front,
rear ;
struct Q_type element[MAX];
};
struct Q_type queue;
A C program that implements all basic queue operations is given in Example 6.3. The
program is a multiplication skill test program that gives you some multiplication problems.
Wrongly answered problems are queued and asked again at the end of the session.
Example 6.3
/* C listing to show all basic queue operations. This is a
multiplication skill test program */
#include <stdio.h>
tinclude <stdlib.h>
#include <time.h>
count=0;
do{
++count;
removeQ (&wrong_queue, ^question);
116 ■ DATA STRUCTURES USING C ■
i f (!query(question, 2))
wrongl++;
else
score++;
} while (count < wrong);
}
printf("You made %d correct out of %d\n", num-wrongl, num);
printf ("You scored %d points.\n", score);
exit(0);
}
The program above is self-explanatory and involves all the standard queue operations.
On execution of the program, one gets the information on how to use it.
E iX iE iR sC iliS iE iS
1. Convert each of the following infix expressions to postfix.
(a) X - Y + Z
(b)X + Y/Z + W
(c) (X+Y)/Z + W
(d) X - (Y - (Z - (W - U )))
2. Convert each of the following postfix expressions to infix.
(a) X Y + Z -
(b)XY + Z - W U * /
(c)XYZ W - + *
(d) X Y / Z W / /
3. Write C functions to convert
(a) A prefix string to infix notation
(b) A postfix string to prefix notation
4. Convert the following boolean expressions to Reverse Polish Notation.
(a)X&&( Y ll!Z)
(b) ( X II ( Y & & ! Z ) & & ( W II U )
(c) (( X < 7 ) && ( X >9)) II ! ( X > 0)
(d) (X != Y ) && ( Z != W )
5. Modify the program in Example 6.2 so that it can accept the binary operator % (mod) and
unary operator - (minus).
6. Write an expression evaluator in C that accepts an infix expression involving logical opera-
tors ( &&, II, ! ) and relational operators (<, >, <=, >=, ==, != ), converts to RPN, and then
evaluates to 1 or 0 depending on true or false, respectively.
7. A stack is to be implemented so that a stack member should hold a list of integers. Imple-
ment such a stack. Also write the push and pop routines that can be used on such a stack.
8. Instead of a stack, implement a queue so that each element of the queue holds a list of
integers. Write the functions addQ and removeQ for such a queue.
RECURSION
Recursive algorithms are especially useful in manipulating data structures that are themselves
defined recursively. Whenever a data object is defined recursively, it is often easy to describe
algorithms that work on these objects recursively. Though all the languages, such as BASIC,
COBOL, and the like, do not have the recursive facility, however, all new programming lan-
guages use recursion as their primary iterative control structures. If our programming language
does not allow recursion, that should not matter because we can always translate a recursive
programme into a nonrecursive version. C language has an inherent property to use recursion.
We will show the readers how the recursion can be implemented through C language. So a
discussion on recursion will be helpful to our readers for proper understanding of the use of
recursion in data structures.
In this chapter we introduce recursion, a problem-solving technique often used in com-
puter science. The basic objective is to provide a variety of examples alongwith explanation to
design and understand recursion. We will confine ourself to the basic concept of recursive
algorithms and the way they can be implemented using C language.
Suppose we want to compute 3! . From the above discussion, we obtain the following
result
3! = 3* (3-1)!
=3*2!
=3*2* (2-1)!
=3*2*1!
=3*2*1*(1-1)!
=3*2*1*0!
=3*2*1*0*(0-1)!
=3*2*1*0*(—1)!
Note that the symbol asterisk (*) represents multiplication. The above example, that is,
calculation of factorials, identifies two major problems — (i) termination of recursion was not
provided and (ii) n! will always evaluate to zero for n>0.
To solve the problems, we need to divide the recursive definition into two cases: a base
case and a general case.
0!=1 /* The base case */
n!=n * (n-1)!, n > 0 / * The general case */
The base case is the non-recursive definition that terminates the recursion. On the other hand,
the general case is the recursive part of the solution definition.
We now implement the recursive version of factorial program using C language. For the
case of understanding the factorial program, we first illustrate the non-recursive version of
factorial program.
Example 7.1
Level 1: Factorial_Recursion(3)
return(3 * Factorial_Recursion(2))
[from General case]
Level 2: Factorial_Recursion(2)
return(2 * Factorial_Recursion(1))
[from General case]
Level 3: Factorial_Recursion(1)
return(1* Factorial_Recursion(0))
[from General case]
122 ■ DATA STRUCTURES USING C ■
void count(int);
int val;
printf("Count to what value?");
scanf("%d", &val);
level =0;
count(val);
}
Let us see how the recursion process works.
The program begins executing in main function. Let this function call co u n t () with an
argument of 4 (i.e., the value of val is 4). The function co u n t (4) begins its execution while the
main function is on hold. This function displays level information about level of recursion.
Since the value of val, that is, argument of count (now it is 4) is greater than 1, the function calls
co u n t ( ) with an argument of 3. Count (3 ) displays information about the level of recursion.
Since the parameter has a value greater than 1, the function calls co u n t ( ) with an argument 2.
At this point, m ain ( ) , c o u n t (4) , and co u n t (3 ) are on hold, and co u n t (2 ) begins
its execution. Just as the two previous calls to co u n t ( ) did, co u n t (2 ) displays information
about the level of recursion. The function now checks whether its argument value is greater
than 1. Since the condition is true, m ain ( ) , c o u n t (4) , co u n t (3) , and co u n t (2 ) are on
hold , and co u n t (1) starts its execution. Count (1) displays its information about recursion
level. Now the best condition for the if statement fails. Control thus transfers to the next state-
ment in the same function which turns out to be the call to p r i n t f ( ) for displaying the value
of val. Note that the last version of co u n t ( ) function called was the first version to do any
work. One feature of recursive function calls is that the last version called is the first one to do
work. The last one called is also the first one to finish its work. We can also see this in the output
as given below.
Count to what value? 4
Starting count at level 1: val =4
Starting count at level 2: val =3
Starting count at level 3: val =2
Starting count at level 4: val =1
Leaving count at level 4: val =1
Leaving count at level 3: val =2
Leaving count at level 2: val =3
Leaving count at level 1: val =4
A recursive function must always test whether it can stop before calling another version
of itself. If a recursive call is made, the parameters in some calls should eventually have values
that will make further recursive calls unnecessary.
Suppose we want to calculate a recursive algorithm to implement the routine power
(x,y), that is, it raises the value contained in x to the yth power. For example, if we need to
calculate the value of 5 raised to the power of 4, power (5.4), we are really solving for 5 * power
(5.3). Likewise, power (5,3) is equivalent to 5* (5,2) when in the power the value is raised to be
equal to 0, the ending value, the value 1, is returned to the calling function.
124 ■ DATA STRUCTURES USING C M
If power ( ) function is called with power (5,2), the following processing is performed:
power (5,2) return
5* power (5, 2 -1)
power (5,1) return
5* power (5,1 -1)
power (5,0)
return 1 by definition
5 ------------- result of power(5,1)
25------------- result of power (5,2)
We will now present the recursive routine for the power function.
Example 7.5
/* Power function */
/* Returns the result of value raised to the */
/* power contained in the variable raised */
/* to, or -1 if the value in raised to is negative */
float power(x,y)
float x;
float y;
{
if (y<0)
return( -1 , 0) ;
else
if(y ==0) /* Any value raised to zero is one */
return( 1,0);
else
return(x* power(x,y -1);
}
information. The stack holds all the information of every function that has been invoked but not
yet completed. Let us consider a simple program to illustrate the use of the stack.
Example 7.6
/* Recursive call */
# include <stdio.h>
m a i n ()
{
static int x=0;
x++;
x<7 ? main( ) :x++;
printf ("%2d* , x );
}
Output: The output of the programme will b e : 8 8 8 8 8 8 8
In the above program m ain ( ) calls itself, though it is not common to the programmer.
The integer variable, x, is defined as storage class, static. Thus, it stores the changed value of x
after initialization. So the value of x is zero for the first time and at the time of next call it stores
the incremented value of x. Each time m ain ( ) function is called, the p r i n t f (. ) statementis
stored in the stack. This statement remains on the stack until the function is completed. The
condition x<7 imposes that there are six calls to m a i n ( ) function and naturally six
p r i n t f ( ) statements are stored on the stack. Since functions are always excited in a reverse
order from the order in which they are invoked, these informations are stored in a stack due to
the nature of the stack (last-in-first-out). Finally, when the condition x<7 fails, the value of x is
incremented again ( i.e., x=7+l =8). Now the informations from the stack are popped off and
execute the p r i n t f ( ) statement with the latest value of x. We will show the process in
Fig. 7.1.
printf()
printf()
printf()
printf()
printf() printf()
printf() printf() printf()
x=l x=2 x=7
(a) Stack (b) After first call (c) After second call (d) After final call
It is always possible to exhaust the space allocated to the run-time stack and get a stack
overflow error condition. Besides this there is also the issue of time required to store the remain-
ing statements on the stack when a function is invoked and return the space when it is com-
pleted. Function invocation involves a great deal of overhead, and it can be a time-consuming
126 ■ DATA STRUCTURES USING C ■
operation. Therefore, recursive algorithms that invoke a great deal of function linkage can some-
times run more slowly than interactive programs that solve the same problem.
We will now present another recursive function which is often used in string manipulation.
The following routine counts the number of characters contained in a string via a recur-
sive process.
string-length (String )
char *String;
{
if(*String == NULL) /* End of the string */
return(0);
else
return(l + String-length (++String ));
/* Recursive call */
}
The routine examines the value referenced by *String to see if it is NULL (the ending
condition). If the *String does not contain NULL the value returned is return (1+String_length
(+ + String));
If *String is NULL, it will return zero. If the routine is invoked with the string "Hello",
the following processing will take place:
String _ length (String); /* Return 1 */
1st time: *String points to 'H'
return! 1+String_length (++String ));
maintained as they are moved to peg3. The discs are to be stacked from the largest to smallest
beginning from the bottom.
While moving the discs from pegl to peg3, the following rules are to be observed.
(i) Only one disc can be moved at a time.
(ii) No large disc can be placed over the top of a smaller disc.
(iii) An auxiliary peg, that is, peg2 can be used to act as an intermediate to store one or more
discs while they are being moved from their source, (peg pegl) to their destination
peg (peg3).
Let us illustrate the changeover process considering the value of N as 1,2, and 3. Finally,
we will conclude the problem in general. Suppose there is only one disc, that is, N=l. The solu-
tion is very simple, that is, merely move the disc from pegl to peg3, as shown in Figs. 7.2(a)
and 7.2(b).
If there are two discs on pegl, move the top disc from pegl to peg2 and then move the
second disc from pegl to peg3. Finally, move the top (first) disc from peg2 to peg3. The transfer
procedure is shown in Figs. 7.3(a) - 7.3(d).
The problem of transferring discs from pegl to peg3 is somewhat complex in case when
the value of N is 3. In this case, move the first two discs from pegl to peg2 using peg3 as an
intermediate. Then move the third disc from pegl to peg3. Next, move two discs from peg2 to
peg3 using pegl as an intermediate. The entire procedure is shown in Fig. 7.4. Fig. 7.5 illustrates
the same problem when N contains four discs.
For general N, use the technique already established for transferring three discs. First,
move (N-1) discs from pegl to peg2 using peg3 as an intermediate. Then move top disc from
pegl to peg3. Then again apply the same technique to move (N-1) discs from peg2 to peg3
using pegl as an intermediate.
A B C
(a)
1
A B C
(b)
Fig. 7.2 (a) Initial configuration when N= 1 (b) Final configuration when N=1
1
A B C
(a)
2 1
A B C
(b)
128 ■ DATA STRUCTURES USING C ■
1
2
(a)
Initial placement of discs when N=3
(b)
Move top disc from A to C
(c)
Move second disc from A to B
2
B
(d)
Move disc from C to B
■ RECURSION m 129
2 3
A B C
(e)
Move disc from A to C
1 3
A B C
(f)
Move top disc from B to A
2
1 3
A B C
(g)
Move disc from B to C
1
A B C
(h)
Move disc 1 from A to C
Fig. 7.4 (a) - (h) Initial to final configuration when number of discs is 3
A B C
(a)
Initial configuration of discs when N=4
130 ■ DATA STRUCTURES USING C ■
4 1
A B C
(b)
Move disc from A to B
4 1 2
A B C
(c)
Move disc from A to C
3 1
4 2
A B C
(d)
Move disc from B to C
4 3 2
A B C
(e)
Move disc from A to B
4 3 2
A B C
(f)
Move disc from C to A
1 2
4 3
A B c
(g)
Move disc from C to B
■ RECURSION ■ 131
1
2
4 3
A B C
(h)
Move disc from A to B
1
2
3 4
A B C
(i)
Move disc from A to C
-2 1
3 4
A B C
(j)
Move disc from B to C
1
2 3 4
A B C
(k)
Move disc from B to A
1
2 3 4
A B C
(1)
Move disc from C to A
1 3
2 4
A B C
(m)
Move disc from B to C
132 ■ DATA STRUCTURES USING C M
2 1 4
A B C
(n)
Move disc from A to B
1 4
A B C
(o)
Move disc from A to C
A B C
(P)
Move disc from B to C ( Final configuration)
Fig. 7.5 (a) - (p) Initial to final configuration when number of discs is 4
Recursion is a powerful tool when used properly, but there are trade-offs. Recursion can sim-
plify difficult programming tasks, such as the Tower of Hanoi. It is doubtful whether a pro-
grammer could have developed the non-recursive solution to the Tower of Hanoi problem di-
rectly from the problem statement. A non-recursive solution involving stacks is more difficult
to realize and more prone to error when a stack cannot be eliminated from the nonrecursive
version of a program and when its counterpart recursive version can be as fast or faster than the
non-recursive version under a standard complier.
In general, a nonrecursive version of a program will execute more efficiently in terms of
time and space than a recursive version. This is because the overhead involved in entering and
exiting a block is not required in the recursive routine. In a non-recursive program, the stack
activity can be eliminated. However, it may sometimes be possible to convert a recursive func-
tion to a non-recursive function and vice versa. The cost involved for this conversion may de-
pend entirely on the knowledge of the programmer. Finally, even C language supports recur-
sion, a recursive solution to a problem is often more expensive than a non-recursive solution,
both in terms of time and space. Frequently this expense is a small price to pay for the simplicity
■ RECURSION ■ 133
and self-documentation of the recursive solution. We now present the program for the Tower of
Hanoi problem.
Example 7.7
#include <stdio.h>
/* Recursive routine for Tower of Hanoi */
/* It perforins to move N-number of discs */
/* from source to destination using*/
/* auxiliary N given as input */
Tower_of_Hanoi ( N, pegl, peg2, peg3)
int N;
char pegl, peg2, peg3;
{
if (N<1)
printf("\n No disc is present on pegl\n");
else if (N ==1)
printf("Move disc from %c to %c\n", pegl, peg2);
else
{
Tower__of__Hanio (N-1, pegl, peg3, peg2);
printf("Move disc from %C to %C\n", pegl, peg3);
Tower __of__Hanoi (n-1, peg2, pegl, peg3);
}
}
/*Main program */
main( )
{
int N ; /* Number of discs to be moved */
char pegl, peg2, peg3 ;
pegl= 'A';
peg2= 'B ';
peg3= 7C #;
printf ("\n Enter the number of discs to be moved:");
scanf("%d, &N);
Tower_of_Hanoi(N, pegl, peg2, peg3);
}
134 ■ DATA STRUCTURES USING C ■
Output:
Sample run is given below:
Enter the number of discs to be move: 3
Move disc from A to C
Move disc from A to B
Move disc from C to B
Move disc from A to C
Move disc from B to A
Move disc from B to C
Move disc from A to C
In the above program, we call T o w e r _ o f _ H a n o i function within the function
T ow er_of_H anoi. This represents a recursive call. Although such a function may be easily
written but it is somewhat difficult to understand thoroughly. While describing the process of
recursive call, it will be evident that the use of a stack is essential for implementing the recursive
call.
------------------- ►printf ( )
7.6 EXAMPLES
In this section we present more examples to illustrate the design, development, and implemen-
tation of recursive functions. In the subsequent chapters, we will come across examples that are
more easily solved by recursion. We will present here only programs and recursive routines
without describing the construction of programs. The reader will understand the actual process
through the output which will be given after each program.
Example 7.1: The Fibonacci sequence is the sequence of integers
0, 1, 1, 2, 3, 5, 8,13, 2 1 ,..............................
Each element in the sequence is the sum of the two preceding elements. We can define the
sequence by the following recursive expressions:
fibo(n) = n if n = 0 or 1
fibo(n) = fibo (n-2) + fibo (n-1) if n> = 2
We will now present the recursive version of the above sequence.
Example 7.8
/* main */
#include < stdio.h>
m ain( )
{
136 ■ DATA STRUCTURES USING C ■
#include(stdio.h>
char A [16];
int i =0;
main ( )
{
int number, base, 1, p, q, t;
printf("\n Enter number and base:");
scanf("%d, %d", ^number, &base);
l=strlen(A) ;
Convert(number,base);
for(p=0, q=l~l; p<q, p++ , q--)
{
t=A [p] ;
A[p]=A[q]; /* Swap A [p] and A[q] */
A[q] = T ;
}
printf("\n%S", A ) ;
return ;
}
m RECURSION m 137
#include <stdio.h>
bitcount(word)
unsigned int word ;
{
/* This function returns the number of bits in a word */
/* which are 1 */
if(word == 0)
return(0);
else
138 ■ DATA STRUCTURES USING C ■
return(1+bitcount(word&(word-1)));
}
/* Main program */
main( )
{
unsigned int word ;
printf("\n Enter an unsigned word:");
scanf("%n", & word);
/* Call the bit count function */
printf("\n Number of l's in the word = %d", bitcount(word));
}
Input:
Enter an unsigned word:15
Output:
Number of l's in the word =4
Example 7.11: The following example illustrates a function that returns an integer value to its
ASCII representation.
Example 7.12
#include<stdio.h>
/* Function to print the contents of the character string */
/* in reverse order , e.g., "This" will be printed as "sihT" */
print__reverse (S) ;
char *S;
{
if (*S ! = NULL)
{
print_reverse(++S) ;
putchar(* ( -- S) ) ;
}
}
/* Main program */
main( )
{
char S[80] ;
printf("M Enter a string:\n") ;
gets(S) ; /* Read a string */
print-reverse (S) ;
}
Input:
Enter a string
This is a String
Output:
g n i r t S a Si sihT
To understand the cost of recursion, we again rewrite the Fibonacci program (with a
minor modification) to compute the desired Fibonacci number. The program is inefficient after
about the twentieth Fibonacci number.
Example 7.13
E mX mE i R i C ; h S i E i S
1. Write a non-recursive routine to calculate x*y by using addition only. Assume that both x
and y are non-negetive integers.
2. Write a recursive function to evaluate
s(n)=2+4+6+... +n
m RECURSION m 143
In Chapter 6, stacks and queues were considered. As a data structure they are a sequence of
data elements where only two types of operations can be performed: insertion and deletion.
Moreover, these operations are restricted to the ends of stacks and queues which are nothing
but special types of lists. But in the case of a general list, there is no restriction. An element can
be inserted into or deleted from any position of the existing list. In fact, a general list may re-
quire some more operations on it. Lists, in general, together with different types of list imple-
mentations are discussed in this chapter in detail.
In general, we use the term 'list' to signify a series of elements. It is to be noted here that for our
purpose the series of elements in a list also maintains an order, that is, first element of the list,
second element of the list, and so on.
As a data structure, a list is a finite sequence of elements. In fact, a list may be an empty
list also. There are several types of operations involved in a list and they greatly depend on the
application. The common operations on list include
(i) create an empty list.
(ii) check if a list is empty.
(iii) traverse the list or a portion of the list.
(iv) insert an element into the list.
(v) delete an element from the list.
By definition, a list is a finite sequence of elements and hence an array may seem to be
most natural as the storage structure of the sequential implementation of list. In this implemen-
tation the list elements are stored as consecutive array elements. This implies that ith list ele-
ment is stored as ith array element.
It is to be noted at this point that an array has a fixed length. Once the array is defined we
are restricting the size (number of elements ) of the list. For the time being let us compromise
with the above fact by defining an array of sufficiently large size so that all the list elements can
be accommodated. Let MAXLEN be the constant that defines the maximum length of the list
(array). Also consider that we inserted n elements to this list (array) sequentially, that is, ith list
element is in the ith element (with index (i-1 )) of the array and n < MAXLEN. Obviously, then
some of the array elements will be empty at its tail. So we should have an auxiliary variable to
maintain the current length of the list.
musTsm 145
With this background we may assume the following definitions and declarations.
#define MAXLEN .../*Maximum length of the array(list)*/
struct listtype {
iterntype itern[MAXLEN];
int len;
};
struct listtype list;
itemtype element;
With the above definitions and declarations the first three list operations mentioned above
can be implemented very easily, but the last two will be little inefficient.
To create an empty list we can simply set the length of list to zero by using a statement
like
list.len=0;
Now it is trivial to return 1 if the list is empty by using a statement like
return(list.len==0);
which might be used to check if a list is empty.
We can traverse or look through the list elements one by one from the beginning of the
list by passing through all the array elements from the beginning. This may be of the form
for(i=0; iclist.len; i++)
{
process list.item[i];
}
Now, to insert an item into the list we must first decide at what point it is to be inserted.
Generally speaking, we need to know after which member of the list the item is to be inserted.
As we are considering sequential implementation we know that the ith member of the list is
stored in array index(i-l). Consider that our item should be the mth member of the list. So it
should be stored in index(m -l) of the list array. Therefore, we must first make the room empty
at the index(m-l) of the list array by shifting all the members of the list one step to the right and
then put the item at this empty place. For shifting the members of a list array one step to the
right we may use the code
for(i=len-l; i>=m-1; i— )
list.item[i+l]=list.item[i];
and finally put the element at its proper place by setting
l i s t . item [m-1 ] =e le m e n t; and increasing len by l i s t . l e n + + ;
Note that we can safely assume the fact that len>=m-l. One more thing must be kept in
mind that before doing the shifting of list members we must be sure that the list array is not full.
Clearly, the number of list members that must be shifted, plays a very important role in
146 ■ DATA STRUCTURES USING C ■
measuring the efficiency of this insertion process. If there are n members in the list then both the
worst case and average case complexities for their process are O(n).
The reverse process is delete. Here if we want to remove the mth member we need to
shift some list members to the left and may use the code
for(i=m-l; i<len-l; i++)
list.len[i]=list.len[i+1];
list.len— ;
Here also measure of efficiency depends on the number of list members to be shifted
and again both the worst case and average case complexities are O(n), where n is the total
number of members in the list.
From the above discussion we see that if we need to maintain a huge list where the fre-
quency of insertion and deletion is very high, this sequential implementation will be simply an
inefficient one since most of the time will be consumed in shifting of list members. There are
other better implementation which is discussed in the next section.
In the sequential implementation we have seen that an array may be thought of a list whose
ordering of elements is implicit. This is obvious because the ith member of the list is stored in
the (i -l)th location of the array. Actually, their implicit ordering is responsible for shifting the
elements of the list to the right or left when we need to insert or delete an element to or from the
list, respectively.
In the linked implementation of lists the ordering of list members (elements) is kept ex-
plicitly. This is possible only when we can
(a) get the first element of the list, and
(b) for any given list element we can determine its successor element.
This suggests a trivial linked-list implementation which requires to maintain the following.
(a) A collection of nodes that stores two items of information:
(i) The list member or element, and
(ii) A pointer that explicitly gives the location of the successor node.
(b) A pointer that holds the location of the first node of the list.
The pointers are basically establishing a link between the current node and its successor
node. That is why these pointers are termed as links in some of the books. We use the terms
Tink' and 'pointers' synonimously.
As an example, a list of the following names of places Bangalore, Jamshedpur, Pune might
pictorially be represented as
In the above diagram arrows indicates links. So in the above list head is a pointer that
points to the first node whose information (data ) part contains Bangalore. This is actually the
first list member. This node has got another part called the link part which keeps the informa-
tion about the location of its successor (next) node. This link part is basically a pointer to the
■ LISTS m 147
successor node. In the successor node of the first node the information part contains Jamshedpur
and the link part points to its successor. Through this link we reach the node where Pune is in
the information part and its link part shows an electrical ground symbol indicating that there
are no more nodes, that is, this node is the terminal node. The pointer value at this node is
NULL. For illustration purpose, in future we will denote the information part of a node pointed
by ptr->info and the link part by ptr->link
Now we are in a position to look as to how the five basic list operations given in Section
8.1 can be implemented. The first list operation is to create an empty list which is simply to
assign a NULL pointer to the pointer variable head. This is to be done to create an empty list to
indicate that head points to no node. The second operation is to check if a list is empty. This can
trivially be implemented by testing whether head is holding the NULL pointer or not. The next
list operation is traversing a linked list. A list traversal is essential when we need to process
each list element exactly once. Consider, for instance, that processing a list element consists of
only printing the element. We do this by initializing an additional pointer variablepsntp tr(say)
to point to the starting node. This may pictorially be represented as
psntptr
psntptr
and then prints 'Jamshedpur'. Now again psntptr is shifted to point the next node by
assign psntptr equal to psntptr->link
148 ■ DATA STRUCTURES USING C ■
psntptr
ptr
ptr
Here head points to the first node of the list whose list members are the four city names
'Ahmedabad', 'Bangalore', 'Jamshedpur', and 'Pune'.
Let us now consider the second case which seeks to insert a node that will contain the city
name 'Chennai' after the node with the city name 'Bangalore'. Consider also that we have in
hand a pointer prevp tr (say) that points to the node with the city name 'Bangalore'. Precisely
speaking, we seek to insert a node with the city name 'Chennai' after the node pointed by
prevp tr. As before, we first do the following:
assign ptr equal to fetchnode();
assign ptr->info equal to 'Chennai'
The status of the node and list may be depicted as
Now to insert the node pointed by ptr within the list after the node pointed by prevp tr,
we do the following in sequence. Clearly the link part of the node pointed by prevptr is the
pointer that points to the next node to the node pointed by prevptr, and should be next node
to the node pointed by ptr after insertion. So link part of the node pointed by prevptr is
150 ■ DATA STRUCTURES USING C ■
copied to the link part of the node pointed by ptr, that is,
assign ptr->link equal to prevptr->link
This status is represented as
After this, the node pointed by ptr should be the next node to the node pointed by
prevptr and we can do it by assigning the link part of the node pointed by prevptr as ptr,
which may be expressed as
assign prevptr->link equal to ptr
This completes the insertion operation as shown below.
We are now left with the deletion operation on linked list. Here also we need to consider
two different cases as follows.
(i) The node is to be deleted from the beginning of the list, that is, the first node of the linked
list is to be deleted.
(ii) The node to be deleted is situated after a node which is pointed by a given pointer, that is,
the successor of a given node is to be deleted.
Note that here also the head (pointer to the first node of the list) will change in the first
case after deletion and it will remain unchanged in the second case. Consider that presently we
are starting with a list with five elements depicted as below, choose first element to be pointed
by head.
One point should be noted here that after deleting a node it will not appear in the list. So
the space used for the deleted node will not be useful anymore unless we make it free that is, we
should treat the deleted node as an empty node so that we can reuse it whenever needed. For
our purpose, let us assume that we have a function freenode (ptr) which allows the node
pointed by the pointer ptr as a reuseable node.
m l is t s m 151
To delete the first node from the above list we take the help of an additional pointer
variable p tr(say ). Firstly, we save the starting pointer head within the variable ptr. Then we
simply set head such that it points to the very next node. This can be done by storing the value
of ptr->l ink to head. Finally, we call freenode (p tr ) to send back the first node (which is
now pointed by p t r ) to the storage pool so that this space can be reused.
More precisely, we perform the following steps in sequence.
assign ptr equal to head
assign head equal to ptr->link
Call the function freenode(ptr):
The step-by-step status of the list is depicted as follows.
In the final list we see that there are four nodes, each containing a city name whose first
node is pointed by the pointer variable head. To illustrate the second case of deletion process,
let us assume that we have another pointer variable prevptr which is pointer to the node with
city name 'Chennai' (previous node of the node to be deleted) and we are to delete the node
which comes immediately after this node in the linked list, that is, the node with city name
'Jamshedpur'. So the current picture of the list looks like
Here also it is better to take the help of an additional pointer variable p tr(say) which will
point to the node to be deleted. This can be achieved by executing the step
assign ptr equal to prevptr->link
152 ■ DATA STRUCTURES USING C ■
From the above figure it is clear that to delete the node pointed by ptr we should change
the link part of the node pointed by prevp tr so that it points the next node to the node pointed
by ptr. Clearly, this is achieved by executing
assign prevptr->link equal to ptr->link
On execution of the above step the linked list takes the form
The only thing we need now is to return the deleted node to the storage pool which
requires to execute
Call function freenode(ptr)
Finally, we are left with the following list.
In summary, as we have discussed earlier, it has been found that in a linked list there is
no need of shifting the elements in case of insertion and deletion, as in the case of an array. But
we have not yet seen how a linked list may be implemented. We have just discussed how differ-
ent operations on a linked list work in an abstract representation. The next section deals with
the different implementations of a linked list.
Though pointer-based implementation is good enough for our purpose, for a first-time
C user it might not be very handy. So for a better visulization of the linked structure we choose
to go through the array-based implementation. The readers may skip this if they find it too
trivial for their purpose.
A C function that initializes the storage pool may be of the following form which returns
the index of the first node of the pool.
initpool() /* Function to initialize the storage pool */
{
int i, avail;
avail=0;
for(i=0; i<SIZE-l; i++ )
nodearray[i].link=i+l;
nodearray[SIZE-1].link=-l;
return avail;
}
Having done this wecan now write the function fetchnode () that returns the index of
an available node from the storage pool of the available list. The returned index may be used to
store a list member. The fetchnode () function simply returns the index of the first node of the
available list if it is non empty and update the pointer(index) avai 1 to point to the next node.
The function may be written as given below.
fetchnode() /* Function to return the index of a free node if
available, otherwise returns -1 */
{
int ptr;
ptr=avail; /* Note that avail is an external variable */
if (avail==-l)
printf("Available list is empty—can't return node \n");
else
avail = nodearray[avail].link;
return ptr;
}
■ LISTS m 155
Consider that the first city name that is to be inserted to the empty list pointed by head is
Tune'. At the beginning we ask a free node by calling the fetchnode () function which will
return an index. Let this index be stored in an auxiliary variable ptr(say). The information part
of the node at the index p t r is then set to the value Tune'. Actually, this node insertion to the
empty list pointed by head is simply adding a node at the front of a list which may be coded in
C as
int ptr; /*Definition of the auxiliary variable */
ptr = fetchnode();
strcpy(nodearray[ptr].info,"Pune");
nodearray[ptr].link = head;
head = ptr;
After inserting Tune' to the list the configuration of nodearray may be depicted as
Index info link
head*•----- ►O Pune -1
•----- >-1
avail« 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 -1
Now if we want to insert 'Bangalore' to the front of the list we need to execute the same
code, except for the second statement changes to strcpy (nodearray [ptr].info, "Bangalore");
and the configuration of nodearray takes the form
Index info link
0 Pune -1
head« —*1 Bangalore 0
avail« -*2 3
3 4
4 5
5 6
6 7
7 8
9
-1
156 ■ DATA STRUCTURES USING C ■
In a similar way, if we insert now the city name 'Ahmedabad' at the front of the list the
second statement changes to
strcpy(nodearray[ptr].info,"Ahmedabad");
And now the nodearray looks like
Index info link
0 Pune -1
1 Bangalore 0
head*----- >2 Ahmedabad 1
avail*----- ^3 4
4 5
5 6
6 7
7 8
8 9
9 -1
We can easily see that in this implementation the storage structure of our linked list is an
array namely nodearray. Moreover this nodearray is holding two different lists:
(a) The list with city names whose first node is pointed by head.
(b) The list of available nodes which are used whenever necessary. The starting node of
this list is pointed by avai 1.
Now if we want to insert the city name 'Surat' at the end of the list, that is, insert after the node
pointed by prevptr which holds the index 0 (say) we take a node from the available list, store
'Surat' to the information part of the node. Clearly, the successor of this node should be the
successor of current prevptr. Moreover the successor of prevptr should be set to the node
we are inserting. A code in C language to implement this is presented below:
int ptr;
ptr = fetchnode();
strcpy(nodearray[ptr].info, "Surat");
nodearray[ptr].link=nodearray[prevptr].link;
nodearray[prevptr].link=ptr;
■ LISTS ■ 157
On insertion of the city name 'Chennai' after the node pointed by 1 (value of prevptr)
using the code in the previous page the nodearray becomes
Index info link
0 Pune 3
1 Bangalore 4
head*----- ►2 Ahmedabad 1
3 Surat -1
4 Chennai 0
avail*----- ►5 6
6 7
7 8
8 9
9 -1
To get this form the same set of code as above was executed except the second statement
which should be like
strcpy(nodearray[ptr].info, "Chennai");
To insert 'Mumbai' after the node pointed by 4, which should be the value of prevptr
the second statement should be changed as
strcpy(nodearray[ptr].info, "Mumbai");
and configuration of node ar r ay becomes as in Fig. 8.2.
158 ■ DATA STRUCTURES USING C ■
Fig. 8.2 Final configuration of nodearray after inserting six city names
At this point our linked list is identified by head and it contains 6 nodes each holding a
different city name. The list is formed by inserting 6 nodes in succession within an empty list.
Some were inserted at the front of the existing list while some were inserted after a specific
given node in the list. The process of insertion is shown in the form of the function insertlist().
Clearly, this function would require three arguments,
(a) pointer (index) to the first node of the existing list.
(b) pointer(index) after which the new node is to be inserted.
and (c) the value (string in this case) that should go to information part of the inserted node.
The function insertlistO for our purpose may be written as in the following, which returns
the head, that is, the pointer to the first node of the list.
insertlist(int start, int prevptr, char element[])
/* start is the pointer(index) to the first node of the list */
/* prevptr is the pointer(index) to the node after which the node
to be inserted. If prevptr is -1, it means that node is to be in
serted at the beginning of the list */
/* element holds the value of the node to insert */
{
int ptr;
ptr = fetchnode();
strcpy(nodearray[ptr].info, element);
if(prevptr == -1) /* Insert at the front of the list */
{
nodearray[ptr].link = start;
start = ptr;
■ LISTS m 159
traverse_list(int head)
{
int current;
current = head;
while (current!= -1)
{
printf("%s \n", nodearray[current].info);
current = nodearray[current].link;
}
return;
}
Now it is the turn of the deletion operation to consider. To delete a node from the list we
need to know the pointer to its previous node. That is, we can delete a node from a list when we
know the pointer to the node which preceds the node to be deleted. Considering the present
configuration of nodearray as in fig. 8.2 let us try to delete the node after the node which is
pointed by the index 2 i.e., the value of the pointer prevptr (say) is 2. This means that we want
to delete the node with city name 'Bangalore' from the list pointed by head. A code in C to
delete the node may be written as in the following:
int ptr ; /* An auxiliary pointer variable */
ptr = nodearray[prevptrJ.link ;
nodearray[prevptr].link = nodearray[ptr].link;
160 ■ DATA STRUCTURES USING C ■
The first instruction of the above code simply stores the link part of node at p revp tr within the
variable ptr. Then simply sets the link part of this node by the link part of the node pointed by
ptr, that is, by the link part of the node to be deleted. Now nodearray changes to
Index info link
0 Pune 3
1 Bangalore 4
head*----- ►2 Ahmedabad 4
3 Surat -1
4 Chennai 5
5 Mumbai 0
avail*----- ►6 7
7 8
8 9
9 -1
This configuration of nodearray shows that the first node of the list pointed by head
holds the city name 'Ahmedabad', its next node is at 4 which holds 'Chennai'. Next to it is the
node at 5 holding the city name 'Mumbai', then the node at 0 which contains 'Pune' and then the
final node at 3 (it is final node, because its link part shows -1) which holds 'Surat'. Clearly, the
node with city name 'Bangalore' is deleted from the list pointed by head. The area of deleted
node is not useable now. It is possible to reuse the area of the deleted node if and only if this
node is returned to the storage pool of available nodes. This can simply be done by inserting the
deleted node to the front of the list of available nodes. Let this task be achieved by calling a
function freenode () that receives the pointer to the (deleted) node which is to be freed. So the
call to the function as
freenode(ptr);
is to be done after executing the above two instructions to complete the deletion task properly.
Now our nodearray takes the form
Index info link
0 Pune 3
->1 Bangalore 6
-+2 Ahmedabad 4
3 Surat -1
4 Chennai 5
5 Mumbai 0
6 7
7 8
8 9
9 -1
■ LISTS ■ 161
This configuration of nodearray shows any call to the function fetchnode () will
return the node at index 1 which is a deleted node and hence the deleted node is reuseable now.
Note that though physically the node at index 1 holds the city name 'Bangalore' it is not treated
so and considered as a junk value and clearly not relevant.
Let us now delete a node which is at the front of the list pointed by head. Clearly, we do
not have any preceeding node of the node to be deleted. Such a situation can be handled very
simply by executing the following C code. The code below also returns the deleted node to the
storage pool of available nodes.
int ptr; /* Auxiliary pointer */
ptr m head; /* Store the current head to ptr */
head = nodearray[ptr].link ; /* Change head to point to
the next node*/
freenode(ptr); /* Send back deleted node to storage pool of
available nodes */
On execution of the above code our nodearray will take the form
The function deletelistO is presented below. It returns pointer to the starting node
of the list from which a node is deleted. The function requires two arguments, namely, (a) pointer
(index) to the first node of the list from which the node to be deleted and (b) pointer to the node
that precedes the node to be deleted.
Note that if no node precedes the node to be deleted, that is, if the first node is to be
deleted then it is indicated to the function by passing the second argument as -1. The function
will appear as given below:
if (p r e v p tr == - 1 )
{
p tr = s t a r t ;
s t a r t = n o d e a rra y [p tr]. lin k ;
} e ls e {
p tr = n o d e a rra y [p re v p tr]. lin k ;
n o d e a r r a y [ p r e v p t r ] .l i n k = n o d e a r r a y [ p t r ] . l i n k ;
}
fre e n o d e (p tr);
re tu rn s t a r t ;
}
The function f re e n o d e () requires two arguments also. They are
(a) pointer to the first node of the list of available nodes; and
(b) the pointer to the free node that is to be attached to the front of the list of available
nodes.
The function freenode () returns the pointer to the beginning of the list of available
nodes and is shown below
fr e e n o d e (in t a v a i l, in t p tr)
/* a v a i l i s th e p o i n t e r ( in d e x ) t o th e f i r s t node o f th e l i s t o f
a v a i l a b l e nod es */
/* p t r i s th e p o i n t e r ( in d e x ) to th e f r e e d */
{
n o d e a rra y [p tr]. lin k = a v a il;
a v a il = p t r ;
re tu rn a v a i l ;
}
read the next word and repeat the search, otherwise, find after which node this word is to be
inserted within the list and then insert it at its proper place. Continue reading and searching till
the end of the document and finally print the list. Clearly, the above method is not efficient
because each time we read a word we need to search the list from the beginning and on an
average the number of comparisons is half the current length of the list. Evidently the list will
become lengthy within a short span of time because we are continuously adding nodes to the
list whenever we find a new word.
In order to reduce the search time we can split the list into a number of smaller lists.
For this particular problem, it is quite natural to use twenty six (26) lists, one for each alphabet
that is, we may construct one list for all words starting with each different alphabet (letter). In
other words, there will be a list for all words starting with 'A', a list for all words starting with
the letter 'B', and so on. A program that constructs an index from a stored document (ASCII
file) is given in Example 8.1. The data structure used is linear linked list while the storage struc-
ture is array- based.
Example 8.1
int free;
nodetype node[POOLSIZE];
mai n ()
{
int i , predptr, list[26];
char word[MAXWORD];
initpool();
for (i=0; i<26; i++)
createlist(&list[i]);
while(getword(word)) /* returns word with len */
i f (1 search(list[word [0]-'A' ] , word, &predptr) )
164 ■ DATA STRUCTURES USING C ■
insertlist(&list[word[0]-'A'],word,predptr);
for(i=0; i<26; i + +)
i f (lemptylist(list[i]))
traverselist(list[i]);
return;
}
initpool()
{
int i;
getnode()
{
int ptr;
ptr = free;
if (free != Null)
free = node[free].link;
else {
printf("No available nodes\n");
printf("Storage pool is empty\n");
ptr = Null;
}
return ptr;
}
createlist(int *ptrlist)
{
*ptrlist = Null;
return;
}
■ LISTS m 165
emptylist(int list)
{
return(list == Null);
}
traverselist(int list)
{
int curptr;
curptr = list;
while (curptr != Null)
{
printf("%s\t", node[curptr].info);
curptr = node [curptr].link;
}
putchar('\n');
return;
}
curptr = list;
*predptr = Null;
while (curptr != Null)
if((found = strcmp(node[curptr].info,item) ) >= 0)
break;
else {
*predptr=curptr;
curptr = node[curptr].link;
}
return ((found = = 0 ) ? TRUE : FALSE);
}
temp = getnode();
strcpy (node[temp].info,item);
if (pred == Null)
{
node[temp].link=*ptrlist;
*ptrlist=temp;
} else {
node[temp].link=node[pred].link;
node[pred].link=temp;
}
return;
}
while(!isalpha(c=getchar()))
if (c == EOF)
return i;
do {
s[i++]=c;
c = getchar();
} while(isalpha(c));
s [i ]=' \0';
return i;
}
For the document
TWINKLE TWINKLE LITTLE STAR
HOW I WONDER WHAT YOU ARE
UP ABOVE THE WORLD SO HIGH
LIKE A DIAMOND IN THE SKY
WHEN THE BLAZING SUN IS GONE
WHEN THERE NOTHING SHINES UPON
THEN YOU SHOW YOUR LITTLE LIGHT
TWINKLE TWINKLE ALL THE NIGHT
the above program creates an array of (twenty six) linked lists which may be picturized in
Fig. 8.3.
m LISTS m 167
list[l] - H BLAZING \ j\
list[2]
list[3] - Hd i a m o n d |j ^|
list[4]
list[5]
list[8] -H i H— H in" -H is
list[9]
list[10]
1
list[ll] -H LIGHT -H LIKE - H LITTLE '
list[12]
list[14]
list[15]
1
list[16]
list[17]
list[18] - » 1 SHINES H -------- H SHOW 1-I— H sky M ---------H SO H ---------H STAR -WSUN
list[20] -H up -W UPON
list[21]
list[23]
list[24]
list[25]
The output of the program for the given document is the following list of words.
A ABOVE ALL ARE
BLAZING
DIAMOND
GONE
HIGH HOW
I IN IS
LIGHT LIKE LITTLE
NIGHT NOTHING
SHINES SHOW SKY SO STAR SUN
THE THEN THERE TWINKLE
UP UPON
WHAT WHEN WONDER WORLD
YOU YOUR
The above code is used to insert a node at the front of a given list (list is given when its
head is given). But we may want to insert a node within a given list after some given node also.
For example, we may want to insert a node with the city name 'Surat' after the node with cityname
'Pune' which is pointed by the pointer say prevp tr. That is, we have with us in the value of the
pointers head and prevptr. We also have an array c ity filled with the string 'Surat'.
170 ■ DATA STRUCTURES USING C ■
Our objective is to insert a node, with information part stored in the city array, into the
list pointed by head after the node pointed by prevptr. To achieve this a C code may be
written as
nodetype *ptr;
ptr=(nodetype *)malloc(sizeof(nodetype));
strcpy(ptr->info,city);
ptr->link=prevptr->link;
preveptr->link=ptr;
On execution of this code the list takes the form
It prevptr points to the node with cityname 'Banglore' and the city array holds the
string 'Chennai', the execution of the above code will change the linked list as
Thus we have seen that we can either insert a node at the beginning of a given list or
insert a node after a given node within a list. This insertion process is shown in the form of a
C function insertnode ().
Obviously, this function would require three arguments, namely
(a) a pointer to the starting node of the existing list say start;
(b) a pointer to the node after which the new node to be inserted say prevptr. The new
node to be inserted at the front of the list is indicated by setting the value of prevptr as
a NULL pointer, before calling the function;
(c) the information part of the new node.
The function should return the head, the pointer to the first node of the list, because
insertion of a node at the front of a list changes the head of the list.
The function insertnode () is presented below:
ptr->link = start;
srart = ptr;
} else {
/* insert after node pointed by prevptr */
ptr->link = prevptr->link;
prevptr->link = ptr;
}
return ptr;
}
Now to print all the list members of a list we need to traverse through all the nodes of the
list from the first node to the last node of the list. A simple list traversal algorithm may be
presented as the following pseudo code.
A list traversal function traverse_list () is presented below which prints all our list
members. The function receives the list head as its argument and returns nothing.
traverse_list(nodetype *head)
nodetype ^current;
current = head;
while (current != NULL)
{
printf("%s \n", current->info);
current = current->link;
}
return;
Like the insert operation the deletion of a node from a list is either of the following two
types:
(i) We may delete a node from the beginning of a given list, that is, the head of the list is to
be deleted. After deletion the next to head will be the new head of the existing list.
(ii) We may delete a node which is situated after a given node.
172 ■ DATA STRUCTURES USING C ■
As in the array implementation, the pointer implementation to delete a node from the
front of list pointed by head may be written as
nodetype *ptr; /* Auxiliary pointer variable */
ptr = head; /* Save current head to ptr */
head = ptr->link; /* Change head to point to next node */
free(ptr); /* Deallocate the allocated memory area pointed by
ptr */
Consider now that we want to delete a node from a list, whose front node is pointed by
head, which is situated after the node pointed by prevptr in the list as in the figure given
below.
1 2 3 4 5
prevptr
In the above figure there are five nodes in the list and they are numbered. The pointer
prevptr points to the node number 3. We want to delete the node numbered 4 which appears
just after the node pointed by prevptr. The following C code segment may be used to achieve
the above task.
nodetype *ptr; /* Auxiliary pointer variable */
ptr=prevptr->link; /* Save the pointer to the node to delete in ptr
*/
prevptr->link = ptr->link ; /* Set the next node to prevptr as the
next node to the node to delete */
free(ptr); /* Deallocate the allocated memory area pointed by ptr */
Combining the above two C segments the function del etenode () to delete a node from
a list may be written as presented below. The function receives two arguments, one of which is
the list head and the other one is a pointer to the previous node of the node to be deleted. This
pointer is NULL, if the node is to be removed from the beginning of the list. The function
returns a pointer to the head of the list.
nodetype *deletenode(nodetype *start, nodetype *prevptr)
/* start is the pointer to the list head */
/* prevptr is the pointer to the preceding node to the node to delete */
{
nodetype *ptr;
if (prevptr == NULL)
{
/* delete from front of the list */
ptr = start;
start = ptr->link;
} else { /*delete the successor of the node pointed by prevptr */
ptr = prevptr->link;
■ LISTS m 173
prevptr->link = ptr->link;
}
free(ptr);
return start;
}
main()
{
nodeptr list[26], prevptr;
char word[MAXWORD];
int i;
174 ■ DATA STRUCTURES USING C ■
for(i=0; i<26; i + +)
createlist(list + i);
while(getword(word)) /* returns word with len */
i f (!search(list[word[0]-'A'], word, &prevptr))
insertnode(&list[word[0]-'A ' ], word, prevptr);
for(i=0; i<26; i + + )
i f (iemptylist(list[i]))
traverselist(list[i]);
return;
}
createlist(nodeptr *ptrlist)
{
*ptrlist = NULL;
return;
}
emptylist(nodeptr list)
{
return(list == NULL);
}
traverselist(nodeptr list)
{
nodeptr curptr;
curptr = list;
while (curptr!=NULL)
{
printf("%s\t", curptr->info);
curptr = curptr->link;
}
putchar('\n');
return;
}
{
nodeptr curptr;
int found;
curptr = head;
*prevptr = NULL;
while(curptr!=NULL)
i f ((found=strcmp(curptr->info)) >= 0)
break;
else {
*prevptr = curptr;
curptr = curptr->link;
}
return ((found == 0)? TRUE : FALSE);
}
temp = (nodeptr)malloc(sizeof(nodetype));
strcpy(temp->info, item);
if (prevptr == NULL)
{
temp->link = *headptr;
*headptr = temp;
} else {
temp->link = prevptr~>link;
prevptr->link = temp;
}
return;
}
getword(char s [])
{
i n t c, i =0;
176 ■ DATA STRUCTURES USING C ■
while(!isalpha(c=getchar()))
if (c == EOF)
return i;
do {
s[i++]=c;
c=getchar();
} while (isalpha(c));
s [i] = '\0 #;
return i;
}
E iX iE iR s C iliS iE iS
1. What are the main advantages of linked lists over arrays in representing a group of items?
2. Write a program to reverse a linked list so that the last element becomes the first one, the
last but one becomes the second element and so on.
3. Write a program to find the sum of integers in a singly linked list.
4. Develop a line-oriented text editor that assigns a number to each line of text and maintains
the lines in a linked list by line number order.
5. Write a program that takes as input a polynomial as (coefficient, exponent) pairs, in any
order. The polynomial is stored in a linked list of (coefficient, exponent) pairs. Write a pro-
gram to print the polynomials in descending order.
6. Write a function binary_search that accepts two parameters, an array of pointers to a group
of stored numbers, and a single number. The function should use binary search to return a
pointer to the single number if it is in the group. If the number is not present in the group,
return the value NULL.
7. Write a linked list program that cyclically permutes the elements of a given sequence. For
example, if the response is
3 4 2 1 5
the program prints
5 3 4 2 1
8. Write a function that removes the first node in a linked list with a given value.
9. Write a function that takes a pointer to a linked list and reverse the order of the nodes by
simply altering the pointers.
If the original list where
5 3 7 1
the function will return
1 7 3 5
m LISTS m 177
10. Write a function multiply(p, q) to multiply two long positive integers represented by singly
linked lists.
11. Write a C program to split a linked list into two lists, in such a manner that the first linked
list contains the odd numbered nodes and second linked list contains the even numbered.
12. A palindrome is some word /line that reads the same forwards or backwards.
Given a linked list of words, write a C function to create a palindrome list from it by concat-
enating its reverse list to the given list.
9
LINKED LISTS—VARIANTS
In the previous chapter we considered linear linked lists. They are linear in their structure in the
sense that list elements must be processed in a sequential manner from first to last. This is
because the linked lists that we have seen have a head node which is directly accessible and
each node contains an information part together with a link part that allow to move to its
successor node, if it exists.
But in some cases, it may be convenient to use different kinds of accesses. In fact, other
kinds of links between the nodes may also be helpful. In this chapter we consider some of these
variants of linked lists such as linked stacks, linked queues, lists with fixed head, circular linked
list, and so on.
create(stacktype **stackptr);
{
*stackptr = (stacktype *) NULL;
return;
}
■ LINKED LISTS—VARIANTS ■ 179
ptr=(stacktype *)malloc(sizeof(stacktype));
strcpy(ptr->info, element);
ptr->link = stack;
stack=ptr;
return stack;
}
It should be noted that this push function does not check whether the stack is full, which
was checked in the push function of the array-based stack implementation. This is because the
pointer implementation of stack does not impose any limit on the size of the stack and hence can
never be full. Of course, the stack size is limited to the available memory in this case. But the
pop function must check for the empty-stack condition in this implementation also, because the
stack may have no element at a point of time and if we try to pop a stack element at that time it
is an error condition. A code to pop () function is given below.
if (empty(stack))
{
printf("Stack underflow !\n");
exit(l); /* Exit with error condition */
} else {
strcpy(element, stack->info);
ptr = stack;
stack = stack->link;
free(ptr);
}
}
180 ■ DATA STRUCTURES USING C ■
The linked implementation of a queue is a simple modification of that of a stack. We recall that
a queue is nothing but a list in which elements are removed only at one end called the front or
head of the queue and elements are inserted only at the other end called the rear or tail of the
queue. From the above definition it is very natural that a linked list will become a queue if the
deletion occurs at the front of the list and any insertion is done at the end of the list. The deletion
operation will then be very similar to the pop operation of the linked stack but the insertion
operation will require a list traversal to get the pointer to the last node(element) of the list after
which the insertion will take place. It is possible to avoid this list traversal if we keep the pointer
to the last node of the list to a variable. So implementation of a linked queue needs the declara-
tion of the form
typedef struct queue_node{
char info[20]; /* Information part*/
struct queue_node *link; /*Link part*/
} nodetype;
typedef struct {
nodetype *qfront, *qrear;
} queuetype;
queuetype queue;
We have already seen that the operation commonly used on a queue includes the follow-
ing.
(i) Create an empty queue.
(ii) Check whether a queue is empty.
(iii) Insert an element at the end (i.e. at the rear) of the queue.
(iv) Delete an element from the front of the queue.
The function create () to create an empty queue may be simply written as
creat.e (queuetype *qptr)
{
qptr->qfront = qptr->qrear = (nodetype *)NULL;
return;
}
The function to check if the queue is empty or not may be implemented by using the
function
emptyQ(queuetype queue)
{
return(queue.qfront==NULL);
}
The operation to insert an element at the end of the queue is basically inserting a node in
the linked list (linked queue) after the node pointed by qrear of the queue. But this needs special
attention because if the queue is an empty queue the insertion should take place at the front of
■ LINKED LISTS—VARIANTS ■ 181
the queue. The deletion operation can be implemented just like the pop () function of a linked
stack discussed in the last section. The function i n s e r t Q () to add a node to the rear of the
queue may be written as
insertQ(queuetype *Qptr, char element[])
{
nodetype *ptr;
ptr=(nodetype *)malloc(sizeof(nodetype));
strcpy(ptr->info, element);
if(empty (*Qptr))
{
Qptr->qfront = ptr;
Qptr->qrear = ptr;
}
else
{
ptr->link=Qptr->qrear->link;
Qptr->qrear->link=ptr;
Qptr->qrear = ptr;
}
return;
}
Similarly, the function d e l e t e Q () from the front of the queue may be written as
nodetype *delete(queuetype *Qptr)
{
nodetype *ptr;
if(emptyQ(*Qptr));
{
printf("Attempt to delete node from empty queue\n");
exit(1);
} else {
ptr=Qptr->qfront;
Qptr->qfront = ptr->link;
if(Qptr->qfront==NULL)
Qptr->qrear=NULL;
free(ptr);
}
182 ■ DATA STRUCTURES USING C ■
create(nodetype **head)
{
*head = (nodetype *)malloc(sizeof(nodetype));
(*head)->link = NULL;
return;
}
The call create (&head) where head is declared as a pointer to nodetype will create an
empty list as below.
head •------► —
■ LINKED LISTS—VARIANTS ■ 183
As the dummy head node has an undefined information part, this area may be used to
store the information regarding the type of the list members. For example, the above list may be
presented as
In such implementation every node is having a predecessor node and so the insertnode ()
function will require only two arguments.
(i) A pointer to the previous node (say prevptr) after which the node has to be inserted.
(ii) The element, that is, the information part, say e 1emen t array of the node to be inserted.
Now the function insertnode () for this implementation may be written as
insertnode(nodetype *prevptr, char element[])
{
nodetype *ptr;
ptr = prevptr->link;
prevptr->link = ptr->link;
184 ■ DATA STRUCTURES USING C ■
free(ptr);
return;
}
The function traverse_list () to print all the list members will change a little, and is
presented below.
traverse_JList (nodetype *head)
{
nodetype *current;
current = head->link;
while(current != NULL)
{
printf("%s\n", current->info);
current=current->link;
}
return;
}
It is clear that every node in a circular linked list is having a predecessor and a successor.
So, as in the case of linked lists with fixed head node, here also no special considerations are
required for insertion and deletion of nodes. An empty circular linked list may be created by
simply setting cllist to NULL and to check if a circular linked list is empty we need to check if
cllist is NULL.
A function insertcllistO for inserting an element into a circular linked list whose
head is pointed by c 11 i s t after the node pointed by prevptr is shown below.
{
ptr->link=ptr;
cllist=ptr;
}
else
{
ptr->link=prevptr->link;
prevptr->link=ptr;
}
return cllist;
}
A single-node circular linked list points to itself, as shown in the figure below. So we
need to give special attention when the insertion takes place in a circular list with no elements.
—► Bangalore •- —
if(empty(cllist))
{
printf("Attempt to delete node from empty list \n");
exit(1);
}
else
{
ptr = prevptr -> link ;
if(ptr == prevptr) /* Single node circular list /*
cllist = NULL;
else
prevptr->link = ptr->link;
free(ptr);
return cllist;
186 ■ DATA STRUCTURES USING C ■
Obviously, we need to give special attention to the case where we want to delete from a
single-node circular list .The function t r a v e r s e _ c 11 i s t () to traverse a circular list is a simple
modification of the function to traverse a simply linked list and is of the following form.
traverse_cllist(nodetype *cllist)
{
nodetype *current;
i f (!empty(cllist))
{
current = cllist;
do
{
printf ("%s\n#/, current -> info);
current = current->link;
} while (current!=cllist);
}
return;
}
Note that in this case do _while loop is used instead of while loop. If we implement a
queue as a circular linked list, it may be advantageous to maintain one pointer cllist to the tail
node instead of the head node. Then to maintain a queue we need only one pointer cllist (the
rear) to access both the front (cllist->link) and the rear (cllist) of the queue. Some other applica-
tion may use a circular linked list with fixed head rode to simplify the algorithms of the appli-
cation.
These lists are also known as symetrically linked lists. It is very clear that with such lists
it is possible to move to either direction through the list by keeping only one pointer. Though
traversal in both directions is possible, it is achieved at the cost of extra space for the predeces-
sor links. As a variant of this, like in the singly linked list, we can introduce fixed head node for
doubly linked list also. This will simplify the basic operations on lists. Furthermore, for easy
access to either end of the list we can make this doubly linked list with dummy head node as a
■ LINKED LISTS—VARIANTS ■ 187
circular one also. With such modifications our doubly linked list will take the following form.
The pointer implementation of doubly linked list requires the following declarations.
typedef struct listnode{
char element[20]; /* Information part */
struct listnode *plink, slink ;
/* Predecessor & sucessor links */
} nodetype;
nodetype *dllist;
An empty doubly linked list may be created by using the function
createD(nodetype **nodeptr)
{
*nodeptr = (nodetype *) malloc(sizeof(nodetype));
(*nodeptr)->plink = (*nodeptr)->slink = (nodeptr *) NULL;
return;
}
A call to createD (&dllist) will create an empty doubly linked list which may be
depicted as
dllist
The function emptyD () to check if a doubly linked list is empty may be written as fol-
lows.
emptyD(nodetype *nodeptr)
{
return (nodeptr->plink == nodeptr);
}
In fact, we may return the value of the expression nodeptr->s 1 ink = = NULL also.
Inserting a node into a doubly linked list after the node pointed by prevptr needs a little
care. Consider the portion of such a doubly linked list as depicted below which contains the city
names 'Bangalore' and 'Surat'. Let prevptr point to the node with city name 'Bangalore' and we
need to insert the node with city name 'Chennai' after it.
Let ptr points to the node be inserted. For insertion, the order of the link adjustments is
as follows. First, we need to set the predecessor and successor links of the node to be inserted.
That is, (i) ptr->plink = prevptr; and (ii) ptr->slink = prevptr->slink;
188 ■ DATA STRUCTURES USING C ■
ptr
The above two instructions may be in either order within themselves. These are used to set the
link fields of the node we are inserting. Now the resetting of links is to be made so that the node
pointed by ptr is inserted at its proper place and these may be written as (iii) prevptr->slink =
ptr; and (iv) ptr->slink->plink = ptr;
These reset instructions may also come in either order but we must be careful that first set
instruction and then the reset instructions should be done to insert a node in doubly linked lists.
The order of these instructions are marked in the figure with dashed lines. The above discussion
leads to the following function insertdllistO to insert a node to a doubly linked list.
ptr=(nodetype *) malloc(sizeof(nodetype));
strcpy(ptr->info, element);
ptr->plink = prevptr;
ptr->slink = prevptr->slink;
prevptr->slink = ptr;
ptr->slink->plink = ptr;
return;
}
The deletion operation is more simple and only requires to reset the slink part of the
predecessor of the node to delete and the plink part of its successor. The function
deletedllist () may be written as
deletedllist(nodetype *nodeptr)
{
/* nodeptr points to the node to delete */
nodeptr->plink->slink = nodeptr->slink;
nodeptr->slink->plink = nodeptr->plink;
free(nodeptr);
return;
}
■ LINKED LISTS—VARIANTS ■ 189
uptr 0 -7 51 99
vptr 0 33 7 51 11 77
fptr
cient parts of these nodes are added. If this sum is nonzero, a node is created with coefficient
part as this sum, exponent part as the common exponents of the nodes pointed by u and v. This
created node is then attached at the end of the list pointed by fptr. That is, the created node is
attached to fptr after the node pointed by f, the last node. But if this sum is zero, then no node is
added to fptr, instead u and v are advanced to point to their successors.
On comparison of the exponent in the nodes pointed by u and v if we find that they are
different a node is created which is a copy of the node containing the smaller exponent with its
link part set to NULL and inserted at the end of the list pointed by fptr. Then the pointers (u or
v) which point to the node with smaller exponent and f are advanced to point the successor
node of the corresponding list.
The above process is continued till u or v becomes NULL. The changes at each step in the
lists pointed by uptr, vptr, and fptr are shown below.
Step 1:
vptr 2 0 33 51 -11 77 i
/) /
fptr
I
Step 2:
uptr 7 0 *H -7 51 5 99
Tu
vptr 33 7 51 -11 77 i\
T /) /
fptr 33
£
Step 3:
uptr 7 0 -7 51 5 99 (*
t a /
192 ■ DATA STRUCTURES USING C ■
vptr —t 2 0 9 33 7 51 -11 77
Tv
fptr —« 33
TV
Step 4:
fptr - i 33 -11 77
T
f
When the end of one of the lists (uptr or vptr) is reached, the remaining nodes of the other
list are attached at the end of fptr, to get the linked representation of sum polynomial (pointed
by fptr).
uptr 0 -7 51 5 99 ii NULL
/) /
t
u
vptr 33 51 -1 1
fptr 33 11 77 99
Actual C functions for such polynomial operations are not written here and are left to the
reader. We can build algorithms for other polynomial operations such as multiplication of two
polynomials and evaluation of a polynomial for a given value of x with little care.
manipulate positive integers which are larger than this size is not possible to do in a straightfor-
ward manner. Rather, first we need to choose a data structure that can be used to represent a
large integer.
It seems very natural to choose a linked list as the data structure to represent a large
integer, because there is no upper bound on the number of digits in the integer. For simplicity,
we consider large positive integers only. The processing of such linked list representation of
large positive integers will require a frequent back and forth movement through the linked list.
So a circular doubly linked list representation would be a better choice in this case. To make the
list operations simpler we choose the circular linked list with fixed head nodes to represent a
large positive integer. Each node in this linked list will store a three digit integer (say) except
possibly the first node, which corresponds to a group of three consecutive digits in the large
positive integer. The first node may contain an integer which might have less than three digits
also. For example, the positive integer 72,165, 834, 982 is represented as the following circular
linked list with fixed head pointed by operand as
To create this doubly linked list for the above integer the input should be in a group of
three digits separated by blanks as shown below.
72 165 834 982
We need a declaration of the form given below to create such lists.
typedef struct list_node{
int info;
struct list_node *plink, *siink;
} nodetype;
nodetype *nodeptr;
To create a circular doubly linked list with fixed head pointed by nodeptr we first create
a node of nodetype pointed by nodeptr as follows.
nodeptr = (nodetype *) malloc(sizeof(nodetype));
nodeptr->plink = nodeptr->slink = nodeptr;
This may be depicted as
The first part of the large integer (in this case 72) is read and stored it to variable num
(say). A new node is then created whose info member is holding the value of num.
This may be done by
ptr=(nodetype *)malloc(sizeof(nodetype));
ptr->info = num;
194 ■ DATA STRUCTURES USING C ■
ptr is the pointer to this created node. Now this node is inserted as the last node of the circular
doubly linked list pointed by nodeptr. This may be achieved by executing the following code:
ptr->plink = nodeptr->plink;
ptr->slink = nodeptr;
nodeptr->plink->slink = ptr;
nodeptr->plink = ptr;
The following figure shows the state of the list.
nodeptr
The same process is repeated by reading the next part of the large integer and is contin-
ued until we finish with all the parts of the integer that has come in the input. A function to read
such a long integer forming a circular doubly linked list with fixed head is given below. This
function readint () returns a pointer to nodetype which is the pointer to the head of the linked
list representing the large integer.
nodetype *readint()
{
nodetype *nodeptr,*ptr;
int num;
nodeptr = (nodetype *)malloc(sizeof(nodetype));
nodeptr->plink = nodeptr->slink = nodeptr;
while(scanf("%d", &num) != EOF)
{
ptr = (nodetype *)malloc(sizeof(nodetype));
ptr->info = num;
ptr->plink = nodeptr->plink;
ptr->slink = nodeptr;
nodeptr->plink->slink = ptr;
nodeptr->plink = ptr;
}
return nodeptr;
}
Consider the following circular doubly linked lists with head nodes pointed by operandl
and operand2 representing two large integers 2, 583, 647 and 72,165, 834, 982.
■ LINKED LISTS—VARIANTS ■ 195
To add the integers represented by the above doubly linked lists we traverse the lists
from right to left (starting from the end of the list), add the two three digit integers in the corre-
sponding nodes and carry digit drawn from the previous node processed to get the sum and
carry digit. We then create a node to store this sum and insert it at the front of another circular
doubly linked list with fixed head that will represent the total of the two given large integers.
For a better understanding we present a C program in the following example 9.1 that
adds and prints two large integers given as input. The integers are represented as a doubly
linked list.
Example 9.1
#include <stdio.h>
#define SIZE 1000
#define LEN 3
typedef struct list_riode{
int info;
struct list_node *plink, *slink;
} nodetype;
nodetype *operandl, *operand2, *total, *readint(), *addnodeint();
main()
{
printf ("Enter two integers in group of %d \n",LEN);
printf ("\nseparating each group by space :\n");
printf ("First integer :");
operandl = readint();
print__dlist (operandl) ;
printf ("Second integer :");
operand2 = readint();
print_dlist(operand2);
total = addnodeint(operandl, operand2);
printf("Sum of these two integers = ");
print_dlist(total);
return;
}
nodetype *readint()
196 ■ DATA STRUCTURES USING C ■
print_dlist(nodetype *nodeptr)
{
/* function to traverse the circular doubly linked list with fixed
head node pointed by nodeptr and prints the information contents of
all the nodes */
nodetype *ptr;
ptr=nodeptr->slink;
while (ptr !=nodeptr)
{
printf("%d,", ptr->info);
ptr = ptr->slink;
}
putchar ('\n');
return;
}
{
/* add integers in nodes pointed by tempi & temp2 */
sum = templ->info + temp2->info + carry;
carry = sum / SIZE;
attach (total, sum%SIZE);
tempi = tempi -> piink;
temp2 = temp2 -> piink;
}
ptr = (tempi == ptrl) ? temp2 : tempi;
headptr = (ptr == tempi) ? ptrl : ptr2; /* select list to
continue with */
while (ptr != headptr)
{
sum = ptr ->info + carry;
carry = sum / SIZE;
attach (total, sum%SIZE);
ptr = ptr->plink;
}
if (carry)
attach (total, carry);
return total;
}
nodetype *temp;
__________________ E i X E i R i C l i S i E S__________________
1. A doubly linked list is a list in which each element contains a pointer to the previous ele-
ment as well as to the next element in the list. There is also a pointer head to the leftmost
element in the list, and a pointer tail to the rightmost element. Both head->prev and
tail->next are set to NULL. Write a C program that creates, destroys and prints such a list.
2. Given a doubly linked list, write a function that removes a node from the list and inserts it
in the front.
3. Write a program that converts a linear singly linked list into a circular linked list.
4. Write functions to perform each of the following operations for circular lists.
(a) Append an element to the end of the list.
(b) Delete the last element from the list.
5. Write algorithms and C routines to perform each of the following operations for doubly
linked circular lists.
(a) Concatenate two lists.
(b) Delete the nth element from a list.
(c) Delete the last element from a list.
(d) Make a second copy of the list.
6. Write a C function mult(x, y) to multiply two long integers represented by doubly linked
circular lists.
7. Write a routine to merge two circular lists A and B to produce a resultant list C. You need
not create a new list; the nodes of the old lists should now appear in the concatenated list.
8. Write a function for a doubly linked circular list which reverses the direction of the links.
9. Using the doubly linked list structure, write a routine back(n), which moves you backward
by n nodes in the list.
10. How can a polynomial in three variables (x, y & z) be represented by a circular list? Write
functions to do the following:
(a) Add two such polynomials.
(b) Multiply two such polynomials.
SORTING
10.1 INTRODUCTION
Sorting is a fundamental operation in computer science. A good deal of effort in computing is
related to making data available to users in some particular order. The concept of an ordered set
of data is one that has considerable impact on our daily lives. For example, lists of names are
frequently printed in alphabetical order and mailing labels are often printed in pin-code order.
Sorting refers to arranging data in some given order, such as increasing or decreasing, with
numeric data or alphabetically, with character data.
In this chapter we are concerned with rearranging the data so that it is in sorted order.
There are two important and largely disjoint categories related to sorting data—internal sorting
and external sorting. Internal sorting takes place in the main memory of a computer, where we
can use the random access capability of the main memory to take advantage in various ways.
External sorting is necessary when the data to be sorted is too large to fit in the main memory.
Many different sorting algorithms have been invented, and we will describe some of the
common sorting techniques and the advantages and disadvantages of one technique over the
other.
10.2 SORTINGTECHNIQUES f l H H H H H H H H I I
10.2.1 Insertion Sort
The sorting method that we shall consider first is called 'insertion sort'. Insertion sort works the
way we might put a hand of cards in order. The hand is scanned for the first card that is lower
than the one to the left. When such a case is found, the smaller card is picked out and moved to
or inserted at the correct location. Fig. 10.1 illustrates the proceess for five elements, each of
which is an interger. The input data, an array of five integers, is shown in Fig. 10.1(a).
Fig. 10.1 Insertion sort. The integers that are known to be sorted at the beginning of each step are
underlined
■ SORTING ■ 201
In the insertion sort, the first two numbers are compared. If the one on the right is larger
than the one on the left, the second number is inserted in front of the number.
The first number slides over and takes the place of the second number. The process goes
on by scanning to the right until a smaller number is found. It is inserted at the correct location,
and the rest of the numbers slide one position to the right. This process is repeated until the end
of the list is reached. The list is then sorted. The basic operation is thus the insertion of a single
element into a sequence of sorted elements so that the resulting sequence is still sorted.
Consider the figure containing an array of five integers. When the fifth element, 235, is
considered by itself, it is a sorted list of length one. The transition from Figs. 10.1(a)-10.1(b)
consists of inserting 46 in the list of elements that is already sorted. Since 46 is less than 235, the
insertion of 46 is at top of the list and the sorted segment of the list now has a length of two.
Next, 162 is between 46 and 235, it is inserted between them by moving 46 up to make room.
The sorted subset of the list has now grown to a length of three. This is shown in Fig. 10.1(c).
Fig. 10.1(d) is obtained by inserting 205 into the list of elements that is already sorted.
This is accomplished by moving 46 and 162 up to make room. Finally, Fig. 10.1(e) is obtained by
inserting 390 into the list of elements (of length four) that is already sorted.
Algorithm 10.1 implements insertion sort as we just described it.
Algorithm 10.1: Insertion sort
Input: An array A with n elements, A [ 1],...,A [n]
Output: Sort the array A into ascending order
Step 1: for k= n-1 to 1 by -1 do
Step 2: j =k+l
Step 3: s=A [k]
Step 4: A [n+1] = S
Step 5: while (S>A [j]) do
Step 6: A [j-1] = A [j]
Step 7: j = j+1
End
Step 8: A [j - j] = S
End
Step 9: Stop
The program for insertion sort is as follows.
Example 10.1
f o r ( 1 =0 ; I < n ; I++)
p r i n t f ( " %3 d" , x [ I ] ) ;
printf("\n*);
f o r 1=1; I <n ; I++)
f o r (j = l , j>0 ScSc x[j]<x[j-l] ; j — )
{
t=x[j];
x[ j ] = x [ j - 1 ] ;
x[ j - 1 ] = t ;
}
printf("\n\n Output data: \n \n");
for(1=0; I<6; I++)
printf("%3d", x[I]);
printf("n");
}
in p u t d a t a :
42 34 56 23 78 90
o u tp u t d a t a :
23 34 42 56 78 90
390 45 45 45 45
205 ----- ►205 182 182 182
182 182 ------- *>205 205 205
45 3 9 0 -------1 390 ------ ►390 235
235 -------1 235 235 235 i----- ►390
Fig. 10.2 Selection sort (smallest element after each step is underlined)
The smallest element, 45, is selected and exchanged with the first element, 390. This pro-
duces a sorted list of length one. Next, the smallest element among the second through the last
element is selected and then exchanged with the second integer in the list. This produces a
sorted list of length two. In the next step the transition is accomplished by selecting the smallest
■ SORTING ■ 203
integer among the third through the last element in the array. This element is then exchanged
with the third element in the sequence. The list is now of length three. Finally, the smallest
integer is selected among the fourth and fifth elements and that element is exchanged with the
fourth element. The entire list is now sorted.
The following algorithm implements the selection sort as we just described.
Algorithm: 10.2: Selection Sort
Input: An array A with n elements, A [1 ],..., A [n]
Output: Sort the array A into ascending order
Step 1: while (n>l) do
Step 2: for k=l to n-1 do
Step 3: Small = k
Step 4: for j=k+l to n do
Step 5: if (A[j] < A [small]) then do
Step 6: small = j
Step 7: X = A [k]
Step 8: A [k] = A [small]
Step 9: A[Small] = X
end
end
end
Step 10: stop
The program for selection sort is as follows.
Example 10.2
i f (X [j ] <X[small])
Small = j;
T= X[i];
X [i]=X[small];
X[small]=t;
}
printf("\n\n Output data:\n\n");
for(i=0; i<6; i + + )
printf("%3d", X[i]);
printf("\n");
}
Input data:
6 5 4 3 2 1
Output data:
1 2 3 4 5 6
In Fig. 10.3, the first pass is made over the data given in Fig. 10.3(a). In Fig. 10.3(a) 390 is
compared with 206. They are interchanged since 206 is smaller. The result is shown in Fig. 10.3 (b).
Next, 390 is compared with 162, and they are exchanged. This is shown in Fig. 10.3(c).
The process is then repeated and finally the result is seen in Fig. 10.3(e).
The first pass moves the largest element to nth position, forming a sorted list of length
one. The second pass moves the second largest element to the n -lth position. In general, after
ith pass, the i largest elements will be in the last i positions, so pass i+1 need only consider i-1
■ SORTING ■ 205
elements. In the mean time the small elements are moving slowly, or bubbling, towards the top.
So this sorting method is refered to as bubble sort.
If no exchange is made during one pass over the data, the data are sorted and the process
teminates. The following algorithm shows how an exchange or bubble sort can be implemented.
Algorithm 10.3: Bubble sort
Input: An array A with n elements, A [1 ] ..............A [n]
Output: Sort the array A into ascending order
Step 1: K = n
Step 2: While ( K > 1 ) do
Step 3: for j = 1 to K -l
Step 4: if(A[j] > A[j+1]) then do
Step 5: p = A [j]
Step 6: A[j] = A[j + 1]
Step 7: A [j + 1] = p
end
Step 8: K= K - l
end
end
end
Step 9: stop
The program for bubble sort is given below.
Example 10.3
/* Bubble sort program */
# include<stdio.h>
# define n 10
int X[]={12,15,5,9,4,7,3,19,4,20,};
m a i n ()
{
int i ,j ,k,temp,min;
/* Unsorted list */
printf("\n\t Unsorted List \n\n\n");
for( i=0; i<n; i++)
printf("% 4d", X[i]);
printf("\n\n\n");
for(i=0; i<n-l; i++)
{
min=i;
for(j = i + l ; j<n; j++)
206 ■ DATA STRUCTURES USING C ■
if(X[j ] <X[min])
min = j ;
temp = X [i ];
X[i] = X(min);
X[min] = temp;
printf("\n pass = %4d \n\n", i+1);
for(k = 0; k < n; K++)
printf("%3d", X[K];
printf("\n");
}
/* Sorted list */
printf("\n\n\n\t Sorted List \n\n");
for(i=0; i<n; i++)
printf("%3d", X [i ];
}
Unsorted List
12 15 5 9 4 7 3 19 4 20
pass - 1
3 15 5 9 4 7 12 19 4 20
pass = 2
3 4 5 9 15 7 12 19 4 20
pass = 3
3 4 4 9 15 7 12 19 5 20
pass = 4
3 4 4 5 15 7 12 19 9 20
pass = 5
3 4 4 5 7 15 12 19 9 20
pass =6
3 4 4 5 7 9 12 19 15 20
pass =7
3 4 4 5 7 9 12 19 15 20
pass = 8
3 4 4 5 7 9 12 19 15 20
pass = 9
3 4 4 5 7 9 12 15 19 20
Sorted List
3 4 4 5 7 9 12 15 19 20
■ SORTING M 207
Complexity of selection sort The selection sort algorithm consists of two loops, outer loop
and inner loop. In the algorithm, each of these is implemented using 'for' statement. There are
n-1 comparisons to find the smallest element, there are n-2 comparisons during pass2 to find
the second smallest element, and so on. Thus, f(n) is calculated as in worst case as
f(n ) = (n - 1 ) + (n - 2 )...+ 2+1 = n(n-l)/2 = 0(n 2)
The number of interchanges and assignments is independent of the data being sorted.
The selection sort always requires 0(n2) comparisons no matter how the data are initially or-
dered, but it never requires more than 0 (n) moves. So selection sort will give the best perfor-
mance for large elements with minimum sorted items. It requires fewer moves than an insertion
sort, and comparing sort items is not the dominant activity.
The following table presents the comparative study between three algorithms discussed so far.
Table 10.1
Method Worst Case Average Case
Bubble sort n (n -1 )/2 = 0 (n 2) n(n-1 )/2 = 0 (n 2)
Insertion sort n (n -1 )/2 = 0 (n 2) n(n-1 )/4 = 0 (n 2)
Selection sort n (n -1 )= 0 (n 2) n(n-1 )2 = 0 (n 2)
gap = gap/3;
for(;gap>0; gap=gap/2)
{
printf("\n\n\t\t Span Size = %4d \n\n", gap);
for(i=gap; i<n; i++)
for(j = i-l; j>0; j=j - gap)
i f (j +gap < n & & (x[j ]>x(X[J] + gap]))
{
temp = x [j ];
x[j] = x[j+gap];
x[j+gap]=temp;
}
for(K=0; K<n; K++)
printf("% 4d", X[K];
}
printf ("\n\n\t\t Sorted data \n\n");
for(i=0; i<n; i++)
printf("%4d", X[i]);
}
Unsorted Data
9 12 45 7 53 12 78 90 10 65 3 19 47
Span Size= 13
9 12 45 7 53 12 78 90 10 65 3 19 47
Span size = 6
9 12 45 7 53 12 47 90 45 65 53 19 78
Span Size=3
7 3 10 9 12 12 47 53 19 65 90 45 78
Span Size= 1
3 7 9 10 12 12 19 45 47 53 65 78 90
Sorted Data
3 7 9 10 12 12 19 45 47 53 65 78 90
Complexity of shell sort The main reason why straight insertion is relatively slow is that the
items are inserted sequentially. Thus the average running time is 0 (n 2.) Shell proposed an effi-
cient variant in which the insertion process is done in several passes of successive refinements.
For a given input size n, the passes are determined by an 'increment sequence' ht, h
.. .hj, where h = 1. In the early passes (when the increment are typically large), elements can be
displaced far from their previous position with only a few comparisons. The later passes "fine
tune" the placements of elements. The last pass, when h =1, consists of a single straight inser-
212 ■ DATA STRUCTURES USING C ■
tion sort of of entire array. The resulting expected running time is 0 (n5/3). The average running
time will be at least proportional to n2.
The shell sort is also called the diminishing increment sort because the increment se-
quence continually decreases. The method is most efficient if the successive values of h are kept
relatively prime to each other. Kunth has mathematically estimated that, with relatively prime
values of h, the shell sort will execute in an average time proportional to 0(n(Log 2 N)2) .
Each step of quick sort positions a given list into three disjoint sublists. One of these
sublists is a single element that is in its sorted position. The sublist in the left of the sorted
element contains elements that are less than the sorted element whereas the sublist in the right
of the sorted element has elements that are higher than the sorted element. This sorting problem
is said to have been partitioned into further sublists and so on.
Let us illustrate the quick sort with an example. If the initial array is given as
26 57 48 37 12 92 86 33
and suppose we want to place the first element 26 in its proper position, the resulting array is
12 26 57 48 37 92 86 33
Note that the element 12 is less than or equal to 25 and each element above 26 is higher
than or equal to 26.
Since 26 is in its final position the original array has been decomposed into the problem of
sorting two subarrays
(12) and (57, 48, 37, 92, 86, 33)
The entire array is viewed as
12 26 (57, 48, 37, 92, 86, 33)
where parentheses enclose the subarrays that are yet to be sorted. Repeating the process
on the subarray on the right of 26 yields
12 26 (48, 37, 33) 57 (92, 86)
and further repetitions yield
12 26 (37, 33) 48 57 (92, 86)
12 26 (33) 37 48 57 (92, 86)
12 26 33 37 48 57 (92, 86)
■ SORTING M 213
12 26 33 37 48 57 (86) 92
12 26 33 37 48 57 86 92
Note that the final array is now sorted. The algorithm for quick sort is presented below.
Algorithm 10.5: Quick sort
Input: An array A of size n containing n distinct numbers, A [1]...A[n].
Output: Rearranging the elements in array A into ascending order.
Step 1: Right = n
Step 2: for left = 1 to n do
Step 3: Call function qs (A, left, right)
end
Step 4: Stop
Function: qs (A, Left, right)
Step 1: if (left < right) then do
Step 2: j=left
Step 3: K= right + 1
Step 4: while (j <= K) do
Step 5: while (A [j] < A [left]) do
Step 6: j= j + 1
end
Step 7: while (A[K]>A[left] do
Step 8: K= K - l
end
Step 9: if(j < k) then do
Step 10: s=A [j]
Step 11: A [j] = A[K]
Step 12: A [K] = s
end
end
Step 13: s=A [left]
Step 14: A [left] = A[K]
Step 15: A [K] = s
Step 16: qs (A, left, K -l) / * Recursive call * /
Step 17: qs (A, K+l, right) / * Recursive call * /
Step 18: Return
The program for quick sort is given below.
Example 10.5
int X []={5,2,4,7,1,4,3,8,9,4,6,7,10,12,2,15};
/* Quick sort program */
214 ■ DATA STRUCTURES USING C ■
m a i n ()
{
int 1,r ,i ,j ;
1 =0 ;
r=15;
printf("\n\t\t Unsorted data \n");
for(i=0; i<16; i++)
printf("% 3d", X [i ]);
printf("\n”);
for(i=0; i<16; i + +)
quick(X,l,r);
printf("\n\t\t Sorted data \n");
for(i=0; i<16; i + + )
printf ("%3d'', X[i]
printf("\n");
}
quick (x, 1, r)
int x [],1,r ;
{
int i, j, pivot, t, k;
if(r>l)
{
pivot = X[l];
i = 1 +1;
j=r;
do
{
while(X[i]< = pivot && i<r); i++;
while(X[j]> pivot && j>l); j — ;
if(i<j)
{
t=X[i];
X[i] = X [j ];
X[j] = t;
}
printf("\n pivot = % d \n", X[i]);
■ SORTING ■ 215
pivot=4
12 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=4
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=5
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=9
1 2 2 3 4 4 4 5 9 8 6 7 7 12 10 15
pivot=9
1 2 2 3 4 4 4 5 9 8 6 7 7 12 10 15
pivot=7
1 2 2 3 4 4 4 5 7 7 6 8 9 12 10 15
pivot=7
1 2 2 3 4 4 4 5 7 7 6 8 9 12 10 15
pivot=6
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=6
1 2 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=7
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=7
1 2 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=8
1 2 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=9
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=12
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=12
12 2 3 4 4 4 5 6 7 7 8 9 10 12 15
Sorted Data
12 2 3 4 4 4 5 6 7 7 8 9 10 12 15
Complexity of quick sort Both the expected and the minimum number comparisons required
by quick sort are 0(n log2n). The worst-case behaviour of this algorithm is of the order 0(n 2).
However, if we are lucky enough then each time an item is correctly positioned, the sublist to its
left will be of the same size as that to its right. This would make the sorting of the sublists each
of size n/2. The time required to position an item in a list of size n is O(n).
Quick sort is an extremely good general-purpose routine. A big drawback of the method
is its worst-case performance. It requires 0(n 2) time to sort a list that is already sorted or nearly
m SORTING m 217
sorted. A good way to guard against guaranteed bad behaviour is to choose the pardoning
elements to be random element in the current sublist (say, the first, middle, and the last ele-
ments). This also has the effect of reducing the average number of comparisons.
If the smaller of the two sublists is always processed first after each partition, then the
required stack contains at the most log n entries. We can simulate the stack with only a constant
amount of space at a slight increase in computing time.
Note that some care is necessary when one of the two sublists becomes exhausted.
Algorithm 10.6: Merge sort (Recursive version)
Input: Merge sort requires two arrays r and t. r-array holds data to be sorted and array t is used
for merging operation.
Output: Array r holds the sorted list in ascending order.
/* Recursive function */
Merge sort(n) /* Array r of size n */
begin
L = 1
218 ■ DATA STRUCTURES USING C ■
If (n > = 3)
then repeat
Merge(L, n, r, t)
L = 2 * L
Merge(L, n, t, r)
L =■ 2 * L
until L > = (n div 2)
if(L < n)
then begin
Merge(L, n, r , t,)
for K = 1 to n do
r [K] = t [K]
end
end
/* Merge function */
Merge(L, n, r, t)
begin
kl = 1
k2 = L+l
q=l
repeat
endl= kl+1
if(endl> = n)
then endl= n+1
else
begin
end2=k2+l
if(end3>n)
then end2= n + 1
repeat
if(r [Kl]< =r[K2]
then begin
t[q] = r[Kl]
q= q + 1
kl - kl+1
■ SORTING ■ 219
end
else
begin
t[q] = r [k2]
q=q + 1
k2 = k2 +l
end
until (kl=endl)or(k2=end2)
end
if(kl < endl)
then repeat
t[q] = r[kl]
q= q+1
kl= kl+1
until kl = endl
else repeat
t[q] = r[k2]
q= q + 1
k2= k2 + 1
until k2 = end2
kl = k2
k2 = k2 +1
until (kl > = n)
end
The non-recursive merge-sort program and recursive merge-sort program are given as
follows.
Sorting a list of elements by the method of merge sort
Example 10.6
void merging(int x[], int y[], int z [ ] , int ml, int m2, int m3)
{
static int xlower = 0, ylower=0,zlower= 0 ;
int xupper = m l , yupper= m2, zupper = m3, 1;
if( x [xlower] < y [ylower])
{
z[ zlower ] = x[xlower] ;
x lower++ ;
}
else
{
z[zlower ] = y[ ylower ] ;
ylower++ ;
}
■ SORTING M 221
zlower++ ;
if ( ( xlower < xupper ) ScSc (ylower<yupper ) )
merging( x, y, z, ml, m2, m3 ) ;
else
{
while (xlower < xupper) Z[zlower++] = x[xlower++ ] :
while(Ylower <yupper) Z[zlower++] = y[ylower ++] :
printf("\n\n");
printf("Elements of the sorted list after combining");
printf("First and second array \n");
for(1=0; 1 < zupper ; 1++ )
printf("% d " , z [ 1 ];
}
}
Enter number of elements in the first sorted array: = = = > 6
Enter the elements of the first sorted array:
12 16 22 34 45 50
Enter number of elements in the second sorted array: = = = > 5
Enter the elements of the second sorted array:
34 39 42 51 59
The elements of the first sorted array:
12 16 22 34 45 50
The elements of the second sorted array:
34 39 42 51 59
Elements of the sorted list after combining first and second array:
12 16 22 34 34 39 42 45 50 51 59
Example 10.7
j J************************************************************************** j
/ * MERGING OF TWO SORTED FILES (Using recursion) */
I ************************************************************************** J
# include < stdio.h >
# include < conio .h >
# include < stdlib .h >
m a i n ()
{
void merge(int_x[], int y[], int xl, int x2, int cl, int c2, int i);
222 ■ DATA STRUCTURES USING C ■
int i , n l , n 2 ;
int a [100], b[100];
clrscr();
printf("Enter the total number of elements in list array \n");
scanf("%d", &nl ) ;
printf (,#\n Enter the element of list file in ascending order \n");
for(i=0; i<nl; i++);
{
printf("a [% d] = " , i + 1 ) ;
scanf("%d", &a [ i ] ;
}
printf("\n");
printf("Enter the total number of elements in 2nd array \n");
scanf( " %d " , &n2 );
printf("\n Enter the elements of 2nd file in ascending order \n");
for( i = 0 ; i < n2 ; i++)
{
printf("b [ % d ] = " , i + 1 );
scanf("% d ", & b [i];
}
printf("\n The list in ascending order \n");
printf ("<--------------------------------------> \n") ;
merge( a, b, nl, n2, 0, 0, 0 );
getch();
}
void merge(int x[], int y[], int xl, int x2, int c2, int i)
{
if ( Cl + C2 = = xl + x2 )
return;
else
{
if ( x [cl] < y[c2] ScSc cl < xl ) ( ( (c2 == x2) )
{
printf("c [ % d ] = % d \ n", i + 1, x[ cl ] ;
merge( x, y, xl, x2, cl, c2, i + 1 ) ;
}
m SORTING m 223
else
{
if ( ( ( x [cl] > y [c2] && c2 < x2 ) (
( cl = = xl ) )
{
printf("c [ % d ] > y [ % d\ n " , i +
1 , y [ c2 ] ;
m erge( x, y, xl, x2, cl , c2+ 1 , i + 1,
Y [ c2 ]);
}
}
}
}
Enter the total number of elements in list array
5
Enter the elements of list file in ascending order
a[l] =2
a[2] =4
a[3] =8
a[4] =11
a[5] =12
Enter the total number of elements in 2nd array
3
Enter the elements of 2nd file in ascending order
b[l] =1
b[2] =3
b[3] = 6
The list in ascending order
<-------------------------------------------------------------------
c [1] = 1
c [2] = 2
c [3] = 3
c [4] = 4
c [5] = 6
c [6] = 8
c [7] = 11
c [8] = 12
224 ■ DATA STRUCTURES USING C ■
Complexity of merge sort The algorithm merge sort makes a pass over the entire array each
time it is called. This occurs approximately log2n times in sorting a list of length n. The proce-
dure merge moves all n of the elements during each pass. So it requires n log2n moves, no matter
how the data are initially ordered. It will perform better for a linked list than for an array. The
number of comparisons during a pass depends on the order of the data. If the sublists of length
one are being merged, then there will be n/2 comparisons. If sublists of length greater than
n/2 are being merged, then as many as (n-1) comparisons may be required. The merge sort
requires approximately n log2n moves and some where between (n/ 2) log2n and n log2n com-
parisons. It has a very consistent performance since the effort required to use it does not affect
much the initial order of the data. It has a drawback to implement it using arrays, because it
requires twice as much memory as any of the other algorithms.
Fig. 10.6 A heap view as a binary tree (a) and an array (b)
In the above figure the number within the circle at each node in the tree is the value
stored at that node. The number prefix to a node is the corresponding index in the array. The
tree is completely filled on all levels except possibly the lowest, which is filled from the left up
to a point.
■ SORTING ■ 225
A heap is a binary tree which must satisfy the following two conditions:
(a) The data value that is sorted in any node is less than or equal to the value in any of that
node's children. The value stored in the root of a heap is always the smallest value in the
loop. This condition is called the ordering property of heaps.
(b) A heap must be a complete binary tree. This property is known as the structuring prop-
erty of heaps.
Suppose that A [1], A [2], ..., A[n] is a sequence of elements. This sequence is a heap if it
satisfies the following two conditions
A[i] < A[2i] ... (i)
A[i]< A[2i+1] ...(ii)
for all applicable values of i.
The two conditions will be referred to as the heap conditions. Fig. 10.7 shows three
arrangements of a sequence of elements from a heap. Each element is an integer and each se-
quence contains a set of integers.
1 2 3 4 5 6 7 8 9
A 10 20 25 30 40 42 50 52 55
(a)
1 2 3 4 5 6 7 8 9
A 10 30 20 40 50 25 55 52 42
(b)
1 2 3 4 5 6 7 8 9
A 10 40 20 50 42 30 25 52 55
(c)
The data sets of array A in Fig. 10.7(a) are sorted and then form a heap. So the sorted data
always forms a heap as shown in Fig. 10.7(b) and Fig. 10.7(c). Many other arrangements of the
same data also form a heap. Fig. 10.8 shows binary trees constructed from the heaps.
The heap-sort proceeds in two phases. First, put the data into the heap and, second, data
are extracted from the heap in sorted order. A heap sort is similar to the selection sort because
both algorithms select and then swap into sorted order successively larger elements. A heap
sort uses a more efficient data structure than selection sort but a selection sort will be faster than
heap sort in case of a smaller set of elements.
Initially in a heap-sort structure the smallest element is on top. Thus, if the elements
forming the heap are sorted on the array elements A [1], A [2], ..., A[n], then the elements with
smallest key will be stored as A [1]. Thus it is required to find element with the second smallest
key.
226 ■ DATA STRUCTURES USING C ■
int i,j , X ;
i = 1 ;
J =2 *i ;
X = A [ i ] ;
while ( j < = r )
{
if( A [ j ] < A [ j + 1 ] )
j ++ t
■ SORTING M 229
if ( X > = A[ j ] break ;
A[ i ] = A[ j ] ;
i = j;
j = i * 2;
}
A[ i ] = X ;
return ;
}
Complexity of heap sort In any sorting method we have a sequence of n integer values, that is,
Av A2, ..., An, and it is required to sort them either in ascending or descending order. Heap sort
proceeds in two phases— create heap phase and maintaining heap property phase. The heap-sort
procedure takes time 0 (n log2 n) since the call to CRE ATEHEAP takes time 0 (n) and each of the
n - 1 calls to maintain the heap property. The overall time complexity of heap-sort is therefore
O (n log2n) in both the average case and the worst case.
In C language, comparing string is done with the help of library function s trcmp. This
function compares two strings element by element until a difference is found. If the two strings
are the same, it returns 0, else it returns a positive number or a negative number depending on
whether the first string is lexicographically greater than the second string or less than the sec-
ond string. While comparing strings is done, it is needed to copy the strings. The following
program illustrates how the quick-sort program can be modified to sort strings.
Example 10.9
# include < stdio.h>
/* Sorting an array of strings */
# define n 10
ma i n ()
{
Char *Names[N]
Int i, left, right;
Left= 0; right = N - 1 ;
/* Read an array of names */
for(i = 0 ; i < N ; i ++)
scanf(" %s\n " , *Names[i];
printf( " \n\n Unsorted names are given below : \n\n ");
for(i = 0 ; i<n ; i++ )
printf( " % s " , Names[I];
/* Call quick sort routines */
Quick sort (Names, left, right ) ;
/* Printing of sorted names */
printf( ' /n/n Sorted names are given below \n\n " ) ;
for( i = 0 ; i<n ; i++)
printf(" \n % s " , Names[i] ;
return 0;
}
char *name ;
float salary ;
};
struct employee e[100] ,arr;
int i , k, j , len;
float n ;
char * rname;
clrscr();
printf( " \n Enter the number of employees ; " ) ;
scanf ( " % f . * , Sc n : i ++ ) ;
{
printf( "\n Enter the id number, name, salary: 9 );
scanf(*%d %s %f", &e[i]. id, rname, &e[i];salary );
len= strlen( rname );
len= len * sizeof( rname );
e[ i ]. name = (char * ) malloc ( len );
strcpy( e [ i ] . name, rname ):
}
/* Starts the sorting process */
for ( i = 1 ; i <= n; i++)
{
for ( j= 1; j<=( n-*l ); j+ + )
{
k = Strcmp( e [ j ] .name, e [ j + 1 ] .name ) ;
If ( k > 0 )
{
arr = e [ j + 1 ] ;
e[j + l ] = e [ j ] ;
e [ j ]= arr ;
}
/* If the two strings are equal ! ! */
i f ( k == 0 )
{
if ( e [ j ] . i d > e [ j + 1 ].id)
{
arr= e[ j + 1] ;
■ SORTING ■ 233
e[ j + 1 ] = e[ j ] ;
e [ j ] = arr ;
>
}
}
}
int eno;
char ename[30];
float salary;
int egrade;
In-
while(ch !='\0')
{
str[c]= ch ;
C++ ;
ch=E[j ].ename [c ] ;
}
str[c]='\0';
c=0;
ch=E[j + l].iname[c];
while (ch! = '\0' )
{
E [j ].ename[c ] ;
c+ + /
ch = E [j + 1].ename[c ];
}
E [j ] .ename[c ] ='\0';
c=0 ;
ch=str fc];
While ( ch ! = #\0')
{
Etj+1].ename[c]=ch;
C++ f
ch= str[c];
}
E [j + 1]. ename [c ] = '\0';
/ * Swap for employee no */
no=E[j].eno;
E[j].eno=E [j + l].eno;
E [j + 1].eno= no;
/* Swap for employee salary */
sail = E [j].esalary;
E [j ].salary = E [j + 1].esalary;
E[j+1].esalary = sail ;
/ * Swap for employee grade */
grl=E [j].egrade;
E [j].egrade=E[j+1].egrade;
236 ■ DATA STRUCTURES USING C ■
E [j + 1 ] .egrade= grl;
}
}
}
printf( "\n The sorted employee list is: " );
for( i =0; i<n,i++)
{
printf("\n%d\t%s\t%f\t%d",E [i].eno,E[i].ename,
E[i].esalary,E[i].egrade);
}
}
Enter how many employees 5
Enter emp-no 7171
Enter emp-name T. Dey
Enter employee salary 6000
Enter the grade 2
Enter emp-no 5911
Enter emp-name P. Roy
Enter employee salary 3500
Enter the grade 3
Enter emp-no 2110
Enter emp-name A. Sharma
Enter employee salary 5000
Enter the grade 2
Enter emp-no 1010
Enter emp-name R. Kumar
Enter employee salary 8500
Enter the grade 1
Enter emp-no 3950
Enter emp-name J. Dutta
Enter employee salary 9700
Enter the grade 1
The sorted employee list is:
2110 A. Sharma 5000. 000000 2
3950 J. Dutta 9700. 000000 1
5911 P. Roy 3500.000000 3
1010 R. Kumar 8500. 000000 1
7171 T. Dey 6000. 000000 2
■ SORTING ■ 237
E mX mE mR mCml mS mE mS
1. Write a program to sort the following sequence of eight integers
57, 27, 22, 95, 79, 45, 96
using insertion sort. In each case, keep track of the number of exchanges and comparison
required.
2. What starting conditions are necessary to produce the worst-case behavior of bubble sort?
3. Both insertion and bubble sort are particularly deficient with respect to its ability to move
elements long distances. Shell sort attempts to move element through long distances. It is
fairly simple matter to modify the insertion sort algorithm so that is becomes a shellsort
algorithm. Write an algorithm to implement this.
4. Mean sort uses mean value of those elements being partitioned. Implement mean sort for
an array containing the following integers:
482, 231, 928, 204,105, 428, 379, 47
5. A sorting algorithm is stable if it preserves the order of elements with equal keys. Which of
these algorithms—quick sort, heap sort, or merge sort are stable?
6. The algorithm for heap sort places elements in ascending or descending order. What changes
of a C-Program are needed so that the result will be in ascending order if it originally sorts
in descending order or vice-versa?
7. A drawback to merge sort for arrays is its requirement for twice as much memory as any of
the other algorithms. Merge sort moves all elements during each pass. So it requires nlog2n
moves no matter how the data are initially ordered. It will therefore perform better with a
linked list data structure than with an array. Implement merge sort algorithm using linked
structure.
8. It is noted that (3 /2) nlog2n is a conservative bound for the number of exchanges and com-
parisons required for heap sort algorithm. Run tests on random sequences of integers to
determine the number of exchanges and comparisons actually required. Plot the experi-
mental results along with the value (3/2) nlog2n for merge sort.
9. What are similarities and differences between selection sort and heap sort?
SEARCHING
11.1 INTRODUCTION
We now consider different methods of searching large amounts of data to find one particular
piece of information. This chapter focuses on four general types of searching, that is, sequential,
binary, indexed search, and hashing schemes. The purpose here is not to present an exhaustive
review of all possible search techniques, but to highlight how the main technique can be imple-
mented.
We first define some common terms. A table or a file is a collection of elements, each of
which is called a record. Each record consists of a set of fields. One of the field has a special
meaning called key field. Such key is also called an internal key. Sometimes there is a separate
table of keys that include pointers to the records. Such keys are called external keys. For each
file, there is at least one unique key, that is, no two records have the same key value and it is
called a primary key. However, keys need not always be unique. Such a key is called a second-
ary key.
A searching method is a process that accepts an argument x and tries to locate a record
whose key value is x. The process may return the entire record or it may return the address of
the record. If searching for a record is unsuccessful, then a null value will return. We start our
discussion with the simplest searching technique which is known as sequential search.
A search that looks through a list from the top to bottom while checking each item for a match
is a sequential search or linear search. This search is applicable to a table organised either as an
array or as a linked list. The item to be matched is called a key. Let us assume that x is an array
of n keys, x[0] through x[n-l], and r an array or record, r [0] through r[n-l], such that x[i] is the
key of r[i]. The search begins at array index 0 and ends either when a match is found or the end
of the array is reached. We want to return the smallest integer i such that x[i] equals key if such
an i exists and -1 otherwise. The program segment may look as follows:
for(i=0; i<n ; i+1)
if(key == x [i])
return(i);
return(-1);
The program segment is simple enough. It examines each key in turn and its index is
returned if it matches with the search argument, otherwise -1 is returned.
The sequential search operates with the fact that the entries in the list are in order. If the
■ SEARCHING ■ 239
key is the last entry in the list or if the key is larger than all the entries, then the algorithm will do
n comparisons for n elements.
The following algorithm illustrates the idea of sequential search.
Algorithm 11.1: Sequential search
Input: An array A with n elements (1,..., n) and x is the item to be searched.
Output: The location of x in array A; otherwise return 0 if x is not found.
Step 1: Seti=l
Step 2: While i<=n and A [i] * x do
Step 3: Set i = i+1
end while
Step 4: if i>n then set i =0
end if
Step 5: stop
The sequential search algorithm terminates with index i equal to the index of the first occur-
rence of x in A, if x is present, and equal to 0 otherwise.
The sequential-search algorithm is implemented with a C program as follows:
Example 11.1
# include<stdio.h>
/* Program for Sequential search */
#define N 7
int A[ ] = {20,17,18,7,5,6,19};
m ai n ()
{
int x, y;
printf("\n\n Enter the item to be searched :\n");
scanf("%d", &x );
y=sequential(x);
if(y « -l)
printf("\n Element is not found \n");
else
printf ("\n Element is found and its position is %d\n*, y ) ;
}
/* Function sequential */
sequential(key)
int key;
{
int i;
for(i =0; i<N ; i++)
240 ■ DATA STRUCTURES USING C ■
if(key == A [i])
return (i);
return(-1);
}
A sample run is shown below :
Input: 17
Output: Element is found and its position is 1
Input: 16
Output: Element is not found
The most efficient method of searching a sequential table without the use of indices or tables is
the binary search. This searching technique is a divide-and-conquer method applicable to sort
data items. The data can be in array or a file. When the data is a structure, a key field is used. The
primary limitation of the method is that the data must be in sorted order. Let us first explain
why this method is better than the sequential method.
Consider an array of elements in which they have been placed in some order. For ex-
ample, a dictionary or telephone book may be thought of as an array whose entries are in alpha-
betical order. Suppose we want to search a name for finding his/her telephone number in a
telephone book or a word in a dictionary. In sequential search, each item of the array is exam-
ined in turn and compared with the item being searched for, until a match occurs. If the list is
unordered, the sequential search may be the only way to find anything in it. Such a method
would never be used in looking for a name in the telephone directory since it may complicate
the searching process. Instead of this, the book is opened to a random page and the names on
that page examined^Since the names are ordered alphabetically, such an examination would
determine whether the search should proceed in the first half or second half of the book. This is
the basic idea of the binary search.
In the binary search, an array is divided into two parts. Now, compare the item being
searched for with the item at the middle of the array. If they are equal, the search has been
completed successfully. If the middle element is greater than the item being searched for, the
search process is repeated for the left half of the array since it may appear anywhere in the first
half. On the other hand, if the middle item is less than the item being searched for, then the
process is repeated in the second half. This method reduces the number of elements yet to be
searched into half for each comparison. For large arrays, this method is superior to the sequen-
tial search in which each comparison to the sequential search reduces the number of elements
yet to be searched by onlyone. Let us illustrate the method with the following example.
Suppose we want to locate the data associated with key value 1649. We begin the search in
1129 1203 1211 1519 1609 1649 2821 3575 9279 9289
the middle of the list. In our example, first we search the key found at position 5. The key value
1649 is greater than the key (i.e., 1609) at position 5, we conclude that the key we want to find
should be in between 6 through 10 if it is to be found at all. We again divide by of sum 6 and 10
and find that the key at position 8 is greater than the actual key. This process continues until we
find the key 1649 at position 6.
■ SEARCHING ■ 241
printf("%3d", x[i]);
printf("\n\n\n");
y= binary search(key);
printf("\n\n % d\n", y ) ;
}
/* Binary search function */
binarySearch(key)
int key;
{
int 1, r, m;
1 = 0;
r - n+1;
while( r !=(1+1))
{
m = (1 + r ) / 2 ;
if (x[m]<= key )
l=m;
else
r= m;
}
if (x[l]==key )
{ printf("\n Element is found and its position is
=%d\n" );
}
else
{
printf("\n Element is not found \nw);
}
}
A sample run is given below:
Input:
Enter element to be searched
17
Output:
1 5 7 8 10 12 14 17 18 20 22 23 24
Element is found and its position is = 7
■ SEARCHING ■ 243
#include <stdio.h>
struct INDEX
{
int kindex;
int*pindex;
};
struct INDEX index[10];
int datafile[25], n, size;
void main ( )
{
int i,j=0, found, x,indxsize,key;
int * low, * high, *temp;
clrser ( );
244 ■ DATA STRUCTURES USING C M
{
printf("\n%5d%17u", index [i], kindex, index [i].pindex);
}
printf("\n Enter the number to be searched = = = > " ) ;
scanf("%d"; &key);
found=0 ; x=0;
whi l e (i<indxsize) &&(found ==0))
{
if(index[x].kindex<key)
found=l;
else
x=x+l;
}
if(x== 0)
low = &datafile[0];
else
low = index[x-1].pindex;
if(found ==1)
high = index[x].pindex-1;
else
high=&datafile[n-1] ;
temp = low;
found =0;
while(( temp <= high) && (found == 0))
{
if (*temp== key)
found = 1;
else
temp++;
}
if(found ==1)
printf("\n Found at address %u: Element is %d", temp, *temp);
else
printf("\n Data not found");
getchO;
}
246 ■ DATA STRUCTURES USING C ■
Enter how many elements in the file (Not more than 25) =======>10
Enter the data: In sorted order
12 19 23 35 42 46 55 67 70 85
Enter the size of the index table===>4
The input data file of integers is as follows:
35 1816
67 1824
85 1828
Enter the number to be searched ====> 55
Found at address 1822: Element is 55
Unfortunately, it is also possible that two keys may hash to the same slot— a collision. Fortu-
nately, there are effective techniques for resolving the conflict created by collisions.
A good hash function satisfies the assumption of simple uniform hashing, that is, each
key is equally likely to hash to any of the m slots. Most hash functions assume that the keys are
natural numbers. Various techniques are available for generating the hash functions.
In the division method for creating hash functions, we map a key k into one of the m slots
by taking the remainder of k divided by m, that is, h(k) =k mod m. For example, if the hash table
has a size 20 and the key value is 119 then h(k)=5. It requires only division operation and natu-
rally works fast. Note that for a key value of 118 the hash function h(k) is same as for the key
value 119. This indicates a collision. To avoid collision, it may be better to select the value of m
as prime number.
In some applications, the key value is not an integer. For example, in the case of an insur-
ance number, folding technique for creating hash function works better than the other tech-
niques. The insurance number
567-96-1505
In the shift-folding method, the above number is viewed as three separate numbers to be
added
567
96
1505
2177
The above number can be treated as the hash position itself or further hashing technique
such as division remainder to get a final hash position in the desired range.
The shift-folding method has a great advantage for its ability to transform non-integers
keys into integers suitable for further hashing action.
We now present the simplest collision-resolution technique, called chaining. In chaining,
we put all the elements that hash to the same slot in a linked list. In a hash table T with m slots
that stores n elements, the average number of elements stored in a chain is n/m. In worst case,
all n keys hash to the same slot, creating a list of length n.
The following program illustrates the idea of searching a hash table using chaining
techniques.
Example 11.4
#include<stdio.h>
#include<stdlib.h>
#incude<alloc.h>
#define B 13
struct nodetype
{
in t d ata;
248 ■ DATA STRUCTURES USING C ■
}
ptr->1ink = root;
}
>
printf("The hash table is as follows:\n");
printf("=======================\n");
display( );
printf(" \n Enter the data to be searched ======>");
scanf("%d, &item);
prt= search(item);
if(ptr! =NULL)
printf("\n Data found: At position %d in hash table [%2d]",
count, hash(item));
else
printf("Data not found");
getch;
>
hash(int n)
{
int p;
p=n%B;
return(p);
}
void display();
{
int i;
for (i=o; i<B; i++)
{
root= arr[i];
printf("Hash table [%2d] ===>", i);
while (root! NULL)
{
printf("%d — >", root->data );
root=root->link;
}
printf("NULL\n");
}
250 ■ DATA STRUCTURES USING C ■
}
node *search(int n)
{
int found,i;
node *p, *q;
found =0;
i=hash(n);
p=arr[i];
while(( p !=NULL) &&(!found))
{
count++;
if (p->data ==n)
found =1;
else
p=p->link;
}
i f (found==l)
return(p);
else
return(NULL);
}
How many data to be inserted in the hash table =====>20
The hash table is as follows:
EnXiiE^RiiC^I^S^EiiS
1. What are the advantages and disadvantages of the Sequential search algorithm?
2. Binary search, a technique that takes advantage of the stored order of the list, takes an
amount of work 0(log2n). What is the maximum number of probes made by a binary search
in a list of 128 elements?
3. Determine the list size for which binary search becomes more efficient than sequential search?
4. What are the main advantages of indexed sequential search over sequential search?
5. The division hash function H(k) = k mod m, is usually a good hash function if m has no
small divisors. Explain why this restriction is placed on m.
6. A perfect hash function is one that causes no collisions. How many probe(s) is /are needed
to locate an element that has a given key value.
7. When the perfect hashing functions are feasible?
8. Open address hashing method attempts to place second and subsequent keys that hash to
the open table location into some other position in the table that is unoccupied. What are the
main drawbacks of this method?
252 ■ DATA STRUCTURES USING C ■
9. External chaining has a linked list associated with each hash table address. Each element is
added to the linked list at its home address. What are the advantages of external chaining
over open address hashing technique?
10. With a table size 50000, after how many insertion operations does hashing with open ad-
dressing display the same behavior as binary search?
TREES
For processing dynamic lists it has been seen that the linked-list data structure is very useful. It
imposes a structure where an element may have a predecessor and a successor. But many natu-
ral applications require data structures, which gives the flavour of a hierarchy. This hierarchical
nature is not available in the linked data structure that we have studied earlier. However, the
data structure, the tree, imposes a hierarchical structure on a collection of elements. In this
chapter we will consider trees together with the operations on them and their applications.
12.1
A tree consists of a finite collection of elements called nodes or vertices, one of which is distin-
guished as root, and a finite collection of directed arcs that connect pairs of nodes. In a non-
empty tree the root node is having no incoming arcs. The other nodes in the tree can be reached
from the root by following a unique sequence of consecutive arcs. A NULL tree is a tree with no
node.
The roots in the natural trees are in the ground and grow their branches upwards in the
air. Although trees derive their name from such natural trees, computer scientists usually por-
tray tree data structures in the upside-down form of natural trees, that is, the root at the top and
its growing branches of it downwards.
For illustration, the conventional method of portraying a tree is shown below. In this
picture the direction of the arcs is not shown but it is assumed that the direction of all the arcs is
downwards.
254 ■ DATA STRUCTURES USING C ■
The tree on the previous page has twelve nodes with root node as a at the top, and having
no incoming arcs. The nodes such as i, e, j, k, 1, h do not have any outgoing arc and are
called leaves. The nodes that are directly accessible from a given node are called children of that
node. A node is a parent of its children. For example, d, e, f are the children of b, while b is the
parent of d, e and f. The nodes d, e, f have the common parent b and so they are siblings. If
there is a path from node nl to node n2 we say nl is an ancestor of n2 and n2 is a descendant of
nl. For example, c is an ancestor of 1 and 1 is a descendant of c. Clearly, a node is an ancestor
and descendant of itself. An ancestor or descendant, which is different from itself, is known as
a proper ancestor or proper descendant, respectively. So the other way of defining a leaf is a
node with no proper descendant.
The depth or level of a node may be defined as follows: the depth of the root node is 0 and
the depth of any other node in the tree is the depth of its parent plus 1, that is, depth of a node
is actually the length of the unique path from root to that node. The height of a tree is the depth
of the node that is at the largest depth in the tree plus 1.
In the following sections we will discuss a more specific type of tree: the binary tree. This
type of tree has many applications and has different forms. In a binary tree there are at the most
two outgoing arcs from a node. If every intermediate (non-leaf) node in a binary tree has exactly
two non-empty children, then it is a strictly binary tree. In case in a strictly binary tree all the
leaves are at a same depth d say, it becomes a complete binary tree. The binary tree in Fig. 12.1
is a complete binary tree of depth 3 and hence also a strict binary tree.
Clearly, in a binary tree if there are b nodes at depth d then at depth (d+1) there is a
maximum of 2b nodes. So in a complete binary tree of depth d there are 2dleaf nodes. Hence the
total number of nodes in a complete binary tree of depth d is given by t where
t = l + 2 J+ 2 2+ ... + 2 d
= 2<d+1) - 1
Fig. 12.2 Each path marked in a binary tree
Trees in which every node has a maximum of two children are called binary trees. For example,
a binary tree is shown in Fig. 12.2. Each path from the root to a leaf node of this tree corresponds
to a particular outcome in flipping a coin three times in succession. For instance, the path A-C-
F-L represents the outcome THH in the diagram.
A formal recursive definition of a binary tree may be written as follows:
A binary tree is either empty, that is, it is a binary tree with no nodes, or consists of a node called
the 'root' which has two pointers to two different binary trees called Teft subtree' and 'right
subtree'.
Note that left subtree and right subtree are two different binary trees and hence we dis-
tinguish between them, that is, the binary trees shown in Fig. 12.3 are two distinct binary trees
in the sense that the root of the binary tree in Fig. 12.3(a) has an empty right subtree while the
root of the binary tree in Fig. 12.3(b) has an empty left subtree. It is also to be noted that a binary
tree having n nodes has (n-1) arcs (edges).
The above definition suggests a natural implementation of binary tree using linked structure.
Each node in the binary tree has two links, one pointing to the left subtree of the node (this link
will be NULL if the left subtree is empty) and the other pointing to its right subtree (NULL if
empty right subtree), together with an information part. Since each node of the binary tree can
be reached from the root by following a unique path from the root, a pointer to the root of the
256 ■ DATA STRUCTURES USING C ■
binary tree will allow us to access any node of the binary tree. So we make the following decla-
rations to implement a binary tree:
typedef struct treetype{
int info; /* Information part */
struct treetype *left; /* Pointer to left subtree */
struct treetype *right; /* Pointer to right subtree */
} treenode;
typedef treenode *nodeptr;
nodeptr tree; /* Pointer to the root of the binary tree */
In the above declaration we assumed the information part of each node in the binary tree
as an integer value. Instead of using int as the typename of the information part one should
replace the suitable typename that fits the actual problem domain. For illustration, consider the
following binary tree with integers as the information part of each node.
In the above picture the pointer that is not linked to a node is a NULL pointer. This style
is followed throughout the chapter.
Binary trees are used to organize data so that they can be accessed very efficiently in many
applications. Most of such applications need to be able to walk through all the nodes of the
binary tree by visiting it exactly once. This walking through the tree is referred to as the process
of traversing the binary tree. Clearly, a linear order of nodes is the outcome of a complete tra-
versal of a binary tree and this linear order depends on the traversal algorithm. The definition of
■ TREESM 257
binary tree suggests that a traversal algorithm requires the following three basic steps:
(a) visit a node (denoted in this book as T),
(b) traverse the left subtree (denoted as L), and
(c) traverse the right subtree (denoted as R).
Obviously, the orders in which these steps are performed determine the order of the
node visited in the tree. Moreover, there are 3! different orders in which these steps can be
performed. These are given by
LTR
TLR
LRT
RTL
RLT
TRL
The ordering LTR corresponds to the following traversal algorithm:
if (the binary tree is non-empty)
{
traverse left subtree;
visit the root;
traverse right subtree;
}
It can easily be understood that the last three orders are very much similar to the first
three orders in the above list. The first three orders are the principal orders of traversal and
named as follows:
LTR — inorder traversal
TLR — preorder traversal
LRT — postorder traversal
It is simple to implement either of these algorithms by writing a recursive function. Be-
fore writing the functions we assume the existence of a function visi t (treenode *node)
which performs the desired task (visit) for the node pointed by the pointer node. For our pur-
pose let us consider the function vi s i t as in the following, which prints the information part of
the node visited.
void visit(treenode *node)
{
printf ("%d\t", node->info);
}
We also assume that the pointer tree points to the root of the tree. The three traversal
functions inorder, preorder, and postorder are listed below in examples 12.1, 12.2 and
12.3 respectively.
258 ■ DATA STRUCTURES USING C ■
Example 12.1
The binary tree that represents an expression is known as an expression tree. The expression
tree for the above expression is shown in Fig. 12.4.
From Fig. 12.4 it can be seen that all the operands of the expression appear at the leaf of
the binary tree. Moreover, the preorder traversal of this tree gives the expression
+*AB-*CDE
which is the prefix expression. The inorder traversal generates the original infix expression
A * B + C / D - E . It can also be seen that the postorder traversal of the tree generates the
postfix(RPN) expression A B * C D / E - + .
At this point we should be little careful to notice that different binary trees may have the
same traversal sequence. For example, the inorder traversals of the tree in Fig. 12.5(a) and
Fig. 12.5(b) have the same sequence x y z w. It is clearly seen that though
the inorder traversal sequences of these two trees are the same, their preorder traversals y x w
z and w z y x are not the same. In fact, there is only one binary tree corresponding to a given
inorder and preorder sequence. The construction of such a binary tree is also interesting. Let us
construct a binary tree for which the inorder traversal sequence is given as x y z w and the
preorder sequence as y x w z.
From the preorder sequence it is clear that the root of the binary tree contains y and it is of
the form as shown on the next page, where T1 and T2 are the left and right subtree, respectively.
260 ■ DATA STRUCTURES USING C ■
From the above tree it can be said that the inorder sequence of this is of the following
form: (inorder sequence of T1 subtree) y (inorder sequence of T2 subtree).
This form, when matching with the given inorder sequence, tells us that the inorder sequence of
T1 subtree is x and the inorder sequence of T2 subtree is z w. This, in turn, concludes that T1 is
a single node subtree containing x, but T2 is having more than one node. Clearly again from the
given preorder sequence we can conclude that T2 subtree has a preorder sequence w z since
T1 is a single node subtree containing x. Thus the subtree T2 may be drawn as below which is
having its root containing w with a left subtree T3 and a right subtree T4. Obviously its inorder
traversal is of the form
THREADED BINARYTREE
Binary tree traversals are important operations in many applications. For an inorder traversal,
we have seen, one has to recursively traverse the left subtree first, then visit the root, and finally
traverse the right subtree. Clearly, because of the recursive nature of the routine an implicit
stack is maintained which allows to backtrack whenever necessary. So, for backtracking pur-
pose an explicit stack must be maintained by a routine that implements inorder traversal non-
recursively. Threaded binary tree is nothing but a simple modification to the ordinary
binary-tree structure. This modification allows inorder traversal of a binary tree without any
recursion that uses no stack.
■ TREES ■ 261
In a binary tree the inorder successor of a node M is the node N that appears immediately
after the node M in the inorder traversal of the binary tree. In recursive algorithms this inorder
successor can easily be found by automatic backtracking. As we mentioned earlier, to do this
backtracking a non-recursive algorithm must maintain an explicit stack. A careful look at the
binary tree tells that if the right subtree of a node is non-empty then its inorder successor must
be found in this right subtree, otherwise we need to do backtracking to find it out. In this case,
since the right subtree of the node is NULL, that is, the right link of the node is NULL, a threaded
tree uses this link to keep the pointer that points to its inorder successor. Here we must be
careful that this link should not be confused with the link to its right subtree (because actually
there is no such right subtree for the node). This link is known as a thread. To identify that this
is just a thread and not a link to its right subtree, an extra logical field is added to each of the tree
nodes and may pictorially be shown as in Fig. 12.6.
The binary tree with threads equivalent to this binary tree is shown below.
A function to implement the inorder traversal in a threaded binary tree may be written as
in example 12.4:
Example 12.4
thread_in(nodeptr treeptr)
{
nodeptr ptrl, ptr2;
ptrl = treeptr;
do {
ptr2 = NULL;
while (ptrl != NULL)
{
ptr2 = ptrl;
ptrl = ptrl->left; .
}
if (ptr2 != NULL) /* Non Null treeptr */
(
printf("%d\n", ptr2->info); /* Visit node */
ptrl = ptr2->right;
while (ptrl != NULL && ptr2->thread) |
{ /* Visit inorder successor through thread */
■ TREESM 263
printf("%d\n", ptrl->info);
ptr2 = ptrl;
ptrl = ptrl->right;
}
}
} while (ptr2 != NULL);
}
Though the binary tree in Fig. 12.7 is little awkward, it has the property that the
value(name) in every node in this tree is such that the root node of its left subtree holds a value
which is either NULL or less than (in lexicographic order) the node and the root node of its right
subtree holds a value which is either NULL or greater than (in lexicographic order) the node.
Because of this property it can be seen that if we traverse this binary tree using inorder traversal
we visit the nodes in the order Arjuna, Bhima, Bidur, Draupadi, Duryodhana, Ganga, Judhistra,
Kama, Krishna, Nakul, Sahadeva, Sakuni and Viswa, which is nothing but the alphabetic order.
264 ■ DATA STRUCTURES USING C ■
A careful look at Fig. 12.7 concludes the following observations to this binary tree.
• Each different node has a distinct value(name), that is, no two nodes in this binary tree
have the same value.
• The value of every node in this binary tree is greater than (in lexicographic order) the
value in its left child (if it exists).
• The value of every node is less than the value in its right child (if it exists).
A binary tree having this property is called a binary search tree (BST). Once the informa-
tion is organized in a BST, the search process becomes simpler. For example, if we want to
search whether the name Judhistra is there in the list one may traverse the tree in the following
manner.
Begin the search by traversing the root of the tree.
Is it Judhistra?
Response is no. (It is Nakul)
Is it less than Judhistra?
Response is no. (Means it is greater than Judhistra)
Traverse the left subtree by traversing its root.
Is it Judhistra?
Response is no. (It is Ganga)
Is it less than Judhistra?
Response is yes.
Traverse the right subtree by traversing its root.
Is it Judhistra?
Response is no. (It is Karna)
Is it less than Judhistra?
Response is no. (Means it is greater than Judhistra)
Traverse the left subtree by traversing its root.
Is it Judhistra?
Response is yes.
Obviously, if we try to search for the name Jayadhrata in this BST, the response of the last
comparison will be 'no' and as the name Jayadhrata is less than Judhistra in lexicographic or-
der, the left subtree of Judhistra is to be searched. Since the left subtree is NULL, we conclude
that Jayadhrata is not present in the BST. In this case since the check that the left subtree is
NULL, we see that one more comparison is necessary. A BST may be implemented as follows:
typedef struct treetype{
char info[15]; /* Node information type */
struct treetype *left;
struct treetype *right;
} treenode;
typedef treenode *bst__ptr;
bst_ptr rootptr; /* Pointer to the root of the BST */
■ TREESM 265
A function to search a key in a binary search tree may now be implemented as in the
following example 12.5. The function b s t _ s e a r c h () is searching for a node with specified
key in a BST. It accepts two arguments:
(i) the pointer r o o t p t r to the root of the BST, and
(ii) an i tem indicating the node information key
If the search is successful, the function returns a pointer to the node containing the speci-
fied item. If not successful, it returns a NULL pointer. The function that follows is a recursive
function. Its main drawback is that in some situations it may require a very large stack space.
Example 12.5
" Ganga",
" Ju d h is tr a ",
"K am a" ,
" K r is h n a " ,
"N akul",
"S a h a d e v a ",
"S a k u n i",
"Viswa"
};
We write a function bu i ld _b s t () to build a BST from this array(data-set). The function
should receive the following arguments: pointer to pointer to the root node of BST, possibly a
NULL pointer at the beginning; a pointer to array containing the data set, and finally the array
size, that is, the number of elements in the data set.
The function b u i ld _b s t () may be implemented as in the following.
if (*rootptr == NULL)
{
newptr = (bst_jptr) malloc (sizeof (treenode) ) ;
newptr->left = NULL;
newptr->right = NULL;
strcpy(newptr->info, element);
*rootptr = newptr;
}
else
{
if ( (c=strcmp (element, (*rootptr)->info)) == 0)
return; /* Element already present in the BST */
else if (c<0) /* Insert element to the left subtree */
insert__bst( & (*rootptr)->lef t , element );
else /* Insert element to the right subtree */
insert__bst( & (*rootptr)->right, element );
}
}
With respect to time an iterative component of this function is more advantageous. An
iterative function insert_bst_i ter () that searches a binary search tree and inserts a new
element into the tree if the search is unsuccessful may be implemented as shown in example
12.7. This time let the arguments to the function be the following.
(i) A pointer to the root of the BST, instead of a pointer to pointer to the root as in the case of
recursive version given earlier.
(ii) The element to be inserted.
The function returns a pointer to the inserted node if successful, otherwise returns the
pointer to the node where the element is already present. It may be written as follows:
Example 12.7
ptrl = rootptr; .
ptr2 = NULL;
while (ptrl != NULL)
{
if ((c=strcmp(element, ptrl->info)) == 0)
268 ■ DATA STRUCTURES USING C ■
Fig. 12.9 BST configuration after deletion of 111 from Fig. 12.8
270 ■ DATA STRUCTURES USING C ■
So, to delete a node from a BST for a given information part, we must first search the BST
to get the above two pointers and then only we can delete the node from the BST. Our algorithm
uses the following variables whose purpose are listed below.
delptr: pointer to the node to delete.
pptr: pointer of the parent of the node to delete.
auxptr: an auxiliary pointer that will be set to the pointer to the node by which delptr will be
replaced.
The following function in example 12.8 deletes the node for which the information part is
given by val(an integer value) say, from the BST whose root is pointed by the argument tree. If
such a node is not found, no node is deleted and simply returns leaving the BST unaltered. If the
node is found in the BST, the function deletes the node from it and returns the pointer to the root
of the modified BST (after deleting the node) which is also a BST.
Example 12.8
bst_ptr bst_delete(bst_ptr tree, int val)
/* tree is the pointer to root of the BST *7
/* val is the information part of the node to delete */
{
bst_ptr delptr, pptr;
bst_ptr auxptr; /* An auxiliary pointer */
/* Initialize */
delptr = tree;
■ TREES ■ 271
pptr = NULL;
/* Find node with value val(if any). If found set delptr to
point to the node and pptr to point to its parent(if exists) */
while (delptr != NULL && delptr->info != val)
{
pptr = delptr;
delptr = (val < delptr->info)? delptr->left: delptr->right;
}
if (delptr == NULL) /* Val not found */
return tree;
/* We assume delptr has at the most one child */
/* Set auxptr to point to the node by which the node pointed by
delptr will be replaced */
auxptr = (delptr->left == NULL)? delptr->right: delptr->left;
if (pptr == NULL) /* Node to be deleted is the root itself */
tree = auxptr;
else
(delptr ==pptr->left)? pptr->left = auxptr: pptr->right = auxptr;
free(delptr);
return tree;
}
T h efu n ctio n b st_d elete () given above considers only the first two cases of deleting a
node from a BST. The last case where the node to be deleted having two non-empty children is
yet to be considered. This case can also simply be reduced to one of the first two cases. This can
be achieved by replacing the node to be deleted by its inorder successor (or predecessor) and
deleting this inorder successor (or predecessor) from the BST. The inorder successor (or prede-
cessor) of the value stored in a given node is the successor (or predecessor) of the node in the
inorder traversal of the tree. For illustration, consider the tree in Fig. 12.10 where we want to
delete the node in which the stored value is 72. Clearly, this node is having two nonempty
children. Now we need to locate the inorder successor of this node. A careful look at inorder
traversal process claims that we can reach the inorder successor of a node by starting at its right
child and then descending through the left child as far as we can. In our case this node is the
node with value 77, and is pointed by isucc in Fig. 12.10. To delete the node with value 72, we
first replace the content of this node by the content of its inorder successor node (that is, the
node with value 77). Then we need only to delete this inorder successor. Obviously, this succes-
sor node will always have an empty left subtree. So this deletion can simply be done by one of
the first two cases. After replacing the node with value 72 by the value of its inorder successor
node the tree will look as in Fig. 12.11.
272 ■ DATA STRUCTURES USING C ■
Finally, after deleting this inorder successor the tree takes the form depicted in Fig. 12.12.
Combining this case with the function bs t_delete () we wrote earlier, the function is
finally written as in example 12.9.
Example 12.9
{
bstjptr delptr, pptr, auxptr;
bstjptr pauxptr, lptr;
delptr = tree;
pptr = NULL;
/* Find node with value val(if any). If found, set delptr to
point to the node and pptr to point to its parent node */
while ( delptr!=NULL && delptr->info!=val)
{
pptr = delptr;
delptr = (val<delptr->info)? delptr->left: delptr->right;
}
if (delptr == NULL) /* Val not found */
return tree; /* Tree unchanged */
/* Set auxptr to point to the node by which the node delptr
will be replaced */
if (delptr->left == NULL)
auxptr = delptr->right;
else if (delptr->right = = NULL)
auxptr = delptr->left;
else { /* delptr has two nonempty children */
/* Set auxptr to the inorder successor of delptr and
pauxptr to the parent of auxptr */
pauxptr = delptr;
auxptr = delptr->right;
lptr = auxptr->left;
while (lptr != NULL)
{
pauxptr = auxptr;
auxptr = lptr;
lptr = auxptr->left;
auxptr->right = delptr->right;
}
auxptr-yleft = delptr->left;
}
if (pptr == NULL)
tree = auxptr;
else
(pptr->left==delptr)? pptr->left=auxptr:pptr->right = auxptr;
free(delptr);
return tree;
}
In the earlier sections we have discussed binary trees, threaded binary trees, and binary search
trees. From the above discussions it is easy to understand that in case of a BST the search time is
0(log n) in the best case but in the worst case (where the BST is degenerated) the search time is
O(n). This suggests that if the tree is uniform in nature the search time for a node will be re-
duced. The performance of an algorithm using trees mostly depends on how quickly we can
search for a node in the tree and so the reduction in search time of a node in a tree is of great
importance.
An AVL tree is basically a BST in which the heights of the two subtrees of each node in
the tree never differ by more than one. That is why it is also called as height balanced tree. The
name AVL tree is after the two Russian mathematicians, G M Adelson-Velskii and E M Landis,
who discovered this sort of trees in 1962.
It is easy to visualize that the height of an ordinary binary tree can be O(n) where n is the
number of nodes in the tree and is degenerated to one side. Hence the search time for a node in
such type of tree is also O(n). Obviously, as n increases, the performance degrades. In case of
AVL trees, the maximum number of nodes n is given by n = 2h- 1 , where h is the height of the
tree. So h = log2(n+l)=0(log2n). Hence the time to search for a node traversing the full height of
an AVL tree having n nodes from the root to a leaf node is 0(log2n) and is significantly less than
O(n).
An AVL (height balanced) tree must keep information about the balance between the
heights of the left and right subtrees of each node in the tree. The balance of a node is defined as
the difference between height of its left subtree and the height of its right subtree, that is,
balance = height of left subtree - height of right subtree
The operations on a BST and an AVL tree are very much similar except for insertion of a node
into an AVL tree and deletion of a node from an AVL tree. After insertion and deletion of nodes
into and from an AVL tree the balance is maintained. The above discussion suggests the imple-
mentation of an AVL tree as follows.
typedef int data_type;
typedef struct avl {
■ TREES ■ 275
data_type* info;
struct avl *left;
struct avl *right;
int balance;
} avlnode;
typedef avlnode *avlptr;
At this point we must first determine what are the situations where an insertion of a node
makes an AVL tree unbalanced. To visualize this let us consider a tree in Fig. 12.13. The figure
indicates different insertion points by using dashed lines into an AVL tree. Each node in the tree
in the above figure is named with an alphabet. Each node of the AVL tree holds the balance
value within it before any insertion. At each insertion point a square box is connected with a
dashed line. The value H within a square box indicates that if an insertion takes place at the
corresponding point the AVL tree will remain height balanced and the value U indicates that
because of insertion the tree will be unbalanced. A careful look at Fig. 12.13 claims that the tree
becomes unbalanced only in two situations.
276 ■ DATA STRUCTURES USING C ■
Subtree Subtree
of of
height height
n(Tl) n(T2)
Subtree Subtree
of of
height height
Newly inserted node n -1 n -1
(T2) (T3)
i
i
These unbalanced trees must now be rebalanced to maintain the height balance property.
At this point we must also ensure that after rebalancing the inorder traversal of the tree and the
same for the previous unbalanced tree must match. Before rebalancing, let us shift our concen-
tration a little for the time being. We will come back into our main discussion shortly.
Consider the tree rooted at a shown in Fig. 12.15(a). If we apply a left rotation to this
rooted tree at a we will get the tree as shown in Fig. 12.15(b). Notice that the root
of this new tree is changed to c but the inorder sequences of both the trees are the same. An
algorithm to implement this left rotation to a tree rooted at x (pointer to the root of the subtree)
may simply be written as in the following.
y = x->right;
save = y->left;
y->left = x;
x->right = save;
For illustration, the right rotation of the rooted tree in Fig. 12.15(a) is shown in Fig. 12.15(c).
Here the root of the rotated tree is changed to b. But again the inorder sequence of this rotated
tree is same as that of the tree in Fig. 12.15(a).The algorithm to implement the right rotation to a
tree rooted at x may be written as in the following.
278 ■ DATA STRUCTURES USING C ■
y = x->left;
save = y->right;
y->right = x;
x->left = save;
We have seen that both left and right rotations of trees are preserving the inorder se-
quence. So to rebalance an unbalanced tree we may apply any number of left or right rotations
to the unbalanced tree which will also preserve the inorder sequence.
Now let us turn our concentration to the main discussion. To maintain the height balance
property of the tree in Fig. 12.14(a) we simply apply a right rotation on the subtree rooted at P.
This right rotation will yield a new subtree rooted at Q shown in Fig. 12.16. The figure also
shows the balance value of the nodes after rotation.
The unbalanced subtree in Fig. 12.14(b) requires a little more attention. Here the new node is
inserted into the right subtree of Q. Let R be the root of the right subtree of Q. Apparently, there
are three different situations for insertion. But they are analogous to each other. These three
situations are
• R itself may be the newly inserted node (arises when n=0).
• New node is inserted into the left subtree of R.
• New node is inserted into the right subtree of R.
We discuss the situation in Fig. 12.14(b) that illustrates the case where the new node is
inserted into the left subtree of R. At first we give a left rotation to the subtree rooted at Q. After
this rotation the subtree rooted at P will look like in Fig. 12.17(a).
Note that after this rotation the inorder sequences of the subtrees in Fig. 12.14(b) and
Fig. 12.17(a) are same. To rebalance, finally we give a right rotation to the subtree rooted at P
shown in Fig. 12.17(a). After this right rotation the final subtree will look like in Fig. 12.17(b).
Here also it should be noted that the inorder sequence of this subtree and that of the subtree in
Fig. 12.14(b) are the same. The other case is very much similar (symmetrical) to the above and is
left to the reader. A C function a v l _ i n s e r t () is presented below. The function allows to
search and insert an element into an AVL tree. The implementation of a node in AVL tree is
discussed earlier in this section. The function shown in example 12.10 returns the pointer to the
root of the AVL tree after node insertion.
■ TREESM 279
Example 12.10
avlptr avl__insert (avlptr tree, data_type item)
{
int imbal_dir; /* Imbalance direction */
avlptr p = tree,
y = p, /*y points to the youngest unbalanced ancestor*/
cy = NULL, /* cy points to the child of y */
py = NULL, /* py points to the parent of y */
pp = NULL, /* pp points to the parent of p */
q = NULL; /* q points to the new node */
pp = p ;
P = q;
}
/* Insert new record into the BST */
q = (avlpt.r)malloc ( sizeof (avlnode) );
q->info = item;
q->left = q->right = NULL;
q->balance = 0;
(item < pp->info)? pp->left = q: pp->right = q;
/* At this point the balance between q and y must be adjusted
because of insertion */
p = (item < y->info)? y->left: y->right;
cy = p;
while ( p != q )
if (item < p->info)
{
p->balance = 1;
p = p->left;
}
else {
p->balance = -1;
p = p->right;
}
/* Check if the tree has become unbalanced */
imbal_dir = (item < y->info)?l : -1;
if (y->balance == 0)
{ /* Another level is added to the tree and since y is the
youngest ancestor, the tree remains balanced */
y->balance = imbal_dir;
return(q);
}
if (y->balance != imbal_dir)
{ /* Node is inserted to the opposite direction of imbalance,
so the tree remains balanced */
y->balance = 0;
return(q);
}
■ TREESM 281
/* Now it has been found that the tree has become unbalanced. Rebal-
ance the tree by applying required rotations and adjusting the bal-
ance of involved nodes */
/* Note that, q is pointing to the inserted node,
y is pointing to its youngest unbalanced ancestor,
py is pointing to the parent of y,
and cy is pointing to the child of y in the direction of
imbalance */
if ( cy->balance == imbal_dir )
{ /* cy and y are unbalanced in same direction */
p = cy;
(imbal_dir == 1)? rotate_right (y) : rotate_lef t (y) ;
y->balance = 0;
cy->balance = 0;
}
else { /* cy and y are unbalanced in opposite directions */
if (imbal__dir == 1)
{
p = cy->right;
rotate_left(cy) ;
y->left = p;
rotate_right(y) ;
}
else {
p = cy->left;
rotate_right(cy) ;
y->right = p;
rotate_left(y) ;
}
/* Now adjust balance of nodes involved */
if (p->balance = 0)
{ /* p is the node inserted */
y->balance = 0;
cy->balance = 0;
}
else if (p->balance == imbal_dir)
282 ■ DATA STRUCTURES USING C ■
{
y->balance = -imbal_dir;
cy->balance = 0;
}
else {
y->balance = 0;
cy->balance = imbal__dir;
}
p->balance = 0;
}
if (py == NULL)
tree = p;
else if (y = = py->right)
py->right = p;
else
py->left = p;
return(q);
}
B-trees are basically the general form of BSTs in which a node can have more than one informa-
tion part. These are balanced search trees which are very much useful for external searching
and work well in secondary storage devices. Consider a node that can have a maximum of
m children in the tree. We say that it is a B-tree of order m. In a B-tree all leaf nodes are at the
same level. If a node is having exactly k non-empty children, then the node contains exactly
(k-1) keys. The leaf nodes and the intermediate nodes are distinguishable. Precisely speaking, a
■ TREESM 283
The basic operations that can be performed on a B-tree are the following: searching for a
key in the B-tree, inserting a key into the B-tree, deleting a key from the B-tree.
The above discussion suggests an implementation of the B-tree node structure as in the
following.
#define ATMOST „ /* Maximum number of keys in a node */
#define ATLEAST „ /* Minimum number of keys in a
node=rATMOST/2l */
typedef int key_data;
typedef struct node {
int n; /* Number of keys in the node */
key_data key[ATMOST+1];
struct node *bough[ATMOST+1];
} btreenode;
typedef btreenode *btreeptr;
286 ■ DATA STRUCTURES USING C ■
return ( (k = = ptr->key[*pos])?1:0 );
}
}
return ptr;
}
return rootptr;
}
Note that the function btree_insert () calls the function move down () which is used
to search for the key k in the B-tree pointed by rootptr to find its insertion point. This function
uses four arguments as given below.
• Key k to be searched in the B-tree for insertion.
• p tr , the pointer to the root node of the subtree in which the search takes place.
• pm, pointer to the key to be inserted into a newly created root (in case of splitting).
• pmr, pointer to the new node (the right subtree of pm) after splitting.
The function move down () returns TRUE when the key pointed by pm is to be placed in a new
root node, and the height of the entire tree increases. When the height of the tree does not
increase, the function returns FALSE. In such a situation the function movedown () inserts the
key into the proper node of the B-tree, if required, movedown () is a recursive function and it
terminates when ptr, the pointer to the root node of the subtree in which the search taking place,
is NULL. When a recursive call to movedown () returns TRUE, then an attempt is made to
insert the key pointed by pm to the current node. This is straightforward if there is room for the
key in the current node. Otherwise, the current node *ptr splits into *ptr and *pmr and the
middle key is moved upwards through the tree. The function movedown () in turn uses the
following three other functions.
• node search (), to search for a key within a node as shown earlier.
• putkey (), which allows to put the key into the node *ptr when possible. Actually the
function is called only when *ptr has room for a key.
• spl it (), which splits the node *ptr into *ptr and *pmr.
The movedown (0) function may now be written as given shown in example 12.13.
Example 12.13
movedown(key__data k, btreeptr ptr, key_data *pm, btreeptr *pmr)
{
int b; /* On which bough to continue the search */
if (ptr == NULL)
{
*pm = k;
*pmr = NULL;
return TRUE;
}
else {
if ( !nodesearch(k, ptr, &b))
if (movedown(k, ptr->boughfb], pm, pmr))
■ TREES ■ 289
Fig. 12.19
as in Fig. 12.19(b). Now let us consider the case where we try to delete a key from a leaf node
which is having minimum number of keys in it. This means that a key deletion from this node
will leave its node with too few keys. Consider that we have to delete the key 436 from its node.
This situation is tackled in the following way. Notice that its successor 462 is in its parent node.
Therefore 462 is moved down to take the place of 436 and is removed (deleted) from its original
place as earlier, that is, its original place is replaced by its successor 468 and 468 is deleted from
its leaf node. The process of deletion is depicted in Fig. 12.19(b). The deletion of the key
299 needs special attention. Notice that the deletion of 299 leaves its leaf node with too few keys.
Moreover, neither of its siblings can spare a key for this leaf. Therefore, the node is combined
with one of its siblings together with the middle key from its parent node as shown with the
loop in Fig. 12.19(c). After combining, the configuration of the B-tree is shown in Fig. 12.19(d). A
careful look at this figure tells us that the effect of combining will leave the parent node with too
few keys, and again its sibling cannot spare a key for it. Therefore, the top three nodes are again
combined to a single node as shown in Fig. 12.19(d) to yield the final B-tree after deletion which
is shown in Fig. 12.19(e).
All the cases of deletion of keys from a B-tree are discussed above in detail. We leave the
development of a detailed program for deleting a key from B-tree to the reader.
E mX i E i R m C i I m S m E m S
1. For each node in the tree in Fig. 12.7
(i) Name the parent node
(ii) List the children
(iii) List the siblings
(iv) Find the depth and height of the tree
2. Define binary tree and binary search tree and distinguish between them.
3. Give the visiting sequence of vertices of each of the following binary trees under
(i) Preorder
(ii) Inorder
(iii) Postorder traversal
■ TREESM 293
5. (a) Show the tree after inserting the following integers in sequence into an initially empty
binary search tree (BST).
18 6 20 25 32 8 12 16
(b) Show the BST after deleting the root from your BST.
6. Construct a binary tree, given the preorder and inorder sequences as below.
preorder: abceifjdghkl
inorder: eicfjbgdkhla
7. Show that in a binary tree with n nodes, there are (n+1) NULL pointers representing chil-
dren.
8. Show that the number of leaves in a non-empty binary tree is equal to the number of full
nodes plus one, where a full node is a node with two non-empty children.
9. How does a binary search tree, (BST) differ from an AVL tree and in what situation a BST is
inefficient as compared to an AVL tree with respect to searching time?
10. What is a threaded binary tree and what is its main advantage over an ordinary binary tree?
11. Write a function to generate the AVL tree of height h with minimum number of nodes in it.
12. Write a routine to perform deletion from a B-tree.
13
GRAPHS
Graphs are the most common 'abstract' structures encountered in computer science. Any sys-
tem that consists of discrete states or sites (called nodes or vertices) and connections between
them can be modeled by a graph. In computer science, mathematics, engineering, and many
other disciplines we often need to model a symmetric relationship between objects. The objects
are represented by nodes and the connections between nodes of a graph are called edges (in
case the connections are between ordered pair of nodes), or directed edges (in case the connec-
tions are between ordered pairs of nodes). Connections may also carry additional informations
as labels or weights related to the interpretation of the graph. Consequently, there are many
types of graphs and many basic notions that capture aspects of the structure of graphs. Also,
many applications require efficient algorithms that essentially operate on graphs.
Graphs occur often in real life, and we encounter them in natural situations. For example,
a road map showing the interstate highway connections between cities is an excellent example
of an undirected graph, since all interstate highways are two-way. We could also add weights
to each edge to indicate the distance in miles between the two cities, producing a weighted
undirected graph.
The sequence of courses that one must take to complete a degree in computer science can
also be represented as a graph. It is a directed graph in which the direction or edge implies the
specific order in which the course must be taken.
Another example that is a little more closely related to computer science is the graph that
represents resource allocations. It describes the relationship between a process, that is, a pro-
gram in execution and other resources in the system such as memory, a file, or printer. A re-
source allocation graph is a directed graph with edges going from a process node to a resource
node.
In a computer network, computers are interconnected via high-speed communication
channels such as phone line, optical fibres, microwave relays, or satellites. We can use a graph-
based representation to determine how to route messages from one node to another and to find
backup routes in case of node.
Graphs are frequently applied in diverse areas such as artificial intelligence, cybernetics,
chemical structures of crystals, transportation networks, electrical circuitry, and the analysis of
programming languages.
In this chapter we give graph representation and traversal algorithms. We also describe
the details of the carefully tuned data structures that may be needed to achieve the ultimate
bounds on time and space complexity for the algorithms.
■ GRAPHS ■ 295
Note that in an undirected graph there can be at the most n(n-l)/2 edges for n nodes.
A weighted graph is a graph in which each edge has an associated value. A weighted
edge has a scaler value W associated with it. The weight is a measure of the cost of using this
edge to go from source node to destination.
15
— ® : Undirected, Weighted edge
A city map showing only the one-way streets is an example of a digraph where nodes are
the intersection of streets and the edges are the streets. Two-way streets in an example of an
undirected graph.
Any sequence of edges of a directed graph such that the terminal node of any node in the
sequence is the initial node of the edge, if any, appearing next in the sequence defines a path of
the graph. In other words, a simple path through a graph is a sequence of vertices Vv V2, ..., Vk
such that all vertices except possibly the first and the last, are distinct and each pair or nodes V.,
V = l, i=l, k-1 is connected by an edge. Consider Fig. 13.4.
In the above figure, the sequence ABCD is a simple path. DFABCD is also a path. How-
ever, ABE is not a path since there is no edge directly connecting nodes B and E. BCDBC is also
not a path since all nodes are not distinct (the nodes B and C appear twice in the sequence).
A cycle or circuit is a simple path V2, V3, ..., Vkexactly as defined above but with the
added requirement that V1=Vk.
For example, CDC and ABCDFEA are cycles. A cycle is called a simple cycle if no edges
appear more than once, otherwise it is a cycle.
The outdegree of a node in a directed graph is the number of edges exiting from the node;
the indegree of a node is the number of edges entering the node. In the following figures, the
indegree of CALCUTTA is 2 and its outdegree is 1. The indegree and outdegree of NEW DELHI
are 2 and 2, respectively.The indegree and outdegree of a node in a directed graph indicate its
relative importance of that graph. A node whose outdegree 0 acts primarily as a sink node and
a node whose indegree is 0 is called a source node.
A B
For example, in the figure, KALKA is the source node and PUNE is the sink node. Since
indegree and outdegree cannot apply to a node in an undirected graph, the degree of a node is
defined as the number of edges connected directly to the node. In Fig. 13.6 the degree of each
node is 2.
■ GRAPHS ■ 297
V! V2 V3 V4 V5 V6
V, 0 1 1 0 0 0
V2 0 0 1 0 0 0
V3 0 0 0 1 0 0
V4 1 0 0 0 1 1
V5 0 0 0 1 0 0
V6 0 0 0 0 0 0
Fig. 13.7
Similarly, we can draw adjacency matrix of an undirected graph. The adjacency matrix
and the corresponding graph for an undirected graph are shown in the figures below.
( \ r\ Va V2 V3 V4
V, 0 1 1 1
A m
Vv3/
V2 1 0 1 1
, v3 1 1 0 1
V4 1 1 1 0
y
The adjacency matrix for an undirected graph is always symmetric whereas the adja-
cency matrix for a directed graph need not be symmetric. In other words, if the graph is undi-
rected, then A (i, j)=A(j,i) for all i, j.
—► 2 5 / 1 2 3 4 5
( 7 ) ....... ( 7 ) 1
—► 1 5 3 - > 4 / 1 0 1 0 0 1
r / T V 2
—► 2 —► 4 / 2 1 0 1 1 1
—► 2 5 —► 3 3 0 1 0 1 0
/
------- Q j 5 4 1 2 / 4 0 1 1 0 1
5 1 1 0 1 0
(a) Graph G (b) Adjacency list of graph G (c) Adjacency matrix of graph G
Fig. 13.10
1 2 3 4 5 6
r\ / O>i) 1 -► 2 —► 4 / 1 0 1 0 1 0 0
-► 5 2 0 0 0 0 1 0
r
1r i
f J* 'f
1
k
2
3
4
—► 6
-* 2 /
—►
5z 3
4
0
0
0
1
0
0
0
0
1
0
1
0
( 4 )— 4— (5 D w 5 —► 4 5 0 0 0 1 0 0
6 6 0 0 0 0 0 1
(a) Graph G1 (b) Adjacency list of graph G1 (c) Adjacency matrix of graph G1
Fig. 13.11
In adjacency list representation, we store all the vertices in a list and then for each vertice,
we have a linked list of its adjacency vertices. In the next section, the graph traversal algorithm
will be discurssed.
#include<stdio.h>
#include<stdlib.h>
void insert (int);
int queue[20], rear=-l, graph[7][7], row ;
void insert (int x)
{
rear= rear+1;
queue[rear]=x ;
}
300 ■ DATA STRUCTURES USING C ■
Remove()
{
int item, k;
item= queue[0];
for(k=o, k<rear; k++)
queue[k]= queue[k+1];
rear = rear-1;
' return(item);
}
void m a i n ()
{
int i,j,num,w,visited[10],v,jl;
int 1, vertices[10], count=0, final[10];
clrscr();
randomize();
printf("How many vertices are there ====>");
scanf("%d", & row);
printf("\n The graph's adjacency matrix is as
follows:\n\n\n");
printf(" ") ;
for(j=0, j<row; j++)
printf("Vertex %d\n", j);
for(i=0, icrow; i++)
{
for(j=count; j<row; j + +)
{
if(ii=j)
{
graph[i ][j ] = random (2);
graph[j][i] = graph [i][j ];
}
else graph[i][j] =0 ;
}
count++;
}
for(i=0; icrow; i++)
■ GRAPHS M 301
{
printf("Vertex %d", i);
for(j=0; jcrow, j++)
printf("%8d", graph[i][j ]);
printf(" \n\n");
}
for (i=0; icrow, i++)
visited[i]=0;
printf("\n Enter the vertex number from which to start ====>");
scanf("%d", &v);
visited[v]=1;
insert(v);
getch();
clrscr();
printf ("\n Starting vertex ==== = > Vertex %d \n\n",v);
count=l;
j 1 =0;
while(rear>=0)
{
v=Remove();
final[jl]=v;
j 1++ ;
1 =0 ;
for(i=0; i<row; i++)
if(graph[v][i]==1)
{
vertices[1]=i;
1++ ;
}
for(i=0; i<l; i++)
{
w=vertices[i];
printf("Step %d: Vertex visited = = = > Vertex
%d\n", count, w) ;
if(visited[w]!=1)
302 ■ DATA STRUCTURES USING C ■
{
insert(w);
visited[w]=1;
}
}
printf("Elements in the queue ===>");
if(rear>=0)
for(J=l; j<=rear; j++)
printf ("%d", queue[j]);
else
printf("EMPTY QUEUE: TRAVERSAL COMPLETE");
count*+;
printf("\n\n\n");
getch();
clrscr();
}
printf("\n The vertices that are traversed");
printf("from vertex %d are as follows:\n",v ) ;
if(count==2)
printf('\n Vertex %d: Is an isolated vertex",v);
else
for(i=o); i<jl; i++)
printf("Vertex %d,", final[i]);
getch();
}
The program produces the following output corresponding to the given input.
How many vertices are there ===> 5
The graph's adjacency matrix is as follows:
VertexO Vertexl Vertex2 Vertex3 Vertex4
VertexO 0 1 1 0 0
Vertexl 1 0 0 1 0
Vertex2 1 0 0 1 1
Vertex3 0 1 1 0 0
Vertex4 0 0 1 0 0
Enter the vertex number from which to start====>3
Starting vertex ====> Vertex3
Stepl: Vertex visited ====> Vertexl
■ GRAPHS ■ 303
V! V2 V3 V4 V5
V, 0 1 1 0 0
V2 0 0 0 0 0
V3 0 1 0 0 0
V4 1 1 1 0 1
V5 0 1 0 0 0
Fig. 13.12
Let us choose Vl as the starting node in the graph. First designate it as the search node and
mark it as visited. The nodes V2and V3are adjacent to Vr Next we will mark V2for searching the
possible path. There is no node adjacent to V2and we move to Vj again for considering the next
possibility of node, that is, V3. All nodes adjacent to V3also have been visited. So we will return
to Vr The search from Vl is now completed since all nodes except V4and V5 have been visited.
We now choose V4first and mark it as visited. Since V5is the only unvisited node adjacent
to V4 proceed to V5. V2 is the only node adjacent to V5, we will move to V2that has already been
visited. Then we will return to V4and the total search is complete.
Depth-first search is a generalization of traversing trees in preorder. The starting vertex
may be determined by the problem or may be chosen arbitrarily. While traversing vertices start-
ing from the initial vertex, a dead end may be reached. A dead end is a vertex such that all its
neighbours, that is, vertices adjacent to it, have already been visited. At a dead end we back up
along the last edge traversed and branch out in another direction. The beauty of the depth-first
algorithm, developed by J E Hopcroft and R E Tarjan, lies in the idea that the algorithm is used
to develop many important algorithms.
In a breadth-first search, vertices are visited in order of increasing distance from the start-
ing vertex, say V, where distance is simply the number of edges in a shortest path. An efficient
implementation for either method must keep a list of vertices that have been visited but whose
adjacent vertices have not yet been visited. The depth-first search backs up from a dead end, it
is supposed to branch out from the most recently visited vertex before pursuing new paths from
vertices that were visited earlier. Thus the list of vertices from which some paths remain to be
traversed must be in a stack. On the otherhand, in a breadth-first search, in order to ensure that
vertices close to V are visited first the list must be a queue.
■ GRAPHS ■ 305
#include <stdio.h>
#include<stdlib.h>
struck stack
{
int data[20];
int top;
};
void push(struct stack*, int);
int graph[10][10], row;
void main()
{
struct stack S ;
int i,j, count=0, v, visited[15];
int 1, vertices[15], w, final[15], index=0;
clrscr();
randomize();
printf("How many vertices are there ====>");
scanf("%d", &row);
printf("\n The graph's adjancency matrix is as follows:
\n\n\n");
printf(" ");
for(j=0; j<row; j+ + )
printf("Vertex %d", j);
printf("\n");
for(i=0; i<row; i++)
{
for(j=count; j<row; j++)
306 ■ DATA STRUCTURES USING C ■
{
if (iI=j )
{
graph[i][j]=random(2);
graph[j][i]=graph[i][ j ] ;
}
else
graph[i][j]=0;
}
count++;
}
for(i=0; icrow; i + +)
{
printf("Vertex %d", i);
for(j=0; jcrow; j + +)
printf("%8d", grapn[i][j];
printf ("\n\n/#) ;
}
for(i=0; ic20; i++)
S.data[i]=0;
S.top=-l;
printf("Enter the vertex from which traversal starts
=== >");
scanf("%d", &v);
getch(); clrscr();
for(i=0, icrow; i++)
visited[i]=0;
visited[v]=1;
push(&s,v);
printf "\n Starting vertex ====> vertex %d \n\n", v ) ;
count=1;
while(S.top==o)
{
v=pop(&S);
final[indx]=v;
indx++;
1 =0 ;
■ GRAPHS ■ 307
EnXeEsRpCiiliiS^EitS
1. Describe the adjacency matrix representation of graph. What are its advantages and disad-
vantages?
2. What are the main advantages of adjacency list representation of a graph over the adja-
cency matrix representation?
3. Explain the difference between directed graph and undirected one?
4. Write a pseudo code to implement the Breadth-first search algorithm.
5. Consider a graph in which the relationship among the nodes is linear. Describe the order in
which the nodes will be processed during Depth-first search that start at one of the nodes
with only one neighbour. What happens if the search starts at one of the nodes with two
neighbour?
6. Both Breadth-first and Depth-first search procedures probe each node in the graph at least
once and each edge twice. Prove that this effort is 0(ne+n), where n is the number of nodes
and e denotes the number of edges.
7. Write a recursive routine to implement Breadth-first search.
8. Write an algorithm to produce the shortest path from a node m to another node n in an
undirected graph if a path exists, or an indication that no path exists between two nodes.
9. Let D be a directed graph and TDbe the directed graph formed by adding an edge from dx
to d2whenever dt and d2 (with no direct edge) are nodes in D and there is a path from dato
d2. Td is the transitive closure of D. Write an algorithm to compute the transitive closure of
a digraph.
10. A topological order is a linear relationship among the nodes of a directed graph such that
each directed edge goes from a node to one of its successors. The basic idea behind topo-
logical sorting is to find a node with no successor, remove it from the graph and add it to a
list. Repeat this process until the graph is empty. Write an algorithm to implement topologi-
cal sorting.
11. Explain briefly the following terms:
a. diagraph b. adjacency list c. Traversal of graph
INDEX
B
B-trees 282 F Terry Baker 25
BASIC 119 Fibonacci numbers 140
binary 90 first-come-first-served (FCFS) 111
binary number system 1 first-in-first-out (FIFO) 111
Binary Search Trees 263 fixed-point storage representation 6
binary tree 224 float 5
bit 1 floating-point notation 2
Bohm 25 floating-point representation 6
bottom-up approach 21 flowchart 11
Boyer-Moore algorithm 67
Brute-Force algorithm 60 G
Graph search 298
C graph traversal 298
C language 3 Graphs 294
character pointer 80
circular linked list 184 m
COBOL 119
Hartlan Mills 25
Communications of the ACM 25
hashing schemes 238, 246
■ INDEX ■ 311
IMM]
postfix notation expressions 98
I
prefix 98
implementions 152 preorder traversals 259
informal design language 13 primitive data structures 5
initialization 78 procedural abstraction 20
inorder traversal sequences 259 processing symbol 11
insertion 34 program flowchart 11
International Organization for pseudocode 13
Standardization (ISO) 13 pseudolanguage 13
push operation 179
JW JW illiam s 224
Jacopini 25 queues 90
K R
Kunth-Morris-Pratt 52 recursive algorithms 119
Kunth-Morris-Pratt algorithm 63 root 253
RPN expression 100
L
last-come-first-serve (LCFS) 90
last-in-first-out 90 scalar 72
last-in-first-out (LIFO) 90 searching 238
linked list 1,146 binary search 240
LSB (least significant bit) 1 breadth-first search 298
depth-first search 298
M indexed sequential search 243
sequential search 238
mantissa 2 sequential lists 144
modularisation 18 singly linked list 186
MSB (most significant bit) 6 sorting 200
bubble sort 204
N heap sort 224
insertion sort 15, 200
Nested control structures 26 merge sort 217
node 160 quick sort 212
nodearray 160 shell sort 208
selection sort 202
P stacks 1, 90
PASCAL 60 static 32
Pascal triangle 45 string 52
string processing 52
pattern matching 52
pointer constant 74 structural programming 26
pointer-based implementation 152 substring 55
pointers 71, 74 symetrically linked lists 186
system flowchart 11
312 ■ DATA STRUCTURES USING C ■
T U
terminal symbol 11 unary operator 72
text processing 52
threaded binary tree 260 V
top-down design approach 19
Tower of Hanoi 126 vertices 253
traversals 256
tree 253 W
triad numbers 48
two-dimensional 36 weight 295
type specifier 72 weighted graph 295