Module-1 (1) - 1
Module-1 (1) - 1
Module-1 (1) - 1
MODULE 1: INTRODUCTION
DATA STRUCTURES
Data may be organized in many different ways. The logical or mathematical model of a
particular organization of data is called a data structure.
A data structure is a storage that is used to store and organize data. It is a way of
arranging data on a computer so that it can be accessed and updated efficiently.
1. Primitive data Structures: Primitive data structures are the fundamental data types which
are supported by a programming language. Basic data types such as integer, real, character
and Boolean are known as Primitive data Structures. These data types consists of characters
that cannot be divided and hence they also called simple data types.
2. Non- Primitive data Structures: Non-primitive data structures are those data structures
which are created using primitive data structures. Examples of non-primitive data
structures is the processing of complex numbers, linked lists, stacks, trees, and graphs.
1. Boolean
Boolean data type represents only one bit of information either true or false which is intended to
represent the two truth values of logic and Boolean algebra, but the size of the Boolean data type
is virtual machine-dependent.
Size: Virtual machine dependent
Values: Boolean such as true, false
Default Value: false
2. Byte
The byte data type is an 8-bit signed two’s complement integer. The byte data type is useful for
saving memory in large arrays.
Size: 1 byte (8 bits)
Values: -128 to 127
3. Short
The short data type is a 16-bit signed two’s complement integer. Similar to byte, use a short to save
memory in large arrays, in situations where the memory savings actually matters.
Size: 2 byte (16 bits)
Values: -32, 768 to 32, 767
4. Int
It is a 32-bit signed two’s complement integer.
Size: 4 byte ( 32 bits )
Values: -2, 147, 483, 648 to 2, 147, 483, 647
5. Long
The range of a long is quite large. The long data type is a 64-bit two’s complement integer and is
useful for those occasions where an int type is not large enough to hold the desired value.
Size: 8 byte (64 bits)
Values: {-9, 223, 372, 036, 854, 775, 808} to {9, 223, 372, 036, 854, 775, 807}
6. Float
The float data type is a single-precision 32-bit IEEE 754 floating-point. Use a float (instead of
double) if you need to save memory in large arrays of floating-point numbers.
Size: 4 byte (32 bits)
Values: up to 7 decimal digits
7. Double
The double data type is a double-precision 64-bit IEEE 754 floating-point. For decimal values, this
data type is generally the default choice.
Size: 8 bytes or 64 bits
Values: Up to 16 decimal digits
8. Char
The char data type is a single 16-bit Unicode character.
Size: 2 byte (16 bits)
Values: ‘\u0000’ (0) to ‘\uffff’ (65535)
Based on the structure and arrangement of data, non-primitive data structures is further
classified into
1. Linear Data Structure
2. Non-linear Data Structure
1. Linear Data Structure:
A data structure is said to be linear if its elements form a sequence or a linear list. There are
basically two ways of representing such linear structure in memory.
1. One way is to have the linear relationships between the elements represented by means
of sequential memory location. These linear structures are called arrays.
2. The other way is to have the linear relationship between the elements represented by
means of pointers or links. These linear structures are called linked lists.
The common examples of linear data structure are Arrays, Queues, Stacks, Linked lists
2. Stack: Stack is a linear data structure which follows a particular order in which the operations are
performed. The order may be LIFO(Last In First Out) or FILO(First In Last Out).
There are many real-life examples of a stack. Consider an example of plates stacked over one
another in the canteen. The plate which is at the top is the first one to be removed, i.e. the plate which
has been placed at the bottommost position remains in the stack for the longest period of time. So, it
can be simply seen to follow LIFO(Last In First Out)/FILO(First In Last Out) order.
3. Queue: This structure is almost similar to the stack as the data is stored sequentially. The
difference is that the queue data structure follows FIFO which is the rule of First In -First Out
where the first added element is to exit the queue first. Front and rear are the two terms to be
used in a queue.
FIFO Principle of Queue:
A Queue is like a line waiting to purchase tickets, where the first person in line is the first
person served. (i.e. First come first serve).
Position of the entry in a queue ready to be served, that is, the first entry that will be removed
from the queue, is called the front of the queue(sometimes, head of the queue), similarly, the
position of the last entry in the queue, that is, the one most recently added, is called the rear (or
the tail) of the queue.
4. Linked List: A linked list is a linear data structure, in which the elements are not stored at
contiguous memory locations. The elements in a linked list are linked using pointers as shown in
the below image:
Tree: A tree data structure consists of various nodes linked together. The structure of a tree is
hierarchical that forms a relationship like that of the parent and a child. The structure of the tree is
formed in a way that there is one connection for every parent-child node relationship. Only one path
should exist between the root to a node in the tree. Various types of trees are present based on their
structures like AVL tree, binary tree, binary search tree, etc.
Graphs: Graphs are those types of non-linear data structures which consist of a definite quantity of
vertices and edges. The vertices or the nodes are involved in storing data and the edges show the
vertices relationship. The difference between a graph to a tree is that in a graph there are no specific
rules for the connection of nodes. Real-life problems like social networks, telephone networks, etc.
can be represented through the graphs.
ARRAYS
An Array is defined as, an ordered set of similar data items. All the data items of an
array are stored in consecutive memory locations.
The data items of an array are of same type and each data items can be accessed using
the same name but different index value.
An array is a set of pairs, <index, value >, such that each index has a value associated
with it. It can be called as corresponding or a mapping
Ex: <index, value>
< 0 , 25 > list[0]=25
< 1 , 15 > list[1]=15
< 2 , 20 > list[2]=20
< 3 , 17 > list[3]=17
< 4 , 35 > list[4]=35
Here, list is the name of array. By using, list [0] to list [4] the data items in list can be
accessed.
Array in C
Declaration: A one dimensional array in C is declared by adding brackets to the name of a
variable.
Ex: int list[5], *plist[5];
The array list[5], defines 5 integers and in C array start at index 0, so list[0], list[1],
list[2], list[3], list[4] are the names of five array elements which contains an integer
value.
The array *plist[5], defines an array of 5 pointers to integers. Where, plist[0], plist[1],
plist[2], plist[3], plist[4] are the five array elements which contains a pointer to an
integer.
Implementation:
When the complier encounters an array declaration, list[5], it allocates five consecutive
memory locations. Each memory is enough large to hold a single integer.
The address of first element of an array is called Base Address. Ex: For list[5] the
address of list[0] is called the base address.
If the memory address of list[i] need to compute by the compiler, then the size of the
int would get by sizeof (int), then memory address of list[i] is as follows:
Note: In C the offset i do not multiply with the size of the type to get to the appropriate
element of the array. Hence (list2+i) is equal &list2[i] and *(list2+i) is equal to list2[i].
When sum is invoked, input=&input[0] is copied into a temporary location and associated
with the formal parameter list
A function that prints out both the address of the ith element of the array and the value found
at that address can written as shown in below program.
Output:
Address Content
12244868 0
12344872 1
12344876 2
12344880 3
12344884 4
STRUCTURES
In C, a way to group data that permits the data to vary in type. This mechanism is called the
structure, for short struct.
A structure (a record) is a collection of data items, where each item is identified as to its type
and name.
Syntax: struct
{ data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
} variable_name;
Ex: struct {
char name[10];
int age;
float salary;
} Person;
The above example creates a structure and variable name is Person and that has three fields:
name = a name that is a characterarray
age = an integer value representing the age of the person
salary = a float value representing the salary of the individual
Ex: strcpy(Person.name,“james”);
Person.age = 10;
Person.salary = 35000;
Type-Defined Structure
The structure definition associated with keyword typedef is called Type-Defined Structure.
Syntax 1: typedef struct
{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
}Type_name;
Where,
typedef is the keyword used at the beginning of the definition and by using typedef
user defined data type can be obtained.
struct is the keyword which tells structure is defined to the complier
The members are declare with their data_type
Type_name is not a variable, it is user defined data_type.
In above example, humanBeing is the name of the type and it is a user defined data type.
This statement declares the variable person1 and person2 are of type humanBeing.
Structure Operation
The various operations can be performed on structures and structure members.
Example: The following example shows two structures, where both the structure are defined
separately.
typedef struct {
int month;
int day;
int year;
}date;
typedef struct {
char name[10];
int age;
float salary;
date dob;
} humanBeing;
humanBeing person1;
A person born on February 11, 1944, would have the values for the date struct set as:
person1.dob.month = 2;
person1.dob.day = 11;
person1.dob.year = 1944;
2. The complete definition of a structure is placed inside the definition of another structure.
Example:
typedef struct {
char name[10];
int age;
float salary;
struct {
----
} date;
} humanBeing;
SELF-REFERENTIAL STRUCTURES
A self-referential structure is one in which one or more of its components is a pointer to itself.
Self-referential structures usually require dynamic storage management routines(malloc and free)
to explicitly obtain and release memory.
Consider as an example:
typedef struct {
char data;
struct list *link ;
} list;
Each instance of the structure list will have two components data and link.
Data: is a single character,
Link: link is a pointer to a list structure. The value of link is either the address in
memory of an instance of list or the null pointer.
Consider these statements, which create three structures and assign values to their respective
fields:
Structures item1, item2 and item3 each contain the data item a, b, and c respectively, and the
null pointer. These structures can be attached together by replacing the null link field in item
2 with one that points to item 3 and by replacing the null link field in item 1 with one that points
to item 2.
item1.link = &item2;
item2.1ink = &item3;
Unions:
A union is similar to a structure, it is collection of data similar data type or dissimilar.
Syntax: union{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
}variable_name;
Example:
union{
int children;
int beard;
} u;
Union Declaration:
A union declaration is similar to a structure, but the fields of a union must share their memory
space. This means that only one field of the union is "active" at any given time.
union{
char name;
int age;
float salary;
}u;
The major difference between a union and a structure is that unlike structure members which
are stored in separate memory locations, all the members of union must share the same memory
space. This means that only one field of the union is "active" at any given time.
Example:
#include <stdio.h>
union job {
char name[32];
float salary;
int worker_no;
}u;
int main( ){
printf("Enter name:\n");
scanf("%s", &u.name);
printf("Enter salary: \n");
scanf("%f", &u.salary);
printf("Displaying\n Name :%s\n",u.name);
printf("Salary: %.1f",u.salary);
return 0;
}
Output:
Enter name: Albert
Enter salary: 45678.90
Displaying
Name: f%gupad (Garbage Value)
Salary: 45678.90
POINTERS
A pointer is a variable which contains the address in memory of another variable.
The two most important operator used with the pointer type are
& - The unary operator & which gives the address of a variable
* - The indirection or dereference operator * gives the content of the object pointed to
by a pointer.
Declaration
int i, *pi;
pi = &i;
Here, &i returns the address of i and assigns it as the value of pi
Null Pointer
The null pointer points to no object or function.
The null pointer is represented by the integer 0.
The null pointer can be used in relational expression, where it is interpreted as false.
3. Pointer is dangerous when use of explicit type casts in converting between pointer types
Ex: pi = malloc (sizeof (int));
pf = (float*) pi;
4. In some system, pointers have the same size as type int, since int is the default type specifier,
some programmers omit the return type when defining a function. The return type defaults to
int which can later be interpreted as a pointer. This has proven to be a dangerous practice on
some computer and the programmer is made to define explicit types for functions.
Pointers to Pointers
A variable which contains address of a pointer variable is called pointer-to-pointer
1. malloc( ):
The function malloc allocates a user- specified amount of memory and a pointer to the start of
the allocated memory is returned.
If there is insufficient memory to make the allocation, the returned value is NULL.
Syntax:
data_type *x;
x= (data_type *) malloc(size);
Where,
x is a pointer variable of data_type
size is the number of bytes
2. calloc( ):
The function calloc allocates a user- specified amount of memory and initializes the allocated
memory to 0 and a pointer to the start of the allocated memory is returned.
data_type *x;
x= (data_type *) calloc(n, size);
Where,
x is a pointer variable of type int
n is the number of block to be allocated
size is the number of bytes in each block
Ex: int *x
x= calloc (10, sizeof(int));
The above example is used to define a one-dimensional array of integers. The capacity of this
array is n=10 and x [0: n-1] (x [0, 9]) are initially 0
Macro CALLOC
#define CALLOC (p, n, s)\
if ( ! ((p) = calloc (n, s)))\
{\
fprintf(stderr, “Insuffiient memory”);\
exit(EXIT_FAILURE);\
}\
3. realloc( ):
Before using the realloc( ) function, the memory should have been allocated using
malloc( ) or calloc( ) functions.
The function relloc( ) resizes memory previously allocated by either mallor or calloc,
which means, the size of the memory changes by extending or deleting the allocated
memory.
If the existing allocated memory need to extend, the pointer value will not change.
If the existing allocated memory cannot be extended, the function allocates a new block
and copies the contents of existing memory block into new memory block and then
deletes the old memory block.
When realloc is able to do the resizing, it returns a pointer to the start of the new block
and when it is unable to do the resizing, the old block is unchanged and the function
returns the value NULL
Syntax:
data_type *x;
x= (data_type *) realloc(p, s );
The size of the memory block pointed at by p changes to S. When s > p the additional s-p
memory block have been extended and when s < p, then p-s bytes of the old block are freed.
Macro REALLOC
#define REALLOC(p,S)\
if (!((p) = realloc(p,s))) \
{\
fprintf(stderr, "Insufficient memory");\
exit(EXIT_FAILURE);\
}\
4. free( )
Dynamically allocated memory with either malloc( ) or calloc ( ) does not return on its own.
The programmer must use free( ) explicitly to release space.
Syntax:
free(ptr);
This statement cause the space in memory pointer by ptr to be deallocated
int month;
int day;
int year;
Linear Array
A linear array is a list of a finite number ‘n’ of homogeneous data element such that
a. The elements of the array are reference respectively by an index set consisting of n
consecutive numbers.
b. The element of the array are respectively in successive memory locations.
The number n of elements is called the length or size of the array. The length or the numbers
of elements of the array can be obtained from the index set by the formula
When LB = 0,
Length = UB – LB + 1
When LB = 1,
Length = UB
Where,
UB is the largest index called the Upper Bound
LB is the smallest index, called the Lower Bound
Let LA be a linear array in the memory of the computer. The memory of the computer is
simply a sequence of address location as shown below,
1000
1001
1002
1003
1004
Using the base address of LA, the computer calculates the address of any element of LA by
the formula
Where, w is the number of words per memory cell for the array LA.
While writing computer programs, if finds ourselves in a situation where we cannot determine
how large an array to use, then a good solution to this problem is to defer this decision to run
time and allocate the array when we have a good estimate of the required array size.
Example:
int i, n, *list;
printf(“Enter the number of numbers to generate:”);
scanf(“%d”, &n);
if(n<1)
{
fprintf (stderr, “Improper value of n \n”);
exit(EXIT_FAILURE);
}
MALLOC (list, n*sizeof(int));
The programs fails only when n<1 or insufficient memory to hold the list of numbers that are
to be sorted.
Two DimensionalArrays
C uses array-of-arrays representation to represent a multidimensional array. The two
dimensional arrays is represented as a one-dimensional array in which each element is itself a
one-dimensional array.
Array-of-arrays representation
int **myArray;
myArray = make2dArray(5,10);
myArray[2][4]=6;
The second line allocates memory for a 5 by 10 two-dimensional array of integers and the
third line assigns the value 6 to the [2][4] element of this array.
ARRAY OPERATIONS
1. Traversing
Let A be a collection of data elements stored in the memory of the computer. Suppose
if the contents of the each elements of array A needs to be printed or to count the
numbers of elements of A with a given property can be accomplished by Traversing.
Traversing is a accessing and processing each element in the array exactly once.
Hear LA is a linear array with the lower bound LB and upper bound UB. This algorithm
traverses LA applying an operation PROCESS to each element of LA using while loop.
1. [Initialize Counter] set K:= LB
2. Repeat step 3 and 4 while K ≤ UB
3. [Visit element] Apply PROCESS to LA [K]
4. [Increase counter] Set K:= K + 1
[End of step 2 loop]
5. Exit
Hear LA is a linear array with the lower bound LB and upper bound UB. This algorithm
traverses LA applying an operation PROCESS to each element of LA using repeat – for loop.
1. Repeat for K = LB to UB
Apply PROCESS to LA [K]
[End of loop]
2. Exit.
Example:
Consider the array AUTO which records the number of automobiles sold each year from 1932
through 1984.
To find the number NUM of years during which more than 300 automobiles were sold,
involves traversing AUTO.
1. [Initialization step.] Set NUM := 0
2. Repeat for K = 1932 to 1984:
If AUTO [K] > 300, then: Set NUM: = NUM + 1.
[End of loop.]
3. Return.
2. Inserting
Let A be a collection of data elements stored in the memory of the computer.
Inserting refers to the operation of adding another element to the collection A.
Inserting an element at the “end” of the linear array can be easily done provided the
memory space allocated for the array is large enough to accommodate the additional
element.
Inserting an element in the middle of the array, then on average, half of the elements
must be moved downwards to new locations to accommodate the new element and keep
the order of the other elements.
Algorithm:
INSERT (LA, N, K, ITEM)
Here LA is a linear array with N elements and K is a positive integer such that K ≤ N. This
algorithm inserts an element ITEM into the Kth position in LA.
3. Deleting
Deleting refers to the operation of removing one element to the collection A.
Deleting an element at the “end” of the linear array can be easily done with difficulties.
If element at the middle of the array needs to be deleted, then each subsequent
elements be moved one location upward to fill up the array.
Algorithm
DELETE (LA, N, K, ITEM)
Here LA is a linear array with N elements and K is a positive integer such that K ≤ N. this
algorithm deletes the Kth element from LA
4. Sorting
Sorting refers to the operation of rearranging the elements of a list. Here list be a set of n
elements. The elements are arranged in increasing or decreasing order.
Ex: suppose A is the list of n numbers. Sorting A refers to the operation of rearranging the
elements of A so they are in increasing order, i.e., so that,
A[I] < A[2] < A[3] < ... < A[N]
Bubble Sort
Suppose the list of numbers A[l], A[2], ... , A[N] is in memory. The bubble sort algorithm
works as follows:
Example:
VTUPulse.com
VTUPulse.com
Complexity of the Bubble Sort Algorithm
The time for a sorting algorithm is measured in terms of the number of comparisons f(n). There
are n – 1 comparisons during the first pass, which places the largest element in the last position;
there are n - 2 comparisons in the second step, which places the second largest element in the
next-to-last position; and so on. Thus
𝒏(𝒏−𝟏) 𝒏𝟐
f(n) = (n - 1) + (n - 2) + ... + 2 + 1 = = = O(n) = O(n2)
𝟐 𝟐
5. Searching
Let DATA be a collection of data elements in memory, and suppose a specific ITEM of
information is given. Searching refers to the operation of finding the location LOC of
ITEM in DATA, or printing some message that ITEM does not appear there.
The search is said to be successful if ITEM does appear in DATA and unsuccessful
otherwise.
Linear Search
Suppose DATA is a linear array with n elements. Given no other information about DATA,
The way to search for a given ITEM in DATA is to compare ITEM with each element of DATA
one by one. That is, first test whether DATA [l] = ITEM, and then test whether DATA[2] =
ITEM, and so on. This method, which traverses DATA sequentially to locate ITEM, is called
linear search or sequential search.
Average Case: The average number of comparisons required to find the location of ITEM is
approximately equal to half the number of elements in the array.
(𝑛+1)
f(n)=
2
That is, the running time for the worst case is approximately equal to log2 n. One can also show that the
running time for the average case is approximately equal to the running time forthe worst case.
Two-Dimensional Arrays
The element of A with first subscript j and second subscript k will be denoted by
AJ,K or A[J, K]
The computer uses the formula to find the address of LA[K] in time independent of K.
LOC (LA[K]) = Base(LA) + w(K - 1)
The computer keeps track of Base(A)-the address of the first element A[1, 1] of A-and
computes the address LOC(A[J, K]) of A[J, K] using the formula
The element of B with subscripts K1 K2 ... , Kn will be denoted by B[K1 K2 ... , Kn]
The programming language will store the array B either in row-major order or in column-
major order.
Let C be such an n-dimensional array. The index set for each dimension of C consists of the
consecutive integers from the lower bound to the upper bound of the dimension. The length Li
of dimension i of C is the number of elements in the index set, and Li can be calculated, as
Li = upper bound - lower bound + 1
For a given subscript Ki, the effective index Ei of Li is the number of indices preceding Ki in
the index set, and Ei can be calculated from
Ei = Ki - lower bound
Then the address LOC(C[K1 K2 ... , Kn] of an arbitrary element of C can be obtained from the
formula
Base(C) + w[((( ... (ENLN-1 ] + E N-1])LN-2) + ... + E3))L2 + E2)L1 + E1]
or from the formula
Base(C) + w[( ... ((E1L2 + E2)L3 + E3)L4 + ... + EN-1 )LN + EN]
Polynomial
What is a polynomial
“A polynomial is a sum of terms, where each term has a form axe , where x is the variable, a is
the coefficient and e is the exponent.”
The largest (or leading) exponent of a polynomial is called its degree. Coefficients that are
zero are not displayed. The term with exponent equal to zero does not show the variable since
x raised to a power of zero is 1.
Polynomial Representation
One way to represent polynomials in C is to use typedef to create the type polynomial as
below:
Now if a is a variable and is of type polynomial and n < MAX_DEGREE, the polynomial
A(x) = Σai xi would be represented as:
a.degree = n
a.coef[i] = an-i , 0 ≤ i ≤ n
In this representation, the coefficients is stored in order of decreasing exponents, such that
a.coef [i] is the coefficient of xn-i provided a term with exponent n-i exists;
Otherwise, a.coef [i] =0. This representation leads to very simple algorithms for most of the
operations, it wastes a lot of space.
To preserve space an alternate representation that uses only one global array, terms to store
all polynomials.
The above figure shows how these polynomials are stored in the array terms. The
index of the first term of A and B is given by startA and startB, while finishA and
finishB give the index of the last term of A and B.
The index of the next free location in the array is given by avail.
For above example, startA=0, finishA=1, startB=2, finishB=5, & avail=6.
Polynomial Addition
C function is written that adds two polynomials, A and B to obtain D =A + B.
To produce D (x), padd( ) is used to add A (x) and B (x) term by term. Starting at
position avail, attach( ) which places the terms of D into the array, terms.
If there is not enough space in terms to accommodate D, an error message is printed to
the standard error device & exits the program with an error condition
void padd(int startA, int finishA, int startB, int finishB, int *startD,int *finishD)
{ /* add A(x) and B(x) to obtain D(x) */
float coefficient;
*startD = avail;
while (startA <= finishA && startB <= finishB)
switch(COMPARE(terms[startA].expon, terms[startB].expon))
{
case -1: /* a expon < b expon */
attach (terms [startB].coef, terms[startB].expon);
startB++;
break;
if (coefficient)
attach (coefficient, terms[startA].expon);
startA++;
startB++;
break;
Analysis of padd( ):
The number of non-zero terms in A and B is the most important factors in analyzing the time
complexity.
Let m and n be the number of non-zero terms in A and B, If m >0 and n > 0, the while loop is
entered. Each iteration of the loop requires O(1) time. At each iteration, the value of startA or
startB or both is incremented. The iteration terminates when either startA or startB exceeds
finishA or finishB.
𝑛 𝑛
A(x) = ∑ 𝑥2𝑖 and B(x) = ∑ 𝑖=0 𝑥 2𝑖+1
𝑖=0
The time for the remaining two for loops is bounded by O(n + m) because we cannot iterate
the first loop more than m times and the second more than n times. So, the asymptotic
computing time of this algorithm is O(n +m).
SPARSE MATRICES
A matrix contains m rows and n columns of elements as illustrated in below figures. In this
figure, the elements are numbers. The first matrix has five rows and three columns and the
second has six rows and six columns. We write m x n (read "m by n") to designate a matrix
with m rows and n columns. The total number of elements in such a matrix is mn. If m equals
n, the matrix is square.
Important Note:
A sparse matrix can be represented in 1-Dimension, 2- Dimension and 3- Dimensional array.
When a sparse matrix is represented as a two-dimensional array as shown in
Figure B, more space is wasted.
Example: consider the space requirements necessary to store a 1000 x 1000 matrix that has
only 2000 non-zero elements. The corresponding two-dimensional array requires space for
1,000,000 elements. The better choice is by using a representation in which only the nonzero
elements are stored.
The below figure shows the representation of matrix in the array “a” a[0].row contains
the number of rows, a[0].col contains the number of columns and a[0].value contains
the total number of nonzero entries.
Positions 1 through 8 store the triples representing the nonzero entries. The row index is
in the field row, the column index is in the field col, and the value is in the field value.
The triples are ordered by row and within rows by columns.
a[0] 6 6 8 b[0] 6 6 8
[1] 0 0 15 [1] 0 0 15
[2] 0 3 22 [2] 0 4 91
[3] 0 5 -15 [3] 1 1 11
[4] 1 1 11 [4] 2 1 3
[5] 1 2 3 [5] 2 5 28
[6] 2 3 -6 [6] 3 0 22
[7] 4 0 91 [7] 3 2 -6
[8] 5 2 28 [8] 5 0 -15
Fig (a): Sparse matrix stored as triple Fig (b): Transpose matrix stored as triple
If we process the original matrix by the row indices it is difficult to know exactly where to
place element <j, i, value> in the transpose matrix until we processed all the elements that
precede it.
This can be avoided by using the column indices to determine the placement of elements in
the transpose matrix. This suggests the following algorithm:
The columns within each row of the transpose matrix will be arranged in ascending order.
void transpose (term a[], termb[])
{ /* b is set to the transpose of a */
int n, i, j, currentb;
n = a[0].value; /* total number of elements */
b[0].row = a[0].col; /* rows in b = columns in a */
b[0].col = a[0].row; /* columns in b = rows in a */
b[0].value = n;
if (n > 0)
{ currentb = 1;
for (i = 0; i < a[O].col; i++)
for (j= 1; j<=n; j++)
if (a[j].col == i)
{
b[currentb].row = a[j].col;
b[currentb].col = a[j].row;
b[currentb].value = a[j].value;
currentb++;
}
}
}
Transpose of a sparse matrix
Each programming languages contains a character set that is used to communicate with the
computer. The character set include the following:
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Digits: 012345678 9
Special characters: + - / * ( ) , . $ = ‘ _ (Blank space)
Concatenation: Let S1 and S2 be the strings. The string consisting of the characters of S 1
followed by the character S2 is called Concatenation of S1 and S2.
Ex: ‘THE’ // ‘END’ = ‘THEEND’
‘THE’ // ‘ ’ // ‘END’ = ‘THE END’
Substring: A string Y is called substring of a string S if there exist string X and Z such that
S = X // Y // Z
If X is an empty string, then Y is called an Initial substring of S, and Z is an empty string then
Y is called a terminal substring of S.
Ex: ‘BE OR NOT’ is a substring of ‘TO BE OR NOT TO BE’
‘THE’ is an initial substring of ‘THE END’
STRINGS IN C
In C, the strings are represented as character arrays terminated with the null character \0.
Declaration 1:
#define MAX_SIZE 100 /* maximum size of string */
char s[MAX_SIZE] = {“dog”};
char t[MAX_SIZE] = {“house”};
s[0] s[1] s[2] s[3] t[0] t[1] t[2] t[3] t[4] t[4]
d o g \0 h o u s e \0
The above figure shows how these strings would be represented internally in memory.
char s[ ] = {“dog”};
char t[ ] = {“house”};
Using these declarations, the C compiler will allocate just enough space to hold each word
including the null character.
STORING STRINGS
Example: Suppose the input consists of the program. Using a record oriented, fixed length
storage medium, the input data will appear in memory as pictured below.
That is, one can use a linear array POINT which gives the address of successive record, so
that the records need not be stored in consecutive locations in memory. Inserting a new record
will require only an updating of the array POINT.
The other method to store strings one after another by using some separation marker, such as
the two dollar sign ($$) or by using a pointer giving the location of the string.
These ways of storing strings will save space and are sometimes used in secondary memory
when records are relatively permanent and require little changes.
These types of methods of storage are usually inefficient when the strings and their lengths
are frequently being changed.
Ex: TO BE OR NOT TO BE
Constants
Many programming languages denotes string constants by placing the string in either single
or double quotation marks.
Ex: ‘THE END’
“THE BEGINNING”
The string constants of length 7 and 13 characters respectively.
Variables
Each programming languages has its own rules for forming character variables. These
variables fall into one of three categories
1. Static: In static character variable, whose length is defined before the program is
executed and cannot change throughout the program
2. Semi-static: The length of the variable may vary during the execution of the program
as long as the length does not exceed a maximum value determined by the program
before the program is executed.
3. Dynamic: The length of the variable can change during the execution of the program.
STRING OPERATION
Substring
Accessing a substring from a given string requires three pieces of information:
(1) The name of the string or the string itself
(2) The position of the first character of the substring in the given string
(3) The length of the substring or the position of the last character of the substring.
The syntax denote the substring of a string S beginning in a position K and having a length L.
Indexing
Indexing also called pattern matching, refers to finding the position where a string pattern
first appears in a given string text T. This operation is called INDEX
If the pattern P does not appears in the text T, then INDEX is assigned the value 0.
The arguments “text” and “pattern” can be either string constant or string variable.
Concatenation
Let S1 and S2 be string. The concatenation of S1 and S2 which is denoted by S1 // S2, is the string
consisting of the characters of S1 followed by the character of S2.
Ex:
(a) Suppose S1 = 'MARK' and S2= ‘TWAIN' then
S1 // S2 = ‘MARKTWAIN’
String length is determined in C language using the strlen( ) function, as shown below:
X = strlen ("sunrise");
strlen function returns an integer value 7 and assigns it to the variable X
Similar to strcat, strlen is also a part of string.h, hence the header file must be included at the
time of pre-processing.
TUPulse.c
Pattern matching is the problem of deciding whether or not a given string pattern P appears in
a string text T. The length of P does not exceed the length of T.
Observation of algorithms
P is an r-character string and T is an s-character string
Algorithm contains two loops, one inside the other. The outer loop runs through each
successive R-character substring W K = T[K] T[K + 1] ... T[K+R-l] of T.
The inner loop compares P with W K, character by character. If any character does not
match, then control transfers to Step 5, which increases K and then leads to the next
substring of T.
If all the R characters of P do match those of some W K then P appears in T and K is the
INDEX of P in T.
If the outer loop completes all of its cycles, then P does not appear in T and so INDEX
= 0.
Complexity
The complexity of this pattern matching algorithm is equal to O(n2)
This algorithm contains the table that is used for the pattern P = aaba.
The table is obtained as follows.
Let Qi denote the initial substring of P of length i, hence Q0 = A, Q1 = a, Q2 = a2, Q3
= aab, Q4 = aaba = P (Here Q0 = A is the empty string.)
The rows of the table are labeled by these initial substrings of P, excluding P itself.
The columns of the table are labeled a, b and x, where x represents any character that
doesn't appear in the pattern P.
Let f be the function determined by the table; i.e., let f(Qi, t) denote the entry in the table
in row Qi and column t (where t is any character). This entry f(Qi, t) is defined to be the
largest Q that appears as a terminal substring in the string (Qi t) the concatenation of Qi
and t.
For example,
a2 is the largest Q that is a terminal substring of Q2a = a3, so f(Q2, a) = Q2
A is the largest Q that is a terminal substring of Q1b = ab, so f(Q1, b) = Q0
a is the largest Q that is a terminal substring of Q0a = a, so f(Q0, a) = Q1
A is the largest Q that is a terminal substring of Q3a = a3bx, so f(Q3, x) = Q0
Data Structures and Applications (18CS32)