Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Data Structure

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

 We know how to write, debug, and run simple programs in C

language.
 The aim has been to design good programs, where a good
program is defined as a program that
 runs correctly
 is easy to read and understand
 is easy to debug
 is easy to modify

 A program should give correct results, but along with that it


should also run efficiently.
 A program is said to be efficient when it executes in
minimum time and with minimum memory space.
 In order to write efficient programs we need to apply certain
data management concepts.
 Data structure is a crucial part of data management.
 A data structure is a way of organizing all data items that
considers not only the elements stored but also their
relationship to each other.
 Data Structure is a mathematical/logical model of a
particular organization of data items.

 More formally, let,


 D is a set of domain
 d is a designated/specified domain.
 F is a set of functions
 A is a set of axioms/rules

Then, the triple (D,F,A) is called data structure of


designated domain d.
 Data structures are used in almost every program or software
system.
 Example of data structures are arrays, linked lists, queues,
stacks, binary trees, hash tables etc.
 The primary goal of a program is not to perform calculations or
operations but to store and retrieve information as fast as
possible.
 The application of an appropriate data structure provides the
most efficient solution.
 A solution is said to be efficient if it solves the problem within the
required resource constraints like the total space available to
store the data and the time allowed to perform each subtask.
 And the best solution is the one that requires fewer resources
than known alternatives.
 Moreover, the cost of a solution is the amount of resources it
consumes. (time, space)
 Today computer programmers do not write programs just to
solve a problem but to write an efficient program.
 For this, they first analyze the problem to determine the
performance goals that must be achieved and then think of
the most appropriate data structure for that job.
 However, program designers with a poor understanding of
data structure concepts ignore this analysis step and apply
a data structure with which they can work comfortably.
 The applied data structure may not be appropriate for the
problem at hand and therefore may result in poor
performance (like slow speed of operations).
 If a program meets its performance goals with a
data structure that is simple to use, then it makes
no sense to apply another complex data structure
just to exhibit the programmer’s skill.

 When selecting a data structure to solve a


problem, the following steps must be performed.
 Analysis of the problem to determine the basic operations
that must be supported. For example, basic operation
may include inserting/deleting/searching a data item
from the data structure.
 Quantify the resource constraints for each operation.
 Select the data structure that best meets these
requirements.
 In the approach, the first concern is the data and the
operations that are to be performed on them.
 The second concern is the representation of the data, and
the final concern is the implementation of that
representation.
 There are different types of data structures that the C
language supports.
 While one type of data structure may permit adding of new
data items only at the beginning, the other may allow it to
be added at any position.
 While one data structure may allow accessing data items
sequentially, the other may allow random access of data.
 So, selection of an appropriate data structure for the
problem is a crucial decision and may have a major impact
on the performance of the program.
 Algorithm:- An algorithm is any well-defined
computational procedure that takes some value or set of
values as input and produces some value or set of values as
an accepted output to solve a well-defined computational
problem.

 Basically, Algorithm + Data Structures Program

 So, to develop a program of an algorithm, we should select


an appropriate data structure for that algorithm. Therefore,
algorithm and its associated data structure forms a
program.
 Data structures are building blocks of a program.
 A program built using improper data structures may not
work as expected.
 So as a programmer it is mandatory to choose most
appropriate data structures for a program.
 The term data means a value or set of values.
 It specifies either the value of a variable or a constant (e.g.,
marks of students, name of an employee, address of a
customer, value of pi, etc.).
 While a data item that does not have subordinate data items
is categorized as an elementary item, the one that is
composed of one or more subordinate data items is called a
group item.
 For example, a student’s name may be divided into three
sub-items—first name, middle name, and last name—but
his roll number would normally be treated as a single item.
 A record is a collection of data items. For example, the
name, address, course, and marks obtained are individual
data items.
 But all these data items can be grouped together to form a
record.
 A file is a collection of related records.
 For example, if there are 60 students in a class, then there
are 60 records of the students.
 All these related records are stored in a file.
 Similarly, we can have a file of all the employees working in
an organization, a file of all the customers of a company, a
file of all the suppliers, so on and so forth.
 Moreover, each record in a file may consist of multiple data
items but the value of a certain data item uniquely
identifies the record in the file. Such a data item K is called
a primary key, and the values K1, K2 ... in such field are
called keys or key values.
 For example, in a student’s record that contains roll
number, name, address, course, and marks obtained, the
field roll number is a primary key.
 Rest of the fields (name, address, course, and marks)
cannot serve as primary keys, since two or more students
may have the same name, or may have the same address (as
they might be staying at the same place), or may be
enrolled in the same course, or have obtained same marks.
 This organization and hierarchy of data is taken further to
form more complex types of data structures.
 Primitive data structures are the fundamental data types which are
supported by a programming language.
 Some basic data types are integer, real, character, and boolean.
 Non-primitive data structures are those data structures which are
created using primitive data structures.
 Examples of such data structures include linked lists, stacks, trees,
and graphs.
 If the elements of a data structure are stored in a linear or
sequential order, then it is a linear data structure.
 Examples include arrays, linked lists, stacks, and queues.
 If the elements of a data structure are not stored in a
sequential order, then it is a non-linear data structure.
 The relationship of adjacency is not maintained between
elements of a non-linear data structure.
 Examples include trees and graphs. C supports a variety of
data structures.
Arrays
 An array is a collection of similar data elements.
 These data elements have the same data type.
 The elements of the array are stored in consecutive memory locations
and are referenced by an index (also known as the subscript).
 Arrays are generally used when we want to store large amount of
similar type of data.
 But they have the following limitations:
 Arrays are of fixed size.
 Data elements are stored in contiguous memory locations which
may not be always available.
 Insertion and deletion of elements can be problematic because of
shifting of elements from their positions.
Linked Lists
 A linked list is a very flexible, dynamic data structure in
which elements (called nodes) form a sequential list.
 In contrast to static arrays, a programmer need not worry
about how many elements will be stored in the linked list.
 This feature enables the programmers to write robust
programs which require less maintenance.
 In a linked list, each node is allocated space as it is added
to the list.
Linked Lists
 Every node in the list points to the next node in the list.
 Therefore, in a linked list, every node contains the following
two types of data: The value of the node or any other data that
corresponds to that node
 A pointer or link to the next node in the list

 The last node in the list contains a NULL pointer to indicate


that it is the end or tail of the list.
 Since the memory for a node is dynamically allocated when
it is added to the list, the total number of nodes that may be
added to a list is limited only by the amount of memory
available.
Stacks
 A stack is a linear data structure in which insertion and
deletion of elements are done at only one end, which is known
as the top of the stack.
 Stack is called a last-in, first-out (LIFO) structure because the
last element which is added to the stack is the first element
which is deleted from the stack.
 In the computer’s memory, stacks can be implemented using
arrays or linked lists.
 There is another variable MAX, which is used to store the
maximum number of elements that the stack can store.
 If top = NULL, then it indicates that the stack is empty and if
top = MAX–1, then the stack is full.
Queues
 A queue is a first-in, first-out (FIFO) data structure in which
the element that is inserted first is the first one to be taken
out.
 The elements in a queue are added at one end called the
rear and removed from the other end called the front.
 Like stacks, queues can be implemented by using either
arrays or linked lists.
 Every queue has front and rear variables that point to the
position from where deletions and insertions can be done,
respectively.
Trees
 A tree is a non-linear data structure which consists of a
collection of nodes arranged in a hierarchical order.
 One of the nodes is designated as the root node, and the
remaining nodes can be partitioned into disjoint sets such that
each set is a sub-tree of the root.
 The simplest form of a tree is a binary tree.
 A binary tree consists of a root node and left and right sub-trees,
where both sub-trees are also binary trees.
Graphs
 A graph is a non-linear data structure which is a collection
of vertices (also called nodes) and edges that connect these
vertices.
 A graph is often viewed as a generalization of the tree
structure, where instead of a purely parent-to-child
relationship between tree nodes, any kind of complex
relationships between the nodes can exist.
 Note that unlike trees, graphs do not have any root node.
 Rather, every node in the graph can be connected with
every another node in the graph.
 When two nodes are connected via an edge, the two nodes
are known as neighbors.
 Traversing: It means to access each data item exactly once so that
it can be processed.

 Searching: It is used to find the location of one or more data items


that satisfy the given constraint. Such a data item may or may not
be present in the given collection of data items.

 Inserting: It is used to add new data items to the given list of data
items.

 Deleting: It means to remove (delete) a particular data item from


the given collection of data items.

 Sorting: Data items can be arranged in some order like ascending


order or descending order depending on the type of application.

 Merging: Lists of two sorted data items can be combined to form a


single list of sorted data items.
 It is an artificial and informal language that helps
programmers develop algorithms.
 It is a text based algorithmic design tool.
 Example:- If students marks is greater than or equal to 40
print Passed
else
print Failed

Difference between Program and Algorithm:-


A program is a detailed set of instructions for a computer to
carry out while an algorithm is a detailed sequence of steps
for carrying out some process.
 An algorithm is a well-defined sequence of steps that
provides a solution for a given problem, while a pseudocode
is used as one of the way to represent an algorithm.

 Algorithm can be written in natural language, but pseudo


code is written in a format that is closely related to high-
level programming language.

 Pseudo code does not use specific programming language


syntax.

 Pseudo code is an intermediary between an algorithm and


implemented program. It can be based on algorithm.
 An abstract data type (ADT) is the way we look at a data
structure, focusing on what it does and ignoring how it does
its job.
 For example, stacks and queues are perfect examples of an
ADT.
 We can implement both these ADTs using an array or a
linked list.
 ADT includes interface(API), methods and data.
 ADT is a set of objects together with a set of operations.
 “Abstract” in that, implementation of operations not
specified in ADT definition.
Example:- ‘List’ (operation insert, delete, search, sort)
‘Stack’ (operation push, pop, display)
 A data type can be considered “abstract” when it is defined
in terms of operations on it and its implementation is
hidden. You don’t know how ADT computes, but you know
what is computes.
 The end-user is not concerned about the details of how the
methods carry out their tasks.
 They are only aware of the methods that are available to them
and are only concerned about calling those methods and getting
the results.
 For example, when we use a stack or a queue, the user is
concerned only with the type of data and the operations that can
be performed on it.
 They should not be concerned with how the methods work or
what structures are being used to store the data.
 They should just know that to work with stacks, they have push()
and pop() functions available to them.
 Advantage of using ADTs In the real world, programs
evolve as a result of new requirements or constraints, so a
modification to a program commonly requires a change in
one or more of its data structures.
 For example, if you want to add a new field to a student’s
record to keep track of more information about each
student, then it will be better to replace an array with a
linked structure to improve the program’s efficiency.
 In such a scenario, rewriting every procedure that uses the
changed structure is not desirable.
 Therefore, a better alternative is to separate the use of a
data structure from the details of its implementation.
 This is the principle underlying the use of abstract data
types.
 Analyzing an algorithm means determining the amount of
resources (such as time and memory) needed to execute it.

 Basically, complexity is a criteria by help of which


algorithms are judged and their performance is measured.

 Space Complexity:- Space complexity of an algorithm is the


amount of memory space it needs to run to completion.
S(x)=C(x)+I(x)
Where, C(x) is a constant space required for codes,
simple variable etc.
I(x) is instantaneous space required for component
variable whose size is dependent on particular
problem instance.
 Consider the following code:
#include<stdio.h>
int main()
{
int a = 5, b = 5, c;
c = a + b;
printf("%d", c);
}

3 integer variables are used. Lets assume the size as 4 bytes. So, the
total space occupied by the above-given program is 4 * 3 = 12 bytes.
Hence, space complexity for the above-given program is O(1), or
constant as there is no dependency on variable term n which is size
of input instance.
 Consider the following code:
#include <stdio.h>
int main()
{
int n, i, sum = 0;
scanf("%d", &n);
int arr[n];
for(i = 0; i < n; i++)
{
scanf("%d", &arr[i]);
sum = sum + arr[i];
}
printf("%d", sum);
}
The array consists of n integer elements. So, the space occupied by the array is
4 * n. Also we have integer variables such as n, i and sum. Assuming 4 bytes for
each variable, the total space occupied by the program is 4n + 12 bytes. Since
the highest order of n in the equation 4n + 12 is n, so the space complexity is
O(n) or linear.
 Time complexity of an algorithm is the amount of time requires to
run to completion.
 Generally, T(x)=C(x)+I(x) where, C(x) is compile time and I(x) is run
time.
 Example:-
int count = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < i; j++)
count++;

Lets see how many times count++ will run.


 When , i=0 it will run 0 times.
 When , i=1 it will run 1 times.
 When , i=2 it will run 2 times and so on.
 Total number of times count++ will run is
 So the time complexity will be
 Worst-case running time:- This denotes the behavior of an
algorithm with respect to the worst possible case of the
input instance.
 The worst-case running time of an algorithm is an upper
bound on the running time for any input.
 Therefore, having the knowledge of worst-case running
time gives us an assurance that the algorithm will never go
beyond this time limit.
 Average-case running time:- The average-case running
time of an algorithm is an estimate of the running time for
an ‘average’ input.
 It specifies the expected behavior of the algorithm when the
input is randomly drawn from a given distribution.
 Average-case running time assumes that all inputs of a
given size are equally likely.
 Best-case running time:- The term ‘best-case performance’
is used to analyze an algorithm under optimal conditions.
For example, the best case for a simple linear search on an
array occurs when the desired element is the first in the
list.
 However, while developing and choosing an algorithm to
solve a problem, we hardly base our decision on the best-
case performance.
 It is always recommended to improve the average
performance and the worst-case performance of an
algorithm.
 These are some terminology that enables us to make
meaningful statement about the time and space
complexities of an algorithm.
 Asymptotic notation defined for functions over the natural
numbers.
 Example:-
T(n)=F(n2)
It describes how T(n) grows in comparisons to n2.
 For non-negative function f(n) and g(n), if there exists
an integer n0 and a constant c>0, such that, for all
integer n  n0 , f(n)  cg(n) then f(n) is Big-Oh of g(n).
f(n)=O(g(n))
 For non-negative function f(n) and g(n), if there exists
an integer n0 and a constant c>0, such that, for all
integer n  n0 , f(n) < cg(n) then f(n) is Small-Oh of g(n).
f(n)=o(g(n))
 For non-negative function f(n) and g(n), if there exists
an integer n0 and a constant c>0, such that, for all
integer n  n0 , cg(n)  f(n) then f(n) is Big-Omega of
g(n).
f(n) = (g(n)).
ω
 For non-negative function f(n) and g(n), if there exists
an integer n0 and a constant c>0, such that, for all
integer n  n0 , cg(n) <f(n) then f(n) is small omega of
g(n).
f(n) = ω(g(n))
 For non-negative function f(n) and g(n), if there exists
an integer n0 and a constant c1, c2, such that, for all
integer n  n0 , 0  c1g(n)  f(n)  c2g(n) then f(n) is theta
of g(n).
f(n) = (g(n))
Constant Time:-
 When an algorithm is not reliant on the input size n, it is said to
have constant time with order O(1).
 The runtime will always be the same, regardless of the input
size n.
Linear Time:-
 When the running time of an algorithm rises linearly with the
length of the input, it is said to have linear time complexity.
 When a function checks all the values in an input data set, it is
said to have time complexity of order O(n).
Logarithmic Time:-
 When an algorithm lowers the amount of the input data in each
step, it is said to have a logarithmic time complexity.
 Binary trees and binary search functions are some of the
algorithms with logarithmic time complexity which is order of
O(log n).
Quadratic Time:-
 When the execution time of an algorithm rises non-linearly with
the length of the input, it is said to have a quadratic time
complexity.
 In general, nested loops fall into the quadratic time complexity
order where one loop takes O(n) and if the function contains
loops inside loops, it takes O(n)*O(n)=O(n2)
 The best algorithm to solve a particular problem at hand is
no doubt the one that requires less memory space and takes
less time to complete its execution.
 But practically, designing such an ideal algorithm is not a
trivial task.
 There can be more than one algorithm to solve a particular
problem.
 One may require less memory space, while the other may
require less CPU time to execute.
 Thus, it is not uncommon to sacrifice one thing for the
other.
 Hence, there exists a time–space trade-off among
algorithms.
 So, if space is a big constraint, then one might choose a
program that takes less space at the cost of more CPU time.
 On the contrary, if time is a major constraint, then one
might choose a program that takes minimum time to
execute at the cost of more space.
 In this technique, we access each element of an array one by one sequentially and
see whether it is desired element or not.
 A search will be unsuccessful if all the elements are accessed and the desired
element is not found.
 Algorithm:-
Step 1: SET POS = -1
Step 2: SET I = 1
Step 3: Repeat Step 4 while I<=N
Step 4: IF A[I] = VAL
SET POS = I
PRINT POS
Go to Step 6
[END OF IF]
SET I = I + 1
[END OF LOOP] Time Complexity:-
Step 5: IF POS = -1
Worst Case O(n)
PRINT " VALUE IS NOT PRESENT " Average Case  O(n)
[END OF IF] Best Case  O(1)
Step 6: EXIT
 It is an extremely efficient algorithm.
 This search technique searches the given item in minimum possible
comparisons.
 Technique:-
1) First find the middle element of the array.
2) Compare the mid element with search element.
Case 1:
 It is the desired element, then search is successful.
Case 2:
 If it is less than desired item, then search only the first(left) half of the
array.
Case 3:
 If it is greater than the desired element, search in the second(right)
half of the array.
 Algorithm:-
Step 1: SET BEG = lower_bound, END = upper_bound, POS = - 1
Step 2: Repeat Steps 3 and 4 while BEG <=END
Step 3: SET MID = (BEG + END)/2
Step 4: IF A[MID] = VAL
SET POS = MID
PRINT POS
Go to Step 6
ELSE IF A[MID] > VAL
SET END = MID - 1
ELSE
SET BEG = MID + 1
[END OF IF]
[END OF LOOP]
Step 5: IF POS = -1
PRINT "VALUE IS NOT PRESENT"
[END OF IF]
Step 6: EXIT
Best Case  O(1) [First calculated middle element is the search element]
Other Cases:-
Binary Search Algorithm Advantages-
 The advantages of binary search algorithm are-

 It eliminates half of the list from further searching by using the result of each
comparison.
 It indicates whether the element being searched is before or after the current
position in the list.
 This information is used to narrow the search.

 For large lists of data, it works significantly better than linear search.

Binary Search Algorithm Disadvantages-


 The disadvantages of binary search algorithm are-

 It employs recursive approach which requires more stack space.

You might also like