Introduction To Data Structure & Algorithm
Introduction To Data Structure & Algorithm
Chapter Objectives
In this chapter, you will learn more about what is a Data Structure and Algorithms is and
how it actually functions. At the end of this chapter, students will be able to:
Apply the properties of Data Structures and algorithm in order to understand how
data is access and represented in computer
Develop a scheme to be able to define data type, abstract data type and data
structure
Develop self-reliance and enough confidence inorder to understand the process of
problem solving and Learn the basic mathematical functions used in analyzing
algorithms
Computer is an electronic machine which is used for data processing and manipulation.
When programmer collects such type of data for processing, he would require to store all of
them in Computer’s main memory.
In order to make computer work we need to know
As applications are getting complex and data rich, there are three common problems that applications
face now-a-days.
Data Search − Consider an inventory of 1 million (106) items of a store. If the application is to
search an item, it has to search an item in 1 million (10 6) items every time slowing down the
search. As data grows, search will become slower.
Processor speed − Processor speed although being very high, falls limited if the data grows to
billion records.
Multiple requests − as thousands of users can search data simultaneously on a web server, even
the fast server fails while searching the data.
To solve the above-mentioned problems, data structures come to rescue. Data can be organized in a
data structure in such a way that all items may not be required to be searched, and the required data
can be searched almost instantly.
What is Algorithm?
There are three cases which are usually used to compare various data structure's execution time in a
relative manner.
Worst Case − this is the scenario where a particular data structure operation takes maximum
time it can take. If an operation's worst case time is ƒ(n) then this operation will not take more
than ƒ(n) time where ƒ(n) represents function of n.
Average Case − this is the scenario depicting the average execution time of an operation of a
data structure. If an operation takes ƒ(n) time in execution, then m operations will take mƒ(n)
time.
Best Case − this is the scenario depicting the least possible execution time of an operation of a
data structure. If an operation takes ƒ(n) time in execution, then the actual operation may take
time as the random number which would be maximum as ƒ(n).
From the data structure point of view, following are some important categories of algorithms −
Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the following characteristics −
There are no well-defined standards for writing algorithms. Rather, it is problem and resource
dependent. Algorithms are never written to support a particular programming code.
As we know that all programming languages share basic code constructs like loops (do, for, while), flow-
control (if-else), etc. These common constructs can be used to write an algorithm.
We write algorithms in a step-by-step manner, but it is not always the case. Algorithm writing is a
process and is executed after the problem domain is well-defined. That is, we should know the problem
domain, for which we are designing a solution.
Example
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be written
as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an algorithm. It
makes it easy for the analyst to analyze the algorithm ignoring all unwanted definitions. He can observe
what operations are being used and how the process is flowing.
We design an algorithm to get a solution of a given problem. A problem can be solved in more than one
ways.
We design an algorithm to get a solution of a given problem. A problem can be solved in more
than one ways.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and after
implementation. They are the following −
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution or running
time of various operations involved. The running time of an operation can be defined as the number of
computer instructions executed per operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm X are
the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such as comparisons
in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space required by the
algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required by the
algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the algorithm in
its life cycle. The space required by an algorithm is equal to the sum of the following two components −
A fixed part that is a space required to store certain data and variables, that are independent of
the size of the problem. For example, simple variables and constants used, program size, etc.
A variable part is a space required by variables, whose size depends on the size of the problem.
For example, dynamic memory allocation, recursion stack space, etc.
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space depends on
data types of given variables and constant types and it will be multiplied accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run to
completion. Time requirements can be defined as a numerical function T(n), where T(n) can be
measured as the number of steps, provided each step consumes constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the total computational time is
T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here, we observe that T(n) grows
linearly as the input size increases.
Asymptotic analysis is input bound i.e., if there's no input to the algorithm, it is concluded to
work in a constant time. Other than the "input" all other factors are considered constant.
Asymptotic analysis refers to computing the running time of any operation in mathematical units
of computation. For example, the running time of one operation is computed as f(n) and may be
for another operation it is computed as g(n2). This means the first operation running time will
increase linearly with the increase in n and the running time of the second operation will increase
exponentially when n increases. Similarly, the running time of both operations will be nearly the
same if n is significantly small.
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or the longest amount of time an algorithm can possibly take
to complete.
STRUCTURED DATA TYPES - are collections of data items which are non-atomic. These data types are
“derived” from the simple data types.
Data types- is a particular kind of data item, as defined by the values it can take, the programming
language used, or the operations that can be performed on it.
These data types are available in most programming languages as built in type.
a. Integer: It is a data type which allows all values without fraction part. We can use it for whole
numbers.
b. Float: It is a data type which use for storing fractional numbers.
c. Character: It is a data type which is used for character values.
d. Pointer: A variable that holds memory address of another variable are called pointer.
• Representation format for each type balances compactness, range, accuracy, ease of
manipulation, and standardization
• For any language there is the need to provide different storage representations for integers and
floating point numbers.
• In the case of integers, a variety of sizes allows for the most efficient use of memory for a given
task. For example, the byte type (8-bit) is often useful for I/O tasks while long types (64-bit) are
needed for representing very large values. In between are the 16-bit short and the 32-bit int
types. The integer types use two's complement signed representations.
• For floating point there are two widths available. The IEEE 754 floating point standard is used for
the 32-bit float and 64-bit double types.
Historically, characters were typically represented with 7-bit ASCII codes, which allowed for 128
characters. A large number of 8-bit encodings exist to provide both standard ASCII and
A Boolean (true/false) type is needed for the many logic operations carried out in almost any
program.
Data Representation comes with 8 primitive data types in all. These include 4 types of integers, two
floating points, a character type and a Boolean. Below is a table of the primitive types with their
specifications.
Primitive Data Types
Type Values Default Size Range
Byte signed integers 0 8 bits -128 to 127
Short signed integers 0 16 bits -32768 to 32767
Int signed integers 0 32 bits -2147483648 to 2147483647
-9223372036854775808 to
Long signed integers 0 64 bits
9223372036854775807
+/-1.4E-45 to +/-3.4028235E+38,
Float IEEE 754 floating point 0.0 32 bits
+/-infinity, +/-0, NAN
+/-4.9E-324 to
Double IEEE 754 floating point 0.0 64 bits +/-1.7976931348623157E+308,
+/-infinity, +/-0, NaN
Char Unicode character \u0000 16 bits \u0000 to \uFFFF
1 bit used in
boolean true, false false 32 bit NA
integer
DATA REPRESENTATION
How numbers and other data, such as characters, are represented in memory
Representing data is of great practical importance. Ideally a single memory representation, or
type, could represent all data including numbers, characters and boolean values.
Computer memory and transfer rates, however, are not infinite and designers must strike a
compromise between the widest possible range of values and conserving memory and maximizing
speed.
Also, the data representations must use base 2 as dictated by the underlying binary hardware.
MEMORY REPRESENTATION
Scheme: based on the assignment of a numeric code to each of the characters in the character set.
Standard Coding Schemes:
1. ASCII (American Standard Code for Information Interchange)
2. EBCDIC (Extended Binary Coded Decimal Interchange Code)
H I
UNICODE
Unicode provides a unique number for every character, no matter what the platform, no matter
what the program, no matter what the language. The Unicode Standard has been adopted by
such industry leaders as Apple, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun, Sybase,
Unisys and many others. Unicode is required by modern standards such as XML, Java,
ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement
ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other
products. The emergence of the Unicode Standard, and the availability of tools supporting it,
are among the most significant recent global software technology trends. In Unicode, a letter
maps to something called a code point .
Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium
which is written like this: U+0639. This magic number is called a code point. The U+ means
"Unicode" and the numbers are hexadecimal. U+0639 is the Arabic letter Ain. The English
Hello
3 Schemes:
#1. Sign magnitude
Left most bit is used as a sign (1 is negative, 0 is positive) and the remaining bits
are used to store the magnitude.
#2. 2’s Complement
Nonnegative integers are represented as in the sign magnitude notation. The
representation of a negative number –n is obtained by first finding the base-two
representation for n, complementing it, and then adding one to the result.
#3. Excess or Biased Notation
The representation of an integer as a string of n bits is formed by adding the bias
2n-1 to the integer and representing the result in base-two.
Examples:
Scheme #1:
a] int n = 65;
0 0000000 01000001
sb magnitude
b] int n = -65;
1 0000000 01000001
sb magnitude
Scheme #2:
a] int n = 65;
0 00000000 01000001
b] int n = -65;
0 00000000 01000001
11111111 10111110 1’s complement
+ 1 2’s complement
1 11111111 10111111
sb magnitude
Scheme #3:
n = # of bits used to represent the integer
2n-1 = excess/bias
n = 16 bits
excess = 216-1 = 215 = 32768
a] int n = 65;
32768
+0065
32833 excess notation of +65
b] int n = -65;
32768
+0-65
32703 excess notation of -65
3. float
data type bits = sb + expo + mantissa
float 32 = 1 8 23
double 64 = 1 11 52
long double 80 = 1 14 64 + integer bit = 1
31 30……………………23 22………….0
63 62……………………52 51………….0
79 78……………………65 64 63………….0
Example:
1] float f = 10.25;
a. 1010.01
expo
3
b. 1.01001 * 2
dropped mantissa
c. biased expo:
127
+3
130 = 10000010
sb biased expo mantissa
0 10000010 0100100000000000000000
2] float f = 26.275;
a. 11010.011
b. 1.1010011 * 24
c. 127
+ 4
131 = 10000011
3] float f = -0.375;
a. 0.011
b. 1.1 * 2-2
c. 127
+-2
125 = 01111101