Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

05 Data structure and algorithms Full [WU] (1)

Chapter 1 introduces data structures and algorithms as essential components for solving problems in programming. It discusses the characteristics, types, and the need for data structures, emphasizing their role in efficiently organizing data and managing computational resources. Additionally, it covers algorithm properties, analysis, and the importance of understanding computational complexity for selecting the best algorithm for a given problem.

Uploaded by

tagesseabate887
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

05 Data structure and algorithms Full [WU] (1)

Chapter 1 introduces data structures and algorithms as essential components for solving problems in programming. It discusses the characteristics, types, and the need for data structures, emphasizing their role in efficiently organizing data and managing computational resources. Additionally, it covers algorithm properties, analysis, and the importance of understanding computational complexity for selecting the best algorithm for a given problem.

Uploaded by

tagesseabate887
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 1: Introduction to Data structures

Introduction
A program is written in order to solve a problem. A solution to a problem actually consists of
two things:
 A way to organize the data---------------------------------------------Data Structure
 Sequence of steps to solve the problem -----------------------------Algorithm
The way data are organized in a computer’s memory is said to be Data Structure and the
sequence of computational steps to solve a problem is said to be an algorithm. Therefore, a
program is nothing but data structures plus algorithms.
Data Structure is a systematic way to organize data in order to use it efficiently. Following
terms are the foundation terms of a data structure.

 Interface − Each data structure has an interface. Interface represents the set of
operations that a data structure supports. An interface only provides the list of supported
operations, type of parameters they can accept and return type of these operations.

 Implementation − Implementation provides the internal representation of a data


structure. Implementation also provides the definition of the algorithms used in the
operations of the data structure.

Characteristics of a Data Structure


 Correctness − Data structure implementation should implement its interface
correctly.

 Time Complexity − Running time or the execution time of operations of data


structure must be as small as possible.

 Space Complexity − Memory usage of a data structure operation should be as little


as possible.

Need for Data Structure


As applications are getting complex and data rich, there are three common problems that
applications face now-a-days.

1
 Data Search − Consider an inventory of 1 million(106) items of a store. If the
application is to search an item, it has to search an item in 1 million(106) items every
time slowing down the search. As data grows, search will become slower.

 Processor speed − Processor speed although being very high, falls limited if the data
grows to billion records.

 Multiple requests − As thousands of users can search data simultaneously on a web


server, even the fast server fails while searching the data.

To solve the above-mentioned problems, data structures come to rescue. Data can be organized
in a data structure in such a way that all items may not be required to be searched, and the
required data can be searched almost instantly.

Execution Time Cases


There are three cases which are usually used to compare various data structure's execution time
in a relative manner.

 Worst Case − This is the scenario where a particular data structure operation takes
maximum time it can take. If an operation's worst case time is ƒ(n) then this operation
will not take more than ƒ(n) time where ƒ(n) represents function of n.

 Average Case − This is the scenario depicting the average execution time of an
operation of a data structure. If an operation takes ƒ(n) time in execution, then m
operations will take mƒ(n) time.

 Best Case − This is the scenario depicting the least possible execution time of an
operation of a data structure. If an operation takes ƒ(n) time in execution, then the actual
operation may take time as the random number which would be maximum as ƒ(n).

A Data Structure is a logical model of a particular organization of data.


There are two types of data structure:-
Linear Data Structure: the elements form a sequence, linear lists e.g. Arrays, stacks,
queues, linked lists.
Non-Linear Data Structure: the elements do not form a sequence, e.g. Graphs, trees

2
Given a problem, the first step to solve the problem is obtaining one’s own abstract view, or
model, of the problem. This process of modeling is called abstraction.
The model defines an abstract view to the problem. This implies that the model focuses only on
problem related stuff and that a programmer tries to define the properties of the problem.
These properties include
 The data which are affected and
 The operations that are involved in the problem.
With abstraction you create a well-defined entity that can be properly handled. These entities
define the data structure of the program. An entity with the properties just described is called an
abstract data type (ADT).

Abstra ct Data Type and Abstraction


An ADT consists of an abstract data structure and operations. Put in other terms, an ADT is an
abstraction of a data structure.
The ADT specifies:
1. What can be stored in the Abstract Data Type?
2. What operations can be done on/by the Abstract Data Type?

For example, if we are going to model employees of an organization:


 This ADT stores employees with their relevant attributes and discarding irrelevant attributes.
 This ADT supports hiring, firing, retiring … operations.

ADT: is a set of objects together with a set of operations; they are mathematical abstractions, objects
such as lists and graphs along with operations can be abstract data types.(Booleans, integers along
with operations also an abstract data type)
A data structure is a language construct that the programmer has defined in order to implement an
abstract data type.
There are lots of formalized and standard Abstract data types such as Stacks, Queues, Trees, etc.
Do all characteristics need to be modeled?
Not at all
 It depends on the scope of the model
 It depends on the reason for developing the model

3
Abstraction is a process of classifying characteristics as relevant and irrelevant for the particular
purpose at hand and ignoring the irrelevant ones.
Applying abstraction correctly is the essence of successful programming
How do data structures model the world or some part of the world?

 The value held by a data structure represents some specific characteristic of the world
 The characteristic being modeled restricts the possible values held by a data structure
 The characteristic being modeled restricts the possible operations to be performed on the data
structure.

Algorithm and Algorithm Analysis


Algorithm
An algorithm is the method of solving a problem. It is a sequence of instructions that act on some
input data to produce some output in a finite number of steps.

“The ends justify the means (algorithm)”…..does not always work in CS


Getting a correct solution late is as bad as getting a wrong solution

It is a well-defined computational procedure that takes some value or a set of values as input and
produces some value or a set of values as output. Data structures model the static part of the world.
They are unchanging while the world is changing.
In order to model the dynamic part of the world we need to work with algorithms. Algorithms are
the dynamic part of a program‘s world model.
An algorithm transforms data structures from one state to another state in two ways:
 An algorithm may change the value held by a data structure
 An algorithm may change the data structure itself
The quality of a data structure is related to its ability to successfully model the characteristics of
the world. Similarly, the quality of an algorithm is related to its ability to successfully simulate the
changes in the world.
However, independent of any particular world model, the quality of data structure and algorithms
is determined by their ability to work together well. Generally speaking, correct data structures

4
lead to simple and efficient algorithms and correct algorithms lead to accurate and efficient data
structures.
Algorithm basically fall under two broad categories:
Iterative- use loops and conditional statements
Recursive:- use divide and conquer strategy
Properties of Algorithm

• Finiteness: Algorithm must complete after a finite number of steps.


• Definiteness: Each step must be clearly defined, having one and only one interpretation. At each
point in computation, one should be able to tell exactly what happens next.

• Sequence: Each step must have a unique defined preceding and succeeding step. The first step
(start step) and last step (halt step) must be clearly noted.

• Feasibility: It must be possible to perform each instruction.


• Correctness: It must compute correct answer for all possible legal inputs.
• Language Independence: It must not depend on any one programming language.
• Completeness: It must solve the problem completely.
• Effectiveness: It must be possible to perform each step exactly and in a finite amount of time.
• Efficiency: It must solve with the least amount of computational resources such as time and space.
• Generality: Algorithm should be valid on all possible inputs.
• Input/output: There must be a specified number of input values, and one or more result values.
Analysis of Algorithm
Analysis of algorithms gives us the scientific reason to determine which algorithm should be
chosen to solve the problem.
But, it does not give a formula that helps us determine how many seconds or computer cycles a
particular algorithm will take to solve a problem. This is not useful information to choose the
right algorithm because:
It involves lots of variables like:

5
 Type of computer
 Instruction set used by the Microprocessor
 What optimization compiler performs on executable code
Algorithm Analysis Concepts
Algorithm analysis refers to the process of determining the amount of computing time and storage
space required by different algorithms. In other words, it‘s a process of predicting the resource
requirement of algorithms in a given environment.
In order to solve a problem, there are many possible algorithms. One has to be able to choose the best
algorithm for the problem at hand using some scientific method. To classify some data structures and
algorithms as good, we need precise ways of analyzing them in terms of resource requirement. The
main resources are:
 Running Time
 Memory Usage
 Communication Bandwidth
Running time is usually treated as the most important since computational time is the most precious
resource in most problem domains.
There are two approaches to measure the efficiency of algorithms:

• Empirical: Programming competing algorithms and trying them on different instances.


• Theoretical: Determining the quantity of resources required mathematically (Execution
time, memory space, etc.) needed by each algorithm.

However, it is difficult to use actual clock-time as a consistent measure of an algorithm‘s


efficiency, because clock-time can vary based on many things.
For example,
Specific processor speed
Current processor load
Specific data for a particular run of the program
o Input Size
o Input Properties
Operating Environment

6
Accordingly, we can analyze an algorithm according to the number of operations required, rather
than according to an absolute amount of time involved. This can show how an algorithm‘s efficiency
changes according to the size of the input.
Complexity Analysis
Complexity Analysis is the systematic study of the cost of computation, measured either in time
units or in operations performed, or in the amount of storage space required.
The goal is to have a meaningful measure that permits comparison of algorithms independent of
operating platform.
There are two things to consider:
Time Complexity: Determine the approximate number of operations required to solve a
problem of size n.
Space Complexity: Determine the approximate memory required to solve a problem of
size n.
Complexity analysis involves two distinct phases:
Algorithm Analysis: Analysis of the algorithm or data structure to produce a function T
(n) that describes the algorithm in terms of the operations performed in order to measure the
complexity of the algorithm.
Analysis: Analysis of the function T (n) to determine the general complexity category to
which it belongs.
There is no generally accepted set of rules for algorithm analysis. However, an exact count of
operations is commonly used.
Analysis Rules:
1. We assume an arbitrary time unit.
2. Execution of one of the following operations takes time 1:
Assignment Operation
Single Input/Output Operation
Single Boolean Operations
Single Arithmetic Operations
Function Return
3. Running time of a selection statement (if, switch) is the time for the condition evaluation + the
maximum of the running times for the individual clauses in the selection.

7
4. Loops: Running time for a loop is equal to the running time for the statements inside the loop *
number of iterations.
The total running time of a statement inside a group of nested loops is the running time of the
statements multiplied by the product of the sizes of all the loops.
For nested loops, analyze inside out.
Always assume that the loop executes the maximum number of iterations possible.
5. Running time of a function call is 1 for setup + the time for any parameter calculations + the time
required for the execution of the function body.
Examples:
1. int count(){ n loops of 2 units for an assignment, and an
int k=0; addition.
cout<< ―Enter an integer‖; 1 for the return statement.
cin>>n; --------------------------------------------------------
for (i=0;i<n;i++) -----------
k=k+1; T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4
return 0;} = O(n)
Time Units to Compute
-------------------------------------------------
1 for the assignment statement: int k=0
1 for the output statement.1 for the input
statement.
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an
addition.
1 for the return statement.
--------------------------------------------------------
-----------
T (n)= 1+1+1+(1+n+1+n)+2n+1 =
4n+6 = O(n)
2. int total(int n)
{
int sum=0;
for (int i=1;i<=n;i++)
sum=sum+1;
return sum;
}
Time Units to Compute
-------------------------------------------------
1 for the assignment statement: int sum=0
In the for loop:
1 assignment, n+1 tests, and n increments.

8
n loops of 2 units for the two increment
3. void func() (addition) operations
{ In the second while loop:
int x=0; n tests
int i=0; n-1 increments
int j=1; --------------------------------------------------------
cout<< ―Enter an Integer value‖; -----------
cin>>n; T (n)= 1+1+1+1+1+n+1+2n+n+n-1
while (i<n){ = 5n+5 = O(n)
x++; 4. int sum (int n)
i++; {
} int partial_sum = 0;
while (j<n) for (int i = 1; i <= n; i++)
{ partial_sum = partial_sum +(i * i * i);
j++; return partial_sum;
} }
} Time Units to Compute
Time Units to Compute -------------------------------------------------
------------------------------------------------- 1 for the assignment.
1 for the first assignment statement: x=0; 1 assignment, n+1 tests, and n increments.
1 for the second assignment statement: i=0; 10 n loops of 4 units for an assignment, an
1 for the third assignment statement: j=1; addition, and two multiplications.
1 for the output statement. 1 for the return statement.
1 for the input statement. --------------------------------------------------------
In the first while loop: -----------
n+1 tests
T (n)= 1+(1+n+1+n)+4n+1 = 6n+4 = O(n)

Computational and Asymptotic Complexity

The field of complexity analysis is concerned with the study of the efficiency of algorithms,
therefore the first question we must ask ourselves is: what is an algorithm? An algorithm can
be thought of as a set of instructions that specifies how to solve a particular problem. For any
given problem, there are usually a large number of different algorithms that can be used to
solve the problem. All may produce the same result, but their efficiency may vary. In other
words, if we write programs (e.g. in C++) that implement each of these algorithms and run
them on the same set of input data, then these implementations will have different
characteristics. Some will execute faster than others; some will use more memory than
others. These differences may not be noticeable for small amounts of data, but as the size of
the input data becomes large, so the differences will become significant.

9
To compare the efficiency of algorithms, a measure of the degree of difficulty of an
algorithm called computational complexity was developed in the 1960’s by Juris Hartmanis
and Richard E. Stearns. Computational complexity indicates how much effort is needed to
execute an algorithm, or what its cost is. This cost can be expressed in terms of execution
time (time efficiency, the most common factor) or memory (space efficiency).

Since time efficiency is the most important, we will focus on this for the moment. When we
run a program on a computer, what factors influence how fast the program runs? One factor
is obviously the efficiency of the algorithm, but a very efficient algorithm run on an old PC
may run slower than an inefficient algorithm run on a Cray supercomputer. Clearly the speed
of the computer the program is run on is also a factor. The amount of input data is another
factor: it will normally take longer for a program to process 10 million pieces of data than
100. Another factor is the language in which the program is written. Compiled languages are
generally much faster than interpreted languages, so a program written in C/C++ may
execute up to 20 times faster than the same program written in BASIC.

It should be clear that we couldn’t use real-time units such as microseconds to evaluate an
algorithms efficiency. A better measure is to use the number of operations required to
perform an algorithm, since this is independent of the computer that the program is run on.
Here, an operation can mean a single program statement such as an assignment statement.
Even this measure has problems, since high-level programming language statements do more
than low-level programming language statements, but it will do for now.

We need to express the relationship between the size n of the input data and the number of
operations t required to process the data. For example, if there is a linear relationship
between the size n and the number of operations t (that is, t = c.n where c is a constant), then
an increase in the size of the data by a factor of 5 results in an increase in number of
operations by factor of 5. Similarly, if t = log2 n then a doubling of n causes t to increase by
1. In other words, in complexity analysis we are not interested in how many microseconds it
will take for an algorithm to execute. We are not even that interested in how many operations

10
it will take. The important thing is how fast the number of operations grows as the size of the
data grows.

The examples given in the preceding paragraph are simple. In most real-world examples the
function expressing the relationship between n and t would be much more complex. Luckily
it is not normally necessary to determine the precise function, as many of the terms will not
be significant when the amount of data becomes large. For example, consider the function t =
f(n) = n2 + 5n. This function consists of two terms, n2 and 5n. However, for any n larger than
5 the n2 term is the most significant, and for very large n we can effectively ignore the 5n
term. Therefore we can approximate the complexity function as f(n) = n2. This simplified
measure of efficiency is called asymptotic complexity and is used when it is difficult or
unnecessary to determine the precise computational complexity function of an algorithm. In
fact it is normally the case that determining the precise complexity function is not feasible, so
the asymptotic complexity is the most common complexity measure used.
The computational complexity of an algorithm is a measure of the cost (usually in execution
time) incurred by applying the algorithm.

The asymptotic complexity of an algorithm is an approximation of the computational


complexity that holds for large amounts of input data.

1. Big-O Notation

The most commonly used notation for specifying asymptotic complexity, that is, for
estimating the rate of growth of complexity functions, is known as big-O notation. Big-O
notation was actually introduced before the invention of computers (in 1894 by Paul
Bachman) to describe the rate of function growth in mathematics. It can also be applied in
the field of complexity analysis as we are dealing with functions that relate then number of
operations t and the size of the data n.

Given two positive-valued functions f and g, consider the following definition:

11
The function f(n) is O(g(n)) if there exist positive numbers c and N such that f(n) ≤ c.g(n) for
all n ≥ N.

This definition states that g(n) is an upper bound on the value of f(n). In other words, in the
long run (for large n) f grows at most as fast as g.

To illustrate this definition, consider the previous example where f(n) = n2 + 5n. We showed
in the last section that for large values of n we could approximate this function by the n2 term
only; that is, the asymptotic complexity of f(n) is n2. Therefore, we can say now that f(n) is
O(n2). In the definition, we substitute n2 for g(n), and we see that it is true that f(n) ≤ 2.g(n)
for all n ≥ 5 (i.e. in this case c=2, N=5).

The problem with definition 3 is that it does not tell us how to calculate c and N. In actual
fact, there are usually an infinite number of pairs of values for c and N. We can show this by
solving the inequality from definition 3 and substituting the appropriate terms, i.e.

f(n) ≤ c.g(n)
n2 + 5n ≤ c. n2
1 + (5/n) ≤ c

Therefore if we choose N=5, then c= 2; if we choose N=6, then c=1.83, and so on. So what
are the ‘correct’ values for c and N ? The answer to this question, it should be determined for
which value of N a particular term in f(n) becomes the largest and stays the largest. In the
above example, the n2 term becomes larger than the 5n term at n>5, so N=5, c=2 is a good
choice.

Another problem with definition 3 is that there are actually infinitely many functions g(n)
that satisfy the definition. For example, we chose n2, but we could also have chosen n3, n4, n5,
and so on. All of these functions satisfy definition 3. To avoid this problem, the smallest
function g is chosen, which in this case is n2.
2. Properties of Big-O Notation

12
There are a number of useful properties of big-O notation that can be used when estimating
the efficiency of algorithms:

Fact 1: If f(n) is O(h(n)) and g(n) is O(h(n)) then f(n) + g(n) is O(h(n)).

In terms of algorithm efficiency, this fact states that if your program consists of, for example,
one O(n2) operation followed by another independent O(n2), then the final program will also
be O(n2).

Fact 2: The function a.nk is O(nk) for any a and k.

In other words, multiplying a complexity function by a constant value (a) does not change
the asymptotic complexity.

Fact 3: The function loga n is O(logb n) for any positive numbers a and b ≠ 1

This states that in the context of big-O notation it does not matter what the base of the
logarithmic function is - all logarithmic functions have the same rate of growth. So if a
program is O(log2 n) it is also O(log10 n). Therefore from now on we will leave out the base
and just write O(log n).
3. Ω, Θ and Little-o Notations

There exist three other, less common, ways of specifying the asymptotic complexity of
algorithms. We have seen that big-O notation refers to an upper bound on the rate of growth
of a function, where this function can refer to the number of operations required to execute
an algorithm given the size of the input data. There is a similar definition for the lower
bound, called big-omega (Ω) notation.

Definition 4: The function f(n) is Ω(g(n)) if there exist positive numbers c and N such that
f(n) ≥ c.g(n) for all n ≥ N.

13
This definition is the same as definition 3 apart from the direction of the inequality (i.e. it
uses ≥ instead of ≤). We can say that g(n) is a lower bound on the value of f(n), or, in the
long run (for large n) f grows at least as fast as g.

Ω notation has the same problems as big-O notation: there are many potential pairs of values
for c and N, and there are infinitely many functions that satisfy the definition. When choosing
one of these functions, for Ω notation we should choose the largest function. In other words,
we choose the smallest upper bound (big-O) function and the largest lower bound (Ω)
function. Using the example we gave earlier, to test if f(n) = n2 + 5n is Ω(n2) we need to find
a value for c such that n2 + 5n ≥ c.n2. For c=2 this expression holds for all n≥5.

For some algorithms (but not all), the lower and upper bounds on the rate of growth will be
the same. In this case, a third notation exists for specifying asymptotic complexity, called
theta (Θ) notation.

The function f(n) is Θ(g(n)) if there exist positive numbers c1, c2 and N such that c1.g(n) ≤
f(n) ≤ c2.g(n) for all n ≥ N.

This definition states that f(n) is Θ(g(n)) if f(n) is O(g(n)) and f(n) is Ω(g(n)). In other words,
the lower and upper bounds on the rate of growth are the same.

For the same example, f(n) = n2 + 5n, we can see that g(n) = n2 satisfies definition 5, so the
function n2 + 5n is Θ(n2). Actually we have shown this already by showing that g(n) = n2
satisfies both definitions 3 and 4.

The final notation is little-o notation. You can think of little-o notation as the opposite of Θ
notation.

The function f(n) is o(g(n)) if f(n) is O(g(n)) but f(n) is not Θ(g(n)).

14
In other words, if a function f(n) is O(g(n)) but not Θ(g(n)), we denote this fact by writing
that it is o(g(n)). This means that f(n) has an upper bound of g(n) but a different lower bound,
i.e. it is not Ω(g(n)).

Complexity Classes

We have seen now that algorithms can be classified using the big-O, Ω and Θ notations
according to their time or space complexities. A number of complexity classes of algorithms
exist, and some of the more common ones are illustrated in Figure 1.

Table 1 gives some sample values for these different complexity classes. We can see from
this table how great is the variation in the number of operations when the data becomes large.
As an illustration, if these algorithms were to be run on a computer that can perform 1 billion
operations per second (i.e. 1 GHz), the quadratic algorithm would take 16 minutes and 40
seconds to process 1 million data items, whereas the cubic algorithm would take over 31
years to perform the same processing. The time taken by the exponential algorithm would
probably exceed the lifetime of the universe!
It is obvious that choosing the right algorithm is of crucial importance, especially when
dealing with large amounts of data.

15
Figure 1 – A comparison of various complexity classes
Complexity Class Number of operations performed based on size of input
data n
Name Big-O n=1 n=100 n=100 n=10,00 n=100,00 n=1,000,0
0 0 0 0 00
Constant O(1) 1 1 1 1 1 1
Logarithm O(log n) 3.32 6.64 9.97 13.3 16.6 19.93
ic
Linear O(n) 10 100 1000 10,000 100,000 1,000,000
n log n O(n log 33.2 664 9970 133,000 1.66 * 106 1.99 * 107
n)

16

You might also like