Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 46
Introduction to Algorithms
and Elementary Data
Structures Chapter One Introduction • An Algorithm is a sequence of steps to solve a problem. • Design and Analysis of Algorithm is very important for designing algorithm to solve different types of problems in the branch of computer science and information technology. • If we have an algorithm for a specific problem, then we can implement it in any programming language, meaning that the algorithm is independent from any programming languages. Problem Development Steps • The following steps are involved in solving computational problems. • Problem definition • Development of a model • Specification of an Algorithm • Designing an Algorithm • Checking the correctness of an Algorithm • Analysis of an Algorithm • Implementation of an Algorithm • Program testing • Documentation Characteristics of Algorithms • The main characteristics of algorithms are as follows − • Algorithms must have a unique name • Algorithms should have explicitly defined set of inputs and outputs • Algorithms are well-ordered with unambiguous operations • Algorithms halt in a finite amount of time. Algorithms should not run for infinity, i.e., an algorithm must end at some point Pseudocode • Pseudocode gives a high-level description of an algorithm without the ambiguity associated with plain text but also without the need to know the syntax of a particular programming language. • The running time can be estimated in a more general manner by using Pseudocode to represent the algorithm as a set of fundamental operations which can then be counted. Difference between Algorithm and Pseudocode • An algorithm is a formal definition with some specific characteristics that describes a process, which could be executed by a Turing- complete computer machine to perform a specific task. • On the other hand, pseudocode is an informal and (often rudimentary) human readable description of an algorithm leaving many granular details of it. • Writing a pseudocode has no restriction of styles and its only objective is to describe the high level steps of algorithm in a much realistic manner in natural language. Algorithm: Insertion-Sort • Input: A list L of integers of length n • Output: A sorted list L1 containing those integers present in L • Step 1: Keep a sorted list L1 which starts off empty • Step 2: Perform Step 3 for each element in the original list L • Step 3: Insert it into the correct position in the sorted list L1. • Step 4: Return the sorted list • Step 5: Stop Pseudocode • for i <- 1 to length(A) • x <- A[i] • j <- i • while j > 0 and A[j-1] > x • A[j] <- A[j-1] • j <- j - 1 • A[j] <- x Algorithm analysis • Algorithm analysis is an important part of computational complexity theory, which provides theoretical estimation for the required resources of an algorithm to solve a specific computational problem. • Analysis of algorithms is the determination of the amount of time and space resources required to execute it. • Usually, the efficiency or running time of an algorithm is stated as a function relating the input length to the number of steps, known as time complexity, or volume of memory, known as space complexity. Why Algorithms? • The main concern of analysis of algorithms is the required time or performance. Generally, we perform the following types of analysis − • Worst-case − The maximum number of steps taken on any instance of size a. • Best-case − The minimum number of steps taken on any instance of size a. • Average case − An average number of steps taken on any instance of size a. • Amortized − A sequence of operations applied to the input of size a averaged over time. Fundamentals of Algorithmic Problem Solving • Understanding the Problem • From a practical perspective, the first thing you need to do before designing an algorithm is to understand completely the problem given. • Read the problem’s description carefully and ask questions if you have any doubts about the problem, do a few small examples by hand, think about special cases, and ask questions again if needed. • An input to an algorithm specifies an instance of the problem the algorithm solves. • It is very important to specify exactly the set of instances the algorithm needs to handle Ascertaining the Capabilities of the Computational Device
• Once you completely understand a problem, you need to ascertain
the capabilities of the computational device the algorithm is intended for. • The vast algorithms in use today are still destined to be programmed for a computer closely resembling the von Neumann machine—a computer architecture outlined by the prominent Hungarian- American mathematician John von Neumann (1903– 1957), in collaboration with A. Choosing between Exact and Approximate Problem Solving • The next principal decision is to choose between solving the problem exactly or solving it approximately. • In the former case, an algorithm is called an exact algorithm; in the latter case, an algorithm is called an approximation algorithm. • First, there are important problems that simply cannot be solved exactly for most of their instances; examples include extracting square roots, solving nonlinear equations, and evaluating definite integrals. • Second, available algorithms for solving a problem exactly can be unacceptably slow because of the problem’s intrinsic complexity. Algorithm Design Techniques • An algorithm design technique (or “strategy” or “paradigm”) is a general approach to solving problems algorithmically that is applicable to a variety of problems from different areas of computing. • Learning these techniques is of utmost importance for the following reasons. • First, they provide guidance for designing algorithms for new problems, i.e., problems for which there is no known satisfactory algorithm. • It is not true, of course, that each of these general techniques will be necessarily applicable to every problem you may encounter. Designing an Algorithm and Data Structures • While the algorithm design techniques do provide a powerful set of general approaches to algorithmic problem solving, designing an algorithm for a particular problem may still be a challenging task. • Some design techniques can be simply inapplicable to the problem in question. • Of course, one should pay close attention to choosing data structures appropriate for the operations performed by the algorithm. “Algorithms + data structure =programs” Methods of Specifying an Algorithm • Using a natural language has an obvious appeal; however, the inherent ambiguity of any natural language makes a succinct and clear description of algorithms surprisingly difficult. • Nevertheless, being able to do this is an important skill that you should strive to develop in the process of learning algorithms. • Pseudocode is a mixture of a natural language and programming languagelike constructs. • Pseudocode is usually more precise than natural language. Proving an Algorithm’s Correctness • Once an algorithm has been specified, you have to prove its correctness. • That is, you have to prove that the algorithm yields a required result for every legitimate input in a finite amount of time. • A common technique for proving correctness is to use mathematical induction because an algorithm’s iterations provide a natural sequence of steps needed for such proofs. Analyzing an Algorithm • We usually want our algorithms to possess several qualities. After correctness, by far the most important is efficiency. In fact, there are two kinds of algorithm efficiency: time efficiency, indicating how fast the algorithm runs, and space efficiency, indicating how much extra memory it uses. • Another desirable characteristic of an algorithm is simplicity. • Unlike efficiency, which can be precisely defined and investigated with mathematical rigor, simplicity, like beauty, is to a considerable degree in the eye of the beholder. Simpler algorithms are easier to understand and easier to program Cont’d… • Yet another desirable characteristic of an algorithm is generality. • There are, in fact, two issues here: generality of the problem the algorithm solves and the set of inputs it accepts. • There are situations, however, where designing a more general algorithm is unnecessary or difficult or even impossible. • As to the set of inputs, your main concern should be designing an algorithm that can handle a set of inputs that is natural for the problem at hand. Coding an Algorithm • Most algorithms are destined to be ultimately implemented as computer programs. • Programming an algorithm presents both a peril and an opportunity. • Of course, implementing an algorithm correctly is necessary but not sufficient: you would not like to diminish your algorithm’s power by an inefficient implementation. • Modern compilers do provide a certain safety net in this regard, especially when they are used in their code optimization mode. • “As a rule, a good algorithm is a result of repeated effort and rework.” By Levtin Asymptotic Analysis • Asymptotic Analysis is defined as the big idea that handles the above issues in analyzing algorithms. • In Asymptotic Analysis, we evaluate the performance of an algorithm in terms of input size (we don’t measure the actual running time). • We calculate, how the time (or space) taken by an algorithm increases with the input size. • Asymptotic notation is a way to describe the running time or space complexity of an algorithm based on the input size. • It is commonly used in complexity analysis to describe how an algorithm performs as the size of the input grows. The three most commonly used notations are Big O, Omega, and Theta. Big O notation (O): • This notation provides an upper bound on the growth rate of an algorithm’s running time or space usage. • It represents the worst-case scenario, i.e., the maximum amount of time or space an algorithm may need to solve a problem. • For example, if an algorithm’s running time is O(n), then it means that the running time of the algorithm increases linearly with the input size n or less. Mathematical Definition • O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 } • The Big-O notation is useful when we only have an upper bound on the time complexity of an algorithm. Many times we easily find an upper bound by simply looking at the algorithm. • Examples : • { 100 , log (2000) , 10^4 } belongs to O(1) • U { (n/4) , (2n+3) , (n/100 + log(n)) } belongs to O(n) • U { (n^2+n) , (2n^2) , (n^2+log(n))} belongs to O( n^2) Omega notation (Ω): • This notation provides a lower bound on the growth rate of an algorithm’s running time or space usage. • It represents the best-case scenario, i.e., the minimum amount of time or space an algorithm may need to solve a problem. • For example, if an algorithm’s running time is Ω(n), then it means that the running time of the algorithm increases linearly with the input size n or more. • Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 } Examples : • { (n^2+n) , (2n^2) , (n^2+log(n))} belongs to Ω( n^2) • U { (n/4) , (2n+3) , (n/100 + log(n)) } belongs to Ω(n) • U { 100 , log (2000) , 10^4 } belongs to Ω(1) Theta notation (Θ): • This notation provides both an upper and lower bound on the growth rate of an algorithm’s running time or space usage. • It represents the average-case scenario, i.e., the amount of time or space an algorithm typically needs to solve a problem. • For example, if an algorithm’s running time is Θ(n), then it means that the running time of the algorithm increases linearly with the input size n. Cont’d… • Theta notation encloses the function from above and below. Since it represents the upper and the lower bound of the running time of an algorithm, it is used for analyzing the average-case complexity of an algorithm. • Let g and f be the function from the set of natural numbers to itself. The function f is said to be Θ(g), if there are constants c1, c2 > 0 and a natural number n0 • such that c1* g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0 Cont’d… Cont’d… • For example, Consider the expression 3n3 + 6n2 + 6000 = Θ(n3), the dropping lower order terms is always fine because there will always be a number(n) after which Θ(n3) has higher values than Θ(n2) irrespective of the constants involved. Properties of Asymptotic Notations: • 1. General Properties: • If f(n) is O(g(n)) then a*f(n) is also O(g(n)), where a is a constant. • Example: • f(n) = 2n²+5 is O(n²) • then, 7*f(n) = 7(2n²+5) = 14n²+35 is also O(n²).
• Similarly, this property satisfies both Θ and Ω notation.
Cont’d… • 2. Transitive Properties: • If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) = O(h(n)). • Example: • If f(n) = n, g(n) = n² and h(n)=n³ • n is O(n²) and n² is O(n³) then, n is O(n³) • Similarly, this property satisfies both Θ and Ω notation. Symmetric Properties: • If f(n) is Θ(g(n)) then g(n) is Θ(f(n)). • Example: • If(n) = n² and g(n) = n² • then, f(n) = Θ(n²) and g(n) = Θ(n²)
• This property only satisfies for Θ notation.
Example • Find the time complexity of the following code snippets • for(i= 0 ; i < n; i++){ • cout<< i << " " ; • i++; •} • The loop has maximum value n but the i will be incremented twice in the for loop which will make the time take half. So the time complexity is O(n/2) which is equivalent to O(n). Example 2 • Find the time complexity of the following code snippets • for(i= 0 ; i < n; i++){ • for(j = 0; j<n ;j++){ • cout<< i << " "; • } •} • The inner loop and the outer loop both are executing n times. So for single value of i, j is looping n times, for n values of i, j will loop total 2 n*n = n 2 times. So the time complexity is O(n ). Data Structures • A data structure is a storage that is used to store and organize data. • It is a way of arranging data on a computer so that it can be accessed and updated efficiently. • A data structure is not only used for organizing the data. It is also used for processing, retrieving, and storing data. • There are different basic and advanced types of data structures that are used in almost every program or software system that has been developed. So we must have good knowledge about data structures. Data Structures Cont’d… A Heap • A Heap is a special Tree-based data structure in which the tree is a complete binary tree. Operations of Heap Data Structure: • Heapify: a process of creating a heap from an array. • Insertion: process to insert an element in existing heap time complexity O(log N). • Deletion: deleting the top element of the heap or the highest priority element, and then organizing the heap and returning the element with time complexity O(log N). • Peek: to check or find the first (or can say the top) element of the heap. Cont’d… • Generally, Heaps can be of two types: • Max-Heap: In a Max-Heap the key present at the root node must be greatest among the keys present at all of it’s children. The same property must be recursively true for all sub-trees in that Binary Tree. • Min-Heap: In a Min-Heap the key present at the root node must be minimum among the keys present at all of it’s children. The same property must be recursively true for all sub-trees in that Binary Tree. Cont’d… Hashing: • Hashing is a popular technique for storing and retrieving data as fast as possible. • The main reason behind using hashing is that it gives optimal results as it performs optimal searches. • Basic Operations: • HashTable: This operation is used in order to create a new hash table. • Delete: This operation is used in order to delete a particular key-value pair from the hash table. • Get: This operation is used in order to search a key inside the hash table and return the value that is associated with that key. • Put: This operation is used in order to insert a new key-value pair inside the hash table. • DeleteHashTable: This operation is used in order to delete the hash table SET Data Structure • In computer science, a set data structure is defined as a data structure that stores a collection of distinct elements. • It is a fundamental Data Structure that is used to store and manipulate a group of objects, where each object is unique. The Signature property of the set is that it doesn’t allow duplicate elements.Structure Cont’d… • A set can be implemented in various ways but the most common ways are: • Hash-Based Set: the set is represented as a hash table where each element in the set is stored in a bucket based on its hash code. • Tree-based set: In this implementation, the set is represented as a binary search tree where each node in the tree represents an element in the set. Types of Set Data Structure: • The set data structure can be classified into the following two categories: • 1. Unordered Set • It is an unordered associative container implemented using a hash table where keys are hashed into indices of a hash table so that the insertion is always randomized. • All operations on the unordered set take constant time O(1) on an average which can go up to linear time O(n) in the worst case which depends on the internally used hash function, but practically they perform very well and generally provide a constant time lookup operation • 2. Ordered Set • An Ordered set is the common set data structure we are familiar with. It is generally implemented using balanced BSTs and it supports O(log n) lookups, insertions and deletion operations. A disjoint-set data structure • A disjoint-set data structure is defined as a data structure that keeps track of a set of elements partitioned into a number of disjoint (non- overlapping) subsets. • A union-find algorithm is an algorithm that performs two useful operations on such a data structure: • Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset. • Union: Join two subsets into a single subset. Here first we have to check if the two subsets belong to same set. If no, then we cannot perform union. The End!!!