Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Data Structure Unit 3 Notes

The document provides an overview of queues, explaining their FIFO principle, basic terminologies, and types including simple, double-ended, circular, and priority queues. It discusses memory representation using arrays and linked lists, along with operations on circular queues and deques. Additionally, it covers sorting and searching algorithms, emphasizing their importance in data organization and efficiency.

Uploaded by

payalmadankar.01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Structure Unit 3 Notes

The document provides an overview of queues, explaining their FIFO principle, basic terminologies, and types including simple, double-ended, circular, and priority queues. It discusses memory representation using arrays and linked lists, along with operations on circular queues and deques. Additionally, it covers sorting and searching algorithms, emphasizing their importance in data organization and efficiency.

Uploaded by

payalmadankar.01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Unit 3

Queue

Queue is a linear data structure that follows FIFO (First In First Out) Principle, so the
first element inserted is the first to be popped out.
FIFO Principle in Queue:
FIFO Principle states that the first element added to the Queue will be the first one to be
removed or processed. So, Queue is like a line of people waiting to purchase tickets, where
the first person in line is the first person served. (i.e. First Come First Serve).
Basic Terminologies of Queue
 Front: Position of the entry in a queue ready to be served, that is, the first entry that
will be removed from the queue, is called the front of the queue. It is also referred as
the head of the queue.
 Rear: Position of the last entry in the queue, that is, the one most recently added, is
called the rear of the queue. It is also referred as the tail of the queue.
 Size: Size refers to the current number of elements in the queue.
 Capacity: Capacity refers to the maximum number of elements the queue can hold.

The four types of Queue are: Simple Queue, Double-ended queue, Circular Queue
and Priority Queue.

Representation of Queue in Memory

Like Stacks, Queues can also be represented in memory in two ways.

 Using the contiguous memory like an array


 Using the non-contiguous memory like a linked list

Using the Contiguous Memory like an Array

In this representation the Queue is implemented using the array. Variables used in this case
are

 QUEUE- the name of the array storing queue elements.


 FRONT- the index where the first element is stored in the array representing the
queue.
 REAR- the index where the last element is stored in array representing the queue.
 MAX- defining that how many elements (maximum count) can be stored in the array
representing the queue.

Using the Non-Contiguous Memory like a Linked List

In this representation the queue is implemented using the dynamic data structure Linked List.
Using linked list for creating a queue makes it flexible in terms of size and storage. You don’t
have to define the maximum number of elements in the queue.
Pointers (links) to store addresses of nodes for defining a queue are.

 FRONT- address of the first element of the Linked list storing the Queue.
 REAR- address of the last element of the Linked list storing the Queue.

Circular Queue

There was one limitation in the array implementation of Queue. If the rear reaches to the end
position of the Queue then there might be possibility that some vacant spaces are left in the
beginning which cannot be utilized. So, to overcome such limitations, the concept of the

circular queue was introduced.

As we can see in the above image, the rear is at the last position of the Queue and front is
pointing somewhere rather than the 0 th position. In the above array, there are only two
elements and other three positions are empty. The rear is at the last position of the Queue; if
we try to insert the element then it will show that there are no empty spaces in the Queue.
There is one solution to avoid such wastage of memory space by shifting both the elements at
the left and adjust the front and rear end accordingly. It is not a practically good approach
because shifting all the elements will consume lots of time. The efficient approach to avoid
the wastage of the memory is to use the circular queue data structure.

What is a Circular Queue?


A circular queue is similar to a linear queue as it is also based on the FIFO (First In First Out)
principle except that the last position is connected to the first position in a circular queue that
forms a circle. It is also known as a Ring Buffer.

Operations on Circular Queue


The following are the operations that can be performed on a circular queue:

o Front: It is used to get the front element from the Queue.


o Rear: It is used to get the rear element from the Queue.
o enQueue(value): This function is used to insert the new value in the Queue. The new
element is always inserted from the rear end.
o deQueue(): This function deletes an element from the Queue. The deletion in a
Queue always takes place from the front end.

Applications of Circular Queue


The circular Queue can be used in the following scenarios:

o Memory management: The circular queue provides memory management. As we


have already seen that in linear queue, the memory is not managed very efficiently.
But in case of a circular queue, the memory is managed efficiently by placing the
elements in a location which is unused.
o CPU Scheduling: The operating system also uses the circular queue to insert the
processes and then execute them.
o Traffic system: In a computer-control traffic system, traffic light is one of the best
examples of the circular queue. Each light of traffic light gets ON one by one after
every interval of time. Like red light gets ON for one minute then yellow light for one
minute and then green light. After green light, the red light gets ON.

Dequeue and priority Queue

Deque or Double Ended Queue is a type of queue in which insertion and removal of elements
can either be performed from the front or the rear. Thus, it does not follow FIFO rule (First In
First Out).

Types of Deque

 Input Restricted Deque


In this deque, input is restricted at a single end but allows deletion at both the ends.
 Output Restricted Deque
In this deque, output is restricted at a single end but allows insertion at both the ends.

Operations on a Deque
Below is the circular array implementation of deque. In a circular array, if the array is full, we
start from the beginning.
But in a linear array implementation, if the array is full, no more elements can be inserted. In
each of the operations below, if the array is full, "overflow message" is thrown.
Before performing the following operations, these steps are followed.
1. Take an array (deque) of size n.
2. Set two pointers front = -1 and rear = 0.
Initialize an array and pointers for deque

1. Insert at the Front


This operation adds an element at the front.

1. Check if the deque is full.


2. Check the position of front
3. If the deque is full (i.e. (front == 0 && rear == n - 1) || (front == rear + 1)), insertion
operation cannot be performed (overflow condition).
4. If the deque is empty, reinitialize front = 0. And, add the new key into array[front].
5. If front = 0, reinitialize front = n-1 (last index).

6. Shift front to the end


7. Else, decrease front by 1.
8. Add the new key 5 into array[front].

9. Insert the element at Front


2. Insert at the Rear

This operation adds an element to the rear.

1. Check if the deque is full.

2. Check if deque is full

3. If the deque is full, insertion operation cannot be performed (overflow condition).


4. If the deque is empty, reinitialize rear = 0. And, add the new key into array[rear].
5. If rear = n - 1, reinitialize real = 0 (first index).

6. Else, increase rear by 1.


7. Increase the rear
8. Add the new key 5 into array[rear].

9. Insert the element at rear

3. Delete from the Front

The operation deletes an element from the front.


1. Check if the deque is empty.

2. Check if deque is empty

3. If the deque is empty (i.e. front = -1), deletion cannot be performed (underflow condition).
4. If the deque has only one element (i.e. front = rear), set front = -1 and rear = -1.
5. Else if front is at the last index (i.e. front = n - 1), set front = 0.

6. Else, front = front + 1.


7. Increase the front

4. Delete from the Rear

This operation deletes an element from the rear.

1. Check if the deque is empty.

2. Check if deque is empty

3. If the deque is empty (i.e. front = -1), deletion cannot be performed (underflow condition).
4. If the deque has only one element (i.e. front = rear), set front = -1 and rear = -1, else follow
the steps below.
5. If rear is at the first index (i.e. rear = 0), reinitialize rear = n - 1.
6. Else, rear = rear - 1.
7. Decrease the rear

5. Check Empty

This operation checks if the deque is empty. If front = -1, the deque is empty.

6. Check Full

This operation checks if the deque is full. If front = 0 and rear = n - 1 OR front = rear + 1,
the deque is full.

A priority queue is a type of queue that arranges elements based on their priority values.
 Each element has a priority associated. When we add an item, it is inserted in a position
based on its priority.
 Elements with higher priority are typically retrieved or removed before elements with
lower priority.
 Ascending Order Priority Queue : In this queue, elements with lower values have
higher priority. For example, with elements 4, 6, 8, 9, and 10, 4 will be dequeued first
since it has the smallest value, and the dequeue operation will return 4.
 Descending order Priority Queue : Elements with higher values have higher priority.
The root of the heap is the highest element, and it is dequeued first. The queue adjusts
by maintaining the heap property after each insertion or deletion.
1) Insertion : If the newly inserted item is of the highest priority, then it is inserted at the
top. Otherwise, it is inserted in such a way that it is accessible after all higher priority items
are accessed.
2) Deletion : We typically remove the highest priority item which is typically available at
the top. Once we remove this item, we need not move next priority item at the top.
3) Peek : This operation only returns the highest priority item (which is typically available
at the top) and does not make any change to the priority queue.

Difference between Priority Queue and Normal Queue


There is no priority attached to elements in a queue, the rule of first-in-first-out (FIFO) is
implemented whereas, in a priority queue, the elements have a priority. The elements with
higher priority are served first.

Operations of above structure using array and linked representation

Sorting and searching

Sorting in data structures is the process of arranging data elements in a specific order
(ascending or descending) based on a particular criterion, crucial for efficient searching,
organization, and analysis.

Why Sorting is Important:


 Efficient Searching:
Sorting enables the use of efficient search algorithms like binary search, which significantly
reduces search time complexity from O(n) to O(log n).
 Data Organization:
Sorting helps organize data in a meaningful way, making it easier to understand and
process.
 Data Analysis:
Sorted data is easier to analyze and identify patterns or trends.
 Performance Enhancement:
Sorting is fundamental for various data processing tasks, including ranking, priority-based
retrieval, and merging data.
Common Sorting Algorithms:
 Bubble Sort:
A simple comparison-based algorithm that repeatedly steps through the list, comparing
adjacent elements and swapping them if they are in the wrong order.
 Selection Sort:
Another comparison-based algorithm that repeatedly finds the minimum (or maximum)
element from the unsorted portion of the list and places it at the beginning.
 Insertion Sort:
Builds the sorted array one item at a time, inserting each element into its correct position
within the sorted portion.
 Merge Sort:
A divide-and-conquer algorithm that recursively divides the list into smaller sublists, sorts
them, and then merges them back together.
 Quick Sort:
A divide-and-conquer algorithm that selects a "pivot" element and partitions the list around
it, placing elements smaller than the pivot before it and elements larger than the pivot after
it.
 Heap Sort:
A comparison-based algorithm that uses a heap data structure to efficiently sort the
elements.
 Counting Sort:
A non-comparison-based algorithm that sorts elements by counting the occurrences of each
element.
 Radix Sort:
A non-comparison-based algorithm that sorts elements by their digits or characters, starting
from the least significant digit/character.
 Shell Sort:
An improvement over insertion sort that sorts elements by comparing elements that are far
apart and then gradually reducing the gap between them.
 Bucket Sort:
A sorting algorithm that divides the input data into a set of buckets and sorts each bucket
individually.

Searching in data structures refers to finding a specific element within a collection of data,
and common algorithms include linear search, binary search, interpolation search, and
algorithms using data structures like binary search trees and hash tables.

1. Linear Search (Sequential Search):


 Concept: Examines each element in the data structure one by one until the target element is
found or the end of the structure is reached.
 Data Structures: Suitable for unsorted arrays or linked lists.
 Time Complexity: O(n) in the worst-case scenario.

2. Binary Search:
 Concept: Efficiently searches sorted arrays or lists by repeatedly dividing the search interval
in half.
 Data Structures: Works best on sorted arrays or lists.
 Time Complexity: O(log n).

3. Interpolation Search:
 Concept: An improvement over binary search, it estimates the position of the target element
based on its value and the values of the first and last elements in the search interval.
 Data Structures: Works best on uniformly distributed data.
 Time Complexity: O(log log n) in the best case, O(n) in the worst case.
4. Data Structures for Searching:
 Binary Search Trees (BSTs):
 Concept: A tree-based data structure where each node has a value, and the values in the left
subtree are less than or equal to the node's value, while the values in the right subtree are
greater than the node's value.
 Searching: BSTs allow for efficient searching by traversing the tree based on the values of the
nodes.
 Hash Tables:
 Concept: A data structure that uses a hash function to map keys to values, allowing for fast
lookups.
 Searching: Hash tables provide near-constant time (O(1)) for searching, insertion, and deletion
operations, on average.
 Ternary Search Trees:
 Concept: A variation of a binary tree that can store strings and perform prefix searches
efficiently.
 Searching: Ternary search trees allow for fast searching and retrieval of strings.
 Linked Lists:
 Concept: A linear data structure where elements are stored in nodes, and each node contains a
pointer to the next node.
 Searching: Searching in linked lists involves traversing the list sequentially, which can be slow
for large lists.
 Fibonacci Search:
 Concept: Uses the Fibonacci sequence to divide the array into sections and searches for the
target element.
 Data Structures: Primarily used when the data structure prohibits direct access to elements,
such as in distributed data systems.
 Jump Search:
 Concept: A search algorithm that skips a fixed number of elements (block size) at each step
instead of checking elements one by one like Linear Search.
 Data Structures: Designed for sorted arrays.
 Exponential Search:
 Concept: Combines binary search with a preliminary phase that helps find the range where the
target element lies.
 Data Structures: Useful when the array is unbounded or when the size of the array is
unknown.

Selection sort

The Selection Sort algorithm finds the lowest value in an array and moves it to the front of
the array.

The algorithm looks through the array again and again, moving the next lowest values to the
front, until the array is sorted.

How it works:

1. Go through the array to find the lowest value.


2. Move the lowest value to the front of the unsorted part of the array.
3. Go through the array again as many times as there are values in the array.

Continue reading to fully understand the Selection Sort algorithm and how to implement it
yourself.

Before we implement the Selection Sort algorithm in a programming language, let's manually
run through a short array only one time, just to get the idea.

Step 1: We start with an unsorted array.


[ 7, 12, 9, 11, 3]

Step 2: Go through the array, one value at a time. Which value is the lowest? 3, right?

[ 7, 12, 9, 11, 3]

Step 3: Move the lowest value 3 to the front of the array.

[ 3, 7, 12, 9, 11]

Step 4: Look through the rest of the values, starting with 7. 7 is the lowest value, and already
at the front of the array, so we don't need to move it.

[ 3, 7, 12, 9, 11]

Step 5: Look through the rest of the array: 12, 9 and 11. 9 is the lowest value.

[ 3, 7, 12, 9, 11]

Step 6: Move 9 to the front.

[ 3, 7, 9, 12, 11]

Step 7: Looking at 12 and 11, 11 is the lowest.

[ 3, 7, 9, 12, 11]

Step 8: Move it to the front.

[ 3, 7, 9, 11, 12]

Finally, the array is sorted.

#include <iostream>
using namespace std;

// function to swap the the position of two elements


void swap(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}

// function to print an array


void printArray(int array[], int size) {
for (int i = 0; i < size; i++) {
cout << array[i] << " ";
}
cout << endl;
}

void selectionSort(int array[], int size) {


for (int step = 0; step < size - 1; step++) {
int min_idx = step;
for (int i = step + 1; i < size; i++) {

// To sort in descending order, change > to < in this line.


// Select the minimum element in each loop.
if (array[i] < array[min_idx])
min_idx = i;
}

// put min at the correct position


swap(&array[min_idx], &array[step]);
}
}

// driver code
int main() {
int data[] = {20, 12, 10, 15, 2};
int size = sizeof(data) / sizeof(data[0]);
selectionSort(data, size);
cout << "Sorted array in Acsending Order:\n";
printArray(data, size);
}

Insertion sort

The Insertion Sort algorithm uses one part of the array to hold the sorted
values, and the other part of the array to hold values that are not sorted yet.

The algorithm takes one value at a time from the unsorted part of the array
and puts it into the right place in the sorted part of the array, until the array
is sorted.

How it works:

1. Take the first value from the unsorted part of the array.
2. Move the value into the correct place in the sorted part of the array.
3. Go through the unsorted part of the array again as many times as there
are values.

Continue reading to fully understand the Insertion Sort algorithm and how to
implement it yourself.

Manual Run Through


Before we implement the Insertion Sort algorithm in a programming
language, let's manually run through a short array, just to get the idea.

Step 1: We start with an unsorted array.

[ 7, 12, 9, 11, 3]

Step 2: We can consider the first value as the initial sorted part of the array.
If it is just one value, it must be sorted, right?
[ 7, 12, 9, 11, 3]

Step 3: The next value 12 should now be moved into the correct position in
the sorted part of the array. But 12 is higher than 7, so it is already in the
correct position.

[ 7, 12, 9, 11, 3]

Step 4: Consider the next value 9.

[ 7, 12, 9, 11, 3]

Step 5: The value 9 must now be moved into the correct position inside the
sorted part of the array, so we move 9 in between 7 and 12.

[ 7, 9, 12, 11, 3]

Step 6: The next value is 11.

[ 7, 9, 12, > 11, 3]

Step 7: We move it in between 9 and 12 in the sorted part of the array.

[ 7, 9, 11, 12, 3]

Step 8: The last value to insert into the correct position is 3.

[ 7, 9, 11, 12, 3]

Step 9: We insert 3 in front of all other values because it is the lowest value.

[ 3,7, 9, 11, 12]

Finally, the array is sorted.

// Insertion sort in C++

#include <iostream>
using namespace std;

// Function to print an array


void printArray(int array[], int size) {
for (int i = 0; i < size; i++) {
cout << array[i] << " ";
}
cout << endl;
}

void insertionSort(int array[], int size) {


for (int step = 1; step < size; step++) {
int key = array[step];
int j = step - 1;

// Compare key with each element on the left of it until an element smaller than
// it is found.
// For descending order, change key<array[j] to key>array[j].
while (j >=0 && key < array[j])
{
array[j + 1] = array[j];
--j;
}
array[j + 1] = key;
}
}

// Driver code
int main() {
int data[] = {9, 5, 1, 4, 3};
int size = sizeof(data) / sizeof(data[0]);
insertionSort(data, size);
cout << "Sorted array in ascending order:\n";
printArray(data, size);
}

Merge sort

The Merge Sort algorithm is a divide-and-conquer algorithm that sorts an


array by first breaking it down into smaller arrays, and then building the
array back together the correct way so that it is sorted.

Divide: The algorithm starts with breaking up the array into smaller and
smaller pieces until one such sub-array only consists of one element.

Conquer: The algorithm merges the small pieces of the array back together
by putting the lowest values first, resulting in a sorted array.

The breaking down and building up of the array to sort the array is done
recursively.

The Merge Sort algorithm can be described like this:

How it works:

1. Divide the unsorted array into two sub-arrays, half the size of the original.
2. Continue to divide the sub-arrays as long as the current piece of the array
has more than one element.
3. Merge two sub-arrays together by always putting the lowest value first.
4. Keep merging until there are no sub-arrays left.

Take a look at the drawing below to see how Merge Sort works from a
different perspective. As you can see, the array is split into smaller and
smaller pieces until it is merged back together. And as the merging happens,
values from each sub-array are compared so that the lowest value comes
first.
Manual Run Through

Let's try to do the sorting manually, just to get an even better understanding
of how Merge Sort works before actually implementing it in a programming
language.

Step 1: We start with an unsorted array, and we know that it splits in half
until the sub-arrays only consist of one element. The Merge Sort function
calls itself two times, once for each half of the array. That means that the
first sub-array will split into the smallest pieces first.

[ 12, 8, 9, 3, 11, 5, 4]
[ 12, 8, 9] [ 3, 11, 5, 4]
[ 12] [ 8, 9] [ 3, 11, 5, 4]
[ 12] [ 8] [ 9] [ 3, 11, 5, 4]

Step 2: The splitting of the first sub-array is finished, and now it is time to
merge. 8 and 9 are the first two elements to be merged. 8 is the lowest
value, so that comes before 9 in the first merged sub-array.

[ 12] [ 8, 9] [ 3, 11, 5, 4]

Step 3: The next sub-arrays to be merged is [ 12] and [ 8, 9]. Values in


both arrays are compared from the start. 8 is lower than 12, so 8 comes
first, and 9 is also lower than 12.
[ 8, 9, 12] [ 3, 11, 5, 4]

Step 4: Now the second big sub-array is split recursively.

[ 8, 9, 12] [ 3, 11, 5, 4]
[ 8, 9, 12] [ 3, 11] [ 5, 4]
[ 8, 9, 12] [ 3] [ 11] [ 5, 4]

Step 5: 3 and 11 are merged back together in the same order as they are
shown because 3 is lower than 11.

[ 8, 9, 12] [ 3, 11] [ 5, 4]

Step 6: Sub-array with values 5 and 4 is split, then merged so that 4 comes
before 5.

[ 8, 9, 12] [ 3, 11] [ 5] [ 4]
[ 8, 9, 12] [ 3, 11] [ 4, 5]

Step 7: The two sub-arrays on the right are merged. Comparisons are done
to create elements in the new merged array:

1. 3 is lower than 4
2. 4 is lower than 11
3. 5 is lower than 11
4. 11 is the last remaining value

[ 8, 9, 12] [ 3, 4, 5, 11]

Step 8: The two last remaining sub-arrays are merged. Let's look at how the
comparisons are done in more detail to create the new merged and finished
sorted array:

3 is lower than 8:

Before [ 8, 9, 12] [ 3, 4, 5, 11]


After: [ 3, 8, 9, 12] [ 4, 5, 11]

Step 9: 4 is lower than 8:

Before [ 3, 8, 9, 12] [ 4, 5, 11]


After: [ 3, 4, 8, 9, 12] [ 5, 11]

Step 10: 5 is lower than 8:

Before [ 3, 4, 8, 9, 12] [ 5, 11]


After: [ 3, 4, 5, 8, 9, 12] [ 11]

Step 11: 8 and 9 are lower than 11:

Before [ 3, 4, 5, 8, 9, 12] [ 11]


After: [ 3, 4, 5, 8, 9, 12] [ 11]

Step 12: 11 is lower than 12:


Before [ 3, 4, 5, 8, 9, 12] [ 11]
After: [ 3, 4, 5, 8, 9, 11, 12]

The sorting is finished!

// Merge sort in C++


#include <iostream>
#include <vector>
using namespace std;

// Merge two subarrays L and M into arr


void merge(int arr[], int p, int q, int r) {

// Create L ← A[p..q] and M ← A[q+1..r]


int n1 = q - p + 1;
int n2 = r - q;

// Use std::vector to dynamically allocate arrays


vector<int> L(n1);
vector<int> M(n2);

for (int i = 0; i < n1; i++)


L[i] = arr[p + i];

for (int j = 0; j < n2; j++)


M[j] = arr[q + 1 + j];

// Maintain current index of sub-arrays and main array


int i = 0, j = 0, k = p;

// Until we reach either end of either L or M, pick larger among


// elements L and M and place them in the correct position at A[p..r]
while (i < n1 && j < n2) {
if (L[i] <= M[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = M[j];
j++;
}
k++;
}

// When we run out of elements in either L or M,


// pick up the remaining elements and put in A[p..r]
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}

while (j < n2) {


arr[k] = M[j];
j++;
k++;
}
}

// Divide the array into two subarrays, sort them and merge them
void mergeSort(int arr[], int l, int r) {
if (l < r) {
// m is the point where the array is divided into two subarrays
int m = l + (r - l) / 2;

mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);
// Merge the sorted subarrays
merge(arr, l, m, r);
}
}

// Print the array


void printArray(int arr[], int size) {
for (int i = 0; i < size; i++)
cout << arr[i] << " ";
cout << endl;
}

// Driver program
int main() {
int arr[] = {6, 5, 12, 10, 9, 1};
int size = sizeof(arr) / sizeof(arr[0]);
mergeSort(arr, 0, size - 1);
cout << "Sorted array: \n";
printArray(arr, size);
return 0;
}
Efficiency of sorting methods
Sorting algorithm efficiency is typically judged by their time complexity, with algorithms
like Merge Sort, Quick Sort, and Heap Sort generally considered efficient, offering O(n log
n) average time complexity, while simpler algorithms like Bubble Sort and Selection Sort
have O(n^2) time complexity.

Common Sorting Algorithms and Their Efficiency:


Bubble Sort:
Simple to understand, but inefficient for large datasets, with a time complexity of O(n^2) in
most cases.
Selection Sort:
Similar to bubble sort in efficiency, also with O(n^2) time complexity.
Insertion Sort:
Relatively efficient for small or nearly sorted datasets, with O(n) time complexity in the
best case and O(n^2) in the worst case.
Merge Sort:
A divide-and-conquer algorithm known for its efficiency, with a time complexity of O(n
log n) in all cases.
Quick Sort:
Another efficient algorithm, often faster in practice than Merge Sort, with an average time
complexity of O(n log n), but a worst-case complexity of O(n^2).
Heap Sort:
Uses a heap data structure to achieve a time complexity of O(n log n) in all cases.
Counting Sort:
An efficient algorithm for sorting integers, with a time complexity of O(n+k), where k is
the range of the input values.
Radix Sort:
A non-comparison-based sorting algorithm that sorts elements based on their digits or
characters, with a time complexity of O(n*k).
Bucket Sort:
Divides the input into buckets and sorts each bucket individually, with an average time
complexity of O(n).
Timsort:
A hybrid sorting algorithm used in Python and Java, known for its efficiency and stability.
Factors Affecting Efficiency:
 Time Complexity: How the execution time grows as the input size increases.
 Space Complexity: How much memory the algorithm requires.
 Input Data: Some algorithms perform better on specific types of data (e.g., nearly sorted
data for insertion sort).
 Algorithm Implementation: The way an algorithm is implemented can also affect its
performance.

Big O notations
Big O notation is a mathematical notation used in computer science to describe the limiting
behavior of a function when the argument tends towards a particular value or infinity,
specifically used to analyze the efficiency of algorithms by focusing on their time and space
complexity as the input size grows.

What it is:
 Mathematical Notation:
Big O notation, also known as asymptotic notation, provides a way to classify algorithms
based on how their runtime or space requirements grow with increasing input size.
 Focus on Growth Rate:
It doesn't concern itself with exact execution time or memory usage, but rather the rate at
which these resources scale.
 Upper Bound:
Big O notation typically represents the worst-case scenario, providing an upper bound on
the algorithm's performance.
 Ignoring Constants and Lower-Order Terms:
It focuses on the dominant term in the complexity expression, discarding constants and
lower-order terms.
Why it's important:
 Algorithm Comparison:
Big O notation allows developers to compare the efficiency of different algorithms and
choose the most optimal one for a given task.
 Performance Optimization:
By understanding the time and space complexity of algorithms, developers can identify
potential bottlenecks and optimize code for better performance.
 Predicting Scalability:
It helps predict how an algorithm's performance will degrade as the input size increases.
Common Big O Notations:
 O(1) (Constant Time): The algorithm's runtime/space remains constant regardless of the
input size.
 O(log n) (Logarithmic Time): The runtime/space grows proportionally to the logarithm of
the input size.
 O(n) (Linear Time): The runtime/space grows proportionally to the input size.
 O(n log n): The runtime/space grows proportionally to the product of the input size and its
logarithm.
 O(n^2) (Quadratic Time): The runtime/space grows proportionally to the square of the
input size.
 O(2^n) (Exponential Time): The runtime/space grows exponentially with the input size.
Examples:
 Linear Search: Finding an element in an unsorted array (O(n) - worst case).
 Binary Search: Finding an element in a sorted array (O(log n) - worst case).
 Bubble Sort: Sorting an array (O(n^2) - worst case).
 Merge Sort: Sorting an array (O(n log n) - worst case).
Hash tables

A Hash Table is a data structure designed to be fast to work with.

The reason Hash Tables are sometimes preferred instead of arrays or linked lists is because
searching for, adding, and deleting data can be done really quickly, even for large amounts of
data.

In a Linked List, finding a person "Bob" takes time because we would have to go from one
node to the next, checking each node, until the node with "Bob" is found.

And finding "Bob" in an Array could be fast if we knew the index, but when we only know
the name "Bob", we need to compare each element (like with Linked Lists), and that takes
time.

With a Hash Table however, finding "Bob" is done really fast because there is a way to go
directly to where "Bob" is stored, using something called a hash function.

Building A Hash Table from Scratch

To get the idea of what a Hash Table is, let's try to build one from scratch, to store unique
first names inside it.

We will build the Hash Set in 5 steps:


1. Starting with an array.
2. Storing names using a hash function.
3. Looking up an element using a hash function.
4. Handling collisions.
5. The basic Hash Set code example and simulation.

Step 1: Starting with an array

Using an array, we could store names like this:

my_array = ['Pete', 'Jones', 'Lisa', 'Bob', 'Siri']

To find "Bob" in this array, we need to compare each name, element by element, until we
find "Bob".

If the array was sorted alphabetically, we could use Binary Search to find a name quickly, but
inserting or deleting names in the array would mean a big operation of shifting elements in
memory.

To make interacting with the list of names really fast, let's use a Hash Table for this instead,
or a Hash Set, which is a simplified version of a Hash Table.

To keep it simple, let's assume there is at most 10 names in the list, so the array must be a
fixed size of 10 elements. When talking about Hash Tables, each of these elements is called
a bucket.

my_hash_set = [None,None,None,None,None,None,None,None,None,None]

Step 2: Storing names using a hash function

Now comes the special way we interact with the Hash Set we are making.

We want to store a name directly into its right place in the array, and this is where the hash
function comes in.

A hash function can be made in many ways, it is up to the creator of the Hash Table. A
common way is to find a way to convert the value into a number that equals one of the Hash
Set's index numbers, in this case a number from 0 to 9. In our example we will use the
Unicode number of each character, summarize them and do a modulo 10 operation to get
index numbers 0-9.

Example

def hash_function(value):

sum_of_chars = 0

for char in value:

sum_of_chars += ord(char)

return sum_of_chars % 10
print("'Bob' has hash code:",hash_function('Bob'))

The character "B" has Unicode code point 66, "o" has 111, and "b" has 98. Adding those
together we get 275. Modulo 10 of 275 is 5, so "Bob" should be stored as an array element at
index 5.

The number returned by the hash function is called the hash code.

Unicode number: Everything in our computers are stored as numbers, and the Unicode code
point is a unique number that exist for every character. For example, the character A has
Unicode number (also called Unicode code point) 65. Just try it in the simulation below.
See this page for more information about how characters are represented as numbers.

Modulo: A mathematical operation, written as % in most programming languages


(or modmod in mathematics). A modulo operation divides a number with another number,
and gives us the resulting remainder. So for example, 7 % 3 will give us the remainder 1.
(Dividing 7 apples between 3 people, means that each person gets 2 apples, with 1 apple to
spare.)

After storing "Bob" where the hash code tells us (index 5), our array now looks like this:

my_hash_set = [None,None,None,None,None,'Bob',None,None,None,None]

We can use the hash function to find out where to store the other names "Pete", "Jones",
"Lisa", and "Siri" as well.

After using the hash function to store those names in the correct position, our array looks like
this:

my_hash_set = [None,'Jones',None,'Lisa',None,'Bob',None,'Siri','Pete',None]

Step 3: Looking up a name using a hash function

We have now established a super basic Hash Set, because we do not have to check the array
element by element anymore to find out if "Pete" is in there, we can just use the hash function
to go straight to the right element!

To find out if "Pete" is stored in the array, we give the name "Pete" to our hash function, we
get back hash code 8, we go directly to the element at index 8, and there he is. We found
"Pete" without checking any other elements.

Example

my_hash_set = [None,'Jones',None,'Lisa',None,'Bob',None,'Siri','Pete',None]

def hash_function(value):

sum_of_chars = 0

for char in value:

sum_of_chars += ord(char)
return sum_of_chars % 10

def contains(name):

index = hash_function(name)

return my_hash_set[index] == name

print("'Pete' is in the Hash Set:",contains('Pete'))

When deleting a name from our Hash Set, we can also use the hash function to go straight to
where the name is, and set that element value to None.

Step 4: Handling collisions

Let's also add "Stuart" to our Hash Set.

We give "Stuart" to our hash function, and we get the hash code 3, meaning "Stuart" should
be stored at index 3.

Trying to store "Stuart" creates what is called a collision, because "Lisa" is already stored at
index 3.

To fix the collision, we can make room for more elements in the same bucket, and solving the
collision problem in this way is called chaining. We can give room for more elements in the
same bucket by implementing each bucket as a linked list, or as an array.

After implementing each bucket as an array, to give room for potentially more than one name
in each bucket, "Stuart" can also be stored at index 3, and our Hash Set now looks like this:

my_hash_set = [

[None],

['Jones'],

[None],

['Lisa', 'Stuart'],

[None],

['Bob'],

[None],

['Siri'],

['Pete'],

[None]

]
Searching for "Stuart" in our Hash Set now means that using the hash function we end up
directly in bucket 3, but then be must first check "Lisa" in that bucket, before we find "Stuart"
as the second element in bucket 3.

Step 5: Hash Set code example and simulation

To complete our very basic Hash Set code, let's have functions for adding and searching for
names in the Hash Set, which is now a two dimensional array.

Run the code example below, and try it with different values to get a better understanding of
how a Hash Set works.

Example

my_hash_set = [

[None],

['Jones'],

[None],

['Lisa'],

[None],

['Bob'],

[None],

['Siri'],

['Pete'],

[None]

def hash_function(value):

return sum(ord(char) for char in value) % 10

def add(value):

index = hash_function(value)

bucket = my_hash_set[index]

if value not in bucket:

bucket.append(value)

def contains(value):

index = hash_function(value)

bucket = my_hash_set[index]
return value in bucket

add('Stuart')

print(my_hash_set)

print('Contains Stuart:',contains('Stuart'))

The next two pages show better and more detailed implementations of Hast Sets and Hash
Tables.

Try the Hash Set simulation below to get a better ide of how a Hash Set works in principle.

Hash Set

0:

Thomas

Jens

1:

2:

Peter

3:

Lisa

4:

Charlotte

5:

Adele

Bob

6:

7:

8:

Michaela

9:

Hash Code

275 % 10 = 5

Uses of Hash Tables

Hash Tables are great for:


 Checking if something is in a collection (like finding a book in a library).
 Storing unique items and quickly finding them (like storing phone numbers).
 Connecting values to keys (like linking names to phone numbers).

The most important reason why Hash Tables are great for these things is that Hash Tables are
very fast compared Arrays and Linked Lists, especially for large sets. Arrays and Linked
Lists have time complexity O(n)O(n) for search and delete, while Hash Tables have
just O(1)O(1) on average
Hash Set vs. Hash Map

A Hash Table can be a Hash Set or a Hash Map. The next two pages describe these data
structures in more detail.

Here's how Hash Sets and Hash Maps are different and similar:

Hash Set Hash Map

Uniqueness and storage Every element is a unique Every entry is a key-value-


key. pair, with a key that is
unique, and a value
connected it.

Use case Checking if an element is Finding information based


in the set, like checking if on a key, like looking up
a name is on a guest list. who owns a certain
telephone number.

Yes, average O(1)O(1). Yes, average O(1)O(1).


Is it fast to search, add
and delete elements?

Is there a hash function Yes Yes


that takes the key,
generates a hash code,
and that is the bucket
where the element is
stored?

Hash Table elements are stored in storage containers called buckets.

Every Hash Table element has a part that is unique that is called the key.

A hash function takes the key of an element to generate a hash code.


The hash code says what bucket the element belongs to, so now we can go directly to that
Hash Table element: to modify it, or to delete it, or just to check if it exists. Specific hash
functions are explained in detail on the next two pages.

A collision happens when two Hash Table elements have the same hash code, because that
means they belong to the same bucket. A collision can be solved in two ways.

Chaining is the way collisions are solved in this tutorial, by using arrays or linked lists to
allow more than one element in the same bucket.

Open Addressing is another way to solve collisions. With open addressing, if we want to
store an element but there is already an element in that bucket, the element is stored in the
next available bucket. This can be done in many different ways, but we will not explain open
addressing any further here.

Hashing techniques
There are numerous hashing algorithms, each with distinct advantages and disadvantages.
The most popular algorithms include the following:

o MD5: A widely used hashing algorithm that produces a 128-bit hash value.
o SHA-1: A popular hashing algorithm that produces a 160-bit hash value.
o SHA-256: A more secure hashing algorithm that produces a 256-bit hash value.
What is MD5?
MD5 (message-digest algorithm) is a cryptographic protocol used for authenticating
messages as well as content verification and digital signatures. MD5 is based on a hash
function that verifies that a file you sent matches the file received by the person you sent it to.
Previously, MD5 was used for data encryption, but now it’s used primarily for authentication.
How does MD5 work?
MD5 runs entire files through a mathematical hashing algorithm to generate a signature that
can be matched with an original file. That way, a received file can be authenticated as
matching the original file that was sent, ensuring that the right files get where they need to
go.
The MD5 hashing algorithm converts data into a string of 32 characters. For example, the
word “frog” always generates this hash: 938c2cc0dcc05f2b68c4287040cfcf71. Similarly, a
file of 1.2 GB also generates a hash with the same number of characters. When you send that
file to someone, their computer authenticates its hash to ensure it matches the one you sent.
If you change just one bit in a file, no matter how large the file is, the hash output will be
completely and irreversibly changed. Nothing less than an exact copy will pass the MD5 test.
What is MD5 used for?
MD5 is primarily used to authenticate files. It’s much easier to use the MD5 hash to check a
copy of a file against an original than to check bit by bit to see if the two copies match.
MD5 was once used for data security and encryption, but these days its primary use is
authentication. Because a hacker can create a file that has the exact same hash as an entirely
different file, MD5 is not secure in the event that someone tampers with a file. But if you’re
simply copying a file from one place to another, MD5 will do the job.
Since MD5 is no longer used for encryption purposes, And if you want to encrypt your entire
internet connection, try Avast SecureLine VPN. Unlike MD5, a VPN encrypts all the data
moving in and out of your computer, making it completely invisible to hackers, ISPs,
governments, or anyone else. And with Avast, you’ll enjoy lightning-fast connection speeds.
How is an MD5 hash calculated?
The MD5 hashing algorithm uses a complex mathematical formula to create a hash. It
converts data into blocks of specific sizes and manipulates that data a number of times. While
this is happening, the algorithm adds a unique value into the calculation and converts the
result into a small signature or hash.
MD5 algorithm steps are incredibly complex for a reason — you cannot reverse this process
and generate the original file from the hash. But the same input will always produce the same
output, also known as the MD5 sum, hash, or the checksum. That’s what makes them so
useful for data validation.
An MD5 hash example looks like this: 0cc175b9c0f1b6a831c399e269772661. That’s the
hash for the letter “a.”
SHA-1 (Secure Hash Algorithm 1) is a cryptographic hash function that produces a 160-bit
hash value (message digest) from an input message of any size, designed by the NSA and
published by NIST. It is a one-way function, meaning it's computationally infeasible to derive
the original message from its hash value.

 Purpose:
SHA-1 is used for verifying data integrity and authenticity, ensuring that data hasn't been
tampered with.
 How it works:
 Input: SHA-1 takes an input message (data) of any length, up to 2^64 bits.
 Padding: The input message is padded to make its length a multiple of 512 bits.
 Processing: The padded message is processed in 80 rounds using a series of logical functions
and bit operations.
 Output: The final hash value is a 160-bit (20-byte) hash digest.
 Weaknesses:
While SHA-1 was once considered secure, it has been found to have vulnerabilities,
including collision attacks, meaning that different inputs can produce the same hash value.
 Alternatives:
Due to these weaknesses, SHA-1 is no longer considered secure for most cryptographic
applications, and newer algorithms like SHA-256 and SHA-3 are recommended.
 Applications:
SHA-1 was widely used in security protocols like TLS, SSL, PGP, SSH, IPsec, and
S/MIME, but its use is now declining.

SHA-256 is a cryptographic hash function that produces a 256-bit (32-byte) hash value from
any input data, widely used for data integrity verification, digital signatures, and blockchain
technology.

Key Features:
 Cryptographic Hash Function:
SHA-256, part of the SHA-2 family, is a cryptographic hash function, meaning it takes an
input (data of any length) and produces a fixed-size output (256-bit hash).
 One-Way Function:
It's designed to be computationally infeasible to reverse the process, meaning you can't
determine the original input from the hash value.
 Fixed-Size Output:
Regardless of the input size, the SHA-256 algorithm always produces a 256-bit hash value.
 Security:
SHA-256 is considered a secure hash function, resistant to collision attacks (where two
different inputs produce the same hash).
 Applications:
 Data Integrity: Verifies that data hasn't been tampered with or corrupted.
 Digital Signatures: Ensures the authenticity and integrity of digital documents.
 Blockchain Technology: Used in cryptocurrencies like Bitcoin for securing transactions and
maintaining the integrity of the blockchain.
 SSL/TLS Certificates: Used to secure web communications by verifying the integrity of
certificates.
 Password Hashing: While not recommended for direct password storage due to its speed,
SHA-256 is sometimes used in combination with other techniques like salting and key
stretching.
How it works:
 SHA-256 takes an input message and processes it through a series of mathematical
operations, including bitwise operations, addition, and shifting.
 These operations are performed in rounds, with the output of each round feeding into the
next.
 The final output is a 256-bit hash value, which is a unique "fingerprint" of the input data.
Why is it important?
 Data Integrity:
By comparing the hash of a file or message before and after transmission or storage, you
can verify that it hasn't been altered.
 Security:
SHA-256 helps protect sensitive data by ensuring its authenticity and integrity, making it
difficult for attackers to tamper with or forge data.
 Blockchain Technology:
SHA-256 is a cornerstone of blockchain technology, ensuring the security and immutability
of transactions.
Collision resolution techniques
When two items hash to the same slot, we must have a systematic method for placing the
second item in the hash table. This process is called collision resolution. As we stated earlier,
if the hash function is perfect, collisions will never occur.

When one or more hash values compete with a single hash table slot, collisions occur. To
resolve this, the next available empty slot is assigned to the current hash value. The most
common methods are open addressing, chaining, probabilistic hashing, perfect hashing and
coalesced hashing techniques.

a) Chaining:
This technique implements a linked list and is the most popular out of all the collision
resolution techniques. Below is an example of a chaining process.
Since one slot here has 3 elements – {50, 85, 92}, a linked list is assigned to include the other
2 items {85, 92}. When you use the chaining technique, the insertion or deletion of items
with the hash table is fairly simple and high-performing. Likewise, a chain hash table
inherits the pros and cons of a linked list. Alternatively, chaining can use dynamic
arrays instead of linked lists.

b) Open Addressing:
This technique depends on space usage and can be done with linear or quadratic probing
techniques. As the name says, this technique tries to find an available slot to store the record.
It can be done in one of the 3 ways –

Linear probing – Here, the next probe interval is fixed to 1. It supports the best caching but
miserably fails at clustering.

 Quadratic probing – the probe distance is calculated based on the quadratic


equation. This is considerably a better option as it balances clustering and caching.
 Double hashing – Here, the probing interval is fixed for each record by a second
hashing function. This technique has poor cache performance although it does not
have any clustering issues.
Below are some of the hashing techniques that can help in resolving collision.

c) Probabilistic hashing:
This is memory-based hashing that implements caching. When a collision occurs, either the
old record is replaced by the new or the new record may be dropped. Although this scenario
has a risk of losing data, it is still preferred due to its ease of implementation and high
performance.

d) Perfect hashing:
When the slots are uniquely mapped, the chances of collision are minimal. However, it can
be done where there is a lot of spare memory.

e) Coalesced hashing:
This technique is a combo of open address and chaining methods. A chain of items is are
stored in the table when there is a collision. The next available table space is used to store the
items to prevent collision.

You might also like