Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit-I: COMP302TH: Data Structure and File Processing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

COMP302TH: Data Structure and File Processing

Unit-I
Basic Data Structures: Abstract data structures- stacks, queues, linked
lists and binary trees. Binary trees, balanced trees.

Unit-II
Searching: Internal and external searching, Memory Management:
Garbage collection algorithms for equal sized blocks, storage allocation
for objects with mixed size.

Unit-III
Physical Devices: Characteristics of storage devices such as disks and
tapes, I/O buffering. Basic File System Operations: Create, open, close,
extend, delete, read-block, write-block, protection mechanisms.

Unit-IV
File Organizations: Sequential, indexed sequential, direct, inverted,
multi-list, directory systems, Indexing using B-tree, B+ tree.

Books Recommended:

1. M.T. Goodrich, R. Tamassia and D. Mount, “Data Structures and


Algorithms in C++”, John Wiley and Sons, Inc., 2004.

2. T.H. Cormen, C.E. Leiserson, R.L. Rivest and C. Stein, “Introduction to


Algorithms”, 2nd Ed., Prentice-Hall of India, 2006.
3. Robert L. Kruse and A.J. Ryba, “Data Structures and Program Design
in C++”, Prentice Hall, Inc., NJ, 1998.

4. B. Stroupstrup, “The C++ Programming Language”, Addison Wesley,


2004. 5. D.E. Knuth, “Fundamental Algorithms (Vol. I)”, Addison Wesley,
1997.

UNIT-1
Abstract data type in data structure
Before knowing about the abstract data type, we should know about the what is a data
structure.

What is data structure?


A data structure is a technique of organizing the data so that the data can be utilized
efficiently. There are two ways of viewing the data structure:

o Mathematical/ Logical/ Abstract models/ Views: The data structure is the way of
organizing the data that requires some protocols or rules. These rules need to be
modeled that come under the logical/abstract model.
o Implementation: The second part is the implementation part. The rules must be
implemented using some programming language.

Why data structure?


The following are the advantages of using the data structure:

o These are the essential ingredients used for creating fast and powerful algorithms.
o They help us to manage and organize the data.
o Data structures make the code cleaner and easier to understand.
What is abstract data type?
An abstract data type is an abstraction of a data structure that provides only the
interface to which the data structure must adhere. The interface does not give any
specific details about something should be implemented or in what programming
language.

In other words, we can say that abstract data types are the entities that are definitions of
data and operations but do not have implementation details. In this case, we know the
data that we are storing and the operations that can be performed on the data, but we
don't know about the implementation details. The reason for not having
implementation details is that every programming language has a different
implementation strategy for example; a C data structure is implemented using structures
while a C++ data structure is implemented using objects and classes.

For example, a List is an abstract data type that is implemented using a dynamic array
and linked list. A queue is implemented using linked list-based queue, array-based
queue, and stack-based queue. A Map is implemented using Tree map, hash map, or
hash table.

Abstract data type model


Before knowing about the abstract data type model, we should know about abstraction
and encapsulation.

Abstraction: It is a technique of hiding the internal details from the user and only
showing the necessary details to the user.

Encapsulation: It is a technique of combining the data and the member function in a


single unit is known as encapsulation.
The above figure shows the ADT model. There are two types of models in the ADT
model, i.e., the public function and the private function. The ADT model also contains
the data structures that we are using in a program. In this model, first encapsulation is
performed, i.e., all the data is wrapped in a single unit, i.e., ADT. Then, the abstraction is
performed means showing the operations that can be performed on the data structure
and what are the data structures that we are using in a program.

Let's understand the abstract data type with a real-world example.

If we consider the smartphone. We look at the high specifications of the smartphone,


such as:

o 4 GB RAM
o Snapdragon 2.2ghz processor
o 5 inch LCD screen
o Dual camera
o Android 8.0

The above specifications of the smartphone are the data, and we can also perform the
following operations on the smartphone:

o call(): We can call through the smartphone.


o text(): We can text a message.
o photo(): We can click a photo.
o video(): We can also make a video.

The smartphone is an entity whose data or specifications and operations are given
above. The abstract/logical view and operations are the abstract or logical views of a
smartphone.

What is Stack ? Explain three different


application of stacks with the help of
example. (HPU BCA question paper)
Or
What is stack in data structure ? Explain
the working of stack ssthrough different
operations performed in stack
What is a Stack?
A Stack is a linear data structure that follows the LIFO (Last-In-First-Out) principle.
Stack has one end, whereas the Queue has two ends (front and rear). It contains only
one pointer top pointer pointing to the topmost element of the stack. Whenever an
element is added in the stack, it is added on the top of the stack, and the element can
be deleted only from the stack. In other words, a stack can be defined as a container
in which insertion and deletion can be done from the one end known as the top of
the stack.

Some key points related to stack


o It is called as stack because it behaves like a real-world stack, piles of books, etc.
o A Stack is an abstract data type with a pre-defined capacity, which means that it can
store the elements of a limited size.
o It is a data structure that follows some order to insert and delete the elements, and that
order can be LIFO or FILO.
Working of Stack
Stack works on the LIFO pattern. As we can observe in the below figure there are five
memory blocks in the stack; therefore, the size of the stack is 5.

Suppose we want to store the elements in a stack and let's assume that stack is empty.
We have taken the stack of size 5 as shown below in which we are pushing the elements
one by one until the stack becomes full.

Since our stack is full as the size of the stack is 5. In the above cases, we can observe
that it goes from the top to the bottom when we were entering the new element in the
stack. The stack gets filled up from the bottom to the top.

When we perform the delete operation on the stack, there is only one way for entry and
exit as the other end is closed. It follows the LIFO pattern, which means that the value
entered first will be removed last. In the above case, the value 5 is entered first, so it will
be removed only after the deletion of all the other elements.

Standard Stack Operations


The following are some common operations implemented on the stack:

o push(): When we insert an element in a stack then the operation is known as a push. If
the stack is full then the overflow condition occurs.
o pop(): When we delete an element from the stack, the operation is known as a pop. If
the stack is empty means that no element exists in the stack, this state is known as an
underflow state.
o isEmpty(): It determines whether the stack is empty or not.
o isFull(): It determines whether the stack is full or not.'
o peek(): It returns the element at the given position.
o count(): It returns the total number of elements available in a stack.
o change(): It changes the element at the given position.
o display(): It prints all the elements available in the stack.

PUSH operation
The steps involved in the PUSH operation is given below:

o Before inserting an element in a stack, we check whether the stack is full.


o If we try to insert the element in a stack, and the stack is full, then
the overflow condition occurs.
o When we initialize a stack, we set the value of top as -1 to check that the stack is empty.
o When the new element is pushed in a stack, first, the value of the top gets incremented,
i.e., top=top+1, and the element will be placed at the new position of the top.

o The elements will be inserted until we reach the max size of the stack.
POP operation
The steps involved in the POP operation is given below:

o Before deleting the element from the stack, we check whether the stack is empty.
o If we try to delete the element from the empty stack, then the underflow condition
occurs.
o If the stack is not empty, we first access the element which is pointed by the top
o Once the pop operation is performed, the top is decremented by 1, i.e., top=top-1.
Applications of Stack
The following are the applications of the stack:

o Balancing of symbols: Stack is used for balancing a symbol. For example, we have the
following program:

1. int main()
2. {
3. cout<<"Hello";
4. cout<<"javaTpoint";
5. }

As we know, each program has an opening and closing braces; when the opening braces
come, we push the braces in a stack, and when the closing braces appear, we pop the
opening braces from the stack. Therefore, the net value comes out to be zero. If any
symbol is left in the stack, it means that some syntax occurs in a program.

o String reversal: Stack is also used for reversing a string. For example, we want to reverse
a "javaTpoint" string, so we can achieve this with the help of a stack.
First, we push all the characters of the string in a stack until we reach the null character.
After pushing all the characters, we start taking out the character one by one until we
reach the bottom of the stack.
o UNDO/REDO: It can also be used for performing UNDO/REDO operations. For example,
we have an editor in which we write 'a', then 'b', and then 'c'; therefore, the text written in
an editor is abc. So, there are three states, a, ab, and abc, which are stored in a stack.
There would be two stacks in which one stack shows UNDO state, and the other shows
REDO state.
If we want to perform UNDO operation, and want to achieve 'ab' state, then we
implement pop operation.
o Recursion: The recursion means that the function is calling itself again. To maintain the
previous states, the compiler creates a system stack in which all the previous records of
the function are maintained.
o DFS(Depth First Search): This search is implemented on a Graph, and Graph uses the
stack data structure.
o Backtracking: Suppose we have to create a path to solve a maze problem. If we are
moving in a particular path, and we realize that we come on the wrong way. In order to
come at the beginning of the path to create a new path, we have to use the stack data
structure.
o Expression conversion: Stack can also be used for expression conversion. This is one of
the most important applications of stack. The list of the expression conversion is given
below:
o Infix to prefix
o Infix to postfix
o Prefix to infix
o Prefix to postfix
Postfix to infix

o Memory management: The stack manages the memory. The memory is assigned in the
contiguous memory blocks. The memory is known as stack memory as all the variables
are assigned in a function call stack memory. The memory size assigned to the program
is known to the compiler. When the function is created, all its variables are assigned in
the stack memory. When the function completed its execution, all the variables assigned
in the stack are released.
What is Queue ? Discuss its various
applications. (HPU BCA )
What is a Queue?
Queue is the data structure that is similar to the queue in the real world. A queue is a
data structure in which whatever comes first will go out first, and it follows the FIFO
(First-In-First-Out) policy. Queue can also be defined as the list or collection in which the
insertion is done from one end known as the rear end or the tail of the queue, whereas
the deletion is done from another end known as the front end or the head of the
queue.

The real-world example of a queue is the ticket queue outside a cinema hall, where the
person who enters first in the queue gets the ticket first, and the last person enters in
the queue gets the ticket at last. Similar approach is followed in the queue in data
structure.

The representation of the queue is shown in the below image -

Now, let's move towards the types of queue.

Types of Queue
There are four different types of queue that are listed as follows -
o Simple Queue or Linear Queue
o Circular Queue
o Priority Queue
o Double Ended Queue (or Deque)

Let's discuss each of the type of queue.

Simple Queue or Linear Queue


In Linear Queue, an insertion takes place from one end while the deletion occurs from
another end. The end at which the insertion takes place is known as the rear end, and
the end at which the deletion takes place is known as front end. It strictly follows the
FIFO rule.
The major drawback of using a linear Queue is that insertion is done only from the rear
end. If the first three elements are deleted from the Queue, we cannot insert more
elements even though the space is available in a Linear Queue. In this case, the linear
Queue shows the overflow condition as the rear is pointing to the last element of the
Queue.

To know more about the queue in data structure, you can click the link -
https://www.javatpoint.com/data-structure-queue

Circular Queue
In Circular Queue, all the nodes are represented as circular. It is similar to the linear
Queue except that the last element of the queue is connected to the first element. It is
also known as Ring Buffer, as all the ends are connected to another end. The
representation of circular queue is shown in the below image -

The drawback that occurs in a linear queue is overcome by using the circular queue. If
the empty space is available in a circular queue, the new element can be added in an
empty space by simply incrementing the value of rear. The main advantage of using the
circular queue is better memory utilization.

To know more about the circular queue, you can click the link -
https://www.javatpoint.com/circular-queue

Priority Queue
It is a special type of queue in which the elements are arranged based on the priority. It
is a special type of queue data structure in which every element has a priority associated
with it. Suppose some elements occur with the same priority, they will be arranged
according to the FIFO principle. The representation of priority queue is shown in the
below image -
Insertion in priority queue takes place based on the arrival, while deletion in the priority
queue occurs based on the priority. Priority queue is mainly used to implement the CPU
scheduling algorithms.

There are two types of priority queue that are discussed as follows -

o Ascending priority queue - In ascending priority queue, elements can be inserted in


arbitrary order, but only smallest can be deleted first. Suppose an array with elements 7,
5, and 3 in the same order, so, insertion can be done with the same sequence, but the
order of deleting the elements is 3, 5, 7.
o Descending priority queue - In descending priority queue, elements can be inserted in
arbitrary order, but only the largest element can be deleted first. Suppose an array with
elements 7, 3, and 5 in the same order, so, insertion can be done with the same
sequence, but the order of deleting the elements is 7, 5, 3.

To learn more about the priority queue, you can click the link -
https://www.javatpoint.com/ds-priority-queue

Deque (or, Double Ended Queue)


In Deque or Double Ended Queue, insertion and deletion can be done from both ends
of the queue either from the front or rear. It means that we can insert and delete
elements from both front and rear ends of the queue. Deque can be used as a
palindrome checker means that if we read the string from both ends, then the string
would be the same.

Deque can be used both as stack and queue as it allows the insertion and deletion
operations on both ends. Deque can be considered as stack because stack follows the
LIFO (Last In First Out) principle in which insertion and deletion both can be performed
only from one end. And in deque, it is possible to perform both insertion and deletion
from one end, and Deque does not follow the FIFO principle.

The representation of the deque is shown in the below image -

To know more about the deque, you can click the link - https://www.javatpoint.com/ds-
deque

There are two types of deque that are discussed as follows -

o Input restricted deque - As the name implies, in input restricted queue, insertion
operation can be performed at only one end, while deletion can be performed from both
ends.

o Output restricted deque - As the name implies, in output restricted queue, deletion
operation can be performed at only one end, while insertion can be performed from
both ends.

Now, let's see the operations performed on the queue.

Operations performed on queue


The fundamental operations that can be performed on queue are listed as follows -

o Enqueue: The Enqueue operation is used to insert the element at the rear end of the
queue. It returns void.
o Dequeue: It performs the deletion from the front-end of the queue. It also returns the
element which has been removed from the front-end. It returns an integer value.
o Peek: This is the third operation that returns the element, which is pointed by the front
pointer in the queue but does not delete it.
o Queue overflow (isfull): It shows the overflow condition when the queue is completely
full.
o Queue underflow (isempty): It shows the underflow condition when the Queue is
empty, i.e., no elements are in the Queue.

Linked List
o Linked List can be defined as collection of objects called nodes that are randomly stored
in the memory.
o A node contains two fields i.e. data stored at that particular address and the pointer
which contains the address of the next node in the memory.
o The last node of the list contains pointer to the null.
Uses of Linked List
o The list is not required to be contiguously present in the memory. The node can reside
any where in the memory and linked together to make a list. This achieves optimized
utilization of space.
o list size is limited to the memory size and doesn't need to be declared in advance.
o Empty node can not be present in the linked list.
o We can store values of primitive types or objects in the singly linked list.

Why use linked list over array?


Till now, we were using array data structure to organize the group of elements that are
to be stored individually in the memory. However, Array has several advantages and
disadvantages which must be known in order to decide the data structure which will be
used throughout the program.

Array contains following limitations:

1. The size of array must be known in advance before using it in the program.
2. Increasing size of the array is a time taking process. It is almost impossible to expand the
size of the array at run time.
3. All the elements in the array need to be contiguously stored in the memory. Inserting
any element in the array needs shifting of all its predecessors.

Linked list is the data structure which can overcome all the limitations of an array. Using
linked list is useful because,

1. It allocates the memory dynamically. All the nodes of linked list are non-contiguously
stored in the memory and linked together with the help of pointers.
2. Sizing is no longer a problem since we do not need to define its size at the time of
declaration. List grows as per the program's demand and limited to the available
memory space.

Singly linked list or One way chain


Singly linked list can be defined as the collection of ordered set of elements. The
number of elements may vary according to need of the program. A node in the singly
linked list consist of two parts: data part and link part. Data part of the node stores
actual information that is to be represented by the node while the link part of the node
stores the address of its immediate successor.

Play Videox

One way chain or singly linked list can be traversed only in one direction. In other words,
we can say that each node contains only next pointer, therefore we can not traverse the
list in the reverse direction.

Consider an example where the marks obtained by the student in three subjects are
stored in a linked list as shown in the figure.
In the above figure, the arrow represents the links. The data part of every node contains
the marks obtained by the student in the different subject. The last node in the list is
identified by the null pointer which is present in the address part of the last node. We
can have as many elements we require, in the data part of the list.

Complexity
Data Time Complexity Space
Structure Compleity

Average Worst Worst

Access Search Insertion Deletion Access Search Insertion Deletion

Singly Linked θ(n) θ(n) θ(1) θ(1) O(n) O(n) O(1) O(1) O(n)
List

Operations on Singly Linked List


There are various operations which can be performed on singly linked list. A list of all
such operations is given below.

Node Creation
1. struct node
2. {
3. int data;
4. struct node *next;
5. };
6. struct node *head, *ptr;
7. ptr = (struct node *)malloc(sizeof(struct node *));

Insertion
The insertion into a singly linked list can be performed at different positions. Based on
the position of the new node being inserted, the insertion is categorized into the
following categories.
SN Operation Description

1 Insertion at It involves inserting any element at the front of the list. We just need to a few link
beginning adjustments to make the new node as the head of the list.

2 Insertion at end of It involves insertion at the last of the linked list. The new node can be inserted as
the list the only node in the list or it can be inserted as the last one. Different logics are
implemented in each scenario.

3 Insertion after It involves insertion after the specified node of the linked list. We need to skip the
specified node desired number of nodes in order to reach the node after which the new node will
be inserted. .

Deletion and Traversing


The Deletion of a node from a singly linked list can be performed at different positions.
Based on the position of the node being deleted, the operation is categorized into the
following categories.

SN Operation Description

1 Deletion at It involves deletion of a node from the beginning of the list. This is the simplest
beginning operation among all. It just need a few adjustments in the node pointers.

2 Deletion at the It involves deleting the last node of the list. The list can either be empty or full.
end of the list Different logic is implemented for the different scenarios.

3 Deletion after It involves deleting the node after the specified node in the list. we need to skip the
specified node desired number of nodes to reach the node after which the node will be deleted.
This requires traversing through the list.

4 Traversing In traversing, we simply visit each node of the list at least once in order to perform
some specific operation on it, for example, printing data part of each node present
in the list.

5 Searching In searching, we match each element of the list with the given element. If the
element is found on any of the location then location of that element is returned
otherwise null is returned. .

Doubly linked list


Doubly linked list is a complex type of linked list in which a node contains a pointer to
the previous as well as the next node in the sequence. Therefore, in a doubly linked list,
a node consists of three parts: node data, pointer to the next node in sequence (next
pointer) , pointer to the previous node (previous pointer). A sample node in a doubly
linked list is shown in the figure.

A doubly linked list containing three nodes having numbers from 1 to 3 in their data
part, is shown in the following image.
In C, structure of a node in doubly linked list can be given as :

1. struct node
2. {
3. struct node *prev;
4. int data;
5. struct node *next;
6. }

The prev part of the first node and the next part of the last node will always contain null
indicating end in each direction.

In a singly linked list, we could traverse only in one direction, because each node
contains address of the next node and it doesn't have any record of its previous nodes.
However, doubly linked list overcome this limitation of singly linked list. Due to the fact
that, each node of the list contains the address of its previous node, we can find all the
details about the previous node as well by using the previous address stored inside the
previous part of each node.
Memory Representation of a doubly linked list
Memory Representation of a doubly linked list is shown in the following image.
Generally, doubly linked list consumes more space for every node and therefore, causes
more expansive basic operations such as insertion and deletion. However, we can easily
manipulate the elements of the list since the list maintains pointers in both the
directions (forward and backward).

In the following image, the first element of the list that is i.e. 13 stored at address 1. The
head pointer points to the starting address 1. Since this is the first element being added
to the list therefore the prev of the list contains null. The next node of the list resides at
address 4 therefore the first node contains 4 in its next pointer.

We can traverse the list in this way until we find any node containing null or -1 in its
next part.
Operations on doubly linked list
Node Creation

1. struct node
2. {
3. struct node *prev;
4. int data;
5. struct node *next;
6. };
7. struct node *head;

All the remaining operations regarding doubly linked list are described in the following
table.

SN Operation Description

1 Insertion at beginning Adding the node into the linked list at beginning.

2 Insertion at end Adding the node into the linked list to the end.

3 Insertion after specified Adding the node into the linked list after the specified node.
node

4 Deletion at beginning Removing the node from beginning of the list

5 Deletion at the end Removing the node from end of the list.

6 Deletion of the node Removing the node which is present just after the node containing the
having given data given data.

7 Searching Comparing each node data with the item to be searched and return the
location of the item in the list if the item found else return null.

8 Traversing Visiting each node of the list at least once in order to perform some specific
operation like searching, sorting, display, etc.
Circular Doubly Linked List
Circular doubly linked list is a more complexed type of data structure in which a node
contain pointers to its previous node as well as the next node. Circular doubly linked list
doesn't contain NULL in any of the node. The last node of the list contains the address
of the first node of the list. The first node of the list also contain address of the last node
in its previous pointer.

A circular doubly linked list is shown in the following figure.

Due to the fact that a circular doubly linked list contains three parts in its structure
therefore, it demands more space per node and more expensive basic operations.
However, a circular doubly linked list provides easy manipulation of the pointers and the
searching becomes twice as efficient.

Memory Management of Circular Doubly linked list


The following figure shows the way in which the memory is allocated for a circular
doubly linked list. The variable head contains the address of the first element of the list
i.e. 1 hence the starting node of the list contains data A is stored at address 1. Since,
each node of the list is supposed to have three parts therefore, the starting node of the
list contains address of the last node i.e. 8 and the next node i.e. 4. The last node of the
list that is stored at address 8 and containing data as 6, contains address of the first
node of the list as shown in the image i.e. 1. In circular doubly linked list, the last node is
identified by the address of the first node which is stored in the next part of the last
node therefore the node which contains the address of the first node, is actually the last
node of the list.

Operations on circular doubly linked list :


There are various operations which can be performed on circular doubly linked list. The
node structure of a circular doubly linked list is similar to doubly linked list. However,
the operations on circular doubly linked list is described in the following table.
SN Operation Description

1 Insertion at beginning Adding a node in circular doubly linked list at the beginning.

2 Insertion at end Adding a node in circular doubly linked list at the end.

3 Deletion at beginning Removing a node in circular doubly linked list from beginning.

4 Deletion at end Removing a node in circular doubly linked list at the end.

Binary Tree
The Binary tree means that the node can have maximum two children. Here, binary
name itself suggests that 'two'; therefore, each node can have either 0, 1 or 2 children.

Let's understand the binary tree through an example.


The above tree is a binary tree because each node contains the utmost two children. The
logical representation of the above tree is given below:

In the above tree, node 1 contains two pointers, i.e., left and a right pointer pointing to
the left and right node respectively. The node 2 contains both the nodes (left and right
node); therefore, it has two pointers (left and right). The nodes 3, 5 and 6 are the leaf
nodes, so all these nodes contain NULL pointer on both left and right parts.

Properties of Binary Tree


o At each level of i, the maximum number of nodes is 2i.
o The height of the tree is defined as the longest path from the root node to the leaf node.
The tree which is shown above has a height equal to 3. Therefore, the maximum number
of nodes at height 3 is equal to (1+2+4+8) = 15. In general, the maximum number of
nodes possible at height h is (20 + 21 + 22+….2h) = 2h+1 -1.
o The minimum number of nodes possible at height h is equal to h+1.
o If the number of nodes is minimum, then the height of the tree would be maximum.
Conversely, if the number of nodes is maximum, then the height of the tree would be
minimum.

If there are 'n' number of nodes in the binary tree.

The minimum height can be computed as:


As we know that,

n = 2h+1 -1

n+1 = 2h+1

Taking log on both the sides,

log2(n+1) = log2(2h+1)

log2(n+1) = h+1

h = log2(n+1) - 1

The maximum height can be computed as:

As we know that,

n = h+1

h= n-1

Types of Binary Tree


There are four types of Binary tree:

o Full/ proper/ strict Binary tree


o Complete Binary tree
o Perfect Binary tree
o Degenerate Binary tree
o Balanced Binary tree

1. Full/ proper/ strict Binary tree

The full binary tree is also known as a strict binary tree. The tree can only be considered
as the full binary tree if each node must contain either 0 or 2 children. The full binary
tree can also be defined as the tree in which each node must contain 2 children except
the leaf nodes.

Let's look at the simple example of the Full Binary tree.


In the above tree, we can observe that each node is either containing zero or two
children; therefore, it is a Full Binary tree.

Properties of Full Binary Tree

o The number of leaf nodes is equal to the number of internal nodes plus 1. In the above
example, the number of internal nodes is 5; therefore, the number of leaf nodes is equal
to 6.
o The maximum number of nodes is the same as the number of nodes in the binary tree,
i.e., 2h+1 -1.
o The minimum number of nodes in the full binary tree is 2*h-1.
o The minimum height of the full binary tree is log2(n+1) - 1.
o The maximum height of the full binary tree can be computed as:

n= 2*h - 1

n+1 = 2*h

h = n+1/2

Complete Binary Tree


The complete binary tree is a tree in which all the nodes are completely filled except the
last level. In the last level, all the nodes must be as left as possible. In a complete binary
tree, the nodes should be added from the left.

Let's create a complete binary tree.

The above tree is a complete binary tree because all the nodes are completely filled, and
all the nodes in the last level are added at the left first.

Properties of Complete Binary Tree

o The maximum number of nodes in complete binary tree is 2h+1 - 1.


o The minimum number of nodes in complete binary tree is 2h.
o The minimum height of a complete binary tree is log2(n+1) - 1.
o The maximum height of a complete binary tree is

Perfect Binary Tree

A tree is a perfect binary tree if all the internal nodes have 2 children, and all the leaf
nodes are at the same level.
Let's look at a simple example of a perfect binary tree.

The below tree is not a perfect binary tree because all the leaf nodes are not at the same
level.
Note: All the perfect binary trees are the complete binary trees as well as the full binary tree,
but vice versa is not true, i.e., all complete binary trees and full binary trees are the perfect
binary trees.

Degenerate Binary Tree


The degenerate binary tree is a tree in which all the internal nodes have only one
children.

Let's understand the Degenerate binary tree through examples.


The above tree is a degenerate binary tree because all the nodes have only one child. It
is also known as a right-skewed tree as all the nodes have a right child only.
The above tree is also a degenerate binary tree because all the nodes have only one
child. It is also known as a left-skewed tree as all the nodes have a left child only.

Balanced Binary Tree

The balanced binary tree is a tree in which both the left and right trees differ by atmost
1. For example, AVL and Red-Black trees are balanced binary tree.

Balanced Binary Tree

The balanced binary tree is a tree in which both the left and right trees differ by atmost
1. For example, AVL and Red-Black trees are balanced binary tree.

Let's understand the balanced binary tree through examples.


The above tree is a balanced binary tree because the difference between the left subtree
and right subtree is zero.
The above tree is not a balanced binary tree because the difference between the left
subtree and the right subtree is greater than 1.

Binary Tree Implementation


A Binary tree is implemented with the help of pointers. The first node in the tree is
represented by the root pointer. Each node in the tree consists of three parts, i.e., data,
left pointer and right pointer. To create a binary tree, we first need to create the node.
We will create the node of user-defined as shown below:

1. struct node
2. {
3. int data,
4. struct node *left, *right;
5. }
In the above structure, data is the value, left pointer contains the address of the left
node, and right pointer contains the address of the right node.

UNIT-2
Write an algorithm for internal and external searching
techniques. (HPU Bsc 2022)

Searching

Searching is the process of finding some particular element in the


list/records/array. If the element is present in the list, then the process
is called successful and the process returns the location of that element,
otherwise, the search is called unsuccessful.

There are two popular search methods that are widely used in order
to search some items into the list. However, the choice of
the algorithm depends upon the arrangement of the list.
* Linear Search
* Binary Search

Internal searching

When all the records to be searched are kept in the main memory then such
searching is termed as internal searching.

External searching

When the number of elements are more and all the elements cannot be
stored in the main memory but are stored in secondary memory/storage
device. Then this kind of searching is termed as external searching.

Internal searching External searching


External searching is done
Internal searching is done in internal memory.
in external memory.
Internal searching is applied to small External searching applied to huge collection
collection of data/elements. of data/elements.
Internal searching does not make use of extra External searching makes use of extra
resources. resources.

Internal searching requires primary memory External searching requires external storage
such as RAM memory such as hard disk , floppy disk etc.

Searching Techniques in Data Structure


The most famous techniques of searching in data structures are:

1. Sequential Search
This is the traditional technique for searching an element in a collection of

elements. In this type of search, all the elements of the list are traversed one

by one to find if the element is present in the list or not. One example of such

an algorithm is a linear search. This is a straightforward and basic algorithm.

Suppose ARR is an array of n elements, and we need to find location LOC of

element ITEM in ARR. For this, LOC is assigned to -1, which indicates that ITEM

is not present in ARR. While comparing ITEM with data at each ARR location,

and once ITEM == ARR[N], LOC is updated with location N+1. Hence we

found the ITEM in ARR.

Algorithm:
LSEARCH(ARR, N, ITEM, LOC) Here ARR Is the array of N number of elements,

ITEM holds the value we need to search in the array and algorithm returns

LOC, the location where ITEM is present in the ARR. Initially, we have to set

LOC = -1.

All in One Data Science Bundle (360+


Courses, 50+ projects)

Price
₹6999 ₹125000 View Courses

360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (82,674 ratings)

1. Set LOC = -1,i=1

2. Repeat while DATA[i] != ITEM:

i=i+1

3. If i=N+1 ,then Set LOC =0


Else LOC = N+1

4. Exit.

Let’s say, below is the ARR with 10 elements. And we need to find whether

ITEM= 18 is present in this array or not.

In the start, LOC =-1

Step 1: ITEM != 77 thus we move to next element.

Step 2: ITEM != 56 thus we move to next element.

Step 3: ITEM != 14 thus we move to next element.


Step 4: ITEM != 7 thus we move to the next element.

Step 5: Hence ITEM == ARR[4] thus LOC updated to 5.

The complexity of Sequential Search


Here are the complexities of the linear search given below.

Space complexity

As linear search algorithm does not use any extra space, thus its space

complexity = O(n) for an array of n number of elements.


Time Complexity

 Worst-case complexity: O(n) – This case occurs when the search

element is not present in the array.

 Best case complexity: O(1) – This case occurs when the first element is

the element to be searched.

 Average complexity: O(n) – This means when an element is present

somewhere in the middle of the array.

2. Binary Search
This is a technique to search an element in the list using the divide and

conquer technique. This type of technique is used in the case of sorted lists.

Instead of searching an element one by one in the list, it directly goes to the

middle element of the list, divides the array into 2 parts, and decides element

lies in which sub-array the element exists.

Suppose ARR is an array with sorted n number of elements present in

increasing order. With every step of this algorithm, the searching is confined

within BEG and END, which are the beginning and ending index of sub-arrays.

The index MID defines the middle index of the array where,
MID = INT(beg + end )/2

It needs to be checked if ITEM < ARR[N} where ITEM is the element that we

need to search in ARR.

 If ITEM = ARR[MID] then LOC = MID and exit .

 If ITEM < ARR[MID} then ITEM can appear in the left sub-array, then BEG

will be the same and END = MID -1 and repeat.

 If ITEM > ARR[MID] then ITEM can appear in the right subarray then BEG

= MID+1 and END will be the same and repeat.

After this MID is again calculated for respective sub-arrays, if we didn’t find

the ITEM, the algorithm returns -1 otherwise LOC = MID.

Algorithm:

BSEARCH(ARR, LB, UB, ITEM, LOC) Here, ARR is a sorted list of elements, with

LB and UB are lower and upper bounds for the array. ITEM needs to be

searched in the array and algorithm returns location LOC, index at which ITEM

is present else return -1.


1. Set BEG = LB, END = UB and MID = INT([BEG+END]/2)

2. Repeat step 3 and 4 while BEG <= END and ARR[MID] != ITEM

3. IF ITEM< ARR[MID] then:

Set END = MID-1

Else:

Set BEG = MID+1

4. Set MID = INT(BEG+END)/2

5. IF ARR[MID] = ITEM then:

Set LOC = MID

Else:

Set LOC = NULL

6. Exit.

Let’s say here, ITEM = 62

BEG = 1 and END =9 Hence MID = (1+9)/2 = 5

ARR[MID] = 52
Step 1: ARR[MID] < ITEM : thus END =9 and BEG = MID +1 = 6. Thus our new

sub-array is,

Step 2: Now BEG =6 and END =9 thus MID = INT([6+9]/2)= 6

NOW ARR[6] =ITEM. Thus LOC = MID

Thus LOC = 6

The complexity of Binary Search


Here are the complexities of the binary search given below.

 Worst Case: O(nlogn)

 Best Case: O(1)

 Average Case: O(nlogn)

Conclusion
Searching refers to finding the location of one element in the array of n

elements. There are 2 types of search linear and binary Search, Linear search

algorithm is straightforward and has O(n) of complexity whereas Binary Search


is a high-speed searching algorithm having the complexity of (logn) but can

only be used in case of the sorted list of elements. In case the size of the array

is large, it is preferable to use binary search instead of linear search. Binary

search is used in many searching data structures. In the case of mid-size

arrays, the linear search algorithm is more preferred.

Interpolation Search
This technique is used if the items to be searched are uniformly distributed between the
first and the last location. This technique is a simple modification in the binary search
when MID is calculated.

Mid = low + (high – low) * ((item – LIST[low]) / (LIST[high] –


LIST[low]));

Advantages

1. If the items are uniformly distributed, the average case time complexity is log2(log2(n)).
2. It is considered an improvement in binary search.

Disadvantages

1. The calculation of mid is complicated. It increases the execution time.


2. If the items are not uniformly distributed, the interpolation search will have very poor
behavior.

Memory Management
In this article, we will understand memory management in detail.

What do you mean by memory management?

Memory is the important part of the computer that is used to store the data. Its
management is critical to the computer system because the amount of main memory
available in a computer system is very limited. At any time, many processes are
competing for it. Moreover, to increase performance, several processes are executed
simultaneously. For this, we must keep several processes in the main memory, so it is
even more important to manage them effectively.

Memory management plays several roles in a computer system.


Following are the important roles in a computer system:

o Memory manager is used to keep track of the status of memory locations, whether it is
free or allocated. It addresses primary memory by providing abstractions so that
software perceives a large memory is allocated to it.
o Memory manager permits computers with a small amount of main memory to execute
programs larger than the size or amount of available memory. It does this by moving
information back and forth between primary memory and secondary memory by using
the concept of swapping.
o The memory manager is responsible for protecting the memory allocated to each
process from being corrupted by another process. If this is not ensured, then the system
may exhibit unpredictable behavior.
o Memory managers should enable sharing of memory space between processes. Thus,
two programs can reside at the same memory location although at different times.

Memory management Techniques:

The Memory management Techniques can be classified into following main


categories:

o Contiguous memory management schemes


o Non-Contiguous memory management schemes

Contiguous memory management schemes:


In a Contiguous memory management scheme, each program occupies a single
contiguous block of storage locations, i.e., a set of memory locations with consecutive
addresses.
Single contiguous memory management schemes:

The Single contiguous memory management scheme is the simplest memory


management scheme used in the earliest generation of computer systems. In this
scheme, the main memory is divided into two contiguous areas or partitions. The
operating systems reside permanently in one partition, generally at the lower memory,
and the user process is loaded into the other partition.

Advantages of Single contiguous memory management schemes:

o Simple to implement.
o Easy to manage and design.
o In a Single contiguous memory management scheme, once a process is loaded, it is
given full processor's time, and no other processor will interrupt it.

Disadvantages of Single contiguous memory management schemes:

o Wastage of memory space due to unused memory as the process is unlikely to use all
the available memory space.
o The CPU remains idle, waiting for the disk to load the binary image into the main
memory.
o It can not be executed if the program is too large to fit the entire available main memory
space.
o It does not support multiprogramming, i.e., it cannot handle multiple programs
simultaneously.

Multiple Partitioning:

The single Contiguous memory management scheme is inefficient as it limits computers


to execute only one program at a time resulting in wastage in memory space and CPU
time. The problem of inefficient CPU use can be overcome using multiprogramming that
allows more than one program to run concurrently. To switch between two processes,
the operating systems need to load both processes into the main memory. The
operating system needs to divide the available main memory into multiple parts to load
multiple processes into the main memory. Thus multiple processes can reside in the
main memory simultaneously.

The multiple partitioning schemes can be of two types:


o Fixed Partitioning
o Dynamic Partitioning

Fixed Partitioning

The main memory is divided into several fixed-sized partitions in a fixed partition
memory management scheme or static partitioning. These partitions can be of the same
size or different sizes. Each partition can hold a single process. The number of partitions
determines the degree of multiprogramming, i.e., the maximum number of processes in
memory. These partitions are made at the time of system generation and remain fixed
after that.

Advantages of Fixed Partitioning memory management schemes:

o Simple to implement.
o Easy to manage and design.

Disadvantages of Fixed Partitioning memory management schemes:

o This scheme suffers from internal fragmentation.


o The number of partitions is specified at the time of system generation.

Dynamic Partitioning

The dynamic partitioning was designed to overcome the problems of a fixed


partitioning scheme. In a dynamic partitioning scheme, each process occupies only as
much memory as they require when loaded for processing. Requested processes are
allocated memory until the entire physical memory is exhausted or the remaining space
is insufficient to hold the requesting process. In this scheme the partitions used are of
variable size, and the number of partitions is not defined at the system generation time.

Advantages of Dynamic Partitioning memory management schemes:

o Simple to implement.
o Easy to manage and design.

Disadvantages of Dynamic Partitioning memory management schemes:

o This scheme also suffers from internal fragmentation.


o The number of partitions is specified at the time of system segmentation.

Non-Contiguous memory management schemes:

In a Non-Contiguous memory management scheme, the program is divided into


different blocks and loaded at different portions of the memory that need not
necessarily be adjacent to one another. This scheme can be classified depending upon
the size of blocks and whether the blocks reside in the main memory or not.

What is paging?

Paging is a technique that eliminates the requirements of contiguous allocation of main


memory. In this, the main memory is divided into fixed-size blocks of physical memory
called frames. The size of a frame should be kept the same as that of a page to
maximize the main memory and avoid external fragmentation.

Advantages of paging:

o Pages reduce external fragmentation.


o Simple to implement.
o Memory efficient.
o Due to the equal size of frames, swapping becomes very easy.
o It is used for faster access of data.

What is Segmentation?

Segmentation is a technique that eliminates the requirements of contiguous allocation


of main memory. In this, the main memory is divided into variable-size blocks of
physical memory called segments. It is based on the way the programmer follows to
structure their programs. With segmented memory allocation, each job is divided into
several segments of different sizes, one for each module. Functions, subroutines, stack,
array, etc., are examples of such modules.

Mark-and-Sweep: Garbage Collection Algorithm

There are many garbage collection algorithms that run in the background, of which
one of them is mark and sweep.
All the objects which are created dynamically (using new in C++ and Java) are
allocated memory in the heap. If we go on creating objects we might get Out Of
Memory error since it is not possible to allocate heap memory to objects. So we need
to clear heap memory by releasing memory for all those objects which are no longer
referenced by the program (or the unreachable objects) so that the space is made
available for subsequent new objects. This memory can be released by the
programmer itself but it seems to be an overhead for the programmer, here garbage
collection comes to our rescue, and it automatically releases the heap memory for all
the unreferenced objects.

Mark and Sweep Algorithm

Any garbage collection algorithm must perform 2 basic operations. One, it should be
able to detect all the unreachable objects and secondly, it must reclaim the heap
space used by the garbage objects and make the space available again to the
program. The above operations are performed by Mark and Sweep Algorithm in two
phases as listed and described further as follows:
 Mark phase
 Sweep phase

Phase 1: Mark Phase

When an object is created, its mark bit is set to 0(false). In the Mark phase, we set the
marked bit for all the reachable objects (or the objects which a user can refer to) to
1(true). Now to perform this operation we simply need to do a graph traversal,
a depth-first search approach would work for us. Here we can consider every object
as a node and then all the nodes (objects) that are reachable from this node (object)
are visited and it goes on till we have visited all the reachable nodes.
 The root is a variable that refers to an object and is directly accessible by a local
variable. We will assume that we have one root only.
 We can access the mark bit for an object by ‘markedBit(obj)’.
Algorithm: Mark phase
Mark(root)
If markedBit(root) = false then
markedBit(root) = true
For each v referenced by root
Mark(v)
Note: If we have more than one root, then we simply have to call Mark() for all the root
variables.
Phase 2: Sweep Phase

As the name suggests it “sweeps” the unreachable objects i.e. it clears the heap
memory for all the unreachable objects. All those objects whose marked value is set to
false are cleared from the heap memory, for all other objects (reachable objects) the
marked bit is set to true.
Now the mark value for all the reachable objects is set to false since we will run the
algorithm (if required) and again we will go through the mark phase to mark all the
reachable objects.
Algorithm: Sweep Phase
Sweep()
For each object p in heap
If markedBit(p) = true then
markedBit(p) = false
else
heap.release(p)
The mark-and-sweep algorithm is called a tracing garbage collector because it traces
out the entire collection of objects that are directly or indirectly accessible by the
program.
Example:
A. All the objects have their marked bits set to false.

B. Reachable objects are marked true

C. Nonreachable objects are cleared from the heap.

Advantages of Mark and Sweep Algorithm are as follows:


 It handles the case with cyclic references, even in the case of a cycle, this
algorithm never ends up in an infinite loop.
 There are no additional overheads incurred during the execution of the algorithm.
Disadvantages of the Mark and Sweep Algorithm are as follows:
 The main disadvantage of the mark-and-sweep approach is the fact that normal
program execution is suspended while the garbage collection algorithm runs.
 Another disadvantage is that, after the Mark and Sweep Algorithm is run several
times on a program, reachable objects end up being separated by many, small
unused memory regions. Look at the below figure for a better understanding.
Storage Allocation
The different ways to allocate memory are:

1. Static storage allocation


2. Stack storage allocation
3. Heap storage allocation

Static storage allocation


o In static allocation, names are bound to storage locations.
o If memory is created at compile time then the memory will be created in static area and
only once.
o Static allocation supports the dynamic data structure that means memory is created only
at compile time and deallocated after program completion.
o The drawback with static storage allocation is that the size and position of data objects
should be known at compile time.
o Another drawback is restriction of the recursion procedure.

Stack Storage Allocation


o In static storage allocation, storage is organized as a stack.
o An activation record is pushed into the stack when activation begins and it is popped
when the activation end.
o Activation record contains the locals so that they are bound to fresh storage in each
activation record. The value of locals is deleted when the activation ends.
o It works on the basis of last-in-first-out (LIFO) and this allocation supports the recursion
process.

Heap Storage Allocation


o Heap allocation is the most flexible allocation scheme.
o Allocation and deallocation of memory can be done at any time and at any place
depending upon the user's requirement.
o Heap allocation is used to allocate memory to the variables dynamically and when the
variables are no more used then claim it back.
o Heap storage allocation supports the recursion process.

Example:
1. fact (int n)
2. {
3. if (n<=1)
4. return 1;
5. else
6. return (n * fact(n-1));
7. }
8. fact (6)

The dynamic allocation is as follows:


Buffering in Operating System
The buffer is an area in the main memory used to store or hold the data temporarily.
In other words, buffer temporarily stores data transmitted from one place to another,
either between two devices or an application. The act of storing data temporarily in the
buffer is called buffering.

A buffer may be used when moving data between processes within a computer. Buffers
can be implemented in a fixed memory location in hardware or by using a virtual data
buffer in software, pointing at a location in the physical memory. In all cases, the data in
a data buffer are stored on a physical storage medium.

Most buffers are implemented in software, which typically uses the faster RAM to store
temporary data due to the much faster access time than hard disk drives. Buffers are
typically used when there is a difference between the rate of received data and the rate
of processed data, for example, in a printer spooler or online video streaming.

A buffer often adjusts timing by implementing a queue or FIFO algorithm in memory,


simultaneously writing data into the queue at one rate and reading it at another rate.

Play Videox

Purpose of Buffering
You face buffer during watching videos on YouTube or live streams. In a video stream, a
buffer represents the amount of data required to be downloaded before the video can
play to the viewer in real-time. A buffer in a computer environment means that a set
amount of data will be stored to preload the required data before it gets used by the
CPU.
Computers have many different devices that operate at varying speeds, and a buffer is
needed to act as a temporary placeholder for everything interacting. This is done to
keep everything running efficiently and without issues between all the devices,
programs, and processes running at that time. There are three reasons behind buffering
of data,

1. It helps in matching speed between two devices in which the data is transmitted. For
example, a hard disk has to store the file received from the modem. As we know, the
transmission speed of a modem is slow compared to the hard disk. So bytes coming
from the modem is accumulated in the buffer space, and when all the bytes of a file has
arrived at the buffer, the entire data is written to the hard disk in a single operation.
2. It helps the devices with different sizes of data transfer to get adapted to each other. It
helps devices to manipulate data before sending or receiving it. In computer networking,
the large message is fragmented into small fragments and sent over the network. The
fragments are accumulated in the buffer at the receiving end and reassembled to form a
complete large message.
3. It also supports copy semantics. With copy semantics, the version of data in the buffer is
guaranteed to be the version of data at the time of system call, irrespective of any
subsequent change to data in the buffer. Buffering increases the performance of the
device. It overlaps the I/O of one job with the computation of the same job.

Types of Buffering
There are three main types of buffering in the operating system, such as:

1. Single Buffer
In Single Buffering, only one buffer is used to transfer the data between two devices.
The producer produces one block of data into the buffer. After that, the consumer
consumes the buffer. Only when the buffer is empty, the processor again produces the
data.

Block oriented device: The following operations are performed in the block-oriented
device,

o System buffer takes the input.


o After taking the input, the block gets transferred to the user space and then requests
another block.
o Two blocks work simultaneously. When the user processes one block of data, the next
block is being read in.
o OS can swap the processes.
o OS can record the data of the system buffer to user processes.

Stream oriented device: It performed the following operations, such as:

o Line-at a time operation is used for scroll made terminals. The user inputs one line at a
time, with a carriage return waving at the end of a line.
o Byte-at a time operation is used on forms mode, terminals when each keystroke is
significant.

2. Double Buffer

In Double Buffering, two schemes or two buffers are used in the place of one. In this
buffering, the producer produces one buffer while the consumer consumes another
buffer simultaneously. So, the producer not needs to wait for filling the buffer. Double
buffering is also known as buffer swapping.

Block oriented: This is how a double buffer works. There are two buffers in the system.

o The driver or controller uses one buffer to store data while waiting for it to be taken by a
higher hierarchy level.
o Another buffer is used to store data from the lower-level module.
o A major disadvantage of double buffering is that the complexity of the process gets
increased.
o If the process performs rapid bursts of I/O, then using double buffering may be deficient.

Stream oriented: It performs these operations, such as:

o Line- at a time I/O, the user process does not need to be suspended for input or output
unless the process runs ahead of the double buffer.
o Byte- at time operations, double buffer offers no advantage over a single buffer of twice
the length.

3. Circular Buffer

When more than two buffers are used, the buffers' collection is called a circular buffer.
Each buffer is being one unit in the circular buffer. The data transfer rate will increase
using the circular buffer rather than the double buffering.
o In this, the data do not directly pass from the producer to the consumer because the
data would change due to overwriting of buffers before consumed.
o The producer can only fill up to buffer x-1 while data in buffer x is waiting to be
consumed.

How Buffering Works


In an operating system, buffer works in the following way:

o Buffering is done to deal effectively with a speed mismatch between the producer and
consumer of the data stream.
o A buffer is produced in the main memory to heap up the bytes received from the
modem.
o After receiving the data in the buffer, the data get transferred to a disk from the buffer in
a single operation.
o This process of data transfer is not instantaneous. Therefore the modem needs another
buffer to store additional incoming data.
o When the first buffer got filled, then it is requested to transfer the data to disk.
o The modem then fills the additional incoming data in the second buffer while the data in
the first buffer gets transferred to the disk.
o When both the buffers completed their tasks, the modem switches back to the first
buffer while the data from the second buffer gets transferred to the disk.
o Two buffers disintegrate the producer and the data consumer, thus minimising the time
requirements between them.
o Buffering also provides variations for devices that have different data transfer sizes.

Advantages of Buffer
Buffering plays a very important role in any operating system during the execution of
any process or task. It has the following advantages.

o The use of buffers allows uniform disk access. It simplifies system design.
o The system places no data alignment restrictions on user processes doing I/O. By
copying data from user buffers to system buffers and vice versa, the kernel eliminates
the need for special alignment of user buffers, making user programs simpler and more
portable.
o The use of the buffer can reduce the amount of disk traffic, thereby increasing overall
system throughput and decreasing response time.
o The buffer algorithms help ensure file system integrity.

Disadvantages of Buffer
Buffers are not better in all respects. Therefore, there are a few disadvantages as follows,
such as:
o It is costly and impractical to have the buffer be the exact size required to hold the
number of elements. Thus, the buffer is slightly larger most of the time, with the rest of
the space being wasted.
o Buffers have a fixed size at any point in time. When the buffer is full, it must be
reallocated with a larger size, and its elements must be moved. Similarly, when the
number of valid elements in the buffer is significantly smaller than its size, the buffer
must be reallocated with a smaller size and elements be moved to avoid too much
waste.
o Use of the buffer requires an extra data copy when reading and writing to and from user
processes. When transmitting large amounts of data, the extra copy slows down
performance

Operations on the File


A file is a collection of logically related data that is recorded on the secondary storage in
the form of sequence of operations. The content of the files are defined by its creator
who is creating the file. The various operations which can be implemented on a file such
as read, write, open and close etc. are called file operations. These operations are
performed by the user by using the commands provided by the operating system. Some
common operations are as follows:

1.Create operation:
This operation is used to create a file in the file system. It is the most widely used
operation performed on the file system. To create a new file of a particular type the
associated application program calls the file system. This file system allocates space to
the file. As the file system knows the format of directory structure, so entry of this new
file is made into the appropriate directory.

2. Open operation:

This operation is the common operation performed on the file. Once the file is created,
it must be opened before performing the file processing operations. When the user
wants to open a file, it provides a file name to open the particular file in the file system.
It tells the operating system to invoke the open system call and passes the file name to
the file system.

3. Write operation:

This operation is used to write the information into a file. A system call write is issued
that specifies the name of the file and the length of the data has to be written to the file.
Whenever the file length is increased by specified value and the file pointer is
repositioned after the last byte written.

4. Read operation:

This operation reads the contents from a file. A Read pointer is maintained by the OS,
pointing to the position up to which the data has been read.

5. Re-position or Seek operation:

The seek system call re-positions the file pointers from the current position to a specific
place in the file i.e. forward or backward depending upon the user's requirement. This
operation is generally performed with those file management systems that support
direct access files.

6. Delete operation:

Deleting the file will not only delete all the data stored inside the file it is also used so
that disk space occupied by it is freed. In order to delete the specified file the directory
is searched. When the directory entry is located, all the associated file space and the
directory entry is released.

7. Truncate operation:
Truncating is simply deleting the file except deleting attributes. The file is not
completely deleted although the information stored inside the file gets replaced.

8. Close operation:

When the processing of the file is complete, it should be closed so that all the changes
made permanent and all the resources occupied should be released. On closing it
deallocates all the internal descriptors that were created when the file was opened.

9. Append operation:

This operation adds data to the end of the file.

10. Rename operation:

This operation is used to rename the existing file.

Unit-IV
Define file organization and elaborate
three types of file organization.(HPU Bsc
2022)
Sequential File Organization
This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:

1. Pile File Method:


o It is a quite simple method. In this method, we store the record in a sequence, i.e., one
after another. Here, the record will be inserted in the order in which they are inserted
into tables.
o In case of updating or deleting of any record, the record will be searched in the memory
blocks. When it is found, then it will be marked for deleting, and the new record is
inserted.
Insertion of the new record:
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence,
records are nothing but a row in the table. Suppose we want to insert a new record R2
in the sequence, then it will be placed at the end of the file. Here, records are nothing
but a row in any table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then it will sort the
sequence in ascending or descending order. Sorting of records is based on any primary
key or any other key.
o In the case of modification of any record, it will update the record and then sort the file,
and lastly, the updated record is placed in the right place.
Insertion of the new record:
Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto
R6 and R7. Suppose a new record R2 has to be inserted in the sequence, then it will be
inserted at the end of the file, and then it will sort the sequence.

Pros of sequential file organization


o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic
tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation
of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization


o It will waste time as we cannot jump on a particular record that is required but we have
to move sequentially which takes our time.

o Sorted file method takes more time and space for sorting the records.
Indexed sequential access method (ISAM)
ISAM method is an advanced sequential file organization. In this method, records are
stored in the file using the primary key. An index value is generated for each primary key
and mapped with the record. This index contains the address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data
block is fetched and the record is retrieved from the memory.

Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value. In
the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.

Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain
the sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.

Hashed File Organisation


Hashed file organisation is also called a direct file organisation.
In this method, for storing the records a hash function is calculated, which provides the
address of the block to store the record. Any type of mathematical function can be used
as a hash function. It can be simple or complex.
Hash function is applied to columns or attributes to get the block address. The records
are stored randomly. So, it is also known as Direct or Random file organization.
If the generated hash function is on the column which is considered as key, then the
column can be called as hash key and if the generated hash function is on the column
which is considered as non-key, then the column can be called as hash column.

Indexed file organisation


A record key is used in indexed file organisation to record the data. Record key is a
unique ID which identifies a record and can also tell the sequence of the record.
Record key is present in a field which is contained in each record. A record key for a
record may be an employee number.
Alternate indexes can also be in an indexed file system which means different
arrangements of records can be accessed by the same record keys.
For example, files can be accessed through the employee department rather than
employee ID.

Advantages of indexed over hash


The advantages of indexed file organisation over hashed file organisation are as
follows −
 In indexed file organisation, data records can be easily processed, unlike hashed
file organisation in which data is stored randomly and is a big mess.
 Multiple records can be accessed using the same key in indexed file organisation
while this is not the case with hashed file organisation.

Advantages of hash over indexed


The advantages of hashed file organization over indexed file organization are as
follows −
 In hashed file organization, the records need not be sorted after any transaction.
While in an indexed file organization the reorganization needs to be done from
time to time to get rid of deleted records.
 There is an extra cost to maintain index in indexed file organization, while this is
not the case in hashed file organization.

DBMS - File Structure


Relative data and information is stored collectively in file formats. A file is a sequence of
records stored in binary format. A disk drive is formatted into several blocks that can
store records. File records are mapped onto those disk blocks.

File Organization
File Organization defines how file records are mapped onto disk blocks. We have four
types of File Organization to organize file records −

Heap File Organization


When a file is created using Heap File Organization, the Operating System allocates
memory area to that file without any further accounting details. File records can be
placed anywhere in that memory area. It is the responsibility of the software to manage
the records. Heap File does not support any ordering, sequencing, or indexing on its
own.

Sequential File Organization


Every file record contains a data field (attribute) to uniquely identify that record. In
sequential file organization, records are placed in the file in some sequential order
based on the unique key field or search key. Practically, it is not possible to store all the
records sequentially in physical form.

Hash File Organization


Hash File Organization uses Hash function computation on some fields of the records.
The output of the hash function determines the location of disk block where the records
are to be placed.

Clustered File Organization


Clustered file organization is not considered good for large databases. In this
mechanism, related records from one or more relations are kept in the same disk block,
that is, the ordering of records is not based on primary key or search key.
File Operations
Operations on database files can be broadly classified into two categories −
 Update Operations
 Retrieval Operations
Update operations change the data values by insertion, deletion, or update. Retrieval
operations, on the other hand, do not alter the data but retrieve them after optional
conditional filtering. In both types of operations, selection plays a significant role. Other
than creation and deletion of a file, there could be several operations, which can be
done on files.
 Open − A file can be opened in one of the two modes, read mode or write
mode. In read mode, the operating system does not allow anyone to alter data. In
other words, data is read only. Files opened in read mode can be shared among
several entities. Write mode allows data modification. Files opened in write mode
can be read but cannot be shared.
 Locate − Every file has a file pointer, which tells the current position where the
data is to be read or written. This pointer can be adjusted accordingly. Using find
(seek) operation, it can be moved forward or backward.
 Read − By default, when files are opened in read mode, the file pointer points to
the beginning of the file. There are options where the user can tell the operating
system where to locate the file pointer at the time of opening a file. The very next
data to the file pointer is read.
 Write − User can select to open a file in write mode, which enables them to edit
its contents. It can be deletion, insertion, or modification. The file pointer can be
located at the time of opening or can be dynamically changed if the operating
system allows to do so.
 Close − This is the most important operation from the operating system’s point of
view. When a request to close a file is generated, the operating system
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and
o releases all the buffers and file handlers associated with the file.
The organization of data inside a file plays a major role here. The process to locate the
file pointer to a desired record inside a file various based on whether the records are
arranged sequentially or clustered.

What is a directory?
Directory can be defined as the listing of the related files on the disk. The directory may
store some or the entire file attributes.
To get the benefit of different file systems on the different operating systems, A hard
disk can be divided into the number of partitions of different sizes. The partitions are
also called volumes or mini disks.

Each partition must have at least one directory in which, all the files of the partition can
be listed. A directory entry is maintained for each file in the directory which stores all the
information related to that file.

A directory can be viewed as a file which contains the Meta data of the bunch of files.
Play Video

Every Directory supports a number of common operations on the file:

1. File Creation
2. Search for the file
3. File deletion
4. Renaming the file
5. Traversing Files
6. Listing of files

Single Level Directory


The simplest method is to have one big list of all the files on the disk. The entire system
will contain only one directory which is supposed to mention all the files present in the
file system. The directory contains one entry per each file present on the file system.
This type of directories can be used for a simple system.

Advantages
1. Implementation is very simple.
2. If the sizes of the files are very small then the searching becomes faster.
3. File creation, searching, deletion is very simple since we have only one directory.

Disadvantages
1. We cannot have two files with the same name.
2. The directory may be very big therefore searching for a file may take so much time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits the number of files in
the system because most of the Operating System limits the number of characters used
to construct the file name.

Two Level Directory


In two level directory systems, we can create a separate directory for each user. There is
one master directory which contains separate directories dedicated to each user. For
each user, there is a different directory present at the second level, containing group of
user's file. The system doesn't let a user to enter in the other user's directory without
permission.
Characteristics of two level directory system
1. Each files has a path name as /User-name/directory-name/
2. Different users can have the same file name.
3. Searching becomes more efficient as only one user's list needs to be traversed.
4. The same kind of files cannot be grouped into a single directory for a particular user.

Every Operating System maintains a variable as PWD which contains the present
directory name (present user name) so that the searching can be done appropriately.

Tree Structured Directory


In Tree structured directory system, any directory entry can either be a file or sub
directory. Tree structured directory system overcomes the drawbacks of two level
directory system. The similar kind of files can now be grouped in one directory.

Each user has its own directory and it cannot enter in the other user's directory.
However, the user has the permission to read the root's data but he cannot write or
modify this. Only administrator of the system has the complete access of root directory.

Searching is more efficient in this directory structure. The concept of current working
directory is used. A file can be accessed by two types of path, either relative or absolute.
Absolute path is the path of the file with respect to the root directory of the system
while relative path is the path with respect to the current working directory of the
system. In tree structured directory systems, the user is given the privilege to create the
files as well as directories.

Permissions on the file and directory


A tree structured directory system may consist of various levels therefore there is a set
of permissions assigned to each file and directory.
The permissions are R W X which are regarding reading, writing and the execution of
the files or directory. The permissions are assigned to three types of users: owner, group
and others.

There is a identification bit which differentiate between directory and file. For a
directory, it is d and for a file, it is dot (.)

The following snapshot shows the permissions assigned to a file in a Linux based
system. Initial bit d represents that it is a directory.

Acyclic-Graph Structured Directories


The tree structured directory system doesn't allow the same file to exist in multiple
directories therefore sharing is major concern in tree structured directory system. We
can provide sharing by making the directory an acyclic graph. In this system, two or
more directory entry can point to the same file or sub directory. That file or sub
directory is shared between the two directory entries.

These kinds of directory graphs can be made using links or aliases. We can have
multiple paths for a same file. Links can either be symbolic (logical) or hard link
(physical).

If a file gets deleted in acyclic graph structured directory system, then


1. In the case of soft link, the file just gets deleted and we are left with a dangling
pointer.

2. In the case of hard link, the actual file will be deleted only if all the references to it
gets deleted.

Inverted Index
An inverted index is an index data structure storing a mapping from content,
such as words or numbers, to its locations in a document or a set of documents.
In simple words, it is a hashmap like data structure that directs you from a word
to a document or a web page.
There are two types of inverted indexes: A record-level inverted
index contains a list of references to documents for each word. A word-level
inverted index additionally contains the positions of each word within a
document. The letter form offers more functionality, but needs more processing
power and space to be created.
Suppose we want to search the texts “hello everyone, ” “this article is based on
inverted index, ” “which is hashmap like data structure”. If we index by (text,
word within the text), the index with location in text is:
hello (1, 1)
everyone (1, 2)
this (2, 1)
article (2, 2)
is (2, 3); (3, 2)
based (2, 4)
on (2, 5)
inverted (2, 6)
index (2, 7)
which (3, 1)
hashmap (3, 3)
like (3, 4)
data (3, 5)
structure (3, 6)
The word “hello” is in document 1 (“hello everyone”) starting at word 1, so has
an entry (1, 1) and word “is” is in document 2 and 3 at ‘3rd’ and ‘2nd’ positions
respectively (here position is based on word).
The index may have weights, frequencies, or other indicators.
Steps to build an inverted index:
 Fetch the Document
Removing of Stop Words: Stop words are most occurring and useless words
in document like “I”, “the”, “we”, “is”, “an”.
 Stemming of Root Word
Whenever I want to search for “cat”, I want to see a document that has
information about it. But the word present in the document is called “cats” or
“catty” instead of “cat”. To relate the both words, I’ll chop some part of each
and every word I read so that I could get the “root word”. There are standard
tools for performing this like “Porter’s Stemmer”.
 Record Document IDs
If word is already present add reference of document to index else create
new entry. Add additional information like frequency of word, location of
word etc.
Example:
Words Document
ant doc1
demo doc2
world doc1, doc2
Advantage of Inverted Index are:
 Inverted index is to allow fast full text searches, at a cost of increased
processing when a document is added to the database.
 It is easy to develop.
 It is the most popular data structure used in document retrieval systems,
used on a large scale for example in search engines.
Inverted Index also has disadvantage:
 Large storage overhead and high maintenance costs on update, delete and
insert.
 Instead of retrieving the data in a decreasing order of expected usefulness,
the records are retrieved in the order in which they occur in the inverted lists.

Explain adjacency multilist. (HPU Bsc 2022)

A multi-linked list is a special type of list that contains two or more logical key
sequences. Before checking details about multi-linked list, see what is a linked
list. A linked list is a data structure that is free from any size restriction until the
heap memory is not full. We have seen different types of linked lists, such
as Singly Linked List, Circular Linked List, and Doubly Linked List. Here we will
see about multi-linked list.
In a multi-linked list, each node can have N number of pointers to other nodes.
A multi-linked list is generally used to organize multiple orders of one set of
elements.
Properties of Multi-Linked List:
The properties of a multi-linked list are mentioned below.
 It is an integrated list of related structures.
 All the nodes are integrated using links of pointers.
 Linked nodes are connected with related data.
 Nodes contain pointers from one structure to the other.
Structure of Multi-linked list:
The structure of a multi-linked list depends on the structure of a node. A single
node generally contains two things:
 A list of pointers
 All the relevant data.
Shown below is the structure of a node that contains only one data and a list of
pointers.

 C
 Java
 Python
 C#
 Javascript

typedef struct node {

int data;

vector<struct node*> pointers;

} Node;

Use cases of Multi-Linked Lists:


Some use cases of a multi-linked list are:
 Multiple orders of one set of elements
 Representation of a sparse matrix
 List of List
Multiple orders of one set of elements:
 A multi-linked list is a more general linked list with multiple links from
nodes.
 For example, suppose the task is to maintain a list in multiple orders, age
and name here, we can define a Node that has two references, an age
pointer and a name pointer.
 Then it is possible to maintain one list, where if we follow the name pointer
we can traverse the list in alphabetical order
 And if we try to traverse the age pointer, we can traverse the list by age also.
 This type of node organization may be useful for maintaining a customer list
in a bank where the same list can be traversed in any order (name, age, or
any other criteria) based on the need. For example, suppose my elements
include the name of a person and his/her age. e.g.
(ANIMESH, 19), (SUMIT, 17), (HARDIK, 22), (ISHA, 18)

Multiple orders of set

Inserting into this structure is very much like inserting the same node into two
separate lists. In multi-linked lists it is quite common to have back-pointers, i.e.
inverses of each of the forward links; in the above example, this would mean
that each node had 4pointers.
Representation of Sparse Matrix:
Multi Linked Lists are used to store sparse matrices. A sparse matrix is such a
matrix that has few non-zero values. If we use a normal array to store such a
matrix, it will end up wasting lots of space.

Spare Matrix

The sparse matrix can be represented by using a linked list for every row and
column.
 A node in a multi-linked list has four parts:
 The first part stores the data.
 The second stores the pointer to the next row.
 Third for the pointer to the next column and
 Fourth for storing the coordinate number of the cell in the matrix.

Representation of sparse matrix

List of List:
A multi-linked list can be used to represent a list of lists. For example, we can
create a linked list where each node is itself a list and have pointers to other
nodes.
See the structure below:
 It is a 2-dimensional data structure.
 Here each node has three fields:
 The first field stores the data.
 The second field stores a pointer to the child node.
 The third field stores the pointer to the next node.

List of List (multi-level linked list)

Advantages of Multi-Linked List:


The advantages of a multi-linked list are:
 Sets of same data can be processed into multiple sequences.
 Data are not duplicated anywhere.
 Data of one kind exist only once in the list.
Comparison of Multi-Linked List with Doubly Linked List:
Let’s first see the structure of a node of Doubly Linked List:

 C
 Java
 Python3
 C#

typedef struct node {

int data;

struct node* prev;

struct node* next;

} Node;

Comparing Doubly linked list and Multi-linked list:


 Unlike doubly nodes in a multilinked list may or may not have an inverse for
each pointer.
 A doubly Linked list has exactly two pointers, whether multi-linked list can
have multiple pointers
 In a doubly linked list, pointers are exactly opposite to each other, but in a
multi-linked list, it is not so.
 Doubly linked list is a special case of multi-linked list.

You might also like