Unit-I: COMP302TH: Data Structure and File Processing
Unit-I: COMP302TH: Data Structure and File Processing
Unit-I: COMP302TH: Data Structure and File Processing
Unit-I
Basic Data Structures: Abstract data structures- stacks, queues, linked
lists and binary trees. Binary trees, balanced trees.
Unit-II
Searching: Internal and external searching, Memory Management:
Garbage collection algorithms for equal sized blocks, storage allocation
for objects with mixed size.
Unit-III
Physical Devices: Characteristics of storage devices such as disks and
tapes, I/O buffering. Basic File System Operations: Create, open, close,
extend, delete, read-block, write-block, protection mechanisms.
Unit-IV
File Organizations: Sequential, indexed sequential, direct, inverted,
multi-list, directory systems, Indexing using B-tree, B+ tree.
Books Recommended:
UNIT-1
Abstract data type in data structure
Before knowing about the abstract data type, we should know about the what is a data
structure.
o Mathematical/ Logical/ Abstract models/ Views: The data structure is the way of
organizing the data that requires some protocols or rules. These rules need to be
modeled that come under the logical/abstract model.
o Implementation: The second part is the implementation part. The rules must be
implemented using some programming language.
o These are the essential ingredients used for creating fast and powerful algorithms.
o They help us to manage and organize the data.
o Data structures make the code cleaner and easier to understand.
What is abstract data type?
An abstract data type is an abstraction of a data structure that provides only the
interface to which the data structure must adhere. The interface does not give any
specific details about something should be implemented or in what programming
language.
In other words, we can say that abstract data types are the entities that are definitions of
data and operations but do not have implementation details. In this case, we know the
data that we are storing and the operations that can be performed on the data, but we
don't know about the implementation details. The reason for not having
implementation details is that every programming language has a different
implementation strategy for example; a C data structure is implemented using structures
while a C++ data structure is implemented using objects and classes.
For example, a List is an abstract data type that is implemented using a dynamic array
and linked list. A queue is implemented using linked list-based queue, array-based
queue, and stack-based queue. A Map is implemented using Tree map, hash map, or
hash table.
Abstraction: It is a technique of hiding the internal details from the user and only
showing the necessary details to the user.
o 4 GB RAM
o Snapdragon 2.2ghz processor
o 5 inch LCD screen
o Dual camera
o Android 8.0
The above specifications of the smartphone are the data, and we can also perform the
following operations on the smartphone:
The smartphone is an entity whose data or specifications and operations are given
above. The abstract/logical view and operations are the abstract or logical views of a
smartphone.
Suppose we want to store the elements in a stack and let's assume that stack is empty.
We have taken the stack of size 5 as shown below in which we are pushing the elements
one by one until the stack becomes full.
Since our stack is full as the size of the stack is 5. In the above cases, we can observe
that it goes from the top to the bottom when we were entering the new element in the
stack. The stack gets filled up from the bottom to the top.
When we perform the delete operation on the stack, there is only one way for entry and
exit as the other end is closed. It follows the LIFO pattern, which means that the value
entered first will be removed last. In the above case, the value 5 is entered first, so it will
be removed only after the deletion of all the other elements.
o push(): When we insert an element in a stack then the operation is known as a push. If
the stack is full then the overflow condition occurs.
o pop(): When we delete an element from the stack, the operation is known as a pop. If
the stack is empty means that no element exists in the stack, this state is known as an
underflow state.
o isEmpty(): It determines whether the stack is empty or not.
o isFull(): It determines whether the stack is full or not.'
o peek(): It returns the element at the given position.
o count(): It returns the total number of elements available in a stack.
o change(): It changes the element at the given position.
o display(): It prints all the elements available in the stack.
PUSH operation
The steps involved in the PUSH operation is given below:
o The elements will be inserted until we reach the max size of the stack.
POP operation
The steps involved in the POP operation is given below:
o Before deleting the element from the stack, we check whether the stack is empty.
o If we try to delete the element from the empty stack, then the underflow condition
occurs.
o If the stack is not empty, we first access the element which is pointed by the top
o Once the pop operation is performed, the top is decremented by 1, i.e., top=top-1.
Applications of Stack
The following are the applications of the stack:
o Balancing of symbols: Stack is used for balancing a symbol. For example, we have the
following program:
1. int main()
2. {
3. cout<<"Hello";
4. cout<<"javaTpoint";
5. }
As we know, each program has an opening and closing braces; when the opening braces
come, we push the braces in a stack, and when the closing braces appear, we pop the
opening braces from the stack. Therefore, the net value comes out to be zero. If any
symbol is left in the stack, it means that some syntax occurs in a program.
o String reversal: Stack is also used for reversing a string. For example, we want to reverse
a "javaTpoint" string, so we can achieve this with the help of a stack.
First, we push all the characters of the string in a stack until we reach the null character.
After pushing all the characters, we start taking out the character one by one until we
reach the bottom of the stack.
o UNDO/REDO: It can also be used for performing UNDO/REDO operations. For example,
we have an editor in which we write 'a', then 'b', and then 'c'; therefore, the text written in
an editor is abc. So, there are three states, a, ab, and abc, which are stored in a stack.
There would be two stacks in which one stack shows UNDO state, and the other shows
REDO state.
If we want to perform UNDO operation, and want to achieve 'ab' state, then we
implement pop operation.
o Recursion: The recursion means that the function is calling itself again. To maintain the
previous states, the compiler creates a system stack in which all the previous records of
the function are maintained.
o DFS(Depth First Search): This search is implemented on a Graph, and Graph uses the
stack data structure.
o Backtracking: Suppose we have to create a path to solve a maze problem. If we are
moving in a particular path, and we realize that we come on the wrong way. In order to
come at the beginning of the path to create a new path, we have to use the stack data
structure.
o Expression conversion: Stack can also be used for expression conversion. This is one of
the most important applications of stack. The list of the expression conversion is given
below:
o Infix to prefix
o Infix to postfix
o Prefix to infix
o Prefix to postfix
Postfix to infix
o Memory management: The stack manages the memory. The memory is assigned in the
contiguous memory blocks. The memory is known as stack memory as all the variables
are assigned in a function call stack memory. The memory size assigned to the program
is known to the compiler. When the function is created, all its variables are assigned in
the stack memory. When the function completed its execution, all the variables assigned
in the stack are released.
What is Queue ? Discuss its various
applications. (HPU BCA )
What is a Queue?
Queue is the data structure that is similar to the queue in the real world. A queue is a
data structure in which whatever comes first will go out first, and it follows the FIFO
(First-In-First-Out) policy. Queue can also be defined as the list or collection in which the
insertion is done from one end known as the rear end or the tail of the queue, whereas
the deletion is done from another end known as the front end or the head of the
queue.
The real-world example of a queue is the ticket queue outside a cinema hall, where the
person who enters first in the queue gets the ticket first, and the last person enters in
the queue gets the ticket at last. Similar approach is followed in the queue in data
structure.
Types of Queue
There are four different types of queue that are listed as follows -
o Simple Queue or Linear Queue
o Circular Queue
o Priority Queue
o Double Ended Queue (or Deque)
To know more about the queue in data structure, you can click the link -
https://www.javatpoint.com/data-structure-queue
Circular Queue
In Circular Queue, all the nodes are represented as circular. It is similar to the linear
Queue except that the last element of the queue is connected to the first element. It is
also known as Ring Buffer, as all the ends are connected to another end. The
representation of circular queue is shown in the below image -
The drawback that occurs in a linear queue is overcome by using the circular queue. If
the empty space is available in a circular queue, the new element can be added in an
empty space by simply incrementing the value of rear. The main advantage of using the
circular queue is better memory utilization.
To know more about the circular queue, you can click the link -
https://www.javatpoint.com/circular-queue
Priority Queue
It is a special type of queue in which the elements are arranged based on the priority. It
is a special type of queue data structure in which every element has a priority associated
with it. Suppose some elements occur with the same priority, they will be arranged
according to the FIFO principle. The representation of priority queue is shown in the
below image -
Insertion in priority queue takes place based on the arrival, while deletion in the priority
queue occurs based on the priority. Priority queue is mainly used to implement the CPU
scheduling algorithms.
There are two types of priority queue that are discussed as follows -
To learn more about the priority queue, you can click the link -
https://www.javatpoint.com/ds-priority-queue
Deque can be used both as stack and queue as it allows the insertion and deletion
operations on both ends. Deque can be considered as stack because stack follows the
LIFO (Last In First Out) principle in which insertion and deletion both can be performed
only from one end. And in deque, it is possible to perform both insertion and deletion
from one end, and Deque does not follow the FIFO principle.
To know more about the deque, you can click the link - https://www.javatpoint.com/ds-
deque
o Input restricted deque - As the name implies, in input restricted queue, insertion
operation can be performed at only one end, while deletion can be performed from both
ends.
o Output restricted deque - As the name implies, in output restricted queue, deletion
operation can be performed at only one end, while insertion can be performed from
both ends.
o Enqueue: The Enqueue operation is used to insert the element at the rear end of the
queue. It returns void.
o Dequeue: It performs the deletion from the front-end of the queue. It also returns the
element which has been removed from the front-end. It returns an integer value.
o Peek: This is the third operation that returns the element, which is pointed by the front
pointer in the queue but does not delete it.
o Queue overflow (isfull): It shows the overflow condition when the queue is completely
full.
o Queue underflow (isempty): It shows the underflow condition when the Queue is
empty, i.e., no elements are in the Queue.
Linked List
o Linked List can be defined as collection of objects called nodes that are randomly stored
in the memory.
o A node contains two fields i.e. data stored at that particular address and the pointer
which contains the address of the next node in the memory.
o The last node of the list contains pointer to the null.
Uses of Linked List
o The list is not required to be contiguously present in the memory. The node can reside
any where in the memory and linked together to make a list. This achieves optimized
utilization of space.
o list size is limited to the memory size and doesn't need to be declared in advance.
o Empty node can not be present in the linked list.
o We can store values of primitive types or objects in the singly linked list.
1. The size of array must be known in advance before using it in the program.
2. Increasing size of the array is a time taking process. It is almost impossible to expand the
size of the array at run time.
3. All the elements in the array need to be contiguously stored in the memory. Inserting
any element in the array needs shifting of all its predecessors.
Linked list is the data structure which can overcome all the limitations of an array. Using
linked list is useful because,
1. It allocates the memory dynamically. All the nodes of linked list are non-contiguously
stored in the memory and linked together with the help of pointers.
2. Sizing is no longer a problem since we do not need to define its size at the time of
declaration. List grows as per the program's demand and limited to the available
memory space.
Play Videox
One way chain or singly linked list can be traversed only in one direction. In other words,
we can say that each node contains only next pointer, therefore we can not traverse the
list in the reverse direction.
Consider an example where the marks obtained by the student in three subjects are
stored in a linked list as shown in the figure.
In the above figure, the arrow represents the links. The data part of every node contains
the marks obtained by the student in the different subject. The last node in the list is
identified by the null pointer which is present in the address part of the last node. We
can have as many elements we require, in the data part of the list.
Complexity
Data Time Complexity Space
Structure Compleity
Singly Linked θ(n) θ(n) θ(1) θ(1) O(n) O(n) O(1) O(1) O(n)
List
Node Creation
1. struct node
2. {
3. int data;
4. struct node *next;
5. };
6. struct node *head, *ptr;
7. ptr = (struct node *)malloc(sizeof(struct node *));
Insertion
The insertion into a singly linked list can be performed at different positions. Based on
the position of the new node being inserted, the insertion is categorized into the
following categories.
SN Operation Description
1 Insertion at It involves inserting any element at the front of the list. We just need to a few link
beginning adjustments to make the new node as the head of the list.
2 Insertion at end of It involves insertion at the last of the linked list. The new node can be inserted as
the list the only node in the list or it can be inserted as the last one. Different logics are
implemented in each scenario.
3 Insertion after It involves insertion after the specified node of the linked list. We need to skip the
specified node desired number of nodes in order to reach the node after which the new node will
be inserted. .
SN Operation Description
1 Deletion at It involves deletion of a node from the beginning of the list. This is the simplest
beginning operation among all. It just need a few adjustments in the node pointers.
2 Deletion at the It involves deleting the last node of the list. The list can either be empty or full.
end of the list Different logic is implemented for the different scenarios.
3 Deletion after It involves deleting the node after the specified node in the list. we need to skip the
specified node desired number of nodes to reach the node after which the node will be deleted.
This requires traversing through the list.
4 Traversing In traversing, we simply visit each node of the list at least once in order to perform
some specific operation on it, for example, printing data part of each node present
in the list.
5 Searching In searching, we match each element of the list with the given element. If the
element is found on any of the location then location of that element is returned
otherwise null is returned. .
A doubly linked list containing three nodes having numbers from 1 to 3 in their data
part, is shown in the following image.
In C, structure of a node in doubly linked list can be given as :
1. struct node
2. {
3. struct node *prev;
4. int data;
5. struct node *next;
6. }
The prev part of the first node and the next part of the last node will always contain null
indicating end in each direction.
In a singly linked list, we could traverse only in one direction, because each node
contains address of the next node and it doesn't have any record of its previous nodes.
However, doubly linked list overcome this limitation of singly linked list. Due to the fact
that, each node of the list contains the address of its previous node, we can find all the
details about the previous node as well by using the previous address stored inside the
previous part of each node.
Memory Representation of a doubly linked list
Memory Representation of a doubly linked list is shown in the following image.
Generally, doubly linked list consumes more space for every node and therefore, causes
more expansive basic operations such as insertion and deletion. However, we can easily
manipulate the elements of the list since the list maintains pointers in both the
directions (forward and backward).
In the following image, the first element of the list that is i.e. 13 stored at address 1. The
head pointer points to the starting address 1. Since this is the first element being added
to the list therefore the prev of the list contains null. The next node of the list resides at
address 4 therefore the first node contains 4 in its next pointer.
We can traverse the list in this way until we find any node containing null or -1 in its
next part.
Operations on doubly linked list
Node Creation
1. struct node
2. {
3. struct node *prev;
4. int data;
5. struct node *next;
6. };
7. struct node *head;
All the remaining operations regarding doubly linked list are described in the following
table.
SN Operation Description
1 Insertion at beginning Adding the node into the linked list at beginning.
2 Insertion at end Adding the node into the linked list to the end.
3 Insertion after specified Adding the node into the linked list after the specified node.
node
5 Deletion at the end Removing the node from end of the list.
6 Deletion of the node Removing the node which is present just after the node containing the
having given data given data.
7 Searching Comparing each node data with the item to be searched and return the
location of the item in the list if the item found else return null.
8 Traversing Visiting each node of the list at least once in order to perform some specific
operation like searching, sorting, display, etc.
Circular Doubly Linked List
Circular doubly linked list is a more complexed type of data structure in which a node
contain pointers to its previous node as well as the next node. Circular doubly linked list
doesn't contain NULL in any of the node. The last node of the list contains the address
of the first node of the list. The first node of the list also contain address of the last node
in its previous pointer.
Due to the fact that a circular doubly linked list contains three parts in its structure
therefore, it demands more space per node and more expensive basic operations.
However, a circular doubly linked list provides easy manipulation of the pointers and the
searching becomes twice as efficient.
1 Insertion at beginning Adding a node in circular doubly linked list at the beginning.
2 Insertion at end Adding a node in circular doubly linked list at the end.
3 Deletion at beginning Removing a node in circular doubly linked list from beginning.
4 Deletion at end Removing a node in circular doubly linked list at the end.
Binary Tree
The Binary tree means that the node can have maximum two children. Here, binary
name itself suggests that 'two'; therefore, each node can have either 0, 1 or 2 children.
In the above tree, node 1 contains two pointers, i.e., left and a right pointer pointing to
the left and right node respectively. The node 2 contains both the nodes (left and right
node); therefore, it has two pointers (left and right). The nodes 3, 5 and 6 are the leaf
nodes, so all these nodes contain NULL pointer on both left and right parts.
n = 2h+1 -1
n+1 = 2h+1
log2(n+1) = log2(2h+1)
log2(n+1) = h+1
h = log2(n+1) - 1
As we know that,
n = h+1
h= n-1
The full binary tree is also known as a strict binary tree. The tree can only be considered
as the full binary tree if each node must contain either 0 or 2 children. The full binary
tree can also be defined as the tree in which each node must contain 2 children except
the leaf nodes.
o The number of leaf nodes is equal to the number of internal nodes plus 1. In the above
example, the number of internal nodes is 5; therefore, the number of leaf nodes is equal
to 6.
o The maximum number of nodes is the same as the number of nodes in the binary tree,
i.e., 2h+1 -1.
o The minimum number of nodes in the full binary tree is 2*h-1.
o The minimum height of the full binary tree is log2(n+1) - 1.
o The maximum height of the full binary tree can be computed as:
n= 2*h - 1
n+1 = 2*h
h = n+1/2
The above tree is a complete binary tree because all the nodes are completely filled, and
all the nodes in the last level are added at the left first.
A tree is a perfect binary tree if all the internal nodes have 2 children, and all the leaf
nodes are at the same level.
Let's look at a simple example of a perfect binary tree.
The below tree is not a perfect binary tree because all the leaf nodes are not at the same
level.
Note: All the perfect binary trees are the complete binary trees as well as the full binary tree,
but vice versa is not true, i.e., all complete binary trees and full binary trees are the perfect
binary trees.
The balanced binary tree is a tree in which both the left and right trees differ by atmost
1. For example, AVL and Red-Black trees are balanced binary tree.
The balanced binary tree is a tree in which both the left and right trees differ by atmost
1. For example, AVL and Red-Black trees are balanced binary tree.
1. struct node
2. {
3. int data,
4. struct node *left, *right;
5. }
In the above structure, data is the value, left pointer contains the address of the left
node, and right pointer contains the address of the right node.
UNIT-2
Write an algorithm for internal and external searching
techniques. (HPU Bsc 2022)
Searching
There are two popular search methods that are widely used in order
to search some items into the list. However, the choice of
the algorithm depends upon the arrangement of the list.
* Linear Search
* Binary Search
Internal searching
When all the records to be searched are kept in the main memory then such
searching is termed as internal searching.
External searching
When the number of elements are more and all the elements cannot be
stored in the main memory but are stored in secondary memory/storage
device. Then this kind of searching is termed as external searching.
Internal searching requires primary memory External searching requires external storage
such as RAM memory such as hard disk , floppy disk etc.
1. Sequential Search
This is the traditional technique for searching an element in a collection of
elements. In this type of search, all the elements of the list are traversed one
by one to find if the element is present in the list or not. One example of such
element ITEM in ARR. For this, LOC is assigned to -1, which indicates that ITEM
is not present in ARR. While comparing ITEM with data at each ARR location,
and once ITEM == ARR[N], LOC is updated with location N+1. Hence we
Algorithm:
LSEARCH(ARR, N, ITEM, LOC) Here ARR Is the array of N number of elements,
ITEM holds the value we need to search in the array and algorithm returns
LOC, the location where ITEM is present in the ARR. Initially, we have to set
LOC = -1.
Price
₹6999 ₹125000 View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (82,674 ratings)
i=i+1
4. Exit.
Let’s say, below is the ARR with 10 elements. And we need to find whether
Space complexity
As linear search algorithm does not use any extra space, thus its space
Best case complexity: O(1) – This case occurs when the first element is
2. Binary Search
This is a technique to search an element in the list using the divide and
conquer technique. This type of technique is used in the case of sorted lists.
Instead of searching an element one by one in the list, it directly goes to the
middle element of the list, divides the array into 2 parts, and decides element
increasing order. With every step of this algorithm, the searching is confined
within BEG and END, which are the beginning and ending index of sub-arrays.
The index MID defines the middle index of the array where,
MID = INT(beg + end )/2
It needs to be checked if ITEM < ARR[N} where ITEM is the element that we
If ITEM < ARR[MID} then ITEM can appear in the left sub-array, then BEG
If ITEM > ARR[MID] then ITEM can appear in the right subarray then BEG
After this MID is again calculated for respective sub-arrays, if we didn’t find
Algorithm:
BSEARCH(ARR, LB, UB, ITEM, LOC) Here, ARR is a sorted list of elements, with
LB and UB are lower and upper bounds for the array. ITEM needs to be
searched in the array and algorithm returns location LOC, index at which ITEM
2. Repeat step 3 and 4 while BEG <= END and ARR[MID] != ITEM
Else:
Else:
6. Exit.
ARR[MID] = 52
Step 1: ARR[MID] < ITEM : thus END =9 and BEG = MID +1 = 6. Thus our new
sub-array is,
Thus LOC = 6
Conclusion
Searching refers to finding the location of one element in the array of n
elements. There are 2 types of search linear and binary Search, Linear search
only be used in case of the sorted list of elements. In case the size of the array
Interpolation Search
This technique is used if the items to be searched are uniformly distributed between the
first and the last location. This technique is a simple modification in the binary search
when MID is calculated.
Advantages
1. If the items are uniformly distributed, the average case time complexity is log2(log2(n)).
2. It is considered an improvement in binary search.
Disadvantages
Memory Management
In this article, we will understand memory management in detail.
Memory is the important part of the computer that is used to store the data. Its
management is critical to the computer system because the amount of main memory
available in a computer system is very limited. At any time, many processes are
competing for it. Moreover, to increase performance, several processes are executed
simultaneously. For this, we must keep several processes in the main memory, so it is
even more important to manage them effectively.
o Memory manager is used to keep track of the status of memory locations, whether it is
free or allocated. It addresses primary memory by providing abstractions so that
software perceives a large memory is allocated to it.
o Memory manager permits computers with a small amount of main memory to execute
programs larger than the size or amount of available memory. It does this by moving
information back and forth between primary memory and secondary memory by using
the concept of swapping.
o The memory manager is responsible for protecting the memory allocated to each
process from being corrupted by another process. If this is not ensured, then the system
may exhibit unpredictable behavior.
o Memory managers should enable sharing of memory space between processes. Thus,
two programs can reside at the same memory location although at different times.
o Simple to implement.
o Easy to manage and design.
o In a Single contiguous memory management scheme, once a process is loaded, it is
given full processor's time, and no other processor will interrupt it.
o Wastage of memory space due to unused memory as the process is unlikely to use all
the available memory space.
o The CPU remains idle, waiting for the disk to load the binary image into the main
memory.
o It can not be executed if the program is too large to fit the entire available main memory
space.
o It does not support multiprogramming, i.e., it cannot handle multiple programs
simultaneously.
Multiple Partitioning:
Fixed Partitioning
The main memory is divided into several fixed-sized partitions in a fixed partition
memory management scheme or static partitioning. These partitions can be of the same
size or different sizes. Each partition can hold a single process. The number of partitions
determines the degree of multiprogramming, i.e., the maximum number of processes in
memory. These partitions are made at the time of system generation and remain fixed
after that.
o Simple to implement.
o Easy to manage and design.
Dynamic Partitioning
o Simple to implement.
o Easy to manage and design.
What is paging?
Advantages of paging:
What is Segmentation?
There are many garbage collection algorithms that run in the background, of which
one of them is mark and sweep.
All the objects which are created dynamically (using new in C++ and Java) are
allocated memory in the heap. If we go on creating objects we might get Out Of
Memory error since it is not possible to allocate heap memory to objects. So we need
to clear heap memory by releasing memory for all those objects which are no longer
referenced by the program (or the unreachable objects) so that the space is made
available for subsequent new objects. This memory can be released by the
programmer itself but it seems to be an overhead for the programmer, here garbage
collection comes to our rescue, and it automatically releases the heap memory for all
the unreferenced objects.
Any garbage collection algorithm must perform 2 basic operations. One, it should be
able to detect all the unreachable objects and secondly, it must reclaim the heap
space used by the garbage objects and make the space available again to the
program. The above operations are performed by Mark and Sweep Algorithm in two
phases as listed and described further as follows:
Mark phase
Sweep phase
When an object is created, its mark bit is set to 0(false). In the Mark phase, we set the
marked bit for all the reachable objects (or the objects which a user can refer to) to
1(true). Now to perform this operation we simply need to do a graph traversal,
a depth-first search approach would work for us. Here we can consider every object
as a node and then all the nodes (objects) that are reachable from this node (object)
are visited and it goes on till we have visited all the reachable nodes.
The root is a variable that refers to an object and is directly accessible by a local
variable. We will assume that we have one root only.
We can access the mark bit for an object by ‘markedBit(obj)’.
Algorithm: Mark phase
Mark(root)
If markedBit(root) = false then
markedBit(root) = true
For each v referenced by root
Mark(v)
Note: If we have more than one root, then we simply have to call Mark() for all the root
variables.
Phase 2: Sweep Phase
As the name suggests it “sweeps” the unreachable objects i.e. it clears the heap
memory for all the unreachable objects. All those objects whose marked value is set to
false are cleared from the heap memory, for all other objects (reachable objects) the
marked bit is set to true.
Now the mark value for all the reachable objects is set to false since we will run the
algorithm (if required) and again we will go through the mark phase to mark all the
reachable objects.
Algorithm: Sweep Phase
Sweep()
For each object p in heap
If markedBit(p) = true then
markedBit(p) = false
else
heap.release(p)
The mark-and-sweep algorithm is called a tracing garbage collector because it traces
out the entire collection of objects that are directly or indirectly accessible by the
program.
Example:
A. All the objects have their marked bits set to false.
Example:
1. fact (int n)
2. {
3. if (n<=1)
4. return 1;
5. else
6. return (n * fact(n-1));
7. }
8. fact (6)
A buffer may be used when moving data between processes within a computer. Buffers
can be implemented in a fixed memory location in hardware or by using a virtual data
buffer in software, pointing at a location in the physical memory. In all cases, the data in
a data buffer are stored on a physical storage medium.
Most buffers are implemented in software, which typically uses the faster RAM to store
temporary data due to the much faster access time than hard disk drives. Buffers are
typically used when there is a difference between the rate of received data and the rate
of processed data, for example, in a printer spooler or online video streaming.
Play Videox
Purpose of Buffering
You face buffer during watching videos on YouTube or live streams. In a video stream, a
buffer represents the amount of data required to be downloaded before the video can
play to the viewer in real-time. A buffer in a computer environment means that a set
amount of data will be stored to preload the required data before it gets used by the
CPU.
Computers have many different devices that operate at varying speeds, and a buffer is
needed to act as a temporary placeholder for everything interacting. This is done to
keep everything running efficiently and without issues between all the devices,
programs, and processes running at that time. There are three reasons behind buffering
of data,
1. It helps in matching speed between two devices in which the data is transmitted. For
example, a hard disk has to store the file received from the modem. As we know, the
transmission speed of a modem is slow compared to the hard disk. So bytes coming
from the modem is accumulated in the buffer space, and when all the bytes of a file has
arrived at the buffer, the entire data is written to the hard disk in a single operation.
2. It helps the devices with different sizes of data transfer to get adapted to each other. It
helps devices to manipulate data before sending or receiving it. In computer networking,
the large message is fragmented into small fragments and sent over the network. The
fragments are accumulated in the buffer at the receiving end and reassembled to form a
complete large message.
3. It also supports copy semantics. With copy semantics, the version of data in the buffer is
guaranteed to be the version of data at the time of system call, irrespective of any
subsequent change to data in the buffer. Buffering increases the performance of the
device. It overlaps the I/O of one job with the computation of the same job.
Types of Buffering
There are three main types of buffering in the operating system, such as:
1. Single Buffer
In Single Buffering, only one buffer is used to transfer the data between two devices.
The producer produces one block of data into the buffer. After that, the consumer
consumes the buffer. Only when the buffer is empty, the processor again produces the
data.
Block oriented device: The following operations are performed in the block-oriented
device,
o Line-at a time operation is used for scroll made terminals. The user inputs one line at a
time, with a carriage return waving at the end of a line.
o Byte-at a time operation is used on forms mode, terminals when each keystroke is
significant.
2. Double Buffer
In Double Buffering, two schemes or two buffers are used in the place of one. In this
buffering, the producer produces one buffer while the consumer consumes another
buffer simultaneously. So, the producer not needs to wait for filling the buffer. Double
buffering is also known as buffer swapping.
Block oriented: This is how a double buffer works. There are two buffers in the system.
o The driver or controller uses one buffer to store data while waiting for it to be taken by a
higher hierarchy level.
o Another buffer is used to store data from the lower-level module.
o A major disadvantage of double buffering is that the complexity of the process gets
increased.
o If the process performs rapid bursts of I/O, then using double buffering may be deficient.
o Line- at a time I/O, the user process does not need to be suspended for input or output
unless the process runs ahead of the double buffer.
o Byte- at time operations, double buffer offers no advantage over a single buffer of twice
the length.
3. Circular Buffer
When more than two buffers are used, the buffers' collection is called a circular buffer.
Each buffer is being one unit in the circular buffer. The data transfer rate will increase
using the circular buffer rather than the double buffering.
o In this, the data do not directly pass from the producer to the consumer because the
data would change due to overwriting of buffers before consumed.
o The producer can only fill up to buffer x-1 while data in buffer x is waiting to be
consumed.
o Buffering is done to deal effectively with a speed mismatch between the producer and
consumer of the data stream.
o A buffer is produced in the main memory to heap up the bytes received from the
modem.
o After receiving the data in the buffer, the data get transferred to a disk from the buffer in
a single operation.
o This process of data transfer is not instantaneous. Therefore the modem needs another
buffer to store additional incoming data.
o When the first buffer got filled, then it is requested to transfer the data to disk.
o The modem then fills the additional incoming data in the second buffer while the data in
the first buffer gets transferred to the disk.
o When both the buffers completed their tasks, the modem switches back to the first
buffer while the data from the second buffer gets transferred to the disk.
o Two buffers disintegrate the producer and the data consumer, thus minimising the time
requirements between them.
o Buffering also provides variations for devices that have different data transfer sizes.
Advantages of Buffer
Buffering plays a very important role in any operating system during the execution of
any process or task. It has the following advantages.
o The use of buffers allows uniform disk access. It simplifies system design.
o The system places no data alignment restrictions on user processes doing I/O. By
copying data from user buffers to system buffers and vice versa, the kernel eliminates
the need for special alignment of user buffers, making user programs simpler and more
portable.
o The use of the buffer can reduce the amount of disk traffic, thereby increasing overall
system throughput and decreasing response time.
o The buffer algorithms help ensure file system integrity.
Disadvantages of Buffer
Buffers are not better in all respects. Therefore, there are a few disadvantages as follows,
such as:
o It is costly and impractical to have the buffer be the exact size required to hold the
number of elements. Thus, the buffer is slightly larger most of the time, with the rest of
the space being wasted.
o Buffers have a fixed size at any point in time. When the buffer is full, it must be
reallocated with a larger size, and its elements must be moved. Similarly, when the
number of valid elements in the buffer is significantly smaller than its size, the buffer
must be reallocated with a smaller size and elements be moved to avoid too much
waste.
o Use of the buffer requires an extra data copy when reading and writing to and from user
processes. When transmitting large amounts of data, the extra copy slows down
performance
1.Create operation:
This operation is used to create a file in the file system. It is the most widely used
operation performed on the file system. To create a new file of a particular type the
associated application program calls the file system. This file system allocates space to
the file. As the file system knows the format of directory structure, so entry of this new
file is made into the appropriate directory.
2. Open operation:
This operation is the common operation performed on the file. Once the file is created,
it must be opened before performing the file processing operations. When the user
wants to open a file, it provides a file name to open the particular file in the file system.
It tells the operating system to invoke the open system call and passes the file name to
the file system.
3. Write operation:
This operation is used to write the information into a file. A system call write is issued
that specifies the name of the file and the length of the data has to be written to the file.
Whenever the file length is increased by specified value and the file pointer is
repositioned after the last byte written.
4. Read operation:
This operation reads the contents from a file. A Read pointer is maintained by the OS,
pointing to the position up to which the data has been read.
The seek system call re-positions the file pointers from the current position to a specific
place in the file i.e. forward or backward depending upon the user's requirement. This
operation is generally performed with those file management systems that support
direct access files.
6. Delete operation:
Deleting the file will not only delete all the data stored inside the file it is also used so
that disk space occupied by it is freed. In order to delete the specified file the directory
is searched. When the directory entry is located, all the associated file space and the
directory entry is released.
7. Truncate operation:
Truncating is simply deleting the file except deleting attributes. The file is not
completely deleted although the information stored inside the file gets replaced.
8. Close operation:
When the processing of the file is complete, it should be closed so that all the changes
made permanent and all the resources occupied should be released. On closing it
deallocates all the internal descriptors that were created when the file was opened.
9. Append operation:
Unit-IV
Define file organization and elaborate
three types of file organization.(HPU Bsc
2022)
Sequential File Organization
This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:
o Sorted file method takes more time and space for sorting the records.
Indexed sequential access method (ISAM)
ISAM method is an advanced sequential file organization. In this method, records are
stored in the file using the primary key. An index value is generated for each primary key
and mapped with the record. This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data
block is fetched and the record is retrieved from the memory.
Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value. In
the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.
Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain
the sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.
File Organization
File Organization defines how file records are mapped onto disk blocks. We have four
types of File Organization to organize file records −
What is a directory?
Directory can be defined as the listing of the related files on the disk. The directory may
store some or the entire file attributes.
To get the benefit of different file systems on the different operating systems, A hard
disk can be divided into the number of partitions of different sizes. The partitions are
also called volumes or mini disks.
Each partition must have at least one directory in which, all the files of the partition can
be listed. A directory entry is maintained for each file in the directory which stores all the
information related to that file.
A directory can be viewed as a file which contains the Meta data of the bunch of files.
Play Video
1. File Creation
2. Search for the file
3. File deletion
4. Renaming the file
5. Traversing Files
6. Listing of files
Advantages
1. Implementation is very simple.
2. If the sizes of the files are very small then the searching becomes faster.
3. File creation, searching, deletion is very simple since we have only one directory.
Disadvantages
1. We cannot have two files with the same name.
2. The directory may be very big therefore searching for a file may take so much time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits the number of files in
the system because most of the Operating System limits the number of characters used
to construct the file name.
Every Operating System maintains a variable as PWD which contains the present
directory name (present user name) so that the searching can be done appropriately.
Each user has its own directory and it cannot enter in the other user's directory.
However, the user has the permission to read the root's data but he cannot write or
modify this. Only administrator of the system has the complete access of root directory.
Searching is more efficient in this directory structure. The concept of current working
directory is used. A file can be accessed by two types of path, either relative or absolute.
Absolute path is the path of the file with respect to the root directory of the system
while relative path is the path with respect to the current working directory of the
system. In tree structured directory systems, the user is given the privilege to create the
files as well as directories.
There is a identification bit which differentiate between directory and file. For a
directory, it is d and for a file, it is dot (.)
The following snapshot shows the permissions assigned to a file in a Linux based
system. Initial bit d represents that it is a directory.
These kinds of directory graphs can be made using links or aliases. We can have
multiple paths for a same file. Links can either be symbolic (logical) or hard link
(physical).
2. In the case of hard link, the actual file will be deleted only if all the references to it
gets deleted.
Inverted Index
An inverted index is an index data structure storing a mapping from content,
such as words or numbers, to its locations in a document or a set of documents.
In simple words, it is a hashmap like data structure that directs you from a word
to a document or a web page.
There are two types of inverted indexes: A record-level inverted
index contains a list of references to documents for each word. A word-level
inverted index additionally contains the positions of each word within a
document. The letter form offers more functionality, but needs more processing
power and space to be created.
Suppose we want to search the texts “hello everyone, ” “this article is based on
inverted index, ” “which is hashmap like data structure”. If we index by (text,
word within the text), the index with location in text is:
hello (1, 1)
everyone (1, 2)
this (2, 1)
article (2, 2)
is (2, 3); (3, 2)
based (2, 4)
on (2, 5)
inverted (2, 6)
index (2, 7)
which (3, 1)
hashmap (3, 3)
like (3, 4)
data (3, 5)
structure (3, 6)
The word “hello” is in document 1 (“hello everyone”) starting at word 1, so has
an entry (1, 1) and word “is” is in document 2 and 3 at ‘3rd’ and ‘2nd’ positions
respectively (here position is based on word).
The index may have weights, frequencies, or other indicators.
Steps to build an inverted index:
Fetch the Document
Removing of Stop Words: Stop words are most occurring and useless words
in document like “I”, “the”, “we”, “is”, “an”.
Stemming of Root Word
Whenever I want to search for “cat”, I want to see a document that has
information about it. But the word present in the document is called “cats” or
“catty” instead of “cat”. To relate the both words, I’ll chop some part of each
and every word I read so that I could get the “root word”. There are standard
tools for performing this like “Porter’s Stemmer”.
Record Document IDs
If word is already present add reference of document to index else create
new entry. Add additional information like frequency of word, location of
word etc.
Example:
Words Document
ant doc1
demo doc2
world doc1, doc2
Advantage of Inverted Index are:
Inverted index is to allow fast full text searches, at a cost of increased
processing when a document is added to the database.
It is easy to develop.
It is the most popular data structure used in document retrieval systems,
used on a large scale for example in search engines.
Inverted Index also has disadvantage:
Large storage overhead and high maintenance costs on update, delete and
insert.
Instead of retrieving the data in a decreasing order of expected usefulness,
the records are retrieved in the order in which they occur in the inverted lists.
A multi-linked list is a special type of list that contains two or more logical key
sequences. Before checking details about multi-linked list, see what is a linked
list. A linked list is a data structure that is free from any size restriction until the
heap memory is not full. We have seen different types of linked lists, such
as Singly Linked List, Circular Linked List, and Doubly Linked List. Here we will
see about multi-linked list.
In a multi-linked list, each node can have N number of pointers to other nodes.
A multi-linked list is generally used to organize multiple orders of one set of
elements.
Properties of Multi-Linked List:
The properties of a multi-linked list are mentioned below.
It is an integrated list of related structures.
All the nodes are integrated using links of pointers.
Linked nodes are connected with related data.
Nodes contain pointers from one structure to the other.
Structure of Multi-linked list:
The structure of a multi-linked list depends on the structure of a node. A single
node generally contains two things:
A list of pointers
All the relevant data.
Shown below is the structure of a node that contains only one data and a list of
pointers.
C
Java
Python
C#
Javascript
int data;
} Node;
Inserting into this structure is very much like inserting the same node into two
separate lists. In multi-linked lists it is quite common to have back-pointers, i.e.
inverses of each of the forward links; in the above example, this would mean
that each node had 4pointers.
Representation of Sparse Matrix:
Multi Linked Lists are used to store sparse matrices. A sparse matrix is such a
matrix that has few non-zero values. If we use a normal array to store such a
matrix, it will end up wasting lots of space.
Spare Matrix
The sparse matrix can be represented by using a linked list for every row and
column.
A node in a multi-linked list has four parts:
The first part stores the data.
The second stores the pointer to the next row.
Third for the pointer to the next column and
Fourth for storing the coordinate number of the cell in the matrix.
List of List:
A multi-linked list can be used to represent a list of lists. For example, we can
create a linked list where each node is itself a list and have pointers to other
nodes.
See the structure below:
It is a 2-dimensional data structure.
Here each node has three fields:
The first field stores the data.
The second field stores a pointer to the child node.
The third field stores the pointer to the next node.
C
Java
Python3
C#
int data;
} Node;