Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DS - Module 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 141

MODULE - 5

GRAPHS
 Definitions : A graph, G, consists of two setsV and E.
 V is a finite non-empty set of vertices.
 E is a set of pairs of vertices, these pairs are called as edges.
V(G) and E(G) will represent the sets of vertices and edges of graph G.
We write G = (V,E) to represent a graph where Vis number of vertices
and E is number of Edges in the graph G.
 A graph may be either undirected or directed graph
If the pair of vertices representing any edge is unordered then graph is
said to be undirected graph. Thus, the pairs (v1, v2) and (v2, v1)
represent the same edge.
 If each edge is represented by a directed pair (v1, v2) where v1 is the tail
and v2 the head of the edge the graph is said to be directed
graph. Therefore <v2, v1> and <v1, v2> represent two different
edges.
Figure below shows three graphs G1, G2 and G3.
 The graphs G1 and G2 are undirected. G3 is a directed graph.
V (G1) = {0,1,2,3}; E(G1) = {(0,1),(0,2),(0,3),(1,2),(1,3),(2,3)}
V (G2) = {0,1,2,3,4,5,6}; E(G2) = {(0,1),(0,2),(1,3),(1,4),(2,5),(2,6)}
V (G3) = {0,1,2}; E(G3) = {<0,1>, <1,2>, <1,0>}.

Note that the edges of a directed graph are drawn with an arrow from the tail to the head.
The graph G2 is also a tree while the graphs G1 and G3 are not.
Restriction on graphs
1. A graph may not have self edges, that is edge to itself.
2. A graph may not have multiple occurrences of the same
edge.
3. For a n-vertex undirected graph, the maximum number of
edges is n(n-1)/2
For a directed graph, maximum number of edges is n(n-1)
Terminologies
 Complete: An n-vertex undirected graph with n(n-1)/2 edges is said to be
complete. In directed graph maximum number of edges is n(n-1). G1 is the
complete graph on 4 vertices while G2 and G3 are not complete graphs.
 If (u,v) is an edge in E(G) of an undirected graph, then vertices u and v are
adjacent vertices and edge(u,v) is incident on vertices u and v
 In directed graph, for an edge <u,v> the vertex u is adjacent to v or v is
adjacent from u. The edge <v1,v2> is incident to v1 and v2.
1 1 1
 A subgraph of G is a graph G such that V(G ) ⊆V(G) and E(G ) ⊆ E(G)
1 1
V(G ) ⊆ V(G) means V(G ) has fewer elements or equal to the set which is there
in V(G)

Terminologies
 A path from vertex vp to vertex vq in graph G is a sequence of
vertices vp,vi1,vi2, ...,vin,vq such that (vp,vi1),(vi1,vi2), ...,(vin,vq) are
edges in E(G). If G' is directed then the path consists of
<vp,vi1>,<vi,vi2>, ..., <vin,vq>, edges in E(G').
 The length of a path is the number of edges on it.
 A simple path is a path in which all vertices except possibly the
first and last are distinct.
 A cycle is a simple path in which the first and last vertices are the
same.
 In an undirected graph, G, two vertices v1 and v2 are said to be connected
if there is a path in G from v1 to v2 (since G is undirected, this means there
must also be a path from v2 to v1).
Terminologies
 An undirected graph is said to be connected if for every pair of
distinct vertices u and v inV(G) there is a path from u to v in G.
 A connected component or simply a component of an
undirected graph is a maximal connected subgraph. That is a
connected component is a subgraph in which every two
vertices are connected to each other by a path and which is not
connected to any additional vertices of the super graph.
 Graph with two connected components shown below.
Terminologies
 A tree is a connected acyclic (i.e., has no cycles) graph .
 A directed graph G is said to be strongly connected if for every pair
of distinct vertices u, v inV(G) there is a directed path from u to v
and also from v to u
 The degree of a vertex is the number of edges incident to
that vertex.
 In case G is a directed graph, we define the in-degree of a vertex v to
be the number of edges for which v is the head. The out-degree is
defined to be the number of edges for which v is the tail.
 If di is the degree of vertex i in a graph G with n vertices and e edges,
then it is easy to see that e = (Σ di/2) for i=0 to n-1
 We refer to a directed graph as a digraph. An undirected graph
will sometimes be referred to simply as a graph
Graph Representation
 1. Adjacency Matrix
 2. Adjacency list
 The adjacency matrix of G is a 2-dimensional n x n boolean
array say A, with the property that A(i,j) = 1 iff there is an
edge (vi,vj) (<vi,vj> for a directed graph).
A(i,j) = 0 if there is no such edge in G.
Adjacency matrix for a given graph
 The adjacency matrix for an undirected graph is symmetric
where as not symmetric for directed graph .
 The space needed to represent a graph using its adjacency matrix is
n2 bits. About half this space can be saved in the case of
undirected graphs by storing only the upper or lower triangle
of the matrix.
 For an undirected graph the degree of any vertex i is its row
sum =∑A(i,j) for j=0 to n-1.
 For a directed graph the row sum is the out-degree while the
column sum is the in-degree.
Adjacency Lists representation
 In this representation n rows of the adjacency matrix are represented
as n linked lists that is chains
 There is one list for each vertex in G.The nodes in list i represent
the vertices that are adjacent from vertex i. Each node has at least
two fields: VERTEX and LINK. The VERTEX fields contain
the indices of the vertices adjacent to vertex i.
 Each list has a head node. The head nodes are sequential
providing easy random access to the adjacency list for any
particular vertex.
 In the case of an undirected graph with n vertices and e edges,
this representation requires n head nodes and 2e chain nodes.
Adjacency Lists for given graph : Examples

The degree of any vertex in an undirected graph may be determined by


just counting the number of nodes in its adjacency list.

In a digraph the number of list nodes = number of edges.The out-degree of any vertex may be
determined by counting the number of nodes on its adjacency list.
Adjacency Lists for given graph : Examples
Packed Adjacency list
 It is possible to sequentially pack the nodes on the adjacency
lists and eliminate the link fields.
 In this case adjacency lists may be packed into an integer
array node[n+2e+1].
 Hence initially declare an array node[n+2e+1]
 Set node[n]=n+2e+1.
 Node[0] to node[n-1] holds starting points of adjacency list
for different vertices. That is in sequential mapping , node[i]
gives the starting point of list for vertex i for 0 ≤ i <n.
 The vertices adjacent from vertex i are stored in location
node[i]….node[i+1]-1 places in node array where 0 ≤ i <n.
Example for a packed adjacency list
Adjacency Multilists
 In the adjacency list representation of an undirected graph each
edge (u,v) is represented by two entries, one on the list for u and the
other on the list for v.
 In some situations it is necessary to be able to determine the second entry
for a particular edge and mark that edge as already having been
examined. This can be accomplished easily if the adjacency lists are
actually maintained as multilists (i.e., lists in which nodes may be
shared among several lists).
 For each edge there will be exactly one node, but this node will be
in two lists, i.e., the adjacency lists for each of the two nodes it is
incident to.
 The node structure is

 m is a one bit mark field that may be used to indicate whether or not
the edge has been examined.
 Link1 is address of other node incident to u which is not covered
earlier and link2 is address of other node incident to v which is not
covered earlier.
Example
N0 N1

N3
N2
N4
N5
Weighted Edges
 The edges of graph may be assigned with weight.
 These weight may be representing the distance from one vertex
to other or cost of going from one vertex to other.
 Adjacency matrix entries a[i][j] holds that weights. If vertex is not
adjacent , then a[i][j] = ∞ and diagonal elements are 0s. The
matrix is generally called as weight matrix
 In adjacency list representation , the weight information is kept in
the chain nodes by including an additional field for weight
Elementary graph operations
 Graph Traversal: Given an undirected graph G (V,E) and a
vertex v inV(G) we are interested in visiting all vertices in G that are
reachable from v (i.e., all vertices connected to v).This may be
performed by two graph traversal methods
 Depth First Search (DFS)
 Breadth First Search (BFS)
DFS is similar to preorder traversal of binary tree
BFS resembles level order traversal of binary tree
Depth First Search(DFS)
Depth first search of an undirected graph proceeds as follows.
 The start vertex v is visited.
 Next an unvisited vertex w adjacent to v is selected
 Depth first search from w initiated.
 When a vertex u is reached such that all its adjacent vertices have been
visited, we back up to the last vertex visited which may have an
unvisited vertex w adjacent to it and initiate a depth first search from w .
 The search terminates when no unvisited vertex can be reached from
any of the visited one
DFS- Algorithm
 procedure DFS(v)
//Given an undirected graph G = (V,E) with n vertices and an arrayVlSlTED[n] initially set to zero,
// this algorithm visits all vertices reachable from v. G andVISITED are global.
VISITED (v) =1
for each vertex w adjacent to v do
if VISlTED(w) = 0 then
call DFS(w)
end
end DFS
DFS- Assuming linked adjacency list representation is used for
a given graph. DFS(v)

 Searching begins by visiting the starting vertex v of adjacency list.


 Preserve v by placing it on the stack
 Select an unvisited w from v’s adjacency list and carryout DFS(w)
 When search reaches a vertex u , that has no unvisited vertices on
it’s adjacency list , remove vertex from stack and continue
processing on stack top adjacency list by visiting unvisited vertices
and placed it on the stack
 Search terminates when stack is empty.
 The recursive implementation is given below.
Function : DFS(v) uses global array visited[MAX_VERTICES]
initialized to FALSE
# define FALSE 0
#define TRUE 1
Short int visited [ MAX_ VERTICES]
Void dfs(int v)
{ /* DFS of graph beginning at v */
nodepointer w;
Visited[v]=TRUE;
For (w=graph[v]; w; w=w-> link)
if(!visited[w->vertex])
dfs(w->vertex)
}
Nodes visited order: 0,1,3,7,4,5,2,6
DFS- Assuming adjacency matrix representation is used
for a given graph. DFS(v)
 Initialize visited vector with 0
void dfs(int v)
{
int w;
Visited[v]=1;
printf(“%d \n”, v);
For (w=0; w<n; w++)
if(a[v][w]==1&visited[w]==0)
dfs(w); a=
}
Nodes visited order: 0,1,3,7,4,5,2,6
Breadth First Search
 Starting from vertex v, it is visited first
 Then unvisited vertices adjacent to v are visited next.
 Repeat the same process.
 A breadth first search beginning at vertex v0 of the graph shown will
visit the vertices in the following order
Vertex v0 and then v1 and v2. Next vertices v3, v4, v5 and v6 will be
visited and finally v7.
 Implementation:
 Visit starting vertex and place that vertex in a a empty Queue.
 While Queue is not empty
 Remove vertex from queue
 The unvisited adjacent vertices of that removed vertex are visited
and placed in the queue
When Queue is empty we have visited all the vertices in that
connected graph.
BFS - Procedure
 BFS(v)
//A breadth first search of G is carried out beginning at vertex v.The graph G and
array VISITED are global and VISITED[i] are initialized to zero.
Initialize Q to be empty //Q is a queue//
VISITED (v) =1;
print(v);
ADDQ( v);
While (! Empty(Q))
DELETEQ( )
for all vertices w adjacent to v do
if VISITED[w] = 0 then
ADDQ(w); //add w to queue//
VISITED[w]= 1 //mark w asVISITED//
print(w);
endif
 End for
End while
 end BFS
BFS- Assuming linked adjacency list
representation is used for a given graph. BFS(v)
 Searching begins by visiting the starting vertex v of
adjacency list.
 Then visit each of vertices on v’s adjacency list.
 Then we visit all the unvisited vertices that are
adjacent to the first vertex on v’s adjacency list, then
second vertex and so on till all vertices are visited.
 This can be implemented efficiently using Queue data
structure
Function : BFS(v) uses global array visited[MAX_VERTICES] initialized to 0
 Void bfs( int v)
 { /* BFS of given graph G, staring vertex v, global array visited [ n] is
initialized to 0, Initialize Q to be empty */
Nodepointer w;
printf(“%d”,v);
visited[v]=1;
AddQ(v);
While(!emptyQ( ))
{
v=DeleteQ( );
For (w=graph[v]; w; w=w-> link)
if(!visited[w->vertex])
{
printf(“%d”,w->vertex);
AddQ(w->vertex);
visited[w->vertex]=1;
}
}
}
BFS- Assuming adjacency matrix representation is used for a
given graph. BFS(v)
 Initialize visited vector with 0
Void bfs(int v)
{
int q[10],front=0,rear=-1, w,i;
Visited[v]=1;
printf(“%d \n”, v);
q[++rear]=v;
While(front<=rear)
{
w=q[front++];
for (i=0; i<n; i++)
if(a[w][i]==1&Visited[i]==0)
{
printf(“%d \n”, i);
q[++rear]=i;
Visited[i]=1;
a=
}
}
Nodes visited order: 0,1,2,3,4,5,6,7
Sorting and searching
 Sorting refers to the operation of arranging data
 Increasing or decreasing order the numerical data
 Alphabetically the character data
Searching refers to operation of finding the location of given item
in a collection of items.
 INSERTION SORT
 The basic step in this method is to insert a new item or record
into a sorted sequence of i items or records in such a way that the
resulting sequence of size i + 1 is also ordered.
 The algorithm scans given list stored as array
A[1],A[2],……A[N], Inserting every element A[K] in its proper
position in previously sorted sub array A[1],A[2]….A[k-1] by
comparing A[K] with A[K-1], A[K-2]..and so on until an
elementA[j]< =A[K] is obtained.
 Each element A[K-1], A[K-2] …A[j+1] is moved one location
forward and A[j] is moved one location forward, and A[K] is then
inserted in j+1 position in the array.
Algorithm
 Algorithm: INSERTION SORT(A,N)
Step 1: Set A[0] = - ∞
Step 2: Repeat step 3 to 5 for k=2,3…N
Step 3: set temp= A[k] and PTR=k-1
Step 4: Repeat while A[PTR]>temp
set A[PTR+1]= A[PTR]
set PTR= PTR-1
[End of Loop]
Step 5: set A[PTR+1]=temp
[End of step 2 for loop]
Step 6: return
Function insert accomplishes this insertion.
 Void insertion sort (int A[ ], int n)
{
int temp, k, PTR;
A[0]= -∞ ;
for(k=2,k<=n; k++)
{
temp=A[k];
PTR=k-1;
while(A[PTR]>temp)
{
A[PTR+1]=A[PTR];
PTR=PTR-1;
}
A[PTR+1]= temp;
}
return;
}
Complexity of Insertion sort
 When given array is in decreasing order and required to sort
in increasing order the inner loop must use maximum
number of comparison . Hence it is worst case .
The maximum number of comparison =
c(n)=1+2+3+…….(n-1)= n(n-1)/2=O(n2)
 If given array is in increasing order itself, then inner while
loop body will not be executed. Hence there will be only n-2
comparisons. Hence this is best case situation.
Complexity is Ω (n).
 For randomly arranged array, the average number of
complexity is C(n)= n(n-1)/4= Ө (n2 )
Tracing with Example: 77,33,44,11,88.22,66,55

Pass A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8]

K=1 -∞ 77
K=2 -∞ 33 77
K=3 -∞ 33 44 77
K=4 -∞ 11 33 44 77
K=5 -∞ 11 33 44 77 88
K=6 -∞ 11 22 33 44 77 88
K=7 -∞ 11 22 33 44 66 77 88
K=8 -∞ 11 22 33 44 55 66 77 88
RADIX SORT
 Radix sort is the method used when alphabetizing a large list of
names. (Here the radix is 26, the 26 letters of the alphabet.)
 Specifically, the list of names is first sorted according to the first
letter of each name.
 That is, the names are arranged in 26 classes, where the first class
consists of those names that begin with "A," the second class
consists of those names that begin with "B," and so on.
 During the second pass, each class is alphabetized according to the
second letter of the name. And so on. If no name contains, for
example, more than 12 letters, the names are alphabetized with at
most 12 passes.
 The radix sort is the method used by a card sorter.
 Suppose 9 cards are punched as follows: 348, 143, 361, 423, 538,
128, 321, 543, 366
 The same concept may be extended to sort the cards with that
numbers.
Suppose we are interested to sort 9 cards that are punched as
follows: 348, 143, 361, 423, 538, 128, 321, 543, 366
 Here radix is 10 (0,1,2……9)
 Sorting may be done considering least significant digit first method

For next pass input is : 361, 321, 143, 423, 366, 348,538,128
Example : contd.

For next pass Input is : 321,423,128,538,143,543,348,361,366

Final output : 128,143,321,348,361,366,423,538,543


 The number of comparisons for algorithm is bounded by
C(n)<=d * s * n
Where d is radix (d=10 for decimal digits)
s is number of digits
n is number of items
Address calculation sort: Procedure
 In this method the items or records to be sorted is placed in
some sub files depending on certain function f order
preserving function, That is the function should have the
property such that if x<=y then f(x)<=f(y).
 Thus all the items or records in one sub file will have keys
less than or equal to the items or records in another sub file.
 The items or records are placed in the subfile in correct
sequence by using any sorting method like insertion sort
method.
 After all items of original file have been placed into sub files,
the sub files are concatenated to produce sorted list
Example :Original file: 25,57,48,37,12,92,86,33
 Let us create 10 sub files, one for each of ten possible first digit.
 Initially each sub file is empty.
 Array of pointers f[10] is declared.
 f[i] points to the first element in the sub file whose first digit is i.
 Each sub file maintains sorted linked list of original array elements.
 For example, after scanning the element 25, it is placed into the sub
file headed by f[2].
 The concatenation of all sub files gives the sorted list.
 After processing each of elements in original file, the sub files appears
as shown below
Original file: 25,57,48,37,12,92,86,33
Routine to implement Address calculation sort, assuming input
is 2 digit number, using first digit assigned sub files
 # define NUMLIST 10
Address sort(int x[ ],int n)
{
int f[10], first,i,j,p,y;
Struct{
int info;
int next;
} node[NUMLIST];
int avail=0;
for(i=0;i<n-1;i++)
node[i].next=i+1;
node[n-1].next=-1;
for(i=0;i<10;i++)
f[i]=-1;
Contd. 25,57,48,37,12,92,86,33
for(i=0;i<n;i++)
{
y=x[i];
first=y/10;
place( &f[first],y);
}
/* Copy number back to x array*/
i=0;
for(j=0;j<10;j++)
{
p=f [j];
while(p!=-1)
{
x[i++]=node[p].info;
p=node[p].next;
}
}
}
HASHING
Hashing- Introduction
 The search time of algorithms like linear search, binary
search depends on the total number of elements in the
collection of data
 Hashing ( hash addressing) is a searching technique in which
search time is independent of the total number of elements
present
 We assume that there is a file F of n records with a set K of
keys which uniquely determine the records in F.
 We also assume that F is maintained in memory by a table T
of m memory locations and that L is the set of memory
addresses of the locations in T.
 For convenience we assume that the keys and the addresses in
L are integers
 The general idea of hashing is, using the key to determine the
address where the record is to be stored
 The general hash function (H), takes the set K of keys and
maps into the set L of memory addresses.
 This can be represented as, H:K->L
 This hashing function or hash function may not yield distinct
values, it is possible that two different keys k1 and k2 will
yield the same hash address. This situation is called collision
 Some method should be used to resolve the collisions
Hash function
 The to principal criteria used in selecting a hash function
H;K->L are:
 The function H should be very easy and quick to compute.
 The function H should as far as possible, uniformly distribute
the hash addresses throughout the set L so that there are a
minimum numbers of collisions
Popular hash functions
 Division method
 Choose a number m larger than the number n of keys in K (
the number m is usually chosen to be a prime number or a
number without small divisors, since this minimizes the
number of collisions)
 The hash function H is defined by
 H(k) = k (mod m) or k ( mod m ) +1
 Here k(mod m) denotes the reminder hence k is divided by
m.
 The second formula is used when we want addresses to be in
the range from 1 to m rather than 0 to m-1
Example
 Consider a company where each of the 68 employees are
assigned with a unique 4-digit employee number. Suppose L
consists of 100 two digit addresses: 00, 01, 02,……99.
 Consider the emp codes 3205, 7148, 2345
 Choose a prime number m close to 99, choose m=97 then,
H(3205) = 3205 mod 97 = 4
H(7148) = 7148 mod 97 = 67
H(2345) = 2345 mod 97 = 17
 Midsquare method
 The key k is squared. Then the hash function H is defined by
 H(k) = l;
 Where l is obtained by deleting digits from both ends of k2.
Same positions of k2 must be used for all the keys
Example:
k: 3205 7148 2345
k2: 102 72 025 510 93 904 54 99 025
H(k): 72 93 99
 Folding method
 The key k is partitioned into a number of parts, k1, k2,…kr,
where each part, except possibly the last, has the same
number of digits as the required address.
 Then the parts are added together, ignoring the last carry.
 That is, H(k) = k1 + k2 + ….+ kr, where the leading-digit
carries of any are ignored.
 Sometimes, for extra „milling‟ the even numbered parts
k2,k4 …. Are each reversed before the addition
 Example
 Chopping the key k into two parts and adding
 H(3205) = 32 + 05 = 37
 H(7148) = 71 + 48 = 119 (ignore leading digit carry 1) =19
 H(2345) = 23 + 45 = 68
 Reversing the second part
 H(3205) = 32 + 50 = 82
 H(7148) = 71 + 84 = 155 (ignore leading digit carry 1) =55
 H(2345) = 23 + 54 = 77
Collision Resolution Methods
1. linear probing (Open addressing also called as closed hashing)

1 2 3 4 5 6 7 8 9 10 11
X C Z A E Y B D
Collision Resolution Methods
 The efficiency of hashing function with collision resolution is
measured by Average number of key comparison need to
find location of record with a given key. We denote it by
S(λ) = Average number of probes for a successful search
U(λ)= Average number of probes for an successful search
For the above solved problem
S(λ) = (1+1+1+1+2+2+2+3)/8 =1.6
U(λ)= (7+6+5+4+3+2+1+2+1+1+8)/11 = 3.6

1 2 3 4 5 6 7 8 9 10 11
X C Z A E Y B D
Linear Probing
Let key x be stored in element of the array whose
address is the array index computed using hash
function h(x)= x %15
Then the keys 35,129,36,47,25,2501 are stored in
the hash table t

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 129 25 2501

What do you do in case of a collision?


If the hash table is not full, attempt to store key as the
next array element until you find an empty slot
in linear probing method
Linear Probing
Where do you store 65 ?.
The location 5 is already occupied by 35.
The location 6 is already occupied by 36.
Next vacant position is 7. There 65 is
stored.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 65 129 25 2501
  
attempts
Linear Probing
Where would you store: 29?

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 65 129 25 2501 29

attempts

Where would you store: 16?

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
16 47 35 36 65 129 25 2501 29
Linear Probing
Where would you store: 14?
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 29
 
attempts
Where would you store: 99?

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 99 29
   
attempts
Linear Probing
Where would you store: 127 ?

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 127 129 25 2501 99 29
 
attempts
Linear Probing
• Leads to problem of clustering. Elements tend
to cluster in dense intervals in the array.
    

• Search efficiency problem remains.


• Deletion becomes trickier….
Deletion problem
 H=KEY MOD 10
 Insert 47, 57, 68, 18, 67
0 18
 Find 68 1 67
 Find 10 2
 Delete 47 3
 Find 57 4
5
6
7 47
8 57
9 68
Deletion Problem -- SOLUTION
 “Lazy” deletion

 Each cell is in one of 3 possible states:


 active
 empty
 deleted

 For Find or Delete


 only stop search when EMPTY state detected (not DELETED)
Quadratic Probing
Let key x be stored in element h(x)=t of the
array

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 129 25 2501
65(?)

What do you do in case of a collision?


If the hash table is not full, attempt to store key in
array elements (t+12)%N, (t+22)%N, (t+32)%N …
until you find an empty slot.
Quadratic Probing
Where do you store 65 ? f(65)=t=5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 129 25 2501 65
   
t t+1 t+4 t+9
attempts
Where would you store: 29? f(29)=t=14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 47 35 36 129 25 2501 65
 
t+1 t
attempts
Quadratic Probing
Where would you store: 16? f(16)=t=1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 16 47 35 36 129 25 2501 65

t attempts
Where would you store: 14? f(14)=t=14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 16 47 14 35 36 129 25 2501 65
  
t+1 t+4 t
attempts
Quadratic Probing
Where would you store: 99 ? f(99)= t=9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 16 47 14 35 36 129 25 2501 99 65
  
t t+1 t+4
attempts
Where would you store: 127 ? f(127)=t=7

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 16 47 14 35 36 127 129 25 2501 99 65

t
attempts
Quadratic Probing
• Tends to distribute keys better than linear
probing
• Alleviates problem of clustering
• Runs the risk of an infinite loop on insertion,
unless precautions are taken.
• E.g., consider inserting the key 16 into a table
of size 16, with positions 0, 1, 4 and 9 already
occupied.
• Therefore, table size should be prime.
Double Hashing
Let key x be stored in element f(x)=t of the array

Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 129 25 2501
65(?)

What do you do in case of a collision?


Define a second hash function f2(x)=d. Attempt to
store key in array elements (t+d)%N, (t+2d)%N,
(t+3d)%N …
until you find an empty slot.
Double Hashing
 Typical second hash function
f2(x)=R − ( x % R )
where R is a prime number, R < N
Double Hashing
Where do you store 65 ? f(65)=t=5
Let f2(x)= 11 − (x % 11) f2(65)=d=1
Note: R=11, N=15
Attempt to store key in array elements (t+d)%N,
(t+2d)%N, (t+3d)%N …
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 65 129 25 2501
  
t t+1 t+2
attempts
Double Hashing
If the hash table is not full, attempt to store key
in array elements (t+d)%N, (t+2d)%N …
Where would you store: 29?

Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 65 129 25 2501 29

t
attempt
Double Hashing
If the hash table is not full, attempt to store key
in array elements (t+d)%N, (t+2d)%N …
Where would you store: 16?
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
16 47 35 36 65 129 25 2501 29

t
attempt
Double Hashing
If the hash table is not full, attempt to store key
in array elements (t+d)%N, (t+2d)%N …
Where would you store: 14?

Let f2(x)= 11 − (x % 11) f2(14)=d=8

Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 29
  
t+16 t+8 t
attempts
Double Hashing
If the hash table is not full, attempt to store key
in array elements (t+d)%N, (t+2d)%N …
Where would you store: 99?
Let f2(x)= 11 − (x % 11) f2(99)=d=11

Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 99 29
   
t+22 t+11 t t+33
attempts
Double Hashing
If the hash table is not full, attempt to store key
in array elements (t+d)%N, (t+2d)%N …
Where would you store: 127 ?

Let f2(x)= 11 − (x % 11) f2(127)=d=5

Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 99 29
  
t+10 t t+5
attempts
Separate Chaining
The keys are 35,129,36,47,25,2501,65

Let each array element be the head of a chain.


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
     
47 65 36 129 25 2501

35

Where would you store: 29, 16, 14, 99, 127 ?


Separate Chaining
Let each array element be the head of a chain:

Where would you store: 29, 16, 14, 99, 127 ?


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
        
16 47 65 36 127 99 25 2501 14
  
35 129 29

New keys go at the front of the relevant chain.


Separate Chaining: Disadvantages
 Parts of the array might never be used.
 As chains get longer, search time increases to O(n) in the
worst case.
 Constructing new chain nodes is relatively expensive (still
constant time, but the constant is high).
 Is there a way to use the “unused” space in the array instead of
using chains to make more space?
Chaining
 This method involves maintaining two tables in memory
 There is a table T (as before) in memory which contains the
records in F, except that T now has an additional field LINK.
 LINK field is used so that all records in T with the same hash
address h can be linked together to form a linked list
 There is a hash address table LIST which contains pointers to
the linked lists in T
 Suppose a new record R with key k is added to the file F.
 We place R in the first available location in the table T and
then add R to the linked list with pointer LIST[H(k)]
 Example
 Consider the data given below
 Record: A, B, C, D, E, X, Y, Z
 H(k): 4, 8, 2, 11, 4, 11, 5, 1
 Record: A, B, C, D, E, X, Y, Z
 H(k): 4, 8, 2, 11, 4, 11, 5, 1
 Record: A, B, C, D, E, X, Y, Z
 H(k): 4, 8, 2, 11, 4, 11, 5, 1
 Record: A, B, C, D, E, X, Y, Z
 H(k): 4, 8, 2, 11, 4, 11, 5, 1
 Record: A, B, C, D, E, X, Y, Z
 H(k): 4, 8, 2, 11, 4, 11, 5, 1
 Record: A, B, C, D, E, X, Y, Z
 H(k): 4, 8, 2, 11, 4, 11, 5, 1
Static hashing
 In hashing keys are distributed in a 1D array H[0…b-1] called as
hash table which is partitioned into ht[0],ht[1]…..ht[b-1] buckets
and capable of holding S pointers to dictionary pair. The bucket is
said to consists of s slots.
 Consider 10 identifiers acos, define, float, exp, char, atan, ceil,
floor, clock and ctime hashes into buckets 0,3,5,4,2,0,2,5,2 and 2
 Hash table with 26 buckets and two slots per bucket with 8
identifiers entered into hash table is shown. The next identifier
clock and ctime hashes into bucket ht[2]. Proper collision resolution
is used to solve this problem
Dynamic hashing
 Here number of bucket is not fixed, it can grow or shrink.
 Dynamic hashing is an extendible hashing technique in which
1. The bucket address to data to be placed is found by
extracting certain number of bits.
2. The bucket can hold data of given bucket size. If data in
bucket is more than bucket size then split the bucket and
double directory.
Example: Keys 1,4,5,7,8,10
bucket size is 2 that is each page or bucket can hold
maximum of 2 data

KEY 1 4 5 7 8 10
h(KEY) 0001 0100 0101 0111 1000 1010

 Step 1: Insert 1,4 and 5

4
Bucket [0]

Bucket [1]
1,5
Example
 Step 2: When we insert 7 there is bucket overflow in
bucket[1]. Then split bucket and double the directory and
place items in proper bucket.

1,5 ,7

1,5

7
After inserting 8 and 10 for the buckets

4,8
1,5

10
7

Size of directory depends on number of bits h(k) is used to index the directory
When indexing is done using h(k,2) the directory size is 4.
For h(k,5) directory size is 32.
The number of bits used to index directory is called directory depth.
Example based on Extendible Hashing:
Hashing the following
elements: 16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash
Function returns X LSBs.
 Solution: First, calculate the binary forms of each of the given
numbers.
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 01101
Extendible Hashing: contd. 16,4,6,22,24,10,31,7,9,20,26.
 Initially, the global-depth and local-depth is always 1where
 Global Depth = Number of bits in directory id.
 Local Depth is associated with the buckets and not the directories.
 Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function
returns 1 LSB of 10000 which is 0. Hence, 16 is mapped to the directory with
id=0.
 Inserting 4 and 6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:

Initially
Extendible Hashing: contd. 16,4,6,22,24,10,31,7,9,20,26.

 Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket
pointed by directory 0 is already full. Hence, Over Flow occurs.
 Since Local Depth = Global Depth, the bucket splits and directory
expansion takes place. Also, rehashing of numbers present in the
overflowing bucket takes place after the split.
 The global depth is incremented by 1, now, the global depth is 2. Hence,
16,4,6,22 are now rehashed w.r.t 2 LSBs.[ 16(10000),4(100),6(110),
22(10110) ]

The bucket which was underflow has remained untouched. But, since the number of
directories has doubled, we now have 2 directories 01 and 11 pointing to the same
bucket. This is because the local-depth of the bucket has remained 1.
Extendible Hashing: contd. 16,4,6,22,24,10,31,7,9,20,26.

 Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed


based on directories with id 00 and 10. Here, we encounter no
overflow condition.
 Inserting 31,7,9: All of these elements[ 31(11111), 7(111),
9(1001) ] have either 01 or 11 in their LSBs. Hence, they are
mapped on the bucket pointed out by 01 and 11. We do not
encounter any overflow condition here.
Extendible Hashing: contd. 16,4,6,22,24,10,31,7,9,20,26.

Inserting 20: Insertion of data element 20 (10100) will again cause


the overflow problem.

The local depth of


the bucket = global
depth, directory
expansion (doubling)
takes place along with
bucket splitting.
Extendible Hashing: contd. 16,4,6,22,24,10,31,7,9,20,26.

 Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010)


are considered. Therefore 26 best fits in the bucket pointed out
by directory 010.

The local depth of bucket < Global depth (2<3), directories are not
doubled but, only the bucket is split and elements are rehashed.
Files and organization
Introduction
 Every file contains data which can be organized in a hierarchy
to present a systematic organization.
 The data hierarchy includes data items such as fields, records,
files, and database.
Data field
 A data field is an elementary unit that stores a single fact. A
data field is usually characterized by its type and size.
 Example: student‟s name is a data field that stores the name
of students.
Record
 A record is a collection of related data fields which is seen as a
single unit from the application point of view.
 Example:
 The student‟s record may contain data fields such as name,
address, phone number, roll number, marks obtained, and so
on
File
 A file is a collection of related records.
 Example: A file of all the employees working in an organization
Directory
 A directory stores information of related files. A directory
organizes information so that users can find it easily
File Attributes
 File has a list of attributes associated with it that gives the
information about the file to the operating system and the
application software and how it is intended to be used.
File name
 It is a string of characters that stores the name of a file.
 File naming conventions vary from one operating system to
the other
File position
 It is a pointer that points to the position at which the next
read/write operation will be performed
File structure
 It indicates whether the file is a text file or a binary file.
 In the text file, the numbers are stored as a string of
characters.
 A binary file stores numbers in the same way as they are
represented in the main memory
File access methods
 It indicates whether the records in a file can be accessed
sequentially or randomly
 In sequential access mode, records are read one by one
 In random access, records can be accessed in any order
Attributes flag
 A file can have six additional attributes attached to it.
 These attributes are usually stored in a single byte, with each
bit representing a specific attribute.
 If a particular bit is set to „1‟ then this means that the
corresponding attribute is turned on
Read-only
 A file marked as read-only cannot be deleted or modified.
Hidden
 A file marked as hidden is not displayed in the directory
listing.
 System
 Volume label
 Directory
 Archive
Text files Binary files
 A text file, also known as a  A binary file contains any
flat file or an ASCII file, is type of data encoded in
structured as a sequence of binary form for computer
lines of alphabet, numerals, storage and processing
special characters purposes
 It is possible for humans to  A binary file is not readable
read text files which by humans
contain only ASCII text
 Text files can be
 Binary files provide efficient
manipulated by any text
storage of data, but they can
editor, they do not provide
be read only through an
efficient storage.
appropriate program.
Basic File Operations
Basic File Operations
 Creating a File
 A file is created by specifying its name and mode. The file may be
opened for writing records that are read from an input device. Once
all the records have been written into the file, the file is closed. The
file is now available for future read/write operations by any program
that has been designed to use it in some way or the other.
 Updating a File
 Updating a file means changing the contents of the file to reflect a
current picture of reality. A file can be updated in the following ways:
 Inserting a new record in the file. For example, if a new student joins
the course, we need to add his record to the STUDENT file.
 Deleting an existing record. For example, if a student quits a course in
the middle of the session, his record has to be deleted from the
STUDENT file.
 Modifying an existing record. For example, if the name of a student
was spelt incorrectly, then correcting the name will be a modification
of the existing record.
Basic File Operations
 Retrieving from a File
 It means extracting useful data from a given file. Information can
be retrieved from a file either for an inquiry or for report
generation. An inquiry for some data retrieves low volume of data,
while report generation may retrieve a large volume of data from
the file.
 Maintaining a File
 It involves restructuring or re-organizing the file to improve the
performance of the programs that access this file.
 Restructuring a file keeps the file organization unchanged and
changes only the structural aspects of the file.
 Example: changing the field width or adding/deleting fields.
 File reorganization may involve changing the entire organization of
the file
File organization
 Organization of records means the logical arrangement of records
in the file and not the physical layout of the file as stored on a
storage media
 Sequential Organization
 A sequentially organized file stores the records in the order in
which they were entered.
 Sequential files can be read only sequentially, starting with the first
record in the file.
 Sequential file organization is the most
basic way to organize a large collection
of records in a file
Advantages
 Simple and easy to Handle
 No extra overheads involved
 Sequential files can be stored on magnetic disks as well as magnetic tapes
 Well suited for batch– oriented applications
Disadvantages
 Records can be read only sequentially. If ith record has to be read, then all
the i–1 records must be read
 Does not support update operation. A new file has to be created and the
original file has to be replaced with the new file that contains the desired
changes
 Cannot be used for interactive applications
Relative File Organization
 If the records are of fixed length and we know the base address of
the file and the length of the record, then any record i can be
accessed using the following formula:
 Address of ith record = base_address + (i–1) * record_length
 Consider the base address of a file is 1000 and each record
occupies 20 bytes, then the address of the 5th record can be given
as:
1000 + (5–1) * 20
= 1000 + 80
= 1080
 Features
 Provides an effective way to access individual records
 The record number represents the location of the record relative to
the beginning of the file
 Records in a relative file are of fixed length
 Relative files can be used for both random as well as sequential access
 Every location in the table either stores a record or is marked as
FREE
Advantages
 Ease of processing
 If the relative record number of the record that has to be accessed
is known, then the record can be accessed instantaneously
 Random access of records makes access to relative files fast
 Allows deletions and updations in the same file
 Provides random as well as sequential access of records with low
overhead
 New records can be easily added in the free locations based on the
relative record number of the record to be inserted
 Well suited for interactive applications
Disadvantages
 Use of relative files is restricted to disk devices
 Records can be of fixed length only
 For random access of records, the relative record number must be
known in advance
Indexed Sequential File Organization
 Features
 Index table stores the address of the records in the file
 The ith entry in the index table points to the ith record of the file
 Provides fast data retrieval
 Records are of fixed length
 While the index table is read sequentially to find the
address of the desired record, a direct access is made
to the address of the specified record in order to
access it randomly
 Indexed sequential files perform well in situations
where sequential access as well as random access is made to
the data
Advantages
 The key improvement is that the indices are small and can be searched
quickly, allowing the database to access only the records it needs
 Supports applications that require both batch and interactive processing
 Records can be accessed sequentially as well as randomly
 Updates the records in the same file
Disadvantages
 Indexed sequential files can be stored only on disks
 Needs extra space and overhead to store indices
 Handling these files is more complicated than handling sequential files
 Supports only fixed length records
INDEXING
 In the indexing technique the index table stores the address
of the records in the file
 . There are two kinds of indices:
Ordered indices that are sorted based on one or more key values
Hash indices that are based on the values generated by applying a hash function
 1. Ordered Indices
Indices are used to provide fast random access to records. An index of a
file may be a primary index or a secondary index.
Primary Index
In a sequentially ordered file, the index whose search key specifies the
sequential order of the file is defined as the primary index.
 Example: suppose records of students are stored in a STUDENT file in a
sequential order starting from roll number 1 to roll number 60. Now, if
we want to search a record for, say, roll number 10, then the student‟s roll
number is the primary index.
INDEXING
 Secondary Index
 An index whose search key specifies an order different from the sequential order
of the file is called as the secondary index.
 Example: If the record of a student is searched by his name, then the name is a
secondary index. Secondary indices are used to improve the performance of
queries on non-primary keys.
 Dense index
In a dense index, the index table stores the address of every record in the file.
By looking at the dense index, it can be concluded directly whether the record
exists in the file or not.
 Sparse index
In a sparse index, the index table stores the address of only some of the records in
the file.
Sparse indices are easy to fit in the main memory,
In a sparse index, to locate a record, first find an entry in the index table with the
largest search key value that is either less than or equal to the search key value of
the desired record. Then, start at that record pointed to by that entry in the index
table and then proceed searching the record using the sequential pointers in the
file, until the desired record is obtained.
Hashed Indices
 Hashing is used to compute the address of a record by using a hash
function on the search key value. The hashed values map to the same
address, then collision occurs and schemes to resolve these collisions are
applied to generate a new address
 Choosing a good hash function is critical to the success of this
technique. By a good hash function, it mean two things.
 1. First, irrespective of the number of search keys, gives an average-case
lookup that is a small constant.
 2. Second, the function distributes records uniformly and randomly
among the buckets, where a bucket is defined as a unit of one or more
records
 The worst hash function is one that maps all the keys to the
same bucket.
 The drawback of using hashed indices includes:
 Though the number of buckets is fixed, the number of files may grow
with time.
 If the number of buckets is too large, storage space is wasted.
 If the number of buckets is too small, there may be too many collisions.
Hashed Indices
 The following operations are performed in a hashed file
organization.
 1. Insertion
 To insert a record that has ki as its search value, use the hash function h(ki)
to compute the address of the bucket for that record. If the bucket is free,
store the record else use chaining to store the record.
 2. Search
 To search a record having the key value ki, use h(ki) to compute the
address of the bucket where the record is stored. The bucket may contain
one or several records, so check for every record in the bucket to retrieve
the desired record with the given key value.
 3. Deletion
 To delete a record with key value ki, use h(ki) to compute the address of
the bucket where the record is stored. The bucket may contain one or
several records so check for every record in the bucket, and then delete
the record.
File Handling in C
 Console oriented I/O functions use keyboard as input device
and monitor as output device.
 The I/O functions like printf(), scanf(), getchar(), putchar(),
gets(), puts()
 The Problem is
 1. Entire data is lost when either the program is terminated
or the computer is turned off.
 2. When the volume of data to be entered is large, it takes a
lot of time to enter the data.
 3. If user makes a mistake while entering data, whole data has
to be re-entered.
 Solution is File : A File is a place on the disk (not memory)
where a group of related data is stored. Also called data files.
 There are Two ways to perform file operation in C.
1. Low level I/O that uses Unix system calls.
2. High level I/O operation using functions in C‟s standard
I/O library.
 'C' provides following file management functions,
 Creation of a file
 Opening a file
 Reading a file
 Writing to a file
 Closing a file
Some important file management functions available in 'C,'

Function Purpose
 fopen ( ) ------->Creating a file or opening an existing file
 fclose ( ) -------> Closing a file
 fprintf ( ) -------> Writing a block of data to a file
 fscanf ( ) -------> Reading a block data from a file
 getc ( ) -------> Reads a single character from a file
 putc ( ) -------> Writes a single character to a file
 getw ( ) -------> Reads an integer from a file
 putw ( ) -------> Writing an integer to a file
 fseek ( ) -------> Sets the position of a file pointer to a specified location
 ftell ( ) -------> Returns the current position of a file pointer
 rewind ( ) -------> Sets the file pointer at the beginning of a file
Defining and Opening a file
The general format for declaring and opening a file is:
FILE *fp;
fp=fopen(“filename”, “mode”);
Here, the first statement declares the variable fp as a “pointer to
the data type FILE”.
The second statement opens the file named filename with the
purpose mode and the beginning address of the buffer area allocated
for the file is stored by file pointer fp.
 • Note: Any no. of files can be opened and used at a time.
File Opening Modes
File Mode Description
 r Open a file for reading. If a file is in reading mode, then no
data is deleted if a file is already present on a system.
 w Open a file for writing. If a file is in writing mode, then a new
file is created if a file doesn't exist at all. If a file is already
present on a system, then all the data inside the file is
truncated, and it is opened for writing purposes.
 a Open a file in append mode. If a file is in append mode, then
the file is opened. The content within the file doesn't change.
 r+ open for reading and writing from beginning
 w+ open for reading and writing, overwriting a file
 a+ open for reading and writing, appending to file
Closing a file
 One should always close a file whenever the operations on file are
over. It means the contents and links to the file are terminated. This
prevents accidental damage to the file.
 'C' provides the fclose function to perform file closing operation.
The syntax of fclose is as follows,
fclose (file_pointer);
 After closing the file, the same file pointer can also be used with
other files.
 In 'C' programming, files are automatically close when the program
is terminated. Closing a file manually by writing fclose function is a
good programming practice.
Writing to a File
 In C, when you write to a file, newline characters '\n' must be
explicitly added.
 The stdio library offers the necessary functions to write to a file:
 fputc(char, file_pointer): It writes a character to the file
pointed to by file_pointer.
 fputs(str, file_pointer): It writes a string to the file pointed to
by file_pointer.
 fprintf(): is formatted output function which is used to print or
write integer, float, char or string value to a file.
Syntax: fprintf(fp, “control_string”, list_of_variables);
 fputw(num, file_pointer): It writes a num to the file pointed
to by file_pointer.
Reading data from a File

 The different functions dedicated to reading data from a file


 fgetc(file_pointer): It returns the next character from the
file pointed to by the file pointer. When the end of the file has
been reached, the EOF is sent back.
 fgets(buffer, n, file_pointer): It reads n-1 characters
from the file and stores the string in a buffer in which the
NULL character '\0' is appended as the last character.
 fscanf(): is formatted input function which is used to read
integer, float, char or string value from a file.
Syntax: fscanf(fp, “control_string”, list_of_variables);
 fgetw(file_pointer): It returns the integer from the file
pointed to by the file pointer and file pointer is ad
End-Of-File (EOF)
 EOF is a special character (an integer with ASCII value 26)
that indicates that the end-of-file has been reached. This
character can be generated from the keyboard by typing
Ctrl+Z.
 EOF is defined in <stdio.h>
 • When we are creating a file, the special character EOF, is
inserted after the last character of the file by the Operating
System.
 • Caution: An attempt to read after EOF might either cause the
program to terminate with an error or result in an infinite loop
situation.
/* Program to read content of a file and display on
the screen*/

void main()
{
FILE *fp;
char filename[20];
char c;
clrscr();
printf("Enter filename:\t");
gets(filename);
fp=fopen(filename, "r");
if(fp==NULL)
{
printf("\n Cannot open file.");
exit();
}
printf("\n The content of file is:\n");
while((c=fgetc(fp))!=EOF)
putchar(c);
fclose(fp);
getch();
}
/* Program to copy content of sfile to dfile*/
void main()
{
FILE *sfp,*dfp;
char sfilename[20],dfilename[20];
char c;
clrscr();
printf("Enter source filename:\t");
gets(sfilename);
printf("\n Enter destination filename:\t");
gets(dfilename);
sfp=fopen(sfilename,"r");
if(sfp==NULL) {
printf("\nSource file can't be opened.");
exit();
}
dfp=fopen(dfilename, "w");
if(dfp==NULL) {
printf("\n Destination file cannot be created or opened.");
exit();
}
while((c=fgetc(sfp))!=EOF)
fputc(c, dfp);
printf("\n Copied........");
fclose(dfp);
fclose(sfp);
getch();
Functions used in random access
 1. ftell():This function takes a file pointer as argument and returns
a number of type long, that indicates the current position of
the file
pointer within the file.
 This function is useful in saving the current position of a file,
which can be used later in the program.
 Syntax
 n = ftell(fp);
 Here, n would give the relative offset (in bytes) of the current position.
This means that n bytes have already been read (or written).
Random access to files
 2. rewind():This function takes a file pointer as argument and resets
the current position of the file pointer to the start of the file.
 Syntax: rewind(fp);
What these statements do?:
rewind(fp);
n=ftell(fp);
• Here, n would be assigned 0, because file position has been set to the start
of the file by rewind().
 • Note:The first byte in the file is numbered as 0, second as 1, and so on.
Random access to files
 3. fseek():This function is used to move the file pointer to a desired
position within a file.
 Syntax : fseek(fp, offset, position);
where fp is a file pointer, offset is a number or variable data type
long, and position is an integer number
• The offset specifies the number of positions (bytes) to be moved
from the location specified by position.
• The position can have one of the following 3 values:
 Value Meaning
0 Beginning of file
1 Current position
2 End of file
The offset may be positive, meaning move forwards, or negative, meaning move
backwards.
• Examples:
Statement Meaning
fseek(fp, 0L, 0); Move file pointer to beginning of file. (Same as rewind.)
fseek(fp, 0L, 1); Stay at the current position. (File pointer is not moved.)
fseek(fp, 0L, 2); Move file pointer past the last character of the file. (Go
to the end of file.)
fseek(fp, m, 0); Move file pointer to (m+1)th byte in the file.
fseek(fp, m, 1); Move file pointer forwards by m bytes.
fseek(fp, -m, 1); Move file pointer backwards by m bytes from the current
position.
fseek(fp, -m, 2); Move file pointer backwards by m bytes from the end.
(Positions the file pointer to the m th character from the
end)
 When the operation is successful, fseek() returns a 0 (zero).
• If we attempt to move the file pointer beyond the file boundaries,
an error occurs and fseek() returns -1 (minus one).
• It is good practice to check whether an error has occurred or not,
before proceeding further.
 /* A program that uses the functions ftell() and fseek() */
#include <stdio.h>
void main()
{
FILE *fp;
char c;
long n;
clrscr();
fp=fopen("RANDOM","w");
if(fp==NULL)
{
printf("\nCannot create file.");
exit();
}
while((c=getchar())!=EOF)
fputc(c,fp);
printf("\nNo. of characters entered=%ld",ftell(fp));
fclose(fp);
fp=fopen("RANDOM","r");
if(fp==NULL)
{
printf("\nCannot create file.");
exit();
}
n=0L;
while(feof(fp)==0)
{
fseek(fp,n,0); //Position to (n+1)th character
printf("Position of %c is %ld\n",fgetc(fp),ftell(fp));
n=n+5L;
}
putchar('\n');
fseek(fp,-1L,2); /*Position to the last character*/
do
{
putchar(fgetc(fp));
}while(!fseek(fp,-2L,1));
fclose(fp);
getch();
}

You might also like