In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.
2. !Distributed file system (DFS) –a distributed
implementation of the classical time-sharing model
of a file system, where multiple users share files and
storage resources.
!A DFS manages set of dispersed storage devices
!Overall storage space managed by a DFS is
composed of different, remotely located, smaller
storage spaces.
!There is usually a correspondence between
constituent storage spaces and sets of files.
3. !Service–software entity running on one or more
machines and providing a particular type of function to
a priori unknown clients. !Server–service software
running on a single machine.
!Client–process that can invoke a service using a set of
operations that forms its client interface.
!A client interface for a file service is formed by a set of
primitive file operations
(create, delete, read, write).
!Client interface of a DFS should be transparent, i.e.,
not distinguish between local and remote files.
4. !Introduction
!DFS Issues
!Naming and Transparency
!Remote File Access
! Statefull versus Stateless Service
!File Replication
!Example Systems
!AFS (vs. NFS)
5. A Distributed File System (DFS) as the name
suggests, is a file system that is distributed on
multiple file servers or multiple locations. It allows
programs to access or store isolated files as they do
with the local ones, allowing programmers to access
files from any network or computer.
6. The main purpose of the Distributed File System
(DFS) is to allows users of physically distributed
systems to share their data and resources by using a
Common File System.
A collection of workstations and mainframes
connected by a Local Area Network (LAN) is a
configuration on Distributed File System.
A DFS is executed as a part of the operating
system. In DFS, a namespace is created and this
process is transparent for the clients.
7. Location Transparency –
Location Transparency achieves through the
namespace component.
Redundancy –
Redundancy is done through a file replication
component.
In the case of failure and heavy load, these
components together improve data availability by
allowing the sharing of data in different locations
to be logically grouped under one folder, which is
known as the “DFS root”.
9. Structure transparency –There is no need for the client to
know about the number or locations of file servers and the
storage devices. Multiple file servers should be provided for
performance, adaptability, and dependability.
Access transparency –Both local and remote files should be
accessible in the same manner. The file system should be
automatically located on the accessed file and send it to the
client’s side.
Naming transparency –There should not be any hint in the
name of the file to the location of the file. Once a name is
given to the file, it should not be changed during
transferring from one node to another.
Replication transparency –If a file is copied on multiple
nodes, both the copies of the file and their locations should
be hidden from one node to another.
10. User mobility : It will automatically bring the user’s home directory to
the node where the user logs in.
Performance : Performance is based on the average amount of time
needed to convince the client requests. This time covers the CPU time +
time taken to access secondary storage + network access time. It is
advisable that the performance of the Distributed File System be similar
to that of a centralized file system.
Simplicity and ease of use : The user interface of a file system should be
simple and the number of commands in the file should be small.
High availability : A Distributed File System should be able to continue
in case of any partial failures like a link failure, a node failure, or a storage
drive crash.
A high authentic and adaptable distributed file system should have
different and independent file servers for controlling different and
independent storage devices.
11. Scalability : Since growing the network by adding new
machines or joining two networks together is routine, the
distributed system will inevitably grow over time. As a
result, a good distributed file system should be built to scale
quickly as the number of nodes and users in the system
grows. Service should not be substantially disrupted as the
number of nodes and users grows.
High reliability :The likelihood of data loss should be
minimized as much as feasible in a suitable distributed file
system. That is, because of the system’s unreliability, users
should not feel forced to make backup copies of their files.
Rather, a file system should create backup copies of key files
that can be used if the originals are lost. Many file systems
employ stable storage as a high-reliability strategy.
12. Data integrity :Multiple users frequently share a file system. The integrity
of data saved in a shared file must be guaranteed by the file system. That
is, concurrent access requests from many users who are competing for
access to the same file must be correctly synchronized using a
concurrency control method. Atomic transactions are a high-level
concurrency management mechanism for data integrity that is frequently
offered to users by a file system.
Security : A distributed file system should be secure so that its users may
trust that their data will be kept private. To safeguard the information
contained in the file system from unwanted & unauthorized access,
security mechanisms must be implemented.
Heterogeneity :Heterogeneity in distributed systems is unavoidable as a
result of huge scale. Users of heterogeneous distributed systems have the
option of using multiple computer platforms for different purposes.
14. NFS stands for Network File System. It is a client-server
architecture that allows a computer user to view, store, and
update files remotely. The protocol of NFS is one of the
several distributed file system standards for Network-
Attached Storage (NAS).
CIFS stands for Common Internet File System. CIFS is an
accent of SMB. That is, CIFS is an application of SIMB
protocol, designed by Microsoft.
SMB stands for Server Message Block. It is a protocol for
sharing a file and was invented by IMB. The SMB protocol
was created to allow computers to perform read and write
operations on files to a remote host over a Local Area
Network (LAN). The directories present in the remote host
can be accessed via SMB and are called as “shares”.
15. Hadoop is a group of open-source software
services. It gives a software framework for
distributed storage and operating of big data using
the MapReduce programming model. The core of
Hadoop contains a storage part, known as Hadoop
Distributed File System (HDFS), and an operating
part which is a MapReduce programming model.
NetWare is an abandon computer network
operating system developed by Novell, Inc. It
primarily used combined multitasking to run
different services on a personal computer, using
the IPX network protocol.
16. There are two ways in which DFS can be
implemented:
Standalone DFS namespace –
Domain-based DFS namespace –
17. Standalone DFS namespace –
It allows only for those DFS roots that exist on the local
computer and are not using Active Directory. A
Standalone DFS can only be acquired on those computers
on which it is created. It does not provide any fault
liberation and cannot be linked to any other DFS.
Standalone DFS roots are rarely come across because of
their limited advantage.
Domain-based DFS namespace –
It stores the configuration of DFS in Active Directory,
creating the DFS namespace root accessible at
<domainname><dfsroot> or <FQDN><dfsroot>
19. DFS allows multiple user to access or store the
data.
It allows the data to be share remotely.
It improved the availability of file, access time, and
network efficiency.
Improved the capacity to change the size of the
data and also improves the ability to exchange the
data.
Distributed File System provides transparency of
data even if server or disk fails.
20. In Distributed File System nodes and connections
needs to be secured therefore we can say that security
is at stake.
There is a possibility of lose of messages and data in
the network while movement from one node to
another.
Database connection in case of Distributed File System
is complicated.
Also handling of the database is not easy in Distributed
File System as compared to a single user system.
There are chances that overloading will take place if all
nodes tries to send data at once.