Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
27 views

Distributed File System

The document discusses distributed file systems (DFS) which allow sharing of files across multiple computers. DFS aim to provide a common view of a centralized file system while utilizing a distributed implementation. Key aspects of DFS include distributed storage, caching files locally for performance, ensuring cache consistency, and different naming schemes for distributed files.

Uploaded by

Defenders
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Distributed File System

The document discusses distributed file systems (DFS) which allow sharing of files across multiple computers. DFS aim to provide a common view of a centralized file system while utilizing a distributed implementation. Key aspects of DFS include distributed storage, caching files locally for performance, ensuring cache consistency, and different naming schemes for distributed files.

Uploaded by

Defenders
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Distributed File System

11
Distributed Storage
 Storage needs increase almost exponentially – widespread use of e-mail,
photos, videos, logs, …
 Can’t store everything on one large disk. If the disk fails, we lose
everything!
 Solution: Store the user’s information along with “some redundant
information” across many disks.
 If a disk fails, then you still have enough information in the surviving
disks. Bring in a new disk and replace the information lost by the failed
disk ASAP. 
 Simple? No. Today’s large data centers have so many disks that multiple
disk failures are more common! Permanent data loss becomes likely. 
This presentation is about these issues.

2
Distributed Storage: what we care about.
 Performance metrics:
 Storage efficiency: how much redundant information do you
store?
 Saturation throughput: how many I/O requests can the system
handle before it collapses (or delay increases to infinity)?
 Rebuild time: how fast can you replace information lost due to
disk failure?
 Mean time to data loss: under assumptions on failure and usage
models of the system, how long do you expect to run without any
permanent loss of data?
 Encoding/Decoding/Update/Rebuild complexity: the
computation power needed for all these operations; also, how
many bytes of data on how many disks do you have to update if
you just want to update 1 byte of user data?
 Sequential read/write bandwidth: bandwidth the system can
provide for streaming data

3
Distributed Files Systems (DFS)
 A special case of distributed system
 Allows multi-computer systems to share files
 Even when no other IPC or RPC is needed
 Sharing devices
 Special case of sharing files
 E.g.,
 NFS (Sun’s Network File System)
 Windows NT, 2000, XP
 Andrew File System (AFS) & others …

Distributed File Systems 4


Distributed File Systems (continued)
 One of most common uses of distributed
computing

 Goal: provide common view of centralized


file system, but distributed implementation.
 Ability to open & update any file on any machine
on network
 All of synchronization issues and capabilities of
shared local files

Distributed File Systems 5


DFS Structure
 Service – software entity running on one or more
machines and providing a particular type of function
to a priori unknown clients
 Server – service software running on a single machine
 Client – process that can invoke a service using a set
of operations that forms its client interface
 A client interface for a file service is formed by a set of
primitive file operations (create, delete, read, write)
 Client interface of a DFS should be transparent, i.e.,
not distinguish between local and remote files

6
Naming of Distributed Files
 Naming – mapping between logical and physical objects.
 A transparent DFS hides the location where in the network the
file is stored.
 Location transparency – file name does not reveal the file’s
physical storage location.
 File name denotes a specific, hidden, set of physical disk blocks.

 Convenient way to share data.

 Could expose correspondence between component units and


machines.
 Location independence – file name does not need to be
changed when the file’s physical storage location changes.
 Better file abstraction.

 Promotes sharing the storage space itself.

 Separates the naming hierarchy from the storage-devices


hierarchy.
Distributed File Systems 7
DFS – Three Naming Schemes
1. Mount remote directories to local directories
1. Mounted remote directories can be accessed transparently.
2. Unix/Linux with NFS; Windows with mapped drives
2. Files named by combination of host name and local name;
1. Guarantees a unique system wide name
2. Windows Network Places
3. Total integration of component file systems.
1. A single global name structure spans all the files in the system.
2. If a server is unavailable, some arbitrary set of directories on
different machines also becomes unavailable.
3. Andrew File System

Distributed File Systems 8


Mounting Remote Directories (NFS)

Distributed File Systems 9


Mounting Remote Directories (continued)
 Note:– names of files are not unique
 As represented by path names
 E.g.,
 Server A sees : /users/steen/mbox
 Client A sees: /remote/vu/mbox
 Client B sees: /work/me/mbox

Distributed File Systems 10


DFS – File Access Performance
 Reduce network traffic by retaining recently
accessed disk blocks in local cache
 Repeated accesses to the same information can be
handled locally.
 All accesses are performed on the cached copy.
 If needed data not already cached, copy of data
brought from the server to the local cache.
 Copies of parts of file may be scattered in different caches.
 Cache-consistency problem – keeping the cached
copies consistent with the master file.
 Especially on write operations

Distributed File Systems 11


Where to put File Caches
 In client memory
 Performance speed up; faster access
 Good when local usage is temporary
 Enables diskless workstations
 On client disk
 Good when local usage dominates (e.g., AFS)
 Caches larger files
 Helps protect clients from server crashes

Distributed File Systems 12


Caching
 We can employ caching to improve system
performance. There are four places in a distributed
system where we can hold data:
 On the server's disk
 In a cache in the server's memory
 In the client's memory
 On the client's disk, the first two places are not an
issue since any interface to the server can check the
centralized cache. It is in the last two places that
problems arise and we have to consider the issue
of cache consistency.

13
Caching
 Several approaches may be taken:
 write-through
 What if another client reads its own cached copy? All accesses would require
checking with the server first (adds network congestion) or require the server
to maintain state on who has what files cached. Write-through also does not
alleviate congestion on writes.
 delayed writes
 Data can be buffered locally (where consistency suffers) but files can be
updated periodically. A single bulk write is far more efficient than lots of little
writes every time any file contents are modified. Unfortunately the semantics
become ambiguous.
 write on close
 This is admitting that the file system uses session semantics.
 centralized control
 The server keeps track of who has what open in which mode. We would have to
support a stateful system and deal with signaling traffic.

14
Cache Location: Disk vs. Main Memory
 Advantages of disk caches:
 More reliable
 Cached data kept on disk are still there during recovery and
don’t need to be fetched again
 Advantages of main memory caches:
 Permit workstations to be diskless
 Data can be accessed more quickly
 Performance speedup in bigger memories
 Server caches (used to speed up disk I/O) are in main
memory regardless of where user caches are located; using
main memory caches on the user machine permits a single
caching mechanism for servers and users

15
File Cache Update Policies
 When does the client update the master file?
 I.e. when is cached data written from the cache to the file?

 Write-through – write data through to disk ASAP


 I.e., following write() or put(), same as on local disks.

 Reliable, but poor performance.

 Delayed-write – cache and then written to the server later.


 Write operations complete quickly; some data may be

overwritten in cache, saving needless network I/O.


 Poor reliability

 unwritten data may be lost when client machine crashes


 Inconsistent data
 Variation – scan cache at regular intervals

Distributed File Systems 16


DFS – File Consistency
 Is locally cached copy of the data consistent with the
master copy?
 Client-initiated approach
 Client initiates a validity check with server.
 Server verifies local data with the master copy
 E.g., time stamps, etc.
 Server-initiated approach
 Server records (parts of) files cached in each client.
 When server detects a potential inconsistency, it reacts

Distributed File Systems 17


DFS – Remote Service vs. Caching
 Remote Service – all file actions implemented by
server/service.
 RPC functions
 Use for small memory diskless machines
 Particularly applicable if large amount of write activity
 Cached System
 Many “remote” accesses handled efficiently by the local
cache
 Servers contacted only occasionally
 Reduces server load and network traffic.
 Enhances potential for scalability.
 Reduces total network overhead

Distributed File Systems 18


DFS – File Server Semantics
 Stateless Service
 Avoids state information in server by making each
request self-contained.
 Each request identifies the file and position in the
file.
 No need to establish and terminate a connection
by open and close operations.

 Poor support for locking or synchronization


among concurrent accesses

Distributed File Systems 19


DFS – File Server Semantics
(continued)
 Stateful Service
 Client opens a file (as in Unix & Windows).
 Server fetches information about file from disk, stores in
server memory,
 Returns to client a connection identifier unique to client and
open file.
 Identifier used for subsequent accesses until session ends.
 Server must reclaim space used by no longer active clients.
 Increased performance; fewer disk accesses.
 Server retains knowledge about file
 E.g., read ahead next blocks for sequential access
 E.g., file locking for managing writes
 Windows

Distributed File Systems 20


DFS –Server Semantics
Comparison
 Failure Recovery: Stateful server loses all volatile
state in a crash.
 Restore state by recovery protocol based on a dialog with
clients.
 Server needs to be aware of crashed client processes
 orphan detection and elimination.
 Failure Recovery: Stateless server failure and
recovery are almost unnoticeable.
 Newly restarted server responds to self-contained requests
without difficulty.

Distributed File Systems 21


Example Distributed File Systems
 NFS – Sun’s Network File System (ver. 3)

 NFS – Sun’s Network File System (ver. 4)

 AFS – the Andrew File System

Distributed File Systems 22


NFS
 Sun Network File System (NFS) has become
standard for distributed UNIX file access.
 NFS runs over LAN
 even WAN (slowly)
 Any system may be both a client and server

 Basic idea:
 Remote directory is mounted onto local directory
 Remote directory may contain mounted directories within

Distributed File Systems 23


24
NFS v3 — A Stateless Service
 Server retains no knowledge of client
 Server crashes invisible to client
 All hard work done on client side
 Every operation provides file handle
 Server caching
 Performance only
 Based on recent usage
 Client caching
 Client checks validity of caches files
 Client responsible for writing out caches
 …

Distributed File Systems 25


NFS v3 — A Stateless Service (continued)
 …
 No locking! No synchronization!

 Unix file semantics not guaranteed


 E.g., read after write
 Session semantics not even guaranteed
 E.g., open after close

Distributed File Systems 26


NFS v3 — A Stateless Service (continued)
 Solution: global lock manager
 Separate from NFS

 Typical locking operations


 Lock – acquire lock (non-blocking)
 Lockt – test a lock
 Locku – unlock a lock
 Renew – renew lease on a lock

Distributed File Systems 27


NFS procedures
NFS Functions
Procedures
LOOKUP Returns a file handle and attribute corresponding to a file name in a specified directory.
MKDIR Create a directory.
RMDIR Delete a directory.
READDIR Read a directory. Used by the Unix ls command, for example.
RENAME Rename a file.
REMOVE Delete a file.
CREATE Create a file.
READ Read from a file, by specify the file handle, starting offset and max. no. of bytes to read
(up to 8192).
WRITE Write to a file.
GETATTR Returns the attributes of a file: type of file, permissions, size, owner, last-access time,
and so on.
SETATTR Set the attributes of a file: permissions, owner, group, size,and last-access and last-
modification time.
LINK Create a Unix hard link to a file.
SYMLINK Create a symbolic link to a file.
READLINK Returns the name of the file to whidh the symbolic link points.
STATFS Returns the status of a file system. Used by the Unix df command, for example.

28
NFS Implementation
 Remote procedure calls for all operations
 Implemented in Sun ONC (Open Network Computing)
 Network communication is client-initiated
 RPC based on UDP (non-reliable protocol)
 Response to remote procedure call is
acknowledgement
 Lost requests are simply re-transmitted
 As many times as necessary to get a response!

29
Summary NFS
 That was version 3 of NFS
 Stateless file system
 High performance, simple protocol
 Based on UDP

 Everything has changed in NFS version 4


 First published in 2000
 Clarifications published in 2003
 Almost complete rewrite of NFS

Distributed File Systems 30


NFS Version 4
 Stateful file service
 Based on TCP – reliable transport protocol
 More ways to access server
 Compound requests
 I.e., multiple RPC calls in same packet

 More emphasis on security


 Mount protocol integrated with rest of NFS
protocol

Distributed File Systems 31


NFS Version 4

Distributed File Systems 32


NFS Version 4 (continued)
 Additional RPC operations
 Long list for managing files, caches, validating versions, etc.
 Also security, permissions, etc.
 Also
 Open() and close().
 With a server crash, some information may have to be recovered

Distributed File Systems 33


Distributed File Systems — Summary
 Performance is always an issue
 Tradeoff between performance and the semantics of file
operations (especially for shared files).
 Caching of file blocks is crucial in any file system,
distributed or otherwise.
 As memories get larger, most read requests can be serviced
out of file buffer cache (local memory).
 Maintaining coherency of those caches is a crucial design
issue.
 Current research addressing disconnected file
operation for mobile computers.

Distributed File Systems 34


NFS v3 and v4 compared
NFSv3 NFSv4
 A collection of protocols (file access,  One protocol to a single port (2049)
mount, lock, status)  Lease-based state
 Stateless  Supports UNIX and Windows file
 UNIX-centric, but seen in Windows semantics
too  Mandates strong authentication
 Deployed with weak authentication  String-based identities
 32 bit numeric uids  Real caching handshake
 Ad-hoc caching  Windows-like access
 UNIX permissions  Bans UDP
 Works over UDP, TCP  Uses a universal character set for file
 Needs a-priori agreement on names
character sets

35
Andrew File System (AFS)
 Completely different kind of file system

 Developed at Carnegie Mellon University


(CMU) to support all student computing.
 Consists of workstation clients and dedicated
file server machines.

Distributed File Systems 36


Andrew File System (AFS)
 Stateful
 Single name space
 File has the same names everywhere in the world.
 Lots of local file caching
 On workstation disks
 For long periods of time
 Originally whole files, now 64K file chunks.
 Good for distant operation because of local disk
caching

Distributed File Systems 37


AFS
 Once a file is cached, all operations are
performed locally.
 On close, if the file is modified, it is replaced
on the server.
 The client assumes that its cache is up to date!
 Server knows about all cached copies of file
 Callback messages from the server saying otherwise.
 …

Distributed File Systems 38


AFS
 On file open()
 If client has received a callback for file, it must
fetch new copy
 Otherwise it uses its locally-cached copy.
 Server crashes
 Transparent to client if file is locally cached
 Server must contact clients to find state of files

Distributed File Systems 39


Network File Sharing
 NFS  AFS
 Low administrative  High administrative
overhead overhead
 Standard UNIX  “Enhanced” backup
backup /restore /restore
 Available for most OS  Limited OS availability
 Distributed  Central administration
administration  replaces standard
 Uses standard utilities utilities

40
Stateful or stateless design?
 A stateless system is one in which the client sends a
request to a server, the server carries it out, and returns
the result
 Between these requests, no client-specific information is
stored on the server
 A stateful system is one where information about client
connections is maintained on the server
 State may refer to any information that a server stores
about a client: whether a file is open, whether a file is
being modified, cached data on the client, etc.

41
Stateful or stateless design?
 In a stateless system:
 Each request must be complete — the file has to be fully identified
and any offsets specified.
 If a server crashes and then recovers, no state was lost about client
connections because there was no state to maintain. This creates a
higher degree of fault tolerance.
 No remote open/close calls are needed (they only serve to establish
state).
 There is no server memory devoted to storing per-client data.
 There is no limit on the number of open files on the server; they
aren't "open" since the server maintains no per-client state.
 There are no problems if the client crashes. The server does not
have any state to clean up.

42
Stateful or stateless design?
 In a stateful file system:
 Requests are shorter (there is less information to send).
 Cache coherence is possible; the server can know which clients are
caching which blocks of a file.
 With shorter requests and caching, one will generally see better
performance in processing the requests.
 File locking is possible; the server can keep state that a certain
client is locking a file (or portion thereof).
 Although the list of stateless advantages is longer, history shows us
that the clear winner is the stateful approach. The ability to
maintain better cache coherence, lock files, and know whether files
are open by remote clients are all incredibly compelling
advantages.

43

You might also like