Distributed File System
Distributed File System
11
Distributed Storage
Storage needs increase almost exponentially – widespread use of e-mail,
photos, videos, logs, …
Can’t store everything on one large disk. If the disk fails, we lose
everything!
Solution: Store the user’s information along with “some redundant
information” across many disks.
If a disk fails, then you still have enough information in the surviving
disks. Bring in a new disk and replace the information lost by the failed
disk ASAP.
Simple? No. Today’s large data centers have so many disks that multiple
disk failures are more common! Permanent data loss becomes likely.
This presentation is about these issues.
2
Distributed Storage: what we care about.
Performance metrics:
Storage efficiency: how much redundant information do you
store?
Saturation throughput: how many I/O requests can the system
handle before it collapses (or delay increases to infinity)?
Rebuild time: how fast can you replace information lost due to
disk failure?
Mean time to data loss: under assumptions on failure and usage
models of the system, how long do you expect to run without any
permanent loss of data?
Encoding/Decoding/Update/Rebuild complexity: the
computation power needed for all these operations; also, how
many bytes of data on how many disks do you have to update if
you just want to update 1 byte of user data?
Sequential read/write bandwidth: bandwidth the system can
provide for streaming data
3
Distributed Files Systems (DFS)
A special case of distributed system
Allows multi-computer systems to share files
Even when no other IPC or RPC is needed
Sharing devices
Special case of sharing files
E.g.,
NFS (Sun’s Network File System)
Windows NT, 2000, XP
Andrew File System (AFS) & others …
6
Naming of Distributed Files
Naming – mapping between logical and physical objects.
A transparent DFS hides the location where in the network the
file is stored.
Location transparency – file name does not reveal the file’s
physical storage location.
File name denotes a specific, hidden, set of physical disk blocks.
13
Caching
Several approaches may be taken:
write-through
What if another client reads its own cached copy? All accesses would require
checking with the server first (adds network congestion) or require the server
to maintain state on who has what files cached. Write-through also does not
alleviate congestion on writes.
delayed writes
Data can be buffered locally (where consistency suffers) but files can be
updated periodically. A single bulk write is far more efficient than lots of little
writes every time any file contents are modified. Unfortunately the semantics
become ambiguous.
write on close
This is admitting that the file system uses session semantics.
centralized control
The server keeps track of who has what open in which mode. We would have to
support a stateful system and deal with signaling traffic.
14
Cache Location: Disk vs. Main Memory
Advantages of disk caches:
More reliable
Cached data kept on disk are still there during recovery and
don’t need to be fetched again
Advantages of main memory caches:
Permit workstations to be diskless
Data can be accessed more quickly
Performance speedup in bigger memories
Server caches (used to speed up disk I/O) are in main
memory regardless of where user caches are located; using
main memory caches on the user machine permits a single
caching mechanism for servers and users
15
File Cache Update Policies
When does the client update the master file?
I.e. when is cached data written from the cache to the file?
Basic idea:
Remote directory is mounted onto local directory
Remote directory may contain mounted directories within
28
NFS Implementation
Remote procedure calls for all operations
Implemented in Sun ONC (Open Network Computing)
Network communication is client-initiated
RPC based on UDP (non-reliable protocol)
Response to remote procedure call is
acknowledgement
Lost requests are simply re-transmitted
As many times as necessary to get a response!
29
Summary NFS
That was version 3 of NFS
Stateless file system
High performance, simple protocol
Based on UDP
35
Andrew File System (AFS)
Completely different kind of file system
40
Stateful or stateless design?
A stateless system is one in which the client sends a
request to a server, the server carries it out, and returns
the result
Between these requests, no client-specific information is
stored on the server
A stateful system is one where information about client
connections is maintained on the server
State may refer to any information that a server stores
about a client: whether a file is open, whether a file is
being modified, cached data on the client, etc.
41
Stateful or stateless design?
In a stateless system:
Each request must be complete — the file has to be fully identified
and any offsets specified.
If a server crashes and then recovers, no state was lost about client
connections because there was no state to maintain. This creates a
higher degree of fault tolerance.
No remote open/close calls are needed (they only serve to establish
state).
There is no server memory devoted to storing per-client data.
There is no limit on the number of open files on the server; they
aren't "open" since the server maintains no per-client state.
There are no problems if the client crashes. The server does not
have any state to clean up.
42
Stateful or stateless design?
In a stateful file system:
Requests are shorter (there is less information to send).
Cache coherence is possible; the server can know which clients are
caching which blocks of a file.
With shorter requests and caching, one will generally see better
performance in processing the requests.
File locking is possible; the server can keep state that a certain
client is locking a file (or portion thereof).
Although the list of stateless advantages is longer, history shows us
that the clear winner is the stateful approach. The ability to
maintain better cache coherence, lock files, and know whether files
are open by remote clients are all incredibly compelling
advantages.
43