Distributed UNIT 5
Distributed UNIT 5
Distributed UNIT 5
UNIT V
File Models, File Accessing Models, File Sharing Semantics, File Caching
Schemes, File Replication, Atomic Transactions, Cryptography,
Authentication, Access control and Digital Signatures.
*****************
Introduction
Distributed file systems support the sharing of information in the
form of files and hardware resources. Goal of distributed file service
Enable programs to store and access remote files exactly as they do
local ones File system were originally developed for centralized
computer systems and desktop computers. File system was as an
operating system facility providing a convenient programming interface
to disk storage.
File Models:
Ø Data-Caching Model
This model attempts to reduce the network traffic of the previous
model by caching the data obtained from the server node. This takes
advantage of the locality feature of the found in file accesses. A
replacement policy such as LRU is used to keep the cache size bounded.
UNIX Semantics:
This enforces an absolute time ordering on all operations and
ensures that every read operation on a file sees the effects of all
previous write operations performed on that file.
ü Cache location
ü Modification Propagation
ü Cache Validation
Cache Location:
This refers to the place where the cached data is stored. Assuming
that the original location of a file is on its server disk, there are three
possible cache locations in a distributed file system:
Ø Client Disk
In this case a cache hit costs one disk access. This is somewhat
slower than having the cache in server main memory. Having the cache
in server main memory is also simpler.
Advantages:
ü Provides reliability against crashes since modification to cached
data is lost in a crash if the cache is kept in main memory.
ü Large storage capacity.
ü Contributes to scalability and reliability because on a cache hit the
access request can be serviced locally without the need to contact
the server.
Advantages:
ü Maximum performance gain.
ü Permits workstations to be diskless.
ü Contributes to reliability and scalability.
Modification Propagation:
When the cache is located on client’s nodes, a files data may
simultaneously be cached on multiple nodes. It is possible for caches to
become inconsistent when the file data is changed by one of the
clients and the corresponding data cached at other nodes are not
changed or discarded.
There are two design issues involved:
Write-Through Scheme
When a cache entry is modified, the new value is immediately
sent to the server for updating the master copy of the file.
Advantage:
High degree of reliability and suitability for UNIX-like semantics.
This is due to the fact that the risk of updated data getting lost in the
event of a client crash is very low since every modification is
immediately propagated to the server having the master copy.
Disadvantage:
This scheme is only suitable where the ratio of read-to-write
accesses is fairly large. It does not reduce network traffic for writes.
This is due to the fact that every write access has to wait until
the data is written to the master copy of the server. Hence the
advantages of data caching are only read accesses because the server
is involved for all write accesses.
Periodic write:
The cache is scanned periodically and any cached data that has
been modified since the last scan is sent to the server.
Write on close:
Modification to cached data is sent to the server when the client
closes the file. This does not help much in reducing network traffic for
those files that are open for very short periods or are rarely modified.
Client-initiated approach
The client contacts the server and checks whether its locally
cached data is consistent with the master copy. Two approaches may be
used:
Checking before every access:
This defeats the purpose of caching because the server needs to
be contacted on every access.
Periodic checking:
A check is initiated every fixed interval of time.
Server-Initiated Approach:
A client informs the file server when opening a file, indicating
whether a file is being opened for reading, writing, or both. The file
server keeps a record of which client has which file open and in what
mode.
So server monitors file usage modes being used by different
clients and reacts whenever it detects a potential for inconsistency. E.g.
if a file is open for reading, other clients may be allowed to open it for
reading, but opening it for writing cannot be allowed. So also, a new
client cannot open a file in any mode if the file is open for writing.
A replicated file is a file that has multiple copies, with each file on
a separate file server.
Advantages of Replication:
Increased Availability:
Alternate copies of a replicated data can be used when the
primary copy is unavailable.
Increased Reliability:
Due to the presence of redundant data files in the system,
recovery from catastrophic failures (e.g. hard drive crash) becomes
possible.
Read-only replication:
In this case the update problem does not arise. This method is too
restrictive.
Read-Any-Write-All Protocol:
A read operation on a replicated file is performed by reading any
copy of the file and a write operation by writing to all copies of the file.
Before updating any copy, all copies need to be locked, then they are
updated, and finally the locks are released to complete the write.
Examples
Ø Withdrawing money from your account
Ø Making an airline reservation
Ø Making a credit‐card purchase
Ø Registering for a course at WPI
Usually used in context of databases
Read: Read data from a file, table, etc., on behalf of the transaction.
Example:
Ø Planning a trip involving three flights
Ø Reservation for each flight “commits” individually
Ø Must be undone if entire trip cannot commit
Stable storage:
i.e., write to disk “atomically” (ppt, html).
Log File
i.e., record actions in a log before “committing” them (ppt, html).
Locking Protocols
Serialize Readand Writeoperations of same data by separate
Transactions.
Begin Transaction
Ø Place a begin entry in log
Write
Ø Write updated data to log
Abort Transaction
Ø Place abort entry in log
Security Requirements
Ø Confidentiality
ü Protection from disclosure to unauthorized persons
Ø Integrity
ü Maintaining data consistency
Ø Authentication
ü Assurance of identity of person or originator of data
Ø Availability
ü Legitimate users have access when they need it
Ø Access control
ü Un authorized users are kept out
Modern cryptography:
Hash Functions:
Ø Creates a unique “fingerprint” for a message
Ø Hash has to be protected in some way
******************
The following are the Important Questions from UNIT-I:
******************