DISTRIBUTEDFILESYS
DISTRIBUTEDFILESYS
DISTRIBUTEDFILESYS
11/21/00
Remote Files
File service vs. file server
University of Pennsylvania
File service interface: the specification of what the file system offers to its clients. File server: a process that runs on some machine and helps implement the file service. upload/download model remote access model Comparison between two model creating and deleting directories
11/21/00
Goals
University of Pennsylvania
1 Network transparency: uses do not have to aware the location of files to access them
location transparency: the name of a file does not reveal any kind of the file's physical storage location. /server1/dir1/dir2/X server1 can be moved anywhere (e.g., from CIS to SEAS). location independence: the name of a file does not need to be changed when the file's physical storage location changes. The above file X cannot moved to server2 if server1 is full and server2 is no so full.
2 High availability: system failures or scheduled activities such as backups, addition of nodes
11/21/00
Architecture
Computation model
University of Pennsylvania
file severs -- machines dedicated to storing files and performing storage and retrieval operations (for high performance) clients -- machines used for computational activities may have a local disk for caching remote files name server -- maps user specified names to stored objects, files and directories cache manager -- to reduce network delay, disk delay problem: inconsistency open, close, read, write, etc.
11/21/00
Design Issues
Naming and name resolution
University of Pennsylvania
Semantics of file sharing (Fig 13-4, Fig 13-5) Stateless versus stateful servers (Fig 13-8) Caching -- where to store files (Fig 13-9)
11/21/00
University of Pennsylvania
provide a single global directory: requires a unique file name for every file, location independent, cannot encompass heterogeneous environments and wide geographical areas
11/21/00
University of Pennsylvania
University of Pennsylvania
Caching
Four places to store files (Fig. 13-9)
University of Pennsylvania
servers disk: slow performance server caching: in main memory cache management issue, how much to cache, replacement strategy still slow due to network delay Used in high-performance web-search engine servers client caching in main memory can be used by diskless workstation faster to access from main memory than disk compete with the virtual memory system for physical memory space Three options (Fig. 13-10) client-cache on a local disk large files can be cached the virtual memory management is simpler a workstation can function even when it is disconnected from the network
11/21/00
University of Pennsylvania
2 total network overhead is lower for big chunks of data (caching) than a series of responses to specific requests.
3 disk access can be optimized better for large requests than random disk blocks
4 cache-consistency problem is the major drawback. If there are frequent writes, overhead due to the consistency problem is significant.
5 OS is simpler for remote service.
11/21/00
Cache Consistency
University of Pennsylvania
Reflecting changes to local cache to master copy Reflecting changes to master copy to local caches
Copy 1
write
Master copy update
Copy 2
11/21/00
University of Pennsylvania
if data is written and then deleted immediately, data need not be written at all (20-30 % of new data is deleted with 30 secs)
11/21/00
Cache Coherence
University of Pennsylvania
How to maintain consistency between locally cached data with the master data when the data has been modified by another client?
1 Client-initiated approach -- check validity on every access: too much overhead first access to a file (e.g., file open) every fixed time interval 2 Server-initiated approach -- server records, for each client, the (parts of) files it caches. After the server detects a potential inconsistency, it reacts. 3 Not allow caching when concurrent-write sharing occurs. Allow many readers. If a client opens for writing, inform all the clients to purge their cached data.
11/21/00
University of Pennsylvania
In session semantics, a client closes a modified file. In UNIX semantics, the server must be notified whenever a file is opened and the intended mode (read or write mode) must be indicated for every open. Disable cache when a file is opened in conflicting modes.
11/21/00
Replication
Reasons:
Increase reliability improve availability balance the servers workload
University of Pennsylvania
how to make replication transparent (Fig. 13-12) how to keep the replicas consistent
Problems -- mainly with updates 1 a replica is not updated due to its server failure 2 network partitioned
1 weighted vote for read and write 2 current synchronization site for each file group to control access
Replication Management:
11/21/00
University of Pennsylvania
Security
11/21/00