Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Operating Systems: Internals and Design Principles: File Management

Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

Operating

Systems:
Internals
and Design
Principles Chapter 12
File Management
Eighth Edition
By William Stallings
Files
 Data collections created by users

 The File System is one of the most important parts of the OS to a user

 Desirable properties of files:

Long-term existence
• files are stored on disk or other secondary storage and do not disappear when a user logs off

Sharable between processes


• files have names and can have associated access permissions that permit controlled sharing

Structure
• files can be organized into hierarchical or more complex structure to reflect the relationships
among files
File Systems
 Provide a means to store data organized as files as well as a
collection of functions that can be performed on files

 Maintain a set of attributes associated with the file

 Typical operations include:


 Create
 Delete
 Open
 Close
 Read
 Write
File Structure

Four terms are


commonly used when
discussing files:

Field Record File Database


Structure Terms
Field File
 basic element of data  collection of similar records
 contains a single value  treated as a single entity
 fixed or variable length  may be referenced by name
 access control restrictions
Database usually apply at the file level
 collection of related data
 relationships among
Record
elements of data are explicit  collection of related fields that
 designed for use by a number can be treated as a unit by some
of different applications application program
 consists of one or more types  fixed or variable length
of files
File Management System
Objectives
 Meet the data management needs of the user
 Guarantee that the data in the file are valid
 Optimize performance
 Provide I/O support for a variety of storage device types
 Minimize the potential for lost or destroyed data
 Provide a standardized set of I/O interface routines to user
processes
 Provide I/O support for multiple users in the case of multiple-
user systems
Minimal User Requirements
 Each user:
• should be able to create, delete, read, write and modify files
1

• may have controlled access to other users’ files


2

• may control what type of accesses are allowed to the files


3

• should be able to restructure the files in a form appropriate to the problem


4

• should be able to move data between files


5

• should be able to back up and recover files in case of damage


6

• should be able to access his or her files by name rather than by numeric identifier
7
User Program

Indexed
Pile Sequential Indexed Hashed
Sequential

Logical I/O

Basic I/O Supervisor

Basic File System

Disk Device Driver Tape Device Driver

Figure 12.1 File System Software Architecture


Device Drivers
 Lowest level
 Communicates directly with peripheral devices
 Responsible for starting I/O operations on a device
 Processes the completion of an I/O request
 Considered to be part of the operating system
Basic File System
 Also referred to as the physical I/O level

 Primary interface with the environment outside the computer


system

 Deals with blocks of data that are exchanged with disk or tape
systems

 Concerned with the placement of blocks on the secondary storage


device

 Concerned with buffering blocks in main memory

 Considered part of the operating system


Basic I/O Supervisor
 Responsible for all file I/O initiation and termination

 Control structures that deal with device I/O, scheduling, and file
status are maintained

 Selects the device on which I/O is to be performed

 Concerned with scheduling disk and tape accesses to optimize


performance

 I/O buffers are assigned and secondary memory is allocated at this


level

 Part of the operating system


Logical I/O

Provides
general-
purpose
Enables users record I/O
and capability Maintains
applications to basic data
access records about file
Access Method
 Level of the file system closest to the user

 Provides a standard interface between applications and the file


systems and devices that hold the data

 Different access methods reflect different file structures and


different ways of accessing and processing the data
Physical blocks Physical blocks
Records in main memory in secondary
buffers storage (disk)
File
Structure
Directory Access
management method Disk
Blocking scheduling

User & program


comands Operation, File I/O Free storage
File name manipulation management
functions

File
allocation

User access
control

File management concerns

Operating system concerns

Figure 12.2 Elements of File Management


File Organization and Access
 File organization is the logical structuring of the records as
determined by the way in which they are accessed

 In choosing a file organization, several criteria are important:


 short access time
 ease of update
 economy of storage
 simple maintenance
 reliability

 Priority of criteria depends on the application that will use the file
File Organization Types
The pile

The
The direct, or sequential
hashed, file file

Five of the common


file organizations are:

The indexed
The sequential file
indexed file
Variable-length records Fixed-length records
Variable set of fields Fixed set of fields in fixed order
Chronological order Sequential order based on key field

(a) Pile File (b) Sequential File


Exhaustive Exhaustive Partial
index index index

n
Index Main File
levels
Index
2
1

Overflow
File

(c) Indexed Sequential File Primary File


(variable-length records)

(d) Indexed File

Figure 12.3 Common File Organizations


The Pile
 Least complicated form of
file organization

 Data are collected in the


order they arrive

 Each record consists of one


burst of data

 Purpose is simply to
Variable-length records
accumulate the mass of
Variable set of fields
data and save it Chronological order
 Record access is by
exhaustive search (a) Pile File
E
The Sequential
File
 Most common form of file
structure

 A fixed format is used for


records

 Key field uniquely identifies


the record

 Typically
ble-length records used in batch Fixed-length records
ble set of applications
fields Fixed set of fields in fixed order
nological order Sequential order based on key field
 Only organization that is
easily stored on tape as well
(a)
asPile
diskFile (b) Sequential File
Exhaustive Exhaustive Partial
(a) Pile File

Indexed
Sequential File
n
Index Main File
 Adds an index to the file
to support random access
levels
Index
2
 Adds an overflow file 1
 Greatly reduces the time
required to access a single
record
Overflow
 Multiple levels of
indexing can be used to
File
provide greater efficiency
in access
(c) Indexed Sequential File
(a) Pile File (b) Sequential File
Exhaustive Exhaustive Partial

Indexed File index index index

 n
Records are accessed only through
Indextheir indexes Main File
levels
Index
 2Variable-length records can be
1 employed
 Exhaustive index contains one
entry for every record in the main
file
Overflow
 Partial index contains
Fileentries to
records where the field of interest
exists


(c) Indexed
Used mostly inSequential File
applications where Primary File
timeliness of information is (variable-length records)
critical

 Examples would be airline


reservation systems and inventory
control systems (d) Indexed File
Direct or Hashed File
 Access directly any block of a known
address

 Makes use of hashing on the key value


Examples are:
 Often used where:
 very rapid access is required • directories
 fixed-length records are used • pricing tables
 records are always accessed • schedules
one at a time
• name lists
B-Trees
 A balanced tree structure with all branches of equal length

 Standard method of organizing indexes for databases

 Commonly used in OS file systems

 Provides for efficient searching, adding, and deleting of items


Key1 Key2 Keyk–1

Subtree1 Subtree2 Subtree3 Subtreek–1 Subtreek

Figure 12.4 A B-tree Node with k Children


 every node has at most 2d – 1 keys
and 2d children or, equivalently,
2d pointers
B-Tree  every node, except for the root,
has at least d – 1 keys and d
Characteristics pointers, as a result, each internal
node, except the root, is at least
half full and has at least d children
 the root has at least 1 key and 2
A B-tree is characterized by its children
minimum degree d and satisfies  all leaves appear on the same level
the following properties: and contain no information. This
is a logical construct to terminate
the tree; the actual
implementation may differ.
 a nonleaf node with k pointers
contains k – 1 keys
23 51 61 71

2 10 30 32 39 43 44 52 59 60 67 68 73 85 88 96

(a) B-tree of minimum degree d = 3.

23 51 61 71

2 10 30 32 39 43 44 52 59 60 67 68 73 85 88 90 96

(b) Key = 90 inserted. This is a simple insertion into a node.

23 39 51 61 71

2 10 30 32 43 44 45 52 59 60 67 68 73 85 88 90 96

(c) Key = 45 inserted. This requires splitting a node into two parts and promoting one key to the root node.

51

23 39 61 71 88

2 10 30 32 43 44 45 52 59 60 67 68 73 84 85 90 96

(d) Key = 84 inserted. This requires splitting a node into two parts and promoting one key to the root node
This then requires the root node to be split and a new root created.

Figure 12.5 Inserting Nodes into a B-tree


Basic Information

Table 12.1 File Name Name as chosen by creator (user or program). Must be unique within a specific
directory.

File Type For example: text, binary, load module, etc.

File Organization For systems that support different organizations

Address Information

Volume Indicates device on which file is stored

Information Starting Address

Size Used
Starting physical address on secondary storage (e.g., cylinder, track, and block
number on disk)

Current size of the file in bytes, words, or blocks

Size Allocated The maximum size of the file

Elements of a Access Control Information

Owner User who is assigned control of this file. The owner may be able to grant/deny
access to other users and to change these privileges.

File Directory Access Information A simple version of this element would include the user's name and password for
each authorized user.

Permitted Actions Controls reading, writing, executing, transmitting over a network

Usage Information

Date Created When file was first placed in directory

Identity of Creator Usually but not necessarily the current owner

Date Last Read Access Date of the last time a record was read

Identity of Last Reader User who did the reading

Date Last Modified Date of the last update, insertion, or deletion

Identity of Last Modifier User who did the modifying

Date of Last Backup Date of the last time the file was backed up on another storage medium

Current Usage Information about current activity on the file, such as process or processes that
have the file open, whether it is locked by a process, and whether the file has been
(Table can be found on page 537 in textbook) updated in main memory but not yet on disk
Operations Performed
on a Directory
 To understand the requirements for a file structure, it is helpful to
consider the types of operations that may be performed on the
directory:

Create Delete List Update


Search
files files directory directory
Two-Level Scheme
Master directory has
There is one
an entry for each user Each user directory
directory for each
directory providing is a simple list of
user and a master
address and access the files of that user
directory
control information

Names must be File system can easily


unique only within the enforce access
collection of files of a restriction on
single user directories
Master Directory

Subirectory Subirectory Subirectory

Subirectory Subirectory File

File File File

Figure 12.6 Tree-Structured Directory


Master Directory
System
User_A
User_B
User_C

Directory Directory
"User_C" Directory "User_B" "User_A"

Draw
Word

Directory "Word" Directory "Draw"

Unit_A ABC

Directory "Unit_A"

ABC File
"ABC"

File Pathname: /User _B/Draw/ABC


"ABC"

Pathname: /User _B/Word/Unit_A/ABC

Figure 12.7 Example of Tree-Structured Directory


File Sharing

Two issues arise


when allowing files
to be shared among
a number of users:

management of
access rights simultaneous
access
Access Rights
 None  Appending
 the user would not be allowed to
read the user directory that  the user can add data to the file
includes the file but cannot modify or delete any
of the file’s contents
 Knowledge
 Updating
 the user can determine that the
file exists and who its owner is  the user can modify, delete, and
and can then petition the owner add to the file’s data
for additional access rights
 Changing protection
 Execution
 the user can load and execute a  the user can change the access
program but cannot copy it rights granted to other users
 Reading  Deletion
 the user can read the file for any  the user can delete the file from
purpose, including copying and the file system
execution
User Access Rights
Specific User
Owner All
Users Groups
usually the
initial creator all users who
of the file have access to
this system
individual a set of users
users who are who are not
has full rights
designated by individually
user ID defined
may grant these are
rights to public files
others
Record Blocking
1) Fixed-Length Blocking – fixed-
 Blocks are the unit of I/O length records are used, and an
with secondary storage integral number of records are
 for I/O to be stored in a block
performed records Internal fragmentation – unused
must be organized as space at the end of each block
blocks
2) Variable-Length Spanned Blocking
– variable-length records are used
and are packed into blocks with no
unused space

 Given the size of a block, 3) Variable-Length Unspanned


three methods of blocking Blocking – variable-length records
are used, but spanning is not
can be used: employed
R1 R2 R3 R4 Track 1

R5 R6 R7 R8 Track 2

Fixed Blocking

R1 R2 R3 R4 R4 R5 R6 Track 1

R6 R7 R8 R9 R9 R10 R11 R12 R13 Track 2

Variable Blocking: Spanned

R1 R2 R3 R4 R5 Track 1

R6 R7 R8 R9 R10 Track 2

Variable Blocking: Unspanned

Data Waste due to record fit to block size

Gaps due to hardware design Waste due to block size constraint


from fixed record size
Waste due to block fit to track size

Figure 12.8 Record Blocking Methods [WIED87]


File Allocation
 On secondary storage, a file consists of a collection of blocks
 The operating system or file management system is responsible for
allocating blocks to files

 The approach taken for file allocation may influence the approach
taken for free space management

 Space is allocated to a file as one or more portions (contiguous set of


allocated blocks)

 File allocation table (FAT)


 data structure used to keep track of the portions assigned to a file
Preallocation vs
Dynamic Allocation
 A preallocation policy requires that the maximum size of a file be
declared at the time of the file creation request

 For many applications it is difficult to estimate reliably the maximum


potential size of the file
 tends to be wasteful because users and application programmers tend
to overestimate size

 Dynamic allocation allocates space to a file in portions as needed


Portion Size
 In choosing a portion size there is a trade-off between efficiency from
the point of view of a single file versus overall system efficiency

 Items to be considered:
1) contiguity of space increases performance, especially for
Retrieve_Next operations, and greatly for transactions
running in a transaction-oriented operating system
2) having a large number of small portions increases the size of
tables needed to manage the allocation information
3) having fixed-size portions simplifies the reallocation of space
4) having variable-size or small fixed-size portions minimizes
waste of unused storage due to overallocation
Alternatives
Two major alternatives:

Variable, large Blocks


contiguous portions • small fixed portions provide
• provides better performance greater flexibility
• the variable size avoids waste • they may require large tables
or complex structures for their
• the file allocation tables are
allocation
small
• contiguity has been abandoned
as a primary goal
• blocks are allocated as needed
Table 12.2
File Allocation Methods

Contiguous Chained Indexed


Preallocation? Necessary Possible Possible
Fixed or variable Variable Fixed blocks Fixed blocks Variable
size portions?
Portion size Large Small Small Medium
Allocation Once Low to high High Low
frequency
Time to allocate Medium Long Short Medium
File allocation One entry One entry Large Medium
table size
File Allocation Table
File A File Name Start Block Length
0 1 2 3 4 File A 2 3
File B 9 5
5 6 7 8 9 File C 18 8
File D 30 2
File B
File E 26 3
10 11 12 13 14

15 16 17 18 19
File C
20 21 22 23 24
File E
25 26 27 28 29
File D
30 31 32 33 34

Figure 12.9 Contiguous File Allocation


File Allocation Table
File A File Name Start Block Length
0 1 2 3 4 File A 0 3
File B File B 3 5
5 6 7 8 9 File C 8 8
File D 19 2
File C
File E 16 3
10 11 12 13 14
File E File D
15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

Figure 12.10 Contiguous File Allocation (After Compaction)


File Allocation Table
File B File Name Start Block Length
0 1 2 3 4
File B 1 5
5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

Figure 12.11 Chained Allocation


File Allocation Table
File B File Name Start Block Length
0 1 2 3 4
File B 0 5
5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

Figure 12.12 Chained Allocation (After Consolidation)


File Allocation Table
File B File Name Index Block
0 1 2 3 4
File B 24
5 6 7 8 9

10 11 12 13 14

15 16 17 18 19
1
20 21 22 23 24 8
3
25 26 27 28 29 14
28
30 31 32 33 34

Figure 12.13 Indexed Allocation with Block Portions


File Allocation Table
File B File Name Index Block
0 1 2 3 4
File B 24
5 6 7 8 9

10 11 12 13 14

15 16 17 18 19
Start Block Length
20 21 22 23 24 1 3
28 4
25 26 27 28 29 14 1

30 31 32 33 34

Figure 12.14 Indexed Allocation with Variable-Length Portions


Free Space Management
 Just as allocated space must be managed, so must the unallocated
space

 To perform file allocation, it is necessary to know which blocks are


available

 A disk allocation table is needed in addition to a file allocation table


Bit Tables
 This method uses a vector containing one bit for each block on the
disk

 Each entry of a 0 corresponds to a free block, and each 1


corresponds to a block in use

Advantages:
• works well with any file
allocation method
• it is as small as possible
Chained Free Portions
 The free portions may be chained together by using a pointer and
length value in each free portion

 Negligible space overhead because there is no need for a disk


allocation table

 Suited to all file allocation methods

Disadvantages:

• leads to fragmentation
• every time you allocate a block you need to read
the block first to recover the pointer to the new
first free block before writing data to that block
Indexing
 Treats free space as a file and uses an index table as it would for file
allocation

 For efficiency, the index should be on the basis of variable-size


portions rather than blocks

 This approach provides efficient support for all of the file allocation
methods
Free Block List
There are two effective
Depending on the size of
techniques for storing a
Each block is assigned a the disk, either 24 or 32
small part of the free
number sequentially bits will be needed to store
block list in main
a single block number
memory:

the list can be treated as


a push-down stack with
the size of the free the first few thousand
the list of the elements of the stack
block list is 24 or 32
numbers of all free kept in main memory
times the size of the
blocks is maintained
corresponding bit
in a reserved portion
table and must be
of the disk
stored on disk the list can be treated as
a FIFO queue, with a
few thousand entries
from both the head and
the tail of the queue in
main memory
Volumes
 A collection of addressable sectors in secondary
memory that an OS or application can use for
data storage
 The sectors in a volume need not be consecutive
on a physical storage device
 they need only appear that way to the OS or application

 A volume may be the result of assembling and


merging smaller volumes
UNIX File  In the UNIX file system, six
types of files are distinguished:
Management
Regular, or ordinary
• contains arbitrary data in zero or more data blocks

Directory
• contains a list of file names plus pointers to associated inodes

Special
• contains no data but provides a mechanism to map physical devices to file names

Named pipes
• an interprocess communications facility

Links
• an alternative file name for an existing file

Symbolic links
• a data file that contains the name of the file it is linked to
Inodes
 All types of UNIX files are administered by the OS by means of
inodes

 An inode (index node) is a control structure that contains the key


information needed by the operating system for a particular file

 Several file names may be associated with a single inode


 an active inode is associated with exactly one file
 each file is controlled by exactly one inode
mode
Data Data Data
owners (2)
Data
timestamps (4)
Data Data Data
size

direct(0) Data

direct (1) Data Data

Pointers

Data Data
Pointers
direct(12)

single indirect Pointers


Data
double indirect Pointers

triple indirect
Pointers
block count Data

reference count
Pointers Pointers
flags (2)
Data
Pointers
generation number

blocksize Pointers Pointers


Data
extended attr size

extended
attribute Pointers
blocks Data

Inode

Figure 12.15 Structure of FreeBSD inode and File


File Allocation
 File allocation is done on a block basis

 Allocation is dynamic, as needed, rather than using preallocation

 An indexed method is used to keep track of each file, with part of


the index stored in the inode for the file

 In all UNIX implementations the inode includes a number of direct


pointers and three indirect pointers (single, double, triple)
Table 12.3
Capacity of a FreeBSD File with 4 kByte Block Size

Level Number of Blocks Number of Bytes

Direct 12 48K

Single Indirect 512 2M

Double Indirect 512 ´ 512 = 256K 1G

Triple Indirect 512 ´ 256K = 128M 512G


Inode table Directory

i1 Name1
i2 Name2
i3 Name3
i4 Name4

Figure 12.16 UNIX Directories and Inodes


Volume Structure
 A UNIX file Data
system resides Boot block Superblock Inode table
blocks
on a single
logical disk or
disk partition
and is laid out contains
contains
with the code
attributes and collection of
storage space
required to available for
information inodes for
following boot the
about the file each file
data files and
operating subdirectories
elements: system
system
User applications
User
space
GNU C library

System call interface

Inode Virtual File Directory


cache System (VFS) cache

File
system Individual File Kernel
Systems space

Buffer cache

Device drivers

Figure 12.17 Linux Virtual File System Context


System calls
System calls VFS
using file
using VFS system
system X
user interface Linux calls Mapping Disk I/O
interface
Virtual function File calls
File to file System X
System system X
User
Process

Files on secondary
storage maintained
by file system X

Figure 12.18 Linux Virtual File System Concept


Primary Object Types in VFS
Superblock Dentry Object
Object • represents a specific
• represents a specific directory entry
mounted file system

Inode Object File Object


• represents an open
• represents a file associated with
specific file a process
Windows File System
 The developers of Windows NT designed a new file system, the New
Technology File System (NTFS) which is intended to meet high-end
requirements for workstations and servers

 Key features of NTFS:


 recoverability
 security
 large disks and large files
 multiple data streams
 journaling
 compression and encryption
 hard and symbolic links
NTFS Volume
and File Structure
 NTFS makes use of the following disk storage concepts:

• the smallest physical storage unit on the disk


Sector • the data size in bytes is a power of 2 and is almost always
512 bytes

Cluster • one or more contiguous sectors


• the cluster size in sectors is a power of 2

• a logical partition on a disk, consisting of one or more


clusters and used by a file system to allocate space
Volume • can be all or a portion of a single disk or it can extend
across multiple disks
• the maximum volume size for NTFS is 264 bytes
Table 12.4
Windows NTFS Partition and Cluster Sizes

Volume Size Sectors per Cluster Cluster Size


£ 512 Mbyte 1 512 bytes
512 Mbyte - 1 Gbyte 2 1K
1 Gbyte - 2 Gbyte 4 2K
2 Gbyte - 4 Gbyte 8 4K
4 Gbyte - 8 Gbyte 16 8K
8 Gbyte - 16 Gbyte 32 16K
16 Gbyte - 32 Gbyte 64 32K
> 32 Gbyte 128 64K
partition System
boot Master File Table Files File Area
sector

Figure 12.19 NTFS Volume Layout


Master File Table (MFT)
 The heart of the Windows file system is the MFT

 The MFT is organized as a table of 1,024-byte rows, called records

 Each row describes a file on this volume, including the MFT itself,
which is treated as a file

 Each record in the MFT consists of a set of attributes that serve to


define the file (or folder) characteristics and the file contents
Table 12.5
Windows NTFS File and Directory Attribute Types
Attribute Type Description
Standard information Includes access attributes (read-only, read/write, etc.); time
stamps, including when the file was created or last modified;
and how many directories point to the file (link count).
Attribute list A list of attributes that make up the file and the file reference
of the MFT file record in which each attribute is located. Used
when all attributes do not fit into a single MFT file record.
File name A file or directory must have one or more names.
Security descriptor Specifies who owns the file and who can access it.
Data The contents of the file. A file has one default unnamed data
attribute and may have one or more named data attributes.
Index root Used to implement folders.
Index allocation Used to implement folders.
Volume information Includes volume-related information, such as the version and
name of the volume.
Bitmap Provides a map representing records in use on the MFT or
folder.

Note: Colored rows refer to required file attributes; the other attributes are optional.
I/O Manager
Log the transaction
Log File NTFS Driver Read/write a
Service mirrored or
Read/write
Fault Tolerant striped volume
the file
Flush the Write the Driver
Read/write
log file cache the disk
Disk Driver

Cache Load data from


Manager disk into
memory
Access the mapped
file or flush the cache

Virtual Memory
Manager

Figure 12.20 Windows NTFS Components


/(root)

(ro)
/system bin

etc

lib
/data (rw)
usr

/cache (rw)

/mnt/sdcard removable storage (rw)

ro: mounted as read only


rw: mounted as read and write

Figure 12.21 Typical Directory Tree of Android


SQLite
 Most widely deployed SQL database engine in the world

 Based on the Structured Query Language (SQL)

 Designed to provide a streamlined SQL-based database management


system suitable for embedded systems and other limited memory systems

 The full SQLite library can be implemented in under 400 KB

 In contrast to other database management systems, SQLite is not a separate


process that is accessed from the client application
 the library is linked in and thus becomes an integral part of the
application program
Summary
 File structure  Secondary storage management
 File management systems  File allocation
 File organization and access  Free space management
 The pile  Volumes
 The sequential file  Reliability
 The indexed sequential file  UNIX file management
 The indexed file  Inodes
 The direct or hashed file  File allocation
 B-Trees  Directories
 File directories  Volume structure
 Contents  Linux virtual file system
 Structure  Superblock object
 Naming  Inode object
 File sharing  Dentry object
 Access rights  File object
 Simultaneous access  Caches
 Record blocking  Windows file system
 Android file management  Key features of NTFS
 File system  NTFS volume and file structure
 SQLite  Recoverability

You might also like