Module II Notes
Module II Notes
Module II Notes
MODULE – II
UNIX file system: Unix File basics Types of files, structure of a regular files, Directory structure of a UNIX file
system, Navigating the File Systems, Creating and Managing File Systems, File System Backup, File Permission and
Access, blocks, Inodes, Superblock the PATH variable, Allocation of disk: – Directories - Inode assignment to a
new file - allocation of disk blocks. System calls for the file System: Open – Read - Write - Lseek – Close - File
creation - Creation of special files - Changing directory and root - changing owner and mode – stat and fstat - pipes –
Dup Input and Output Redirection: Input Redirection, Output Redirection, Error Redirection, Filter.
3. Special files
a. Block file('b')
e. Socket file('s')
4. Hidden files
1. Regular files
Regular files are the most common files and are used to contain data. Regular files are in the form
of text files or binary files:
Text files : Text files are regular files that contain information stored in ASCII format text
and are readable by the user. They contain printable characters.
Binary files : Binary files are regular files that contain information readable by the
computer. Commands and programs are stored in executable, binary files. Eg. Audio, video
files
2. Directory files
Directory files contain information that the system needs to access all types of files, but directory
files do not contain the actual file data. As a result, directories occupy less space than a regular file.
Each directory entry represents either a file or a subdirectory.
3. Special files
a. Block file('b') – Devices that read/write one block at a time. Eg Hard Disk
b. Character device file('c') – Devices that read/write character by character. Eg. Serial
modems
c. Named pipe file or just a pipe file('p') - FIFO files are also called pipes. Pipes are
created by one process to temporarily allow communication with another process.
INODE REPRESENTATION:
➢ Two versions of the inode :
• Disk copy : store the inode information when file is not in use
2. File type. Files may be of type regular, directory, character or block special, or
FIFO (pipes).
7. File size.
● The Structure of file is used to represent information and description of the file in Linux.
● It gives detailed information of the file so that it enables the user to perform operations
based on it.
● The file structure handles the file using the inodes.
● In UNIX, the data in files is not stored sequentially on disk. inode stores the disk block
numbers on which the data is present. But for such strategy, if a file had data across 1000
blocks, the inode would need to store the numbers of 1000 blocks and the size of the inode
would differ according to the size of the file.
To be able to have constant size and yet allow large files, indirect addressing is used.
The inodes have array of size 13 which for storing the block numbers, although, the number of
elements in array is independent of the storage strategy. The first 10 members of the array are
"direct addresses", meaning that they store the block numbers of actual data. The 11th member
is "single indirect", it stores the block number of the block which has "direct addresses". The
12th member is "double indirect", it stores block number of a "single indirect" block. And the
13th member is "triple indirect", it stores block number of a "double indirect" block. This
strategy can be extended to "quadruple" or "quintuple" indirect addressing.
• The boot block occupies the beginning of a file system, typically the first sector, and
contains the bootstrap code that is read into the machine to boot, or initialize, the operating
system.
• The super block describes the state of a file system - how large it is, how many files it can
store, where to find free space on the file system, and other information.
• The inode list is a list of inode’s that follows the super block in the file system.
• The data blocks start at the end of the inode list and contain file data and administrative
data. An allocated data block can belong to one and only one file in the file system.
• free list – a list of unused blocks
SUPER BLOCK:
The superblock is part of various file systems of the operating system UNIX and its
derivatives. It typically includes the following management information of the file system:
➢ A list of free blocks available on the file system - Pointer to free list
The kernel periodically writes the super block to disk if it had been modified so that it is
consistent with the data in the file system.
• iget – Kernel uses iget algorithm to allocate known inode whose inode number was
determined previously.
• working inode=root
/ (root) : inode = 2
home : inode = 5
ankit : inode = 31
abc.txt : inode = 12
A system call is the programmatic way in which a computer program requests a service
from the kernel of the operating system it is executed on. This may include hardware-related
services for example, accessing a hard disk drive, creation and execution of new processes,
and communication between them.
open()
The open system call allows us to open a file for reading, writing etc. Its syntax is :
The open system call returns an integer called the user file descriptor.
read()
read() system call is used to read the file contents.The syntax of the read system call is:
where fd is the file descriptor returned by open, buffer is the address of a data structure
in the user process that will contain the read data on successful completion of the call, count is
the number of bytes the user wants to read, and number is the number of bytes actually read.
write()
where the meaning of the variables fd, buffer, count, and number are the same as they are
for the read system call.
close()
A process closes an open file when it no longer wants to access it. The syntax for the
close system call is:
close(fd);
lseek:
The ordinary use of read and write system calls provides sequential access to a file,
but processes can use the lseek system call to position the I/O and allow random access to a file.
where fd is the file descriptor identifying the file, offset is a byte offset, and reference
indicates whether offset should be considered from the beginning of the file, from the current
position of the read/write offset, or from the end of the file. The return value ie. position, is the
byte offset where the next read or write will start.
The system calls stat and fstat allow processes to query the status of files, returning
information such as the file type, file owner, access permissions, file size, number of links,
inode number, and file access times. The syntax for the system calls is:
stat(pathname,statbuffer);
fstat(fd, statbuffer);
where pathname is a file name, fd is a file descriptor returned by a previous open call,
and statbuffer is the address of a data structure in the user process that will contain the status
information of the file on completion of the call. The system calls simply write the fields of the
inodeinto statbuffer.
stat() – returns the information in the inode for the file named by a string
fstat() - returns the information in the inode for the file named by a file descriptor
dup()
The dup() system call copies a file descriptor into the first free slot of the user file
descriptor table, returning the new file descriptor to the user. It works for all file types. The
syntax of the system call is :
newfd = dup(fd);
where fd is the file descriptor being duped and newfd is the new file descriptor that
references the file.
dup2:
The dup2 is a system call similar to dup in that it duplicates one file descriptor, making
them aliases, and then deleting the old file descriptor. Syntax for the dup2 system call is:
dup2(oldfd, newfd);
The oldfd is the source file descriptor that remains open after the call to dup2 and the
newfd is the destination file descriptor that will points to the same file as oldfd after this call
returns. It returns the value of the newfd up on success. A negative value will be returned when
error occurs.
Output redirection is a method in which the standard output of a command can be redirected to files
or as standard input for another command. The “>” sign is used for output redirection. The terminal
does not show the output; instead, it is written to a file or redirected as input to another command.
> operator overwrites the content of files.
date > specifications.txt
cat specificatons.txt
Here, the date command’s output is redirected to specifications.txt.
With the error redirection method, the standard errors can be redirected and written to a file. For
example.
If any error occurs, it will not show on the terminal window; rather, it will be stored in an error file.
If the error file already exists, then it will be overwritten.
Pipe | in Linux
• A pipe is a form of redirection (transfer of standard output to some other destination) that is
used in Linux to send the output of one command to another command for further
processing.
• The pipe is used to combine two or more commands, and in this, the output of one
command acts as input to another command, and this command’s output may act as input to
the next command, and so on.
This will sort the given file and print the unique values only.
A filter is a program that takes plain text (stored in a file or generated by another program) as
standard input, converts it to a meaningful format, and then returns it as standard output. Some of
the most commonly uses filters are described below.
1. cat : Display the text of the file line by line. Syntax: cat filename
2. head: Shows the first n lines of the specified text file. If no number of lines is specified, the first
10 lines are printed by default. Syntax: head -number_of_lines_to_print filename
3. tail: Works the same as the head, but in reverse order. tail prints the lines from bottom to top.
Syntax: tail -number_of_lines_to_print filename
4. sort: Sorts the rows alphabetically by default. Syntax : sort filename
5. uniq: Remove duplicate lines. Syntax: uniq [options] [filename]
6. wc: wc command gives the number of lines, words and characters in the data.
Syntax: wc -lwc filename
7. grep : It is a pattern or expression matching command. It searches for a pattern or regular
expression that matches in files or directories and then prints found matches. Syntax: grep [options]
"pattern to be matched" filename
8. sed : For filtering and transforming text data, sed is a very powerful stream editor utility. Here
sed replaces the word ‘is’ if any, with the string ‘was’ in the file.
sed 's/is/was/' sample.txt
9. nl : nl is used to number the lines of a file. Syntax: nl filename
10. less : It is used to read the contents of a text file one page (one screen) at a time. It has faster
access because if a file is large, it doesn’t access the complete file, but accesses it page by page.
Syntax : less filename
11. more : It reads files and displays the text one screen at a time. The more command also allows
the user do scroll up and down through the page. Syntax : more filename
12. awk : This command can scan files line by line, split each input line into fields, compare input
lines and fields to patterns and perform specified actions on matching lines
2. Incremental backup
Unlike full backups, incremental backups first look to see whether a file's modification time is more
recent than its last backup time. If it is not, the file has not been modified since the last backup and
can be skipped this time. On the other hand, if the modification date is more recent than the last
backup date, the file has been modified and should be backed up.
Incremental backups are used in conjunction with a regularly-occurring full backup (for example, a
weekly full backup, with daily incrementals).
3. Differential backup
Differential backups are similar to incremental backups in that both backup only modified files.
However, differential backups are cumulative -- in other words, with a differential backup, once a
file has been modified it continues to be included in all subsequent differential backups (until the
next, full backup).
This means that each differential backup contains all the files modified since the last full backup,
making it possible to perform a complete restoration with only the last full backup and the last
differential backup.
Like the backup strategy used with incremental backups, differential backups normally follow the
same approach: a single periodic full backup followed by more frequent differential backups.
II. Backup Media
1. Tape
Tape was the first widely-used removable data storage medium. It has the benefits of low media
cost and reasonably-good storage capacity. However, tape has some disadvantages -- it is subject to
wear, and data access on tape is sequential in nature. On the other hand, tape is one of the most
inexpensive mass storage media available, and it has a long history of reliability.
2. Disk
The primary reason for using disk drives as a backup medium would be speed. There is no faster
mass storage medium available. But disk storage is not the ideal backup medium, for a number of
reasons:
• Disk drives are not normally removable.
• Disk drives are expensive
• Disk drives are fragile.
3. Network
By itself, a network cannot act as backup media. But combined with mass storage technologies, it
can serve quite well. For instance, by combining a high-speed network link to a remote data center
containing large amounts of disk storage, the disadvantages of backing up to disks can be
overcome.
Important aspects of Backup devices :
1. Cost
2. Reliability
3. Availability
4. Speed
5. Usability
III. Backup commands in Linux
a. tar command
It is always beneficial to compress files before backup. The two most popular tools for
compression of regular files on Linux are gzip/gunzip and bzip2/bunzip2. gzip results
into a file with .gz extension and bzip2 results into a file with .bz2 extension.
tar - It is short for Tape Archive and is used to create and extract archive files. An
archive file is a compressed file that contains one or more files bundled together for
more accessible storage and portability.
The tar command can also be used to compress an archive using gzip or bzip2
compression. To create a compressed file, the -z or -j option can be used in conjunction
with the -c option.
$ tar -zcvf archive.tar.gz files_to_compress - This command creates a “.tar.gz” archive
of the specified files.
$ tar -jcvf archive.tar.bz2 files_to_compress - This command creates a “.tar.bz2” archive
of the specified files.
To unzip an archive :
To decompress a ‘.tar.gz’ file − $ tar -zxvf archive.tar.gz
To decompress a ‘.tar.bz2’ file − $ tar -jxvf archive.tar.bz2
b. cpio command (Copy Input and Output)
The cpio command is a tool for creating and extracting archives, or copying files from
one place to another. It is used for processing the archive files like *.cpio or *.tar. This
command can copy files to and from archives.
1. Copy-out Mode: Copy files named in name-list to the archive
$ls
file1 file2 file3
$ls | cpio -ov > /home/Tom/backup.cpio
2. Copy-in Mode: Extract files from the archive
cpio -iv < /home/Tom/backup.cpio
-i, –extract: Extract files from an archive and it runs only in copy-in mode.
-o, –create: Create the archive and it runs only in copy-out mode.
c. dd command (Disk/Data Duplicator)
The dd is a command-line utility for Linux whose primary purpose is to convert and copy
files.
• To backup the entire hard disk: To back up an entire copy of a hard disk to another
hard disk connected to the same system, execute the dd command as shown. In this
dd command example, the UNIX device name of the source hard disk is /dev/hda,
and device name of the target hard disk is /dev/hdb.
$ dd if = /dev/sda of = /dev/sdb
“if” represents inputfile, and “of” represents output file. So the exact copy of
/dev/sda will be available in /dev/sdb.
3. Incremental backup
• Numerical argument as 2 is incremental backup and subsequent number represent the
incremental backup corresponding to full back.
• -u updates the /etc/dumpdates files
[root@localhost ~]# /sbin/dump -2u -f /dev/st0 /dev/sda9
dump: date of this level 2 dump: wed feb 8 22:14:13 2017
dump: date of last level 1 dump: wed feb 8 22:13:06 2017
dump: dumping /dev/sda9 (/boot) to /dev/st0
…
4. Backup History
Back up history can be viewed in the file /etc/dumpdates.
• [root@localhost ~]# cat /etc/dumpdates
/dev/sda9 0 wed feb 8 22:10:13 2017 -0800
/dev/sda9 1 wed feb 8 22:13:06 2017 -0800
/dev/sda9 2 wed feb 8 22:14:13 2017 -0800
/dev/sda9 3 wed feb 8 22:15:27 2017 -0800
/dev/sda9 4 wed feb 8 22:15:43 2017 -0800
/dev/sda9 5 wed feb 8 22:15:34 2017 -0800
5. Exit status
Dump exits with zero status on success. Startup errors are indicated with an exit code of 1;
abnormal termination is indicated with an exit code of 3.
restore command
restore command in Linux system is used for restoring files from a backup created using dump.
Eg. restore -rf /dev/st0
The above command will restore a file system that is backed up at /dev/st0
While using the command ls –l , each file listed in the output has ten characters before the file
name. The first character specifies the type of the file.
Understanding file permissions
File permissions can be defined in two modes, each divided in 3 user categories:
1. Symbolic mode : In this mode each user category [u, g, o] and permission/right [r, w, x] is
defined using a combination of letters, where r stands for read rights, w for write rights and x for
execution rights.
User Denotations
u user/owner
g group
o other
a all
Operator Description
2.Absolute/Numeric/Octal mode : In this mode the rights are defined using a three-digit octal
number, where each digit represents the rights of a certain user category. The table below gives
numbers for all for permissions types.
Number File Permission Type Symbol
0 No permission
1 Execute --x
2 Write -w-
3 Write and Execute -wx
4 Read r--
5 Read and Execute r-x
6 Read and Write rw-
7 Read, Write and Execute rwx
In the above-given terminal window, we have changed the permissions of the file ‘sample to ‘764’.
where:
• [mask]: The new permissions mask you are applying. By default, the mask is presented as a
numeric (octal) value.
• [-S]: Displays the current mask as a symbolic value.
• [-p]: Displays the current mask along with the umask command, allowing it to be copied and
pasted as a future input.
• The system default permission values are 777 (rwxrwxrwx) for folders and 666 (rw-rw-rw-)
for files.
• The default mask for a non-root user is 002, changing the folder permissions to 775
(rwxrwxr-x), and file permissions to 664 (rw-rw-r--).
• The default mask for a root user us 022, changing the folder permissions to 755 (rwxr-xr-x),
and file permissions to 644 (rw-r--r--).
This shows us that the final permission value is the result of subtracting the umask value form the
default permission value (777 or 666).
For example, if you want to change the folder permission value from 777 (read, write, and execute
for all) to 444 (read for all), you need to apply a umask value of 333, since:
umask [mask]
where [mask]: The mask you want to apply, as either a symbolic or numeric value.
Set a new umask value by using symbolic values with the following syntax:
umask u= #,g= #,o= #
Eg : umask u+rw,g+w,o-r
where:
Note: Never use space after commas when setting up a symbolic mask value.
Once you calculate the required umask numeric value, set it up by using:
umask [mask]
Eg. umask 242
where [mask]: The numeric value of the mask you want to apply.
➢ The cd command allows us to change directories. When you open a terminal, you will be in
your home directory. To move around the file system, you will use cd. Examples:
➢ To navigate into the root directory, use "cd /"
➢ To navigate to your home directory, use "cd" or "cd ~"
➢ To navigate up one directory level, use "cd .."
➢ To navigate to the previous directory (or back), use "cd -"
➢ To navigate through multiple levels of directory at once, specify the full directory path that
you want to go to. For example, use, "cd /var/www" to go directly to the /www subdirectory
of /var/. As another example, "cd ~/Desktop" will move you to the Desktop subdirectory
inside your home directory.
Changing root ‘chroot’
It is used to change the root directory to a new directory in the Linux/Unix operating
system. The chroot command can be used only by a user operating with root user authority.
Syntax : chroot / path / to / new / root command
To run the ls command with the /tmp directory as the root file system, enter:
mkdir /tmp/bin
cp /bin/ls /tmp/bin
chroot /tmp ls
Change ownership : chown command