Cse-IV-unix and Shell Programming (10cs44) - Notes
Cse-IV-unix and Shell Programming (10cs44) - Notes
Cse-IV-unix and Shell Programming (10cs44) - Notes
10CS44
I.A. Marks : 25
Total Hours : 52
Hours/Week : 04
Exam Marks: 100
PART A
UNIT 1:
1. The UNIX Operating System, the UNIX architecture and Command Usage, The File System
6 Hours
UNIT 2:
2.
6 Hours
3.
7 Hours
UNIT 4:
4.
7 Hours
PART B
UNIT 5:
5.
6 Hours
UNIT 6:
6.
6 Hours
UNIT 7:
7. awk An Advanced Filter
7 Hours
UNIT 8:
8. perl - The Master Manipulator
Dept of CSE,SJBIT
7 Hours
10CS44
Text Book
1. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
Dept of CSE,SJBIT
10CS44
Table of Contents
Sl No
1
Unit description
Unit 1
The Unix Operating System
90-124
Unit 7
awk An Advanced Filter
78-89
Unit 6
Essential Shell Programming
63-77
Unit 5
Filters using regular expressions
35-62
Unit 4
More file attributes
20-34
Unit 3
The Shell, The Process
1-19
Unit 2
Basic File Attributes
Page no
125-146
Unit 8
perl - The Master Manipulator
Dept of CSE,SJBIT
147-160
10CS44
UNIT 1
.
The Unix Operating System, The UNIX architecture and Command Usage, The File
System
6 Hours
Text Book
1. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata
McGraw Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg,
Thomson, 2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 1
10CS44
Objectives
System Utilities
UNIX Operating system allows complex tasks to be performed with a few keystrokes. It
doesnt tell or warn the user about the consequences of the command.
Kernighan and Pike (The UNIX Programming Environment) lamented long ago that as
the UNIX system has spread, the fraction of its users who are skilled in its application has
decreased. However, the capabilities of UNIX are limited only by your imagination.
Dept of CSE, SJBIT
page 2
10CS44
2. Features of UNIX OS
Several features of UNIX have made it popular. Some of them are:
Portable
UNIX can be installed on many hardware platforms. Its widespread use can be traced to
the decision to develop it using the C language.
Multiuser
The UNIX design allows multiple users to concurrently share hardware and software
Multitasking
UNIX allows a user to run more than one program at a time. In fact more than one
program can be running in the background while a user is working foreground.
Networking
While UNIX was developed to be an interactive, multiuser, multitasking system,
networking is also incorporated into the heart of the operating system. Access to another
system uses a standard communications protocol known as Transmission Control
Protocol/Internet Protocol (TCP/IP).
Organized File System
UNIX has a very organized file and directory system that allows users to organize and
maintain files.
Device Independence
UNIX treats input/output devices like ordinary files. The source or destination for file
input and output is easily controlled through a UNIX design feature called redirection.
Utilities
UNIX provides a rich library of utilities that can be use to increase user productivity.
page 3
10CS44
Ken Thompson then teamed up with Dennis Ritchie, the author of the first C compiler in
1973. They rewrote the UNIX kernel in C - this was a big step forwards in terms of the
system's portability - and released the Fifth Edition of UNIX to universities in 1974. The
Seventh Edition, released in 1978, marked a split in UNIX development into two main
branches: SYSV (System 5) and BSD (Berkeley Software Distribution). BSD arose from
the University of California at Berkeley where Ken Thompson spent a sabbatical year. Its
development was continued by students at Berkeley and other research institutions.
SYSV was developed by AT&T and other commercial companies. UNIX flavors based
on SYSV have traditionally been more conservative, but better supported than BSDbased flavors.
Until recently, UNIX standards were nearly as numerous as its variants. In early
days, AT&T published a document called System V Interface Definition (SVID).
X/OPEN (now The Open Group), a consortium of vendors and users, had one too, in
the X/Open Portability Guide (XPG). In the US, yet another set of standards, named
Portable Operating System Interface for Computer Environments (POSIX), were
developed at the behest of the Institution of Electrical and Electronics Engineers
(IEEE).
In 1998, X/OPEN and IEEE undertook an ambitious program of unifying the two
standards. In 2001, this joint initiative resulted in a single specification called the
Single UNIX Specification, Version 3 (SUSV3), that is also known as IEEE
1003.1:2001 (POSIX.1). In 2002, the International Organization for Standardization
(ISO) approved SUSV3 and IEEE 1003.1:2001.
Some of the commercial UNIX based on system V are:
IBM's AIX
Hewlett-Packard's HPUX
SCO's Open Server Release 5
Silicon Graphics' IRIS
DEC's Digital UNIX
Sun Microsystems' Solaris 2
Conclusion
page 4
10CS44
In this chapter we defined an operating system. We also looked at history of UNIX and
features of UNIX that make it a popular operating system. We also discussed the
convergence of different flavors of UNIX into Single Unix Specification (SUS) and
Portable Operating System Interface for Computing Environments (POSIX).
page 5
10CS44
Objectives
Users
Shell
Kernel
Hardware
System Calls
page 6
10CS44
UNIX architecture comprises of two major components viz., the shell and the kernel. The
kernel interacts with the machines hardware and the shell with the user.
The kernel is the core of the operating system. It is a collection of routines written in C. It
is loaded into memory when the system is booted and communicates directly with the
hardware. User programs that need to access the hardware use the services of the kernel
via use of system calls and the kernel performs the job on behalf of the user. Kernel is
also responsible for managing systems memory, schedules processes, decides their
priorities.
The shell performs the role of command interpreter. Even though theres only one kernel
running on the system, there could be several shells in action, one for each user whos
logged in. The shell is responsible for interpreting the meaning of metacharacters if any,
found on the command line before dispatching the command to the kernel for execution.
2. Locating Files
All UNIX commands are single words like ls, cd, cat, etc. These names are in lowercase.
These commands are essentially files containing programs, mainly written in C. Files are
stored in directories, and so are the binaries associated with these commands. You can
find the location of an executable program using type command:
$ type ls
ls is /bin/ls
This means that when you execute ls command, the shell locates this file in /bin directory
and makes arrangements to execute it.
The Path
The sequence of directories that the shell searches to look for a command is specified in
its own PATH variable. These directories are colon separated. When you issue a
command, the shell searches this list in the sequence specified to locate and execute it.
page 7
10CS44
4. Command Structure
UNIX commands take the following general form:
verb [options] [arguments]
where verb is the command name that can take a set of optional options and one or more
optional arguments.
Commands, options and arguments have to be separated by spaces or tabs to enable the
shell to interpret them as words. A contiguous string of spaces and tabs together is called
a whitespace. The shell compresses multiple occurrences of whitespace into a single
whitespace.
Options
An option is preceded by a minus sign (-) to distinguish it from filenames.
Example: $ ls l
There must not be any whitespaces between and l. Options are also arguments, but
given a special name because they are predetermined. Options can be normally compined
with only one sign. i.e., instead of using
$ ls l a t
we can as well use,
$ ls lat
Because UNIX was developed by people who had their own ideas as to what options
should look like, there will be variations in the options. Some commands use + as an
option prefix instead of -.
Filename Arguments
Many UNIX commands use a filename as argument so that the command can take input
from the file. If a command uses a filename as argument, it will usually be the last
argument, after all options.
Example:
cp file1 file2 file3 dest_dir
rm file1 file2 file3
The command with its options and argumens is known as the command line, which is
considered as complete after [Enter] key is pressed, so that the entire line is fed to the
shell as its input for interpretation and execution.
Exceptions
Some commands in UNIX like pwd do not take any options and arguments. Some
commands like who may or may not be specified with arguments. The ls command can
run without arguments (ls), with only options (ls l), with only filenames (ls f1 f2), or
using a combination of both (ls l f1 f2). Some commands compulsorily take options
(cut). Some commands like grep, sed can take an expression as an argument, or a set of
instructions as argument.
page 8
10CS44
Combining Commands
Instead of executing commands on separate lines, where each command is processed and
executed before the next could be entered, UNIX allows you to specify more than one
command in the single command line. Each command has to be separated from the other
by a ; (semicolon).
wc sample.txt ; ls l sample.txt
You can even group several commands together so that their combined output is
redirected to a file.
(wc sample.txt ; ls l sample.txt) > newfile
When a command line contains a semicolon, the shell understands that the command on
each side of it needs to be processed separately. Here ; is known as a metacharacter.
Note: When a command overflows into the next line or needs to be split into multiple
lines, just press enter, so that the secondary prompt (normally >) is displayed and you can
enter the remaining part of the command on the next line.
page 9
10CS44
When you use man command, it starts searching the manuals starting from section 1. If it
locates a keyword in one section, it wont continue the search, even if the keyword occurs
in another section. However, we can provide the section number additionally as argument
for man command.
For example, passwd appears in section 1 and section 4. If we want to get documentation
of passwd in section 4, we use,
$ man 4 passwd
OR
$ man s4 passwd (on Solaris)
page 10
10CS44
A man page is divided into a number of compulsory and optional sections. Every
command doesnt need all sections, but the first three (NAME, SYNOPSIS and
DESCRIPTION) are generally seen in all man pages. NAME presents a one-line
introduction of the command. SYNOPSIS shows the syntax used by the command and
DESCRIPTION provides a detailed description.
The SYNOPSIS follows certain conventions and rules:
If a command argument is enclosed in rectangular brackets, then it is optional;
otherwise, the argument is required.
The ellipsis (a set if three dots) implies that there can be more instances of the
preceding word.
The | means that only one of the options shows on either side of the pipe can be
used.
All the options used by the command are listed in OPTIONS section. There is a separate
section named EXIT STATUS which lists possible error conditions and their numeric
representation.
Note: You can use man command to view its own documentation ($ man man). You can
also set the pager to use with man ($ PAGER=less ; export PAGER). To understand
which pager is being used by man, use $ echo $PAGER.
The following table shows the organization of man documentation.
Section
1
2
3
4
5
6
7
8
Subject (SVR4)
User programs
Kernels system calls
Library functions
Administrative file formats
Miscellaneous
Games
Special files (in /dev)
Administration commands
Subject (Linux)
User programs
Kernels system calls
Library functions
Special files (in /dev)
Administrative file formats
Games
Macro packages and conventions
Administration commands
page 11
10CS44
ftpusers
Function
Erases text
Interrupts a command
Terminates login session or a program that expects its input from
keyboard
Stops scrolling of screen output and locks keyboard
Resumes scrolling of screen output and unlocks keyboard
Kills command line without executing it
Kills running program but creates a core file containing the memory
image of the program
Suspends process and returns shell prompt; use fg to resume job
Alternative to [Enter]
Alternative to [Enter]
Restores terminal to normal status
Conclusion
In this chapter, we looked at the architecture of UNIX and the division of labor between
two agencies viz., the shell and the kernel. We also looked at the structure and usage of
UNIX commands. The man documentation will be the most valuable source of
documentation for UNIX commands. Also, when the keyboard sequences wont
sometimes work as expected because of different terminal settings. We listed the possible
remedial keyboard sequences when that happens.
page 12
10CS44
Objectives
Types of files
UNIX Filenames
Directories and Files
Absolute and Relative Pathnames
pwd print working directory
cd change directory
mkdir make a directory
rmdir remove directory
The PATH environmental variable
ls list directory contents
The UNIX File System
1. Types of files
A simple description of the UNIX system is this:
On a UNIX system, everything is a file; if something is not a file, it is a process.
A UNIX system makes no difference between a file and a directory, since a directory is
just a file containing names of other files. Programs, services, texts, images, and so forth,
are all files. Input and output devices, and generally all devices, are considered to be files,
according to the system.
Most files are just files, called regular files; they contain normal data, for example text
files, executable files or programs, input for or output from a program and so on.
While it is reasonably safe to suppose that everything you encounter on a UNIX system is
a file, there are some exceptions.
Directories: files that are lists of other files.
Special files or Device Files: All devices and peripherals are represented by files. To read
or write a device, you have to perform these operations on its associated file. Most
special files are in /dev.
Links: a system to make a file or directory visible in multiple parts of the system's file
tree.
(Domain) sockets: a special file type, similar to TCP/IP sockets, providing interprocess
networking protected by the file system's access control.
Named pipes: act more or less like sockets and form a way for processes to communicate
with each other, without using network socket semantics.
page 13
10CS44
Directory File
A directory contains no data, but keeps details of the files and subdirectories that it
contains. A directory file contains one entry for every file and subdirectory that it houses.
Each entry has two components namely, the filename and a unique identification number
of the file or directory (called the inode number).
When you create or remove a file, the kernel automatically updates its corresponding
directory by adding or removing the entry (filename and inode number) associated with
the file.
Device File
All the operations on the devices are performed by reading or writing the file representing
the device. It is advantageous to treat devices as files as some of the commands used to
access an ordinary file can be used with device files as well.
Device filenames are found in a single directory structure, /dev. A device file is not really
a stream of characters. It is the attributes of the file that entirely govern the operation of
the device. The kernel identifies a device from its attributes and uses them to operate the
device.
2. Filenames in UNIX
On a UNIX system, a filename can consist of up to 255 characters. Files may or may not
have extensions and can consist of practically any ASCII character except the / and the
Null character. You are permitted to use control characters or other nonprintable
characters in a filename. However, you should avoid using these characters while naming
a file. It is recommended that only the following characters be used in filenames:
Alphabets and numerals.
The period (.), hyphen (-) and underscore (_).
UNIX imposes no restrictions on the extension. In all cases, it is the application that
imposes that restriction. Eg. A C Compiler expects C program filenames to end with .c,
Oracle requires SQL scripts to have .sql extension.
A file can have as many dots embedded in its name. A filename can also begin with or
end with a dot.
UNIX is case sensitive; cap01, Chap01 and CHAP01 are three different filenames that
can coexist in the same directory.
page 14
10CS44
page 15
10CS44
/home/frank/src
6. cd - change directory
You can change to a new directory with the cd, change directory, command. cd will
accept both absolute and relative path names.
Syntax
cd [directory]
Examples
cd
changes to user's home directory
cd /
changes directory to the system's root
cd .. goes up one directory level
cd ../.. goes up two directory levels
cd /full/path/name/from/root changes directory to absolute path named
(note the leading slash)
cd path/from/current/location changes directory to path relative to current
location (no leading slash)
page 16
10CS44
Common Options
When no argument is used, the listing will be of the current directory. There are many
very useful options for the ls command. A listing of many of them follows. When using
the command, string the desired options together preceded by "-".
-a Lists all files, including those beginning with a dot (.).
-d Lists only names of directories, not the files in the directory
-F Indicates type of entry with a trailing symbol: executables with *, directories with / and
symbolic links with @
-R Recursive list
-u Sorts filenames by last access time
-t Sorts filenames by last modification time
-i Displays inode number
-l Long listing: lists the mode, link information, owner, size, last modification (time). If the file is
a symbolic link, an arrow (-->) precedes the pathname of the linked-to file.
The mode field is given by the -l option and consists of 10 characters. The first character
is one of the following:
CHARACTER
d
IF ENTRY IS A
directory
page 17
10CS44
plain file
block-type special file
character-type special file
symbolic link
socket
The next 9 characters are in 3 sets of 3 characters each. They indicate the file access
permissions: the first 3 characters refer to the permissions for the user, the next three for
the users in the Unix group assigned to the file, and the last 3 to the permissions for other
users on the system.
Designations are as follows:
r read permission
w write permission
x execute permission
- no permission
Examples
1. To list the files in a directory:
$ ls
Content
/bin
Common programs, shared by the system, the system administrator and the users.
/dev
Contains references to all the CPU peripheral hardware, which are represented as files with
special properties.
/etc
Most important system configuration files are in /etc, this directory contains data similar to
those in the Control Panel in Windows
/home
/lib
Library files, includes files for all kinds of programs needed by the system and the users.
page 18
10CS44
/sbin
/tmp
Temporary space for use by the system, cleaned upon reboot, so don't use this for saving any
work!
/usr
/var
Conclusion
In this chapter we looked at the UNIX file system and different types of files UNIX
understands. We also discussed different commands that are specific to directory files
viz., pwd, mkdir, cd, rmdir and ls. These commands have no relevance to ordinary or
device files. We also saw filenaming conventions in UNIX. Difference between the
absolute and relative pathnames was highlighted next. Finally we described some of the
important subdirectories contained under root (/).
page 19
10CS44
UNIT 2
2.
6 Hours
Text Book
2. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 20
10CS44
The file type and its permissions are associated with each file. Links indicate the
number of file names maintained by the system. This does not mean that there are so
many copies of the file. File is created by the owner. Every user is attached to a group
owner. File size in bytes is displayed. Last modification time is the next field. If you
change only the permissions or ownership of the file, the modification time remains
unchanged. In the last field, it displays the file name.
For example,
$ ls l
total 72
-rw-r--r--rw-r--r--rw-rw-rw-rw-r--r-drwxr-xr-x
drwxr-xr-x
1
1
1
1
2
2
kumar
kumar
kumar
kumar
kumar
kumar
chap01
chap02
dept.lst
genie.sh
helpdir
progs
page 21
10CS44
Directories are easily identified in the listing by the first character of the first
column, which here shows a d. The significance of the attributes of a directory differs a
good deal from an ordinary file. To see the attributes of a directory rather than the files
contained in it, use ls ld with the directory name. Note that simply using ls d will not
list all subdirectories in the current directory. Strange though it may seem, ls has no
option to list only directories.
File Ownership
When you create a file, you become its owner. Every owner is attached to a group
owner. Several users may belong to a single group, but the privileges of the group are set
by the owner of the file and not by the group members. When the system administrator
creates a user account, he has to assign these parameters to the user:
The user-id (UID) both its name and numeric representation
The group-id (GID) both its name and numeric representation
File Permissions
UNIX follows a three-tiered file protection system that determines a files access
rights. It is displayed in the following format:
Filetype owner (rwx) groupowner (rwx) others (rwx)
For Example:
-rwxr-xr-- 1 kumar metal 20500 may 10 19:21 chap02
rwx
owner/user
r-x
group owner
r-others
The first group has all three permissions. The file is readable, writable and
executable by the owner of the file. The second group has a hyphen in the middle slot,
which indicates the absence of write permission by the group owner of the file. The third
group has the write and execute bits absent. This set of permissions is applicable to others.
page 22
10CS44
You can set different permissions for the three categories of users owner, group
and others. Its important that you understand them because a little learning here can be a
dangerous thing. Faulty file permission is a sure recipe for disaster
Changing File Permissions
A file or a directory is created with a default set of permissions, which can be
determined by umask. Let us assume that the file permission for the created file is -rw-r-r--. Using chmod command, we can change the file permissions and allow the owner to
execute his file. The command can be used in two ways:
In a relative manner by specifying the changes to the current permissions
In an absolute manner by specifying the final permissions
Relative Permissions
chmod only changes the permissions specified in the command line and leaves the
other permissions unchanged. Its syntax is:
chmod category operation permission filename(s)
chmod takes an expression as its argument which contains:
user category (user, group, others)
operation to be performed (assign or remove a permission)
type of permission (read, write, execute)
Category
u - user
g - group
o - others
a - all (ugo)
operation
+ assign
- remove
= absolute
permission
r - read
w - write
x - execute
23:38 xstart
23:38 xstart
The command assigns (+) execute (x) permission to the user (u), other permissions
remain unchanged.
chmod ugo+x xstart
chmod a+x xstart
chmod +x xstart
or
or
page 23
-rwxr-xr-x
10CS44
23:38 xstart
Absolute Permissions
Here, we need not to know the current file permissions. We can set all nine
permissions explicitly. A string of three octal digits is used as an expression. The
permission can be represented by one octal digit for each category. For each category, we
add octal digits. If we represent the permissions of each category by one octal digit, this
is how the permission can be represented:
Octal
0
1
2
3
4
5
6
7
Permissions
----x
-w-wx
r-r-x
rwrwx
Significance
no permissions
execute only
write only
write and execute
read only
read and execute
read and write
read, write and execute
We have three categories and three permissions for each category, so three octal
digits can describe a files permissions completely. The most significant digit represents
user and the least one represents others. chmod can use this three-digit string as the
expression.
Using relative permission, we have,
page 24
10CS44
or
page 25
10CS44
This makes all the files and subdirectories found in the shell_scripts directory, executable
by all users. When you know the shell meta characters well, you will appreciate that the *
doesnt match filenames beginning with a dot. The dot is generally a safer but note that
both commands change the permissions of directories also.
Directory Permissions
It is possible that a file cannot be accessed even though it has read permission,
and can be removed even when it is write protected. The default permissions of a
directory are,
rwxr-xr-x (755)
A directory must never be writable by group and others
Example:
mkdir c_progs
ls ld c_progs
drwxr-xr-x
If a directory has write permission for group and others also, be assured that every
user can remove every file in the directory. As a rule, you must not make directories
universally writable unless you have definite reasons to do so.
Changing File Ownership
Usually, on BSD and AT&T systems, there are two commands meant to change the
ownership of a file or directory. Let kumar be the owner and metal be the group owner. If
sharma copies a file of kumar, then sharma will become its owner and he can manipulate
the attributes
chown changing file owner and chgrp changing group owner
On BSD, only system administrator can use chown
On other systems, only the owner can change both
chown
Changing ownership requires superuser permission, so use su command
ls -l note
-rwxr----x
page 26
10CS44
Once ownership of the file has been given away to sharma, the user file
permissions that previously applied to Kumar now apply to sharma. Thus, Kumar can no
longer edit note since there is no write privilege for group and others. He can not get back
the ownership either. But he can copy the file to his own directory, in which case he
becomes the owner of the copy.
chgrp
This command changes the files group owner. No superuser permission is required.
ls l dept.lst
-rw-r--r--
-rw-r--r--
Source: Sumitabha Das, UNIX Concepts and Applications, 4th edition, Tata
McGraw Hill, 2006
page 27
10CS44
The vi Editor
To write and edit some programs and scripts, we require editors. UNIX provides vi
editor for BSD system created by Bill Joy. Bram Moolenaar improved vi editor and
called it as vim (vi improved) on Linux OS.
vi Basics
To add some text to a file, we invoke,
vi <filename>
In all probability, the file doesnt exist, and vi presents you a full screen with the
filename shown at the bottom with the qualifier. The cursor is positioned at the top and
all remaining lines of the screen show a ~. They are non-existent lines. The last line is
reserved for commands that you can enter to act on text. This line is also used by the
system to display messages. This is the command mode. This is the mode where you can
pass commands to act on text, using most of the keys of the keyboard. This is the default
mode of the editor where every key pressed is interpreted as a command to run on text.
You will have to be in this mode to copy and delete text
For, text editing, vi uses 24 out of 25 lines that are normally available in the
terminal. To enter text, you must switch to the input mode. First press the key i, and you
are in this mode ready to input text. Subsequent key depressions will then show up on the
screen as text input.
After text entry is complete, the cursor is positioned on the last character of the
last line. This is known as current line and the character where the cursor is stationed is
the current cursor position. This mode is used to handle files and perform substitution.
After the command is run, you are back to the default command mode. If a word has been
misspelled, use ctrl-w to erase the entire word.
Now press esc key to revert to command mode. Press it again and you will hear a
beep. A beep in vi indicates that a key has been pressed unnecessarily. Actually, the text
entered has not been saved on disk but exists in some temporary storage called a buffer.
To save the entered text, you must switch to the execute mode (the last line mode).
Invoke the execute mode from the command mode by entering a: which shows up in the
last line.
The Repeat Factor
vi provides repeat factor in command and input mode commands. Command
mode command k moves the cursor one line up. 10k moves cursor 10 lines up.
To undo whenever you make a mistake, press
Esc u
page 28
10CS44
FUNCTION
inserts text
appends text
inserts at beginning of line
appends text at end of line
opens line below
opens line above
replaces a single character
replaces with a text
replaces entire line
page 29
10CS44
Command
:W
:x
:wq
:w <filename>
:w! <filename>
:q
:q!
:sh
:recover
Action
saves file and remains in editing mode
saves and quits editing mode
saves and quits editing mode
save as
save as, but overwrites existing file
quits editing mode
quits editing mode by rejecting changes made
escapes to UNIX shell
recovers file from a crash
Navigation
A command mode command doesnt show up on screen but simply performs a function.
To move the cursor in four directions,
k
j
h
l
moves cursor up
moves cursor down
moves cursor left
moves cursor right
Word Navigation
Moving by one character is not always enough. You will often need to move faster
along a line. vi understands a word as a navigation unit which can be defined in two ways,
depending on the key pressed. If your cursor is a number of words away from your
desired position, you can use the word-navigation commands to go there directly. There
are three basic commands:
b
e
w
Example,
5b takes the cursor 5 words back
3w takes the cursor 3 words forward
Moving to Line Extremes
Moving to the beginning or end of a line is a common requirement.
To move to the first character of a line
0 or |
30| moves cursor to column 30
page 30
10CS44
scrolls forward
scrolls backward
Editing Text
The editing facilitates in vi are very elaborate and invoke the use of operators. They use
operators, such as,
d
y
delete
yank (copy)
Deleting Text
x
dd
yy
6dd
Moving Text
Moving text (p) puts the text at the new location.
page 31
10CS44
p and P place text on right and left only when you delete parts of lines. But the same keys
get associated with below and above when you delete complete lines
Copying Text
Copying text (y and p) is achieved as,
yy
10yy
Joining Lines
J
4J
u
U
vim (LINUX) lets you undo and redo multiple editing instructions. u behaves
differently here; repeated use of this key progressively undoes your previous actions. You
could even have the original file in front of you. Further 10u reverses your last 10 editing
actions. The function of U remains the same.
You may overshoot the desired mark when you keep u pressed, in which case use
ctrl-r to redo your undone actions. Further, undoing with 10u can be completely reversed
with 10ctrl-r. The undoing limit is set by the execute mode command: set undolevels=n,
where n is set to 1000 by default.
Repeating the Last Command
The . (dot) command is used for repeating the last instruction in both editing and
command mode commands
For example:
2dd deletes 2 lines from current line and to repeat this operation, type. (dot)
Searching for a Pattern
/ search forward
? search backward
/printf
The search begins forward to position the cursor on the first instance of the
word
page 32
10CS44
?pattern
Searches backward for the most previous instance of the pattern
Repeating the Last Pattern Search
n
Function
Interactive substitution: sometimes you may like to selectively replace a string. In that
case, add the c parameter as the flag at the end:
:1,$s/director/member/gc
Each line is selected in turn, followed by a sequence of carets in the next line, just below
the pattern that requires substitution. The cursor is positioned at the end of this caret
sequence, waiting for your response.
The ex mode is also used for substitution. Both search and replace operations also
use regular expressions for matching multiple patterns.
page 33
10CS44
The features of vi editor that have been highlighted so far are good enough for a
beginner who should not proceed any further before mastering most of them. There are
many more functions that make vi a very powerful editor. Can you copy three words or
even the entire file using simple keystrokes? Can you copy or move multiple sections of
text from one file to another in a single file switch? How do you compile your C and Java
programs without leaving the editor? vi can do all this.
Source: Sumitabha Das, UNIX Concepts and Applications, 4th edition, Tata
McGraw Hill, 2006
page 34
10CS44
UNIT 3
3.
7 Hours
Text Book
3. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 35
10CS44
The Shell
Introduction
In this chapter we will look at one of the major component of UNIX architecture The
Shell. Shell acts as both a command interpreter as well as a programming facility. We
will look at the interpretive nature of the shell in this chapter.
Objectives
page 36
10CS44
After the command execution is complete, the prompt reappears and the shell
returns to its waiting role to start the next cycle. You are free to enter another
command.
Matches
Any number of characters including none
A single character
A single character either an i, j or k
A single character that is within the ASCII range of characters x and x
A single character that is not an i,j or k (Not in C shell)
A single character that is not within the ASCII range of the characters x
and x (Not in C Shell)
{pat1,pat2} Pat1, pat2, etc. (Not in Bourne shell)
Examples:
To list all files that begin with chap, use
$ ls chap*
To list all files whose filenames are six character long and start with chap, use
$ ls chap??
Note: Both * and ? operate with some restrictions. for example, the * doesnt match all
files beginning with a . (dot) ot the / of a pathname. If you wish to list all hidden
filenames in your directory having at least three characters after the dot, the dot must be
matched explicitly.
$ ls .???*
However, if the filename contains a dot anywhere but at the beginning, it need not be
matched explicitly.
Similarly, these characters dont match the / in a pathname. So, you cannot use
$ cd /usr?local
to change to /usr/local.
page 37
10CS44
- To match all filenames with a single-character extension but not the .c ot .o files,
use *.[!co]
- To match all filenames that dont begin with an alphabetic character,
use [!a-zA-Z]*
Matching totally dissimilar patterns
This feature is not available in the Bourne shell. To copy all the C and Java source
programs from another directory, we can delimit the patterns with a comma and then put
curly braces around them.
$ cp $HOME/prog_sources/*.{c,java} .
The Bourne shell requires two separate invocations of cp to do this job.
$ cp /home/srm/{project,html,scripts/* .
The above command copies all files from three directories (project, html and scripts) to
the current directory.
page 38
10CS44
Standard error: The file (stream) representing error messages that emanate from the
command or shell, connected to the display.
The standard input can represent three input sources:
The keyboard, the default source.
A file using redirection with the < symbol.
Another program using a pipeline.
The standard output can represent three possible destinations:
The terminal, the default destination.
A file using the redirection symbols > and >>.
As input to another program using a pipeline.
A file is opened by referring to its pathname, but subsequent read and write operations
identify the file by a unique number called a file descriptor. The kernel maintains a table
of file descriptors for every process running in the system. The first three slots are
generally allocated to the three standard streams as,
0 Standard input
1 Standard output
2 Standard error
These descriptors are implicitly prefixed to the redirection symbols.
Examples:
Assuming file2 doesnt exist, the following command redirects the standard output to file
myOutput and the standard error to file myError.
$ ls l file1 file2 1>myOutput 2>myError
To redirect both standard output and standard error to a single file use:
$ ls l file1 file2 1>| myOutput 2>| myError OR
$ ls l file1 file2 1> myOutput 2>& 1
page 39
10CS44
7. Pipes
With piping, the output of a command can be used as input (piped) to a subsequent
command.
$ command1 | command2
Output from command1 is piped into input for command2.
This is equivalent to, but more efficient than:
$ command1 > temp
$ command2 < temp
$ rm temp
Examples
$ ls -al | more
$ who | sort | lpr
8. Creating a tee
tee is an external command that handles a character stream by duplicating its input. It
saves one copy in a file and writes the other to standard output. It is also a filter and
hence can be placed anywhere in a pipeline.
Example: The following command sequence uses tee to display the output of who and
saves this output in a file as well.
$ who | tee users.lst
9. Command substitution
page 40
10CS44
The shell enables the connecting of two commands in yet another way. While a pipe
enables a command to obtain its standard input from the standard output of another
command, the shell enables one or more command arguments to be obtained from the
standard output of another command. This feature is called command substitution.
Example:
$ echo Current date and time is `date`
Observe the use of backquotes around date in the above command. Here the output of the
command execution of date is taken as argument of echo. The shell executes the enclosed
command and replaces the enclosed command line with the output of the command.
Similarly the following command displays the total number of files in the working
directory.
$ echo There are `ls | wc l` files in the current directory
Observe the use of double quotes around the argument of echo. If you use single quotes,
the backquote is not interpreted by the shell if enclosed in single quotes.
page 41
10CS44
file=$base$ext
echo $file
// prints foo.c
Conclusion
In this chapter we saw the major interpretive features of the shell. The following is a
summary of activities that the shell performs when a command line is encountered at the
prompt.
Parsing: The shell first breaks up the command line into words using spaces
and tabs as delimiters, unless quoted. All consecutive occurrences of a space
or tab are replaced with a single space.
Variable evaluation: All $-prefixed strings are evaluated as variables, unless
quoted or escaped.
Command substitution: Any command surrounded by backquotes is executed
by the shell, which then replaces the standard output of the command into the
command line.
Redirection: The shell then looks for the characters >, < and >> to open the
files they point to.
Wild-card interpretation: The shell then scans the command line for wildcards (the characters *, ?, [ and ]). Any word containing a wild-card is
replaced by a sorted list of filenames that match the pattern. The list of these
filenames then forms the arguments to the command.
PATH evaluation: It finally looks for the PATH variable to determine the
sequence of directories it has to search in order to find the associated binary.
The Process
Dept of CSE, SJBIT
page 42
10CS44
Introduction
A process is an OS abstraction that enables us to look at files and programs as their time
image. This chapter discusses processes, the mechanism of creating a process, different
states of a process and also the ps command with its different options. A discussion on
creating and controlling background jobs will be made next. We also look at three
commands viz., at, batch and cron for scheduling jobs. This chapter also looks at nice
command for specifying job priority, signals and time command for getting execution
time usage statistics of a command.
Objectives
Process Basics
ps: Process Status
Mechanism of Process Creation
Internal and External Commands
Process States and Zombies
Background Jobs
nice: Assigning execution priority
Processes and Signals
job Control
at and batch: Execute Later
cron command: Running Jobs Periodically
time: Timing Usage Statistics at process runtime
1. Process Basics
UNIX is a multiuser and multitasking operating system. Multiuser means that several
people can use the computer system simultaneously (unlike a single-user operating
system, such as MS-DOS). Multitasking means that UNIX, like Windows NT, can work
on several tasks concurrently; it can begin work on one task and take up another before
the first task is finished.
When you execute a program on your UNIX system, the system creates a special
environment for that program. This environment contains everything needed for the
system to run the program as if no other program were running on the system. Stated in
other words, a process is created. A process is a program in execution. A process is said
to be born when the program starts execution and remains alive as long as the program is
active. After execution is complete, the process is said to die.
The kernel is responsible for the management of the processes. It determines the time and
priorities that are allocated to processes so that more than one process can share the CPU
resources.
Just as files have attributes, so have processes. These attributes are maintained by the
kernel in a data structure known as process table. Two important attributes of a process
are:
page 43
10CS44
BSD
f
aux
Significance
Full listing showing PPID of each process
All processes (user and system) processes
page 44
U user
-l
-t term
l
t term
10CS44
Processes of user user only
Processes of all users excluding processes not
associated with terminal
Long listing showing memory related information
Processes running on the terminal term
Examples
$ ps
PID TTY
TIME
CMD
4245 pts/7 00:00:00 bash
5314 pts/7 00:00:00 ps
The output shows the header specifying the PID, the terminal (TTY), the cumulative
processor time (TIME) that has been consumed since the process was started, and the
process name (CMD).
$ ps -f
UID
PID
root
14931
sartin 14932
sartin 15339
PPID
136
14931
14932
C
0
0
7
STIME
08:37:48
08:37:50
16:32:29
TTY
ttys0
ttys0
ttys0
TIME
0:00
0:00
0:00
COMMAND
rlogind
-sh
ps f
TIME
0:34
41:55
0:03
2:47
20:04
CMD
sched
init
sh
cron
vi
page 45
10CS44
page 46
10CS44
getty
fork
shell
login
fork-exec
fork-exec
When the system moves to multiuser mode, init forks and execs a getty for every
active communication port.
Each one of these gettys prints the login prompt on the respective terminal and then
goes off to sleep.
When a user tries to log in, getty wakes up and fork-execs the login program to verify
login name and password entered.
On successful login, login for-execs the process representing the login shell.
init goes off to sleep, waiting for the children to terminate. The processes getty and
login overlay themselves.
When the user logs out, it is intimated to init, which then wakes up and spawns
another getty for that line to monitor the next login.
page 47
10CS44
It is possible for the parent itself to die before the child dies. In such case, the child
becomes an orphan and the kernel makes init the parent of the orphan. When this
adopted child dies, init waits for its death.
page 48
10CS44
In the following command, the sorted file and any error messages are placed in the file
nohup.out.
$ nohup sort sales.dat &
1252
Sending output to nohup.out
Note that the shell has returned the PID (1252) of the process.
When the user logs out, the child turns into an orphan. The kernel handles such situations
by reassigning the PPID of the orphan to the systems init process (PID 1) - the parent of
all shells. When the user logs out, init takes over the parentage of any process run with
nohup. In this way, you can kill a parent (the shell) without killing its child.
Additional Points
When you run a command in the background, the shell disconnects the standard input
from the keyboard, but does not disconnect its standard output from the screen. So,
output from the command, whenever it occurs, shows up on screen. It can be confusing if
you are entering another command or using another program. Hence, make sure that both
standard output and standard error are redirected suitably.
OR
Important:
1. You should relegate time-consuming or low-priority jobs to the background.
2. If you log out while a background job is running, it will be terminated.
page 49
10CS44
A high nice value implies a lower priority. A program with a high nice number is friendly
to other programs, other users and the system; it is not an important job. The lower the
nice number, the more important a job is and the more resources it will take without
sharing them.
Example:
$ nice wc l hugefile.txt
OR
$ nice wc l hugefile.txt &
The default nice value is set to 10.
We can specify the nice value explicitly with n number option where number is an
offset to the default. If the n number argument is present, the priority is incremented by
that amount up to a limit of 20.
Example:
$ nice n 5 wc l hugefile.txt &
page 50
10CS44
Issuing the kill command sends a signal to a process. The default signal is SIGTERM
signal (15). UNIX programs can send or receive more than 20 signals, each of which is
represented by a number. (Use kill l to list all signal names and numbers)
If the process ignores the signal SIGTERM, you can kill it with SIGKILL signal (9) as,
$ kill -9 123
OR
$ kill s KILL 123
The system variable $! stores the PID of the last background job. You can kill the last
background job without knowing its PID by specifying $ kill $!
Note: You can kill only those processes that you own; You cant kill processes of
other users. To kill all background jobs, enter kill 0.
9. Job Control
A job is a name given to a group of processes that is typically created by piping a series
of commands using pipeline character. You can use job control facilities to manipulate
jobs. You can use job control facilities to,
1. Relegate a job to the background (bg)
2. Bring it back to the foreground (fg)
3. List the active jobs (jobs)
4. Suspend a foreground job ([Ctrl-z])
5. Kill a job (kill)
The following examples demonstrate the different job control facilities.
Assume a process is taking a long time. You can suspend it by pressing [Ctrl-z].
[1] + Suspended
wc l hugefile.txt
A suspended job is not terminated. You can now relegate it to background by,
$ bg
You can start more jobs in the background any time:
$ sort employee.dat > sortedlist.dat &
[2]
530
$ grep director emp.dat &
[3]
540
You can see a listing of these jobs using jobs command,
$ jobs
[3] + Running
grep director emp.dat &
[2] - Running
sort employee.dat > sortedlist.dat &
[1]
Suspended
wc l hugefile.txt
You can bring a job to foreground using fg %jobno OR fg %jobname as,
$ fg %2
OR
$ fg %sort
page 51
10CS44
with
Director
at
PM^G^G
>
The above job will display the following message on your screen (/dev/term/43) at 1:00
PM, along with two beeps(^G^G).
Lunch with Director at 1 PM
To see which jobs you scheduled with at, enter at -l. Working with the preceding
examples, you may see the following results:
job 756603300.a at Tue Sep 11 01:00:00 2007
job 756604200.a at Fri Sep 14 14:23:00 2007
The following forms show some of the keywords and operations permissible with at
command:
at hh:mm
Schedules job at the hour (hh) and minute (mm) specified, using a
24-hour clock
at hh:mm month day year
Schedules job at the hour (hh), minute (mm), month, day,
and year specified
at -l
Lists scheduled jobs
at now +count time-units
Schedules the job right now plus count number of
timeunits; time units can be minutes, hours, days, or weeks
at r job_id
Cancels the job with the job number matching job_id
page 52
10CS44
To sort a collection of files, print the results, and notify the user named boss that the job
is done, enter the following commands:
$ batch
sort /usr/sales/reports/* | lp
echo Files printed, Boss! | mailx -sJob done boss
The system returns the following response:
job 7789001234.b at Fri Sep 7 11:43:09 2007
The date and time listed are the date and time you pressed <Ctrl-d> to complete the batch
command. When the job is complete, check your mail; anything that the commands
normally display is mailed to you. Note that any job scheduled with batch command goes
into a special at queue.
page 53
10CS44
crontab files are stored in the file /var/spool/cron/crontabs/<user> where <user> is the
login-id of the user. Only the root user has access to the system crontabs, while each user
should only have access to his own crontabs.
A typical entry in the crontab file of a user will have the following format.
minute hour day-of-month month-of-year day-of-week command
where, Time-Field Options are as follows:
Field
Range
----------------------------------------------------------------------------------------------minute
00 through 59 Number of minutes after the hour
hour
00 through 23 (midnight is 00)
day-of-month 01 through 31
month-of-year 01 through 12
day-of-week 01 through 07 (Monday is 01, Sunday is 07)
----------------------------------------------------------------------------------------------The first five fields are time option fields. You must specify all five of these fields. Use
an asterisk (*) in a field if you want to ignore that field.
Examples:
00-10 17 * 3.6.9.12 5 find / -newer .last_time print >backuplist
In the above entry, the find command will be executed every minute in the first 10
minutes after 5 p.m. every Friday of the months March, June, September and December
of every year.
30 07 * * 01 sort /usr/wwr/sales/weekly |mail -sWeekly Sales srm
In the above entry, the sort command will be executed with /usr/www/sales/weekly as
argument and the output is mailed to a user named srm at 7:30 a.m. each Monday.
page 54
10CS44
The sum of user time and sys time actually represents the CPU time. This could be
significantly less than the real time on a heavily loaded system.
Conclusion
In this chapter, we saw an important abstraction of the UNIX operating system viz.,
processes. We also saw the mechanism of process creation, the attributes inherited by the
child from the parent process as well as the shells behavior when it encounters internal
commands, external commands and shell scripts. This chapter also discussed background
jobs, creation and controlling jobs as well as controlling processes using signals. We
finally described three commands viz., at, batch and cron for process scheduling, with a
discussion of time command for obtaining time usage statistics of process execution.
page 55
10CS44
Objectives
The Shell
Environment Variables
Common Environment Variables
Command Aliases (bash and korn)
Command History Facility (bash and korn)
In-Line Command Editing (bash and korn)
Miscellaneous Features (bash and korn)
The Initialization Scripts
The Shell
The UNIX shell is both an interpreter as well as a scripting language. An interactive shell
turns noninteractive when it executes a script.
Bourne Shell This shell was developed by Steve Bourne. It is the original UNIX shell.
It has strong programming features, but it is a weak interpreter.
C Shell This shell was developed by Bill Joy. It has improved interpretive features, but
it wasnt suitable for programming.
Korn Shell This shell was developed by David Korn. It combines best features of the
bourne and C shells. It has features like aliases, command history. But it lacks some
features of the C shell.
Bash Shell This was developed by GNU. It can be considered as a superset that
combined the features of Korn and C Shells. More importantly, it conforms to POSIX
shell specification.
Environment Variables
page 56
10CS44
The environment variables are managed by the shell. As opposed to regular shell
variables, environment variables are inherited by any program you start, including
another shell. New processes are assigned a copy of these variables, which they can read,
modify and pass on in turn to their own child processes.
The set statement display all variables available in the current shell, but env command
displays only environment variables. Note than env is an external command and runs in a
child process.
There is nothing special about the environment variable names. The convention is to use
uppercase letters for naming one.
Stored information
size of the shell history file in number of lines
path to your home directory
HOSTNAME
LOGNAME
MAIL
login name
location of your incoming mail folder
paths to search for man pages
MANPATH
PATH
PS1
PS2
secondary prompt
PWD
SHELL
TERM
UID
current shell
terminal type
USER
MAILCHECK
CDPATH
user ID
page 57
10CS44
The prompt strings (PS1, PS2): The prompt that you normally see (the $ prompt) is the
shells primary prompt specified by PS1. PS2 specifies the secondary prompt (>). You
can change the prompt by assigning a new value to these environment variables.
Shell used by the commands with shell escapes (SHELL): This environment variable
specifies the login shell as well as the shell that interprets the command if preceded with
a shell escape.
The Bash and korn prompt can do much more than displaying such simple information as
your user name, the name of your machine and some indication about the present
working directory. Some examples are demonstrated next.
$ PS1=[PWD]
[/home/srm] cd progs
[/home/srm/progs] _
Bash and Korn also support a history facility that treats a previous command as an event
and associates it with a number. This event number is represented as !.
$ PS1=[!]
$ PS1=[! $PWD]
[42] _
[42 /home/srm/progs] _
$ PS1=\h>
saturn> _
Aliases
Bash and korn support the use of aliases that let you assign shorthand names to frequently
used commands. Aliases are defined using the alias command. Here are some typical
aliases that one may like to use:
Command History
Bash and Korn support a history feature that treats a previous command as an event and
associates it with an event number. Using this number you can recall previous commands,
edit them if required and reexecute them.
page 58
10CS44
The history command displays the history list showing the event number of every
previously executed command. With bash, the complete history list is displayed, while
with korn, the last 16 commands. You can specify a numeric argument to specify the
number of previous commands to display, as in, history 5 (in bash) or history -5 (korn).
By default, bash stores all previous commands in $HOME/.bash_history and korn stores
them in $HOME/.sh_history. When a command is entered and executed, it is appended to
the list maintained in the file.
page 59
10CS44
2. Tilde Substitution
The ~ acts as a shorthand representation for the home directory. A configuration file
like .profile that exists in the home directory can be referred to both as $HOME/.profile
and ~/.profile.
You can also toggle between the directory you switched to most recently and your current
directory. This is done with the ~- symbols (or simply -, a hyphen). For example, either
of the following commands change to your previous directory:
cd ~OR
cd
page 60
10CS44
out. To make them permanent, use certain startup scripts. The startup scripts are executed
when the user logs in. The initialization scripts in different shells are listed below:
.profile (Bourne shell)
.profile and .kshrc (Korn shell)
.bash_profile (or .bash_login) and .bashrc (Bash)
.login and .cshrc (C shell)
The Profile
When logging into an interactive login shell, login will do the authentication, set the
environment and start your shell. In the case of bash, the next step is reading the general
profile from /etc, if that file exists. bash then looks for ~/.bash_profile, ~/.bash_login and
~/.profile, in that order, and reads and executes commands from the first one that exists
and is readable. If none exists, /etc/bashrc is applied.
When a login shell exits, bash reads and executes commands from the file,
~/.bash_logout, if it exists.
The profile contains commands that are meant to be executed only once in a session. It
can also be used to customize the operating environment to suit user requirements. Every
time you change the profile file, you should either log out and log in again or You can
execute it by using a special command (called dot).
$ . .profile
The rc File
Normally the profiles are executed only once, upon login. The rc files are designed to be
executed every time a separate shell is created. There is no rc file in Bourne, but bash and
korn use one. This file is defined by an environment variable BASH_ENV in Bash and
ENV in Korn.
export BASH_ENV=$HOME/.bashrc
export ENV=$HOME/.kshrc
Korn automatically executes .kshrc during login if ENV is defined. Bash merely ensures
that a sub-shell executes this file. If the login shell also has to execute this file then a
separate entry must be added in the profile:
. ~/.bashrc
The rc file is used to define command aliases, variable settings, and shell options. Some
sample entries of an rc file are
alias cp=cp i
alias rm=rm i
set o noclobber
set o ignoreeof
set o vi
The rc file will be executed after the profile. However, if the BASH_ENV or ENV
variables are not set, the shell executes only the profile.
page 61
10CS44
Conclusion
In this chapter, we looked at the environment-related features of the shells, and found
weaknesses in the Bourne shell. Knowledge of Bash and Korn only supplements your
knowledge of Bourne and doesnt take anything away. It is always advisable to use Bash
or korn as your default login shell as it results in a more fruitful experience, with their
rich features in the form of aliases, history features and in-line command editing features.
page 62
10CS44
UNIT 4
4.
7 Hours
Text Book
4. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 63
10CS44
File type
File permissions
Number of links
The UID of the owner
The GID of the group owner
File size in bytes
Date and time of last modification
Date and time of last access
Date and time of last change of the inode
An array of pointers that keep track of all disk blocks used by the file
Please note that, neither the name of the file nor the inode number is stored in the inode.
To know inode number of a file:
ls -il tulec05
9059 -rw-r--r-- 1 kumar metal 51813 Jan 31 11:15 tulec05
Where, 9059 is the inode number and no other file can have the same inode number in the
same file system.
page 64
10CS44
Hard Links
The link count is displayed in the second column of the listing. This count is normally 1,
but the following files have two links,
-rwxr-xr-- 2 kumar metal 163 Jull 13 21:36 backup.sh
-rwxr-xr-- 2 kumar metal 163 Jul 13 21:36 restore.sh
All attributes seem to be identical, but the files could still be copies. Its the link count
that seems to suggest that the files are linked to each other. But this can only be
confirmed by using the i option to ls.
ls -li backup.sh restore.sh
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 backup.sh
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 restore.sh
ln: Creating Hard Links
A file is linked with the ln command which takes two filenames as arguments (cp
command). The command can create both a hard link and a soft link and has syntax
similar to the one used by cp. The following command links emp.lst with employee:
ln emp.lst employee
The i option to ls shows that they have the same inode number, meaning that
they are actually one end the same file:
ls -li emp.lst employee
29518 -rwxr-xr-x 2 kumar metal 915 may 4 09:58 emp.lst
29518 -rwxr-xr-x 2 kumar metal 915 may 4 09:58 employee
The link count, which is normally one for unlinked files, is shown to be two. You
can increase the number of links by adding the third file name emp.dat as:
ln employee emp.dat ; ls -l emp*
29518 -rwxr-xr-x 3 kumar metal 915 may 4 09:58 emp.dat
29518 -rwxr-xr-x 3 kumar metal 915 may 4 09:58 emp.lst
29518 -rwxr-xr-x 3 kumar metal 915 may 4 09:58 employee
You can link multiple files, but then the destination filename must be a directory. A file is
considered to be completely removed from the file system when its link count drops to
zero. ln returns an error when the destination file exists. Use the f option to force the
removal of the existing link before creation of the new one
page 65
10CS44
page 66
10CS44
However, Linux uses a fast symbolic link which stores the pathname in the inode itself
provided it doesnt exceed 60 characters.
The Directory
A directory has its own permissions, owners and links. The significance of the file
attributes change a great deal when applied to a directory. For example, the size of a
directory is in no way related to the size of files that exists in the directory, but rather to
the number of files housed by it. The higher the number of files, the larger the directory
size. Permission acquires a different meaning when the term is applied to a directory.
ls -l -d progs
drwxr-xr-x 2 kumar metal 320 may 9 09:57 progs
The default permissions are different from those of ordinary files. The user has all
permissions, and group and others have read and execute permissions only. The
permissions of a directory also impact the security of its files. To understand how that can
happen, we must know what permissions for a directory really mean.
Read permission
Read permission for a directory means that the list of filenames stored in that
directory is accessible. Since ls reads the directory to display filenames, if a directorys
read permission is removed, ls wont work. Consider removing the read permission first
from the directory progs,
ls -ld progs
drwxr-xr-x 2 kumar metal 128 jun 18 22:41 progs
chmod -r progs ; ls progs
progs: permission denied
Write permission
We cant write to a directory file. Only the kernel can do that. If that were
possible, any user could destroy the integrity of the file system. Write permission for a
directory implies that you are permitted to create or remove files in it. To try that out,
restore the read permission and remove the write permission from the directory before
you try to copy a file to it.
chmod 555 progs ; ls ld progs
dr-xr-xr-x 2 kumar metal 128 jun 18 22:41 progs
page 67
10CS44
cp emp.lst progs
cp: cannot create progs/emp.lst: permission denied
The write permission for a directory determines whether we can create or remove
files in it because these actions modify the directory
Whether we can modify a file depends on whether the file itself has write
permission. Changing a file doesn't modify its directory entry
Execute permission
If a single directory in the pathname doesnt have execute permission, then it
cant be searched for the name of the next directory. Thats why the execute privilege of
a directory is often referred to as the search permission. A directory has to be searched
for the next directory, so the cd command wont work if the search permission for the
directory is turned off.
chmod 666 progs ; ls ld progs
drw-rw-rw- 2 kumar metal 128 jun 18 22:41 progs
cd progs
permission denied to search and execute it
umask: DEFAULT FILE AND DIRECTORY PERMISSIONS
When we create files and directories, the permissions assigned to them depend on
the systems default setting. The UNIX system has the following default permissions for
all files and directories.
rw-rw-rw- (octal 666) for regular files
rwxrwxrwx (octal 777) for directories
The default is transformed by subtracting the user mask from it to remove one or
more permissions. We can evaluate the current value of the mask by using umask without
arguments,
$ umask
022
This becomes 644 (666-022) for ordinary files and 755 (777-022) for directories umask
000. This indicates, we are not subtracting anything and the default permissions will
remain unchanged. Note that, changing system wide default permission settings is
possible using chmod but not by umask
page 68
10CS44
A UNIX file has three time stamps associated with it. Among them, two are:
Time of last file modification
ls -l
Time of last access
ls lu
The access time is displayed when ls -l is combined with the -u option. Knowledge of
files modification and access times is extremely important for the system administrator.
Many of the tools used by them look at these time stamps to decide whether a particular
file will participate in a backup or not.
TOUCH COMMAND changing the time stamps
To set the modification and access times to predefined values, we have,
touch options expression filename(s)
touch emp.lst (without options and expression)
Then, both times are set to the current time and creates the file, if it doesnt exist.
touch command (without options but with expression) can be used. The expression
consists of MMDDhhmm (month, day, hour and minute).
touch 03161430 emp.lst ; ls -l emp.lst
-rw-r--r-- 1 kumar metal 870 mar 16 14:30 emp.lst
ls -lu emp.lst
-rw-r--r-- 1 kumar metal 870 mar 16 14:30 emp.lst
It is possible to change the two times individually. The m and a options change the
modification and access times, respectively:
touch command (with options and expression)
-m for changing modification time
-a for changing access time
touch -m 02281030 emp.lst ; ls -l emp.lst
-rw-r--r-- 1 kumar metal 870 feb 28 10:30 emp.lst
touch -a 01261650 emp.lst ; ls -lu emp.lst
page 69
10CS44
Source: Sumitabha Das, UNIX Concepts and Applications, 4th edition, Tata
McGraw Hill, 2006
page 70
10CS44
SIMPLE FILTERS
Filters are the commands which accept data from standard input manipulate it and
write the results to standard output. Filters are the central tools of the UNIX tool kit, and
each filter performs a simple function. Some commands use delimiter, pipe (|) or colon (:).
Many filters work well with delimited fields, and some simply wont work without them.
The piping mechanism allows the standard output of one filter serve as standard input of
another. The filters can read data from standard input when used without a filename as
argument, and from the file otherwise
The Simple Database
Several UNIX commands are provided for text editing and shell programming.
(emp.lst) - each line of this file has six fields separated by five delimiters. The details of
an employee are stored in one single line. This text file designed in fixed format and
containing a personnel database. There are 15 lines, where each field is separated by the
delimiter |.
$ cat emp.lst
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
9876 | jai sharma | director | production | 12/03/50 | 7000
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300
1265 | s.n. dasgupta | manager | sales | 12/09/63 | 5600
4290 | jayant choudhury | executive | production | 07/09/50 | 6000
2476 | anil aggarwal | manager | sales | 01/05/59 | 5000
6521 | lalit chowdury | directir | marketing | 26/09/45 | 8200
3212 | shyam saksena | d.g.m. | accounts | 12/12/55 | 6000
3564 | sudhir agarwal | executive | personnel | 06/07/47 | 7500
2345 | j. b. sexena | g.m. | marketing | 12/03/45 | 8000
0110 | v.k.agrawal | g.m.| marketing | 31/12/40 | 9000
pr : paginating files
We know that,
cat dept.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
Dept of CSE, SJBIT
page 71
10CS44
04|personnel|2365
05|production|9876
06|sales|1006
pr command adds suitable headers, footers and formatted text. pr adds five lines of
margin at the top and bottom. The header shows the date and time of last modification of
the file along with the filename and page number.
pr dept.lst
May 06 10:38 1997 dept.lst page 1
01:accounts:6213
02:progs:5423
03:marketing:6521
04:personnel:2365
05:production:9876
06:sales:1006
blank lines
pr options
The different options for pr command are:
-k prints k (integer) columns
-t to suppress the header and footer
-h to have a header of users choice
-d double spaces input
-n will number each line and helps in debugging
-on offsets the lines by n spaces and increases left margin of page
pr +10 chap01
starts printing from page 10
pr -l 54 chap01
this option sets the page length to 54
head displaying the beginning of the file
The command displays the top of the file. It displays the first 10 lines of the file,
when used without an option.
head emp.lst
page 72
10CS44
Use tail f when we are running a program that continuously writes to a file, and we want
to see how the file is growing. We have to terminate this command with the interrupt key.
cut slitting a file vertically
It is used for slitting the file vertically. head -n 5 emp.lst | tee shortlist will select
the first five lines of emp.lst and saves it to shortlist. We can cut by using -c option with a
list of column numbers, delimited by a comma (cutting columns).
cut -c 6-22,24-32 shortlist
cut -c -3,6-22,28-34,55- shortlist
The expression 55- indicates column number 55 to end of line. Similarly, -3 is the same
as 1-3.
page 73
10CS44
Most files dont contain fixed length lines, so we have to cut fields rather than columns
(cutting fields).
-d for the field delimiter
-f for the field list
cut -d \ | -f 2,3 shortlist | tee cutlist1
will display the second and third columns of shortlist and saves the output in
cutlist1. here | is escaped to prevent it as pipeline character
page 74
10CS44
sort shortlist
This default sorting sequence can be altered by using certain options. We can also sort
one or more keys (fileds) or use a different ordering rule.
sort options
The important sort options are:
-tchar
-k n
-k m,n
-k m.n
-u
-n
-r
-f
-m list
-c
-o flname
sort t| k 2 shortlist
sorts the second field (name)
sort t| r k 2 shortlist
or
sort t| k 2r shortlist
sort order can be revered with this r option.
sort t| k 3,3 k 2,2 shortlist
sorting on secondary key is also possible as shown above.
sort t| k 5.7,5.8 shortlist
we can also specify a character position with in a field to be the beginning of sort
as shown above (sorting on columns).
sort n numfile
when sort acts on numericals, strange things can happen. When we sort a file
containing only numbers, we get a curious result. This can be overridden by n (numeric)
option.
page 75
10CS44
page 76
10CS44
page 77
10CS44
UNIT 5
5.
6 Hours
Text Book
5. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 78
10CS44
page 79
10CS44
-F
matches multiple fixed strings
grep -i agarwal emp.lst
grep -v director emp.lst > otherlist
wc -l otherlist will display 11 otherlist
grep n marketing emp.lst
grep c director emp.lst
grep c director emp*.lst
will print filenames prefixed to the line count
grep l manager *.lst
will display filenames only
grep e Agarwal e aggarwal e agrawal emp.lst
will print matching multiple patterns
grep f pattern.lst emp.lst
all the above three patterns are stored in a separate file pattern.lst
Basic Regular Expressions (BRE) An Introduction
It is tedious to specify each pattern separately with the -e option. grep uses an
expression of a different type to match a group of similar patterns. If an expression uses
meta characters, it is termed a regular expression. Some of the characters used by regular
expression are also meaningful to the shell.
BRE character subset
The basic regular expression character subset uses an elaborate meta character set,
overshadowing the shells wild-cards, and can perform amazing matches.
*
g*
.
.*
[pqr]
[c1-c2]
page 80
10CS44
page 81
10CS44
page 82
10CS44
page 83
10CS44
$p emp.lst
Selecting multiple groups of lines
sed n 3,$!p emp.lst
Negating the action, just same as 1,2p
Using Multiple Instructions (-e and f)
There is adequate scope of using the e and f options whenever sed is used with
multiple instructions.
sed n e 1,2p e 7,9p e $p emp.lst
Let us consider,
cat instr.fil
1,2p
7,9p
$p
-f option to direct the sed to take its instructions from the file
sed n f instr.fil emp.lst
We can combine and use e and f options as many times as we want
sed n f instr.fil1 f instr.fil2 emp.lst
sed n e /saxena/p f instr.fil1 f instr.fil2 emp.lst
Context Addressing
We can specify one or more patterns to locate lines
sed n /director/p emp.lst
We can also specify a comma-separated pair of context addresses to select a group of
lines.
sed n /dasgupta/,/saxena/p emp.lst
Line and context addresses can also be mixed
sed n 1,/dasgupta/p emp.lst
page 84
10CS44
page 85
10CS44
Will add two include lines in the beginning of foo.c file. Sed identifies the line without
the \ as the last line of input. Redirected to $$ temporary file. This technique has to be
followed when using the a and c commands also. To insert a blank line after each line of
the file is printed (double spacing text), we have,
sed a\
emp.lst
Deleting lines (d)
sed /director/d emp.lst > olist
or
page 86
10CS44
sed also uses regular expressions for patterns to be substituted. To replace all occurrence
of agarwal, aggarwal and agrawal with simply Agarwal, we have,
sed s/[Aa]gg*[ar][ar]wal/Agarwal/g emp.lst
We can also use ^ and $ with the same meaning. To add 2 prefix to all emp-ids,
sed s/^/2/ emp.lst | head n 1
22233 | a.k.shukla | gm | sales | 12/12/52 | 6000
To add .00 suffix to all salary,
sed s/$/.00/ emp.lst | head n 1
2233 | a.k.shukla | gm | sales | 12/12/52 | 6000.00
Performing multiple substitutions
sed s/<I>/<EM>/g
s/<B>/<STRONG>/g
s/<U>/<EM>/g form.html
An instruction processes the output of the previous instruction, as sed is a stream editor
and works on data stream
sed s/<I>/<EM>/g
s/<EM>/<STRONG>/g form.html
When a g is used at the end of a substitution instruction, the change is performed
globally along the line. Without it, only the left most occurrence is replaced. When there
are a group of instructions to execute, you should place these instructions in a file instead
and use sed with the f option.
Compressing multiple spaces
sed s/*|/|/g emp.lst | tee empn.lst | head n 3
2233|a.k.shukla|g.m|sales|12/12/52|6000
9876|jai sharma|director|production|12/03/50|7000
5678|sumit chakrobarty|dgm|mrking|19/04/43|6000
The remembered patterns
Consider the below three lines which does the same job
page 87
10CS44
page 88
10CS44
page 89
10CS44
UNIT 6
6.
6 Hours
Text Book
6. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 90
10CS44
Shell Scripts
When groups of command have to be executed regularly, they should be stored in a file,
and the file itself executed as a shell script or a shell program by the user. A shell
program runs in interpretive mode. It is not complied with a separate executable file as
with a C program but each statement is loaded into memory when it is to be executed.
Hence shell scripts run slower than the programs written in high-level language. .sh is
used as an extension for shell scripts. However the use of extension is not mandatory.
Shell scripts are executed in a separate child shell process which may or may not be same
as the login shell.
Example: script.sh
#! /bin/sh
# script.sh: Sample Shell Script
echo Welcome to Shell Programming
echo Todays date : `date`
echo This months calendar:
cal `date +%m 20%y`
page 91
10CS44
$ chmod +x script.sh
Then invoke the script name as:
$ script.sh
Once this is done, we can see the following output :
Welcome to Shell Programming
Todays date: Mon Oct 8 08:02:45 IST 2007
This months calendar:
October 2007
Su
Mo
Tu
We
Th
Fr
Sa
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
My Shell: /bin/Sh
As stated above the child shell reads and executes each statement in interpretive mode.
We can also explicitly spawn a child of your choice with the script name as argument:
sh script.sh
Note: Here the script neither requires a executable permission nor an interpreter line.
page 92
10CS44
#! /bin/sh
# emp1.sh: Interactive version, uses read to accept two inputs
#
echo Enter the pattern to be searched: \c
# No newline
read pname
echo Enter the file to be used: \c
read fname
echo Searching for pattern $pname from the file $fname
grep $pname $fname
echo Selected records shown above
Running of the above script by specifying the inputs when the script pauses twice:
$ emp1.sh
Enter the pattern to be searched : director
Enter the file to be used: emp.lst
Searching for pattern director from the file emp.lst
9876
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
page 93
Shell parameter
10CS44
Significance
$1, $2
$#
$0
$*
$ @
$?
$$
$!
page 94
10CS44
The shell provides two operators that aloe conditional execution, the && and ||.
Usage:
cmd1 && cmd2
cmd1 || cmd2
&& delimits two commands. cmd 2 executed only when cmd1 succeeds.
Example1:
$ grep director emp.lst && echo Pattern found
Output:
9876
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
Pattern found
Example 2:
$ grep clerk emp.lst || echo Pattern not found
Output:
Pattern not found
Example 3:
grep $1 $2 || exit 2
echo Pattern Found Job Over
page 95
10CS44
The if Conditional
The if statement makes two way decisions based on the result of a condition. The
following forms of if are available in the shell:
Form 1
Form 2
Form 3
if command is successful
if command is successful
if command is successful
then
then
then
execute commands
fi
execute commands
else
execute commands
elif command is successful
execute commands
fi
then...
else...
fi
If the command succeeds, the statements within if are executed or else statements in else
block are executed (if else present).
Example:
#! /bin/sh
if grep ^$1 /etc/passwd 2>/dev/null
then
echo Pattern Found
else
echo Pattern Not Found
fi
Output1:
$ emp3.sh ftp
ftp: *.325:15:FTP User:/Users1/home/ftp:/bin/true
Pattern Found
page 96
10CS44
Output2:
$ emp3.sh mail
Pattern Not Found
While: Looping
To carry out a set of instruction repeatedly shell offers three features namely while, until
and for.
Syntax:
while condition is true
do
Commands
done
The commands enclosed by do and done are executed repeatedly as long as condition is
true.
Example:
#! /bin/usr
ans=y
while [$ans=y]
do
echo Enter the code and description : \c > /dev/tty
read code description
echo $code $description >>newlist
echo Enter any more [Y/N]
read any
case $any in
Y* | y* ) answer =y;;
N* | n*) answer = n;;
*) answer=y;;
esac
done
Dept of CSE, SJBIT
page 97
10CS44
Input:
Enter the code and description : 03 analgestics
Enter any more [Y/N] :y
Enter the code and description : 04 antibiotics
Enter any more [Y/N] : [Enter]
Enter the code and description : 05 OTC drugs
Enter any more [Y/N] : n
Output:
$ cat newlist
03 | analgestics
04 | antibiotics
05 | OTC drugs
Meaning
-eq
Equal to
-ne
Not equal to
-gt
Greater than
-ge
-lt
Less than
-le
page 98
10CS44
Operators always begin with a (Hyphen) followed by a two word character word and
enclosed on either side by whitespace.
Numeric comparison in the shell is confined to integer values only, decimal values are
simply truncated.
Ex:
$x=5;y=7;z=7.2
1. $test $x eq $y; echo $?
1
Not equal
True
7.2 is equal to 7
page 99
10CS44
$emp31.sh ftp>foo
You didnt enter two arguments
$emp31.sh henry /etc/passwd>foo
Henry not found in /etc/passwd
$emp31.sh ftp /etc/passwd>foo
ftp:*:325:15:FTP User:/user1/home/ftp:/bin/true
Shorthand for test
[ and ] can be used instead of test. The following two forms are equivalent
Test $x eq $y
and
[ $x eq $y ]
String Comparison
Test command is also used for testing strings. Test can be used to compare strings with
the following set of comparison operators as listed below.
Test
True if
s1=s2
String s1=s2
s1!=s2
-n stg
-z stg
stg
s1= =s2
String s1=s2
page 100
10CS44
fi
echo Enter the filename to be used :\c
read flname
if [ ! n $flname ] ; then
echo You have not entered the flname ; exit 2
fi
emp.sh $pname $flname
else
emp.sh $*
fi
Output1:
$emp1.sh
Enter the string to be searched :[Enter]
You have not entered the string
Output2:
$emp1.sh
Enter the string to be searched :root
Enter the filename to be searched :/etc/passwd
Root:x:0:1:Super-user:/:/usr/bin/bash
When we run the script with arguments emp1.sh bypasses all the above activities and
calls emp.sh to perform all validation checks
$emp1.sh jai
You didnt enter two arguments
$emp1.sh jai emp,lst
9878|jai sharma|director|sales|12/03/56|70000
$emp1.sh jai sharma emp.lst
You didnt enter two arguments
Because $* treats jai and sharma are separate arguments. And $# makes a wrong
argument count. Solution is replace $* with $@ (with quote and then run the script.
page 101
10CS44
File Tests
Test can be used to test various file attributes like its type (file, directory or symbolic
links) or its permission (read, write. Execute, SUID, etc).
Example:
$ ls l emp.lst
-rw-rw-rw-
1 kumar group
Ordinary file
0
$ [-x emp.lst] ; echo $?
Not an executable.
1
$ [! -w emp.lst] || echo False that file not writeable
False that file is not writable.
Example: filetest.sh
#! /bin/usr
#
if [! e $1] : then
Echo File doesnot exist
elif [! r S1]; then
Echo File not readable
elif[! w $1]; then
Echo File not writable
else
Echo File is both readable and writable\
fi
Output:
$ filetest.sh emp3.lst
page 102
10CS44
True if
-f file
-r file
-w file
-x file
-d file
-s file
-e file
-u file
-k file
-L file
f1 nt f2
f1 ot f2
f1 ef f2
page 103
10CS44
Pattern3) commands3 ;;
Esac
Case first matches expression with pattern1. if the match succeeds, then it executes
commands1, which may be one or more commands. If the match fails, then pattern2 is
matched and so forth. Each command list is terminated with a pair of semicolon and the
entire construct is closed with esac (reverse of case).
Example:
#! /bin/sh
#
echo
Menu\n
1. List of files\n2. Processes of user\n3. Todays Date
4. Users of system\n5.Quit\nEnter your option: \c
read choice
case $choice in
1) ls l;;
2) ps f ;;
3) date ;;
4) who ;;
5) exit ;;
*) echo Invalid option
esac
Output
$ menu.sh
Menu
1. List of files
2. Processes of user
3. Todays Date
4. Users of system
5. Quit
Enter your option: 3
Mon Oct 8 08:02:45 IST 2007
page 104
10CS44
Note:
case can not handle relational and file test, but it matches strings with compact
code. It is very effective when the string is fetched by command substitution.
page 105
10CS44
$ expr $y/$x
1
$ expr 13%5
3
expr is also used with command substitution to assign a variable.
Example1:
$ x=6 y=2 : z=`expr $x+$y`
$ echo $z
8
Example2:
$ x=5
$ x=`expr $x+1`
$ echo $x
6
Dept of CSE, SJBIT
page 106
10CS44
String Handling:
expr is also used to handle strings. For manipulating strings, expr uses two expressions
separated by a colon (:). The string to be worked upon is closed on the left of the colon
and a regular expression is placed on its right. Depending on the composition of the
expression expr can perform the following three functions:
1. Determine the length of the string.
2. Extract the substring.
3. Locate the position of a character in a string.
1. Length of the string:
The regular expression .* is used to print the number of characters
matching the pattern .
Example1:
$ expr abcdefg : .*
7
Example2:
while echo Enter your name: \c ;do
read name
if [`expe $name :.*` -gt 20] ; then
echo Name is very long
else
break
fi
done
2. Extracting a substring:
expr can extract a string enclosed by the escape characters \ (and \).
Example:
$ st=2007
$ expr $st :..\(..\)
07
page 107
10CS44
Output:
$ comc
hello.c compiled successfully.
page 108
10CS44
While: Looping
To carry out a set of instruction repeatedly shell offers three features namely while, until
and for.
Synatx:
while condition is true
do
Commands
done
The commands enclosed by do and done are executed repadetedly as long as condition is
true.
Example:
#! /bin/usr
ans=y
while [$ans=y]
do
echo Enter the code and description : \c > /dev/tty
read code description
echo $code $description >>newlist
echo Enter any more [Y/N]
read any
case $any in
Y* | y* ) answer =y;;
N* | n*) answer = n;;
*) answer=y;;
esac
done
page 109
10CS44
Input:
Enter the code and description : 03 analgestics
Enter any more [Y/N] :y
Enter the code and description : 04 antibiotics
Enter any more [Y/N] : [Enter]
Enter the code and description : 05 OTC drugs
Enter any more [Y/N] : n
Output:
$ cat newlist
03 | analgestics
04 | antibiotics
05 | OTC drugs
Other Examples: An infinite/semi-infinite loop
(1)
while true ; do
[ -r $1 ] && break
sleep $2
done
(2)
while [ ! -r $1 ] ; do
sleep $2
done
page 110
10CS44
Example:
for file in ch1 ch2; do
> cp $file ${file}.bak
> echo $file copied to $file.bak
done
Output:
ch1 copied to ch1.bak
ch2 copied to ch2.bak
Sources of list:
List from variables: Series of variables are evaluated by the shell before
executing the loop
Example:
$ for var in $PATH $HOME; do echo $var ; done
Output:
/bin:/usr/bin;/home/local/bin;
/home/user1
List from wildcards: Here the shell interprets the wildcards as filenames.
Example:
for file in *.htm *.html ; do
sed s/strong/STRONG/g
s/img src/IMG SRC/g $file > $$
mv $$ $file
done
page 111
10CS44
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
page 112
10CS44
The set statement assigns positional parameters $1, $2 and so on, to its arguments. This is
used for picking up individual fields from the output of a program.
Example 1:
$ set 9876 2345 6213
$
This assigns the value 9876 to the positional parameters $1, 2345 to $2 and 6213 to $3. It
also sets the other parameters $# and $*.
Example 2:
$ set `date`
$ echo $*
Mon Oct 8 08:02:45 IST 2007
Example 3:
$ echo The date today is $2 $3, $6
The date today is Oct 8, 2007
Shift: Shifting Arguments Left
Shift transfers the contents of positional parameters to its immediate lower numbered one.
This is done as many times as the statement is called. When called once, $2 becomes $1,
$3 becomes S2 and so on.
Example 1:
$ echo $@
page 113
10CS44
$shift
$echo $1 $2 $3
Mon Oct 8 08:02:45
$shift 2
Shifts 2 places
$echo $1 $2 $3
08:02:45 IST 2007
Example 2: emp.sh
#! /bin/sh
Case $# in
0|1) echo Usage: $0 file pattern(S) ;exit ;;
*) fname=$1
shift
for pattern in $@ ; do
grep $pattern $fname || echo Pattern $pattern not found
done;;
esac
Output:
$emp.sh emp.lst
Insufficient number of arguments
$emp.sh emp.lst Rakesh 1006 9877
9876
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
page 114
10CS44
Example:
$set `ls l chp1`
Output:
-rwxr-xr-x: bad options
Example2:
$set `grep usr1 /etc/passwd`
Correction to be made to get correct output are:
$set -- `ls l chp1`
$set -- `grep usr1 /etc/passwd`
The string (MARK) is delimiter. The shell treats every line following the command and
delimited by MARK as input to the command. Kumar at the other end will see three lines
of message text with the date inserted by command. The word MARK itself doesnt show
up.
page 115
10CS44
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
Example: To remove all temporary files named after the PID number of the shell:
trap rm $$* ; echo Program Interrupted ; exit HUP INT TERM
page 116
10CS44
trap is a signal handler. It first removes all files expanded from $$*, echoes a message
and finally terminates the script when signals SIGHUP (1), SIGINT (2) or SIGTERM(15)
are sent to the shell process running the script.
A script can also be made to ignore the signals by using a null command list.
Example:
trap 1 2 15
Programs
1)
#!/bin/sh
IFS=|
While echo enter dept code:\c; do
Read dcode
Set -- `grep ^$dcode<<limit
01|ISE|22
02|CSE|45
03|ECE|25
04|TCE|58
limit`
Case $# in
3) echo dept name :$2 \n emp-id:$3\n
*) echo invalid code;continue
esac
done
Output:
$valcode.sh
Enter dept code:88
Invalid code
Enter dept code:02
Dept name : CSE
Emp-id :45
Enter dept code:<ctrl-c>
page 117
10CS44
2)
#!/bin/sh
x=1
While [$x le 10];do
echo $x
x=`expr $x+1`
done
#!/bin/sh
sum=0
for I in $@ do
echo $I
sum=`expr $sum + $I`
done
Echo sum is $sum
3)
#!/bin/sh
sum=0
for I in `cat list`; do
echo string is $I
x= `expr $I:.*`
Echo length is $x
Done
4)
This is a non-recursive shell script that accepts any number of arguments and prints them
in a reverse order.
For example if A B C are entered then output is C B A.
#!/bin/sh
if [ $# -lt 2 ]; then
echo "please enter 2 or more arguments"
exit
fi
for x in $@
do
y=$x" "$y
done
echo "$y"
Run1:
[root@localhost shellprgms]# sh sh1a.sh 1 2 3 4 5 6 7
7654321
page 118
10CS44
5)
The following shell script to accept 2 file names checks if the permission for these files
are identical and if they are not identical outputs each filename followed by permission.
#!/bin/sh
if [ $# -lt 2 ]
then
echo "invalid number of arguments"
exit
fi
str1=`ls -l $1|cut -c 2-10`
str2=`ls -l $2|cut -c 2-10`
if [ "$str1" = "$str2" ]
then
echo "the file permissions are the same: $str1"
else
echo " Different file permissions "
echo -e "file permission for $1 is $str1\nfile permission for $2 is $str2"
fi
Run1:
[root@localhost shellprgms]# sh 2a.sh ab.c xy.c
file permission for ab.c is rw-r--r-file permission for xy.c is rwxr-xr-x
Run2:
page 119
10CS44
12172
7)This shell script that accepts valid log-in names as arguments and prints their
corresponding home directories. If no arguments are specified, print a suitable error
message.
if [ $# -lt 1 ]
then
echo " Invlaid Arguments....... "
exit
fi
for x in "$@"
page 120
10CS44
do
grep -w "^$x" /etc/passwd | cut -d ":" -f 1,6
done
Run1:
[root@localhost shellprgms]# sh 4a.sh root
root:/root
Run2:
[root@localhost shellprgms]# sh 4a.sh
Invalid Arguments.......
8) This shell script finds and displays all the links of a file specified as the first argument
to the script. The second argument, which is optional, can be used to specify the directory
in which the search is to begin. If this second
argument is not present .the search is
to begin in current working directory.
#!/bin/bash
if [ $# -eq 0 ]
then
echo "Usage:sh 8a.sh[file1] [dir1(optional)]"
exit
fi
if [ -f $1 ]
then
dir="."
if [ $# -eq 2 ]
then
dir=$2
fi
inode=`ls -i $1|cut -d " " -f 2`
echo "Hard links of $1 are"
find $dir -inum $inode -print
page 121
10CS44
3 4
5 6 7
8 9
** 11 12 13 14
15 16 17 18 19 20 21
page 122
10CS44
22 23 24 25 26 27 28
29 30 31
10) This shell script implements terminal locking. Prompt the user for a password after
accepting, prompt for confirmation, if match occurs it must lock and ask for password, if
it matches terminal must be unlocked
trap " 1 2 3 5 20
clear
echo -e \nenter password to lock terminal:"
stty -echo
read keynew
stty echo
echo -e \nconfirm password:"
stty -echo
read keyold
stty echo
if [ $keyold = $keynew ]
then
echo "terminal locked!"
while [ 1 ]
do
echo "retype the password to unlock:"
stty -echo
read key
if [ $key = $keynew ]
then
stty echo
echo "terminal unlocked!"
stty sane
exit
fi
echo "invalid password!"
done
else
page 123
10CS44
****
page 124
10CS44
UNIT 7
7. awk An Advanced Filter
7 Hours
Text Book
7. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 125
10CS44
Jai Sharma
Manager
Productions
2356
Rohit
Manager
Sales
5683
Rakesh
Manager
Marketing
In the above example, /manager/ is the selection_criteria which selects lines that are
processed in the action section i.e. {print}. Since the print statement is used without any
field specifiers, it prints the whole line.
Note: If no selection_criteria is used, then action applies to all lines of the file.
page 126
10CS44
Since printing is the default action of awk, any one of the following three forms can be
used:
awk /manager/ emp.lst
awk /manager/ { print } emp.lst
awk /manager/ { print $0} emp.lst
Rohit
Manager
Sales
5683
Rakesh
Manager
Marketing
Productions
Rahul
| Accountant |
Productions
Rakesh
| Clerk
Productions
In the above example, comma (,) is used to delimit field specifications to ensure that each
field is separated from the other by a space so that the program produces a readable
output.
Note: We can also specify the number of lines we want using the built-in variable NR as
illustrated in the following example:
Example: awk F | NR==2, NR==4 { print NR, $2, $3, $4 } emp.lst
page 127
10CS44
Output:
2
Jai Sharma
Manager
Productions
Rahul
Accountant
Productions
Rakesh
Clerk
Productions
R Kumar
Manager
Sunil kumaar
Accountant
Anil Kummar
Clerk
Here, the name and designation have been printed in spaces 20 and 12 characters wide
respectively.
Note: The printf requires \n to print a newline after each line.
Redirecting Standard Output:
The print and printf statements can be separately redirected with the > and | symbols. Any
command or a filename that follows these redirection symbols should be enclosed within
double quotes.
Example1: use of |
printf %3d %-20s %-12s \n, NR, $2, $3 | sort
page 128
10CS44
prints Hello
String concatenation can also be performed. Awk does not provide any operator for this,
however strings can be concatenated by simply placing them side-by-side.
Example 1: z = "Hello" "World"
print z
Dept of CSE, SJBIT
10CS44
Example 3: x = UNIX
y = LINUX
print x & y
Expressions also have true and false values associated with them. A nonempty string or
any positive number has true value.
Example: if(c)
chairman
15000
jai sharma
manager
9000
rohit
manager
8750
rakesh
manager
8500
The above command looks for two strings only in the third filed ($3). The second
attempted only if (||) the first match fails.
Note: awk uses the || and && logical operators as in C and UNIX shell.
Example 2 : $ awk F | $3 != manager && $3 != chairman {
> printf %-20s %-12s %d\n, $2, $3, $5} emp.lst
Dept of CSE, SJBIT
page 130
10CS44
Output:
Sunil kumaar
Accountant
7000
Anil Kummar
Clerk
6000
Rahul
Accountant
7000
Rakesh
Clerk
6000
The above example illustrates the use of != and && operators. Here all the employee
records other than that of manager and chairman are displayed.
~ and !~ : The Regular Expression Operators:
In awk, special characters, called regular expression operators or metacharacters, can be
used with regular expression to increase the power and versatility of regular expressions.
To restrict a match to a specific field, two regular expression operators ~ (matches)
and !~ (does not match).
Example1: $2 ~ /[cC]ho[wu]dh?ury / || $2 ~ /sa[xk]s ?ena /
Example2: $2 !~ /manager | chairman /
Note:
The operators ~ and !~ work only with field specifiers like $1, $2, etc.,.
For instance, to locate g.m s the following command does not display the expected output,
because the word g.m. is embedded in d.g.m or c.g.m.
$ awk F | $3 ~ /g.m./ {printf ..
prints fields including g.m like g.m, d.g.m and c.g.m
To avoid such unexpected output, awk provides two operators ^ and $ that indicates the
beginning and end of the filed respectively. So the above command should be modified
as follows:
$ awk F | $3 ~ /^g.m./ {printf ..
prints fields including g.m only and not d.g.m or c.g.m
The following table depicts the comparison and regular expression matching operators.
page 131
10CS44
Operator
Significance
<
Less than
<=
==
Equal to
!=
Not equal to
>=
>
Greater than
!~
Number Comparison:
Awk has the ability to handle numbers (integer and floating type). Relational test or
comparisons can also be performed on them.
Example: $ awk F | $5 > 7500 {
> printf %-20s %-12s %d\n, $2, $3, $5} emp.lst
Output:
ganesh
chairman
15000
jai sharma
manager
9000
rohit
manager
8750
rakesh
manager
8500
In the above example, the details of employees getting salary greater than 7500 are
displayed.
Regular expressions can also be combined with numeric comparison.
Example: $ awk F | $5 > 7500 || $6 ~/1980$/ {
> printf %-20s %-12s %d\n, $2, $3, $5, $6} emp.lst
Output:
ganesh
chairman
15000
30/12/1950
jai sharma
manager
9000
01/01/1980
page 132
10CS44
rohit
manager
8750
10/05/1975
rakesh
manager
8500
20/05/1975
Rahul
Accountant
6000
01/10/1980
Anil
Clerk
5000
20/05/1980
In the above example, the details of employees getting salary greater than 7500 or whose
year of birth is 1980 are displayed.
Number Processing
Numeric computations can be performed in awk using the arithmetic operators like +, -, /,
*, % (modulus). One of the main feature of awk w.r.t. number processing is that it can
handle even decimal numbers, which is not possible in shell.
Example: $ awk F | $3 == manager {
> printf %-20s %-12s %d\n, $2, $3, $5, $5*0.4} emp.lst
Output:
jai sharma
manager
9000
3600
rohit
manager
8750
3500
rakesh
manager
8500
3250
page 133
10CS44
Variables
Awk allows the user to use variables of there choice. You can now print a serial number,
using the variable kount, and apply it those directors drawing a salary exceeding 6700:
$ awk F| $3 == director && $6 > 6700 {
kount =kount+1
printf %3f %20s %-12s %d\n, kount,$2,$3,$6 } empn.lst
The initial value of kount was 0 (by default). Thats why the first line is correctly
assigned the number 1. awk also accepts the C- style incrementing forms:
Kount ++
Kount +=2
Printf %3d\n, ++kount
THE f OPTION: STORING awk PROGRAMS INA FILE
You should holds large awk programs in separate file and provide them with the
.awk extension for easier identification. Lets first store the previous program in the file
empawk.awk:
$ cat empawk.awk
Observe that this time we havent used quotes to enclose the awk program. You
can now use awk with the f filename option to obtain the same output:
Awk F| f empawk.awk empn.lst
THE BEGIN AND END SECTIONS
Awk statements are usully applied to all lines selected by the address, and if there
are no addresses, then they are applied to every line of input. But, if you have to print
something before processing the first line, for example, a heading, then the BEGIN
section can be used gainfully. Similarly, the end section useful in printing some totals
after processing is over.
The BEGIN and END sections are optional and take the form
BEGIN {action}
END {action}
These two sections, when present, are delimited by the body of the awk program. You
can use them to print a suitable heading at the beginning and the average salary at the
end. Store this program, in a separate file empawk2.awk
Like the shell, awk also uses the # for providing comments. The BEGIN section
prints a suitable heading , offset by two tabs (\t\t), while the END section prints the
average pay (tot/kount) for the selected lines. To execute this program, use the f option:
$awk F| f empawk2.awk empn.lst
page 134
10CS44
Like all filters, awk reads standard input when the filename is omitted. We can make awk
behave like a simple scripting language by doing all work in the BEGIN section. This is
how you perform floating point arithmetic:
$ awk BEGIN {printf %f\n, 22/7 }
3.142857
This is something that you cant do with expr. Depending on the version of the awk the
prompt may be or may not be returned, which means that awk may still be reading
standard input. Use [ctrl-d] to return the prompt.
BUILT-IN VARIABLES
Awk has several built-in variables. They are all assigned automatically, though it
is also possible for a user to reassign some of them. You have already used NR, which
signifies the record number of the current line. Well now have a brief look at some of the
other variable.
The FS Variable: as stated elsewhere, awk uses a contiguous string of spaces as the
default field delimeter. FS redefines this field separator, which in the sample database
happens to be the |. When used at all, it must occur in the BEGIN section so that the body
of the program knows its value before it starts processing:
BEGIN {FS=|}
This is an alternative to the F option which does the same thing.
The OFS Variable: when you used the print statement with comma-separated arguments,
each argument was separated from the other by a space. This is awks default output field
separator, and can reassigned using the variable OFS in the BEGIN section:
BEGIN { OFS=~ }
When you reassign this variable with a ~ (tilde), awk will use this character for delimiting
the print arguments. This is a useful variable for creating lines with delimited fields.
The NF variable: NF comes in quite handy for cleaning up a database of lines that dont
contain the right number of fields. By using it on a file, say emp.lst, you can locate those
lines not having 6 fields, and which have crept in due to faulty data entry:
$awk BEGIN { FS = | }
NF !=6 {
Print Record No , NR, has , fields} empx.lst
The FILENAME Variable: FILENAME stores the name of the current file being
processed. Like grep and sed, awk can also handle multiple filenames in the command
line. By default, awk doesnt print the filename, but you can instruct it to do so:
page 135
10CS44
page 136
10CS44
FUNCTIONS
Awk has several built in functions, performing both arithmetic and string
operations. The arguments are passed to a function in C-style, delimited by commas and
enclosed by a matched pair of parentheses. Even though awk allows use of functions with
and without parentheses (like printf and printf()), POSIX discourages use of functions
without parentheses.
Some of these functions take a variable number of arguments, and one (length) uses no
arguments as a variant form. The functions are adequately explained here so u can
confidently use them in perl which often uses identical syntaxes.
There are two arithmetic functions which a programmer will except awk to offer. int
calculates the integral portion of a number (without rounding off),while sqrt calculates
page 137
10CS44
square root of a number. awk also has some of the common string handling function you
can hope to find in any language. There are:
length: it determines the length of its arguments, and if no argument is present, the enire
line is assumed to be the argument. You can use length (without any argument) to locate
lines whose length exceeds 1024 characters:
awk F| length > 1024 empn.lst
you can use length with a field as well. The following program selects those people who
have short names:
awk F| length ($2) < 11 empn.lst
index(s1, s2): it determines the position of a string s2within a larger string s1. This
function is especially useful in validating single character fields. If a field takes the
values a, b, c, d or e you can use this function n to find out whether this single character
field can be located within a string abcde:
x = index (abcde, b)
This returns the value 2.
substr (stg, m, n): it extracts a substring from a string stg. m represents the starting point
of extraction and n indicates the number of characters to be extracted. Because string
values can also be used for computation, the returned string from this function can be
used to select those born between 1946 and 1951:
awk F| substr($5, 7, 2) > 45 && substr($5, 7, 2) < 52 empn.lst
2365|barun sengupta|director|personel|11/05/47|7800|2365
3564|sudhir ararwal|executive|personnel|06/07/47|7500|2365
4290|jaynth Choudhury|executive|production|07/09/50|6000|9876
9876|jai sharma|director|production|12/03/50|7000|9876
you can never get this output with either sed and grep because regular expressions can
never match the numbers between 46 and 51. Note that awk does indeed posses a
mechanism of identifying the type of expression from its context. It identified the date
field string for using substr and then converted it to a number for making a numeric
comparison.
split(stg, arr, ch): it breaks up a string stg on the delimiter ch and stores the fields in an
array arr[]. Heres how yo can convert the date field to the format YYYYMMDD:
$awk F | {split($5, ar, /); print 19ar[3]ar[2]ar[1]} empn.lst
19521212
19501203
19431904
..
page 138
10CS44
You can also do it with sed, but this method is superior because it explicitly picks up the
fifth field, whereas sed would transorm the only date field that it finds.
system: you may want to print the system date at the beging of the report. For running a
UNIX command within a awk, youll have to use the system function. Here are two
examples:
BEGIN {
system(tput clear)
Clears the screen
system(date)
Executes the UNIX date command
}
CONTROL FLOW- THE if STATEMENT:
Awk has practically all the features of a modern programming language. It has
conditional structures (the if statement) and loops (while or for). They all execute a body
of statements depending on the success or failure of the control command. This is simply
a condition that is specified in the first line of the construct.
Function
int(x)
sqrt(x)
length
length(x)
substr(stg, m, n)
index(1s, s2)
splicit(stg, arr, ch)
System(cmd)
Description
returns the integer value of x
returns the square root of x
returns the complete length of line
returns length of x
returns portion of string of length n, starting from position
m in string stg.
returns position of string s2 in string s1
splicit string stg into array arr using ch as delimiter, returns
number of fields.
runs UNIX command cmd and returns its exit status
The if statement can be used when the && and || are found to be inadequate for
certain tasks. Its behavior is well known to all programmers. The statement here takes the
form:
If (condition is true) {
Statement
} else {
Statement
}
Like in C, none of the control flow constructs need to use curly braces if theres
only one statement to be executed. But when there are multiple actions take, the
statement must be enclosed within a pair of curly braces. Moreover, the control command
must be enclosed in parentheses.
Most of the addresses that have been used so far reflect the logic normally used in
the if statement. In a previous example, you have selected lines where the basic pay
exceeded 7500, by using the condition as the selection criteria:
page 139
10CS44
$6 > 7500 {
An alternative form of this logic places the condition inside the action component
rather than the selection criteria. But this form requires the if statement:
Awk F | { if ($6 > 7500) printf .
if can be used with the comparison operators and the special symbols ~ and !~ to match a
regular expression. When used in combination with the logical operators || and &&, awk
programming becomes quite easy and powerful. Some of the earlier pattern matching
expressions are rephrased in the following, this time in the form used by if:
if ( NR > = 3 && NR <= 6 )
if ( $3 == director || $3 == chairman )
if ( $3 ~ /^g.m/ )
if ( $2 !~ / [aA]gg?[ar]+wal/ )
if ( $2 ~[cC]ho[wu]dh?ury|sa[xk]s?ena/ )
To illustrate the use of the optional else statement, lets assume that the dearness
allowance is 25% of basic pay when the latter is less than 600, and 1000 otherwise. The
if-else structure that implants this logic looks like this:
If ( $6 < 6000 )
da = 0.25*$6
else
da = 1000
You can even replace the above if construct with a compact conditional structure:
$6 < 6000 ? da = 0.25*$6 : da = 1000
This is the form that C and perl use to implement the logic of simple if-else
construct. The ? and : act as separator of the two actions.
When you have more than one statement to be executed, they must be bounded by
a pair of curly braces (as in C). For example, if the factors determining the hra and da are
in turn dependent on the basic pay itself, then you need to use terminators:
If ( $6 < 6000 ) {
hra = 0.50*$6
da = 0.25*$6
}else {
hra = 0.40*$6
da = 1000
}
LOOPING WITH for:
page 140
10CS44
awk supports two loops for and while. They both execute the loop body as long
as the control command returns a true value. For has two forms. The easier one
resembles its C counterpart. A simple example illustrates the first form:
for (k=0; k<=9; k+=2)
This form also consists of three components; the first component initializes the value of k,
the second checks the condition with every iteration, while the third sets the increment
used for every iteration. for is useful for centering text, and the following examples uses
awk with echo in a pipeline to do that:
$echo
>Income statement\nfor\nthe month of august, 2002\nDepartment : Sales |
>awk { for (k=1 ; k < (55 length($0)) /2 ; k++)
>printf %s,
>printf $0}
Income statement
for
the month of August, 2002
Department : Sales
The loop here uses the first printf statement to print the required number of spaces (page
width assumed to be 55 ). The line is then printed with the second printf statement,
which falls outside the loop. This is useful routine which can be used to center some titles
that normally appear at the beginning of a report.
Using for with an Associative Array:
The second form of the for loop exploits the associative feature of awks arrays.
This form is also seen in perl but not in the commonly used languages like C and java.
The loop selects each indexof an array:
for ( k in array )
commamds
Here, k is the subscript of the array arr. Because k can also be a string, we can use this
loop to print all environment variables. We simply have to pick up each subscript of the
ENVIRON array:
$ nawk BIGIN {
>for ( key in ENVIRON )
>print key = ENVIRON [key]
>}
LOGNAME=praveen
MAIL=/var/mail/Praveen
page 141
10CS44
PATH=/usr/bin::/usr/local/bin::/usr/ccs/bin
TERM=xterm
HOME=/home/praveen
SHELL=/bin/bash
Because the index is actually a string, we can use any field as index. We can even use
elements of the array counters. Using our sample databases, we can display the count of
the employees, grouped according to the disgnation ( the third field ). You can use the
string value of $3 as the subscript of the array kount[]:
$awk F| { kount[$3]++ }
>END { for ( desig in kount)
>print desig, kount[desig] } empn.lst
g.m
chairman
executive
director
manager
d.g.m
4
1
2
4
2
2
The program here analyzes the databases to break up of the employees, grouped on their
designation. The array kount[] takes as its subscript non-numeric values g.m., chairman,
executive, etc.. for is invoked in the END section to print the subscript (desig) and the
number of occurrence of the subscript (kount[desig]). Note that you dont need to sort the
input file to print the report!
LOOPING WITH while
The while loop has a similar role to play; it repeatedly iterates the loop until the
command succeeds. For example, the previous for loop used for centering text can be
easily replaced with a while construct:
k=0
while (k < (55 length($0))/2) {
printf %s,
k++
}
print $0
The loop here prints a space and increments the value of k with every iteration. The
condition (k < (55 length($0))/2) is tested at the beginning of every iteration, and the
loop body only if the test succeeds. In this way, entire line is filled with a string
spacesbefore the text is printed with print $0.
page 142
10CS44
Not that the length function has been used with an argument ($0). This awk understands
to be the entire line. Since length, in the absence of arguments, uses the entire line
anyway, $0 can be omitted. Similarly, print $0 may also be replaced by simply print.
Programs
1)awk script to delete duplicate
lines in a file.
BEGIN { i=1;}
{
flag=1;
for(j=1; j<i && flag ; j++ )
{
if( x[j] == $0 )
flag=0;
}
if(flag)
{
x[i]=$0;
printf "%s \n",x[i];
i++;
}
}
Run1:
[root@localhost shellprgms]$ cat >for7.txt
hello
world
world
hello
this
is
this
Output:
page 143
10CS44
page 144
10CS44
Transpose
2
3)Awk script that folds long line into 40 columns. Thus any line that exceeds 40
Characters must be broken after 40th and is to be continued with the residue. The inputs
to be supplied through a text file created by the user.
BEGIN{
start=1; }
{ len=length;
for(i=$0; length(i)>40; len-=40)
{
print substr(i,1,40) "\\"
i=substr(i,41,len);
}
print i; }
Run1:
[root@localhost shellprgms]$ awk -F "|" -f 15.awk sample.txt
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaaaaa
aaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaa
Output:
page 145
10CS44
page 146
10CS44
UNIT 8
8. perl - The Master Manipulator
7 Hours
Text Book
8. UNIX Concepts and Applications, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
page 147
10CS44
.
Objectives
perl preliminaries
The chop function
Variables and Operators
String handling functions
Specifying filenames in a command line
$_(Default Variable)
$. (Current Line Number) and .. (The Range Operator)
Lists and Arrays
ARGV[]: Command Line Arguments
foreach: Looping Through a List
split: Splitting into a List or Array
join: Joining a List
dec2bin.pl: Converting a Decimal Number to Binary
grep: Searching an Array for a Pattern
Associative Arrays
Regular Expressions and Substitution
File Handling
Subroutines
Conclusion
1. Perl preliminaries
Perl: Perl stands for Practical Extraction and Reporting Language. The language was
developed by Larry Wall. Perl is a popular programming language because of its
powerful pattern matching capabilities, rich library of functions for arrays, lists and file
handling. Perl is also a popular choice for developing CGI (Common Gateway Interface)
scripts on the www (World Wide Web).
Perl is a simple yet useful programming language that provides the convenience of shell
scripts and the power and flexibility of high-level programming languages. Perl programs
are interpreted and executed directly, just as shell scripts are; however, they also contain
control structures and operators similar to those found in the C programming language.
This gives you the ability to write useful programs in a very
short time.
page 148
10CS44
http://www.perl.com
or
A perl program runs in a special interpretive model; the entire script is compiled
internally in memory before being executed. Script errors, if any, are generated before
execution. Unlike awk, printing isnt perls default action. Like C, all perl statements end
with a semicolon. Perl statements can either be executed on command line with the e
option or placed in .pl files. In Perl, anytime a # character is recognized, the rest of the
line is treated as a comment.
The following is a sample perl script.
#!/usr/bin/perl
# Script: sample.pl Shows the use of variables
#
print(Enter your name: );
$name=<STDIN>;
Print(Enter a temperature in Centigrade: );
$centigrade=<STDIN>;
$fahr=$centigrade*9/5 + 32;
print The temperature in Fahrenheit is $fahr\n;
print Thank you $name for using this program.
There are two ways of running a perl script. One is to assign execute (x) permission on
the script file and run it by specifying script filename (chmod +x filename). Other is to
use perl interpreter at the command line followed by the script name. In the second case,
we dont have to use the interpreter line viz., #!/usr/bin/perl.
page 149
10CS44
Comparison Operators
Perl supports operators similar to C for performing numeric comparison. It also provides
operators for performing string comparison, unlike C where we have to use either
strcmp() or strcmpi() for string comparison. The are listed next.
Numeric comparison
==
!=
>
<
>=
<=
String comparison
eq
ne
gt
lt
ge
le
page 150
10CS44
page 151
10CS44
chop(<STDIN>);
In this case, a line is read from standard input and assigned to default variable $_, of
which the last character (in this case a \n) will be removed by the chop() function.
Note that you can reassign the value of $_, so that you can use the functions of perl
without specifying either $_ or any variable name as argument.
page 152
10CS44
Arrays
Perl allows you to store lists in special variables designed for that purpose. These
variables are called array variables. Note that arrays in perl need not contain similar type
of data. Also arrays in perl can dynamically grow or shrink at run time.
@array = (1, 2, 3); # Here, the list (1, 2, 3) is assigned to the array variable @array.
Perl uses @ and $ to distinguish array variables from scalar variables, the same name can
be used in an array variable and in a scalar variable:
$var = 1;
@var = (11, 27.1, "a string");
Here, the name var is used in both the scalar variable $var and the array variable @var.
These are two completely separate variables. You retrieve value of the scalar variable by
specifying $var, and of that of array at index 1 as $var[1] respectively.
Following are some of the examples of arrays with their description.
x = 27;
# list containing one element
@y = @x;
# assign one array variable to another
@x = (2, 3, 4);
@y = (1, @x, 5);
# the list (2, 3, 4) is substituted for @x, and the resulting list
# (1, 2, 3, 4,5) is assigned to @y.
$len = @y;
$last_index = $#y;
page 153
10CS44
page 154
10CS44
The current element of the list being used as the counter is stored in a special scalar
variable, which in this case is $temp. This variable is special because it is only defined
for the statements inside the foreach loop.
perl has a for loop as well whose syntax similar to C.
Example:
for($i=0 ; $i < 3 ; $i++) { . . .
page 155
10CS44
$binary_num = join(,@bit_arr);
print (Binary form of $temp is $binary_num\n);
splice(@bit_arr, 0, $#bit_arr+1);
}
The output of the above script (assuming script name is dec2bin.pl) is,
$ dec2bin.pl 10
Binary form of 10 is 1010
$ dec2bin.pl 8 12 15 10
Binary form of 8 is 1000
Binary form of 12 is 1100
Binary form of 15 is 1111
Binary form of 10 is 1010
$
page 156
10CS44
Here, the s prefix indicates that the pattern between the first / and the second is to be
replaced by the string between the second / and the third.
Here, any character matched by the first pattern is replaced by the corresponding
character in the second pattern.
page 157
10CS44
[A-Za-z0-9_].
\W doesnt match a word character, same as [^a-zA-Z0-9_]
\s matches any whitespace (any character not visible on the screen); it is
equivalent to [ \r\t\n\f].
perl accepts the IRE and TRE used by grep and sed, except that the curly braces
and parenthesis are not escaped.
For example, to locate lines longer than 512 characters using IRE:
perl ne print if /.{513,}/ filename # Note that we didnt escape the curly braces
page 158
10CS44
19. Subroutines
The use of subroutines results in a modular program. We already know the advantages of
modular approach. (They are code reuse, ease of debugging and better readability).
Frequently used segments of code can be stored in separate sections, known as
subroutines. The general form of defining a subroutine in perl is:
sub procedure_name {
# Body of the subroutine
}
Example: The following is a routine to read a line of input from a file and break it into
words.
sub get_words {
$inputline = <>;
@words = split(/\s+/, $inputline);
}
Note: The subroutine name must start with a letter, and can then consist of any number of
letters, digits, and underscores. The name must not be a keyword.
Precede the name of the subroutine with & to tell perl to call the subroutine.
The following example uses the previous subroutine get_words to count the number of
occurrences of the word the.
page 159
10CS44
#!/usr/bin/perl
$thecount = 0;
&get_words;
Call the subroutine
while ($words[0] ne "") {
for ($index = 0; $words[$index] ne "";
$index += 1) {
$thecount += 1 if $words[$index] eq "the";
}
&get_words;
}
Return Values
In perl subroutines, the last value seen by the subroutine becomes the subroutine's return
value. That is the reason why we could refer to the array variable @words in the calling
routine.
Conclusion
Perl is a programming language that allows you to write programs that manipulate files,
strings, integers, and arrays quickly and easily. perl is a superset of grep, tr, sed, awk and
the shell. perl also has functions for inter- process communication. perl helps in
developing minimal code for performing complex tasks. The UNIX spirit lives in perl.
perl is popularly used as a CGI scripting lan
page 160