Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

LGLinux 1 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 149

Introduction to the Linux

Environment

R.Krishna Murthy
Chief Research Scientist
Supercomputer education and Research Centre
Indian Institute of Science

___________________________________________________________________________________
E-mail: arkay.murthy@gmail.com

Phone: 23600653,4,9

Course Objective:
The course aims at providing the participants an overview of the
major features of Linux from a end user's perspective. At the end of
the course the participants will have adequate background to use
Linux for routine computing. This course will provide the requisite
background to pursue an advanced course for program
development in the Linux environment

Pre-requisites:
The participants are expected to be familiar with the use of
computers in interactive mode , programming in any high-level
language and in the use of standard tools like editors, compilers etc.

Course Coverage:
Lectures, Tutorial walk-through, Reading and Hands-on
assignments

INTRODUCTION TO LINUX

Linux is a Unix-like Operating system designed by Linus Torvalds


An important example of successful Free Software and Open
Source Development

Introduced in 1991, it is available on multiple hardware platforms


including desktops,servers,embedded systems, supercomputers
Widespread adoption of Linux

Latest stable release 2.6.21 (Linux Kernel) April 2007

Philosophy of Linux development is based on


Interoperability with other OS. Adhere to standards
where possible

(e.g. POSIX 1.3) open formats


-

Portability Originally designed for 1386 .Linux is a


Portable operating system available in multiple plat forms

community development is largely driven by user


community (Linux user groups)

Linux Distribution responsible for default configuration of


Linux and the integration of different packages into
coherent whole

BRIEF HISTORY OF LINUX


Version 0.1 September 1971
Version 1.0 March 1994 Supported only Single
processor i386 system
Version 1.2 March 1995 Support for multiple ArchitecturesAlpha, Sparc ,MIPS

Version 2.0 June 1996- Support for Mac processors and SMP

Version 2.2 Jan 1999


Version 2.4 Jan 2001- HP PARSIC,ISA Plug-and Play USB and PC
Card support

Version 2.4.6 Blue tooth Support


File System Data Storage LVM ,RAID

Version 2.6 Current (DEC 2003 to present)


Support for integrated microcontrollers
Linux ,More processors from
NEC,HITACHI,MOTOROLA support for Intel's Hyper Threading
Integrated ALSA Sound drivers
OS Support Improved APIC, Increased users and groups
(64K to 4 billion 232)

Major devices(255 x 4095),Minor devices (255 x 230)

64 bit file system support


Improved responses fast mutex
IMPROVED Module leaders
Storage Support LVM Version S6Is XFS filesys
- syfs and procfs
- SCSI Support June 2005

Latest Stable release 2.6.2 April 2007

OPERATING SYSTEM CONCEPTS

An Operating System (OS) is a systems program which

Provides services to users and makes the system easy


to use

Allocates,controls,monitors resources (CPU,Memory


etc.)

Helps in sharing the system in a controlled fashion.

Major functions of an OS are :


Process Management
Memory Management

Device and File Management


Command language and user services
Security and authentication

Types of OS :
Multiprogrammed systems - more than one job scheduled for execution.

Time-sharing systems (Interactive Systems) - Several users interact with


their jobs on the system simultaneously. Each user seems to have a system
for himself Good response for interaction is the main objective.
Better productivity

Batch systems - Users do not interact directly with their jobs.

Jobs are submitted in the form of command files for queued


processing. Turnaround time is an important metric useful in
controlled processing of work load

Real-time systems Cost of not meeting response


requirements high. correct results and within stipulated time.

Single / multi-user systems

Multi-tasking systems: Allows asynchronous initiation of related


processes (more then one) by a user

Process: is a program in execution. Processes compete for resources.


OS deals with the abstraction of Computation i.e.. Process

Processes are usually identified by their process identifier (pid)-usually


numeric. Program is a static entity. Process Program together with its
executing context
One program C compiler when invoked by different users gives rise to
different processes.

INTRODUCTION TO UNIX
Unix is an interactive , multi-user multitasking operating system
supporting a hierarchical file system and a command language
selectable on a per user basis
Developed by Ken Thompson and his associates at Bell labs in
1969 on a PDP-7

Main goals were to design an OS to satisfy the following goals


Simple and Elegant
Written in High level Language

Allow code reuse

Creating a computing environment to pursue work on


programming research comfortably and effectively
Ability to initiate asynchronous process
Command language selectable on a per user basis

High degree of portability


Over 100 subsystems including a dozen languages

Popularity of Unix
High-level language implementation
and consequent portability ensured wide availability
micros to super computers
Unix provided a rich set of operations and commands
that supported the building block approach to problem
solving

Customizable command languages that run as a user


process
Unix supported a large number of useful and innovative
tools and utilities:

e.g. diff,yacc,lex,awk,sed etc..

UNIXs Building Block Approach


who
grep pattern file
wc l
who | grep person
who | grep person | wc l

Finding unique words in a file


tr
sort

uniq
tr cs [a-zA-Z] \015 < file | sort | uniq

STRUCTURE OF UNIX

user
User
Commands
+ Data

UTTILITIES
transfer of
control

APPLICATIONS

UNIX

SHELL

System call
interface

UNIX SYSTEM HARDWARE

hardware
interface

SYSTEM ARCHITECTURE

Comm drivers
and protocols

Pipes and
filters
System call

System
maintenence
services

UNIX
User
command
interface

Hardware
Directly

Executable
programs

Kernel
Interface

Configurable
environment

SHELL
SHELL is the Command Language. Users interact with the system
through the shell

UTILITIES
Executable programs for the users to perform a variety of standard
functions supplied with the system

CATEGORIES OF UTILITIES
General Operations

date,who,cal,bc
changing password
System Administration

Adding/Removing users
creating and restoring file systems
Application Development
Creating,Editing,Debugging,Maintaining and Profiling Software
File and Process related Utilities

APPLICATIONS

User created software and


third party purchased software
KERNEL
Forms the Core of the Operating System that interacts
hardware to provide a defined set of services

with system

The basic services provided by the Kernel


System initialization
Process Management
Memory Management

File System Management


Communication Facilities
Programming Interface

Unix/Linux Philosphy
Unix Philosophy by Mike Gancarz

Why Linux ?
Linux is free
Linux is fully customizable in all its components source
available for customization under GPL
Linux is available on a wide range of platforms PCs to
supercomputers
Linux systems are stable low failure rate
Linux kernel is small and compact
Linux is compatible with many OS
Linux is well supported lot easier and faster to get
patches than proprietary

Components of a Linux system


The major components of a typical OS are
The Kernel the core resident portion of the OS . Provides
the execution environment my making a set of services and
corresponding interfaces to applications. The services are
available as commands or procedures
Command language to interact with the system -shell
Command and utilities to facilitate the use of systems
Strictly speaking Linux refers to the Unix like Kernel
other components- Commands and utilities are provided
by GNU
So Linux is sometimes referred to as GNU/Linux

SYSTEM INITIALISATION OVERVIEW


BOOTSTRAPPING

Hardware Configuration Check


Power-on Diagnostic
Loading the Unix Kernel
Initializing in Core data structures and interval tables
PROCESS 0 (Scheduler and Swapper is instrumented)
INIT process is forked by process 0.INIT process has pid 1 and is the
ultimate ancestor of all processes

INIT Process does most of the startup jobs like


File system Check and Mounting
Starting standard service and network daemons
e.g.. cron and NFS daemons
Enabling user logins

User login Process


Init spawns getty process on each terminal outputting
the prompt login
When user types login name getty overlays itself with
the login process
This asks for password

On successful entry of password login process is overlaid with


shell and ownership of process is changed to that of the user.
On logout, the users login process terminates and this is sensed by
the Init.

Init respawns the getty and the login prompt is redisplayed

USING THE UNIX SYSTEM


Obtaining User account

User identification-name
Password
Home directory and Shell
Resource allocation and limits

Using a terminal to interact


Logging on
Password validation
Creation of process (shell) attached to the terminal
Importance of terminal setting

stty command
Allows setting of terminal parameters

Parameters that can be set given by stty a


Proper setting necessary for effective interaction

e.g.. vi usage
Logging out of the system
Your identity in the system on logging in
Process identification
Group identification
Interacting terminals identity

The id command
The logname command
The ps command
The who command
Interaction with the system is through commands

Documentation on Linux
Man pages the man command
man man

- Describes about the man command

Structure of man pages


Heading name followed by section number
Name Name of the command and any related
commands described in the man page

Synopsis Command structure with options


Description a short description of the command
Options detailed description of the options

Other sections like usage, related commands may be


present
Organization of Man Pages - grouped into sections

Some of the sections are


1. User commands
2. System calls

3. Library routines
4. Device related information
5. File format descriptions
6. Games
7. Miscellaneous
8. System Administration
9. Kernel related

Man related commands whatis and apropos


Whatis searches for the name given in man pages
and prints related information
Apropos Does a keyword search of man pages and
prints information about all which match the given
keyword
Display of man pages less
Info command not all commands information is
available using info
Xman GUI based man pages

STRUCTURE OF COMMAND LINE


A command line is made of one or more distinct elements

Element is a sequence of non-white space characters separated from


other elements by delimiters like blank, tab (white space characters)
Some characters have special Interpretation assigned by shell
metacharacters
A Command is terminated by a new line

COMMAND LINE FORMAT


Command option1 ..option arg1 . Argn
Arguments is the input to command on which it acts
Options
modify the action of a command

e.g. : wc file
wc l file

Shell Prompt
The character echoed by the shell to indicate its readiness to
accept the next command
System permits type ahead
Correcting typing mistakes
erase character - #
erase line typed so far (kill character)@

A return sends all typed characters to OS Hence the


above characters affect only the current line
Terminating an executing command
- interrupt character usually CTRL-C
(or DEL
Key)
All the above characters can be reassigned using stty
stty erase ^h kill ^x

Miscellaneous Control sequences


CTRL d logout or end of file
CTRL-s stop screen output
CTRL-q Resume screen output
Continuing command on more than one line
$ Command .. \

\ turns the normal meaning of

Basic Linux OS Concepts


Everything in Linux is a file. If it is not a file it is a Process
Processes and Threads
Process/Kernel Model
User processes do not handle hardware. They use services provided
by the kernel.
Since several processes will be using the system, at the same time ,
the kernel needs to protect each process against others and from itself.
The kernel uses the h/w support to enforce the privileged use of OS as
against the non-privileged use or ordinary user. CPU supports
operations in different modes of varying privileges.
Linux Kernel supports 2 modes User mode and Kernel mode
User invokes Kernel servers through system calls. A system call
transfers control to well defined entry points in the Kernel after
validating the call and switches from non-privileged user mode to the
privileged mode . User special H/W instructions
SYSENTER/SYSEXIT

The kernel space code is usually executed in the context of


a process.
Files and Filesystems
Everything is a file
Files, directories, devices , network connections, pipes all are viewed
as files
File is viewed as a stream of bytes any structure is imposed by the
application. OS does not impose or enforce any structure
This uniform view helps substitutability the same program can work
with a file or a device without change

File systems a rooted hierarchy of files. A unit for


administration and a unit for making a collection available for use
Linux supports a uniform global namespace made of a collection
of file systems organized into a hierarchy.
A file system is made available for use by the process of
mounting

FILE SYSTEMS

The available raw disk space is organized into logical organizations


called file system
The physical disk may have many file systems and a file system can
be made up of several disks. File systems are made available by
mounting.
Related information is grouped together in entities called files.
Related files in turn can be grouped into different directories

The directories are organized hierarchically as a rooted structure for


better accessibility and management of name space
Directories are also files

Unlike user files which contain information stored by the user, directories
contain information about other files and directories

FILE CONTENTS AND FILE INFORMATION


File contents refer to the information stored in the file

File information is information about the file (file attributes)


File attributes
name
size
ownership
Access rights / permissions
access / modification times
retrieval information
link count

In Unix separate space on the file system is allotted for storing file contents
and information about the file
Information about files are stored in a structure called index nodes (Inodes)
(except name of file , which is stored in the directory)

DIFFERENT TYPES OF FILES


Regular files arbitrary data
Directory files provide mapping between filenames and files them
selves
Special files do not contain data , but provide a mechanism to map
devices to file names
Reading or Writing these files activates the device driver which controls
the movement of data between the physical device and the controlling
process

Links associate different file names with the same file


(hard links)

Symbolic link (soft links) are data files containing the name of a file it is
supposed to link

Indirect addressing
Named Pipes

File Names

All files have names that allow you to identify files.


File names can be formed from almost any group of characters except
slash (/). Slash is used to separate filenames in a pathname.
It is better to avoid punctuations and other metacharacters like *,?,etc

Recommended characters for filenames


Uppercase and lowercase
Underscores and periods
Numeric characters

Modern Unix implementations do not impose filename length limitations .


Older Unix specified 14 characters

Special filenames
/ root directory
. Current directory
.. Parent directory
~ shell replaces it with a home directory

Filenames starting with a period are called hidden files


No concept of separate file extension version field or device field as
part of filename

Wildcards
Wildcard notation helps in specifying groups of files in a concise form
without exhaustive enumeration
Wildcard expansion in filenames is done by shell
Wild Pattern
?
*
[a b]
[a-z]
[!a-z]

Matches
Any single character
Any group of zero or more characters
either a or b
any one character between and z inclusive
any one character other than a to z

Wild Pattern
?(abc)
+(abc)
!(abc)

Matches
zero or more instance of abc
one or more instances of abc
match anything that does not contain abc

Examples
*.txt

chapter.[0123]
x?(abc)x

all files ending with .txt in the current


directory
chapter.0,chapter.1,chapter.2,chapter.3
matches xx or xabcx

Extension :
while there is no concept of an extension required by OS ,
some utilities make use of It to indicate the nature of Content of the
file.
Example:

Extension

Description

.txt
.tar
.c

ASCII text
tar archive
C source files

PATHNAMES
Absolute pathnames

The enumeration of the sequence of directory names staring from


root separated by a / leading to filename is called the absolute path name
of that file. Absolute pathnames begin at the root directory (/)
Relative pathnames
The enumeration of the sequence of directory names separated by
/ starting from the current directory leading to the target filename is called
the relative pathname of the file

/bin

/dev

bin

/etc

/lib

games

lib

adm

etc

adm

adm

include

adm

/u

/mnt

skel

man

local

adm

/usr

preserve

adm

adm

/tmp

tmp

spool

ucb

adm

adm

Common LINUX system directories


Directory

Description

This directory is the root directory of the LINUX


system, which contains all other files.

/bin

This directory contains binary versions of system


application files, such as the Bash program itself.

/dev

This directory contains psuedofiles that represent


physical devices like disk drives.

/etc

This directory contains the majority of the system


configuration files.

Directory
/lib

Description
This directory contains library files needed for
system applications.

/opt

This directory contains optional system


components or applications.

/tmp

This directory contains temporary files used by


system or user applications.

/usr

This directory contains user and non-critical

Consider the typical Unix file system tree shown in the accompanying
figure
Example
The absolute pathname of the file news is /usr/spool/news
The relative pathname for the file news if you are located currently in
spool is news

Relative pathname provides a convenient way to address files

Links
Association of filenames with the files (inodes) constitutes a link
Link is established in directories

Links are established when files are created or when an alias is created
to an existing file
Link count of a file represents the number of references to that file

Minimum link count for an ordinary file is 1


and for a directory is 2 (why)

Every new directory when created has 2 entries


. and ..
. Refers to itself
. . Refers to the parent directory

DIRECTORY

Name

INODE

A
Z
HARD LINKS

DIRECTORY

INODE

DIRECTORY

B
INODE

Symbolic Link

Links can be hard or soft


Space for contents of file and inodes are limited
Link count is determined by 1 whenever a file is deleted If the
link count goes to zero the file space is released

FILE ATTRIBUTES
File name is not stored in Inode
File Ownership
User owner
Group owner
This facilitates sharing
Size , access and modification times
Retrieval information specify which blocks belong to the file

FILE PERMISIONS
Owner uid of owner of the object
same as the uid of accessor
group gid of owner is same as gid of object

world the remaining population


Different access modes (for each category)
- read
(r)
- write (w)
- execute (x)

Miscellaneous bits
- set user (or group) id (s)
- sticky bit ; save text (file) or prevent removal of files by nonowners (directory) (t)
Specification of permission numeric mode
3 digits corresponding to the three categories owner,group,world

Permissions for each mode is evaluated as follows


Read 4
Write 2
Execute - 1
Example
read + write is specified by 6

The Miscellaneous bits are specified by the 4th digit


4 - set user ID on execution
4 - set group ID on execution
1 - set sticky bit
Then if the permission is d1 d2 d3 d4
d1 represents miscellaneous bits
d2 d3 d4 - permissions for owner , group and world respectively

DETERMINING THE ACCESS CLASS


if Accessors UID = Objects UID

Access class is owner


else if Accessors GID = Objects GID

Access class is group


else
Access class is world

FILE PERMISSIONS :
Access Modes
Access type

r
w
x

: read, write, execute


File

Directory

view file contents


view directory contents
Modify file contents
modify directory contents
Execute file contents as you can cd to it
a program

Command

Min file permission

Min dir permission

cd /home/hk

N/A

ls /home/rk/*.c

none

Ls s /home/ rk/*.c

none

cat text

cat >> text

Program

x(bin)
rx(script)

rm program

none

wx

Access Permission

Meaning

No access

Do not allow any activity

Allow work with the programs whose


name is known - Hides others

rx

Allow listing contents and


working with programs

rwx

Allow all operations

Access Classes
user (u), group (g), others (o)

Additional Access Modes


Code

-t
-s
-S
-l

Name

sticky bit /save text


set UID
set GID
File locking

Meaning

keep executable in memory on exit


set process UID on execution
set process GID
set mandatory file locking on read

SUID & SGID set on executables

Sticky bit on directories (AIX & SUNOS)


Even though directory has w permissions, setting sticky bit permits
deletion of the file only by the owner
e.g.: ls ld /tmp
drwxrwxrwt 2 root 8704 Mar.. /tmp

File/Directory Commands

PROCESS
A Process is an instance of a program that is being
executed by the operating system.
A Process is created by the fork system call in UNIX.
Some operating system use the term task instead a
process.
Each process operates in its own address space
Process has a unique identity pid.
Every process has an owner the owners(user) identity is
internally stored a unique number uid(externally available
as a name).Uid 0 or root is has absolute privileges
Every user belongs to atleast one group(primary) gid. A
user can belong to multiple groups. Group is a mechanism
for controlled sharing
A Process typically has the arrangement shown as below:

PROCESS STATE TRANSITION


During the course of execution processes assume different states
Ready-to-run
Running
Waiting

The states through which a process passes together with the events
causing these transition is described by the state transition diagram

Process State Transition

Ready
5

Start

Running
4
wait

stop
1. Job enters
2. Process Scheduled
3. Process preempted

4. Resource wait
5. Resource allocated
6. Process exits

PROCESS
Types :
Interactive Process : initiated from & controlled by a terminal
May run as foreground (attached to terminal) or background

Job Control
bg
fg %n

: bring current process to background


: bring the background job n to foreground

Batch Processes : are those which are not associated with a terminal
and submitted through a queue.

Daemons
: Server processes which are not associated with a
terminal initiated normally at the boot time and wait in the background
until some process requires service.

Process Attributes
Process Id (PID), Parent Process Id (PPID)
Nice number : a number indicating processes priority
relative to others, Used in computing
executing priority

tty
Real and Effective UID & GID

PROCESS ATTRIBUTES
Process states : ready , running, sleep, waiting
Process address space layout
User-area : all information relevant to process execution
it holds
real and effective uid ,gid
open file handles/descriptors
signal disposition
program invocation arguments
accounting information
etc

User context

PROCESS IMAGE STRUCTURE

Stack
Kernel context

heap
Un initialized data
Initialized
Read /write
Initialized
Read only data
text

PROCESS CREATION ,DELETION AND TERMINATION


fork is the Unix system call to spawn a process
exec overlays the current process image with the specified programs
image.
exec does not create a new process
command execution results in one or more processes being created
Termination : a process terminates when it executes exit function
Process may be sent a signal to terminate it

The life cycle of a process


fork

exec
:
with
the command.

Command to create a copy of the


executing process.
Overlay existing process image
the one specified by

Execution of
a command(grep)
fork
Init
(pid 1)

Init

(pid 420)

exec

getty (pid 420)


exec

login (pid 420)


exec

Login shell

(pid 420)

sh fork sh (pid 563)


exec

grep (pid 563)

Process Control
nice command
Foreground , Background modes
sending signals

1 PROCESS CREATION
login shell

Process a
Pid = nnnn

Each process is assignes a


Unique process id (pid)

Before fork

After fork
2. $cat file

process a forks another process (b) to run the cat command

Process a
Pid = nnnn

Process a is the parent


of process b

Process b is the child of


process a

Process b
Pid = mmm

A process exists until it


does an exit system call

3. Exec system call


Process b does an exec which allows the cat command
(program) to overlay process b

Before exec
Process b
Exec cat program
Pid= mmmm

Cat program is an
executable file

Exec

4.

The exec call causes


the cat program to be
loaded into process
bs memory

Cat program

6. after exit
process a

When the cat program


terminates control reverts
to process a , which then
waits for the next
command

Pid = nnnn

cat program
Pid =
mmmm
Process b is no longer
known to the login shell
and its memory area is
freed up

BACK GROUND PROCESS


sh
forks sh

forks sh
Pid = aaaa

sh
execs
cft

cft
executes,
then exits

sh
execs
cft

cft
executes,
then exits

Signals : are notification to process than an event has occurred


called software interrupt
usually asynchronously sent
Signal can be sent

by one process to itself


by one process to another
by kernel to the other process
every signal has a name
signal names can be listed using kill -l

Kill command
kill signal process_id
The above sends signal signal to a process with pid
process_id
e.g. : kill -9 420
Kills process with pid 420

User types in Unix


ordinary user
super user or root (uid 0)
The super user has unrestricted privileges and hence should be used
with care
Input/Output redirection

Processes communicate with files through file descriptor


A file descriptor is associated with a file through open command
In Unix processes do I/O through file descriptors. It does not matter
what type of file the descriptor is associated with- regular file or device

Every process has three file descriptors open on creation


Standard input (0)
Standard Output (1)
Standard Error (2)

Further ,details of I/O are encapsulated within the kernel


It is thus easy to redirect a file descriptor and let the kernel handle the
details of redirection
The shell provides a simple notation for specifying redirection
> specifies redirection of Standard Output
< specifies redirection of Standard Input
In general

n > file specifies redirection of file descriptor n to file


n < file set file descriptor n to file

INPUT REDIRECTION
File
fred

terminal

standard
input

shell
program
(cat)

standard
output

shell

terminal

$ cat < fred

OUTPUT REDIRECTION
terminal

standard
input

shell
program
(cat)

standard
output
File
(joe)

shell

terminal

$ cat >fred

STANDARD I/O
terminal
standard
input

shell

program
(cat)
standard
output

shell

terminal

$ cat

Process Pipeline : the pipe operator |


The pipe operator connects the standard output of the command on its left
to the standard input of the command on its right
e.g. who | wc l
pipes support the building block approach to problem solving
Programs which are written to accept input from standard input and
generate output are called filters

Filters can be conveniently connected into pipelines

PIPED PROCESSES
Sh
Forks sh
Forks sh

sh
Execs
who

sh

Who

execs
sorts

Executes
writing to
pipe

Pipe
buffer

$ who | sort

sort
executes
Reading
form pipe

PIPES
File
/etc/passwd
standard input

shell

Program grep

standard input
shell
standard output
Program
(sort)
standard output

shell

terminal
$ grep I joe /etc/passwd | sort

Process related Commands

REGULAR EXPRESSIONS
Regular Expressions (RE) are used to specify text patterns for searching
and replacing
RE constitute a powerful language capable of describing complex
pattern classes
Several important Unix tools support the capabilities of RE
e.g. grep,fing ,sed,awk,vi

FORMING REGULAR EXPRESSIONS

Each literal character is a RE that matches only that character


Simplest operation for forming larger REs from literals is Concatenation

e.g.., ABC is a RE formed by the concatenation of literals A,B,and C

REs are not limited to literals . They can also contain metacharacters
The list of metacharacters in RE ,their function and the utilities
supporting them are described in the table

Interpreting a regular expression

Metacharacters
The characters below have special meaning only in search patterns
. Matches any single character except new line

* Match any number (or none) of the single character that immediately
precedes it. The preceding character can also be regular expression
e.g. since . (dot) means any character , .* means match any number of
any character

^ match the following regular expression at the beginning of the line

$ match the preceding regular expression at the end of the line


[ ] match any one of the enclosed character
A hyphen ( -) indicates a range of consecutive characters.
A circumflex (^) as the first character in the brackets reverses the
sense.
A hyphen or close bracket (]) as the first character is treated member of the
list.

\{n,m\} match a range of occurrences of the single character that


immediately precedes it. The preceding character can also be a regular
expression. \{n\} matches exactly n occurrences, \{n,\} matches at least n
occurrences and \{n,m\} matches any number of occurrences between n
and m and m must be 0 and 256,inclusive.

\ turn off the special meaning of the characters that follows

\(\) save the pattern enclosed between \( \ ) into a special holding space.

\<\> match characters at beginning \< or \> end of a word.


+ match one or more instances of preceding regular expressions
? Match zero or one instance of preceding regular expression

| match the regular expression specified before or after


( ) apply a match to the enclosed group of regular expressions.

Regular Expression Anchor Character Examples


Pattern

Matches

^A

An A at the beginning of a line

A$

An A at the end of a line

A^

An A anywhere on a line

$A

$A anywhere on a line

^\^

A ^ at the beginning of a line

^^

Same as ^\^

\$$

A $ at the end of a line

$$

Same as \$$

Regular Expression Character Set Examples

Regular Expression

Matches

[0-9]

Any digit

[^0-9]

Any character other than a digit

[-0-9]

Any digit or a -

[0-9-]

Any digit or a -

[^-0-9]

Any character except a digit or a -

[]0-9]

Any digit or a ]

[0-9]]

Any digit followed by a ]

[0-99-z]

Any digit or any character between 9 and z

[]0-9-]

Any digit, a -, or a ]

Regular Expression Pattern Repetition Examples


Regular
Expression

Matches

Any line with a *

\*

Any line with a *

\\

Any line with a \

^*

Any line starting with a *

^A*

^A\*

Any line
Any line starting with an A*

^AA*

Any line starting with one A

Regular
Expression

Matches

^AA*B

Any line starting with one or more A's followed by a B

^A\{4,8\}B

Any line starting with four, five, six, seven, or eight A's
followed by a B

^A\{4,\}B

Any line starting with four or more A's followed by a B

^A\{4\}B

Any line starting with an AAAAB

\{4,8\}

Any line with a {4,8}

A{4,8}

Any line with an A{4,8}

Examples of searching and Replacing


Command

Result

s/.*/( & )/

Redo the entire line, but add parentheses.

s/.*/mv & &.old/

Change a wordlist into mv commands.

/^$/d

Delete blank lines.

:g/^$/d

ex version of previous.

/^[ tab]*$/d
spaces

Delete blank lines, plus lines containing only


or Tabs.

:g/^[

tab]*$/d

s/*//g
:%s/

ex version of previous.
Turn one or more spaces into one space.

*/ /g

ex version of previous.

Command
:s/[0-9]/Item &:/

Result
Turn a number into an item label (on the current
line).

:s

Repeat the substitution on the first occurrence.

:&

Same.

:sg

Same, but for all occurrences on the line.

:&g

Same.

:%&g

Repeat the substitution globally.

:.,$s/Fortran/\U&/g
last line.

Change word to uppercase, on current line to

:%s/.*/ \L&/

Lowercase entire file.

:s/\<./\u&/g
current

Uppercase first letter of each word on


line (useful for titles).

:%s/yes/No/g

Globally change a word to No.

:%s/Yes/~/g Globally

s/die or do/do or die/

change a different word to No


(previous replacement).
Transpose words.

Examples of searching
Pattern

What Does it Match?

bag

The string bag.

^bag

bag at beginning of line.

bag$

bag at end of line.

^bag$

bag as the only word on line.

[Bb]ag

Bag or bag.

b[aeiou]g

Second letter is a vowel.

b[^aeiou]g

Second letter is a consonant


(or uppercase or symbol).

b.g

Second letter is any character.

^...$

Any line containing exactly three characters.

^\.

Any line that begins with a . (dot).

^\.[a-z][a-z]

Same, followed by two lowercase letters


(e.g., troff requests).

^\.[a-z]\{2\}

Same as previous, grep or sed only.

^[^.]

Any line that doesn't begin with a . (dot).

bugs*

bug, bugs, bugss, etc.

"word"

A word in quotes.

"*word"*

A word, with or without quotes.

[A-Z][A-Z]*

One or more uppercase letters.

[A-Z]+

Same, egrep or awk only.

[A-Z].*

An uppercase letter, followed by zero or more


characters.

[A-Z]*

Zero or more uppercase letters.

[a-zA-Z]

Any letter.

[^0-9A-Za-z]

Any symbol (not a letter or a number).

[567]

One of the numbers 5, 6, or 7.

egrep or awk pattern:


five|six|seven
80[23]?86
compan(y|ies)

One of the words five, six, or seven.


One of the numbers 8086, 80286, or 80386.
One of the words company or companies.

ex or vi pattern:
\<the
the\>
\<the\>

Words like theater or the.


Words like breathe or the.
The word the.

sed or grep pattern:


0\{5,\}

Five or more zeros in a row.

[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\} US social security number (nnn-nnnnnn).

Valid meta-characters for different program


Symbol

ed ex vi sed awk grep egrep Action

Match any character.

Match zero or more preceding.

Match beginning of line.

Match end of line.

Escape character following.

[]

Match one from a set.

\(\)

Store pattern for later replay.

\{\}

Match a range of instances.

\<\>

Match word's beginning or end.

Match one or more preceding.

Match zero or one preceding.

Separate choices to match.

()

Group expressions to match.

Valid Meta-characters for Replacement Patterns


Symbol
\

ex
*

sed
*

ed
*

Action
Escape character following.

\n

Reuse pattern stored in \( \).

&

~
pattern.

Reuse previous replacement

\u \U

Change character's) to
uppercase.

\l \L

Change character's) to
lowercase.

\E

Turn off previous \U or \L.

\e

Reuse previous search


pattern.

Turn off previous \u or \l.

FIND
find is one of UNIX's most useful and important utilities
It finds files that matches a given set of parameters (like name ,
permissions, size, access & modification times etc.)
Command format
find
path
operators (options)
Some of the options are
-name filename #filename
-perm mode
#access mode
-type c
#type of file - f, d, 1, b etc..,
-user name
-size n
#files that are ' n' blocks
+n => '>n' blocks
-n => '<n' blocks
optl -a opt2
find files which watch both optl
and opt2
optl -o opt2
file that match either

! operator
\(expression \)
- print
- exec command

files that don't match operator


grouping of operator expression for procedures
print selected filenames on standard output
execute command. the selected file can be
referred to by { }. The command must be
terminated by\;

Examples
1. find -name " *.o" -exec rm -f {} \;
2. find . -print
3. find ~ ~barny /usr/local
-print
4. find . -name " [a-zA-Z] *.o" \ -print
5. find
-mtime +6 -print
6. find
-name \*.p -perm 664 -print
7. find
-perm -100 print (if -perm argument is negative then all
permission bits including SUID bit is examined)
8. All SUID files of owner root find
-user root -perm -4000 -print
6. find /tmp -type f -mtime +7 \ -execute run {} \;

Global Regular Expression Printer


grep is a utility for searching patterns or regular expressions in a file
Related commands are egrep and fgrep

egrep is an extension to grep and handle more powerful regular


expressions
fgrep is a subset of egrep and can handle only strings of fixed size & is
faster for such strings

Command Format
egrep [options] [regular-expr] file(s)
options
-v print the lines that do not match pattern
-c Do not print lines but only the count of lines matching
-i ignore uppercase/lowercase distinction
-n print line numbers along with lines
-e print lines which begin with a - (minus)
-f get regular expression from a file
Examples
egrep -c . inpfile
egrep -n `, . *,.*, inpfile
cat exprfile egrep -n -f exprfile inpfile

SED
sed is a stream , editor
Commands of sed are similar to ed
sed can take input from a file, edit the text of the file , but the text of
the input file is not altered.
sed can take input from a command
sed is a stream editor and hence non-interactive
sed is useful for
1) Editing files too large for efficient interactive editing
2) Global editing. Only one pass is made on input
3) Complex editing which would be tedious for interactive editing
sed does not create a temporary file. A few lines are stored in core.
Hence more efficient than ed or vi

Execution of sed
Before executing commands sed commands are compiled into an
efficient form
One line is pulled from the the input stream & placed in pattern space
All commands act on this line. The next line is pulled and the above
process repeats.
Command format
sed [-n] [--e script] [-f file] [file ...
.]
-e argument is the command script for sed on the command line
-f
specifies that commands are not on the command line but in the
file. If files are specified input text is taken from then else sed takes
it from standard input.

SED EXECUTION
Addresses
addresses select lines from inputs to be operated on by the command
Two types of addresses
Line number Addresses
Context Addresses
Examples
1,20
/#/,$
/Begin/,/End

Commands
one character in length
Text substitution
s/regular expression/replacement/[gp]
g - substitute every occurrence of re
p - specifies line to be printed in case of substitution succeeds.
This is on by default.

Deleting Text
E.g.: '/^ $/d' double
Appending Text
-a
The text is written to the standard output after the line number
specified for 'a'. If no line number is given, the text added at the end.

AWK
Aho, Weinberger and Kernighan
Awk is a pattern scanning and processing language
awk [-f command file] [commands][file..]
Records and fields
Each line of input is a record which is terminated by a record
separator(RS) which is new line by default
The current record number is stored in NR
Every record is divided into fields which are separated by a field
separator (FS), space by default
The variable NF denotes the number of fields in the current record
Fields can be referenced by the notation $n where 'n' is the field
number. When 'n' is zero, $n refers to the entire record.
E.g.:

Awk program
Overall structure
pattern {action}
pattern {action}
action - valid awk statement or group of statements
action statements are optional when only pattern is specified, it
prints all lines containing the pattern
patterns can be regular expressions, defined BEGIN or END pattern
or relational pattern
pattern is also optional. Actions without patterns are executed
unconditionally
patterns are used to select lines when a pattern matches a line
or pattern it is said to be selected and the associated action is
performed
Text patterns
[abc]
[a-z]

matches one of the enclosed characters


matches one from the range a to z

[^abc] ()
|
*
+
?
-

not one of the enclosed characters.


groups character
or character, alternatives
0 or more occurrences of previous character or group
one or more occurrences
0 or one occurrences of previous character or group

^
$
~
!~

Beginning of the line or field


End of the line or field
A field or entire line matches a pattern
A field or entire line does not match

These help if the specific field contains a pattern


Examples
1) awk '/in/' text
--prints all lines containing 'in' in the file text
2) awk ' [Tt]he' text --prints all lines containing 'The' or 'the'
3) awk ' [^Tt]he' text --prints all lines not containing 'The' or 'the'
4) awk '/Unix I Berkeley/' text --prints all lines containing 'Unix' or
'Berkeley'

BEGIN and END Patterns


BEGIN is used to specify actions to be completed before an input
record is read
END is used to specify actions to be completed after all input has
been read
Examples :
BEGIN {FS=":"}
BEGIN 'BEGIN { FS=:} /tng/'
END { print TOTAL }
awk 'END {print NR}' text

/etc/passwd

RELATIONAL PATTERNS
$ awk '$2 >= 10' data
$ awk '$1 >= "gr" data
Strings are enclosed in double quotes
string comparison done if neither expression is numeric
If data contains
awk
15
sed
12
tr
6
grep
8
cut
2
What is the output of the last two awk programs ?

Pattern combination
Patterns are combined using boolean operators
Symbol
Function
||
logical or
&&
logical and
!
logical not

$awk '$2>8
&& $2<14' data
sed 12

$ awk 1$1 == "awk" II $1 > "s" data


awk
15
sed
12
tr
6
Pattern ranges
$ awk ' Irl, lel' data
tr
6
grep
8

Actions
{print $1 $2} data
where data consists of 2 records listed
above then the output is,
tr
6
grep
8
{print $1, $2 }data
tr
6
grep 8
Output Field Separator (OFS)
$awk'BEGIN {OFS="="}
>
{print $1, $2} data
tr
6
grep 8
When white space separates the arguments of 'print', the output is not
separated by OFS.
Formatted print
printf "control-string" , argl,
control-string - is similar to that in "c"
Print redirection
{print " %s \t\t %o octal \n", $1, $2 >> comlog)

Variable assignment
var = "text" Assigned text
var = 9
Assigned numeric data
If var = "9", both var > 5 and var > "+" are valid
Arithmetic operation
All standard operations are available
Example
$ cat data
awk
15
sed
12
tr
6
grep
8
cut
2
$ cat prog
BEGIN {sum = 0}
{sum = $sum + $2)
END {printf "Total is % d\n", sum)
$ awk -f grog
Total is 43

You might also like