Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

8 - Awk Programming

awk is a pattern scanning and processing language used for text processing and as a filter. It can process column-oriented text data by accessing fields as variables like $1 and $2. awk programs are surrounded by single quotes and use selection criteria and actions. It can perform number processing, comparisons, logical operations, and handle variables, arrays, and user-defined functions. Large programs are stored in files with the .awk extension and invoked with the -f option. BEGIN and END blocks allow preprocessing and postprocessing.

Uploaded by

tonnysylvester5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

8 - Awk Programming

awk is a pattern scanning and processing language used for text processing and as a filter. It can process column-oriented text data by accessing fields as variables like $1 and $2. awk programs are surrounded by single quotes and use selection criteria and actions. It can perform number processing, comparisons, logical operations, and handle variables, arrays, and user-defined functions. Large programs are stored in files with the .awk extension and invoked with the -f option. BEGIN and END blocks allow preprocessing and postprocessing.

Uploaded by

tonnysylvester5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

9.

0 awk Programming

9.1 awk Preliminaries

Awk is a pattern scanning and processing language. awk is small, fast, and simple, unlike, say, perl.
awk also has a clean comprehensible C-like input language. And while it can't do everything you can
do in perl, it can do most things that are actually text processing, and it's much easier to work with.
awk is also used as a filter, just like the sed and find commands in UNIX.

awk options 'selection_criteria {action}' file(s)


awk –F: '$3 > 200 { print $1, $3 }' /etc/passwd

The selection_criteria and action constitutes the awk program that is surrounded by a set of single
quotes. Unlike other filters, awk uses a contiguous sequence of spaces and tabs as the default
delimiter. Fields in awk are numbered $1, $2, and so on, and the selection criteria here tests whether
the third field is greater than 200. awk addresses the entire line as $0. To prevent the shell from
performing variable evaluation, we need to single-quote any awk program that uses these parameters.

In its simplest usage awk is meant for processing column-oriented text data, such as tables, presented
to it on standard input. The variables $1, $2, and so forth are the contents of the first, second, etc.
column of the current input line. For example, to print the second column of a file, you might use the
following simple awk script:

awk < file '{ print $2 }'

This means "on every line, print the second field".


To print the second and third columns, you might use in
awk < file '{ print $2, $3 }'
awk makes use of regular expressions to search for, one or two line addresses, or a conditional
expression. Consider the examples below:

awk '/printf/ { print }' filefoo  prints lines containing printf


awk '$2 ~ /^orange$/ { print }' foo  tests exact match on second field
awk 'NR == 1, NR == 5 { print }' foo  prints lines 1 to 5
awk '$6 > 2000 { print }' foo  Sixth field greater than 2000

Printing is the default action for awk.


awk '/printf/' filefoo  prints lines containing printf

9.1.1 Input separator

By default awk splits input lines into fields based on whitespace, that is, spaces and tabs. You can
change this by using the -F option to awk and supplying another character. For instance, to print the
home directories of all users on the system, you might do
awk < /etc/passwd -F: '{ print $6 }'
since the password file has fields delimited by colons and the home directory is the 6th field.

Page 1 of 7
9.2 Using print and printf

awk uses print and printf statements to write to standard output. print produces unformatted output. a
comma in the field list ($1, $2) ensures that the fields are not glued together. The default delimiter is
space but we will learn how to change it later, by setting the built-in variable, FS.

Example: awk -F: '{ print $1, $2 }' fruitlist.txt

When placing multiple statements in a single line, use the semicolon (;) as their delimiter.

With C-like printf statement, you can use awk as a stream formatter. printf uses a quoted format
specifier and a field list. awk accepts most of the formats used by the printf function in C and the
printif command. They include: %s for string, %d for Integer and %f for Floating point number.

Example: awk -F: '{ printf("%d %12s", $1, $2) }' list.txt

9.2.1 Redirecting Standard Output

Every print and printf statement can be separately redirected with > and | symbols. However, need to
make sure that the filename or the command name that follows these symbols is enclosed within
double quotes.

printf "%d %-10s %-12s %-8s\n", $1, $3, $4, $6 | "sort"

if you use the filename instead, the file name should be enclosed in quotes in a similar manner:

printf "%d %-10s %-12s %-8s\n", $1, $3, $4, $6 > "mlist"

9.3 Number Processing

awk supports computation using the arithmetic operators. The +, -, *, and / as well as the modulo (%)
operation. awk uses the symbol ^ for exponentiation. For example: 2^10 = 1024.

echo 2 10 | awk '{ printf("%d", $1^ $2) }'

List of assignment operators

Operator Description Example


++ Adds one to itself i++
+= Adds and assigns to itself i +=5
-- Subtracts one from itself i--
-= Subtracts and assigns to itself i -=2
*= Multiplies and assigns to itself i *=3
/= Divides and assigns to itself i /=6

The statements x++ and ++x are similar but not identical:

kount=count=5
print ++kount  increments kount first and then prints 6
print count++ prints 5 and then sets count to 6

Page 2 of 7
9.4 Variables and Expressions

Expressions comprise strings, numbers, variables, and entities that are built by combining them with
operators. For example, (x + 5)*12 is an expression. Unlike other programming languages (like perl),
awk does not have primitive data types. Every expression can be interpreted as either a string or a
number, and just like in perl, awk makes the necessary conversion according to context.

awk also allows the use of user-defined variables but without declaring them. Variables are case-
sensitive: x is different from X. a variable is deemed to be declared the first time it is used. Unlike
shell variables, awk variables don’t use the $ either in assignment or in evaluation.

x = “5”
print x
strings in awk are always double-quoted and can contain any character. In awk, string concatenation is
achieved by simply placing side-by-side.

NB: variables are neither declared nor are their type specified. awk identifies their type and initializes
them to zero or null strings.

9.5 The Comparison and Logical Operators

Unlike Shell and Perl, awk has a single set of comparison operators for handling strings and numbers,
and two separate operators for matching regular expressions.

Operator Significance

< Less than


<= Less than or equal to
== Equal to
!= Not equal to
>= Greater than or equal to
> Greater than
~ Matches a regular string
!~ Doesn’t match a regular string
&& Logical AND
|| Logical OR
! Logical NOT

Example: awk -F: '$6 > 120000 { print $2, $6 }' emplist.txt  making a comparison on a field value.

9.6 The –f Option: Storing awk Programs in a File

Large awk programs should be held in a file with the .awk extension for easier identification.
Consider:

$ cat empawk.awk

FS ="\t"
$3 == "Director" && $4 > 120000 {
printf "%4d %-20s %-12s %d\n", ++kount, $2, $3, $4}

Page 3 of 7
to run the program, use: awk –F: -f empawk.awk empn.list

The –f option is used only for programs not enclosed within single quotes.
Notice that here the program is not enclosed within single quotes. awk uses quotes only when the
program is specified in the command line or the entire awk command line is held in a shell script.

9.7 The BEGIN and END Sections

The BEGIN and END sections are optional and take the form – both requires opening and closing
curly braces:
BEGIN { action }
END { action }

When present, these actions are delimited by the body of the awk program. You can use them to print
a suitable heading at the beginning and the average salary at the end.
Like shell, awk uses # for providing comments.

Example: empawk2.awk

BEGIN { FS = "\t"
printf "\t\tEmployee abstract \n\n"
}
$4 > 120000 { #Increment variables for serial number
and pay
kount++; total +=$4 #Multiple assignments in one line
printf "%3d %-20s %-12s %d\n", kount, $2, $3, $4
}
END {
printf "\n\tThe average salary is %6d\n", total/kount
}

Like all standard filters, awk reads standard input when filename is omitted. We can make awk
behave like a simple scripting language by doing all work in the BEGIN section.

9.8 Positional Parameters


The script empawk2.awk would take a more generalized form if the number 120000 is replaced with a
variable. Because awk uses the parameters as field identifiers, quoting helps distinguish between a
field identifier and a shell parameter.
E.g. $4 > '$1'

empn.list

Kenya Kinyua Ware Ass.Chair 117000


Uganda Bill Johnson Director 130000
USA Ken Wamaitha Treasurer 118000
UK Barry Wood Chairman 160000
Wakes Gordon Lightf Director 140000
Comoros Juane Kay active Member 112000
Kenya Derrik O'Brian Director 125000
Uganda James Keysalt P. Assistant 110000
Kenya Ken Thompson Secretary 119000

Page 4 of 7
9.9 Arrays in awk

An array is also a variable except that this variable can store a set of values or elements. Each element
is accessed by a subscript called the index. Arrays in awk:

 Are not formally defined. An array is considered declared the moment it is used.
 Array elements are initialized to zero or an empty string unless initialized explicitly.
 Arrays do not have a fixed size; they expand automatically.
 The index can be virtually anything; it can even be a string.

Example: empawk3.awk

BEGIN { FS= “:” ; printf “%44s/n”, “Salary Commission” }


$4 ~/sales|marketing/ {
commission = $6*0.20
tot[1] += $6 ; tot[2] +=commission
kount++
}
END { printf “\t Average %5d %5d\n”, tot[1]/kount,
tot[2]/kount }

awk arrays are associative (hash), where information is held as key-value pairs. The index is the key
that is saved internally as a string. Setting array element mon[1]= “Jan”, awk converts the number 1 to
a string.

Example:

The Environment Array ENVIRON[ ]: awk maintains the associative array, ENVIRON[ ], to store all
the environment variables.

9.10 Built-in Variables

The FS variable: - awk uses a contiguous string of spaces as the default field delimiter. FS redefines
this field separator. When used, must occur in the BEGIN section.

BEGIN { FS = “:” }

This is an alternative to the –F: option of the command which does the same thing.

The OFS Variable: - space is the awk’s default output field separator, and can be reassigned using
the variable OFS in the BEGIN section as follows:

BEGIN { OFS = “~” }

When you reassign this varaiable with ~ (tilde), awk uses this character for delimiting the print
arguments. This is a useful variable for creating lines with delimited fields.

Variable Function Default value

NR Cumulative number of lines read


FS Input field separator space
OFS Output field separator space

Page 5 of 7
OFMT Default floating point format %.6f
RS Record separator newline
NF Number of fields in current line/each record
FILENAME Current input line
ARGC Number of arguments in command line
ARGV Array containing list of arguments
ENVIRON Associative array containing all environment variables

9.11 Functions

In awk, we have several built-in functions that perform both arithmetic and string operations. The
arguments are passed to a function in C-style, delimited by commas, and enclosed by a matched pair
of parenthesis. However, awk does allow use of functions without parentheses. Like print and printf
functions.

Built-in functions

Function Description

Arithmetic

int(x) returns integer value of x


sqrt(x) returns square root of x

String

length() returns length of a complete line


length(x) returns length of x
tolower(s) returns string s after conversion to lowercase
toupper(s) returns string s after conversion to uppercase
substr(stg,m) returns remaining string from position m in string stg
substr(stg, m, n) returns portion of string of length n, starting from position m in string stg
index(s1,s2) returns position of string s2 in string s1
split(stg, arr, ch) splits string stg into array using ch as delimiter; returns number of fields
system(“cmd”) runs UNIX command cmd and returns its exit status

9.10 Control Flow – The if Statement

Like any programming language, awk supports conditional structures (the if statement) and loops
(while and for). The if statement permits two-way decision making.
Syntax:

if (condition is true) {
statement (s)
}else { #else is optional
statement(s)
}

Page 6 of 7
The control command itself must be enclosed in parentheses. As in C, the statements form a code
block delimited by curly braces. Also, as in C, the { and } are required only when multiple statements
are executed. The else sectional is optional.
Example:

if ($4 < 10000)


commission = 0.15*$4
else
commission = 0.10*$4

9.11 Looping with for

Both for and while loops, execute the loop body as long as the control command returns a true value.
for has two forms.

1st form of for loop:


for ( k=1; k<= 100; k+=2)
Statement(s)

BEGIN { FS = ":"}
if($1 ~/^root$|^uucp$/){
line = ""
for (i = NF; i>0; i--)
line = line ":" $1
print line
}
}

Using for with an associative array

for (k in arr) {
Statememt(s)
}

9.14 Lopping with while

The while loop has a similar role to play; it repeatedly iterates the loop till the control command
succeeds. Syntax:

while (condition is true) {


statement (s)
}

Like for, while also uses the continue statement to start a premature iteration and break to exit a
loop.

Page 7 of 7

You might also like