8 - Awk Programming
8 - Awk Programming
0 awk Programming
Awk is a pattern scanning and processing language. awk is small, fast, and simple, unlike, say, perl.
awk also has a clean comprehensible C-like input language. And while it can't do everything you can
do in perl, it can do most things that are actually text processing, and it's much easier to work with.
awk is also used as a filter, just like the sed and find commands in UNIX.
The selection_criteria and action constitutes the awk program that is surrounded by a set of single
quotes. Unlike other filters, awk uses a contiguous sequence of spaces and tabs as the default
delimiter. Fields in awk are numbered $1, $2, and so on, and the selection criteria here tests whether
the third field is greater than 200. awk addresses the entire line as $0. To prevent the shell from
performing variable evaluation, we need to single-quote any awk program that uses these parameters.
In its simplest usage awk is meant for processing column-oriented text data, such as tables, presented
to it on standard input. The variables $1, $2, and so forth are the contents of the first, second, etc.
column of the current input line. For example, to print the second column of a file, you might use the
following simple awk script:
By default awk splits input lines into fields based on whitespace, that is, spaces and tabs. You can
change this by using the -F option to awk and supplying another character. For instance, to print the
home directories of all users on the system, you might do
awk < /etc/passwd -F: '{ print $6 }'
since the password file has fields delimited by colons and the home directory is the 6th field.
Page 1 of 7
9.2 Using print and printf
awk uses print and printf statements to write to standard output. print produces unformatted output. a
comma in the field list ($1, $2) ensures that the fields are not glued together. The default delimiter is
space but we will learn how to change it later, by setting the built-in variable, FS.
When placing multiple statements in a single line, use the semicolon (;) as their delimiter.
With C-like printf statement, you can use awk as a stream formatter. printf uses a quoted format
specifier and a field list. awk accepts most of the formats used by the printf function in C and the
printif command. They include: %s for string, %d for Integer and %f for Floating point number.
Example: awk -F: '{ printf("%d %12s", $1, $2) }' list.txt
Every print and printf statement can be separately redirected with > and | symbols. However, need to
make sure that the filename or the command name that follows these symbols is enclosed within
double quotes.
if you use the filename instead, the file name should be enclosed in quotes in a similar manner:
printf "%d %-10s %-12s %-8s\n", $1, $3, $4, $6 > "mlist"
awk supports computation using the arithmetic operators. The +, -, *, and / as well as the modulo (%)
operation. awk uses the symbol ^ for exponentiation. For example: 2^10 = 1024.
The statements x++ and ++x are similar but not identical:
kount=count=5
print ++kount increments kount first and then prints 6
print count++ prints 5 and then sets count to 6
Page 2 of 7
9.4 Variables and Expressions
Expressions comprise strings, numbers, variables, and entities that are built by combining them with
operators. For example, (x + 5)*12 is an expression. Unlike other programming languages (like perl),
awk does not have primitive data types. Every expression can be interpreted as either a string or a
number, and just like in perl, awk makes the necessary conversion according to context.
awk also allows the use of user-defined variables but without declaring them. Variables are case-
sensitive: x is different from X. a variable is deemed to be declared the first time it is used. Unlike
shell variables, awk variables don’t use the $ either in assignment or in evaluation.
x = “5”
print x
strings in awk are always double-quoted and can contain any character. In awk, string concatenation is
achieved by simply placing side-by-side.
NB: variables are neither declared nor are their type specified. awk identifies their type and initializes
them to zero or null strings.
Unlike Shell and Perl, awk has a single set of comparison operators for handling strings and numbers,
and two separate operators for matching regular expressions.
Operator Significance
Example: awk -F: '$6 > 120000 { print $2, $6 }' emplist.txt making a comparison on a field value.
Large awk programs should be held in a file with the .awk extension for easier identification.
Consider:
$ cat empawk.awk
FS ="\t"
$3 == "Director" && $4 > 120000 {
printf "%4d %-20s %-12s %d\n", ++kount, $2, $3, $4}
Page 3 of 7
to run the program, use: awk –F: -f empawk.awk empn.list
The –f option is used only for programs not enclosed within single quotes.
Notice that here the program is not enclosed within single quotes. awk uses quotes only when the
program is specified in the command line or the entire awk command line is held in a shell script.
The BEGIN and END sections are optional and take the form – both requires opening and closing
curly braces:
BEGIN { action }
END { action }
When present, these actions are delimited by the body of the awk program. You can use them to print
a suitable heading at the beginning and the average salary at the end.
Like shell, awk uses # for providing comments.
Example: empawk2.awk
BEGIN { FS = "\t"
printf "\t\tEmployee abstract \n\n"
}
$4 > 120000 { #Increment variables for serial number
and pay
kount++; total +=$4 #Multiple assignments in one line
printf "%3d %-20s %-12s %d\n", kount, $2, $3, $4
}
END {
printf "\n\tThe average salary is %6d\n", total/kount
}
Like all standard filters, awk reads standard input when filename is omitted. We can make awk
behave like a simple scripting language by doing all work in the BEGIN section.
empn.list
Page 4 of 7
9.9 Arrays in awk
An array is also a variable except that this variable can store a set of values or elements. Each element
is accessed by a subscript called the index. Arrays in awk:
Are not formally defined. An array is considered declared the moment it is used.
Array elements are initialized to zero or an empty string unless initialized explicitly.
Arrays do not have a fixed size; they expand automatically.
The index can be virtually anything; it can even be a string.
Example: empawk3.awk
awk arrays are associative (hash), where information is held as key-value pairs. The index is the key
that is saved internally as a string. Setting array element mon[1]= “Jan”, awk converts the number 1 to
a string.
Example:
The Environment Array ENVIRON[ ]: awk maintains the associative array, ENVIRON[ ], to store all
the environment variables.
The FS variable: - awk uses a contiguous string of spaces as the default field delimiter. FS redefines
this field separator. When used, must occur in the BEGIN section.
BEGIN { FS = “:” }
This is an alternative to the –F: option of the command which does the same thing.
The OFS Variable: - space is the awk’s default output field separator, and can be reassigned using
the variable OFS in the BEGIN section as follows:
When you reassign this varaiable with ~ (tilde), awk uses this character for delimiting the print
arguments. This is a useful variable for creating lines with delimited fields.
Page 5 of 7
OFMT Default floating point format %.6f
RS Record separator newline
NF Number of fields in current line/each record
FILENAME Current input line
ARGC Number of arguments in command line
ARGV Array containing list of arguments
ENVIRON Associative array containing all environment variables
9.11 Functions
In awk, we have several built-in functions that perform both arithmetic and string operations. The
arguments are passed to a function in C-style, delimited by commas, and enclosed by a matched pair
of parenthesis. However, awk does allow use of functions without parentheses. Like print and printf
functions.
Built-in functions
Function Description
Arithmetic
String
Like any programming language, awk supports conditional structures (the if statement) and loops
(while and for). The if statement permits two-way decision making.
Syntax:
if (condition is true) {
statement (s)
}else { #else is optional
statement(s)
}
Page 6 of 7
The control command itself must be enclosed in parentheses. As in C, the statements form a code
block delimited by curly braces. Also, as in C, the { and } are required only when multiple statements
are executed. The else sectional is optional.
Example:
Both for and while loops, execute the loop body as long as the control command returns a true value.
for has two forms.
BEGIN { FS = ":"}
if($1 ~/^root$|^uucp$/){
line = ""
for (i = NF; i>0; i--)
line = line ":" $1
print line
}
}
for (k in arr) {
Statememt(s)
}
The while loop has a similar role to play; it repeatedly iterates the loop till the control command
succeeds. Syntax:
Like for, while also uses the continue statement to start a premature iteration and break to exit a
loop.
Page 7 of 7