AWK Programming
AWK Programming
AWK Programming
Introduction
Computer users spend a lot of time doing simple, mechanical data manipulation - changing the format of data, checking its validity, finding items with some property, adding up numbers, printing reports, and the like. All of these jobs ought to be mechanized, but its a real nuisance to have to write a special-purpose program in a standard language like C or Pascal each time such a task comes up. Awk is a programming language that make it possible to handle simple, mechanical data manipulation tasks with very short programs, often only one or two lines long. An awk program is a sequence of patterns and actions that tell what to look for in the input data and what to do when its found. Aho, Kernighan and Weinberger. 1988. The AWK Programming Language
{print} OR awk {print $0} {print $1, $3} {print NF, $1, $NF} (Any expression can be used after $ to denote a field {print $1, $2 * $3} {print NR, $0} {print total pay for , $1, is , $2 * $3}
Computing and Printing awk Printing Line Numbers awk Putting Text in the Output awk
1.3 Fancier Output print statement is meant for quick and easy output use printf statement to format the output exactly the way you want it
Lining Up Fields printf statement form printf (format, value1, value2, , valuen) where format is a string that contains text to be printed verbatim interspersed with specification of how each of the values is to be printed A specification is a % followed by a few characters that control the format of a value. Task: Use prinf to print the total pay for every employee awk {printf (total pay for %s is $%.2f \n, $1, $2 * $3)}
no blanks or new lines are produced automatically; you must create them yourself. Dont forget
the \n.
Task: Print each employees name and pay. awk {printf (%-8s $%6.2f \n, $1, $2 * $3)} Sorting the Output Task: Print all data for each employee, along with his or her pay, sorted in order of increasing pay. awk {printf (%6.2f %s \n, $2 * $3, $0)} emp.data | sort
pipes the output of awk into the sort command. 1.4 Selection Awk patterns are good for selecting interesting lines from the input for further processing.
Selection by Comparison Task: A comparison pattern to select the records of employees who earn $5.00 or more per hour. awk $2 >= 5 emp.data Selection by Computation Task: Print the pay of those employees whose total pay exceeds $50. awk $2*$3 > 50 {printf($%.2f for %s \n, $2*$3, $1)} Selection by Text Content Task: Print all lines in which the first field is Susie awk $1 = =Susie Combinations of Patterns
Patterns can be combined with parentheses and the logical operators &&, ||, and !, which stand
for AND, OR, and NOT, respectively. Task: Print lines where $2 is at least 4 or $3 is at least 20. awk $2 >= 4 || $3 >= 20 emp.data Data Validation
Awk is an excellent tool for checking that data has reasonable values and that it is in the right
format. Task: Use comparison patterns to apply five plausibility tests to each line of emp.data. awk NF !=3 {print $0, number of fields is not equal to 3} awk $2 <3.35 {print $0, rate is below minimum wage} awk $2 > 10 {print $0, rate exceeds $10 per hour} awk $3 <0 {print $0, negative hours worked} awk $3 >60 {print $0, too many hours worked}
3
BEGIN and END The special pattern BEGIN matches before the first line of the first input file is read, and END matches after the last line of the last file has been processed. Task: Use BEGIN to print a heading. (Note. This is a multiple line file and must be executed from a file.) BEGIN {print Name Rate Hours; print } {print} You can put several statements on a single line if you separate them by semicolons.
1.5 Computing with AWK In awk, user-created variables are not declared.
Counting Task: Use a variable emp to count employees who have worked more than 15 hours. $3 > 15 {emp=emp+1} END {print emp, employees worked more than 15 hours} Computing Sums and Averages Task: Use the built-in variable NR to count the number of employees awk END {print NR, employees} Task: Compute the average pay {pay = pay + $2*$3} END {print NR, employees print Total pay is , pay print average pay is , pay /NR } Handling Text
One strength of awk is its ability to handle strings of characters as conveniently as most
languages handle numbers. Task: Find the employee who is paid the most per hour. $2 > maxrate {maxrate = $2; maxemp=$1} END print highest hourly rate: , maxrate, for , maxemp} String Concatenation Task: Create new strings by combining old ones {names = names $1 } END {print names} Built-in Functions
Provides built-in variables that maintain frequently used quantities: number of fields, input line
number
1.6 Control-Flow Statements (Note: These constructs are available in gawk and not awk at
ISU.) IF-Else Statement $2 > 6 {n = n+1; pay = pay + $2 * $3} END {if (n > 0) print n, employees, Total pay is , pay, average pay is , pay/n else print no employees are paid more than $6/hour } While Statement Task: Show how the value of an amount of money invested at a particular interest rate grows over a number of years, using the formula value = amount (1 + rate)years. #interest1 - compute compound interest # input: amount rate years # output: compounded value at the end of each year. { i=1 while (i <= $3){ printf(\t%.2f\n, $1*(1+$2)^i) i=i+1 } } Try gawk -f interest1 1000 .06 5 1000 .12 5 References Aho, A. V., B. W. Kernighan, and P. J. Weinberger. 1988. The AWK Programming Language. Addison-Wesley. New York. Dougherty, D. 1990. Sed and Awk: UNIX Power Tools. OReilly and Associates, Inc. California.