LinuxCBT AwkSed Edition Notes
LinuxCBT AwkSed Edition Notes
LinuxCBT AwkSed Edition Notes
###SED's FEATURES###
1. Non-interactive editor
2. Stream Editor
a. Manipulates input - performing edits as instructed
b. Sed accepts input on/from: STDIN (Keyboard), File, Pipe (|)
3. Sed Loops through ALL input lines of input stream or file, by DEFAULT
4. Does NOT operate on the source file, by default. (Will NOT clobber the original
file, unless instructed to do so)
5. Supports addresses to indicate which lines to operate on: /^$/d - deletes blank
lines
6. Stores active (current) line the 'pattern space' and maintains a 'hold space'
for usage
7. Used primarily to perform Search-and-Replaces
###AWK's FEATURES###
1. Field processor based on whitespace, by default
2. Used for reporting (extracting specific columns) from data feed
3. Supports programming constructs
a. loops (for,while,do)
b. conditions (if,then,else)
c. arrays (lists)
d. functions (string, numeric, user-defined)
4. Automatically tokenizes words in a line for later usage - $1, $2, $3, etc.
(This is based on the current delimiter)
5. Automatically loops through input like Sed, making lines available for
processing
6. Ability to execute shell commands using 'system()' functions
###METACHARACTERS###
^ - matches the character(s) at the beginning of a line
a. sed -ne '/^dog/p' animals.txt
###CHARACTERS CLASSES###
Allows to search for a range of characters
a. [0-9]
b. [a-z][A-Z]
###INTRO TO SED###
Usage:
1. sed [options] 'instruction' file | PIPE | STDIN
2. sed -e 'instruction1' -e 'instruction2' ...
3. sed -f script_file_name file
Note: Execute Sed by indicating instruction on one of the following:
1. Command-line
2. Script File
sed -e '/^$/d' animals.txt > animals2.txt - deletes blank lines from file and
creates new output file 'animals2.txt'
###SEARCH & REPLACE USING Sed###
General Usage:
sed -e 's/find/replace/g' animals.txt - replaces 'find' with 'replace'
Note: Left Hand Side (LHS) supports literals and RegExes
Note: Right Hand Side (RHS) supports literals and back references
Examples:
sed -e 's/LinuxCBT/UnixCBT/' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to
STDOUT
sed -e 's/LinuxCBT/UnixCBT/I' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to
STDOUT (Case-Insensitive)
Note: Replacements occur on the FIRST match, unless 'g' is appended to the
s/find/replace/g sequence
sed -e 's/LinuxCBT/UnixCBT/Ig' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to
STDOUT (Case-Insensitive & Global)
Task:
1. Remove ALL blank lines
2. Substitute 'cat', regardless of case, with 'Tiger'
Note: Whenver using '-n' option, you MUST specify the print modifier 'p'
sed -ne '/^$/d' -e 's/Cat/Tiger/Ig' animals.txt - removes blank lines &
substitutes 'cat' with 'Tiger'
OR sed -e '/^$/d; s/Cat/Tiger/Igp' animals.txt - does the same as above
Note: Simply separate multiple commands with semicolons
###Focus on the Right Hand Side (RHS) of Search & Replace Functions in SED###
Note: SED reserves a few characters to help with substitutions based on the matched
pattern from the LHS
& = The full value of the LHS (Pattern Matched) OR the values in the pattern space
Task:
Intersperse each line with the word 'Animal '
sed -ne 's/.*/&/p' animals.txt - replaces the matched pattern with the matched
pattern
sed -ne 's/.*/Animal &/p' animals.txt - Intersperses 'Animal' on each line
sed -ne 's/.*/Animal: &/p' animals.txt - Intersperses 'Animal' on each line
###Sed Scripts###
Note: Sed supports scripting, which means, the ability to dump 1 or more
instructions into 1 file
Task:
Perform multiple transformations on animals.txt file
1. /^$/d - Removes blank lines
2. s/dog/frog/Ig - substitutes globally, 'dog' with 'frog' - (case-insensitive)
3. s/tiger/lion/Ig - substitute globally, 'tiger' with 'lion' - (case-insensitive)
4. s/.*/Animals: &/ - Interspersed 'Animals:'
5. s/animals/mammals/Ig - Replaced 'Animals' with 'mammals'
6. s/\([a-z]*\)\([0-9]*\)/\1/Ip - Strips trailing numeric values from alphas
###Awk - Intro###
Features:
1. Reporter
2. Field Processor
3. Supports Scripting
4. Programming Constructs
5. Default delimiter is whitespace
6. Supports: Pipes, Files, and STDIN as sources of input
7. Automatically tokenizes processed columns/fields into the variables: $1, $2, $3
.. $n
8. Supports GREP and EGREP RegExes
Usage:
awk '{instructions}' file(s)
awk '/pattern/ { procedure }' file
awk -f script_file file(s)
Tasks:
Note: $0 represents the current record or row
1. Print entire row, one at a time, from an input file (animals.txt)
a. awk '{ print $0 }' animals.txt
6. Remove blank lines with Sed and pipe output to awk for processing
a. sed -e /^$/d animals.txt | awk '/^[0-9]*$/ { print $0 }'
###Delimiters###
Default delimiter: whitespace (space, tabs)
Use: '-F' to influence the default delimiter
Task:
###Awk Scripts###
Features:
1. Ability to organize patterns and procedures into a script file
2. The patterns/procedures are much neater and easier to read
3. Less information is placed on the command-line
4. By default, loops through lines of input from various sources: STDIN, Pipe,
files
5. # is the default comment character
6. Able to perform matches based on specific fields
Tasks:
1. Print to the screen some useful information without reading input (STDIN, Pipe,
or File)
a. awk 'BEGIN { print "Testing Awk without input file" } '
3. Write script to extract rows which contain 'deer' from animals.txt using RegEx
a. awk -f animals.awk animals.txt
4. Parse /etc/passwd
a. print entire lines - { print }
b. print specific columns - { print $1, $5 }
c. print specific columns for a specific user - /linuxcbt/ { print $1, $5 }
d. print specific columns for a specific user matching a given column - $1 ~
/linuxcbt/ { print $1, $5 }
e. test column #7 for the string 'bash' - $7 ~ /bash/ { print }
###Awk Variables###
Features 3 Types of variables:
1. System - i.e. FILENAME, RS, ORS...
2. Scalars - i.e. a = 3
3. Arrays - i.e. variable_name[n]
System Variables:
1. FILENAME - name of current input file
2. FNR - used when multiple input files are used
3. FS - field separator - defaults to whitespace - can be a single character,
including via a RegEx
4. OFS - output field separator - defaults to whitespace
5. NF - number of fields in the current record
6. NR - current record number (it is auto-summed when referenced in END section)
7. RS - record separator - defaults to a newline
8. ORS - output record separator - defaults to a newline
9. ARGV - array of command-line argurments - indexed at 0, beginning with $1
10. ARGC - total # of command-line arguments
11. ENVIRON - array of environment variables for the current user
Tasks:
1. print key system variables
a. print FILENAME (print anywhere after the BEGIN block)
b. print NF - number of fields per record
c. print NR - current record number
d. print ARGC - returns total number of command-line arguments
Scalar Variables:
variable_name = value
age = 50
Note: Set scalars in the BEGIN section, however, they can be, if required, set in
the main loop
{ ++age } - increments variable 'age' by 1, for each iteration of the main loop
(component 2 of 3)
Array Variables:
Feature:
1. List of information
Task:
1. Define an array variable to store various ages
a. age[0] = 50
2. Use split function to auto-build an array
a. arr1num = split(string, array, separator)
###Operators###
Features:
1. Provides comparison tools for expressions
2. Generally 2 types:
a. Relational - ==, !=, <, >, <=, >=, ~ (RegEx Matches), !~ (RegEx Does NOT
Match)
b. Boolean - ||(OR), &&(AND), !(NOT) - Combines comparisons
4. Find records that have at least 2 fields and are positioned at record 5 or
higher
a. NF >= 2 && NR >=5 { print }
###Loops###
Features:
1. Support for: while, do, and for
While:
{ while (NR > 10) print "Greater than 10" }
For:
for(i=1; i <=10; ++i) print i
###Printf Formatting###
Feature:
1. Ability to control the width of fields in the output
Usage:
printf("format", arguments)
Supported Printf Formats include:
1. %c - ASCII Characters
2. %d - Decimals - NOT floating point values OR values to the right of the
decimal point
3. %f - Floating Point
4. %s - Strings
Note: printf does NO print newline character(s)
This means you'll need to indicate a newline character sequence: \n - in the
"format" section of the printf function
Examples | Tasks:
1. print "Testing printf" from the command-line
a. awk 'BEGIN { printf("Testing printf\n") }'
4. Left-justify task #3
a. awk 'BEGIN { printf("Here is the output\n")} { printf("%-20s\t%-20s\n", $1,$2)
}' animals.txt
5. Parse animals_with_prices.txt file and properly represent strings, decimals and
floating point values
a. awk 'BEGIN { printf("Here is the output\n\n")} { printf("%-5s\t$%.2f\n",
$1,$2) }' animals_with_prices.txt
b. Effect the change to ALL product files and create .new output files without
clobbering the source files
for i in `ls -A products_*php`; do sed -e 's/<b>Shipping<\/b>: Free<br>//'
$i > $i.new; done
Windows Stuff:
gawk "BEGIN { max=ARGV[1]; for (i=1;i<=max;++i) print i }" 10 - reads 10 from
ARGV[1] and passes it to 'max' var for use in the 'for' loop