Adv Unix Scripting
Adv Unix Scripting
Table of Contents
Awk, Nawk, Gawk
Making Unix Shell Variables Upper and Lower Case
Unix Shell Variable Editing Hacks
Read-Only Unix Shell Variables
Don't Be Square, Part 1
Don't Be Square, Part 2
Don't Be Square, Part 3
Don't Be Square, Part 4
Awk Basics
Don't Be Square, Part 5
Elegant Solution for Common "ps" Problem
Looking at specific lines in a large logfile
Reversing a Text File
Reversing the Unix ls command
Celsius to Fahrenheit Conversion
Awk Day of the Week Program
Two Awk Scripts for Calculating Compound Interest
Word Frequency
Selecting the Most Recent File in a Directory
Unix Basic Calculator
MAN Pages
Unix Change directory Command
Awk Script to Sum Integers
Script to Sum Any Range of Integers
Awk substr function
Using Awk to Find Average Annual Interest Rate
Unix Date Command
Setting Default Values For Unix Shell Variables
Awk Script for 5x5 matrix
Solving A Multi-Line Problem With Awk
Example of Awk Pattern Matching
Unix test command
Awk Regular Expressions
Example of Renaming Files in Unix
Using Temporary Files in Unix Scripts
Unix Find Command
Awk OFS parameter
Adventure Game Written In Awk
WildCards for Unix Shell Commands
Searching for Special Characters in a File
Running Multiple Unix Commands On The Same Command Prompt
Comparing Time On Two Unix Servers
Unix shell Script Here-Document Function
Awk Script to Combine Lines in a File
Unix Stream Editor
Finding Duplicates in a File
Running Multiple Unix Commands
Unix Change Directory Commands
Complex Global Substitution in Unix Text files
Searching through Directories in Unix
Using Unix Commands on Web Pages
Using AWK to Generate SQL from File
Calculating the Previous Month in Unix
Processing Multiple Files Through Awk
Unix Script to Capitalize First Letter of Word
Pulling Sections From An XML File Using AWK
Unix Sort Question
Unix Script to Find Difference From Two Time Stamps
Using Unix Shell Script to Build Java Classpath
Using Awk To Generate Random Coupon Code
Splitting a Unix File into Smaller Files
The Split Function in Awk
Accessing Unix Command Output in Awk
A Formal Way To Parse Command Lines in Shell Scripts
Running Unix Commands on Directories With Too Many Files
Extracting Initials In Awk, Part 1
Extracting Initials In Awk, Part 2
Generating Random Numbers In Awk
Using the Unix Dot (.) Operator to Run a Shell Script in the Current Shell
Sending Email From a Unix Script
Customizing Your Unix Ksh Environment with .kshrc
How To Use Multiple-Word Arguments in Unix Scripts
Awk Script to Generate an HTML Page With Randomly Labelled URLs
Using Awk to Generate HTML and Java Script
Calling Unix Commands From Within Awk: system vs getline
Creating csv Files From Unix When the Data Has Commas
Using uuencode to Mail Attachments From Unix Shell Scripts
Removing Carriage Return Line Feeds (CRLF's) From Text Files in Unix
The Unix shell For Loop
If you are a programmer or engineer working in a unix or linux environment,
you will probably find the shell 'for' loop to be a handy tool for automating
command line tasks.
Here are three examples of the 'for' loop. All the commands are in italics and
should be entered on the command line, followed by a carriage return.
Note that, after entering the initial 'for' line, you will get the secondary unix
prompt, which is usually a ">".
for i in *.old
do
j=`echo $i|sed 's/old/bak/'`
mv $i $j
done
Here, we looped thru all files with extension ".old", setting the variable "i" to be
the file name we are currently looping thru. Then, between the "do" and "done",
we have the body of the loop. On each pass, we echo the file name ("i") to the
unix stream editor sed. Sed replaces the "old" with "bak" (so file "a.old"
becomes "a.bak"), and saves the changed name to variable "j". Then, we use the
unix move (mv) command to rename the original file (ex. a.old) to the new file
(a.bak).
2. Change all instances of "yes" to "no" in all ".txt" files in the current directory.
Back up the original files to ".bak".
for i in *.txt
do
j=`echo $i|sed 's/txt/bak/'`
mv $i $j
sed 's/yes/no/' $j > $i
done
In this case, we rename each file from ".txt" to ".bak". Additionally, we use sed a
second time, on the contents of the original file (now with a ".bak" extension)
and save the modified text back to the original name (with ".txt").
Here, we loop thru the results of a command (in this case "cat"), rather than
looping thru files in the directory. We also use an if statement with the "test"
command to test for a condition (in this case, whether the file is readable).
Awk, Nawk, Gawk
A lot of the awk examples I will share have been done on a Sun Solaris System.
Sun keeps the original version of awk on its systems. This version is very basic,
and does not contain many of the features in the current standard version of awk.
Sun systems carry this new version as "nawk" (for new awk).
On other systems, like AIX, HPUX, Linux, etc. the standard version is called
"awk".
Finally, the GNU project has an awk version called Gawk, which is free and
available for all systems, including windows.
For almost all my examples, all the awks (except the Sun awk) will work.
So, if you see me type "nawk ..." and you are on another system besides Sun,
you can change the nawk to awk.
Making Unix Shell Variables Upper and Lower Case
When writing unix ksh programs, you can use the built-in typeset command to
make variables upper and lower case only.
Example:
[542]-> small="BIG"
big
Example:
I AM TINY
Unix Shell Variable Editing Hacks
The unix korn shell (ksh) has some neat tricks for editing variables:
${variable#pattern} will return variable with the smallest possible pattern
removed from the front.
[586]-> t="/etc/tmp/foo.c"
[587]-> echo ${t#*/}
etc/tmp/foo.c
${variable##pattern} will return variable with the largest possible pattern
removed from the front.
[586]-> t="/etc/tmp/foo.c"
[587]-> echo ${t##*/}
foo.c
${variable%pattern} will return variable with the smallest possible pattern
removed from the back.
[589]-> t=foo.c.bak
[590]-> echo ${t%.*}
foo.c
${variable%%pattern} will return variable with the largest possible pattern
removed from the back.
[589]-> t=foo.c.bak
[590]-> echo ${t%%.*}
foo
I always remember that # slashes from the front, and % from the back, because #
comes before % on the keyboard.
The % can be handy for scripts that move files to back up versions.
i.e. If f = "sample.txt", you can move it to "sample.bak" with: mv $f
${f%.txt}.bak
Read-Only Unix Shell Variables
You can use typeset -r to make variables in a ksh script be read-only. This
effectively makes them constants, which can no longer be modified.
For example:
[550]-> pi=3.14159
3.14159
[552]-> typeset -r pi
[553]-> pi=45
3.14159
Don't Be Square, Part 1
Let's look at an awk program called "square". It takes a number as an argument,
and prints out
its square.
[628]-> square
Usage: square number
[629]-> square 4
The square of 4.00 is: 16.00
#! /usr/bin/nawk -f
BEGIN {
if (ARGC <2)
{
printf ("Usage: square number\n")
exit 1
}
The first line, "#!...", tells unix what command to use to run this file. By putting
this line in the file, I can just type "square number" to run it. If this line was not
there, I would have to type "nawk -f square number" to run the file. If you were
using gawk on windows, for example, you would have to type "gawk -f square
number" in your DOS window.
The next thing to notice is that the program is blocked into a section called
"BEGIN". By default, awk tries to read a file or input stream, and apply its
commands to each line. Commands in a BEGIN block are executed once, before
any input is processed. If your awk program only has a BEGIN block, then awk
works just like a standard procedural language. It executes the commands top to
bottom.
The "if" statement executes if there are less than 2 arguments (the program name
+ the number). If so, it prints the usage information, and exits with a status code
of 1. The unix convention is that a program returns 0 if execution was
successful, or else a non-zero error code.
If we make it past the argument check, the program prints the square and exits
(with a default return code of 0).
One of the authors of awk was also the author of C and unix. Thus, awk uses a
lot of C commands and syntax. One of the features of awk is the C printf
statement (awk also has a simpler print statement).
The printf command arguments consist of the message in quotes (with formatters
like %f for variables) and a list of any variables used. Printf does not
automatically put a new line at the end of the message, so the user must insert a
backslash n for a carriage return.
In the second printf statement we use %.2f as the formatter. %f is floating point,
while %c is a character, %s is a string, and %d is an integer. The .2 says that we
want 2 decimal places.
A lot of times, a simple print statement will do, but printf gives you a lot of
control for output.
Don't Be Square, Part 2
Here is another squares AWK program. This one prints the squares from 1 to 10.
[721]-> ./squares
Squares from 1 to 10
Number Square
------ ------
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
BEGIN {
if (ARGC < 2)
{
printf ("Usage: %s file\n",ARGV[0])
exit 1
}
Here we see that awk uses the standard while loop: The printf statement is
executed
as long as the body of the while loop is true.
The while loop body, however, consists of a command unique to awk: the
getline.
In this case, getline reads from the file named by the first argument. This while
loop will go thru all the lines in the file, until getline gets to the end of the file,
and returns 0 (non-true).
Don't Be Square, Part 4
Now, we will add a twist to the square_file program from the last example.
We will still use the same input file
[771]-> cat tmpfile
18
3
23
125
But, now, our numbers to be squared are displayed sorted, from lowest to
highest:
BEGIN {
if (ARGC < 2)
{
printf ("Usage: %s file\n",ARGV[0])
exit 1
}
Notice that the only thing that has changed is the getline statement in the while
loop. Last time, we redirected the argument to getline, so it was opened as a file.
Now, we made a unix command string (to numerically sort the file). By piping
the
string into getline, awk executes it in unix, and puts the output into getline.
The interesting thing here is that getline executes the command just once, on its
first use. After that, until a close("sort -n "ARGV[1]) is used, or the script ends,
the pipe stays open and each call to "sort -n "ARGV[1] getline returns the next
line in the file.
Before, the while loop ended when getline reached the end of the file. In this
case, the while loop ends when getline gets to the end of the pipe contents.
Awk Basics
Some languages, such as basic and C, are procedural - the program executes
once, top to bottom. Languages like java are object-oriented - the program is
written as a series of objects which interact with each other.
By default, one record = one line of input. The variable NR is the current record
number (i.e. line number). In advanced usage, the user can redefine the record to
span multiple lines.
Each record is split into one or more fields. The default separator is white space
(blanks, spaces, tabs). Each field in a record is denoted by a "$" sign and number
(i.e. $4 is the fourth field on the current record).
Patterns and actions are optional: a pattern with no actions will print any lines
that match the pattern. Action(s) with no pattern will be done for all lines. An
awk program with no pattern or action will just loop thru standard input and do
nothing.
There are 2 special patterns: BEGIN and END. Any actions in a BEGIN block
are executed once, before standard input is looped through. Actions in the END
block are done once, after standard input is processed.
The BEGIN block also has the special feature that, if it is used in an awk
program with no standard input, it makes awk into a regular procedural
language. We have been using awk this way in the “Don't Be Square” examples.
An awk program with no BEGIN block and no standard input will just hang.
/^The/ {print $1} Print the first field of the line if the line starts with "The"
$2==5 {x = 12} Set variable x equal to 12 if the second field of the line is 5
NR<=32 If the line is one of the first 32, we will print it (the default action)
{Name[NR]=$4} For all lines (default), store field 4 in array element Name[line
number]
Don't Be Square, Part 5
In this last Squares example, we will rewrite the awk program from “Don't Be
Square, Part 3” into awk's iterative style.
This was the procedural program we had written to loop through a file of
numbers and square them:
#! /usr/bin/nawk -f
BEGIN {
if (ARGC < 2)
{
printf ("Usage: %s file\n",ARGV[0])
exit 1
}
Now, we will write the program using the built-in iteration of awk:
#! /usr/bin/nawk -f
BEGIN {
if (ARGC < 2)
{
printf ("Usage: %s file\n",ARGV[0])
exit 1
}
}
Notice that, instead of having the whole program in the BEGIN block, we have
moved the printf statement that does the squaring into its own {} block which
has no pattern. This means that the printf statement will be executed for each
line in standard input. We do not have to explicitly loop with while and getline.
There is a difference between the two programs however. Both programs check
that there is at least one argument to the program. The first program, however,
only tries to open and loop thru the first argument. The latest program will loop
thru all files. So, if we had invoked both programs with prog file1 file2, then the
first program will just square the numbers in file1, while the second example will
square the numbers in both files.
Elegant Solution for Common "ps" Problem
Suppose you want to list the active processes on your system. You would use the
"ps" command:
$ ps
$ ps | grep vi
10ae 0535 0056 0c34 10ae (vi.exe) \unix\bin\vi.exe
1105 10ae 15cd 0000 2112 (/unix/bi) -c ps | grep vi
The problem is that you also get the entry for the "grep" process itself.
Thus, in our case, "grep" is simply looking for "vi", but the "ps" entry
for the "grep" contains the original "[v]i", so we prevent a match.
Looking at specific lines in a large logfile
One day, a co-worker asked me for help. She was trying to look at a specific
error message in a unix logfile that was too big to view with an editor.
She knew how to use grep ERROR file to display the lines that the error occurred
on, but she needed to also see the lines on either side of the error.
I told her to first use grep -n ERROR file to return the errors with the line
numbers.
Then she can use awk with each instance of the error to see the lines around it.
For example, if the grep showed the error ocurred on line 345, she could then
use
[560]-> date
Some examples:
10/05/2006
10/05/06
100506.100131
The last example is great for creating unique log file names.
So you can do the following in your script:
DATE=`date '+%m%d%y.%H%M%S'`
LOG=/tmp/programname.$DATE
There are lots of formatting options for date. You can use man date to check out
the manual pages.
Awk split command
Awk has a split command, which takes a string and splits it into an array, and
returns the number of elements. The default separator is white space.
As an example, let us assume that a line in a log file consists of:
4/2/2003 11:23:18 This is a log entry with timestamp.
and we have an awk program like this:
{
split($1,DATE,"/")
n = split($2,TIME,":")
print "Month is "DATE[1]
print "Minutes are "TIME[2]
print "Time has "n" parts"
}
Running the program against the log file line would result in the following
output:
Month is 4
Minutes are 23
Time has 3 parts
Setting Default Values For Unix Shell Variables
The unix korn shell has a shortcut method for returning a default value for a
variable that is not set, so that you do not need to use an if loop to check it.
For example, let us assume that the variable T is not set.
Then, echo $T returns nothing.
However, echo ${T:-"not set"} will return not set.
So, let us say that you write a script that takes an optional numerical argument.
If the argument is not specified (i.e. $1 is not set), you want the argument to
default to 2.
You could do
if test "$1" = ""
then
VAL=2
else
VAL=$1
fi
or you can just do
VAL=${1:-2}
Awk Script for 5x5 matrix
Here is an awk script that reads in a 5x5 matrix and prints the matrix, along with
the sums of the rows and columns.
Given file rr:
1 0 3 1 8
4 9 7 5 3
6 4 8 6 2
7 8 3 2 1
5 4 3 2 1
Here is the run:
[565]-> matrix rr
1 0 3 1 8| 13
4 9 7 5 3| 28
6 4 8 6 2| 26
7 8 3 2 1| 21
5 4 3 2 1| 15
-------------------------------
23 25 24 16 15
Here is the script:
#! /usr/bin/awk -f
{
R=$1+$2+$3+$4+$5
print $0"| "R
C[1]+=$1
C[2]+=$2
C[3]+=$3
C[4]+=$4
C[5]+=$5
}
END {
print "-------------------------------"
print C[1]" "C[2]" "C[3]" "C[4]" "C[5]
}
The script loops thru each line (row). For each row, it computes the total, then
prints the row and total. Then, it adds each column element to a running total.
The END loop gets run after all the rows are processed. It prints the final
column totals below each column.
Solving A Multi-Line Problem With Awk
Here is a problem that spans more than one line in a file.
Problem
-------------
Given a text file such as:
Part 1
564 32718 976
54 2345 987 50
432 1 75
Section 2
281 34 1290 345
21 8 4 3
Create a script that outputs:
$1 ~ /^[0-9]/
In this case, we need to print the current prefix and the content line like this:
{ print prefix":"$0 }
Putting it all together, our shell script for solving the problem is:
#! /bin/ksh
nawk '
NF==2 && $2 ~ /^[0-9]/ { prefix = $1" "$2 }
$1 ~ /^[0-9]/ { print prefix":"$0 }
' $1
Example of Awk Pattern Matching
Here is a problem-solution exercise that I wrote about 12 years ago, as another
example of pattern matching:
Problem:
---------
Compose a script or command to print out the lines in a text file
that contain "abc", and either "rst" or "xyz", but not "0.00".
Solution:
------------
nawk '/abc/ && ( /rst/ || /xyz/ ) && $0 !~ /0\.00/' filename
The tricky thing here is that we must escape the period, otherwise
it will match any character.
Solution2:
----------------
Now, when I wrote the above solution, notice that, for the 0.00 check, I used $0
!~ /0\.00/
The ~ and !~ are used when comparing strings with patterns. Since I'm checking
the whole line, I can drop the $0 ~ and write just the pattern, like the other
patterns (of course, I have to include the negation !):
nawk '/abc/ && ( /rst/ || /xyz/ ) && !/0\.00/' filename
Unix test command
The unix test command can test for various conditions, and then returns 0 for
true and 1 for false. Usually, test is used in if...then or while loops.
There are two ways of invoking test - either test condition or [ condition ]
Strings are compared with =, !=, >, and <.
For example, test "$1" = "yes" is true if $1 is "yes".
Numbers are compared with -eq (equal), -ne (not equal), -lt (less than), -le (less
than or equal), -gt (greater than), -ge (greater than or equal to).
For example, test $1 -lt 4 means that $1 is a number less than 4.
Here are some other conditions:
test -r file True if file file exists and is readable.
test -w file True if file file exists and is writeable.
test -x file True if file file exists and is executable.
test -d file True if file file exists and is a directory.
test -s file True if file file exists and has a size greater than 0.
test -z string True if string string exists is length 0.
conditions can be compounded with -o (or) or -a (and).
For example,
if test $SIZE -lt 5000 -o $DAY != "Sun" will be true if $SIZE is less than 5000
or $DAY is not equal to "Sun".
For more conditions, use man test.
Awk Regular Expressions
In Awk scripts, you can use regular expressions to match text. Regular
expressions are encolsed in forward slashes (/).
In the pattern part of an awk script, the regular expression by iteself means that
the pattern is checked against the whole record ($0).
For example, /jump/ {print $1} will print the first field if the line contains
jump.
Regular expressions can be matched to strings by using the tilde (~).
For example, $0 ~ /jump/ {print $1} is equivalent to the statement above, while
$2 ~ /jump/ {print $1} will only print the first field if the second field contains
the pattern.
Brackets allow alternatives in regular expressions. For example, /[Jj]ump/ {print
$1} will print the first field if the line contains jump or Jump.
A ^ anchors the pattern to the start of the string. So /^jump/ {print $1} will only
print the first field if the line starts with jump. It will not, for example, match
parachute jump.
A $ anchors the pattern to the end of the string. So /jump$/ {print $1} will print
the first field if the line is jump or parachute jump, but not jump out.
A * after a character or brackets means "0 or more", while a + means "1 or
more".
So, /j*/ would match jump, banjo, and cook, but /j+/ would match jump and
banjo, but not cook.
Example of Renaming Files in Unix
Problem
-----------
Write a script to automatically rename all files in the current directory
that have the ".txt" extension to the same name, with a ".doc" extension.
Thus, "foo.txt" becomes "foo.doc", "bar.txt" becomes "bar.doc", etc.
Solution
----------
First, we use the unix's for-loop mechanism to loop through all
".txt" files, and perform some as yet undefined actions:
for i in *.txt
do
actions
done
The above code will loop through all files with extension ".txt", one at a time,
and set i to the full name of the file.
Now, two actions are necessary. First, we must generate the new filename, then
we must rename(move) the original file to the new file.
To generate the new name from the old, we need only perform a simple
substitution.
This is a job for sed!
j=`echo $i | sed s/\.txt/\.doc/`
This line pipes the value of i into a sed command, which replaces the ".txt" with
".doc". This modified value of i is then assigned to j.
To rename the file, we simply execute:
mv $i $j
Putting it all together, we get the following shell script:
for i in *.txt
do
j=`echo $i | sed s/\.txt/\.doc/`
mv $i $j
done
However, my advice would be that, before you run any for loop that modifies
files, you run a version which displays what you want to do, just to double
check.
So, before you execute our script to actually move the files, I would substitute
echo $i $j for mv $i $j and run this loop to make sure you are moving the right
files to the right extension.
Using Temporary Files in Unix Scripts
If you need to use one or more temporary files in a unix script, you should
append $$ to the filename.
$$ returns the process id of the process that is running that instance of the script.
This is important because unix is multi-tasking, and it is very possible for a
script to be running multiple times, simultaneously.
Unless a temporary file has a unique name, there is the danger that simultaneous
runs will overwrite the temporary file, causing problems.
For example, if your script used:
TMPfile=/tmp/stordata
The file will not be unique for different runs of the program. So, if the script is
running two times simultaneously, both sessions will write to /tmp/stordata at the
same time.
On the other hand,
TMPfile=/tmp/stordata.$$
Will be unique for each session. In the case of two users running the script
simultaneously, the two processes that are running the script may be 1234 and
1476, for example. So, the first script will write to /tmp/stordata.1234, while the
other writes to /tmp/stordata.1476.
After using the temporary file(s) in your script, you should clean them up by
removing them.
This can be done by a line such as the following at the end of the script:
/bin/rm -f $TMP 2>/dev/null
Notice that we are using the -f option to rm. This is because rm sometimes
prompts you if you want to remove the file. The -f overwrites this.
The other thing is that we are redirecting standard error to /dev/null. This means
we do not want to display any error messages from the rm on the screen.
Unix Find Command
The unix find command lets you recursively search a directory for files, and
even run commands on the files as they are found.
The basic find command is:
find dir
This will return a list of all files and directories, from directory dir on down. For
example, find . will return all directories and files from the current directory on
down.
We can add the -name option to look for specific files:
i.e. find /tmp -name "*.log"
Will return all files with the .log extension that are under the /tmp directory
structure.
We can use the -exec option to run a command on each file that find finds.
In this example, we also use the -mtime option to only select files that have not
been modified in a certain number of days:
find $log_dir -name "*.log" -mtime +14 -exec rm {} \;
This will find all .log files under the $log_dir directory that have not been
modified in 14 days and, for each file that is found, the rm command will be
run, and the {} will be replaced by the file name.
Whenever you use the -exec option, you have to end the line with a space and \;.
These are the main find options that I use. If you want to learn more, you can do
a man find to see the manual pages.
Awk OFS parameter
In awk, OFS is the output field separator. By default it is a space.
This parameter is used by the print command when you separate strings by using
a comma.
For example, if we have a test file consisting of one line:
burp boy orange
Then running the following script on the file:
nawk '
{
print $1,$2,$3
OFS="%"
print $1,$2,$3
print $0
}' $*
Will produce the following output:
burp boy orange
burp%boy%orange
burp boy orange
Notice that, in the first print, OFS is the space by default. So the fields are
printed with a space in between them.
Then, we set OFS to be a % sign, and the next print statement outputs the fields
separated by a %.
Finally, we do a print on $0 to illustrate the fact that $0 always preserves the
original format of the line.
Adventure Game Written In Awk
I wrote a small text-adventure game in awk - just to stretch the perception of
awk, and show that it can be used as a programming language.
This game is small, but gives a taste of the fantasy adventure games of the 80's -
like Zork from Infocom.
In this adventure, you are in a cave complex, and need to find the hidden gold to
win. The adventure lets you move around, search, pick up objects, and use
them. It uses a menu instead of free-form entries.
Here is the awk code:
nawk '
function intro() {
print
print "You are a brave adventurer. You have entered a hidden"
print "cave just outside town, that is rumored to hold gold!"
print "To win this adventure, you need to get the gold."
}
function invent() {
if (coin || axe || sword)
print "You are carrying: "
if (coin) print "coin"
if (axe) print "big, rusty battle axe"
if (sword) print "small sword"
}
function cave() {
print
print "You are standing in a cave. Sunlight gleams behind you"
print "from the entrance. In front of you, is a wooden door."
print "You see an opening to the left, and one to the right."
print
invent()
print
print "What do you want to do? "
print
print "(o)pen wooden door"
print "go (l)eft"
print "go (r)ight"
print "leave thru the (e)ntrance"
if (sword) print "break door with your (s)word"
if (axe) print "break door with your (a)xe"
print "(y)ell Open Sesame"
print "e(x)amine area"
print "read (i)ntroduction"
"read x;echo $x"|getline x
close "read x;echo $x"
if (x=="o") {print "The wooden door is shut tight."; cave()}
if (x=="l") {deadend()}
if (x=="r") {cave2()}
if (x=="e") {print "You decide to quit. Goodbye!";exit}
if (sword&&x=="s") {print "your sword breaks!";sword=0;cave()}
if (axe&&x=="a") {
print "You chop down the door and find the gold!!"
print "Great job, bold adventurer!"
print "This is the end of this adventure, but"
print "you have a promising career ahead of you!"
exit;
}
if (x=="y") {
print "A band of evil goblins passing by the entrance"
print "hear you, enter the cave, and kill you"
exit;
}
if (x=="x") {print "You find nothing";cave()}
if (x=="i") {intro();cave()}
print "What do you want to do?";cave()
}
function deadend() {
print
print "You are in a dead end"
print
invent()
print
print "What do you want to do? "
print
print "go (b)ack"
print "e(x)amine area"
print "read (i)ntroduction"
"read x;echo $x"|getline x
close "read x;echo $x"
if (x=="b") {cave()}
if (x=="x") {print "You find a sword!";sword=1;deadend()}
if (x=="i") {intro();deadend()}
print "What do you want to do?";deadend()
}
function cave2() {
print
print "You are in another cave."
print "You can go back, or explore a niche to the left."
print
invent()
print
print "What do you want to do? "
print
print "go (b)ack"
print "enter (n)iche"
if (rubble) print "(s)earch rubble"
print "e(x)amine area"
print "read (i)ntroduction"
"read x;echo $x"|getline x
close "read x;echo $x"
if (x=="b") {cave()}
if (x=="n") {niche()}
if (rubble&&x=="s"&&!coin) {print "you found a coin!";coin=1;cave2
()}
if (rubble&&x=="s"&&coin) {print "you found a nothing!";cave2()}
if (x=="x") {print "You see a pile of rubble";rubble=1;cave2()}
if (x=="i") {intro();cave2()}
print "What do you want to do?";cave2()
}
function niche() {
print
print "You are in a niche."
print "There is a dwarf here!"
print
invent()
print
print "What do you want to do? "
print
print "go (b)ack"
print "(t)alk to dwarf"
if (!sword&&!axe) print "(f)ight dwarf"
if (sword) print "fight dwarf with (s)word"
if (axe) print "fight dwarf with (a)xe"
if (coin) print "(o)ffer coin to dwarf"
print "e(x)amine area"
print "read (i)ntroduction"
"read x;echo $x"|getline x
close "read x;echo $x"
if (x=="b") {cave2()}
if (x=="t") {print "The dwarf grunts";niche()}
if (x=="f") {print "The dwarf kills you";exit}
if (x=="s") {print "The dwarf kills you";exit}
if (x=="a") {print "The dwarf kills you";exit}
if (coin&&x=="o") {print "The dwarf takes the coin and gives you a
n axe!";coin=0;axe=1;niche()}
if (x=="x") {print "You find nothing";niche()}
if (x=="i") {intro();niche()}
print "What do you want to do?";niche()
}
BEGIN { intro(); cave() }
'
This is one of the longest awk programs that I have written. Notice that it is
function-driven. I have created functions to give the introduction, and the
inventory, and I have created functions for each room.
The awk program is kicked off by the BEGIN section, which runs intro() and
cave() to put you in the first room.
Each object is represented by a variable of the same name (i.e. sword for sword)
and is either 0 (off) or 1 (on), depending if you have the object.
Each function will print descriptions and gve options, depending on the setting
of these boolean variables.
The inputting is done by using getline to run "read x;echo $x" to read from the
screen and echo the response into awk. Then, a close is done so that the next
getline will get fresh input.
WildCards for Unix Shell Commands
Let's look at some common wildcards you can use with unix commands.
For example, lets assume that the current directory has the following files:
proxy.txt
proxy1.txt
proxy2.txt
proxy11.txt
Proxy1.txt
Then let's use the ls command to demonstrate the differences between the
wildcards.
? matches 1 character.
$ cmd1; cmd2
This will first run cmd1, and then cmd2. It is the equivalent of running cmd1 on
the command prompt, pressing return, and then running cmd2 at the next
command prompt.
This will run cmd1 first. If cmd1 runs successfully (return code 0), then cmd2 is
run.
$ cmd1 || cmd2
This will run cmd1 first. If cmd1 runs un-successfully (non-zero return code),
then cmd2 is run.
Comparing Time On Two Unix Servers
You can check that the time on two unix machines are synchronized by logging
into machine1, and then simultaneously running the date command on machine1
and machine2, using remsh (remote shell):
This will return the current date/time from both machines, and you can see if
they are in sych.
I actually once had a production problem that was caused by one unix server
having its timestamp out of synch by 25 seconds. The machine that was running
25 seconds fast was running a program that was waiting for a response from a
program running on the second machine (that had the proper time).
The first program was supposed to wait for up to 2 minutes, and then timeout.
We noticed that it started timing out a lot. We first thought that there was some
problem with the second program, or with the communication between
machines.
But, then we figured out that the clocks were out of synch, and so the first
program was timing out when the second server still had 25 seconds in which to
respond.
Unix shell Script Here-Document Function
The Unix shell has what is called a "here-document" function which allows you
to put input under a command, instead of in a separate text file, and feed it into
the program.
This is done by placing a "<<" and a character string after the command. Then,
every line after the command is interpreting as free-form text to be fed into the
command, until a line is hit that has the character string.
For example,
jack be
nimble jack be
quick.
Awk Script to Combine Lines in a File
Problem: We have a text file containing individual records spanning multiple
lines. We want to combine them into one line each.
We want:
Name: John Doe Age: 32 Zip: 60324
Name: Jane Doe Age: 34 Zip: 54930
Name: Skippy Age: 134 Zip: 234556
Solution:
/Name/ {
print d
d=""
}
{ d=d" "$0}
END { print d}
When a Name: line is hit, this script prints the current value of variable d, and
then clears it. Then, for all lines (including the Name: line) the variable d is built
up.
Then, the END statement gets executed to print the last record.
Unix Stream Editor
The Unix stream editor (sed) is useful for editing streams of text. You can
either pipe the text into sed, or else give a file name - in which case, sed works
on the file.
In all cases, sed does not change the original text, but sends the modified text to
standard out.
Sure, I could just use awk but, for some tasks, its just easier to use sed.
Some examples:
sed 's/yes/no/' Substitute "no" for the first occurance of "yes" on each line.
sed 's/yes/no/g' In this case, substitute "no" for all occurances of "yes".
sed 's/yes/no/2' Substitute "no" for the second occurance of "yes" on each line.
Finding Duplicates in a File
If you have a text file, you can find lines that are duplicated by running
By default, the uniq command takes a sorted list and prints each line once. If you
add the -d option, it only prints lines that occur more than once.
For example,
file1
-------
hello
lemon
lemon
hello
lemon
So, to get all duplicates, we run sort file1 | uniq -d, which returns:
hello
lemon
Running Multiple Unix Commands
Let's look at some ways to run unix commands on the same command prompt.
$ cmd1; cmd2
This will first run cmd1, and then cmd2. It is the equivalent of running cmd1 on
the command prompt, pressing return, and then running cmd2 at the next
command prompt.
This will run cmd1 first. If cmd1 runs successfully (return code 0), then cmd2 is
run.
$ cmd1 || cmd2
This will run cmd1 first. If cmd1 runs un-successfully (non-zero return code),
then cmd2 is run.
Unix Change Directory Commands
In unix, the cd command is used to change directories. For example, cd /tmp will
put you in the /tmp directory.
cd dir (without a /) will put you in a subdirectory. for example, if you are in /usr,
typing cd bin will put you in /usr/bin, while cd /bin puts you in /bin.
cd .. will move you up one directory. So, if you are /usr/bin/tmp, cd .. moves you
to /usr/bin, while cd ../.. moves you to /usr (i.e. up two levels). You can use this
indirection to access subdirectories too. So, from /usr/bin/tmp, you can use cd
../../local to go to /usr/local.
cd - will switch you to the previous directory. For example, if you are in
/usr/bin/tmp, and go to /etc, you can type cd - to go back to /usr/bin/tmp. You
can use this to toggle back and forth between two directories.
Complex Global Substitution in Unix Text files
A co-worker asked me to help him change all bfx* files in a directory.
/2007
with:
/`date +%Y`
In other words, he wanted to remove the hard-coded 2007 and have unix
automatically run the date function to compute the current year.
for i in bfx*
do
sed 's/\/2007/\/\`date +%Y\`/' $i > tmp
mv $i $i.bak
mv tmp $i
done
Searching through Directories in Unix
Here is a script called "search" that will allow you to search through a hierarchy
of directories for files that contain a word or phrase:
You could type in, for example, "search green" or "search will be going". In the
first case, it will return the names of files that contain "green". In the second
case, it will return the names of files that contain the phrase "will be going".
Search works because of the find command. The unix find command searches
directories recursively, and it has the -exec option, which allows you to specify a
command to be run on any file that is found.
command and options are just the command name and any options. The {} are
place holders for the file name. Find will replace them with the name of each file
that it finds. The \; is used to signify the end of the command.
In this case, we are giving a grep command as the argument to the exec option.
Note that search is case insensitive so "search green" would return files with
"green", "Green", "GREEN", etc.
For case sensitive searches, I have a script called searchcase. The only difference
in searchcase is that the "i" in the grep is removed.
Using Unix Commands on Web Pages
We can parse web pages with unix tools by first using the lynx text browser to
retrieve the page.
For example, in this case we are using the lynx browser to return an Oncall page
(which lists which support people are on call), and then extracting the Primary
support person:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
URL="http://www.oncallPage.com/contacts/"
Then, I create a string variable called URL which holds the URL of the web page
I'm interested in.
Finally, I call lynx. The -dump parameter tells lynx to return the formatted page
to standard out.
At this point, it is simply a text stream which we can edit like any other text
stream. Here, I am grepping any lines that contain "Primary", and then using sed
1q to return just the first instance.
Using AWK to Generate SQL from File
Here is a unix script that will read in a text file containing customer ids, Old
Template Names, and New Template Names.
The script will use awk to generate sql statements that can be run in oracle to
rename the templates from old to new, for each of the companies.
This script would come in handy when we have a whole file full of templates to
change, and don't want to write the sql by hand.
The interesting thing is that we use sprintf to store the apostrophe in the variable
sq. This way, we can output apostrophe's, which normally cannot be embedded
in awk.
#! /bin/ksh
BEGIN { sq=sprintf("%c",39) }
{
custid = $1
old = $2
new = $3
printf("UPDATE TEMPLATES SET TEMPLATENAME = %c%s%c\n
",sq,new,sq)
printf("WHERE TEMPLATENAME like %c%s%c and custid = %c%s%c;\n",
sq,old,sq,sq,custid,sq)
}' $*
Calculating the Previous Month in Unix
Here is a unix shell script that calculates the previous month.
Here is the run:
[576]-> last_month
Today is 10/10/2006
Last month was 9/2006
First day of last month was 9/01/2006
Last day of last month was 9/30/2006
Here is the script:
day=`date +%d`
month=`date +%m`
year=`date +%Y`
echo "Today is $month/$day/$year"
lmonth=`expr $month - 1`
if test "$lmonth" = "0"
then
lmonth=12
year=`expr $year - 1`
fi
echo "Last month was $lmonth/$year"
lday=`cal $lmonth $year |awk '$0~/[0-9]/ {print $NF}'|tail -1`
echo "First day of last month was $lmonth/01/$year"
echo "Last day of last month was $lmonth/$lday/$year"
The first part of the script uses the unix date command to retrieve today's day,
month, and year. We print today's date.
'
Next, we use the korn shell's expr command to subtract 1 from the month. If the
month becomes 0, then that means that this month is January, so we wrap the
date to December of the previous year. We print out the previous month and
year.
In the third part, we retrieve the last day of the previous month, and then print
the first and last days of the previous month.
The tricky thing here is how we retrieve the last day. We run the unix cal
function to return last month's calendar. We pipe it into an awk command, which
prints the last field from each line. We pipe this to tail -1, which returns the last
line of the awk output.
This whole pipeline is enclosed in back ticks (`) so that we can assign the final
output to the variable lday.
Let's look at this, using the 9/2006 cal entry:
[578]-> cal 9 2006
September 2006
S M Tu W Th F S
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
The above cal output would go to awk, which would output:
S
16
23
30
This would be piped to tail -1, which would return 30.
Processing Multiple Files Through Awk
Problem
--------------
Given an arbitrary number of text files, compose an awk script to search the first
20 lines of each file for a pattern, and then print each matched line, along with its
filename.
Solution
-------------
The solution makes use of two awk values: FNR and FILENAME.
FNR is the line number expressed RELATIVE to the current input file, while NR
is the line number RELATIVE to all input. Thus, if file1 (containing 20 lines)
and file2 were the inputs to awk, NR for line 1 of file2 would be 21, but FNR
would equal 1.
FILENAME contains the name of the current input file being processed. Thus,
when lines from file1 are being evaluated, FILENAME="file1". When line 1 of
file2 is reached, FILENAME becomes "file2".
Thus, the solution to the problem is:
nawk 'FNR<=20 && /pattern/ {print FILENAME":"$0}' files
Unix Script to Capitalize First Letter of Word
A friend at work asked me for help with an interesting unix problem.
He wanted his script to look at the first argument passed on the command line
($1).
If the first letter of the argument was capitalized (i.e. Boy or DOG), he wanted to
assign the argument to variable y as it is, with no changes.
If the first letter was lowercase (i.e. boy or dOG), he wanted to capitalize the
first letter of the word before assigning it to y. But, he only wanted to capitalize
the first letter. He did not want to capitalize the whole word. So, boy would
become Boy, not BOY. For dOG, it would become DOG since the OG was
already uppercase.
I am using a variable called first to hold the first character of $1. I am using the
awk substring function to return the substring of $1 that starts with position 1,
and is 1 character long.
I used typeset -u to denote that variable first stores uppercase only. This means
that anything assigned to first will be automatically capitalized.
I am using the variable rest to hold the rest of $1. I used the awk substring
function to return the substring of $1 that starts in position 2. I did not specify a
length, so it automatically returns the characters up to the end.
So, for example, if $1 was sunset, then substr($1,1,1) returns s and substr($1,2)
returns unset.
Finally, I assign $first$last to y, so that y now holds $1, with the first character
guaranteed to be uppercase.
Pulling Sections From An XML File Using AWK
Recently, I had to pull all mt100 records out of a file that was written in XML
format. So, I needed to pull out anything between mt100 and /mt100 tags, while
ignoring the rest of the file.
/mt100/ {writ=1}
/\/mt100/ {writ=0}
In this case, awk will ignore lines in the file unless one of three conditions are
met:
1. The line contains the pattern mt100. If this occurs then a variable called writ is
set to 1.
2. If writ is 1, then we print the current line. All awk variables are initialized to
0.
3. The line contains pattern /mt100. If this occurs than writ is set to 0.
So, writ is a flag. It starts off as unset, then gets set whenever we hit the mt100
tag, and is unset when we hit the /mt100 tag. Whenever writ is set, lines get
output.
He had a file with the format of an alpha part and then a numeric part.
Here is an example:
abf 11111
abc 11111
abde 11111
abc 11112
He wanted the file sorted only using the alpha part, and he wanted lines with
duplicated alpha parts removed.
So, using the example input file, the output would be:
abc 11111
abde 11111
abf 11111
He had tried to use the sort and uniq commands, but was having trouble.
The -k 1,1 option causes the sorting to be done on the first to the first field. In
other words, only use the first field for sorting.
The -u option eliminates the need to pipe the output to uniq. Coupled with the -k
1,1 option, it removes lines that are duplicates only in the first field - which is
what we want.
Unix Script to Find Difference From Two Time Stamps
Here, we will find the difference between these 2 time stamps (assume they are
from the same day):
"5:02:02"
"5:19:59"
T1='5:02:02'
T2='5:19:59'
else
fi
We are storing the time stamps in T1 and T2. Then, we use the cut command to
extract the hour, minute, and seconds fields for each time stamp. We then
calculate T1 into seconds and store it in x1. We calculate T2 into seconds and
store that in x2.
We give bc the -l option to load the floating point library. Since we are only
adding integers we don't need the library, but I always give the command,
because there is hardly any overhead.
Using Unix Shell Script to Build Java Classpath
Scenario: You want to run a java program from a shell script. Before you invoke
the java command, you want to build the CLASSPATH vaariable dynamically
with all the jar files in a certain directory (denoted by $java_dir).
Solution:
do
CLASSPATH="$CLASSPATH:$line"
done
This for loop will cycle through each file in the directory $java_dir that has a
".jar" extension.
During each pass, the variable "line" is set to the full pathname of the jar file. We
add the jar's pathname to the CLASSPATH.
Using Awk To Generate Random Coupon Code
Here is an awk script I use to generate 100 random 8-character coupon codes.
Each character has 62 possibilities (a-z, A-Z, and 0-9). This means there are
8^62 possible coupon codes.
BEGIN {
s="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
srand()
for (i=1;i<=100;i++)
code=""
for (j=1;j<=8;j++)
code = code""substr(s,int(rand()*62)+1,1)
print code
Let's analyze the script. First, we set string s to hold all 62 possible characters.
Next, we call srand() to seed awk's random number generator. We left the
argument blank, so that the current date is used for seeding.
We now loop 100 times, because we want to output 100 coupon codes.
In this loop, we first set the coupon code back to the empty string. Then, we
have an inner loop that executes 8 times to build the code. Finally, we print the
code.
Notice the command in the inner loop. This command uses the random (rand)
function. Since rand() returns a number greater or equal to 0, and less than 1, we
multiply it by 62 and use the integer (int) function.
This will return a number between 0 and 61. Why? Because int(0*62) = int(0) =
0 and int(.999...*62) = int(61.99...) = 61.
We then add 1 to the result to get a random number from 1-62. We then use this
result in the substr function to randomly pick a character.
Splitting a Unix File into Smaller Files
Let's say that we have a large unix file. For example, a text file called my_list
with 100,000 lines.
We need the data contained in smaller files with no more than 1000 lines each.
This will create 100 files in the current directory that each contain 1000 lines
from my_list. Since we did not specify a name for the output file, the files will
be named by an x, followed by two letters of the alphabet (from aa to zz).
So, for example, the first 1000 lines of my_list will be in file xaa, the next 1000
lines in xab, the next 1000 in xac, etc.
then the output files would have been my_listaa, mylistab, etc.
The Split Function in Awk
Awk has a split command, which takes a string and splits it into an array, and
returns the number of elements. The default separator is white space.
As an example, let us assume that a line in a logfile consists of:
4/2/2003 11:23:18 This is a log entry with timestamp.
and we have an awk program like this:
{
split($1,DATE,"/")
n = split($2,TIME,":")
print "Month is "DATE[1]
print "Minutes are "TIME[2]
print "Time has "n" parts"
}
Running the program against the logfile line would result in the following
output:
Month is 4
Minutes are 23
Time has 3 parts
Accessing Unix Command Output in Awk
You can run a unix command through awk, and then access the command's
output within the awk script.
The first call to "cmd"|getline will open it as a pipe and fetch the first line of
output. Each subsequent call will fetch the next line of output. If there is no
output, it will return empty.
For each line, $0 will be automatically assigned to the whole line, and the fields
($1, $2, etc) will be assigned by breaking up on the whitespace pattern.
while ("cmd"|getline)
#! /bin/nawk -f
BEGIN {
while ("env"|getline)
print $0
}
}
This will run the "env" command in a unix shell and it will keep looping until
there are no more environment variables. Each line will be printed by the print
command.
A Formal Way To Parse Command Lines in Shell Scripts
For unix scripts that get executed by lots of other users, I like to make them user
friendly by allowing arguments to be passed in any order.
For example, let's say that I created a script called cube that takes four
parameters: three values for the cube's dimensions (height, width, and depth) and
a flag to make it a die (i.e. have the sides numbered 1-6).
For quick and dirty scripts, I would just read in the arguments in order. For
example, I might assign $1 to height, $2 to width, $3 to depth, and $4 to flag.
So, if a user ran cube 3 4 5 1, I would create a 3x4x5 cube that was a die.
If I wanted to make the script user friendly, I would specify the parameters -
height, -width, -depth, and -die, where the first 3 params would take an
argument.
Let's further clarify that the height is the only required parameter. If the width or
depth is omitted, it would be the same as the height. If -die is not provided, then
the cube will not be a die.
My shell script would start with two functions: Usage and parseArgs.
The Usage function simply prints out the possible arguments and whether they
are optional (i.e. in brackets):
function Usage
echo "Usage: cube -height height [-width width] [-depth depth] [-die]"
}
The pargeArgs function initializes a heightcheck variable to 0, because height is
a required field. Then, as long as the first argument ($1) exists, the while loop
will execute. The while loop uses a nested case statement to identify the
parameter, and then does a shift to shift all arguments to the left (so $2 becomes
the new $1). If the parameter takes an argument, then $2 is accessed and an extra
shift statement is done.
function parseArgs
heightcheck = 0
while [ -n "$1" ]
do
case "$1" in
-height) height="$2"
heightCheck=1
shift
;;
-width) width="$2"
shift
;;
-depth) depth="$2"
shift
;;
-die) die=1
;;
esac
shift
done
After these functions, the main script will start. First, we will check for the
arguments by calling parseArgs with the arguments to the script:
Next, We will set the defaults. ${a:-"b"} returns a if a is set (assigned a value)
or else it will return b:
He thought the limitation was in grep and so asked me to provide him with the
equivalent awk script. I told him that it was not a grep problem. The issue is
with too many arguments on the command line - so the problem would happen
with awk also.
The solution is to use a for loop, so you are actually running the command
240,695 times with only one argument:
for i in *
do
done
Extracting Initials In Awk, Part 1
Someone once asked me for help.
They had a file file1 containing first and last names. They wanted to output the
initials to a new file file2.
So, if file1 contained "charlie brown", the script had to write "cb" to file2.
Here's my solution:
charlie brown
orphan annie
chuck barry
cb1
oa1
cb2
Here is my solution:
awk '{
initial=substr($1,1,1)""substr($2,1,1)
INITCOUNT[initial]++
print initial""INITCOUNT[initial]
}'
The first line sets a variable initial to be the initials (i.e. cb).
The second line sets an associative array called INITCOUNT that is indexed by
initial. The code increments the value of INITCOUNT by 1. So
INITCOUNT["cb"] is 1 the first time the initial is "cb", the next occurrence sets
it to 2, etc.
Generating Random Numbers In Awk
In awk, you can generate random numbers with the rand() function.
Like most number generators, you need to "seed" the function (i.e. provide an
initial value for the mathematical process), otherwise the function will return the
same values.
You can seed the awk random generator by calling srand() in the BEGIN section
of the awk program. By not providing an argument, it defaults to using the
current date / time as the seed value.
The function returns a value v, where 0 <= v < 1. This means that v can be a
value in between 0 and 1, and it can be 0, but it can't be 1. In other words, v can
be 0, 0.12, 0.65, 0.999, etc.
So, the way to generate a random number in awk from M to N is to use the
formula value = int( rand() * N ) + M.
For example, if you want to simulate a dice roll, you need to generate a random
number from 1 to 6. This means that the random value can be 1, 2, 3, 4, 5, or 6.
You would use int(rand() * 6 ) + 1.
3. int() will round it down to the nearest integer, thus resulting in a number from
0 to 5.
. script
Normally, when you run a shell script, it executes in a child shell. Once the script
completes, the child shell goes away and you are returned to the command
prompt on the original shell. This means that the scope of the script doesn't
apply to the invoking shell.
In other words, if your script changed the directory or set some variables, those
changes won't be reflected after the script ends.
Using a dot space in front of the script means that the scope of the script is the
same as the invoking shell.
For example, let's say that you are currently in the /usr directory and you run the
script go_tmp, which changes the directory to /tmp.
If you run go_tmp then, after the script ends, you will still be in /usr.
The dot operator is also used a lot in shell scripts that need to call other scripts
(like subroutines).
Sending Email From a Unix Script
Sometimes I need my unix script to email the output to a mailing list.
1. I set a unix variable called MAIL_LIST that holds all the email addresses in a
space separated list.
MAIL_LIST="john.doe@acme.com bob_jones@somewhere.org"
3. Instead of sending the output of the script to standard out, I send it to a temp
file
4. At the end of the script, before exiting, I use /usr/bin/mailx to email the
output.
umask 002
stty erase "^H" kill "^U" intr "^C" eof "^D" quit "^\\" susp "^Z"
PS1="\\
\$PWD \\
`hostname`:[\!]-> "
export PS1
The first entry set an alias of "ll", so "ll" could be run like a command. It would
do the long list function (ls -l).
The umask sets it so that files are created with a default permission of rw-rw-r--
and directories are created with drwxrwxr-x.
The stty line sets the terminal characteristics, so that his unix window's
backspace key will work.
Finally, it sets the main command prompt(PS1) to always display the current
directory, the machine name, last command process, and a -> for entering the
command.
For example, if you were currently in the /etc directory, and your unix machine
was called "donut", your command prompt would look like:
/etc
donut:[583]->
Note: Make sure that your .profile file in your home directory calls .kshrc, so
that it is automatically loaded every time you log in to unix.
How To Use Multiple-Word Arguments in Unix Scripts
Did you know that unix command line arguments can be more than one word?
You can group text separated by spaces into a single argument by surrounding
the text with quotes.
For example, what are the differences between these argument lists?
boy girl
boy dog girl
"boy dog" girl
In the first example, we have two arguments: boy and girl. In the second
example, we have three arguments: boy, dog, and girl.
In the third example, we have two arguments: boy dog and girl.
Awk Script to Generate an HTML Page With Randomly Labelled URLs
Here is an interesting awk script I created.
It reads in a file of URLs and then generates an HTML page containing links to
each of the URLs. Each link will be labeled with a random number.
I used this script (and the page it generated) to do some testing with people I
recruited off of Craigslist.
print "[br]"
The first things to look at are the BEGIN and END statements. They are each
run once - the BEGIN before the file is read in, and the END after.
The BEGIN statement prints the html tag and seeds the random number
generator. The argument to srand() is empty, so we use the date/time for the seed.
This way, we won't get the same random numbers every time the script is run.
The first one will output the link. The rand() function will return a number that is
greater than or equal to 0 and less than 1. So we can write this as 0 <= n > 1
So, in other words, the lowest value returned by rand() is 0 and the highest is
.9999...
Next we are multiplying rand() by 62 and using the int() function. This will
result in an integer from 0 to 61. Then, we are adding 41. This means that the
random label used for the link will be from 41 to 102.
In this script, we do the same thing, except that, for each URL, we write
javascript which will randomly build the link 5% of the time, at runtime.
So, we are using random numbers twice. First, in the awk script itself to create
the labels for the links. Second, we were writing the javascript random function
to the output, so that this random function is run every time the page is loaded in
a browser.
Let's use an example to make it clearer. Pretend we have a file with 1o URLs. If
we run the awk script form the last post, the output will be an HTML file that
displays all 10 links, each one labeled by a random number.
If we run this same file on the script below, we will have an HTML page that
runs a javascript routine for each link to decide whether or not it gets displayed.
Each link only gets displayed 5% of the time. So, this HTML file will display
from 0-10 links every time it is loaded into the web browser.
print "document.write("apos"[br]"apos");"
print "}"
print "[/script]"
}
Notice the apos constant. In the BEGIN block, we set it equal to the apostrophe.
This way, we can insert apostrophes for the javascript, without awk processing
them.
Calling Unix Commands From Within Awk: system vs getline
Inside an awk script, there are two ways in which you can interface with the
operating system: system() and getline.
system() is good if you want to just run a command, and don't need any results
back.
getline can be used when you want your awk program to return data back into
the awk script.
Some examples:
The first example sorts a file, and then stores the return code in the ret_code.
I run a query on a unix database using sqlplus from a shell script. I then save the
results in an csv (comma separated value) file and load it into Excel. What I find
is that some rows have extra columns.
1. Extract the data with a % (or other symbol) separating the data.
2. Use sed (unix stream editor) to remove all commas, and then convert the %'s
into commas:
Then, your columns should line up in excel because the data will no longer
contain commas.
Using uuencode to Mail Attachments From Unix Shell Scripts
At work, I frequently have to do queries on an Oracle database, and send out the
results in a spreadsheet.
If the report can be done with SQL alone, I usually do the report on my PC using
TOAD (Tool for Oracle Admins and Developers). TOAD lets you save the
results as an excel spreadsheet.
If, however, the report was complex where it required manipulating the output
through unix shell / awk scripts, or I wanted to automate the report, I had to run
the query from unix through sqlplus. If I then mailed the report from unix, the
data would be in the body of the email. This isn't good if the report had a lot of
columns.
I recently found a solution. It is a unix utility called uuencode, which let's you
send data as an attachment. By sending the report as a csv (comma separated
values) file, the email will appear in Outlook on my PC with an attachment that
will open in Excel with one click.
Here are two ways to use it (assume that the report was saved in a text file called
rptfile, and all data are separated by commas):
The first way sends an empty email with report.csv as an attachment. The second
one also sends an email with report.csv as an attachment, but the email also has
the contents of bodyfile in the body of the email.
Removing Carriage Return Line Feeds (CRLF's) From Text Files in Unix
In unix, each line in a text file ends with a line feed. Windows text files,
however, end each line with a carriage return and a line feed.
Normally, when you ftp a text file between Windows and unix, the end of line
characters get converted. Sometimes, however, the carriage return and line feed
get translated into Unix. This can happen, for example, if the FTP is set to binary
mode before the file is sent.
When a unix file has both a carriage return and a line feed, it will display a
control M at the end of each line. You can remove them in the vi editor:
1. vi the file.
2. Type colon to get a search line.
3. Type 1,$s/ctrl v ctrl m/$