Unix and Python Tutorial For 20 180
Unix and Python Tutorial For 20 180
180
Originally written by Sampsa Hautaniemi
Modified by Sabrina L. Spencer and Laura B. Sontag
February 8, 2006
Contents
1 Hands on UNIX 3
1.1 Basic UNIX operations . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Logging in and out of UNIX . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Help yourself . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Directory and listing commands . . . . . . . . . . . . . . 4
1.2.3 File commands . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Executing programs in UNIX . . . . . . . . . . . . . . . . . . . . 6
1.4 Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Unix exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Python exercises 23
3.1 Play with Python . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2
1 Hands on UNIX
Unix is a multi-user, multi-tasking operating system; hundreds of Unix users
are able to use tens of program simultaneously. Accordingly, Unix provides
computational biologists and bioengineers with many useful features. For ex-
ample, high performance computing clusters are practically always based on
Unix. Therefore, some familiarity with the Unix commands is indispensable.
The Unix part of this tutorial is directed to students with no prior experience
with Unix systems. The goal of this section is to review basic commands and
principles for the Unix operating system, and using this tutorial a student should
be able to:
• copy files in Unix.
• create directories and navigate in Unix.
• get help in Unix.
• execute programs, such as Python, in Unix.
More comprehensive texts are available, for example, in [Unix Tutorial, Fiamingo et al., 1998].
In general, a command may have several options. Options are usually pre-
ceded by a hyphen (-) and several options can be concatenated. Here, only the
most useful options are discussed.
Unix is a rather old operating system (b. 1970). This brings forward several
differences between the Unix system and desktop/laptop computer operating
systems, such as Windows and Apple. For example, commands in Unix are
typed in command prompt (no need to use a mouse and click windows). Unix
is also case-sensitive, so string “text” is different from “TEXT”, and command
“exit” is executed while “Exit” causes an error.
Throughout this tutorial, Unix commands are typed in bold italics (e.g. cd ),
options in bold (e.g. ls -la), and examples of the Unix operations in Athena
are typed with the typewrite font. Often, especially when using editors, some
commands are preceded with pressing down control, alt or shift, and these are
denoted with CTRL, ALT, SHIFT, respectively. For example, in order to
exit a Pico editor session, one needs to first press down the control key and
then simultaneously press down “x”, which is denoted as CTRL+x. To exit a
Python session and return to the Athena prompt, one needs to type CTRL+d.
3
1.2 Logging in and out of UNIX
In order to use a Unix operating system one needs to log in. The easiest way
is via a workstation; another option could be an SSH (telnet) program in a
PC/MAC.
Whatever means are used to make the connection to Unix, you need to give
your username and password in order to log in. If your username and password
match an entry in the database of approved usernames and associated pass-
words, you have successfully logged in to the Athena cluster and the command
prompt appears:
athena%
In order to end a Unix session, the user needs to log out. This can done
with the command exit (or with the slightly longer command logout ).
Note that the /s is referring to the first letter of my Athena user name and the
/p is referring to the second letter of my Athena user name.
Command mkdir makes a new directory. For example, in order to create
directory BE180, type mkdir BE180 in the command prompt. Command
rmdir removes a directory, which must be empty (i.e., it should not contain
any files or directories). For example, in order to delete the folder BE180, type
rmdir BE180. Files can be removed with the command rm.
Command cd (change directory) enables navigation in the directory struc-
ture. For example, in order to move to directory BE180, type cd BE180. The
directory must exist before executing cd. In order to move one step towards
the root, two dots (..) should be used. For example, if you are in the directory
4
BE180, type cd .. in order to go up one level and get back to your home direc-
tory. Typing cd ∼ always brings you back to your home directory, no matter
where you are in the directory structure.
Command ls lists all files and directories in the current directory. ls has
many options, but -la is particularly useful (type ls -la) because it lists each
file and directory in one row and shows their properties such as attributes, size
and creation date. Attributes will be discussed in Section 1.2.4. Sometimes a
directory contains so many files that they do not all fit on the screen. In this
case -la | more is useful.
athena% cp /afs/athena.mit.edu/user/s/o/sontag/www/BE180/copyme.py
/afs/athena.mit.edu/user/s/p/spencers/BE180/copymedup.py
athena% cp /afs/athena.mit.edu/user/s/o/sontag/www/BE180/copyme.py
∼/BE180/
1.2.4 Security
Since Unix is a multiuser system, basic knowledge on some security issues is
needed. Students using Unix need to be especially careful about what they
leave in their directories and how these directories are secured. Unix gives
many ways to deal with the security, such as chmod (permissions mode of a
file). Attributes of a file or a directory can be listed with ls -la. For example,
Entries starting from the second position (rwxr. . .) denote attributes for copyme.py.
The first entry indicates if the listed entity is a file (-) or a directory (d). Entries
5
2–4 denote what rights the user has; r is read, w write, and x execute. Entries
5–7 denote rwx rights to a group (which needs to be specified) and entries 8–10
to all users. Thus, copyme.py can be read and altered by all users.
In this case, copyme.py was successfully executed and it printed two lines to
the screen.
Another example of executing programs in Unix is sending email with the
program Pine. For example, pine -attach copyme.py spencers@mit.edu,
opens the Pine program, attaches copyme.py, and sends the email to spencers@mit.edu.
1.4 Editors
Unix offers a variety of editors for typing and editing text. Probably the user-
friendliest is Pico. Another widely used editor is Emacs, which is very flexible
and valuable for heavy users. However, the flexibility of Emacs makes it far
more difficult to use than Pico. No support for Emacs will be provided in this
course, but there are many tutorials available, for example [Emacs Tutorial].
If your prior Unix experience consists only of a few hours, it is highly recom-
mended that you choose Pico for this course. The main reason for this is that
Pico is easy to use. If you want to edit copyme.py, just type pico copyme.py.
If the file does not exist in the directory, Pico will create new file with that
name. A short Pico online tutorial is available at [Pico Tutorial]. To save a file
in Pico type CTRL+o. To exit Pico and return to your Athena prompt, type
CTRL+x.
6
4. Execute Unix intro.py with Python in your directory BE180. (Section 1.3)
5. Open Unix intro.py with Pico (or with an editor of your choice) and follow
instructions in the file. Remember to save your changes. (Section 1.4)
6. Execute Unix intro.py again with Python.
7. Email Unix intro.py to yourself. (Section 1.3)
7
2 Introduction to programming and Python
Python is an object-oriented, interpreted, programming language mainly used
for making scripts (terms in italic will be explained in this tutorial). The aim of
this tutorial is to provide students with sufficient knowledge of programming in
Python to be able to solve basic tasks. Several excellent Python tutorials exist
for beginners and advanced users, and it is highly recommended to use them in
addition to this one [A Python Tutorial, Many Python Tutorials, Python book].
A programming language is a set of vocabulary and grammatical rules for
instructing a computer to perform specific tasks. Python is a “high-level lan-
guage”, which means that its syntax resembles that of natural language, such
as English. Even though Python is much more simple than English, it is more
versatile and easier to learn than low-level languages such as Assembler or ac-
tual machine language, which is understood by the computer. Accordingly, a
high-level program must be translated into machine language. In general, the
translation can be done in two ways. The first is to compile the program, which
is how C++ works. The second is to interpret the program, which is how
Python works. Compilation translates the whole program directly into ma-
chine language, while the interpreter translates one line at a time. Interpreted
programs, and Python in particular, have several benefits, of which the most
important is that the programs can be studied interactively.
A script (macro) is a sequence of commands that are interpreted. Scripting
allows users to create programs that automatically perform a specific task. For
example, assume that you obtain hundreds of DNA sequences and you want
to perform a BLAST search to align the sequences. A simple Python script
allows you automatically perform the search and store the results, while your
contribution is restricted to coding the script and starting it.
Python is an object-oriented programming language, which means that each
data type can have several types of operations or methods. For example, a
string object has several types of methods, such as ’count’, and this function is
used in Python (to open Python in Athena, type python ) as follows:
>>> "GATTACA".count("A")
3
8
before writing a single line of a program. That is, make sure that you under-
stand the assignment and have a plan for what you will do before starting to
program.
A fruitful approach to tasks requiring programming is a top-down approach.
First, understand what the programs needs to do: what are the inputs and what
does the program need to output? Second, split the task into smaller tasks, such
as: “Read a file”, “Find start/stop codon”, “Read amino acids between the start
and the stop codon”, “Write the results”. These smaller tasks correspond to
modules (or functions) and they can be used in several programs doing similar
things. For example, once you have made a function that reads a text file, it
can be used in many programs using text files and with small changes to read
other file formats.
9
>>> s = "GATTA"
>>> s2 = s + "CA"
>>> s2
’GATTACA’
>>> s3 = s + " " + "CA"
>>> s3
’GATTA CA’
Several strings, which do not need to be of the same length, can be stored
to a list. A list is delimited with brackets ([]). After creating a list, the entries
can be retrieved easily using a bracketed index. However, perhaps somewhat
counterintuitively, the first index to a list is 0, not 1. For example, in the inter-
active Python mode:
>>> li = ["GATTACA", "ATTAC", "ACATTAG"]
>>> li
[’GATTACA’, ’ATTAC’, ’ACATTAG’]
>>> li[1]
’ATTAC’
>>> li[0]
’GATTACA’
If you forget to type brackets, each entry is taken to be its own element in
the list:
>>> li.extend("GGG")
>>> li
[’GATTACA’, ’ATTAC’, ’ACATTAG’, ’GAGA’, ’G’, ’G’, ’G’]
The method ’find’ finds the first entry of a substring. For example, the starting
position of the substring “TTA” for the first sequence in the list li :
10
>>> li = ["GATTACA", "ATTAC", "ACATTAG"]
>>> li[0].find("TTA")
2
Methods for lists are sufficient for basic operations such as finding and count-
ing. However, sometimes there is a need for more sophisticated string opera-
tions. Python allows users to import modules that can perform complex tasks.
In the case of strings, the next step after string and list methods is to import
the module string. The string module can be imported with the command im-
port and the operations are first preceded by the module from which they are
imported. For example, (although this is not necessary to count):
For the full list of the operations in the string module, type dir(string),
help(string), and/or visit the Python online tutorial [String module commands].
In some cases the string module is not powerful enough. The next (and in
some sense the ultimate) step is to use regular expressions (import re). The
drawback of regular expressions is that they are much more difficult to use than
the string and list methods. Indeed, there are several tutorials for just how to
use regular expressions, e.g. [Regular expression Tutorial]. In this course you
should be able to perform all programming tasks without regular expressions.
2.3.2 Indexing
Getting slices of a string is somewhat counterintuitive because the first index is
included in the resulting slice, but the last one is not. Examples follow:
11
>>> s = "GATTACA"
>>> s[1:3]
’AT’
>>> s[0:2]
’GA’
>>> s[0:-1]
’GATTAC’
>>> s[1:len(s)]
’ATTACA’
Note that when using the method ’len’ the last entry is also included in the
slice.
2.3.3 Dictionaries
Dictionaries are mapping constructs consisting of key-value pairs. A dictionary
is created with curly brackets ({}) or the command dict. For example, let’s
make a dictionary that has two keys (Seq1 and Seq2 ) and two values (GAT-
TACA and ATTAC), and then add the sequence ACATTAGA (key is Seq3 ).
Let’s also try to create another entry of the dictionary that has a list for its key,
and observe that lists are not good keys:
Note that lists cannot be keys but can be values (the same applies to dic-
tionaries: dictionaries cannot be keys but can be values). Keys and associated
values are initialized using a colon (:) and new entries are added just by using
the key as above. Note also that the order of the key-value is irrelevant in a
dictionary because the values are retrieved using the keys. For example, let’s
retrieve GATTACA:
12
>>> my dict = {"Seq1":"GATTACA","Seq2":"ATTAC"}
>>> my dict["Seq1"]
’GATTACA’
Dictionaries, like lists and strings, have several methods. Use help(dict) to
see the listing. For example, in order to store all the keys and values, the meth-
ods ’keys’ and ’values’ are useful:
13
2.4 Controlling the flow
In real-life programming tasks one wants to control the flow of the program.
For example, IF a sequence has a start codon at position i, THEN print the
characters UNTIL a stop codon is reached. A way to control the flow is via
conditional execution (if -elseif -then ) construction.
Controlling the flow requires the concept of code indenting. Code indenting
is very important in Python! In this tutorial we use three spaces ( ) to
denote an indented block, but any other number is as good as long as it remains
consistent throughout the file. Indenting means that everything in if, elif or
else blocks needs to be indented. A colon after if, elif or else statement tells
that the block begins and returning indention up one level tells Python that the
block has ended. An if -statement does not necessarily need to be followed by
elif - or then-blocks.
When programs become more complex, it is better to write the program
into a file (using an editor such as Pico) and then execute it using the command
python. For example, assume we first want to determine whether a sequence
contains more “A”s or an equal number of “A”s than another sequence and
then print this information to the screen. Here, we have typed the following to
iftest.py (mark denotes a space):
s1 = "GATTACA"
s2 = "TATAG"
if s1.count("A") == s2.count("A") == 0: 1
print "Strings", s1, "and", s2, "do not have any As"
elif s1.count("A") > s2.count("A"): 2
print "String", s1, "has more As than",s2
elif s1.count("A") == s2.count("A"): 3
print "Strings", s1, "and", s2, "have equal numbers of As"
else: 4
print "String", s2, "has more As than", s1
print "This line will be printed always" 5
First, in 1 we ask if the strings have at least one “A”. Note colon (:) and
indention! Note also that an if -statement may have several comparisons. If
either of the strings has at least one “A”, in 2 we compare them to ask if the
number of “A”s is greater in s1 than in s2. In 3 we test if the number of “A”s
is equal. If s1 does not have greater or equal number of “A”s, then the only
possibility is that s2 has more, which is printed in the else block 4 . Regardless
of what happens in the if -elif -then blocks, 5 will always be executed because
it is not indented. Now, iftest.py is executed:
14
In Python, character == denotes equal comparison and using = in compari-
son causes an error. The symbol = is an assignment operator. Use it to assign a
number to a variable, for example s=9 . Character > denotes “strictly greater
than”, >= means “greater than or equal to”, and a != b compares whether a
is different from b. It is also worth remembering that if an if or elif condition
realizes, the rest of the elif conditions are not tested (in different programming
languages this may be different).
Note that it does not matter if there is an empty line between the for-block
and the print command. This is true in general in Python; one can add as
many empty lines to the program as needed (sometimes they improve readabil-
ity of the program).
Often with for-loops the method ’range’ is useful. Parameters for ’range’ are
start (optional), end (obligatory), and step (optional); ’range’ returns a list
containing an arithmetic progression of integers. For example,
15
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(0,5,1)
[0, 1, 2, 3, 4]
>>> i = range(0,7,3)
>>> i
[0, 3, 6]
Note how Python does not care that we tried to access non-existing elements
and just returned the last letter.
In addition to for-loops, one can use while-loops for repeated execution. A
while-loop does something until a condition is fulfilled. It is useful when read-
ing something from a file (more discussion on dealing with the files is offered
in the next section). For example, if Seq.txt consists of sequences GATTACA,
ATTAC, ACATTAG (one in each row), we could read them line by line (the
following code is stored to whiletest.py ):
f = open("Seq.txt", "r")
seq = f.readline()
while seq != "" :
print seq.strip()
seq = f.readline()
f.close()
print "Once again this will be printed"
16
athena% python whiletest.py
GATTACA
ATTAC
ACATTAG
Once again this will be printed
athena%
Method ’strip()’ reads the sequence and removes all leading and trailing charac-
ters defined in the parenthesis. If the parenthesis is empty, like in the example
above, then whitespace characters (spaces, tabs, carriage returns and line feeds)
are removed. If ’strip’ was not used, there would be an empty line after each
sequence.
Using the command open, you can open files in a Python program. Attribute
"r" makes the file open in read-only mode, which means that you cannot (ac-
cidentally or deliberately) write anything to Seq.txt. Further, assume that you
need to write the contents of Seq.txt directly (of course, in real-life you would
do something sophisticated to the data before storing it again) to the file out-
put.txt. First, we need to again open both files (and then close them afterwards):
f = open("Seq.txt", "r")
towrite = open("output.txt", "w")
<...place some really cool code here...>
f.close()
towrite.close()
17
If output.txt does not exist, attribute "w" creates a file and names it as out-
put.txt, and assigns a write mode to it. If output.txt existed, the contents of it
are deleted before new data are stored into it, so be careful when using "w".
First we read all the lines from Seq.txt and then use a for-loop to print each
line to one row in output.txt. The code below is typed to iotest.py :
contents = readfile.readlines() 1
for i in contents:
SeqToWrite = i.strip() 2
print >> writefile, SeqToWrite
3
readfile.close()
writefile.close()
contents = readfile.readlines()
for i in contents:
writefile.write(i)
readfile.close()
writefile.close()
Note that here we did not delete newline characters from the sequence. If
we did, all the sequences would have been written together to the first row.
try:
<Do something, e.g. open a file.>
except:
18
<Define here what do to if an error occurred.>
<...here life may go on...>
2.7 Functions
Functions allow the programmer to store valuable procedures and use them fre-
quently by calling them in the main program. In Python, a function block is
denoted with the command def. For example, a function that takes in a name
of a file, tries to open it, reads its contents to a list and returns the list, would
be as follows (functiontest.py):
def ReadFileContents(filename):
try:
file = open(filename,"r")
except:
#Do nothing but return -1
return -1
contents = file.readlines()
file.close()
return contents
# Main program
filename = "Seq.txt"
sequences = ReadFileContents(filename)
print sequences
If ReadFileContents is not able to read the file, it returns -1. Since the main pro-
gram only prints the sequences, in case the file could not be read, -1 is printed
(Test this, e.g. rename Seq.txt to Seq2.txt ). In real-life programs exceptions
can be used to terminate the program and inform the user of the error. Note
that in the except -block we have a comment. In Python, programmers can (and
should) type comments using the hash character (#). Everything after the hash
character is considered as a comment and will not be executed. Also, note that
a function needs to be declared before you use it. Declaration begins with def
and ends when indentation gets back one to main level. Since the function above
did not use the ’strip’ method, the last character of each sequence is the newline
character (check this).
If a function returns something to the main program, this can be done with
the command return which is followed by the returned variables that are sep-
19
arated with commas. It is not necessary, however, for a function to return
anything.
# Main program
x = "ATTAC"
z = "ACATTAG"
testfun("GAGA")
print z
print y # This will cause an error
20
This is y’s value: GAGA
This is z’s value: GATTACA
ACATTAG
Traceback (most recent call last):
File "scopetest.py", line 20, in ?
print y
NameError: name ’y’ is not defined
athena%
So, the command print x in the function did not cause errors since x is defined
in the main program. However, trying to print y in the main program caused
an error because y is not defined. In other words, the scope of x is global (it
can be seen in the main program and in the functions) but y is local (it is ac-
cessible only in the function testfun). Note that in the function the value for z
is GATTACA, while in the main program it is ACATTAG.
After the global namespace comes built-in namespace and it contains several
Python functions (e.g. range, int ). The built-in namespace is created every
time when the interpreter starts. Normally programs do not modify the built-in
namespace.
2.8 Biopython
Biopython is a freely available project consisting of tools for computational
molecular biology [Biopython Homepage]. Detailed discussion on how to use
Biopython and what is included in it is not covered in this tutorial. However,
you can install Biopython to Athena as follows:
2.9 Help!?
Interactive Python has interactive help (help). In order to use help, one needs
to know the module or data type for which help is needed. For example, in order
to find out the methods for strings, type help(str), on lists type help(list),
on dictionaries type help(dict), and so forth. In order to get slightly more
detailed help on a specific method in a module, one needs to type the module
first and then method. For example, readline is a method for the object file,
and in order to get help on the command readline, type help(file.readline).
When getting help on the modules, remember to import the module first. For
example, first import string and then help(string). More thorough help is
available at Python webpages [A Python Tutorial].
21
2.10 Summary
• Before starting to code, make sure you understand the assignment.
• Use Pico (or Emacs) to write the code.
• Python is case-sensitive; “Print” is different from “print”.
• Use functions as much as possible (so you can reuse them later. . . )
• Declare functions before you use them.
22
3 Python exercises
This section contains some introductory Python exercises.
23
References
[A Python Tutorial] http://www.python.org/doc/current/tut/tut.html
[Python book] H.M Deitel, P.J. Deitel, J.P. Liperi, B.A Wiedermann Python
How to Program. Prentice Hall, Upper Saddle River, NJ, 2002.
[Emacs Tutorial] http://www.lib.uchicago.edu/keith/tcl-course/emacs-
tutorial.html
[Fiamingo et al., 1998] Frank G. Fiamingo, Linda DeBula and Linda Condron.
Introduction to Unix. University Technology Services, The Ohio State Uni-
versity, 1998. http://8help.osu.edu/wks/unix_course/intro-1.html
[Many Python Tutorials] http://www.python.org/doc/Intros.html
24