Introduction To Python: Chen Lin
Introduction To Python: Chen Lin
Chen Lin
clin@brandeis.edu
COSI 134a
Volen 110
Office Hour: Thurs. 3-5
Python Videos
http://showmedo.com/videotutorials/python
5 Minute Overview (What Does Python
Look Like?)
Introducing the PyDev IDE for Eclipse
Linear Algebra with Numpy
And many more
Jython
Development Environments
what IDE to use? http://stackoverflow.com/questions/81584
Background
Data Types/Structure
Control
File
flow
I/O
Modules
Class
NLTK
List
A compound data type:
[0]
[2.3, 4.5]
[5, "Hello", "there", 9.8]
[]
Use len() to get the length of a list
>>> names = [Ben", Chen", Yaqin"]
>>> len(names)
3
remove an element
Dictionaries
Duplicate
Dictionary
Background
Data Types/Structure
Control Flow
Things that are False
The boolean value False
The numbers 0 (integer), 0.0 (float) and 0j (complex).
The empty string "".
The empty list [], empty dictionary {} and empty set set().
Things that are True
The boolean value True
All non-zero numbers.
Any string containing at least one character.
A non-empty data structure.
If
>>> smiles = "BrC1=CC=C(C=C1)NN.Cl"
>>> bool(smiles)
True
>>> not bool(smiles)
False
>>> if not smiles:
...
print "The SMILES string is empty"
...
The else case is always optional
Boolean logic
Python expressions can have ands and
ors:
if (ben <= 5 and chen >= 10 or
chen == 500 and ben != 5):
print Ben and Chen
Range Test
if (3 <= Time <= 5):
print Office Hour"
For
>>> names = [Ben", Chen", Yaqin"]
>>> for name in names:
...
print smiles
...
Ben
Chen
Yaqin
Break, continue
>>>
...
...
...
...
...
...
...
...
...
Checking 3
for value in [3, 1, 4, 1, 5, 9, 2]:
The square is 9
Checking 1
print "Checking", value
Ignoring
if value > 8:
Checking 4
The square is 16
print "Exiting for loop"
Use breakChecking
to stop 1
break
the for loopIgnoring
Checking 5
elif value < 3:
The to
square
Use continue
stop is 25
print "Ignoring"
processing Checking
the current9 item
Exiting for loop
continue
>>>
Range()
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(5, 10)
[5, 6, 7, 8, 9]
>>> range(0, 10, 2)
[0, 2, 4, 6, 8]
Background
Data Types/Structure
Control
File
flow
I/O
Modules
Class
NLTK
Reading files
>>> f = open(names.txt")
>>> f.readline()
'Yaqin\n'
Quick Way
>>> lst= [ x for x in open("text.txt","r").readlines() ]
>>> lst
['Chen Lin\n', 'clin@brandeis.edu\n', 'Volen 110\n', 'Office
Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n',
'yaqin@brandeis.edu\n', 'Volen 110\n', 'Offiche Hour:
Tues. 3-5\n']
File Output
input_file = open(in.txt")
output_file = open(out.txt", "w")
for line in input_file:
w = write mode
output_file.write(line) a = append mode
wb = write in binary
r = read mode (default)
rb = read in binary
U = read files with Unix
or Windows line endings
Background
Data Types/Structure
Control
File
flow
I/O
Modules
Class
NLTK
Modules
When
Background
Data Types/Structure
Control
File
flow
I/O
Modules
Class
NLTK
Classes
class ClassName(object):
<statement-1>
...
<statement-N>
class MyClass(object):
"""A simple example class"""
i = 12345
def f(self):
return self.i
class DerivedClassName(BaseClassName):
<statement-1>
...
<statement-N>
Background
Data Types/Structure
Control
File
flow
I/O
Modules
Class
NLTK
http://www.nltk.org/book
NLTK is on berry patch machines!
>>>from nltk.book import *
>>> text1
<Text: Moby Dick by Herman Melville 1851>
>>> text1.name
'Moby Dick by Herman Melville 1851'
>>> text1.concordance("monstrous")
>>> dir(text1)
>>> text1.tokens
>>> text1.index("my")
4647
>>> sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',
'Sussex', '.']
Classify Text
>>> def gender_features(word):
...
return {'last_letter': word[-1]}
>>> gender_features('Shrek')
{'last_letter': 'k'}
>>> from nltk.corpus import names
>>> import random
>>> names = ([(name, 'male') for name in names.words('male.txt')] +
... [(name, 'female') for name in names.words('female.txt')])
>>> random.shuffle(names)
Corpus:10,788 news
1.3 million words.
Been classified into 90 topics
Grouped into 2 sets, "training" and "test
Categories overlap with each other
http://nltk.googlecode.com/svn/trunk/doc/bo
ok/ch02.html
Reuters
>>> from nltk.corpus import reuters
>>> reuters.fileids()
['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]
>>> reuters.categories()
['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',
'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cottonoil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]