Introduction To Python Programming
Introduction To Python Programming
email_address = “clin”
if "@" not in email_address:
email_address += "@brandeis.edu“
String Method: “strip”, “rstrip”, “lstrip” are ways to
remove whitespace or selected characters
>>> line = " # This is a comment line \n"
>>> line.strip()
'# This is a comment line'
>>> line.rstrip()
' # This is a comment line'
>>> line.rstrip("\n")
' # This is a comment line '
>>>
More String methods
email.startswith(“c") endswith(“u”)
True/False
>>> “chen".upper()
‘CHEN'
Unexpected things about strings
>>> s = "andrew" Strings are read only
>>> s[0] = "A"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item
assignment
>>> s = "A" + s[1:]
>>> s
'Andrew‘
“\” is for special characters
\n -> newline
\t -> tab
\\ -> backslash
...
But Windows uses backslash for directories!
filename = "M:\nickel_project\reactive.smi" # DANGER!
filename = "M:\\nickel_project\\reactive.smi" # Better!
filename = "M:/nickel_project/reactive.smi" # Usually works
Lists are mutable - some useful
methods
>>> ids = ["9pti", "2plv", "1crn"]
>>> ids.append("1alm")
append an element
>>> ids
['9pti', '2plv', '1crn', '1alm']
>>>ids.extend(L) remove an element
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
>>> del ids[0]
>>> ids
['2plv', '1crn', '1alm']
sort by default order
>>> ids.sort()
>>> ids
['1alm', '1crn', '2plv'] reverse the elements in a list
>>> ids.reverse()
>>> ids
['2plv', '1crn', '1alm']
>>> ids.insert(0, "9pti")
insert an element at some
>>> ids specified position.
['9pti', '2plv', '1crn', '1alm']
(Slower than .append())
Tuples: sort of an immutable list
>>> yellow = (255, 255, 0) # r, g, b
>>> one = (1,)
>>> yellow[0]
>>> yellow[1:]
(255, 0)
>>> yellow[0] = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> symbol_to_name.values()
['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']
class MyClass(object):
"""A simple example class"""
i = 12345
def f(self):
return self.i
class DerivedClassName(BaseClassName):
<statement-1>
...
<statement-N>
Background
Data Types/Structure
Control flow
File I/O
Modules
Class
NLTK
http://www.nltk.org/book
NLTK is on berry patch machines!
>>>from nltk.book import *
>>> text1
<Text: Moby Dick by Herman Melville 1851>
>>> text1.name
'Moby Dick by Herman Melville 1851'
>>> text1.concordance("monstrous")
>>> dir(text1)
>>> text1.tokens
>>> text1.index("my")
4647
>>> sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',
'Sussex', '.']
Classify Text
>>> def gender_features(word):
... return {'last_letter': word[-1]}
>>> gender_features('Shrek')
{'last_letter': 'k'}
>>> from nltk.corpus import names
>>> import random
>>> names = ([(name, 'male') for name in names.words('male.txt')] +
... [(name, 'female') for name in
names.words('female.txt')])
>>> random.shuffle(names)
Featurize, train, test, predict
>>> featuresets = [(gender_features(n), g) for (n,g) in names]
'male'
from nltk.corpus import reuters
Reuters Corpus:10,788 news
1.3 million words.
Been classified into 90 topics
Grouped into 2 sets, "training" and "test“
Categories overlap with each other
http://nltk.googlecode.com/svn/trunk/doc/
book/ch02.html
Reuters
>>> from nltk.corpus import reuters
>>> reuters.fileids()
['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]
>>> reuters.categories()
['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',
'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-
oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]