Python For Data Cheetsheet
Python For Data Cheetsheet
1 Installation - Conda
• Install Anaconda (for Python 3) from anaconda.org
• Install libraries:
conda install notebook pandas seaborn xlrd openpyxl scipy scikit-learn
• Launch Jupyter:
jupyter notebook
2 Installation - Python.org
• Install Python 3
– Windows:
envScriptsactivate
– Unix (Mac/Linux):
source env/bin/activate
• Install libraries:
1
Python Cheatsheet
• Launch Jupyter:
jupyter notebook
3 Jupyter
Command Mode:
• a - Above
• b - Below
• CTL-Enter - Run
• ii - Interrupt Kernel
Edit Mode:
• TAB - Completion
• CTL-Enter - Run
Hints:
%matplotlib inline
4 Variables
• You don’t need to declare variables. Just put their name followed by =
Copyright 2020 2
Python Cheatsheet
5 Math
Use Python as a calculator
Operation Result
41 + 1 42 Adds numbers
41 - 1 40 Subtracts numbers
41 * 1 42 Multiplies numbers
41 / 1 42 Divides numbers
5 % 2 1 Modulus operation (remainder)
8 ** 2 64 Power operation
6 Getting Help
Operation Result
dir(5) Lists attributes of integer
help(range) Shows documentation for range built-in
range? Shows documentation for range built-in. (Can also hit Shift-TAB four
times.) Jupyter only
pd.read_csv?? Shows source code for read_csv function in pandas. Jupyter only
help() Goes into help mode. Hit ENTER to exit.
7 Strings
Operation Result
name = 'paul' Create a variable with a string (can also use double quotes)
dir(name) List attributes of string
name.upper? Shows documentation for .upper method on string. Jupyter only
name.upper() Paul - Uppercase string
name.find('au') 1 - Index location of au
name[0] p - Character at position 0
name[-1] l - Last character in string
greeting = '\N{GRINNING Create string with GRINNING FACE Unicode glyphs (1f600 is Hex for
FACE}' GRINNING FACE).
greeting = '\U0001f600' Create string with GRINNING FACE Unicode glyphs (1f600 is Hex for
GRINNING FACE).
Triple quoted string can span multiple lines. Python 3.6 introduced F-strings:
minutes = 36
paragraph = f"""Greetings {name.title()},
Thank you for attending tonight.
We will be here for {minutes/60:.2f} hours
Long-winded talk.
Goodbye {name}!"""
Copyright 2020 3
Python Cheatsheet
8 Files
Writing to a UTF-8 file:
with open('names.csv', mode='w', encoding='utf8') as fout:
fout.write('name,age\n')
fout.write('jeff,30\n')
fout.write('linda,29\n')
# file is automatically closed when we dedent
Mode Meaning
'r' Read text file (default)
'w' Write text file (truncates if exists)
'x' Write text file, throw FileExistsError if exists.
'a' Append to text file (write to end)
'rb' Read binary file
'wb' Write binary (truncate)
'w+b' Open binary file for reading and writing
'xb' Write binary file, throw FileExistsError if exists.
'ab' Append to binary file (write to end)
Table 4: File Modes
9 Lists
Lists are ordered mutable sequences. They can be created with the list literal syntax:
>>> people = ['Paul', 'John', 'George']
>>> people.append('Ringo')
Lists can also be created by calling the constructor with an optional sequence:
>>> people = list(('Paul', 'John', 'George'))
>>> people
['Paul', 'John', 'George']
If we need the index number during iteration, the enumerate function gives us a tuple of index,
item pairs:
>>> for i, name in enumerate(people, 1):
... print('{} - {}'.format(i, name))
1 - Paul
2 - John
3 - George
4 - Ringo
Copyright 2020 4
Python Cheatsheet
>>> people[0]
'Paul'
>>> people[-1] # len(people) - 1
'Ringo'
Operation Result
l.append(item) Append item to end
l.clear() Empty list (mutates l)
l.copy() Shallow copy
l.count(thing) Number of occurrences of thing
Copyright 2020 5
Python Cheatsheet
10 Slicing
Python tends to use the half-open interval (Pandas .loc slicing is an exception). This means that it
includes start index but not the end index. Also, the length is equal to the end index minus the start
index.
Operation Result
names = ['lennon', Create a list
'mccartney',
'harrison', 'starr']
names[0] 'lennon' - First item
names[-1] 'starr' - Last item
names[0:3] ['lennon', 'mccartney', 'harrison'] - First three items
names[:3] ['lennon', 'mccartney', 'harrison'] - First three items
names[2:] ['harrison', 'starr'] - From position three (index location 2) to
end
names[-3:] ['mccartney', 'harrison', 'starr'] - Third from last to end
names2 = names[:] Makes a shallow copy of names
names[::-1] ['starr', 'harrison', 'mccartney', 'lennon'] - Copy in re-
verse order (stride -1)
list(range(10))[::3] [0, 3, 6, 9] - Every third item
10.1 Dictionaries
Dictionaries are mutable mappings of keys to values. Keys must be hashable, but values can be any
object. Here is a dictionary literal:
>>> instruments = {'Paul': 'Bass',
... 'John': 'Guitar'}
Dictionaries can also be made by calling the constructor with an optional mapping, an iterable, or
using keyword arguments. The iterable must be a sequence of 2-pairs:
>>> instruments = dict([('Paul', 'Bass'),
... ('John', 'Guitar')])
Copyright 2020 6
Python Cheatsheet
Operation Result
d.clear() Remove all items (mutates d)
d.copy() Shallow copy
d.fromkeys(iter, value=None) Create dict from iterable with values set to value
d.get(key, [default]) Get value for key or return default (None)
d.items() View of (key, value) pairs
d.keys() View of keys
d.pop(key, [default]) Return value for key or default (KeyError if not set)
d.popitem() Return arbitrary (key, value) tuple. KeyError if empty
d.setdefault(k, [default]) Does d.get(k, default). If k missing, sets to default
d.update(d2) Mutate d with values of d2 (dictionary or iterable of (key,
value) pairs)
d.values() View of values
Table 9: Dictionary Methods
Copyright 2020 7
Python Cheatsheet
11 Looping
You can loop over objects in a sequence:
>>> names = ['John', 'Paul', 'Ringo']
>>> for name in names:
... print(name)
John
Paul
Ringo
The continue statement skips over the body of the loop and continues at the next item of iteration:
>>> for name in names:
... if name == 'Paul':
... continue
... print(name)
John
Ringo
12 Comprehensions
Comprehension constructs allow us to combine the functional ideas behind map and filter into an
easy to read, single line of code. When you see code that is aggregating into a list (or dict, set, or
generator), you can replace it with a list comprehension (or dict, set comprehension, or generator
expression). Here is an example of the code smell:
>>> nums = range(10)
>>> result = []
>>> for num in nums:
... if num % 2 == 0: # filter
... result.append(num*num) # map
• Assign the result (result) to brackets. The brackets signal to the reader of the code that a list
will be returned:
result = [ ]
• Place the for loop construct inside the brackets. No colons are necessary:
Copyright 2020 8
Python Cheatsheet
• Insert any operations that filter the accumulation after the for loop:
result = [for num in nums if num % 2 == 0]
• Insert the accumulated object (num*num) at the front directly following the left bracket. Insert
parentheses around the object if it is a tuple:
result = [num*num for num in nums
if num % 2 == 0]
13 Functions
Functions may take input, do some processing, and return output. You can provide a docstring directly
following the name and parameters of the function:
>>> def add_numbers(x, y):
... """ add_numbers sums up x and y
...
... Arguments:
... x -- object that supports addition
... y -- object that supports addition
... """
... return x + y
We can create anonymous functions using the lambda statement. Because they only allow an
expression following the colon, it is somewhat crippled in functionality. They are commonly used as
a key argument to sorted, min, or max:
>>> add = lambda x, y: x + y
>>> add(4, 5)
9
Functions can have default arguments. Since Python 3.7, you can have more than 255 arguments
for a single function! Be careful with mutable types as arguments, as the default is bound to the
function when the function is created, not when it is called:
>>> def add_n(x, n=42):
... return x + n
>>> add_n(10)
52
>>> add_n(3, -10)
-7
14 Modules
A module is a Python file (ending in .py). Modules are files that end in .py. According to PEP 8, we
lowercase the module name and don’t put underscores between the words in them. Any module
found in the PYTHONPATH environment variable or the sys.path list, can be imported.
Copyright 2020 9
Python Cheatsheet
A directory that has a file named __init__.py in it is a package. A package can have modules in
it as well as sub packages. The package should be found in PYTHONPATH or sys.path to be imported.
An example might look like this:
packagename/
__init__.py
module1.py
module2.py
subpackage/
__init__.py
The __init__.py module can be empty or can import code from other modules in the package
to remove nesting in import statements.
You can import a package or a module:
import packagename
import packagename.module1
Assume there is a fib function in module1. You have access to everything in the namespace
of the module you imported. To use this function you will need to use the fully qualified name,
packagename.module1.fib:
import packagename.module1
packagename.module1.fib()
If you only want to import the fib function, use the from variant:
from packagename.module1 import fib
fib()
package_fib()
15 Classes
Python supports object oriented programming but doesn’t require you to create classes. You can
use the built-in data structures to great effect. Here’s a class for a simple bike. The class attribute,
num_passengers, is shared for all instances of Bike. The instance attributes, size and ratio, are
unique to each instance:
>>> class Bike:
... ''' Represents a bike '''
... num_passengers = 1 # class attribute
...
... def __init__(self, wheel_size,
... gear_ratio):
... ''' Create a bike specifying the
... wheel size, and gear ratio '''
... # instance attributes
... self.size = wheel_size
Copyright 2020 10
Python Cheatsheet
We can call the constructor (__init__), by invoking the class name. Note that self is the
instance, but Python passes that around for us automatically:
>>> bike = Bike(26, 34/13)
>>> print(bike.gear_inches())
68.0
We can access both class attributes and instance attributes on the instance:
>>> bike.num_passengers
1
>>> bike.size
26
If an attribute is not found on the instance, Python will then look for it on the class, it will
look through the parent classes to continue to try and find it. If the lookup is unsuccessful, an
AttributeError is raised.
16 Exceptions
Python can catch one or more exceptions (PEP 3110). You can provide a chain of different exceptions
to catch if you want to react differently. A few hints:
• Try to keep the block of the try statement down to the code that throws exceptions
If you use a bare raise inside of an except block, Python’s traceback will point back to the location
of the original exception, rather than where it is raised from.
>>> def avg(seq):
... try:
... result = sum(seq) / len(seq)
... except ZeroDivisionError as e:
... return None
... except Exception:
... raise
... return result
Copyright 2020 11
Python Cheatsheet
>>> avg('matt')
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +: 'int'
and 'str'
You can raise an exception using the raise statement (PEP 3109):
>>> def bad_code(x):
... raise ValueError('Bad code')
>>> bad_code(1)
Traceback (most recent call last):
...
ValueError: Bad code
17 NumPy
Standard import is:
import numpy as np
Operation Result
digits = np.array(range(10)) Make a NumPy array
digits.shape (10, ) - Number of items (tuple)
digits.dtype int64 - NumPy optimized block
np.log(digits) Return array with log of values
np.sin(digits) Return array with sine of values
digits.mean() Return mean of all values
digits + 10 Return array with value incremented by 10
nums = (np.arange(100) Create a two dimensional array (20 rows, 5 columns)
.reshape(20, 5))
nums.mean() Return mean of all values
nums.mean(axis=0) Return array with mean of each column (5 results)
nums.mean(axis=1) Return array with mean of each row (20 results)
nums.mean(axis=1, keepdims=True) Return array with mean of each row (20 results), but in 2
dimensions
18 NumPy Slicing
Operation Result
nums = (np.arange(100) Create a two dimensional array (20 rows, 5 columns)
.reshape(20, 5))
nums[0] First row
nums[[0,5,10]] Rows at index positions 0, 5 and 10
nums[0:10] First ten rows
Copyright 2020 12
Python Cheatsheet
19 Boolean Arrays
Operation Result
nums = (np.arange(100) Create a two dimensional array (20 rows, 5 columns)
.reshape(20, 5))
nums % 2 == 0 Return array with booleans where values are even
nums[nums %2 == 0] Return array where values are even
nums.sum(axis=1) < 100 Return array with booleans where row sum is less than 100
nums[nums.sum(axis=1) < 100] Return array with values where row sum is less than 100
nums.mean(axis=0) > 50 Return array with booleans where column sum is greater than
50
nums[:, nums.mean(axis=0) > 50] Return array with columns where column sum is greater than
50
20 Pandas
Operation Result
import pandas as pd Import pandas
df = pd.read_csv('names.csv') Load dataframe
df.age >= 30 Return series where entries in age column are >= 30
df[df.age >= 30] Return a dataframe with rows only where age >= 30
df.age + 2 Return series incrementing ages by 2
df.mean() Return series with mean of each column (column name in
index)
Copyright 2020 13