Python Course Cheat Sheet
Python Course Cheat Sheet
Python Course Cheat Sheet
Basic
a. While Loop
i=1
while i < 5:
print('i is: {}'.format(i))
i = i+1
i is: 1
i is: 2
i is: 3
i is: 4
b. range()
range(5)
range(0, 5)
c. list comprehension
x = [1,2,3,4]
[item**2 for item in x]
[1, 4, 9, 16]
seq = [1,2,3,4,5]
list(filter(lambda item: item%2 == 0,seq))
2. Numpy
np.zeros((5,5))
np.ones(3)
array([ 1., 1., 1.])
e. max,min,argmax,argmin
ranarr = np.random.randint(0,50,10)
array([10, 12, 41, 17, 49, 2, 46, 3, 19, 39])
ranarr.max()
49
ranarr.argmax()
4
ranarr.min()
2
ranarr.argmin()
5
f. dtype - You can also grab the data type of the object in the array:
arr.dtype
dtype('int64')
3. Pandas
a. DataFrames
from numpy.random import randn
np.random.seed(101)
State
W X Y Z
s
- -
0.65111 0.60596
B 0.31931 0.84807 NY
8 5
8 7
- -
0.74012 0.52881
C 2.01816 0.58900 WY
2 3
8 1
- -
0.18869 0.95505
D 0.75887 0.93323 OR
5 7
2 7
# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
[('G1', 1), ('G1', 2), ('G1', 3), ('G2', 1), ('G2', 2), ('G2',
3)]
hier_index = pd.MultiIndex.from_tuples(hier_index)
df.loc['G1']
A B
-
0.01869
1 0.94299
0
9
1.41079 1.00219
2
7 5
-
0.34369
3 2.37165
8
2
c. Missing Data
df.dropna()
Df.dropna(axis=1)
df.dropna(thresh=2)
It would drop the rows which have less than 2 non-null values
df.fillna(value='FILL VALUE')
key key
A B C D
1 2
A B
0 K0 K0 C0 D0
0 0
A B Na Na
1 K0 K1
1 1 N N
A B
2 K1 K0 C1 D1
2 2
A B
3 K1 K0 C2 D2
2 2
A B Na Na
4 K2 K1
3 3 N N
A B C D
K A B
C0 D0
0 0 0
K A B Na Na
1 1 1 N N
K A B
C2 D2
2 2 2
e. Pivot Table
df.pivot_table(values='D',index=['A', 'B'],columns=['C'])
Visualization
import matplotlib.pyplot as plt
%matplotlib inline (You'll also need to use this line to see plots in the
notebook:)
plt.show() - to have the figure pop up in another window.
axes.plot(x, y, 'r')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
# custom dash
line, = ax.plot(x, x+8, color="black", lw=1.50)
line.set_dashes([5, 10, 15, 10]) # format: line length, space length, ...
# possible marker symbols: marker = '+', 'o', '*', 's', ',', '.', '1', '2', '3', '4', ...
ax.plot(x, x+ 9, color="blue", lw=3, ls='-', marker='+')
ax.plot(x, x+10, color="blue", lw=3, ls='--', marker='o')
ax.plot(x, x+11, color="blue", lw=3, ls='-', marker='s')
ax.plot(x, x+12, color="blue", lw=3, ls='--', marker='1')
axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))
axes[0].set_title("scatter")
sns.distplot(tips['total_bill'],kde=False,bins=30)
## jointplot
jointplot() allows you to basically match up two distplots for bivariate data. With your
choice of what **kind** parameter to compare with:
* “scatter”
* “reg”
* “resid”
* “kde”
* “hex”
sns.jointplot(x='total_bill',y='tip',data=tips,kind='scatter')
sns.jointplot(x='total_bill',y='tip',data=tips,kind='hex')
sns.jointplot(x='total_bill',y='tip',data=tips,kind='reg')
pairplot
pairplot will plot pairwise relationships across an entire dataframe (for the
numerical columns) and supports a color hue argument (for categorical
columns).
sns.pairplot(tips,hue='sex',palette='coolwarm')
## rugplot
rugplots are actually a very simple concept, they just draw a dash mark for every point
on a univariate distribution. They are the building block of a KDE plot:
sns.rugplot(tips['total_bill'])
barplot and countplot
These very similar plots allow you to get aggregate data off a
categorical feature in your data. barplot is a general plot that allows
you to aggregate the categorical data based off some function, by
default the mean:
sns.barplot(x='sex',y='total_bill',data=tips)
sns.barplot(x='sex',y='total_bill',data=tips,estimator=np.std)
sns.countplot(x='sex',data=tips)
A violin plot plays a similar role as a box and whisker plot. It shows
the distribution of quantitative data across several levels of one (or
more) categorical variables such that those distributions can be
compared. Unlike a box plot, in which all of the plot components
correspond to actual datapoints, the violin plot features a kernel
density estimation of the underlying distribution.
sns.factorplot(x='sex',y='total_bill',data=tips,kind='bar')
Matrix Plots
Matrix plots allow you to plot data as color-encoded matrices and can also be
used to indicate clusters within the data (later in the machine learning
section we will learn how to formally cluster data).
flights = sns.load_dataset('flights')
sns.heatmap(tips.corr(),cmap='coolwarm',annot=True)
flights.pivot_table(values='passengers',index='month',columns='year')
sns.clustermap(pvflights)
# Just the Grid
sns.PairGrid(iris)
sns.lmplot(x='total_bill',y='tip',data=tips)
sns.lmplot(x='total_bill',y='tip',data=tips,col='sex')
sns.countplot(x='sex',data=tips)
sns.despine()
_
# Pandas Built-in Data Visualization
import numpy as np
import pandas as pd
%matplotlib inline
df1 = pd.read_csv('df1',index_col=0)
df2 = pd.read_csv('df2')
df1['A'].hist()
df1['A'].hist()
plt.style.use('bmh')
df1['A'].hist()
plt.style.use('dark_background')
df1['A'].hist()
plt.style.use('fivethirtyeight')
df1['A'].hist()
Area
df2.plot.area(alpha=0.4)
Barplots
df2.head()
df2.plot.bar()
df2.plot.bar(stacked=True)
Histogram
df1['A'].plot.hist(bins=50)
Line Plots
df1.plot.line(x=df1.index,y='B',figsize=(12,3),lw=1)
Scatter Plots
df1.plot.scatter(x='A',y='B')
BoxPlots
df2.plot.box() # Can also pass a by= argument for groupby
PlotLy
df = pd.DataFrame(np.random.randn(100,4),columns='A B C D'.split())
df.head()
A B C D
- - -
0.38792
2 0.15879 0.63537 0.63755
6
3 1 8
3 - 1.39342 - -
A B C D
- -
1.25315 0.30291
4 0.53759 2.54608
2 7
8 3
Scatter
df.iplot(kind='scatter',x='A',y='B',mode='markers',size=10)
df.count().iplot(kind='bar')
Boxplots
df.iplot(kind='box')
3d Surface
df3 = pd.DataFrame({'x':[1,2,3,4,5],'y':[10,20,30,20,10],'z':[5,4,3,2,1]})
df3.iplot(kind='surface',colorscale='rdylbu')
Spread
df[['A','B']].iplot(kind='spread')
histogram
df['A'].iplot(kind='hist',bins=25)
scatter_matrix()
Similar to sns.pairplot()
df.scatter_matrix()
df.idxmin() – returns lowest index value column
BAC['Close'].loc['2015-01-01':'2016-01-01'].ta_plot(study='boll')
MS['Close'].loc['2015-01-01':'2016-01-
01'].ta_plot(study='sma',periods=[13,21,55],title='Simple Moving Averages')