Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
Cheat Sheet
Cheat Sheet
Select single column: index=['B', 'D'],
df['col1'] columns=['col1', 'col3'])
#df3: new dataframe
Select multiple columns: Only merge complete rows (INNER JOIN):
Pandas provides data analysis tools for Python. All of the df[['col1', 'col2']] df.merge(df3)
following code examples refer to the dataframe below.
Show first n rows: Left column stays complete (LEFT OUTER JOIN):
df.head(2) df.merge(df3, how='left')
axis 1
col1 col2 Show last n rows: Right column stays complete (RIGHT OUTER JOIN):
df.tail(2) df.merge(df3, how='right')
A 1 4
Select rows by index values: Preserve all values (OUTER JOIN):
axis 0
df = B 2 5 df.loc['A'] df.loc[['A', 'B']] df.merge(df3, how='outer')
Cheat Sheet
X_train,X_test,y_train,y_test = train_test_split(X,y,
random_state = 0)#Splits data into training and test set
Cheat Sheet
y = [40, 50, 33] Workflow
plt.bar(x, y)
plt.show() import seaborn as sns
import matplotlib.pyplot as plt
Matplotlib is a Python 2D plotting library that produces Piechart import pandas as pd
plt.pie(y, labels=x, autopct='%.0f %%') Lineplot
figures in a variety of formats. plt.figure(figsize=(10, 5))
plt.show()
Figure flights = sns.load_dataset("flights")
Y-axis Histogram may_flights=flights.query("month=='May'")
ages = [15, 16, 17, 30, 31, 32, 35] ax = sns.lineplot(data=may_flights,
bins = [15, 20, 25, 30, 35] x="year",
plt.hist(ages, bins, edgecolor='black') y="passengers")
plt.show() ax.set(xlabel='x', ylabel='y',
title='my_title, xticks=[1,2,3])
Boxplots ax.legend(title='my_legend,
ages = [15, 16, 17, 30, 31, 32, 35] title_fontsize=13)
Matplotlib X-axis
plt.boxplot(ages) plt.show()
Workflow plt.show()
Barplot
The basic steps to creating plots with matplotlib are Prepare Scatterplot tips = sns.load_dataset("tips")
a = [1, 2, 3, 4, 5, 4, 3 ,2, 5, 6, 7] ax = sns.barplot(x="day",
Data, Plot, Customize Plot, Save Plot and Show Plot. y="total_bill,
b = [7, 2, 3, 5, 5, 7, 3, 2, 6, 3, 2]
import matplotlib.pyplot as plt plt.scatter(a, b) data=tips)
Example with lineplot plt.show() Histogram
penguins = sns.load_dataset("penguins")
Prepare data sns.histplot(data=penguins,
x = [2017, 2018, 2019, 2020, 2021]
y = [43, 45, 47, 48, 50]
Subplots Boxplot
x="flipper_length_mm")
Add the code below to make multple plots with 'n' tips = sns.load_dataset("tips")
Plot & Customize Plot ax = sns.boxplot(x=tips["total_bill"])
number of rows and columns.
plt.plot(x,y,marker='o',linestyle='--',
fig, ax = plt.subplots(nrows=1, Scatterplot
color='g', label='USA') ncols=2, tips = sns.load_dataset("tips")
plt.xlabel('Years') sharey=True, sns.scatterplot(data=tips,
plt.ylabel('Population (M)') figsize=(12, 4)) x="total_bill",
Plot & Customize Each Graph y="tip")
plt.title('Years vs Population') ax[0].plot(x, y, color='g')
plt.legend(loc='lower right') ax[0].legend()
Figure aesthetics
ax[1].plot(a, b, color='r') sns.set_style('darkgrid') #stlyes
plt.yticks([41, 45, 48, 51]) sns.set_palette('husl', 3) #palettes
ax[1].legend()
Save Plot plt.show() sns.color_palette('husl') #colors
plt.savefig('example.png')
Fontsize of the axes title, x and y labels, tick labels
Show Plot Find practical examples in these and legend:
plt.show() guides I made: plt.rc('axes', titlesize=18)
plt.rc('axes', labelsize=14)
Markers: '.', 'o', 'v', '<', '>' - Matplotlib & Seaborn Guide (link) plt.rc('xtick', labelsize=13)
Line Styles: '-', '--', '-.', ':' - Wordclouds Guide (link) plt.rc('ytick', labelsize=13)
Colors: 'b', 'g', 'r', 'y' #blue, green, red, yellow - Comparing Data Viz libraries(link) plt.rc('legend', fontsize=13)
plt.rc('font', size=13)
Made by Frank Andrade frank-andrade.medium.com
Web Scraping “Siblings” are nodes with the same parent.
A node’s children and its children’s children are
XPath
Cheat Sheet
called its “descendants”. Similarly, a node’s parent We need to learn XPath to scrape with Selenium or
and its parent’s parent are called its “ancestors”. Scrapy.
it’s recommended to find element in this order.
a. ID
Web Scraping is the process of extracting data from a b. Class name XPath Syntax
website. Before studying Beautiful Soup and Selenium, it's c. Tag name An XPath usually contains a tag name, attribute
d. Xpath
good to review some HTML basics first. name, and attribute value.