essential_python
essential_python
Python for
Data Scientists
A step-by-step roadmap
Dawn Choo
Did you know 92% of
Data Science jobs
require Python?
Here are essential
Python skills for
Data Scientists
1 Learn Python fundamentals
Key concepts
Control structures
if, elif, else
for loop
while loop
range()
Functions
def
return
args
List comprehensions
[expression for item in iterable if condition]
1 Learn Python fundamentals
Test your skills
Exercise 1
Exercise 2
Exercise 3
DataFrame operations
pd.DataFrame()
df.head(), df.tail()
df.info(), df.describe()
Data cleaning
df.dropna(), df.fillna()
df.drop_duplicates()
df.replace()
Exercise 1
Exercise 2
Exercise 3
Descriptive statistics
df.mean(), df.median(), df.mode()
df.std(), df.var()
df.min(), df.max(), df.quantile()
Data distribution
df.hist()
plt.hist()
scipy.stats.normaltest()
Correlation analysis
df.corr()
plt.imshow() (for heatmaps)
scipy.stats.pearsonr()
3 Exploratory Data Analysis
Key concepts
Outlier detection
plt.boxplot()
scipy.stats.zscore()
IQR method using numpy percentile
Exercise 1
Exercise 2
Exercise 3
Basic plotting
plt.plot() (line plots)
plt.scatter() (scatter plots)
plt.bar() (bar charts)
Box plots
plt.boxplot()
Customizing plots
plt.xlabel(), plt.ylabel(), plt.title()
plt.xscale(), plt.yscale()
plt.legend()
4 Data Visualization
Test your skills
For these exercises, use any dataset you like on Kaggle.
Exercise 1
Exercise 2
Exercise 3
Regression models
sklearn.linear_model.LinearRegression()
sklearn.metrics.mean_squared_error()
sklearn.metrics.r2_score()
Classification models
sklearn.linear_model.LogisticRegression()
sklearn.metrics.accuracy_score()
sklearn.metrics.confusion_matrix()
Clustering
sklearn.cluster.KMeans()
sklearn.metrics.silhouette_score()
5 Machine learning basics
Test your skills
For these exercises, use any dataset you like on Kaggle.
Exercise 1
Split the dataset into training and test
sets, then build and evaluate a linear
regression model to predict a continuous
target variable.
Exercise 2
Implement a logistic regression classifier,
use cross-validation to assess its
performance, and interpret the model
coefficients.
Exercise 3
Perform k-means clustering on the
dataset, determine the optimal number of
clusters, and visualize the results.
Have any questions?
Share them in the comments below!
Found this
useful?
Save it
Follow me
Repost it Dawn Choo