Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

supervised learning using python - chapter1

Datacamp supervised learning chapter 1

Uploaded by

senarkitgame
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

supervised learning using python - chapter1

Datacamp supervised learning chapter 1

Uploaded by

senarkitgame
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Machine learning

with scikit-learn
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

George Boorman
Core Curriculum Manager, DataCamp
What is machine learning?
Machine learning is the process whereby:
Computers are given the ability to learn to make decisions from data

without being explicitly programmed!

SUPERVISED LEARNING WITH SCIKIT-LEARN


Examples of machine learning

SUPERVISED LEARNING WITH SCIKIT-LEARN


Unsupervised learning
Uncovering hidden patterns from unlabeled data
Example:
Grouping customers into distinct categories (Clustering)

SUPERVISED LEARNING WITH SCIKIT-LEARN


Supervised learning
The predicted values are known

Aim: Predict the target values of unseen data, given the features

SUPERVISED LEARNING WITH SCIKIT-LEARN


Types of supervised learning
Classification: Target variable consists of Regression: Target variable is continuous
categories

SUPERVISED LEARNING WITH SCIKIT-LEARN


Naming conventions
Feature = predictor variable = independent variable
Target variable = dependent variable = response variable

SUPERVISED LEARNING WITH SCIKIT-LEARN


Before you use supervised learning
Requirements:
No missing values

Data in numeric format

Data stored in pandas DataFrame or NumPy array

Perform Exploratory Data Analysis (EDA) first

SUPERVISED LEARNING WITH SCIKIT-LEARN


scikit-learn syntax
from sklearn.module import Model
model = Model()
model.fit(X, y)
predictions = model.predict(X_new)
print(predictions)

array([0, 0, 0, 0, 1, 0])

SUPERVISED LEARNING WITH SCIKIT-LEARN


Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
The classification
challenge
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

George Boorman
Core Curriculum Manager, DataCamp
Classifying labels of unseen data
1. Build a model
2. Model learns from the labeled data we pass to it

3. Pass unlabeled data to the model as input

4. Model predicts the labels of the unseen data

Labeled data = training data

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors
Predict the label of a data point by
Looking at the k closest labeled data points

Taking a majority vote

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN


KNN Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN


KNN Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN


Using scikit-learn to fit a classifier
from sklearn.neighbors import KNeighborsClassifier
X = churn_df[["total_day_charge", "total_eve_charge"]].values
y = churn_df["churn"].values
print(X.shape, y.shape)

(3333, 2), (3333,)

knn = KNeighborsClassifier(n_neighbors=15)
knn.fit(X, y)

SUPERVISED LEARNING WITH SCIKIT-LEARN


Predicting on unlabeled data
X_new = np.array([[56.8, 17.5],
[24.4, 24.1],
[50.1, 10.9]])
print(X_new.shape)

(3, 2)

predictions = knn.predict(X_new)
print('Predictions: {}'.format(predictions))

Predictions: [1 0 0]

SUPERVISED LEARNING WITH SCIKIT-LEARN


Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
Measuring model
performance
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

George Boorman
Core Curriculum Manager, DataCamp
Measuring model performance
In classification, accuracy is a commonly used metric
Accuracy:

SUPERVISED LEARNING WITH SCIKIT-LEARN


Measuring model performance
How do we measure accuracy?
Could compute accuracy on the data used to fit the classifier

NOT indicative of ability to generalize

SUPERVISED LEARNING WITH SCIKIT-LEARN


Computing accuracy

SUPERVISED LEARNING WITH SCIKIT-LEARN


Computing accuracy

SUPERVISED LEARNING WITH SCIKIT-LEARN


Computing accuracy

SUPERVISED LEARNING WITH SCIKIT-LEARN


Train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=21, stratify=y)
knn = KNeighborsClassifier(n_neighbors=6)
knn.fit(X_train, y_train)
print(knn.score(X_test, y_test))

0.8800599700149925

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity
Larger k = less complex model = can cause underfitting
Smaller k = more complex model = can lead to overfitting

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity and over/underfitting
train_accuracies = {}
test_accuracies = {}
neighbors = np.arange(1, 26)
for neighbor in neighbors:
knn = KNeighborsClassifier(n_neighbors=neighbor)
knn.fit(X_train, y_train)
train_accuracies[neighbor] = knn.score(X_train, y_train)
test_accuracies[neighbor] = knn.score(X_test, y_test)

SUPERVISED LEARNING WITH SCIKIT-LEARN


Plotting our results
plt.figure(figsize=(8, 6))
plt.title("KNN: Varying Number of Neighbors")
plt.plot(neighbors, train_accuracies.values(), label="Training Accuracy")
plt.plot(neighbors, test_accuracies.values(), label="Testing Accuracy")
plt.legend()
plt.xlabel("Number of Neighbors")
plt.ylabel("Accuracy")
plt.show()

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity curve

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity curve

SUPERVISED LEARNING WITH SCIKIT-LEARN


Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

You might also like