Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
243 views

Scikit Learn Cheat Sheet

This document provides a summary of a scikit-learn cheat sheet for machine learning in Python. It introduces scikit-learn and its basic workflow, including loading and preprocessing data, creating and fitting models, making predictions, and evaluating performance. Code examples are provided for common machine learning tasks like classification, regression, clustering, and cross-validation.

Uploaded by

burhan ök
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
243 views

Scikit Learn Cheat Sheet

This document provides a summary of a scikit-learn cheat sheet for machine learning in Python. It introduces scikit-learn and its basic workflow, including loading and preprocessing data, creating and fitting models, making predictions, and evaluating performance. Code examples are provided for common machine learning tasks like classification, regression, clustering, and cross-validation.

Uploaded by

burhan ök
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

09.07.

2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

Scikit-Learn Cheat Sheet: Python Machine


Learning
A handy sc k t-learn cheat sheet to mach ne learn ng w th Python, nclud ng
code examples.

Most of you who are learn ng data sc ence w th Python w ll have def n tely heard already about
scikit-learn , the open source Python l brary that mplements a w de var ety of mach ne learn ng,
preprocess ng, cross-val dat on and v sual zat on algor thms w th the help of a un f ed nterface.

If you're st ll qu te new to the f eld, you should be aware that mach ne learn ng, and thus also th s
Python l brary, belong to the must-knows for every asp r ng data sc ent st.

That's why DataCamp has created a scikit-learn cheat sheet for those of you who have already
started learn ng about the Python package, but that st ll want a handy reference sheet. Or, f you st ll
have no dea about how scikit-learn works, th s mach ne learn ng cheat sheet m ght come n handy
to get a qu ck f rst dea of the bas cs that you need to know to get started.

E ther way, we're sure that you're go ng to f nd t useful when you're tackl ng mach ne learn ng
problems!

Th s scikit-learn cheat sheet w ll ntroduce you to the bas c steps that you need to go through to
mplement mach ne learn ng algor thms successfully: you'll see how to load n your data, how to
preprocess t, how to create your own model to wh ch you can f t your data and pred ct target labels,
how to val date your model and how to tune t further to mprove ts performance.

https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 1/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

In short, th s cheat sheet w ll k ckstart your data sc ence projects: w th the help of code examples, you'll
have created, val dated and tuned your mach ne learn ng models n no t me.

So what are you wa t ng for? T me to get started!

(Cl ck above to download a pr ntable vers on or read the onl ne vers on below.)

Python For Data Science Cheat Sheet: Scikit-learn


Sc k t-learn s an open source Python l brary that mplements a range of mach ne learn ng,
preprocess ng, cross-val dat on and v sual zat on algor thms us ng a un f ed nterface.

A Basic Example

>>> from sklearn import neighbors, datasets, preprocessing


>>> from sklearn.model_selection import train_test_split
>>> from sklearn.metrics import accuracy_score
https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 2/9
09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> iris = datasets.load_iris()


>>> X, y = iris.data[:, :2], iris.target
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33)
>>> scaler = preprocessing.StandardScaler().fit(X_train)
>>> X_train = scaler.transform(X_train)
>>> X_test = scaler.transform(X_test)
>>> knn = neighbors.KNeighborsClassifier(n_neighbors=5)
>>> knn.fit(X_train, y_train)
>>> y_pred = knn.predict(X_test)
>>> accuracy_score(y_test, y_pred)

Loading The Data

Your data needs to be numer c and stored as NumPy arrays or Sc Py sparse matr ces. Other types that
are convert ble to numer c arrays, such as Pandas DataFrame, are also acceptable.

>>> import numpy as np


>>> X = np.random.random((10,5))
>>> y = np.array(['M','M','F','F','M','F','M','M','F','F','F'])
>>> X[X < 0.7] = 0

Preprocessing The Data

Standardization

>>> from sklearn.preprocessing import StandardScaler


>>> scaler = StandardScaler().fit(X_train)
>>> standardized_X = scaler.transform(X_train)
>>> standardized_X_test = scaler.transform(X_test)

Normalization

>>> from sklearn.preprocessing import Normalizer


>>> scaler = Normalizer().fit(X_train)
>>> normalized_X = scaler.transform(X_train)
>>> normalized_X_test = scaler.transform(X_test)

Binarization

https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 3/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> from sklearn.preprocessing import Binarizer


>>> binarizer = Binarizer(threshold=0.0).fit(X)
>>> binary_X = binarizer.transform(X)

Encoding Categorical Features

>>> from sklearn.preprocessing import LabelEncoder


>>> enc = LabelEncoder()
>>> y = enc.fit_transform(y)

Imputing Missing Values

>>>from sklearn.preprocessing import Imputer


>>>imp = Imputer(missing_values=0, strategy='mean', axis=0)
>>>imp.fit_transform(X_train)

Generating Polynomial Features

>>> from sklearn.preprocessing import PolynomialFeatures


>>> poly = PolynomialFeatures(5)
>>> oly.fit_transform(X)

Training And Test Data

>>> from sklearn.model_selection import train_test_split


>>> X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)

Create Your Model

Supervised Learning Estimators

L near Regress on

>>> from sklearn.linear_model import LinearRegression


>>> lr = LinearRegression(normalize=True)

Support Vector Mach nes (SVM)

https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 4/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> from sklearn.svm import SVC


>>> svc = SVC(kernel='linear')

Na ve Bayes

>>> from sklearn.naive_bayes import GaussianNB


>>> gnb = GaussianNB()

KNN

>>> from sklearn import neighbors


>>> knn = neighbors.KNeighborsClassifier(n_neighbors=5)

Unsuperv sed Learn ng Est mators

Pr nc pal Component Analys s (PCA)

>>> from sklearn.decomposition import PCA


>>> pca = PCA(n_components=0.95)

K Means

>>> from sklearn.cluster import KMeans


>>> k_means = KMeans(n_clusters=3, random_state=0)

Model Fitting

Supervised learning

>>> lr.fit(X, y)
>>> knn.fit(X_train, y_train)
>>> svc.fit(X_train, y_train)

Unsuperv sed Learn ng

>>> k_means.fit(X_train)
fi f (
https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet
i ) 5/9
09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp
>>> pca_model = pca.fit_transform(X_train)

Prediction

Superv sed Est mators

>>> y_pred = svc.predict(np.random.random((2,5)))

>>> y_pred = lr.predict(X_test)

>>> y_pred = knn.predict_proba(X_test))

Unsuperv sed Est mators

>>> y_pred = k_means.predict(X_test)

Evaluate Your Model's Performance

Classification Metrics

Accuracy Score

>>> knn.score(X_test, y_test)


>>> from sklearn.metrics import accuracy_score
>>> accuracy_score(y_test, y_pred)

Class f cat on Report

>>> from sklearn.metrics import classification_report


>>> print(classification_report(y_test, y_pred)))

Confus on Matr x

>>> from sklearn.metrics import confusion_matrix


>>> print(confusion_matrix(y_test, y_pred)))

Regression Metrics
https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 6/9
09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

Mean Absolute Error

>>> from sklearn.metrics import mean_absolute_error


>>> y_true = [3, -0.5, 2])
>>> mean_absolute_error(y_true, y_pred))

Mean Squared Error

>>> from sklearn.metrics import mean_squared_error


>>> mean_squared_error(y_test, y_pred))

R2 Score

>>> from sklearn.metrics import r2_score


>>> r2_score(y_true, y_pred))

Clustering Metrics

Adjusted Rand Index

>>> from sklearn.metrics import adjusted_rand_score


>>> adjusted_rand_score(y_true, y_pred))

Homogene ty

>>> from sklearn.metrics import homogeneity_score


>>> homogeneity_score(y_true, y_pred))

V-measure

>>> from sklearn.metrics import v_measure_score


>>> metrics.v_measure_score(y_true, y_pred))

Cross-Validation

>>> print(cross_val_score(knn, X_train, y_train, cv=4))

https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 7/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

>>> print(cross_val_score(lr, X, y, cv=2))

Tune Your Model

Grid Search

>>> from sklearn.grid_search import GridSearchCV

>>> params = {"n_neighbors": np.arange(1,3), "metric": ["euclidean", "cityblock"]}

>>> grid = GridSearchCV(estimator=knn,param_grid=params)

>>> grid.fit(X_train, y_train)

>>> print(grid.best_score_)

>>> print(grid.best_estimator_.n_neighbors)

Randomized Parameter Optimization

>>> from sklearn.grid_search import RandomizedSearchCV

>>> params = {"n_neighbors": range(1,5), "weights": ["uniform", "distance"]}

>>> rsearch = RandomizedSearchCV(estimator=knn,


param_distributions=params,
cv=4,
n_iter=8,
random_state=5)

>>> rsearch.fit(X_train, y_train)

>>> print(rsearch.best_score_)

https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 8/9


09.07.2019 Sc k t-Learn Cheat Sheet: Python Mach ne Learn ng (art cle) - DataCamp

Going Further
Beg n w th our sc k t-learn tutor al for beg nners, n wh ch you'll learn n an easy, step-by-step way how
to explore handwr tten d g ts data, how to create a model for t, how to f t your data to your model and
how to pred ct target values. In add t on, you'll make use of Python's data v sual zat on l brary
matplotl b to v sual ze your results.

https://www.datacamp.com/commun ty/blog/sc k t-learn-cheat-sheet 9/9

You might also like