Scikit Learn - Quick Guide
Scikit Learn - Quick Guide
Scikit Learn - Quick Guide
Tutorialspoint
More Detail
Tutorialspoint
More Detail
Tutorialspoint
More Detail
In this chapter, we will understand what is Scikit-Learn or Sklearn, origin of Scikit-Learn and
some other related topics such as communities and contributors responsible for development and
maintenance of Scikit-Learn, its prerequisites, installation and its features.
Origin of Scikit-Learn
It was originally called scikits.learn and was initially developed by David Cournapeau as a
Google summer of code project in 2007. Later, in 2010, Fabian Pedregosa, Gael Varoquaux,
Alexandre Gramfort, and Vincent Michel, from FIRCA (French Institute for Research in Computer
Science and Automation), took this project at another level and made the first public release (v0.1
beta) on 1st Feb. 2010.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 2/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Various organisations like Booking.com, JP Morgan, Evernote, Inria, AWeber, Spotify and many
more are using Sklearn.
Prerequisites
Before we start using scikit-learn latest release, we require the following −
Python (>=3.5)
Pandas (>= 0.18.0) is required for some of the scikit-learn examples using data structure
and analysis.
Installation
If you already installed NumPy and Scipy, following are the two easiest ways to install scikit-learn
−
Using pip
Following command can be used to install scikit-learn via pip −
Using conda
Following command can be used to install scikit-learn via conda −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 3/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
On the other hand, if NumPy and Scipy is not yet installed on your Python workstation then, you
can install them by using either pip or conda.
Another option to use scikit-learn is to use Python distributions like Canopy and Anaconda
because they both ship the latest version of scikit-learn.
Features
Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn
are as follows −
Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-
learn.
Unsupervised Learning algorithms − On the other hand, it also has all the popular
unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component
Analysis) to unsupervised neural networks.
Cross Validation − It is used to check the accuracy of supervised models on unseen data.
Dimensionality Reduction − It is used for reducing the number of attributes in data which can
be further used for summarisation, visualisation and feature selection.
Ensemble methods − As name suggest, it is used for combining the predictions of multiple
supervised models.
Feature extraction − It is used to extract the features from data to define the attributes in image
and text data.
Open Source − It is open source library and also commercially usable under BSD license.
Dataset Loading
A collection of data is called dataset. It is having the following two components −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 4/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Features − The variables of data are called its features. They are also known as predictors,
inputs or attributes.
Feature matrix − It is the collection of features, in case there are more than one.
Response − It is the output variable that basically depends upon the feature variables. They are
also known as target, label or output.
Response Vector − It is used to represent response column. Generally, we have just one
response column.
Scikit-learn have few example datasets like iris and digits for classification and the Boston
house prices for regression.
Example
Following is an example to load iris dataset −
Output
Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'p
Target names: ['setosa' 'versicolor' 'virginica']
First 10 rows of X:
[
[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 5/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
]
Example
The following example will split the data into 70:30 ratio, i.e. 70% data will be used as training
data and 30% will be used as testing data. The dataset is iris dataset as in above example.
X = iris.data
y = iris.target
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
Output
(105, 4)
(45, 4)
(105,)
(45,)
As seen in the example above, it uses train_test_split() function of scikit-learn to split the
dataset. This function has the following arguments −
X, y − Here, X is the feature matrix and y is the response vector, which need to be split.
test_size − This represents the ratio of test data to the total given data. As in the above
example, we are setting test_data = 0.3 for 150 rows of X. It will produce test data of
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 6/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
150*0.3 = 45 rows.
random_size − It is used to guarantee that the split will always be the same. This is useful
in the situations where you want reproducible results.
Example
In the example below, we are going to use KNN (K nearest neighbors) classifier. Don’t go into the
details of KNN algorithms, as there will be a separate chapter for that. This example is used to
make you understand the implementation part only.
Output
Accuracy: 0.9833333333333333
Predictions: ['versicolor', 'virginica']
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 7/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Model Persistence
Once you train the model, it is desirable that the model should be persist for future use so that we
do not need to retrain it again and again. It can be done with the help of dump and load features
of joblib package.
Consider the example below in which we will be saving the above trained model (classifier_knn)
for future use −
The above code will save the model into file named iris_classifier_knn.joblib. Now, the object can
be reloaded from the file with the help of following code −
joblib.load('iris_classifier_knn.joblib')
Binarisation
This preprocessing technique is used when we need to convert our numerical values into
Boolean values.
Example
import numpy as np
from sklearn import preprocessing
Input_data = np.array(
[2.1, -1.9, 5.5],
[-1.5, 2.4, 3.5],
[0.5, -7.9, 5.6],
[5.9, 2.3, -5.8]]
)
data_binarized = preprocessing.Binarizer(threshold=0.5).transform(input_dat
print("\nBinarized data:\n", data_binarized)
In the above example, we used threshold value = 0.5 and that is why, all the values above 0.5
would be converted to 1, and all the values below 0.5 would be converted to 0.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 8/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
Binarized data:
[
[ 1. 0. 1.]
[ 0. 1. 1.]
[ 0. 0. 1.]
[ 1. 1. 0.]
]
Mean Removal
This technique is used to eliminate the mean from feature vector so that every feature centered
on zero.
Example
import numpy as np
from sklearn import preprocessing
Input_data = np.array(
[2.1, -1.9, 5.5],
[-1.5, 2.4, 3.5],
[0.5, -7.9, 5.6],
[5.9, 2.3, -5.8]]
)
#displaying the mean and the standard deviation of the input data
print("Mean =", input_data.mean(axis=0))
print("Stddeviation = ", input_data.std(axis=0))
#Removing the mean and the standard deviation of the input data
data_scaled = preprocessing.scale(input_data)
print("Mean_removed =", data_scaled.mean(axis=0))
print("Stddeviation_removed =", data_scaled.std(axis=0))
Output
Mean = [ 1.75 -1.275 2.2 ]
Stddeviation = [ 2.71431391 4.20022321 4.69414529]
Mean_removed = [ 1.11022302e-16 0.00000000e+00 0.00000000e+00]
Stddeviation_removed = [ 1. 1. 1.]
Scaling
We use this preprocessing technique for scaling the feature vectors. Scaling of feature vectors is
important, because the features should not be synthetically large or small.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 9/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
import numpy as np
from sklearn import preprocessing
Input_data = np.array(
[
[2.1, -1.9, 5.5],
[-1.5, 2.4, 3.5],
[0.5, -7.9, 5.6],
[5.9, 2.3, -5.8]
]
)
data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print ("\nMin max scaled data:\n", data_scaled_minmax)
Output
Min max scaled data:
[
[ 0.48648649 0.58252427 0.99122807]
[ 0. 1. 0.81578947]
[ 0.27027027 0. 1. ]
[ 1. 0.99029126 0. ]
]
Normalisation
We use this preprocessing technique for modifying the feature vectors. Normalisation of feature
vectors is necessary so that the feature vectors can be measured at common scale. There are
two types of normalisation as follows −
L1 Normalisation
It is also called Least Absolute Deviations. It modifies the value in such a manner that the sum of
the absolute values remains always up to 1 in each row. Following example shows the
implementation of L1 normalisation on input data.
Example
import numpy as np
from sklearn import preprocessing
Input_data = np.array(
[
[2.1, -1.9, 5.5],
[-1.5, 2.4, 3.5],
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 10/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
L1 normalized data:
[
[ 0.22105263 -0.2 0.57894737]
[-0.2027027 0.32432432 0.47297297]
[ 0.03571429 -0.56428571 0.4 ]
[ 0.42142857 0.16428571 -0.41428571]
]
L2 Normalisation
Also called Least Squares. It modifies the value in such a manner that the sum of the squares
remains always up to 1 in each row. Following example shows the implementation of L2
normalisation on input data.
Example
import numpy as np
from sklearn import preprocessing
Input_data = np.array(
[
[2.1, -1.9, 5.5],
[-1.5, 2.4, 3.5],
[0.5, -7.9, 5.6],
[5.9, 2.3, -5.8]
]
)
data_normalized_l2 = preprocessing.normalize(input_data, norm='l2')
print("\nL1 normalized data:\n", data_normalized_l2)
Output
L2 normalized data:
[
[ 0.33946114 -0.30713151 0.88906489]
[-0.33325106 0.53320169 0.7775858 ]
[ 0.05156558 -0.81473612 0.57753446]
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 11/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Data as table
The best way to represent data in Scikit-learn is in the form of tables. A table represents a 2-D
grid of data where rows represent the individual elements of the dataset and the columns
represents the quantities related to those individual elements.
Example
With the example given below, we can download iris dataset in the form of a Pandas DataFrame
with the help of python seaborn library.
Output
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
From above output, we can see that each row of the data represents a single observed flower
and the number of rows represents the total number of flowers in the dataset. Generally, we refer
the rows of the matrix as samples.
On the other hand, each column of the data represents a quantitative information describing each
sample. Generally, we refer the columns of the matrix as features.
told earlier, the samples always represent the individual objects described by the dataset and the
features represents the distinct observations that describe each sample in a quantitative manner.
Example
In the example below, from iris dataset we predict the species of flower based on the other
measurements. In this case, the Species column would be considered as the feature.
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 13/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
(150,4)
(150,)
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 14/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
estimator. It can be used with any of the algorithms like classification, regression, clustering or
even with a transformer, that extracts useful features from raw data.
For fitting the data, all estimator objects expose a fit method that takes a dataset shown as
follows −
estimator.fit(data)
Next, all the parameters of an estimator can be set, as follows, when it is instantiated by the
corresponding attribute.
Once data is fitted with an estimator, parameters are estimated from the data at hand. Now, all
the estimated parameters will be the attributes of the estimator object ending by an underscore
as follows −
estimator.estimated_param_
fit
fit_predict if transductive
predict if inductive
Guiding Principles
While designing the Scikit-Learn API, following guiding principles kept in mind −
Consistency
This principle states that all the objects should share a common interface drawn from a limited
set of methods. The documentation should also be consistent.
Datasets should be represented in standard format like NumPy arrays, Pandas DataFrames,
SciPy sparse matrix.
Composition
As we know that, ML algorithms can be expressed as the sequence of many fundamental
algorithms. Scikit-learn makes use of these fundamental algorithms whenever needed.
Sensible defaults
According to this principle, the Scikit-learn library defines an appropriate default value whenever
ML models require user-specified parameters.
Inspection
As per this guiding principle, every specified parameter value is exposed as pubic attributes.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 16/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
In this step, we need to choose class model hyperparameters. It can be done by instantiating the
class with desired values.
Example
import seaborn as sns
iris = sns.load_dataset('iris')
X_iris = iris.drop('species', axis = 1)
X_iris.shape
Output
(150, 4)
Example
y_iris = iris['species']
y_iris.shape
Output
(150,)
Example
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 17/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Now, for this regression example, we are going to use the following sample data −
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.RandomState(35)
x = 10*rng.rand(40)
y = 2*x-1+rng.randn(40)
plt.scatter(x,y);
Output
So, we have the above data for our linear regression example.
Example
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 18/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
Example
X = x[:, np.newaxis]
X.shape
Output
(40, 1)
Model fitting
Once, we arrange the data, it is time to fit the model i.e. to apply our model to data. This can be
done with the help of fit() method as follows −
Example
model.fit(X, y)
Output
For this example, the below parameter shows the slope of the simple linear fit of the data −
Example
model.coef_
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 19/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
array([1.99839352])
The below parameter represents the intercept of the simple linear fit to the data −
Example
model.intercept_
Output
-0.9895459457775022
Example
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 20/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
iris = sns.load_dataset('iris')
X_iris = iris.drop('species', axis = 1)
X_iris.shape
y_iris = iris['species']
y_iris.shape
rng = np.random.RandomState(35)
x = 10*rng.rand(40)
y = 2*x-1+rng.randn(40)
plt.scatter(x,y);
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)
model
X = x[:, np.newaxis]
X.shape
model.fit(X, y)
model.coef_
model.intercept_
Like the above given example, we can load and plot the random data from iris dataset. After that
we can follow the steps as below −
Example
model = PCA(n_components=2)
model
Output
Model fitting
Example
model.fit(X_iris)
Output
X_2D = model.transform(X_iris)
Output
iris['PCA1'] = X_2D[:, 0]
iris['PCA2'] = X_2D[:, 1]
sns.lmplot("PCA1", "PCA2", hue = 'species', data = iris, fit_reg = False);
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 22/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
iris = sns.load_dataset('iris')
X_iris = iris.drop('species', axis = 1)
X_iris.shape
y_iris = iris['species']
y_iris.shape
rng = np.random.RandomState(35)
x = 10*rng.rand(40)
y = 2*x-1+rng.randn(40)
plt.scatter(x,y);
from sklearn.decomposition import PCA
model = PCA(n_components=2)
model
model.fit(X_iris)
X_2D = model.transform(X_iris)
iris['PCA1'] = X_2D[:, 0]
iris['PCA2'] = X_2D[:, 1]
sns.lmplot("PCA1", "PCA2", hue='species', data=iris, fit_reg=False);
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 23/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
The APIs adopt simple conventions and the design choices have been guided in a manner to
avoid the proliferation of framework code.
Purpose of Conventions
The purpose of conventions is to make sure that the API stick to the following broad principles −
Consistency − All the objects whether they are basic, or composite must share a consistent
interface which further composed of a limited set of methods.
Various Conventions
The conventions available in Sklearn are explained below −
Type casting
It states that the input should be cast to float64. In the following example, in which
sklearn.random_projection module used to reduce the dimensionality of the data, will explain it
−
Example
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 24/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
import numpy as np
from sklearn import random_projection
rannge = np.random.RandomState(0)
X = range.rand(10,2000)
X = np.array(X, dtype = 'float32')
X.dtype
Transformer_data = random_projection.GaussianRandomProjection()
X_new = transformer.fit_transform(X)
X_new.dtype
Output
dtype('float32')
dtype('float64')
In the above example, we can see that X is float32 which is cast to float64 by fit_transform(X).
Example
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
X, y = load_iris(return_X_y = True)
clf = SVC()
clf.set_params(kernel = 'linear').fit(X, y)
clf.predict(X[:5])
Output
array([0, 0, 0, 0, 0])
Once the estimator has been constructed, above code will change the default kernel rbf to linear
via SVC.set_params().
Now, the following code will change back the kernel to rbf to refit the estimator and to make a
second prediction.
Example
Output
array([0, 0, 0, 0, 0])
Complete code
The following is the complete executable program −
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
X, y = load_iris(return_X_y = True)
clf = SVC()
clf.set_params(kernel = 'linear').fit(X, y)
clf.predict(X[:5])
clf.set_params(kernel = 'rbf', gamma = 'scale').fit(X, y)
clf.predict(X[:5])
Example
Output
array([0, 0, 1, 1, 2])
In the above example, classifier is fit on one dimensional array of multiclass labels and the
predict() method hence provides corresponding multiclass prediction. But on the other hand, it is
also possible to fit upon a two-dimensional array of binary label indicators as follows −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 26/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
Output
array(
[
[0, 0, 0],
[0, 0, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 0]
]
)
Similarly, in case of multilabel fitting, an instance can be assigned multiple labels as follows −
Example
Output
array(
[
[1, 0, 1, 0, 0],
[1, 0, 1, 0, 0],
[1, 0, 1, 1, 0],
[1, 0, 1, 1, 0],
[1, 0, 1, 0, 0]
]
)
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 27/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
The following table lists out various linear models provided by Scikit-Learn −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 28/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
Linear Regression
It is one of the best statistical models that studies the relationship between a
dependent variable (Y) with a given set of independent variables (X).
2
Logistic Regression
3
Ridge Regression
4
Bayesian Ridge Regression
5
LASSO
6
Multi-task LASSO
It allows to fit multiple regression problems jointly enforcing the selected features to be
same for all the regression problems, also called tasks. Sklearn provides a linear
model named MultiTaskLasso, trained with a mixed L1, L2-norm for regularisation,
which estimates sparse coefficients for multiple regression problems jointly.
7
Elastic-Net
8
Multi-task Elastic-Net
One such example is that a simple linear regression can be extended by constructing polynomial
features from the coefficients.
Mathematically, suppose we have standard linear regression model then for 2-D data it would
look like this −
Y = W0 + W1 X1 + W2 X2
Now, we can combine the features in second-order polynomials and our model will look like as
follows −
2 2
Y = W0 + W1 X1 + W2 X2 + W3 X1 X2 + W4 X + W5 X
1 2
The above is still a linear model. Here, we saw that the resulting polynomial regression is in the
same class of linear models and can be solved similarly.
Parameters
Followings table consist the parameters used by PolynomialFeatures module
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 30/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
degree − integer, default = 2
2
interaction_only − Boolean, default = false
By default, it is false but if set as true, the features that are products of most degree
distinct input features, are produced. Such features are called interaction features.
3
include_bias − Boolean, default = true
It includes a bias column i.e. the feature in which all polynomials powers are zero.
4
order − str in {‘C’, ‘F’}, default = ‘C’
This parameter represents the order of output array in the dense case. ‘F’ order means
faster to compute but on the other hand, it may slow down subsequent estimators.
Attributes
Followings table consist the attributes used by PolynomialFeatures module
1
powers_ − array, shape (n_output_features, n_input_features)
It shows powers_ [i,j] is the exponent of the jth input in the ith output.
2
n_input_features _ − int
3
n_output_features _ − int
Implementation Example
Following Python script uses PolynomialFeatures transformer to transform array of 8 into shape
(4,2) −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 31/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
array(
[
[ 1., 0., 1., 0., 0., 1.],
[ 1., 2., 3., 4., 6., 9.],
[ 1., 4., 5., 16., 20., 25.],
[ 1., 6., 7., 36., 42., 49.]
]
)
Example
The below python scripts using Scikit-learn’s Pipeline tools to streamline the preprocessing (will
fit to an order-3 polynomial data).
#Provide the size of array and order of polynomial data to fit the model.
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
Stream_model = model.fit(x[:, np.newaxis], y)
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 32/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
array([ 3., -2., 1., -1.])
The above output shows that the linear model trained on polynomial features is able to recover
the exact input polynomial coefficients.
Stochastic Gradient Descent (SGD) is a simple yet efficient optimization algorithm used to find
the values of parameters/coefficients of functions that minimize a cost function. In other words, it
is used for discriminative learning of linear classifiers under convex loss functions such as SVM
and Logistic regression. It has been successfully applied to large-scale datasets because the
update to the coefficients is performed for each training instance, rather than at the end of
instances.
SGD Classifier
Stochastic Gradient Descent (SGD) classifier basically implements a plain SGD learning routine
supporting various loss functions and penalties for classification. Scikit-learn provides
SGDClassifier module to implement SGD classification.
Parameters
Followings table consist the parameters used by SGDClassifier module −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 33/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
loss − str, default = ‘hinge’
It represents the loss function to be used while implementing. The default value is
‘hinge’ which will give us a linear SVM. The other options which can be used are −
log − This loss will give us logistic regression i.e. a probabilistic classifier.
2
penalty − str, ‘none’, ‘l2’, ‘l1’, ‘elasticnet’
It is the regularization term used in the model. By default, it is L2. We can use L1 or
‘elasticnet; as well but both might bring sparsity to the model, hence not achievable
with L2.
3
alpha − float, default = 0.0001
Alpha, the constant that multiplies the regularization term, is the tuning parameter that
decides how much we want to penalize the model. The default value is 0.0001.
4
l1_ratio − float, default = 0.15
This is called the ElasticNet mixing parameter. Its range is 0 < = l1_ratio < = 1. If
l1_ratio = 1, the penalty would be L1 penalty. If l1_ratio = 0, the penalty would be an L2
penalty.
5
fit_intercept − Boolean, Default=True
This parameter specifies that a constant (bias or intercept) should be added to the
decision function. No intercept will be used in calculation and data will be assumed
already centered, if it will set to false.
6
tol − float or none, optional, default = 1.e-3
This parameter represents the stopping criterion for iterations. Its default value is False
but if set to None, the iterations will stop when 𝒍 loss > best_loss - tol for
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 34/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
n_iter_no_changesuccessive epochs.
7
shuffle − Boolean, optional, default = True
This parameter represents that whether we want our training data to be shuffled after
each epoch or not.
8
verbose − integer, default = 0
9
epsilon − float, default = 0.1
This parameter specifies the width of the insensitive region. If loss = ‘epsilon-
insensitive’, any difference, between current prediction and the correct label, less than
the threshold would be ignored.
10
max_iter − int, optional, default = 1000
As name suggest, it represents the maximum number of passes over the epochs i.e.
training data.
11
warm_start − bool, optional, default = false
With this parameter set to True, we can reuse the solution of the previous call to fit as
initialization. If we choose default i.e. false, it will erase the previous solution.
12
random_state − int, RandomState instance or None, optional, default = none
This parameter represents the seed of the pseudo random number generated which is
used while shuffling the data. Followings are the options.
int − In this case, random_state is the seed used by random number generator.
13
n_jobs − int or none, optional, Default = None
It represents the number of CPUs to be used in OVA (One Versus All) computation, for
multi-class problems. The default value is none which means 1.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 35/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
14
learning_rate − string, optional, default = ‘optimal’
15
eta0 − double, default = 0.0
It represents the initial learning rate for above mentioned learning rate options i.e.
‘constant’, ‘invscalling’, or ‘adaptive’.
16
power_t − idouble, default =0.5
17
early_stopping − bool, default = False
This parameter represents the use of early stopping to terminate training when
validation score is not improving. Its default value is false but when set to true, it
automatically set aside a stratified fraction of training data as validation and stop
training when validation score is not improving.
18
validation_fraction − float, default = 0.1
It is only used when early_stopping is true. It represents the proportion of training data
to set asides as validation set for early termination of training data..
19
n_iter_no_change − int, default=5
It represents the number of iteration with no improvement should algorithm run before
early stopping.
20
classs_weight − dict, {class_label: weight} or “balanced”, or None, optional
This parameter represents the weights associated with classes. If not provided, the
classes are supposed to have weight 1.
20
warm_start − bool, optional, default = false
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 36/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
With this parameter set to True, we can reuse the solution of the previous call to fit as
initialization. If we choose default i.e. false, it will erase the previous solution.
21
average − iBoolean or int, optional, default = false
It represents the number of CPUs to be used in OVA (One Versus All) computation, for
multi-class problems. The default value is none which means 1.
Attributes
Following table consist the attributes used by SGDClassifier module −
1
coef_ − array, shape (1, n_features) if n_classes==2, else (n_classes, n_features)
2
intercept_ − array, shape (1,) if n_classes==2, else (n_classes,)
3
n_iter_ − int
Implementation Example
Like other classifiers, Stochastic Gradient Descent (SGD) has to be fitted with following two
arrays −
Example
import numpy as np
from sklearn import linear_model
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array([1, 1, 2, 2])
SGDClf = linear_model.SGDClassifier(max_iter = 1000, tol=1e-3,penalty = "el
SGDClf.fit(X, Y)
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 37/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
SGDClassifier(
alpha = 0.0001, average = False, class_weight = None,
early_stopping = False, epsilon = 0.1, eta0 = 0.0, fit_intercept = True,
l1_ratio = 0.15, learning_rate = 'optimal', loss = 'hinge', max_iter = 1000,
n_iter = None, n_iter_no_change = 5, n_jobs = None, penalty = 'elasticnet',
power_t = 0.5, random_state = None, shuffle = True, tol = 0.001,
validation_fraction = 0.1, verbose = 0, warm_start = False
)
Example
Now, once fitted, the model can predict new values as follows −
SGDClf.predict([[2.,2.]])
Output
array([2])
Example
For the above example, we can get the weight vector with the help of following python script −
SGDClf.coef_
Output
array([[19.54811198, 9.77200712]])
Example
Similarly, we can get the value of intercept with the help of following python script −
SGDClf.intercept_
Output
array([10.])
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 38/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
SGDClf.decision_function([[2., 2.]])
Output
array([68.6402382])
SGD Regressor
Stochastic Gradient Descent (SGD) regressor basically implements a plain SGD learning routine
supporting various loss functions and penalties to fit linear regression models. Scikit-learn
provides SGDRegressor module to implement SGD regression.
Parameters
Parameters used by SGDRegressor are almost same as that were used in SGDClassifier
module. The difference lies in ‘loss’ parameter. For SGDRegressor modules’ loss parameter the
positives values are as follows −
huber: SGDRegressor − correct the outliers by switching from squared to linear loss past a
distance of epsilon. The work of ‘huber’ is to modify ‘squared_loss’ so that algorithm focus
less on correcting outliers.
epsilon_insensitive − Actually, it ignores the errors less than epsilon.
Another difference is that the parameter named ‘power_t’ has the default value of 0.25 rather
than 0.5 as in SGDClassifier. Furthermore, it doesn’t have ‘class_weight’ and ‘n_jobs’
parameters.
Attributes
Attributes of SGDRegressor are also same as that were of SGDClassifier module. Rather it has
three extra attributes as follows −
t_ − int
It provides the number of weight updates performed during the training phase.
Note − the attributes average_coef_ and average_intercept_ will work after enabling parameter
‘average’ to True.
Implementation Example
import numpy as np
from sklearn import linear_model
n_samples, n_features = 10, 5
rng = np.random.RandomState(0)
y = rng.randn(n_samples)
X = rng.randn(n_samples, n_features)
SGDReg =linear_model.SGDRegressor(
max_iter = 1000,penalty = "elasticnet",loss = 'huber',tol = 1e-3, averag
)
SGDReg.fit(X, y)
Output
SGDRegressor(
alpha = 0.0001, average = True, early_stopping = False, epsilon = 0.1,
eta0 = 0.01, fit_intercept = True, l1_ratio = 0.15,
learning_rate = 'invscaling', loss = 'huber', max_iter = 1000,
n_iter = None, n_iter_no_change = 5, penalty = 'elasticnet', power_t = 0.25,
random_state = None, shuffle = True, tol = 0.001, validation_fraction = 0.1,
verbose = 0, warm_start = False
)
Example
Now, once fitted, we can get the weight vector with the help of following python script −
SGDReg.coef_
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 40/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
Similarly, we can get the value of intercept with the help of following python script −
SGReg.intercept_
Output
SGReg.intercept_
Example
We can get the number of weight updates during training phase with the help of the following
python script −
SGDReg.t_
Output
61.0
It is very easy to implement as there are lots of opportunities for code tuning.
Introduction
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 41/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Support vector machines (SVMs) are powerful yet flexible supervised machine learning methods
used for classification, regression, and, outliers’ detection. SVMs are very efficient in high
dimensional spaces and generally are used in classification problems. SVMs are popular and
memory efficient because they use a subset of training points in the decision function.
The main goal of SVMs is to divide the datasets into number of classes in order to find a
maximum marginal hyperplane (MMH) which can be done in the following two steps −
Support Vector Machines will first generate hyperplanes iteratively that separates the
classes in the best way.
After that it will choose the hyperplane that segregate the classes correctly.
Support Vectors − They may be defined as the datapoints which are closest to the
hyperplane. Support vectors help in deciding the separating line.
Hyperplane − The decision plane or space that divides set of objects having different
classes.
Margin − The gap between two lines on the closet data points of different classes is called
margin.
Following diagrams will give you an insight about these SVM concepts −
SVM in Scikit-learn supports both sparse and dense sample vectors as input.
Classification of SVM
Scikit-learn provides three classes namely SVC, NuSVC and LinearSVC which can perform
multiclass-class classification.
SVC
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 42/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
It is C-support vector classification whose implementation is based on libsvm. The module used
by scikit-learn is sklearn.svm.SVC. This class handles the multiclass support according to one-
vs-one scheme.
Parameters
Followings table consist the parameters used by sklearn.svm.SVC class −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 43/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
C − float, optional, default = 1.0
2
kernel − string, optional, default = ‘rbf’
This parameter specifies the type of kernel to be used in the algorithm. we can choose
any one among, ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. The default value of
kernel would be ‘rbf’.
3
degree − int, optional, default = 3
It represents the degree of the ‘poly’ kernel function and will be ignored by all other
kernels.
4
gamma − {‘scale’, ‘auto’} or float,
5
optinal default − = ‘scale’
If you choose default i.e. gamma = ‘scale’ then the value of gamma to be used by SVC
is 1/(𝑛_𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠∗𝑋.𝑣𝑎𝑟()).
6
coef0 − float, optional, Default=0.0
An independent term in kernel function which is only significant in ‘poly’ and ‘sigmoid’.
7
tol − float, optional, default = 1.e-3
8
shrinking − Boolean, optional, default = True
This parameter represents that whether we want to use shrinking heuristic or not.
9
verbose − Boolean, default: false
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 44/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
This parameter enables or disables probability estimates. The default value is false,
but it must be enabled before we call fit.
11
max_iter − int, optional, default = -1
As name suggest, it represents the maximum number of iterations within the solver.
Value -1 means there is no limit on the number of iterations.
12
cache_size − float, optional
This parameter will specify the size of the kernel cache. The value will be in
MB(MegaBytes).
13
random_state − int, RandomState instance or None, optional, default = none
This parameter represents the seed of the pseudo random number generated which is
used while shuffling the data. Followings are the options −
int − In this case, random_state is the seed used by random number generator.
14
class_weight − {dict, ‘balanced’}, optional
This parameter will set the parameter C of class j to 𝑐𝑙𝑎𝑠𝑠 _𝑤𝑒𝑖𝑔ℎ𝑡[𝑗]∗𝐶 for SVC. If we
use the default option, it means all the classes are supposed to have weight one. On
the other hand, if you choose class_weight:balanced, it will use the values of y to
automatically adjust weights.
15
decision_function_shape − ovo’, ‘ovr’, default = ‘ovr’
This parameter will decide whether the algorithm will return ‘ovr’ (one-vs-rest) decision
function of shape as all other classifiers, or the original ovo(one-vs-one) decision
function of libsvm.
16
break_ties − boolean, optional, default = false
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 45/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
True − The predict will break ties according to the confidence values of
decision_function
False − The predict will return the first class among the tied classes.
Attributes
Followings table consist the attributes used by sklearn.svm.SVC class −
1
support_ − array-like, shape = [n_SV]
2
support_vectors_ − array-like, shape = [n_SV, n_features]
3
n_support_ − array-like, dtype=int32, shape = [n_class]
4
dual_coef_ − array, shape = [n_class-1,n_SV]
These are the coefficient of the support vectors in the decision function.
5
coef_ − array, shape = [n_class * (n_class-1)/2, n_features]
This attribute, only available in case of linear kernel, provides the weight assigned to
the features.
6
intercept_ − array, shape = [n_class * (n_class-1)/2]
7
fit_status_ − int
8
classes_ − array of shape = [n_classes]
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 46/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Implementation Example
Like other classifiers, SVC also has to be fitted with following two arrays −
import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
SVCClf = SVC(kernel = 'linear',gamma = 'scale', shrinking = False,)
SVCClf.fit(X, y)
Output
Example
Now, once fitted, we can get the weight vector with the help of following python script −
SVCClf.coef_
Output
array([[0.5, 0.5]])
Example
SVCClf.predict([[-0.5,-0.8]])
Output
array([1])
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 47/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
SVCClf.n_support_
Output
array([1, 1])
Example
SVCClf.support_vectors_
Output
array(
[
[-1., -1.],
[ 1., 1.]
]
)
Example
SVCClf.support_
Output
array([0, 2])
Example
SVCClf.intercept_
Output
array([-0.])
Example
SVCClf.fit_status_
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 48/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
NuSVC
NuSVC is Nu Support Vector Classification. It is another class provided by scikit-learn which can
perform multi-class classification. It is like SVC but NuSVC accepts slightly different sets of
parameters. The parameter which is different from SVC is as follows −
It represents an upper bound on the fraction of training errors and a lower bound of the fraction of
support vectors. Its value should be in the interval of (o,1].
Implementation Example
We can implement the same example using sklearn.svm.NuSVC class also.
import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import NuSVC
NuSVCClf = NuSVC(kernel = 'linear',gamma = 'scale', shrinking = False,)
NuSVCClf.fit(X, y)
Output
NuSVC(cache_size = 200, class_weight = None, coef0 = 0.0,
decision_function_shape = 'ovr', degree = 3, gamma = 'scale', kernel = 'linea
max_iter = -1, nu = 0.5, probability = False, random_state = None,
shrinking = False, tol = 0.001, verbose = False)
We can get the outputs of rest of the attributes as did in the case of SVC.
LinearSVC
It is Linear Support Vector Classification. It is similar to SVC having kernel = ‘linear’. The
difference between them is that LinearSVC implemented in terms of liblinear while SVC is
implemented in libsvm. That’s the reason LinearSVC has more flexibility in the choice of
penalties and loss functions. It also scales better to large number of samples.
If we talk about its parameters and attributes then it does not support ‘kernel’ because it is
assumed to be linear and it also lacks some of the attributes like support_, support_vectors_,
n_support_, fit_status_ and, dual_coef_.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 49/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
This parameter is used to specify the norm (L1 or L2) used in penalization (regularization).
Implementation Example
Following Python script uses sklearn.svm.LinearSVC class −
Output
LinearSVC(C = 1.0, class_weight = None, dual = False, fit_intercept = True,
intercept_scaling = 1, loss = 'squared_hinge', max_iter = 1000,
multi_class = 'ovr', penalty = 'l1', random_state = 0, tol = 1e-05, verbose =
Example
Now, once fitted, the model can predict new values as follows −
LSVCClf.predict([[0,0,0,0]])
Output
[1]
Example
For the above example, we can get the weight vector with the help of following python script −
LSVCClf.coef_
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 50/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
[[0. 0. 0.91214955 0.22630686]]
Example
Similarly, we can get the value of intercept with the help of following python script −
LSVCClf.intercept_
Output
[0.26860518]
Whereas, the model produced by SVR (Support Vector Regression) also only depends on a
subset of the training data. Why? Because the cost function for building the model ignores any
training data points close to the model prediction.
Scikit-learn provides three classes namely SVR, NuSVR and LinearSVR as three different
implementations of SVR.
SVR
It is Epsilon-support vector regression whose implementation is based on libsvm. As opposite to
SVC There are two free parameters in the model namely ‘C’ and ‘epsilon’.
It represents the epsilon in the epsilon-SVR model, and specifies the epsilon-tube within which no
penalty is associated in the training loss function with points predicted within a distance epsilon
from the actual value.
Implementation Example
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 51/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
SVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, epsilon = 0.1, gamma =
kernel = 'linear', max_iter = -1, shrinking = True, tol = 0.001, verbose = Fa
Example
Now, once fitted, we can get the weight vector with the help of following python script −
SVRReg.coef_
Output
array([[0.4, 0.4]])
Example
Similarly, we can get the value of other attributes as follows −
SVRReg.predict([[1,1]])
Output
array([1.1])
NuSVR
NuSVR is Nu Support Vector Regression. It is like NuSVC, but NuSVR uses a parameter nu to
control the number of support vectors. And moreover, unlike NuSVC where nu replaced C
parameter, here it replaces epsilon.
Implementation Example
Following Python script uses sklearn.svm.SVR class −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 52/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
NuSVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, gamma = 'auto',
kernel = 'linear', max_iter = -1, nu = 0.1, shrinking = True, tol = 0.001,
verbose = False)
Example
Now, once fitted, we can get the weight vector with the help of following python script −
NuSVRReg.coef_
Output
array(
[
[-0.14904483, 0.04596145, 0.22605216, -0.08125403, 0.06564533,
0.01104285, 0.04068767, 0.2918337 , -0.13473211, 0.36006765,
-0.2185713 , -0.31836476, -0.03048429, 0.16102126, -0.29317051]
]
)
LinearSVR
It is Linear Support Vector Regression. It is similar to SVR having kernel = ‘linear’. The difference
between them is that LinearSVR implemented in terms of liblinear, while SVC implemented in
libsvm. That’s the reason LinearSVR has more flexibility in the choice of penalties and loss
functions. It also scales better to large number of samples.
If we talk about its parameters and attributes then it does not support ‘kernel’ because it is
assumed to be linear and it also lacks some of the attributes like support_, support_vectors_,
n_support_, fit_status_ and, dual_coef_.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 53/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
It represents the loss function where epsilon_insensitive loss is the L1 loss and the squared
epsilon-insensitive loss is the L2 loss.
Implementation Example
Following Python script uses sklearn.svm.LinearSVR class −
Output
LinearSVR(
C=1.0, dual=False, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='squared_epsilon_insensitive',
max_iter=1000, random_state=0, tol=1e-05, verbose=0
)
Example
Now, once fitted, the model can predict new values as follows −
LSRReg.predict([[0,0,0,0]])
Output
array([-0.01041416])
Example
For the above example, we can get the weight vector with the help of following python script −
LSRReg.coef_
Output
array([20.47354746, 34.08619401, 67.23189022, 87.47017787])
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 54/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
Similarly, we can get the value of intercept with the help of following python script −
LSRReg.intercept_
Output
array([-0.01041416])
Anomaly detection is a technique used to identify data points in dataset that does not fit well with
the rest of the data. It has many applications in business such as fraud detection, intrusion
detection, system health monitoring, surveillance, and predictive maintenance. Anomalies, which
are also called outlier, can be divided into following three categories −
Methods
Two methods namely outlier detection and novelty detection can be used for anomaly
detection. It’s necessary to see the distinction between them.
Outlier detection
The training data contains outliers that are far from the rest of the data. Such outliers are defined
as observations. That’s the reason, outlier detection estimators always try to fit the region having
most concentrated training data while ignoring the deviant observations. It is also known as
unsupervised anomaly detection.
Novelty detection
It is concerned with detecting an unobserved pattern in new observations which is not included in
training data. Here, the training data is not polluted by the outliers. It is also known as semi-
supervised anomaly detection.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 55/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
There are set of ML tools, provided by scikit-learn, which can be used for both outlier detection as
well novelty detection. These tools first implementing object learning from the data in an
unsupervised by using fit () method as follows −
estimator.fit(X_train)
Now, the new observations would be sorted as inliers (labeled 1) or outliers (labeled -1) by
using predict() method as follows −
estimator.fit(X_test)
The estimator will first compute the raw scoring function and then predict method will make use of
threshold on that raw scoring function. We can access this raw scoring function with the help of
score_sample method and can control the threshold by contamination parameter.
We can also define decision_function method that defines outliers as negative value and inliers
as non-negative value.
estimator.decision_function(X_test)
This object fits a robust covariance estimate to the data, and thus, fits an ellipse to the central
data points. It ignores the points outside the central mode.
Parameters
Following table consist the parameters used by sklearn. covariance.EllipticEnvelop method −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 56/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
store_precision − Boolean, optional, default = True
2
assume_centered − Boolean, optional, default = False
If we set it False, it will compute the robust location and covariance directly with the
help of FastMCD algorithm. On the other hand, if set True, it will compute the support
of robust location and covarian.
3
support_fraction − float in (0., 1.), optional, default = None
This parameter tells the method that how much proportion of points to be included in
the support of the raw MCD estimates.
4
contamination − float in (0., 1.), optional, default = 0.1
5
random_state − int, RandomState instance or None, optional, default = none
This parameter represents the seed of the pseudo random number generated which is
used while shuffling the data. Followings are the options −
int − In this case, random_state is the seed used by random number generator.
Attributes
Following table consist the attributes used by sklearn. covariance.EllipticEnvelop method −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 57/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
support_ − array-like, shape(n_samples,)
2
location_ − array-like, shape (n_features)
3
covariance_ − array-like, shape (n_features, n_features)
4
precision_ − array-like, shape (n_features, n_features)
5
offset_ − float
It is used to define the decision function from the raw scores. decision_function =
score_samples -offset_
Implementation Example
Output
array([ 1, -1])
Isolation Forest
In case of high-dimensional dataset, one efficient way for outlier detection is to use random
forests. The scikit-learn provides ensemble.IsolationForest method that isolates the
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 58/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Here, the number of splitting needed to isolate a sample is equivalent to path length from the root
node to the terminating node.
Parameters
Followings table consist the parameters used by sklearn. ensemble.IsolationForest method −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 59/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
n_estimators − int, optional, default = 100
2
max_samples − int or float, optional, default = “auto”
It represents the number of samples to be drawn from X to train each base estimator. If
we choose int as its value, it will draw max_samples samples. If we choose float as its
value, it will draw max_samples ∗ 𝑋 .shape[0] samples. And, if we choose auto as its
value, it will draw max_samples = min(256,n_samples).
3
support_fraction − float in (0., 1.), optional, default = None
This parameter tells the method that how much proportion of points to be included in
the support of the raw MCD estimates.
4
contamination − auto or float, optional, default = auto
It provides the proportion of the outliers in the data set. If we set it default i.e. auto, it
will determine the threshold as in the original paper. If set to float, the range of
contamination will be in the range of [0,0.5].
5
random_state − int, RandomState instance or None, optional, default = none
This parameter represents the seed of the pseudo random number generated which is
used while shuffling the data. Followings are the options −
int − In this case, random_state is the seed used by random number generator.
6
max_features − int or float, optional (default = 1.0)
It represents the number of features to be drawn from X to train each base estimator. If
we choose int as its value, it will draw max_features features. If we choose float as its
value, it will draw max_features * X.shape[𝟏] samples.
7
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 60/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Its default option is False which means the sampling would be performed without
replacement. And on the other hand, if set to True, means individual trees are fit on a
random subset of the training data sampled with replacement.
8
n_jobs − int or None, optional (default = None)
It represents the number of jobs to be run in parallel for fit() and predict() methods
both.
9
verbose − int, optional (default = 0)
10
warm_start − Bool, optional (default=False)
If warm_start = true, we can reuse previous calls solution to fit and can add more
estimators to the ensemble. But if is set to false, we need to fit a whole new forest.
Attributes
Following table consist the attributes used by sklearn. ensemble.IsolationForest method −
1
estimators_ − list of DecisionTreeClassifier
2
max_samples_ − integer
3
offset_ − float
It is used to define the decision function from the raw scores. decision_function =
score_samples -offset_
Implementation Example
The Python script below will use sklearn. ensemble.IsolationForest method to fit 10 trees on
given data
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 61/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
IsolationForest(
behaviour = 'old', bootstrap = False, contamination='legacy',
max_features = 1.0, max_samples = 'auto', n_estimators = 10, n_jobs=None,
random_state = None, verbose = 0
)
Parameters
Followings table consist the parameters used by sklearn. neighbors.LocalOutlierFactor
method
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 62/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
n_neighbors − int, optional, default = 20
It represents the number of neighbors use by default for kneighbors query. All samples
would be used if .
2
algorithm − optional
If you choose auto, it will decide the most appropriate algorithm on the basis of
the value we passed to fit() method.
3
leaf_size − int, optional, default = 30
The value of this parameter can affect the speed of the construction and query. It also
affects the memory required to store the tree. This parameter is passed to BallTree or
KdTree algorithms.
4
contamination − auto or float, optional, default = auto
It provides the proportion of the outliers in the data set. If we set it default i.e. auto, it
will determine the threshold as in the original paper. If set to float, the range of
contamination will be in the range of [0,0.5].
5
metric − string or callable, default
6
P − int, optional (default = 2)
7
novelty − Boolean, (default = False)
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 63/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
By default, LOF algorithm is used for outlier detection but it can be used for novelty
detection if we set novelty = true.
8
n_jobs − int or None, optional (default = None)
It represents the number of jobs to be run in parallel for fit() and predict() methods
both.
Attributes
Following table consist the attributes used by sklearn.neighbors.LocalOutlierFactor method −
1
negative_outlier_factor_ − numpy array, shape(n_samples,)
2
n_neighbors_ − integer
3
offset_ − float
Implementation Example
Output
NearestNeighbors(
algorithm = 'ball_tree', leaf_size = 30, metric='minkowski',
metric_params = None, n_jobs = None, n_neighbors = 1, p = 1, radius = 1.0
)
Example
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 64/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Now, we can ask from this constructed classifier is the closet point to [0.5, 1., 1.5] by using the
following python script −
Output
One-Class SVM
The One-Class SVM, introduced by Schölkopf et al., is the unsupervised Outlier Detection. It is
also very efficient in high-dimensional data and estimates the support of a high-dimensional
distribution. It is implemented in the Support Vector Machines module in the
Sklearn.svm.OneClassSVM object. For defining a frontier, it requires a kernel (mostly used is
RBF) and a scalar parameter.
For better understanding let's fit our data with svm.OneClassSVM object −
Example
from sklearn.svm import OneClassSVM
X = [[0], [0.89], [0.90], [0.91], [1]]
OSVMclf = OneClassSVM(gamma = 'scale').fit(X)
OSVMclf.score_samples(X)
Output
array([1.12218594, 1.58645126, 1.58673086, 1.58645127, 1.55713767])
Neighbor based learning method are of both types namely supervised and unsupervised.
Supervised neighbors-based learning can be used for both classification as well as regression
predictive problems but, it is mainly used for classification predictive problems in industry.
Neighbors based learning methods do not have a specialised training phase and uses all the
data for training while classification. It also does not assume anything about the underlying data.
That’s the reason they are lazy and non-parametric in nature.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 65/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
To find a predefined number of training samples closet in distance to the new data point
Predict the label from these number of training samples.
Here, the number of samples can be a user-defined constant like in K-nearest neighbor learning
or vary based on the local density of point like in radius-based neighbor learning.
sklearn.neighbors Module
Scikit-learn have sklearn.neighbors module that provides functionality for both unsupervised
and supervised neighbors-based learning methods. As input, the classes in this module can
handle either NumPy arrays or scipy.sparse matrices.
Types of algorithms
Different types of algorithms which can be used in neighbor-based methods’ implementation are
as follows −
Brute Force
The brute-force computation of distances between all pairs of points in the dataset provides the
most naïve neighbor search implementation. Mathematically, for N samples in D dimensions,
brute-force approach scales as 0[DN2]
For small data samples, this algorithm can be very useful, but it becomes infeasible as and when
number of samples grows. Brute force neighbor search can be enabled by writing the keyword
algorithm=’brute’.
K-D Tree
One of the tree-based data structures that have been invented to address the computational
inefficiencies of the brute-force approach, is KD tree data structure. Basically, the KD tree is a
binary tree structure which is called K-dimensional tree. It recursively partitions the parameters
space along the data axes by dividing it into nested orthographic regions into which the data
points are filled.
Advantages
Following are some advantages of K-D tree algorithm −
Construction is fast − As the partitioning is performed only along the data axes, K-D tree’s
construction is very fast.
Less distance computations − This algorithm takes very less distance computations to
determine the nearest neighbor of a query point. It only takes 𝑶[𝐥𝐨𝐠 (𝑵)] distance computations.
Disadvantages
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 66/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Fast for only low-dimensional neighbor searches − It is very fast for low-dimensional (D < 20)
neighbor searches but as and when D grow it becomes inefficient. As the partitioning is
performed only along the data axes,
K-D tree neighbor searches can be enabled by writing the keyword algorithm=’kd_tree’.
Ball Tree
As we know that KD Tree is inefficient in higher dimensions, hence, to address this inefficiency of
KD Tree, Ball tree data structure was developed. Mathematically, it recursively divides the data,
into nodes defined by a centroid C and radius r, in such a way that each point in the node lies
within the hyper-sphere defined by centroid C and radius r. It uses triangle inequality, given
below, which reduces the number of candidate points for a neighbor search
⏐X + Y ⏐ ≤ ⏐X ⏐ + ⏐Y ⏐
Advantages
Following are some advantages of Ball Tree algorithm −
Efficient on highly structured data − As ball tree partition the data in a series of nesting hyper-
spheres, it is efficient on highly structured data.
Out-performs KD-tree − Ball tree out-performs KD tree in high dimensions because it has
spherical geometry of the ball tree nodes.
Disadvantages
Costly − Partition the data in a series of nesting hyper-spheres makes its construction very
costly.
Ball tree neighbor searches can be enabled by writing the keyword algorithm=’ball_tree’.
The query time of KD tree algorithm changes with D in a strange manner that is very difficult
to characterize. When D < 20, the cost is O[D log(N)] and this algorithm is very efficient. On
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 67/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
the other hand, it is inefficient in case when D > 20 because the cost increases to nearly
O[DN].
Data Structure
Another factor that affect the performance of these algorithms is intrinsic dimensionality of the
data or sparsity of the data. It is because the query times of Ball tree and KD tree algorithms can
be greatly influenced by it. Whereas, the query time of Brute Force algorithm is unchanged by
data structure. Generally, Ball tree and KD tree algorithms produces faster query time when
implanted on sparser data with smaller intrinsic dimensionality.
Step 1
In this step, it computes and stores the k nearest neighbors for each sample in the training set.
Step 2
In this step, for an unlabeled sample, it retrieves the k nearest neighbors from dataset. Then
among these k-nearest neighbors, it predicts the class through voting (class with majority votes
wins).
The module, sklearn.neighbors that implements the k-nearest neighbors algorithm, provides the
functionality for unsupervised as well as supervised neighbors-based learning methods.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 68/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
The unsupervised nearest neighbors implement different algorithms (BallTree, KDTree or Brute
Force) to find the nearest neighbor(s) for each sample. This unsupervised version is basically
only step 1, which is discussed above, and the foundation of many algorithms (KNN and K-
means being the famous one) which require the neighbor search. In simple words, it is
Unsupervised learner for implementing neighbor searches.
On the other hand, the supervised neighbors-based learning is used for classification as well as
regression.
Scikit-learn module
sklearn.neighbors.NearestNeighbors is the module used to implement unsupervised nearest
neighbor learning. It uses specific nearest neighbor algorithms named BallTree, KDTree or Brute
Force. In other words, it acts as a uniform interface to these three algorithms.
Parameters
Followings table consist the parameters used by NearestNeighbors module −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 69/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
n_neighbors − int, optional
2
radius − float, optional
3
algorithm − {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
This parameter will take the algorithm (BallTree, KDTree or Brute-force) you want to
use to compute the nearest neighbors. If you will provide ‘auto’, it will attempt to decide
the most appropriate algorithm based on the values passed to fit method.
4
leaf_size − int, optional
It can affect the speed of the construction & query as well as the memory required to
store the tree. It is passed to BallTree or KDTree. Although the optimal value depends
on the nature of the problem, its default value is 30.
5
metric − string or callable
It is the metric to use for distance computation between points. We can pass it as a
string or callable function. In case of callable function, the metric is called on each pair
of rows and the resulting value is recorded. It is less efficient than passing the metric
name as a string.
We can choose from metric from scikit-learn or scipy.spatial.distance. the valid values
are as follows −
Scipy.spatial.distance −
[‘braycurtis’,‘canberra’,‘chebyshev’,‘dice’,‘hamming’,‘jaccard’,
‘correlation’,‘kulsinski’,‘mahalanobis’,‘minkowski’,‘rogerstanimoto’,‘russellrao’,
‘sokalmicheme’,’sokalsneath’, ‘seuclidean’, ‘sqeuclidean’, ‘yule’].
6
P − integer, optional
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 70/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
It is the parameter for the Minkowski metric. The default value is 2 which is equivalent
to using Euclidean_distance(l2).
7
metric_params − dict, optional
This is the additional keyword arguments for the metric function. The default value is
None.
8
N_jobs − int or None, optional
It reprsetst the numer of parallel jobs to run for neighbor search. The default value is
None.
Implementation Example
The example below will find the nearest neighbors between two sets of data by using the
sklearn.neighbors.NearestNeighbors module.
Now, after importing the packages, define the sets of data in between we want to find the nearest
neighbors −
Input_data = np.array([[-1, 1], [-2, 2], [-3, 3], [1, 2], [2, 3], [3, 4],[4, 5]]
nrst_neigh.fit(Input_data)
Now, find the K-neighbors of data set. It will return the indices and distances of the neighbors of
each point.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 71/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
array(
[
[0, 1, 3],
[1, 2, 0],
[2, 1, 0],
[3, 4, 0],
[4, 5, 3],
[5, 6, 4],
[6, 5, 4]
], dtype = int64
)
distances
Output
array(
[
[0. , 1.41421356, 2.23606798],
[0. , 1.41421356, 1.41421356],
[0. , 1.41421356, 2.82842712],
[0. , 1.41421356, 2.23606798],
[0. , 1.41421356, 1.41421356],
[0. , 1.41421356, 1.41421356],
[0. , 1.41421356, 2.82842712]
]
)
The above output shows that the nearest neighbor of each point is the point itself i.e. at zero. It is
because the query set matches the training set.
Example
We can also show a connection between neighboring points by producing a sparse graph as
follows −
nrst_neigh.kneighbors_graph(Input_data).toarray()
Output
array(
[
[1., 1., 0., 1., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0.],
[1., 0., 0., 1., 1., 0., 0.],
[0., 0., 0., 1., 1., 1., 0.],
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 72/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Once we fit the unsupervised NearestNeighbors model, the data will be stored in a data
structure based on the value set for the argument ‘algorithm’. After that we can use this
unsupervised learner’s kneighbors in a model which requires neighbor searches.
It is computed from a simple majority vote of the nearest neighbors of each point.
It simply stores instances of the training data, that’s why it is a type of non-generalizing
learning.
Scikit-learn modules
Followings are the two different types of nearest neighbor classifiers used by scikit-learn −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 73/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1. KNeighborsClassifier
The K in the name of this classifier represents the k nearest neighbors, where k is an
integer value specified by the user. Hence as the name suggests, this classifier
implements learning based on the k nearest neighbors. The choice of the value of k is
dependent on data.
2. RadiusNeighborsClassifier
The Radius in the name of this classifier represents the nearest neighbors within a
specified radius r, where r is a floating-point value specified by the user. Hence as the
name suggests, this classifier implements learning based on the number neighbors
within a fixed radius r of each training point.
Followings are the two different types of nearest neighbor regressors used by scikit-learn −
KNeighborsRegressor
The K in the name of this regressor represents the k nearest neighbors, where k is an integer
value specified by the user. Hence, as the name suggests, this regressor implements learning
based on the k nearest neighbors. The choice of the value of k is dependent on data. Let’s
understand it more with the help of an implementation example.
Followings are the two different types of nearest neighbor regressors used by scikit-learn −
Implementation Example
In this example, we will be implementing KNN on data set named Iris Flower data set by using
scikit-learn KNeighborsRegressor.
Now, we need to split the data into training and testing data. We will be using Sklearn
train_test_split function to split the data into the ratio of 70 (training data) and 20 (testing data) −
X = iris.data[:, :4]
y = iris.target
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 74/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Next, we will be doing data scaling with the help of Sklearn preprocessing module as follows −
Next, import the KNeighborsRegressor class from Sklearn and provide the value of neighbors
as follows.
Example
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
knnr = KNeighborsRegressor(n_neighbors = 8)
knnr.fit(X_train, y_train)
Output
KNeighborsRegressor(
algorithm = 'auto', leaf_size = 30, metric = 'minkowski',
metric_params = None, n_jobs = None, n_neighbors = 8, p = 2,
weights = 'uniform'
)
Example
Now, we can find the MSE (Mean Squared Error) as follows −
Output
The MSE is: 4.4333349609375
Example
Now, use it to predict the value as follows −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 75/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
[0.66666667]
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
knnr = KNeighborsRegressor(n_neighbors=8)
knnr.fit(X_train, y_train)
RadiusNeighborsRegressor
The Radius in the name of this regressor represents the nearest neighbors within a specified
radius r, where r is a floating-point value specified by the user. Hence as the name suggests, this
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 76/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
regressor implements learning based on the number neighbors within a fixed radius r of each
training point. Let’s understand it more with the help if an implementation example −
Implementation Example
In this example, we will be implementing KNN on data set named Iris Flower data set by using
scikit-learn RadiusNeighborsRegressor −
Now, we need to split the data into training and testing data. We will be using Sklearn
train_test_split function to split the data into the ratio of 70 (training data) and 20 (testing data) −
X = iris.data[:, :4]
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
Next, we will be doing data scaling with the help of Sklearn preprocessing module as follows −
Next, import the RadiusneighborsRegressor class from Sklearn and provide the value of radius
as follows −
import numpy as np
from sklearn.neighbors import RadiusNeighborsRegressor
knnr_r = RadiusNeighborsRegressor(radius=1)
knnr_r.fit(X_train, y_train)
Example
Now, we can find the MSE (Mean Squared Error) as follows −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 77/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
The MSE is: The MSE is: 5.666666666666667
Example
Now, use it to predict the value as follows −
Output
[1.]
iris = load_iris()
X = iris.data[:, :4]
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
import numpy as np
from sklearn.neighbors import RadiusNeighborsRegressor
knnr_r = RadiusNeighborsRegressor(radius = 1)
knnr_r.fit(X_train, y_train)
print ("The MSE is:",format(np.power(y-knnr_r.predict(X),4).mean()))
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import RadiusNeighborsRegressor
knnr_r = RadiusNeighborsRegressor(radius = 1)
knnr_r.fit(X, y)
print(knnr_r.predict([[2.5]]))
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 78/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Bayes theorem states the following relationship in order to find the posterior probability of class
P⟮Y ⟯P ( f eatures⏐Y )
P ( Y ⏐f eatures ) =
( P ( f eatures ) )
The Scikit-learn provides different naïve Bayes classifiers models namely Gaussian, Multinomial,
Complement and Bernoulli. All of them differ mainly by the assumption they make regarding the
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 79/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1 Gaussian Naïve Bayes classifier assumes that the data from each label is drawn from
a simple Gaussian distribution.
It assumes that the features are drawn from a simple Multinomial distribution.
The assumption in this model is that the features binary (0s and 1s) in nature. An
application of Bernoulli Naïve Bayes classification is Text classification with ‘bag of
words’ model
Example
Import Sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']
print(label_names)
print(labels[0])
print(feature_names[0])
print(features[0])
train, test, train_labels, test_labels = train_test_split(
features,labels,test_size = 0.40, random_state = 42
)
from sklearn.naive_bayes import GaussianNB
GNBclf = GaussianNB()
model = GNBclf.fit(train, train_labels)
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 80/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
preds = GNBclf.predict(test)
print(preds)
Output
[
1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1
1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1
1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0
1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0
1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1
0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1
1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0
1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1
1 1 1 1 0 1 0 0 1 1 0 1
]
The above output consists of a series of 0s and 1s which are basically the predicted values from
tumor classes namely malignant and benign.
Decisions tress (DTs) are the most powerful non-parametric supervised learning method. They
can be used for the classification and regression tasks. The main goal of DTs is to create a model
predicting target variable value by learning simple decision rules deduced from the data features.
Decision trees have two main entities; one is root node, where the data splits, and other is
decision nodes or leaves, where we got final output.
ID3
It was developed by Ross Quinlan in 1986. It is also called Iterative Dichotomiser 3. The main
goal of this algorithm is to find those categorical features, for every node, that will yield the largest
information gain for categorical targets.
It lets the tree to be grown to their maximum size and then to improve the tree’s ability on unseen
data, applies a pruning step. The output of this algorithm would be a multiway tree.
C4.5
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 81/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
It is the successor to ID3 and dynamically defines a discrete attribute that partition the continuous
attribute value into a discrete set of intervals. That’s the reason it removed the restriction of
categorical features. It converts the ID3 trained tree into sets of ‘IF-THEN’ rules.
In order to determine the sequence in which these rules should applied, the accuracy of each rule
will be evaluated first.
C5.0
It works similar as C4.5 but it uses less memory and build smaller rulesets. It is more accurate
than C4.5.
CART
It is called Classification and Regression Trees alsgorithm. It basically generates binary splits by
using the features and threshold yielding the largest information gain at each node (called the
Gini index).
Homogeneity depends upon Gini index, higher the value of Gini index, higher would be the
homogeneity. It is like C4.5 algorithm, but, the difference is that it does not compute rule sets and
does not support numerical target variables (regression) as well.
Sklearn Module − The Scikit-learn library provides the module name DecisionTreeClassifier for
performing multiclass classification on dataset.
Parameters
Following table consist the parameters used by sklearn.tree.DecisionTreeClassifier module −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 82/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
criterion − string, optional default= “gini”
It represents the function to measure the quality of a split. Supported criteria are “gini”
and “entropy”. The default is gini which is for Gini impurity while entropy is for the
information gain.
2
splitter − string, optional default= “best”
It tells the model, which strategy from “best” or “random” to choose the split at each
node.
3
max_depth − int or None, optional default=None
This parameter decides the maximum depth of the tree. The default value is None
which means the nodes will expand until all leaves are pure or until all leaves contain
less than min_smaples_split samples.
4
min_samples_split − int, float, optional default=2
This parameter provides the minimum number of samples required to split an internal
node.
5
min_samples_leaf − int, float, optional default=1
6
min_weight_fraction_leaf − float, optional default=0.
With this parameter, the model will get the minimum weighted fraction of the sum of
weights required to be at a leaf node.
7
max_features − int, float, string or None, optional default=None
It gives the model the number of features to be considered when looking for the best
split.
8
random_state − int, RandomState instance or None, optional, default = none
This parameter represents the seed of the pseudo random number generated which is
used while shuffling the data. Followings are the options −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 83/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
int − In this case, random_state is the seed used by random number generator.
None − In this case, the random number generator is the RandonState instance
used by np.random.
9
max_leaf_nodes − int or None, optional default=None
This parameter will let grow a tree with max_leaf_nodes in best-first fashion. The
default is none which means there would be unlimited number of leaf nodes.
10
min_impurity_decrease − float, optional default=0.
This value works as a criterion for a node to split because the model will split a node if
this split induces a decrease of the impurity greater than or equal to
min_impurity_decrease value.
11
min_impurity_split − float, default=1e-7
12
class_weight − dict, list of dicts, “balanced” or None, default=None
It represents the weights associated with classes. The form is {class_label: weight}. If
we use the default option, it means all the classes are supposed to have weight one.
On the other hand, if you choose class_weight: balanced, it will use the values of y to
automatically adjust weights.
13
presort − bool, optional default=False
It tells the model whether to presort the data to speed up the finding of best splits in
fitting. The default is false but of set to true, it may slow down the training process.
Attributes
Following table consist the attributes used by sklearn.tree.DecisionTreeClassifier module −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 84/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
feature_importances_ − array of shape =[n_features]
2
classes_: − array of shape = [n_classes] or a list of such arrays
It represents the classes labels i.e. the single output problem, or a list of arrays of
class labels i.e. multi-output problem.
3
max_features_ − int
4
n_classes_ − int or list
It represents the number of classes i.e. the single output problem, or a list of number of
classes for every output i.e. multi-output problem.
5
n_features_ − int
6
n_outputs_ − int
Methods
Following table consist the methods used by sklearn.tree.DecisionTreeClassifier module −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 85/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1
apply(self, X[, check_input])
2
decision_path(self, X[, check_input])
As name suggests, this method will return the decision path in the tree
3
fit(self, X, y[, sample_weight, …])
fit() method will build a decision tree classifier from given training set (X, y).
4
get_depth(self)
As name suggests, this method will return the depth of the decision tree
5
get_n_leaves(self)
As name suggests, this method will return the number of leaves of the decision tree.
6
get_params(self[, deep])
7
predict(self, X[, check_input])
8
predict_log_proba(self, X)
9
predict_proba(self, X[, check_input])
10
score(self, X, y[, sample_weight])
As the name implies, the score() method will return the mean accuracy on the given
test data and labels..
11
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 86/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
set_params(self, \*\*params)
Implementation Example
The Python script below will use sklearn.tree.DecisionTreeClassifier module to construct a
classifier for predicting male or female from our data set having 25 samples and two features
namely ‘height’ and ‘length of hair’ −
Output
['Woman']
We can also predict the probability of each class by using following python predict_proba()
method as follows −
Example
prediction = DTclf.predict_proba([[135,29]])
print(prediction)
Output
[[0. 1.]]
Sklearn Module − The Scikit-learn library provides the module name DecisionTreeRegressor
for applying decision trees on regression problems.
Parameters
Parameters used by DecisionTreeRegressor are almost same as that were used in
DecisionTreeClassifier module. The difference lies in ‘criterion’ parameter. For
DecisionTreeRegressor modules ‘criterion: string, optional default= “mse”’ parameter have the
following values −
mse − It stands for the mean squared error. It is equal to variance reduction as feature
selectin criterion. It minimises the L2 loss using the mean of each terminal node.
freidman_mse − It also uses mean squared error but with Friedman’s improvement score.
mae − It stands for the mean absolute error. It minimizes the L1 loss using the median of
each terminal node.
Attributes
Attributes of DecisionTreeRegressor are also same as that were of DecisionTreeClassifier
module. The difference is that it does not have ‘classes_’ and ‘n_classes_’ attributes.
Methods
Methods of DecisionTreeRegressor are also same as that were of DecisionTreeClassifier
module. The difference is that it does not have ‘predict_log_proba()’ and ‘predict_proba()’’
attributes.
Implementation Example
The fit() method in Decision tree regression model will take floating point values of y. let’s see a
simple implementation example by using Sklearn.tree.DecisionTreeRegressor −
Once fitted, we can use this regression model to make prediction as follows −
DTreg.predict([[4, 5]])
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 88/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
array([1.5])
Here, ‘max_features’ is the size of the random subsets of features to consider when splitting a
node. If we choose this parameter’s value to none then it will consider all the features rather than
a random subset. On the other hand, n_estimators are the number of trees in the forest. The
higher the number of trees, the better the result will be. But it will take longer to compute also.
Implementation example
In the following example, we are building a random forest classifier by using
sklearn.ensemble.RandomForestClassifier and also checking its accuracy also by using
cross_val_score module.
Output
0.9997
Example
We can also use the sklearn dataset to build Random Forest classifier. As in the following
example we are using iris dataset. We will also find its accuracy score and confusion matrix.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accura
path = "https://archive.ics.uci.edu/ml/machine-learning-database
s/iris/iris.data"
headernames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width'
dataset = pd.read_csv(path, names = headernames)
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
RFclf = RandomForestClassifier(n_estimators = 50)
RFclf.fit(X_train, y_train)
y_pred = RFclf.predict(X_test)
result = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(result)
result1 = classification_report(y_test, y_pred)
print("Classification Report:",)
print (result1)
result2 = accuracy_score(y_test,y_pred)
print("Accuracy:",result2)
Output
Confusion Matrix:
[[14 0 0]
[ 0 18 1]
[ 0 0 12]]
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 90/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Classification Report:
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 14
Iris-versicolor 1.00 0.95 0.97 19
Iris-virginica 0.92 1.00 0.96 12
Accuracy: 0.9777777777777777
Implementation example
In the following example, we are building a random forest regressor by using
sklearn.ensemble.RandomForestregressor and also predicting for new values by using
predict() method.
Output
RandomForestRegressor(
bootstrap = True, criterion = 'mse', max_depth = 10,
max_features = 'auto', max_leaf_nodes = None,
min_impurity_decrease = 0.0, min_impurity_split = None,
min_samples_leaf = 1, min_samples_split = 2,
min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None,
oob_score = False, random_state = 0, verbose = 0, warm_start = False
)
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 91/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
print(RFregr.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))
Output
[98.47729198]
Extra-Tree Methods
For each feature under consideration, it selects a random value for the split. The benefit of using
extra tree methods is that it allows to reduce the variance of the model a bit more. The
disadvantage of using these methods is that it slightly increases the bias.
Implementation example
In the following example, we are building a random forest classifier by using
sklearn.ensemble.ExtraTreeClassifier and also checking its accuracy by using
cross_val_score module.
Output
1.0
Example
We can also use the sklearn dataset to build classifier using Extra-Tree method. As in the
following example we are using Pima-Indian dataset.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 92/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age
data = read_csv(path, names=headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
kfold = KFold(n_splits=10, random_state=seed)
num_trees = 150
max_features = 5
ETclf = ExtraTreesClassifier(n_estimators=num_trees, max_features=max_featu
results = cross_val_score(ETclf, X, Y, cv=kfold)
print(results.mean())
Output
0.7551435406698566
Implementation example
In the following example, we are applying sklearn.ensemble.ExtraTreesregressor and on the
same data as we used while creating random forest regressor. Let’s see the difference in the
Output
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 93/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
Once fitted we can predict from regression model as follows −
print(ETregr.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))
Output
[85.50955817]
Boosting methods build ensemble model in an increment way. The main principle is to build the
model incrementally by training each base model estimator sequentially. In order to build powerful
ensemble, these methods basically combine several week learners which are sequentially trained
over multiple iterations of training data. The sklearn.ensemble module is having following two
boosting methods.
AdaBoost
It is one of the most successful boosting ensemble method whose main key is in the way they
give weights to the instances in dataset. That’s why the algorithm needs to pay less attention to
the instances while constructing subsequent models.
Implementation example
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 94/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
AdaBoostClassifier(algorithm = 'SAMME.R', base_estimator = None,
learning_rate = 1.0, n_estimators = 100, random_state = 0)
Example
Once fitted, we can predict for new values as follows −
print(ADBclf.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))
Output
[1]
Example
Now we can check the score as follows −
ADBclf.score(X, y)
Output
0.995
Example
We can also use the sklearn dataset to build classifier using Extra-Tree method. For example, in
an example given below, we are using Pima-Indian dataset.
seed = 5
kfold = KFold(n_splits = 10, random_state = seed)
num_trees = 100
max_features = 5
ADBclf = AdaBoostClassifier(n_estimators = num_trees, max_features = max_fe
results = cross_val_score(ADBclf, X, Y, cv = kfold)
print(results.mean())
Output
0.7851435406698566
Implementation example
In the following example, we are building a AdaBoost regressor by using
sklearn.ensemble.AdaBoostregressor and also predicting for new values by using predict()
method.
Output
AdaBoostRegressor(base_estimator = None, learning_rate = 1.0, loss = 'linear',
n_estimators = 100, random_state = 0)
Example
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 96/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
print(ADBregr.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))
Output
[85.50955817]
On the other hand, if we choose this parameter’s value to exponential then it recovers the
AdaBoost algorithm. The parameter n_estimators will control the number of week learners. A
hyper-parameter named learning_rate (in the range of (0.0, 1.0]) will control overfitting via
shrinkage.
Implementation example
In the following example, we are building a Gradient Boosting classifier by using
sklearn.ensemble.GradientBoostingClassifier. We are fitting this classifier with 50 week
learners.
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 97/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
0.8724285714285714
Example
We can also use the sklearn dataset to build classifier using Gradient Boosting Classifier. As in
the following example we are using Pima-Indian dataset.
Output
0.7946582356674234
Implementation example
In the following example, we are building a Gradient Boosting regressor by using
sklearn.ensemble.GradientBoostingregressor and also finding the mean squared error by
using mean_squared_error() method.
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_friedman1
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 98/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
mean_squared_error(y_test, GDBreg.predict(X_test))
Output
5.391246106657164
Clustering methods, one of the most useful unsupervised ML methods, used to find similarity &
relationship patterns among data samples. After that, they cluster those samples into groups
having similarity based on features. Clustering determines the intrinsic grouping among the
present unlabeled data, that’s why it is important.
The Scikit-learn library have sklearn.cluster to perform clustering of unlabeled data. Under this
module scikit-leran have the following clustering methods −
KMeans
This algorithm computes the centroids and iterates until it finds optimal centroid. It requires the
number of clusters to be specified that’s why it assumes that they are already known. The main
logic of this algorithm is to cluster the data separating samples in n number of groups of equal
variances by minimizing the criteria known as the inertia. The number of clusters identified by
algorithm is represented by ‘K.
Affinity Propagation
This algorithm is based on the concept of ‘message passing’ between different pairs of samples
until convergence. It does not require the number of clusters to be specified before running the
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 99/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
algorithm. The algorithm has a time complexity of the order ( 2𝑇), which is the biggest
𝑂 𝑁
disadvantage of it.
Mean Shift
This algorithm mainly discovers blobs in a smooth density of samples. It assigns the datapoints
to the clusters iteratively by shifting points towards the highest density of datapoints. Instead of
relying on a parameter named bandwidth dictating the size of the region to search through, it
automatically sets the number of clusters.
Spectral Clustering
Before clustering, this algorithm basically uses the eigenvalues i.e. spectrum of the similarity
matrix of the data to perform dimensionality reduction in fewer dimensions. The use of this
algorithm is not advisable when there are large number of clusters.
Hierarchical Clustering
This algorithm builds nested clusters by merging or splitting the clusters successively. This cluster
hierarchy is represented as dendrogram i.e. tree. It falls into following two categories −
Agglomerative hierarchical algorithms − In this kind of hierarchical algorithm, every data point
is treated like a single cluster. It then successively agglomerates the pairs of clusters. This uses
the bottom-up approach.
Divisive hierarchical algorithms − In this hierarchical algorithm, all data points are treated as
one big cluster. In this the process of clustering involves dividing, by using top-down approach,
the one big cluster into various small clusters.
DBSCAN
It stands for “Density-based spatial clustering of applications with noise”. This algorithm is
based on the intuitive notion of “clusters” & “noise” that clusters are dense regions of the lower
density in the data space, separated by lower density regions of data points.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 100/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Higher value of parameter min_samples or lower value of the parameter eps will give an
indication about the higher density of data points which is necessary to form a cluster.
OPTICS
It stands for “Ordering points to identify the clustering structure”. This algorithm also finds
density-based clusters in spatial data. It’s basic working logic is like DBSCAN.
BIRCH
It stands for Balanced iterative reducing and clustering using hierarchies. It is used to perform
hierarchical clustering over large data sets. It builds a tree named CFT i.e. Characteristics
Feature Tree, for the given data.
The advantage of CFT is that the data nodes called CF (Characteristics Feature) nodes holds the
necessary information for clustering which further prevents the need to hold the entire input data
in memory.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 101/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
digits = load_digits()
digits.data.shape
Output
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 102/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
1797, 64)
This output shows that digit dataset is having 1797 samples with 64 features.
Example
Now, perform the K-Means clustering as follows −
Output
(10, 64)
This output shows that K-means clustering created 10 clusters with 64 features.
Example
fig, ax = plt.subplots(2, 5, figsize = (8, 3))
centers = kmeans.cluster_centers_.reshape(10, 8, 8)
for axi, center in zip(ax.flat, centers):
axi.set(xticks = [], yticks = [])
axi.imshow(center, interpolation = 'nearest', cmap = plt.cm.binary)
Output
The below output has images showing clusters centers learned by K-Means Clustering.
Next, the Python script below will match the learned cluster labels (by K-Means) with the true
labels found in them −
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 103/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
mask = (clusters == i)
labels[mask] = mode(digits.target[mask])[0]
We can also check the accuracy with the help of the below mentioned command.
Output
0.7935447968836951
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 104/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Following are some important and mostly used functions given by the Scikit-learn for evaluating
clustering performance −
It has two parameters namely labels_true, which is ground truth class labels, and labels_pred,
which are clusters label to evaluate.
Example
from sklearn.metrics.cluster import adjusted_rand_score
labels_true = [0, 0, 1, 1, 1, 1]
labels_pred = [0, 0, 2, 2, 3, 3]
adjusted_rand_score(labels_true, labels_pred)
Output
0.4444444444444445
Perfect labeling would be scored 1 and bad labelling or independent labelling is scored 0 or
negative.
Example
from sklearn.metrics.cluster import normalized_mutual_info_score
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 105/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
labels_true = [0, 0, 1, 1, 1, 1]
labels_pred = [0, 0, 2, 2, 3, 3]
Output
0.7611702597222881
Example
from sklearn.metrics.cluster import adjusted_mutual_info_score
labels_true = [0, 0, 1, 1, 1, 1]
labels_pred = [0, 0, 2, 2, 3, 3]
Output
0.4444444444444448
Fowlkes-Mallows Score
The Fowlkes-Mallows function measures the similarity of two clustering of a set of points. It may
be defined as the geometric mean of the pairwise precision and recall.
Mathematically,
TP
FMS =
√‾
(T
‾‾‾‾‾‾
P + F
‾‾‾‾‾‾‾
P) (T P‾‾‾‾‾‾
+ F N‾)
Here, TP = True Positive − number of pair of points belonging to the same clusters in true as
well as predicted labels both.
FP = False Positive − number of pair of points belonging to the same clusters in true labels but
not in the predicted labels.
FN = False Negative − number of pair of points belonging to the same clusters in the predicted
labels but not in the true labels.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 106/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
from sklearn.metrics.cluster import fowlkes_mallows_score
labels_true = [0, 0, 1, 1, 1, 1]
labels_pred = [0, 0, 2, 2, 3, 3]
Output
0.6546536707079771
Silhouette Coefficient
The Silhouette function will compute the mean Silhouette Coefficient of all samples using the
mean intra-cluster distance and the mean nearest-cluster distance for each sample.
Mathematically,
S = (b − a) /max (a, b)
Example
from sklearn import metrics.silhouette_score
from sklearn.metrics import pairwise_distances
from sklearn import datasets
import numpy as np
from sklearn.cluster import KMeans
dataset = datasets.load_iris()
X = dataset.data
y = dataset.target
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 107/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Output
0.5528190123564091
Contingency Matrix
This matrix will report the intersection cardinality for every trusted pair of (true, predicted).
Confusion matrix for classification problems is a square contingency matrix.
Example
from sklearn.metrics.cluster import contingency_matrix
x = ["a", "a", "a", "b", "b", "b"]
y = [1, 1, 2, 0, 1, 2]
contingency_matrix(x, y)
Output
array([
[0, 2, 1],
[1, 1, 1]
])
The first row of above output shows that among three samples whose true cluster is “a”, none of
them is in 0, two of the are in 1 and 1 is in 2. On the other hand, second row shows that among
three samples whose true cluster is “b”, 1 is in 0, 1 is in 1 and 1 is in 2.
Exact PCA
Principal Component Analysis (PCA) is used for linear dimensionality reduction using Singular
Value Decomposition (SVD) of the data to project it to a lower dimensional space. While
decomposition using PCA, input data is centered but not scaled for each feature before applying
the SVD.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 108/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
The below example will use sklearn.decomposition.PCA module to find best 5 Principal
components from Pima Indians Diabetes dataset.
Output
Explained Variance: [0.88854663 0.06159078 0.02579012 0.01308614 0.00744094]
[
[-2.02176587e-03 9.78115765e-02 1.60930503e-02 6.07566861e-029.93110844e-01 1
[-2.26488861e-02 -9.72210040e-01 -1.41909330e-01 5.78614699e-029.46266913e-02
Incremental PCA
Incremental Principal Component Analysis (IPCA) is used to address the biggest limitation of
Principal Component Analysis (PCA) and that is PCA only supports batch processing, means all
the input data to be processed should fit in the memory.
Same as PCA, while decomposition using IPCA, input data is centered but not scaled for each
feature before applying the SVD.
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 109/111
06/10/2022, 18:24 Scikit Learn - Quick Guide
Example
The below example will use sklearn.decomposition.IPCA module on Sklearn digit dataset.
Output
(1797, 10)
Here, we can partially fit on smaller batches of data (as we did on 100 per batch) or you can let
the fit() function to divide the data into batches.
Kernel PCA
Kernel Principal Component Analysis, an extension of PCA, achieves non-linear dimensionality
reduction using kernels. It supports both transform and inverse_transform.
Example
The below example will use sklearn.decomposition.KernelPCA module on Sklearn digit
dataset. We are using sigmoid kernel.
Output
(1797, 10)
Example
The below example will use sklearn.decomposition.PCA module with the optional parameter
svd_solver=’randomized’ to find best 7 Principal components from Pima Indians Diabetes
dataset.
Output
Explained Variance: [8.88546635e-01 6.15907837e-02 2.57901189e-02 1.30861374e-02
[
[-2.02176587e-03 9.78115765e-02 1.60930503e-02 6.07566861e-029.93110844e-01 1
[-2.26488861e-02 -9.72210040e-01 -1.41909330e-01 5.78614699e-029.46266913e-02
[-2.24649003e-02 1.43428710e-01 -9.22467192e-01 -3.07013055e-012.09773019e-02
[-4.90459604e-02 1.19830016e-01 -2.62742788e-01 8.84369380e-01-6.55503615e-02
[ 1.51612874e-01 -8.79407680e-02 -2.32165009e-01 2.59973487e-01-1.72312241e-0
[-5.04730888e-03 5.07391813e-02 7.56365525e-02 2.21363068e-01-6.13326472e-03
[ 9.86672995e-01 8.83426114e-04 -1.22975947e-03 -3.76444746e-041.42307394e-03
]
https://www.tutorialspoint.com/scikit_learn/scikit_learn_quick_guide.htm 111/111