Scikit Learn
Scikit Learn
It provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and diversionary reduction . Also scikit-learn focused on modeling the data. This library is built upon NumPy, Scipy and
Matplotlib.This tutorial will explore statistical learning with scikit-learn.
1- Installation
or in Anaconda:
2- Features :
Clustering
Cross Validation
Dimensionality Reduction
Ensemble methods
Feature Extraction
Feature Selection
Open Source
3- Modelling Process
- Features: variable of data are called its features. They are also known as predictors, inputs or attributes:
- Response: output variable that basically depends upon the feature variables. They are also known as target, label or output:
Scikit-learn have few example datasets like iris and digits for classification and the Boston house price for regression.
output:
To check the accuracy of model, we can split the dataset into two pieces- a training set and a test set. then use the training set to train the model nd testing set to test
the model.
train_test_split(X,y,test_size,random_size):
-X: is feature matrix
-test_size: this represent the ratio of test data to the total given data.
-random_size: it is used to guarantee that the split will always be the same.
X=iris.data
y=iris.target
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
output:
Now, we can use our dataset to train some prediction-model. here for example we use KNN(have seprate chapter for that)
output:
Before inputting data to machine learning algorithms, we need to convert it into meaningful data. This process is called Preprocessing the Data . Scikit-learn has
package named preprocessing.
3-4-1- Binarisation
for example we need to convert numerical values into Boolean values. here , use threshold value=0.5 , then all values above 0.5 would be converted to 1, and other
value converted to 0.
import numpy as np
from sklearn import preprocessing
data=np.array([[2.1,-5,6.3,-0.1],
[3.9,9.8,-6.3,-7],
[2,3.8,6,5],
[-8,0,2.1,6]])
print(f'Binarized data is :\n {preprocessing.Binarizer(threshold=0.5).transform(data)}')
output:
3-4-2- Mean and Standard division
import numpy as np
from sklearn import preprocessing
data=np.array([[2.1,-5,6.3,-0.1],
[3.9,9.8,-6.3,-7],
[2,3.8,6,5],
[-8,0,2.1,6]])
print(f'Mean of data in axis=0 is : {data.mean(axis=0)}')
print(f'Standard division of data in axis=0 is : {data.std(axis=0)}')
output:
3-4-3- Scaling
We use scaling technique, because the features should not be synthetically large or small.
import numpy as np
from sklearn import preprocessing
data=np.array([[2.1,-5,6.3,-0.1],
[3.9,9.8,-6.3,-7],
[2,3.8,6,5],
[-8,0,2.1,6]])
scaler=preprocessing.MinMaxScaler(feature_range=(0,1))
data_scalered=scaler.fit_transform(data)
print(f'minmax scalered data in is \n : {data_scalered}')
output:
3-4-4- Normalization
We use Normalization technique for modify the feature vectors. Normalization is necessary so that the feature vectors can be measured t common scale. there are
two type of normalization: L1 Normalization and L2 Normalization
L1 Normalization also called Least Absolute Deviations. It modifies the value in such a manner that the sum of the absolute values remains always up to 1 in each
row.
import numpy as np
from sklearn import preprocessing
data=np.array([[1,6.2,-5,3],
[-3.6,2,8,7],
[-3.6,5.2,0,1],
[-9,5.2,14,4]])
normalize_data=preprocessing.normalize(data,norm='l1')
print(f'L1 Norm of data is:\n{normalize_data}')
output:
L2 Normalization also called Least Squares. It modifies the value in such a manner that the sum of the squares remains always up to 1 in each row.
import numpy as np
from sklearn import preprocessing
data=np.array([[1,6.2,-5,3],
[-3.6,2,8,7],
[-3.6,5.2,0,1],
[-9,5.2,14,4]])
normalize_data=preprocessing.normalize(data,norm='l2')
print(f'L2 Norm of data is:\n{normalize_data}')
output:
4- Data Representation
As we know machine learning is about to create model from data. for this purpose, computer must understand the data first. here, we are going to discuss various
ways to represent the data:
the best way to represent data is the form of tables.A table represents a 2-D grid of data where rows represent the individual elements of the database and columns
represents the quantities related to those individual elements. as you see, each column of the data represents a quantitative information describing each sample.
output:
Feature matrix may be defined as the table layout where information can be thought of as a 2-D matrix. it is stored in a variable named X and assumed to be two
dimensional with shape[n_samples,n_features]. Mostly, it is contained in a NumPy array or Pandas DataFrame. The samples always represent the individual objects
described by the dataset and the features represent the distinct observations that describe each sample in a quantitative manner.
It is also called label. it is denoted by y . The label or target array is usually one-dimensional having length n_samples. It is generally contained in NumPy array or
Pandas Series. Target array may have both the values, continuous numerical values and discrete values.
we can distinguish target array from feature columns by one point that the target array is usually the quantity we want to predict from the data. in statistical term it is
the dependent variable.
in the example, from iris dataset predict the species of flower based on other measurement.
output:
5- Estimator API
It is one of the main APIs implemented by Scikit-learn. It provides a consistent interface for a wide range of ML applications that's why all machine learning algorithms
in Scikit-Learn are implemented via Estimator API. The object that learns from the data (fitting the data) is an estimator. It can be used with any of the algorithms like
classification, regression, clustering or even with a transformer, that extracts useful features from row data.
For fitting the data, all estimator objects expose a fit method that takes a dataset shown as follows:
estimator.fit(data)
Next, all the parameters of an estimator can be set, as follows, when it is instantiated by the corresponding attribute:
Ones data is fitted with an estimator, parameters are estimated from the data at hand.
Estimator object is used for estimation and decoding of a model. Furthermore, the model is estimated as a deterministic function of the following:
5-2- Steps in using Estimator API1. Choose a class of model 1. Choose model hyperparameters 1. Arranging the data 1. Model Fitting 1. Applying the model
Scikit-learn's object share a uniform basic API that consist of the following three complementary interfaces:
import numpy as np
from sklearn import random_projection
rannage=np.random.RandomState(0)
X=rannage.rand(10,2000)
X=np.array(X,dtype='float32')
print(X.dtype)
Transformer_data=random_projection.GaussianRandomProjection()
X_new=Transformer_data.fit_transform(X)
print(X_new.dtype)
output:
float32
float64
Hyper-parameters of an estimator can be updated and refitted after it has been constructed via the set_params()
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
X,y=load_iris(return_X_y=True)
clf=SVC()
clf.set_params(kernel='linear').fit(X,y)
clf.predict(X[:5])
output:
Now we can change back the kernel to rbf to refit the estimator and to make a second prediction:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
X,y=load_iris(return_X_y=True)
clf=SVC()
# clf.set_params(kernel='linear').fit(X,y)
# clf.predict(X[:5])
clf.set_params(kernel='rbf',gamma='scale').fit(X,y)
clf.predict(X[:5])
output:
In case of multiclass fitting, both learning and the prediction takes are dependent on the format of the target data fit upon.The module used is sklearn.multiclass.
output:
output:
7- Scikit Learn - Linear Modeling Let us begin by understanding what is Linear Regression in Sklearn.
7-1: Linear Regression studies the relationship between a dependent variable(Y) with a given set of independent variables(X).
7-2: Logistic Regression is a classification algorithm rather than regression algorithm. Based on a given set of independent variables, it is used to estimate discrete
value(0 or 1, yes/no, true/false)
7-3: Ridge Regression or Tikhonov regularization technique that perform L2 regularization. It modifies the loss function by adding the penalty equivalent to the square
of the magnitude of coefficients.
7-4: Bayesian Ridge Regression allows a natural mechanism to survive insufficient data or poorly distributed data by formulating linear regression using probability
distributors rather than point estimates.
7-5: LASSO is the regularization technique that perform L1 regularization. It modifies the loss function by adding the penalty equivalent to the summation of the
absolute value of coefficients.
7-6: Multi-task LASSO allows to fit multiple regression problems jointly enforcing the selected features to be same for all the regression problems, also called tasks.
Sklearn provides a linear model named MultiTaskLasso, trained with a mixed L1, L2-norm for regularization, which estimates sparse coefficients for multiple
regression problems jointly.
7-7: Elastic-Net is a regularized regression method that linearly combines both penalties. It is useful when there are multiple correlated features.
7-8: Multi-task Elastic-Net allows to fit multiple regression problems jointly enforcing the selected features to be same for all the regression problems, also called
tasks.
SGD is an optimization algorithm in Sklearn. This algorithm used to find the values of parameters/coefficients of functions that minimize a cost function. It has been
successfully applied to large-scale datasets because the update to the coefficients is performed for each training instance, rather than at the end of instance.
Stochastic Gradient Descent classifier basically implements a plain SGD learning routine supporting various loss functions and penalties for classification. Scikit-learn
provides SGDClassifier module to implement SGD classification.
Like other Classifiers, SGD has to be fitted with following two arrays:
import numpy as np
from sklearn import linear_model
X=np.array([[-1,1],[-2,-3],[2,3],[1,2]])
Y=np.array([1,2,5,1])
SGDCLF=linear_model.SGDClassifier(max_iter=1000,tol=1e-3,penalty='elasticnet')
SGDCLF.fit(X,Y)
print(SGDCLF.predict([[2,1]]))
print(SGDCLF.coef_)
output:
Support Vector Machines(SVM) used for classification, regression, and , outlier's detection. SVMs are very efficient in high dimensional spaces and generally are used
in classification problems.
The main goal of SVMs is to divide the datasets into number of classes in order to find a maximum marginal hyperplane(MMH) which can done in the following two
steps:
Support Vector Machines will first generate hyperplanes iteratively that separates the classes in the best way.
After that it will choose the hyperplane that separate the classes correctly.
Scikit-learn provides three classes namely SVC,NuSVC,LinearSVC which can perform multi-class classification.
9-1: SVC
This module used by scikit-learn is sklearn.svm.svc. This class handles the multiclass support according to one-vs-one scheme.
9-1-1:Parameters
9-1-2:Implementation Example
import numpy as np
from sklearn.svm import SVC
X=np.array([[-1,1],[-2,-3],[2,3],[1,2]])
y=np.array([1,2,2,1])
SVCCLF=SVC(kernel='linear',gamma='scale',shrinking=False)
SVCCLF.fit(X,y)
print(SVCCLF.predict([[-0.5,-0.8]]))
print(SVCCLF.coef_)
output:
9-2: NuSVC
It is another class provided by scikit-learn which can perform multi-class classification. it is like SVC but NuSVC accepts slightly different sets of parameters:
- nu represents an upper bound on the fraction of training errors and a lower bound of the fraction of suport vectors. its value should be in the interval of (0,1].
9-2-1:Implementation Example
import numpy as np
from sklearn.svm import NuSVC
X=np.array([[-1,1],[-2,-3],[2,3],[1,2]])
y=np.array([1,2,2,1])
NuSVCCLF=NuSVC(kernel='linear',gamma='scale',shrinking=False)
NuSVCCLF.fit(X,y)
print(NuSVCCLF.predict([[-0.5,-0.8]]))
print(NuSVCCLF.coef_)
output:
9-3: LinearSVC
It is Linear Support Vector Classification. It is similar to SVC having kernel='linear'. The difference between them is that LinearSVC implemented in terms of liblinear
while SVC is implemented in libsvm. That's the reason LinearSVC has more flexibility in the choice of penalties and loss functions. It also scales better to large
number of samples.
9-3-1:Implementation Example
import numpy as np
from sklearn.svm import LinearSVC
X=np.array([[-1,1],[-2,-3],[2,3],[1,2]])
y=np.array([1,2,2,1])
LSVCClf=LinearSVC(dual=False,random_state=0,penalty='l1',tol=1e-5)
LSVCClf.fit(X,y)
print(LSVCClf.predict([[-0.5,-0.8]]))
print(LSVCClf.coef_)
output:
SVM is used for both classification and regression problems. Scikit-learn's method of Support Vector Classification (SVC) can be extended to solve regression
problems as well. That extended method is called Support Vector Regression (SVR).
Scikit-learn provides three classes namely SVR,NuSVR and LinearSVR as three different implementation of SVR.
9-4-1:SVR
It is epsilon-support vector regression whose implementation is based on libsvm. here, parameter epsilon represents the epsilon-SVR model, and specifies the
epsilon-tube within within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.
from sklearn.svm import SVR
X=[[1,2],[3,4]]
y=[1,3]
SVRREG=SVR(kernel='linear',gamma='auto')
SVRREG.fit(X,y)
print(SVRREG.predict([[1,2]]))
output:
[1.1]
9-4-2:NuSVR
It is like NuSVC, but NuSVR uses a parameter nu to control the number of support vectors. And moreover, unlike NuSVC where nu replaced C parameter, here it
replaces epsilon .
output:
9-5: LinearSVR
It is similar to SVR having kernel='linear'. The difference between them is that LinearSVR implemented in terms of liblinear. while SVC implemented in libsvm. That's
the reason LinearSVR has more flexibility in the choice of penalties and loss functions. It also scales better to large number of samples. Also it does not support
'kernel' because it is assumed to be linear.
output:
10- Scikit-Learn: Anomaly Detection
Anomaly Detection is a technique used to identify data points in dataset that does not fit well with the rest of the data. Two methods namely outlier detection and
novelty detection can be used for anomaly detection.
outlier detection: The training data contains outliers that are far from the rest of the data. Such outliers are defined as observations. That's the reason, outlier
detection estimators always try to fit the region having most concentrated training data while ignoring the deviant observations.
Novelty detection: It is concerned with detecting an unobserved pattern in new observations which is not included in training data.Here, the training data is not
polluted by outliers.
There are set of ML tools, provided by scikit-learn, which can be used for both outlier detection as well as novelty detection. These tools first implementing object
learning from the data using fit() method as follows:
estimator.fit(X_train)
estimator.predict(X_test)
Fitting an elliptic envelop: This algorithm assume that regular data comes from a known distribution such as Gaussian distribution. For Outlier detection,
Scikit-learn provides an object named covariance.EllipticEnvelop. This object fits a robust covariance estimate to the data, and thus, fits an ellipse to the
central data points. It ignores the points outside the central mode.
import numpy as np
from sklearn.covariance import EllipticEnvelope
data=np.array([[5,6],[6,9]])
X=np.random.RandomState(0).multivariate_normal(mean=[0,0],cov=data,size=500)
cov=EllipticEnvelope(random_state=0).fit(X)
cov.predict([[-5,-3],[2,4]])
Isolation Forest: In case of high-dimensional dataset, one efficient way for outlier detection is to use random forests. Scikit-learn provides
ensemble.IsolationForest method that isolates the observations by randomly selecting a feature. Afterwards, it randomly selects a value between the
maximum and minimum values of the selected features.
Local Outlier Factor: LOF algorithm is another to perform outlier detection on high dimension data. Scikit-learn provides neighbors.LocalOutlierFactor method
that computes a score, called local outlier factor, reflecting the degree of anomality of the observations. The main logic of this algorithm is to detect the
samples that have a substantially lower density than its neighbors.
output:
One-Class SVM: This algorithm is very effecient in high-dimensional data and estimates the support of a high dimensional distribution. It is implemented in the
Support Vector Machine module in the sklearn.svm.OneClassSVM object.
To find a predefined number of training sample closest in distance to the new data point
Predict the label from these number of training samples
Here, the number of samples can be a user-defined constant like in K-nearest neighbor learning or very based on the local density of point like in radius-based
neighbor learning. For this, Scikit-learn have sklearn.neighbors module.
Different types of algorithms which can be used in neighbor-based methods implementation are as follows:
Brute Force: The brute-force computation of distance between all pairs of points in the dataset provides the most naive neighbor search implementation. for N
samples in D dimensions, brute-force approach scales as O[DN2 ]. For small data samples, this algorithm can be very useful, but it becomes infeasible as and
when number of samples grows. Brute-Force neighbor search can be enabled by writing keyword algorithm='brute'.
K-D Tree: K-D Tree is a binary tree structure which is called K-dimensional tree. It recursively partitions the parameters space along the data axes by dividing it
into nested orthographic regions into which the data points are filled. It have been invented to address the computational inefficiencies of the brute-force
approach.This algorithm takes very less distance computations to determine the nearest neighbor of a query point and takes O[log(N)] distance computations.
K-D tree neighbor searches can be enabled by writing the keyword algorithm='kd_tree'.
Ball Tree: KD Tree is inefficient in higher dimensions. For this, Ball Tree was developed. This algorithm recursively divides the data, into nodes defined by a
centroid C and radius r. It uses triangle inequality which reduces the number of candidate points for a neighbor search:
|X+Y|≤|X|+|Y|
There are most important factors to be considered while choosing Nearest Neighbor algorithm:
KNN is non-parametric and lazy in nature. Non-parametric means that there is no assumption for the underlying data distribution i.e. the model structure is
determined from the dataset. Lazy or instance-based learning means that for the purpose of model generation, it does not require any training data points and whole
training data is used in the testing phase.
1. in this step, it computes and stores the k nearest neighbors for each sample in the training set.
2. in this step, for an unlabeled sample, it retrieves the k nearest neighbors from dataset. Then among these k-nearest neighbors, it predicts the class through
voting(class with majority votes wins).
sklearn.neighbors.NearestNeighbors is the module used to implement unsupervised nearest neighbor learning. It uses nearest neighbor algorithms named
BallTree,KDTree or Brute Force.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
iris=load_iris()
X=iris.data[:,:4]
y=iris.target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)
scaler=StandardScaler()
scaler.fit(X_train)
X_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)
knnr=KNeighborsRegressor(n_neighbors=5)
knn=knnr.fit(X_train,y_train)
print ("The MSE is:",format(np.power(y-knnr.predict(X),4).mean()))
Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes theorem with a strong assumption that all the predictors are independent
to each other.
Bayes theorem is a mathematical formula used to determine the condition probability of events. Essentially, the Bayes theorem describes the probability of an event
based on prior knowledge of the conditions that might be relevant to the event.
where:
The Scikit-learn provides different naive Bayes classifier models namely Gaussian, Multinomial, Complement and Bernoulli.
Gaussian Naive Bayes: this classifier assumes that the data from each label is drown from a simple Gaussian distribution.
Multinomial Naive Bayes : It assumes that the features are drown from a simple Multinomial distribution.
Bernoulli Naive Bayes: Assumption in this model is that features binary (0s and 1a) in nature.
Complement Naive Bayes: it was designed to correct the severe assumptions made by Multinomial Bayes classifier.
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
label_names=data['target_names']
labels=data['target']
feature_names=data['feature_names']
feature=data['data']
train,test,train_labels,test_labels=train_test_split(feature,labels,test_size=0.40,random_state=42)
from sklearn.naive_bayes import GaussianNB
GNBclf=GaussianNB()
model=GNBclf.fit(train,train_labels)
pred=GNBclf.predict(test)
print(pred)
output:
Decision Tree re the most powerful non-parametric supervised earning method. They can be used for classification and regression tasks. The main goal of DTree is to
create a model predicting target variable value by learning simple decision rules deduced from data features. Decision Tree have two main entities; one s root
node,where the data split, and other is decision nodes or leaves,where we got final output.
data_feture_name=['height','length of hair']
DTclf=tree.DecisionTreeClassifier()
clf=DTclf.fit(X,Y)
prediction=clf.predict([[125,653]])
print(prediction)
output:
['Man']
Decision Tree is usually trained by recursively splitting the data, but being prone to overfit, they have been transformed to random forests by training many trees over
various sub-samples of the data. sklearn.ensemble having two algorithms based on randomized decision trees.
In this algorithm, each decision tree in the ensemble is built from a sample drown with replacement from the training set and then gets the prediction from each of
them and finally selects the best solution by means of voting. It can be used for both classification as well as regression tasks.
Classification by Random Forest : For creating a random forest classifier, Scikit-learn module provides sklearn.ensemble.RandomForestClassifier. Main
parameters are max-features and n_estimators. max_features is the size of the random subsets of features to consider when splitting a node. n_estimators
are the number of trees in the forest.
output:
0.9998000000000001
Reression with Randm Forest: For creating a random forest regression, scikit-learn module provides sklearn.ensemble.RandomForestRegressor. it will use the
sample paramreters as used by sklearn.ensemble.RandomForestClassifier.
output:
0.9705321938477912
For each feature under consideration, it selects a random value for the split. benefit of using extra tree methods is that it allows to reduce the variance of the model.
the disadvantage of using these methods is that slightly increases the bias.
Classification with Extra-Tree Method: For creating a classifier using Extra-Tree method scikit-learn module provides sklearn.ensemble.ExtraTreesClassifier. It
uses the same parameters as used by sklearn.ensemble.RandomForestClassifier.
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs
from sklearn.ensemble import ExtraTreesClassifier
X,y=make_blobs(n_samples=10000,n_features=10,centers=100,random_state=0)
ETclf=ExtraTreesClassifier(n_estimators=10,min_samples_split=2)
score=cross_val_score(ETclf,X,y,cv=5)
print(score.mean())
output:
0.9998000000000001
Regression with Extra-Tree Method: For creating a Extra-Tree regression, scikit-learn module provides sklearn.ensemble.ExtraTreesRegressor. it use the same
parameters as used by sklearn.ensemble.ExtraTreesClassifier.
output:
0.9914243964799774
19-Scikit Learn-Clustering Performance Evaluation
There are many functions with the help of which can evaluate the performance of clustering algorithms. Following are some important and mostly used functions
given by Scikit Learn for evaluating clustering performance:
Rand Index is a function that computes a similarity measures between two clustering. For this computation rand index cosiders all pairs of samples and counting
pairs that are assigned in the similar or different clusters in the predicted and true clustering.
It has two parameters namely labels_true, which is ground truth class labels, and labels_pred, which are clusters label to evaluate.
labels_true=[0,0,1,1,2]
labels_pred=[0,1,1,1,0]
print(adjusted_rand_score(labels_true,labels_pred))
output:
0.090
Mutual Information is a function that compute the agreement of two assignments. It ignores the permutations. There are following versions available:
labels_true=[0,0,1,1,2]
labels_pred=[0,1,1,1,0]
print(normalized_mutual_info_score(labels_true,labels_pred))
output:
0.45
labels_true=[0,0,1,1,2]
labels_pred=[0,1,1,1,0]
print(adjusted_mutual_info_score(labels_true,labels_pred))
output:
0.10
19-3-Fowlkes-Mallows Score
This function measures the similarity of two clustering of a set of points. It may be defined as the geometric mean of the pairwise precision and recall.
TP=True Positive:number of pair of points belonging tot he same clusters in true as well as predicted labels both.
FP=False Positive: number of pair of points belonging to the same clusters in true labels but not in the predicted labels.
FN=False Negative: number of pair of points belonging to the same clusters in the predicted labels but not in the true labels.
output:
0.35
19-4-Silhouette Coefficient
This function will compute the mean Silhouette Coefficient of all samples using the mean intra-cluster distance and the mean nearest-cluster distance for each
sample.
S=(b-a)/max(a,b)
output:
0.55
Dimensionality reduction is one of the popular algorithms for dimensionality reduction, an unsupervised machine learning method is used to reduce the number of
feature variable for each data sample selecting set of principal features.
PCA is used for linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.
Scikit learn provides sklearn.decomposition.PCA module that is implemented as a transformer object which learns n components in its fit() method. in this example
PCA used to find best 5 principal components.
import pandas as pd
from pandas import read_csv
from sklearn.decomposition import PCA
data= pd.read_csv('pima-indians-diabetes.csv')
data=pd.DataFrame(data)
data.columns=['preg','plas','pres','skin','test','mass','pedi','age','class']
array=dataframe.values
X=array[:,0:8]
Y=array[:,8]
pca=PCA(n_components=5)
fit=pca.fit(X)
print(fit.components_)
output:
Incremental Principal Component Analysis (IPCA) is used to address biggest limitation of PCA and that s PCA only supports batch processing,means all the input
data to be processed should fit in the memory. The Scikit-learn provides sklearn.decomposition.IPCA module that makes it possible to implement Out-of-Core PCA
either by using it partial_fit method on sequentially fetched chunks of data or by enabling use of np.memmap, a memory mapped file, without loading the entire file
into memory.
20-2-Kernel PCA
Kernel Prinipal Component Analysis, an extension of PCA, achives on-linear dimensionality reduction using kernels.It supports both ansform and inverse_transform.
Scikit-learn provides sklearn.decomposition.kernelIPCAmodule.