Info Pca

https://www.gatevidyalay.
com/tag/principal-component-analysis-example-ppt
If life is like a bowl of chocolates, you will never know what you will get, but
is there a way to reduce some uncertainty? Dimensionality reduction is the
process of reducing the number of random variables impacting your data.
Come and explore, but make sure you don't let the chocolates melt.
Principal Component Analysis (PCA) is an unsupervised, non-parametric statistical technique primarily used for dimensionality
reduction in machine learning.
PCA can also be used to filter noisy datasets, such as image compression.
The first principal component expresses the most amount of variance
it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational
complexity of the model which makes machine learning algorithms run faster.
Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as
stock market predictions, the analysis of gene expression data, and many more.
the main goal of a PCA analysis is to identify patterns in data; PCA aims to detect the correlation between variables. If a strong correlation between
variables exists, the attempt to reduce the dimensionality only makes sense. In a nutshell, this is what PCA is all about: Finding the directions of
maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information.
Both Linear Discriminant Analysis (LDA) and PCA are linear transformation methods. PCA yields the directions (principal components) that maximize
the variance of the data, whereas LDA also aims to find the directions that maximize the separation (or discrimination) between different classes, which
can be useful in pattern classification problem (PCA "ignores" class labels).
PCA can be applied to both unsupervised and supervised learning scenarios.
Preparing the Iris Dataset

For the following tutorial, we will be working with the famous "Iris" dataset that has been deposited on the UCI machine learning repository
(https://archive.ics.uci.edu/ml/datasets/Iris).
The iris dataset contains measurements for 150 iris flowers from three different species.
The three classes in the Iris dataset are:
1. Iris-setosa (n=50)
2. Iris-versicolor (n=50)
3. Iris-virginica (n=50)
And the four features of in Iris dataset are:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
Example of Calculating a Covariance Matrix
The covariance matrix is a math concept that occurs in several areas of machine learning. If you have a set of n numeric data
items, where each data item has d dimensions, then the covariance matrix is a d-by-d symmetric square matrix where there are
variance values on the diagonal and covariance values off the diagonal.
Suppose you have a set of n=5 data items, representing 5 people, where each data item has a Height (X), test Score (Y), and Age (Z)
(therefore d = 3):
X Y Z
Height Score Age
64.0 580.0 29.0
66.0 570.0 33.0
68.0 590.0 37.0
69.0 660.0 46.0
73.0 600.0 55.0
mean = 68.0 600.0 40.0
n=5
The covariance matrix for this data set is:
X Y Z
X 11.50 50.00 34.75
Y 50.00 1250.00 205.00
Z 34.75 205.00 110.00
The 11.50 is the variance of X, 1250.0 is the variance of Y, and 110.0 is the variance of Z. For variance, in words, subtract each
value from the dimension mean. Square, add them up, and divide by n-1. For example, for X:
Var(X) = [ (64–68.0)^2 + (66–68.0^2 + (68-68.0)^2 + (69-68.0)^2 +(73-68.0)^2 ] / (5-1) = (16.0 + 4.0 + 0.0 + 1.0 + 25.0) / 4 =
46.0 / 4 = 11.50.
The covariance for XY is best shown by example:
Covar(XY) =
[ (64-68.0)*(580-600.0) + (66-68.0)*(570-600.0) + (68-68.0)*(590-600.0) + (69-68.0)*(660-600.0) + (73-68.0)*(600-600.0) ] / (5-1) =
[80.0 + 60.0 + 0 + 60.0 + 0] / 4 =
200 / 4 = 50.0
If you examine the calculations carefully, you’ll see the pattern to compute the covariance of the XZ and YZ columns. And you’ll see that Covar(XY) =
Covar(YX).
One way to think about a covariance matrix is that it is a numerical summary of how variable a dataset is.
%matplotlib inline is an example of a predefined magic function in Ipython. They are frequently used in
interactive environments like jupyter notebook. %matplotlib inline makes your plot outputs appear and
be stored within the notebook.
Python | Pandas DataFrame. Pandas DataFrame is two-dimensional size-mutable, potentially

heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-
dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
pandas is a library that you install, so it's local to your Python installation. import pandas as pd. Simply
imports the library the current namespace, but rather than using the name pandas , it's instructed to
use the name pd instead.
pyplot is matplotlib's plotting framework. That specific import line merely imports the module
"matplotlib.pyplot" and binds that to the name "plt".
Numpy is a general-purpose array-processing package. It provides a high-performance multidimensional

array object, and tools for working with these arrays. It is the fundamental package for scientific
computing with Python.
Seaborn. The seaborn package was developed based on the Matplotlib library. It is used to create more
attractive and informative statistical graphics. While seaborn is a different package, it can also be used
to develop the attractiveness of matplotlib graphics
customizing the properties and default styles of Matplotlib

Info Pca

Uploaded by

Copyright:

Available Formats

Info Pca

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Info Pca

Uploaded by

Copyright:

Available Formats

https://www.gatevidyalay.

PCA can also be used to filter noisy datasets, such as image compression.

The first principal component expresses the most amount of variance

PCA can be applied to both unsupervised and supervised learning scenarios.

Preparing the Iris Dataset

And the four features of in Iris dataset are:

Height Score Age

64.0 580.0 29.0

66.0 570.0 33.0

68.0 590.0 37.0

69.0 660.0 46.0

73.0 600.0 55.0

mean = 68.0 600.0 40.0

The covariance matrix for this data set is:

X 11.50 50.00 34.75

Y 50.00 1250.00 205.00

Z 34.75 205.00 110.00

The covariance for XY is best shown by example:

[ (64-68.0)*(580-600.0) + (66-68.0)*(570-600.0) + (68-68.0)*(590-600.0) + (69-68.0)*(660-600.0) + (73-68.0)*(600-600.0) ] / (5-1) =

[80.0 + 60.0 + 0 + 60.0 + 0] / 4 =

Python | Pandas DataFrame. Pandas DataFrame is two-dimensional size-mutable, potentially

Numpy is a general-purpose array-processing package. It provides a high-performance multidimensional

customizing the properties and default styles of Matplotlib

You might also like

[ (64-68.0)(580-600.0) + (66-68.0)(570-600.0) + (68-68.0)(590-600.0) + (69-68.0)(660-600.0) + (73-68.0)*(600-600.0) ] / (5-1) =