Info Pca
Info Pca
Info Pca
com/tag/principal-component-analysis-example-ppt
If life is like a bowl of chocolates, you will never know what you will get, but
is there a way to reduce some uncertainty? Dimensionality reduction is the
process of reducing the number of random variables impacting your data.
Come and explore, but make sure you don't let the chocolates melt.
Principal Component Analysis (PCA) is an unsupervised, non-parametric statistical technique primarily used for dimensionality
reduction in machine learning.
it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational
complexity of the model which makes machine learning algorithms run faster.
Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as
stock market predictions, the analysis of gene expression data, and many more.
the main goal of a PCA analysis is to identify patterns in data; PCA aims to detect the correlation between variables. If a strong correlation between
variables exists, the attempt to reduce the dimensionality only makes sense. In a nutshell, this is what PCA is all about: Finding the directions of
maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information.
Both Linear Discriminant Analysis (LDA) and PCA are linear transformation methods. PCA yields the directions (principal components) that maximize
the variance of the data, whereas LDA also aims to find the directions that maximize the separation (or discrimination) between different classes, which
can be useful in pattern classification problem (PCA "ignores" class labels).
1. Iris-setosa (n=50)
2. Iris-versicolor (n=50)
3. Iris-virginica (n=50)
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
Example of Calculating a Covariance Matrix
The covariance matrix is a math concept that occurs in several areas of machine learning. If you have a set of n numeric data
items, where each data item has d dimensions, then the covariance matrix is a d-by-d symmetric square matrix where there are
variance values on the diagonal and covariance values off the diagonal.
Suppose you have a set of n=5 data items, representing 5 people, where each data item has a Height (X), test Score (Y), and Age (Z)
(therefore d = 3):
X Y Z
n=5
X Y Z
The 11.50 is the variance of X, 1250.0 is the variance of Y, and 110.0 is the variance of Z. For variance, in words, subtract each
value from the dimension mean. Square, add them up, and divide by n-1. For example, for X:
Var(X) = [ (64–68.0)^2 + (66–68.0^2 + (68-68.0)^2 + (69-68.0)^2 +(73-68.0)^2 ] / (5-1) = (16.0 + 4.0 + 0.0 + 1.0 + 25.0) / 4 =
46.0 / 4 = 11.50.
Covar(XY) =
200 / 4 = 50.0
If you examine the calculations carefully, you’ll see the pattern to compute the covariance of the XZ and YZ columns. And you’ll see that Covar(XY) =
Covar(YX).
One way to think about a covariance matrix is that it is a numerical summary of how variable a dataset is.
%matplotlib inline is an example of a predefined magic function in Ipython. They are frequently used in
interactive environments like jupyter notebook. %matplotlib inline makes your plot outputs appear and
be stored within the notebook.
pandas is a library that you install, so it's local to your Python installation. import pandas as pd. Simply
imports the library the current namespace, but rather than using the name pandas , it's instructed to
use the name pd instead.
pyplot is matplotlib's plotting framework. That specific import line merely imports the module
"matplotlib.pyplot" and binds that to the name "plt".
Seaborn. The seaborn package was developed based on the Matplotlib library. It is used to create more
attractive and informative statistical graphics. While seaborn is a different package, it can also be used
to develop the attractiveness of matplotlib graphics