Principal Component Analysis

Principal
Component
Analysis
Ricardo Wendell
Aug 2013

2
Feature Engineering
(Our motivation)
Introduction to Principal Component
Analysis
(And some statistical concepts)
Agile Analytics and PCA
(Helping visualization…)
Agenda

4
Given a
classiﬁcation
problem…
How do we choose
the right features?

5
Intuition
fails in high
dimensions
Building a classifier in two or three
dimensions is relatively easy…
It’s usually possible to find a
reasonable frontier between
examples of different
classes just by visual inspection.

6
Feature
engineering
Intuitively, one might
think that gathering
more features never
hurts, right?
At worst they provide
no new information
about the domain…

7
The curse of
dimensionality
Many algorithms that work ﬁne in low
dimensions become intractable
when the input is high-dimensional.
Bellman, 1961

8
How do we
solve it?
Feature Selection
Feature Extraction

9
Feature
extraction
“In most applications
examples are not spread
uniformly throughout the
examples space, but are
concentrated on or near
a lower-dimensional
subspace.”

11
Objective of
PCA
To perform dimensionality
reduction while preserving
as much of the randomness
in the high-dimensional
space as possible

12
Principal
Component
Analysis
It takes your cloud of data
points, and rotates it such
that the maximum variability
is visible.
PCA is mainly concerned
with identifying correlations
in the data.

13
Measuring
Correlation
Degree and type of relationship
between any two or more quantities
(variables) in which they vary together
over a period
Correlation can vary from +1 to -1.
Values close to +1 indicate a high-
degree of positive correlation, and
values close to -1 indicate a high
degree of negative correlation.
Values close to zero indicate poor
correlation of either kind, and 0
indicates no correlation at all

15
Beware: Correlation does not
imply causation

16
Correlation
matrix
It shows at a glance how
variables correlate with
each other

17
Eingenvalues
and
eingevectors

18
Steps for PCA 1. Standardize the data
2. Calculate the covariance matrix
3. Find the eigenvalues and
eingenvectors of the covariance
matrix
4. Plot the eigenvectors / principal
components over the scaled data

19
Demo
with R
Let’s check the products
of PCA…

21
Agile
Analytics
Machine learning and data
mining tools and techniques
+
Knowledge of the
domain at hand
+
Short feedback cycles

22
Agile
Analytics
We could use PCA as a tool to
quickly identify correlation
between features, helping
feature extraction and
selection.
Reducing dimensionality using
PCA or other similar technique
can help us achieve better and
quicker results.

Principal Component Analysis

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to Principal Component Analysis

Similar to Principal Component Analysis (20)

More from Ricardo Wendell Rodrigues da Silveira

More from Ricardo Wendell Rodrigues da Silveira (6)

Recently uploaded

Recently uploaded (20)

Principal Component Analysis