0% found this document useful (0 votes)

43 views

Dat Science: CLASS 11: Clustering and Dimensionality Reduction

The document discusses dimensionality reduction techniques for clustering and analyzing high-dimensional data. It covers clustering algorithms like K-means and hierarchical clustering. It also discusses principal component analysis (PCA) which transforms correlated variables into a set of linearly uncorrelated variables called principal components. PCA works by calculating eigenvalues and eigenvectors of the covariance matrix to decompose data into principal components that maximize variance.

Uploaded by

ashishamitav123

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Dat Science: CLASS 11: Clustering and Dimensionality Reduction

Uploaded by

ashishamitav123

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

DAT SCIENCE

CLASS 11:
CLUSTERING AND
DIMENSIONALITY
AGENDA 2

I. CLUSTERING (K-MEANS AND

HIERARCHICAL)
II. PRINCIPAL COMPONENTS ANALYSIS
IV. SUPPORT VECTOR MACHINES
CLUSTERING AND DIMENSIONALITY
REDUCTION

I. CLUSTERING
CLUSTERING 4
• Clustering is an unsupervised method where observations are grouped together due to
their feature similarity, but in a way not optimized to predict a certain class or feature.

• You can think of clustering as jut another form of dimensionality reduction – we are
reducing k features used to make the cluster to just 1 feature – the clusters themselves.

• Clustering is useful for (a) recommendation algorithms (especially if the customer

does not show clear intent to buy a specific product); (b) reducing dimensionality
ahead of prediction, (c) grouping or binning data (such as customer behavior) in an
objective, machine-driven way; and (d) visualizing data.
CLUSTERING 5
• There are many type of clustering depending on your application. The most common
are K-Means and Hierarchical clustering.

• K-means clustering creates clusters of data around centroids the ‘average’ points of all
the points in the data.

• Hierarchical clustering groups data together by absolute distance, and then further
groups up the hierarchy when distances cross a given threshold.
K-MEANS IMPLEMENTATION 6
• The most popular implementation of K-means (called Lloyd’s algorithm) uses
the following process to ‘lock in’ on the data’s proper centroids:

1. Pick a number of clusters you want to create, k.

2. Assign a random k observations as the centroids of the data set.
3. Calculate the distance of each observation to each k centroids.
4. Assign the observation to the cluster of the nearest k centroid.
5. Re-calculate each centroid to the ‘average’ value of its cluster.
6. Reassign each observation to the cluster corresponding to its nearest
centroid.
7. Continue steps 5-6 until no observations are re-assigned after new
centroids are calculated OR the reassignment no longer decreases mean
cluster loss (as defined by average Euclidian distance of all points to their
centroids).
K-MEANS 7
K-MEANS PROS AND CONS 8
• Pros:
• Simple, intuitive algorithm
• Fast execution
• Effective for two-dimensional or geospatial data

• Cons:
• Tendency to converge to local minima or dense regions of data (especially if you
pick your starting points at random)
• Produces nonsensical centroids if data is not closely and tightly dispersed
K-MEANS EXTENSIONS 9
• Due to the limitations of K-means, a number of related methods are more commonly
used to derive cluster meaning:

• One simple extension of K-means is to repeatedly run the algorithm with different
initialization sets, and average the results.
• K-medoids assigns centroids to actual observations in your dataset.
• Expectation-Maximization (EM) clustering derives clusters by calculating
confidence that the found centroids are the ‘true’ centroids for the dataset.
• Density Based Scanning (DBSCAN) looks at the median difference between
points in the cluster, so is robust to non-linear cluster combinations.
K-MEDOIDS 1
0
EM CLUSTERING 1
1
HIERARCHICAL CLUSTERING 1
2
• While k-means clustering tries to determine a given
set of discrete clusters, hierarchical clustering
attempts to determine the relationship between each
observation and cluster.

• Hierarchical clustering is typically visualized via a

dendrogram (seen to your right), a representation of
the relationship of each point against some sort of
dissimilarity measure (typically Euclidian distance).
HIERARCHICAL CLUSTERING 1
3
• Hierarchical clustering typically shows you a
more accurate representation of similarity
between your data than most other techniques.

• However, as you must chose an arbitrary cutoff

point to split your data, you may get highly
unbalanced clusters or markedly different
numbers of clusters as the data changes
through time.

• Examples of use cases include identifying

customer behavior groupings or similarities
between genes.
IN-CLASS EXERCISE: CLUSTERING 1
4
• Using the Baseball dataset, we will try to cluster teams by two dimensions: their
average salaries and their number of hits:
1. Create an annotated plot of the data showing team name.
2. Perform K-means clustering and plot the results.
3. Perform DBSCAN clustering and plot the results.
4. Create a hierarchical cluster of the data and plot a dendrogram.
5. Create a cutoff in the dendrogram to create a discrete number of clusters.
6. Plot the results and compare to K-means and DBSCAN.
CLUSTERING AND DIMENSIONALITY
REDUCTION

II. PRINCIPAL
COMPONENT
ANALYSIS
PRINCIPAL COMPONENT ANALYSIS 1
6
• Recall that in previous classes, you learned feature selection, i.e. a recursive process to
determine the bag of variables that allow your model to optimize predictive accuracy.

• But, recall the problems with recursive feature elimination:

• Current implementations do not explore all possible feature interactions.
• Elimination is done in a rank-ordered way, which can be misleading as rank and/or
significance of a feature can change as you eliminate other features.
• Implementation of RFE is very computationally intensive.
• There is no guarantee that you will eliminate linearly related variables if they make
it through your initial preprocessing.
PRINCIPAL COMPONENT ANALYSIS 1
7
• Moreover, recursive feature elimination can become unstable if you have multiple
correlated or related features, each of which has a weak contribution to overall
accuracy.
• This shows the curse of dimensionality: as you add related features to your dataset, it
is march harder for a machine learning algorithm to determine which ones are truly
predictive, and which are just correlated to the predictive features.
• This typically gives you nonsensical results, where, for example, you see highly
positive and negative coefficients if you use linear methods.
• Most social science and business data suffers from this – for example, when you see a
rise in revenue, most of your other features rise as well – but which one drove the rise
in revenue?
PRINCIPAL COMPONENT ANALYSIS 1
8
• To get around these issues, data scientists often use principal component analysis to
‘decompose’ the data into n dimensions.

• However, the simpler methods you will learn today (such as PCA) will only
decompose your data properly if they have a linear relationship with one another.

• In addition, once you have transformed your data, it becomes un-interpretable as it

no longer has a direct connection to any one feature. As such, you will have no way of
identifying a problem with an underlying feature by looking at the PCA output.

• In addition, PCA is dependent on the initial scaling of the variables – so if one variable
is scaled to have a larger magnitude, it will dominate your decomposition.
PRINCIPAL COMPONENT ANALYSIS 1
9
• Principal Component Analysis (PCA) decomposes your (sometimes-correlated)
variables into a set of linearly uncorrelated variables called principal
components.
• Here’s how principal component analysis works:
1. Scale each feature around 0 by subtracting the mean from each observation.
2. Calculate a covariance matrix between each scaled variable in the data.
3. Calculate eigenvalues and eigenvectors of the matrix.
• Eigenvectors effectively work like OLS regressions, by best fitting the data. Each
subsequent eigenvector fits the residuals of the previous eigenvector.
4. Sort the eigenvectors by the size of their corresponding eigenvalues and and
determine a cutoff (typically around 0) below which you discard the eigenvector.
5. Fit each eigenvector to your original data. The first eigenvector you fit is called your
first principal component.
• See http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf for
more information.
WHAT’S AN EIGENVALUE AND AN 2
EIGENVECTOR? 0
• Eigenvectors are in essence a linear system that solve each row of a matrix to zero.
Eigenvalues are the scaler or ‘fit’ on the eigenvector.

• Say you have the following matrix:

• You can solve for your eigenvalues via the

following equation:

• Eigenvectors are solved this way:

(using the value of lambda as zero)
HOW MANY PRINCIPAL COMPONENTS 2
SHOULD YOU KEEP? 1
• A scree plot lets you look at how
much variance is explained by each
principal component.

• Apply the elbow test to the plot:

only take those components to the
left of the ‘elbow’ in explained
variance.
RELATED TECHNIQUES 2
2
• Besides PCA, there are a number of other related techniques that rely on matrix
decomposition:

• Singular Value Decomposition

• Pros: Less prone to overfitting than PCA, faster to compute
• Cons: Same as PCA: assumes linear relationships between variables, no guarantee that
principal components will be fit on features that are most important to dividing classes or
helping predict your regressand.

• Linear Discriminant Analysis

• Pros: Dimensionality reduction is supervised, so differences returned by their nature are
relevant to your classification or regression problem.
• Cons: Assumes input features are normally distributed, assumes output classification
follows a functional (linear, quadratic. etc.) decision boundary
RELATED TECHNIQUES 2
3
• Multidimensional Scaling
• Pros: does not assume linear relationships between variables or normal distributions
among them.
• Cons: longer computation time, less available documentation to properly tune, still no
guarantee features extracted will be relevant to your regression/classification question.
IN-CLASS ASSIGNMENT: PCA 2
4
• Perform PCA on the features we used in last class’s dataset.

• Plot the first two principal component of the explanatory data.

• Use a scree plot to determine how many principal components you should keep.

• Run a random forest classifier on the retained principal components.

• Evaluate out-of-sample accuracy against non-transformed data.

CLUSTERING AND DIMENSIONALITY
REDUCTION

III. SUPPORT
VECTOR MACHINES
SUPPORT VECTOR MACHINES 2
6
• Support Vector Machines (SVMs) are a set of classifiers that use similar techniques to
PCA and other matrix-based dimensionality reduction techniques.

• SVMs apply a function (called a kernel function) on the independent features and find
the best interaction of the function results that separate the classes (for classification)
or best follow the variance of the response feature (for regression).

• SVMs work best if you have multiple ‘weak inputs’ that have some sort of strong
underlying signal between them.
SUPPORT VECTOR MACHINES 2
7
• Pros: • Cons:
• Accuracy on par with RFs GBMs • Computationally intensive
• Very good for picture and text analysis • Hard to debug
• Works well with trending data (unlike RFs!) • Typically no intuition on which
• No ‘jagged edges’ in regression. kernel function works best
KERNEL FUNCTIONS 2
8
• SVMs include the option of many different kernel functions, including:
• Linear
• Polynomial
• RBF
• Sigmoid
• Any Python function you want!

• You’ll have to use grid search to determine which kernel is best, which can take a very
long time!
IN-CLASS EXERCISE: SVMS 2
9
• Run a SVM with the kernel as ‘polynomial’
• Perform grid search to find the optimal kernel for our use case.
• Compare the accuracy of the final model to the PCA result and the raw random forest.
‣CLUSTERING AND DIMENSIONALITY
REDUCTION

ADIOS, AMIGOS!

Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
What Is Principal Component Analysis For Dummies
No ratings yet
What Is Principal Component Analysis For Dummies
6 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
cheat sheet
No ratings yet
cheat sheet
2 pages
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
No ratings yet
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
15 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
topic 2
No ratings yet
topic 2
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
QSRI-lecture4
No ratings yet
QSRI-lecture4
56 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
nnml_ml3
No ratings yet
nnml_ml3
84 pages
UNIT-4
No ratings yet
UNIT-4
79 pages
Data Mining Algorithms in R PDF
No ratings yet
Data Mining Algorithms in R PDF
266 pages
Pac
No ratings yet
Pac
70 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
3
No ratings yet
3
12 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Project Report Data Mining
No ratings yet
Project Report Data Mining
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Practical Statistics For Data Science - Chapter7
No ratings yet
Practical Statistics For Data Science - Chapter7
12 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Pattern Recognition Techniques
No ratings yet
Pattern Recognition Techniques
13 pages
Dimensional Reduction in R
No ratings yet
Dimensional Reduction in R
24 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Unit-3
No ratings yet
Unit-3
28 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
16 pages
Pca
No ratings yet
Pca
6 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
29 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DPM 40(Solutions)
No ratings yet
DPM 40(Solutions)
8 pages
DPM 57(Solutions)
No ratings yet
DPM 57(Solutions)
12 pages
DPM 4 (Solutions)
No ratings yet
DPM 4 (Solutions)
6 pages
DPM 4
No ratings yet
DPM 4
10 pages
DPM 91(Solutions)
No ratings yet
DPM 91(Solutions)
9 pages
DPM 56(Solutions)
No ratings yet
DPM 56(Solutions)
8 pages
DPM 18(Solutions)
No ratings yet
DPM 18(Solutions)
7 pages
DPM 92(Solutions)
No ratings yet
DPM 92(Solutions)
8 pages
DPM 95
No ratings yet
DPM 95
13 pages
DPM 96
No ratings yet
DPM 96
14 pages
DPM 95(Solutions)
No ratings yet
DPM 95(Solutions)
9 pages
Osgood-Schramm Model of Communication
No ratings yet
Osgood-Schramm Model of Communication
6 pages
Face Recognition in Real-World Surveillance Videos With Deep Learning Method
No ratings yet
Face Recognition in Real-World Surveillance Videos With Deep Learning Method
5 pages
MLT Synopsis
No ratings yet
MLT Synopsis
2 pages
0.1 Mmai Mma Gmma 869 Syllabus
No ratings yet
0.1 Mmai Mma Gmma 869 Syllabus
16 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Introduction To Artificial Intelligence: Materi 1
No ratings yet
Introduction To Artificial Intelligence: Materi 1
29 pages
Software Requirements Specification For Sales Prediction Model Page-Ii
No ratings yet
Software Requirements Specification For Sales Prediction Model Page-Ii
11 pages
Machine Learning Engineer Interview Questions
No ratings yet
Machine Learning Engineer Interview Questions
2 pages
UGEC1281&JASP1120 3sep19
No ratings yet
UGEC1281&JASP1120 3sep19
4 pages
Gen AI
No ratings yet
Gen AI
8 pages
FV Tables
No ratings yet
FV Tables
337 pages
Akhil Reddy GCP
No ratings yet
Akhil Reddy GCP
8 pages
Advanced Machine Learning and Artificial Intelligence
No ratings yet
Advanced Machine Learning and Artificial Intelligence
9 pages
CLASS - 12 - CS - PRACTICAL Mysql
No ratings yet
CLASS - 12 - CS - PRACTICAL Mysql
9 pages
Iv. Single Layer Structures: 4.1. Perceptrons
No ratings yet
Iv. Single Layer Structures: 4.1. Perceptrons
26 pages
Operation Research
0% (3)
Operation Research
3 pages
Course Outline HCI
No ratings yet
Course Outline HCI
3 pages
Machine Learning Notes - Lec 05 - Text Claasification Using Scikit-Learn
No ratings yet
Machine Learning Notes - Lec 05 - Text Claasification Using Scikit-Learn
75 pages
L22 DecisionTrees
No ratings yet
L22 DecisionTrees
14 pages
978 3 659 18926 5communicationskillsbook PDF
No ratings yet
978 3 659 18926 5communicationskillsbook PDF
118 pages
Branch: Electronics and Communication Engineering Ec010 502-Control Systems
No ratings yet
Branch: Electronics and Communication Engineering Ec010 502-Control Systems
3 pages
SQL Week-3
No ratings yet
SQL Week-3
29 pages
Project Report (Amazon Review (Sentiment Analysis) )
No ratings yet
Project Report (Amazon Review (Sentiment Analysis) )
31 pages
Robotics Explorer (150 Classes) - Skyfi Labs
No ratings yet
Robotics Explorer (150 Classes) - Skyfi Labs
6 pages
Machine Learning Basics Infographic With Algorithm Examples PDF
No ratings yet
Machine Learning Basics Infographic With Algorithm Examples PDF
1 page
Extensions of Fuzzy Cognitive Maps - A Systematic Review
No ratings yet
Extensions of Fuzzy Cognitive Maps - A Systematic Review
36 pages
AI Image Generation - Dall-E
No ratings yet
AI Image Generation - Dall-E
10 pages
An Expert System Is A Computer Program That Is Designed To Hold The Accumulated Knowledge of One or More Domain Experts
100% (1)
An Expert System Is A Computer Program That Is Designed To Hold The Accumulated Knowledge of One or More Domain Experts
19 pages
Icon 18 Rev Rec Chelli Ah
No ratings yet
Icon 18 Rev Rec Chelli Ah
53 pages