Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ISLR Chap 10 Shaheryar-Mutahira

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Unsupervised Learning

Mutahira Khalid Chaudry & Syed Shaheryar Zahur

Convergent Business Technologies.


Overview
• Approach
• Principal Component Analysis
• Clustering

Convergent Business Technologies.


Unsupervised Learning
• Definition
• Goal in unsupervised learning
• More challenging than supervised

Convergent Business Technologies.


Principal Component Analysis
• Definition
• An unsupervised approach
• Serves as a tool for data visualization

Convergent Business Technologies.


Principal Components
• Definition
• Mathematical form

• Loading vectors define direction

• Squared sum of loading vectors should equal to 1

Convergent Business Technologies.


Principal Components
• Variables centered to mean 0
• Linear combination with largest variance

• Second PC is orthogonal to PC1


• PC1 and PC2 are uncorrelated

Convergent Business Technologies.


Example
• PC1 weights on crimes
• PC2 weights on UrbanPop
• States with large positive score
have high crime rates

Convergent Business Technologies.


More on PCA
• Scaling variables
• Uniqueness of PC
• Proportion of variance
• Deciding on number of PC

Scree Plot

Convergent Business Technologies.


Clustering Methods
• Techniques for finding sub-groups/clusters
• Find homogenous subgroups among observations

 Two approaches
I. K-means Clustering:
partition observations into pre-specified number of clusters

II. Hierarchical Clustering:


Tree-like visualization of observations - dendrogram

Convergent Business Technologies.


K-means Clustering
• Simple and elegant
• Partition into distinct, non-overlapping clusters
• Number of clusters specified beforehand

Convergent Business Technologies.


K-means Clustering

Convergent Business Technologies.


K-means Clustering

Objective value:

Convergent Business Technologies.


Hierarchical Clustering
• Does not require choice of K
• Attractive tree-based representation - dendrogram
• Bottom-up/Agglomerative clustering
• Built starting from leaves
• Combining clusters up to trunk

Convergent Business Technologies.


Hierarchical Clustering
• Obtaining number of clusters

Convergent Business Technologies.


Hierarchical Clustering
• Choice of dissimilarity important:
• Linkage defines dissimilarity
I. Complete Linkage
II. Single Linkage
III. Average Linkage
IV. Centroid Linkage

Convergent Business Technologies.


Practical Issues in Clustering
• Consideration of dissimilarity measure (hierarchical clustering)
• Consideration of standardization (both approaches)
• Not very robust models

Convergent Business Technologies.


Thank you.

©2019 Convergent Business Technologies

Convergent Business Technologies.

You might also like