Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ISLR Chap 4 Shaheryar

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Classification

By
Syed Shaheryar Zahur

Convergent Business Technologies.


Overview
• Approach for predicting qualitative responses
• Involves assigning observation to a category or class
• Predicts the probability of observation’s belonging to a category

Convergent Business Technologies.


Why not Linear Regression?
1. Cannot accommodate a qualitative response with more than two
classes
2. Will not provide meaningful estimates with just two classes

Convergent Business Technologies.


Logistic model
• Models probability that response belongs to a particular category
• In order to get probabilities within range 0 and 1 a logistic function
is used

• For estimation of parameters:


Maximum likelihood is preferred

Convergent Business Technologies.


Multiple Logistic Regressions
• There may be some association between the different predictors

• Confounding phenomenon

Convergent Business Technologies.


Multinomial Logistic Regression
• A variable with more than two classes
• Process
• Select a baseline
• Then use model

• Selection order for baseline is unimportant


Convergent Business Technologies.
Generative Models
• Why need for another method
I. When substantial difference between classes - parameters for logistic
regression unstable
II. If distribution of predictors approximately normal and sample size is
small – more accurate
III. Methods can be extended to the case of more than two response
variables

Convergent Business Technologies.


Linear Discriminant Analysis
• Finds best linear equation that clearly
separates the data into classes
• Assumptions
• Distribution of variables is normal
• Shared covariance across all classes

Convergent Business Technologies.


Linear Discriminant Analysis

threshold = 0.5 threshold = 0.2

Convergent Business Technologies.


Linear Discriminant Analysis
• ROC – a graph showing performance of classification model at all
classification thresholds

Convergent Business Technologies.


Quadratic Discriminant Analysis
• Assumptions
• Distribution of variables is normal
• Each class has its own covariance matrix
• More flexible than LDA

• Used when
• Variance of classifier is not a concern

Convergent Business Technologies.


Naïve Bayes Classifier
• Assumptions
• Within class, predictors are independent

• Introduces some bias but reduces variance

Convergent Business Technologies.


Comparison

Bayes boundary line linear Bayes boundary line Bayes boundary line
20 observations linear linear
Uncorrelated random 20 observations 50 observations
variables with different mean Random variables with Non normal distribution
in each class different mean in each Correlation of 0.5
class
Correlation of -0.5
Convergent Business Technologies.
Comparison

Non-linear decision Non-linear decision Non-linear decision


boundary boundary boundary
Correlation of 0.5 Normal distribution Normal distribution
and -0.5 within Complicated non- Sample size = 6
classes linear function of
predictors

Convergent Business Technologies.


Poisson Distribution
• Response can be neither qualitative nor quantitative
E.g Counts – non-negative integer value
• Distinction from linear regression model
• Interpretation
• Mean-Variance Relationship
• Non-negative fitted values

Convergent Business Technologies.


Thank you.

©2019 Convergent Business Technologies

Convergent Business Technologies.

You might also like