0% found this document useful (0 votes)

2 views

Machine Learning-Lecture 2(Student)

The document discusses classification methods, contrasting linear regression with classification techniques, particularly logistic regression and linear discriminant analysis. It provides examples of classification scenarios, such as diagnosing medical conditions and detecting fraudulent transactions, and explains the logistic regression model's application in predicting credit card defaults. Additionally, it includes a computer session using R to analyze a dataset, demonstrating the implementation of logistic regression and evaluating prediction accuracy.

Uploaded by

hubertkuo418

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Machine Learning-Lecture 2(Student)

Uploaded by

hubertkuo418

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Lecture 2: Classification I

Overview
⚫ Linear Regression vs. Classification
The linear regression model assumes that the response variable Y is . But
in many situations, the response variable is instead .
⚫ Examples of Classification
1. A person arrives at the emergency room with a set of symptoms that could
possibly be attributed to one of three medical conditions. Which of the three
conditions does the individual have?
2. An online banking service must be able to determine whether or not a
transaction being performed on the site is fraudulent, on the basis of the user’s
IP address, past transaction history, and so forth.
3. On the basis of DNA sequence data for a number of patients with and without a
given disease, a biologist would like to figure out which DNA mutations are
deleterious (disease-causing) and which are not.
⚫ Why not linear regression?
Suppose that we are trying to predict the medical condition of a patient in the
emergency room on the basis of her symptoms. In this simplified example, there are
three possible diagnoses: stroke, drug overdose, and epileptic seizure. We could
consider encoding these values as a quantitative response variable, Y , as follows:

⚫ Three of the most widely-used classifiers

1
Logistic Regression
⚫ An Example
We will illustrate the concept of classification using the simulated Default data set. We
are interested in predicting whether an individual will default on his or her credit
card payment, on the basis of annual income and monthly credit card balance. The
data set is displayed in Figure 4.1. We have plotted annual income and monthly credit
card balance for a subset of 10, 000 individuals.

⚫ The Logistic Model

➢ For the Default data, logistic regression models the probability of default. For
example, the probability of default given balance can be written as

➢ using a linear regression model to represent these probabilities:

➢ Taking log-odds or logit:

⚫ Estimation
A likelihood function

2
⚫ Making Prediction

we predict that the default probability for an individual with a balance of $1, 000 is

⚫ Multiple Logistic regression

For example, a student with a credit card balance of $1, 500 and an income of $40,
000 has an estimated probability of default of

⚫ Multiple-Class (K>2)
The two-class logistic regression models have multiple-class extensions, but in
practice they tend not to be used all that often.

3
Linear Discriminant Analysis
⚫ Why not using logistic regression
1. When the classes are well-separated, the parameter estimates for the
logistic regression model are surprisingly unstable.

2. If n is small and the distribution of the predictors X is approximately

normal in each of the classes,

3. As mentioned in last page, linear discriminant analysis is popular when we have

more than two response classes.
⚫ Using Bayes’ Theorem for Classification
➢ Rules of multiplication

➢ Bayes' Thm

➢ Let represent the overall or prior probability that a randomly chosen

observation comes from the kth class; this is the probability that a given
observation is associated with the kth category of the response variable Y .
Let denote the density function of X for an observation that
comes from the kth class. In other words, fk(x) is relatively large if there is a
high probability that an observation in the kth class has X ≈ x. The Bayes
Thm states that

we will use the abbreviation . In general, estimating πk is

easy if we have a random sample of Y s from the population: we simply
compute the fraction of the training observations that belong to the kth
class. However, estimating fk(X) tends to be more challenging, unless we
assume some simple forms for these densities. We refer to pk(x) as the
that an observation X = x belongs to the kth
class. That is, it is the probability that the observation belongs to the kth
class, given the predictor value for that observation. (to be continued.....)

4
Computer Session
library(ISLR2)

## Warning: 套件 'ISLR2' 是用 R 版本 4.3.2 來建造的

names(Smarket)

## [1] "Year" "Lag1" "Lag2" "Lag3" "Lag4" "Lag5"

## [7] "Volume" "Today" "Direction"

summary(Smarket)

## Year Lag1 Lag2 Lag3

## Min. :2001 Min. :-4.922000 Min. :-4.922000 Min. :-4.922000

## 1st Qu.:2002 1st Qu.:-0.639500 1st Qu.:-0.639500 1st Qu.:-0.640000

## Median :2003 Median : 0.039000 Median : 0.039000 Median : 0.038500

## Mean :2003 Mean : 0.003834 Mean : 0.003919 Mean : 0.001716

## 3rd Qu.:2004 3rd Qu.: 0.596750 3rd Qu.: 0.596750 3rd Qu.: 0.596750

## Max. :2005 Max. : 5.733000 Max. : 5.733000 Max. : 5.733000

## Lag4 Lag5 Volume Today
## Min. :-4.922000 Min. :-4.92200 Min. :0.3561 Min. :-4.922000

## 1st Qu.:-0.640000 1st Qu.:-0.64000 1st Qu.:1.2574 1st Qu.:-0.63950

0
## Median : 0.038500 Median : 0.03850 Median :1.4229 Median : 0.038500

## Mean : 0.001636 Mean : 0.00561 Mean :1.4783 Mean : 0.003138

## 3rd Qu.: 0.596750 3rd Qu.: 0.59700 3rd Qu.:1.6417 3rd Qu.: 0.596750

## Max. : 5.733000 Max. : 5.73300 Max. :3.1525 Max. : 5.733000

## Direction
## Down:602
## Up :648
##

5
pairs(Smarket)

cor(Smarket[, -9])

## Year Lag1 Lag2 Lag3 Lag4

## Year 1.00000000 0.029699649 0.030596422 0.033194581 0.035688718
## Lag1 0.02969965 1.000000000 -0.026294328 -0.010803402 -0.002985911
## Lag2 0.03059642 -0.026294328 1.000000000 -0.025896670 -0.010853533
## Lag3 0.03319458 -0.010803402 -0.025896670 1.000000000 -0.024051036
## Lag4 0.03568872 -0.002985911 -0.010853533 -0.024051036 1.000000000
## Lag5 0.02978799 -0.005674606 -0.003557949 -0.018808338 -0.027083641
## Volume 0.53900647 0.040909908 -0.043383215 -0.041823686 -0.048414246
## Today 0.03009523 -0.026155045 -0.010250033 -0.002447647 -0.006899527
## Lag5 Volume Today
## Year 0.029787995 0.53900647 0.030095229
## Lag1 -0.005674606 0.04090991 -0.026155045
## Lag2 -0.003557949 -0.04338321 -0.010250033
## Lag3 -0.018808338 -0.04182369 -0.002447647
## Lag4 -0.027083641 -0.04841425 -0.006899527
## Lag5 1.000000000 -0.02200231 -0.034860083
## Volume -0.022002315 1.00000000 0.014591823
## Today -0.034860083 0.01459182 1.000000000
6
attach(Smarket)
plot(Volume)

glm.fit=glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data=Smarket,
family=binomial)
summary(glm.fit)

##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 +
## Volume, family = binomial, data = Smarket)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.126000 0.240736 -0.523 0.601
## Lag1 -0.073074 0.050167 -1.457 0.145
## Lag2 -0.042301 0.050086 -0.845 0.398
## Lag3 0.011085 0.049939 0.222 0.824
## Lag4 0.009359 0.049974 0.187 0.851
## Lag5 0.010313 0.049511 0.208 0.835
## Volume 0.135441 0.158360 0.855 0.392
##
7
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1731.2 on 1249 degrees of freedom
## Residual deviance: 1727.6 on 1243 degrees of freedom
## AIC: 1741.6
##
## Number of Fisher Scoring iterations: 3

coef(glm.fit)

## (Intercept) Lag1 Lag2 Lag3 Lag4 Lag

5
## -0.126000257 -0.073073746 -0.042301344 0.011085108 0.009358938 0.01031
3068
## Volume
## 0.135440659

summary(glm.fit)$coef

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -0.126000257 0.24073574 -0.5233966 0.6006983
## Lag1 -0.073073746 0.05016739 -1.4565986 0.1452272
## Lag2 -0.042301344 0.05008605 -0.8445733 0.3983491
## Lag3 0.011085108 0.04993854 0.2219750 0.8243333
## Lag4 0.009358938 0.04997413 0.1872757 0.8514445
## Lag5 0.010313068 0.04951146 0.2082966 0.8349974
## Volume 0.135440659 0.15835970 0.8552723 0.3924004

glm.probs=predict(glm.fit, type="response")
glm.probs[1:10]

## 1 2 3 4 5 6 7 8

## 0.5070841 0.4814679 0.4811388 0.5152224 0.5107812 0.5069565 0.4926509 0.5092292

## 9 10

## 0.5176135 0.4888378

contrasts(Direction)

## Up
## Down 0
## Up 1

8
glm.pred=rep("Down", 1250)
glm.pred[glm.probs > .5]="Up"
table(glm.pred, Direction)

## Direction
## glm.pred Down Up
## Down 145 141
## Up 457 507

(507+145)/1250

## [1] 0.5216

mean(glm.pred==Direction)

## [1] 0.5216

train=(Year<2005)
Smarket.2005=Smarket[!train, ]
dim(Smarket.2005)

## [1] 252 9

Direction.2005=Direction[!train]
glm.fit=glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data=Smarket,
family=binomial, subset=train)
glm.probs=predict(glm.fit, Smarket.2005, type="response")
glm.pred=rep("Down ", 252)
glm.pred[glm.probs>.5]="Up"
table(glm.pred, Direction.2005)

## Direction.2005
## glm.pred Down Up
## Down 77 97
## Up 34 44

mean(glm.pred==Direction.2005)

## [1] 0.1746032

mean(glm.pred!=Direction.2005)

## [1] 0.8253968

Anova Ancova Manova Mancova
100% (3)
Anova Ancova Manova Mancova
1 page
Emperical Measurement of Price
No ratings yet
Emperical Measurement of Price
5 pages
Test Bank Questions Chapter 5
No ratings yet
Test Bank Questions Chapter 5
5 pages
Classification
No ratings yet
Classification
5 pages
Chapter 4 Exercise 10
No ratings yet
Chapter 4 Exercise 10
8 pages
Lab 5
No ratings yet
Lab 5
6 pages
Resultados
No ratings yet
Resultados
26 pages
FRA Group Assignment - Report
No ratings yet
FRA Group Assignment - Report
22 pages
Sestrada Logistic Regression in R 02172023
No ratings yet
Sestrada Logistic Regression in R 02172023
25 pages
Project
No ratings yet
Project
16 pages
Analyzing The Ionosphere Using R
No ratings yet
Analyzing The Ionosphere Using R
22 pages
HW 4 Andalib
No ratings yet
HW 4 Andalib
23 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
Machine Learning-Lecture 1(Student)
No ratings yet
Machine Learning-Lecture 1(Student)
14 pages
Linear Regression in R
No ratings yet
Linear Regression in R
19 pages
Procedure GLM
No ratings yet
Procedure GLM
37 pages
7 K-Means Clustering
No ratings yet
7 K-Means Clustering
27 pages
BS SRR-3
No ratings yet
BS SRR-3
20 pages
18BCE10291 - Outliers Assignment
No ratings yet
18BCE10291 - Outliers Assignment
10 pages
Yaikob Second Assesiment Final
No ratings yet
Yaikob Second Assesiment Final
33 pages
soruma-SECOND-ASSEsiment l reg
No ratings yet
soruma-SECOND-ASSEsiment l reg
33 pages
ISYE6501-Homework-2
No ratings yet
ISYE6501-Homework-2
11 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Individual Variable Data Analysis: Warning
No ratings yet
Individual Variable Data Analysis: Warning
38 pages
Exam3 510
No ratings yet
Exam3 510
11 pages
Soruma SECOND ASSEsiment Final l Reg
No ratings yet
Soruma SECOND ASSEsiment Final l Reg
34 pages
Mtcars: Choosing The Most Related Variable (S) To The Response
No ratings yet
Mtcars: Choosing The Most Related Variable (S) To The Response
13 pages
Project Report ME-315 Machine Learning in Practice: Sebastian Perez Viegener LSE ID:201870983 July 3, 2019
No ratings yet
Project Report ME-315 Machine Learning in Practice: Sebastian Perez Viegener LSE ID:201870983 July 3, 2019
15 pages
Analysis Course HW3
No ratings yet
Analysis Course HW3
12 pages
Pool
No ratings yet
Pool
13 pages
Analysis Course HW5
No ratings yet
Analysis Course HW5
7 pages
Lecture 3 Logistic Regression
No ratings yet
Lecture 3 Logistic Regression
14 pages
Project Report: Predictive Modelling-Telecom Customer Churn Dataset
No ratings yet
Project Report: Predictive Modelling-Telecom Customer Churn Dataset
35 pages
R Programming Codes Linear Regression
No ratings yet
R Programming Codes Linear Regression
20 pages
LLIURAMENT3
No ratings yet
LLIURAMENT3
18 pages
Leslie Salt Property Project Report
No ratings yet
Leslie Salt Property Project Report
10 pages
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
No ratings yet
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
16 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
ISYE6501-Homework-5
No ratings yet
ISYE6501-Homework-5
5 pages
1.3 Statistical Concept
No ratings yet
1.3 Statistical Concept
12 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Ch5 - Slides 2022 - 12 - 19 - L2
No ratings yet
Ch5 - Slides 2022 - 12 - 19 - L2
30 pages
W3 Exercise
No ratings yet
W3 Exercise
3 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
En Tanagra Kohonen SOM R
No ratings yet
En Tanagra Kohonen SOM R
21 pages
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
No ratings yet
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
10 pages
Project-Report Sample
No ratings yet
Project-Report Sample
59 pages
Stats 200 Problem Set 7
No ratings yet
Stats 200 Problem Set 7
10 pages
A028 GLM-SC3
No ratings yet
A028 GLM-SC3
137 pages
Econ209 f2024 Lab 4 Truong Gia Han
No ratings yet
Econ209 f2024 Lab 4 Truong Gia Han
11 pages
HW5
No ratings yet
HW5
9 pages
Untitled4 Assigment 3
No ratings yet
Untitled4 Assigment 3
9 pages
Problems Involving Mean and Variance of Probability Distributions
50% (2)
Problems Involving Mean and Variance of Probability Distributions
38 pages
Hypotheses
No ratings yet
Hypotheses
29 pages
Analisis Jalur
No ratings yet
Analisis Jalur
30 pages
Practical Machine Learning
No ratings yet
Practical Machine Learning
11 pages
Data Analytics - R Markdown
No ratings yet
Data Analytics - R Markdown
20 pages
HW2_solution_Fall2024
No ratings yet
HW2_solution_Fall2024
9 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Ch23_Lecture 2025 updated
No ratings yet
Ch23_Lecture 2025 updated
55 pages
Support-Vector-Machine-R程式練習2
No ratings yet
Support-Vector-Machine-R程式練習2
3 pages
Ch25_Lecture 2025 updated
No ratings yet
Ch25_Lecture 2025 updated
84 pages
Ch24_Lecture 2025 updated
No ratings yet
Ch24_Lecture 2025 updated
51 pages
Resampling-Methods 411210002
No ratings yet
Resampling-Methods 411210002
3 pages
Ch21_Lecture 2025 updated
No ratings yet
Ch21_Lecture 2025 updated
80 pages
Ch22_Lecture 2025 updated
No ratings yet
Ch22_Lecture 2025 updated
68 pages
Machine Learning-Lecture 17(Student)
No ratings yet
Machine Learning-Lecture 17(Student)
7 pages
分組作業三
No ratings yet
分組作業三
4 pages
Support-Vector-Classifier
No ratings yet
Support-Vector-Classifier
7 pages
Support-Vector-Classifier
No ratings yet
Support-Vector-Classifier
7 pages
ll-LDA-practice-411210002
No ratings yet
ll-LDA-practice-411210002
3 pages
PCA-411210002
No ratings yet
PCA-411210002
4 pages
Mathematical Analysis Lecture 20241029_241105_233716
No ratings yet
Mathematical Analysis Lecture 20241029_241105_233716
12 pages
Mathematical Analysis Lecture 20240925_241012_115607
No ratings yet
Mathematical Analysis Lecture 20240925_241012_115607
13 pages
Improved_Techniques_for_Lower_Bounds_for_Odd_Perfe
No ratings yet
Improved_Techniques_for_Lower_Bounds_for_Odd_Perfe
2 pages
Mathematical Analysis Lecture 20241022_241102_155552
No ratings yet
Mathematical Analysis Lecture 20241022_241102_155552
12 pages
Mathematical Analysis Lecture 20241024_241102_155553
No ratings yet
Mathematical Analysis Lecture 20241024_241102_155553
10 pages
Mathematical Analysis Lecture 20241015_241019_163822
No ratings yet
Mathematical Analysis Lecture 20241015_241019_163822
18 pages
Mathematical Analysis Lecture 20240910_250224_211702
No ratings yet
Mathematical Analysis Lecture 20240910_250224_211702
10 pages
Mathematical Analysis Lecture 20240926_241019_114056
No ratings yet
Mathematical Analysis Lecture 20240926_241019_114056
12 pages
Mathematical Analysis Lecture 20241008_241019_114257
No ratings yet
Mathematical Analysis Lecture 20241008_241019_114257
11 pages
Mathematical Analysis Lecture 20241017_241019_114136
No ratings yet
Mathematical Analysis Lecture 20241017_241019_114136
18 pages
hw02
No ratings yet
hw02
3 pages
Mathematical Analysis Lecture 20241009_241019_134610
No ratings yet
Mathematical Analysis Lecture 20241009_241019_134610
10 pages
Mathematical Analysis Lecture 20240911_241012_161638
No ratings yet
Mathematical Analysis Lecture 20240911_241012_161638
21 pages
hw08_241124_172949
No ratings yet
hw08_241124_172949
2 pages
hw01
No ratings yet
hw01
3 pages
hw13_250104_160719
No ratings yet
hw13_250104_160719
3 pages
Mathematical Analysis Lecture 2025.02.18
No ratings yet
Mathematical Analysis Lecture 2025.02.18
19 pages
Nonparametric Correlations
No ratings yet
Nonparametric Correlations
2 pages
The Event Study Methodology Since 1969
No ratings yet
The Event Study Methodology Since 1969
20 pages
Regression
No ratings yet
Regression
50 pages
Select The Correct Answer
No ratings yet
Select The Correct Answer
5 pages
OTM Correlation Regression Dec 23
No ratings yet
OTM Correlation Regression Dec 23
8 pages
Introductory Econometrics Econ 012
No ratings yet
Introductory Econometrics Econ 012
9 pages
Bdmr8043 Cheah Teong Keat
No ratings yet
Bdmr8043 Cheah Teong Keat
11 pages
Examen Deep Learning
100% (1)
Examen Deep Learning
8 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
MSC Statistics III IV Sem Syllabus 5 Units Current Batch
No ratings yet
MSC Statistics III IV Sem Syllabus 5 Units Current Batch
28 pages
Multivariate Regression, slides
No ratings yet
Multivariate Regression, slides
61 pages
Final Cheat Sheet!
No ratings yet
Final Cheat Sheet!
1 page
Characterising and Displaying Multivariate Data
No ratings yet
Characterising and Displaying Multivariate Data
15 pages
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
No ratings yet
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
2 pages
Applied Statistics Project One: Question One (Ex 7.30 From Text)
No ratings yet
Applied Statistics Project One: Question One (Ex 7.30 From Text)
4 pages
Immediate download Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys ebooks 2024
100% (1)
Immediate download Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys ebooks 2024
62 pages
Document
No ratings yet
Document
7 pages
Chapter 2 SOLVING NONLINEAR EQUATION 3
No ratings yet
Chapter 2 SOLVING NONLINEAR EQUATION 3
14 pages
L6 - Biostatistics - Linear Regression and Correlation
No ratings yet
L6 - Biostatistics - Linear Regression and Correlation
121 pages
Lecture 11 - SimplerLinear and Simple Logistic Regression
No ratings yet
Lecture 11 - SimplerLinear and Simple Logistic Regression
31 pages
ML-UNIT4
No ratings yet
ML-UNIT4
41 pages
Logistic Regression Using SAS Theory and Application Second Edition Paul D. Allison - Quickly download the ebook in PDF format for unlimited reading
No ratings yet
Logistic Regression Using SAS Theory and Application Second Edition Paul D. Allison - Quickly download the ebook in PDF format for unlimited reading
46 pages
Chapter 12 Heteroskedasticity PDF
No ratings yet
Chapter 12 Heteroskedasticity PDF
20 pages
Adhe Friam Budhi 2015210068 Ekon0Mi Pembangunan Reguler 1 Cluster 2 3A
No ratings yet
Adhe Friam Budhi 2015210068 Ekon0Mi Pembangunan Reguler 1 Cluster 2 3A
17 pages
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
No ratings yet
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
8 pages
Recommended Reading - Kruskal - Wallis H Test
No ratings yet
Recommended Reading - Kruskal - Wallis H Test
1 page
Simple Linear Regression: Yandell - Econ 216 Chap 13-1
No ratings yet
Simple Linear Regression: Yandell - Econ 216 Chap 13-1
70 pages