0% found this document useful (0 votes)

12 views

Simple Linear and Logistic Regression

The document provides an overview of regression analysis, detailing concepts such as slope, intercept, and types of regression models including simple linear, multiple linear, and logistic regression. It explains how these models are used to predict relationships between variables, the importance of residuals and the least squares property, and the significance of the coefficient of determination. Additionally, it discusses non-linear regression and polynomial regression as methods to address non-linear data relationships.

Uploaded by

Asma Ayub

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Simple Linear and Logistic Regression

Uploaded by

Asma Ayub

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

REGRESSION

RECAP OF BASIC CONCEPTS

SLOPE
Slope of a linear regression line tells us - how much change
in y-variable is caused by a unit change in x-variable.

The slope indicates the steepness of a line and the intercept

indicates the location where it intersects an axis. The slope
and the intercept define the linear relationship between two
variables, and can be used to estimate an average rate of
change.

The greater the magnitude of the slope, the steeper the line
and the greater the rate of change.
The slope is positive 5. When x increases by 1, y
increases by 5. The y-intercept is 2.

The slope is negative 0.4. When x increases by 1,

y decreases by 0.4. The y-intercept is 7.2.

The slope is 0. When x increases by 1, y neither

increases or decreases. The y-intercept is -4.
INTERCEPT
The slope indicates the steepness of a line and the
intercept indicates the location where it intersects an
axis. The slope and the intercept define the linear
relationship between two variables, and can be used to
estimate an average rate of change.

The intercept (often labeled as constant) is the point

where the function crosses the y-axis.
Y-INTERCEPT

A positive y-intercept means the line crosses the y-axis

above the origin, while a negative y-intercept means
that the line crosses below the origin. Simply by
changing the values of m and b, we can define any
straight line. That's how powerful and versatile the slope
intercept formula is.
SLOPE AND INTERCEPT EXAMPLE
For example, a company determines that job
performance for employees in a production department
can be predicted using the regression model y = 130 +
4.3x, where x is the hours of in-house training they
receive (from 0 to 20) and y is their score on a job skills
test.
The value of the y-intercept (130) indicates the average
job skill score for an employee with no training.
The value of the slope (4.3) indicates that for each hour
of training, the job skill score increases, on average, by
4.3 points.
MARGINAL CHANGE / SLOPE
Marginal change – in working with two variables related
by a regression equation, the marginal change in a
variable is the amount that the variable changes when the
other variable changes by exactly one unit

⚫ The slope, 𝑏1, in the regression equation is the marginal

change in 𝑦 when 𝑥 changes by one unit
TYPES OF REGRESSION MODELS
Simple linear regression
Multiple regression
Logistic regression
Polynomial regression
SIMPLE LINEAR REGRESSION
We find the equation of the straight line that best fits the
paired sample data. That equation algebraically describes
the relationship between two variables.

The best-fitting straight line is called a regression line

and its equation is called the regression equation.
SIMPLE LINEAR REGRESSION

Regression Equation – given a collection of paired sample

data, the regression equation that algebraically describes the
relationship between the two variables 𝑥 and 𝑦 is 𝑦 =𝑏0+𝑏1𝑥
⚫ The regression equation attempts to describe a relationship
between two variables
⚫ Inherently, the equation algebraically describes how the values of
one variable are somehow associated with the values of the other
variable

Regression line – the graph of the regression equation

⚫ Also known as the “line of best fit” or the “least square line”
⚫ The regression line fits the sample points best
NOTATION

Notice the 𝑦 in the sample regression equation!

This implies that we are predicting something!
We are predicting values for 𝑦 based upon true and
observed values of 𝑥.
SIMPLE LINEAR REGRESSION
REQUIREMENTS FOR SIMPLE LINEAR REGRESSION
1.The sample of paired data is a simple random sample
of quantitative data

2.The pairs of data (𝑥,𝑦) have a bivariate normal

distribution, meaning the following:
⚫ Visual examination of the scatter plot(s) confirms that the
sample points follow an approximately straight line(s)
⚫ Because results can be strongly affected by the presence of
outliers, any outliers should be removed if they are known to
be errors (Note: Use caution when removing data points)
SIMPLE LINEAR REGRESSION

Outlier – in a scatter plot, an outlier is a point lying far

away from the other data points

Influential point – a point that strongly affects the graph

of the regression line
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION

The additional point is an influential point because the

graph of the regression line because the graph of the
regression line did change considerably.

The additional point is also an outlier because it is far

from the other points.
RESIDUAL AND THE LEAST SQUARES PROPERTY

Residual – for a pair of sample 𝑥 and 𝑦 values, the difference

between the observed sample value of 𝑦 (a true value
observed) and the y-value that is predicted by using the
regression equation 𝑦^ is the residual

⚫ 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 = 𝑦 − 𝑦^

⚫ A residual represents a type of inherent prediction error
⚫ The regression equation does not, typically, pass through all the
observed data values that we have

The Least Squares Property – a straight line satisfies this

property if the sum of the squares of the residuals is the
smallest sum possible
RESIDUALS
EXAMPLE – HOME PRICE PREDICTION
EXAMPLE – HOME PRICE PREDICTION
EXAMPLE – HOME PRICE PREDICTION
EXAMPLE – HOME PRICE PREDICTION
EXAMPLE – HOME PRICE PREDICTION
EXAMPLE – HOME PRICE PREDICTION
Find sum of squared errors for all the possible lines.
Blue line gives the minimum error.
EXAMPLE – HOME PRICE PREDICTION
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
EXAMPLE – EXAM GRADES
LEAST SQUARES METHOD
LEAST SQUARES METHOD
LEAST SQUARES METHOD
LEAST SQUARES METHOD
LEAST SQUARES METHOD
LEAST SQUARES METHOD
HOW GOOD IS THE PREDICTION?
The depends how good the regression line fits the data
The measurement the tells how good the regression line
fits the data is Coefficient of Determination.
COEFFICIENT OF DETERMINATION
R SQUARE
CORRELATION COEFFICIENT
CORRELATION COEFFICIENT
PEARSON’S CORRELATION COEFFICIENT
MULTIPLE LINEAR REGRESSION
MULTIPLE LINEAR REGRESSION
Multiple linear regression (MLR), also known simply as
multiple regression, is a statistical technique that uses
several explanatory variables to predict the outcome of a
response variable.

The goal of multiple linear regression is to model the

linear relationship between the explanatory
(independent) variables and response (dependent)
variables.
MULTIPLE LINEAR REGRESSION
What Multiple Linear Regression Can Tell You
Simple linear regression is a function that allows an
analyst or statistician to make predictions about one
variable based on the information that is known about
another variable.
Linear regression can only be used when one has two
continuous variables—an independent variable and a
dependent variable.
The independent variable is the parameter that is used to
calculate the dependent variable or outcome.
A multiple regression model extends to several
explanatory variables.
MULTIPLE LINEAR REGRESSION
The multiple regression model is based on the following
assumptions:

There is a linear relationship between the dependent

variables and the independent variables

The independent variables are not too highly correlated

with each other
MULTIPLE LINEAR REGRESSION
MULTIPLE LINEAR REGRESSION
EXAMPLE
Dataset
OUTPUT FROM EXCEL
MULTIPLE LINEAR REGRESSION
MULTIPLE LINEAR REGRESSION
MULTIPLE LINEAR REGRESSION
LOGISTIC REGRESSION
AIMS
When and Why do we Use Logistic Regression?
Binary
Multinomial
Theory Behind Logistic Regression
Assessing the Model
Assessing predictors
Things that can go Wrong
Interpreting Logistic Regression
WHEN AND WHY
To predict an outcome variable that is categorical from
one or more categorical or continuous predictor
variables.

Used because having a categorical outcome variable

violates the assumption of linearity in normal regression.
WHEN AND WHY
No assumptions about the distributions of the predictor
variables.
Predictors do not have to be normally distributed
Logistic regression does not make any assumptions of
normality, linearity, and homogeneity of variance for the
independent variables.
Because it does not impose these requirements, it is
preferred to discriminant analysis when the data does not
satisfy these assumptions.
LOGISTIC REGRESSION
Logistic regression is used to analyze relationships between a
dichotomous dependent variable and continue or dichotomous
independent variables.

A dichotomous variable is one that takes on one of only two

possible values when observed or measured. ... For example, a
dichotomous variable may be used to indicate whether a piece of
legislation passed. The dichotomous variable (pass/fail) is a
representation of the actual, and observable, vote on the
legislation.

Logistic regression combines the independent variables to estimate

the probability that a particular event will occur, i.e. a subject will
be a member of one of the groups defined by the dichotomous
dependent variable.
LOGISTIC REGRESSION
Logistic regression is a type of binary classification
machine learning algorithm used to predict the
probability of something happening, in our case whether
or not an event will occur.

The name “logistic regression” is derived from the

concept of the logistic function that it uses. The logistic
function is also known as the sigmoid function. The
value of this logistic function lies between zero and one.
The following is an example of a logistic function we
can use to find the probability of a vehicle breaking
down, depending on how many years it has been since it
was serviced last.
LOGISTIC REGRESSION
Logistic regression aims to solve classification problems.
It does this by predicting categorical outcomes, unlike
linear regression that predicts a continuous outcome.
In the simplest case there are two outcomes, which is
called binomial, an example of which is predicting if a
tumor is malignant or benign. Other cases have more
than two outcomes to classify, in this case it is called
multinomial. A common example for multinomial
logistic regression would be predicting the class of an iris
flower between 3 different species.
OBJECTIVES OF LOGISTIC REGRESSION

Identify the independent variable that impact in the

dependent variable

Establishing classification system based on the logistic

model for determining the group membership
TYPES OF LOGISTIC REGRESSION
BINARY LOGISTIC REGRESSION

⚫ It is used when the dependent variable is dichotomous.

MULTINOMIAL LOGISTIC REGRESSION

⚫ It is used when the dependent or outcomes variable has more

than two categories.
LOGISTIC REGRESSION – BINARY CLASSIFICATION
LOGISTIC REGRESSION – BINARY CLASSIFICATION
With linear regression
LOGISTIC REGRESSION – BINARY CLASSIFICATION
With logistic regression
LOGISTIC REGRESSION – BINARY CLASSIFICATION
LOGISTIC REGRESSION – BINARY CLASSIFICATION
NON-LINEAR REGRESSION
NON-LINEAR REGRESSION
In simple linear regression algorithm only works when
the relationship between the data is linear

But suppose if we have non-linear data then Linear

regression will not capable to draw a best-fit line and It
fails in such conditions.

Hence, we introduce polynomial regression to overcome

this problem, which helps identify the curvilinear
relationship between independent and dependent
variables.
NON-LINEAR REGRESSION
How Polynomial Regression Overcomes the problem of
Non-Linear data?

Polynomial regression is a form of Linear regression where

only due to the Non-linear relationship between dependent
and independent variables we add some polynomial terms to
linear regression to convert it into Polynomial regression.

Suppose we have X as Independent data and Y as dependent

data. Before feeding data to a mode in preprocessing stage we
convert the input variables into polynomial terms using some
degree.
NON-LINEAR REGRESSION
NON-LINEAR REGRESSION
NON-LINEAR REGRESSION
BASIS FUNCTION REGRESSION
Polynomial basis functions

Gaussian basis functions

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (66)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Introductory Econometrics Test Bank
84% (32)
Introductory Econometrics Test Bank
133 pages
Projects Prasanna Chandra 7E Ch4 Minicase Solution
No ratings yet
Projects Prasanna Chandra 7E Ch4 Minicase Solution
3 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Regression
No ratings yet
Regression
45 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
1linear Regression
No ratings yet
1linear Regression
12 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
No ratings yet
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
18 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Regression
No ratings yet
Regression
14 pages
Unit III
No ratings yet
Unit III
18 pages
Linear Regression using R
No ratings yet
Linear Regression using R
11 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
25 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
02 LR
No ratings yet
02 LR
11 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
12 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Data Science
100% (1)
Data Science
14 pages
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
CHAPTER 14 Regression Analysis
No ratings yet
CHAPTER 14 Regression Analysis
69 pages
Kuliah 10 Simple Regression
No ratings yet
Kuliah 10 Simple Regression
16 pages
Fsgs
No ratings yet
Fsgs
28 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Regression Analysis
100% (2)
Regression Analysis
11 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Regression PDF
No ratings yet
Regression PDF
16 pages
Regression
No ratings yet
Regression
27 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
unit5_R
No ratings yet
unit5_R
5 pages
ML Unit-III Notes
No ratings yet
ML Unit-III Notes
83 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Unit 2
No ratings yet
Unit 2
19 pages
QT _Unit 2_Part B - Regression
No ratings yet
QT _Unit 2_Part B - Regression
40 pages
Linear Regression Vs Logistic Regression
No ratings yet
Linear Regression Vs Logistic Regression
2 pages
Session_19&20
No ratings yet
Session_19&20
54 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
CH 5
No ratings yet
CH 5
36 pages
Regression Analysis
No ratings yet
Regression Analysis
32 pages
Econometrics for Finace Lecture II-Session Three
No ratings yet
Econometrics for Finace Lecture II-Session Three
32 pages
Linear Regression
100% (2)
Linear Regression
28 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Regression Primer
No ratings yet
Regression Primer
4 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
Automata
No ratings yet
Automata
27 pages
Assignment 3
No ratings yet
Assignment 3
1 page
Lecture 8
No ratings yet
Lecture 8
26 pages
Lecture-4
No ratings yet
Lecture-4
37 pages
Automata
No ratings yet
Automata
20 pages
Lecture-22-Pipelining
No ratings yet
Lecture-22-Pipelining
13 pages
Lecture-21-Exception Handling
No ratings yet
Lecture-21-Exception Handling
15 pages
Lecture-2-a
No ratings yet
Lecture-2-a
10 pages
Computer Architecture Introduction
No ratings yet
Computer Architecture Introduction
27 pages
Reinforcement Learning.pptx
No ratings yet
Reinforcement Learning.pptx
59 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
Naive Bayes.ppt
No ratings yet
Naive Bayes.ppt
24 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
Gradient Descent and Cost Function.pptx
No ratings yet
Gradient Descent and Cost Function.pptx
35 pages
dimensionalityReduction.pptx
No ratings yet
dimensionalityReduction.pptx
117 pages
Jobvite Continuous Candidate Engagement
No ratings yet
Jobvite Continuous Candidate Engagement
42 pages
Exercise 1. Laplace, Wald or Pessimistic, Optimistic, Hurwicz and Savage Criteria (Profit Matrix)
No ratings yet
Exercise 1. Laplace, Wald or Pessimistic, Optimistic, Hurwicz and Savage Criteria (Profit Matrix)
9 pages
Financial Risk Management: Zvi Wiener
No ratings yet
Financial Risk Management: Zvi Wiener
47 pages
Principles of Managerial Finance: Fifteenth Edition, Global Edition
No ratings yet
Principles of Managerial Finance: Fifteenth Edition, Global Edition
151 pages
Life Contingencies
No ratings yet
Life Contingencies
70 pages
Ie Electives Final Exam - Rampula
No ratings yet
Ie Electives Final Exam - Rampula
2 pages
Linear Regression in Machine Learning - GeeksforGeeks
No ratings yet
Linear Regression in Machine Learning - GeeksforGeeks
26 pages
Probabilitas (Satatistika)
No ratings yet
Probabilitas (Satatistika)
16 pages
Goal Programming PPT Solved
No ratings yet
Goal Programming PPT Solved
27 pages
Ef4484 PS3
No ratings yet
Ef4484 PS3
3 pages
Foi 5590
No ratings yet
Foi 5590
19 pages
Anova Sheet
No ratings yet
Anova Sheet
4 pages