Lecture 8 Linear and Multiple Regression
Lecture 8 Linear and Multiple Regression
Multiple Linear
Regression
Lecture Outcomes
• What is Regression?
• Real World Examples
• The Line, the Slope and the Intercept
• How is a Simple Linear Regression Analysis done?
• Plotting
• Fitting the Line
• Finding the best Line
STA6166-RegBasics 2
What is Linear Regression?
• Definition:
Linear regression is a statistical method used to model the
relationship between a dependent variable and one or more
independent variables.
• The goal of linear regression is to find the line that best fits the data
and can be used for prediction.
Real-World Example
• Scenario: Predicting house prices
• Independent Variable (X): Square footage of a house
• Dependent Variable (Y): Price of the house
• Question: How can we predict the price of a house based on its size?
Linear regression helps us understand this relationship.
Real-Life Example-II
• Scenario:
Imagine you want to predict a student's score on a test based on the
number of hours they studied.
• Independent variable (x): Number of hours studied
• Dependent variable (y): Test score
• The question:
Can we predict the test score by knowing the number of hours
studied?
Simple Linear Regression
• Equation for a Straight Line:
The equation for a straight line is:
y=β0+β1⋅x i.e y=mx+c
Where:
• y = predicted value (test score)
• x = independent variable (hours studied)
• β0= intercept (the point where the line crosses the y-axis)
• β1= slope (how much y changes for each change in x)
Understanding the Slope and
Intercept
The ‘c’
• Intercept (β0): This is the value of y when x=0. In our example, it's the
predicted score when no hours are studied. This is where the line
crosses the Y-axis
11
Visualizing Linear Regression
• Graph Example:
• X-axis: Hours studied (independent variable)
• Y-axis: Test score (dependent variable)
• Scatter Plot:
Plot data points showing hours studied versus test scores.
• Regression Line:
The straight line drawn through the points that best represents the
relationship between hours studied and test score.
How Does Linear Regression
Work?
• Objective:
Find the line that best fits the data by minimizing the difference
between actual values and predicted values.
• How do we fit the line?
• We use a method called Least Squares to minimize the error (the difference
between actual and predicted points).
• Error (Residual):
The difference between the observed value (real test score) and the
predicted value (value on the line).
How Do We Find the Best-Fitting
Line?
• Method: Least Squares Method
• This method minimizes the sum of the squared differences between the
observed (actual) values and the predicted values.
• Formula for minimizing:
• where Yi are the actual values and are the predicted values.
Evaluating the Model
17
An operations supervisor measured how long it takes one of her drivers to put 1, 2, 3 and 4
cases of soft drink into a soft drink machine. In this case the levels of the explanatory variable,
X are {1,2,3,4}, and she controls them. She might repeat the measurement a couple of times at
each level of X. A scatter plot of the resulting data might look like:
STA6166-RegBasics 18
A forestry graduate student makes wrapping paper out of different percentages
of hardwood then measure its tensile strength. He has the freedom to choose at
the beginning of the study to have only five percentages to work with, say {5%,
10%, 15%, 20%, and 25%}. A scatter plot of the resulting data might look like:
STA6166-RegBasics 19
A farm manager is interested in the relationship between litter size and average
litter weight (average newborn piglet weight). She examines the farm records
over the last couple of years and records the litter size and average weight for all
births. A plot of the data pairs looks like the following:
STA6166-RegBasics 20
A farm operations student is interested in the relationship between maintenance
cost and age of farm tractors. He performs a telephone interview survey of the 52
commercial potato growers in Putnam County, FL. One part of the questionnaire
provides information on tractor age and 1995 maintenance cost (fuel, lubricants,
repairs, etc). A plot of these data might look like:
STA6166-RegBasics 21
Questions needing answers.
• What is the association between Y and X?
• How can changes in Y be explained by changes in X?
• What are the functional relationships between Y and X?
A functional relationship is symbolically written as:
Eq: 1 Y f ( X )
Example: A proportional
relationship (e.g. fish weight to
length).
Y b1 X
b1 is the slope of the line.
STA6166-RegBasics 22
Example: Linear relationship (e.g.
Y=cholesterol versus X=age)
Y b0 b1 X
b0 is the intercept,
b1 is the slope.
STA6166-RegBasics 23
Problem Statement
•
Linear Regression: Fitting a Line
𝒚 =(𝒎𝒙 + 𝒃)
Which Line Best Fits the data points
(This One)
Which Line Best Fits the data points
(Or This One)
Which Line Best Fits the data points
(Or This One)
Finding out the best line
The line which minimizes the sum of square errors
Linear Regression (Example)
• Find the best line that fits the following data points:
X 1 2 3 4 5 6 7
Y 1.5 3.8 6.7 9.0 11.2 13.6 16
Linear Regression (Example): Plot
Linear Regression (Example)
X Y XY X2
1 1.5 1.5 1
2 3.8 7.6 4
3 6.7 20.1 9
4 9.0 36 16
5 11.2 56 25
6 13.6 81.6 36
7 16 112 49
• =-0.8285
• →
Linear Regression (Example)
Linear Regression (Example)
• → =3.99
• → =11.22
• → =16.04
Equations Governing Linear Regression
Finding out the best line
The line which minimizes the sum of square errors
Example (Home Work)
• The iodine value (x) is the amount of iodine necessary to saturate a
sample of 100 g of oil. Fit the simple linear regression model to this
data.
Multiple Linear Regression
• Boston Housing Data:
Multiple Linear Regression
• Titanic Data:
Multiple Linear Regression
• Iris Data:
Multiple Linear Regression
Multiple Linear Regression (Example)
• Finding the best fit plane:
1 2 5 6 B1
X= 1 3 8 Y= 8 B= B2
1 4 2 12 B3
Multiple Linear Regression (Example)
• XTX =
X
T X XTX
• (XTX)-1 =
(XTX)-1
Multiple Linear Regression (Example)
• ((XTX)-1)XT=
(XTX)-1 XT
(XTX)-1 XT
Multiple Linear Regression (Example)
• (XTX)-1 XT Y=
(XTX)-1 XT Y
• B=
b
Multiple Linear Regression (Example)
• b0=-1.69
• b1= 3.48
• b2=-0.05
Practical Example: Hours Studied
vs. Test Scores
• Data:
Example dataset with hours studied and corresponding test scores.
• Hours studied: [1, 2, 3, 4, 5]
• Test scores: [50, 55, 60, 65, 70]
• Regression Line:
After performing linear regression, we get the equation for the line:
• Test Score= 50 + 5⋅(Hours Studied)
• This means that for every additional hour of study, the score increases by 5 points.
Advantages of Linear
Regression
• Simple and Easy to Understand:
The model is easy to explain and interpret, making it a good starting
point for prediction.
• Quick Computation:
Linear regression can be computed quickly, even with large datasets.
• Clear Interpretation:
The coefficients (β0 and β1) are easy to understand in practical terms.
Limitations of Linear Regression
• Sensitive to Outliers:
Outliers (data points far from the line) can greatly affect the line.
Applications of Linear
Regression
• Predicting Sales:
Estimating sales based on advertising budget or marketing spend.
• Predicting Growth:
Estimating population growth or economic trends based on certain
factors.
• Medical Research:
Predicting the impact of various factors (e.g., age, lifestyle) on health
outcomes.
• Real Estate:
Estimating house prices based on factors like square footage, number
of rooms, and location.
Conclusion
• Linear regression is a powerful, simple, and widely used tool for
making predictions.
• It’s most useful when there is a linear relationship between the
independent and dependent variables.
• Understanding its basic principles, like the slope and intercept, helps
in applying linear regression effectively.
What is Regression?
Relationships
• Cause-and-effect relationships.
• Functional relationships.
b0: intercept,
b1: linear coefficient,
b2: quadratic coefficient.
STA6166-RegBasics 57