Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
29 views

Lecture 8 Linear and Multiple Regression

Uploaded by

haseebayy45
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Lecture 8 Linear and Multiple Regression

Uploaded by

haseebayy45
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Linear Regression &

Multiple Linear
Regression
Lecture Outcomes

• What is Regression?
• Real World Examples
• The Line, the Slope and the Intercept
• How is a Simple Linear Regression Analysis done?
• Plotting
• Fitting the Line
• Finding the best Line

STA6166-RegBasics 2
What is Linear Regression?
• Definition:
Linear regression is a statistical method used to model the
relationship between a dependent variable and one or more
independent variables.

• Regression is a statistical procedure that determines the equation for


the straight line that best fits a specific set of data.
Goal:
• We want to draw a straight line that best fits the data points, so we
can predict values for one variable based on the other(s).\

• The goal of linear regression is to find the line that best fits the data
and can be used for prediction.
Real-World Example
• Scenario: Predicting house prices
• Independent Variable (X): Square footage of a house
• Dependent Variable (Y): Price of the house
• Question: How can we predict the price of a house based on its size?
Linear regression helps us understand this relationship.
Real-Life Example-II
• Scenario:
Imagine you want to predict a student's score on a test based on the
number of hours they studied.
• Independent variable (x): Number of hours studied
• Dependent variable (y): Test score
• The question:
Can we predict the test score by knowing the number of hours
studied?
Simple Linear Regression
• Equation for a Straight Line:
The equation for a straight line is:
y=β0+β1⋅x i.e y=mx+c

Where:
• y = predicted value (test score)
• x = independent variable (hours studied)
• β0​= intercept (the point where the line crosses the y-axis)
• β1​= slope (how much y changes for each change in x)
Understanding the Slope and
Intercept
The ‘c’
• Intercept (β0​): This is the value of y when x=0. In our example, it's the
predicted score when no hours are studied. This is where the line
crosses the Y-axis

• Slope (β1​): The ‘m’


This shows how much the dependent variable (y) changes for each
unit change in the independent variable (x).

In our example, it tells us how much the test score increases for each
additional hour of study.
The Straight Line
• Any straight line can be represented by an equation of the form Y = bX
+ a, where b and a are constants.
• The value of b is called the slope constant and determines the
direction and degree to which the line is tilted.
• The value of a is called the Y-intercept and determines the point
where the line crosses the Y-axis.

11
Visualizing Linear Regression

• Graph Example:
• X-axis: Hours studied (independent variable)
• Y-axis: Test score (dependent variable)
• Scatter Plot:
Plot data points showing hours studied versus test scores.
• Regression Line:
The straight line drawn through the points that best represents the
relationship between hours studied and test score.
How Does Linear Regression
Work?
• Objective:
Find the line that best fits the data by minimizing the difference
between actual values and predicted values.
• How do we fit the line?
• We use a method called Least Squares to minimize the error (the difference
between actual and predicted points).
• Error (Residual):
The difference between the observed value (real test score) and the
predicted value (value on the line).
How Do We Find the Best-Fitting
Line?
• Method: Least Squares Method
• This method minimizes the sum of the squared differences between the
observed (actual) values and the predicted values.
• Formula for minimizing:

• where Yi are the actual values and are the predicted values.
Evaluating the Model

•Mean Squared Error (MSE):


•This measures the average of the squared differences between actual and predicted
values.
•The lower the MSE, the better the model.
Steps in Linear Regression
1.Collect data:
Gather data for the dependent and independent variables (e.g., hours
studied and test scores).
2.Plot the data:
Create a scatter plot to visualize the relationship between variables.
3.Fit the line:
Use linear regression to find the line that best fits the data.
4.Make predictions:
Once the line is fitted, use it to make predictions about new data (e.g.,
predict test scores based on hours studied).
Linear Regression (cont.)
• How well a set of data points fits a straight line can be measured by
calculating the distance between the data points and the line.
• The total error between the data points and the line is obtained by
squaring each distance and then summing the squared values.
• The regression equation is designed to produce the minimum sum of
squared errors.

17
An operations supervisor measured how long it takes one of her drivers to put 1, 2, 3 and 4
cases of soft drink into a soft drink machine. In this case the levels of the explanatory variable,
X are {1,2,3,4}, and she controls them. She might repeat the measurement a couple of times at
each level of X. A scatter plot of the resulting data might look like:

STA6166-RegBasics 18
A forestry graduate student makes wrapping paper out of different percentages
of hardwood then measure its tensile strength. He has the freedom to choose at
the beginning of the study to have only five percentages to work with, say {5%,
10%, 15%, 20%, and 25%}. A scatter plot of the resulting data might look like:

STA6166-RegBasics 19
A farm manager is interested in the relationship between litter size and average
litter weight (average newborn piglet weight). She examines the farm records
over the last couple of years and records the litter size and average weight for all
births. A plot of the data pairs looks like the following:

STA6166-RegBasics 20
A farm operations student is interested in the relationship between maintenance
cost and age of farm tractors. He performs a telephone interview survey of the 52
commercial potato growers in Putnam County, FL. One part of the questionnaire
provides information on tractor age and 1995 maintenance cost (fuel, lubricants,
repairs, etc). A plot of these data might look like:

STA6166-RegBasics 21
Questions needing answers.
• What is the association between Y and X?
• How can changes in Y be explained by changes in X?
• What are the functional relationships between Y and X?
A functional relationship is symbolically written as:

Eq: 1 Y f ( X )
Example: A proportional
relationship (e.g. fish weight to
length).

Y b1 X
b1 is the slope of the line.

STA6166-RegBasics 22
Example: Linear relationship (e.g.
Y=cholesterol versus X=age)

Y b0  b1 X
b0 is the intercept,
b1 is the slope.

STA6166-RegBasics 23
Problem Statement

Linear Regression: Fitting a Line
𝒚 =(𝒎𝒙 + 𝒃)
Which Line Best Fits the data points
(This One)
Which Line Best Fits the data points
(Or This One)
Which Line Best Fits the data points
(Or This One)
Finding out the best line
The line which minimizes the sum of square errors
Linear Regression (Example)
• Find the best line that fits the following data points:

X 1 2 3 4 5 6 7
Y 1.5 3.8 6.7 9.0 11.2 13.6 16
Linear Regression (Example): Plot
Linear Regression (Example)
X Y XY X2
1 1.5 1.5 1

2 3.8 7.6 4

3 6.7 20.1 9

4 9.0 36 16

5 11.2 56 25

6 13.6 81.6 36

7 16 112 49

=28 61.8 314.8 =140


Linear Regression (Example)

• =-0.8285

• →
Linear Regression (Example)
Linear Regression (Example)
• → =3.99

• → =11.22

• → =16.04
Equations Governing Linear Regression
Finding out the best line
The line which minimizes the sum of square errors
Example (Home Work)
• The iodine value (x) is the amount of iodine necessary to saturate a
sample of 100 g of oil. Fit the simple linear regression model to this
data.
Multiple Linear Regression
• Boston Housing Data:
Multiple Linear Regression
• Titanic Data:
Multiple Linear Regression
• Iris Data:
Multiple Linear Regression
Multiple Linear Regression (Example)
• Finding the best fit plane:

X1 (Product-1 Sale) X2 (Product-2 Sale) Y (Weekly Sale)


1 4 1
2 5 6
3 8 8
4 2 12
Multiple Linear Regression (Example)
1 1 4 1 B0

1 2 5 6 B1

X= 1 3 8 Y= 8 B= B2

1 4 2 12 B3
Multiple Linear Regression (Example)

• XTX =

X
T X XTX
• (XTX)-1 =

(XTX)-1
Multiple Linear Regression (Example)

• ((XTX)-1)XT=

(XTX)-1 XT

(XTX)-1 XT
Multiple Linear Regression (Example)

• (XTX)-1 XT Y=

(XTX)-1 XT Y

• B=

b
Multiple Linear Regression (Example)
• b0=-1.69
• b1= 3.48
• b2=-0.05
Practical Example: Hours Studied
vs. Test Scores
• Data:
Example dataset with hours studied and corresponding test scores.
• Hours studied: [1, 2, 3, 4, 5]
• Test scores: [50, 55, 60, 65, 70]
• Regression Line:
After performing linear regression, we get the equation for the line:
• Test Score= 50 + 5⋅(Hours Studied)
• This means that for every additional hour of study, the score increases by 5 points.
Advantages of Linear
Regression
• Simple and Easy to Understand:
The model is easy to explain and interpret, making it a good starting
point for prediction.
• Quick Computation:
Linear regression can be computed quickly, even with large datasets.
• Clear Interpretation:
The coefficients (β0 and β1​) are easy to understand in practical terms.
Limitations of Linear Regression

• Assumes a Linear Relationship:


It only works well if the relationship between the variables is linear. If
the relationship is non-linear, this method may not work.

• Sensitive to Outliers:
Outliers (data points far from the line) can greatly affect the line.
Applications of Linear
Regression
• Predicting Sales:
Estimating sales based on advertising budget or marketing spend.
• Predicting Growth:
Estimating population growth or economic trends based on certain
factors.
• Medical Research:
Predicting the impact of various factors (e.g., age, lifestyle) on health
outcomes.
• Real Estate:
Estimating house prices based on factors like square footage, number
of rooms, and location.
Conclusion
• Linear regression is a powerful, simple, and widely used tool for
making predictions.
• It’s most useful when there is a linear relationship between the
independent and dependent variables.
• Understanding its basic principles, like the slope and intercept, helps
in applying linear regression effectively.
What is Regression?
Relationships

In science, we frequently measure two or more variables on


the same individual (case, object, etc). We do this to explore
the nature of the relationship among these variables. There
are two basic types of relationships.

• Cause-and-effect relationships.
• Functional relationships.

Function: a mathematical relationship enabling us to


predict what values of one variable (Y) correspond to
given values of another variable (X).

• Y: is referred to as the dependent variable, the response


variable or the predicted variable.
• X: is referred to as the independent variable, the
explanatory variable or the predictor variable.
STA6166-RegBasics 55
Example: Polynomial relationship
(e.g. Y=crop yield vs. X=pH)
2
Y b0  b1 X  b2 X

b0: intercept,
b1: linear coefficient,
b2: quadratic coefficient.

STA6166-RegBasics 57

You might also like