Lesson 2 Linear Regression
Lesson 2 Linear Regression
Data Science
Module 2
Week 2
Review of Descriptive and Inferential Statistics
Data Processing and Visualization with R
Module Objectives
At the end of this module, students must be able to:
1. Differentiate the two areas of statistics: descriptive and
inferential;
2. Perform simple linear regression in Excel and in R along with
pertinent visual output;
3. Perform multiple linear regression in Excel and in R along with
pertinent visual output;
Statistics Refresher
Collection
DESCRIPTIVE Organization
Presentation
STATISTICS
DESCRIPTIVE
Point
STATISTICS Probability
Estimation
INFERENTIAL Interval
Hypothesis
Testing
The Process of Statistics
Sampling Theory
POPULATION SAMPLE
Descriptive Statistics
Inferential Statistics
PARAMETER STATISTIC
Stat Refresher: Regression Analysis
Regression Analysis:
Statistical technique used most frequently to analyze the
relationship between two or more variables.
At least two variables need to be continuous
Deals with the way one variable tends to change as one or
more other variables change
Example
Predictor Variables
- The predictors 𝑋1,𝑋2,…,𝑋𝑝 can be continuous, discrete or
categorical variables
Initial EDA
Prior to any regression modelling, the data should always be
inspected for:
Data-entry errors
Missing values
Outliers
Unusual (e.g., asymmetric)distributions
Changes in Variability
Clustering
Non-linear bivariate relationships
Unexpected pattern
Simple Linear Regression
The Variables
X : explanatory variable (horizontal axis)
Y : response variable (vertical axis)
After data collection, we have pairs of observations:
(𝑋1,𝑌1),…,(𝑋𝑛,𝑌𝑛)
Sample Data 1
Variables: X (Height), Y (Weight)