Simple Linear Regression
Simple Linear Regression
CONCEPT PAPER
Simple linear regression is a statistical method on summarizing and of the study of the
relationships between two continuous (quantitative) variables: the independent variable,
denoted x, is also regarded as the predictor or explanatory variable; the dependent variable
denoted y, is also regarded as the response or outcome variable. The other terms are mentioned
only to make us aware of them if ever we encounter them. Simple linear regression gets its
adjective "simple," because it concerns the study of only one predictor variable. If there are more
than two variables present, it is already referred to as “multiple linear regression”.
When you want to understand the association between two variables (one continuous
dependent variable and one independent variable), that is when you can use simple linear
regression. There are three major uses for simple linear regression: determining the strength of
predictors, forecasting an effect, and trend forecasting. It is not, however, used to interpret the
cause-and-effect relationship between variables.
Nevertheless, linear regression is an extremely simple method. It is very easy and
intuitive to use and understand. A person with only the knowledge of high school mathematics
can understand and use it. In addition, it works in most cases. Even when it doesn’t fit the data
exactly, we can use it to find the nature of the relationship between the two variables (D. Jain,
2009).
However, when using this statistical tool, it only presents the relationships between
dependent and independent variables that are linear. It assumes there is a straight-line
relationship between them which is incorrect sometimes. Also, linear regression is very sensitive
to the anomalies in the data (or outliers). Take for example most of your data lies in the range 0-
10. If due to any reason only one of the data items comes out of the range, say for example 15,
this significantly influences the regression coefficients. Another disadvantage is that if we have a
number of parameters than the number of samples available then the model starts to represent the
noise rather than the relationship between the variables (D. Jain, 2009).
ASSUMPTIONS
SAMPLE PROBLEM
In a statistics course, we want to see if there is any relationship between study time and scores in
the mid-semester exam.
Then, we graphed the data using scatter plot to see if it has a linear relationship
Step 1
Select "Analyze -> Regression -> Linear".
A new window
pops out.
Step 2
From the list on the left, select the variable "Exam score" as "Dependent" and the variable
"Hours" as the "Independent(s)".
Step 3
After clicking “OK”, the results now pop out in the "Output" window.
Step 4
The results shows up, and now we can interpret it.
From B in the third table, since the p-value is .000, the relationship between study hours
and exam scores is significant. From A in the second table, the correlation coefficient, R, is
0.827. Therefore, we can conclude that study hours is positively correlated with exam score and
the relationship is very strong (R is positive and is very close to 1). From C in the last table, we
can conclude that on average, for every one hour a student study, he/she gets 2.391 more marks
in the exam.
Our alternative hypothesis is accepted which states that, there is a significant relationship
between the study time and scores in the mid-semester exam. The null hypothesis was rejected
that states otherwise.