Polynomial Regression Models
Polynomial Regression Models
Polynomial regression models are multiple linear regression models that have higher order terms
in them. They are useful when there is reason to believe the relationship between two variables is
curvilinear.
In a polynomial regression model of order p the relationship between the predictor(s) and
response variable is/are modeled as a pth order polynomial function.
y 0 1 X 2 X 2 ... k X k
When k=2, the polynomial model is referred to as quadratic model and the parameters 1 and 2
are linear and quadratic parameter effects respectively.
When k=3, the polynomial model is referred to as cubic model and the parameters 1 , 2 and
3are linear , quadratic and cubic parameter effects respectively.
ˆ 1 ( x1 x ) ( x1 x ) 2 ( x1 x ) k
0
ˆ 1 ( x2 x ) ( x2 x ) 2 ( x2 x ) k
1
And ˆ
ˆ 2 X X 1 X y where X 1 ( x3 x ) ( x3 x ) 2 ( x3 x ) k
ˆ k 1 ( xn x ) ( xn x ) 2 ( xn x ) k
Significance tests for determining best polynomial model fit:
The higher the powers of X, the more complex the model becomes, hence it is advisable to keep
the powers as low as possible.
To determine the best fit, we successively compare two models; the k-th order (reduced model)
and (k+1)-th order( full model) models. Thus we compare simple linear and quadratic, quadratic
and cubic, cubic and fourth order model etc.
SSE R SSE F
The test statistic is F MSE F
~ F (1, n k 2)
SSER is the error sum of squares for reduced model and SSEF is the error sum of squares for full
model.
The best fit is attained when the calculated value of F is less than F(1,n-k-2, ).
Alternatively we test the hypotheses
H 0 : k 1 0 vs. H 1 : k 1 0
If we fail to reject H0 then we conclude that the k-th polynomial is the better fit and hence the
best fit.
Example:
(i) Fit a linear, quadratic and cubic regression model to the data below:
y 9.42 4.88 3.36 3.28 1.67 7.35 6.3 4.67 9.33 5.04
x 11 9 5 5 4 10 3 10 11 8
ANOVA TABLE:
Source of variation d.f Sum of squares Mean sum of squares F -value
Regression 1 28.6689 28.6689 7.4268
Error 8 30.882 3.8602
Quadratic model:
Parameter estimates:
Variable estimate s.e(estimate)
intercept 3.3649
(x x) 0.7958 0.1623
(x x) 2 0.2565 0.0818
ANOVA TABLE:
Source of variation d.f Sum of squares Mean sum of squares F -value
Regression 2 46.704 23.352 12.7245
Error 7 12.846 1.8352
Cubic model:
Parameter estimates:
Variable estimate s.e(estimate)
intercept 3.4066
(x x) 0.9031 0.4204
(x x) 2 0.2437 0.0991
ANOVA TABLE:
Source of variation d.f Sum of squares Mean sum of squares F -value
Regression 3 46.87 15.6233 7.3925
Error 6 12.68 2.1134
T 0.2565 3.136
0.0818
t-critical value = t(7,0.025)= 2.365, hence the predictor is significant. The quadratic model is a
better fit.
Alternatively; comparing the two models: linear is reduced and quadratic is full model
(30.882 12.846)
F 9.8278
1.8352
F –critical value= F(1,7,0.05)=5.59 hence we reject H0 full model is a better fit.
Comparing quadratic and cubic
Using regression coefficient H0: 3=0 vs. H1:3≠0
T 0.0095 0.28
0.0339
t-critical value = t(6,0.025)= 2.447, hence the predictor is not significant. The quadratic model is
a better fit.
Alternatively; comparing the two models: quadratic is reduced and cubic is full model
(12.846 12.68)
F 0.079
2.1134
F –critical value= F(1,6,0.05)=5.99 hence we fail to reject H0 reduced model is a better fit.
Hence we conclude that the quadratic model is the one that best describes the relationship
between y and x.