Lecture 11 Regression
Lecture 11 Regression
Lecture 11 Regression
Chemical Engineering
LECTURE 11
LINEAR AND NONLINEAR REGRESSION
CURVE FITTING
ÖZGE KÜRKÇÜOĞLU-LEVİTAS
Linear Least-Squares Regression
y= 𝑎0 + 𝑎1 𝑥 + 𝑒
a0: intercept
a1: slope
e: error or residual e=y−𝑎0 − 𝑎1 𝑥
Discrepancy between the true value of y
and the approximate value a0+a1x
predicted by linear equation
Best line through the data would be to minimize the sum of
the squares of the residuals (errors).
𝑛
2
𝑆𝑟 = 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖
𝑖=1
yi
xi
𝑛
𝑆𝑟 = 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 2
𝑖=1
= 0 minimize Sr
= 0 minimize Sr
Here,
Then:
2 equations with 2 unknowns:
Called Normal equations
y= 𝑎0 + 𝑎1 𝑥
Calculate the means
y= −234.3 + 19.5𝑥
y
y= −234.3 + 19.5𝑥
Quantification of Error of Linear Regression
𝑛
Sum of squares: 𝑆𝑟 = 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 2
𝑖=1 𝑛
Discrepancy between the data and the mean 𝑆𝑡 = 𝑦𝑖 − 𝑦ത 2
𝑖=1
Correlation coefficient r
Coefficient of determination r2
A perfect fit:
Sr = 0 and r2 = 1,
line explains 100% of
the variability of the data.
IMPORTANT!
Just because r2 is close to 1 does not mean that the fit is
necessarily good. It is possible to obtain a high r2 for x and
y that are not linear.
Anscombe’s four data sets along with the best-fit line, y = 3 + 0.5x
Linearization of Nonlinear Relationships
lny=lna1+b1x 1/y=1/a3+b3/a3/x
logy=loga2+b2logx
exponential model
power model
saturation-growth-rate model
The relationship between x and y is not always
linear.
1st step in any regression analysis is: plot and visually
inspect data.
Example:
Mean values
Fit of the transformed data
logy=loga2+b2logx
>> plotregression(x,y)
y= −234.3 + 19.5𝑥
OR, fit a linear logarithmic equation by,
>> [r,m,b] = regression(log10(x),log10(y))
r=
0.9737
m=
1.9842
b=
-0.5620
function [a, r2] = linregr(x,y)
% linregr: linear regression curve fitting
% [a, r2] = linregr(x,y): Least squares fit of straight
% line to data by solving the normal equations
% input:
% x = independent variable
% y = dependent variable
% output:
% a = vector of slope, a(1), and intercept, a(2)
% r2 = coefficient of determination
n = length(x);
if length(y)~=n, error('x and y must be same length'); end
x = x(:); y = y(:); % convert to column vectors
sx = sum(x); sy = sum(y);
sx2 = sum(x.*x); sxy = sum(x.*y); sy2 = sum(y.*y);
a(1) = (n*sxy-sx*sy)/(n*sx2-sx^2);
a(2) = sy/n-a(1)*sx/n;
r2 = ((n*sxy-sx*sy)/sqrt(n*sx2-sx^2)/sqrt(n*sy2-sy^2))^2;
% create plot of data and best fit line
xp = linspace(min(x),max(x),2);
yp = a(1)*xp+a(2);
plot(x,y,'o',xp,yp)
Chapra 3rd ed.
grid on
Built-in function polyfit fits a least-squares nth order polynomial to data
as,
>> p = polyfit(x, y, n)
For our example n=1 since straight line is a 1st order polynomial.
>> a=polyfit(x,y,1)
a=
19.4702 -234.2857 %y= −234.3 + 19.5𝑥
t 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
w 6.00 4.83 3.70 3.15 2.41 1.83 1.49 1.21 0.96 0.73 0.64
5
• Data is first plotted with
linear scales on both axis.
4
w 3
• Power function ?
2 • Logarithmic function ?
1 • Reciprocal or exponential ?
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
Linear-linear
power:
w=btm ------ polyfit(log(t), log(w), 1)
logarithmic
w = m ln(t) + b ---- polyfit(log(t), w, 1)
w=mlog(t)+b ---- polyfit(log10(t),w,1)
reciprocal
w = 1/(mt+b) ------polyfit(t,1./ w, 1) --- 1/y = mx+b
exponential
w= bemt -----polyfit(t, log(w), 1) ---- ln(w) = mt+ln(b)
x Linear- y logarithmic x Linear – y reciprocal
1.6
1.2
Not linear
exp 1
0.8
0.6
0
10 0.4
0.2
>> b = exp(p(2));
3
>> tc = 0:0.1:5;
>> wc = b*exp(m*tc); 2
>> plot(t,w,'o',tc,wc) 1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
General Linear Least-Squares and
Nonlinear Regression
1. Exponential Model:
𝑦 = 𝑎𝑒 𝑏𝑥
For this case, the sum of the squares of the residuals is,
b can be found by
numerical methods
(such as bisection)
Example:
γ = 𝐴𝑒 λ𝑡
λ is found by solving the nonlinear equation
A found from
λ = -0.1151 A=0.9998
γ = 0.9998𝑒 −0.1151𝑡
>> t=[0 1 3 5 7 9];
>> gamma=[1. 0.891 0.708 0.562 0.447 0.355 ];
>> plot(t,gamma,'o')
>> hold on
>> x=0:24;
>> plot(x,0.9998*exp.(-0.1151.*x))
γ = 0.9998𝑒 −0.1151𝑡
@24 h,
γ=6.31 x 10-2
For this case, the sum of the squares of the residuals is,
To generate the least-squares fit, we take the derivatives of Sr with
respect to each unknown coefficients of the polynomial,
Standard error,
Example:
Use MATLAB
>> N = [6 15 55;15 55 225;55 225 979];
>> r = [152.6 585.6 2488.8];
>> a = N\r
a =
2.4786
2.3593
1.8607
Coefficient of determination
3-General Linear Least Squares
𝑦 = 𝑎0 𝑧0 + 𝑎1 𝑧1 +𝑎2 𝑧2 +𝑎3 𝑧3 +…+𝑎𝑚 𝑧𝑚 +e *
𝑦 = 𝑍 𝑎 + 𝑒
𝑦 = 𝑍 𝑎 + 𝑒
[Z] is a matrix of the calculated values of the
basis functions at the measured values of the
independent variables
m: number of variables
n: number of data points
n≥ 𝑚 + 1 : Z may not be a square matrix
unknown coefficients
residuals
Normal equations:
coefficient of determination
Example:
𝑦 = 𝑎0 + 𝑎1 𝑥+𝑎2 𝑥 2
Use MATLAB
>> x = [0 1 2 3 4 5]';
>> y = [2.1 7.7 13.6 27.2 40.9 61.1]';
OR 𝑦 = 2.4786 + 2.3593𝑥+1.8607𝑥 2
>> x = [0 1 2 3 4 5]';
>> y = [2.1 7.7 13.6 27.2 40.9 61.1]';
>> Z = [ones(size(x)) x x.^2];
>> a = Z\y
a=
2.4786 2.3593 1.8607
4-Nonlinear regression:
There are many cases in engineering and science where
nonlinear models must be fit to data. These models have a
nonlinear dependence on their parameters, such as,
𝑦 = 𝑎0 1 − 𝑒 −𝑎1𝑥 + 𝑒