Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Module 3 Regression and Interpolation

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 3 Regression and Interpolation

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

NUMERICAL METHODS

MA 311
MODULE 3: regression and
interpolation
Musango Lungu, D Eng

School of Mines and Mineral Sciences


Chemical engineering department
2023 @ copyright M.L.
REGRESSION AND INTERPOLATION
• Many scientific and engineering observations are made by conducting
experiments in which physical quantities are measured and recorded.
• The experimental records are typically referred to as data points.
• For example in Chemical Reaction Engineering, Reactant A decomposes in
a batch reactor by the following equation:

• The composition of A in the reactor is measured at various times with results


tabulated below:
Time t,s 0 20 40 60 120 180 300
Concentration CA0 =10 8 6 5 3 2 1
CA , mol/liter
REGRESSION AND INTERPOLATION
• Sometimes measurements are made and recorded continuously with
analog devices, but in most cases, especially in recent years with
the wide use of computers, the measured quantities are digitized and
stored as a set of discrete points.
• Once the data is known, scientists and engineers can use it in different
ways.
• Often the data is used for developing, or evaluating, mathematical
formulas (equations) that represent the data.
• This is done by curve fitting in which a specific form of an equation is
assumed, or provided by a guiding theory, and then the parameters of the
equation are determined such that the equation best fits the data points.
• Sometimes the data points are used for estimating the expected values
between the known points, a procedure called interpolation.
REGRESSION AND INTERPOLATION
• Or for predicting how the data might extend beyond the range over which
it was measured, a procedure called extrapolation.
• A primitive way of curve fitting is by arbitrarily drawing a best fit line
through the points. Such a method is subjective.
• An objective approach is to derive a curve that minimizes the error
between the data points and the curve. This is called least squares
regression.
• When the model is linear in the coefficients, they can be estimated using
linear regression and non-linear regression for models with non linear
coefficients.
REGRESSION AND INTERPOLATION
• The most popular technique is to fit the ‘best’ straight line
(1)
through n data points, .
• where a0 and a1 are coefficients representing the intercept and slope
respectively.
• e is the error or residual between the true value of y and the value
predicted by the model :
(2)
• This is called linear regression.
• The sum of the squares of the residuals between the measured y
and the y calculated with the linear model is :
(3)
REGRESSION AND INTERPOLATION
• The coefficients a0 and a1 can be determined by minimization of equation
3.

(4)

• Simplifying the equations:

(5)
REGRESSION
Example 1
AND INTERPOLATION
A leaf filter has an area of 0.5 m2 and operates at a constant pressure drop of
500 kPa. The following test results were obtained for a slurry in water which
gave rise to a filter cake regarded as incompressible:
V (m3 ) 0.1 0.2 0.3 0.4 0.5
t/V (s/m3 ) 1400 1800 2000 2600 3000

The filtration equation for constant pressure drop is given as

where t is time (s), V is volume of filtrate (m3), A is the surface area (m2), rc
is the cake resistance, -Δp is the pressure drop (kPa), Veq is volume of
filtrate that creates a cake of certain thickness (m3), φ is the volume of cake
formed by passage of unit volume of filtrate and μ is viscosity of fluid
(Pa.s).
REGRESSION AND INTERPOLATION
• It can be seen that the filtration equation is in the form of
and thus we can obtain a0 and a1 by linear regression which corresponds
to and .

Solution
• Refer to excel sheet accompanying this PPT
• The coefficients a0 and a1 are thus determined to be 960 s/m3 and 4000
s/m6 .
• The quantities and are computed.
• Given values of φ and μ , one can also compute the cake resistance, rc .
REGRESSION AND INTERPOLATION
• Other graphing software like Origin Lab also compute linear regression.
REGRESSION AND INTERPOLATION
REGRESSION AND INTERPOLATION
• Matlab computes linear regression using the polyfit inbuilt function.
REGRESSION AND INTERPOLATION
Example 2
According to Charles’s law for an ideal gas, at constant volume, a linear
relationship exists between the pressure, p, and temperature, T. In an
experiment, a fixed volume of gas in a sealed container is submerged in
ice water ( T = OC). The temperature of the gas is then increased in ten
increments up to T = 100 OC by heating the water, and the pressure of the
gas is measured at each temperature. The data from the experiment is:
T, OC 0 10 20 30 40 50 60 70 80 90 100
P, atm 0.94 0.96 1.0 1.05 1.07 1.09 1.14 1.17 1.21 1.24 1.28

(i) Use linear least-squares regression to determine a linear function in the


form that best fits the data points.
(ii) Extrapolate the data to determine the absolute zero temperature, T0 .
REGRESSION AND INTERPOLATION
• Origin Lab graphing software
gives the following plot.
• Linear fit is p = 0.0034T +0.9336
• T0 = -0.9336/0.0034 = -274.5882
• This result is close to the
handbook value of –273.15
REGRESSION AND INTERPOLATION
POLYNOMIAL REGRESSION
• If the independent variable is a linear function of several variables,
, we may wish to fit a set of data

to the following linear equation in a least squares manner, with n > N .

• Defining the error : (6)

• The minimum value E will be obtained with choices of


given by
REGRESSION AND INTERPOLATION

(7)

• These equations can be rewritten in the following matrix form


REGRESSION AND INTERPOLATION

(8)

• Equation is referred to as normal equations for linear regression.


• Any suitable methods for solving matrices studied in module 2 can be
used to obtain the solution for the a coefficients.
REGRESSION AND INTERPOLATION
• Alternatively the LEAST SQUARES SOLUTION that minimizes E is
given by formal MATRIX ALGEGRA as :
(9)

where

(10)
REGRESSION AND INTERPOLATION
Example 3: Application of least squares to develop a cost model for
the cost of heat exchangers
• Curve fitting of the costs of fabrication of heat exchangers can be used
to predict the cost of a new exchanger of the same class with different
design variables.
• Let the cost be expressed as a linear equation:
where

• Estimate the values of the constants from the table .


• The regressors are .
REGRESSION AND INTERPOLATION
• Solution: The matrices used in computing are
as follows:
REGRESSION AND INTERPOLATION
• The solution is given by :

• Solving using Matlab gives

• The model equation is written as


REGRESSION AND INTERPOLATION
FUNCTIONAL REGRESSION
• In some situations the relationship between the dependent and
independent variables is not always linear.
• Example of such functions are:
Arrhenius expression ( reaction engineering) i.e. or
Nusselt number (Heat & Mass transfer) i.e.
• Such expressions can be conveniently converted to linear form by
introduction of logarithms.
• Thus
REGRESSION AND INTERPOLATION
Example 4
From the following equilibrium data for the distribution of SO3 in hexane,
determine a suitable linear (in the parameters) empirical model to
represent the data.
200 400 600 800 1000 1200 1400 1600
Pressure
(psia)
Yi 0.846 0.573 0.401 0.288 0.209 0.153 0.111 0.078
Weight
fraction
hexane
REGRESSION AND INTERPOLATION
• Solution: The graph between –Ln Yi vs xi is a straight line as can been
below.
The data thus fits a model of the
form :

Using Origin Lab, the constants


a0 and a1 are computed to be
0.12127 and -0.00167.
REGRESSION AND INTERPOLATION
• Using the square of the error minimization:

• Therefore the fitting equation is


REGRESSION AND INTERPOLATION
• Other nonlinear equations can be transformed into linear form in a similar way.
REGRESSION AND INTERPOLATION
NON LINEAR REGRESSION
• There are many cases in Chemical Engineering that require non-linear
model fitting to data.
• These models are defined as those that have a non-linear dependence on
their parameters.
• Example :
• Such an equation cannot be converted to linear form as previously seen.
• Thus in these situations we use non-linear regression approach.
• As with least squares approach, the method minimizes the sum of the
square of the residuals but the solution proceeds in an iterative manner.
• The Gauss-Newton method is one algorithm for minimizing the sum of
the squares of the residuals between data and nonlinear equations.
REGRESSION AND INTERPOLATION
• (11)
where yi is the measured value of the dependent variable and
is the equation that is the function of the
independent variable , xi and non-linear function of the parameters
and ei is a random error.
• For convenience: (12)
• The non-linear model can be expanded in a Taylor series around the
parameter values and truncated after the first derivative. Thus:
(13)

where j = initial guess, j+1 = prediction,


REGRESSION AND INTERPOLATION
• Substituting equation 13 into equation 12 yields:

(14)

• In matrix notation :
(15)

• where Aj is the matrix of partial derivatives of the function evaluated at


the initial guess j,
REGRESSION AND INTERPOLATION

(16)
REGRESSION AND INTERPOLATION
• Applying linear least-squares theory yields :

(17)
• Thus, the approach consists of solving the above equation which is
employed to compute improved values for the parameters:

• This procedure is repeated until the solution converges, i.e. satisfies the
convergence stopping criteria.

(18)
REGRESSION AND INTERPOLATION
• The Gauss-Newton method has a number of other possible
shortcomings:

1. partial derivatives of the function may be difficult to evaluate

2. It may converge slowly.

3. It may oscillate widely, that is, continually change directions.

4. It may not converge at all.

• A more general approach is to use nonlinear optimization routines for


nonlinear data fitting.
REGRESSION AND INTERPOLATION
INTERPOLATION
• In Chemical engineering, one has to frequently estimate intermediate
values between precise data points.
• The most common method used for this purpose is polynomial
interpolation.
• The general formula for an nth-order polynomial is:
(19)
• For n + 1 data points, there is one and only one polynomial of order n that
passes through all the points.
• Example, there is only one straight line (i.e., a first-order polynomial)
that connects two points, one parabola connects a set of three points.
• Polynomial interpolation consists of determining the unique nth-order
polynomial that fits n + 1 data points.
REGRESSION AND INTERPOLATION
• This polynomial then provides a formula to compute intermediate values.
• There are a variety of mathematical formats in which this polynomial can be
expressed.

• Examples of interpolating polynomials: (a) first-order (linear) connecting


two points, (b) second-order (quadratic or parabolic) connecting three
points, and (c) third-order (cubic) connecting four points.
REGRESSION AND INTERPOLATION
• Newton’s divided-difference interpolating polynomial is among the
most popular and useful forms.
• The simplest form of interpolation is to
connect two data points with a straight
line.
• This technique, called linear
interpolation, is depicted graphically .

Graphical depiction of linear interpolation.


The shaded areas indicate the similar
triangles used to derive the
linear-interpolation formula.
REGRESSION AND INTERPOLATION
• Linear interpolation is the 1st order version of this technique:
Finite divided difference
approx. of 1st derivative
(20)

• This can be rearranged to :

(21)

• This is called the linear interpolation formula.

• denotes that this is a first order interpolating formula.


REGRESSION AND INTERPOLATION
Example 5
• A mixture of Galena and Limestone have the same undersize distribution
given below :

• Using linear interpolation, calculate the corresponding undersize


percentage for a size of 37.6 microns (Galena) and 73.5 microns
(Limestone).
REGRESSION AND INTERPOLATION
• For Galena

• For Limestone

Example 6
• Estimate the natural logarithm of 2 using linear interpolation. First,
perform the computation by interpolating between ln 1 = 0 and ln 6 =
1.791759. Then, repeat the procedure, but use a smaller interval from ln
1 to ln 4 (1.386294). Note that the true value of ln 2 is 0.6931472.
REGRESSION AND INTERPOLATION
Quadratic interpolation

• In the previous example we


approximated a curve with a
straight line which introduces a
degree of error.
• Estimate can be improved by
introducing some curvature into the
line connecting the points.
• Second-order polynomial (also
called a quadratic polynomial, or
a parabola)
REGRESSION AND INTERPOLATION
• Second order polynomial is given by :
(22)

• Equation 22 is similar in form to equation 19 as follows:


Step1 : multiply out the terms in equation 22

Step 2 : collecting terms yields:


REGRESSION AND INTERPOLATION
• where :

Determination of coefficients :
• For b0, equation 22 with x =x0 is used i.e.

(23)
• Equation 23 can now be substituted in equation 22 and evaluated at x
=x1 for
REGRESSION AND INTERPOLATION
(24)

• Finally equations 23 and 24 can be substituted into equation 22 and


evaluated at x = x2 and solved for :

(25)

• Coefficient b2 is very similar to the finite-divided-difference


approximation of the second derivative .
• Equation 22 is manifesting a structure that is very similar to the
Taylor series expansion.
REGRESSION AND INTERPOLATION
Example 7
Fit a second-order polynomial to the three points used in Example 6.
x0 = 1, f (x0) = 0; x1 = 4, f(x1) = 1.386294; x2 = 6, f (x2) = 1.791759

• General Form of Newton’s Interpolating Polynomials


• The preceding analysis can be generalized to fit an nth-order polynomial
to n + 1 data points. The nth-order polynomial is:

(26)
• For an nth-order polynomial, n + 1 data points are required:
REGRESSION AND INTERPOLATION
• These data points and the following equations to evaluate the coefficients

• The bracketed function evaluations are finite divided differences


REGRESSION AND INTERPOLATION
• The first finite divided difference is represented as:

(31)
• The second finite divided difference, which represents the difference of
two first divided differences, is expressed as:

(32)

• Similarly, the nth finite divided difference is

(33)
REGRESSION AND INTERPOLATION
• These differences can be used to evaluate the coefficients in Eqs. (27)
through (29), which can then be substituted into Eq. (26) to yield
the interpolating polynomial:

(34)

• This is called Newton’s divided-difference interpolating polynomial.


REGRESSION AND INTERPOLATION
LAGRANGE INTERPOLATING POLYNOMIALS
• The Lagrange interpolating polynomial is simply a reformulation of the
Newton polynomial that avoids the computation of divided differences.
• It can be represented concisely as (35)

where (36)

and represents “product of”


• Example, the linear version ( n = 1) is

(37)
REGRESSION AND INTERPOLATION
• 2nd order version is:

(38)

• Rationale underlying the Lagrange formulation can be grasped by


realizing that each term Li(x) will be 1 at x = xi and 0 at all other
sample points.
REGRESSION AND INTERPOLATION
SPLINE INTERPOLATION
• Spline functions are an alternative approach of applying lower-order
polynomials to subsets of data points.
• For example, third-order curves employed to connect each pair of
data points are called cubic splines.
Linear spline
• For linear splines, each function is merely the straight line connecting the
two points at each end of the interval, which is formulated as:
(39)

where is the intercept, which is defined as (40)


and is the slope of the straight lines connecting the points.
REGRESSION AND INTERPOLATION
(41)

• Substituting equation 40 and 41 into equation 39 gives :

(42)
• These equations can be used to evaluate the function at any point
between x1 and xn by first locating the interval within which the point
lies.

• Then the appropriate equation is used to determine the function


value within the interval.
REGRESSION AND INTERPOLATION
Example 7
Fit the data in Table below with first-order splines. Evaluate the
function at x = 5.
1 2 3 4
i
3.0 4.5 7.0 9.0
xi
2.5 1.0 2.5 0.5
fi

Solution
• For the 1st interval from x = 3.0 to x = 4.5:
REGRESSION AND INTERPOLATION
• For the 2nd interval from x = 4.5 to x = 7.0

• The value at x = 5 can be found from the 2nd interval as:

• For the 3rd interval from x = 7.0 to x = 9.0

• The resulting first-order splines are plotted as shown in the figure.


REGRESSION AND INTERPOLATION
• From the plot the primary
disadvantage of first-order splines is
that they are not smooth.
• In essence, at the data points where
two splines meet (called a knot), the
slope changes abruptly.
• In formal terms, the first derivative of
• the function is discontinuous at
these points.
• This deficiency is overcome by
using higher-order polynomial splines
such as cubic splines.
REGRESSION AND INTERPOLATION
Cubic splines
• Cubic splines are most frequently used in practice.
• Quartic or higher-order splines are not used because they tend to exhibit
the instabilities inherent in higher-order polynomials.
• Cubic splines are preferred because they provide the simplest
representation that exhibits the desired appearance of smoothness.
• The objective with cubic splines is to derive a third-order
polynomial for each interval between knots as represented generally by:
(43)

• Thus, for n data points (i = 1, 2, . . . , n), there are n − 1 intervals


and 4(n − 1) un-known coefficients to evaluate.
REGRESSION AND INTERPOLATION
• 4(n − 1) conditions are required for their evaluation.
Derivation of cubic spline
• First condition is that the spline must pass through all the data points:
(44)

which simplifies to: (45)


Substituting equation 45 in equation 43 gives:

(46)
• The condition that each of the cubics must join at the knots is applied.
• For knot i+1, this can be represented as:
REGRESSION AND INTERPOLATION
(47)

where
• First derivatives at the interior nodes must be equal. Equation 46
is differentiated to yield:
(48)

• The equivalence of the derivatives at an interior node, i+1, can


therefore be written as:
(49)
REGRESSION AND INTERPOLATION
• The second derivatives at the interior nodes must also be equal.
Equation 48 is differentiated to yield :
(50)

• Equivalence of the second derivatives at an interior node, i+1 :


(51)

• Solving for di :
(52)
REGRESSION AND INTERPOLATION
• Substituting equation 52 into equation 47:

(53)
• Substituting equation 52 into equation 49 yields:
(54)

• Equation 53 can be solved for bi :

(55)
REGRESSION AND INTERPOLATION
• The index of equation 55 can be reduced by 1:

(56)

• The index of equation 54 can also be reduced by 1:


(57)

• Equations 55 and 56 can be substituted into equation 57 to yield:

(58)
REGRESSION AND INTERPOLATION
• Equation 58 can be written for the interior knots, i = 2, 3, . . . , n − 2, which
results in n − 3 simultaneous tri-diagonal equations with n − 1 unknown
coefficients, c1 , c2 , . . . , cn–1 .
• With two additional conditions, we can solve for the c’s.
• Equations 55and 52 can be used to determine the remaining coefficients,
b and d.
• The two additional end conditions can be formulated in a number of ways.
• One common approach, the natural spline, assumes that the second
derivatives at the end knots are equal to zero.
• Thus the 2nd derivative at the first node [Eq.50] can be set to zero:
i.e. c1 = 0
REGRESSION AND INTERPOLATION
• Same evaluation can be made at the last node:
(59)

• Recalling Eqn. 51, an extraneous parameter cn can be defined as:

• To impose a zero second derivative at the last node, we set cn = 0.


• The final equations can now be written in matrix form as:
REGRESSION AND INTERPOLATION

(60)

• The system of equations is tri-diagonal and thus efficient to solve.

Example 8

• Fit a cubic spline to the data in Example 7. Utilize the results to estimate
the value of the function at x = 5.
REGRESSION AND INTERPOLATION
Solution
• Generate the set of simultaneous equations that will be utilized to
determine the c coefficients in the matrix:

• The necessary function and interval width values are:


REGRESSION AND INTERPOLATION
• Substituting these values yields:

• Solving using MATLAB with the results:


c1 = 0; c2 = 0.8395; c3 = −0.7665; c4 = 0
• Equations 55vand 52 can be used to compute the b’s and d’s:
REGRESSION AND INTERPOLATION
• These results, along with the values for the a’s [Eq. 45], can be
substituted into Eq. 44 to develop the following cubic splines for each
interval:

• The equations can be employed to compute values within each


interval.
• For example, the value at x = 5, which falls within the second interval, is
calculated as
REGRESSION AND INTERPOLATION
First and last equations needed to specify commonly used end conditions for
cubic splines.

END OF MODULE 3

You might also like