Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

Recognition Patterns: Jean Carlo Grandas Franco March 2020

The document discusses several examples of: 1) Finding second-order Taylor polynomial approximations of functions around specific points. This involves calculating derivatives up to the second order. 2) Computing gradients and Hessians to derive second-order approximations of other functions. Stationary points are identified by setting the gradient to zero. 3) A two-layer neural network example with 4 inputs and 6 outputs is discussed. The number of neurons in each layer and weight matrix dimensions are considered.

Uploaded by

Jean Carlo Gf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Recognition Patterns: Jean Carlo Grandas Franco March 2020

The document discusses several examples of: 1) Finding second-order Taylor polynomial approximations of functions around specific points. This involves calculating derivatives up to the second order. 2) Computing gradients and Hessians to derive second-order approximations of other functions. Stationary points are identified by setting the gradient to zero. 3) A two-layer neural network example with 4 inputs and 6 outputs is discussed. The number of neurons in each layer and weight matrix dimensions are considered.

Uploaded by

Jean Carlo Gf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Recognition patterns

Jean Carlo Grandas Franco


March 2020

Exercise E8.1

The general formula for a second-order approximation of a given function


with one variable, could be written as:

0 F (x∗ )
F (x) ≈ F (x∗ ) + F (x∗ ) (x − x∗ ) + (x − x∗ )2
2

We are asked to find the second-order approximation for the following func-
tion:

1
F (x) =
x3 − 34 x − 1
2

3x2 − 43
F (x)0 = −
(x3 − 43 x − 12 )2

24(32x4 − 12x2 + 8x + 3)
F (x)00 = −
(−4x3 + 3x + 2)3

i. For x∗ = −0.5

F (x) = −4 + 24(x + 0.5)2

1
ii. For x∗ = 1.1

500 114925000
F (x) = − 80000(x − 1.1) + (x − 1.1)3
3 3

iii. If we now plot all functions to compare.

2
If we plot both approximations using MatLab, we come to a very accurate
result for the equilibrium point at x∗ = −0.5 from −0.6 to 0.4. However, if one
now plot an approximation at x∗ = 1.1, approximation is not accurate even if
we analyze the results from 1 to 1.2.

These results are actually consistent with the expected result, if we consider
that F (x) has an asymptote at x = 1.09791 which is close to 1.1. Thus, consec-
utive derivatives at the given point will tend to infinity.

Exercise E8.2

Lets now focus on the function below:

2 2
F (x) = e(2x1 +2x2 +x1 −5x2 +10) = eg(x)

In order to compute the second order approximation, the gradient and Hes-
sian must be derived from the above function.

 
4x1 + 1
∇F (x) = g(x)
4x2 − 5

16x21 + 8x1 + 5
 
2 (4x1 + 1)(4x2 − 5)
∇ F (x) = g(x)
(4x1 + 1)(4x2 − 5) 16x22 − 40x2 + 29

Thus, the second order approximation can be written as:

1
F (x) = F (x∗ ) + ∇F (x)Tx∗ (x − x∗ ) + (x − x∗ )T ∇2 F (x)x∗ (x − x∗ )
2

i. Lets check for x∗ = 0


 
0

 
1
∇F (x∗ ) = e10
−5

3
 
5 −5
∇2 F (x∗ ) = e10
−5 29

e10
5x21 − 10x1 x2 + 2x1 + 29x22 − 10x2 + 2

F (x) =
2

ii. Stationary points for the approximation.

 
∗ 10 5x1 − 5x2 + 1
∇F (x ) = e
−5x1 + 29x2 − 5

x∗ =
1 1

6 − 30

iii. Lets find the stationary points for F (x).

The stationary points are placed exactly where the slopes (or the gradient)
is set to 0.

∇F (x)x∗ = 0

Thus,

   
4x1 + 1 0
g(x) =
4x2 − 5 0

Since the exponential functiong(x) will never reach the 0 value, the terms
inside the vectors are the ones that must be set to 0.

5 T
x∗ = − 41
 
4

4
Exercise E8.5

Consider the following function of two variables:

F (x) = (x1 + x2 )4 − 12x1 x2 + x1 + x2 + 1

i. We can now find the stationary points by computing the gradient vector
of the function.

4(x1 + x2 )3 − 12x2 + 1
   
0
∇F (x) = =
4(x1 + x2 )3 − 12x1 + 1 0

We can now come to the difference between both equations in the vector as:

12x1 − 12x2 = 0
x1 = x2

Thus, for all stationary points, both vector entries must be equal and can
be derived from the resulting cubic equation.

32x31 − 12x1 + 1 = 0

If one test synthetic division on the above polynomial, there are no rational
roots that satisfy the equation, thus, computational tools must be used to verify
that.

     
1 −0.6504 2 0.085 3 0.5655
x = x = x =
−0.6504 0.085 0.5655

ii. We’re going to check whether the points are maxima or minima.

For all of the stationary points, we have:

F (x1 ) = −2.51

5
F (x2 ) = 1.08
F (x3 ) = −0.07
If we now check the Hessian matrix.

12(x1 + x2 )2 12(x1 + x2 )2 − 12
 
2
∇ F (x) = 2
12(x1 + x2 ) − 12 12(x1 + x2 )2

For any of the stationary points, the corresponding Hessian matrices and
eigenvalues are:

 
2 1 20.3 8.3
∇ F (x ) =
8.3 20.3
λ1 = 28.61 λ2 = 12
Since both eigenvalues are positive, the given point is a strong minimum and
according to original function, a global minimum.

 
0.347 −11.7
∇2 F (x2 ) =
−11.7 0.347
λ1 = 12 λ2 = −13.3
One of the eigenvalues is negative, thus, this is a saddle point. However, it
is expected to be a maximum.

 
15.3 3.35
∇2 F (x2 ) =
3.35 15.3
λ1 = 18.7 λ2 = 12
Both eigenvalues are positive, thus, this is a strong minimum.

iii. For second-order approximation it is necessary to compute first the Hes-


sian matrix.

1
F (x) = F (x∗ ) + ∇F (x)Tx∗ (x − x∗ ) + (x − x∗ )T ∇2 F (x)x∗ (x − x∗ )
2

Now, the approximations at any of the stationary points are:

6
For x1 :

F (x) = 10.2x21 + 8.3x1 x2 + 18.6x1 + 10.2x22 + 18.6x2 + 9.59

For x2 :

F (x) = 0.173x21 − 11.7x1 x2 + 0.961x1 + 0.173x22 + 0.961x2 + 1

For x3 :

F (x) = 7.67x21 + 3.35x1 x2 − 10.6x1 + 7.67x22 − 10.6x2 + 5.91

iv. The graphs for the above functions are.

7
8
Exercise E2.4

A two-layers neural network is to have four inputs and six outputs. The
range of the outputs lies between 0 and 1 and is continuous.

Considering that two layers are to be used and six outputs must be dis-
played, it is necessary to have 6 Neurons on the second layer that will display
the six needed output. On the other side, the number of outputs from the first
layers does not play a crucial role to satisfy this condition, since this number of
outputs is to be used as inputs for the second layer. Hence, it can be set to 1
or more.

The dimension of the first layer weight matrix is 4x4 if we consider that there
are four inputs needed. The dimensions of the weight matrix on the second lay-
ers would then correspond to the number of outputs coming from the first layers.

In the second layers, there is no doubt that a linear transfer function is the
best option, although there are many others that will also display values within
the given limits. Of course, saturation at 1 is also a good option to make sure
that no output will lie outside the desired bandwidth. For the first layer, there
are more options depending on how many neurons are to be used, for instante,
if the are many of them, a hard limit is also possible and thus the second layers
would compute the output from the number of inputs at 1, for example.

Biases are normally optional, but in some cases, the would help the system
to work better.

The rest of the exercises can be found in the attached file.

You might also like