Recognition Patterns: Jean Carlo Grandas Franco March 2020
Recognition Patterns: Jean Carlo Grandas Franco March 2020
Exercise E8.1
0 F (x∗ )
F (x) ≈ F (x∗ ) + F (x∗ ) (x − x∗ ) + (x − x∗ )2
2
We are asked to find the second-order approximation for the following func-
tion:
1
F (x) =
x3 − 34 x − 1
2
3x2 − 43
F (x)0 = −
(x3 − 43 x − 12 )2
24(32x4 − 12x2 + 8x + 3)
F (x)00 = −
(−4x3 + 3x + 2)3
i. For x∗ = −0.5
1
ii. For x∗ = 1.1
500 114925000
F (x) = − 80000(x − 1.1) + (x − 1.1)3
3 3
2
If we plot both approximations using MatLab, we come to a very accurate
result for the equilibrium point at x∗ = −0.5 from −0.6 to 0.4. However, if one
now plot an approximation at x∗ = 1.1, approximation is not accurate even if
we analyze the results from 1 to 1.2.
These results are actually consistent with the expected result, if we consider
that F (x) has an asymptote at x = 1.09791 which is close to 1.1. Thus, consec-
utive derivatives at the given point will tend to infinity.
Exercise E8.2
2 2
F (x) = e(2x1 +2x2 +x1 −5x2 +10) = eg(x)
In order to compute the second order approximation, the gradient and Hes-
sian must be derived from the above function.
4x1 + 1
∇F (x) = g(x)
4x2 − 5
16x21 + 8x1 + 5
2 (4x1 + 1)(4x2 − 5)
∇ F (x) = g(x)
(4x1 + 1)(4x2 − 5) 16x22 − 40x2 + 29
1
F (x) = F (x∗ ) + ∇F (x)Tx∗ (x − x∗ ) + (x − x∗ )T ∇2 F (x)x∗ (x − x∗ )
2
1
∇F (x∗ ) = e10
−5
3
5 −5
∇2 F (x∗ ) = e10
−5 29
e10
5x21 − 10x1 x2 + 2x1 + 29x22 − 10x2 + 2
F (x) =
2
∗ 10 5x1 − 5x2 + 1
∇F (x ) = e
−5x1 + 29x2 − 5
x∗ =
1 1
6 − 30
The stationary points are placed exactly where the slopes (or the gradient)
is set to 0.
∇F (x)x∗ = 0
Thus,
4x1 + 1 0
g(x) =
4x2 − 5 0
Since the exponential functiong(x) will never reach the 0 value, the terms
inside the vectors are the ones that must be set to 0.
5 T
x∗ = − 41
4
4
Exercise E8.5
i. We can now find the stationary points by computing the gradient vector
of the function.
4(x1 + x2 )3 − 12x2 + 1
0
∇F (x) = =
4(x1 + x2 )3 − 12x1 + 1 0
We can now come to the difference between both equations in the vector as:
12x1 − 12x2 = 0
x1 = x2
Thus, for all stationary points, both vector entries must be equal and can
be derived from the resulting cubic equation.
32x31 − 12x1 + 1 = 0
If one test synthetic division on the above polynomial, there are no rational
roots that satisfy the equation, thus, computational tools must be used to verify
that.
1 −0.6504 2 0.085 3 0.5655
x = x = x =
−0.6504 0.085 0.5655
ii. We’re going to check whether the points are maxima or minima.
F (x1 ) = −2.51
5
F (x2 ) = 1.08
F (x3 ) = −0.07
If we now check the Hessian matrix.
12(x1 + x2 )2 12(x1 + x2 )2 − 12
2
∇ F (x) = 2
12(x1 + x2 ) − 12 12(x1 + x2 )2
For any of the stationary points, the corresponding Hessian matrices and
eigenvalues are:
2 1 20.3 8.3
∇ F (x ) =
8.3 20.3
λ1 = 28.61 λ2 = 12
Since both eigenvalues are positive, the given point is a strong minimum and
according to original function, a global minimum.
0.347 −11.7
∇2 F (x2 ) =
−11.7 0.347
λ1 = 12 λ2 = −13.3
One of the eigenvalues is negative, thus, this is a saddle point. However, it
is expected to be a maximum.
15.3 3.35
∇2 F (x2 ) =
3.35 15.3
λ1 = 18.7 λ2 = 12
Both eigenvalues are positive, thus, this is a strong minimum.
1
F (x) = F (x∗ ) + ∇F (x)Tx∗ (x − x∗ ) + (x − x∗ )T ∇2 F (x)x∗ (x − x∗ )
2
6
For x1 :
For x2 :
For x3 :
7
8
Exercise E2.4
A two-layers neural network is to have four inputs and six outputs. The
range of the outputs lies between 0 and 1 and is continuous.
Considering that two layers are to be used and six outputs must be dis-
played, it is necessary to have 6 Neurons on the second layer that will display
the six needed output. On the other side, the number of outputs from the first
layers does not play a crucial role to satisfy this condition, since this number of
outputs is to be used as inputs for the second layer. Hence, it can be set to 1
or more.
The dimension of the first layer weight matrix is 4x4 if we consider that there
are four inputs needed. The dimensions of the weight matrix on the second lay-
ers would then correspond to the number of outputs coming from the first layers.
In the second layers, there is no doubt that a linear transfer function is the
best option, although there are many others that will also display values within
the given limits. Of course, saturation at 1 is also a good option to make sure
that no output will lie outside the desired bandwidth. For the first layer, there
are more options depending on how many neurons are to be used, for instante,
if the are many of them, a hard limit is also possible and thus the second layers
would compute the output from the number of inputs at 1, for example.
Biases are normally optional, but in some cases, the would help the system
to work better.