Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Practice BFGS Algorithm

This document details the optimization of a quadratic function using the BFGS algorithm. It outlines the steps taken to compute the initial gradient, search direction, optimal step size, and updates to the iterate and inverse Hessian approximation. The final result includes the updated inverse Hessian and a summary of the next steps in the algorithm.

Uploaded by

Daniel Solomon
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Practice BFGS Algorithm

This document details the optimization of a quadratic function using the BFGS algorithm. It outlines the steps taken to compute the initial gradient, search direction, optimal step size, and updates to the iterate and inverse Hessian approximation. The final result includes the updated inverse Hessian and a summary of the next steps in the algorithm.

Uploaded by

Daniel Solomon
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Case Study: Quadratic Function Optimization

Let’s consider optimizing the quadratic function:


1 T T
f ( x )= x Qx−b x
2
where

Q=
[ 41 13] , b=[12]
This is a convex function with a unique minimum at x ¿=Q−1 b .
Step 1: Initialization
We choose an initial guess:

x 0=
[ 21]
The initial inverse Hessian approximation is the identity matrix:

H 0=I =
[ 10 01]
Now, your turn! Compute the initial gradient g0=∇ f ( x 0 ) . What do you get?

(RESPONSE): We compute the initial gradient:


g0=∇ f ( x 0 ) =Q x 0 −b

Substituting the given values:

g0 =
[ 41 13] [ 21]−[ 12]
Perform the matrix-vector multiplication:

[ ][ ][]
4 ( 2 )+1 ( 1 )
1 ( 2 ) +3 ( 1 )
=
8+1
2+3
=
9
5

Subtracting b :

g0 =
[ 95]−[ 12]=[ 83]
Thus, the initial gradient is:
g0 =
[]
8
3

Now, compute the search direction using the initial inverse Hessian
approximation:
p0=−H 0 g 0

What do you get?

We compute the search direction:


p0=−H 0 g 0

Since H 0 is the identity matrix:

p0=−I g0=−
[ 10 01][ 83]
Multiplying by I (which keeps the vector unchanged):

p0=−
[ 83]=[−8
−3 ]

Thus, the search direction is:

p0 =
[−8
−3 ]

Now, perform a line search to determine the optimal step size α 0. We seek
α 0 >0 such that it minimizes:

ϕ ( α )=f ( x 0 + α p 0 )

1 T T
Substituting f ( x )= x Qx−b x , derive the optimal α 0.
2

(RESPONSE):
(I ASKED IF THAT WAS ALL AND IT SAID,)
No, there’s much more! Let’s go step by step through the line search process
in detail.
We aim to minimize:
ϕ ( α )=f ( x 0 + α p 0 )
Step 1: Express x k+1 in Terms of α

Since x 0= [ 21] and p =[−8


0
−3 ]
, we write:

x ( α ) =x0 + α p0=
[ 21]+α [−8
−3 ] [ 1−3 α ]
=
2−8 α

Step 2: Compute ϕ ( α )=f ( x ( α ) )


1 T T
Since f ( x )= x Qx−b x , we expand:
2
1 T T
ϕ ( α )= ( x ( α ) ) Qx ( α )−b x ( α )
2
First Term: Quadratic Form Expansion

Expanding x ( α )T Qx ( α ):

T
x ( α ) Qx ( α )=( 2−8 α , 1−3 α )
[ 41 13 ][ 2−8
1−3 α ]
α

Performing the matrix-vector multiplication:

Qx ( α )=
[ 4 ( 2−8 α ) +1 ( 1−3 α )
1 ( 2−8 α )+ 3 (1−3 α ) ]
¿
[8−32
2−8 α +3−9 α ] [5−17 α ]
α +1−3 α
=
9−35 α

Now computing x ( α )T ( Qx ( α ) ):

[
( 2−8 α , 1−3 α ) ⋅ 9−35 α
5−17 α ]
¿ ( 2−8 α ) ( 9−35 α ) + ( 1−3 α ) ( 5−17 α )
Expanding:
2 2
¿ 18−70 α−72 α +280 α +5−17 α −15 α + 51 α
2
¿ 23−174 α + 331 α
Thus:
1 1 23 331 2
x ( α ) Qx ( α )= ( 23−174 α +331 α ) = −87 α +
T 2
α
2 2 2 2
Second Term: Linear Term Expansion

b x ( α )=[ 1 2 ]
T
[ 2−8
1−3 α ]
α

¿ 1 ( 2−8 α ) +2 (1−3 α )=2−8 α +2−6 α =4−14 α

Step 3: Derivative to Find α 0


23 331 2
ϕ ( α )= −87 α + α − ( 4−14 α )
2 2
23 331 2
¿ −87 α + α −4+ 14 α
2 2
15 331 2
¿ −73 α + α
2 2
Taking the derivative:
331
ϕ ' ( α )=−73+ ( 2 α )=−73+331 α
2
Setting ϕ ' ( α )=0:
331 α =73
73
α 0= ≈ 0.2205
331

Final Answer for Step Size:


α 0 ≈ 0.2205

Your Turn
Now update the iterate:
x 1=x 0 +α 0 p0

Substituting the values, compute x 1.

(RESPONSE):
We update the iterate:
x 1=x 0 +α 0 p0

Substituting the known values:


x 0=
[] [ ]
2
1
, p0 =
−8
−3
,α 0=
73
331
≈ 0.2205

Performing the element-wise update:

x 1=
[21]+ 0.2205[−8
−3 ]

¿
[2−0.2205
1−0.2205 ×3 ]
×8

¿
[1−0.6615
2−1.764
]
¿
[0.3385
0.236
]
Thus, the updated iterate is:

x 1=
[0.3385
0.236
]
Next Step: Compute Differences for the BFGS Update
Now, compute:
s0 =x1−x 0 , y 0=∇ f ( x 1 )−∇ f ( x 0 )

Since we already know x 0 and x 1, compute s0 first. Then, find ∇ f ( x 1 ) using the
gradient formula:
∇ f ( x )=Qx−b

Use this to compute y 0. Let’s see your result!

We now compute the differences required for the BFGS update.

Step 1: Compute s0
s0 =x1−x 0

Substituting the values:

s0 =
[ 0.3385
0.236
]−[ 21]
¿
[ ]
0.236−2
0.3385−1

¿
[−0.6615
−1.764
]
Step 2: Compute y 0=∇ f ( x 1 ) −∇ f ( x 0 )
We use the gradient formula:
∇ f ( x )=Qx−b

We already computed ∇ f ( x 0 )= [ 83]. Now compute ∇ f ( x ): 1

∇ f ( x 1 )=Q x 1−b

¿
[ 41 13][ 0.3385
0.236
]−[12]
First, perform the matrix-vector multiplication:

Q x1 =
[ 4 ( 0.236 ) +1 ( 0.3385 )
1 ( 0.236 )+3 ( 0.3385 ) ]
¿
[0.944
0.236+1.0155 ]
+0.3385

¿
[1.2825
1.2515 ]

Now subtract b :

∇ f ( x 1 )=
[1.2825
1.2515 ]−
[ 2]
1

¿
[1.2825−1
1.2515−2 ]

¿
[−0.7485
0.2825
]
Now, compute y 0:
y 0=∇ f ( x 1 ) −∇ f ( x 0 )

¿
[−0.7485
0.2825
]−[ 83]
¿
[0.2825−8
−0.7485−3 ]
¿
[−7.7175
−3.7485 ]

Final Results

[−0.6615
s0 =
−1.764
]
y=
[−7.7175
−3.7485 ]
0

(RESPONSE):

Next Step: Compute ρ0 and Update H 1


Now, compute:
1
ρ0 = T
y s0
0

H 1=( I −ρ0 s 0 y T0 ) H 0 ( I −ρ0 y 0 s T0 ) + ρ0 s 0 sT0

Let’s see your results!

You’re right! Let’s wrap it up properly.

Final Step: Compute ρ0 and Update H 1


Step 1: Compute ρ0
1
ρ0 = T
y s0
0

First, compute y T0 s0 :
T
y 0 s0 =(−7.7175 ,−3.7485 ) ⋅ (−1.764 ,−0.6615 )

¿ (−7.7175 ×−1.764 )+ (−3.7485×−0.6615 )


¿ 13.616+2.479
¿ 16.095
Thus:
1
ρ0 = ≈ 0.0621
16.095

Step 2: Compute H 1 using the BFGS update

H 1=( I −ρ0 s 0 y T0 ) H 0 ( I −ρ0 y 0 s T0 ) + ρ0 s 0 sT0

Since H 0=I , the formula simplifies to:


T T T
H 1=I −ρ 0 s0 y 0 −ρ0 y 0 s 0 + ρ0 s 0 s 0

Computing each term:

1. ρ0 s 0 y T0 and ρ0 y 0 s T0 :

T
ρ0 s 0 y 0 =0.0621
[−0.6615
−1.764
] [−7.7175 −3.7485 ]

¿ 0.0621
[ 13.616 6.609
5.103 2.480 ] =
[ 0.846
0.317
0.411
0.154 ]
Since ρ0 s 0 y T0 = ρ0 y 0 sT0 , we subtract twice this matrix from I :

I −2 ×
[0.846
0.317
0.411
0.154
=
][
1−1.692 −0.822
=
−0.692 −0.822
−0.634 1−0.308 −0.634 0.692 ][ ]
2. ρ0 s 0 s T0 :

T
[−0.6615
ρ0 s 0 s 0 =0.0621
−1.764
] [−1.764 −0.6615 ]

¿ 0.0621
[ 1.167
3.112 1.167
0.438 ] =
[ 0.193
0.073
0.073
0.027 ]
Adding to the previous result:

H 1=
[−0.692
−0.634 0.692 ] [ 0.073
−0.822 0.193
+
0.073
0.027 ]
¿
[−0.499
−0.561 0.719 ]
−0.749

Final Result:
The updated inverse Hessian approximation:
H 1=
[−0.499
−0.561
−0.749
0.719 ]
Final Thoughts
This completes the first iteration of the BFGS algorithm. In subsequent
iterations, we would:
 Check for convergence.

 Compute a new search direction.

 Perform a line search.

 Update x k , H k , and iterate again.

You might also like