Subgrad Method Slides
Subgrad Method Slides
(k)
fbest = min f (x(i))
i=1,...,k
• kx(1) − x?k2 ≤ R
now we use
k k
!
(k)
X X
αi(f (x(i)) − f ?) ≥ (fbest − f ?) αi
i=1 i=1
to get Pk
2 2 2
(k) R + G α
i=1 i
fbest − f? ≤ Pk .
2 i=1 αi
2 2 2
(k) R + G kα
fbest − f? ≤
2kα
2
Pk 2 (i) 2 2 2
(k) R + α
i=1 i kg k2 R + γ k
fbest − f? ≤ Pk ≤ ,
2 i=1 αi 2γk/G
∞
X ∞
X
αk2 < ∞, αk = ∞
i=1 k=1
then Pk
2 2 2
R
(k) + G α i
− f? ≤
fbest Pk
i=1
2 i=1 αi
as k → ∞, numerator converges to a finite number, denominator
(k)
converges to ∞, so fbest → f ?
2 22
Pk
R +G i=1 α i
• optimal choice of αi to achieve Pk ≤ for smallest k:
2 i=1 αi
√
αi = (R/G)/ k, i = 1, . . . , k
• the truth: there really isn’t a good stopping criterion for the subgradient
method . . .
and take g = aj
0 gamma1val
10 gamma2val
gamma3val
−1
fbest − fmin
10
−2
10
−3
10
500 1000 1500 2000 2500 3000
k
−1
10
−2
10
−3
10
0 500 1000 1500 2000 2500 3000
k
• applying recursively,
k
X (f (x(i)) − f ?)2
≤ R2
i=1
kg (i)k22
and so
k
X
(f (x(i)) − f ?)2 ≤ R2G2
i=1
which proves f (x(k)) → f ?
−1
10
−2
10
−3
10
0 500 1000 1500 2000 2500 3000
k
x − PCj (x)
g = ∇ dist(x, Cj ) =
kx − PCj (x)k2
• at each step, project the current point onto the farthest set
• for m = 2 sets, projections alternate onto one set, then the other
• convergence: dist(x(k), C) → 0 as k → ∞
x1
y1 x2
y2 xstar
D
C
n
X
PC1 (X) = max{0, λi}qiqiT
i=1
0
10
dist
−2
10
−4
10
−6
10
0 20 40 60 80 100
k
(k)
• fbest − γk serves as estimate of f ?
(k)
• can show fbest → f ?
0
10
fbest − fmin
−1
10
−2
10
−3
10
0 500 1000 1500 2000 2500 3000
k
−1
10
−2
10
−3
10
0 500 1000 1500 2000
k
2 2 2
Pk
(k) ? R +G i=1 αi
• optimal choice of αi to achieve fbest −f ≤ Pk ≤ :
2 i=1 αi
√
αi = (R/G)/ k, i = 1, . . . , k
(k) RG
• fbest − f ? ≤ √
k
after k iterations
• we query a point x
(k) ? RG
fbest −f ≥ √
k
• f (x) is minimized at
(
1
∗ − λk , 1≤i≤k
x =
0, k+1≤i≤n
• ei + λx is a subgradient
λ
ei∗ + λx ∈ ∂f (x) = ∂ max xi + kxk22
1≤i≤k 2
(k) RG
fbest − f ? ≥ √
2(1 + k)
(k) RG
fbest − f ? ≤ √
k
up to constants