Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Optimal Analysis of Method with Batching for Monotone Stochastic Finite-Sum Variational Inequalities

  • Published:
Doklady Mathematics Aims and scope Submit manuscript

Abstract

Variational inequalities are a universal optimization paradigm that is interesting in itself, but also incorporates classical minimization and saddle point problems. Modern realities encourage to consider stochastic formulations of optimization problems. In this paper, we present an analysis of a method that gives optimal convergence estimates for monotone stochastic finite-sum variational inequalities. In contrast to the previous works, our method supports batching and does not lose the oracle complexity optimality. The effectiveness of the algorithm, especially in the case of small but not single batches is confirmed experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.

REFERENCES

  1. F. Facchinei and J.-S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer, New York, 2003).

    Google Scholar 

  2. G. Scutari, D. P. Palomar, F. Facchinei, and J.-S. Pang, “Convex optimization, game theory, and variational inequality theory,” IEEE Signal Process. Mag. 27 (3), 35–49 (2010).

    Article  Google Scholar 

  3. J. V. Neumann and O. Morgenstern, Theory of Games and Economic Behavior (Princeton Univ. Press, Princeton, 1944).

    Google Scholar 

  4. P. T. Harker and J.-S. Pang, “Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications,” Math. Program. 48, 161–220 (1990).

    Article  MathSciNet  Google Scholar 

  5. A. Ben-Tal, L. E. Ghaoui, and A. Nemirovski, Robust Optimization (Princeton Univ. Press, Princeton, 2009).

    Book  Google Scholar 

  6. Y. Nesterov, “Dual extrapolation and its applications to solving variational inequalities and related problems,” Math. Program. 109 (2), 319–344 (2007).

    Article  MathSciNet  Google Scholar 

  7. A. Nemirovski, “Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems,” SIAM J. Optim. 15 (1), 229–251 (2004).

    Article  MathSciNet  Google Scholar 

  8. T. Joachims, “A support vector method for multivariate performance measures,” Proceedings of the 22nd International Conference on Machine Learning (2005), pp. 377–384. https://doi.org/10.1145/1102351.1102399

  9. F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, “Optimization with sparsity-inducing penalties” (2011). https://doi.org/10.48550/arXiv.1108.0775

  10. L. Xu, J. Neufeld, B. Larson, and D. Schuurmans, “Maximum margin clustering,” Adv. Neural Inf. Process. Syst. 17, 1537–1544 (2005).

    Google Scholar 

  11. F. Bach, J. Mairal, and J. Ponce, “Convex sparse matrix factorizations” (2008). https://doi.org/10.48550/arXiv.0812.1869

  12. E. Esser, X. Zhang, and T. F. Chan, “A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science,” SIAM J. Imaging Sci. 3 (4), 1015–1046 (2010).

    Article  MathSciNet  Google Scholar 

  13. A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imaging Vision 40 (1), 120–145 (2011).

    Article  MathSciNet  Google Scholar 

  14. S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian, “Deep decentralized multi-task multi-agent reinforcement learning under partial observability,” in Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR (2017), Vol. 70, pp. 2681–2690. http://proceedings.mlr.press/v70/omidshafiei17a.html

  15. Y. Jin and A. Sidford, “Efficiently solving MDPs with stochastic mirror descent,” in Proceedings of the 37th International Conference on Machine Learning (ICML), PMLR (2020), Vol. 119, pp. 4890–4900.

  16. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations (ICLR) (2018).

  17. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial networks” (2014). https://doi.org/10.48550/arXiv.1406.2661

  18. C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng, “Training GANs with optimism” (2017). https://doi.org/10.48550/arXiv.1711.00141

  19. G. Gidel, H. Berard, G. Vignoud, P. Vincent, and S. Lacoste-Julien, “A variational inequality perspective on generative adversarial networks” (2018). https://doi.org/10.48550/arXiv.1802.10551

  20. P. Mertikopoulos, B. Lecouat, H. Zenati, C.-S. Foo, V. Chandrasekhar, and G. Piliouras, “Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile” (2018). https://doi.org/10.48550/arXiv.1807.02629

  21. H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat. 22 (3), 400–407 (1951).

    Article  MathSciNet  Google Scholar 

  22. R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” Adv. Neural Inf. Process. Syst. 26, 315–323 (2013).

    Google Scholar 

  23. A. Defazio, F. Bach, and S. Lacoste-Julien, “SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives,” Adv. Neural Inf. Process. Syst. 27, 1646–1654 (2014).

    Google Scholar 

  24. L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, “SARAH: A novel method for machine learning problems using stochastic recursive gradient,” in International Conference on Machine Learning, PMLR (2017), pp. 2613–2621.

  25. Z. Allen-Zhu, “Katyusha: The first direct acceleration of stochastic gradient methods,” in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (2017), pp. 1200–1205.

  26. G. M. Korpelevich, “The extragradient method for finding saddle points and other problems,” Matecon 12, 35–49 (1977).

    MathSciNet  Google Scholar 

  27. A. Mokhtari, A. Ozdaglar, and S. Pattathil, “A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach,” in International Conference on Artificial Intelligence and Statistics, PMLR (2020), pp. 1497–1507.

  28. L. D. Popov, “A modification of the Arrow–Hurwicz method for search of saddle points,” Math. Notes Acad. Sci. USSR 28, 845–848 (1980).

    Google Scholar 

  29. P. Tseng, “A modified forward-backward splitting method for maximal monotone mappings,” SIAM J. Control Optim. 38 (2), 431–446 (2000).

    Article  MathSciNet  Google Scholar 

  30. A. Juditsky, A. Nemirovski, and C. Tauvel, “Solving variational inequalities with stochastic mirror-prox algorithm,” Stochastic Syst. 1 (1), 17–58 (2011).

    Article  MathSciNet  Google Scholar 

  31. Y. Carmon, Y. Jin, A. Sidford, and K. Tian, “Variance reduction for matrix games” (2019). https://doi.org/10.48550/arXiv.1907.02056

  32. A. Alacaoglu and Y. Malitsky, “Stochastic variance reduction for variational inequality methods” (2021). https://doi.org/10.48550/arXiv.2102.08352

  33. A. Alacaoglu, Y. Malitsky, and V. Cevher, “Forward-reflected-backward method with variance reduction,” Comput. Optim. Appl. 80, 321–346 (2021). https://doi.org/10.1007/s10589-021-00305-3

    Article  MathSciNet  Google Scholar 

  34. Y. Han, G. Xie, and Z. Zhang, “Lower complexity bounds of finite-sum optimization problems: The results and construction” (2021). https://doi.org/10.48550/arXiv.2103.08280

  35. D. Kovalev, A. Beznosikov, A. Sadiev, M. Persiianov, P. Richt’arik, and A. Gasnikov, “Optimal algorithms for decentralized stochastic variational inequalities” (2022). https://doi.org/10.48550/arXiv.2202.0277

  36. A. Nemirovski, Mini-Course on Convex Programming Algorithms (2013).

  37. A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. Optim. 19 (4), 1574–1609 (2009).

    Article  MathSciNet  Google Scholar 

  38. L. Condat, “Fast projection onto the simplex and the l 1 ball,” Math. Program. 158 (1–2), 575–585 (2016).

    Article  MathSciNet  Google Scholar 

Download references

Funding

The work of A. Pichugin and M. Pechin was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Moscow Institute of Physics and Technology dated November 1, 2021 no. 70-2021-00138.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Beznosikov.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

APPENDIX A

APPENDIX A

1.1 PROOF OF THEOREM 1

Before proving the theorem we introduce the following lemmas.

Lemma 1 (Lemma 2.4 from [32]). Let \(\mathcal{F} = ({{\mathcal{F}}_{k}}{{)}_{{k \geqslant 0}}}\) be a filtration and (uk) a stochastic process adapted to \(\mathcal{F}\) with \(\mathbb{E}[{{u}^{{k + 1}}}|{{F}_{k}}] = 0\). Then for any \(K \in \mathbb{N}\), \({{x}^{0}} \in X\) and any compact set \(\mathcal{C} \subseteq X\)

$$\begin{gathered} \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \sum\limits_{k = 0}^{K - 1} \langle {{u}^{{k + 1}}},x\rangle } \right] \\ \leqslant \mathop {\max }\limits_{x \in \mathcal{C}} \frac{1}{2}{\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}} + \frac{1}{2}\sum\limits_{k = 0}^{K - 1} \mathbb{E}{\text{||}}{{u}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}}. \\ \end{gathered} $$

Lemma 2. Under Assumption 1 for iterates of Algorithm 1 the following inequality holds:

$$\begin{gathered} \mathbb{E}\left[ {{{{\left\| {{{\Delta }^{k}} - {{\mathbb{E}}_{k}}\left[ {{{\Delta }^{k}}} \right]} \right\|}}^{2}}} \right] \\ \leqslant \frac{{2{{{\overline L }}^{2}}}}{b}\mathbb{E}\left[ {{{{\left\| {{{x}^{k}} - {{w}^{{k - 1}}}} \right\|}}^{2}} + {{{\left\| {{{x}^{k}} - {{x}^{{k - 1}}}} \right\|}}^{2}}} \right], \\ \end{gathered} $$
(8)

where \({{\mathbb{E}}_{k}}[{{\Delta }^{k}}]\) is equal to

$${{\mathbb{E}}_{k}}[{{\Delta }^{k}}] = 2F({{x}^{k}}) - F({{x}^{{k - 1}}}).$$
(9)

Proof. We start from line 6 of Algorithm 1

$${{\mathbb{E}}_{k}}[{\text{||}}{{\Delta }^{k}} - {{\mathbb{E}}_{k}}[{{\Delta }^{k}}]{\text{|}}{{{\text{|}}}^{2}}]$$
$$\begin{gathered} = {{\mathbb{E}}_{k}}\left[ {\left\| {\frac{1}{b}\sum\limits_{j \in {{S}^{k}}} ({{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{w}^{{k - 1}}}) + ({{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{x}^{{k - 1}}})))} \right.} \right. \\ \left. {{{{\left. {_{{_{{_{{_{{_{{_{{_{{_{{}}}}}}}}}}}}}}}} + \,F({{w}^{{k - 1}}}) - (2F({{x}^{k}}) - F({{x}^{{k - 1}}}))} \right\|}}^{2}}} \right]. \\ \end{gathered} $$

With Cauchy–Schwarz inequality, we have

$${{\mathbb{E}}_{k}}[{\text{||}}{{\Delta }^{k}} - {{\mathbb{E}}_{k}}[{{\Delta }^{k}}]{\text{|}}{{{\text{|}}}^{2}}]$$
$$ \leqslant 2{{\mathbb{E}}_{k}}\left[ {{{{\left\| {\frac{1}{b}\sum\limits_{j \in {{S}^{k}}} ({{F}_{j}}({{x}^{k}})\, - \,{{F}_{j}}({{w}^{{k - 1}}}))\, - \,(F({{x}^{k}})\, - \,F({{w}^{{k - 1}}}))} \right\|}}^{2}}} \right]$$
$$ + \,\,2{{\mathbb{E}}_{k}}\left[ {{{{\left\| {\frac{1}{b}\sum\limits_{j \in {{S}^{k}}} ({{F}_{j}}({{x}^{k}})\, - \,{{F}_{j}}({{x}^{{k - 1}}}))\, - \,(F({{x}^{k}})\, - \,F({{x}^{{k - 1}}}))} \right\|}}^{2}}} \right].$$

Using we choose \(j_{1}^{k}, \ldots ,j_{b}^{k}\) in Sk indepdently and uniformly, one can note that

$$\begin{gathered} {{\mathbb{E}}_{k}}[\langle ({{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{w}^{{k - 1}}})) - (F({{x}^{k}}) - F({{w}^{{k - 1}}})), \\ ({{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{w}^{{k - 1}}})) - (F({{x}^{k}}) - F({{w}^{{k - 1}}}))\rangle ] \\ \end{gathered} $$
$$ = {{\mathbb{E}}_{k}}[\langle {{\mathbb{E}}_{{j_{i}^{k}}}}[({{F}_{{j_{i}^{k}}}}({{x}^{k}}) - {{F}_{{j_{i}^{k}}}}({{w}^{{k - 1}}})) - (F({{x}^{k}}) - F({{w}^{{k - 1}}}))],$$
$${{\mathbb{E}}_{{j_{l}^{k}}}}[({{F}_{{j_{l}^{k}}}}({{x}^{k}}) - {{F}_{{j_{l}^{k}}}}({{w}^{{k - 1}}})) - (F({{x}^{k}}) - F({{w}^{{k - 1}}}))]\rangle ] = 0.$$

Hence, we get

$${{\mathbb{E}}_{k}}[{\text{||}}{{\Delta }^{k}} - {{\mathbb{E}}_{k}}[{{\Delta }^{k}}]{\text{|}}{{{\text{|}}}^{2}}]$$
$$ \leqslant \,2{{\mathbb{E}}_{k}}\left[ {\sum\limits_{j \in {{S}^{k}}} \frac{1}{{{{b}^{2}}}}{{{\left\| {({{F}_{j}}({{x}^{k}})\, - \,{{F}_{j}}({{w}^{{k - 1}}}))\, - \,(F({{x}^{k}})\, - \,F({{w}^{{k - 1}}}))} \right\|}}^{2}}} \right]$$
$$ + \,2{{\mathbb{E}}_{k}}\left[ {\sum\limits_{j \in {{S}^{k}}} \frac{1}{{{{b}^{2}}}}{{{\left\| {({{F}_{j}}({{x}^{k}})\, - \,{{F}_{j}}({{x}^{{k - 1}}}))\, - \,(F({{x}^{k}})\, - \,F({{x}^{{k - 1}}}))} \right\|}}^{2}}} \right]$$
$$ = \,\frac{2}{{{{b}^{2}}}}{{\mathbb{E}}_{k}}\left[ {\sum\limits_{j \in {{S}^{k}}} {{{\left\| {({{F}_{j}}({{x}^{k}})\, - \,{{F}_{j}}({{w}^{{k - 1}}}))\, - \,(F({{x}^{k}})\, - \,F({{w}^{{k - 1}}}))} \right\|}}^{2}}} \right]$$
$$ + \,\,\frac{2}{{{{b}^{2}}}}{{\mathbb{E}}_{k}}\left[ {\sum\limits_{j \in {{S}^{k}}} {{{\left\| {({{F}_{j}}({{x}^{k}})\, - \,{{F}_{j}}({{x}^{{k - 1}}}))\, - \,(F({{x}^{k}})\, - \,F({{x}^{{k - 1}}}))} \right\|}}^{2}}} \right]$$
$$\begin{gathered} \leqslant \frac{2}{{{{b}^{2}}}}{{\mathbb{E}}_{k}}\left[ {\sum\limits_{j \in {{S}^{k}}} {{{\left\| {{{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{w}^{{k - 1}}})} \right\|}}^{2}}} \right] \\ + \frac{2}{{{{b}^{2}}}}{{\mathbb{E}}_{k}}\left[ {\sum\limits_{j \in {{S}^{k}}} {{{\left\| {{{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{x}^{{k - 1}}})} \right\|}}^{2}}} \right]. \\ \end{gathered} $$

In the last step, we used the fact that \(\mathbb{E}{\text{||}}X\)\(\mathbb{E}X{\text{|}}{{{\text{|}}}^{2}} = \mathbb{E}{\text{||}}X{\text{|}}{{{\text{|}}}^{2}} - \,{\text{||}}\mathbb{E}X{\text{|}}{{{\text{|}}}^{2}}\). Next, we again take into account that \(j_{1}^{k}, \ldots ,j_{b}^{k}\) in Sk are chosen uniformly

$${{\mathbb{E}}_{k}}[{\text{||}}{{\Delta }^{k}} - {{\mathbb{E}}_{k}}[{{\Delta }^{k}}]{\text{|}}{{{\text{|}}}^{2}}]$$
$$\begin{gathered} \leqslant \frac{2}{b}{{\mathbb{E}}_{k}}[{{\mathbb{E}}_{{j \sim {\text{u}}{\text{.a}}{\text{.r}}.\{ 1, \ldots ,M\} }}}[{\text{||}}{{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{w}^{{k - 1}}}){\text{|}}{{{\text{|}}}^{2}} \\ \, + \,{\text{||}}{{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{x}^{{k - 1}}}){\text{|}}{{{\text{|}}}^{2}}]] \\ \end{gathered} $$
$$ = \frac{2}{{Mb}}\sum\limits_{j = 1}^M \left( {{{{\left\| {{{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{w}^{{k - 1}}})} \right\|}}^{2}} + {{{\left\| {{{F}_{j}}({{x}^{k}}) - {{F}_{j}}({{x}^{{k - 1}}})} \right\|}}^{2}}} \right).$$

Since each operator Fj is Lj-Lipschitz, we can rewrite it as

$$\begin{gathered} {{\mathbb{E}}_{k}}[{\text{||}}{{\Delta }^{k}} - {{\mathbb{E}}_{k}}[{{\Delta }^{k}}]{\text{|}}{{{\text{|}}}^{2}}] \\ \leqslant \frac{2}{{mb}}\sum\limits_{j = 1}^m L_{j}^{2}\left( {{\text{||}}{{x}^{k}} - {{w}^{{k - 1}}}{\text{|}}{{{\text{|}}}^{2}} + \,{\text{||}}{{x}^{k}} - {{x}^{{k - 1}}}{\text{|}}{{{\text{|}}}^{2}}} \right). \\ \end{gathered} $$

Applying the definition of \(\overline L \), we obtain

$$\begin{gathered} {{\mathbb{E}}_{k}}\left[ {{\text{||}}{{\Delta }^{k}} - {{\mathbb{E}}_{k}}[{{\Delta }^{k}}]{\text{|}}{{{\text{|}}}^{2}}} \right] \\ \leqslant \frac{{2{{{\overline L }}^{2}}}}{b}\left( {{\text{||}}{{x}^{k}} - {{w}^{{k - 1}}}{\text{|}}{{{\text{|}}}^{2}} + \,{\text{||}}{{x}^{k}} - {{x}^{{k - 1}}}{\text{|}}{{{\text{|}}}^{2}}} \right). \\ \end{gathered} $$

Taking the full expectation concludes the proof. \(\square \)

Lemma 3. For iterates of Algorithm 1 with \(\gamma = p\) the following bound holds for any compact set \(\mathcal{C} \subseteq X\):

$$\begin{gathered} \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \sum\limits_{k = 0}^{K - 1} {{e}_{1}}(x,k)} \right] \leqslant 2\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}x - {{x}^{0}}{\text{|}}{{{\text{|}}}^{2}}\} \\ + \,\,\frac{{\gamma (1 - \gamma )}}{2}\sum\limits_{k = 0}^{K - 1} \mathbb{E}{\text{||}}{{x}^{{k + 1}}} - {{\omega }^{k}}{\text{|}}{{{\text{|}}}^{2}}, \\ \end{gathered} $$

where e1(k, x) = \({\text{||}}{{w}^{{k + 1}}}\, - \,x{\text{|}}{{{\text{|}}}^{2}}\, - \,{\text{||}}{{w}^{k}}\, - \,x{\text{|}}{{{\text{|}}}^{2}}\, + \,(1\, - \,\gamma ){\text{||}}{{x}^{{k + 1}}}\)x||2.

Proof. For shortness we introduce

$${{u}^{{k + 1}}} = \gamma {{x}^{{k + 1}}} + (1 - \gamma ){{\omega }^{k}} - {{\omega }^{{k + 1}}}.$$

With new notation, we can rewrite \({{e}_{1}}(k,x)\) as

$${{e}_{1}}(k,x) = 2\left\langle {{{u}^{{k + 1}}},x} \right\rangle - \gamma {\text{||}}{{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} - (1 - \gamma ){\text{||}}{{w}^{k}}{\text{|}}{{{\text{|}}}^{2}} + \,{\text{||}}{{w}^{k}}{\text{|}}{{{\text{|}}}^{2}}.$$

From line 8 of Algorithm 1 and using that \(\gamma = p\), one can obtain

$$\mathbb{E}[{{\mathbb{E}}_{k}}[{\text{||}}{{\omega }^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} - (1 - \gamma ){\text{||}}{{\omega }^{k}}{\text{|}}{{{\text{|}}}^{2}}]] = 0.$$

Using two properties above, we reach for any compact set \(\mathcal{C} \subseteq X\)

$$\begin{gathered} \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \sum\limits_{k = 0}^{K - 1} {{e}_{1}}(k,x)} \right] = 2\mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \sum\limits_{k = 0}^{K - 1} \left\langle {{{u}^{{k + 1}}},x} \right\rangle } \right] \\ \, + \mathbb{E}\left[ { - \gamma {\text{||}}{{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} - (1 - \gamma ){\text{||}}{{w}^{k}}{\text{|}}{{{\text{|}}}^{2}}\, + \,{\text{||}}{{w}^{k}}{\text{|}}{{{\text{|}}}^{2}}} \right] \\ \end{gathered} $$
$$ = 2\mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \sum\limits_{k = 0}^{K - 1} \left\langle {{{u}^{{k + 1}}},x} \right\rangle } \right].$$

With \({{\mathcal{F}}_{k}} = \sigma ({{\xi }_{0}}, \ldots ,{{\xi }_{k}},{{x}^{k}})\) we have that \(\mathbb{E}[{{u}^{{k + 1}}}\) | \({{\mathcal{F}}_{k}}]\) = 0. It means that we can apply Lemma 1. Thus, we get

$$\begin{gathered} \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \sum\limits_{k = 0}^{K - 1} {{e}_{1}}(k,x)} \right] \\ \leqslant 2\mathop {\max }\limits_{x \in \mathcal{C}} {{\left\| {{{x}_{0}} - x} \right\|}^{2}} + \frac{1}{2}\sum\limits_{k = 0}^{K - 1} \mathbb{E}{{\left\| {{{u}^{{k + 1}}}} \right\|}^{2}}. \\ \end{gathered} $$
(10)

We estimate \({\text{||}}{{u}_{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}}\) using the fact that \(\mathbb{E}{\text{||}}X\)\(\mathbb{E}X{\text{|}}{{{\text{|}}}^{2}} = \mathbb{E}{\text{||}}X{\text{|}}{{{\text{|}}}^{2}} - \,{\text{||}}\mathbb{E}X{\text{|}}{{{\text{|}}}^{2}}\) and line 8 of Algorithm 1:

$$\begin{gathered} \mathbb{E}{{\left\| {{{u}_{{k + 1}}}} \right\|}^{2}} = \mathbb{E}[{{\mathbb{E}}_{k}}{{\left\| {{{u}_{{k + 1}}}} \right\|}^{2}}] \\ = \mathbb{E}[{{\mathbb{E}}_{k}}{\text{||}}{{\mathbb{E}}_{k}}\left[ {{{\boldsymbol\omega }_{{k + 1}}}} \right] - {{\boldsymbol\omega }_{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$ = \mathbb{E}[{{\mathbb{E}}_{k}}{{\left\| {{{\boldsymbol\omega }_{{k + 1}}}} \right\|}^{2}} - {{\left\| {{{\mathbb{E}}_{k}}\left[ {{{\boldsymbol\omega }_{{k + 1}}}} \right]} \right\|}^{2}}]$$
$$ = \mathbb{E}\left[ {\gamma {{{\left\| {{{x}_{{k + 1}}}} \right\|}}^{2}} + (1 - \gamma ){{{\left\| {{{\boldsymbol\omega }_{k}}} \right\|}}^{2}} - {{{\left\| {\gamma {{x}_{{k + 1}}} + (1 - \gamma ){{\boldsymbol\omega }_{k}}} \right\|}}^{2}}} \right]$$
$$ = \gamma (1 - \gamma )\mathbb{E}{{\left\| {{{x}_{{k + 1}}} - {{\omega }_{k}}} \right\|}^{2}}.$$

Applying this result to (10), we get

$$\begin{gathered} \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \sum\limits_{k = 0}^{K - 1} {{e}_{1}}(k,x)} \right] = 2\mathop {\max }\limits_{x \in \mathcal{C}} {{\left\| {{{x}_{0}} - x} \right\|}^{2}} \\ + \,\,\frac{{\gamma (1 - \gamma )}}{2}\sum\limits_{k = 0}^{K - 1} \mathbb{E}{{\left\| {{{x}_{{k + 1}}} - {{\omega }_{k}}} \right\|}^{2}}. \\ \end{gathered} $$

Proof of Theorem 1. We start from

$$\begin{gathered} {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} = {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} \\ \, + 2\langle {{x}^{{k + 1}}} - {{x}^{k}},{{x}^{{k + 1}}} - x\rangle - \,{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} \\ \end{gathered} $$
$$ = {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + 2\gamma \langle {{w}^{k}} - {{x}^{k}},{{x}^{{k + 1}}} - x\rangle - 2\eta \langle {{\Delta }^{k}},{{x}^{{k + 1}}} - x\rangle $$
$$\begin{gathered} - \,{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} - 2[\langle {{x}^{k}} + \gamma ({{w}^{k}} - {{x}^{k}}) \\ \, - \eta {{\Delta }^{k}} - {{x}^{{k + 1}}},{{x}^{{k + 1}}} - x\rangle ]. \\ \end{gathered} $$

From line 7 of Algorithm 1 and according to the property (5) of proximal operator, it follows, that

$${{x}^{k}} + \gamma ({{w}^{k}} - {{x}^{k}}) - \eta {{\Delta }^{k}} - {{x}^{{k + 1}}} \in \partial (\eta g)({{x}^{{k + 1}}}).$$

From convexity of \(g( \cdot )\), we obtain

$$\begin{gathered} {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} \leqslant {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} \\ \, + 2\gamma \langle {{w}^{k}} - {{x}^{k}},{{x}^{{k + 1}}} - x\rangle - 2\eta \langle {{\Delta }^{k}},{{x}^{{k + 1}}} - x\rangle \\ \end{gathered} $$
$$ - \,{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} + 2\eta g(x) - 2\eta g({{x}^{{k + 1}}}).$$

Using \(2\gamma \langle {{w}^{k}} - {{x}^{k}},{{x}^{{k + 1}}} - x\rangle \) = \(2\gamma \langle {{w}^{k}} - x,{{x}^{{k + 1}}} - x\rangle \)\(2\gamma \langle {{x}^{k}} - x,{{x}^{{k + 1}}} - x\rangle \) and the following property of scalar product: \(2\langle a,b\rangle = {\text{||}}a + b{\text{|}}{{{\text{|}}}^{2}} - \,{\text{||}}a{\text{|}}{{{\text{|}}}^{2}} - \,{\text{||}}b{\text{|}}{{{\text{|}}}^{2}}\), we get

$$\begin{gathered} {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} \leqslant {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} \\ + \,\gamma \left( {{\text{||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + \,{\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} - \,{\text{||}}{{x}^{{k + 1}}} - {{w}^{k}}{\text{|}}{{{\text{|}}}^{2}}} \right) \\ \end{gathered} $$
$$\begin{gathered} \, - 2\eta \langle {{\Delta }^{k}},{{x}^{{k + 1}}} - x\rangle - \gamma {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} \\ \, - \gamma {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + \gamma {\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} \\ \end{gathered} $$
$$ - \,{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} + 2\eta g(x) - 2\eta g({{x}^{{k + 1}}})$$
$$ = {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + \gamma {\text{||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{x}^{{k + 1}}} - {{w}^{k}}{\text{|}}{{{\text{|}}}^{2}}$$
$$\begin{gathered} \, - 2\eta \langle {{\Delta }^{k}},{{x}^{{k + 1}}} - x\rangle - (1 - \gamma ){\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} \\ \, + 2\eta g(x) - 2\eta g({{x}^{{k + 1}}}). \\ \end{gathered} $$

Applying the properties of \({{\mathbb{E}}_{k}}[{{\Delta }^{k}}]\) specified in Lemma 2, we obtain

$$\begin{gathered} {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} \leqslant {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + \gamma {\text{|||}}{{w}^{k}} - x{{{\text{|}}}^{2}} \\ \, - \gamma {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{w}^{k}} - {{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} \\ \end{gathered} $$
$$ - \,2\eta \langle {{\mathbb{E}}_{k}}[{{\Delta }^{k}}],{{x}^{{k + 1}}} - x\rangle - (1 - \gamma ){\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}$$
$$ + \,2\eta \langle {{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}\rangle + 2\eta \langle {{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x\rangle $$
$$ + \,2\eta g(x) - 2\eta g({{x}^{{k + 1}}})$$
$$ = {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + \gamma {\text{||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{w}^{k}} - {{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}}$$
$$ - \,2\eta \left\langle {F({{x}^{k}}) + F({{x}^{k}}) - F({{x}^{{k - 1}}}),{{x}^{{k + 1}}} - x} \right\rangle $$
$$ - \,(1 - \gamma ){\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} + 2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ + \,2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$ + \,2\eta g(x) - 2\eta g({{x}^{{k + 1}}})$$
$$ = {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + \gamma {\text{||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{w}^{k}} - {{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}}$$
$$ - \,2\eta \langle F({{x}^{k}}) - F({{x}^{{k + 1}}}) + F({{x}^{k}}) - F({{x}^{{k - 1}}}),{{x}^{{k + 1}}} - x\rangle $$
$$ - \,2\eta \langle F({{x}^{{k + 1}}}),{{x}^{{k + 1}}} - x\rangle $$
$$ - \,(1 - \gamma ){\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} + 2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ + \,2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle + 2\eta g(x) - 2\eta g({{x}^{{k + 1}}}).$$

By a simple rearrangements, we obtain

$$2\eta (g({{x}^{{k + 1}}}) - g(x)) + 2\eta \langle F({{x}^{{k + 1}}}),{{x}^{{k + 1}}} - x\rangle $$
$$\begin{gathered} \leqslant {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} + \gamma {\text{||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{x}^{k}} - x{\text{|}}{{{\text{|}}}^{2}} \\ \, - \gamma {\text{||}}{{w}^{k}} - {{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} - \,{\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} \\ \end{gathered} $$
$$ - \,2\eta \langle F({{x}^{k}}) - F({{x}^{{k + 1}}}) + F({{x}^{k}}) - F({{x}^{{k - 1}}}),{{x}^{{k + 1}}} - x\rangle $$
$$ - \,(1 - \gamma ){\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} + 2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ + \,2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$ = (1 - \gamma ){{\left\| {{{x}^{k}} - x} \right\|}^{2}} + {{\left\| {{{w}^{k}} - x} \right\|}^{2}}$$
$$ - \,\,(1 - \gamma ){{\left\| {{{x}^{{k + 1}}} - x} \right\|}^{2}} - {{\left\| {{{w}^{{k + 1}}} - x} \right\|}^{2}}$$
$$\begin{gathered} + {{\left\| {{{w}^{{k + 1}}} - x} \right\|}^{2}} - \gamma {{\left\| {{{x}^{{k + 1}}} - x} \right\|}^{2}} \\ \, - (1 - \gamma ){{\left\| {{{w}^{k}} - x} \right\|}^{2}} - \gamma {{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}^{2}} \\ \end{gathered} $$
$$\begin{gathered} - \,\,2\eta \langle F({{x}^{k}}) - F({{x}^{{k + 1}}}),{{x}^{{k + 1}}} - x\rangle \\ \, + 2\eta \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - x\rangle \\ \end{gathered} $$
$$ - \,\,(1 - \gamma ){\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} + 2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ + \,2\eta \left\langle {{{\mathbb{E}}_{k}}\left[ {{{\Delta }^{k}}} \right] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$ = (1 - \gamma ){{\left\| {{{x}^{k}} - x} \right\|}^{2}} + {{\left\| {{{w}^{k}} - x} \right\|}^{2}}$$
$$ - \,(1 - \gamma ){{\left\| {{{x}^{{k + 1}}} - x} \right\|}^{2}} - {{\left\| {{{w}^{{k + 1}}} - x} \right\|}^{2}}$$
$$\begin{gathered} + \,\,{{\left\| {{{w}^{{k + 1}}} - x} \right\|}^{2}} - \gamma {{\left\| {{{x}^{{k + 1}}} - x} \right\|}^{2}} \\ \, - (1 - \gamma ){{\left\| {{{w}^{k}} - x} \right\|}^{2}} - \gamma {{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}^{2}} \\ \end{gathered} $$
$$ - \,2\eta \langle F({{x}^{k}}) - F({{x}^{{k + 1}}}),{{x}^{{k + 1}}} - x\rangle $$
$$\begin{gathered} \, + 2\eta \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{k}} - x\rangle \\ \, + 2\eta \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle \\ \end{gathered} $$
$$ - \,(1 - \gamma ){{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}^{2}} + 2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ + \,2\eta \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle .$$

After taking sum and then averaging, one can get

$$2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {\langle F({{x}^{{k + 1}}}),{{x}^{{k + 1}}} - x\rangle + g({{x}^{{k + 1}}}) - g(x)} \right]$$
$$ \leqslant \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {(1 - \gamma ){{{\left\| {{{x}^{k}} - x} \right\|}}^{2}} + {{{\left\| {{{w}^{k}} - x} \right\|}}^{2}}} \right]$$
$$ - \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {(1 - \gamma ){{{\left\| {{{x}^{{k + 1}}} - x} \right\|}}^{2}} + {{{\left\| {{{w}^{{k + 1}}} - x} \right\|}}^{2}}} \right]$$
$$ - 2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{k}}) - F({{x}^{{k + 1}}}),{{x}^{{k + 1}}} - x\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{k}} - x\rangle $$
$$ + \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {{{{\left\| {{{w}^{{k + 1}}} - x} \right\|}}^{2}} - \gamma {{{\left\| {{{x}^{{k + 1}}} - x} \right\|}}^{2}} - (1 - \gamma ){{{\left\| {{{w}^{k}} - x} \right\|}}^{2}}} \right]$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$ - \frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}^{2}} - \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}^{2}}$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ = \frac{{2 - \gamma }}{K}{{\left\| {{{x}^{0}} - x} \right\|}^{2}} - \frac{{1 - \gamma }}{K}{{\left\| {{{x}^{K}} - x} \right\|}^{2}} - \frac{1}{K}{{\left\| {{{w}^{K}} - x} \right\|}^{2}}$$
$$ - \,2\eta \cdot \frac{1}{K}\langle F({{x}^{{K - 1}}}) - F({{x}^{K}}),{{x}^{K}} - x\rangle $$
$$ + \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {{{{\left\| {{{w}^{{k + 1}}} - x} \right\|}}^{2}} - \gamma {{{\left\| {{{x}^{{k + 1}}} - x} \right\|}}^{2}} - (1 - \gamma ){{{\left\| {{{w}^{k}} - x} \right\|}}^{2}}} \right]$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$ - \frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}^{2}} - \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}^{2}}$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle .$$
(11)

Here we also used the initialization of Algorithm 1 with \({{w}^{0}} = {{x}^{{ - 1}}} = {{x}^{0}}\). Applying Young’s inequality, using the \(L\)-Lipshetzness of \(F\), and taking into account the definition of \(\eta \leqslant \frac{1}{{8L}}\) from conditions of the theorem for any k, one can obtain

$$\begin{gathered} - \,2\eta \langle F({{x}^{{K - 1}}}) - F({{x}^{K}}),{{x}^{K}} - x\rangle \\ \leqslant 2{{\eta }^{2}}{{\left\| {F({{x}^{{K - 1}}}) - F({{x}^{K}})} \right\|}^{2}} + \frac{1}{2}{\text{||}}{{x}^{K}} - x{\text{|}}{{{\text{|}}}^{2}} \\ \end{gathered} $$
$$ \leqslant 2{{\eta }^{2}}{{L}^{2}}{{\left\| {{{x}^{{K - 1}}} - {{x}^{K}}} \right\|}^{2}} + \frac{1}{2}{{\left\| {{{x}^{K}} - x} \right\|}^{2}}$$
$$ \leqslant \frac{1}{{32}}{{\left\| {{{x}^{{K - 1}}} - {{x}^{K}}} \right\|}^{2}} + \frac{1}{2}{{\left\| {{{x}^{K}} - x} \right\|}^{2}}.$$
(12)

Combining (11) and (12), we get

$$2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} [\langle F({{x}^{{k + 1}}}),{{x}^{{k + 1}}} - x\rangle + g({{x}^{{k + 1}}}) - g(x)]$$
$$ \leqslant \frac{{2 - \gamma }}{K}{{\left\| {{{x}^{0}} - x} \right\|}^{2}} - \frac{1}{K}\left( {\frac{1}{2} - \gamma } \right){{\left\| {{{x}^{K}} - x} \right\|}^{2}} - \frac{1}{K}{{\left\| {{{w}^{K}} - x} \right\|}^{2}}$$
$$\begin{gathered} + \,\,\frac{1}{{32K}}{\text{||}}{{x}^{{K - 1}}} - {{x}^{K}}{\text{|}}{{{\text{|}}}^{2}} + \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \text{[}{\text{||}}{{w}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} \\ \, - \gamma {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} - (1 - \gamma ){\text{||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$ - \frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}^{2}} - \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}^{2}}$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ \leqslant \frac{{2 - \gamma }}{K}{{\left\| {{{x}^{0}} - x} \right\|}^{2}}$$
$$ + \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {{{{\left\| {{{w}^{{k + 1}}} - x} \right\|}}^{2}} - \gamma {{{\left\| {{{x}^{{k + 1}}} - x} \right\|}}^{2}} - (1 - \gamma ){{{\left\| {{{w}^{k}} - x} \right\|}}^{2}}} \right]$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$\begin{gathered} - \frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {\kern 1pt} {\text{||}}{{w}^{k}} - {{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} - \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} \\ \, + \frac{1}{{32K}}{\text{||}}{{x}^{{K - 1}}} - {{x}^{K}}{\text{|}}{{{\text{|}}}^{2}} \\ \end{gathered} $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle .$$

Next, we use monotonicity of F, apply Jensen’s inequality for the convex function g and obtain

$$\begin{gathered} 2\eta \left[ {\left\langle {F(x),\frac{1}{K}\sum\limits_{k = 0}^{K - 1} {{x}^{{k + 1}}} - x} \right\rangle + g\left( {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} {{x}^{{k + 1}}}} \right) - g(x)} \right] \\ \leqslant \frac{{2 - \gamma }}{K}{{\left\| {{{x}^{0}} - x} \right\|}^{2}} \\ \end{gathered} $$
$$ + \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {{{{\left\| {{{w}^{{k + 1}}} - x} \right\|}}^{2}} - \gamma {{{\left\| {{{x}^{{k + 1}}} - x} \right\|}}^{2}} - (1 - \gamma ){{{\left\| {{{w}^{k}} - x} \right\|}}^{2}}} \right]$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle $$
$$\begin{gathered} - \,\,\frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {\kern 1pt} {\text{||}}{{w}^{k}} - {{x}^{{k + 1}}}{\text{|}}{{{\text{|}}}^{2}} - \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {\kern 1pt} {\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}} \\ + \frac{1}{{32K}}{\text{||}}{{x}^{{K - 1}}} - {{x}^{K}}{\text{|}}{{{\text{|}}}^{2}} \\ \end{gathered} $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle .$$

Using new notation \({{\bar {x}}^{K}} = \frac{1}{K}\sum\limits_{k = 0}^{K - 1} {{x}^{{k + 1}}}\) and taking maximum on \(\mathcal{C}\), we achieve

$$2\eta {\text{Gap}}({{\bar {x}}^{K}}) \leqslant \mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{{2 - \gamma }}{K}{{{\left\| {{{x}^{0}} - x} \right\|}}^{2}}_{{_{{_{{_{{_{{_{{_{{_{{_{{}}}}}}}}}}}}}}}}}}} \right.$$
$$ + \,\,\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left[ {{{{\left\| {{{w}^{{k + 1}}} - x} \right\|}}^{2}} - \gamma {{{\left\| {{{x}^{{k + 1}}} - x} \right\|}}^{2}} - (1 - \gamma ){{{\left\| {{{w}^{k}} - x} \right\|}}^{2}}} \right]\,$$
$$\left. { + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle } \right\} + \frac{1}{{32K}}{{\left\| {{{x}^{{K - 1}}} - {{x}^{K}}} \right\|}^{2}}$$
$$ - \frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}^{2}} - \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}^{2}}$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle $$
$$ \leqslant \mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{{2 - \gamma }}{K}{{{\left\| {{{x}^{0}} - x} \right\|}}^{2}}} \right\}$$
$$\begin{gathered} + \mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \text{[}{\text{||}}{{w}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}}\, - \,\gamma {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}}} \right. \\ \left. {_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{}}}}}}}}}}}}}}}}}}}} - \,(1 - \gamma ){\text{||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}}]} \right\} \\ \end{gathered} $$
$$\begin{gathered} + \,2\eta \mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle } \right\} \\ + \frac{1}{{32K}}{{\left\| {{{x}^{{K - 1}}} - {{x}^{K}}} \right\|}^{2}} \\ \end{gathered} $$
$$ - \frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}^{2}} - \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}^{2}}$$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle $$
$$ + \,2\eta \cdot \frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle .\,$$

Here we also used that maximum of the sum not greater than the sum of the maximums. After that we take the an expectation and get

$$2\eta \mathbb{E}[{\text{Gap}}({{\bar {x}}^{K}})] \leqslant \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{{2 - \gamma }}{K}{{{\left\| {{{x}^{0}} - x} \right\|}}^{2}}} \right\}} \right]$$
$$\begin{gathered} + \,\mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} [{\text{||}}{{w}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}} - \gamma {\text{||}}{{x}^{{k + 1}}} - x{\text{|}}{{{\text{|}}}^{2}}} \right.} \right. \\ \left. {\left. {_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{}}}}}}}}}}}}}}}}}}}}}} - \,(1 - \gamma {\text{)||}}{{w}^{k}} - x{\text{|}}{{{\text{|}}}^{2}}]} \right\}} \right] \\ \end{gathered} $$
$$\begin{gathered} \, + 2\eta \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{k}} - x} \right\rangle } \right\}} \right] \\ \, + \frac{1}{{32K}}\mathbb{E}[{\text{||}}{{x}^{{K - 1}}} - {{x}^{K}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$\, - \mathbb{E}\left[ {\frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right]$$
$$ + \,2\eta \mathbb{E}\left[ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle } \right]$$
$$ + \,2\eta \mathbb{E}\left[ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle } \right].$$

With Lemma 3 for the second line of the previous estimate and Lemma 1 for the third line, we get

$$2\eta \mathbb{E}[{\text{Gap}}({{\bar {x}}^{K}})] \leqslant \mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{{2 - \gamma }}{K}{{{\left\| {{{x}^{0}} - x} \right\|}}^{2}}} \right\}} \right]$$
$$ + \mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{2}{K}{\text{||}}x - {{x}^{0}}{\text{|}}{{{\text{|}}}^{2}}} \right\} + \frac{{\gamma (1 - \gamma )}}{{2K}}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{x}^{{k + 1}}} - {{\omega }^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$ + \mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {\frac{1}{K}{\text{||}}x - {{x}^{0}}{\text{|}}{{{\text{|}}}^{2}}} \right\} + \frac{{{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$\begin{gathered} - \,\,\mathbb{E}\left[ {\frac{\gamma }{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right] \\ + \frac{1}{{32K}}\mathbb{E}[{\text{||}}{{x}^{{K - 1}}} - {{x}^{K}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$ + \,2\eta \mathbb{E}\left[ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle } \right]$$
$$ + \,2\eta \mathbb{E}\left[ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle } \right]$$
$$ \leqslant \frac{4}{K}\mathbb{E}[\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} ] + \frac{{{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$\begin{gathered} \, - \mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right] \\ + \frac{1}{{32K}}\mathbb{E}[{\text{||}}{{x}^{{K - 1}}} - {{x}^{K}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$ + \,2\eta \mathbb{E}\left[ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle } \right]$$
$$ + \,2\eta \mathbb{E}\left[ {\frac{1}{K}\sum\limits_{k = 0}^{K - 1} \left\langle {{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}} \right\rangle } \right].$$
(13)

According to Young’s inequality,

$$\begin{gathered} \mathbb{E}[2\eta \langle {{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}},{{x}^{{k + 1}}} - {{x}^{k}}\rangle ] \\ \leqslant 4{{\eta }^{2}}\mathbb{E}[{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}] + \frac{1}{4}\mathbb{E}[{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}], \\ \end{gathered} $$
(14)

and

$$\begin{gathered} \mathbb{E}[2\eta \langle F({{x}^{{k - 1}}}) - F({{x}^{k}}),{{x}^{{k + 1}}} - {{x}^{k}}\rangle ] \\ \leqslant 4{{\eta }^{2}}\mathbb{E}[{\text{||}}F({{x}^{{k - 1}}}) - F({{x}^{k}}){\text{|}}{{{\text{|}}}^{2}}] + \frac{1}{4}\mathbb{E}[{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}]. \\ \end{gathered} $$
(15)

Combining (14), (15) with (13), we obtain

$$\begin{gathered} 2\eta \mathbb{E}[{\text{Gap}}({{{\bar {x}}}^{K}})] \leqslant \frac{4}{K}\mathbb{E}[\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} ] \\ + \frac{{{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$\begin{gathered} - \,\,\mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right] \\ + \frac{1}{{32K}}\mathbb{E}[{\text{||}}{{x}^{{K - 1}}} - {{x}^{K}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$\begin{gathered} + \frac{{4{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}F({{x}^{{k - 1}}}) - F({{x}^{k}}){\text{|}}{{{\text{|}}}^{2}}] \\ + \frac{1}{{4K}}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$\begin{gathered} + \frac{{4{{\eta }^{2}}}}{K}\mathbb{E}\sum\limits_{k = 0}^{K - 1} \text{[}{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}] \\ + \frac{1}{{4K}}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}] \\ \end{gathered} $$
$$ \leqslant \frac{4}{K}\mathbb{E}[\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} ] + \frac{{5{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$\begin{gathered} - \,\,\mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1{\text{/}}2 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right] \\ + \frac{1}{{32K}}\mathbb{E}\left[ {{{{\left\| {{{x}^{{K - 1}}} - {{x}^{K}}} \right\|}}^{2}}} \right] \\ \end{gathered} $$
$$ + \frac{{4{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}F({{x}^{{k - 1}}}) - F({{x}^{k}}){\text{|}}{{{\text{|}}}^{2}}].$$

L—Lipschitzness of F (Assumption 1) and the choice of \(\gamma \leqslant \frac{1}{L}\) give

$$\begin{gathered} 2\eta \mathbb{E}[{\text{Gap}}({{{\bar {x}}}^{K}})] \leqslant \frac{4}{K}\mathbb{E}[\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} ] \\ + \frac{{5{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}\left[ {{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}} \right] \\ \end{gathered} $$
$$\begin{gathered} \, - \mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1{\text{/}}2 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right] \\ + \frac{1}{{32K}}\mathbb{E}\left[ {{{{\left\| {{{x}^{{K - 1}}} - {{x}^{K}}} \right\|}}^{2}}} \right] \\ \end{gathered} $$
$$ + \,\,\frac{{4{{\eta }^{2}}{{L}^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{x}^{{k - 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$ \leqslant \frac{4}{K}\mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} } \right] + \frac{{5{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}\left[ {{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}} \right]$$
$$\begin{gathered} - \,\,\mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1{\text{/}}2 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right] \\ + \frac{1}{{32K}}\mathbb{E}\left[ {{{{\left\| {{{x}^{{K - 1}}} - {{x}^{K}}} \right\|}}^{2}}} \right] \\ \end{gathered} $$
$$ + \frac{1}{{4K}}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{x}^{{k - 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$ \leqslant \frac{4}{K}\mathbb{E}[\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} ] + \frac{{5{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}\left[ {{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}} \right]$$
$$ - \,\,\mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1{\text{/}}2 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right]$$
$$ + \,\,\frac{1}{{4K}}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{x}^{{k + 1}}} - {{x}^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$ = \frac{4}{K}\mathbb{E}[\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} ] + \frac{{5{{\eta }^{2}}}}{K}\sum\limits_{k = 0}^{K - 1} \mathbb{E}[{\text{||}}{{\mathbb{E}}_{k}}[{{\Delta }^{k}}] - {{\Delta }^{k}}{\text{|}}{{{\text{|}}}^{2}}]$$
$$ - \,\,\mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1{\text{/}}4 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right].$$

Here we also used the initialization of Algorithm 1 with \({{x}^{{ - 1}}} = {{x}^{0}}\). Applying Lemma 2, we obtain

$$2\eta \mathbb{E}[{\text{Gap}}({{\bar {x}}^{K}})] \leqslant \frac{4}{K}\mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {{\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}} \right\}} \right]$$
$$ + \,\,\frac{{10{{\eta }^{2}}{{{\overline L }}^{2}}}}{{bK}}\sum\limits_{k = 0}^{K - 1} ({\text{||}}{{x}^{k}} - {{w}^{{k - 1}}}{\text{|}}{{{\text{|}}}^{2}} + {{\left\| {{{x}^{k}} - {{x}^{{k - 1}}}} \right\|}^{2}})$$
$$ - \,\,\mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1{\text{/}}4 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right]$$
$$ \leqslant \frac{4}{K}\mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {{\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}} \right\}} \right]$$
$$ + \,\,\frac{{10{{\eta }^{2}}{{{\overline L }}^{2}}}}{{bK}}\sum\limits_{k = 0}^{K - 1} \left( {{\text{||}}{{x}^{{k + 1}}} - {{w}^{k}}{\text{|}}{{{\text{|}}}^{2}} + {{{\left\| {{{x}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}}} \right)$$
$$ - \,\,\mathbb{E}\left[ {\frac{\gamma }{{2K}}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}} + \frac{{1{\text{/}}4 - \gamma }}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right]$$
$$ \leqslant \frac{4}{K}\mathbb{E}\left[ {\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {{\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}} \right\}} \right]$$
$$\begin{gathered} - \,\,\mathbb{E}\left[ {\left( {\frac{\gamma }{2} - \frac{{10{{\eta }^{2}}{{{\overline L }}^{2}}}}{b}} \right)\frac{1}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{w}^{k}} - {{x}^{{k + 1}}}} \right\|}}^{2}}} \right. \\ \left. { + \left( {\frac{1}{4} - \gamma - \frac{{10{{\eta }^{2}}{{{\overline L }}^{2}}}}{b}} \right)\frac{1}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right]. \\ \end{gathered} $$

Here we again used the initialization of Algorithm 1 with \({{w}^{{ - 1}}} = {{x}^{{ - 1}}} = {{x}^{0}}\). The choice of \(\eta \leqslant \frac{{\sqrt {\gamma b} }}{{8\bar {L}}}\) and 0 < \(\gamma \leqslant \frac{1}{{16}}\) gives

$$2\eta \mathbb{E}[{\text{Gap}}({{\bar {x}}^{K}})] \leqslant \frac{4}{K}\mathbb{E}[\mathop {\max }\limits_{x \in \mathcal{C}} \{ {\text{||}}{{x}^{0}} - x{\text{|}}{{{\text{|}}}^{2}}\} ]$$
$$ - \,\,\mathbb{E}\left[ {\left( {\frac{1}{{12}} - \gamma } \right)\frac{1}{K}\sum\limits_{k = 0}^{K - 1} {{{\left\| {{{x}^{{k + 1}}} - {{x}^{k}}} \right\|}}^{2}}} \right]$$
$$ \leqslant \frac{4}{K}\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {{{{\left\| {{{x}^{0}} - x} \right\|}}^{2}}} \right\}.$$

And we have

$$\mathbb{E}[{\text{Gap}}({{\bar {x}}^{K}})] \leqslant \frac{2}{{\eta K}}\mathop {\max }\limits_{x \in \mathcal{C}} \left\{ {{{{\left\| {{{x}^{0}} - x} \right\|}}^{2}}} \right\}.$$

Substitution of \(\eta \) from the conditions of the theorem and \(\gamma = p\) finishes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pichugin, A., Pechin, M., Beznosikov, A. et al. Optimal Analysis of Method with Batching for Monotone Stochastic Finite-Sum Variational Inequalities. Dokl. Math. 108 (Suppl 2), S348–S359 (2023). https://doi.org/10.1134/S1064562423701582

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1064562423701582