Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A penalty barrier framework for nonconvex constrained optimization

Alberto De Marchi Department of Aerospace Engineering, Institute of Applied Mathematics and Scientific Computing, University of the Bundeswehr Munich, Werner-Heisenberg-Weg 39, 85577 Neubiberg, Germany
email: alberto.demarchi@unibw.de, orcid: 0000-0002-3545-6898
   Andreas Themelis Faculty of Information Science and Electrical Engineering (ISEE), Kyushu University, 744 Motooka, Nishi-ku 819-0395, Fukuoka, Japan
email: andreas.themelis@ees.kyushu-u.ac.jp, orcid: 0000-0002-6044-0169
Abstract

Focusing on minimization problems with structured objective function and smooth constraints, we present a flexible technique that combines the beneficial regularization effects of (exact) penalty and interior-point methods. Working in the fully nonconvex setting, a pure barrier approach requires careful steps when approaching the infeasible set, thus hindering convergence. We show how a tight integration with a penalty scheme overcomes such conservatism, does not require a strictly feasible starting point, and thus accommodates equality constraints. The crucial advancement that allows us to invoke generic (possibly accelerated) subsolvers is a marginalization step: amounting to a conjugacy operation, this step effectively merges (exact) penalty and barrier into a smooth, full domain functional object. When the penalty exactness takes effect, the generated subproblems do not suffer the ill-conditioning typical of penalty methods, nor do they exhibit the nonsmoothness of exact penalty terms. We provide a theoretical characterization of the algorithm and its asymptotic properties, deriving convergence results for fully nonconvex problems. Illustrative examples and numerical simulations demonstrate the wide range of problems our theory and algorithm are able to cover.

Keywords. Nonsmooth nonconvex optimization \cdot exact penalty methods \cdot interior point methods \cdot proximal algorithms

AMS subject classifications. 49J52 \cdot 49J53 \cdot 65K05 \cdot 90C06 \cdot 90C30

1 Introduction

We are interested in developing numerical methods for constrained optimization problems of the form

minimize𝒙nq(𝒙)subjectto𝒄(𝒙)𝟎\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}{}\leavevmode\nobreak\ q({\bm{x}})% \quad\operatorname{subject\ to}{}\leavevmode\nobreak\ {\bm{c}}({\bm{x}})\leq{% \bm{0}}roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) start_OPFUNCTION roman_subject roman_to end_OPFUNCTION bold_italic_c ( bold_italic_x ) ≤ bold_0

where 𝒙𝒙{\bm{x}}bold_italic_x is the decision variable and functions q𝑞qitalic_q and 𝒄𝒄{\bm{c}}bold_italic_c are problem data. (Throughout, we stick to the convention of bold-facing vector variables and vector-valued functions, so that 𝟎0{\bm{0}}bold_0 indicates the zero vector of suitable size and similarly 𝟏1{\bm{1}}bold_1 is the vector with all entries equal to one.) The proposed algorithm for (1) can also be applied for tackling problems with equality constraints 𝒄eq(𝒙)=𝟎subscript𝒄eq𝒙0{{\bm{c}}}_{\text{eq}}({\bm{x}})={\bm{0}}bold_italic_c start_POSTSUBSCRIPT eq end_POSTSUBSCRIPT ( bold_italic_x ) = bold_0. For simplicity of presentation, we focus on the more general inequality constrained case and illustrate in Section 4.3 an efficient way of handling equality constraints in our framework other than describing them as two-sided inequalities. Henceforth we consider (1) under the following standing assumptions.

{assumption} The following hold in problem (1): 1 q:n¯q:{}^{n}\rightarrow\overline{}\m@thbbch@rRitalic_q : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG is proper, lsc, and continuous relative to its domain domqdom𝑞\operatorname{dom}qroman_dom italic_q.111This is meant in the sense that whenever domq𝒙k𝒙containsdom𝑞superscript𝒙𝑘𝒙\operatorname{dom}q\ni{\bm{x}}^{k}\to{\bm{x}}roman_dom italic_q ∋ bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → bold_italic_x it holds that q(𝒙k)q(𝒙)𝑞superscript𝒙𝑘𝑞𝒙q({\bm{x}}^{k})\to q({\bm{x}})italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → italic_q ( bold_italic_x ). In absence of continuity of g𝑔gitalic_g on its domain, however, all claims regarding convergence properties still hold true provided that the iterates converge ‘g𝑔gitalic_g-attentively’. 2 𝒄:nm{\bm{c}}:{}^{n}\rightarrow{}^{m}bold_italic_c : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT is continuously differentiable. 3 The problem is feasible: qinf{𝒙𝒄(𝒙)𝟎}q(𝒙)subscript𝑞subscriptinf𝒙𝒄𝒙0𝑞𝒙absentq_{\star}\coloneqq\operatorname*{inf}_{{\mathopen{}\left\{{\bm{x}}{}\mathrel{% \mid}{}{\bm{c}}({\bm{x}})\leq{\bm{0}}\right\}\mathclose{}}}q({\bm{x}})\in\m@thbbch@rRitalic_q start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≔ roman_inf start_POSTSUBSCRIPT { bold_italic_x ∣ bold_italic_c ( bold_italic_x ) ≤ bold_0 } end_POSTSUBSCRIPT italic_q ( bold_italic_x ) ∈; in particular, {𝒙domq𝒄(𝒙)𝟎}𝒙dom𝑞𝒄𝒙0{\mathopen{}\left\{{\bm{x}}\in\operatorname{dom}q{}\mathrel{\mid}{}{\bm{c}}({% \bm{x}})\leq{\bm{0}}\right\}\mathclose{}}\neq\emptyset{ bold_italic_x ∈ roman_dom italic_q ∣ bold_italic_c ( bold_italic_x ) ≤ bold_0 } ≠ ∅.

Notice that no differentiability requirements are imposed on the cost q𝑞qitalic_q, nor convexity on any term in the formulation. The primary objective of this paper is to devise an abstract algorithmic framework in the generality of this setting. The methodology requires an oracle for solving, up to approximate local optimality, minimization instances of the sum of q𝑞qitalic_q with a differentiable term. In our numerical experiments we will invoke off-the-shelf routines based on proximal gradient iterations, thereby restricting our attention to problem instances in which q𝑞qitalic_q is structured as q=f+g𝑞𝑓𝑔q=f+gitalic_q = italic_f + italic_g for a differentiable function f𝑓fitalic_f and a function g𝑔gitalic_g that enjoys an easily computable proximal map. Most nonsmooth functions widely used in practice comply with all these requirements. For instance, g𝑔gitalic_g can include indicators of any nonempty and closed set, and thus enforce arbitrary closed constraints that are easy to project onto. We also emphasize that the requirement of continuity relative to the domain is virtually negligible, as in most cases it can be circumvented through suitable reformulations of the problem that leverage the flexibility of the constraints. As a particularly enticing such instance, we mention the reformulation of the so-called L0superscript𝐿0L^{0}italic_L start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT-norm penalty (number of nonzero entries) 𝒙0subscriptnorm𝒙0\|{\bm{x}}\|_{0}∥ bold_italic_x ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT as the linear program

𝒙0=min𝒖n𝒖1subjectto𝟏𝒖𝟏,𝒖,𝒙=𝒙1,\|{\bm{x}}\|_{0}=\min_{{\bm{u}}\in{}^{n}}{}\|{\bm{u}}\|_{1}\quad\operatorname{% subject\ to}{}\leavevmode\nobreak\ -{\bm{1}}\leq{\bm{u}}\leq{\bm{1}},% \leavevmode\nobreak\ {\mathopen{}\left\langle{}{\bm{u}}{},{}{\bm{x}}{}\right% \rangle\mathclose{}}=\|{\bm{x}}\|_{1},∥ bold_italic_x ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT bold_italic_u ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_u ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_OPFUNCTION roman_subject roman_to end_OPFUNCTION - bold_1 ≤ bold_italic_u ≤ bold_1 , ⟨ bold_italic_u , bold_italic_x ⟩ = ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

and remark that, more generally, matrix rank can also be cast in a similar fashion [2, Lem. 3.1].

Motivations and related work

The class of problems (1) with structured cost q𝑞qitalic_q has been recently studied in [4] and [12], respectively, for the fully convex and nonconvex setting, developing methods that bear strong convergence guarantees under some restrictive assumptions. Above all, building on a pure barrier approach, these methods demand a feasible set with nonempty interior, thus excluding problems with equality constraints. Although restricted to simple bounds, a similar interior-point technique is investigated in [18] and manifests analogous pros and cons. In contrast to these works, we intend to address equality constraints as well. An augmented Lagrangian scheme for constrained structured problems was developed in [10], which also allows the specification of constraints in a function-in-set format.

Constrained structured programs (1) are also closely related to the template of structured composite optimization

minimize𝒙nq(𝒙)+h(𝒄(𝒙))\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}{}\leavevmode\nobreak\ q({\bm{x}})% +h({\bm{c}}({\bm{x}}))roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_h ( bold_italic_c ( bold_italic_x ) )

with h:m¯h:{}^{m}\rightarrow\overline{}\m@thbbch@rRitalic_h : start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG. By introducing additional variables 𝒛m{\bm{z}}\in{}^{m}bold_italic_z ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT, composite problems can be rewritten in (equality) constrained form recovering the class of problems (1), with a one-to-one relationship between (local and global) solutions and stationary points [10, Lemma 3.1]. The recent literature on structured composite optimization includes [21], only for convex hhitalic_h, and [16, 9] for fully nonconvex problems, and concentrates almost exclusively on the augmented Lagrangian framework. Relying essentially on a penalty approach, in contrast to a barrier, the algorithmic characterization in [10] involved weaker assumptions and yet retrieved standard convergence results in constrained nonconvex optimization. However, the dependency on dual estimates makes methods of this family sensitive to the initialization of Lagrange multipliers. Moreover, they require some safeguards to ensure convergence from arbitrary starting points [3, 10]. In contrast, thanks to their ‘primal’ nature and inherent regularizing effect, penalty-barrier techniques can conveniently cope with degenerate problems.

The idea of adopting and merging penalty and barrier approaches, in a variety of possible flavors and combinations, is certainly not new, tracing back at least to [14]. Among several recent concretizations of this avenue, we refer to Curtis’ work [7] for a comprehensive discussion and further references. Our motivation for developing this technique for constrained structured problems comes from previous experience while designing the interior point scheme IPprox [12]. The key observation therein is that, with a pure barrier approach, the arising subproblems have a smooth term without full domain. This nonstandard situation, together with a nonconvex and possibly extended-real-valued cost q𝑞qitalic_q and nonlinear constraints 𝒄𝒄{\bm{c}}bold_italic_c, makes it difficult to adopt accelerated subsolvers.222For comparison, most interior point algorithms for classical nonlinear programming transform the original problem into one with equalities and simple bounds only, treating the latter with a barrier and dampening search directions with a so-called fraction-to-the-boundary rule to maintain strict feasibility (relative only to simple bounds) [26].

In the broad setting of (1) under Section 1, a blind application of penalty-barrier strategies in the spirit of [7] would bear no advantages, since the issue of IPprox of a restricted domain would persist, hindering again the practical performance. In this paper we propose and investigate in details a simple technique to overcome this limitation. The crucial step consists in the marginalization of auxiliary variables: after applying some penalty and barrier modifications, the auxiliary variables are optimized pointwise, for any given decision variable 𝒙𝒙{\bm{x}}bold_italic_x.333This approach can be interpreted as an extreme version of the so-called magical steps [5, 3], or slack reset in [7], and was inspired by the proximal approaches in [13, 9]. Before proceeding with the technical content, we emphasize that the marginalization step not only reduces the subproblems’ size (recovering that of only the original decision variable 𝒙𝒙{\bm{x}}bold_italic_x), but it also—and especially—results in a smooth penalty term for the subproblems that has always full domain. The emergence of this penalty-barrier envelope enables the adoption of generic, possibly accelerated subsolvers, as well as tailored routines that exploit the problem’s original structure. This claim will be substantiated in Fig. 1 where we show that convexity and Lipschitz differentiability are preserved in the transformed problems.

2 Preliminaries

In this section we comment on useful notation and preliminary results before discussing optimality notions to characterize solutions of (1).

2.1 Notation and known facts

With and ¯{}\overline{}\m@thbbch@rR\coloneqq\m@thbbch@rR\cup{\mathopen{}\left\{\infty{}% \mathrel{\mid}{}{}\right\}\mathclose{}}over¯ start_ARG end_ARG ≔ ∪ { ∞ ∣ } we denote the real and extended-real line, respectively, and with +[0,){}_{+}\coloneqq[0,\infty)start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT ≔ [ 0 , ∞ ) and (,0]{}_{-}\coloneqq(-\infty,0]start_FLOATSUBSCRIPT - end_FLOATSUBSCRIPT ≔ ( - ∞ , 0 ] the set of positive and negative real numbers, respectively. The positive and negative parts of a number r𝑟absentr\in\m@thbbch@rRitalic_r ∈ are respectively denoted as [r]+max{0,r}[r]_{+}\coloneqq\max{\mathopen{}\left\{0,r{}\mathrel{\mid}{}{}\right\}% \mathclose{}}[ italic_r ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ≔ roman_max { 0 , italic_r ∣ } and [r]max{0,r}[r]_{-}\coloneqq\max{\mathopen{}\left\{0,-r{}\mathrel{\mid}{}{}\right\}% \mathclose{}}[ italic_r ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ≔ roman_max { 0 , - italic_r ∣ }, so that r=[r]+[r]𝑟subscriptdelimited-[]𝑟subscriptdelimited-[]𝑟r=[r]_{+}-[r]_{-}italic_r = [ italic_r ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT - [ italic_r ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT. In case of a vector 𝒓𝒓{\bm{r}}bold_italic_r, then [𝒓]+subscriptdelimited-[]𝒓[{\bm{r}}]_{+}[ bold_italic_r ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and [𝒓]subscriptdelimited-[]𝒓[{\bm{r}}]_{-}[ bold_italic_r ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT are meant elementwise.

The notation T:nmT:{}^{n}\rightrightarrows{}^{m}italic_T : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ⇉ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT indicates a set-valued mapping T𝑇Titalic_T that maps any 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT to a (possibly empty) subset T(𝒙)𝑇𝒙T({\bm{x}})italic_T ( bold_italic_x ) of m. Its (effective) domain and graph are the sets domT{𝒙nT(𝒙)}\operatorname{dom}T\coloneqq{\mathopen{}\left\{{\bm{x}}\in{}^{n}{}\mathrel{% \mid}{}T({\bm{x}})\neq\emptyset\right\}\mathclose{}}roman_dom italic_T ≔ { bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ italic_T ( bold_italic_x ) ≠ ∅ } and gphT{(𝒙,𝒚)×nm𝒚T(𝒙)}\operatorname{gph}T\coloneqq{\mathopen{}\left\{({\bm{x}},{\bm{y}})\in{}^{n}% \times{}^{m}{}\mathrel{\mid}{}{\bm{y}}\in T({\bm{x}})\right\}\mathclose{}}roman_gph italic_T ≔ { ( bold_italic_x , bold_italic_y ) ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT × start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT ∣ bold_italic_y ∈ italic_T ( bold_italic_x ) }. Algebraic operations with or among set-valued mappings are meant in a componentwise sense; for instance, the sum of T1,T2:nmT_{1},T_{2}:{}^{n}\rightrightarrows{}^{m}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ⇉ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT is defined as (T1+T2)(𝒙){𝒚1+𝒚2(𝒚1,𝒚2)T1(𝒙)×T2(𝒙)}subscript𝑇1subscript𝑇2𝒙superscript𝒚1superscript𝒚2superscript𝒚1superscript𝒚2subscript𝑇1𝒙subscript𝑇2𝒙(T_{1}+T_{2})({\bm{x}})\coloneqq{\mathopen{}\left\{{\bm{y}}^{1}+{\bm{y}}^{2}{}% \mathrel{\mid}{}({\bm{y}}^{1},{\bm{y}}^{2})\in T_{1}({\bm{x}})\times T_{2}({% \bm{x}})\right\}\mathclose{}}( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( bold_italic_x ) ≔ { bold_italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + bold_italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ ( bold_italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∈ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_x ) × italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x ) } for all 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT.

With distE:n[0,)\operatorname{dist}_{E}:{}^{n}\rightarrow[0,\infty)roman_dist start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → [ 0 , ∞ ) and ΠE:nn\operatorname{\Pi}_{E}:{}^{n}\rightrightarrows{}^{n}roman_Π start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ⇉ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT we indicate the distance from and the projection onto a nonempty set EnE\subseteq{}^{n}italic_E ⊆ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT, respectively, namely

distE(𝒙)inf𝒚E𝒚𝒙andΠE(𝒙)argmin𝒚E𝒚𝒙.formulae-sequencesubscriptdist𝐸𝒙subscriptinf𝒚𝐸norm𝒚𝒙andsubscriptΠ𝐸𝒙subscriptargmin𝒚𝐸norm𝒚𝒙\operatorname{dist}_{E}({\bm{x}})\coloneqq\operatorname*{inf}_{{\bm{y}}\in E}% \|{\bm{y}}-{\bm{x}}\|\quad\text{and}\quad\operatorname{\Pi}_{E}({\bm{x}})% \coloneqq\operatorname*{arg\,min}_{{\bm{y}}\in E}\|{\bm{y}}-{\bm{x}}\|.roman_dist start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( bold_italic_x ) ≔ roman_inf start_POSTSUBSCRIPT bold_italic_y ∈ italic_E end_POSTSUBSCRIPT ∥ bold_italic_y - bold_italic_x ∥ and roman_Π start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( bold_italic_x ) ≔ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_y ∈ italic_E end_POSTSUBSCRIPT ∥ bold_italic_y - bold_italic_x ∥ .

With δE:n¯\operatorname{\delta}_{E}:{}^{n}\rightarrow\overline{}\m@thbbch@rRitalic_δ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG we denote the indicator function of E𝐸Eitalic_E, namely such that δE(𝒙)=0subscript𝛿𝐸𝒙0\operatorname{\delta}_{E}({\bm{x}})=0italic_δ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( bold_italic_x ) = 0 if 𝒙E𝒙𝐸{\bm{x}}\in Ebold_italic_x ∈ italic_E and \infty otherwise. For an extended-real-valued function h:n¯h:{}^{n}\rightarrow\overline{}\m@thbbch@rRitalic_h : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG, the (effective) domain, graph, and epigraph are given by domh{𝒙nh(𝒙)<}\operatorname{dom}h\coloneqq{\mathopen{}\left\{{\bm{x}}\in{}^{n}{}\mathrel{% \mid}{}h({\bm{x}})<\infty\right\}\mathclose{}}roman_dom italic_h ≔ { bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ italic_h ( bold_italic_x ) < ∞ }, gphh{(𝒙,h(𝒙))𝒙domh}gph𝒙𝒙𝒙dom\operatorname{gph}h\coloneqq{\mathopen{}\left\{({\bm{x}},h({\bm{x}})){}% \mathrel{\mid}{}{\bm{x}}\in\operatorname{dom}h\right\}\mathclose{}}roman_gph italic_h ≔ { ( bold_italic_x , italic_h ( bold_italic_x ) ) ∣ bold_italic_x ∈ roman_dom italic_h }, and epih{(𝒙,α)×nαh(𝒙)}\operatorname{epi}h\coloneqq{\mathopen{}\left\{({\bm{x}},\alpha)\in{}^{n}% \times\m@thbbch@rR{}\mathrel{\mid}{}\alpha\geq h({\bm{x}})\right\}\mathclose{}}roman_epi italic_h ≔ { ( bold_italic_x , italic_α ) ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT × ∣ italic_α ≥ italic_h ( bold_italic_x ) }. We say that hhitalic_h is proper if domhdom\operatorname{dom}h\neq\emptysetroman_dom italic_h ≠ ∅ and lower semicontinuous (lsc) if h(𝒙¯)lim inf𝒙𝒙¯h(𝒙)¯𝒙subscriptlimit-infimum𝒙¯𝒙𝒙h(\bar{\bm{x}})\leq\liminf_{{\bm{x}}\to\bar{\bm{x}}}h({\bm{x}})italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ≤ lim inf start_POSTSUBSCRIPT bold_italic_x → over¯ start_ARG bold_italic_x end_ARG end_POSTSUBSCRIPT italic_h ( bold_italic_x ) for all 𝒙¯n\bar{\bm{x}}\in{}^{n}over¯ start_ARG bold_italic_x end_ARG ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT or, equivalently, if epihepi\operatorname{epi}hroman_epi italic_h is a closed subset of n+1. Following [22, Def. 8.3], we denote by ^h:nn\hat{\partial}h:{}^{n}\rightrightarrows{}^{n}over^ start_ARG ∂ end_ARG italic_h : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ⇉ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT the regular subdifferential of hhitalic_h, where

𝒗^h(𝒙¯)(def)lim inf𝒙𝒙¯𝒙𝒙¯h(𝒙)h(𝒙¯)𝒗,𝒙𝒙¯𝒙𝒙¯0.formulae-sequence𝒗^¯𝒙superscript(def)subscriptlimit-infimum𝒙¯𝒙𝒙¯𝒙𝒙¯𝒙𝒗𝒙¯𝒙norm𝒙¯𝒙0{\bm{v}}\in\hat{\partial}h(\bar{\bm{x}})\quad\mathrel{{}\mathop{% \Leftrightarrow}\limits^{\text{\clap{\tiny(def)}}}}\quad\liminf_{{\begin{array% }[]{>{\scriptstyle}r >{\scriptstyle{}}c<{{}} >{\scriptstyle}l}{\bm{x}}&\to&% \bar{\bm{x}}\\ {\bm{x}}&\neq&\bar{\bm{x}}\end{array}}}\frac{h({\bm{x}})-h(\bar{\bm{x}})-{% \mathopen{}\left\langle{}{\bm{v}}{},{}{\bm{x}}-\bar{\bm{x}}{}\right\rangle% \mathclose{}}}{\|{\bm{x}}-\bar{\bm{x}}\|}\geq 0.bold_italic_v ∈ over^ start_ARG ∂ end_ARG italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ⇔ start_POSTSUPERSCRIPT (def) end_POSTSUPERSCRIPT lim inf start_POSTSUBSCRIPT start_ARRAY start_ROW start_CELL bold_italic_x end_CELL start_CELL → end_CELL start_CELL over¯ start_ARG bold_italic_x end_ARG end_CELL end_ROW start_ROW start_CELL bold_italic_x end_CELL start_CELL ≠ end_CELL start_CELL over¯ start_ARG bold_italic_x end_ARG end_CELL end_ROW end_ARRAY end_POSTSUBSCRIPT divide start_ARG italic_h ( bold_italic_x ) - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) - ⟨ bold_italic_v , bold_italic_x - over¯ start_ARG bold_italic_x end_ARG ⟩ end_ARG start_ARG ∥ bold_italic_x - over¯ start_ARG bold_italic_x end_ARG ∥ end_ARG ≥ 0 . (1)

The (limiting, or Mordukhovich) subdifferential of hhitalic_h is h:nn\partial h:{}^{n}\rightrightarrows{}^{n}∂ italic_h : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ⇉ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT, where 𝒗¯h(𝒙¯)¯𝒗¯𝒙\bar{{\bm{v}}}\in\partial h(\bar{\bm{x}})over¯ start_ARG bold_italic_v end_ARG ∈ ∂ italic_h ( over¯ start_ARG bold_italic_x end_ARG ) if and only if 𝒙¯domh¯𝒙dom\bar{\bm{x}}\in\operatorname{dom}hover¯ start_ARG bold_italic_x end_ARG ∈ roman_dom italic_h and there exists a sequence (𝒙k,𝒗k)ksubscriptsuperscript𝒙𝑘superscript𝒗𝑘𝑘absent({\bm{x}}^{k},{\bm{v}}^{k})_{k\in\m@thbbch@rN}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT in gph^hgph^\operatorname{gph}\hat{\partial}hroman_gph over^ start_ARG ∂ end_ARG italic_h such that (𝒙k,𝒗k,h(𝒙k))(𝒙¯,𝒗¯,h(𝒙¯))superscript𝒙𝑘superscript𝒗𝑘superscript𝒙𝑘¯𝒙¯𝒗¯𝒙({\bm{x}}^{k},{\bm{v}}^{k},h({\bm{x}}^{k}))\to(\bar{\bm{x}},\bar{{\bm{v}}},h(% \bar{\bm{x}}))( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_h ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) → ( over¯ start_ARG bold_italic_x end_ARG , over¯ start_ARG bold_italic_v end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ). In particular, ^h(𝒙)h(𝒙)^𝒙𝒙\hat{\partial}h({\bm{x}})\subseteq\partial h({\bm{x}})over^ start_ARG ∂ end_ARG italic_h ( bold_italic_x ) ⊆ ∂ italic_h ( bold_italic_x ) holds at any 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT; moreover, 𝟎^h(𝒙)0^𝒙{\bm{0}}\in\hat{\partial}h({\bm{x}})bold_0 ∈ over^ start_ARG ∂ end_ARG italic_h ( bold_italic_x ) is a necessary condition for local minimality of hhitalic_h at 𝒙𝒙{\bm{x}}bold_italic_x [22, Thm. 10.1]. The subdifferential of hhitalic_h at 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG satisfies (h+h0)(𝒙¯)=h(𝒙¯)+h0(𝒙¯)subscript0¯𝒙¯𝒙subscript0¯𝒙\partial(h+h_{0})(\bar{\bm{x}})=\partial h(\bar{\bm{x}})+{\nabla}h_{0}(\bar{% \bm{x}})∂ ( italic_h + italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( over¯ start_ARG bold_italic_x end_ARG ) = ∂ italic_h ( over¯ start_ARG bold_italic_x end_ARG ) + ∇ italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) for any h0:n¯h_{0}:{}^{n}\rightarrow\overline{}\m@thbbch@rRitalic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG continuously differentiable around 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG [22, Ex. 8.8]. If hhitalic_h is convex, then ^h=h^\hat{\partial}h=\partial hover^ start_ARG ∂ end_ARG italic_h = ∂ italic_h coincide with the convex subdifferential

n𝒙¯{𝒗nh(𝒙)h(𝒙¯)𝒗,𝒙𝒙¯0𝒙}n.{}^{n}\ni\bar{\bm{x}}\mapsto{\mathopen{}\left\{{\bm{v}}\in{}^{n}{}\mathrel{% \mid}{}h({\bm{x}})-h(\bar{\bm{x}})-{\mathopen{}\left\langle{}{\bm{v}}{},{}{\bm% {x}}-\bar{\bm{x}}{}\right\rangle\mathclose{}}\geq 0\ \forall{\bm{x}}\in{}^{n}% \right\}\mathclose{}}.start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∋ over¯ start_ARG bold_italic_x end_ARG ↦ { bold_italic_v ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ italic_h ( bold_italic_x ) - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) - ⟨ bold_italic_v , bold_italic_x - over¯ start_ARG bold_italic_x end_ARG ⟩ ≥ 0 ∀ bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT } .

For a convex set CmC\subseteq{}^{m}italic_C ⊆ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT and a point 𝒙C𝒙𝐶{\bm{x}}\in Cbold_italic_x ∈ italic_C one has that δC(𝒙)=NC(𝒙)subscript𝛿𝐶𝒙subscriptN𝐶𝒙\partial\operatorname{\delta}_{C}({\bm{x}})=\operatorname{N}_{C}({\bm{x}})∂ italic_δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_x ) = roman_N start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_x ), where

NC(𝒙){𝒗n𝒗,𝒙𝒙0𝒙C}\operatorname{N}_{C}({\bm{x}})\coloneqq{\mathopen{}\left\{{\bm{v}}\in{}^{n}{}% \mathrel{\mid}{}{\mathopen{}\left\langle{}{\bm{v}}{},{}{\bm{x}}^{\prime}-{\bm{% x}}{}\right\rangle\mathclose{}}\leq 0\ \forall{\bm{x}}^{\prime}\in C\right\}% \mathclose{}}roman_N start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_x ) ≔ { bold_italic_v ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ ⟨ bold_italic_v , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_x ⟩ ≤ 0 ∀ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_C }

denotes the normal cone of C𝐶Citalic_C at 𝒙𝒙{\bm{x}}bold_italic_x.

We use the symbol J𝑭:nm×n\mathop{}\!{\operatorname{J}}{\bm{F}}:{}^{n}\rightarrow{}^{m\times n}roman_J bold_italic_F : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → start_FLOATSUPERSCRIPT italic_m × italic_n end_FLOATSUPERSCRIPT to indicate the Jacobian of a differentiable mapping 𝑭:nm{\bm{F}}:{}^{n}\rightarrow{}^{m}bold_italic_F : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT, namely J𝑭(𝒙¯)i,j=Fimissingxjmissing(𝒙¯)J𝑭subscript¯𝒙𝑖𝑗subscript𝐹𝑖missingsubscript𝑥𝑗missing¯𝒙\mathop{}\!{\operatorname{J}}{\bm{F}}(\bar{\bm{x}})_{i,j}=\frac{\partial{{F}}_% {i}missing}{\partial{{x}}_{j}missing}(\bar{\bm{x}})roman_J bold_italic_F ( over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = divide start_ARG ∂ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_missing end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_missing end_ARG ( over¯ start_ARG bold_italic_x end_ARG ) for all 𝒙¯m\bar{\bm{x}}\in{}^{m}over¯ start_ARG bold_italic_x end_ARG ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT. For a real-valued function hhitalic_h, we instead use the gradient notation hJhJsuperscripttop{\nabla}h\coloneqq\mathop{}\!{\operatorname{J}}h^{\top}∇ italic_h ≔ roman_J italic_h start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT to indicate the column vector of its partial derivatives. Finally, we remind that the convex conjugate of a proper lsc convex function b:¯b:\m@thbbch@rR\rightarrow\overline{}\m@thbbch@rRitalic_b : → over¯ start_ARG end_ARG is the proper lsc convex function b:¯b^{\ast}:\m@thbbch@rR\rightarrow\overline{}\m@thbbch@rRitalic_b start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : → over¯ start_ARG end_ARG defined as b(τ)supt{τtb(t)}b^{\ast}(\tau)\coloneqq\sup_{t\in\m@thbbch@rR}{\mathopen{}\left\{\tau t-b(t){}% \mathrel{\mid}{}{}\right\}\mathclose{}}italic_b start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_τ ) ≔ roman_sup start_POSTSUBSCRIPT italic_t ∈ end_POSTSUBSCRIPT { italic_τ italic_t - italic_b ( italic_t ) ∣ }, and that one then has τb(t)𝜏𝑏𝑡\tau\in\partial b(t)italic_τ ∈ ∂ italic_b ( italic_t ) if and only if tb(τ)𝑡superscript𝑏𝜏t\in\partial b^{\ast}(\tau)italic_t ∈ ∂ italic_b start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_τ ).

2.2 Stationarity concepts

This subsection summarizes standard local optimality measures which were adopted in the proximal interior point framework of [12], and which will be further developed in the following Section 3 into conditions tailored to the setting of this paper. The interested reader is referred to [12, §2] for a verbose introduction and to [3, §3] for a detailed treatise. We start with the usual notion of (approximate) stationarity for general minimization problems of an extended-real-valued function.

{definition}

[stationarity]Relative to the problem minimize𝒙nφ(𝒙)\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}\varphi({\bm{x}})roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_φ ( bold_italic_x ) for a function φ:n¯\varphi:{}^{n}\rightarrow\overline{}\m@thbbch@rRitalic_φ : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG, a point 𝒙¯n\bar{\bm{x}}\in{}^{n}over¯ start_ARG bold_italic_x end_ARG ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT is called

  1. 1.

    stationary if it satisfies 𝟎φ(𝒙¯)0𝜑¯𝒙{\bm{0}}\in\partial\varphi(\bar{\bm{x}})bold_0 ∈ ∂ italic_φ ( over¯ start_ARG bold_italic_x end_ARG );

  2. 2.

    ε𝜀\varepsilonitalic_ε-stationary (with ε>0𝜀0\varepsilon>0italic_ε > 0) if it satisfies distφ(𝒙¯)(𝟎)εsubscriptdist𝜑¯𝒙0𝜀\operatorname{dist}_{\partial\varphi(\bar{\bm{x}})}({\bm{0}})\leq\varepsilonroman_dist start_POSTSUBSCRIPT ∂ italic_φ ( over¯ start_ARG bold_italic_x end_ARG ) end_POSTSUBSCRIPT ( bold_0 ) ≤ italic_ε.

A standard optimality notion that reflects the constrained structure of (1) is given by the Karush-Kuhn-Tucker (KKT) conditions.

{definition}

[2.2 optimality]Relative to problem (1), we say that 𝒙¯n\bar{\bm{x}}\in{}^{n}over¯ start_ARG bold_italic_x end_ARG ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT is KKT-optimal if there exists 𝒚¯m\bar{\bm{y}}\in{}^{m}over¯ start_ARG bold_italic_y end_ARG ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT such that

{J𝒄(𝒙¯)𝒚¯q(𝒙¯)𝒄(𝒙¯)𝟎𝒚¯𝟎y¯ici(𝒙¯)=0i.casesJ𝒄superscript¯𝒙top¯𝒚𝑞¯𝒙missing-subexpression𝒄¯𝒙0missing-subexpression¯𝒚0missing-subexpressionsubscript¯𝑦𝑖subscript𝑐𝑖¯𝒙0for-all𝑖missing-subexpression{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}-\mathop{}\!{\operatorname{J}}{% \bm{c}}(\bar{\bm{x}})^{\top}\bar{\bm{y}}\in\partial q(\bar{\bm{x}})\\ {\bm{c}}(\bar{\bm{x}})\leq{\bm{0}}\\ \bar{\bm{y}}\geq{\bm{0}}\\ \bar{{y}}_{i}{{c}}_{i}(\bar{\bm{x}})=0\quad\forall i.\end{array}\right.% \mathclose{}}{ start_ARRAY start_ROW start_CELL - roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_y end_ARG ∈ ∂ italic_q ( over¯ start_ARG bold_italic_x end_ARG ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ bold_0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL over¯ start_ARG bold_italic_y end_ARG ≥ bold_0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) = 0 ∀ italic_i . end_CELL start_CELL end_CELL end_ROW end_ARRAY

In such case, we say that (𝒙¯,𝒚¯)×nm(\bar{\bm{x}},\bar{\bm{y}})\in{}^{n}\times{}^{m}( over¯ start_ARG bold_italic_x end_ARG , over¯ start_ARG bold_italic_y end_ARG ) ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT × start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT is a KKT-optimal pair for (1).

Even for convex problems, unless suitable constraint and epigraphical qualifications are met, local solutions may fail to be 2.2-optimal. Necessary conditions in the generality of problem (1) are provided by the following asymptotic counterpart.

{definition}

[2.2 optimality]Relative to problem (1), we say that 𝒙¯n\bar{\bm{x}}\in{}^{n}over¯ start_ARG bold_italic_x end_ARG ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT is asymptotically KKT-optimal if there exist sequences (𝒙k)k𝒙¯subscriptsuperscript𝒙𝑘𝑘absent¯𝒙({\bm{x}}^{k})_{k\in\m@thbbch@rN}\to\bar{\bm{x}}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT → over¯ start_ARG bold_italic_x end_ARG and (𝒚k)km({\bm{y}}^{k})_{k\in\m@thbbch@rN}\subset{}^{m}( bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT ⊂ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT such that

{distq(𝒙k)(J𝒄(𝒙k)𝒚k)0[𝒄(𝒙k)]+𝟎𝒚k𝟎yikci(𝒙¯)=0i.casessubscriptdist𝑞superscript𝒙𝑘J𝒄superscriptsuperscript𝒙𝑘topsuperscript𝒚𝑘0missing-subexpressionsubscriptdelimited-[]𝒄superscript𝒙𝑘0missing-subexpressionsuperscript𝒚𝑘0missing-subexpressionsuperscriptsubscript𝑦𝑖𝑘subscript𝑐𝑖¯𝒙0for-all𝑖missing-subexpression{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}\operatorname{dist}_{\partial q% ({\bm{x}}^{k})}(-\mathop{}\!{\operatorname{J}}{\bm{c}}({\bm{x}}^{k})^{\top}{% \bm{y}}^{k})\to 0\\ {}[{\bm{c}}({\bm{x}}^{k})]_{+}\to{\bm{0}}\\ {\bm{y}}^{k}\geq{\bm{0}}\\ {{y}}_{i}^{k}{{c}}_{i}(\bar{\bm{x}})=0\quad\forall i.\end{array}\right.% \mathclose{}}{ start_ARRAY start_ROW start_CELL roman_dist start_POSTSUBSCRIPT ∂ italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( - roman_J bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → 0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT → bold_0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ≥ bold_0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) = 0 ∀ italic_i . end_CELL start_CELL end_CELL end_ROW end_ARRAY
{proposition}

[[3, Thm. 3.1], [10, Prop. 2.5]] Any local minimizer for (1) is 2.2-stationary.

For the sake of designing suitable algorithmic stopping criteria, we also define an approximate variant which provides a further weaker notion of optimality.

{definition}

[ϵbold-italic-ϵ{\bm{\epsilon}}bold_italic_ϵ-KKT optimality]Relative to problem (1), for ϵ=(ϵp,ϵd)>(0,0)bold-italic-ϵsubscriptitalic-ϵpsubscriptitalic-ϵd00{\bm{\epsilon}}=(\epsilon_{\rm p},\epsilon_{\rm d})>(0,0)bold_italic_ϵ = ( italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT ) > ( 0 , 0 ) we say that 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG is an (approximate) ϵbold-ϵ{\bm{\epsilon}}bold_italic_ϵ-KKT point if there exists 𝒚¯m\bar{\bm{y}}\in{}^{m}over¯ start_ARG bold_italic_y end_ARG ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT such that

{distq(𝒙¯)(J𝒄(𝒙¯)𝒚¯)ϵd𝒄(𝒙¯)ϵp𝒚¯𝟎min{y¯i,[ci(𝒙¯)]}ϵpi.{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}\operatorname{dist}_{\partial q% (\bar{\bm{x}})}(-\mathop{}\!{\operatorname{J}}{\bm{c}}(\bar{\bm{x}})^{\top}% \bar{\bm{y}})\leq\epsilon_{\rm d}\\ {\bm{c}}(\bar{\bm{x}})\leq\epsilon_{\rm p}\\ \bar{\bm{y}}\geq{\bm{0}}\\ \min{\mathopen{}\left\{\bar{{y}}_{i},[{{c}}_{i}(\bar{\bm{x}})]_{-}{}\mathrel{% \mid}{}{}\right\}\mathclose{}}\leq\epsilon_{\rm p}\quad\forall i.\end{array}% \right.\mathclose{}}{ start_ARRAY start_ROW start_CELL roman_dist start_POSTSUBSCRIPT ∂ italic_q ( over¯ start_ARG bold_italic_x end_ARG ) end_POSTSUBSCRIPT ( - roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_y end_ARG ) ≤ italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL over¯ start_ARG bold_italic_y end_ARG ≥ bold_0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL roman_min { over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ∣ } ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT ∀ italic_i . end_CELL start_CELL end_CELL end_ROW end_ARRAY

As discussed in the commentary after [3, Thm. 3.1], 2.2-optimality of 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG is tantamount to the existence of a sequence 𝒙k𝒙¯superscript𝒙𝑘¯𝒙{\bm{x}}^{k}\to\bar{\bm{x}}bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → over¯ start_ARG bold_italic_x end_ARG of ϵksuperscriptbold-italic-ϵ𝑘{\bm{\epsilon}}^{k}bold_italic_ϵ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT-KKTpoints for some ϵk(0,0)superscriptbold-italic-ϵ𝑘00{\bm{\epsilon}}^{k}\to(0,0)bold_italic_ϵ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → ( 0 , 0 ). More generally, for any ϵp,ϵd>0subscriptitalic-ϵpsubscriptitalic-ϵd0\epsilon_{\rm p},\epsilon_{\rm d}>0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT > 0 the notions are related as

2.22.2ϵ-KKT,2.22.2ϵ-KKT\text{\ref{KKT}}\quad\Rightarrow\quad\text{\ref{AKKT}}\quad\Rightarrow\quad% \text{\hyperref@@ii[eKKT]{${\bm{\epsilon}}$-KKT}},⇒ ⇒ bold_italic_ϵ ,

while when ϵp=ϵd=0subscriptitalic-ϵpsubscriptitalic-ϵd0\epsilon_{\rm p}=\epsilon_{\rm d}=0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = 0 one has that ϵbold-italic-ϵ{\bm{\epsilon}}bold_italic_ϵ-KKT reduces to 2.2. We conclude by listing the observations in [12, Lem. 6 and Rem. 7] that will be useful in the sequel.

{remark}

Relative to the conditions 2.2 in Section 2.2:

  1. 1.

    Up to possibly perturbing the sequence of multipliers, the complementarity slackness yikci(𝒙¯)=0superscriptsubscript𝑦𝑖𝑘subscript𝑐𝑖¯𝒙0{{y}}_{i}^{k}{{c}}_{i}(\bar{\bm{x}})=0italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) = 0 can be equivalently expressed as yikci(𝒙k)0superscriptsubscript𝑦𝑖𝑘subscript𝑐𝑖superscript𝒙𝑘0{{y}}_{i}^{k}{{c}}_{i}({\bm{x}}^{k})\to 0italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → 0.

  2. 2.

    If the sequence of multipliers (𝒚k)ksubscriptsuperscript𝒚𝑘𝑘absent({\bm{y}}^{k})_{k\in\m@thbbch@rN}( bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT contains a bounded subsequence, then 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG is a 2.2 point, not merely asymptotically.

3 Subproblems generation

In this section we operate a two-step modification of problem (1), whose conceptual roadmap is as follows. We begin with a soft-constrained reformulation in which violation of the inequality 𝒄(𝒙)𝟎𝒄𝒙0{\bm{c}}({\bm{x}})\leq{\bm{0}}bold_italic_c ( bold_italic_x ) ≤ bold_0 is penalized with an L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT-norm in the cost function; the use of a slack variable 𝒔m{\bm{s}}\in{}^{m}bold_italic_s ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT simplifies this formulation by promoting separability. Next, a barrier term is added to enforce strict satisfaction of the constraints in the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT-penalized reformulation. The minimization with respect to the slack variable 𝒔𝒔{\bm{s}}bold_italic_s in the resulting problem can be carried out explicitly, and gives rise to a new problem in which the constraint 𝒄(𝒙)𝟎𝒄𝒙0{\bm{c}}({\bm{x}})\leq{\bm{0}}bold_italic_c ( bold_italic_x ) ≤ bold_0 is softened with a smooth penalty. Increasing the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT-penalty and decreasing the barrier coefficients results in a homotopic transition between smooth reformulations and the original nonsmooth problem.

3.1 𝑳𝟏superscript𝑳1\bm{L^{1}}bold_italic_L start_POSTSUPERSCRIPT bold_1 end_POSTSUPERSCRIPT-penalization

Given α>0𝛼0\alpha>0italic_α > 0, we consider the following L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT relaxation of (1):

minimizexnq(𝒙)+α[𝒄(𝒙)]+1.\operatorname*{minimize}_{x\in{}^{n}}q({\bm{x}})+\alpha\|[{\bm{c}}({\bm{x}})]_% {+}\|_{1}.roman_minimize start_POSTSUBSCRIPT italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_α ∥ [ bold_italic_c ( bold_italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .

By introducing a slack variable 𝒔m{\bm{s}}\in{}^{m}bold_italic_s ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT, (3.1) can equivalently be cast as

minimize(𝒙,𝒔)×nmq(𝒙)+α𝟏,𝒔+δ+m(𝒔)subjectto𝒄(𝒙)𝒔,\begin{array}[t]{>{\displaystyle}r @{\ } >{\displaystyle}l}\operatorname*{% minimize}_{({\bm{x}},{\bm{s}})\in{}^{n}\times{}^{m}}{}&q({\bm{x}})+\alpha{% \mathopen{}\left\langle{}{\bm{1}}{},{}{\bm{s}}{}\right\rangle\mathclose{}}+% \operatorname{\delta}_{{}_{+}^{m}}({\bm{s}})\\ \operatorname{subject\ to}{}&{\bm{c}}({\bm{x}})\leq{\bm{s}},\end{array}start_ARRAY start_ROW start_CELL roman_minimize start_POSTSUBSCRIPT ( bold_italic_x , bold_italic_s ) ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT × start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL italic_q ( bold_italic_x ) + italic_α ⟨ bold_1 , bold_italic_s ⟩ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) end_CELL end_ROW start_ROW start_CELL roman_subject roman_to end_CELL start_CELL bold_italic_c ( bold_italic_x ) ≤ bold_italic_s , end_CELL end_ROW end_ARRAY

as one can easily verify that [𝒄(𝒙)]+=argmin𝒔m{α𝟏,𝒔+δ+m(𝒔)𝒄(𝒙)𝒔}[{\bm{c}}({\bm{x}})]_{+}=\operatorname*{arg\,min}_{{\bm{s}}\in{}^{m}}{% \mathopen{}\left\{\alpha{\mathopen{}\left\langle{}{\bm{1}}{},{}{\bm{s}}{}% \right\rangle\mathclose{}}+\operatorname{\delta}_{{}_{+}^{m}}({\bm{s}}){}% \mathrel{\mid}{}{\bm{c}}({\bm{x}})\leq{\bm{s}}\right\}\mathclose{}}[ bold_italic_c ( bold_italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_s ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT { italic_α ⟨ bold_1 , bold_italic_s ⟩ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) ∣ bold_italic_c ( bold_italic_x ) ≤ bold_italic_s } holds for any 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT and α>0𝛼0\alpha>0italic_α > 0. In other words, (3.1) amounts to (3.1) after a marginal minimization with respect to the slack variable 𝒔𝒔{\bm{s}}bold_italic_s. Accordingly, we may consider the following relaxed optimality notion for problem (1), which, as explained below, is tantamount to 2.2-optimality for problem (3.1).

{definition}

[3.1 optimality]Given α>0𝛼0\alpha>0italic_α > 0, we say that a point 𝒙¯αn\bar{\bm{x}}^{\alpha}\in{}^{n}over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT is 3.1-optimal for (1) if there exists 𝒚¯αm\bar{\bm{y}}^{\alpha}\in{}^{m}over¯ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT such that

{J𝒄(𝒙¯α)𝒚¯αq(𝒙¯α)0y¯iαα(αy¯iα)[ci(𝒙¯α)]+=y¯iα[ci(𝒙¯α)]=0i.casesJ𝒄superscriptsuperscript¯𝒙𝛼topsuperscript¯𝒚𝛼𝑞superscript¯𝒙𝛼missing-subexpression0superscriptsubscript¯𝑦𝑖𝛼𝛼missing-subexpressionformulae-sequence𝛼superscriptsubscript¯𝑦𝑖𝛼subscriptdelimited-[]subscript𝑐𝑖superscript¯𝒙𝛼superscriptsubscript¯𝑦𝑖𝛼subscriptdelimited-[]subscript𝑐𝑖superscript¯𝒙𝛼0for-all𝑖missing-subexpression{\mathopen{}\left\{\begin{array}[]{@{}l@{}l@{}}-\mathop{}\!{\operatorname{J}}{% \bm{c}}(\bar{\bm{x}}^{\alpha})^{\top}\bar{\bm{y}}^{\alpha}\in\partial q(\bar{% \bm{x}}^{\alpha})\\ 0\leq\bar{{y}}_{i}^{\alpha}\leq\alpha\\ (\alpha-\bar{{y}}_{i}^{\alpha})[{{c}}_{i}(\bar{\bm{x}}^{\alpha})]_{+}=\bar{{y}% }_{i}^{\alpha}[{{c}}_{i}(\bar{\bm{x}}^{\alpha})]_{-}=0\quad\forall i.\end{% array}\right.\mathclose{}}{ start_ARRAY start_ROW start_CELL - roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ∈ ∂ italic_q ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 ≤ over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ≤ italic_α end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ( italic_α - over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT = 0 ∀ italic_i . end_CELL start_CELL end_CELL end_ROW end_ARRAY

In such case, we call (𝒙¯α,𝒚¯α)×nm(\bar{\bm{x}}^{\alpha},\bar{\bm{y}}^{\alpha})\in{}^{n}\times{}^{m}( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , over¯ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT × start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT a 3.1-optimal pair for (1).

Since the cost function in (3.1) is separable in 𝒙𝒙{\bm{x}}bold_italic_x and 𝒔𝒔{\bm{s}}bold_italic_s, its subdifferential at any point (𝒙,𝒔)×nm({\bm{x}},{\bm{s}})\in{}^{n}\times{}^{m}( bold_italic_x , bold_italic_s ) ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT × start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT is the Cartesian product q(𝒙)×(α𝟏+δm+(𝒔))\partial q({\bm{x}})\times\bigl{(}\alpha{\bm{1}}+\partial\operatorname{\delta}% _{{}_{m}^{+}}({\bm{s}})\bigr{)}∂ italic_q ( bold_italic_x ) × ( italic_α bold_1 + ∂ italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) ) of the partial subdifferentials, see [22, Prop. 10.5]. By further observing that

δm+(𝒔)={𝒗mvisi=0,i=1,,m}\partial\operatorname{\delta}_{{}_{m}^{+}}({\bm{s}})={\mathopen{}\left\{{\bm{v% }}\in{}_{-}^{m}{}\mathrel{\mid}{}{{v}}_{i}{{s}}_{i}=0,\ i=1,\dots,m\right\}% \mathclose{}}∂ italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) = { bold_italic_v ∈ start_FLOATSUBSCRIPT - end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∣ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_i = 1 , … , italic_m }

for all 𝒔𝟎𝒔0{\bm{s}}\leq{\bm{0}}bold_italic_s ≤ bold_0 (and is empty otherwise), it is easy to see that 𝒙¯αsuperscript¯𝒙𝛼\bar{\bm{x}}^{\alpha}over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT is 3.1-optimal iff (𝒙¯α,¯𝒔α)superscript¯𝒙𝛼¯absentsuperscript𝒔𝛼(\bar{\bm{x}}^{\alpha},\bar{}{\bm{s}}^{\alpha})( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , over¯ start_ARG end_ARG bold_italic_s start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) (with ¯𝒔α=[𝒄(𝒙¯α)]+¯absentsuperscript𝒔𝛼subscriptdelimited-[]𝒄superscript¯𝒙𝛼\bar{}{\bm{s}}^{\alpha}=[{\bm{c}}(\bar{\bm{x}}^{\alpha})]_{+}over¯ start_ARG end_ARG bold_italic_s start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT = [ bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT) is 2.2-optimal for (3.1), and in which case the multipliers 𝒚¯αsuperscript¯𝒚𝛼\bar{\bm{y}}^{\alpha}over¯ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT coincide. More importantly, the following result clarifies how 2.2- and 3.1-optimality for problem (1) are interrelated. The result is standard, but its simple proof is nevertheless provided out of self containedness.

{lemma}

The following hold:

  1. 1.

    A 3.1-optimal pair (𝒙¯α,𝒚¯α)superscript¯𝒙𝛼superscript¯𝒚𝛼(\bar{\bm{x}}^{\alpha},\bar{\bm{y}}^{\alpha})( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , over¯ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) satisfying 𝒄(𝒙¯α)𝟎𝒄superscript¯𝒙𝛼0{\bm{c}}(\bar{\bm{x}}^{\alpha})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ≤ bold_0 is 2.2-optimal.

  2. 2.

    A 2.2-optimal pair (𝒙¯,𝒚¯)¯𝒙¯𝒚(\bar{\bm{x}},\bar{\bm{y}})( over¯ start_ARG bold_italic_x end_ARG , over¯ start_ARG bold_italic_y end_ARG ) is 3.1-optimal for any α𝒚¯𝛼subscriptnorm¯𝒚\alpha\geq\|\bar{\bm{y}}\|_{\infty}italic_α ≥ ∥ over¯ start_ARG bold_italic_y end_ARG ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.

Proof.

For clarity of exposition, we have schematically reported the 2.2 and 3.1 optimality conditions side by side in (1). Notice in particular that the dual feasibility 𝒚¯𝟎¯𝒚0\bar{\bm{y}}\geq{\bm{0}}over¯ start_ARG bold_italic_y end_ARG ≥ bold_0 and primal optimality J𝒄(𝒙¯)𝒚¯q(𝒙¯)J𝒄superscript¯𝒙top¯𝒚𝑞¯𝒙-\mathop{}\!{\operatorname{J}}{\bm{c}}(\bar{\bm{x}})^{\top}\bar{\bm{y}}\in% \partial q(\bar{\bm{x}})- roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_y end_ARG ∈ ∂ italic_q ( over¯ start_ARG bold_italic_x end_ARG ) coincide in both notions.

2.2J𝒄(𝒙¯)𝒚¯q(𝒙¯)𝒚¯𝟎𝒄(𝒙¯)𝟎y¯ici(𝒙¯)=03.1𝒚¯α𝟏(αy¯i)[ci(𝒙¯)]+=0y¯i[ci(𝒙¯)]=0missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression2.2J𝒄superscript¯𝒙top¯𝒚𝑞¯𝒙¯𝒚0𝒄¯𝒙0subscript¯𝑦𝑖subscript𝑐𝑖¯𝒙0missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression3.1¯𝒚𝛼1𝛼subscript¯𝑦𝑖subscriptdelimited-[]subscript𝑐𝑖¯𝒙0subscript¯𝑦𝑖subscriptdelimited-[]subscript𝑐𝑖¯𝒙0\begin{array}[]{l|c|c|c|c|c|}\cline{2-6}\cr\text{\text{\ref{KKT}}}&\hbox{% \multirowsetup$-\mathop{}\!{\operatorname{J}}{\bm{c}}(\bar{\bm{x}})^{\top}\bar% {\bm{y}}\in\partial q(\bar{\bm{x}})$}&\hbox{\multirowsetup$\bar{\bm{y}}\geq{% \bm{0}}$}&{\bm{c}}(\bar{\bm{x}})\leq{\bm{0}}&\lx@intercol\hfil\bar{{y}}_{i}{{c% }}_{i}(\bar{\bm{x}})=0\hfil\lx@intercol\vrule\lx@intercol\\ \cline{4-6}\cr\text{\text{\ref{KKTa}}}&&&\bar{\bm{y}}\leq\alpha{\bm{1}}&(% \alpha-\bar{{y}}_{i})[{{c}}_{i}(\bar{\bm{x}})]_{+}=0&\bar{{y}}_{i}[{{c}}_{i}(% \bar{\bm{x}})]_{-}=0\\ \cline{2-6}\cr\end{array}start_ARRAY start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_y end_ARG ∈ ∂ italic_q ( over¯ start_ARG bold_italic_x end_ARG ) end_CELL start_CELL over¯ start_ARG bold_italic_y end_ARG ≥ bold_0 end_CELL start_CELL bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ bold_0 end_CELL start_CELL over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) = 0 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over¯ start_ARG bold_italic_y end_ARG ≤ italic_α bold_1 end_CELL start_CELL ( italic_α - over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = 0 end_CELL start_CELL over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT = 0 end_CELL end_ROW end_ARRAY (1)
  • \diamondsuit

    ??)  Primal feasibility 𝒄(𝒙¯α)𝟎𝒄superscript¯𝒙𝛼0{\bm{c}}(\bar{\bm{x}}^{\alpha})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ≤ bold_0 holds by assumption, and since 𝒄(𝒙¯α)𝟎𝒄superscript¯𝒙𝛼0{\bm{c}}(\bar{\bm{x}}^{\alpha})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ≤ bold_0, the complementarity slackness in 3.1 reduces to 0=[ci(𝒙¯α)]y¯iα=ci(𝒙¯α)y¯iα0subscriptdelimited-[]subscript𝑐𝑖superscript¯𝒙𝛼superscriptsubscript¯𝑦𝑖𝛼subscript𝑐𝑖superscript¯𝒙𝛼superscriptsubscript¯𝑦𝑖𝛼0=[{{c}}_{i}(\bar{\bm{x}}^{\alpha})]_{-}\bar{{y}}_{i}^{\alpha}=-{{c}}_{i}(\bar% {\bm{x}}^{\alpha})\bar{{y}}_{i}^{\alpha}0 = [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT = - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT, i=1,,m𝑖1𝑚i=1,\dots,mitalic_i = 1 , … , italic_m, yielding the corresponding complementarity slackness condition in Section 2.2.

  • \diamondsuit

    ??)  The upper bound y¯iαsubscript¯𝑦𝑖𝛼\bar{{y}}_{i}\leq\alphaover¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_α holds by assumption, and since 𝒄(𝒙¯)𝟎𝒄¯𝒙0{\bm{c}}(\bar{\bm{x}})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ bold_0, for every i𝑖iitalic_i one has that 0=[ci(𝒙¯)]+=(αy¯i)[ci(𝒙¯)]+0subscriptdelimited-[]subscript𝑐𝑖¯𝒙𝛼subscript¯𝑦𝑖subscriptdelimited-[]subscript𝑐𝑖¯𝒙0=[{{c}}_{i}(\bar{\bm{x}})]_{+}=(\alpha-\bar{{y}}_{i})[{{c}}_{i}(\bar{\bm{x}})% ]_{+}0 = [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = ( italic_α - over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and 0=y¯ici(𝒙¯)=y¯i[ci(𝒙¯)]0subscript¯𝑦𝑖subscript𝑐𝑖¯𝒙subscript¯𝑦𝑖subscriptdelimited-[]subscript𝑐𝑖¯𝒙0=\bar{{y}}_{i}{{c}}_{i}(\bar{\bm{x}})=-\bar{{y}}_{i}[{{c}}_{i}(\bar{\bm{x}})]% _{-}0 = over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) = - over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT. ∎

3.2 IP-type barrier reformulation

To carry on with the second modification of the problem, in what follows we fix a barrier satisfying the following requirements.

{assumption} The barrier function :(0,]\eulerb{}:\m@thbbch@rR\rightarrow(0,\infty]: → ( 0 , ∞ ] is proper, lsc, and twice continuously differentiable on its domain dom=(,0)dom0\operatorname{dom}\eulerb{}=(-\infty,0)roman_dom = ( - ∞ , 0 ) with >0{}^{\prime}{}>0start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT > 0 and >0{}^{\prime}{}^{\prime}>0start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT > 0. Furthermore, inf=0inf0\operatorname*{inf}\eulerb{}=0roman_inf = 0.

For reasons that will be elaborated on later, convenient choices of barriers are (t)=1t𝑡1𝑡\eulerb{}(t)=-\frac{1}{t}( italic_t ) = - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG and (t)=ln(11t)𝑡11𝑡\eulerb{}(t)=\ln(1-\frac{1}{t})( italic_t ) = roman_ln ( 1 - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ) (both extended as \infty on +), see Table 1 in Section 4.1. Once such is fixed, in the spirit of interior point methods, given a parameter μ>0𝜇0\mu>0italic_μ > 0 we enforce strict satisfaction of the constraint in (3.1) by considering the following barrier version

minimize(𝒙,𝒔)×nmq(𝒙)+α𝟏,𝒔+δ+m(𝒔)+μi=1m(ci(𝒙)si).\operatorname*{minimize}_{({\bm{x}},{\bm{s}})\in{}^{n}\times{}^{m}}q({\bm{x}})% +\alpha{\mathopen{}\left\langle{}{\bm{1}}{},{}{\bm{s}}{}\right\rangle% \mathclose{}}+\operatorname{\delta}_{{}_{+}^{m}}({\bm{s}})+\mu\sum_{i=1}^{m}% \eulerb{}\bigl{(}{{c}}_{i}({\bm{x}})-{{s}}_{i}\bigr{)}.roman_minimize start_POSTSUBSCRIPT ( bold_italic_x , bold_italic_s ) ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT × start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_α ⟨ bold_1 , bold_italic_s ⟩ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Differently from the IP frameworks of [4, 12], we here enforce a barrier in the relaxed version (3.1), and not on the original problem (1). As such, it is only pairs (𝒙,𝒔)𝒙𝒔({\bm{x}},{\bm{s}})( bold_italic_x , bold_italic_s ) that need to lie in the interior of the constraints, but 𝒙𝒙{\bm{x}}bold_italic_x is otherwise ‘unconstrained’: for any 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT, any 𝒔>𝒄(𝒙)𝒔𝒄𝒙{\bm{s}}>{\bm{c}}({\bm{x}})bold_italic_s > bold_italic_c ( bold_italic_x ) (elementwise) yields a pair (𝒙,𝒔)𝒙𝒔({\bm{x}},{\bm{s}})( bold_italic_x , bold_italic_s ) that satisfies the strict constraint 𝒄(𝒙)𝒔<𝟎𝒄𝒙𝒔0{\bm{c}}({\bm{x}})-{\bm{s}}<{\bm{0}}bold_italic_c ( bold_italic_x ) - bold_italic_s < bold_0. In fact, we may again explicitly minimize with respect to the slack variable 𝒔𝒔{\bm{s}}bold_italic_s by observing that

argmin𝒔m{α𝟏,𝒔+δ+m(𝒔)+μi=1m(ci(𝒙)si)}=[𝒄(𝒙)()(α/μ)]+\operatorname*{arg\,min}_{{\bm{s}}\in{}^{m}}{\mathopen{}\left\{\alpha{% \mathopen{}\left\langle{}{\bm{1}}{},{}{\bm{s}}{}\right\rangle\mathclose{}}+% \operatorname{\delta}_{{}_{+}^{m}}({\bm{s}})+\mu\sum_{i=1}^{m}\eulerb{}\bigl{(% }{{c}}_{i}({\bm{x}})-{{s}}_{i}\bigr{)}{}\mathrel{\mid}{}{}\right\}\mathclose{}% }=\bigl{[}{\bm{c}}({\bm{x}})-({\bm{\eulerb}}^{\ast})^{\prime}{}(\nicefrac{{% \alpha}}{{\mu}})\bigr{]}_{+}start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_s ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT { italic_α ⟨ bold_1 , bold_italic_s ⟩ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∣ } = [ bold_italic_c ( bold_italic_x ) - ( start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT (1)

holds for every 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT, where (t)((t),,(t))𝑡𝑡𝑡{\bm{\eulerb}}{}(t)\coloneqq(\eulerb{}(t),\dots,\eulerb{}(t))( italic_t ) ≔ ( ( italic_t ) , … , ( italic_t ) ) and similarly conjugation and derivative are meant elementwise. Plugging the optimal 𝒔𝒔{\bm{s}}bold_italic_s into (3.2) results in

minimize𝒙nq(𝒙)+μΨα/μ(𝒄(𝒙)),\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}q({\bm{x}})+\mu\Psi_{\nicefrac{{% \alpha}}{{\mu}}}\bigl{(}{\bm{c}}({\bm{x}})\bigr{)},roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_μ roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( bold_italic_c ( bold_italic_x ) ) ,

where, for any ρ>0superscript𝜌0\rho^{\ast}>0italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0, Ψρ:m\Psi_{\rho^{\ast}}:{}^{m}\rightarrow\m@thbbch@rRroman_Ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT → is a separable function given by

Ψρ(𝒚)i=1mψρ(yi)subscriptΨsuperscript𝜌𝒚superscriptsubscript𝑖1𝑚subscript𝜓superscript𝜌subscript𝑦𝑖\displaystyle\Psi_{\rho^{\ast}}({\bm{y}})\coloneqq\sum_{i=1}^{m}\psi_{\rho^{% \ast}}({{y}}_{i})roman_Ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_y ) ≔ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (1a)
with
ψρ(t){(t)if (t)ρρt(ρ)otherwise{\displaystyle\psi_{\rho^{\ast}}(t)\coloneqq{\mathopen{}\left\{\begin{array}[]{% l @{\hspace{\ifcasescolsep}} >{\text{if~}}l }\eulerb{}(t)\hfil\hskip 10.00002% pt&\leavevmode\nobreak\ }{}^{\prime}{}(t)\leq\rho^{\ast}\\ \rho^{\ast}t-{}^{\ast}{}(\rho^{\ast})\hfil\hskip 10.00002pt&\lx@intercol\text{% otherwise}\hfil\lx@intercol\end{array}\right.\mathclose{}}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) ≔ { start_ARRAY start_ROW start_CELL ( italic_t ) end_CELL start_CELL if start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) ≤ italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY (1d)
being (globally) Lipschitz differentiable and ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT-Lipschitz continuous with derivative
ψρ(t)=min{(t),ρ}.\displaystyle\psi_{\rho^{\ast}}^{\prime}(t)=\min{\mathopen{}\left\{{}^{\prime}% {}(t),\rho^{\ast}{}\mathrel{\mid}{}{}\right\}\mathclose{}}.italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) = roman_min { start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ } . (1e)

A step-by-step derivation of all the identities above is given in A.2 in the appendix.

Problem (3.2) is ‘unconstrained’, in the sense that no explicit ambient constraints are provided, yet stationarity notions relative to it bear a close resemblance with KKT-type optimality conditions.

{lemma}

For any μ,α>0𝜇𝛼0\mu,\alpha>0italic_μ , italic_α > 0 and 𝒙n{\bm{x}}\in{}^{n}bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT

[q+μΨα/μ𝒄](𝒙)=q(𝒙)+i=1mmin{α,μ(ci(𝒙))}ci(𝒙).\partial{\mathopen{}\left[q+\mu\Psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{\bm{c}}% \right]\mathclose{}}({\bm{x}})=\partial q({\bm{x}})+\sum_{i=1}^{m}\min{% \mathopen{}\left\{\alpha,\mu{}^{\prime}{}({{c}}_{i}({\bm{x}})){}\mathrel{\mid}% {}{}\right\}\mathclose{}}{\nabla}{{c}}_{i}({\bm{x}}).∂ [ italic_q + italic_μ roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ bold_italic_c ] ( bold_italic_x ) = ∂ italic_q ( bold_italic_x ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_min { italic_α , italic_μ start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) ∣ } ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) .

In particular, 𝒙¯α,μn\bar{\bm{x}}^{\alpha,\mu}\in{}^{n}over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α , italic_μ end_POSTSUPERSCRIPT ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT is ε𝜀\varepsilonitalic_ε-stationary for (3.2) iff

distq(𝒙¯α,μ)(J𝒄(𝒙¯α,μ)𝒚¯α,μ)ε,wherey¯iα,μmin{α,μ(ci(𝒙))}(0,α].\operatorname{dist}_{\partial q(\bar{\bm{x}}^{\alpha,\mu})}\bigl{(}-\mathop{}% \!{\operatorname{J}}{\bm{c}}(\bar{\bm{x}}^{\alpha,\mu})^{\top}\bar{\bm{y}}^{% \alpha,\mu}\bigr{)}\leq\varepsilon,\quad\text{where}\quad\bar{{y}}_{i}^{\alpha% ,\mu}\coloneqq\min{\mathopen{}\left\{\alpha,\mu{}^{\prime}{}({{c}}_{i}({\bm{x}% })){}\mathrel{\mid}{}{}\right\}\mathclose{}}\in(0,\alpha].roman_dist start_POSTSUBSCRIPT ∂ italic_q ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α , italic_μ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( - roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_α , italic_μ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_α , italic_μ end_POSTSUPERSCRIPT ) ≤ italic_ε , where over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α , italic_μ end_POSTSUPERSCRIPT ≔ roman_min { italic_α , italic_μ start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) ∣ } ∈ ( 0 , italic_α ] .
Proof.

Since μΨα/μ𝒄𝜇subscriptΨ𝛼𝜇𝒄\mu\Psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{\bm{c}}italic_μ roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ bold_italic_c is differentiable, we just need to confirm that its gradient equals the sum in the display. We have

[μΨα/μ𝒄](𝒙)=μi=1m[ψα/μci](𝒙)=μi=1mψα/μ(ci(𝒙))ci(𝒙),𝜇subscriptΨ𝛼𝜇𝒄𝒙𝜇superscriptsubscript𝑖1𝑚subscript𝜓𝛼𝜇subscript𝑐𝑖𝒙𝜇superscriptsubscript𝑖1𝑚subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒙subscript𝑐𝑖𝒙{\nabla}{\mathopen{}\left[\mu\Psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{\bm{c}}% \right]\mathclose{}}({\bm{x}})=\mu\sum_{i=1}^{m}{\nabla}\bigl{[}\psi_{% \nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i}\bigr{]}({\bm{x}})=\mu\sum_{i=1}^{m}% \psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{i}({\bm{x}})){\nabla}{{c}}_{% i}({\bm{x}}),∇ [ italic_μ roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ bold_italic_c ] ( bold_italic_x ) = italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∇ [ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ( bold_italic_x ) = italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ,

where the first identity owes to the definition of Ψα/μsubscriptΨ𝛼𝜇\Psi_{\nicefrac{{\alpha}}{{\mu}}}roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT. Using (1e) yields the claimed expression. ∎

As is apparent from (1d), ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT coincides with up to when its slope is α/μ𝛼𝜇\nicefrac{{\alpha}}{{\mu}}/ start_ARG italic_α end_ARG start_ARG italic_μ end_ARG, and after that point it reduces to its tangent line. As such, ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT coincides with a McShane Lipschitz (and globally Lipschitz differentiable) extension [20] of a portion of the barrier , cf. Fig. 1(a). As Fig. 1(b) instead shows, by introducing a scaling factor μ𝜇\muitalic_μ the linear part has slope α𝛼\alphaitalic_α, independently of μ𝜇\muitalic_μ, and the sharp L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT penalty α[]+𝛼subscriptdelimited-[]\alpha[{}\cdot{}]_{+}italic_α [ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT coincides with the limiting case as μ𝜇\muitalic_μ is driven to 0. These details are formalized next.

Refer to caption
(a) Lipschitz extension of a portion of . Function ψρsubscript𝜓superscript𝜌\psi_{\rho^{\ast}}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT agrees with the barrier until its slope equals ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (at ρ()(ρ)\rho\coloneqq({}^{\ast})^{\prime}{}(\rho^{\ast})italic_ρ ≔ ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )), and then continues linearly with slope ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Apparently, ψρsubscript𝜓superscript𝜌absent\psi_{\rho^{\ast}}\nearrow\eulerb{}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ↗ as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞.
Refer to caption
(b) Intermediate between barrier and L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT penalty. As μ0𝜇0\mu\searrow 0italic_μ ↘ 0, μψα/μ𝜇subscript𝜓𝛼𝜇\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT uniformly converges to the sharp L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT barrier α[]+𝛼subscriptdelimited-[]\alpha[{}\cdot{}]_{+}italic_α [ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT while maintaining the same slope α𝛼\alphaitalic_α after the breakpoints.
Figure 1: Graph of ψρsubscript𝜓superscript𝜌\psi_{\rho^{\ast}}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for different values of ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (left) and graph of μψα/μ𝜇subscript𝜓𝛼𝜇\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT for different values of μ𝜇\muitalic_μ (right). These examples employ α1𝛼1\alpha\equiv 1italic_α ≡ 1 constant and the inverse barrier (t)=1t+δ(,0)𝑡1𝑡subscript𝛿0\eulerb{}(t)=-\frac{1}{t}+\operatorname{\delta}_{(-\infty,0)}( italic_t ) = - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG + italic_δ start_POSTSUBSCRIPT ( - ∞ , 0 ) end_POSTSUBSCRIPT.
{lemma}

[Limiting behavior of ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT] The following hold:

  1. 1.

    ψρbsubscript𝜓superscript𝜌𝑏\psi_{\rho^{\ast}}\nearrow bitalic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ↗ italic_b pointwise as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞.

  2. 2.

    ψρ/ρ[]+subscript𝜓superscript𝜌superscript𝜌subscriptdelimited-[]\psi_{\rho^{\ast}}/\rho^{\ast}\searrow[{}\cdot{}]_{+}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↘ [ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT pointwise as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞.

  3. 3.

    For any α>0𝛼0\alpha>0italic_α > 0, μψα/μα[]+𝜇subscript𝜓𝛼𝜇𝛼subscriptdelimited-[]\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}\searrow\alpha[{}\cdot{}]_{+}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ↘ italic_α [ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT pointwise as μ0𝜇0\mu\searrow 0italic_μ ↘ 0.

Proof.
  • \diamondsuit

    ??)  Follows from the expression (1d) by observing that tρt(ρ)t\mapsto\rho^{\ast}t-{}^{\ast}{}(\rho^{\ast})italic_t ↦ italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is tangent to the graph of (at ρ=(ρ)\rho={}^{\ast}{}(\rho^{\ast})italic_ρ = start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )), and is thus globally majorized by because of convexity.

  • \diamondsuit

    ??)  From the relation ψρ=(+δ[0,ρ])\psi_{\rho^{\ast}}={\mathopen{}\left({}^{\ast}{}+\operatorname{\delta}_{[0,% \rho^{\ast}]}\right)\mathclose{}}^{\ast}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT + italic_δ start_POSTSUBSCRIPT [ 0 , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as in (19c) and the conjugacy calculus rule of [1, Prop. 13.23(ii)] it follows that

    ψρρ=[1ρ((ρ)+δ[0,ρ](ρ))]=[1ρ(ρ)+δ[0,1]].\frac{\psi_{\rho^{\ast}}}{\rho^{\ast}}={\mathopen{}\left[\tfrac{1}{\rho^{\ast}% }{\mathopen{}\left({}^{\ast}{}(\rho^{\ast}{}\cdot{})+\operatorname{\delta}_{[0% ,\rho^{\ast}]}(\rho^{\ast}{}\cdot{})\right)\mathclose{}}\right]\mathclose{}}^{% \ast}={\mathopen{}\left[\tfrac{1}{\rho^{\ast}}{}^{\ast}{}(\rho^{\ast}{}\cdot{}% )+\operatorname{\delta}_{[0,1]}\right]\mathclose{}}^{\ast}.divide start_ARG italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = [ divide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⋅ ) + italic_δ start_POSTSUBSCRIPT [ 0 , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⋅ ) ) ] start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ divide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⋅ ) + italic_δ start_POSTSUBSCRIPT [ 0 , 1 ] end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

    To show that ψρ/ρsubscript𝜓superscript𝜌superscript𝜌\psi_{\rho^{\ast}}/\rho^{\ast}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is pointwise decreasing as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞, it thus suffices to show that 1ρ(ρ)\tfrac{1}{\rho^{\ast}}{}^{\ast}{}(\rho^{\ast}{}\cdot{})divide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⋅ ) is pointwise increasing, owing to the relation fgfg𝑓𝑔superscript𝑓superscript𝑔f\leq g\Rightarrow f^{\ast}\geq g^{\ast}italic_f ≤ italic_g ⇒ italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≥ italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT [1, Prop. 13.16(ii)]. On (,0)0(-\infty,0)( - ∞ , 0 ) one has that 1ρ(ρ)\tfrac{1}{\rho^{\ast}}{}^{\ast}{}(\rho^{\ast}{}\cdot{})\equiv\inftydivide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⋅ ) ≡ ∞ independently of ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, as it follows from Item 2. Moreover, (0)=inf=0{}^{\ast}{}(0)=-\operatorname*{inf}\eulerb{}=0start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( 0 ) = - roman_inf = 0. Finally, for t>0superscript𝑡0t^{*}>0italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0 one has that

    ddρ[(ρt)ρ]=ρt()(ρt)(ρt)(ρ)2=(()(ρt))(ρ)2>0,\frac{\mathop{}\!{\operator@font d}}{\mathop{}\!{\operator@font d}\rho^{\ast}}% {\mathopen{}\left[\tfrac{{}^{\ast}{}(\rho^{\ast}t^{*})}{\rho^{\ast}}\right]% \mathclose{}}=\frac{\rho^{\ast}t^{*}({}^{\ast})^{\prime}{}(\rho^{\ast}t^{*})-{% }^{\ast}{}(\rho^{\ast}t^{*})}{(\rho^{\ast})^{2}}=\frac{\eulerb{}{\mathopen{}% \left(({}^{\ast})^{\prime}{}(\rho^{\ast}t^{*})\right)\mathclose{}}}{(\rho^{% \ast})^{2}}>0,divide start_ARG roman_d end_ARG start_ARG roman_d italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG [ divide start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ] = divide start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG ( ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG start_ARG ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG > 0 ,

    where the second identity follows from Item 3. This confirms monotonicity on as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞.

    It remains to show that the limit (exists and) equals []+subscriptdelimited-[][{}\cdot{}]_{+}[ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. Since (ρt)ρ0\tfrac{{}^{\ast}{}(\rho^{\ast}t^{*})}{\rho^{\ast}}\geq 0divide start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ≥ 0, the monotonically decreasing behavior just proven implies that the limit exists and is pointwise positive. For t<0𝑡0t<0italic_t < 0, from the definition (1d) we have

    limρψρ(t)ρ=limρ(t)ρ=0.subscriptsuperscript𝜌subscript𝜓superscript𝜌𝑡superscript𝜌subscriptsuperscript𝜌𝑡superscript𝜌0\lim_{\rho^{\ast}\to\infty}\frac{\psi_{\rho^{\ast}}(t)}{\rho^{\ast}}=\lim_{% \rho^{\ast}\to\infty}\frac{\eulerb{}(t)}{\rho^{\ast}}=0.roman_lim start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞ end_POSTSUBSCRIPT divide start_ARG italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = roman_lim start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞ end_POSTSUBSCRIPT divide start_ARG ( italic_t ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = 0 .

    For t0𝑡0t\geq 0italic_t ≥ 0 we have ψρ(t)ρ=t(ρ)ρ\frac{\psi_{\rho^{\ast}}(t)}{\rho^{\ast}}=t-\frac{{}^{\ast}{}(\rho^{\ast})}{% \rho^{\ast}}divide start_ARG italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = italic_t - divide start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG, which converges to t𝑡titalic_t as ρsuperscript𝜌\rho^{\ast}\to\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞ by virtue of (17).

  • \diamondsuit

    ??)  This follows from assertion ??, since μψα/μ=αψα/μ/(α/μ)𝜇subscript𝜓𝛼𝜇𝛼subscript𝜓𝛼𝜇𝛼𝜇\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}=\alpha\psi_{\nicefrac{{\alpha}}{{\mu}}}/(% \nicefrac{{\alpha}}{{\mu}})italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT = italic_α italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT / ( / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG ), and α/μ𝛼𝜇\nicefrac{{\alpha}}{{\mu}}\nearrow\infty/ start_ARG italic_α end_ARG start_ARG italic_μ end_ARG ↗ ∞ as μ0𝜇0\mu\searrow 0italic_μ ↘ 0. ∎

In support of our claims about, we emphasize that the smooth penalty Ψα/μ𝒄subscriptΨ𝛼𝜇𝒄\Psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{\bm{c}}roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ bold_italic_c is convex whenever all the components cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are, and demonstrate how Lipschitz differentiability of 𝒄𝒄{\bm{c}}bold_italic_c is usually preserved after a composition with ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT.

{lemma}

[Properties of ψα/μ𝒄subscript𝜓𝛼𝜇𝒄\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{\bm{c}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ bold_italic_c]Let α,μ>0𝛼𝜇0\alpha,\mu>0italic_α , italic_μ > 0 be fixed and let comply with Section 3.2.

  1. 1.

    If cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is convex, then ψα/μcisubscript𝜓𝛼𝜇subscript𝑐𝑖\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is convex.

  2. 2.

    If cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT has Lipschitz-continuous gradient, then so does ψα/μcisubscript𝜓𝛼𝜇subscript𝑐𝑖\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT provided that cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is Lipschitz continuous on closed subsets of {𝒙nci(𝒙)<0}{\mathopen{}\left\{{\bm{x}}\in{}^{n}{}\mathrel{\mid}{}{{c}}_{i}({\bm{x}})<0% \right\}\mathclose{}}{ bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) < 0 } (as is the case when cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is lower bounded [17, Lem. 2.3]).

Proof.

The first claim about convexity is straightforward, since ψα/μcisubscript𝜓𝛼𝜇subscript𝑐𝑖\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT would amount to the composition of the convex and increasing function ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT with the convex function cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We next prove the statement about Lipschitz differentiability, thereby assuming that cisubscript𝑐𝑖{\nabla}{{c}}_{i}∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is Lipschitz on n with modulus L0𝐿0L\geq 0italic_L ≥ 0, while cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is Lipschitz on {𝒙nci(𝒙)ρ}{\mathopen{}\left\{{\bm{x}}\in{}^{n}{}\mathrel{\mid}{}{{c}}_{i}({\bm{x}})\leq% \rho\right\}\mathclose{}}{ bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ italic_ρ } with modulus 00\ell\geq 0roman_ℓ ≥ 0, where ρ()(α/μ)<0\rho\coloneqq({}^{\ast})^{\prime}{}(\nicefrac{{\alpha}}{{\mu}})<0italic_ρ ≔ ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG ) < 0. Recall that ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT is (α/μ)𝛼𝜇(\nicefrac{{\alpha}}{{\mu}})( / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG )-Lipschitz continuous, coincides with on (,ρ]𝜌(-\infty,\rho]( - ∞ , italic_ρ ], and is then linear with slope α/μ𝛼𝜇\nicefrac{{\alpha}}{{\mu}}/ start_ARG italic_α end_ARG start_ARG italic_μ end_ARG on (ρ,)𝜌(\rho,\infty)( italic_ρ , ∞ ), cf. (1e). Fix 𝒙,𝒚n{\bm{x}},{\bm{y}}\in{}^{n}bold_italic_x , bold_italic_y ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT, and without loss of generality assume that ci(𝒙)ci(𝒚)subscript𝑐𝑖𝒙subscript𝑐𝑖𝒚{{c}}_{i}({\bm{x}})\leq{{c}}_{i}({\bm{y}})italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ). We have

(ψα/μci)(𝒙)(ψα/μci)(𝒚)=normsubscript𝜓𝛼𝜇subscript𝑐𝑖𝒙subscript𝜓𝛼𝜇subscript𝑐𝑖𝒚absent\displaystyle\|{\nabla}(\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i})({\bm{% x}})-{\nabla}(\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i})({\bm{y}})\|={}∥ ∇ ( italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_italic_x ) - ∇ ( italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_italic_y ) ∥ = ψα/μ(ci(𝒙))ci(𝒙)ψα/μ(ci(𝒚))ci(𝒚)normsubscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒙subscript𝑐𝑖𝒙subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒚subscript𝑐𝑖𝒚\displaystyle\|\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{i}({\bm{x}}))% {\nabla}{{c}}_{i}({\bm{x}})-\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{% i}({\bm{y}})){\nabla}{{c}}_{i}({\bm{y}})\|∥ italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) - italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) ) ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) ∥
\displaystyle\leq{} ψα/μ(ci(𝒙))ci(𝒙)ci(𝒚)subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒙normsubscript𝑐𝑖𝒙subscript𝑐𝑖𝒚\displaystyle\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{i}({\bm{x}}))\|% {\nabla}{{c}}_{i}({\bm{x}})-{\nabla}{{c}}_{i}({\bm{y}})\|italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) ∥ ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) - ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) ∥
+ci(𝒙)|ψα/μ(ci(𝒙))ψα/μ(ci(𝒚))|normsubscript𝑐𝑖𝒙subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒙subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒚\displaystyle+\|{\nabla}{{c}}_{i}({\bm{x}})\||\psi^{\prime}_{\nicefrac{{\alpha% }}{{\mu}}}({{c}}_{i}({\bm{x}}))-\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c% }}_{i}({\bm{y}}))|+ ∥ ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ∥ | italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) - italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) ) |
\displaystyle\leq{} αμL𝒙𝒚+ci(𝒙)|ψα/μ(ci(𝒙))ψα/μ(ci(𝒚))|.𝛼𝜇𝐿norm𝒙𝒚normsubscript𝑐𝑖𝒙subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒙subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒚\displaystyle\tfrac{\alpha}{\mu}L\|{\bm{x}}-{\bm{y}}\|+\|{\nabla}{{c}}_{i}({% \bm{x}})\||\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{i}({\bm{x}}))-% \psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{i}({\bm{y}}))|.divide start_ARG italic_α end_ARG start_ARG italic_μ end_ARG italic_L ∥ bold_italic_x - bold_italic_y ∥ + ∥ ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ∥ | italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) - italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) ) | .

It remains to account for the second term in the last sum. If ci(𝒙)ci(𝒚)ρsubscript𝑐𝑖𝒙subscript𝑐𝑖𝒚𝜌{{c}}_{i}({\bm{x}})\leq{{c}}_{i}({\bm{y}})\leq\rhoitalic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) ≤ italic_ρ, then ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT coincides with in all occurrences, and the term can be upper bounded as B2𝒙𝒚𝐵superscript2norm𝒙𝒚B\ell^{2}\|{\bm{x}}-{\bm{y}}\|italic_B roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_x - bold_italic_y ∥, where Bmax(,ρ]B\coloneqq\max_{(-\infty,\rho]}{}^{\prime}{}^{\prime}italic_B ≔ roman_max start_POSTSUBSCRIPT ( - ∞ , italic_ρ ] end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT is a Lipschitz modulus for on (,ρ]𝜌(-\infty,\rho]( - ∞ , italic_ρ ]. If ci(𝒙)ρ<ci(𝒚)subscript𝑐𝑖𝒙𝜌subscript𝑐𝑖𝒚{{c}}_{i}({\bm{x}})\leq\rho<{{c}}_{i}({\bm{y}})italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ italic_ρ < italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ), then ψα/μ(ci(𝒚))=ψα/μ(ρ)subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒚subscriptsuperscript𝜓𝛼𝜇𝜌\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{i}({\bm{y}}))=\psi^{\prime}_% {\nicefrac{{\alpha}}{{\mu}}}(\rho)italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) ) = italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_ρ ) and, by continuity, there exists t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ] such that ci(𝒙+t(𝒚𝒙))=ρsubscript𝑐𝑖𝒙𝑡𝒚𝒙𝜌{{c}}_{i}({\bm{x}}+t({\bm{y}}-{\bm{x}}))=\rhoitalic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x + italic_t ( bold_italic_y - bold_italic_x ) ) = italic_ρ, so that ψα/μ(ci(𝒙+t(𝒚𝒙)))=ψα/μ(ρ)subscriptsuperscript𝜓𝛼𝜇subscript𝑐𝑖𝒙𝑡𝒚𝒙subscriptsuperscript𝜓𝛼𝜇𝜌\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}({{c}}_{i}({\bm{x}}+t({\bm{y}}-{\bm{% x}})))=\psi^{\prime}_{\nicefrac{{\alpha}}{{\mu}}}(\rho)italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x + italic_t ( bold_italic_y - bold_italic_x ) ) ) = italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_ρ ), resulting in the same bound Bt2𝒙𝒚B2𝒙𝒚𝐵𝑡superscript2norm𝒙𝒚𝐵superscript2norm𝒙𝒚Bt\ell^{2}\|{\bm{x}}-{\bm{y}}\|\leq B\ell^{2}\|{\bm{x}}-{\bm{y}}\|italic_B italic_t roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_x - bold_italic_y ∥ ≤ italic_B roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_x - bold_italic_y ∥. Lastly, if ρci(𝒙)ci(𝒚)𝜌subscript𝑐𝑖𝒙subscript𝑐𝑖𝒚\rho\leq{{c}}_{i}({\bm{x}})\leq{{c}}_{i}({\bm{y}})italic_ρ ≤ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y ) then the last term is zero. In all cases we conclude that

(ψα/μci)(𝒙)(ψα/μci)(𝒚)(αμL+B2)𝒙𝒚𝒙,𝒚,n\|{\nabla}(\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i})({\bm{x}})-{\nabla}% (\psi_{\nicefrac{{\alpha}}{{\mu}}}\circ{{c}}_{i})({\bm{y}})\|\leq\bigl{(}% \tfrac{\alpha}{\mu}L+B\ell^{2}\bigr{)}\|{\bm{x}}-{\bm{y}}\|\quad\forall{\bm{x}% },{\bm{y}}\in{}^{n},∥ ∇ ( italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_italic_x ) - ∇ ( italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_italic_y ) ∥ ≤ ( divide start_ARG italic_α end_ARG start_ARG italic_μ end_ARG italic_L + italic_B roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_italic_x - bold_italic_y ∥ ∀ bold_italic_x , bold_italic_y ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ,

proving the claim. ∎

The requirements on cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT other than Lipschitz differentiability in Item 2 are virtually negligible, since lower boundedness can be artificially imposed by replacing cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with, say, rci𝑟subscript𝑐𝑖r\circ{{c}}_{i}italic_r ∘ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where r(t)[t+1]+2+32𝑟𝑡superscriptsubscriptdelimited-[]𝑡1232r(t)\coloneqq\sqrt{[t+1]_{+}^{2}+3}-2italic_r ( italic_t ) ≔ square-root start_ARG [ italic_t + 1 ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 end_ARG - 2. Inspired by the penalty function adopted in [23], r𝑟ritalic_r is lower bounded and such that r(t)0𝑟𝑡0r(t)\leq 0italic_r ( italic_t ) ≤ 0 iff t0𝑡0t\leq 0italic_t ≤ 0; in addition, having r(0)=1/20superscript𝑟0120r^{\prime}(0)=\nicefrac{{1}}{{2}}\neq 0italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) = / start_ARG 1 end_ARG start_ARG 2 end_ARG ≠ 0, its adoption does not affect qualifications of active constraints. A Hessian inspection reveals that Lipschitz differentiability is preserved for ‘reasonable’ cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, e.g. whenever ci(𝒙)2max{ci(𝒙)3,1}\frac{\|{\nabla}{{c}}_{i}({\bm{x}})\|^{2}}{\max{\mathopen{}\left\{{{c}}_{i}({% \bm{x}})^{3},1{}\mathrel{\mid}{}{}\right\}\mathclose{}}}divide start_ARG ∥ ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_max { italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , 1 ∣ } end_ARG is bounded on the set {𝒙ci(𝒙)1}𝒙subscript𝑐𝑖𝒙1{\mathopen{}\left\{{\bm{x}}{}\mathrel{\mid}{}{{c}}_{i}({\bm{x}})\geq-1\right\}% \mathclose{}}{ bold_italic_x ∣ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≥ - 1 }, as it happens for quadratic functions. This is in stark constrast with methods such as ALM in which Lipschitz differentiability is typically lost in the composition with quadratic penalties.

4 Algorithmic framework

Algorithm 1 General framework
1.1:tolerances ϵp,ϵd0subscriptitalic-ϵpsubscriptitalic-ϵd0\epsilon_{\rm p},\epsilon_{\rm d}\geq 0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT ≥ 0;  parameters α0,μ0>0subscript𝛼0subscript𝜇00\alpha_{0},\mu_{0}>0italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 and ε0ϵdsubscript𝜀0subscriptitalic-ϵd\varepsilon_{0}\geq\epsilon_{\rm d}italic_ε start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT;  ratios δα>1subscript𝛿𝛼1\delta_{\alpha}>1italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT > 1 and δε,δμ(0,1)subscript𝛿𝜀subscript𝛿𝜇01\delta_{\varepsilon},\delta_{\mu}\in(0,1)italic_δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT ∈ ( 0 , 1 )
1.2:
1.3:Find an εksubscript𝜀𝑘\varepsilon_{k}italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT-stationary point 𝒙ksuperscript𝒙𝑘{\bm{x}}^{k}bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT for (3.2) with (α,μ)=(αk,μk)𝛼𝜇subscript𝛼𝑘subscript𝜇𝑘(\alpha,\mu)=(\alpha_{k},\mu_{k})( italic_α , italic_μ ) = ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
1.4:yik=min{αk,μk(ci(𝒙k))}{{y}}_{i}^{k}=\min{\mathopen{}\left\{\alpha_{k},\mu_{k}{}^{\prime}{}({{c}}_{i}% ({\bm{x}}^{k})){}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = roman_min { italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) ∣ }, i=1,,m𝑖1𝑚i=1,\dots,mitalic_i = 1 , … , italic_m % yik=μkψαk/μk(ci(𝒙k))superscriptsubscript𝑦𝑖𝑘subscript𝜇𝑘superscriptsubscript𝜓subscript𝛼𝑘subscript𝜇𝑘subscript𝑐𝑖superscript𝒙𝑘{{y}}_{i}^{k}=\mu_{k}\psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k}}}}^{\prime}({{c}}_% {i}({\bm{x}}^{k}))italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ), cf. Section 3.2
1.5:pk=[𝒄(𝒙k)]+subscript𝑝𝑘subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘p_{k}=\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{\infty}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT % constraints violation
1.6:sk=min{𝒚k,[𝒄(𝒙k)]}s_{k}=\|\min{\mathopen{}\left\{{\bm{y}}^{k},[{\bm{c}}({\bm{x}}^{k})]_{-}{}% \mathrel{\mid}{}{}\right\}\mathclose{}}\|_{\infty}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∥ roman_min { bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ∣ } ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT % slackness violation
1.7:if  εkϵdsubscript𝜀𝑘subscriptitalic-ϵd\varepsilon_{k}\leq\epsilon_{\rm d}italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT, pkϵpsubscript𝑝𝑘subscriptitalic-ϵpp_{k}\leq\epsilon_{\rm p}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT, and skϵpsubscript𝑠𝑘subscriptitalic-ϵps_{k}\leq\epsilon_{\rm p}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT  then
1.8:     return (𝒙k,𝒚k)superscript𝒙𝑘superscript𝒚𝑘({\bm{x}}^{k},{\bm{y}}^{k})( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) (ϵp,ϵd)subscriptitalic-ϵpsubscriptitalic-ϵd(\epsilon_{\rm p},\epsilon_{\rm d})( italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT )-KKTpair for (1)
1.9:εk+1=max{δεεk,ϵd},\varepsilon_{k+1}=\max{\mathopen{}\left\{\delta_{\varepsilon}\varepsilon_{k},% \,\epsilon_{\rm d}{}\mathrel{\mid}{}{}\right\}\mathclose{}},italic_ε start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_max { italic_δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT ∣ } ,
1.10:if  pk>max{ϵp,2mμkαk(αkμk)}p_{k}>\max{\mathopen{}\left\{\epsilon_{\rm p},-2m\frac{\mu_{k}}{\alpha_{k}}{}^% {\ast}{}\bigl{(}\tfrac{\alpha_{k}}{\mu_{k}}\bigr{)}{}\mathrel{\mid}{}{}\right% \}\mathclose{}}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > roman_max { italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , - 2 italic_m divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) ∣ }  then
1.11:     αk+1=δααksubscript𝛼𝑘1subscript𝛿𝛼subscript𝛼𝑘\alpha_{k+1}=\delta_{\alpha}\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and μk+1={δμμkif sk>ϵpμkotherwisesubscript𝜇𝑘1casessubscript𝛿𝜇subscript𝜇𝑘if subscript𝑠𝑘subscriptitalic-ϵpsubscript𝜇𝑘otherwise{\mu_{k+1}={\mathopen{}\left\{\begin{array}[]{l @{\hspace{\ifcasescolsep}} >{% \text{if~}}l }\delta_{\mu}\mu_{k}\hfil\hskip 10.00002pt&\leavevmode\nobreak\ }% s_{k}>\epsilon_{\rm p}\\ \mu_{k}\hfil\hskip 10.00002pt&\lx@intercol\text{otherwise}\hfil\lx@intercol% \end{array}\right.\mathclose{}}italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL start_CELL if italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY
1.12:else
1.13:     αk+1=αksubscript𝛼𝑘1subscript𝛼𝑘\alpha_{k+1}=\alpha_{k}\hphantom{\delta_{\alpha}}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and μk+1=δμμksubscript𝜇𝑘1subscript𝛿𝜇subscript𝜇𝑘\mu_{k+1}=\delta_{\mu}\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

As shown in the previous section, the cost function in the smoothened problem (3.2) pointwise converges to the original hard-constrained cost q+δm𝒄q+\operatorname{\delta}_{{}_{-}^{m}}\circ{\bm{c}}italic_q + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT - end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∘ bold_italic_c of (1) as μ0𝜇0\mu\searrow 0italic_μ ↘ 0 and α𝛼\alpha\nearrow\inftyitalic_α ↗ ∞. Following the penalty method rationale, this motivates solving (up to approximate local optimality) instances of (3.2) for progressively small values of μ𝜇\muitalic_μ and larger values of α𝛼\alphaitalic_α. This is the leading idea of the algorithmic framework of Algorithm 1 presented in this section, which also implements suitable update rules for the coefficients ensuring that the output satisfies suitable optimality conditions for the original problem (1). In fact, a careful design of the update rule for the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT penalization parameter α𝛼\alphaitalic_α in (3.2) prevents this coefficient from divergent behaviors under favorable conditions on the problem. The reason behind the involvement of the conjugate in the update criterion at Step 1.10 will be revealed in Sections 4.1 and 4.2 through a systematic study of the properties of the barrier in the generality of Section 1 as well as when specialized to the convex case.

Algorithm 1 is not tied to any particular solver for addressing each instance of (3.2). Whenever q𝑞qitalic_q amounts to the sum of a differentiable and a proximable function (in the sense that its proximal mapping is easily computable), such structure is retained by the cost function in (3.2), indicating that proximal-gradient based methods are suitable candidates. This was also the case in the purely interior-point based IPprox of [12], which considers a plain proximal gradient with a backtracking routine for selecting the stepsizes. Differently from the subproblems of IPprox in which the differentiable term is extended-real valued, the differentiable term in (3.2) is smooth on the whole n. This enables the employment of more sophisticated proximal-gradient-type algorithms such as PANOC+ [11] that make use of higher-order information to considerably enhance convergence speed. This claim will be substantiated with numerical evidence in Section 5; in this section, we instead focus on properties of the outer Algorithm 1 that are independent of the inner solver.

{lemma}

[properties of the iterates]Suppose that Sections 1 and 3.2 hold, and consider the iterates generated by Algorithm 1. At every iteration k𝑘kitalic_k the following hold:

  1. 1.

    𝟎𝒚kαk𝟏0superscript𝒚𝑘subscript𝛼𝑘1{\bm{0}}\leq{\bm{y}}^{k}\leq\alpha_{k}{\bm{1}}bold_0 ≤ bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ≤ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_1.

  2. 2.

    (𝒙kdomqsuperscript𝒙𝑘dom𝑞{\bm{x}}^{k}\in\operatorname{dom}qbold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ roman_dom italic_q and) distq(𝒙k)(J𝒄(𝒙k)𝒚k)εksubscriptdist𝑞superscript𝒙𝑘J𝒄superscriptsuperscript𝒙𝑘topsuperscript𝒚𝑘subscript𝜀𝑘\operatorname{dist}_{\partial q({\bm{x}}^{k})}(-\mathop{}\!{\operatorname{J}}{% \bm{c}}({\bm{x}}^{k})^{\top}{\bm{y}}^{k})\leq\varepsilon_{k}roman_dist start_POSTSUBSCRIPT ∂ italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( - roman_J bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ≤ italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

  3. 3.

    If (ϵp>0subscriptitalic-ϵp0\epsilon_{\rm p}>0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT > 0 and) μkϵp(ϵp)\mu_{k}\leq\frac{\epsilon_{\rm p}}{{}^{\prime}{}(-\epsilon_{\rm p})}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ divide start_ARG italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT end_ARG start_ARG start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( - italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT ) end_ARG, then skϵpsubscript𝑠𝑘subscriptitalic-ϵps_{k}\leq\epsilon_{\rm p}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT.

  4. 4.

    For k1𝑘1k\geq 1italic_k ≥ 1, either αk=δααk1subscript𝛼𝑘subscript𝛿𝛼subscript𝛼𝑘1\alpha_{k}=\delta_{\alpha}\alpha_{k-1}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT or μk=δμμk1subscript𝜇𝑘subscript𝛿𝜇subscript𝜇𝑘1\mu_{k}=\delta_{\mu}\mu_{k-1}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT (possibly both); in particular, letting ρkαk/μksubscriptsuperscript𝜌𝑘subscript𝛼𝑘subscript𝜇𝑘\rho^{\ast}_{k}\coloneqq\nicefrac{{\alpha_{k}}}{{\mu_{k}}}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≔ / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG and δρmin{δα,δμ1}\delta_{\rho^{\ast}}\coloneqq\min{\mathopen{}\left\{\delta_{\alpha},\delta_{% \mu}^{-1}{}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≔ roman_min { italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∣ } it holds that ρkδρρk1subscriptsuperscript𝜌𝑘subscript𝛿superscript𝜌subscriptsuperscript𝜌𝑘1\rho^{\ast}_{k}\geq\delta_{\rho^{\ast}}\rho^{\ast}_{k-1}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT.

Proof.

Recall that 𝒚k𝟎superscript𝒚𝑘0{\bm{y}}^{k}\geq{\bm{0}}bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ≥ bold_0 holds by construction for every k𝑘kitalic_k, since >0{}^{\prime}{}>0start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT > 0, from which assertion ?? follows. Regarding assertion ??, by Section 3.2 this is precisely the required stationarity for 𝒙ksuperscript𝒙𝑘{\bm{x}}^{k}bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. Assertion ?? is obvious by observing that whenever αk+1=αksubscript𝛼𝑘1subscript𝛼𝑘\alpha_{k+1}=\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT the update μk+1=δμμksubscript𝜇𝑘1subscript𝛿𝜇subscript𝜇𝑘\mu_{k+1}=\delta_{\mu}\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is enforced.

We finally turn to assertion ??, and suppose that μkϵp(ϵp)\mu_{k}\leq\frac{\epsilon_{\rm p}}{{}^{\prime}{}(-\epsilon_{\rm p})}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ divide start_ARG italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT end_ARG start_ARG start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( - italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT ) end_ARG. Then, for all i𝑖iitalic_i such that [ci(𝒙k)]>ϵpsubscriptdelimited-[]subscript𝑐𝑖superscript𝒙𝑘subscriptitalic-ϵp[{{c}}_{i}({\bm{x}}^{k})]_{-}>\epsilon_{\rm p}[ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT > italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT (or, equivalently, ci(𝒙k)<ϵpsubscript𝑐𝑖superscript𝒙𝑘subscriptitalic-ϵp{{c}}_{i}({\bm{x}}^{k})<-\epsilon_{\rm p}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) < - italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT), one has

yikμk(ci(𝒙k))μk(ϵp)ϵp,{{y}}_{i}^{k}\leq\mu_{k}{}^{\prime}{}({{c}}_{i}({\bm{x}}^{k}))\leq\mu_{k}{}^{% \prime}{}(-\epsilon_{\rm p})\leq\epsilon_{\rm p},italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) ≤ italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( - italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT ) ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT ,

where the first inequality follows from the definition of 𝒚ksuperscript𝒚𝑘{\bm{y}}^{k}bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and the second one owes to monotonicity of . Thus, for all i𝑖iitalic_i at least one among yiksuperscriptsubscript𝑦𝑖𝑘{{y}}_{i}^{k}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and [ci(𝒙k)]subscriptdelimited-[]subscript𝑐𝑖superscript𝒙𝑘[{{c}}_{i}({\bm{x}}^{k})]_{-}[ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT is not larger than ϵpsubscriptitalic-ϵp\epsilon_{\rm p}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT, proving that sk=maxi=1,,mmin{yik,[ci(𝒙k)]}ϵps_{k}=\max_{i=1,\dots,m}\min{\mathopen{}\left\{{{y}}_{i}^{k},[{{c}}_{i}({\bm{x% }}^{k})]_{-}{}\mathrel{\mid}{}{}\right\}\mathclose{}}\leq\epsilon_{\rm p}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_m end_POSTSUBSCRIPT roman_min { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ∣ } ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT. ∎

{corollary}

[stationarity of feasible limit points]Let Sections 1 and 3.2 hold, and consider the iterates generated by Algorithm 1. If the algorithm runs indefinitely, then μkαk(αk/μk)0-\frac{\mu_{k}}{\alpha_{k}}{}^{\ast}{}(\nicefrac{{\alpha_{k}}}{{\mu_{k}}})\searrow 0- divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) ↘ 0 as k𝑘k\to\inftyitalic_k → ∞ and any accumulation point 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG of (𝒙k)ksubscriptsuperscript𝒙𝑘𝑘absent({\bm{x}}^{k})_{k\in\m@thbbch@rN}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT that satisfies 𝒄(𝒙¯)𝟎𝒄¯𝒙0{\bm{c}}(\bar{\bm{x}})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ bold_0 is 2.2-optimal for (1).

Proof.

The monotonic vanishing of μkαk(αk/μk)-\frac{\mu_{k}}{\alpha_{k}}{}^{\ast}{}(\nicefrac{{\alpha_{k}}}{{\mu_{k}}})- divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) follows from Items 4 and 4. Suppose that (𝒙k)kK𝒙¯subscriptsuperscript𝒙𝑘𝑘𝐾¯𝒙({\bm{x}}^{k})_{k\in K}\to\bar{\bm{x}}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ italic_K end_POSTSUBSCRIPT → over¯ start_ARG bold_italic_x end_ARG with 𝒄(𝒙¯)𝟎𝒄¯𝒙0{\bm{c}}(\bar{\bm{x}})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ bold_0, and let i𝑖iitalic_i be such that ci(𝒙¯)<0subscript𝑐𝑖¯𝒙0{{c}}_{i}(\bar{\bm{x}})<0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) < 0 (if such an i𝑖iitalic_i does not exist, then there is nothing to show). According to Sections 2.2 and 1, it suffices to show that (yik)kK0subscriptsuperscriptsubscript𝑦𝑖𝑘𝑘𝐾0({{y}}_{i}^{k})_{k\in K}\to 0( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ italic_K end_POSTSUBSCRIPT → 0; in turn, by definition of 𝒚ksuperscript𝒚𝑘{\bm{y}}^{k}bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and continuity of 𝒄𝒄{\bm{c}}bold_italic_c it suffices to show that μk0subscript𝜇𝑘0\mu_{k}\searrow 0italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↘ 0. If ϵp>0subscriptitalic-ϵp0\epsilon_{\rm p}>0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT > 0, then continuity of 𝒄𝒄{\bm{c}}bold_italic_c implies that αk+1=αksubscript𝛼𝑘1subscript𝛼𝑘\alpha_{k+1}=\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all kK𝑘𝐾k\in Kitalic_k ∈ italic_K large enough, hence, by virtue of Item 4, μk+1=δμμksubscript𝜇𝑘1subscript𝛿𝜇subscript𝜇𝑘\mu_{k+1}=\delta_{\mu}\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all such k𝑘kitalic_k. Since (μk)ksubscriptsubscript𝜇𝑘𝑘absent(\mu_{k})_{k\in\m@thbbch@rN}( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT is monotone, in this case μk0subscript𝜇𝑘0\mu_{k}\searrow 0italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↘ 0 as k𝑘k\to\inftyitalic_k → ∞. Suppose instead that ϵp=0subscriptitalic-ϵp0\epsilon_{\rm p}=0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 0, and, to arrive to a contradiction, that μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is asymptotically constant. This implies that skϵp=0subscript𝑠𝑘subscriptitalic-ϵp0s_{k}\leq\epsilon_{\rm p}=0italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 0 eventually always holds, which is a contradiction since

skmin{yik,[ci(𝒙k)]}min{yik,12ci(𝒙¯)}>0kK large,s_{k}\geq\min{\mathopen{}\left\{{{y}}_{i}^{k},[{{c}}_{i}({\bm{x}}^{k})]_{-}{}% \mathrel{\mid}{}{}\right\}\mathclose{}}\geq\min{\mathopen{}\left\{{{y}}_{i}^{k% },-\tfrac{1}{2}{{c}}_{i}(\bar{\bm{x}}){}\mathrel{\mid}{}{}\right\}\mathclose{}% }>0\quad\forall k\in K\text{ large,}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ roman_min { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ∣ } ≥ roman_min { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ∣ } > 0 ∀ italic_k ∈ italic_K large,

where the first inequality follows by definition of sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, cf. Step 1.6, the second one for kK𝑘𝐾k\in Kitalic_k ∈ italic_K large since ci(𝒙k)ci(𝒙¯)<0subscript𝑐𝑖superscript𝒙𝑘subscript𝑐𝑖¯𝒙0{{c}}_{i}({\bm{x}}^{k})\to{{c}}_{i}(\bar{\bm{x}})<0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) < 0 as Kkcontains𝐾𝑘K\ni k\to\inftyitalic_K ∋ italic_k → ∞, and the last one because 𝒚k>𝟎superscript𝒚𝑘0{\bm{y}}^{k}>{\bm{0}}bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT > bold_0. ∎

The update rule for the penalty parameter does not demand (approximate) feasibility, but it depends on a relaxed condition at Step 1.10. By (17), the second term vanishes as α/μ𝛼𝜇\alpha/\mu\to\inftyitalic_α / italic_μ → ∞, so the penalty parameter is eventually increased as needed to achieve ϵpsubscriptitalic-ϵp\epsilon_{\rm p}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT-feasibility. The relaxation of this condition using a quantity involving the conjugate mitigates the growth of α𝛼\alphaitalic_α. Simultaneously, under suitable choices of the barrier , it ensures that this parameter remains unchanged only if the constraints violation stays within a controlled range, as will be ultimately demonstrated in Section 4.1.

{theorem}

Suppose that Sections 1 and 3.2 hold, and consider the iterates generated by Algorithm 1 with ϵp,ϵd>0subscriptitalic-ϵpsubscriptitalic-ϵd0\epsilon_{\rm p},\epsilon_{\rm d}>0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT > 0. Then, minkμk>0subscript𝑘subscript𝜇𝑘0\min_{k}\mu_{k}>0roman_min start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 and exactly one of the following scenarios occurs:

  1. 1
  2. 2

    or it runs indenfinitely with skϵp<pksubscript𝑠𝑘subscriptitalic-ϵpsubscript𝑝𝑘s_{k}\leq\epsilon_{\rm p}<p_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT < italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all k𝑘kitalic_k large enough, and (αk)ksubscriptsubscript𝛼𝑘𝑘absent(\alpha_{k})_{k\in\m@thbbch@rN}\nearrow\infty( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT ↗ ∞.

In the latter case, if domqdom𝑞\operatorname{dom}qroman_dom italic_q is closed, then for any accumulation point 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG of (𝒙k)ksubscriptsuperscript𝒙𝑘𝑘absent({\bm{x}}^{k})_{k\in\m@thbbch@rN}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT one has that (𝒙¯,q(𝒙¯))¯𝒙𝑞¯𝒙(\bar{\bm{x}},q(\bar{\bm{x}}))( over¯ start_ARG bold_italic_x end_ARG , italic_q ( over¯ start_ARG bold_italic_x end_ARG ) ) is KKT-stationary for the feasiblity problem

minimize(𝒙,t)epiq[𝒄(𝒙)]+1,subscriptminimize𝒙𝑡epi𝑞subscriptnormsubscriptdelimited-[]𝒄𝒙1\operatorname*{minimize}_{({\bm{x}},t)\in\operatorname{epi}q}\|[{\bm{c}}({\bm{% x}})]_{+}\|_{1},roman_minimize start_POSTSUBSCRIPT ( bold_italic_x , italic_t ) ∈ roman_epi italic_q end_POSTSUBSCRIPT ∥ [ bold_italic_c ( bold_italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (2)

in the sense that (𝟎,0)[c(𝒙¯)]+1×{0}+Nepiq(𝒙¯,q(𝒙¯))({\bm{0}},0)\in\partial\|[c(\bar{\bm{x}})]_{+}\|_{1}\times{\mathopen{}\left\{0% {}\mathrel{\mid}{}{}\right\}\mathclose{}}+\operatorname{N}_{\operatorname{epi}% q}(\bar{\bm{x}},q(\bar{\bm{x}}))( bold_0 , 0 ) ∈ ∂ ∥ [ italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × { 0 ∣ } + roman_N start_POSTSUBSCRIPT roman_epi italic_q end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_q ( over¯ start_ARG bold_italic_x end_ARG ) ).

Proof.

Since μk+1μksubscript𝜇𝑘1subscript𝜇𝑘\mu_{k+1}\leq\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all k𝑘kitalic_k, and μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is linearly reduced whenever sk>ϵpsubscript𝑠𝑘subscriptitalic-ϵps_{k}>\epsilon_{\rm p}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT, we conclude that (either the algorithm terminates or) skϵpsubscript𝑠𝑘subscriptitalic-ϵps_{k}\leq\epsilon_{\rm p}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT eventually always holds.

If the algorithm returns a pair (𝒙k,𝒚k)superscript𝒙𝑘superscript𝒚𝑘({\bm{x}}^{k},{\bm{y}}^{k})( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ), then the compliance with the termination criteria ensures that (𝒙k,𝒚k)superscript𝒙𝑘superscript𝒚𝑘({\bm{x}}^{k},{\bm{y}}^{k})( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) meets all conditions in Section 2.2, and hence it is (ϵp,ϵd)subscriptitalic-ϵpsubscriptitalic-ϵd(\epsilon_{\rm p},\epsilon_{\rm d})( italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT )-KKT-stationary for (1).

Suppose instead that the algorithm does not terminate. Clearly, εk=ϵdsubscript𝜀𝑘subscriptitalic-ϵd\varepsilon_{k}=\epsilon_{\rm d}italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT holds for k𝑘kitalic_k large enough, so that the only unmet termination criterion is eventually pkϵdsubscript𝑝𝑘subscriptitalic-ϵdp_{k}\leq\epsilon_{\rm d}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT. Therefore, pk>ϵpsubscript𝑝𝑘subscriptitalic-ϵpp_{k}>\epsilon_{\rm p}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT holds for every k𝑘kitalic_k large enough. It follows from assertion ?? and Item 4 that (ρk)/ρk-{}^{\ast}{}(\rho^{\ast}_{k})/\rho^{\ast}_{k}- start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT eventually drops below ϵpsubscriptitalic-ϵp\epsilon_{\rm p}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT, implying that the condition for increasing αk+1subscript𝛼𝑘1\alpha_{k+1}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT at Step 1.10 reduces to pk>ϵpsubscript𝑝𝑘subscriptitalic-ϵpp_{k}>\epsilon_{\rm p}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT. Having shown that this is eventually always the case, αk+1=δααksubscript𝛼𝑘1subscript𝛿𝛼subscript𝛼𝑘\alpha_{k+1}=\delta_{\alpha}\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT always holds for k𝑘kitalic_k large, αksubscript𝛼𝑘\alpha_{k}\nearrow\inftyitalic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↗ ∞, and μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is eventually never updated, cf. Step 1.10.

To conclude, suppose that domqdom𝑞\operatorname{dom}qroman_dom italic_q is closed. By Section 3.2, for every k𝑘kitalic_k we have that there exists 𝜼kn{\bm{\eta}}^{k}\in{}^{n}bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT with 𝜼kϵdnormsuperscript𝜼𝑘subscriptitalic-ϵd\|{\bm{\eta}}^{k}\|\leq\epsilon_{\rm d}∥ bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT such that

𝜼kJ𝒄(𝒙k)𝒚kq(𝒙k).superscript𝜼𝑘J𝒄superscriptsuperscript𝒙𝑘topsuperscript𝒚𝑘𝑞superscript𝒙𝑘{\bm{\eta}}^{k}-\mathop{}\!{\operatorname{J}}{\bm{c}}({\bm{x}}^{k})^{\top}{\bm% {y}}^{k}\in\partial q({\bm{x}}^{k}).bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - roman_J bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ ∂ italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) .

Let 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG be the limit of a subsequence (𝒙k)kKsubscriptsuperscript𝒙𝑘𝑘𝐾({\bm{x}}^{k})_{k\in K}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ italic_K end_POSTSUBSCRIPT and, up to extracting, let 𝝀¯¯𝝀\bar{{\bm{\lambda}}}over¯ start_ARG bold_italic_λ end_ARG be the limit of (1αk𝒚k)kKsubscript1subscript𝛼𝑘superscript𝒚𝑘𝑘𝐾(\frac{1}{\alpha_{k}}{\bm{y}}^{k})_{k\in K}( divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ italic_K end_POSTSUBSCRIPT. By the definition of 𝒚ksuperscript𝒚𝑘{\bm{y}}^{k}bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and the continuity of 𝒄𝒄{\bm{c}}bold_italic_c,

λ¯i{=0if ci(𝒙¯)<0=1if ci(𝒙¯)>0[0,1]if ci(𝒙¯)=0.subscript¯𝜆𝑖casesabsent0if subscript𝑐𝑖¯𝒙0absent1if subscript𝑐𝑖¯𝒙0absent01if subscript𝑐𝑖¯𝒙0{{{{{\bar{\lambda}}}_{i}{\mathopen{}\left\{\begin{array}[]{l @{\hspace{% \ifcasescolsep}} >{\text{if~}}l }=0\hfil\hskip 10.00002pt&\leavevmode\nobreak% \ }{{c}}_{i}(\bar{\bm{x}})<0\\ =1\hfil\hskip 10.00002pt&\leavevmode\nobreak\ }{{c}}_{i}(\bar{\bm{x}})>0\\ \in[0,1]\hfil\hskip 10.00002pt&\leavevmode\nobreak\ }{{c}}_{i}(\bar{\bm{x}})=0% .\end{array}\right.\mathclose{}}over¯ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { start_ARRAY start_ROW start_CELL = 0 end_CELL start_CELL if italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) < 0 end_CELL end_ROW start_ROW start_CELL = 1 end_CELL start_CELL if italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) > 0 end_CELL end_ROW start_ROW start_CELL ∈ [ 0 , 1 ] end_CELL start_CELL if italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) = 0 . end_CELL end_ROW end_ARRAY

Equivalently,

λ¯i[]+(ci(𝒙¯)).subscript¯𝜆𝑖subscriptdelimited-[]subscript𝑐𝑖¯𝒙{{\bar{\lambda}}}_{i}\in\partial[{}\cdot{}]_{+}({{c}}_{i}(\bar{\bm{x}})).over¯ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ∂ [ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ) . (3)

Since domqdom𝑞\operatorname{dom}qroman_dom italic_q is closed and 𝒙kdomqsuperscript𝒙𝑘dom𝑞{\bm{x}}^{k}\in\operatorname{dom}qbold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ roman_dom italic_q for all k𝑘kitalic_k, one has that q(𝒙¯)<𝑞¯𝒙q(\bar{\bm{x}})<\inftyitalic_q ( over¯ start_ARG bold_italic_x end_ARG ) < ∞. Moreover, it follows from Assumption 1 that q(𝒙k)q(𝒙¯)𝑞superscript𝒙𝑘𝑞¯𝒙q({\bm{x}}^{k})\to q(\bar{\bm{x}})italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → italic_q ( over¯ start_ARG bold_italic_x end_ARG ) as Kkcontains𝐾𝑘K\ni k\to\inftyitalic_K ∋ italic_k → ∞, hence that

J𝒄(𝒙¯)𝝀¯q(𝒙k)(J𝒄(𝒙¯)𝝀¯,0)Nepiq(𝒙¯,q(𝒙¯))=δepiq(𝒙¯,q(𝒙¯)),formulae-sequenceJ𝒄superscript¯𝒙topbold-¯𝝀superscript𝑞superscript𝒙𝑘J𝒄superscript¯𝒙topbold-¯𝝀0subscriptNepi𝑞¯𝒙𝑞¯𝒙subscript𝛿epi𝑞¯𝒙𝑞¯𝒙-\mathop{}\!{\operatorname{J}}{\bm{c}}(\bar{\bm{x}})^{\top}{\bm{\bar{\lambda}}% }\in\partial^{\infty}q({\bm{x}}^{k})\quad\Rightarrow\quad\bigl{(}\mathop{}\!{% \operatorname{J}}{\bm{c}}(\bar{\bm{x}})^{\top}{\bm{\bar{\lambda}}},0\bigr{)}% \in\operatorname{N}_{\operatorname{epi}q}(\bar{\bm{x}},q(\bar{\bm{x}}))=% \partial\operatorname{\delta}_{\operatorname{epi}q}(\bar{\bm{x}},q(\bar{\bm{x}% })),- roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT overbold_¯ start_ARG bold_italic_λ end_ARG ∈ ∂ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⇒ ( roman_J bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT overbold_¯ start_ARG bold_italic_λ end_ARG , 0 ) ∈ roman_N start_POSTSUBSCRIPT roman_epi italic_q end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_q ( over¯ start_ARG bold_italic_x end_ARG ) ) = ∂ italic_δ start_POSTSUBSCRIPT roman_epi italic_q end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_q ( over¯ start_ARG bold_italic_x end_ARG ) ) , (4)

where the inclusion follows from [22, Thm. 8.9]. Since

[𝒄(𝒙)]+1=i=1m[ci(𝒙)]+=i=1m[]+(ci(𝒙))ci(𝒙),subscriptnormsubscriptdelimited-[]𝒄𝒙1superscriptsubscript𝑖1𝑚subscriptdelimited-[]subscript𝑐𝑖𝒙superscriptsubscript𝑖1𝑚subscriptdelimited-[]subscript𝑐𝑖𝒙subscript𝑐𝑖𝒙\partial\|[{\bm{c}}({\bm{x}})]_{+}\|_{1}=\sum_{i=1}^{m}\partial[{{c}}_{i}({\bm% {x}})]_{+}=\sum_{i=1}^{m}\partial[{}\cdot{}]_{+}({{c}}_{i}({\bm{x}})){\nabla}{% {c}}_{i}({\bm{x}}),∂ ∥ [ bold_italic_c ( bold_italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∂ [ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∂ [ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ) ∇ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ,

see [22, Ex. 10.26], it follows from (3) and continuity of [𝒄(𝒙)]+1subscriptnormsubscriptdelimited-[]𝒄𝒙1\|[{\bm{c}}({\bm{x}})]_{+}\|_{1}∥ [ bold_italic_c ( bold_italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT that 𝝀¯[𝒄(𝒙¯)]+1bold-¯𝝀subscriptnormsubscriptdelimited-[]𝒄¯𝒙1{\bm{\bar{\lambda}}}\in\partial\|[{\bm{c}}(\bar{\bm{x}})]_{+}\|_{1}overbold_¯ start_ARG bold_italic_λ end_ARG ∈ ∂ ∥ [ bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Combining with (4) concludes the proof. ∎

The abuse of terminology to express KKT-stationarity in terms of subdifferentials passes through the same construct relating (3.1) and (3.1), in which a slack variable is tacitly introduced to reformulate the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm; see the discussion in Section 3.1. More importantly, the involvement in (2) of the epigraph of q𝑞qitalic_q, as opposed to its domain, is a necessary technicality that cannot be avoided in the generality of Section 1, as we illustrate next.

{remark}

[epiqepi𝑞\operatorname{epi}qroman_epi italic_q vs domqdom𝑞\operatorname{dom}qroman_dom italic_q]Stationarity for (2) is, in general, weaker than that for the more natural minimal infeasibility violation problem

minimize𝒙domq[𝒄(𝒙)]+1.subscriptminimize𝒙dom𝑞subscriptnormsubscriptdelimited-[]𝒄𝒙1\operatorname*{minimize}_{{\bm{x}}\in\operatorname{dom}q}\|[{\bm{c}}({\bm{x}})% ]_{+}\|_{1}.roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ roman_dom italic_q end_POSTSUBSCRIPT ∥ [ bold_italic_c ( bold_italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (5)

To see how this notion may be violated, consider q(x)=|x|𝑞𝑥𝑥q(x)=\sqrt{|x|}italic_q ( italic_x ) = square-root start_ARG | italic_x | end_ARG and c(x)=x+1𝑐𝑥𝑥1c(x)=x+1italic_c ( italic_x ) = italic_x + 1, so that (1) reads

minimizex|x|subjecttox1.subscriptminimize𝑥absent𝑥subjectto𝑥1\operatorname*{minimize}_{x\in\m@thbbch@rR}\sqrt{|x|}\quad\operatorname{% subject\ to}x\leq-1.roman_minimize start_POSTSUBSCRIPT italic_x ∈ end_POSTSUBSCRIPT square-root start_ARG | italic_x | end_ARG start_OPFUNCTION roman_subject roman_to end_OPFUNCTION italic_x ≤ - 1 .

The point xk=0superscript𝑥𝑘0x^{k}=0italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = 0 is stationary for any subproblem (3.2) with arbitrary α,μ>0𝛼𝜇0\alpha,\mu>0italic_α , italic_μ > 0, and therefore constitutes a feasible choice in Algorithm 1. However, the limit x¯=0¯𝑥0\bar{x}=0over¯ start_ARG italic_x end_ARG = 0 of the corresponding constant sequence is not stationary for the minimization of [x+1]+subscriptdelimited-[]𝑥1[x+1]_{+}[ italic_x + 1 ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT over domq=dom𝑞absent\operatorname{dom}q=\m@thbbch@rRroman_dom italic_q =. Nevertheless,

[x+1]+(0)×{0}+Nepiq(0,0)={(1,0)}+(×{0})(0,0),\partial[x+1]_{+}(0)\times{\mathopen{}\left\{0{}\mathrel{\mid}{}{}\right\}% \mathclose{}}+\operatorname{N}_{\operatorname{epi}q}(0,0)={\mathopen{}\left\{(% 1,0){}\mathrel{\mid}{}{}\right\}\mathclose{}}+(\m@thbbch@rR\times{\mathopen{}% \left\{0{}\mathrel{\mid}{}{}\right\}\mathclose{}})\ni(0,0),∂ [ italic_x + 1 ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( 0 ) × { 0 ∣ } + roman_N start_POSTSUBSCRIPT roman_epi italic_q end_POSTSUBSCRIPT ( 0 , 0 ) = { ( 1 , 0 ) ∣ } + ( × { 0 ∣ } ) ∋ ( 0 , 0 ) ,

confirming that (0,0)00(0,0)( 0 , 0 ) is stationary for the epigraphical problem (2).

We next formally illustrate why stationarity for (5) always implies that for (2), and identify the culprit of a possible discrepancy in uncontrolled growths around 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG from within domqdom𝑞\operatorname{dom}qroman_dom italic_q. To this end, we remind that a function h:n¯h:{}^{n}\rightarrow\overline{}\m@thbbch@rRitalic_h : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG is said to be calm at a point 𝒙¯domh¯𝒙dom\bar{\bm{x}}\in\operatorname{dom}hover¯ start_ARG bold_italic_x end_ARG ∈ roman_dom italic_h relative to a set X𝒙¯¯𝒙𝑋X\ni\bar{\bm{x}}italic_X ∋ over¯ start_ARG bold_italic_x end_ARG if

lim infX𝒙𝒙¯𝒙𝒙¯|h(𝒙)h(𝒙¯)|𝒙𝒙¯<,subscriptlimit-infimumcontains𝑋𝒙¯𝒙𝒙¯𝒙𝒙¯𝒙norm𝒙¯𝒙\liminf_{\begin{subarray}{c}X\ni{\bm{x}}\to\bar{\bm{x}}\\ {\bm{x}}\neq\bar{\bm{x}}\end{subarray}}\tfrac{|h({\bm{x}})-h(\bar{\bm{x}})|}{% \|{\bm{x}}-\bar{\bm{x}}\|}<\infty,lim inf start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_X ∋ bold_italic_x → over¯ start_ARG bold_italic_x end_ARG end_CELL end_ROW start_ROW start_CELL bold_italic_x ≠ over¯ start_ARG bold_italic_x end_ARG end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG | italic_h ( bold_italic_x ) - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) | end_ARG start_ARG ∥ bold_italic_x - over¯ start_ARG bold_italic_x end_ARG ∥ end_ARG < ∞ ,

and that this condition is weaker than strict continuity.

{lemma}

Let h:n¯h:{}^{n}\rightarrow\overline{}\m@thbbch@rRitalic_h : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → over¯ start_ARG end_ARG be proper and lsc. Then, for any 𝒙¯domh¯𝒙dom\bar{\bm{x}}\in\operatorname{dom}hover¯ start_ARG bold_italic_x end_ARG ∈ roman_dom italic_h one has

N^domh(𝒙¯)subscript^Ndom¯𝒙absent\displaystyle\widehat{\operatorname{N}}_{\operatorname{dom}h}(\bar{\bm{x}})% \subseteq{}over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_dom italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ⊆ {𝒗¯(𝒗¯,0)N^epih(𝒙¯,h(𝒙¯))}bold-¯𝒗bold-¯𝒗0subscript^Nepi¯𝒙¯𝒙\displaystyle{\mathopen{}\left\{{\bm{\bar{v}}}{}\mathrel{\mid}{}({\bm{\bar{v}}% },0)\in\widehat{\operatorname{N}}_{\operatorname{epi}h}(\bar{\bm{x}},h(\bar{% \bm{x}}))\right\}\mathclose{}}{ overbold_¯ start_ARG bold_italic_v end_ARG ∣ ( overbold_¯ start_ARG bold_italic_v end_ARG , 0 ) ∈ over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_epi italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ) }
and
Ndomh(𝒙¯)subscriptNdom¯𝒙absent\displaystyle\operatorname{N}_{\operatorname{dom}h}(\bar{\bm{x}})\subseteq{}roman_N start_POSTSUBSCRIPT roman_dom italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ⊆ {𝒗¯(𝒗¯,0)Nepih(𝒙¯,h(𝒙¯))}=h(𝒙¯).bold-¯𝒗bold-¯𝒗0subscriptNepi¯𝒙¯𝒙superscript¯𝒙\displaystyle{\mathopen{}\left\{{\bm{\bar{v}}}{}\mathrel{\mid}{}({\bm{\bar{v}}% },0)\in\operatorname{N}_{\operatorname{epi}h}(\bar{\bm{x}},h(\bar{\bm{x}}))% \right\}\mathclose{}}=\partial^{\infty}h(\bar{\bm{x}}).{ overbold_¯ start_ARG bold_italic_v end_ARG ∣ ( overbold_¯ start_ARG bold_italic_v end_ARG , 0 ) ∈ roman_N start_POSTSUBSCRIPT roman_epi italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ) } = ∂ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_h ( over¯ start_ARG bold_italic_x end_ARG ) .

When hhitalic_h is convex, both inclusions hold as equality. More generally, when hhitalic_h is calm (in particular, if it is strictly continuous) at 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG relative to domhdom\operatorname{dom}hroman_dom italic_h, then the first inclusion holds as equality, and so does the second one when such property holds not only at 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG, but also at all points in domhdom\operatorname{dom}hroman_dom italic_h close to it.

Proof.

The relations in the convex case are shown in [22, Thm. 8.9 and Prop. 8.12]; in what follows, we consider an arbitrary proper and lsc function hhitalic_h. Let v¯N^domh(𝒙¯)¯𝑣subscript^Ndom¯𝒙\bar{v}\in\widehat{\operatorname{N}}_{\operatorname{dom}h}(\bar{\bm{x}})over¯ start_ARG italic_v end_ARG ∈ over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_dom italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) and let epih(𝒙k,tk)(𝒙¯,h(𝒙¯))containsepisuperscript𝒙𝑘superscript𝑡𝑘¯𝒙¯𝒙\operatorname{epi}h\ni({\bm{x}}^{k},t^{k})\to(\bar{\bm{x}},h(\bar{\bm{x}}))roman_epi italic_h ∋ ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ). Then, there exists εk0subscript𝜀𝑘0\varepsilon_{k}\to 0italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → 0 such that 𝒗¯,𝒙k𝒙¯εk𝒙k𝒙¯bold-¯𝒗superscript𝒙𝑘¯𝒙subscript𝜀𝑘normsuperscript𝒙𝑘¯𝒙{\mathopen{}\left\langle{}{\bm{\bar{v}}}{},{}{\bm{x}}^{k}-\bar{\bm{x}}{}\right% \rangle\mathclose{}}\leq\varepsilon_{k}\|{\bm{x}}^{k}-\bar{\bm{x}}\|⟨ overbold_¯ start_ARG bold_italic_v end_ARG , bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG ⟩ ≤ italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG ∥ holds for every k𝑘kitalic_k, hence

εk(𝒙k𝒙¯missingtkh(𝒙¯))εk𝒙k𝒙¯𝒗¯,𝒙k𝒙¯=(𝒗¯missing0),(𝒙k𝒙¯missingtkh(𝒙¯)).subscript𝜀𝑘normbinomialsuperscript𝒙𝑘¯𝒙missingsuperscript𝑡𝑘¯𝒙subscript𝜀𝑘normsuperscript𝒙𝑘¯𝒙bold-¯𝒗superscript𝒙𝑘¯𝒙binomialbold-¯𝒗missing0binomialsuperscript𝒙𝑘¯𝒙missingsuperscript𝑡𝑘¯𝒙\varepsilon_{k}{\mathopen{}\left\|\binom{{\bm{x}}^{k}-\bar{\bm{x}}missing}{t^{% k}-h(\bar{\bm{x}})}\right\|\mathclose{}}\geq\varepsilon_{k}\|{\bm{x}}^{k}-\bar% {\bm{x}}\|\geq{\mathopen{}\left\langle{}{\bm{\bar{v}}}{},{}{\bm{x}}^{k}-\bar{% \bm{x}}{}\right\rangle\mathclose{}}={\mathopen{}\left\langle{}\binom{{\bm{\bar% {v}}}missing}{0}{},{}\binom{{\bm{x}}^{k}-\bar{\bm{x}}missing}{t^{k}-h(\bar{\bm% {x}})}{}\right\rangle\mathclose{}}.italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ( FRACOP start_ARG bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG roman_missing end_ARG start_ARG italic_t start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) end_ARG ) ∥ ≥ italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG ∥ ≥ ⟨ overbold_¯ start_ARG bold_italic_v end_ARG , bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG ⟩ = ⟨ ( FRACOP start_ARG overbold_¯ start_ARG bold_italic_v end_ARG roman_missing end_ARG start_ARG 0 end_ARG ) , ( FRACOP start_ARG bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG roman_missing end_ARG start_ARG italic_t start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) end_ARG ) ⟩ .

By the arbitrariness of the sequence, we conclude that (𝒗¯,0)N^epih(𝒙¯,h(𝒙¯))bold-¯𝒗0subscript^Nepi¯𝒙¯𝒙({\bm{\bar{v}}},0)\in\widehat{\operatorname{N}}_{\operatorname{epi}h}(\bar{\bm% {x}},h(\bar{\bm{x}}))( overbold_¯ start_ARG bold_italic_v end_ARG , 0 ) ∈ over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_epi italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ). The same inclusion must then hold for the limiting normal cones, leading to

Ndomh(𝒙¯){𝒗¯n(𝒗¯,0)Nepih(𝒙¯,h(𝒙¯))}=h(𝒙¯),\operatorname{N}_{\operatorname{dom}h}(\bar{\bm{x}})\subseteq{\mathopen{}\left% \{{\bm{\bar{v}}}\in{}^{n}{}\mathrel{\mid}{}({\bm{\bar{v}}},0)\in\operatorname{% N}_{\operatorname{epi}h}(\bar{\bm{x}},h(\bar{\bm{x}}))\right\}\mathclose{}}=% \partial^{\infty}h(\bar{\bm{x}}),roman_N start_POSTSUBSCRIPT roman_dom italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) ⊆ { overbold_¯ start_ARG bold_italic_v end_ARG ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ ( overbold_¯ start_ARG bold_italic_v end_ARG , 0 ) ∈ roman_N start_POSTSUBSCRIPT roman_epi italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ) } = ∂ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_h ( over¯ start_ARG bold_italic_x end_ARG ) , (6)

where the identity follows from [22, Thm. 8.9].

Suppose now that there exists κ>0𝜅0\kappa>0italic_κ > 0 such that |h(𝒙)h(𝒙¯)|κ𝒙𝒙¯𝒙¯𝒙𝜅norm𝒙¯𝒙|h({\bm{x}})-h(\bar{\bm{x}})|\leq\kappa\|{\bm{x}}-\bar{\bm{x}}\|| italic_h ( bold_italic_x ) - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) | ≤ italic_κ ∥ bold_italic_x - over¯ start_ARG bold_italic_x end_ARG ∥ for 𝒙domh𝒙dom{\bm{x}}\in\operatorname{dom}hbold_italic_x ∈ roman_dom italic_h close to 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG, and suppose that (𝒗¯,0)N^epih(𝒙¯,h(𝒙¯))bold-¯𝒗0subscript^Nepi¯𝒙¯𝒙({\bm{\bar{v}}},0)\in\widehat{\operatorname{N}}_{\operatorname{epi}h}(\bar{\bm% {x}},h(\bar{\bm{x}}))( overbold_¯ start_ARG bold_italic_v end_ARG , 0 ) ∈ over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_epi italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ). Let domh𝒙k𝒙¯containsdomsuperscript𝒙𝑘¯𝒙\operatorname{dom}h\ni{\bm{x}}^{k}\to\bar{\bm{x}}roman_dom italic_h ∋ bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → over¯ start_ARG bold_italic_x end_ARG, and note that epih(𝒙k,h(𝒙k))(𝒙¯,h(𝒙¯))containsepisuperscript𝒙𝑘superscript𝒙𝑘¯𝒙¯𝒙\operatorname{epi}h\ni({\bm{x}}^{k},h({\bm{x}}^{k}))\to(\bar{\bm{x}},h(\bar{% \bm{x}}))roman_epi italic_h ∋ ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_h ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) → ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ). Then, there exists εk0subscript𝜀𝑘0\varepsilon_{k}\to 0italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → 0 such that

𝒗¯,𝒙k𝒙¯=(𝒗¯missing0),(𝒙k𝒙¯missingtkh(𝒙¯))bold-¯𝒗superscript𝒙𝑘¯𝒙binomialbold-¯𝒗missing0binomialsuperscript𝒙𝑘¯𝒙missingsuperscript𝑡𝑘¯𝒙absent\displaystyle{\mathopen{}\left\langle{}{\bm{\bar{v}}}{},{}{\bm{x}}^{k}-\bar{% \bm{x}}{}\right\rangle\mathclose{}}={\mathopen{}\left\langle{}\binom{{\bm{\bar% {v}}}missing}{0}{},{}\binom{{\bm{x}}^{k}-\bar{\bm{x}}missing}{t^{k}-h(\bar{\bm% {x}})}{}\right\rangle\mathclose{}}\leq{}⟨ overbold_¯ start_ARG bold_italic_v end_ARG , bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG ⟩ = ⟨ ( FRACOP start_ARG overbold_¯ start_ARG bold_italic_v end_ARG roman_missing end_ARG start_ARG 0 end_ARG ) , ( FRACOP start_ARG bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG roman_missing end_ARG start_ARG italic_t start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) end_ARG ) ⟩ ≤ εk(𝒙k𝒙¯missingh(𝒙k)h(𝒙¯))subscript𝜀𝑘normbinomialsuperscript𝒙𝑘¯𝒙missingsuperscript𝒙𝑘¯𝒙\displaystyle\varepsilon_{k}{\mathopen{}\left\|\binom{{\bm{x}}^{k}-\bar{\bm{x}% }missing}{h({\bm{x}}^{k})-h(\bar{\bm{x}})}\right\|\mathclose{}}italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ( FRACOP start_ARG bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG roman_missing end_ARG start_ARG italic_h ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - italic_h ( over¯ start_ARG bold_italic_x end_ARG ) end_ARG ) ∥
\displaystyle\leq{} εk(𝒙k𝒙¯missingκ𝒙k𝒙¯)=εk1+κ2𝒙k𝒙¯,subscript𝜀𝑘normbinomialsuperscript𝒙𝑘¯𝒙missing𝜅normsuperscript𝒙𝑘¯𝒙subscript𝜀𝑘1superscript𝜅2normsuperscript𝒙𝑘¯𝒙\displaystyle\varepsilon_{k}{\mathopen{}\left\|\binom{{\bm{x}}^{k}-\bar{\bm{x}% }missing}{\kappa\|{\bm{x}}^{k}-\bar{\bm{x}}\|}\right\|\mathclose{}}=% \varepsilon_{k}\sqrt{1+\kappa^{2}}\|{\bm{x}}^{k}-\bar{\bm{x}}\|,italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ( FRACOP start_ARG bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG roman_missing end_ARG start_ARG italic_κ ∥ bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG ∥ end_ARG ) ∥ = italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG 1 + italic_κ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_x end_ARG ∥ ,

where the second inequality holds for k𝑘kitalic_k large enough. Arguing again by the arbitrariness of the sequence, we conclude that 𝒗¯N^domh(𝒙¯)bold-¯𝒗subscript^Ndom¯𝒙{\bm{\bar{v}}}\in\widehat{\operatorname{N}}_{\operatorname{dom}h}(\bar{\bm{x}})overbold_¯ start_ARG bold_italic_v end_ARG ∈ over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_dom italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ). Finally, when hhitalic_h is calm relative to its domain at all points 𝒙domh𝒙dom{\bm{x}}\in\operatorname{dom}hbold_italic_x ∈ roman_dom italic_h close to 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG, then the identity N^domh(𝒙)×{0}=N^epih(𝒙,h(𝒙))\widehat{\operatorname{N}}_{\operatorname{dom}h}({\bm{x}})\times{\mathopen{}% \left\{0{}\mathrel{\mid}{}{}\right\}\mathclose{}}=\widehat{\operatorname{N}}_{% \operatorname{epi}h}({\bm{x}},h({\bm{x}}))over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_dom italic_h end_POSTSUBSCRIPT ( bold_italic_x ) × { 0 ∣ } = over^ start_ARG roman_N end_ARG start_POSTSUBSCRIPT roman_epi italic_h end_POSTSUBSCRIPT ( bold_italic_x , italic_h ( bold_italic_x ) ) holds for all such points, and a limiting argument then yields that Ndomh(𝒙¯)×{0}=Nepih(𝒙¯,h(𝒙¯))\operatorname{N}_{\operatorname{dom}h}(\bar{\bm{x}})\times{\mathopen{}\left\{0% {}\mathrel{\mid}{}{}\right\}\mathclose{}}=\operatorname{N}_{\operatorname{epi}% h}(\bar{\bm{x}},h(\bar{\bm{x}}))roman_N start_POSTSUBSCRIPT roman_dom italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) × { 0 ∣ } = roman_N start_POSTSUBSCRIPT roman_epi italic_h end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG , italic_h ( over¯ start_ARG bold_italic_x end_ARG ) ) holds for the limiting normal cones. Therefore, the inclusion in (6) holds as equality, which concludes the proof. ∎

4.1 Barrier’s properties

According to its update rule in Algorithm 1, before a desired feasibility violation pkϵpsubscript𝑝𝑘subscriptitalic-ϵpp_{k}\leq\epsilon_{\rm p}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT has been reached, αk+1=αksubscript𝛼𝑘1subscript𝛼𝑘\alpha_{k+1}=\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT means that pk2m(ρk)ρkp_{k}\leq 2m\frac{-{}^{\ast}{}(\rho^{\ast}_{k})}{\rho^{\ast}_{k}}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 2 italic_m divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG, where ρk=αk/μksubscriptsuperscript𝜌𝑘subscript𝛼𝑘subscript𝜇𝑘\rho^{\ast}_{k}=\nicefrac{{\alpha_{k}}}{{\mu_{k}}}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG. As shown in Item 4, regardless of whether αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is updated or not, ρksubscriptsuperscript𝜌𝑘\rho^{\ast}_{k}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT grows linearly over the iterations; therefore, having αk+1=αksubscript𝛼𝑘1subscript𝛼𝑘\alpha_{k+1}=\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT implies in particular that either the constraint violation pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is within a desired tolerance ϵpsubscriptitalic-ϵp\epsilon_{\rm p}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT, or that it is controlled by (ρk)ρk2m(δρkρ0)δρkρ0\frac{-{}^{\ast}{}(\rho^{\ast}_{k})}{\rho^{\ast}_{k}}\leq 2m\frac{-{}^{\ast}{}% (\delta_{\rho^{\ast}}^{k}\rho^{\ast}_{0})}{\delta_{\rho^{\ast}}^{k}\rho^{\ast}% _{0}}divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ≤ 2 italic_m divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG, where the inequality follows from monotonicity of (t)/t\nicefrac{{-{}^{\ast}{}(t^{*})}}{{t^{*}}}/ start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG, cf. Item 4. This means that a desired decrease in feasibility violation can be enforced through suitable choices of the barrier . This will be particularly significant in the convex case, for it can be shown that αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is eventually never updated under reasonable assumptions.

{lemma}

Let Sections 1 and 3.2 hold, and consider the iterates generated by Algorithm 1. Suppose that there exists θ(0,1)𝜃01\theta\in(0,1)italic_θ ∈ ( 0 , 1 ) such that the barrier satisfies (θt)θδρ(t)𝜃𝑡𝜃subscript𝛿superscript𝜌𝑡\eulerb{}(\theta t)\leq\theta\delta_{\rho^{\ast}}\eulerb{}(t)( italic_θ italic_t ) ≤ italic_θ italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) for every t<0𝑡0t<0italic_t < 0 (resp. for every t<0𝑡0t<0italic_t < 0 close enough to 0), where δρmin{δμ1,δα}>1\delta_{\rho^{\ast}}\coloneqq\min{\mathopen{}\left\{\delta_{\mu}^{-1},\delta_{% \alpha}{}\mathrel{\mid}{}{}\right\}\mathclose{}}>1italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≔ roman_min { italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ∣ } > 1. Then,

αk+1=αkpkmax{ϵp,2mμ0α0(α0μ0)θk}\alpha_{k+1}=\alpha_{k}\quad\Rightarrow\quad p_{k}\leq\max{\mathopen{}\left\{% \epsilon_{\rm p},-2m\tfrac{\mu_{0}}{\alpha_{0}}{}^{\ast}{}\bigl{(}\tfrac{% \alpha_{0}}{\mu_{0}}\bigr{)}\theta^{k}{}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⇒ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ roman_max { italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , - 2 italic_m divide start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∣ } (7)

holds for every k𝑘kitalic_k (resp. for every k𝑘kitalic_k large enough).

Proof.

To simplify the presentation, without loss of generality let us set ϵp=0subscriptitalic-ϵp0\epsilon_{\rm p}=0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 0. We have already argued that αk+1=αksubscript𝛼𝑘1subscript𝛼𝑘\alpha_{k+1}=\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT implies pk2mπksubscript𝑝𝑘2𝑚subscript𝜋𝑘p_{k}\leq 2m\pi_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 2 italic_m italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where πk(δρkρ0)δρkρ0\pi_{k}\coloneqq\frac{-{}^{\ast}{}(\delta_{\rho^{\ast}}^{k}\rho^{\ast}_{0})}{% \delta_{\rho^{\ast}}^{k}\rho^{\ast}_{0}}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≔ divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG for all k𝑘absentk\in\m@thbbch@rNitalic_k ∈. It thus suffices to show that πkμ0α0(α0μ0)θk\pi_{k}\leq-\tfrac{\mu_{0}}{\alpha_{0}}{}^{\ast}{}\bigl{(}\tfrac{\alpha_{0}}{% \mu_{0}}\bigr{)}\theta^{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ - divide start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. To this end, notice that for every t>0superscript𝑡0t^{*}>0italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0 one has

(δρt)δρtθ(t)t(t)(δρt)θδρ=supτ{tτ(θτ)δρθ}=((θ)θδρ)(t),\frac{-{}^{\ast}{}(\delta_{\rho^{\ast}}t^{*})}{\delta_{\rho^{\ast}}t^{*}}\leq% \theta\frac{-{}^{\ast}{}(t^{*})}{t^{*}}\quad\Leftrightarrow\quad{}^{\ast}{}(t^% {*})\leq\frac{{}^{\ast}{}(\delta_{\rho^{\ast}}t^{*})}{\theta\delta_{\rho^{\ast% }}}=\sup_{\tau}{\mathopen{}\left\{t^{*}\tau-\tfrac{\eulerb{}(\theta\tau)}{% \delta_{\rho^{\ast}}\theta}{}\mathrel{\mid}{}{}\right\}\mathclose{}}={% \mathopen{}\left(\tfrac{\eulerb{}(\theta\cdot{})}{\theta\delta_{\rho^{\ast}}}% \right)\mathclose{}}^{\ast}(t^{*}),divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ≤ italic_θ divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ⇔ start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ divide start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_θ italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG = roman_sup start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT { italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_τ - divide start_ARG ( italic_θ italic_τ ) end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_θ end_ARG ∣ } = ( divide start_ARG ( italic_θ ⋅ ) end_ARG start_ARG italic_θ italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ,

hence, since (t)={}^{\ast}{}(t^{*})=\inftystart_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ∞ for t<0superscript𝑡0t^{*}<0italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 0,

(δρt)δρtθ(t)tt(t)(θt)θδρt,\frac{-{}^{\ast}{}(\delta_{\rho^{\ast}}t^{*})}{\delta_{\rho^{\ast}}t^{*}}\leq% \theta\frac{-{}^{\ast}{}(t^{*})}{t^{*}}\quad\forall t^{*}\in\m@thbbch@rR\quad% \Leftrightarrow\quad\eulerb{}(t)\geq\tfrac{\eulerb{}(\theta t)}{\theta\delta_{% \rho^{\ast}}}\quad\forall t\in\m@thbbch@rR,divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ≤ italic_θ divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ∀ italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ ⇔ ( italic_t ) ≥ divide start_ARG ( italic_θ italic_t ) end_ARG start_ARG italic_θ italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ∀ italic_t ∈ ,

which amounts to the condition in the statement. Under such condition, then, πk+1θπksubscript𝜋𝑘1𝜃subscript𝜋𝑘\pi_{k+1}\leq\theta\pi_{k}italic_π start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≤ italic_θ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT holds for every k𝑘kitalic_k, leading to πkπ0θk=(ρ0)ρ0θk\pi_{k}\leq\pi_{0}\theta^{k}=\frac{-{}^{\ast}{}(\rho^{\ast}_{0})}{\rho^{\ast}_% {0}}\theta^{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT as claimed. ∎

Though it would be tempting to seek barriers for which (πk)ksubscriptsubscript𝜋𝑘𝑘absent(\pi_{k})_{k\in\m@thbbch@rN}( italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT as in the proof vanishes at any desired rate, it can be easily verified that no choice of or δρsubscript𝛿superscript𝜌\delta_{\rho^{\ast}}italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT can result in (πk)ksubscriptsubscript𝜋𝑘𝑘absent(\pi_{k})_{k\in\m@thbbch@rN}( italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT converging any faster than linearly. In fact,

πk+1=(ρ0δρk+1)ρ0δρk+1>(ρ0δρk)ρ0δρk+1=1δρπk,\pi_{k+1}=\frac{-{}^{\ast}{}(\rho^{\ast}_{0}\delta_{\rho^{\ast}}^{k+1})}{\rho^% {\ast}_{0}\delta_{\rho^{\ast}}^{k+1}}>\frac{-{}^{\ast}{}(\rho^{\ast}_{0}\delta% _{\rho^{\ast}}^{k})}{\rho^{\ast}_{0}\delta_{\rho^{\ast}}^{k+1}}=\frac{1}{% \delta_{\rho^{\ast}}}\pi_{k},italic_π start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_ARG > divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

where the inequality follows from monotonicity of -{}^{\ast}{}- start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT, cf. Item 2. This shows that a linear decrease by a factor δρ1superscriptsubscript𝛿superscript𝜌1\delta_{\rho^{\ast}}^{-1}italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is the best achievable rate, and that this can only happen in the limit. Section 4.1 nevertheless identifies a property that allows us to judge the fitness of a barrier within the framework of Algorithm 1. As we will see in Section 4.2, this will be particularly evident in the convex case, for it can be guaranteed that, under assumptions, αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT eventually does remain always constant, so that employing a barrier that complies with this requirement is a guarantee that eventually the infeasibility pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of the iterates generated by Algorithm 1 vanishes at R-linear rate. This motivates the following definition.

{definition}

[behavior profiles of ]We say that a barrier complying with Section 3.2 is asymptotically well behaved if

θ(0,1)κ(θ)for-all𝜃01𝜅𝜃\displaystyle\forall\theta\in(0,1)\quad\mathchoice{\hskip 14.32803pt\clap{${% \displaystyle{}\kappa(\theta){}}$}\hskip 14.32803pt}{\hskip 14.32803pt\clap{${% {}\kappa(\theta){}}$}\hskip 14.32803pt}{\hskip 10.10406pt\clap{${\scriptstyle{% }\kappa(\theta){}}$}\hskip 10.10406pt}{\hskip 8.28067pt\clap{${% \scriptscriptstyle{}\kappa(\theta){}}$}\hskip 8.28067pt}\coloneqq{}∀ italic_θ ∈ ( 0 , 1 ) italic_κ ( italic_θ ) ≔ lim supt0(θt)θ(t)<subscriptlimit-supremum𝑡superscript0𝜃𝑡𝜃𝑡\displaystyle\limsup_{t\to 0^{-}}\frac{\eulerb{}(\theta t)}{\theta\eulerb{}(t)% }<\inftylim sup start_POSTSUBSCRIPT italic_t → 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ( italic_θ italic_t ) end_ARG start_ARG italic_θ ( italic_t ) end_ARG < ∞ andlimθ1κ(θ)=1.andsubscript𝜃superscript1𝜅𝜃1\displaystyle\text{and}\quad\lim_{\theta\to 1^{-}}\kappa(\theta)=1.and roman_lim start_POSTSUBSCRIPT italic_θ → 1 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_κ ( italic_θ ) = 1 .
If this condition can be strengthened to
θ(0,1)κmax(θ)\displaystyle\forall\theta\in(0,1)\quad\kappa^{\rm max}(\theta)\coloneqq{}∀ italic_θ ∈ ( 0 , 1 ) italic_κ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_θ ) ≔ supt<0(θt)θ(t)<subscriptsupremum𝑡0𝜃𝑡𝜃𝑡\displaystyle\sup_{t<0}\frac{\eulerb{}(\theta t)}{\theta\eulerb{}(t)}<\inftyroman_sup start_POSTSUBSCRIPT italic_t < 0 end_POSTSUBSCRIPT divide start_ARG ( italic_θ italic_t ) end_ARG start_ARG italic_θ ( italic_t ) end_ARG < ∞ andlimθ1κmax(θ)=1,andsubscript𝜃superscript1superscript𝜅max𝜃1\displaystyle\text{and}\quad\lim_{\theta\to 1^{-}}\kappa^{\rm max}(\theta)=1,and roman_lim start_POSTSUBSCRIPT italic_θ → 1 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_κ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_θ ) = 1 ,

then we say that is well behaved (not merely asymptotically). We call the functions κmax,κ:(0,1)(1,):superscript𝜅max𝜅011\kappa^{\rm max},\kappa:(0,1)\rightarrow(1,\infty)italic_κ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT , italic_κ : ( 0 , 1 ) → ( 1 , ∞ ) the behavior profile and the asymptotic behavior profile of , respectively.

In penalty-type methods, the update of a penalty parameter is typically decided based on the violation of the corresponding constraints. Under the assumption that the barrier is (asymptotically) well behaved, Section 4.1 demonstrates that in Algorithm 1 (eventually) the condition αk+1=αksubscript𝛼𝑘1subscript𝛼𝑘\alpha_{k+1}=\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT furnishes a guarantee of linear decrease of the infeasibility. Insisting on continuity of κ𝜅\kappaitalic_κ and κmaxsuperscript𝜅max\kappa^{\rm max}italic_κ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT at θ=1𝜃1\theta=1italic_θ = 1 in Section 4.1 is a minor technicality ensuring that, regardless of the value of δμ(0,1)subscript𝛿𝜇01\delta_{\mu}\in(0,1)italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and δα>1subscript𝛿𝛼1\delta_{\alpha}>1italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT > 1, for any (asymptotically) well behaved barrier there always exists θ(0,1)𝜃01\theta\in(0,1)italic_θ ∈ ( 0 , 1 ) such that (θt)θδρ(t)𝜃𝑡𝜃subscript𝛿superscript𝜌𝑡\eulerb{}(\theta t)\leq\theta\delta_{\rho^{\ast}}\eulerb{}(t)( italic_θ italic_t ) ≤ italic_θ italic_δ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) holds for every t<0𝑡0t<0italic_t < 0 (close enough to zero) as required in Section 4.1. The result can thus be restated as follows.

{corollary}

Additionally to Sections 1 and 3.2, suppose that the barrier is (asymptotically) well behaved. Then, there exists θ(0,1)𝜃01\theta\in(0,1)italic_θ ∈ ( 0 , 1 ) such that the iterates of Algorithm 1 satisfy (7) for all k𝑘absentk\in\m@thbbch@rNitalic_k ∈ (large enough).

When it comes to comparing different barriers, lower values of κ𝜅\kappaitalic_κ are clearly preferable. Notice that both κmaxsuperscript𝜅max\kappa^{\rm max}italic_κ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT and κ𝜅\kappaitalic_κ are scaling invariant:

κβ=κ(β)=κandκβmax=κ(β)max=κmaxβ>0.\kappa_{\beta\eulerb{}}=\kappa_{\eulerb{}(\beta{}\cdot{})}=\kappa\quad\text{% and}\quad\kappa_{\beta\eulerb{}}^{\rm max}=\kappa_{\eulerb{}(\beta{}\cdot{})}^% {\rm max}=\kappa^{\rm max}\qquad\forall\beta>0.italic_κ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = italic_κ start_POSTSUBSCRIPT ( italic_β ⋅ ) end_POSTSUBSCRIPT = italic_κ and italic_κ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT = italic_κ start_POSTSUBSCRIPT ( italic_β ⋅ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT = italic_κ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ∀ italic_β > 0 .

Moreover, since

κ(θ)1θθ(0,1)formulae-sequence𝜅𝜃1𝜃for-all𝜃01\kappa(\theta)\geq\tfrac{1}{\theta}\quad\forall\theta\in(0,1)italic_κ ( italic_θ ) ≥ divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ∀ italic_θ ∈ ( 0 , 1 )

(owing to monotonicity of and the fact that consequently (θt)(t)𝜃𝑡𝑡\eulerb{}(\theta t)\geq\eulerb{}(t)( italic_θ italic_t ) ≥ ( italic_t ) for t<0𝑡0t<0italic_t < 0), barriers attaining κ(θ)=1θ𝜅𝜃1𝜃\kappa(\theta)=\frac{1}{\theta}italic_κ ( italic_θ ) = divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG can be considered asymptotically optimal. Table 1 shows that logarithmic barriers can attain such lower bound.

(t)(τ)κ(θ)κmax(θ)1p(t)p1qτq(1θ)1+p(1θ)1+p(p>0q=p1+p)ln(11t)2(ττ+τ+4+ln(τ+τ+42))1θ(1θ)2exp(1t)12W0(τ/2)(1+12W0(τ/2))τ\begin{array}[]{|cccc|l}\lx@intercol\hfil\eulerb{}(t)\hfil\lx@intercol&% \lx@intercol\hfil{}^{\ast}{}(\tau)\hfil\lx@intercol&\lx@intercol\hfil\kappa(% \theta)\hfil\lx@intercol&\lx@intercol\hfil\kappa^{\rm max}(\theta)\hfil% \lx@intercol\\ \cline{1-4}\cr\vphantom{X^{\big{|}}}\frac{1}{p}(-t)^{-p}&-\frac{1}{q}\tau^{q}&% {\mathopen{}\left(\frac{1}{\theta}\right)\mathclose{}}^{1+p}&{\mathopen{}\left% (\frac{1}{\theta}\right)\mathclose{}}^{1+p}&\text{($p>0$, $q=\frac{p}{1+p}$)}% \\ \ln{\mathopen{}\left(1-\frac{1}{t}\right)\mathclose{}}&-2{\mathopen{}\left(% \frac{\sqrt{\tau}}{\sqrt{\tau}+\sqrt{\tau+4}}+\ln\bigl{(}\frac{\sqrt{\tau}+% \sqrt{\tau+4}}{2}\bigr{)}\right)\mathclose{}}&\frac{1}{\theta}&{\mathopen{}% \left(\frac{1}{\theta}\right)\mathclose{}}^{2}\\ \exp{\mathopen{}\left(-\frac{1}{t}\right)\mathclose{}}&-\frac{1}{2% \operatorname{W_{0}}(\nicefrac{{\sqrt{\tau}}}{{2}})}{\mathopen{}\left(1+\frac{% 1}{2\operatorname{W_{0}}(\nicefrac{{\sqrt{\tau}}}{{2}})}\right)\mathclose{}}% \tau&\infty&\infty\\ \cline{1-4}\cr\end{array}start_ARRAY start_ROW start_CELL ( italic_t ) end_CELL start_CELL start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_τ ) end_CELL start_CELL italic_κ ( italic_θ ) end_CELL start_CELL italic_κ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_θ ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( - italic_t ) start_POSTSUPERSCRIPT - italic_p end_POSTSUPERSCRIPT end_CELL start_CELL - divide start_ARG 1 end_ARG start_ARG italic_q end_ARG italic_τ start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_CELL start_CELL ( divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT 1 + italic_p end_POSTSUPERSCRIPT end_CELL start_CELL ( divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT 1 + italic_p end_POSTSUPERSCRIPT end_CELL start_CELL ( italic_p > 0 , italic_q = divide start_ARG italic_p end_ARG start_ARG 1 + italic_p end_ARG ) end_CELL end_ROW start_ROW start_CELL roman_ln ( 1 - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ) end_CELL start_CELL - 2 ( divide start_ARG square-root start_ARG italic_τ end_ARG end_ARG start_ARG square-root start_ARG italic_τ end_ARG + square-root start_ARG italic_τ + 4 end_ARG end_ARG + roman_ln ( divide start_ARG square-root start_ARG italic_τ end_ARG + square-root start_ARG italic_τ + 4 end_ARG end_ARG start_ARG 2 end_ARG ) ) end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG end_CELL start_CELL ( divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL roman_exp ( - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ) end_CELL start_CELL - divide start_ARG 1 end_ARG start_ARG 2 start_OPFUNCTION roman_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_OPFUNCTION ( / start_ARG square-root start_ARG italic_τ end_ARG end_ARG start_ARG 2 end_ARG ) end_ARG ( 1 + divide start_ARG 1 end_ARG start_ARG 2 start_OPFUNCTION roman_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_OPFUNCTION ( / start_ARG square-root start_ARG italic_τ end_ARG end_ARG start_ARG 2 end_ARG ) end_ARG ) italic_τ end_CELL start_CELL ∞ end_CELL start_CELL ∞ end_CELL start_CELL end_CELL end_ROW end_ARRAY
Table 1: Examples of barriers and their behavior profiles κ𝜅\kappaitalic_κ. A low κ𝜅\kappaitalic_κ is symptomatic of good aptitude of as barrier within Algorithm 1. Geometrically, it indicates that well approximates the nonsmooth indicator δsubscript𝛿\operatorname{\delta}_{{}_{-}}italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT - end_FLOATSUBSCRIPT end_POSTSUBSCRIPT. Functions like exp(1t)1𝑡\exp{\mathopen{}\left(-\frac{1}{t}\right)\mathclose{}}roman_exp ( - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ) growing too fast are unsuited, whereas logarithmic barriers such as (t)=ln(11t)𝑡11𝑡\eulerb{}(t)=\ln{\mathopen{}\left(1-\frac{1}{t}\right)\mathclose{}}( italic_t ) = roman_ln ( 1 - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ) attain an optimal asymptotic behavior profile κ(θ)=1θ𝜅𝜃1𝜃\kappa(\theta)=\frac{1}{\theta}italic_κ ( italic_θ ) = divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG. Here, W0subscriptW0\operatorname{W_{0}}roman_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denotes the Lambert W0subscriptW0\operatorname{W_{0}}roman_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT function (product logarithm), namely the functional inverse of ττexp(τ)maps-to𝜏𝜏𝜏\tau\mapsto\tau\exp(\tau)italic_τ ↦ italic_τ roman_exp ( italic_τ ) for τ0𝜏0\tau\geq 0italic_τ ≥ 0 [6].

4.2 The convex case

In this section we investigate the behavior of Algorithm 1 when applied to convex problems. In particular, we detail an asymptotic analysis in which the termination tolerances are set to zero, so that the algorithm (may) run indefinitely. We demonstrate that under standard assumptions the iterates subsequentially converge to (global) solutions, and that the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT penalty parameter α𝛼\alphaitalic_α is eventually never updated.

{theorem}

Additionally to Sections 1 and 3.2, suppose that q𝑞qitalic_q and cisubscript𝑐𝑖{{c}}_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,,m𝑖1𝑚i=1,\dots,mitalic_i = 1 , … , italic_m, are convex functions, and that there exists an optimal 2.2-pair (𝒙,𝒚)superscript𝒙superscript𝒚({\bm{x}}^{\star},{\bm{y}}^{\star})( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) for (1). Then, the following hold for the iterates generated by Algorithm 1 with ϵp=ϵd=0subscriptitalic-ϵpsubscriptitalic-ϵd0\epsilon_{\rm p}=\epsilon_{\rm d}=0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = 0:

  1. 1.

    Any accumulation point of the sequence (𝒙k)ksubscriptsuperscript𝒙𝑘𝑘absent({\bm{x}}^{k})_{k\in\m@thbbch@rN}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT is a solution of (1).

  2. 2.

    If, additionally, (𝒙k)ksubscriptsuperscript𝒙𝑘𝑘absent({\bm{x}}^{k})_{k\in\m@thbbch@rN}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT remains bounded (as is the case when domqdom𝑞\operatorname{dom}qroman_dom italic_q is bounded), αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is eventually never updated.

  3. 3.

    Further assuming that the barrier is asymptotically well conditioned, so that there exists θ(0,1)𝜃01\theta\in(0,1)italic_θ ∈ ( 0 , 1 ) such that (θt)θmin{δμ1,δα}(t)\eulerb{}(\theta t)\leq\theta\min{\mathopen{}\left\{\delta_{\mu}^{-1},\delta_{% \alpha}{}\mathrel{\mid}{}{}\right\}\mathclose{}}\eulerb{}(t)( italic_θ italic_t ) ≤ italic_θ roman_min { italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ∣ } ( italic_t ) for every t<0𝑡0t<0italic_t < 0 close enough to 0, then the feasibility violation eventually vanishes with rate pk2mμ0α0(α0μ0)θkp_{k}\leq-2m\tfrac{\mu_{0}}{\alpha_{0}}{}^{\ast}{}\bigl{(}\tfrac{\alpha_{0}}{% \mu_{0}}\bigr{)}\theta^{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ - 2 italic_m divide start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT.

Proof.

It follows from Item 2 that 𝒙superscript𝒙{\bm{x}}^{\star}bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT solves (3.1) for all α𝒚𝛼subscriptnormsuperscript𝒚\alpha\geq\|{\bm{y}}^{\star}\|_{\infty}italic_α ≥ ∥ bold_italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. For every k𝑘kitalic_k, there exists 𝜼ksuperscript𝜼𝑘{\bm{\eta}}^{k}bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT with 𝜼kεknormsuperscript𝜼𝑘subscript𝜀𝑘\|{\bm{\eta}}^{k}\|\leq\varepsilon_{k}∥ bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT such that 𝜼k[q+μkΨαk/μk](𝒙k)superscript𝜼𝑘delimited-[]𝑞subscript𝜇𝑘subscriptΨsubscript𝛼𝑘subscript𝜇𝑘superscript𝒙𝑘{\bm{\eta}}^{k}\in\partial\bigl{[}q+\mu_{k}\Psi_{\nicefrac{{\alpha_{k}}}{{\mu_% {k}}}}\bigr{]}({\bm{x}}^{k})bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ ∂ [ italic_q + italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ] ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ). If (αk)ksubscriptsubscript𝛼𝑘𝑘absent(\alpha_{k})_{k\in\m@thbbch@rN}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT is asymptotically constant, then pkμkαk(αk/μk)p_{k}\leq-\frac{\mu_{k}}{\alpha_{k}}{}^{\ast}{}(\nicefrac{{\alpha_{k}}}{{\mu_{% k}}})italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ - divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) eventually always holds, and, since the right-hand side vanishes as k𝑘k\to\inftyitalic_k → ∞, any limit point 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG of (𝒙k)ksubscriptsuperscript𝒙𝑘𝑘absent({\bm{x}}^{k})_{k\in\m@thbbch@rN}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT satisfies 𝒄(𝒙¯)𝟎𝒄¯𝒙0{\bm{c}}(\bar{\bm{x}})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ bold_0. Otherwise, (αk)ksubscriptsubscript𝛼𝑘𝑘absent(\alpha_{k})_{k\in\m@thbbch@rN}\nearrow\infty( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT ↗ ∞ and, for k𝑘kitalic_k large such that αk>α𝒚subscript𝛼𝑘𝛼subscriptnormsuperscript𝒚\alpha_{k}>\alpha\coloneqq\|{\bm{y}}^{\star}\|_{\infty}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_α ≔ ∥ bold_italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT, since 𝒄(𝒙)𝟎𝒄superscript𝒙0{\bm{c}}({\bm{x}}^{\star})\leq{\bm{0}}bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ≤ bold_0 and 𝒙superscript𝒙{\bm{x}}^{\star}bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT solves (3.1) one has

q(𝒙)=𝑞superscript𝒙absent\displaystyle q({\bm{x}}^{\star})={}italic_q ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) = q(𝒙)+α[𝒄(𝒙)]+1𝑞superscript𝒙𝛼subscriptnormsubscriptdelimited-[]𝒄superscript𝒙1\displaystyle q({\bm{x}}^{\star})+\alpha\|[{\bm{c}}({\bm{x}}^{\star})]_{+}\|_{1}italic_q ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + italic_α ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
\displaystyle\leq{} q(𝒙k)+α[𝒄(𝒙k)]+1𝑞superscript𝒙𝑘𝛼subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘1\displaystyle q({\bm{x}}^{k})+\alpha\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1}italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_α ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
=\displaystyle={}= q(𝒙k)+αk[𝒄(𝒙k)]+1(αkα)[𝒄(𝒙k)]+1𝑞superscript𝒙𝑘subscript𝛼𝑘subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘1subscript𝛼𝑘𝛼subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘1\displaystyle q({\bm{x}}^{k})+\alpha_{k}\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1}-(% \alpha_{k}-\alpha)\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1}italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α ) ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
since []+ψρ/ρsubscriptdelimited-[]subscript𝜓superscript𝜌superscript𝜌[{}\cdot{}]_{+}\leq\psi_{\rho^{\ast}}/\rho^{\ast}[ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ≤ italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for any ρ>0superscript𝜌0\rho^{\ast}>0italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0, see Item 2,
\displaystyle\leq{} q(𝒙k)+μkΨαk/μk(𝒄(𝒙k))(αkα)[𝒄(𝒙k)]+1𝑞superscript𝒙𝑘subscript𝜇𝑘subscriptΨsubscript𝛼𝑘subscript𝜇𝑘𝒄superscript𝒙𝑘subscript𝛼𝑘𝛼subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘1\displaystyle q({\bm{x}}^{k})+\mu_{k}\Psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k}}}}% ({\bm{c}}({\bm{x}}^{k}))-(\alpha_{k}-\alpha)\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1}italic_q ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) - ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α ) ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
since 𝜼k[q+μkΨαk/μk](𝒙k)superscript𝜼𝑘delimited-[]𝑞subscript𝜇𝑘subscriptΨsubscript𝛼𝑘subscript𝜇𝑘superscript𝒙𝑘{\bm{\eta}}^{k}\in\partial\bigl{[}q+\mu_{k}\Psi_{\nicefrac{{\alpha_{k}}}{{\mu_% {k}}}}\bigr{]}({\bm{x}}^{k})bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ ∂ [ italic_q + italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ] ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) and q+μkΨαk/μk𝑞subscript𝜇𝑘subscriptΨsubscript𝛼𝑘subscript𝜇𝑘q+\mu_{k}\Psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k}}}}italic_q + italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT is convex by virtue of Item 1,
\displaystyle\leq{} q(𝒙)+μkΨαk/μk(𝒄(𝒙))+𝜼k,𝒙𝒙k(αkα)[𝒄(𝒙k)]+1𝑞superscript𝒙subscript𝜇𝑘subscriptΨsubscript𝛼𝑘subscript𝜇𝑘𝒄superscript𝒙superscript𝜼𝑘superscript𝒙superscript𝒙𝑘subscript𝛼𝑘𝛼subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘1\displaystyle q({\bm{x}}^{\star})+\mu_{k}\Psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k% }}}}({\bm{c}}({\bm{x}}^{\star}))+{\mathopen{}\left\langle{}{\bm{\eta}}^{k}{},{% }{\bm{x}}^{\star}-{\bm{x}}^{k}{}\right\rangle\mathclose{}}-(\alpha_{k}-\alpha)% \|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1}italic_q ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) + ⟨ bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α ) ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
since 𝒄(𝒙)𝟎𝒄superscript𝒙0{\bm{c}}({\bm{x}}^{\star})\leq{\bm{0}}bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ≤ bold_0 and ψρsubscript𝜓superscript𝜌\psi_{\rho^{\ast}}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is increasing for any ρ>0superscript𝜌0\rho^{\ast}>0italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0, cf. (1e), and since 𝜼kεknormsuperscript𝜼𝑘subscript𝜀𝑘\|{\bm{\eta}}^{k}\|\leq\varepsilon_{k}∥ bold_italic_η start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT,
\displaystyle\leq{} q(𝒙)+μkΨαk/μk(𝟎)+εk𝒙𝒙k(αkα)[𝒄(𝒙k)]+1𝑞superscript𝒙subscript𝜇𝑘subscriptΨsubscript𝛼𝑘subscript𝜇𝑘0subscript𝜀𝑘normsuperscript𝒙superscript𝒙𝑘subscript𝛼𝑘𝛼subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘1\displaystyle q({\bm{x}}^{\star})+\mu_{k}\Psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k% }}}}({\bm{0}})+\varepsilon_{k}\|{\bm{x}}^{\star}-{\bm{x}}^{k}\|-(\alpha_{k}-% \alpha)\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1}italic_q ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( bold_0 ) + italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ - ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α ) ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
=\displaystyle={}= q(𝒙)+mμkψαk/μk(0)+εk𝒙𝒙k(αkα)[𝒄(𝒙k)]+1𝑞superscript𝒙𝑚subscript𝜇𝑘subscript𝜓subscript𝛼𝑘subscript𝜇𝑘0subscript𝜀𝑘normsuperscript𝒙superscript𝒙𝑘subscript𝛼𝑘𝛼subscriptnormsubscriptdelimited-[]𝒄superscript𝒙𝑘1\displaystyle q({\bm{x}}^{\star})+m\mu_{k}\psi_{\nicefrac{{\alpha_{k}}}{{\mu_{% k}}}}(0)+\varepsilon_{k}\|{\bm{x}}^{\star}-{\bm{x}}^{k}\|-(\alpha_{k}-\alpha)% \|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1}italic_q ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + italic_m italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( 0 ) + italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ - ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α ) ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (8)
=\displaystyle={}= q(𝒙)+mμk((αk/μk))+εk𝒙𝒙k(αkα)[𝒄(𝒙k)]+1,\displaystyle q({\bm{x}}^{\star})+m\mu_{k}\bigl{(}-{}^{\ast}{}(\nicefrac{{% \alpha_{k}}}{{\mu_{k}}})\bigr{)}+\varepsilon_{k}\|{\bm{x}}^{\star}-{\bm{x}}^{k% }\|-(\alpha_{k}-\alpha)\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{1},italic_q ( bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + italic_m italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) ) + italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ - ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α ) ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

where the last identity uses (1d). Therefore, dividing by αkαsubscript𝛼𝑘𝛼\alpha_{k}-\alphaitalic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α and denoting ρkαk/μksubscriptsuperscript𝜌𝑘subscript𝛼𝑘subscript𝜇𝑘\rho^{\ast}_{k}\coloneqq\nicefrac{{\alpha_{k}}}{{\mu_{k}}}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≔ / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ↗ ∞,

pk=[𝒄(𝒙k)]+[𝒄(𝒙k)]+1mαkαkα(ρk)ρk0+εkαkα0𝒙𝒙kp_{k}=\|[{\bm{c}}({\bm{x}}^{k})]_{+}\|_{\infty}\leq\|[{\bm{c}}({\bm{x}}^{k})]_% {+}\|_{1}\leq m\tfrac{\alpha_{k}}{\alpha_{k}-\alpha}{{\underbracket{\tfrac{-{}% ^{\ast}{}(\rho^{\ast}_{k})}{\rho^{\ast}_{k}}}_{\to 0}}}+{{\underbracket{\tfrac% {\varepsilon_{k}}{\alpha_{k}-\alpha}\vphantom{\tfrac{-{}^{\ast}{}(\rho^{\ast}_% {k})}{\rho^{\ast}_{k}}}}_{\to 0}}}\|{\bm{x}}^{\star}-{\bm{x}}^{k}\|italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ ∥ [ bold_italic_c ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_m divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α end_ARG under﹈ start_ARG divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT → 0 end_POSTSUBSCRIPT + under﹈ start_ARG divide start_ARG italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α end_ARG end_ARG start_POSTSUBSCRIPT → 0 end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥

holds for all k𝑘kitalic_k large. Along any convergent subsequence, it is clear that pk0subscript𝑝𝑘0p_{k}\to 0italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → 0, hence that the corresponding limit point 𝒙¯¯𝒙\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG satisfies 𝒄(𝒙¯)𝟎𝒄¯𝒙0{\bm{c}}(\bar{\bm{x}})\leq{\bm{0}}bold_italic_c ( over¯ start_ARG bold_italic_x end_ARG ) ≤ bold_0, and is thus optimal in view of Algorithm 1 and convexity of the problem.

If the entire sequence (𝒙k)ksubscriptsuperscript𝒙𝑘𝑘absent({\bm{x}}^{k})_{k\in\m@thbbch@rN}( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT is contained in B(𝟎;R)B0𝑅\operatorname{B}({\bm{0}};R)roman_B ( bold_0 ; italic_R ) for some R>0𝑅0R>0italic_R > 0, then

pk2m(ρk)ρkmαkαkα(ρk)ρk+2Rεkαkα2m(ρk)ρk=12αkαkα+Rεkαkαkαμkm(αk/μk).\frac{p_{k}}{\frac{-2m{}^{\ast}{}(\rho^{\ast}_{k})}{\rho^{\ast}_{k}}}\leq\frac% {m\tfrac{\alpha_{k}}{\alpha_{k}-\alpha}\tfrac{-{}^{\ast}{}(\rho^{\ast}_{k})}{% \rho^{\ast}_{k}}+2R\tfrac{\varepsilon_{k}}{\alpha_{k}-\alpha}\vphantom{\tfrac{% -{}^{\ast}{}(\rho^{\ast}_{k})}{\rho^{\ast}_{k}}}}{\frac{-2m{}^{\ast}{}(\rho^{% \ast}_{k})}{\rho^{\ast}_{k}}}=\tfrac{1}{2}\tfrac{\alpha_{k}}{\alpha_{k}-\alpha% }+R\varepsilon_{k}\tfrac{\alpha_{k}}{\alpha_{k}-\alpha}\frac{\mu_{k}}{-m{}^{% \ast}{}(\nicefrac{{\alpha_{k}}}{{\mu_{k}}})}.divide start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG divide start_ARG - 2 italic_m start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG ≤ divide start_ARG italic_m divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α end_ARG divide start_ARG - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + 2 italic_R divide start_ARG italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α end_ARG end_ARG start_ARG divide start_ARG - 2 italic_m start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α end_ARG + italic_R italic_ε start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α end_ARG divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG - italic_m start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) end_ARG .

If, contrary to the claim, αksubscript𝛼𝑘\alpha_{k}\nearrow\inftyitalic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↗ ∞, then appealing to Item 2 the right-hand side of the above inequality converges to 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG as k𝑘k\to\inftyitalic_k → ∞, and in particular is eventually smaller than one. That is, pk2mμkαk(αk/μk)p_{k}\leq-2m\frac{\mu_{k}}{\alpha_{k}}{}^{\ast}{}(\nicefrac{{\alpha_{k}}}{{\mu% _{k}}})italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ - 2 italic_m divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) eventually always holds and αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT thus never updated, a contradiction.

Finally, the claim about the rate of pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT follows from Section 4.1. ∎

4.3 Equalities and bilateral constraints

In previous sections we described an algorithm for the inequality constrained problem (1), but our approach can be applied to problems with bilateral and equality constraints as well. There are several possibilities for incorporating such constraints in the format (1) by means of reformulations. Below we discuss only a few options and refer to the related discussion in [7, §4.1.4] for more. Here we focus on two key aspects: on the one hand, for the sake of efficiency, we intend to exploit the additional problem structure available when bilateral constraints are explicitly specified by the user. On the other hand, for the sake of robustness, we need to tolerate and handle bad formulations as well. The latter point is particularly significant for large-scale, possibly automatically generated, optimization models, whereby it might be impractical to parse the constraints specification with the goal of uncovering, e.g., hidden equalities. In practice, nonlinear problems (1) may contain (approximate) constraint redundancies leading to singularities in the Jacobian: thanks to the penalty-barrier regularization, we expect Algorithm 1 to cope with these kinds of degeneracy.

Two inequalities

An equality constraint ci(𝒙)=0subscript𝑐𝑖𝒙0{{c}}_{i}({\bm{x}})=0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) = 0 can be split into the pair of inequalities ci(𝒙)0subscript𝑐𝑖𝒙0{{c}}_{i}({\bm{x}})\leq 0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ 0 and ci(𝒙)0subscript𝑐𝑖𝒙0-{{c}}_{i}({\bm{x}})\leq 0- italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ 0, making the technique developed for (1) directly applicable. However, following the approach outlined in Section 3, this can result in a flat portion around the feasible set, and so in the absence of an effective penalization term. This circumstance is demonstrated by the fact that

ψρ±[0,0](t)ψρ(t)+ψρ(t)={(|t|)+ρ|t|(ρ)if (|t|)ρ,2(ρ)otherwise,{\psi_{\rho^{\ast}}^{\pm[0,0]}(t)\coloneqq\psi_{\rho^{\ast}}(t)+\psi_{\rho^{% \ast}}(-t)={\mathopen{}\left\{\begin{array}[]{l @{\hspace{\ifcasescolsep}} >{% \text{if~}}l }\eulerb{}(-|t|)+\rho^{\ast}|t|-{}^{\ast}{}(\rho^{\ast})\hfil% \hskip 10.00002pt&\leavevmode\nobreak\ }{}^{\prime}{}(-|t|)\leq\rho^{\ast},\\ -2{}^{\ast}{}(\rho^{\ast})\hfil\hskip 10.00002pt&\lx@intercol\text{otherwise,}% \hfil\lx@intercol\end{array}\right.\mathclose{}}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT ( italic_t ) ≔ italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) + italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( - italic_t ) = { start_ARRAY start_ROW start_CELL ( - | italic_t | ) + italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_t | - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL start_CELL if start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( - | italic_t | ) ≤ italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL - 2 start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL start_CELL otherwise, end_CELL end_ROW end_ARRAY (9)

which displays a flat region around t=0𝑡0t=0italic_t = 0 with radius ()(ρ)>0-({}^{\ast})^{\prime}{}(\rho^{\ast})>0- ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > 0 vanishing as ρsuperscript𝜌\rho^{\ast}\to\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞.

This splitting into a pair of inequality constraints also affects the theoretical motivations of Section 3. In fact, applying this technique, constraint qualifications fail to hold for 𝒄(𝒙)𝟎𝒄𝒙0{\bm{c}}({\bm{x}})\leq{\bm{0}}bold_italic_c ( bold_italic_x ) ≤ bold_0 in (1). Therefore, as these typically guarantee boundedness of the set of optimal Lagrange multipliers, the penalty exactness may not apply. These considerations underline that it is important to handle equality constraints carefully.

We now retrace the developments of Section 3 focusing on problems of the form

minimize𝒙nq(𝒙)subjectto𝒍𝒄(𝒙)𝒖.\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}{}\leavevmode\nobreak\ q({\bm{x}})% \quad\operatorname{subject\ to}{}\leavevmode\nobreak\ {\bm{l}}\leq{\bm{c}}({% \bm{x}})\leq{\bm{u}}.roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) start_OPFUNCTION roman_subject roman_to end_OPFUNCTION bold_italic_l ≤ bold_italic_c ( bold_italic_x ) ≤ bold_italic_u . (10)

Formulating the penalty problem—in the form (3.1)—with two slack variables 𝒔u,𝒔lsubscript𝒔usubscript𝒔l{{\bm{s}}}_{\rm u},{{\bm{s}}}_{\rm l}bold_italic_s start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT roman_l end_POSTSUBSCRIPT as

minimize𝒙,n𝒔u,𝒔lm\displaystyle\operatorname*{minimize}_{{\bm{x}}\in{}^{n},{{\bm{s}}}_{\rm u},{{% \bm{s}}}_{\rm l}\in{}^{m}}\quadroman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT roman_l end_POSTSUBSCRIPT ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT q(𝒙)+α𝟏,𝒔u+𝒔l+δ+m(𝒔u)+δ+m(𝒔l)\displaystyle q({\bm{x}})+\alpha{\mathopen{}\left\langle{}{\bm{1}}{},{}{{\bm{s% }}}_{\rm u}+{{\bm{s}}}_{\rm l}{}\right\rangle\mathclose{}}+\operatorname{% \delta}_{{}_{+}^{m}}({{\bm{s}}}_{\rm u})+\operatorname{\delta}_{{}_{+}^{m}}({{% \bm{s}}}_{\rm l})italic_q ( bold_italic_x ) + italic_α ⟨ bold_1 , bold_italic_s start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT + bold_italic_s start_POSTSUBSCRIPT roman_l end_POSTSUBSCRIPT ⟩ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT ) + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT roman_l end_POSTSUBSCRIPT )
subjecttosubjectto\displaystyle\operatorname{subject\ to}\quadstart_OPFUNCTION roman_subject roman_to end_OPFUNCTION 𝒍𝒔l𝒄(𝒙)𝒖+𝒔u𝒍subscript𝒔l𝒄𝒙𝒖subscript𝒔u\displaystyle{\bm{l}}-{{\bm{s}}}_{\rm l}\leq{\bm{c}}({\bm{x}})\leq{\bm{u}}+{{% \bm{s}}}_{\rm u}bold_italic_l - bold_italic_s start_POSTSUBSCRIPT roman_l end_POSTSUBSCRIPT ≤ bold_italic_c ( bold_italic_x ) ≤ bold_italic_u + bold_italic_s start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT

results in a penalty barrier problem analogous to (3.2) after marginalization with respect to 𝒔u,𝒔lsubscript𝒔usubscript𝒔l{{\bm{s}}}_{\rm u},{{\bm{s}}}_{\rm l}bold_italic_s start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT roman_l end_POSTSUBSCRIPT, which reads

minimize𝒙nq(𝒙)+μΨα/μ(𝒄(𝒙)𝒖)+μΨα/μ(𝒍𝒄(𝒙)).\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}\quad q({\bm{x}})+\mu\Psi_{% \nicefrac{{\alpha}}{{\mu}}}({\bm{c}}({\bm{x}})-{\bm{u}})+\mu\Psi_{\nicefrac{{% \alpha}}{{\mu}}}({\bm{l}}-{\bm{c}}({\bm{x}})).roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_μ roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( bold_italic_c ( bold_italic_x ) - bold_italic_u ) + italic_μ roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( bold_italic_l - bold_italic_c ( bold_italic_x ) ) .

Including the sum ψα/μ±[l,u]ψα/μ(u)+ψα/μ(l)\psi_{\nicefrac{{\alpha}}{{\mu}}}^{\pm[l,u]}\coloneqq\psi_{\nicefrac{{\alpha}}% {{\mu}}}(\cdot-u)+\psi_{\nicefrac{{\alpha}}{{\mu}}}(l-\cdot)italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT ≔ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( ⋅ - italic_u ) + italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_l - ⋅ ), it is easy to see that this approach is equivalent to specifying two (independent) inequalities. Thus, considering two slack variables does not overcome the issue of flat regions discussed above for (hidden) equalities. Instead, applying the technique above with only one slack variable appears to be a more reasonable approach for handling equality constraints, as we are about to show.

Combined marginalization

For problems with bilateral constraints as (10) we may formulate the associated penalty problem, in analogy with (3.1), with a single slack variable 𝒔𝒔{\bm{s}}bold_italic_s as

minimize𝒙,n𝒔mq(𝒙)+α𝟏,𝒔+δ+m(𝒔)subjectto𝒍𝒔𝒄(𝒙)𝒖+𝒔.\operatorname*{minimize}_{{\bm{x}}\in{}^{n},{\bm{s}}\in{}^{m}}\leavevmode% \nobreak\ q({\bm{x}})+\alpha{\mathopen{}\left\langle{}{\bm{1}}{},{}{\bm{s}}{}% \right\rangle\mathclose{}}+\operatorname{\delta}_{{}_{+}^{m}}({\bm{s}})\quad% \operatorname{subject\ to}{}\leavevmode\nobreak\ {\bm{l}}-{\bm{s}}\leq{\bm{c}}% ({\bm{x}})\leq{\bm{u}}+{\bm{s}}.roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT , bold_italic_s ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_α ⟨ bold_1 , bold_italic_s ⟩ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) start_OPFUNCTION roman_subject roman_to end_OPFUNCTION bold_italic_l - bold_italic_s ≤ bold_italic_c ( bold_italic_x ) ≤ bold_italic_u + bold_italic_s .

Then, as for (3.2), we introduce a barrier to replace the inequality constraints, obtaining

minimize𝒙,n𝒔mq(𝒙)+α𝟏,𝒔+δ+m(𝒔)+μi=1m(ci(𝒙)uisi)+μi=1m(lici(𝒙)si).\operatorname*{minimize}_{{\bm{x}}\in{}^{n},{\bm{s}}\in{}^{m}}\leavevmode% \nobreak\ q({\bm{x}})+\alpha{\mathopen{}\left\langle{}{\bm{1}}{},{}{\bm{s}}{}% \right\rangle\mathclose{}}+\operatorname{\delta}_{{}_{+}^{m}}({\bm{s}})+\mu% \sum_{i=1}^{m}\eulerb{}({{c}}_{i}({\bm{x}})-{{u}}_{i}-{{s}}_{i})+\mu\sum_{i=1}% ^{m}\eulerb{}({{l}}_{i}-{{c}}_{i}({\bm{x}})-{{s}}_{i}).roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT , bold_italic_s ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_α ⟨ bold_1 , bold_italic_s ⟩ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_s ) + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Following this strategy, the marginalization subproblem corresponding to a constraint lici(𝒙)uisubscript𝑙𝑖subscript𝑐𝑖𝒙subscript𝑢𝑖{{l}}_{i}\leq{{c}}_{i}({\bm{x}})\leq{{u}}_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT yields a definition for the counterpart of ψρsubscript𝜓superscript𝜌\psi_{\rho^{\ast}}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in (1) with bounds—see also (1):

ψα/μ[l,u](t)infs0{αμs+(tus)+(lts)}\psi_{\nicefrac{{\alpha}}{{\mu}}}^{[l,u]}(t)\coloneqq\operatorname*{inf}_{s% \geq 0}{\mathopen{}\left\{\tfrac{\alpha}{\mu}s+\eulerb{}(t-u-s)+\eulerb{}(l-t-% s){}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT ( italic_t ) ≔ roman_inf start_POSTSUBSCRIPT italic_s ≥ 0 end_POSTSUBSCRIPT { divide start_ARG italic_α end_ARG start_ARG italic_μ end_ARG italic_s + ( italic_t - italic_u - italic_s ) + ( italic_l - italic_t - italic_s ) ∣ } (11)

and, in analogy to (3.2), leads to the penalty-barrier subproblem

minimize𝒙nq(𝒙)+μΨα/μ[𝒍,𝒖](𝒄(𝒙)),\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}\leavevmode\nobreak\ q({\bm{x}})+% \mu\Psi_{\nicefrac{{\alpha}}{{\mu}}}^{[{\bm{l}},{\bm{u}}]}({\bm{c}}({\bm{x}})),roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) + italic_μ roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ bold_italic_l , bold_italic_u ] end_POSTSUPERSCRIPT ( bold_italic_c ( bold_italic_x ) ) ,

where Ψρ[𝒍,𝒖](𝒚)i=1mψρ[li,ui](yi)superscriptsubscriptΨsuperscript𝜌𝒍𝒖𝒚superscriptsubscript𝑖1𝑚superscriptsubscript𝜓superscript𝜌subscript𝑙𝑖subscript𝑢𝑖subscript𝑦𝑖\Psi_{\rho^{\ast}}^{[{\bm{l}},{\bm{u}}]}({\bm{y}})\coloneqq\sum_{i=1}^{m}\psi_% {\rho^{\ast}}^{[{{l}}_{i},{{u}}_{i}]}({{y}}_{i})roman_Ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ bold_italic_l , bold_italic_u ] end_POSTSUPERSCRIPT ( bold_italic_y ) ≔ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). A closed-form expression for the marginalization subproblem (11) can be obtained for specific barrier choices. For the inverse barrier function (t)=1t𝑡1𝑡\eulerb{}(t)=-\frac{1}{t}( italic_t ) = - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG (extended as \infty on +), the optimal value for the auxiliary variable is given by

sρ[l,u](t+l+u2)max{0,t2+1ρ+4ρt2+1(ρ)2+lu2}.s_{\rho^{\ast}}^{[l,u]}\bigl{(}t+\tfrac{l+u}{2}\bigr{)}\coloneqq\max{\mathopen% {}\left\{0,\sqrt{t^{2}+\frac{1}{\rho^{\ast}}+\sqrt{\frac{4}{\rho^{\ast}}t^{2}+% \frac{1}{(\rho^{\ast})^{2}}}}+\frac{l-u}{2}{}\mathrel{\mid}{}{}\right\}% \mathclose{}}.italic_s start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT ( italic_t + divide start_ARG italic_l + italic_u end_ARG start_ARG 2 end_ARG ) ≔ roman_max { 0 , square-root start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG + square-root start_ARG divide start_ARG 4 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG end_ARG + divide start_ARG italic_l - italic_u end_ARG start_ARG 2 end_ARG ∣ } .

Similarly, the oracle for the log-like barrier function (t)=lnt1t𝑡𝑡1𝑡\eulerb{}(t)=\ln\frac{t-1}{t}( italic_t ) = roman_ln divide start_ARG italic_t - 1 end_ARG start_ARG italic_t end_ARG (extended as \infty on +) reads

sρ[l,u](t+l+u2)max{0,ρt2+ρ4+1+4ρt2+t2+1ρ12+lu2}.s_{\rho^{\ast}}^{[l,u]}\bigl{(}t+\tfrac{l+u}{2}\bigr{)}\coloneqq\max{\mathopen% {}\left\{0,\sqrt{\rho^{\ast}t^{2}+\frac{\rho^{\ast}}{4}+1+\sqrt{\frac{4}{\rho^% {\ast}}t^{2}+t^{2}+\frac{1}{\rho^{\ast}}}}-\frac{1}{2}+\frac{l-u}{2}{}\mathrel% {\mid}{}{}\right\}\mathclose{}}.italic_s start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT ( italic_t + divide start_ARG italic_l + italic_u end_ARG start_ARG 2 end_ARG ) ≔ roman_max { 0 , square-root start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG + 1 + square-root start_ARG divide start_ARG 4 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + divide start_ARG italic_l - italic_u end_ARG start_ARG 2 end_ARG ∣ } .

It is nevertheless easy to verify for arbitrary complying with Section 3.2 the sharp L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT penalization of the interval [𝒍,𝒖]𝒍𝒖[{\bm{l}},{\bm{u}}][ bold_italic_l , bold_italic_u ] attained in the limit.

{lemma}

[Limiting behavior of ψρ[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT]For any l,u𝑙𝑢absentl,u\in\m@thbbch@rRitalic_l , italic_u ∈ with lu𝑙𝑢l\leq uitalic_l ≤ italic_u one has that ψρ[l,u]/ρdist[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢superscript𝜌subscriptdist𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}/\rho^{\ast}\searrow\operatorname{dist}_{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT / italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↘ roman_dist start_POSTSUBSCRIPT [ italic_l , italic_u ] end_POSTSUBSCRIPT pointwise as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞.

Proof.

To begin with, notice that ψρ[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT as in (11) is bounded as

infs0{ρs+(|t|ul2s)}ψρ[l,u](t+l+u2)infs0{ρs+2(|t|ul2s)},\operatorname*{inf}_{s\geq 0}{\mathopen{}\left\{\rho^{\ast}s+\eulerb{}\bigl{(}% |t|-\tfrac{u-l}{2}-s\bigr{)}{}\mathrel{\mid}{}{}\right\}\mathclose{}}\leq\psi_% {\rho^{\ast}}^{[l,u]}\bigl{(}t+\tfrac{l+u}{2}\bigr{)}\leq\operatorname*{inf}_{% s\geq 0}{\mathopen{}\left\{\rho^{\ast}s+2\eulerb{}\bigl{(}|t|-\tfrac{u-l}{2}-s% \bigr{)}{}\mathrel{\mid}{}{}\right\}\mathclose{}},roman_inf start_POSTSUBSCRIPT italic_s ≥ 0 end_POSTSUBSCRIPT { italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_s + ( | italic_t | - divide start_ARG italic_u - italic_l end_ARG start_ARG 2 end_ARG - italic_s ) ∣ } ≤ italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT ( italic_t + divide start_ARG italic_l + italic_u end_ARG start_ARG 2 end_ARG ) ≤ roman_inf start_POSTSUBSCRIPT italic_s ≥ 0 end_POSTSUBSCRIPT { italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_s + 2 ( | italic_t | - divide start_ARG italic_u - italic_l end_ARG start_ARG 2 end_ARG - italic_s ) ∣ } ,

where the first inequality owes to the fact that >0absent0\eulerb{}>0> 0, and the second one to the fact that >0{}^{\prime}{}>0start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT > 0 (hence that is increasing). Again deferring to A.2 for the details, such lower and upper bounds coincide with unilateral smooth penalties as in Section 3.2, namely

ψρ(|t|ul2)ψρ[l,u](t+l+u2)2ψρ/2(|t|ul2).subscript𝜓superscript𝜌𝑡𝑢𝑙2superscriptsubscript𝜓superscript𝜌𝑙𝑢𝑡𝑙𝑢22subscript𝜓superscript𝜌2𝑡𝑢𝑙2\psi_{\rho^{\ast}}\bigl{(}|t|-\tfrac{u-l}{2}\bigr{)}\leq\psi_{\rho^{\ast}}^{[l% ,u]}\bigl{(}t+\tfrac{l+u}{2}\bigr{)}\leq 2\psi_{\nicefrac{{\rho^{\ast}}}{{2}}}% \bigl{(}|t|-\tfrac{u-l}{2}\bigr{)}.italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( | italic_t | - divide start_ARG italic_u - italic_l end_ARG start_ARG 2 end_ARG ) ≤ italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT ( italic_t + divide start_ARG italic_l + italic_u end_ARG start_ARG 2 end_ARG ) ≤ 2 italic_ψ start_POSTSUBSCRIPT / start_ARG italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ( | italic_t | - divide start_ARG italic_u - italic_l end_ARG start_ARG 2 end_ARG ) . (12)

Dividing by ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and letting ρsuperscript𝜌\rho^{\ast}\to\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞, Item 2 yields that both the lower and upper bounds converge to [|t|ul2]+=dist[l,u](t+l+u2)subscriptdelimited-[]𝑡𝑢𝑙2subscriptdist𝑙𝑢𝑡𝑙𝑢2\bigl{[}|t|-\tfrac{u-l}{2}\bigr{]}_{+}=\operatorname{dist}_{[l,u]}\bigl{(}t+% \tfrac{l+u}{2}\bigr{)}[ | italic_t | - divide start_ARG italic_u - italic_l end_ARG start_ARG 2 end_ARG ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = roman_dist start_POSTSUBSCRIPT [ italic_l , italic_u ] end_POSTSUBSCRIPT ( italic_t + divide start_ARG italic_l + italic_u end_ARG start_ARG 2 end_ARG ), demonstrating the claim. ∎

The penalty-barrier function ψρ[0,0]superscriptsubscript𝜓superscript𝜌00\psi_{\rho^{\ast}}^{[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT associated to an equality constraint ci(𝒙)=0subscript𝑐𝑖𝒙0{{c}}_{i}({\bm{x}})=0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) = 0 is depicted in Fig. 2 and contrasted with the two-inequality term ψρ±[0,0]superscriptsubscript𝜓superscript𝜌plus-or-minus00\psi_{\rho^{\ast}}^{\pm[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT. An analogous comparison between ψρ[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT and μψρ±[l,u]𝜇superscriptsubscript𝜓superscript𝜌plus-or-minus𝑙𝑢\mu\psi_{\rho^{\ast}}^{\pm[l,u]}italic_μ italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT associated to the more general bilateral constraint lici(𝒙)uisubscript𝑙𝑖subscript𝑐𝑖𝒙subscript𝑢𝑖{{l}}_{i}\leq{{c}}_{i}({\bm{x}})\leq{{u}}_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≤ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is displayed in Fig. 3 for the case li<uisubscript𝑙𝑖subscript𝑢𝑖{{l}}_{i}<{{u}}_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The two scaled penalty-barrier terms μψρ[l,u]𝜇superscriptsubscript𝜓superscript𝜌𝑙𝑢\mu\psi_{\rho^{\ast}}^{[l,u]}italic_μ italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT and μψρ±[l,u]𝜇superscriptsubscript𝜓superscript𝜌plus-or-minus𝑙𝑢\mu\psi_{\rho^{\ast}}^{\pm[l,u]}italic_μ italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT have similar behaviors, according to Figs. 2(b) and 3(b), and lead to a well-justified procedure within Algorithm 1. In particular, both penalty-barrier terms converge to some sharp penalty function as μ0𝜇0\mu\searrow 0italic_μ ↘ 0 (for fixed α>0𝛼0\alpha>0italic_α > 0). However, ψρ±[l,u]superscriptsubscript𝜓superscript𝜌plus-or-minus𝑙𝑢\psi_{\rho^{\ast}}^{\pm[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT exhibits a flat region around the midpoint l+u2𝑙𝑢2\tfrac{l+u}{2}divide start_ARG italic_l + italic_u end_ARG start_ARG 2 end_ARG, whereas ψρ[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT is strictly convex everywhere; this stark contrast is especially apparent for ψρ[0,0]superscriptsubscript𝜓superscript𝜌00\psi_{\rho^{\ast}}^{[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT and ψρ±[0,0]superscriptsubscript𝜓superscript𝜌plus-or-minus00\psi_{\rho^{\ast}}^{\pm[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT at the origin, where only the former attains its unique minimum. Regardless, since the width of the flat portion of ψρ±[l,u]superscriptsubscript𝜓superscript𝜌plus-or-minus𝑙𝑢\psi_{\rho^{\ast}}^{\pm[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT vanishes as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞, Algorithm 1 can terminate for any ϵp,ϵd>0subscriptitalic-ϵpsubscriptitalic-ϵd0\epsilon_{\rm p},\epsilon_{\rm d}>0italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT > 0 even when invoked with hidden equality or bilateral constraints. The minor modifications needed to account for explicit equalities are given next.

{remark}

[Algorithm 1 with equality constraints]Two additional steps in Algorithm 1 allow it to handle problems of the form

minimize𝒙nq(𝒙)subjectto𝒄(𝒙)𝟎,𝒄eq(𝒙)=𝟎\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}\leavevmode\nobreak\ q({\bm{x}})% \quad\operatorname{subject\ to}\leavevmode\nobreak\ {\bm{c}}({\bm{x}})\leq{\bm% {0}},\leavevmode\nobreak\ {{\bm{c}}}_{\rm eq}({\bm{x}})={\bm{0}}roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_q ( bold_italic_x ) start_OPFUNCTION roman_subject roman_to end_OPFUNCTION bold_italic_c ( bold_italic_x ) ≤ bold_0 , bold_italic_c start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT ( bold_italic_x ) = bold_0

for some smooth mapping 𝒄eq:nmeq{{\bm{c}}}_{\rm eq}:{}^{n}\rightarrow{}^{m_{\rm eq}}bold_italic_c start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT : start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT → start_FLOATSUPERSCRIPT italic_m start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT. First, an equality multiplier 𝒚eqkmeq{{\bm{y}}}_{\rm eq}^{k}\in{}^{m_{\rm eq}}bold_italic_y start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ start_FLOATSUPERSCRIPT italic_m start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT is introduced; its update patterns the one of 𝒚ksuperscript𝒚𝑘{\bm{y}}^{k}bold_italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, but involving the derivative of ψαk/μk[0,0]superscriptsubscript𝜓subscript𝛼𝑘subscript𝜇𝑘00\psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k}}}}^{[0,0]}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT as opposed to that of ψαk/μksubscript𝜓subscript𝛼𝑘subscript𝜇𝑘\psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k}}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT, see the comment in Step 1.4, and thus not restricted in sign (as expected from equality multipliers). The infeasibility measure pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is captured in a straightforward manner. Altogether, the addition to the respective Steps 1.4 and 1.5 are as follows:

2a:𝒚eqk=μk(Ψαk/μk[0,0])(𝒄eq(𝒙k))superscriptsubscript𝒚eq𝑘subscript𝜇𝑘superscriptsuperscriptsubscriptΨsubscript𝛼𝑘subscript𝜇𝑘00subscript𝒄eqsuperscript𝒙𝑘{{\bm{y}}}_{\rm eq}^{k}=\mu_{k}\bigl{(}\Psi_{\nicefrac{{\alpha_{k}}}{{\mu_{k}}% }}^{[0,0]}\bigr{)}^{\prime}({{\bm{c}}}_{\rm eq}({\bm{x}}^{k}))bold_italic_y start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUBSCRIPT / start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_c start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) )
3a:pkmax{pk,𝒄eq(𝒙k)}p_{k}\leftarrow\max{\mathopen{}\left\{p_{k},\bigl{\|}{{\bm{c}}}_{\rm eq}({\bm{% x}}^{k})\bigr{\|}_{\infty}{}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← roman_max { italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ∥ bold_italic_c start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∣ }

All the convergence results remain valid for this variant; boundedness of αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in Section 4.2 is also recovered as long as m𝑚mitalic_m at Step 1.10 is replaced by m+2meq𝑚2subscript𝑚eqm+2m_{\rm eq}italic_m + 2 italic_m start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT, namely in such a way that equalities are counted as two inequalities. The reason behind this can be easily understood from the upper bound ψα/μ[0,0]2ψα/2μsuperscriptsubscript𝜓𝛼𝜇002subscript𝜓𝛼2𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}^{[0,0]}\leq 2\psi_{\nicefrac{{\alpha}}{{2\mu% }}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT ≤ 2 italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG 2 italic_μ end_ARG end_POSTSUBSCRIPT in (12), which can be used in (8) by turning the equality therein into an inequality.

Based on the discussion above, bilateral constraints should be explicitly specified, whenever possible, in order to employ the tailored penalty-barrier relaxation. Nevertheless, it is an important feature of our scheme that it can handle badly formulated models, for instance, with hidden equalities. These favorable qualities will be showcased in Section 5.5, where we present a numerical comparison of the two options.

Refer to caption
(a) Functions ψρ[0,0]superscriptsubscript𝜓superscript𝜌00\psi_{\rho^{\ast}}^{[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT and ψρ±[0,0]superscriptsubscript𝜓superscript𝜌plus-or-minus00\psi_{\rho^{\ast}}^{\pm[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT behave similarly but for the flat regions of ψρ±[0,0]superscriptsubscript𝜓superscript𝜌plus-or-minus00\psi_{\rho^{\ast}}^{\pm[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT around the origin. Apparently, ψρ[0,0](0),ψρ±[0,0](0)superscriptsubscript𝜓superscript𝜌000superscriptsubscript𝜓superscript𝜌plus-or-minus000\psi_{\rho^{\ast}}^{[0,0]}(0),\psi_{\rho^{\ast}}^{\pm[0,0]}(0)\nearrow\inftyitalic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT ( 0 ) , italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT ( 0 ) ↗ ∞ as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞.
Refer to caption
(b) As μ0𝜇0\mu\searrow 0italic_μ ↘ 0, both μψα/μ[0,0]𝜇superscriptsubscript𝜓𝛼𝜇00\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{[0,0]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT and μψα/μ±[0,0]𝜇superscriptsubscript𝜓𝛼𝜇plus-or-minus00\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{\pm[0,0]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT uniformly converge to the sharp L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT penalty α||\alpha|{}\cdot{}|italic_α | ⋅ |. μψα/μ[0,0]𝜇superscriptsubscript𝜓𝛼𝜇00\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{[0,0]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT is strictly convex, whereas μψα/μ±[0,0]𝜇superscriptsubscript𝜓𝛼𝜇plus-or-minus00\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{\pm[0,0]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT is flat around zero.
Figure 2: Graph of ψρ[0,0]superscriptsubscript𝜓superscript𝜌00\psi_{\rho^{\ast}}^{[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT and ψρ±[0,0]superscriptsubscript𝜓superscript𝜌plus-or-minus00\psi_{\rho^{\ast}}^{\pm[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT for different values of ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (left) and graph of μψα/μ[0,0]𝜇superscriptsubscript𝜓𝛼𝜇00\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{[0,0]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT and μψα/μ±[0,0]𝜇superscriptsubscript𝜓𝛼𝜇plus-or-minus00\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{\pm[0,0]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT for different values of μ𝜇\muitalic_μ (right), where ψρ[0,0]superscriptsubscript𝜓superscript𝜌00\psi_{\rho^{\ast}}^{[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT stems from the combined marginalization and ψρ±[0,0]superscriptsubscript𝜓superscript𝜌plus-or-minus00\psi_{\rho^{\ast}}^{\pm[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT arises from the formulation with two inequalities. These examples employ α1𝛼1\alpha\equiv 1italic_α ≡ 1 constant and the inverse barrier (t)=1t+δ(,0)𝑡1𝑡subscript𝛿0\eulerb{}(t)=-\frac{1}{t}+\operatorname{\delta}_{(-\infty,0)}( italic_t ) = - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG + italic_δ start_POSTSUBSCRIPT ( - ∞ , 0 ) end_POSTSUBSCRIPT.
Refer to caption
(a) Functions ψρ[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT and ψρ±[l,u]superscriptsubscript𝜓superscript𝜌plus-or-minus𝑙𝑢\psi_{\rho^{\ast}}^{\pm[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT behave similarly. Apparently, ψρ[l,u],ψρ±[l,u]±[l,u]\psi_{\rho^{\ast}}^{[l,u]},\psi_{\rho^{\ast}}^{\pm[l,u]}\nearrow\eulerb{}^{\pm% [l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT , italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT ↗ start_FLOATSUPERSCRIPT ± [ italic_l , italic_u ] end_FLOATSUPERSCRIPT as ρsuperscript𝜌\rho^{\ast}\nearrow\inftyitalic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↗ ∞, where ±[l,u](u)+(l)\eulerb{}^{\pm[l,u]}\coloneqq\eulerb{}(\cdot-u)+\eulerb{}(l-\cdot)start_FLOATSUPERSCRIPT ± [ italic_l , italic_u ] end_FLOATSUPERSCRIPT ≔ ( ⋅ - italic_u ) + ( italic_l - ⋅ ).
Refer to caption
(b) As μ0𝜇0\mu\searrow 0italic_μ ↘ 0, both μψα/μ[l,u]𝜇superscriptsubscript𝜓𝛼𝜇𝑙𝑢\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{[l,u]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT and μψα/μ±[l,u]𝜇superscriptsubscript𝜓𝛼𝜇plus-or-minus𝑙𝑢\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{\pm[l,u]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT converge to the sharp L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT penalty αdist[l,u]𝛼subscriptdist𝑙𝑢\alpha\operatorname{dist}_{[l,u]}italic_α roman_dist start_POSTSUBSCRIPT [ italic_l , italic_u ] end_POSTSUBSCRIPT, the former being strictly convex while the latter flat around l+u2𝑙𝑢2\tfrac{l+u}{2}divide start_ARG italic_l + italic_u end_ARG start_ARG 2 end_ARG.
Figure 3: Graph of ψρ[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT and ψρ±[l,u]superscriptsubscript𝜓superscript𝜌plus-or-minus𝑙𝑢\psi_{\rho^{\ast}}^{\pm[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT for different values of ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (left) and graph of μψα/μ[l,u]𝜇superscriptsubscript𝜓𝛼𝜇𝑙𝑢\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{[l,u]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT and μψα/μ±[l,u]𝜇superscriptsubscript𝜓𝛼𝜇plus-or-minus𝑙𝑢\mu\psi_{\nicefrac{{\alpha}}{{\mu}}}^{\pm[l,u]}italic_μ italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT for different values of μ𝜇\muitalic_μ (right), where ψρ[l,u]superscriptsubscript𝜓superscript𝜌𝑙𝑢\psi_{\rho^{\ast}}^{[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l , italic_u ] end_POSTSUPERSCRIPT stems from the combined marginalization and ψρ±[l,u]superscriptsubscript𝜓superscript𝜌plus-or-minus𝑙𝑢\psi_{\rho^{\ast}}^{\pm[l,u]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ italic_l , italic_u ] end_POSTSUPERSCRIPT arises from the formulation with two inequalities. These examples employ with l=0𝑙0l=0italic_l = 0, u=2𝑢2u=2italic_u = 2, α1𝛼1\alpha\equiv 1italic_α ≡ 1 constant and the inverse barrier (t)=1t+δ(,0)𝑡1𝑡subscript𝛿0\eulerb{}(t)=-\frac{1}{t}+\operatorname{\delta}_{(-\infty,0)}( italic_t ) = - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG + italic_δ start_POSTSUBSCRIPT ( - ∞ , 0 ) end_POSTSUBSCRIPT.

5 Numerical experiments

This section is dedicated to experimental results and comparisons with other numerical approaches for constrained structured optimization. We will refer to our implementation of Algorithm 1 as to Alg1. The perfomance and behavior of Alg1 is illustrated in different variants, considering two barrier functions, namely (t)=1t𝑡1𝑡\eulerb{}(t)=-\frac{1}{t}( italic_t ) = - divide start_ARG 1 end_ARG start_ARG italic_t end_ARG and (t)=lnt1t𝑡𝑡1𝑡\eulerb{}(t)=\ln\frac{t-1}{t}( italic_t ) = roman_ln divide start_ARG italic_t - 1 end_ARG start_ARG italic_t end_ARG (both extended as \infty on +) denominated inverse and log-like, respectively, and two inner solvers, NMPG [8] and PANOC+ [11, 24]. The numerical comparison will highlight the influence of the barrier function on the performance of Alg1, supporting the quality assessment of Section 4.1.

The two subsolvers follow a proximal-gradient scheme and can handle merely local smoothness (as opposed to global Lipschitz continuity of the gradient of the smooth term). NMPG combines a spectral stepsize with a nonmonotone globalization strategy. PANOC+ can exploit acceleration directions (e.g., of quasi-Newton type) while ensuring convergence with a backtracking linesearch, see also [25, §5.1].

The performance of Alg1 is compared against those of IPprox [12, Alg. 1] and ALPS [9, Alg. 4.1], based on [10]. IPprox builds upon a pure interior point scheme and solves the barrier subproblems with a tailored adaptive proximal-gradient algorithm. ALPS belongs to the family of augmented Lagrangian algorithms and does not require a custom subsolver—suitable subsolvers for Alg1 can be applied within ALPS and viceversa.

Patterning the simulations of [12, §5.2], we examine the nonnegative PCA problem in Section 5.2 to evaluate Alg1 in several variants and compare it against IPprox. Then, Section 5.3 focuses on a low-rank matrix completion task, a fully nonconvex problem with bilateral constraints, contrasting Alg1 and ALPS. Finally, the exact penalty behavior and the ability to handle hidden equalities are illustrated and discussed in Sections 5.4, LABEL: and 5.5, respectively.

The source code of our implementation has been made available for reproducibility of the numerical results presented in this paper; it can be found on Zenodo at doi: 10.5281/zenodo.11098283.

5.1 Implementation details

We describe here details pertinent to our implementation Alg1 of Algorithm 1, such as the initialization and update of algorithmic parameters. These numerical features tend to improve the practical performances, without compromising the convergence guarantees. IPprox is available from [12] and adopted as is, whereas ALPS is a slight modification of the code from [9] to be comparable with Alg1, as detailed below.

Alg1 accepts problems formulated as in (10), with bilateral bounds defined by extended-real-valued vectors 𝒍𝒍{\bm{l}}bold_italic_l and 𝒖𝒖{\bm{u}}bold_italic_u. In a preprocessing phase (within Alg1), these vectors are parsed to instantiate the penalty-barrier functions to treat one-sided, two-sided and equality constraints.

Default parameters for Alg1 are μ0=1subscript𝜇01\mu_{0}=1italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 and δε=δμ=1/4subscript𝛿𝜀subscript𝛿𝜇14\delta_{\varepsilon}=\delta_{\mu}=1/4italic_δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT = 1 / 4 as in IPprox, δα=1/2subscript𝛿𝛼12\delta_{\alpha}=1/2italic_δ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = 1 / 2 as in ALPS, and α0=1subscript𝛼01\alpha_{0}=1italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1. The initial tolerance ε0subscript𝜀0\varepsilon_{0}italic_ε start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for Alg1 (and ALPS) is chosen adaptively, based on the user-provided starting point 𝒙0superscript𝒙0{{\bm{x}}}^{0}bold_italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and penalty-barrier parameters. Matching the mechanism implemented in IPprox, we set ε0=max{ϵd,κεη0}subscript𝜀0subscriptitalic-ϵdsubscript𝜅𝜀subscript𝜂0\varepsilon_{0}=\max\{\epsilon_{\rm d},\kappa_{\varepsilon}\eta_{0}\}italic_ε start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT , italic_κ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT }, where κε(0,1)subscript𝜅𝜀01\kappa_{\varepsilon}\in(0,1)italic_κ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ∈ ( 0 , 1 ) is a user-specified parameter (default κε=102subscript𝜅𝜀superscript102\kappa_{\varepsilon}=10^{-2}italic_κ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT) and η0subscript𝜂0\eta_{0}italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is an estimate of the initial stationarity measure, as evaluated by the inner solver invoked at (𝒙0,α0,μ0)superscript𝒙0subscript𝛼0subscript𝜇0({{\bm{x}}}^{0},\alpha_{0},\mu_{0})( bold_italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). For simplicity, no infeasibility detection mechanism nor artificial bounds on penalty and barrier parameters have been included.

We run ALPS with the same settings as in [10, 9] apart from the following features to match Alg1: the initial penalty parameter is fixed (α0=1subscript𝛼01\alpha_{0}=1italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1) and not adaptive, the tolerance reduction factor is set to δε=1/4subscript𝛿𝜀14\delta_{\varepsilon}=\nicefrac{{1}}{{4}}italic_δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = / start_ARG 1 end_ARG start_ARG 4 end_ARG instead of δε=1/10subscript𝛿𝜀110\delta_{\varepsilon}=\nicefrac{{1}}{{10}}italic_δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = / start_ARG 1 end_ARG start_ARG 10 end_ARG, and the initial inner tolerance is selected adaptively and not fixed to ε0=ϵd1/3subscript𝜀0superscriptsubscriptitalic-ϵd13\varepsilon_{0}=\epsilon_{\rm d}^{\nicefrac{{1}}{{3}}}italic_ε start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT. We always initialize ALPS with dual estimate 𝒚0=𝟎superscript𝒚00{\bm{y}}^{0}={\bm{0}}bold_italic_y start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_0. The two subsolvers are considered with their default tuning: PANOC+ wih L-BFGS directions (memory 5) and monotone linesearch strategy as in [11], NMPG with spectral stepsize and nonmonotone globalization with average-based merit function as in [8].

For P𝑃Pitalic_P the set of problems and S𝑆Sitalic_S the set of solvers, let ts,psubscript𝑡𝑠𝑝t_{s,p}italic_t start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT denote the user-defined metric for the computational effort required by solver sS𝑠𝑆s\in Sitalic_s ∈ italic_S to solve instance pP𝑝𝑃p\in Pitalic_p ∈ italic_P (lower is better). We will monitor the (total) number of gradient evaluations, so that the computational overhead triggered by backtracking is fairly accounted for, and the number of (outer) iterations. Then, to graphically summarize our numerical results and compare different solvers, we display so-called data profiles. A data profile is the graph of the cumulative distribution function fs:[0,)[0,1]:subscript𝑓𝑠001f_{s}:[0,\infty)\rightarrow[0,1]italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT : [ 0 , ∞ ) → [ 0 , 1 ] of the evaluation metric, namely fs(t)|{pPts,pt}|/|P|subscript𝑓𝑠𝑡𝑝𝑃subscript𝑡𝑠𝑝𝑡𝑃f_{s}(t)\coloneqq|{\mathopen{}\left\{p\in P{}\mathrel{\mid}{}t_{s,p}\leq t% \right\}\mathclose{}}|/|P|italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_t ) ≔ | { italic_p ∈ italic_P ∣ italic_t start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT ≤ italic_t } | / | italic_P |. As such, each data profile reports the fraction of problems fs(t)subscript𝑓𝑠𝑡f_{s}(t)italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_t ) solved by solver s𝑠sitalic_s with a budget t𝑡titalic_t of evaluation metric, and therefore it is independent of the other solvers.

5.2 Nonnegative PCA

Principal component analysis (PCA) aims at estimating the direction of maximal variability of a high-dimensional dataset. Imposing nonnegativity of entries as prior knowledge, we address PCA restricted to the positive orthant:

maximize𝒙n𝒙𝒁𝒙subjectto𝒙=1,𝒙𝟎.\operatorname*{maximize}_{{\bm{x}}\in{}^{n}}\leavevmode\nobreak\ {\bm{x}}^{% \top}{\bm{Z}}{\bm{x}}\quad\operatorname{subject\ to}{}\leavevmode\nobreak\ \|{% \bm{x}}\|=1,\leavevmode\nobreak\ {\bm{x}}\geq{\bm{0}}.roman_maximize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_Z bold_italic_x start_OPFUNCTION roman_subject roman_to end_OPFUNCTION ∥ bold_italic_x ∥ = 1 , bold_italic_x ≥ bold_0 . (13)

This task falls within the scope of (1), with f(𝒙)𝒙𝒁𝒙𝑓𝒙superscript𝒙top𝒁𝒙f({\bm{x}})\coloneqq-{\bm{x}}^{\top}{\bm{Z}}{\bm{x}}italic_f ( bold_italic_x ) ≔ - bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_Z bold_italic_x, g(𝒙)δ=1(𝒙)g({\bm{x}})\coloneqq\operatorname{\delta}_{\|\cdot\|=1}({\bm{x}})italic_g ( bold_italic_x ) ≔ italic_δ start_POSTSUBSCRIPT ∥ ⋅ ∥ = 1 end_POSTSUBSCRIPT ( bold_italic_x ), and 𝒄(𝒙)=𝒙𝒄𝒙𝒙{\bm{c}}({\bm{x}})=-{\bm{x}}bold_italic_c ( bold_italic_x ) = - bold_italic_x, and has been considered in [12] for validating IPprox and tuning its hyperparameters.

Setup

We generate synthetic problem data as in [12, §5.2]. For a problem size n𝑛absentn\in\m@thbbch@rNitalic_n ∈, let 𝒁=σn𝒛𝒛+𝑵n×n{\bm{Z}}=\sqrt{\sigma_{n}}{\bm{z}}{\bm{z}}^{\top}+{\bm{N}}\in{}^{n\times n}bold_italic_Z = square-root start_ARG italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG bold_italic_z bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_N ∈ start_FLOATSUPERSCRIPT italic_n × italic_n end_FLOATSUPERSCRIPT, where 𝑵n×n{\bm{N}}\in{}^{n\times n}bold_italic_N ∈ start_FLOATSUPERSCRIPT italic_n × italic_n end_FLOATSUPERSCRIPT is a random symmetric noise matrix, 𝒛n{\bm{z}}\in{}^{n}bold_italic_z ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT is the true (random) principal direction, and σn>0subscript𝜎𝑛0\sigma_{n}>0italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > 0 is the signal-to-noise ratio. We consider some dimensions n𝑛nitalic_n and, for each dimension, the set of problems parametrized by σn{0.05,0.1,0.25,0.5,1.0}\sigma_{n}\in{\mathopen{}\left\{0.05,0.1,0.25,0.5,1.0{}\mathrel{\mid}{}{}% \right\}\mathclose{}}italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ { 0.05 , 0.1 , 0.25 , 0.5 , 1.0 ∣ } and σs{0.1,0.3,0.7,0.9}\sigma_{s}\in{\mathopen{}\left\{0.1,0.3,0.7,0.9{}\mathrel{\mid}{}{}\right\}% \mathclose{}}italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∈ { 0.1 , 0.3 , 0.7 , 0.9 ∣ }, which control the noise and sparsity level, respectively. There are 5 choices for σnsubscript𝜎𝑛\sigma_{n}italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, 4 for σssubscript𝜎𝑠\sigma_{s}italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and, for each set of parameters, 5 instances are generated with different problem data 𝒁𝒁{\bm{Z}}bold_italic_Z and starting point 𝒙0superscript𝒙0{{\bm{x}}}^{0}bold_italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. Overall, each solver-settings pair is invoked on 100 different instances for each dimension n𝑛nitalic_n.

A strictly feasible starting point 𝒙0superscript𝒙0{{\bm{x}}}^{0}bold_italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is generated by sampling a uniform distribution over [0,3]nsuperscript03𝑛[0,3]^{n}[ 0 , 3 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and projecting onto domg={𝒙n𝒙=1}\operatorname{dom}g={\mathopen{}\left\{{\bm{x}}\in{}^{n}{}\mathrel{\mid}{}\|{% \bm{x}}\|=1\right\}\mathclose{}}roman_dom italic_g = { bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT ∣ ∥ bold_italic_x ∥ = 1 }. This property is necessary for IPprox but not for Alg1. We will test Alg1 also with arbitrary initialization, in which case 𝒙0superscript𝒙0{{\bm{x}}}^{0}bold_italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is generated by sampling a uniform distribution over [3,3]nsuperscript33𝑛[-3,3]^{n}[ - 3 , 3 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and then projectin onto domgdom𝑔\operatorname{dom}groman_dom italic_g.

Barriers and subsolvers

Algorithm 1 is controlled by, and its performance depends on, several algorithmic hyperparameters, such as the (sequences of) barrier and penalty parameters, the choice of barrier function , and the subsolver adopted at Step 1.3. We now focus on the effect of the last two elements, for different levels of accuracy requirements, testing all combinations of barriers and subsolvers considering problem dimensions n{10,15,20,25,30}n\in{\mathopen{}\left\{10,15,20,25,30{}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_n ∈ { 10 , 15 , 20 , 25 , 30 ∣ }, for a total of 500 calls to each solver. For this set of experiments we set a time limit of 100 seconds on each call.

The results are graphically summarized in Fig. 4. As IPprox performed almost identically with the two barrier functions, only the inverse variant (originally considered in [12]) is displayed for clarity. Moreover, because of the excessive run time to perform all simulations for IPprox with high accuracy, we exclude it altogether for the high accuracy tests and consider instead starting points 𝒙0superscript𝒙0{{\bm{x}}}^{0}bold_italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT that are not necessarily (strictly) feasible.

For low and medium accuracy, all instances are solved by Alg1 and IPprox up to the desired primal-dual tolerances. With high accuracy, only the variant of Alg1 with NMPG and log-like is not able to solve all instances within the time limit. Across all accuracy levels, Alg1 PANOC+ inverse operates consistently better than the other variants of Alg1, all of which outperform IPprox. In particular, the overall effort (number of gradient evaluations) required by PANOC+ is less than NMPG (for fixed barrier) and with the inverse barrier is less than with the log-like (for fixed subsolver). With increasing accuracy it becomes more efficient to adopt PANOC+ than NMPG, while IPprox performs gradually more poorly. The slow tail convergence typical of first-order schemes badly affects the scalability of IPprox, whereas the adoption of a quasi-Newton scheme within PANOC+ seems to beat the simpler spectral approximation in NMPG.

In terms of (outer) iterations, the results appear unsurprisingly independent on the subsolver. Moreover, the log-like barrier invariably demands the solution of fewer subproblems than the inverse one, in agreement with the discussion in Section 4.1 on the barrier’s well behavior. However, we emphasize that the overall computational effort (measured in terms of gradient evaluations) also depends on the subsolver’s efficiency in solving the subproblems, as demonstrated by Fig. 4.

Refer to caption
Figure 4: Nonnegative PCA problem (13): comparison of solvers with low, medium and high accuracy ϵp=ϵd=ε{103,104,105}subscriptitalic-ϵpsubscriptitalic-ϵd𝜀superscript103superscript104superscript105\epsilon_{\rm p}=\epsilon_{\rm d}=\varepsilon\in\{10^{-3},10^{-4},10^{-5}\}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = italic_ε ∈ { 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT } (top to bottom) using data profiles relative to the number of gradient evaluations (left) and outer iterations (right). With high accuracy, Alg1 NMPG log-like is not able to solve all instances within the time limit, whereas results for IPprox are not included due to excessive run time.

Problem size and accuracy

To investigate scalability and influence of accuracy requirements, we consider instances of (13) with dimensions n{10,101.5,102,102.5,103}n\in{\mathopen{}\left\{10,\lceil 10^{1.5}\rceil,10^{2},\lceil 10^{2.5}\rceil,1% 0^{3}{}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_n ∈ { 10 , ⌈ 10 start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT ⌉ , 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ⌈ 10 start_POSTSUPERSCRIPT 2.5 end_POSTSUPERSCRIPT ⌉ , 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∣ } and tolerances ϵp=ϵd=ε{103,104,105}\epsilon_{\rm p}=\epsilon_{\rm d}=\varepsilon\in{\mathopen{}\left\{10^{-3},10^% {-4},10^{-5}{}\mathrel{\mid}{}{}\right\}\mathclose{}}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = italic_ε ∈ { 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT ∣ }, without time limit. For each of these tolerance parameters, we test Alg1 with PANOC+ and the log-like barrier. For this test, we generate 5 instances (as described above) for each set of parameters, leading to a total of 500 problem instances to be solved with increasing accuracy.

All instances are solved up to the desired primal-dual tolerances. The influence of problem size and tolerance is depicted in Fig. 5, which displays for each pair (n,ε)𝑛𝜀(n,\varepsilon)( italic_n , italic_ε ) the number of gradient evaluations with a jitter plot (for a better visualization of the distribution of numerical values over categories). The empirical cumulative distribution function with the associated median value are also indicated. This chart visualizes how problem size and accuracy requirement affect the solution process, and reveals the stark effect of both n𝑛nitalic_n and ε𝜀\varepsilonitalic_ε. For low accuracy, Alg1 scales relatively well with the problem size, whereas large-scale problems become prohibitive for high accuracy.

This behavior is typical of first-order methods, due to their slow tail convergence, and we take it as a motivation for investigating the interaction between subpoblems and subsolvers in future works. Nevertheless, these experiments (and those forthcoming) demonstrate Alg1’s capability to handle thousands of variables and constraints in a fully nonconvex optimization landscape, witnessing a tremendous improvement over IPprox, not only in the practical performance but also in the ease of use.

Refer to caption
Figure 5: Nonnegative PCA problem (13) with Alg1 PANOC+ log-like: comparison for increasing accuracy requirements (decreasing tolerances ϵp=ϵd=εsubscriptitalic-ϵpsubscriptitalic-ϵd𝜀\epsilon_{\rm p}=\epsilon_{\rm d}=\varepsilonitalic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = italic_ε) and problem sizes n𝑛nitalic_n. Combination of jitter plot (dots) and empirical cumulative distribution function (solid line) with median value (vertical segment).

5.3 Low-rank matrix completion

Given an incomplete matrix of (uncertain) ratings 𝒀𝒀{\bm{Y}}bold_italic_Y, a common task is to find a complete ratings matrix 𝑿𝑿{\bm{X}}bold_italic_X that is a parsimonious representation of 𝒀𝒀{\bm{Y}}bold_italic_Y, in the sense of low-rank, and such that 𝒀𝑿𝒀𝑿{\bm{Y}}\approx{\bm{X}}bold_italic_Y ≈ bold_italic_X for the entries available [19]. Let #usubscript#𝑢{\#_{u}}# start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and #msubscript#𝑚{\#_{m}}# start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT denote the number of users and items, respectively, and let the rating Yi,jsubscript𝑌𝑖𝑗{Y_{i,j}}italic_Y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT by the i𝑖iitalic_ith user for the j𝑗jitalic_jth item range on a scale defined by constants Yminsubscript𝑌Y_{\min}italic_Y start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT and Ymaxsubscript𝑌Y_{\max}italic_Y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. Let ΩΩ\Omegaroman_Ω represent the set of observed ratings, and |Ω|Ω|\Omega|| roman_Ω | the cardinality of ΩΩ\Omegaroman_Ω. The ratings matrix 𝒀𝒀{\bm{Y}}bold_italic_Y could be very large and often most of the entries are unobserved, since a given user will only rate a small subset of items. Low-rankness of 𝑿𝑿{\bm{X}}bold_italic_X can be enforced by construction, with the Ansatz 𝑿𝑼𝑽𝑿𝑼superscript𝑽top{\bm{X}}\equiv{\bm{U}}{\bm{V}^{\top}}bold_italic_X ≡ bold_italic_U bold_italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, as in dictionary learning. In practice, for some prescribed embedding dimension #asubscript#𝑎{\#_{a}}# start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, we seek a user embedding matrix 𝑼#u×#a{\bm{U}}\in{}^{{\#_{u}}\times{\#_{a}}}bold_italic_U ∈ start_FLOATSUPERSCRIPT # start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × # start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT and an item embedding matrix 𝑽#m×#a{\bm{V}}\in{}^{{\#_{m}}\times{\#_{a}}}bold_italic_V ∈ start_FLOATSUPERSCRIPT # start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT × # start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT. Each row 𝑼i,:subscript𝑼𝑖:{\bm{U}_{i,:}}bold_italic_U start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT of 𝑼𝑼{\bm{U}}bold_italic_U is a #asubscript#𝑎{\#_{a}}# start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT-dimensional vector representing user i𝑖iitalic_i, while each row 𝑽j,:subscript𝑽𝑗:{\bm{V}_{j,:}}bold_italic_V start_POSTSUBSCRIPT italic_j , : end_POSTSUBSCRIPT of 𝑽𝑽{\bm{V}}bold_italic_V is a #asubscript#𝑎{\#_{a}}# start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT-dimensional vector representing item j𝑗jitalic_j. We address the joint completion and factorization of the ratings matrix 𝒀𝒀{\bm{Y}}bold_italic_Y, encoded in the following form:

minimize𝑼,#u×#a𝑽#m×#a\displaystyle\operatorname*{minimize}_{{\bm{U}}\in{}^{{\#_{u}}\times{\#_{a}}},% {\bm{V}}\in{}^{{\#_{m}}\times{\#_{a}}}}\quadroman_minimize start_POSTSUBSCRIPT bold_italic_U ∈ start_FLOATSUPERSCRIPT # start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × # start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT , bold_italic_V ∈ start_FLOATSUPERSCRIPT # start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT × # start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT 1|Ω|(i,j)Ω(𝑼i,:,𝑽j,:Yi,j)2+λ#mj=1#m𝑽j,:01Ωsubscript𝑖𝑗Ωsuperscriptsubscript𝑼𝑖:subscript𝑽𝑗:subscript𝑌𝑖𝑗2𝜆subscript#𝑚superscriptsubscript𝑗1subscript#𝑚subscriptnormsubscript𝑽𝑗:0\displaystyle\frac{1}{|\Omega|}\sum_{(i,j)\in\Omega}{\mathopen{}\left({% \mathopen{}\left\langle{}{\bm{U}_{i,:}}{},{}{\bm{V}_{j,:}}{}\right\rangle% \mathclose{}}-{Y_{i,j}}\right)\mathclose{}}^{2}+\frac{\lambda}{{\#_{m}}}\sum_{% j=1}^{{\#_{m}}}\|{\bm{V}_{j,:}}\|_{0}divide start_ARG 1 end_ARG start_ARG | roman_Ω | end_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ roman_Ω end_POSTSUBSCRIPT ( ⟨ bold_italic_U start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT , bold_italic_V start_POSTSUBSCRIPT italic_j , : end_POSTSUBSCRIPT ⟩ - italic_Y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ end_ARG start_ARG # start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT # start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ bold_italic_V start_POSTSUBSCRIPT italic_j , : end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (14)
subjecttosubjectto\displaystyle\operatorname{subject\ to}\quadstart_OPFUNCTION roman_subject roman_to end_OPFUNCTION max{Ymin,Yi,j1}𝑼i,:,𝑽j,:min{Ymax,Yi,j+1}\displaystyle\max{\mathopen{}\left\{Y_{\min},{Y_{i,j}}-1{}\mathrel{\mid}{}{}% \right\}\mathclose{}}\leq{\mathopen{}\left\langle{}{\bm{U}_{i,:}}{},{}{\bm{V}_% {j,:}}{}\right\rangle\mathclose{}}\leq\min{\mathopen{}\left\{Y_{\max},{Y_{i,j}% }+1{}\mathrel{\mid}{}{}\right\}\mathclose{}}roman_max { italic_Y start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - 1 ∣ } ≤ ⟨ bold_italic_U start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT , bold_italic_V start_POSTSUBSCRIPT italic_j , : end_POSTSUBSCRIPT ⟩ ≤ roman_min { italic_Y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + 1 ∣ } (i,j)Ω,for-all𝑖𝑗Ω\displaystyle\forall(i,j)\in\Omega,∀ ( italic_i , italic_j ) ∈ roman_Ω ,
Ymin𝑼i,:,𝑽j,:Ymaxsubscript𝑌subscript𝑼𝑖:subscript𝑽𝑗:subscript𝑌\displaystyle Y_{\min}\leq{\mathopen{}\left\langle{}{\bm{U}_{i,:}}{},{}{\bm{V}% _{j,:}}{}\right\rangle\mathclose{}}\leq Y_{\max}italic_Y start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ≤ ⟨ bold_italic_U start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT , bold_italic_V start_POSTSUBSCRIPT italic_j , : end_POSTSUBSCRIPT ⟩ ≤ italic_Y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT (i,j)Ω,for-all𝑖𝑗Ω\displaystyle\forall(i,j)\notin\Omega,∀ ( italic_i , italic_j ) ∉ roman_Ω ,
𝑼i,:2=1subscriptnormsubscript𝑼𝑖:21\displaystyle\|{\bm{U}_{i,:}}\|_{2}=1∥ bold_italic_U start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 i{1,,#u}.\displaystyle\forall i\in{\mathopen{}\left\{1,\dots,{\#_{u}}{}\mathrel{\mid}{}% {}\right\}\mathclose{}}.∀ italic_i ∈ { 1 , … , # start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∣ } .

While aiming at 𝑼𝑽𝒀𝑼superscript𝑽top𝒀{\bm{U}}{\bm{V}^{\top}}\approx{\bm{Y}}bold_italic_U bold_italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ≈ bold_italic_Y, the model in (14) sets the rating range [Ymin,Ymax]subscript𝑌subscript𝑌[Y_{\min},Y_{\max}][ italic_Y start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] as a hard constraint for all predictions; a tighter constraint is imposed to observed ratings. Following [25, §6.2], we explicitly constrain the norm of the dictionary atoms 𝑼i,:subscript𝑼𝑖:{\bm{U}_{i,:}}bold_italic_U start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT, without loss of generality, to reduce the number of equivalent (up to scaling) solutions; this norm specification is included as an indicator in the nonsmooth objective term g𝑔gitalic_g. Furthermore, we encourage sparsity of the coefficient representation 𝑽j,:subscript𝑽𝑗:{\bm{V}_{j,:}}bold_italic_V start_POSTSUBSCRIPT italic_j , : end_POSTSUBSCRIPT with the 0\|\cdot\|_{0}∥ ⋅ ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT penalty, which counts the nonzero elements, scaled with a regularization parameter λ0𝜆0\lambda\geq 0italic_λ ≥ 0. Overall, this problem has n#a(#u+#m)𝑛subscript#𝑎subscript#𝑢subscript#𝑚n\coloneqq{\#_{a}}({\#_{u}}+{\#_{m}})italic_n ≔ # start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( # start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + # start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) decision variables and m#u#m𝑚subscript#𝑢subscript#𝑚m\coloneqq{\#_{u}}{\#_{m}}italic_m ≔ # start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT # start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bilateral constraints. All terms (f𝑓fitalic_f, g𝑔gitalic_g, and 𝒄𝒄{\bm{c}}bold_italic_c) are nonconvex, as well as the (unbounded) feasible set.

It appears nontrivial to find a strictly feasible point for (14), in the sense of [12, Def. 2], which is required for initializing IPprox, thus highlighting a major advantage of Alg1.

Setup

We consider the MovieLens 100k dataset444The entire dataset is available at https://grouplens.org/datasets/movielens/100k/., which contains 1000023100002310000231000023 ratings for 3706370637063706 unique movies (the dataset contains some repetitions in movie ratings and we have ignored them); these recommendations were made by 6040604060406040 users on a discrete rating scale from Ymin=1subscript𝑌1Y_{\min}=1italic_Y start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = 1 to Ymax=5subscript𝑌5Y_{\max}=5italic_Y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 5. If we construct a matrix of movie ratings by the users, then it is a sparse unstructured matrix with only 4.47%percent4.474.47\%4.47 % of the total entries available.

We compare Alg1 and ALPS and test their scalability with instances of increasing size. We fix the number of atoms to #a=10subscript#𝑎10{\#_{a}}=10# start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 10 and consider the instances of (14) corresponding to subsets of #u{10,15,,45}{\#_{u}}\in{\mathopen{}\left\{10,15,\ldots,45{}\mathrel{\mid}{}{}\right\}% \mathclose{}}# start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∈ { 10 , 15 , … , 45 ∣ } users (always starting from the first one). For these problem instances the sizes range n[7220,11060]𝑛722011060n\in[7220,11060]italic_n ∈ [ 7220 , 11060 ] and m[7120,47745]𝑚712047745m\in[7120,47745]italic_m ∈ [ 7120 , 47745 ]. We set the regularization parameter λ=102𝜆superscript102\lambda=10^{-2}italic_λ = 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT and invoke each solver with the primal-dual tolerances ϵp=ϵd=103subscriptitalic-ϵpsubscriptitalic-ϵdsuperscript103\epsilon_{\rm p}=\epsilon_{\rm d}=10^{-3}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and without time limit. For each problem instance, we randomly generated 10101010 starting points, for a total of 80808080 calls to each solver variant.

Results

A summary of the numerical results is depicted in Fig. 6. For the sake of clarity we display only the variant of ALPS with NMPG since it performed better than with PANOC+. Also the variants of Alg1 with the inverse barrier are not included, as their corresponding profiles were intermediate between those of Alg1 with the log-like barrier. All solver variants were always able to find a solution up to the primal-dual tolerance. Although not all variants of Alg1 consistently outperform ALPS, Alg1 NMPG log-like finishes ahead in most calls, but the advantage becomes not significant for larger instances. Alg1 PANOC+ log-like appears comparable to ALPS for smaller instances, but then it exhibits a performance much more sensitive to the starting point (its shaded area in Fig. 6 is significantly larger for larger instances). This behavior may stem from the more complicated mechanisms (search direction and globalizatin with line search) employed in PANOC+. These observations are in agreement with those in the previous Section 5.2 and highlight the potential benefits of subsolvers tailored to the subproblems’ structure.

Refer to caption
Figure 6: Comparison of different solvers and variants for the matrix completion problem (14). Data profiles (left) and computational effort for increasing problem size (right) relative to number of gradient evaluations. For each solver and problem size, the right panel indicates the mean value (marked line) and interquartile range (shaded area).

5.4 Exact penalty behavior

This subsection is dedicated to the penalty-barrier behavior of Alg1. When addressing the nonconvex problem (14), we observed that the sequences of penalty parameters (αk)ksubscriptsubscript𝛼𝑘𝑘absent(\alpha_{k})_{k\in\m@thbbch@rN}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT remain constant (equal to the initial value α0=1subscript𝛼01\alpha_{0}=1italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1) for all instances and solver variants. This is enabled by the relaxed condition at Step 1.10 of Algorithm 1, which does not require any sufficient improvement at every iteration, but instead monitors globally how the constraint violation pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT vanishes. Correspondingly, only the barrier parameter μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is decreased in order to reduce the complementarity slackness sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, see Item 3. When active, this exact penalty quality prevents the barrier to yield too much ill-conditioning, exploiting Item 3.

Even in the fully nonconvex problem of the previous Section 5.3, the sequence of penalty parameters (αk)ksubscriptsubscript𝛼𝑘𝑘absent(\alpha_{k})_{k\in\m@thbbch@rN}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT generated by Alg1 remains always bounded. Although this indicates that the assumptions behind Item 2 could be relaxed, the penalty exactness does not always take effect. We illustrate now an example problem where Alg1 exhibits αksubscript𝛼𝑘\alpha_{k}\nearrow\inftyitalic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↗ ∞, hence it does not boil down to an exact penalty method. For this purpose it suffices to consider the two-dimensional convex problem

minimize𝒙2x1+δ+(x2)subjecttox12+x20,\operatorname*{minimize}_{{\bm{x}}\in{}^{2}}\leavevmode\nobreak\ {{x}}_{1}+% \operatorname{\delta}_{{}_{+}}({{x}}_{2})\quad\operatorname{subject\ to}{}% \leavevmode\nobreak\ {{x}}_{1}^{2}+{{x}}_{2}\leq 0,roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_OPFUNCTION roman_subject roman_to end_OPFUNCTION italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 0 , (15)

whose (unique) solution is the only feasible point 𝒙=(0,0)superscript𝒙00{\bm{x}}^{\star}=(0,0)bold_italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = ( 0 , 0 ) which is however not 2.2-optimal (there exists no suitable multiplier 𝒚superscript𝒚{\bm{y}}^{\star}bold_italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT).

We intend to solve problem (15) with tolerances ϵp=ϵd=107subscriptitalic-ϵpsubscriptitalic-ϵdsuperscript107\epsilon_{\rm p}=\epsilon_{\rm d}=10^{-7}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT starting from 100 random initializations, generated according to xi0𝒩(0,σx2)similar-tosuperscriptsubscript𝑥𝑖0𝒩0superscriptsubscript𝜎𝑥2{{x}}_{i}^{0}\sim\mathcal{N}(0,\sigma_{x}^{2})italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) with large standard deviation σx=30subscript𝜎𝑥30\sigma_{x}=30italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = 30.

Results

All solver variants find a primal-dual solution to (15), up to the tolerances, for all starting points. The unbounded behavior of the penalty parameters (αk)ksubscriptsubscript𝛼𝑘𝑘absent(\alpha_{k})_{k\in\m@thbbch@rN}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT appears evident in Fig. 7. Thus, αksubscript𝛼𝑘\alpha_{k}\nearrow\inftyitalic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↗ ∞ seems necessary to drive the constraint violation pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to zero, while the barrier parameter μk0subscript𝜇𝑘0\mu_{k}\searrow 0italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↘ 0 forces the complementarity slackness sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Refer to caption
Figure 7: Illustration of the algorithmic behavior relative to the penalty parameter when solving the convex problem (15). Ensemble of trajectories for different instances, starting points and barrier functions, indicating how many times the penalty parameter αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is updated during the solution process. In all cases the sequence of penalty parameters (αk)ksubscriptsubscript𝛼𝑘𝑘absent(\alpha_{k})_{k\in\m@thbbch@rN}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ end_POSTSUBSCRIPT blows up.

The numerical performance of Alg1 and ALPS are summarized in Fig. 8, where ALPS is displayed only in the NMPG variant (which performed better than with PANOC+). Considering both the overall effort (number of gradient evaluations) and the number of subproblems needed by Alg1, the log-like barrier yields better results than the inverse (for fixed subsolver) and NMPG is more efficient than PANOC+ (for fixed barrier). Moreover, Alg1 performs consistently better than ALPS, despite the lack of penalty exactness.

Refer to caption
Figure 8: Comparison of different solvers and variants for the convex problem (15). Data profiles relative to the number of gradient evaluations (left) and total inner iterations (right).

5.5 Handling equalities

Even though bilateral constraints can be handled explicitly, as examined in Section 4.3, it is important that Alg1 can cope with hidden equalities too. These may appear as the result of automatic model constructions, and are often difficult to identify by inspection. Here we compare the behavior of Alg1 when the problem specification has explicit equalities against the same problem but whose constraints are described using two inequalities each. Consider possibly nonconvex quadratic programming (QP) problems of the form

minimize𝒙n12𝒙𝑸𝒙+𝒒,𝒙subjectto𝑨𝒙=𝒃,𝒙missing¯𝒙𝒙¯\operatorname*{minimize}_{{\bm{x}}\in{}^{n}}\leavevmode\nobreak\ \tfrac{1}{2}{% \bm{x}}^{\top}{\bm{Q}}{\bm{x}}+{\mathopen{}\left\langle{}{\bm{q}}{},{}{\bm{x}}% {}\right\rangle\mathclose{}}\quad\operatorname{subject\ to}{}\leavevmode% \nobreak\ {\bm{A}}{\bm{x}}={\bm{b}},\leavevmode\nobreak\ \underline{{\bm{x}}% missing}\leq{\bm{x}}\leq\overline{{\bm{x}}}roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_Q bold_italic_x + ⟨ bold_italic_q , bold_italic_x ⟩ start_OPFUNCTION roman_subject roman_to end_OPFUNCTION bold_italic_A bold_italic_x = bold_italic_b , under¯ start_ARG bold_italic_x roman_missing end_ARG ≤ bold_italic_x ≤ over¯ start_ARG bold_italic_x end_ARG (16)

with matrices 𝑸n×n{\bm{Q}}\in{}^{n\times n}bold_italic_Q ∈ start_FLOATSUPERSCRIPT italic_n × italic_n end_FLOATSUPERSCRIPT, 𝑨m×n{\bm{A}}\in{}^{m\times n}bold_italic_A ∈ start_FLOATSUPERSCRIPT italic_m × italic_n end_FLOATSUPERSCRIPT and vectors 𝒒,𝒙missing¯,𝒙¯n{\bm{q}},\underline{{\bm{x}}missing},\overline{{\bm{x}}}\in{}^{n}bold_italic_q , under¯ start_ARG bold_italic_x roman_missing end_ARG , over¯ start_ARG bold_italic_x end_ARG ∈ start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT, 𝒃m{\bm{b}}\in{}^{m}bold_italic_b ∈ start_FLOATSUPERSCRIPT italic_m end_FLOATSUPERSCRIPT as problem data. Problem (16) can be cast as (10) with f(𝒙)12𝒙𝑸𝒙+𝒒,𝒙𝑓𝒙12superscript𝒙top𝑸𝒙𝒒𝒙f({\bm{x}})\coloneqq\tfrac{1}{2}{\bm{x}}^{\top}{\bm{Q}}{\bm{x}}+{\mathopen{}% \left\langle{}{\bm{q}}{},{}{\bm{x}}{}\right\rangle\mathclose{}}italic_f ( bold_italic_x ) ≔ divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_Q bold_italic_x + ⟨ bold_italic_q , bold_italic_x ⟩, g(𝒙)δ[𝒙missing¯,𝒙¯](𝒙)𝑔𝒙subscript𝛿¯𝒙missing¯𝒙𝒙g({\bm{x}})\coloneqq\operatorname{\delta}_{[\underline{{\bm{x}}missing},% \overline{{\bm{x}}}]}({\bm{x}})italic_g ( bold_italic_x ) ≔ italic_δ start_POSTSUBSCRIPT [ under¯ start_ARG bold_italic_x roman_missing end_ARG , over¯ start_ARG bold_italic_x end_ARG ] end_POSTSUBSCRIPT ( bold_italic_x ), 𝒄(𝒙)𝑨𝒙𝒃𝒄𝒙𝑨𝒙𝒃{\bm{c}}({\bm{x}})\coloneqq{\bm{A}}{\bm{x}}-{\bm{b}}bold_italic_c ( bold_italic_x ) ≔ bold_italic_A bold_italic_x - bold_italic_b and 𝒍𝒖𝟎𝒍𝒖0{\bm{l}}\coloneqq{\bm{u}}\coloneqq{\bm{0}}bold_italic_l ≔ bold_italic_u ≔ bold_0. We are interested in comparing the performance of Alg1 (in different variants) with the two problem formulations described in Section 4.3 to deal with equalities: either by splitting into two inequalities (leading to ψρ±[0,0]superscriptsubscript𝜓superscript𝜌plus-or-minus00\psi_{\rho^{\ast}}^{\pm[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± [ 0 , 0 ] end_POSTSUPERSCRIPT defined in (9)) or by performing a combined marginalization (resulting in ψρ[0,0]superscriptsubscript𝜓superscript𝜌00\psi_{\rho^{\ast}}^{[0,0]}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 , 0 ] end_POSTSUPERSCRIPT given by (11)). Hence, for each solver’s variant and problem instance, we contrast these two formulations, symbolized by Alg1± and Alg1, respectively.

Setup

Problem instances are generated as follows: we let 𝑸=(𝑴+𝑴)/2𝑸𝑴superscript𝑴top2{\bm{Q}}=({\bm{M}}+{\bm{M}^{\top}})/2bold_italic_Q = ( bold_italic_M + bold_italic_M start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) / 2 where the elements of 𝑴n×n{\bm{M}}\in{}^{n\times n}bold_italic_M ∈ start_FLOATSUPERSCRIPT italic_n × italic_n end_FLOATSUPERSCRIPT are normally distributed, 𝑴ij,:𝒩(0,1)similar-tosubscript𝑴𝑖𝑗:𝒩01{\bm{M}_{ij,:}}\sim\mathcal{N}(0,1)bold_italic_M start_POSTSUBSCRIPT italic_i italic_j , : end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ), with only 10% being nonzero. The linear part of the cost 𝒒𝒒{\bm{q}}bold_italic_q is also normally distributed, i.e., qi𝒩(0,1)similar-tosubscript𝑞𝑖𝒩01{{q}}_{i}\sim\mathcal{N}(0,1)italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ). Simple bounds are generated according to a uniform distribution, i.e., x¯i𝒰(0,1)similar-tosubscript¯𝑥𝑖𝒰01\underline{x}_{i}\sim-\mathcal{U}(0,1)under¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ - caligraphic_U ( 0 , 1 ) and x¯i𝒰(0,1)similar-tosubscript¯𝑥𝑖𝒰01\overline{x}_{i}\sim\mathcal{U}(0,1)over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_U ( 0 , 1 ). We set the elements of 𝑨m×n{\bm{A}}\in{}^{m\times n}bold_italic_A ∈ start_FLOATSUPERSCRIPT italic_m × italic_n end_FLOATSUPERSCRIPT as 𝑨ij,:𝒩(0,1)similar-tosubscript𝑨𝑖𝑗:𝒩01{\bm{A}_{ij,:}}\sim\mathcal{N}(0,1)bold_italic_A start_POSTSUBSCRIPT italic_i italic_j , : end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) with only 10% being nonzero. To ensure that the problem is feasible, we draw an element 𝒙^[𝒙missing¯,𝒙¯]^𝒙¯𝒙missing¯𝒙\widehat{{\bm{x}}}\in[\underline{{\bm{x}}missing},\overline{{\bm{x}}}]over^ start_ARG bold_italic_x end_ARG ∈ [ under¯ start_ARG bold_italic_x roman_missing end_ARG , over¯ start_ARG bold_italic_x end_ARG ] (as x^i=x¯i+(x¯ix¯i)aisubscript^𝑥𝑖subscript¯𝑥𝑖subscript¯𝑥𝑖subscript¯𝑥𝑖subscript𝑎𝑖\widehat{x}_{i}=\underline{x}_{i}+(\overline{x}_{i}-\underline{x}_{i}){{a}}_{i}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = under¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - under¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, ai𝒰(0,1)similar-tosubscript𝑎𝑖𝒰01{{a}}_{i}\sim\mathcal{U}(0,1)italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_U ( 0 , 1 )) and set 𝒃=𝑨𝒙^𝒃𝑨^𝒙{\bm{b}}={\bm{A}}\widehat{{\bm{x}}}bold_italic_b = bold_italic_A over^ start_ARG bold_italic_x end_ARG. An initial guess is randomly generated for each problem instance, as xi0𝒩(0,1)similar-tosuperscriptsubscript𝑥𝑖0𝒩01{{x}}_{i}^{0}\sim\mathcal{N}(0,1)italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∼ caligraphic_N ( 0 , 1 ), and shared across all solvers and formulations.

We consider problems with m{1,2,,10}𝑚1210m\in\{1,2,\ldots,10\}italic_m ∈ { 1 , 2 , … , 10 } and n=10m𝑛10𝑚n=10mitalic_n = 10 italic_m, set the tolerances ϵp=ϵd=105subscriptitalic-ϵpsubscriptitalic-ϵdsuperscript105\epsilon_{\rm p}=\epsilon_{\rm d}=10^{-5}italic_ϵ start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT roman_d end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, and construct 10 instances for each size, for a total of 100 calls to each solver for each formulation.

Results

Numerical results are visualized by means of pairwise (extended) performance profiles. Let ts,psubscript𝑡𝑠𝑝t_{s,p}italic_t start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT and ts,p±superscriptsubscript𝑡𝑠𝑝plus-or-minust_{s,p}^{\pm}italic_t start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT denote the evaluation metric of solver sS𝑠𝑆s\in Sitalic_s ∈ italic_S on a certain instance pP𝑝𝑃p\in Pitalic_p ∈ italic_P with the two formulations. Then, for each solver s𝑠sitalic_s, the corresponding pairwise performance profile displays the cumulative distribution ρs:[0,)[0,1]:subscript𝜌𝑠001\rho_{s}:[0,\infty)\to[0,1]italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT : [ 0 , ∞ ) → [ 0 , 1 ] of its performance ratio τs,psubscript𝜏𝑠𝑝\tau_{s,p}italic_τ start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT, namely

ρs(τ)|{pPτs,pτ}||P|whereτs,pts,pts,p±.formulae-sequencesubscript𝜌𝑠𝜏𝑝𝑃subscript𝜏𝑠𝑝𝜏𝑃wheresubscript𝜏𝑠𝑝subscript𝑡𝑠𝑝superscriptsubscript𝑡𝑠𝑝plus-or-minus\rho_{s}(\tau)\coloneqq\frac{|{\mathopen{}\left\{p\in P{}\mathrel{\mid}{}\tau_% {s,p}\leq\tau\right\}\mathclose{}}|}{|P|}\quad\text{where}\quad\tau_{s,p}% \coloneqq\frac{t_{s,p}}{t_{s,p}^{\pm}}.italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_τ ) ≔ divide start_ARG | { italic_p ∈ italic_P ∣ italic_τ start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT ≤ italic_τ } | end_ARG start_ARG | italic_P | end_ARG where italic_τ start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT ≔ divide start_ARG italic_t start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT end_ARG .

Thus, the profile for solver s𝑠sitalic_s indicates the fraction of problems ρs(τ)subscript𝜌𝑠𝜏\rho_{s}(\tau)italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_τ ) for which solver s𝑠sitalic_s invoked by Alg1 requires at most τ𝜏\tauitalic_τ times the computational effort needed by the same solver s𝑠sitalic_s invoked by Alg1±. As depicted in Fig. 9, all pairwise performance profiles cross the unit ratio with at least 83% problems solved, meaning that all solver variants benefit from the tailored handling of equality constraints as in Section 4.3. All variants are nevertheless robust to degenerate formulations, confirming that our algorithmic framework can endure redundant constraints and hidden equalities.

Refer to caption
Figure 9: Comparison of different solvers and formulations for quadratic programs (16). Pairwise performance profiles, relative to number of gradient evaluations, for formulation 𝖠𝗅𝗀𝟣𝖠𝗅𝗀𝟣{\sf Alg1}sansserif_Alg1 (explicit equality) over 𝖠𝗅𝗀𝟣±superscript𝖠𝗅𝗀𝟣plus-or-minus{\sf Alg1}^{\pm}sansserif_Alg1 start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT (split into two inequalities). Profiles located in the top-left indicate that the former tends to outperform the latter.

6 Final remarks and open questions

We proposed an optimization framework for the numerical solution of constrained structure problems in the fully nonconvex setting. We went beyond a simple combination of (exact) penalty and barrier approaches by taking a marginalization step, which not only allows us to reduce the problem size but also enables the adoption of generic subsolvers. In particular, by extending the domain of the subproblems’ smooth objective term, the proposed methodology overcomes the need for safeguards within the subsolver and the difficulty of accelerating it, a major drawback of IPprox [12]. Under mild assumptions, our theretical analysis established convergence results on par with those typical for nonconvex nonsmooth optimization. Most notably, all feasible accumulation points are asymptotically KKT optimal. We tested our approach numerically with problems arising in data science, studying scalability and the effect of accuracy requirements. Furthermore, illustrative examples confirmed the robust behavior on badly formulated problems and degenerate cases.

The methodology in this paper could be applied to, and compared with, a combination of barrier and augmented Lagrangian approaches. By generating a smoother penalty-barrier term, this strategy could benefit from the more effective performance of subsolvers. However, this development comes with the additional challenge of designing suitable updates for the Lagrange multiplier. Future research may also focus on specializing the proposed framework to classical nonlinear programming, taking advantage of the special structure and linear algebra. Finally, mechanisms for rapid infeasibility detection and guaranteed existence of subproblems’ solutions should be investigated.

Appendix A Auxiliary results

This appendix contains some auxiliary results and proofs of statements referred to in the main body.

Lemma A.1 (Properties of the barrier ).

Any function as in Section 3.2 satisfies the following:

  1. 1.

    limt(t)=inf=0subscript𝑡𝑡inf0\lim_{t\to-\infty}\eulerb{}(t)=\operatorname*{inf}\eulerb{}=0roman_lim start_POSTSUBSCRIPT italic_t → - ∞ end_POSTSUBSCRIPT ( italic_t ) = roman_inf = 0 and limt0(t)=limt0(t)=\lim_{t\to 0^{-}}\eulerb{}(t)=\lim_{t\to 0^{-}}{}^{\prime}{}(t)=\inftyroman_lim start_POSTSUBSCRIPT italic_t → 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) = roman_lim start_POSTSUBSCRIPT italic_t → 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) = ∞.

  2. 2.

    The conjugate is continuously differentiable on the interior of its domain dom=+\operatorname{dom}{}^{\ast}{}={}_{+}roman_dom start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT = start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT with ()<0({}^{\ast})^{\prime}{}<0( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < 0, and satisfies (0)=0{}^{\ast}{}(0)=0start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( 0 ) = 0 and limt(t)=\lim_{t^{\ast}\to\infty}{}^{\ast}{}(t^{\ast})=-\inftyroman_lim start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞ end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = - ∞.

  3. 3.

    (t)=()(t)t(()(t)){}^{\ast}{}(t^{*})=({}^{\ast})^{\prime}{}(t^{*})t^{*}-\eulerb{}(({}^{\ast})^{% \prime}{}(t^{*}))start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - ( ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) for any t>0superscript𝑡0t^{*}>0italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0.

  4. 4.

    The function (0,)t(t)/t=t(t)/(t)(0,\infty)\ni t^{\ast}\mapsto\nicefrac{{{}^{\ast}{}(t^{\ast})}}{{t^{\ast}}}=t-% \nicefrac{{\eulerb{}(t)}}{{{}^{\prime}{}(t)}}( 0 , ∞ ) ∋ italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ↦ / start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = italic_t - / start_ARG ( italic_t ) end_ARG start_ARG start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) end_ARG, where t()(t)t\coloneqq({}^{\ast})^{\prime}{}(t^{\ast})italic_t ≔ ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), strictly increases from -\infty- ∞ to 0.

Proof.
  • \diamondsuit

    ??)  Trivial because of strict monotonicity on (,0)0(-\infty,0)( - ∞ , 0 ) (since >0{}^{\prime}{}>0start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT > 0).

  • \diamondsuit

    ??)  Since (t)=(def)supt<0{tt(t)}{}^{\ast}{}(t^{*})\mathrel{{}\mathop{=}\limits^{\text{\clap{\tiny(def)}}}}\sup% _{t<0}{\mathopen{}\left\{tt^{*}-\eulerb{}(t){}\mathrel{\mid}{}{}\right\}% \mathclose{}}start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = start_POSTSUPERSCRIPT (def) end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_t < 0 end_POSTSUBSCRIPT { italic_t italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - ( italic_t ) ∣ }, if t<0superscript𝑡0t^{*}<0italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 0 one has that limttt(t)=subscript𝑡𝑡superscript𝑡𝑡\lim_{t\to-\infty}tt^{*}-\eulerb{}(t)=\inftyroman_lim start_POSTSUBSCRIPT italic_t → - ∞ end_POSTSUBSCRIPT italic_t italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - ( italic_t ) = ∞. For t=0superscript𝑡0t^{*}=0italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0 one directly has that (0)=inf=0{}^{\ast}{}(0)=-\operatorname*{inf}\eulerb{}=0start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( 0 ) = - roman_inf = 0, see [1, Prop. 13.10(i)], and in particular 0dom0\in\operatorname{dom}{}^{\ast}{}0 ∈ roman_dom start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT; in addition, since domdom()=range=++\operatorname{dom}{}^{\ast}{}\supseteq\operatorname{dom}({}^{\ast})^{\prime}{}% =\operatorname{range}{}^{\prime}{}={}_{++}roman_dom start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ⊇ roman_dom ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_range start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT = start_FLOATSUBSCRIPT + + end_FLOATSUBSCRIPT with equality holding by virtue of [1, Thm. 16.29], we conclude that dom=+\operatorname{dom}{}^{\ast}{}={}_{+}roman_dom start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT = start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT. For the same reason, one has that <0{{}^{\ast}{}}^{\prime}<0start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < 0 on (0,)0(0,\infty)( 0 , ∞ ), which proves that is strictly decreasing. Finally, since inf=(0)=\operatorname*{inf}{}^{\ast}{}=-\eulerb{}(0)=-\inftyroman_inf start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT = - ( 0 ) = - ∞, we conclude that limt(t)=\lim_{t^{*}\to\infty}{}^{\ast}{}(t^{*})=-\inftyroman_lim start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞ end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = - ∞.

  • \diamondsuit

    ??)  This is a standard result of Fenchel conjugacy, see e.g. [1, Prop. 16.10], here specialized to the fact that range=++\operatorname{range}{}^{\prime}{}={}_{++}roman_range start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT = start_FLOATSUBSCRIPT + + end_FLOATSUBSCRIPT.

  • \diamondsuit

    ??)  Strict monotonic increase follows by observing that ((t)/t)=(t)/(t)2>0{\mathopen{}\left(\nicefrac{{{}^{\ast}{}(t^{*})}}{{t^{*}}}\right)\mathclose{}}% ^{\prime}=\nicefrac{{\eulerb{}(t)}}{{(t^{*})^{2}}}>0( / start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = / start_ARG ( italic_t ) end_ARG start_ARG ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG > 0 for t>0superscript𝑡0t^{*}>0italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0. Moreover,

    limt0+(t)t=lim(t)0+t(t)(t)=limtt(t)(t)>0=.\displaystyle\lim_{t^{*}\to 0^{+}}\tfrac{{}^{\ast}{}(t^{*})}{t^{*}}=\lim_{{}^{% \prime}{}(t)\to 0^{+}}t-\tfrac{\eulerb{}(t)}{{}^{\prime}{}(t)}=\lim_{t\to-% \infty}t-{\vphantom{\tfrac{\eulerb{}(t)}{{}^{\prime}{}(t)}}\smash{\overbracket% {\tfrac{\eulerb{}(t)}{{}^{\prime}{}(t)}}^{>0}}}=-\infty.roman_lim start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → 0 start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = roman_lim start_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) → 0 start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_t - divide start_ARG ( italic_t ) end_ARG start_ARG start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) end_ARG = roman_lim start_POSTSUBSCRIPT italic_t → - ∞ end_POSTSUBSCRIPT italic_t - over﹇ start_ARG divide start_ARG ( italic_t ) end_ARG start_ARG start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) end_ARG end_ARG start_POSTSUPERSCRIPT > 0 end_POSTSUPERSCRIPT = - ∞ .
    Lastly,
    limt(t)t=limt(t)=lim(t)t=limt0t=0,\displaystyle\lim_{t^{*}\to\infty}\tfrac{{}^{\ast}{}(t^{*})}{t^{*}}=\lim_{t^{*% }\to\infty}{{}^{\ast}{}}^{\prime}(t^{*})=\lim_{{}^{\prime}{}(t)\to\infty}t=% \lim_{t\to 0^{-}}t=0,roman_lim start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞ end_POSTSUBSCRIPT divide start_ARG start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = roman_lim start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → ∞ end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = roman_lim start_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) → ∞ end_POSTSUBSCRIPT italic_t = roman_lim start_POSTSUBSCRIPT italic_t → 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_t = 0 , (17)

    where the first equality uses L’Hôpital’s rule. ∎

We next prove in detail the derivations of (1) and (1). Noticing that the minimization in (1) separates into that of σt,α/μ(τ)αμτ+δ+(τ)+(tτ)subscript𝜎𝑡𝛼𝜇𝜏𝛼𝜇𝜏subscript𝛿𝜏𝑡𝜏\sigma_{t,\nicefrac{{\alpha}}{{\mu}}}(\tau)\coloneqq\frac{\alpha}{\mu}\tau+% \operatorname{\delta}_{{}_{+}}(\tau)+\eulerb{}(t-\tau)italic_σ start_POSTSUBSCRIPT italic_t , / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_τ ) ≔ divide start_ARG italic_α end_ARG start_ARG italic_μ end_ARG italic_τ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) + ( italic_t - italic_τ ) for t=c1(𝒙),,cm(𝒙)𝑡subscript𝑐1𝒙subscript𝑐𝑚𝒙t={{c}}_{1}({\bm{x}}),\dots,{{c}}_{m}({\bm{x}})italic_t = italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_x ) , … , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_italic_x ) (after a division by a factor μ𝜇\muitalic_μ), we provide the elementwise version of the claim.

Lemma A.2.

For ρ>0superscript𝜌0\rho^{\ast}>0italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0 and t𝑡absentt\in\m@thbbch@rRitalic_t ∈, let σt,ρ:¯\sigma_{t,\rho^{\ast}}:\m@thbbch@rR\rightarrow\overline{}\m@thbbch@rRitalic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : → over¯ start_ARG end_ARG be defined as σt,ρ(τ)=ρτ+δ+(τ)+(tτ)subscript𝜎𝑡superscript𝜌𝜏superscript𝜌𝜏subscript𝛿𝜏𝑡𝜏\sigma_{t,\rho^{\ast}}(\tau)=\rho^{\ast}\tau+\operatorname{\delta}_{{}_{+}}(% \tau)+\eulerb{}(t-\tau)italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) = italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_τ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) + ( italic_t - italic_τ ). Then,

argminσt,ρ=argminsubscript𝜎𝑡superscript𝜌absent\displaystyle\operatorname*{arg\,min}\sigma_{t,\rho^{\ast}}={}start_OPERATOR roman_arg roman_min end_OPERATOR italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = [t()(ρ)]+\displaystyle\bigl{[}t-({}^{\ast})^{\prime}{}(\rho^{\ast})\bigr{]}_{+}[ italic_t - ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT (18a)
and
minσt,ρ=subscript𝜎𝑡superscript𝜌absent\displaystyle\min\sigma_{t,\rho^{\ast}}={}roman_min italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ψρ(t),subscript𝜓superscript𝜌𝑡\displaystyle\psi_{\rho^{\ast}}(t),italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) , (18b)

where

ψρ(t)subscript𝜓superscript𝜌𝑡absent\displaystyle\psi_{\rho^{\ast}}(t)\coloneqq{}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) ≔ {(t)if (t)ρρt(ρ)otherwise=(+δ[0,ρ])(t){\displaystyle{\mathopen{}\left\{\begin{array}[]{l @{\hspace{\ifcasescolsep}} >% {\text{if~}}l }\eulerb{}(t)\hfil\hskip 10.00002pt&\leavevmode\nobreak\ }{}^{% \prime}{}(t)\leq\rho^{\ast}\\ \rho^{\ast}t-{}^{\ast}{}(\rho^{\ast})\hfil\hskip 10.00002pt&\lx@intercol\text{% otherwise}\hfil\lx@intercol\end{array}\right.\mathclose{}}={\mathopen{}\left({% }^{\ast}{}+\operatorname{\delta}_{[0,\rho^{\ast}]}\right)\mathclose{}}^{\ast}(t){ start_ARRAY start_ROW start_CELL ( italic_t ) end_CELL start_CELL if start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) ≤ italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY = ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT + italic_δ start_POSTSUBSCRIPT [ 0 , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) (19c)
is (globally) Lipschitz differentiable and ρsuperscript𝜌\rho^{\ast}italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT-Lipschitz continuous with derivative
ψρ(t)=superscriptsubscript𝜓superscript𝜌𝑡absent\displaystyle\psi_{\rho^{\ast}}^{\prime}(t)={}italic_ψ start_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) = min{b(t),ρ}.\displaystyle\min{\mathopen{}\left\{b^{\prime}(t),\rho^{\ast}{}\mathrel{\mid}{% }{}\right\}\mathclose{}}.roman_min { italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ } . (19d)
Proof.

We first consider (18a).

The properties of as in Section 3.2 ensure that σt,ρsubscript𝜎𝑡superscript𝜌\sigma_{t,\rho^{\ast}}italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is strictly convex and coercive, and that therefore it attains a unique (global) minimizer. Moreover, σt,ρsubscript𝜎𝑡superscript𝜌\sigma_{t,\rho^{\ast}}italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is differentiable on intdomσt,ρ=([t]+,)intdomsubscript𝜎𝑡superscript𝜌subscriptdelimited-[]𝑡\operatorname{int}\operatorname{dom}\sigma_{t,\rho^{\ast}}=([t]_{+},\infty)roman_int roman_dom italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ( [ italic_t ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , ∞ ); as such, the minimizer is the unique zero of σt,ρsuperscriptsubscript𝜎𝑡superscript𝜌\sigma_{t,\rho^{\ast}}^{\prime}italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT if it exists, and 0 otherwise. Solving σt,ρ(τ)=0superscriptsubscript𝜎𝑡superscript𝜌𝜏0\sigma_{t,\rho^{\ast}}^{\prime}(\tau)=0italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_τ ) = 0 for τ>0𝜏0\tau>0italic_τ > 0 gives

αμ(tτ)=0τ>0tτ=()(ρ)τ>0τ=t()(ρ)>0.\alpha-\mu{}^{\prime}{}(t-\tau)=0\wedge\tau>0\quad\Leftrightarrow\quad t-\tau=% ({}^{\ast})^{\prime}{}(\rho^{\ast})\wedge\tau>0\quad\Leftrightarrow\quad\tau=t% -({}^{\ast})^{\prime}{}(\rho^{\ast})>0.italic_α - italic_μ start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t - italic_τ ) = 0 ∧ italic_τ > 0 ⇔ italic_t - italic_τ = ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∧ italic_τ > 0 ⇔ italic_τ = italic_t - ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > 0 .

Therefore, the minimizer is given by

argminσt,ρ={t()(ρ)if t()(ρ)>00otherwise=[t()(ρ)]+,{\operatorname*{arg\,min}\sigma_{t,\rho^{\ast}}={\mathopen{}\left\{\begin{array% }[]{l @{\hspace{\ifcasescolsep}} >{\text{if~}}l }t-({}^{\ast})^{\prime}{}(\rho% ^{\ast})\hfil\hskip 10.00002pt&\leavevmode\nobreak\ }t-({}^{\ast})^{\prime}{}(% \rho^{\ast})>0\\ 0\hfil\hskip 10.00002pt&\lx@intercol\text{otherwise}\hfil\lx@intercol\end{% array}\right.\mathclose{}}=\bigl{[}t-({}^{\ast})^{\prime}{}(\rho^{\ast})\bigr{% ]}_{+},start_OPERATOR roman_arg roman_min end_OPERATOR italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL italic_t - ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL start_CELL if italic_t - ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY = [ italic_t - ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,

which yields the claimed expression in (18a).

Next, according to this formula, if ()(ρ)<t({}^{\ast})^{\prime}{}(\rho^{\ast})<t( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) < italic_t then the minimizer of σt,ρsubscript𝜎𝑡superscript𝜌\sigma_{t,\rho^{\ast}}italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is t()(ρ)t-({}^{\ast})^{\prime}{}(\rho^{\ast})italic_t - ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), hence

()(ρ)<tminσt,ρ=\displaystyle({}^{\ast})^{\prime}{}(\rho^{\ast})<t\quad\Rightarrow\quad\min% \sigma_{t,\rho^{\ast}}={}( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) < italic_t ⇒ roman_min italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = σt,ρ(t()(ρ))\displaystyle\sigma_{t,\rho^{\ast}}\bigl{(}t-({}^{\ast})^{\prime}{}(\rho^{\ast% })\bigr{)}italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t - ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
=\displaystyle={}= ρtρ()(ρ)+(()(ρ))\displaystyle\rho^{\ast}t-\rho^{\ast}({}^{\ast})^{\prime}{}(\rho^{\ast})+% \eulerb{}(({}^{\ast})^{\prime}{}(\rho^{\ast}))italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t - italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ( ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
=\displaystyle={}= ρtρρ+(ρ)superscript𝜌𝑡superscript𝜌𝜌𝜌\displaystyle\rho^{\ast}t-\rho^{\ast}\rho+\eulerb{}(\rho)italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t - italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_ρ + ( italic_ρ )
=\displaystyle={}= ρt(ρ),\displaystyle\rho^{\ast}t-{}^{\ast}{}(\rho^{\ast}),italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ,

where ρ()(ρ)\rho\coloneqq({}^{\ast})^{\prime}{}(\rho^{\ast})italic_ρ ≔ ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (i.e., such that ρ=(ρ)\rho^{\ast}={}^{\prime}{}(\rho)italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_ρ )). Otherwise, if ()(ρ)t({}^{\ast})^{\prime}{}(\rho^{\ast})\geq t( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ italic_t then the minimizer of σt,ρsubscript𝜎𝑡superscript𝜌\sigma_{t,\rho^{\ast}}italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is 0, which gives

()(ρ)tminσt,ρ=σt,ρ(0)=b(t).\displaystyle({}^{\ast})^{\prime}{}(\rho^{\ast})\geq t\quad\Rightarrow\quad% \min\sigma_{t,\rho^{\ast}}=\sigma_{t,\rho^{\ast}}(0)=b(t).( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ italic_t ⇒ roman_min italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( 0 ) = italic_b ( italic_t ) .

By observing that ()(ρ)t({}^{\ast})^{\prime}{}(\rho^{\ast})\geq t( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ italic_t iff b(t)ρsuperscript𝑏𝑡superscript𝜌b^{\prime}(t)\leq\rho^{\ast}italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) ≤ italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT we infer that

minσt,ρ={(t)if (t)ρρt(ρ)otherwise=(def)ψα/μ(t),{\min\sigma_{t,\rho^{\ast}}={\mathopen{}\left\{\begin{array}[]{l @{\hspace{% \ifcasescolsep}} >{\text{if~}}l }\eulerb{}(t)\hfil\hskip 10.00002pt&% \leavevmode\nobreak\ }{}^{\prime}{}(t)\leq\rho^{\ast}\\ \rho^{\ast}t-{}^{\ast}{}(\rho^{\ast})\hfil\hskip 10.00002pt&\lx@intercol\text{% otherwise}\hfil\lx@intercol\end{array}\right.\mathclose{}}\mathrel{{}\mathop{=% }\limits^{\text{\clap{\tiny(def)}}}}\psi_{\nicefrac{{\alpha}}{{\mu}}}(t),roman_min italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL ( italic_t ) end_CELL start_CELL if start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_t ) ≤ italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY = start_POSTSUPERSCRIPT (def) end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_t ) ,

which proves (18b).

We next show the alternative expression involving the convex conjugacy for ψα/μsubscript𝜓𝛼𝜇\psi_{\nicefrac{{\alpha}}{{\mu}}}italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT as in (19c). To this end, for a function h:¯h:\m@thbbch@rR\rightarrow\overline{}\m@thbbch@rRitalic_h : → over¯ start_ARG end_ARG and a number t𝑡absentt\in\m@thbbch@rRitalic_t ∈, let Tthh(t)\operatorname{T}_{t}h\coloneqq h({}\cdot-t)roman_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_h ≔ italic_h ( ⋅ - italic_t ) and hh()h_{-}\coloneqq h(-\cdot{})italic_h start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ≔ italic_h ( - ⋅ ) denote the translation by t𝑡titalic_t and the reflection, respectively, and recall that (Tth)=h+t(\operatorname{T}_{t}h)^{\ast}=h^{\ast}+t{}\cdot{}( roman_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_h ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_h start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_t ⋅ and (h)=(h)hsuperscriptsubscriptsubscriptsuperscriptsubscriptsuperscript(h_{-})^{\ast}=({h^{\ast}})_{-}\eqqcolon h^{\ast}_{-}( italic_h start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( italic_h start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ≕ italic_h start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - end_POSTSUBSCRIPT, see [1, Prop.s 13.23(iii)-(iv)]. We will also use the fact that (h1+h2)=h1h2superscriptsubscript1subscript2subscript1subscript2(h_{1}+h_{2})^{\ast}=h_{1}\mathbin{\square}h_{2}( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT □ italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT holds for any pair of proper, lsc, convex functions hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT defined on the same space, i=1,2𝑖12i=1,2italic_i = 1 , 2, where h1h2infth1(t)+h2(t)h_{1}\mathbin{\square}h_{2}\coloneqq\operatorname*{inf}_{t}h_{1}({}\cdot-t)+h_% {2}(t)italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT □ italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≔ roman_inf start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ - italic_t ) + italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) denotes the infimal convolution, see [1, Prop. 13.24]. Then,

inf1μσt,ρ=inf1𝜇subscript𝜎𝑡superscript𝜌absent\displaystyle\operatorname*{inf}\tfrac{1}{\mu}\sigma_{t,\rho^{\ast}}={}roman_inf divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG italic_σ start_POSTSUBSCRIPT italic_t , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = infτ{αμτ+δ+(τ)+b(tτ)Ttb(τ)}\displaystyle\operatorname*{inf}_{\tau\in\m@thbbch@rR}{\mathopen{}\left\{% \tfrac{\alpha}{\mu}\tau+\operatorname{\delta}_{{}_{+}}(\tau)+{\vphantom{b(t-% \tau)}\smash{\overbracket{b(t-\tau)}^{\operatorname{T}_{t}b_{-}(\tau)}}}{}% \mathrel{\mid}{}{}\right\}\mathclose{}}\vphantom{{{\overbracket{b(t-\tau)}^{% \operatorname{T}_{t}b_{-}(\tau)}}}missing}roman_inf start_POSTSUBSCRIPT italic_τ ∈ end_POSTSUBSCRIPT { divide start_ARG italic_α end_ARG start_ARG italic_μ end_ARG italic_τ + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) + over﹇ start_ARG italic_b ( italic_t - italic_τ ) end_ARG start_POSTSUPERSCRIPT roman_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( italic_τ ) end_POSTSUPERSCRIPT ∣ }
=\displaystyle={}= supτ{αμτ[δ+(τ)+Ttb(τ)]}\displaystyle-\sup_{\tau\in\m@thbbch@rR}{\mathopen{}\left\{-\tfrac{\alpha}{\mu% }\tau-\Bigl{[}\operatorname{\delta}_{{}_{+}}(\tau)+\operatorname{T}_{t}b_{-}(% \tau)\Bigr{]}{}\mathrel{\mid}{}{}\right\}\mathclose{}}- roman_sup start_POSTSUBSCRIPT italic_τ ∈ end_POSTSUBSCRIPT { - divide start_ARG italic_α end_ARG start_ARG italic_μ end_ARG italic_τ - [ italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) + roman_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( italic_τ ) ] ∣ }
=\displaystyle={}= (Ttb+δ+)(ρ)superscriptsubscriptT𝑡subscript𝑏subscript𝛿superscript𝜌\displaystyle-{\mathopen{}\left(\operatorname{T}_{t}b_{-}+\operatorname{\delta% }_{{}_{+}}\right)\mathclose{}}^{\ast}(-\rho^{\ast})- ( roman_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT - end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( - italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
=\displaystyle={}= [(Ttb)δ+](ρ)delimited-[]superscriptsubscriptT𝑡subscript𝑏superscriptsubscript𝛿superscript𝜌\displaystyle-\Bigl{[}{\mathopen{}\left(\operatorname{T}_{t}b_{-}\right)% \mathclose{}}^{\ast}\mathbin{\square}\operatorname{\delta}_{{}_{+}}^{\ast}% \Bigr{]}(-\rho^{\ast})- [ ( roman_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT □ italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT + end_FLOATSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] ( - italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
=\displaystyle={}= [(+t)δ](ρ)\displaystyle-\Bigl{[}{\mathopen{}\left({}^{\ast}{}_{-}+t{}\cdot{}\right)% \mathclose{}}\mathbin{\square}\operatorname{\delta}_{{}_{-}}\Bigr{]}(-\rho^{% \ast})- [ ( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT start_FLOATSUBSCRIPT - end_FLOATSUBSCRIPT + italic_t ⋅ ) □ italic_δ start_POSTSUBSCRIPT start_FLOATSUBSCRIPT - end_FLOATSUBSCRIPT end_POSTSUBSCRIPT ] ( - italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
=\displaystyle={}= infu0{(ρu)t(ρ+u)}\displaystyle-\operatorname*{inf}_{u\leq 0}{\mathopen{}\left\{{}^{\ast}{}_{-}(% -\rho^{\ast}-u)-t(\rho^{\ast}+u){}\mathrel{\mid}{}{}\right\}\mathclose{}}- roman_inf start_POSTSUBSCRIPT italic_u ≤ 0 end_POSTSUBSCRIPT { start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT start_FLOATSUBSCRIPT - end_FLOATSUBSCRIPT ( - italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_u ) - italic_t ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_u ) ∣ }
=\displaystyle={}= supu0{t(ρ+u)(ρ+u)}\displaystyle\sup_{u\leq 0}{\mathopen{}\left\{t(\rho^{\ast}+u)-{}^{\ast}{}(% \rho^{\ast}+u){}\mathrel{\mid}{}{}\right\}\mathclose{}}roman_sup start_POSTSUBSCRIPT italic_u ≤ 0 end_POSTSUBSCRIPT { italic_t ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_u ) - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_u ) ∣ }
and since (u)={}^{\ast}{}(u)=\inftystart_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_u ) = ∞ for u<0𝑢0u<0italic_u < 0,
=\displaystyle={}= supu[0,ρ]{tu(u)}\displaystyle\sup_{u\in[0,\rho^{\ast}]}{\mathopen{}\left\{tu-{}^{\ast}{}(u){}% \mathrel{\mid}{}{}\right\}\mathclose{}}roman_sup start_POSTSUBSCRIPT italic_u ∈ [ 0 , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT { italic_t italic_u - start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT ( italic_u ) ∣ }
=\displaystyle={}= (+δ[0,ρ])(t)=(def)ψα/μ(t),\displaystyle{\mathopen{}\left({}^{\ast}{}+\operatorname{\delta}_{[0,\rho^{% \ast}]}\right)\mathclose{}}^{\ast}(t)\mathrel{{}\mathop{=}\limits^{\text{\clap% {\tiny(def)}}}}\psi_{\nicefrac{{\alpha}}{{\mu}}}(t),( start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT + italic_δ start_POSTSUBSCRIPT [ 0 , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) = start_POSTSUPERSCRIPT (def) end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT / start_ARG italic_α end_ARG start_ARG italic_μ end_ARG end_POSTSUBSCRIPT ( italic_t ) ,

which yields (19c), and the formula for the derivative as in (19d) is then immediately obtained.

To conclude, observe that b𝑏bitalic_b is essentially differentiable,555In the sense that b(t)superscript𝑏𝑡b^{\prime}(t)\to\inftyitalic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) → ∞ as t0𝑡superscript0t\to 0^{-}italic_t → 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT, 0 being the only point in the boundary of dombdom𝑏\operatorname{dom}broman_dom italic_b. locally strongly convex and locally Lipschitz differentiable (having b′′>0superscript𝑏′′0b^{\prime\prime}>0italic_b start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT > 0), all these conditions also holding for the conjugate by virtue of [15, Cor. 4.4]. Therefore, +δ[0,ρ]{}^{\ast}{}+\operatorname{\delta}_{[0,\rho^{\ast}]}start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT + italic_δ start_POSTSUBSCRIPT [ 0 , italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT is (globally) strongly convex, proving the claimed global Lipschitz-smoothness of its conjugate . ∎

References

  • [1] Heinz H. Bauschke and Patrick L. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics. Springer, 2017.
  • [2] Shujun Bi and Shaohua Pan. Multistage convex relaxation approach to rank regularized minimization problems based on equivalent mathematical program with a generalized complementarity constraint. SIAM Journal on Control and Optimization, 55(4):2493–2518, 2017.
  • [3] Ernesto G. Birgin and José Mario Martínez. Practical Augmented Lagrangian Methods for Constrained Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014.
  • [4] Emilie Chouzenoux, Marie-Caroline Corbineau, and Jean-Christophe Pesquet. A proximal interior point algorithm with applications to image processing. Journal of Mathematical Imaging and Vision, 62(6):919–940, 2020.
  • [5] Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint. Trust Region Methods. Society for Industrial and Applied Mathematics, 2000.
  • [6] Robert M. Corless, Gaston H. Gonnet, David E.G. Hare, David J. Jeffrey, and Donald E. Knuth. On the Lambert W function. Advances in Computational mathematics, 5:329–359, 1996.
  • [7] Frank E. Curtis. A penalty-interior-point algorithm for nonlinear constrained optimization. Mathematical Programming Computation, 4(2):181–209, 6 2012.
  • [8] Alberto De Marchi. Proximal gradient methods beyond monotony. Journal of Nonsmooth Analysis and Optimization, 4, 2023.
  • [9] Alberto De Marchi. Implicit augmented Lagrangian and generalized optimization. Journal of Applied and Numerical Optimization, 6(2):291–320, 2024.
  • [10] Alberto De Marchi, Xiaoxi Jia, Christian Kanzow, and Patrick Mehlitz. Constrained composite optimization and augmented Lagrangian methods. Mathematical Programming, 201(1):863–896, 2023.
  • [11] Alberto De Marchi and Andreas Themelis. Proximal gradient algorithms under local Lipschitz gradient continuity. Journal of Optimization Theory and Applications, 194(3):771–794, 2022.
  • [12] Alberto De Marchi and Andreas Themelis. An interior proximal gradient method for nonconvex optimization. Open Journal of Mathematical Optimization, 2024. to appear.
  • [13] N. K. Dhingra, S. Z. Khong, and M. R. Jovanović. The proximal augmented Lagrangian method for nonsmooth composite optimization. IEEE Transactions on Automatic Control, 64(7):2861–2868, 2019.
  • [14] Anthony V. Fiacco and Garth P. McCormick. The sequential unconstrained minimization technique for nonlinear programing, a primal-dual method. Management Science, 10(2):360–366, 1964.
  • [15] Rafal Goebel and R. Tyrrell Rockafellar. Local strong convexity and local Lipschitz continuity of the gradient of convex functions. Journal of Convex Analysis, 15(2):263, 2008.
  • [16] Nadav Hallak and Marc Teboulle. An adaptive Lagrangian-based scheme for nonconvex composite optimization. Mathematics of Operations Research, 48(4):2337–2352, 2023.
  • [17] Ben Hermans, Andreas Themelis, and Panagiotis Patrinos. QPALM: A proximal augmented Lagrangian method for nonconvex quadratic programs. Mathematical Programming Computation, 14:497–541, 3 2022.
  • [18] Geoffroy Leconte and Dominique Orban. An interior-point trust-region method for nonsmooth regularized bound-constrained optimization, 2024.
  • [19] Jakub Mareček, Peter Richtárik, and Martin Takáč. Matrix completion under interval uncertainty. European Journal of Operational Research, 256(1):35 – 43, 2017.
  • [20] Edward J. McShane. Extension of range of functions. Bulletin of the American Mathematical Society, 40(12):837 – 842, 1934.
  • [21] R. Tyrrell Rockafellar. Convergence of augmented Lagrangian methods in extensions beyond nonlinear programming. Mathematical Programming, 199(1):375–420, 2022.
  • [22] R. Tyrrell Rockafellar and Roger J.B. Wets. Variational analysis, volume 317. Springer, 1998.
  • [23] Ajay S. Sathya, Pantelis Sopasakis, Ruben Van Parys, Andreas Themelis, Goele Pipeleers, and Panos Patrinos. Embedded nonlinear model predictive control for obstacle avoidance using PANOC. In 2018 European Control Conference (ECC), pages 1523–1528, 6 2018.
  • [24] Lorenzo Stella, Andreas Themelis, Pantelis Sopasakis, and Panagiotis Patrinos. A simple and efficient algorithm for nonlinear model predictive control. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 1939–1944. IEEE, 2017.
  • [25] Andreas Themelis, Lorenzo Stella, and Panagiotis Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization, 28(3):2274–2303, 2018.
  • [26] Andreas Wächter and Lorenz T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 3 2006.