Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

A Characterization of Proximity Operators

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

We characterize proximity operators, that is to say functions that map a vector to a solution of a penalized least-squares optimization problem. Proximity operators of convex penalties have been widely studied and fully characterized by Moreau. They are also widely used in practice with nonconvex penalties such as the \(\ell ^0\) pseudo-norm, yet the extension of Moreau’s characterization to this setting seemed to be a missing element of the literature. We characterize proximity operators of (convex or nonconvex) penalties as functions that are the subdifferential of some convex potential. This is proved as a consequence of a more general characterization of the so-called Bregman proximity operators of possibly nonconvex penalties in terms of certain convex potentials. As a side effect of our analysis, we obtain a test to verify whether a given function is the proximity operator of some penalty, or not. Many well-known shrinkage operators are indeed confirmed to be proximity operators. However, we prove that windowed Group-LASSO and persistent empirical Wiener shrinkage—two forms of a so-called social sparsity shrinkage—are generally not the proximity operator of any penalty; the exception is when they are simply weighted versions of group-sparse shrinkage with non-overlapping groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. See Sect. 2.1 for detailed notations and reminders on convex analysis and differentiability in Hilbert spaces.

  2. See Sect. 2.1 for brief reminders on the notion of continuity/differentiability in Hilbert spaces.

  3. A continuous linear operator \(L: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) is symmetric if \({\langle }x, Ly{\rangle } = {\langle }Lx, y{\rangle }\) for each \(x,y \in {{\mathcal {H}}}\). A symmetric continuous linear operator is positive semi-definite if \({\langle }x,Lx{\rangle } \geqslant 0\) for each \(x \in {{\mathcal {H}}}\). This is denoted \(L \succeq 0\). It is positive definite if \({\langle }x,Lx{\rangle } >0\) for each nonzero \(x \in {{\mathcal {H}}}\). This is denoted \(L \succ 0\).

  4. See “Appendix 1” for some reminders on Fréchet derivatives in Hilbert spaces.

  5. For the sake of simplicity, we use the same notation \({\langle }\cdot ,\cdot {\rangle }\) for the inner products \({\langle }x,A(y){\rangle }\) (between elements of \({{\mathcal {H}}}\)) and \({\langle }B(x),y{\rangle }\) (between elements of \({{\mathcal {H}}}'\)). The reader can inspect the proof of Theorem 3 to check that the result still holds if we consider Banach spaces\({{\mathcal {H}}}\) and \({{\mathcal {H}}}'\), \({{\mathcal {H}}}^\star \) and \(({{\mathcal {H}}}')^\star \) their duals, and \(A: {\mathcal {Y}}\rightarrow {{\mathcal {H}}}^\star \), \(B: {{\mathcal {H}}}\rightarrow ({{\mathcal {H}}}')^\star \).

  6. That are explicitly constructed as the proximity operator of a convex l.s.c. penalty, e.g., soft-thresholding.

  7. Also known as convex hull, [30, p. 57], [21, Definition 2.5.3].

  8. For a proof, see, e.g., (in French) https://fr.wikipedia.org/wiki/Lemme_de_Cousin section 4.9, version from 13/01/2019.

  9. In general, we may have \(g\ne g_1\) as there is no connectedness assumption on \({\text {dom}}(\theta )\).

  10. The inclusion (29) is true even if f is not a proximity operator.

References

  1. Advani, M., Ganguli, S.: An equivalence between high dimensional Bayes optimal inference and M-estimation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 3378–3386. Curran Associates Inc., New York (2016)

    Google Scholar 

  2. Antoniadis, A.: Wavelet methods in statistics: some recent developments and their applications. Stat. Surv. 1, 16–55 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bach, F.: Optimization with sparsity-inducing penalties. FNT Mach. Learn. 4(1), 1–106 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bakin, S.: Adaptive regression and model selection in data mining problems. Ph.D. thesis, School of Mathematical Sciences, Australian National University (1999)

  5. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42(2), 596–636 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Cham (2017)

    Book  MATH  Google Scholar 

  7. Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bredies, K., Lorenz, D.A., Reiterer, S.: Minimization of non-smooth, non-convex functionals by iterative thresholding. J. Optim. Theory Appl. 165(1), 78–112 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cai, T.T., Silverman, B.W.: Incorporating information on neighbouring coefficients into wavelet estimation. Sankhyà Indian J. Stat. Ser. B 63, 127–148 (2001). (Special Issue on Wavelets)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cartan, H.: Cours de calcul différentiel. Collection Méthodes. Editions Hermann (1977)

  12. Censor, Y., Zenios, S.A.: Proximal minimization algorithm with \(d\)-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992)

    MathSciNet  MATH  Google Scholar 

  13. Combettes, P.L., Pesquet, J.C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18, 1351–1376 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ekeland, I., Turnbull, T.: Infinite-Dimensional Optimization and Convexity. Chicago Lectures in Mathematics. The University of Chicago Press, Chicago (1983)

    MATH  Google Scholar 

  15. Févotte, C., Kowalski, M.: Hybrid sparse and low-rank time-frequency signal decomposition. EUSIPCO, pp. 464–468 (2015)

  16. Galbis, A., Maestre, M.: Vector Analysis Versus Vector Calculus. Universitext. Springer, Boston (2012)

    Book  MATH  Google Scholar 

  17. Gribonval, R.: Should penalized least squares regression be interpreted as maximum a posteriori estimation? IEEE Trans. Signal Process. 59(5), 2405–2410 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Gribonval, R., Machart, P.: Reconciling “priors” and “priors” without prejudice? In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26 (NIPS), pp. 2193–2201 (2013). https://papers.nips.cc/paper/4868-reconciling-priors-priors-without-prejudice

  19. Gribonval, R., Nikolova, M.: On bayesian estimation and proximity operators. Appl. Comput. Harmon. Anal. (2019). https://doi.org/10.1016/j.acha.2019.07.002

    Article  Google Scholar 

  20. Hall, P., Penev, S.I., Kerkyacharian, G., Picard, D.: Numerical performance of block thresholded wavelet estimators. Stat. Comput. 7, 115–124 (1997)

    Article  Google Scholar 

  21. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms, vol. I. Springer, Berlin (1996)

    MATH  Google Scholar 

  22. Kowalski, M., Siedenburg, K., Dörfler, M.: Social sparsity! Neighborhood systems enrich structured shrinkage operators. IEEE Trans. Signal Process. 61, 2498–2511 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kowalski, M., Torrésani, B.: Sparsity and persistence: mixed norms provide simple signal models with dependent coefficients. Signal Image Video Process. 3(3), 251–264 (2009)

    Article  MATH  Google Scholar 

  24. Kowalski, M., Torrésani, B.: Structured sparsity: from mixed norms to structured shrinkage. In: Gribonval, R. (ed.) SPARS’09: Signal Processing with Adaptive Sparse Structured Representations. Inria Rennes - Bretagne Atlantique, Saint Malo (2009)

    Google Scholar 

  25. Louchet, C., Moisan, L.: Posterior expectation of the total variation model: properties and experiments. SIAM J. Imaging Sci. 6(4), 2640–2684 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  26. Moreau, J.J.: Proximité et dualité dans un espace Hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  27. Nikolova, M.: Estimation of binary images by minimizing convex criteria. In: Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269), pp. 108–112. IEEE Comput. Soc. (1998)

  28. Parekh, A., Selesnick, I.W.: Convex denoising using non-convex tight frame regularization. IEEE Signal Process. Lett. 22, 1786–1790 (2015)

    Article  Google Scholar 

  29. Rockafellar, R.: On the maximal monotonicity of subdifferential mappings. Pac. J. Math. 33(1), 209–216 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  30. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  31. Selesnick, I.W.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17), 4481–4494 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  32. Siedenburg, K., Dörfler, M.: Structured sparsity for audio signals. In: Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11), Paris (2011)

  33. Siedenburg, K., Kowalski, M., Dörfler, M.: Audio declipping with social sparsity. In: ICASSP 2014: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1577–1581. IEEE (2014)

  34. Thomson, B.S.: Rethinking the elementary real analysis course. Am. Math. Mon. 114, 469–490 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  35. Varoquaux, G., Kowalski, M., Thirion, B.: Social-sparsity brain decoders: faster spatial sparsity. In: International Workshop on Pattern Recognition in Neuroimaging, Trento (2016)

  36. Villani, C.: Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften: A Series of Comprehensive Studies in Mathematics, vol. 338. Springer, Berlin (2009)

    Google Scholar 

  37. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The first author wishes to thank Laurent Condat, Jean-Christophe Pesquet and Patrick-Louis Combettes for their feedback that helped improve an early version of this paper, as well as the anonymous reviewers for many insightful comments that improved it much further.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rémi Gribonval.

Additional information

This work and the companion paper [19] are dedicated to the memory of Mila Nikolova, who passed away prematurely in June 2018. Mila dedicated much of her energy to bring the technical content to completion during the spring of 2018. The first author did his best to finalize the papers as Mila would have wished. He should be held responsible for any possible imperfection in the final manuscript.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Appendix: Proofs

The proofs of technical results of Sect. 2 are provided in “Appendix 4” (Theorem 3), “Appendix5” (Lemma 1), “Appendix 6” (Corollary 3) and “Appendix 7” (Lemma 2). As a preliminary, we give brief reminders on some useful, but classical notions in Sects. 11.

1.1 Appendix 1 Brief Reminders on (Fréchet) Differentials and Gradients in Hilbert Spaces

Consider \({{\mathcal {H}}},{{\mathcal {H}}}'\) two Hilbert spaces. A function \(\theta : {\mathcal {X}}\rightarrow {{\mathcal {H}}}'\) where \({\mathcal {X}}\subset {{\mathcal {H}}}\) is an open domain is (Fréchet) differentiable at x if there exists a continuous linear operator \(L:{{\mathcal {H}}}\rightarrow {{\mathcal {H}}}'\) such that \(\lim _{h \rightarrow 0}\Vert \theta (x+h)-\theta (x)-L(h)\Vert _{{{\mathcal {H}}}'}/\Vert h\Vert _{{{\mathcal {H}}}} = 0\). The linear operator L is called the differential of \(\theta \) at x and denoted \(D\theta (x)\). When \({{\mathcal {H}}}' = {\mathbb {R}}\), L belongs to the dual of \({{\mathcal {H}}}\), hence there is \(u \in {{\mathcal {H}}}\)—called the gradient of \(\theta \) at x and denoted \(\nabla \theta (x)\)—such that \(L(h) = {\langle }u,h{\rangle },\ \forall \;h \in {{\mathcal {H}}}\).

1.2 Appendix 2 Subgradients and Subdifferentials for Possibly Nonconvex Functions

We adopt a gentle definition which is familiar when \(\theta \) is a convex function. Although this is possibly less well known by nonexperts, this definition is also valid when \(\theta \) is possibly nonconvex, see, e.g., [6, Definition 16.1].

Definition 5

Let \(\theta :{{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) be a proper function. The subdifferential\(\partial \theta (x)\) of \(\theta \) at x is the set of all \(u\in {{\mathcal {H}}}\), called subgradients of \(\theta \) at x, such that

$$\begin{aligned} \theta (x') \geqslant \theta (x) + {\langle } u,x'-x{\rangle },\quad \forall x' \in {{\mathcal {H}}}. \end{aligned}$$
(16)

If \(x\not \in {\text {dom}}(\theta )\), then \(\partial \theta (x)=\varnothing \). The function \(\theta \) is subdifferentiable at \(x \in {{\mathcal {H}}}\) if \(\partial \theta (x) \ne \varnothing \). The domain of \(\partial \theta \) is \({\text {dom}}(\partial \theta ) := \{x \in {{\mathcal {H}}}, \partial \theta (x) \ne \varnothing \}\). It satisfies \({\text {dom}}(\partial \theta ) \subset {\text {dom}}(\theta )\).

Fact 1

When \(\partial \theta (x)\ne \varnothing \), the inequality in (16) is trivial for each \(x'\not \in {\text {dom}}(\theta )\) since it amounts to \(+\infty = \theta (x')-\theta (x) \geqslant {\langle } u,x'-x{\rangle }\).

Definition 5 leads to the well-known Fermat’s rule [6, Theorem 16.3]

Theorem 5

Let \(\theta :{{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) be a proper function. A point \(x\in {\text {dom}}(\theta )\) is a global minimizer of \(\theta \) if and only if

$$\begin{aligned} 0 \in \partial \theta (x). \end{aligned}$$

If \(\theta \) has a global minimizer at x, then by Theorem 5 the set \(\partial \theta (x)\) is non-empty. However, \(\partial \theta (x)\) can be empty, e.g., at local minimizers that are not the global minimizer:

Example 7

Let \(\theta (x)=\frac{1}{2}x^2-\cos (\pi x)\). The global minimum of \(\theta \) is reached at \(x=0\) where \(\partial \theta (x)= f'(x)=0\). At \(x=\pm 1.7 {\bar{9}} \)\(\theta \) has local minimizers where \(\partial \theta (x)=\varnothing \) (even though \(\theta \) is \({{\mathcal {C}}}^\infty \)). For \(|x|<0.53\), one has \(\partial \theta (x)=\nabla \theta (x)\) with \(\theta ''(x)\geqslant 0\) and for \(0.54< |x| < 1.91\)\(\partial \theta (x)=\varnothing \).

The proof of the following lemma is a standard exercise in convex analysis [6, Exercise 16.8].

Lemma 3

Let \(\theta :{{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) be a proper function such that (a) \({\text {dom}}(\theta )\) is convex and (b) \(\partial \theta (x)\ne \varnothing \) for each \(x \in {\text {dom}}(\theta )\). Then, \(\theta \) is a convex function.

Definition 6

(Lower convex envelope of a function) Let \(\theta : {{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) be proper with \({\text {dom}}(\partial \theta ) \ne \varnothing \). Its lower convex envelope,Footnote 7 denoted \(\breve{\theta }\), is the pointwise supremum of all the convex lower-semicontinuous functions minorizing \(\theta \)

$$\begin{aligned}&\breve{\theta }(x) := \sup \{\varrho (x)\, |\, \varrho : {{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}, \varrho \ \text{ convex } \text{ l.s.c. },\ \nonumber \\&\varrho (z) \leqslant \theta (z), \forall \;z \in {{\mathcal {H}}}\},\quad \forall \;x \in {{\mathcal {H}}}. \end{aligned}$$
(17)

The function \(\breve{\theta }\) is proper, convex and lower-semicontinuous. It satisfies

$$\begin{aligned} \breve{\theta }(x) \leqslant \theta (x), \quad \forall \;x \in {{\mathcal {H}}}. \end{aligned}$$
(18)

Proposition 3

Let \(\theta : {{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) be proper with \({\text {dom}}(\partial \theta ) \ne \varnothing \). For any \(x_{0} \in {\text {dom}}(\partial \theta )\), we have \(\breve{\theta }(x_0) = \theta (x_0)\), \(\partial \theta (x_0) = \partial \breve{\theta }(x_0)\).

Proof

As \(\partial \theta (x_0) \ne \varnothing \), by [6, Proposition 13.45], \(\breve{\theta }\) is the so-called biconjugate \(\theta ^{**}\) of \(\theta \) [6, Definition 13.1]. Moreover, [6, Proposition 16.5] yields \(\theta ^{**}(x_{0}) = \theta (x_{0})\) and \(\partial \theta ^{**}(x_{0}) = \partial \theta (x_{0})\).

We need to adapt [6, Proposition 17.31] to the case where \(\theta \) is proper but possibly nonconvex, with a stronger assumption of Fréchet (instead of Gâteaux) differentiability.

Proposition 4

If \(\partial \theta (x) \ne \varnothing \) and \(\theta \) is (Fréchet) differentiable at x, then \(\partial \theta (x) = \{\nabla \theta (x)\}\).

Proof

Consider \(u \in \partial \theta (x)\). As \(\theta \) is differentiable at x, there is an open ball \({\mathcal {B}}\) centered at 0 such that \(x+h \in {\text {dom}}(\theta )\) for each \(h \in {\mathcal {B}}\). For each \(h \in {\mathcal {B}}\), Definition 5 yields

$$\begin{aligned}&\theta (x-h)-\theta (x) \geqslant {\langle }u,-h{\rangle }\quad \quad \text{ and } \\&\theta (x+h)-\theta (x) \geqslant {\langle }u,h{\rangle } \end{aligned}$$

hence \(-(\theta (x-h)-\theta (x)) \leqslant {\langle }u,h{\rangle } \leqslant \theta (x+h)-\theta (x)\). Since \(\theta \) is Fréchet differentiable at x, letting \(\Vert h\Vert \) tend to zero yields

$$\begin{aligned} -\left( {\langle }\nabla \theta (x),-h{\rangle } + o(\Vert h\Vert )\right) \leqslant {\langle }u,h{\rangle } \leqslant {\langle }\nabla \theta (x),h{\rangle } + o(\Vert h\Vert ) \end{aligned}$$

hence \({\langle }u-\nabla \theta (x),h{\rangle } = o(\Vert h\Vert )\), \(\forall \;h\in {\mathcal {B}}\). This shows that \(u = \nabla \theta (x)\).

1.3 Appendix 3 Characterizing Functions with a Given Subdifferential

Corollary 9 below generalizes a result of Moreau [26, Proposition 8.b] characterizing functions by their subdifferential. It shows that one only needs the subdifferentials to intersect. We begin in dimension one.

Lemma 4

Consider \(a_0,a_1: {\mathbb {R}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) convex functions such that \({\text {dom}}(a_i) = {\text {dom}}(\partial a_i) = [0,1]\) and \(\partial a_0(t) \cap \partial a_1(t) \ne \varnothing \) on [0, 1]. Then, there exists a constant \(K \in {\mathbb {R}}\) such that \(a_1(t)-a_0(t)=K\) on [0, 1].

Proof

As \(a_i\) is convex, it is continuous on (0, 1) [21, Theorem 3.1.1, p16]. Moreover, by [21, Proposition 3.1.2] we have \(a_{i}(0) \geqslant \lim _{t \rightarrow 0, t>0} a_{i}(t) =:a_{i}(0_+)\), and since \(\partial a_{i}(0) \ne \varnothing \), there is \(u_{i} \in \partial a_{i}(0)\) such that \(a_{i}(t) \geqslant a_{i}(0) + u_{i}(t-0)\) for each \(t \in [0,1]\) hence \(a_{i}(0_+) \geqslant a_{i}(0)\). This shows that \(a_{i}(0_+) = a_{i}(0)\), and similarly \( \lim _{t \rightarrow 1, t<1} a_{i}(t) = a_{i}(1)\), hence \(a_{i}\) is continuous on [0, 1] relative to [0, 1]. In addition, \(a_{i}\) is differentiable on [0, 1] except on a countable set \(B_i \subset [0,1]\) [21, Theorem 4.2.1 (ii)].

For \(t \in [0,1] \backslash (B_{0} \cup B_{1})\) and \(i \in \{0,1\}\), Proposition 4 yields \(\partial a_i(t) = \{a'_i(t)\}\) hence the function \(\delta := a_1-a_0\) is continuous on [0, 1] and differentiable on \([0,1] \backslash (B_0 \cup B_1)\). For \(t \in I \backslash (B_0 \cup B_1)\), \(\{a'_0(t)\} \cap \{a'_1(t)\} = \partial a_0(t) \cap \partial a_1(t) \ne \varnothing \), hence \(a'_0(t)=a'_1(t)\) and \(\delta '(t) = 0\). A classical exerciseFootnote 8 in real analysis [34, Example 4] is to show that if a function f is continuous on an interval, and differentiable with zero derivative except on a countable set, then f is constant. As \(B_0 \cup B_1\) is countable, it follows \(\delta \) is constant on (0, 1). As it is continuous on [0, 1], it is constant on [0, 1].

Corollary 9

Let \(\theta _0,\theta _1: {{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) be proper and \({{\mathcal {C}}}\subset {{\mathcal {H}}}\) a non-empty polygonally connected set. Assume that for each \(z \in {{\mathcal {C}}}\), \(\partial \theta _0(z) \cap \partial \theta _1(z) \ne \varnothing \); then, there is a constant \(K \in {\mathbb {R}}\) such that \(\theta _1(x) -\theta _0(x) = K\), \(\forall \;x \in {{\mathcal {C}}}\).

Remark 4

Note that the functions \(\theta _{i}\) and the set \({{\mathcal {C}}}\) are not assumed to be convex.

Proof

The proof is in two parts.

  1. (i)

    Assume that \({{\mathcal {C}}}\) is convex and fix some \(x^* \in {{\mathcal {C}}}\). Consider \(x \in {{\mathcal {C}}}\), and define \(a_i(t):= \theta _i(x^*+t(x-x^*))\), for \(i=0,1\) and each \(t \in [0,1]\), and \(a_i(t)=+\infty \) if \(t\not \in [0,1]\). As \({{\mathcal {C}}}\) is convex, \(z_t := x^*+t(x-x^*) \in {{\mathcal {C}}}\) hence for each \(t \in [0,1]\) there exists \(u_t \in \partial \theta _0(z_t) \cap \partial \theta _1(z_t)\). By Definition 5 for each \(t,t' \in [0,1]\),

    $$\begin{aligned}&a_i(t')-a_i(t) = \theta _i(x^*+t'(x-x^*))-\theta _i(x^*+t(x-x^*)) \\&\quad \geqslant {\langle }u_t,(t'-t)(x-x^*){\rangle } ={\langle }u_t,x-x^*{\rangle }(t'-t). \end{aligned}$$

    For \(t \in [0,1]\) and \(t' \in {\mathbb {R}}\backslash [0,1]\) since \(a_{i}(t') = +\infty \) the inequality \(a_i(t')-a_i(t) \geqslant {\langle }u_t,x-x^*{\rangle }(t'-t)\) also obviously holds, hence \({\langle }u_t,x-x^*{\rangle } \in \partial a_i(t)\), \(i=0,1\). Thus, \(\partial a_i(t)\ne \varnothing \) for each \(t \in [0,1]\), so by Lemma 3\(a_i\) is convex on [0, 1] for \(i=0,1\), and \({\langle }u_t,x-x^*{\rangle } \in \partial a_0(t) \cap \partial a_1(t)\) for each \(t \in [0,1]\). By Lemma 4, there exists \(K \in {\mathbb {R}}\) such that \(a_1(t)-a_0(t)= K\) for each \(t \in [0,1]\). Therefore,

    $$\begin{aligned} \theta _1(x)-\theta _0(x)= & {} a_1(1)-a_0(1) = a_1(0) -a_0(0)\\= & {} \theta _1(x^*) -\theta _0(x^*) = K. \end{aligned}$$

    As this holds for each \(x \in {{\mathcal {C}}}\), we have established the result as soon as \({{\mathcal {C}}}\) is convex.

  2. (ii)

    Now, we prove the result when \({{\mathcal {C}}}\) is polygonally connected. Fix some \(x^* \in {{\mathcal {C}}}\) and define \(K:= \theta _1(x^*)-\theta _0(x^*)\). Consider \(x \in {{\mathcal {C}}}\): By the definition of polygonal-connectedness, there exists an integer \(n \geqslant 1\) and \(x_j \in {{\mathcal {C}}}\), \(0 \leqslant j \leqslant n\) with \(x_0 = x^*\) and \(x_n = x\) such that the (convex) segments \({{\mathcal {C}}}_j = [x_j,x_{j+1}] = \{t x_j + (1-t) x_{j+1}, t \in [0,1]\}\) satisfy \({{\mathcal {C}}}_j \subset {{\mathcal {C}}}\). Since each \({{\mathcal {C}}}_j\) is convex, the result established in (i) implies that \(\theta _1(x_{j+1})-\theta _0(x_{j+1}) = \theta _1(x_j)-\theta _0(x_j)\) for \(0 \leqslant j < n\). This shows that \(\theta _1(x)-\theta _0(x) = \theta _1(x_{n})-\theta _0(x_{n}) = \cdots = \theta _1(x_{0})-\theta _0(x_{0}) = \theta _1(x^*)-\theta _0(x^*) =K\).

1.4 Appendix 4 Proof of Theorem 3

The indicator function of a set \({\mathcal {S}}\) is denoted

$$\begin{aligned} \chi _{{\mathcal {S}}}(x):=\left\{ \begin{array}{lll} 0&{}\text{ if }&{}x\in {\mathcal {S}}, \\ +\infty &{}\text{ if }&{} x\not \in {\mathcal {S}}.\end{array}\right. \end{aligned}$$

(ai) \(\Rightarrow \) (aii) We introduce the function \(\theta :{{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) by

$$\begin{aligned} \theta := b+\varphi + \chi _{{\text {Im}}(f)}. \end{aligned}$$
(19)

Consider \(x \in {\text {Im}}(f)\). By definition, \(x= f(y)\) where \(y \in {\mathcal {Y}}\), hence by (ai) x is a global minimizer of \(x'\mapsto \left\{ D(x',y)+\varphi (x')\right\} \). Therefore, we have

$$\begin{aligned}&\forall \;x' \in {{\mathcal {H}}},\quad -{\langle }A(y),x'{\rangle }+\underbrace{b(x')+\varphi (x') + \chi _{ {\text {Im}}(f) } (x')}_{=\theta (x')} \nonumber \\&\quad \geqslant -{\langle }A(y),x{\rangle }+\underbrace{b(x)+\varphi (x) + \chi _{{\text {Im}}(f)}(x)}_{=\theta (x)} \end{aligned}$$
(20)

which is equivalent to

$$\begin{aligned} \forall \;x'\in {{\mathcal {H}}}\quad \quad \theta (x') \geqslant \theta (x) + {\langle }A(y),x'-x{\rangle } \end{aligned}$$
(21)

meaning that \(A(y)\in \partial \theta \left( f(y)\right) \). As this holds for each \(y \in {\mathcal {Y}}\) such that \(f(y)=x\), we get \(A(f^{-1}(x))\subset \partial \theta (x)\). Consider \(g_1 := \breve{\theta } \) according to Definition 6. Since \(g_1\) is convex l.s.c. and

$$\begin{aligned} \forall \;x \in {\text {Im}}(f), \quad \partial \theta (x) \ne \varnothing , \end{aligned}$$
(22)

by Proposition 3, \(\partial \theta (x) = \partial g_1(x)\) and \(\theta (x) = g_{1}(x)\) for each \(x \in {\text {Im}}(f)\). This establishes (aii) with \(g := g_1 = \breve{\theta }\).

(aii) \(\Rightarrow \) (ai) Set \(\theta _1: = g+\chi _{{\text {Im}}(f)}\). By (aii), \(\partial g(x) \ne \varnothing \) for each \(x \in {\text {Im}}(f)\). Since \({\text {dom}}(\partial g) \subset {\text {dom}}(g)\), it follows that \({\text {Im}}(f) \subset {\text {dom}}(g)\) and consequently

$$\begin{aligned} {\text {dom}}(\theta _1)={\text {Im}}(f). \end{aligned}$$

Consider \(y \in {\mathcal {Y}}\) and \(x:=f(y)\) so that \(x \in {\text {Im}}(f)\), hence \(\theta _1(x)=g(x)\) and \(A(y)\in A(f^{-1}(x)) \subset \partial g(x)\) where the inclusion comes from (aii). It follows that for each \((x,x')\in {\text {Im}}(f) \times {{\mathcal {H}}}\), one has

$$\begin{aligned} \theta _1(x')= & {} g(x')+\chi _{{\text {Im}}(f)}(x') \geqslant g(x') \geqslant g(x)+{\langle }A(y),x'-x{\rangle } \\= & {} \theta _{1}(x) +{\langle }A(y),x'-x{\rangle }, \end{aligned}$$

showing that \(A(y)\in \partial \theta _1(x)\). This is equivalent to (21) with \(\theta := \theta _1\), and since \({\text {dom}}(\theta _1) = {\text {Im}}(f)\), the inequality in (20) holds with \(\varphi (x) := \theta _1(x)-b(x)\), i.e., x is a global minimizer of \(D(x',y)+\varphi (x')\). Since this holds for each \(y \in {\mathcal {Y}}\), this establishes (ai) with \(\varphi := \theta _1-b = g-b+\chi _{{\text {Im}}(f)}\).

(b) Consider \(\varphi \) and g satisfying (ai) and (aii), respectively. LetFootnote 9\(g_1 := \breve{\theta }\) with \(\theta \) defined in (19). Following the arguments of (ai) \(\Rightarrow \) (aii), we obtain that \(g_1\) (just as g) satisfies (aii). For each \(x \in {{\mathcal {C}}}\), we thus have \(\partial g(x) \cap \partial g_1(x) \supset A(f^{-1}(x)) \ne \varnothing \) with \(g,g_1\) convex l.s.c. functions. Hence, by Corollary 9, since \({{\mathcal {C}}}\) is polygonally connected, there is a constant K such that \(g(x) = g_1(x)+K\), \(\forall \;x \in {{\mathcal {C}}}\). To establish the relation (2) between g and \(\varphi \), we now show that \(g_1(x) = b(x) + \varphi (x)\) on \({{\mathcal {C}}}\). By (22) and Proposition 3, we have \(\breve{\theta }(x)=\theta (x)\) for each \(x \in {\text {Im}}(f)\), hence as \({{\mathcal {C}}}\subset {\text {Im}}(f)\) we obtain \(g_1(x) := \breve{\theta }(x) = \theta (x) = b(x)+\varphi (x)\) for each \(x \in {{\mathcal {C}}}\). This establishes (2).

(ci) \(\Rightarrow \) (cii) Define

$$\begin{aligned} \varrho (y):= & {} {\left\{ \begin{array}{ll} +\infty , &{}\forall y \notin {\mathcal {Y}}\\ {\langle }B(f(y)),y{\rangle } -b(f(y))- \varphi (f(y)),\quad &{}\forall y \in {\mathcal {Y}}. \end{array}\right. }\nonumber \\ \end{aligned}$$
(23)

Consider \(y \in {\mathcal {Y}}\). From (ci), for each \(y'\) the global minimizer of \(x \mapsto {\widetilde{D}}(x,y')+\varphi (x)\) is reached at \(x'=f(y')\). Hence, for \(x = f(y)\) we have

$$\begin{aligned}&-\,{\langle }B(f(y')),y'{\rangle }+b(f(y'))+\varphi (f(y')) \leqslant -{\langle }B(x),y'{\rangle } \\&\quad +\,b(x)+\varphi (x) = -{\langle }B(f(y)),y'{\rangle }+b(f(y))+\varphi (f(y)). \end{aligned}$$

Using this inequality, we obtain that

$$\begin{aligned}&\forall \;y'\in {\mathcal {Y}},\quad \varrho (y')-\varrho (y) = -{\langle }B(f(y)),y{\rangle } + b(f(y)) \\&\qquad + \,\varphi (f(y)) + {\langle }B(f(y')),y'{\rangle }-b(f(y'))-\varphi (f(y')) \\&\quad \geqslant {\langle }B(f(y)),y'{\rangle }-{\langle }B(f(y)),y{\rangle } \geqslant {\langle } B(f(y)), y'-y{\rangle }. \end{aligned}$$

This shows that

$$\begin{aligned} B(f(y)) \in \partial \varrho (y). \end{aligned}$$
(24)

Set \(\psi _1 := \breve{\varrho } \) according to Definition 6. Then, the function \(\psi _1\) is convex l.s.c. and for each \(y \in {\mathcal {Y}}\) the function B(f(y)) is well-defined, so \(\partial \varrho (y) \ne \varnothing \). Hence, by Proposition 3, \(\partial \varrho (y) =\partial \breve{\varrho }(y)= \partial \psi _1(y)\) and \(\varrho (y)=\breve{\varrho }(y) = \psi _{1}(y)\) for each \(y \in {\mathcal {Y}}\). This establishes (cii) with \(\psi := \psi _1 =\breve{\varrho }\).

(cii) \(\Rightarrow \) (ci) Define \(h:{\mathcal {Y}}\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} h(y):= {\langle } B(f(y)),y{\rangle } - \psi (y). \end{aligned}$$

Since \(B(f(y')) \in \partial \psi (y')\) with \(\psi \) convex by (cii), applying Definition 5 to \(\partial \psi \) yields \(\psi (y) - \psi (y') \geqslant {\langle }y-y',B(f(y')){\rangle }\). Using this inequality, one has

$$\begin{aligned}&\forall \;y,y'\in {\mathcal {Y}}\quad \quad h(y')-h(y) \nonumber \\&\quad = {\langle }B(f(y')),y'{\rangle } - \psi (y') -{\langle }B(f(y)),y{\rangle }+ \psi (y) \nonumber \\&\quad \geqslant {\langle }B(f(y')),y'{\rangle } - {\langle }B(f(y)),y{\rangle } + {\langle }B(f(y')),y-y'{\rangle } \nonumber \\&\quad = \big \langle B(f(y') -B(f(y)),\ y\big \rangle . \end{aligned}$$
(25)

Noticing that for each \(x \in {\text {Im}}(f)\) there is \(y \in {\mathcal {Y}}\) such that \(x=f(y)\), we can define \(\theta :{{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) obeying \({\text {dom}}(\theta )={\text {Im}}(f)\) by

$$\begin{aligned} \theta (x) := \left\{ \begin{array}{ll} h(y ) \,\, \text{ with }~~ y\in f^{-1}(x)&{} \quad \text{ if }~~ x\in {\text {Im}}(f) \\ +\infty &{} \quad \text{ otherwise }.\end{array} \right. \end{aligned}$$

For \(x \in {\text {Im}}(f)\), as \(f(y)=f(y')=x\) for each \(y,y' \in f^{-1}(x)\), applying (25) yields \(h(y')-h(y) \geqslant 0\). By symmetry \(h(y')=h(y)\), hence the definition of \(\theta (x)\) does not depend of which \(y \in f^{-1}(x)\) is chosen.

For \(x'\in {\text {Im}}(f) \), we write \(x'=f(y')\). Using (25) and the definition of \(\theta \) yields

$$\begin{aligned}&\theta (x')-\theta (f(y)) = \theta (f(y'))-\theta (f(y)) = h(y')-h(y) \\&\quad \geqslant {\langle }B(f(y'))-B(f(y)),y{\rangle } = {\langle }B(x')-B(f(y)),y{\rangle }. \end{aligned}$$

that is to say

$$\begin{aligned}&\theta (x') - {\langle }B(x'),y{\rangle } \geqslant \theta (f(y)) - {\langle }B(f(y)),y{\rangle } , \\ \quad \forall \;x' \in {\text {Im}}(f). \end{aligned}$$

This also trivially holds for \(x' \notin {\text {Im}}(f)\). Setting \(\varphi (x):= \theta (x)-b(x)\) for each \(x \in {{\mathcal {H}}}\), and replacing \(\theta \) by \(b+\varphi \) in the inequality above yields

$$\begin{aligned}&a(y)-{\langle }B(x'),y{\rangle }+b(x') + \varphi (x') \geqslant a(y)\\&\quad -\,{\langle }B(f(y)),y{\rangle }+b(f(y))+\varphi (f(y)),\quad \forall \;x' \in {{\mathcal {H}}}\end{aligned}$$

showing that \(f(y) \in \arg \min _{x'} \{{\widetilde{D}}(x',y)+\varphi (x')\}\). As this holds for each \(y \in {\mathcal {Y}}\), \(\varphi \) satisfies (ci).

(d) Consider \(\varphi \) and \(\psi \) satisfying (ci) and (cii), respectively. Using the arguments of (ci) \(\Rightarrow \) (cii), the function \(\psi _1 := \breve{\varrho }\) with \(\varrho \) defined in (23) satisfies (cii). As \(\psi \) and \(\psi _1\) both satisfy (cii), for each \(y \in {{\mathcal {C}}}'\) we have \(\partial \psi (y) \cap \partial \psi _1(y) \supset B(f(y)) \ne \varnothing \) with \(\psi ,\psi _1\) convex l.s.c. functions. Hence, by Corollary 9, since \({{\mathcal {C}}}'\) is polygonally connected, there is a constant \(K'\) such that \(\psi (y) = \psi _1(y)+K'\), \(\forall \;y \in {{\mathcal {C}}}'\). By (24), \(\partial \varrho (y) \ne \varnothing \) for each \(y \in {\mathcal {Y}}\), hence by Proposition 3 we have \(\breve{\varrho }(y) = \varrho (y)\) for each \(y \in {\mathcal {Y}}\). As \({{\mathcal {C}}}' \subset {\mathcal {Y}}\), it follows that \(\psi _1(y) = \breve{\varrho }(y) = \varrho (y)\) for each \(y \in {{\mathcal {C}}}'\). This establishes (3).

1.5 Appendix 5 Proof of Lemma 1

Proof

Without loss of generality, we prove the equivalence for the convex envelope \(\breve{\theta }\) instead of \(\theta \): Indeed by Proposition 3, since \(\partial \theta (x) \ne \varnothing \) on \({\mathcal {X}}\) we have \(\breve{\theta }(x) = \theta (x)\) and \(\partial \breve{\theta }(x) = \partial \theta (x)\) on \({\mathcal {X}}\).

(a) \(\Rightarrow \) (b). By [6, Prop 17.41(iii)\(\Rightarrow \)(i)], as \(\breve{\theta }\) is convex l.s.c. and \(\varrho \) is a selection of its subdifferential which is continuous at each \(x \in {\mathcal {X}}\), \(\breve{\theta }\) is (Fréchet) differentiable at each \(x \in {\mathcal {X}}\). By Proposition 4, we get \(\partial \breve{\theta }(x) = \{\nabla \breve{\theta }(x)\} = \{\varrho (x)\}\) on \({\mathcal {X}}\). Since \(\varrho \) is continuous, \(x \mapsto \nabla \breve{\theta }(x)\) is continuous on \({\mathcal {X}}\).

(b) \(\Rightarrow \) (a). Since \(\breve{\theta }\) is differentiable on \({\mathcal {X}}\), by Proposition 4 we have \(\partial \breve{\theta }(x) = \{\nabla \breve{\theta }(x)\}\) on \({\mathcal {X}}\). By (9), it follows that \(\varrho (x) = \nabla \breve{\theta }(x)\) on \({\mathcal {X}}\). Since \(\nabla \breve{\theta }\) is continuous on \({\mathcal {X}}\), so is \(\varrho \).

1.6 Appendix 6 Proof of Corollary 3

By Theorem 2, as \({\mathcal {Y}}\) is open and convex and f is \(C^1({\mathcal {Y}})\) with Df(y) symmetric semi-definite positive for each \(y \in {\mathcal {Y}}\), there is a function \(\varphi _0\) and a convex l.s.c. function \(\psi \in C^2({\mathcal {Y}})\) such that \(\nabla \psi (y) = f(y) \in \text {prox}_{\varphi _0}(y)\) for each \(y \in {\mathcal {Y}}\). We define \(\varphi (x) := \varphi _0(x) + \chi _{{\text {Im}}(f)}(x)\) and let the reader check that \(f(y) \in \text {prox}_\varphi (y)\) for each \(y \in {\mathcal {Y}}\). By construction, \({\text {dom}}(\varphi ) = {\text {Im}}(f)\).

Uniqueness of the Global Minimizer. Consider \(\widetilde{f}\) any function such that \(\widetilde{f}(y) \in \text {prox}_{\varphi }(y)\) for each y. This implies

$$\begin{aligned}&\tfrac{1}{2}\Vert y-f(y)\Vert ^2+\varphi (f(y)) = \tfrac{1}{2}\Vert y-\widetilde{f}(y)\Vert ^2+\varphi (\widetilde{f}(y)) \nonumber \\&\quad = \min _{x \in {{\mathcal {H}}}} \{\tfrac{1}{2}\Vert y-x\Vert ^2+\varphi (x)\},\quad \forall \;y \in {\mathcal {Y}}. \end{aligned}$$
(26)

By Corollary 1, there is a convex l.s.c. function \(\widetilde{\psi }\) such that \(\widetilde{f}(y) \in \partial \widetilde{\psi }(y)\) for each \(y \in {\mathcal {Y}}\). Since \({\mathcal {Y}}\) is convex, it is polygonally connected hence by Theorem 4(b) and (26) there are \(K,K' \in {\mathbb {R}}\) such that

$$\begin{aligned}&\psi (y) -K = \tfrac{1}{2}\Vert y\Vert ^2-\tfrac{1}{2}\Vert y-f(y)\Vert ^2-\varphi (f(y)) = \tfrac{1}{2}\Vert y\Vert ^2 \nonumber \\&\quad -\tfrac{1}{2}\Vert y -\widetilde{f}(y)\Vert ^2-\varphi (\widetilde{f}(y)) = \widetilde{\psi }(y) - K',\quad \forall \;y \in {\mathcal {Y}}.\nonumber \\ \end{aligned}$$
(27)

Thus, \(\widetilde{\psi }\) is also \(C^2({\mathcal {Y}})\) and \(\widetilde{f}(y) \in \partial \widetilde{\psi }(y) = \{\nabla \psi (y)\} = \{f(y)\}\) for each \(y \in {\mathcal {Y}}\). This shows that \(\widetilde{f}(y)=f(y)\) for each y, hence f(y) is the unique global minimizer on \({{\mathcal {H}}}\) of \(x \mapsto \tfrac{1}{2}\Vert y-x\Vert ^2+\varphi (x)\), i.e., \(\text {prox}_\varphi (y) = \{f(y)\}\).

Injectivity off. The proof follows that of [17, Lemma 1]. Given \(y \ne y'\), define \(v := y'-y \ne 0\) and \(\theta (t) := {\langle }f(y+tv),v{\rangle }\) for \(t \in [0,1]\). As \({\mathcal {Y}}\) is convex, this is well-defined. As \(f \in {{\mathcal {C}}}^1({\mathcal {Y}})\) and \(Df(y+tv) \succ 0\), the function \(\theta \) is \(C^1([0,1])\) with \(\theta '(t) = {\langle } Df(y+tv)\ v,v{\rangle } > 0\) for each t. If we had \(f(y) = f(y')\), then by Rolle’s theorem there would be \(t \in [0,1]\) such that \(\theta '(t)=0\), contradicting the fact that \(\theta '(t)>0\).

Differentiability of\(\varvec{\varphi }\). If Df(y) is boundedly invertible for each \(y \in {\mathcal {Y}}\), then by the inverse function theorem \({\text {Im}}(f)\) is open and \(f^{-1}: {\text {Im}}(f) \rightarrow {\mathcal {Y}}\) is \(C^{1}\). Given \(x \in {\text {Im}}(f)\), denoting \(u := f^{-1}(x)\), (27) yields

$$\begin{aligned} \varphi (x)= & {} \varphi (f(u)) = -(\psi (u)-K)+\tfrac{1}{2}\Vert u\Vert ^{2} \\&-\tfrac{1}{2}\Vert u-f(u)\Vert ^{2} = -(\psi (f^{-1}(x))-K)\\&+\tfrac{1}{2}\Vert f^{-1}(x)\Vert ^{2}-\tfrac{1}{2}\Vert f^{-1}(x)-x\Vert ^{2}. \end{aligned}$$

Since \(\psi \) is \(C^{2}\) and \(f^{-1}\) is \(C^{1}\), it follows that \(\varphi \) is \(C^{1}\).

Global Minimum is the Unique Critical Point. The proof is inspired by that of [17, Theorem 1]. Consider x a critical point of \(\theta : x \mapsto \tfrac{1}{2}\Vert y-x\Vert ^2+\varphi (x)\), i.e., since \(\varphi \) is \(C^{1}\), a point where \(\nabla \theta (x)=0\). Since \({\text {dom}}(\varphi ) = {\text {Im}}(f)\), there is some \(v \in {\mathcal {Y}}\) such that \(x = f(v)\). Moreover, as \(\varphi \) is \(C^{1}\) on the open set \({\text {Im}}(f)\), the gradient \(\nabla \theta (x)\) is well-defined and \(\nabla \theta (x)=0\). On the one hand, denoting \(\varrho (u):= (\theta \circ f)(u) = \tfrac{1}{2}\Vert y-f(u)\Vert ^2+\varphi (f(u))\) we have \(\nabla \varrho (u) = Df(u) \nabla \theta (f(u))\) for each \(u \in {\mathcal {Y}}\). On the other hand, for each \(u \in {\mathcal {Y}}\), as \(f(u) = \nabla \psi (u)\) we also have

$$\begin{aligned} \varrho (u)= & {} \tfrac{1}{2}\Vert y\Vert ^2+\tfrac{1}{2}\Vert f(u)\Vert ^2-{\langle }y,f(u){\rangle }+\varphi (f(u))\\= & {} +\tfrac{1}{2}\Vert y\Vert ^2+{\langle }u-y,f(u){\rangle }-(\psi (u)-K),\\ \nabla \varrho (u)= & {} Df(u)\ (u-y) + f(u) -\nabla \psi (u) = Df(u)\ (u-y). \end{aligned}$$

For \(u=v\), we get \(Df(v)\ (v-y) = \nabla \varrho (v) = Df(v) \nabla \theta (f(v)) = Df(v)\ \nabla \theta (x) = 0\). As \(Df(v) \succ 0\), this implies \(v=y\), hence \(x=f(y)\).

1.7 Appendix 7 Proof of Lemma 2

As a preliminary, let us compute the entries of the \(n \times n\) matrix associated to Df(y):

$$\begin{aligned}&\forall \;i,j \in \llbracket 1,n \rrbracket \quad \nonumber \\&\tfrac{\partial f_{i}}{\partial y_{j}}(y) = {\left\{ \begin{array}{ll} 0 &{} \text {if}\ \Vert \text {diag}(w^{i})y\Vert _{2} < \lambda ;\\ 2(w^{i}_{j})^{2} y_{i}y_{j} h'_{i}\left( \Vert \text {diag}(w^{i})y\Vert _2^{2}\right) &{} \text {if}\ \Vert \text {diag}(w^{i})y\Vert _{2}>\lambda . \end{array}\right. }\nonumber \\ \end{aligned}$$
(28)

Note that if \(\Vert \text {diag}(w^{i})y\Vert _{2}=\lambda \), then f may not be differentiable at y; this case will not be useful below.

The proof exploits Corollary 2 which shows that if f is a proximity operator, then Df(y) is symmetric in each open set where it is well-defined.

Let f be a generalized social shrinkage operator as described in Lemma 2 and consider \({\mathcal {G}} = \{G_{1},\ldots ,G_{p}\}\) the partition of \(\llbracket 1,n \rrbracket \) into disjoint groups corresponding to the equivalence classes defined by the equivalence relation between indices: For \(i,j \in \llbracket 1,n \rrbracket \), \(i \sim j\) if and only if \(w^i = w^j\). Given \(G \in {\mathcal {G}}\), denote \(w^G\) the weight vector shared all \(i \in G\). If f is a proximity operator, then we show that for each \(G \in {\mathcal {G}}\), we have \(\text {supp}(w^{G}) = G\).

For \(i \in G\), by Definition 4 we have \(i \in N_i = \text {supp}(w^i) = \text {supp}(w^G)\), establishing thatFootnote 10

$$\begin{aligned} G \subset \text {supp}(w^G). \end{aligned}$$
(29)

From now on, we assume that f is a proximity operator, and consider a group \(G \in {\mathcal {G}}\). To prove that \(G = \text {supp}(w^G)\), we will establish that for each \(i,j \in \llbracket 1,n \rrbracket \)

$$\begin{aligned}&\text {if there exists}\ y \in {\mathbb {R}}^n\ \text {such that}\ \Vert \text {diag}(w^{j})y\Vert _{2} \ne \Vert \nonumber \\&\text {diag}(w^{i})y\Vert _{2} \ \quad \text {then}\ w^i_j = 0\ \text {and}\ w^j_i = 0. \end{aligned}$$
(30)

To see why it allows to conclude, consider \(j \in \text {supp}(w^G)\), and \(i \in G\). As \(N_i := \text {supp}(w^i) = \text {supp}(w^G)\) we obtain that \(j \in N_i\), i.e., \(w^i_j \ne 0\). By (30), it follows that \( \Vert \text {diag}(w^{j})y\Vert _{2} = \Vert \text {diag}(w^{i})y\Vert _{2}\) for each y. As \(w^i,w^j\) have nonnegative entries, this means that \(w^{i} = w^{j}\). As \(i \in G\), this implies \(j \in G\) by the very definition of G as an equivalence class. This shows \(\text {supp}(w^G) \subset G\). Using also (29), we conclude that \(\text {supp}(w^G)=G\).

Let us now prove (30). Consider a given pair \(i,j \in \llbracket 1,n \rrbracket \). Assume that \(\Vert \text {diag}(w^{j})y\Vert _{2} \ne \Vert \text {diag}(w^{i})y\Vert _{2}\) for at least one vector y. Without loss of generality, assume that \(a := \Vert \text {diag}(w^{j})y\Vert _{2} < \Vert \text {diag}(w^{i})y\Vert _{2} =:b\). Rescaling y by a factor \(c = 2\lambda /(a+b)\) yields the existence of y such that for the considered pair ij

$$\begin{aligned} \Vert \text {diag}(w^{j})y\Vert _{2}<\lambda < \Vert \text {diag}(w^{i})y\Vert _{2}. \end{aligned}$$
(31)

By continuity, perturbing y if needed we can also assume that for this pair ij we have \(y_{i}y_{j} \ne 0\).

By (28), as (31) holds in a neighborhood of y, f is \(C^1\) at y and its partial derivatives for the considered pair ij satisfy

$$\begin{aligned} \tfrac{\partial f_{i}}{\partial y_{j}}(y) = 2(w^{i}_{j})^{2} y_{i}y_{j} h'_{i}\left( \Vert \text {diag}(w^{i})y\Vert _2^{2}\right) \quad \text {and}\quad \tfrac{\partial f_{j}}{\partial y_{i}}(y) = 0. \end{aligned}$$

Since f is a proximity operator, by Corollary 2 we have \(\tfrac{\partial f_{i}}{\partial y_{j}}(y) = \tfrac{\partial f_{j}}{\partial y_{i}}(y)\). It follows that for the considered pair ij

$$\begin{aligned} (w^{i}_{j})^{2} y_{i}y_{j} h'_{i}\left( \Vert \text {diag}(w^{i})y\Vert _2^{2}\right) = 0. \end{aligned}$$

As \(y_iy_j \ne 0\) and \(h'_i(t) \ne 0\) for \(t \ne 0\), we obtain \(w^{i}_{j} = 0\).

To conclude, we now show that \(w^j_i = 0\). As \(w^i_j=0\), \(f_i\) is in fact independent of \(y_j\) and \(\frac{\partial f_{i}}{\partial y_{j}}\) is identically zero on \({\mathbb {R}}^{n}\). By scaling y as needed, we get a vector \(y'\) such that \(y'_{i}y'_{j} \ne 0\) and

$$\begin{aligned} \lambda< \Vert \text {diag}(w^{j})y'\Vert _{2} < \Vert \text {diag}(w^{i})y'\Vert _{2}. \end{aligned}$$

Reasoning as above yields \(2(w^j_i)^2 y'_j y'_i h'_j\left( \Vert \text {diag}(w^j)y'\Vert _2^2\right) = \frac{\partial f_{j}}{\partial y_{i}}(y') = \frac{\partial f_{i}}{\partial y_{j}}(y') =0\), hence \(w^{j}_{i} = 0\). We thus obtain that \(w^{i}_{j}=w^{j}_{i}=0\) as claimed, establishing (30) and therefore \(G = \text {supp}(w^G)\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gribonval, R., Nikolova, M. A Characterization of Proximity Operators. J Math Imaging Vis 62, 773–789 (2020). https://doi.org/10.1007/s10851-020-00951-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-020-00951-y

Keywords