A majorization–minimization algorithm for split feasibility problems

Xu, Jason; Chi, Eric C.; Yang, Meng; Lange, Kenneth

doi:10.1007/s10589-018-0025-z

A majorization–minimization algorithm for split feasibility problems

Published: 02 August 2018

Volume 71, pages 795–828, (2018)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

779 Accesses
Explore all metrics

Abstract

The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization–minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Note on the modified relaxation CQ algorithm for the split feasibility problem

Article 29 April 2017

An algorithm for the split-feasibility problems with application to the split-equality problem

Article Open access 08 December 2017

Iterative algorithms for the multiple-sets split feasibility problem in Hilbert spaces

Article 28 February 2017

References

Alber, M., Meedt, G., Nusslin, F., Reemtsen, R.: On the degeneracy of the IMRT optimization problem. Med. Phys. 29(11), 2584â€“2589 (2002)
Article Google Scholar
Antoniadis, A., Gijbels, I., Nikolova, M.: Penalized likelihood regression for generalized linear models with non-quadratic penalties. Ann. Inst. Stat. Math. 63(3), 585â€“615 (2011)
Article MathSciNet Google Scholar
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and legendre functions in banach spaces. Commun. Contemp. Math. 3(04), 615â€“647 (2001)
Article MathSciNet Google Scholar
Becker, M.P., Yang, I., Lange, K.: EM algorithms without missing data. Stat. Methods Med. Res. 6, 38â€“54 (1997)
Article Google Scholar
Biegler, L.T., Zavala, V.M.: Large-scale nonlinear programming using IPOPT: an integrating framework for enterprise-wide dynamic optimization. Comput. Chem. Eng. 33(3), 575â€“582 (2009)
Article Google Scholar
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200â€“217 (1967)
Article MathSciNet Google Scholar
Byrne, C.L.: Iterative image reconstruction algorithms based on cross-entropy minimization. IEEE Trans. Image Process. 2(1), 96â€“103 (1993)
Article Google Scholar
Byrne, C.L.: Iterative projection onto convex sets using multiple Bregman distances. Inverse Probl. 15(5), 1295 (1999)
Article MathSciNet Google Scholar
Byrne, C.L.: Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 18(2), 441â€“453 (2002)
Article MathSciNet Google Scholar
Byrne, C.L.: Sequential unconstrained minimization algorithms for constrained optimization. Inverse Probl. 24(1), 015013 (2008)
Article MathSciNet Google Scholar
Byrne, C.L.: Alternating minimization as sequential unconstrained minimization: a survey. J. Optim. Theory Appl. 156(3), 554â€“566 (2013)
Article MathSciNet Google Scholar
Byrne, C.L.: An elementary proof of convergence for the forward-backward splitting algorithm. J. Nonlinear Convex Anal. 15(4), 681â€“691 (2014)
MathSciNet MATH Google Scholar
Cegielski, A.: General method for solving the split common fixed point problem. J. Optim. Theory Appl. 165(2), 385â€“404 (2015)
Article MathSciNet Google Scholar
Censor, Y.: Row-action methods for huge and sparse systems and their applications. SIAM Rev. 23(4), 444â€“466 (1981)
Article MathSciNet Google Scholar
Censor, Y.: Weak and strong superiorization: between feasibility-seeking and minimization. Analele Univ. Ovidius Constanta Ser. Mat. 23(3), 41â€“54 (2015)
Article MathSciNet Google Scholar
Censor, Y., Bortfeld, T., Martin, B., Trofimov, A.: A unified approach for inversion problems in intensity-modulated radiation therapy. Phys. Med. Biol. 51(10), 2353 (2006)
Article Google Scholar
Censor, Y., Elfving, T.: A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 8(2), 221â€“239 (1994)
Article MathSciNet Google Scholar
Censor, Y., Elfving, T., Kopf, N., Bortfeld, T.: The multiple-sets split feasibility problem and its applications for inverse problems. Inverse Probl. 21(6), 2071â€“2084 (2005)
Article MathSciNet Google Scholar
Censor, Y., Gibali, A., Reich, S.: Algorithms for the split variational inequality problem. Numer. Algorithms 59(2), 301â€“323 (2012)
Article MathSciNet Google Scholar
Censor, Y., Motova, A., Segal, A.: Perturbed projections and subgradient projections for the multiple-sets split feasibility problem. J. Math. Anal. Appl. 327(2), 1244â€“1256 (2007)
Article MathSciNet Google Scholar
Censor, Y., Zenios, S.A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press on Demand, Oxford (1997)
MATH Google Scholar
Chi, E.C., Lange, K.: A look at the generalized heron problem through the lens of majorizationâ€“minimization. Am. Math. Mon. 121(2), 95â€“108 (2014)
Article MathSciNet Google Scholar
Chi, E.C., Zhou, H., Lange, K.: Distance majorization and its applications. Math. Program. Ser. A 146(1â€“2), 409â€“436 (2014)
Article MathSciNet Google Scholar
Combettes, P.L.: Inconsistent signal feasibility problems: least-squares solutions in a product space. IEEE Trans. Signal Process. 42(11), 2955â€“2966 (1994)
Article Google Scholar
Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5â€“6), 475â€“504 (2004)
Article MathSciNet Google Scholar
Combettes, P.L., Bondon, P.: Hard-constrained inconsistent signal feasibility problems. IEEE Trans. Signal Process. 47(9), 2460â€“2468 (1999)
Article Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forwardâ€“backward splitting. Multiscale Model. Simul. 4(4), 1168â€“1200 (2005)
Article MathSciNet Google Scholar
Craft, D., Bangert, M., Long, T., Papp, D., Unkelbach, J.: Shared data for intensity modulated radiation therapy (IMRT) optimization research: the CORT dataset. GigaSci. 3(1), 37 (2014)
Article Google Scholar
Davenport, M.A., Duarte, M.F., Eldar, Y.C., Kutyniok, G.: Chapter 1. Introduction to compressed sensing. Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge (2012)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1â€“38 (1977)
MathSciNet MATH Google Scholar
Dingping, W., Qibin, D., Erli, W., Hang, Z.: The Split Feasibility Problem in Hilbert Space, pp. 1149â€“1154. Springer, Berlin (2013)
Google Scholar
Ehrgott, M., GÃ¼ler, c, Hamacher, H.W., Shao, L.: Mathematical optimization in intensity modulated radiation therapy. Ann. Oper. Res. 175(1), 309â€“365 (2010)
Article MathSciNet Google Scholar
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20(1), 101 (2010)
MathSciNet MATH Google Scholar
FÃ©votte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the $\beta $-divergence. Neural Comput. 23(9), 2421â€“2456 (2011)
Article MathSciNet Google Scholar
Fiacco, A.V., McCormick, G.P.: Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Classics in Applied Mathematics. SIAM, PA (1990)
Book Google Scholar
Gibali, A., KÃ¼fer, K., SÃ¼ss, P.: Successive linear programing approach for solving the nonlinear split feasibility problem. J. Nonlinear Convex Anal. 15(2), 345â€“353 (2014)
MathSciNet Google Scholar
Gibali, A., KÃ¼fer, K.-H., Reem, D., SÃ¼ss, P.: A generalized projection-based scheme for solving convex constrained optimization problems. Comput. Optim. Appl. 70(3), 737â€“762 (2018)
Article MathSciNet Google Scholar
Goldstein, T., Studer, C., Baraniuk, R.: A field guide to forward-backward splitting with a FASTA implementation. arXiv:1411.3406 [cs.NA] (2014)
Gordon, R., Bender, R., Herman, G.T.: Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29(3), 471â€“481 (1970)
Article Google Scholar
Hou, Q., Wang, J., Chen, Y., Galvin, J.M.: An optimization algorithm for intensity modulated radiotherapy-the simulated dynamics with dose-volume constraints. Med. Phys. 30(1), 61â€“68 (2003)
Article Google Scholar
Lange, K.: A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 57(2), 425â€“437 (1995)
MathSciNet MATH Google Scholar
Lange, K.: Numerical Analysis for Statisticians. Statistics and Computing, 2nd edn. Springer, New York (2010)
Book Google Scholar
Lange, K.: Optimization. Springer Texts in Statistics, 2nd edn. Springer, New York (2013)
Google Scholar
Lange, K.: MM Optimization Algorithms. SIAM, PA (2016)
Book Google Scholar
Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Gr. Stat. 9, 1â€“20 (2000)
Google Scholar
Li, Z., Han, D., Zhang, W.: A self-adaptive projection-type method for nonlinear multiple-sets split feasibility problem. Inverse Probl. Sci. Eng. 21(1), 155â€“170 (2013)
Article MathSciNet Google Scholar
Llacer, J., Deasy, J.O., Bortfeld, T.R., Solberg, T.D., Promberger, C.: Absence of multiple local minima effects in intensity modulated optimization with dose-volume constraints. Phys. Med. Biol. 48(2), 183 (2003)
Article Google Scholar
Lorenz, D.A., SchÃ¶pfer, F., Wenger, S.: The linearized Bregman method via split feasibility problems: analysis and generalizations. SIAM J. Imaging Sci. 7(2), 1237â€“1262 (2014)
Article MathSciNet Google Scholar
Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. International series in operations research & management science, volume 228, 4th edn. Springer, New York (2016)
Book Google Scholar
Masad, E., Reich, S.: A note on the multiple-set split convex feasibility problem in Hilbert space. J. Nonlinear Convex Anal. 8(3), 367â€“371 (2007)
MathSciNet MATH Google Scholar
McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC press, Boca Raton (1989)
Book Google Scholar
Mordukhovich, B.S., Nam, N.M.: Applications of variational analysis to a generalized Fermatâ€“Torricelli problem. J. Optim. Theory Appl. 148, 431â€“454 (2011)
Article MathSciNet Google Scholar
Mordukhovich, B.S., Nam, N.M., Salinas, J.: Applications of variational analysis to a generalized Heron problem. Appl. Anal. 91(10), 1915–1942 (2011)
Article MathSciNet Google Scholar
Mordukhovich, B.S., Nam, N.M., Salinas, J.: Solving a generalized Heron problem by means of convex analysis. Am. Math. Mon. 119(2), 87â€“99 (2012)
Article MathSciNet Google Scholar
MorÃ©, J.J.: The Levenbergâ€“Marquardt algorithm: implementation and theory. In: Numerical analysis, pp. 105â€“116. Springer (1978)
Moudafi, A.: A note on the split common fixed-point problem for quasi-nonexpansive operators. Nonlinear Anal. Theory Methods Appl. 74(12), 4083â€“4087 (2011)
Article MathSciNet Google Scholar
Murty, K.G., Yu, F.-T.: Linear Complementarity, Linear and Nonlinear Programming, vol. 3. Citeseer, Heldermann, Berlin (1988)
MATH Google Scholar
Palta, J.R., Mackie, T.R. (eds.): Intensity-Modulated Radiation Therapy: The State of The Art. Medical Physics Publishing, Madison (2003)
Google Scholar
Park, M.Y., Hastie, T.: L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 69(4), 659â€“677 (2007)
Article MathSciNet Google Scholar
Polson, N.G., Scott, J.G., Willard, B.T.: Proximal algorithms in statistics and machine learning. Stat. Sci. 30(4), 559â€“581 (2015)
Article MathSciNet Google Scholar
R Core Team. R: a language and environment for statistical computing. R foundation for statistical computing, Vienna (2017)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, New York (1998)
Book Google Scholar
Shepard, D.M., Ferris, M.C., Olivera, G.H., Mackie, T.R.: Optimizing the delivery of radiation therapy to cancer patients. SIAM Rev. 41(4), 721â€“744 (1999)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Wang, F., Xu, H.-K.: Cyclic algorithms for split feasibility problems in Hilbert spaces. Nonlinear Anal. Theory Methods Appl. 74(12), 4105â€“4111 (2011)
Article MathSciNet Google Scholar
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479â€“2493 (2009)
Article MathSciNet Google Scholar
Xu, H.-K.: A variable Krasnosel’skii Mann algorithm and the multiple-set split feasibility problem. Inverse Probl. 22(6), 2021â€“2034 (2006)
Article Google Scholar
Xu, H.-K.: Iterative methods for the split feasibility problem in infinite-dimensional Hilbert spaces. Inverse Probl. 26(10), 105018 (2010)
Article MathSciNet Google Scholar
Xu, J., Chi, E., Lange, K.: Generalized linear model regression under distance-to-set penalties. In: Advances in Neural Information Processing Systems, pp. 1385â€“1395 (2017)
Zhang, X., Liu, H., Wang, X., Dong, L., Wu, Q., Mohan, R.: Speed and convergence properties of gradient algorithms for optimization of IMRT. Med. Phy. 31(5), 1141â€“1152 (2004)
Article Google Scholar
Zhou, H., Alexander, D., Lange, K.: A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21, 261â€“273 (2011)
Article MathSciNet Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Methodol.) 67(2), 301â€“320 (2005)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank Steve Wright and Dávid Papp for their help with the IMRT data examples, and thank Patrick Combettes for helpful comments. We also thank two anonymous referees for their constructive comments and thoughtful feedback.

Author information

Authors and Affiliations

Department of Statistical Science, Duke University, Durham, NC, USA
Jason Xu
Department of Statistics, North Carolina State University, Raleigh, NC, USA
Eric C. Chi & Meng Yang
Departments of Biomathematics, Human Genetics, and Statistics, University of California, Los Angeles, CA, USA
Kenneth Lange

Authors

Jason Xu
View author publications
You can also search for this author in PubMed Google Scholar
Eric C. Chi
View author publications
You can also search for this author in PubMed Google Scholar
Meng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth Lange
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason Xu.

Additional information

Jason Xu was partially supported by NSF MSPRF Grant DMS-1606177. Kenneth Lange was partially supported by NIH Grant HG006139.

Appendix

1.1 Proof of Proposition 3

Our proof of Proposition 3 relies on showing the algorithm map $\psi $ that carries out the Armijo backtracking line search is closed at all non-stationary points of f. Recall that a point-to-set map A is closed at a point $\varvec{x}$, if for any sequence $\varvec{x}_j \rightarrow \varvec{x}$ such that $\varvec{y}_j \in A(\varvec{x}_j)$ converges to $\varvec{y}$, it follows that $\varvec{y}\in A(\varvec{x})$. Define the following point-to-set map

$$\begin{aligned} S(\varvec{x}, \varvec{v}) = \left\{ \varvec{y}: \varvec{y}= \varvec{x}+ \sigma ^k \varvec{v}\text { for }k \in \mathbb {Z}_+\text { such that }\, f(\varvec{y}) \le f(\varvec{x}) + \alpha \sigma ^{k} df(\varvec{x}) \varvec{v}\right\} , \end{aligned}$$

and let G denote the map

$$\begin{aligned} G(\varvec{x}) = \begin{pmatrix} \varvec{x}\\ \nabla f(\varvec{x}) \end{pmatrix}. \end{aligned}$$

The map G is continuous since f is Lipschitz differentiable. Let $A = S \circ G(\varvec{x})$. By Corollary 2 in Chapter 7.7 of [49], the composite mapping A is closed at $\varvec{x}$, if the mapping S is closed at $G(\varvec{x})$. We will state and prove a slightly more general result.

Proposition 4

The mapping S is closed at all points $(\varvec{x}, \varvec{v})$ provided that $\varvec{v}\not = \varvec{0}$.

Proof

Consider sequences $\{ \varvec{x}_j \}$, $\{ \varvec{v}_j \}$ such that $\varvec{x}_j \rightarrow \varvec{x}, \varvec{v}_j \rightarrow \varvec{v}$, and let $\varvec{y}_k \in S(\varvec{x}_{k}, \varvec{v}_k)$ with $\varvec{y}_k \rightarrow \varvec{y}$. For every j, there is some $k_j$ such that $\varvec{y}_j = \varvec{x}_j + \sigma ^{k_j} \varvec{v}_j$, where

$$\begin{aligned} \sigma ^{k_j} = \log \left( \frac{ ||\varvec{y}_j - \varvec{x}_j ||}{||\varvec{v}_j ||} \right) . \end{aligned}$$

Taking limits on both side above yields

$$\begin{aligned} \sigma ^{\bar{k}} = \frac{ ||\varvec{y}- \varvec{x}||}{||\varvec{v}||}, \text { where }\bar{k} = \lim _{j \rightarrow \infty } k_j. \end{aligned}$$

Because $\varvec{y}_k \in S(\varvec{x}_{k}, \varvec{v}_k)$, it holds for each j that

$$\begin{aligned} f(\varvec{y}_j) \le f(\varvec{x}_j) + \alpha \sigma ^{k_j} df(\varvec{x}_j ) \varvec{v}_j . \end{aligned}$$

(22)

Since $\{k_j\}$ is a sequence of integers, it assumes only finitely many distinct values before converging to the constant sequence $\{ \bar{k} \}$; let $k^\dagger $ denote the maximum of these values. Then replacing $k_j$ by $k^\dagger $ and letting $j \rightarrow \infty $ in (22), together with continuity of f and $df(\varvec{x})$, we have that

$$\begin{aligned} f(\varvec{y}) \le f(\varvec{x}) + \alpha \sigma ^{k^\dagger } df(\varvec{x}) \varvec{v}. \end{aligned}$$

That is, $\varvec{y}\in S(\varvec{x}, \varvec{v})$, proving that S is closed at $(\varvec{x}, \varvec{v})$. $\square $

Proposition 4 informs us that the map A is closed at all non-stationary points of f. We are now ready to prove Proposition 3.

Proof

Fix an initial iterate $\varvec{x}_0$. Note that the set $\mathcal {L}_f(\varvec{x}_0) \equiv \{ \varvec{x}: f(\varvec{x}) \le f(\varvec{x}_0)\}$ is compact since f is coercive and the modified quasi-Newton method generates monotonically decreasing values. Since $\mathcal {L}_f(\varvec{x}_0)$ is compact, the sequence $\{\varvec{x}_{k}\}$ has a convergent subsequence whose limit is in $\mathcal {L}_f(\varvec{x}_0)$; denote this as $\varvec{x}_{k_l} \rightarrow \varvec{x}_\star $ as $l \rightarrow \infty $. Our goal is to show that $\varvec{x}_\star $ must be a stationary point of f. To the contrary, suppose that $\varvec{x}_\star $ is not a stationary point of f.

Let $\varvec{y}_{k_l} = \psi (\varvec{x}_{k_l}) \in A(\varvec{x}_{k_l})$. Note that $\varvec{y}_{k_l} \in \mathcal {L}_f(\varvec{x}_0)$ and therefore the sequence $\{\varvec{y}_{k_l}\}$ has a convergent subsequence $\{\varvec{y}_{k_{l_j}}\}$. Denote this sequence’s limit $\varvec{y}_\star $. Note that the map A is closed at $\varvec{x}_\star $, since $\varvec{x}_\star $ is not a stationary point of f. Therefore, by the definition of closed maps, we have that $\varvec{y}_\star \in A(\varvec{x}_\star )$ and consequently

$$\begin{aligned} f(\varvec{y}_\star ) \le f(\varvec{x}_\star ) + \alpha \sigma ^k df(\varvec{x}_\star ) \varvec{H}(\varvec{x}_\star )^{-1}\nabla f(\varvec{x}_\star ) < f(\varvec{x}_\star ), \end{aligned}$$

(23)

for some positive integer k. On the other hand, since f is Lipschitz-differentiable, it is continuous; therefore $\lim f(\varvec{x}_{k_l}) = f(\varvec{x}_\star )$. Moreover, for all $k_l$ we have that

$$\begin{aligned} f(\varvec{x}_\star ) \le f(\varvec{x}_{k_{l+1}}) \le f(\varvec{x}_{k_l + 1}) = f(\varphi (\varvec{x}_{k_l})) \le f(\psi \circ \psi (\varvec{x}_{k_l})) \le f(\psi (\varvec{x}_{k_l})). \end{aligned}$$

By continuity of f, we have that $f(\varvec{x}_\star ) \le f(\varvec{y}_\star )$, contradicting the inequality established in (23). We conclude that $\varvec{x}_\star $ must be a stationary point of f. $\square $

1.2 $\beta $-divergence

The $\beta $-divergence is defined

$$\begin{aligned} D(\varvec{x},\varvec{y}) = {\left\{ \begin{array}{ll} \displaystyle \sum _{j}\dfrac{1}{\beta (\beta -1)} x_j^\beta + \sum _{j}\dfrac{1}{\beta } y_j^{\beta } - \dfrac{1}{\beta -1} x_j y_j^{\beta -1} &{} \beta \in \mathbb {R}\setminus \{0, 1 \} \\ \sum _j x_j \log \left( \dfrac{x_j}{y_j}\right) - x_j + y_j &{} \beta = 1 \\ \sum _j \dfrac{x_j}{y_j} - \log \left( \dfrac{x_j}{y_j}\right) - 1 &{} \beta = 0 \end{array}\right. } \end{aligned}$$

We see that the $\beta $-divergence corresponds to the Kullback–Leibler and Itakura–Saito divergences when $\beta =1,0$ respectively. Below we discuss the projection onto a hyperplane for the case of $\beta \in \mathbb {R}\setminus \{0, 1 \}$.

The function $\phi (x) = \dfrac{1}{\beta (\beta -1)} \varvec{x}^\beta $ generates the $\beta $-divergence

$$\begin{aligned} {D(\varvec{x},\varvec{y}) = \sum _{j}\dfrac{1}{\beta (\beta -1)} x_j^\beta + \sum _{j}\dfrac{1}{\beta } y_j^{\beta } - \dfrac{1}{\beta -1} x_j y_j^{\beta -1} .} \end{aligned}$$

Its gradient is

$$\begin{aligned} {\nabla _\beta \phi (\varvec{x}) = \dfrac{1}{\beta -1}\varvec{x}^{\beta -1}, } \end{aligned}$$

and recall the Fenchel conjugate of $\phi $ is given by

$$\begin{aligned} {\phi ^\star (\varvec{z}) = \sup _{\varvec{x}} \bigg ( \langle \varvec{z}, \varvec{x}\rangle - \phi (\varvec{x}) \bigg ) = \sup _{\varvec{x}} \bigg ( \langle \varvec{z}, \varvec{x}\rangle - \sum _j \dfrac{1}{\beta (\beta -1)} x_j^\beta \bigg ).} \end{aligned}$$

Defining $h(\varvec{x}) = \langle \varvec{z}, \varvec{x}\rangle - \sum _j \dfrac{1}{\beta (\beta -1)} x_j^\beta $, and differentiating with respect to $x_j$:

$$\begin{aligned} \nabla _{x_j} h&= z_j - \dfrac{1}{\beta -1}x_j^{\beta -1} = 0 \\ x_j&= (\beta -1)^{\frac{1}{\beta -1}} z_j^{\frac{1}{\beta -1}} . \end{aligned}$$

Plugging into $\phi ^\star (\varvec{x})$,

$$\begin{aligned} \phi ^\star (\varvec{z})&= \sum _j z_j (\beta -1)^{\frac{1}{\beta -1}} z_j^{\frac{1}{\beta -1}} - \sum _j \dfrac{1}{\beta (\beta -1)} \left( (\beta -1)^{\frac{1}{\beta -1}} z_j^{\frac{1}{\beta -1}}\right) ^\beta \\&=(\beta -1)^{\frac{1}{\beta -1}} \sum _j z_j^{\frac{\beta }{\beta -1}} - \dfrac{ (\beta -1)^{\frac{1}{\beta -1}} }{\beta } \sum _j z_j^{\frac{\beta }{\beta -1}} \\&=(\beta -1)^{\frac{1}{\beta -1}} \bigg ( 1 - \dfrac{1 }{\beta } \bigg ) \sum _j z_j^{\frac{\beta }{\beta -1}} . \end{aligned}$$

Finally, differentiating the Fenchel conjugate yields

$$\begin{aligned} \nabla \phi ^\star (\varvec{z}) =(\beta -1)^{\frac{1}{\beta -1}} \bigg ( 1 - \dfrac{1 }{\beta } \bigg ) \frac{\beta }{\beta -1} \varvec{z}^{\frac{1}{\beta -1}} = (\beta -1)^{\frac{1}{\beta -1}} \varvec{z}^{\frac{1}{\beta -1} } . \end{aligned}$$

Thus, the projection of $\varvec{x}$ onto a hyperplane is given by

$$\begin{aligned} {\mathcal {P}^\phi _{Q(a,c)} (\varvec{x}) =(\beta -1)^{\frac{1}{\beta -1}} \bigg ( \dfrac{1}{\beta -1} \varvec{x}^{\beta -1} - \widetilde{\gamma } a\bigg ) ^{\frac{1}{\beta -1}},} \end{aligned}$$

where $\widetilde{\gamma } = \text {arg}\min _{\gamma } (\beta -1)^{\frac{1}{\beta -1}} \bigg ( 1 - \dfrac{1 }{\beta } \bigg ) \sum _j \bigg ( \dfrac{1}{\beta -1} x_j^{\beta -1} - \gamma a_j\bigg )^{\frac{\beta }{\beta -1}} + \gamma c .$

1.3 Per-iteration complexity of IMRT study

We detail the calculations behind our per iteration complexity remarks. Note that in the IMRT dataset considered, $p \ll n < m$.

1.3.1 Linear

How many flops are required to compute a single MM update given by Eq. (8) in the linear case?

$$\begin{aligned} \nabla f(\varvec{x}_{k})= & {} \sum _i \varvec{v}_i\big (\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k})\big ) + \sum _j w_j \varvec{A}^t\big (\varvec{A}\varvec{x}_{k} - \mathcal {P}_{Q_j}(\varvec{A}\varvec{x}_{k})\big ) \\ \varvec{x}_{k+1}= & {} \varvec{x}_{k} - \big [v\varvec{I}+ w\varvec{A}^t\varvec{A}\big ]^{-1}\nabla f(\varvec{x}_{k}). \end{aligned}$$

We first tally the flops required to compute the gradient $\nabla f(\varvec{x}_{k})$. In the IMRT example, the first sum $\sum _i v_i(\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k}))$ requires $\mathcal {O}(n)$ flops. The matrix-vector product $\varvec{z}_k = \varvec{A}\varvec{x}_{k}$ requires $\mathcal {O}(mn)$ flops. The sum $\varvec{y}_k = \sum _j w_j(\varvec{z}_k - \mathcal {P}_{Q_j}(\varvec{z}_k))$ requires $\mathcal {O}(m)$ flops. The matrix-vector product $\varvec{A}^t\varvec{y}_k$ requires $\mathcal {O}(mn)$ flops. Adding the two sums requires $\mathcal {O}(n)$ flops. In total, the gradient requires $\mathcal {O}(mn)$ flops.

We next tally the flops required to compute the MM update. Forming the matrix $v\varvec{I}+ w\varvec{A}^t\varvec{A}$ requires $\mathcal {O}(mn^2)$ flops. Computing its Cholesky factorization requires $\mathcal {O}(n^3)$ flops. We only need to compute the factorization once and can cache it. Subsequent iterations will require $\mathcal {O}(n^2)$ flops. Thus, the amount of work required to compute a single MM update for the linear formulation is $\mathcal {O}(mn)$.

Note that the exact update in the linear case given by

$$\begin{aligned} \varvec{x}_{k+1} = (v\varvec{I}+ w\varvec{A}^t\varvec{A})^{-1} \left[ \sum _i \varvec{v}_i \mathcal {P}_{C_i}(\varvec{x}_{k}) + \varvec{A}^t \sum _j w_j \varvec{A}\mathcal {P}_{Q_j}(\varvec{A}\varvec{x}_{k}) \right] \end{aligned}$$

has the same complexity as the update considered above.

1.3.2 Non-linear

We next consider the number of flops required for an MM update in the non-linear case:

$$\begin{aligned} \nabla f(\varvec{x}_{k})= & {} \sum _i v_i\big (\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k})\big ) + \nabla h(\varvec{x}_{k})\sum _j w_j \big (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k}))\big ) \\ \varvec{x}_{k+1}= & {} \varvec{x}_{k} - \eta _{\varvec{x}_{k}}\big [v\varvec{I}+ w\nabla h(\varvec{x}_{k})dh(\varvec{x}_{k})\big ]^{-1}\nabla f(\varvec{x}_{k}) \\= & {} \varvec{x}_{k} - \eta _{\varvec{x}_{k}}\left[ \frac{1}{v}\varvec{I}- \frac{w}{v^2}\nabla h(\varvec{x}_{k}) \left( \varvec{I}+ \frac{w}{v}dh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k}) \right] \nabla f(\varvec{x}_{k}) \\= & {} \varvec{x}_{k} - \eta _{\varvec{x}_{k}}\left[ \frac{1}{v}\nabla f(\varvec{x}_{k}) - \frac{w}{v}\nabla h(\varvec{x}_{k}) \big (v\varvec{I}+ { {wdh}}(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \big )^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k}) \right] . \end{aligned}$$

We first tally the flops required to compute the gradient $\nabla f(\varvec{x}_{k})$. The first sum $\sum _i v_i(\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k}))$ requires $\mathcal {O}(n)$ flops. Computing $\varvec{A}\varvec{x}_{k}$ requires $\mathcal {O}(mn)$ flops. Computing the sum $\sum _j w_j (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k})))$ requires $\mathcal {O}(m)$ flops. Computing $\nabla h(\varvec{x}_{k})$ requires $\mathcal {O}(mn)$ flops, and computing its product with the sum term $\nabla h(\varvec{x}_{k})\sum _j w_j (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k})))$ requires $\mathcal {O}(mp)$ flops. In total, the gradient requires $\mathcal {O}(mn)$ flops. As in the linear case, the dominant cost are matrix-vector products involving the matrix $\varvec{A}$.

We next tally the flops required to compute the MM update. Forming the matrix $v\varvec{I}+ w\nabla h(\varvec{x}_{k})dh(\varvec{x}_{k})$ requires $\mathcal {O}(np^2)$ flops. Computing its Cholesky factorization requires $\mathcal {O}(p^3)$ flops. Computing $dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})$ requires $\mathcal {O}(np)$ flops. Computing $\left( v\varvec{I}+ wdh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})$ requires $\mathcal {O}(p^2)$ flops. The product $\nabla h(\varvec{x}_{k})\left( v\varvec{I}+ wdh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})$ requires $\mathcal {O}(np)$ flops. Computing a candidate update requires $\mathcal {O}(n)$ flops. An objective function evaluation requires $\mathcal {O}(mn)$ flops. Thus, including the line search, the amount of work required to compute a single MM update for the linear formulation is $\mathcal {O}(\max \{mn,np^2\})$. When $p^2 < m$, we conclude that the computational work for a single MM update for the non-linear case is comparable to that of the linear case. In practice, reducing the problem size via a non-linear formulation may additionally reduce the number of MM updates.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Chi, E.C., Yang, M. et al. A majorization–minimization algorithm for split feasibility problems. Comput Optim Appl 71, 795–828 (2018). https://doi.org/10.1007/s10589-018-0025-z

Download citation

Received: 10 August 2017
Published: 02 August 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10589-018-0025-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A majorization–minimization algorithm for split feasibility problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Note on the modified relaxation CQ algorithm for the split feasibility problem

An algorithm for the split-feasibility problems with application to the split-equality problem

Iterative algorithms for the multiple-sets split feasibility problem in Hilbert spaces

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Proposition 3

Proposition 4

Proof

Proof

1.2 \(\beta \)-divergence

1.3 Per-iteration complexity of IMRT study

1.3.1 Linear

1.3.2 Non-linear

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A majorization–minimization algorithm for split feasibility problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Note on the modified relaxation CQ algorithm for the split feasibility problem

An algorithm for the split-feasibility problems with application to the split-equality problem

Iterative algorithms for the multiple-sets split feasibility problem in Hilbert spaces

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Proposition 3

Proposition 4

Proof

Proof

1.2 \(\beta \)-divergence

1.3 Per-iteration complexity of IMRT study

1.3.1 Linear

1.3.2 Non-linear

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation