Abstract
The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization–minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy.
Similar content being viewed by others
References
Alber, M., Meedt, G., Nusslin, F., Reemtsen, R.: On the degeneracy of the IMRT optimization problem. Med. Phys. 29(11), 2584–2589 (2002)
Antoniadis, A., Gijbels, I., Nikolova, M.: Penalized likelihood regression for generalized linear models with non-quadratic penalties. Ann. Inst. Stat. Math. 63(3), 585–615 (2011)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and legendre functions in banach spaces. Commun. Contemp. Math. 3(04), 615–647 (2001)
Becker, M.P., Yang, I., Lange, K.: EM algorithms without missing data. Stat. Methods Med. Res. 6, 38–54 (1997)
Biegler, L.T., Zavala, V.M.: Large-scale nonlinear programming using IPOPT: an integrating framework for enterprise-wide dynamic optimization. Comput. Chem. Eng. 33(3), 575–582 (2009)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
Byrne, C.L.: Iterative image reconstruction algorithms based on cross-entropy minimization. IEEE Trans. Image Process. 2(1), 96–103 (1993)
Byrne, C.L.: Iterative projection onto convex sets using multiple Bregman distances. Inverse Probl. 15(5), 1295 (1999)
Byrne, C.L.: Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 18(2), 441–453 (2002)
Byrne, C.L.: Sequential unconstrained minimization algorithms for constrained optimization. Inverse Probl. 24(1), 015013 (2008)
Byrne, C.L.: Alternating minimization as sequential unconstrained minimization: a survey. J. Optim. Theory Appl. 156(3), 554–566 (2013)
Byrne, C.L.: An elementary proof of convergence for the forward-backward splitting algorithm. J. Nonlinear Convex Anal. 15(4), 681–691 (2014)
Cegielski, A.: General method for solving the split common fixed point problem. J. Optim. Theory Appl. 165(2), 385–404 (2015)
Censor, Y.: Row-action methods for huge and sparse systems and their applications. SIAM Rev. 23(4), 444–466 (1981)
Censor, Y.: Weak and strong superiorization: between feasibility-seeking and minimization. Analele Univ. Ovidius Constanta Ser. Mat. 23(3), 41–54 (2015)
Censor, Y., Bortfeld, T., Martin, B., Trofimov, A.: A unified approach for inversion problems in intensity-modulated radiation therapy. Phys. Med. Biol. 51(10), 2353 (2006)
Censor, Y., Elfving, T.: A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 8(2), 221–239 (1994)
Censor, Y., Elfving, T., Kopf, N., Bortfeld, T.: The multiple-sets split feasibility problem and its applications for inverse problems. Inverse Probl. 21(6), 2071–2084 (2005)
Censor, Y., Gibali, A., Reich, S.: Algorithms for the split variational inequality problem. Numer. Algorithms 59(2), 301–323 (2012)
Censor, Y., Motova, A., Segal, A.: Perturbed projections and subgradient projections for the multiple-sets split feasibility problem. J. Math. Anal. Appl. 327(2), 1244–1256 (2007)
Censor, Y., Zenios, S.A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press on Demand, Oxford (1997)
Chi, E.C., Lange, K.: A look at the generalized heron problem through the lens of majorization–minimization. Am. Math. Mon. 121(2), 95–108 (2014)
Chi, E.C., Zhou, H., Lange, K.: Distance majorization and its applications. Math. Program. Ser. A 146(1–2), 409–436 (2014)
Combettes, P.L.: Inconsistent signal feasibility problems: least-squares solutions in a product space. IEEE Trans. Signal Process. 42(11), 2955–2966 (1994)
Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004)
Combettes, P.L., Bondon, P.: Hard-constrained inconsistent signal feasibility problems. IEEE Trans. Signal Process. 47(9), 2460–2468 (1999)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Craft, D., Bangert, M., Long, T., Papp, D., Unkelbach, J.: Shared data for intensity modulated radiation therapy (IMRT) optimization research: the CORT dataset. GigaSci. 3(1), 37 (2014)
Davenport, M.A., Duarte, M.F., Eldar, Y.C., Kutyniok, G.: Chapter 1. Introduction to compressed sensing. Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge (2012)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
Dingping, W., Qibin, D., Erli, W., Hang, Z.: The Split Feasibility Problem in Hilbert Space, pp. 1149–1154. Springer, Berlin (2013)
Ehrgott, M., Güler, c, Hamacher, H.W., Shao, L.: Mathematical optimization in intensity modulated radiation therapy. Ann. Oper. Res. 175(1), 309–365 (2010)
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20(1), 101 (2010)
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Fiacco, A.V., McCormick, G.P.: Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Classics in Applied Mathematics. SIAM, PA (1990)
Gibali, A., Küfer, K., Süss, P.: Successive linear programing approach for solving the nonlinear split feasibility problem. J. Nonlinear Convex Anal. 15(2), 345–353 (2014)
Gibali, A., Küfer, K.-H., Reem, D., Süss, P.: A generalized projection-based scheme for solving convex constrained optimization problems. Comput. Optim. Appl. 70(3), 737–762 (2018)
Goldstein, T., Studer, C., Baraniuk, R.: A field guide to forward-backward splitting with a FASTA implementation. arXiv:1411.3406 [cs.NA] (2014)
Gordon, R., Bender, R., Herman, G.T.: Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29(3), 471–481 (1970)
Hou, Q., Wang, J., Chen, Y., Galvin, J.M.: An optimization algorithm for intensity modulated radiotherapy-the simulated dynamics with dose-volume constraints. Med. Phys. 30(1), 61–68 (2003)
Lange, K.: A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 57(2), 425–437 (1995)
Lange, K.: Numerical Analysis for Statisticians. Statistics and Computing, 2nd edn. Springer, New York (2010)
Lange, K.: Optimization. Springer Texts in Statistics, 2nd edn. Springer, New York (2013)
Lange, K.: MM Optimization Algorithms. SIAM, PA (2016)
Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Gr. Stat. 9, 1–20 (2000)
Li, Z., Han, D., Zhang, W.: A self-adaptive projection-type method for nonlinear multiple-sets split feasibility problem. Inverse Probl. Sci. Eng. 21(1), 155–170 (2013)
Llacer, J., Deasy, J.O., Bortfeld, T.R., Solberg, T.D., Promberger, C.: Absence of multiple local minima effects in intensity modulated optimization with dose-volume constraints. Phys. Med. Biol. 48(2), 183 (2003)
Lorenz, D.A., Schöpfer, F., Wenger, S.: The linearized Bregman method via split feasibility problems: analysis and generalizations. SIAM J. Imaging Sci. 7(2), 1237–1262 (2014)
Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. International series in operations research & management science, volume 228, 4th edn. Springer, New York (2016)
Masad, E., Reich, S.: A note on the multiple-set split convex feasibility problem in Hilbert space. J. Nonlinear Convex Anal. 8(3), 367–371 (2007)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC press, Boca Raton (1989)
Mordukhovich, B.S., Nam, N.M.: Applications of variational analysis to a generalized Fermat–Torricelli problem. J. Optim. Theory Appl. 148, 431–454 (2011)
Mordukhovich, B.S., Nam, N.M., Salinas, J.: Applications of variational analysis to a generalized Heron problem. Appl. Anal. 91(10), 1915–1942 (2011)
Mordukhovich, B.S., Nam, N.M., Salinas, J.: Solving a generalized Heron problem by means of convex analysis. Am. Math. Mon. 119(2), 87–99 (2012)
Moré, J.J.: The Levenberg–Marquardt algorithm: implementation and theory. In: Numerical analysis, pp. 105–116. Springer (1978)
Moudafi, A.: A note on the split common fixed-point problem for quasi-nonexpansive operators. Nonlinear Anal. Theory Methods Appl. 74(12), 4083–4087 (2011)
Murty, K.G., Yu, F.-T.: Linear Complementarity, Linear and Nonlinear Programming, vol. 3. Citeseer, Heldermann, Berlin (1988)
Palta, J.R., Mackie, T.R. (eds.): Intensity-Modulated Radiation Therapy: The State of The Art. Medical Physics Publishing, Madison (2003)
Park, M.Y., Hastie, T.: L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 69(4), 659–677 (2007)
Polson, N.G., Scott, J.G., Willard, B.T.: Proximal algorithms in statistics and machine learning. Stat. Sci. 30(4), 559–581 (2015)
R Core Team. R: a language and environment for statistical computing. R foundation for statistical computing, Vienna (2017)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, New York (1998)
Shepard, D.M., Ferris, M.C., Olivera, G.H., Mackie, T.R.: Optimizing the delivery of radiation therapy to cancer patients. SIAM Rev. 41(4), 721–744 (1999)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
Wang, F., Xu, H.-K.: Cyclic algorithms for split feasibility problems in Hilbert spaces. Nonlinear Anal. Theory Methods Appl. 74(12), 4105–4111 (2011)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Xu, H.-K.: A variable Krasnosel’skii Mann algorithm and the multiple-set split feasibility problem. Inverse Probl. 22(6), 2021–2034 (2006)
Xu, H.-K.: Iterative methods for the split feasibility problem in infinite-dimensional Hilbert spaces. Inverse Probl. 26(10), 105018 (2010)
Xu, J., Chi, E., Lange, K.: Generalized linear model regression under distance-to-set penalties. In: Advances in Neural Information Processing Systems, pp. 1385–1395 (2017)
Zhang, X., Liu, H., Wang, X., Dong, L., Wu, Q., Mohan, R.: Speed and convergence properties of gradient algorithms for optimization of IMRT. Med. Phy. 31(5), 1141–1152 (2004)
Zhou, H., Alexander, D., Lange, K.: A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21, 261–273 (2011)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Methodol.) 67(2), 301–320 (2005)
Acknowledgements
We thank Steve Wright and Dávid Papp for their help with the IMRT data examples, and thank Patrick Combettes for helpful comments. We also thank two anonymous referees for their constructive comments and thoughtful feedback.
Author information
Authors and Affiliations
Corresponding author
Additional information
Jason Xu was partially supported by NSF MSPRF Grant DMS-1606177. Kenneth Lange was partially supported by NIH Grant HG006139.
Appendix
Appendix
1.1 Proof of Proposition 3
Our proof of Proposition 3 relies on showing the algorithm map \(\psi \) that carries out the Armijo backtracking line search is closed at all non-stationary points of f. Recall that a point-to-set map A is closed at a point \(\varvec{x}\), if for any sequence \(\varvec{x}_j \rightarrow \varvec{x}\) such that \(\varvec{y}_j \in A(\varvec{x}_j)\) converges to \(\varvec{y}\), it follows that \(\varvec{y}\in A(\varvec{x})\). Define the following point-to-set map
and let G denote the map
The map G is continuous since f is Lipschitz differentiable. Let \(A = S \circ G(\varvec{x})\). By Corollary 2 in Chapter 7.7 of [49], the composite mapping A is closed at \(\varvec{x}\), if the mapping S is closed at \(G(\varvec{x})\). We will state and prove a slightly more general result.
Proposition 4
The mapping S is closed at all points \((\varvec{x}, \varvec{v})\) provided that \(\varvec{v}\not = \varvec{0}\).
Proof
Consider sequences \(\{ \varvec{x}_j \}\), \(\{ \varvec{v}_j \}\) such that \(\varvec{x}_j \rightarrow \varvec{x}, \varvec{v}_j \rightarrow \varvec{v}\), and let \(\varvec{y}_k \in S(\varvec{x}_{k}, \varvec{v}_k)\) with \(\varvec{y}_k \rightarrow \varvec{y}\). For every j, there is some \(k_j\) such that \(\varvec{y}_j = \varvec{x}_j + \sigma ^{k_j} \varvec{v}_j\), where
Taking limits on both side above yields
Because \(\varvec{y}_k \in S(\varvec{x}_{k}, \varvec{v}_k)\), it holds for each j that
Since \(\{k_j\}\) is a sequence of integers, it assumes only finitely many distinct values before converging to the constant sequence \(\{ \bar{k} \}\); let \(k^\dagger \) denote the maximum of these values. Then replacing \(k_j\) by \(k^\dagger \) and letting \(j \rightarrow \infty \) in (22), together with continuity of f and \(df(\varvec{x})\), we have that
That is, \(\varvec{y}\in S(\varvec{x}, \varvec{v})\), proving that S is closed at \((\varvec{x}, \varvec{v})\). \(\square \)
Proposition 4 informs us that the map A is closed at all non-stationary points of f. We are now ready to prove Proposition 3.
Proof
Fix an initial iterate \(\varvec{x}_0\). Note that the set \(\mathcal {L}_f(\varvec{x}_0) \equiv \{ \varvec{x}: f(\varvec{x}) \le f(\varvec{x}_0)\}\) is compact since f is coercive and the modified quasi-Newton method generates monotonically decreasing values. Since \(\mathcal {L}_f(\varvec{x}_0)\) is compact, the sequence \(\{\varvec{x}_{k}\}\) has a convergent subsequence whose limit is in \(\mathcal {L}_f(\varvec{x}_0)\); denote this as \(\varvec{x}_{k_l} \rightarrow \varvec{x}_\star \) as \(l \rightarrow \infty \). Our goal is to show that \(\varvec{x}_\star \) must be a stationary point of f. To the contrary, suppose that \(\varvec{x}_\star \) is not a stationary point of f.
Let \(\varvec{y}_{k_l} = \psi (\varvec{x}_{k_l}) \in A(\varvec{x}_{k_l})\). Note that \(\varvec{y}_{k_l} \in \mathcal {L}_f(\varvec{x}_0)\) and therefore the sequence \(\{\varvec{y}_{k_l}\}\) has a convergent subsequence \(\{\varvec{y}_{k_{l_j}}\}\). Denote this sequence’s limit \(\varvec{y}_\star \). Note that the map A is closed at \(\varvec{x}_\star \), since \(\varvec{x}_\star \) is not a stationary point of f. Therefore, by the definition of closed maps, we have that \(\varvec{y}_\star \in A(\varvec{x}_\star )\) and consequently
for some positive integer k. On the other hand, since f is Lipschitz-differentiable, it is continuous; therefore \(\lim f(\varvec{x}_{k_l}) = f(\varvec{x}_\star )\). Moreover, for all \(k_l\) we have that
By continuity of f, we have that \(f(\varvec{x}_\star ) \le f(\varvec{y}_\star )\), contradicting the inequality established in (23). We conclude that \(\varvec{x}_\star \) must be a stationary point of f. \(\square \)
1.2 \(\beta \)-divergence
The \(\beta \)-divergence is defined
We see that the \(\beta \)-divergence corresponds to the Kullback–Leibler and Itakura–Saito divergences when \(\beta =1,0\) respectively. Below we discuss the projection onto a hyperplane for the case of \(\beta \in \mathbb {R}\setminus \{0, 1 \}\).
The function \(\phi (x) = \dfrac{1}{\beta (\beta -1)} \varvec{x}^\beta \) generates the \(\beta \)-divergence
Its gradient is
and recall the Fenchel conjugate of \(\phi \) is given by
Defining \(h(\varvec{x}) = \langle \varvec{z}, \varvec{x}\rangle - \sum _j \dfrac{1}{\beta (\beta -1)} x_j^\beta \), and differentiating with respect to \(x_j\):
Plugging into \(\phi ^\star (\varvec{x})\),
Finally, differentiating the Fenchel conjugate yields
Thus, the projection of \(\varvec{x}\) onto a hyperplane is given by
where \(\widetilde{\gamma } = \text {arg}\min _{\gamma } (\beta -1)^{\frac{1}{\beta -1}} \bigg ( 1 - \dfrac{1 }{\beta } \bigg ) \sum _j \bigg ( \dfrac{1}{\beta -1} x_j^{\beta -1} - \gamma a_j\bigg )^{\frac{\beta }{\beta -1}} + \gamma c .\)
1.3 Per-iteration complexity of IMRT study
We detail the calculations behind our per iteration complexity remarks. Note that in the IMRT dataset considered, \(p \ll n < m\).
1.3.1 Linear
How many flops are required to compute a single MM update given by Eq. (8) in the linear case?
We first tally the flops required to compute the gradient \(\nabla f(\varvec{x}_{k})\). In the IMRT example, the first sum \(\sum _i v_i(\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k}))\) requires \(\mathcal {O}(n)\) flops. The matrix-vector product \(\varvec{z}_k = \varvec{A}\varvec{x}_{k}\) requires \(\mathcal {O}(mn)\) flops. The sum \(\varvec{y}_k = \sum _j w_j(\varvec{z}_k - \mathcal {P}_{Q_j}(\varvec{z}_k))\) requires \(\mathcal {O}(m)\) flops. The matrix-vector product \(\varvec{A}^t\varvec{y}_k\) requires \(\mathcal {O}(mn)\) flops. Adding the two sums requires \(\mathcal {O}(n)\) flops. In total, the gradient requires \(\mathcal {O}(mn)\) flops.
We next tally the flops required to compute the MM update. Forming the matrix \(v\varvec{I}+ w\varvec{A}^t\varvec{A}\) requires \(\mathcal {O}(mn^2)\) flops. Computing its Cholesky factorization requires \(\mathcal {O}(n^3)\) flops. We only need to compute the factorization once and can cache it. Subsequent iterations will require \(\mathcal {O}(n^2)\) flops. Thus, the amount of work required to compute a single MM update for the linear formulation is \(\mathcal {O}(mn)\).
Note that the exact update in the linear case given by
has the same complexity as the update considered above.
1.3.2 Non-linear
We next consider the number of flops required for an MM update in the non-linear case:
We first tally the flops required to compute the gradient \(\nabla f(\varvec{x}_{k})\). The first sum \(\sum _i v_i(\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k}))\) requires \(\mathcal {O}(n)\) flops. Computing \(\varvec{A}\varvec{x}_{k}\) requires \(\mathcal {O}(mn)\) flops. Computing the sum \(\sum _j w_j (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k})))\) requires \(\mathcal {O}(m)\) flops. Computing \(\nabla h(\varvec{x}_{k})\) requires \(\mathcal {O}(mn)\) flops, and computing its product with the sum term \(\nabla h(\varvec{x}_{k})\sum _j w_j (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k})))\) requires \(\mathcal {O}(mp)\) flops. In total, the gradient requires \(\mathcal {O}(mn)\) flops. As in the linear case, the dominant cost are matrix-vector products involving the matrix \(\varvec{A}\).
We next tally the flops required to compute the MM update. Forming the matrix \(v\varvec{I}+ w\nabla h(\varvec{x}_{k})dh(\varvec{x}_{k})\) requires \(\mathcal {O}(np^2)\) flops. Computing its Cholesky factorization requires \(\mathcal {O}(p^3)\) flops. Computing \(dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})\) requires \(\mathcal {O}(np)\) flops. Computing \(\left( v\varvec{I}+ wdh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})\) requires \(\mathcal {O}(p^2)\) flops. The product \(\nabla h(\varvec{x}_{k})\left( v\varvec{I}+ wdh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})\) requires \(\mathcal {O}(np)\) flops. Computing a candidate update requires \(\mathcal {O}(n)\) flops. An objective function evaluation requires \(\mathcal {O}(mn)\) flops. Thus, including the line search, the amount of work required to compute a single MM update for the linear formulation is \(\mathcal {O}(\max \{mn,np^2\})\). When \(p^2 < m\), we conclude that the computational work for a single MM update for the non-linear case is comparable to that of the linear case. In practice, reducing the problem size via a non-linear formulation may additionally reduce the number of MM updates.
Rights and permissions
About this article
Cite this article
Xu, J., Chi, E.C., Yang, M. et al. A majorization–minimization algorithm for split feasibility problems. Comput Optim Appl 71, 795–828 (2018). https://doi.org/10.1007/s10589-018-0025-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-018-0025-z