A majorization-minimization approach to the sparse generalized eigenvalue problem

Sriperumbudur, Bharath K.; Torres, David A.; Lanckriet, Gert R. G.

doi:10.1007/s10994-010-5226-3

A majorization-minimization approach to the sparse generalized eigenvalue problem

Open access
Published: 07 December 2010

Volume 85, pages 3–39, (2011)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

A majorization-minimization approach to the sparse generalized eigenvalue problem

Download PDF

Bharath K. Sriperumbudur¹,
David A. Torres² &
Gert R. G. Lanckriet^1,2

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Generalized eigenvalue (GEV) problems have applications in many areas of science and engineering. For example, principal component analysis (PCA), canonical correlation analysis (CCA) and Fisher discriminant analysis (FDA) are specific instances of GEV problems, that are widely used in statistical data analysis. The main contribution of this work is to formulate a general, efficient algorithm to obtain sparse solutions to a GEV problem. Specific instances of sparse GEV problems can then be solved by specific instances of this algorithm. We achieve this by solving the GEV problem while constraining the cardinality of the solution. Instead of relaxing the cardinality constraint using a ℓ ₁-norm approximation, we consider a tighter approximation that is related to the negative log-likelihood of a Student’s t-distribution. The problem is then framed as a d.c. (difference of convex functions) program and is solved as a sequence of convex programs by invoking the majorization-minimization method. The resulting algorithm is proved to exhibit global convergence behavior, i.e., for any random initialization, the sequence (subsequence) of iterates generated by the algorithm converges to a stationary point of the d.c. program. Finally, we illustrate the merits of this general sparse GEV algorithm with three specific examples of sparse GEV problems: sparse PCA, sparse CCA and sparse FDA. Empirical evidence for these examples suggests that the proposed sparse GEV algorithm, which offers a general framework to solve any sparse GEV problem, will give rise to competitive algorithms for a variety of applications where specific instances of GEV problems arise.

Article PDF

The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms

Article 18 April 2016

Spectrally Constrained Optimization

Article 06 August 2024

Eigenvalue programming beyond matrices

Article 10 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues. Cell Biology, 96, 6745–6750.
Google Scholar
Bioucas-Dias, J., Figueiredo, M., & Oliveira, J. (2006). Total-variation image deconvolution: a majorization-minimization approach. In: Proc. IEEE international conference on acoustics, speech, and signal processing, Toulouse, France.
Böhning, D., & Lindsay, B. G. (1988). Monotonicity of quadratic-approximation algorithms. Annals of the Institute of Statistical Mathematics, 40(4), 641–663.
Article MathSciNet MATH Google Scholar
Bonnans, J. F., Gilbert, J. C., Lemaréchal, C., & Sagastizábal, C. A. (2006). Numerical optimization: theoretical and practical aspects. Berlin: Springer.
MATH Google Scholar
Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
MATH Google Scholar
Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Proc. 15th international conf. on machine learning (pp. 82–90). San Francisco: Kaufmann.
Google Scholar
Cadima, J., & Jolliffe, I. (1995). Loadings and correlations in the interpretation of principal components. Applied Statistics, 22, 203–214.
Article MathSciNet Google Scholar
Candes, E. J., Wakin, M., & Boyd, S. (2007). Enhancing sparsity by reweighted ℓ ₁ minimization. The Journal of Fourier Analysis and Applications, 14, 877–905.
MathSciNet Google Scholar
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., & Lanckriet, G. R. G. (2005). A direct formulation for sparse PCA using semidefinite programming. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 17, pp. 41–48). Cambridge: MIT Press.
Google Scholar
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., & Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Review, 49(3), 434–448.
Article MathSciNet MATH Google Scholar
d’Aspremont, A., Bach, F. R., & El Ghaoui, L. (2008). Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9, 1269–1294.
Google Scholar
Daubechies, I., Defrise, M., & Mol, C. D. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57, 1413–1457.
Article MathSciNet MATH Google Scholar
de Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. In J. R. Barra, F. Brodeau, G. Romier, & B. V. Cutsem (Eds.), Recent advantages in statistics (pp. 133–146). Amsterdam: North Holland.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.
MathSciNet MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.
Article MathSciNet MATH Google Scholar
Fazel, M., Hindi, H., & Boyd, S. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In: Proc. American control conference, Denver, Colorado.
Figueiredo, M. A. T., Bioucas-Dias, J. M., & Nowak, R. D. (2007). Majorization-minimization algorithms for wavelet-based image restoration. IEEE Transactions on Image Processing, 16, 2980–2991.
Article MathSciNet Google Scholar
Germann, U. (2001) Aligned Hansards of the 36th parliament of Canada. http://www.isi.edu/natural-language/download/hansard/.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M. K., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., & Lander, E. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Article Google Scholar
Heiser, W. J. (1987). Correspondence analysis with least absolute residuals. Computational Statistics and Data Analysis, 5, 337–356.
Article MATH Google Scholar
Horst, R., & Thoai, N. V. (1999). D.c. programming: overview. Journal of Optimization Theory and Applications, 103, 1–43.
Article MathSciNet Google Scholar
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.
Article Google Scholar
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.
MATH Google Scholar
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Book MATH Google Scholar
Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58, 30–37.
Article MathSciNet Google Scholar
Hunter, D. R., & Li, R. (2005). Variable selection using MM algorithms. The Annals of Statistics, 33, 1617–1642.
Article MathSciNet MATH Google Scholar
Jeffers, J. (1967). Two case studies in the application of principal components. Applied Statistics, 16, 225–236.
Article Google Scholar
Jolliffe, I. (1986). Principal component analysis. New York: Springer.
Google Scholar
Jolliffe, I. T., Trendafilov, N. T., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531–547.
Article MathSciNet Google Scholar
Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. Journal of Machine Learning Research, 11, 517–553.
Google Scholar
Lange, K., Hunter, D. R., & Yang, I. (2000). Optimization transfer using surrogate objective functions with discussion. Journal of Computational and Graphical Statistics, 9(1), 1–59.
Article MathSciNet Google Scholar
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In T. Leen, T. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 556–562). Cambridge: MIT Press.
Google Scholar
Lemaréchal, C., & Oustry, F. (1999). Semidefinite relaxations and Lagrangian duality with application to combinatorial optimization (Tech. Rep. RR3710). INRIA.
Littman, M. L., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. In G. Grefenstette (Ed.), Cross-language information retrieval (pp. 51–62). Norwell: Kluwer Academic.
Google Scholar
Mackey, L. (2009). Deflation methods for sparse PCA. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 1017–1024). Cambridge: MIT Press.
Google Scholar
McCabe, G. (1984). Principal variables. Technometrics, 26, 137–144.
Article MathSciNet MATH Google Scholar
Mckinney, M. F. (2003). Features for audio and music classification. In Proc. of the international symposium on music information retrieval (pp. 151–158).
Google Scholar
Meng, X. L. (2000). Discussion on “optimization transfer using surrogate objective functions”. Journal of Computational and Graphical Statistics, 9(1), 35–43.
Article Google Scholar
Mika, S., Rätsch, G., & Müller, K. R. (2001). A mathematical programming approach to the kernel Fisher algorithm. In T. Leen, T. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13). Cambridge: MIT Press.
Google Scholar
Minoux, M. (1986). Mathematical programming: theory and algorithms. New York: Wiley.
MATH Google Scholar
Moghaddam, B., Weiss, Y., & Avidan, S. (2007a). Generalized spectral bounds for sparse LDA. In: Proc. of international conference on machine learning.
Moghaddam, B., Weiss, Y., & Avidan, S. (2007b). Spectral bounds for sparse PCA: exact and greedy algorithms. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems (Vol. 19). Cambridge: MIT Press.
Google Scholar
Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming, Series A, 103, 127–152.
Article MathSciNet MATH Google Scholar
Ortega, J. M., & Rheinboldt, W. C. (1970). Iterative solution of nonlinear equations in several variables. New York: Academic Press.
MATH Google Scholar
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E., & Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signature. Proceedings of the National Academy of Sciences, 98, 15,149–15,154.
Article Google Scholar
Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
MATH Google Scholar
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Book Google Scholar
Sriperumbudur, B. K., & Lanckriet, G. R. G. (2009). On the convergence of the concave-convex procedure. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1759–1767). Cambridge: MIT Press.
Google Scholar
Sriperumbudur, B. K., Torres, D. A., & Lanckriet, G. R. G. (2007). Sparse eigen methods by d.c. programming. In: Proc. of the 24th annual international conference on machine learning.
Sriperumbudur, B. K., Torres, D. A., & Lanckriet, G. R. G. (2009). A d.c. programming approach to the sparse generalized eigenvalue problem. In: Optimization for machine learning workshop, NIPS.
Strang, G. (1986). Introduction to applied mathematics. Wellesley: Cambridge Press.
MATH Google Scholar
Suykens, J. A. K., Gestel, T. V., Brabanter, J. D., Moor, B. D., & Vandewalle, J. (2002). Least squares support vector machines. Singapore: World Scientific.
Book MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of Royal Statistical Society, Series B, 58(1), 267–288.
MathSciNet MATH Google Scholar
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
MathSciNet MATH Google Scholar
Torres, D., Turnbull, D., Barrington, L., & Lanckriet, G. R. G. (2007a). Identifying words that are musically meaningful. In: Proc. of international symposium on music information and retrieval.
Torres, D. A., Turnbull, D., Sriperumbudur, B. K., Barrington, L., & Lanckriet, G. R. G. (2007b). Finding musically meaningful words using sparse CCA. In: Music brain & cognition workshop, NIPS.
Turnbull, D., Barrington, L., Torres, D., & Lanckriet, G. R. G. (2008). Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech and Language Processing, 16, 467–476.
Article Google Scholar
Vandenberghe, L., & Boyd, S. (1996). Semidefinite programming. SIAM Review, 38, 49–95.
Article MathSciNet MATH Google Scholar
Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003). Inferring a semantic representation of text via cross-language correlation analysis. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 15, pp. 1473–1480). Cambridge: MIT Press.
Google Scholar
Weston, J., Elisseeff, A., Schölkopf, B., & Tipping, M. (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3, 1439–1461.
MATH Google Scholar
Yuille, A. L., & Rangarajan, A. (2003). The concave-convex procedure. Neural Computation, 15, 915–936.
Article MATH Google Scholar
Zangwill, W. I. (1969). Nonlinear programming: a unified approach. Englewood Cliffs: Prentice-Hall.
MATH Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265–286.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, CA, 92093-0407, USA
Bharath K. Sriperumbudur & Gert R. G. Lanckriet
Department of Computer Science and Engineering, University of California, La Jolla, San Diego, CA, 92093-0407, USA
David A. Torres & Gert R. G. Lanckriet

Authors

Bharath K. Sriperumbudur
View author publications
You can also search for this author in PubMed Google Scholar
David A. Torres
View author publications
You can also search for this author in PubMed Google Scholar
Gert R. G. Lanckriet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bharath K. Sriperumbudur.

Additional information

Editors: Süreyya Özöǧür-Akyüz, Devrim Ünay, and Alex Smola.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Sriperumbudur, B.K., Torres, D.A. & Lanckriet, G.R.G. A majorization-minimization approach to the sparse generalized eigenvalue problem. Mach Learn 85, 3–39 (2011). https://doi.org/10.1007/s10994-010-5226-3

Download citation

Received: 01 March 2010
Revised: 17 August 2010
Accepted: 03 November 2010
Published: 07 December 2010
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10994-010-5226-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A majorization-minimization approach to the sparse generalized eigenvalue problem

Abstract

Article PDF

Similar content being viewed by others

The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms

Spectrally Constrained Optimization

Eigenvalue programming beyond matrices

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A majorization-minimization approach to the sparse generalized eigenvalue problem

Abstract

Article PDF

Similar content being viewed by others

The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms

Spectrally Constrained Optimization

Eigenvalue programming beyond matrices

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation