Abstract
Generalized eigenvalue (GEV) problems have applications in many areas of science and engineering. For example, principal component analysis (PCA), canonical correlation analysis (CCA) and Fisher discriminant analysis (FDA) are specific instances of GEV problems, that are widely used in statistical data analysis. The main contribution of this work is to formulate a general, efficient algorithm to obtain sparse solutions to a GEV problem. Specific instances of sparse GEV problems can then be solved by specific instances of this algorithm. We achieve this by solving the GEV problem while constraining the cardinality of the solution. Instead of relaxing the cardinality constraint using a ℓ 1-norm approximation, we consider a tighter approximation that is related to the negative log-likelihood of a Student’s t-distribution. The problem is then framed as a d.c. (difference of convex functions) program and is solved as a sequence of convex programs by invoking the majorization-minimization method. The resulting algorithm is proved to exhibit global convergence behavior, i.e., for any random initialization, the sequence (subsequence) of iterates generated by the algorithm converges to a stationary point of the d.c. program. Finally, we illustrate the merits of this general sparse GEV algorithm with three specific examples of sparse GEV problems: sparse PCA, sparse CCA and sparse FDA. Empirical evidence for these examples suggests that the proposed sparse GEV algorithm, which offers a general framework to solve any sparse GEV problem, will give rise to competitive algorithms for a variety of applications where specific instances of GEV problems arise.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues. Cell Biology, 96, 6745–6750.
Bioucas-Dias, J., Figueiredo, M., & Oliveira, J. (2006). Total-variation image deconvolution: a majorization-minimization approach. In: Proc. IEEE international conference on acoustics, speech, and signal processing, Toulouse, France.
Böhning, D., & Lindsay, B. G. (1988). Monotonicity of quadratic-approximation algorithms. Annals of the Institute of Statistical Mathematics, 40(4), 641–663.
Bonnans, J. F., Gilbert, J. C., Lemaréchal, C., & Sagastizábal, C. A. (2006). Numerical optimization: theoretical and practical aspects. Berlin: Springer.
Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Proc. 15th international conf. on machine learning (pp. 82–90). San Francisco: Kaufmann.
Cadima, J., & Jolliffe, I. (1995). Loadings and correlations in the interpretation of principal components. Applied Statistics, 22, 203–214.
Candes, E. J., Wakin, M., & Boyd, S. (2007). Enhancing sparsity by reweighted ℓ 1 minimization. The Journal of Fourier Analysis and Applications, 14, 877–905.
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., & Lanckriet, G. R. G. (2005). A direct formulation for sparse PCA using semidefinite programming. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 17, pp. 41–48). Cambridge: MIT Press.
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., & Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Review, 49(3), 434–448.
d’Aspremont, A., Bach, F. R., & El Ghaoui, L. (2008). Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9, 1269–1294.
Daubechies, I., Defrise, M., & Mol, C. D. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57, 1413–1457.
de Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. In J. R. Barra, F. Brodeau, G. Romier, & B. V. Cutsem (Eds.), Recent advantages in statistics (pp. 133–146). Amsterdam: North Holland.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.
Fazel, M., Hindi, H., & Boyd, S. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In: Proc. American control conference, Denver, Colorado.
Figueiredo, M. A. T., Bioucas-Dias, J. M., & Nowak, R. D. (2007). Majorization-minimization algorithms for wavelet-based image restoration. IEEE Transactions on Image Processing, 16, 2980–2991.
Germann, U. (2001) Aligned Hansards of the 36th parliament of Canada. http://www.isi.edu/natural-language/download/hansard/.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M. K., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., & Lander, E. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Heiser, W. J. (1987). Correspondence analysis with least absolute residuals. Computational Statistics and Data Analysis, 5, 337–356.
Horst, R., & Thoai, N. V. (1999). D.c. programming: overview. Journal of Optimization Theory and Applications, 103, 1–43.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58, 30–37.
Hunter, D. R., & Li, R. (2005). Variable selection using MM algorithms. The Annals of Statistics, 33, 1617–1642.
Jeffers, J. (1967). Two case studies in the application of principal components. Applied Statistics, 16, 225–236.
Jolliffe, I. (1986). Principal component analysis. New York: Springer.
Jolliffe, I. T., Trendafilov, N. T., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531–547.
Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. Journal of Machine Learning Research, 11, 517–553.
Lange, K., Hunter, D. R., & Yang, I. (2000). Optimization transfer using surrogate objective functions with discussion. Journal of Computational and Graphical Statistics, 9(1), 1–59.
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In T. Leen, T. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 556–562). Cambridge: MIT Press.
Lemaréchal, C., & Oustry, F. (1999). Semidefinite relaxations and Lagrangian duality with application to combinatorial optimization (Tech. Rep. RR3710). INRIA.
Littman, M. L., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. In G. Grefenstette (Ed.), Cross-language information retrieval (pp. 51–62). Norwell: Kluwer Academic.
Mackey, L. (2009). Deflation methods for sparse PCA. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 1017–1024). Cambridge: MIT Press.
McCabe, G. (1984). Principal variables. Technometrics, 26, 137–144.
Mckinney, M. F. (2003). Features for audio and music classification. In Proc. of the international symposium on music information retrieval (pp. 151–158).
Meng, X. L. (2000). Discussion on “optimization transfer using surrogate objective functions”. Journal of Computational and Graphical Statistics, 9(1), 35–43.
Mika, S., Rätsch, G., & Müller, K. R. (2001). A mathematical programming approach to the kernel Fisher algorithm. In T. Leen, T. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13). Cambridge: MIT Press.
Minoux, M. (1986). Mathematical programming: theory and algorithms. New York: Wiley.
Moghaddam, B., Weiss, Y., & Avidan, S. (2007a). Generalized spectral bounds for sparse LDA. In: Proc. of international conference on machine learning.
Moghaddam, B., Weiss, Y., & Avidan, S. (2007b). Spectral bounds for sparse PCA: exact and greedy algorithms. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems (Vol. 19). Cambridge: MIT Press.
Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming, Series A, 103, 127–152.
Ortega, J. M., & Rheinboldt, W. C. (1970). Iterative solution of nonlinear equations in several variables. New York: Academic Press.
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E., & Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signature. Proceedings of the National Academy of Sciences, 98, 15,149–15,154.
Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Sriperumbudur, B. K., & Lanckriet, G. R. G. (2009). On the convergence of the concave-convex procedure. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1759–1767). Cambridge: MIT Press.
Sriperumbudur, B. K., Torres, D. A., & Lanckriet, G. R. G. (2007). Sparse eigen methods by d.c. programming. In: Proc. of the 24th annual international conference on machine learning.
Sriperumbudur, B. K., Torres, D. A., & Lanckriet, G. R. G. (2009). A d.c. programming approach to the sparse generalized eigenvalue problem. In: Optimization for machine learning workshop, NIPS.
Strang, G. (1986). Introduction to applied mathematics. Wellesley: Cambridge Press.
Suykens, J. A. K., Gestel, T. V., Brabanter, J. D., Moor, B. D., & Vandewalle, J. (2002). Least squares support vector machines. Singapore: World Scientific.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of Royal Statistical Society, Series B, 58(1), 267–288.
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Torres, D., Turnbull, D., Barrington, L., & Lanckriet, G. R. G. (2007a). Identifying words that are musically meaningful. In: Proc. of international symposium on music information and retrieval.
Torres, D. A., Turnbull, D., Sriperumbudur, B. K., Barrington, L., & Lanckriet, G. R. G. (2007b). Finding musically meaningful words using sparse CCA. In: Music brain & cognition workshop, NIPS.
Turnbull, D., Barrington, L., Torres, D., & Lanckriet, G. R. G. (2008). Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech and Language Processing, 16, 467–476.
Vandenberghe, L., & Boyd, S. (1996). Semidefinite programming. SIAM Review, 38, 49–95.
Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003). Inferring a semantic representation of text via cross-language correlation analysis. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 15, pp. 1473–1480). Cambridge: MIT Press.
Weston, J., Elisseeff, A., Schölkopf, B., & Tipping, M. (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3, 1439–1461.
Yuille, A. L., & Rangarajan, A. (2003). The concave-convex procedure. Neural Computation, 15, 915–936.
Zangwill, W. I. (1969). Nonlinear programming: a unified approach. Englewood Cliffs: Prentice-Hall.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265–286.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Süreyya Özöǧür-Akyüz, Devrim Ünay, and Alex Smola.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Sriperumbudur, B.K., Torres, D.A. & Lanckriet, G.R.G. A majorization-minimization approach to the sparse generalized eigenvalue problem. Mach Learn 85, 3–39 (2011). https://doi.org/10.1007/s10994-010-5226-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5226-3