article

Free access

Learning the Kernel Matrix with Semidefinite Programming

Authors:

Gert R. G. Lanckriet,

Nello Cristianini,

Peter Bartlett,

Laurent El Ghaoui,

Michael I. JordanAuthors Info & Claims

The Journal of Machine Learning Research, Volume 5

Pages 27 - 72

Published: 01 December 2004 Publication History

Abstract

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive semidefinite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space---classical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semidefinite programming (SDP) techniques. When applied to a kernel matrix associated with both training and test data this gives a powerful transductive algorithm---using the labeled part of the data one can learn an embedding also for the unlabeled part. The similarity between test points is inferred from training points and their labels. Importantly, these learning problems are convex, so we obtain a method for learning both the model class and the function without local minima. Furthermore, this approach leads directly to a convex method for learning the 2-norm soft margin parameter in support vector machines, solving an important open problem.

References

[1]

Andersen, E. D. and Andersen, A. D. (2000). The MOSEK interior point optimizer for linear programming: An implementation of the homogeneous algorithm. In Frenk, H., Roos, C., Terlaky, T., and Zhang, S., editors, High Performance Optimization, pages 197-232. Kluwer Academic Publishers.

[2]

Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525-536.

Digital Library

[3]

Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463-482.

Digital Library

[4]

Bennett, K. P. and Bredensteiner, E. J. (2000). Duality and geometry in SVM classifiers. In Proceedings of the 17th International Conference on Machine Learning, pages 57-64. Morgan Kaufmann.

Digital Library

[5]

Boyd, S. and Vandenberghe, L. (2003). Convex optimization. Course notes for EE364, Stanford University. Available at http://www.stanford.edu/class/ee364.

[6]

Breiman, L. (1998). Arcing classifiers. Annals of Statistics, 26(3):801-849.

[7]

Cai, L. and Hofmann, T. (2003). Text categorization by boosting automatically extracted concepts. In Proceedings of the 26th ACM-SIGIR International Conference on Research and Development in Information Retrieval. ACM Press.

Digital Library

[8]

Cristianini, N., Kandola, J., Elisseeff, A., and Shawe-Taylor, J. (2001). On kernel target alignment. Technical Report NeuroColt 2001-099, Royal Holloway University London.

[9]

Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press.

Digital Library

[10]

Cristianini, N., Shawe-Taylor, J., Elisseeff, A., and Kandola, J. (2002). On kernel-target alignment. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14. MIT Press.

[11]

De Bie, T., Lanckriet, G., and Cristianini, N. (2003). Convex tuning of the soft margin parameter. Technical Report CSD-03-1289, University of California, Berkeley.

[12]

Deng, M., Chen, T., and Sun, F. (2003). An integrated probabilistic model for functional prediction of proteins. In RECOMB, pages 95-103.

Digital Library

[13]

Eyheramendy, S., Genkin, A., Ju, W., Lewis, D. D., and Madigan, D. (2003). Sparse bayesian classifiers for text categorization. Technical report, Department of Statistics, Rutgers University.

[14]

Huang, Y. (2003). Support vector machines for text categorization based on latent semantic indexing. Technical report, Electrical and Computer Engineering Department, The Johns Hopkins University.

[15]

Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30.

[16]

Kondor, R. I. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Sammut, C. and Hoffmann, A., editors, Proceedings of the International Conference on Machine Learning. Morgan Kaufmann.

Digital Library

[17]

Lanckriet, G. R. G., Deng, M., Cristianini, N., Jordan, M. I., and Noble, W. S. (2004). Kernel-based data fusion and its application to protein function prediction in yeast. In Pacific Symposium on Biocomputing.

[18]

Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag.

[19]

McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148-188. Cambridge University Press.

[20]

Nesterov, Y. and Nemirovsky, A. (1994). Interior Point Polynomial Methods in Convex Programming: Theory and Applications. SIAM.

[21]

Platt, J. (1999). Using sparseness and analytic QP to speed training of support vector machines. In M. S. Kearns, S. A. Solla, D. A. C., editor, Advances in Neural Information Processing Systems 11. MIT Press.

Digital Library

[22]

Salton, G. and McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.

Digital Library

[23]

Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press.

[24]

Seeger, M. (2002). PAC-Bayesian generalization error bounds for Gaussian process classification. Technical Report EDI-INF-RR-0094, University of Edinburgh, Division of Informatics.

[25]

Shawe-Taylor, J. and Cristianini, N. (1999). Soft margin and margin distribution. In Smola, A., Schölkopf, B., Bartlett, P., and Schuurmans, D., editors, Advances in Large Margin classifiers. MIT Press.

[26]

Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.

Digital Library

[27]

Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195-197.

[28]

Sturm, J. F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11-12:625-653. Special issue on Interior Point Methods (CD supplement with software).

[29]

Tsuda, K. (1999). Support vector classification with asymmetric kernel function. In Verleysen, M., editor, Proceedings of the European Symposium on Artificial Neural Networks, pages 183-188.

[30]

Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. SIAM Review, 38(1):49-95.

Digital Library

[31]

Vandenberghe, L., Boyd, S., and Wu, S.-P. (1998). Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications, 19(2):499-533.

Digital Library

Cited By

Wu ZYang R(2024)The Gauss-cos model for the autocorrelation function of fertility rateApplied Mathematics and Computation10.1016/j.amc.2024.128907480:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.amc.2024.128907
Liang YWang M(2023)Multiple Kernel Learning for Learner ClassificationProceedings of the 2023 6th International Conference on Algorithms, Computing and Artificial Intelligence10.1145/3639631.3639651(113-118)Online publication date: 22-Dec-2023
https://dl.acm.org/doi/10.1145/3639631.3639651
Mey ALoog M(2023)Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical ResultsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.319817545:4(4747-4767)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3198175
Show More Cited By

Index Terms

Learning the Kernel Matrix with Semidefinite Programming
1. Computing methodologies
  1. Machine learning
    1. Learning settings
  2. Symbolic and algebraic manipulation
    1. Symbolic and algebraic algorithms
      1. Linear algebra algorithms
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Learning the Kernel Matrix with Semi-Definite Programming
Semisupervised kernel matrix learning by kernel propagation

The goal of semisupervised kernel matrix learning (SS-KML) is to learn a kernel matrix on all the given samples on which just a little supervised information, such as class label or pairwise constraint, is provided. Despite extensive research, the ...
Approximate Toeplitz Matrix Problem Using Semidefinite Programming

Given a data matrix, we find its nearest symmetric positive-semidefinite Toeplitz matrix. In this paper, we formulate the problem as an optimization problem with a quadratic objective function and semidefinite constraints. In particular, instead of ...

Comments

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 December 2004

Published in JMLR Volume 5

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

573
Total Citations
View Citations
3,106
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)17

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu ZYang R(2024)The Gauss-cos model for the autocorrelation function of fertility rateApplied Mathematics and Computation10.1016/j.amc.2024.128907480:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.amc.2024.128907
Liang YWang M(2023)Multiple Kernel Learning for Learner ClassificationProceedings of the 2023 6th International Conference on Algorithms, Computing and Artificial Intelligence10.1145/3639631.3639651(113-118)Online publication date: 22-Dec-2023
https://dl.acm.org/doi/10.1145/3639631.3639651
Mey ALoog M(2023)Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical ResultsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.319817545:4(4747-4767)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3198175
Zhu PYao XWang YCao MHui BZhao SHu Q(2023)Latent Heterogeneous Graph Network for Incomplete Multi-View LearningIEEE Transactions on Multimedia10.1109/TMM.2022.315459225(3033-3045)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3154592
Li SLiu TTan JZeng DGe S(2023)Trustable Co-Label Learning From Multiple Noisy AnnotatorsIEEE Transactions on Multimedia10.1109/TMM.2021.313775225(1045-1057)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2021.3137752
Chi ZWang ZWang BFang ZZhu ZLi DDu W(2023)Multiple Kernel Subspace Learning for Clustering and ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320072335:7(7278-7290)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3200723
Rahimzadeh Arashloo S(2023)One-Class Classification Using ℓp-Norm Multiple Kernel Fisher Null ApproachIEEE Transactions on Image Processing10.1109/TIP.2023.325510232(1843-1856)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIP.2023.3255102
Wang KHe FHe MHuang X(2023)Learning non-parametric kernel via matrix decomposition for logistic regressionPattern Recognition Letters10.1016/j.patrec.2023.05.018171:C(177-183)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.patrec.2023.05.018
Quayson EGanaa EZhu QShen X(2023)Multi-view Representation Induced Kernel Ensemble Support Vector MachineNeural Processing Letters10.1007/s11063-023-11250-z55:6(7035-7056)Online publication date: 3-Apr-2023
https://dl.acm.org/doi/10.1007/s11063-023-11250-z
Yazdandoost Hamedani EJalilzadeh A(2023)A stochastic variance-reduced accelerated primal-dual method for finite-sum saddle-point problemsComputational Optimization and Applications10.1007/s10589-023-00472-585:2(653-679)Online publication date: 10-Apr-2023
https://dl.acm.org/doi/10.1007/s10589-023-00472-5
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents