Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Learning the Kernel Matrix with Semidefinite Programming

Published: 01 December 2004 Publication History

Abstract

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive semidefinite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space---classical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semidefinite programming (SDP) techniques. When applied to a kernel matrix associated with both training and test data this gives a powerful transductive algorithm---using the labeled part of the data one can learn an embedding also for the unlabeled part. The similarity between test points is inferred from training points and their labels. Importantly, these learning problems are convex, so we obtain a method for learning both the model class and the function without local minima. Furthermore, this approach leads directly to a convex method for learning the 2-norm soft margin parameter in support vector machines, solving an important open problem.

References

[1]
Andersen, E. D. and Andersen, A. D. (2000). The MOSEK interior point optimizer for linear programming: An implementation of the homogeneous algorithm. In Frenk, H., Roos, C., Terlaky, T., and Zhang, S., editors, High Performance Optimization, pages 197-232. Kluwer Academic Publishers.
[2]
Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525-536.
[3]
Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463-482.
[4]
Bennett, K. P. and Bredensteiner, E. J. (2000). Duality and geometry in SVM classifiers. In Proceedings of the 17th International Conference on Machine Learning, pages 57-64. Morgan Kaufmann.
[5]
Boyd, S. and Vandenberghe, L. (2003). Convex optimization. Course notes for EE364, Stanford University. Available at http://www.stanford.edu/class/ee364.
[6]
Breiman, L. (1998). Arcing classifiers. Annals of Statistics, 26(3):801-849.
[7]
Cai, L. and Hofmann, T. (2003). Text categorization by boosting automatically extracted concepts. In Proceedings of the 26th ACM-SIGIR International Conference on Research and Development in Information Retrieval. ACM Press.
[8]
Cristianini, N., Kandola, J., Elisseeff, A., and Shawe-Taylor, J. (2001). On kernel target alignment. Technical Report NeuroColt 2001-099, Royal Holloway University London.
[9]
Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press.
[10]
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., and Kandola, J. (2002). On kernel-target alignment. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14. MIT Press.
[11]
De Bie, T., Lanckriet, G., and Cristianini, N. (2003). Convex tuning of the soft margin parameter. Technical Report CSD-03-1289, University of California, Berkeley.
[12]
Deng, M., Chen, T., and Sun, F. (2003). An integrated probabilistic model for functional prediction of proteins. In RECOMB, pages 95-103.
[13]
Eyheramendy, S., Genkin, A., Ju, W., Lewis, D. D., and Madigan, D. (2003). Sparse bayesian classifiers for text categorization. Technical report, Department of Statistics, Rutgers University.
[14]
Huang, Y. (2003). Support vector machines for text categorization based on latent semantic indexing. Technical report, Electrical and Computer Engineering Department, The Johns Hopkins University.
[15]
Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30.
[16]
Kondor, R. I. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Sammut, C. and Hoffmann, A., editors, Proceedings of the International Conference on Machine Learning. Morgan Kaufmann.
[17]
Lanckriet, G. R. G., Deng, M., Cristianini, N., Jordan, M. I., and Noble, W. S. (2004). Kernel-based data fusion and its application to protein function prediction in yeast. In Pacific Symposium on Biocomputing.
[18]
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag.
[19]
McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148-188. Cambridge University Press.
[20]
Nesterov, Y. and Nemirovsky, A. (1994). Interior Point Polynomial Methods in Convex Programming: Theory and Applications. SIAM.
[21]
Platt, J. (1999). Using sparseness and analytic QP to speed training of support vector machines. In M. S. Kearns, S. A. Solla, D. A. C., editor, Advances in Neural Information Processing Systems 11. MIT Press.
[22]
Salton, G. and McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
[23]
Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press.
[24]
Seeger, M. (2002). PAC-Bayesian generalization error bounds for Gaussian process classification. Technical Report EDI-INF-RR-0094, University of Edinburgh, Division of Informatics.
[25]
Shawe-Taylor, J. and Cristianini, N. (1999). Soft margin and margin distribution. In Smola, A., Schölkopf, B., Bartlett, P., and Schuurmans, D., editors, Advances in Large Margin classifiers. MIT Press.
[26]
Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.
[27]
Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195-197.
[28]
Sturm, J. F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11-12:625-653. Special issue on Interior Point Methods (CD supplement with software).
[29]
Tsuda, K. (1999). Support vector classification with asymmetric kernel function. In Verleysen, M., editor, Proceedings of the European Symposium on Artificial Neural Networks, pages 183-188.
[30]
Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. SIAM Review, 38(1):49-95.
[31]
Vandenberghe, L., Boyd, S., and Wu, S.-P. (1998). Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications, 19(2):499-533.

Cited By

View all
  • (2024)The Gauss-cos model for the autocorrelation function of fertility rateApplied Mathematics and Computation10.1016/j.amc.2024.128907480:COnline publication date: 1-Nov-2024
  • (2023)Multiple Kernel Learning for Learner ClassificationProceedings of the 2023 6th International Conference on Algorithms, Computing and Artificial Intelligence10.1145/3639631.3639651(113-118)Online publication date: 22-Dec-2023
  • (2023)Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical ResultsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.319817545:4(4747-4767)Online publication date: 1-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 December 2004
Published in JMLR Volume 5

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)17
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Gauss-cos model for the autocorrelation function of fertility rateApplied Mathematics and Computation10.1016/j.amc.2024.128907480:COnline publication date: 1-Nov-2024
  • (2023)Multiple Kernel Learning for Learner ClassificationProceedings of the 2023 6th International Conference on Algorithms, Computing and Artificial Intelligence10.1145/3639631.3639651(113-118)Online publication date: 22-Dec-2023
  • (2023)Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical ResultsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.319817545:4(4747-4767)Online publication date: 1-Apr-2023
  • (2023)Latent Heterogeneous Graph Network for Incomplete Multi-View LearningIEEE Transactions on Multimedia10.1109/TMM.2022.315459225(3033-3045)Online publication date: 1-Jan-2023
  • (2023)Trustable Co-Label Learning From Multiple Noisy AnnotatorsIEEE Transactions on Multimedia10.1109/TMM.2021.313775225(1045-1057)Online publication date: 1-Jan-2023
  • (2023)Multiple Kernel Subspace Learning for Clustering and ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320072335:7(7278-7290)Online publication date: 1-Jul-2023
  • (2023)One-Class Classification Using ℓp-Norm Multiple Kernel Fisher Null ApproachIEEE Transactions on Image Processing10.1109/TIP.2023.325510232(1843-1856)Online publication date: 1-Jan-2023
  • (2023)Learning non-parametric kernel via matrix decomposition for logistic regressionPattern Recognition Letters10.1016/j.patrec.2023.05.018171:C(177-183)Online publication date: 1-Jul-2023
  • (2023)Multi-view Representation Induced Kernel Ensemble Support Vector MachineNeural Processing Letters10.1007/s11063-023-11250-z55:6(7035-7056)Online publication date: 3-Apr-2023
  • (2023)A stochastic variance-reduced accelerated primal-dual method for finite-sum saddle-point problemsComputational Optimization and Applications10.1007/s10589-023-00472-585:2(653-679)Online publication date: 10-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media