article

Free access

A case study on meta-generalising: a Gaussian processes approach

Authors:

Grigorios Skolidis,

Guido SanguinettiAuthors Info & Claims

The Journal of Machine Learning Research, Volume 13

Pages 691 - 721

Published: 01 March 2012 Publication History

PDF eReader Publisher Site

Abstract

We propose a novel model for meta-generalisation, that is, performing prediction on novel tasks based on information from multiple different but related tasks. The model is based on two coupled Gaussian processes with structured covariance function; one model performs predictions by learning a constrained covariance function encapsulating the relations between the various training tasks, while the second model determines the similarity of new tasks to previously seen tasks. We demonstrate empirically on several real and synthetic data sets both the strengths of the approach and its limitations due to the distributional assumptions underpinning it.

References

[1]

J. H. Albert and S. Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422):669-679, 1993.

[2]

R.K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6:1817-1853, 2005.

[3]

A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243-272, 2008.

[4]

A. Arnold, R. Nallapati, and W.W. Cohen. A comparative study of methods for transductive transfer learning. In Proceedings of the 7th IEEE International Conference on Data Mining Workshops, pages 77-82, Omaha, Nebraska, USA, 2007.

[5]

B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4:83-99, 2003.

[6]

J. Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12: 149-198, 2000.

[7]

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems 19, pages 137-145, Vancouver, Canada, 2007.

[8]

S. Ben-David, T. Luu, T. Lu, and D. Pál. Impossibility theorems for domain adaptation. In Proceedings of the 13th International Workshop on Artificial Intelligence and Statistics, volume 13, pages 129-136, Sardinia, Italy, 2010.

[9]

S. Bickel, M. Brückner, and T. Scheffer. Discriminative learning under covariate shift. The Journal of Machine Learning Research, 10:2137-2155, 2009.

[10]

E. Bonilla, K. M. Chai, and C.K.I. Williams. Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems 20, pages 153-160, Vancouver, Canada, 2008.

[11]

K.H. Brodersen, C.S. Ong, K.E. Stephan, and J.M. Buhmann. The binormal assumption on precision-recall curves. In Proceedings of the 2010 International Conference on Pattern Recognition, pages 4263-4266, Istanbul, Turkey, 2010.

[12]

R. Caruana. Multi-task learning. Machine Learning, 28(1):41-75, 1997.

[13]

O. Chapelle, B. Schölkopf, and A. Zien. Semi-supervised learning. MIT Press, Cambridge, MA, 2006.

[14]

K. Crammer, M. Kearns, and J.Wortman. Learning from multiple sources. The Journal of Machine Learning Research, 9:1757-1774, 2008.

[15]

N. A.C. Cressie. Statistics for Spatial Data. John Wiley & Sons. New York. US, 1993.

[16]

L. Csató, E. Fokoué, M. Opper, B. Schottky, and O. Winther. Efficient approaches to gaussian process classification. In Advances in Neural Information Processing Systems 12, pages 251- 257, Denver, Colorado, 2000.

[17]

H. Daumé. Frustratingly easy domain adaptation. In Annual Meeting of the Association for Computational Linguistics, volume 45, pages 256-263, 2007.

[18]

H. Daumé III. Bayesian multitask learning with latent hierarchies. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages 135-142, Montreal, Canada, 2009.

[19]

H. Daumé III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26(1):101-126, 2006.

[20]

J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd International Conference on Machine Learning, pages 233-240, Pittsburgh, USA, 2006.

[21]

I. Deak. Three digit accurate multiple normal probabilities. Numerische Mathematik, 35(4):369- 380, 1980.

[22]

H. I. Gassmann, I. Deak, and T. Szantai. Computing multivariate normal probabilities: A new look. Journal of Computational and Graphical Statistics, 11(4):920-949, 2002.

[23]

A. Genz. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1(2):141-149, 1992.

[24]

M. Girolami and S. Rogers. Variational bayesian multinomial probit regression with gaussian process priors. Neural Computation, 18(8):1790-1817, 2006.

[25]

M. Girolami and M. Zhong. Data integration for classification problems employing Gaussian process priors. In Advances in Neural Information Processing Systems 19, pages 465-472, Vancouver, Canada, 2007.

[26]

A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation, 101(23): 215-220", 2000.

[27]

A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman & Hall/CRC, 2000.

[28]

J. A. Hanley and B. J. Mcneil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29-36, April 1982.

[29]

G.E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40(1-3):185-234, 1989.

[30]

J. Huang, A. J. Smola, A. Gretton, K M. Borgwardt, and B. Schölkopf. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems 19, pages 601- 608, Vancouver, Canada, 2007.

[31]

R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79-87, 1991.

[32]

L.I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2):281-286, 2002.

[33]

Q. Liu, X. Liao, H. Li, J. R. Stack, and L. Carin. Semisupervised multitask learning. IEEE Transactions on Pattern Analysis Machine Intelligence, 31(6):1074-1086, 2009.

[34]

D.J.C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

[35]

Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems 21, pages 1041-1048, Vancouver, Canada, 2009.

[36]

T.P. Minka. Expectation propagation for approximate bayesian inference. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, volume 17, pages 362-369, San Francisco, CA, USA, 2001.

[37]

M. Opper and O. Winther. Gaussian processes for classification: mean-field algorithms. Neural Computation, 12(11):2655-2684, 2000.

[38]

S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345-1359, 2010.

[39]

S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199-210, 2009.

[40]

R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, pages 759-766, Corvallis, OR, USA, 2007.

[41]

C. E. Rasmussen and C. K.I. Williams. Gaussian Processes for Machine Learning. MIT press, 2005.

[42]

C.E. Rasmussen and Z. Ghahramani. Infinite mixtures of gaussian process experts. In Advances in Neural Information Processing Systems 14, pages 881-888, Vancouver, Canada, 2001.

[43]

R. Rebonato and P. Jäckel. The most general methodology to create a valid correlation matrix for risk management and option pricing purposes. Journal of Risk, 2(2), 2000.

[44]

G. Skolidis and G. Sanguinetti. Bayesian multitask classification with gaussian process priors. IEEE Transactions on Neural Networks, 22(12):2011 -2021, Dec. 2011.

[45]

G. Skolidis, RH Clayton, and G. Sanguinetti. Automatic classification of arrhythmic beats using gaussian processes. In IEEE Transactions on Computers in Cardiology, 2008, pages 921-924, Bologna, Italy, 2008.

[46]

E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. In Advances in Neural Information Processing Systems 18, pages 1257-1264, Vancouver, Canada, 2006.

[47]

A. J. Storkey and M. Sugiyama. Mixture regression for covariate shift. In Advances in Neural Information Processing Systems 19, pages 1337-1344, Vancouver, Canada, 2007.

[48]

M. Sugiyama, M. Krauledat, and K.R. Müller. Covariate shift adaptation by importance weighted cross validation. The Journal of Machine Learning Research, 8:985-1005, 2007.

[49]

Volker Tresp. Mixtures of gaussian processes. In Advances in Neural Information Processing Systems 13, pages 654-660, Vancouver, Canada, 2000. MIT Press.

[50]

S.R. Waterhouse. Classification and Regression Using Mixtures of Experts. PhD thesis, Department of Engineering, Cambridge University, 1997.

[51]

Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task learning for classification with dirichlet process priors. The Journal of Machine Learning Research, 8:35-63, 2007.

[52]

K. Yu, V. Tresp, and A. Schwaighofer. Learning gaussian processes from multiple tasks. In Proceedings of the 22nd International Conference on Machine Learning, pages 1012-1019, Bonn, Germany, 2005.

Index Terms

A case study on meta-generalising: a Gaussian processes approach

Recommendations

A case study on meta-generalising: a Gaussian processes approach

We propose a novel model for meta-generalisation, that is, performing prediction on novel tasks based on information from multiple different but related tasks. The model is based on two coupled Gaussian processes with structured covariance function; one ...
Focused multi-task learning in a Gaussian process framework

Multi-task learning, learning of a set of tasks together, can improve performance in the individual learning tasks. Gaussian process models have been applied to learning a set of tasks on different data sets, by constructing joint priors for functions ...
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 13, Issue

3/1/2012

2065 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 March 2012

Published in JMLR Volume 13

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
146
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)14

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents