Article

Free access

Semi-supervised deep kernel learning: regression with unlabeled data by minimizing predictive variance

Authors:

Sang Michael Xie,

Stefano ErmonAuthors Info & Claims

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Pages 5327 - 5338

Published: 03 December 2018 Publication History

PDF eReader Publisher Site

Abstract

Large amounts of labeled data are typically required to train deep learning models. For many real-world problems, however, acquiring additional data can be expensive or even impossible. We present semi-supervised deep kernel learning (SSDKL), a semi-supervised regression model based on minimizing predictive variance in the posterior regularization framework. SSDKL combines the hierarchical representation learning of neural networks with the probabilistic modeling capabilities of Gaussian processes. By leveraging unlabeled data, we show improvements on a diverse set of real-world regression tasks over supervised deep kernel learning and semi-supervised methods such as VAT and mean teacher adapted for regression.

References

[1]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.

Digital Library

[2]

Neal Jean, Marshall Burke, Michael Xie, W Matthew Davis, David B Lobell, and Stefano Ermon. Combining satellite imagery and machine learning to predict poverty. Science, 353(6301):790-794, 2016.

[3]

Jiaxuan You, Xiaocheng Li, Melvin Low, David Lobell, and Stefano Ermon. Deep gaussian process for crop yield prediction based on remote sensing data. In AAAI, pages 4559-4566, 2017.

Digital Library

[4]

Barak Oshri, Annie Hu, Peter Adelson, Xiao Chen, Pascaline Dupas, Jeremy Weinstein, Marshall Burke, David Lobell, and Stefano Ermon. Infrastructure quality assessment in africa using satellite imagery and deep learning. Proc. 24th ACM SIGKDD Conference, 2018.

Digital Library

[5]

Xiaojin Zhu and Andrew B Goldberg. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3(1):1-130, 2009.

Digital Library

[6]

Rui Shu, Hung H Bui, Hirokazu Narui, and Stefano Ermon. A DIRT-T approach to unsupervised domain adaptation. In International Conference on Learning Representations, 2018.

[7]

Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pages 529-536, 2004.

Digital Library

[8]

Olivier Chapelle and Alexander Zien. Semi-supervised classification by low density separation. In AISTATS, pages 57-64, 2005.

[9]

Aarti Singh, Robert Nowak, and Xiaojin Zhu. Unlabeled data: Now it helps, now it doesn't. In Advances in neural information processing systems, pages 1513-1520, 2009.

[10]

Volodymyr Kuleshov and Stefano Ermon. Deep hybrid models: Bridging discriminative and generative approaches. In Proceedings of the Conference on Uncertainty in AI (UAI), 2017.

[11]

Russell Ren, Hongyu Stewart, Jiaming Song, Volodymyr Kuleshov, and Stefano Ermon. Adversarial constraint learning for structured prediction. Proc. 27th International Joint Conference on Artificial Intelligence, 2018.

Digital Library

[12]

Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. Deep kernel learning. The Journal of Machine Learning Research, 2015.

[13]

Stephan Eissman and Stefano Ermon. Bayesian optimization and attribute adjustment. Proc. 34th Conference on Uncertainty in Artificial Intelligence, 2018.

[14]

Kuzman Ganchev, Jennifer Gillenwater, Ben Taskar, et al. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11(Jul):2001-2049, 2010.

Digital Library

[15]

Jun Zhu, Ning Chen, and Eric P Xing. Bayesian inference with posterior regularization and applications to infinite latent svms. Journal of Machine Learning Research, 15(1):1799-1847, 2014.

Digital Library

[16]

Rui Shu, Hung H Bui, Shengjia Zhao, Mykel J Kochenderfer, and Stefano Ermon. Amortized inference regularization. NIPS, 2018.

[17]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677, 2015.

[18]

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 1195-1204. Curran Associates, Inc., 2017.

Digital Library

[19]

Andrew Arnold, Ramesh Nallapati, and William W. Cohen. A comparative study of methods for transductive transfer learning. Proc. Seventh IEEE Int',l Conf. Data Mining Workshops, 2007.

Digital Library

[20]

Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 370-378, 2016.

[21]

Carl Edward Rasmussen and Christopher KI Williams. Gaussian processes for machine learning. The MIT Press, 2006.

Digital Library

[22]

Andrew Gordon Wilson and Hannes Nickisch. Kernel interpolation for scalable structured gaussian processes (KISS-GP). In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 1775-1784, 2015.

Digital Library

[23]

M. Lichman. UCI machine learning repository, 2013.

[24]

Michael Xie, Neal Jean, Marshall Burke, David Lobell, and Stefano Ermon. Transfer learning from deep features for remote sensing and poverty mapping. AAAI Conference on Artificial Intelligence, 2016.

Digital Library

[25]

Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.

[26]

Zhi-Hua Zhou and Ming Li. Semi-supervised regression with co-training. In IJCAI, volume 5, pages 908-913, 2005.

Digital Library

[27]

Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92-100. ACM, 1998.

Digital Library

[28]

Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. ICLR 2017.

[29]

Augustus Odena, Avital Oliver, Colin Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of semi-supervised learning algorithms. 2018.

[30]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976, 2017.

[31]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.

[32]

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, 2002.

[33]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672-2680, 2014.

Digital Library

[34]

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[35]

Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary deep generative models. arXiv preprint arXiv:1602.05473, 2016.

Digital Library

[36]

Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pages 3546-3554, 2015.

Digital Library

[37]

Andreas C. Damianou and Neil D. Lawrence. Deep gaussian processes. The Journal of Machine Learning Research, 2013.

[38]

Andrew G Wilson, Zhiting Hu, Ruslan R Salakhutdinov, and Eric P Xing. Stochastic variational deep kernel learning. In Advances in Neural Information Processing Systems, pages 2586-2594, 2016.

Digital Library

[39]

Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, and Eric P Xing. Learning scalable deep kernels with recurrent structure. arXiv preprint arXiv:1610.08936, 2016.

Digital Library

[40]

Kai Yu, Jinbo Bi, and Volker Tresp. Active learning via transductive experimental design. The International Conference on Machine Learning (ICML), 2006.

Digital Library

[41]

Chenyang Zhao and Shaodan Zhai. Minimum variance semi-supervised boosting for multi-label classification. In 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1342-1346. IEEE, 2015.

Cited By

Qian YEl Vaigh CNakashima YRenoust BNagahara HFujioka YGouet-Brunet VKhokhlova MKosti RWeng L(2021)Built Year Prediction from Buddha Face with Heterogeneous LabelsProceedings of the 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents10.1145/3475720.3484441(5-12)Online publication date: 20-Oct-2021
https://dl.acm.org/doi/10.1145/3475720.3484441

Semi-supervised deep kernel learning: regression with unlabeled data by minimizing predictive variance
1. Computing methodologies

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Multiview Semi-Supervised Learning with Consensus

Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...
Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification

Deep learning has gained popularity in a variety of computer vision tasks. Recently, it has also been successfully applied for hyperspectral image classification tasks. Training deep neural networks, such as a convolutional neural network for ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

December 2018

11021 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2018

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
90
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)8

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qian YEl Vaigh CNakashima YRenoust BNagahara HFujioka YGouet-Brunet VKhokhlova MKosti RWeng L(2021)Built Year Prediction from Buddha Face with Heterogeneous LabelsProceedings of the 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents10.1145/3475720.3484441(5-12)Online publication date: 20-Oct-2021
https://dl.acm.org/doi/10.1145/3475720.3484441

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents