Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3327345.3327438guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Semi-supervised deep kernel learning: regression with unlabeled data by minimizing predictive variance

Published: 03 December 2018 Publication History

Abstract

Large amounts of labeled data are typically required to train deep learning models. For many real-world problems, however, acquiring additional data can be expensive or even impossible. We present semi-supervised deep kernel learning (SSDKL), a semi-supervised regression model based on minimizing predictive variance in the posterior regularization framework. SSDKL combines the hierarchical representation learning of neural networks with the probabilistic modeling capabilities of Gaussian processes. By leveraging unlabeled data, we show improvements on a diverse set of real-world regression tasks over supervised deep kernel learning and semi-supervised methods such as VAT and mean teacher adapted for regression.

References

[1]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
[2]
Neal Jean, Marshall Burke, Michael Xie, W Matthew Davis, David B Lobell, and Stefano Ermon. Combining satellite imagery and machine learning to predict poverty. Science, 353(6301):790-794, 2016.
[3]
Jiaxuan You, Xiaocheng Li, Melvin Low, David Lobell, and Stefano Ermon. Deep gaussian process for crop yield prediction based on remote sensing data. In AAAI, pages 4559-4566, 2017.
[4]
Barak Oshri, Annie Hu, Peter Adelson, Xiao Chen, Pascaline Dupas, Jeremy Weinstein, Marshall Burke, David Lobell, and Stefano Ermon. Infrastructure quality assessment in africa using satellite imagery and deep learning. Proc. 24th ACM SIGKDD Conference, 2018.
[5]
Xiaojin Zhu and Andrew B Goldberg. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3(1):1-130, 2009.
[6]
Rui Shu, Hung H Bui, Hirokazu Narui, and Stefano Ermon. A DIRT-T approach to unsupervised domain adaptation. In International Conference on Learning Representations, 2018.
[7]
Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pages 529-536, 2004.
[8]
Olivier Chapelle and Alexander Zien. Semi-supervised classification by low density separation. In AISTATS, pages 57-64, 2005.
[9]
Aarti Singh, Robert Nowak, and Xiaojin Zhu. Unlabeled data: Now it helps, now it doesn't. In Advances in neural information processing systems, pages 1513-1520, 2009.
[10]
Volodymyr Kuleshov and Stefano Ermon. Deep hybrid models: Bridging discriminative and generative approaches. In Proceedings of the Conference on Uncertainty in AI (UAI), 2017.
[11]
Russell Ren, Hongyu Stewart, Jiaming Song, Volodymyr Kuleshov, and Stefano Ermon. Adversarial constraint learning for structured prediction. Proc. 27th International Joint Conference on Artificial Intelligence, 2018.
[12]
Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. Deep kernel learning. The Journal of Machine Learning Research, 2015.
[13]
Stephan Eissman and Stefano Ermon. Bayesian optimization and attribute adjustment. Proc. 34th Conference on Uncertainty in Artificial Intelligence, 2018.
[14]
Kuzman Ganchev, Jennifer Gillenwater, Ben Taskar, et al. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11(Jul):2001-2049, 2010.
[15]
Jun Zhu, Ning Chen, and Eric P Xing. Bayesian inference with posterior regularization and applications to infinite latent svms. Journal of Machine Learning Research, 15(1):1799-1847, 2014.
[16]
Rui Shu, Hung H Bui, Shengjia Zhao, Mykel J Kochenderfer, and Stefano Ermon. Amortized inference regularization. NIPS, 2018.
[17]
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677, 2015.
[18]
Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 1195-1204. Curran Associates, Inc., 2017.
[19]
Andrew Arnold, Ramesh Nallapati, and William W. Cohen. A comparative study of methods for transductive transfer learning. Proc. Seventh IEEE Int',l Conf. Data Mining Workshops, 2007.
[20]
Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 370-378, 2016.
[21]
Carl Edward Rasmussen and Christopher KI Williams. Gaussian processes for machine learning. The MIT Press, 2006.
[22]
Andrew Gordon Wilson and Hannes Nickisch. Kernel interpolation for scalable structured gaussian processes (KISS-GP). In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 1775-1784, 2015.
[23]
M. Lichman. UCI machine learning repository, 2013.
[24]
Michael Xie, Neal Jean, Marshall Burke, David Lobell, and Stefano Ermon. Transfer learning from deep features for remote sensing and poverty mapping. AAAI Conference on Artificial Intelligence, 2016.
[25]
Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
[26]
Zhi-Hua Zhou and Ming Li. Semi-supervised regression with co-training. In IJCAI, volume 5, pages 908-913, 2005.
[27]
Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92-100. ACM, 1998.
[28]
Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. ICLR 2017.
[29]
Augustus Odena, Avital Oliver, Colin Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of semi-supervised learning algorithms. 2018.
[30]
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976, 2017.
[31]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[32]
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, 2002.
[33]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672-2680, 2014.
[34]
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[35]
Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary deep generative models. arXiv preprint arXiv:1602.05473, 2016.
[36]
Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pages 3546-3554, 2015.
[37]
Andreas C. Damianou and Neil D. Lawrence. Deep gaussian processes. The Journal of Machine Learning Research, 2013.
[38]
Andrew G Wilson, Zhiting Hu, Ruslan R Salakhutdinov, and Eric P Xing. Stochastic variational deep kernel learning. In Advances in Neural Information Processing Systems, pages 2586-2594, 2016.
[39]
Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, and Eric P Xing. Learning scalable deep kernels with recurrent structure. arXiv preprint arXiv:1610.08936, 2016.
[40]
Kai Yu, Jinbo Bi, and Volker Tresp. Active learning via transductive experimental design. The International Conference on Machine Learning (ICML), 2006.
[41]
Chenyang Zhao and Shaodan Zhai. Minimum variance semi-supervised boosting for multi-label classification. In 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1342-1346. IEEE, 2015.

Cited By

View all
  • (2021)Built Year Prediction from Buddha Face with Heterogeneous LabelsProceedings of the 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents10.1145/3475720.3484441(5-12)Online publication date: 20-Oct-2021
  1. Semi-supervised deep kernel learning: regression with unlabeled data by minimizing predictive variance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems
    December 2018
    11021 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 03 December 2018

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Built Year Prediction from Buddha Face with Heterogeneous LabelsProceedings of the 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents10.1145/3475720.3484441(5-12)Online publication date: 20-Oct-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media