research-article

Open access

Parallel Software for Million-scale Exact Kernel Regression

Authors:

Yu Chen,

Lucca Skon,

James Mccombs,

Zhenming Liu,

Andreas StathopoulosAuthors Info & Claims

ICS '23: Proceedings of the 37th International Conference on Supercomputing

Pages 313 - 323

https://doi.org/10.1145/3577193.3593737

Published: 21 June 2023 Publication History

PDF eReader

Abstract

We present the design and the implementation of a kernel principal component regression software that handles training datasets with a million or more observations. Kernel regressions are nonlinear and interpretable models that have wide downstream applications, and are shown to have a close connection to deep learning. Nevertheless, the exact regression of large-scale kernel models using currently available software has been notoriously difficult because it is both compute and memory intensive and it requires extensive tuning of hyperparameters.

While in computational science distributed computing and iterative methods have been a mainstay of large scale software, they have not been widely adopted in kernel learning. Our software leverages existing high performance computing (HPC) techniques and develops new ones that address cross-cutting constraints between HPC and learning algorithms. It integrates three major components: (a) a state-of-the-art parallel eigenvalue iterative solver, (b) a block matrix-vector multiplication routine that employs both multi-threading and distributed memory parallelism and can be performed on-the-fly under limited memory, and (c) a software pipeline consisting of Python front-ends that control the HPC backbone and the hyperparameter optimization through a boosting optimizer. We perform feasibility studies by running the entire ImageNet dataset and a large asset pricing dataset.

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.

Abstract

References

Index Terms

Recommendations

A note on kernel principal component regression

Kernel linear regression for face recognition

Online multiple kernel regression

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations