Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3577193.3593737acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Open access

Parallel Software for Million-scale Exact Kernel Regression

Published: 21 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    We present the design and the implementation of a kernel principal component regression software that handles training datasets with a million or more observations. Kernel regressions are nonlinear and interpretable models that have wide downstream applications, and are shown to have a close connection to deep learning. Nevertheless, the exact regression of large-scale kernel models using currently available software has been notoriously difficult because it is both compute and memory intensive and it requires extensive tuning of hyperparameters.
    While in computational science distributed computing and iterative methods have been a mainstay of large scale software, they have not been widely adopted in kernel learning. Our software leverages existing high performance computing (HPC) techniques and develops new ones that address cross-cutting constraints between HPC and learning algorithms. It integrates three major components: (a) a state-of-the-art parallel eigenvalue iterative solver, (b) a block matrix-vector multiplication routine that employs both multi-threading and distributed memory parallelism and can be performed on-the-fly under limited memory, and (c) a software pipeline consisting of Python front-ends that control the HPC backbone and the hyperparameter optimization through a boosting optimizer. We perform feasibility studies by running the entire ImageNet dataset and a large asset pricing dataset.

    References

    [1]
    Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
    [2]
    Sina Alemohammad, Zichao Wang, Randall Balestriero, and Richard Baraniuk. 2020. The recurrent neural tangent kernel. arXiv preprint arXiv:2006.10246 (2020).
    [3]
    E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Du Croz, A. Greenbaum, S. Hammarling, A McKenney, S. Ostrouchov, and D. Sorensen. 1992. LAPACK User's Guide. SIAM.
    [4]
    Haim Avron, Kenneth L Clarkson, and David P Woodruff. 2017. Faster kernel ridge regression using sketching and preconditioning. SIAM J. Matrix Anal. Appl. 38, 4 (2017), 1116--1138.
    [5]
    Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst (Eds.). 2000. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Software Environ. Tools 11, SIAM, Philadelphia.
    [6]
    C. G. Baker, U. L. Hetmaniuk, R. B. Lehoucq, and H. K. Thornquist. 2009. Anasazi Software for the Numerical Solution of Large-Scale Eigenvalue Problems. ACM Trans. Math. Software 36, 3 (2009). http://trilinos.sandia.gov/packages/anasazi.
    [7]
    R. Barrett, M. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst. 1994. Templates for the solution of linear systems: Building blocks for iterative methods. SIAM, Philadelphia, PA.
    [8]
    L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. 1997. ScaLAPACK User's Guide. SIAM.
    [9]
    Serhat S Bucak, Rong Jin, and Anil K Jain. 2013. Multiple kernel learning for visual object recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2013), 1354--1369.
    [10]
    Benjamin Charlier, Jean Feydy, Joan Alexis Glaunès, François-David Collin, and Ghislain Durif. 2021. Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research 22, 74 (2021), 1--6. http://jmlr.org/papers/v22/20-275.html
    [11]
    James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, and Omer Spillinger. 2013. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 261--272.
    [12]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
    [13]
    Lee H Dicker, Dean P Foster, and Daniel Hsu. 2017. Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators. Electronic Journal of Statistics 11, 1 (2017), 1022--1047.
    [14]
    Petros Drineas, Michael W Mahoney, and Nello Cristianini. 2005. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning. journal of machine learning research 6, 12 (2005).
    [15]
    Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
    [16]
    Andreas Frommer and Peter Maass. 1999. Fast CG-based methods for Tikhonov-Phillips regularization. SIAM Journal on Scientific Computing 20, 5 (1999), 1831--1850.
    [17]
    Jacob R Gardner, Geoff Pleiss, David Bindel, Kilian Q Weinberger, and Andrew Gordon Wilson. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. In Advances in Neural Information Processing Systems.
    [18]
    Mustansar Ali Ghazanfar, Adam Prügel-Bennett, and Sandor Szedmak. 2012. Kernel-mapping recommender system algorithms. Information Sciences 208 (2012), 81--104.
    [19]
    Shihao Gu, Bryan Kelly, and Dacheng Xiu. 2020. Empirical asset pricing via machine learning. The Review of Financial Studies 33, 5 (2020), 2223--2273.
    [20]
    Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217--288.
    [21]
    Vicente Hernandez, Jose E. Roman, and Vicente Vidal. 2005. SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Software 31, 3 (2005), 351--362.
    [22]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
    [23]
    Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (1970), 55--67.
    [24]
    Toshiyuki Imamura, Susumu Yamada, and Masahiko Machida. 2011. Development of a high performance eigensolver on the petascale next generation supercomputer system. Progress in Nuclear Science and Technology 2 (2011), 643--650.
    [25]
    Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems 31 (2018).
    [26]
    V Nisha Jenipher and S Radhika. 2021. SVM Kernel Methods with Data Normalization for Lung Cancer Survivability Prediction Application. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV). IEEE, 1294--1299.
    [27]
    Taichi Joutou and Keiji Yanai. 2009. A food image recognition system with multiple kernel learning. In 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 285--288.
    [28]
    Shubhra Kanti Karmaker, Md Mahadi Hassan, Micah J Smith, Lei Xu, Chengxiang Zhai, and Kalyan Veeramachaneni. 2021. Automl to date and beyond: Challenges and opportunities. ACM Computing Surveys (CSUR) 54, 8 (2021), 1--36.
    [29]
    Marius Kloft, Ulf Brefeld, Sören Sonnenburg, and Alexander Zien. 2011. Lp-norm multiple kernel learning. The Journal of Machine Learning Research 12 (2011), 953--997.
    [30]
    Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, and Torsten Hoefler. 2019. Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication (SC '19). Association for Computing Machinery, New York, NY, USA, Article 24, 22 pages.
    [31]
    Ivano Lauriola and Fabio Aiolli. 2020. MKLpy: a python-based framework for Multiple Kernel Learning. arXiv preprint arXiv:2007.09982 (2020).
    [32]
    R. B. Lehoucq, D. C. Sorensen, and C. Yang. 1998. ARPACK User's guide: Solution of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, PA.
    [33]
    Xin Li, Mengyue Wang, and T-P Liang. 2014. A multi-theoretical kernel-based approach to social network-based recommendation. Decision Support Systems 65 (2014), 95--104.
    [34]
    William B. March, Bo Xiao, Chenhan D. Yu, and George Biros. 2016. ASKIT: An Efficient, Parallel Library for High-Dimensional Kernel Summations. SIAM Journal on Scientific Computing 38, 5 (2016), S720--S749. arXiv:https://doi.org/10.1137/15M1026468
    [35]
    Per-Gunnar Martinsson. 2019. Randomized methods for matrix computations. The Mathematics of Data 25, 4 (2019), 187--231.
    [36]
    Ashin Mukherjee and Ji Zhu. 2011. Reduced rank ridge regression and its kernel extensions. Statistical analysis and data mining: the ASA data science journal 4, 6 (2011), 612--622.
    [37]
    Yuji Nakatsukasa and Joel A. Tropp. 2021. Fast & Accurate Randomized Algorithms for Linear Systems and Eigenvalue Problems.
    [38]
    Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, and Samuel S. Schoenholz. 2020. Neural Tangents: Fast and Easy Infinite Neural Networks in Python. In International Conference on Learning Representations. https://github.com/google/neural-tangents
    [39]
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
    [40]
    Marius-Constantin Popescu, Valentina E Balas, Liliana Perescu-Popescu, and Nikos Mastorakis. 2009. Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems 8, 7 (2009), 579--588.
    [41]
    Ali Rahimi and Benjamin Recht. 2007. Random features for large-scale kernel machines. Advances in neural information processing systems 20 (2007).
    [42]
    Bernhard Schölkopf, Alexander J Smola, Francis Bach, et al. 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
    [43]
    Hyunseok Seo, Masoud Badiei Khuzani, Varun Vasudevan, Charles Huang, Hongyi Ren, Ruoxiu Xiao, Xiao Jia, and Lei Xing. 2020. Machine learning techniques for biomedical image segmentation: an overview of technical aspects and introduction to state-of-art applications. Medical physics 47, 5 (2020), e148--e167.
    [44]
    Dan Stanzione, Bill Barth, Niall Gaffney, Kelly Gaither, Chris Hempel, Tommy Minyard, Susan Mehringer, Eric Wernert, H Tufo, D Panda, et al. 2017. Stampede 2: The evolution of an xsede supercomputer. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact. 1--8.
    [45]
    Andreas Stathopoulos and James R. McCombs. 2010. PRIMME: Preconditioned Iterative Multimethod Eigensolver---Methods and Software Description. ACM Trans. Math. Softw. 37, 2, Article 21 (apr 2010), 30 pages.
    [46]
    Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.
    [47]
    Stanimire Tomov, Rajib Nath, Peng Du, and Jack Dongarra. 2011. MAGMA Users' Guide. ICL, UTK (November 2009) (2011).
    [48]
    John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D Peterson, et al. 2014. XSEDE: accelerating scientific discovery. Computing in science & engineering 16, 5 (2014), 62--74.
    [49]
    Ke Alexander Wang, Geoff Pleiss, Jacob R Gardner, Stephen Tyree, Kilian Q. Weinberger, and Andre Gordon Wilson. 2019. Exact Gaussian Processes on a Million Data Points. 33rd Conference on Neural Information Processing Systems, NeurIPS (2019).
    [50]
    Li Wang and Ji Zhu. 2010. Financial market forecasting using a two-step kernel learning method for the support vector regression. Annals of Operations Research 174, 1 (2010), 103--120.
    [51]
    Christopher Williams and Matthias Seeger. 2000. Using the Nyström method to speed up kernel machines. Advances in neural information processing systems 13 (2000).
    [52]
    Qiong Wu, Christopher G Brinton, Zheng Zhang, Andrea Pizzoferrato, Zhenming Liu, and Mihai Cucuringu. 2021. Equity2vec: End-to-end deep learning framework for cross-sectional asset pricing. In Proceedings of the Second ACM International Conference on AI in Finance. 1--9.
    [53]
    Qiong Wu, Felix M Wong, Yanhua Li, Zhenming Liu, and Varun Kanade. 2020. Adaptive reduced rank regression. Advances in Neural Information Processing Systems 33 (2020), 4103--4114.
    [54]
    Yuan Yao, Lorenzo Rosasco, and Andrea Caponnetto. 2007. On early stopping in gradient descent learning. Constructive Approximation 26, 2 (2007), 289--315.
    [55]
    Chenyun Yu and Ka Chi Lam. 2014. Applying multiple kernel learning and support vector machine for solving the multicriteria and nonlinearity problems of traffic flow prediction. Journal of Advanced Transportation 48, 3 (2014), 250--271.
    [56]
    Yuchen Zhang, John Duchi, and Martin Wainwright. 2013. Divide and conquer kernel ridge regression. In Conference on learning theory. PMLR, 592--617.
    [57]
    Chunlin Zhao, Chongxun Zheng, Min Zhao, Yaling Tu, and Jianping Liu. 2011. Multivariate autoregressive models and kernel learning algorithms for classifying driving mental fatigue based on electroencephalographic. Expert Systems with Applications 38, 3 (2011), 1859--1865.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '23: Proceedings of the 37th ACM International Conference on Supercomputing
    June 2023
    505 pages
    ISBN:9798400700569
    DOI:10.1145/3577193
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 June 2023

    Check for updates

    Author Tags

    1. kernel principal component regression
    2. SVD
    3. parallel algorithm
    4. classification
    5. machine learning
    6. software tools
    7. boosting

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 416
      Total Downloads
    • Downloads (Last 12 months)406
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media