research-article

pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores

Authors:

Aleksandros Sobczyk,

Efstratios GallopoulosAuthors Info & Claims

ACM Transactions on Mathematical Software, Volume 48, Issue 4

Article No.: 44, Pages 1 - 27

https://doi.org/10.1145/3555370

Published: 19 December 2022 Publication History

Abstract

We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.

References

[1]

Dimitris Achlioptas. 2001. Database-friendly random projections. In Proceedings of the 20th ACM Symposium on Principles of Database Systems. 274–281.

Digital Library

[2]

Nir Ailon and Bernard Chazelle. 2006. Approximate nearest neighbors and the fast johnson-lindenstrauss transform. In Proceedings of the 38th ACM Symposium on Theory of Computing. 557–563.

Digital Library

[3]

Nir Ailon and Edo Liberty. 2009. Fast dimension reduction using rademacher series on dual BCH codes. Discr. Comput. Geom. 42, 4 (2009), 615.

[4]

Ahmed Alaoui and Michael W. Mahoney. 2015. Fast randomized kernel ridge regression with statistical guarantees. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc.

[5]

Josh Alman and Virginia Vassilevska Williams. 2021. A refined laser method and faster matrix multiplication. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. SIAM, 522–539.

[6]

Hartwig Anzt, Terry Cojean, Yen-Chen Chen, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, Tobias Ribizel, and Yu-Hsiang Tsai. 2020. Ginkgo: A high performance numerical linear algebra library. J. Open Source Softw. 5, 52 (2020), 2260.

[7]

Viviana Arrigoni, Filippo Maggioli, Annalisa Massini, and Emanuele Rodolà. 2021. Efficiently parallelizable strassen-based multiplication of a matrix by its transpose. In Proceedings of the 50th International Conference on Parallel Processing. Association for Computing Machinery, New York, NY, 12. DOI:

Digital Library

[8]

Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Workshop on Large-scale Data Mining: Theory and Applications, Vol. 10. 10–9.

[9]

Haim Avron and Christos Boutsidis. 2013. Faster subset selection for matrices and applications. SIAM J. Matrix Anal. Appl. 34, 4 (2013), 1464–1499.

[10]

Haim Avron, Kenneth L. Clarkson, and David P. Woodruff. 2017. Faster kernel ridge regression using sketching and preconditioning. SIAM J. Matrix Anal. Appl. 38, 4 (2017), 1116–1138.

Digital Library

[11]

Haim Avron and Lior Horesh. 2015. Community detection using time-dependent personalized pagerank. In Proceedings of the 32nd International Conference on Machine Learning. PMLR, 1795–1803.

[12]

H. Avron, P. Maymounkov, and S. Toledo. 2010. Blendenpik: Supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput. 32, 3 (2010), 1217–1236.

[13]

Haim Avron and Vikas Sindhwani. 2016. High-performance kernel machines with implicit distributed optimization and randomization. Technometrics 58, 3 (2016), 341–349.

[14]

Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication. SIAM J. Sci. Comput. 38, 6 (2016), C624–C651.

[15]

Mitali Bafna and Nikhil Vyas. 2021. Optimal fine-grained hardness of approximation of linear equations. In Proceedings of the 48th Intternational Colloquium on Automata, Languages and Programming. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

[16]

Afonso S. Bandeira, Amit Singer, and Thomas Strohmer. 2020. Mathematics of Data Science. Retrieved from https://people.math.ethz.ch/abandeira/BandeiraSingerStrohmer-MDS-draft.pdf.

[17]

Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. 2009. An improved approximation algorithm for the column subset selection problem. In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms. 968–977.

[18]

Aydin Buluc and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS’08). IEEE, 1–11.

[19]

Aydın Buluç and John R. Gilbert. 2011. The combinatorial BLAS: Design, implementation, and applications. Int. J. High Perf. Comput. Appl. 25, 4 (2011), 496–509.

Digital Library

[20]

Coralia Cartis, Jan Fiala, and Zhen Shao. 2021. Hashing embeddings of optimal dimension, with applications to linear least squares. arXiv:2105.11815. Retrieved from https://arxiv.org/abs/2105.11815.

[21]

Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming. Springer, 693–703.

Digital Library

[22]

Nadiia Chepurko, Kenneth L Clarkson, Praneeth Kacham, and David P Woodruff. 2022. Near-optimal algorithms for linear algebra in the current matrix multiplication time. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms. SIAM, 3043–3068.

[23]

Ho Yee Cheung, Tsz Chiu Kwok, and Lap Chi Lau. 2013. Fast matrix rank algorithms and applications. J. ACM 60, 5 (2013), 1–25.

Digital Library

[24]

Ali Civril. 2014. Column subset selection problem is UG-hard. J. Comput. Syst. Sci. 80, 4 (2014), 849–859.

Digital Library

[25]

Kenneth L. Clarkson and David P. Woodruff. 2017. Low-rank approximation and regression in input sparsity time. J. ACM 63, 6, Article 54 (January2017), 54:1–54:45 pages.

Digital Library

[26]

Michael B. Cohen. 2016. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the 27th ACM-SIAM Symposium on Discrete Algorithms. 278–287.

[27]

Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. 2015. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the 47th ACM Symposium on Theory of Computing. 163–172.

Digital Library

[28]

Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. 2015. Uniform sampling for matrix approximation. In Proceedings of the 2015 Innovations in Theoretical Computer Science. ACM Press, New York, New York, 181–190.

Digital Library

[29]

Michael B. Cohen, Cameron Musco, and Christopher Musco. 2017. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms. 1758–1777.

[30]

Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1 (1998), 46–55.

Digital Library

[31]

Yogesh Dahiya, Dimitris Konomis, and David P. Woodruff. 2018. An empirical evaluation of sketching for numerical linear algebra. In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1292–1300.

Digital Library

[32]

Timothy A. Davis. 2006. Direct Methods for Sparse Linear Systems. SIAM. DOI:

[33]

Michal Derezinski, Jonathan Lacotte, Mert Pilanci, and Michael W. Mahoney. 2021. Newton-LESS: Sparsification without trade-offs for the sketched newton update. In Advances in Neural Information Processing Systems, Vol. 34 (2021).

[34]

Michal Derezinski, Zhenyu Liao, Edgar Dobriban, and Michael Mahoney. 2021. Sparse sketches with small inversion bias. In Proceedings of the Conference on Learning Theory. PMLR, 1467–1510.

[35]

Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, and David P. Woodruff. 2012. Fast approximation of matrix coherence and statistical leverage. J. Mach. Learn. Res. 13, 1 (2012), 3475–3506.

Digital Library

[36]

P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós. 2010. Faster least squares approximation. Numer. Math. 117, 2 (October2010), 219–249. DOI:

Digital Library

[37]

Iain S. Duff, Michael A. Heroux, and Roldan Pozo. 2002. An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum. ACM Trans. Math. Softw. 28, 2 (2002), 239–267.

Digital Library

[38]

Jean-Guillaume Dumas, Clement Pernet, and Alexandre Sedoglavic. 2020. On fast multiplication of a matrix by its transpose. In Proceedings of the 45th International Symposium on Symbolic and Algebraic Computation (ISSAC’20). 162–169.

Digital Library

[39]

Flatiron Institute. sparse_dot_mkl. Retrieved from https://github.com/flatironinstitute/sparse_dot.

[40]

François Le Gall and Florent Urrutia. 2018. Improved rectangular matrix multiplication using powers of the coppersmith-winograd tensor. In Proceedings of the 29th ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1029–1046.

[41]

Alex Gittens and Michael W. Mahoney. 2013. Revisiting the Nyström method for improved large-scale machine learning. Proceedings of the 30th International. Conference on Machine Learning, Vol. 28, 567–575.

[42]

Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. Retrieved from http://eigen.tuxfamily.org.

[43]

Fred G. Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Trans. Math. Softw. 4, 3 (1978), 250–269.

Digital Library

[44]

IBM Corporation. IBM Engineering and Scientific Subroutine Library. Retrieved from https://www.ibm.com/docs/en/essl.

[45]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on Theory of Computing. 604–613.

Digital Library

[46]

Intel Corporation. Intel oneAPI Math Kernel Library. Retrieved from https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html.

[47]

William B. Johnson and Joram Lindenstrauss. 1984. Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26, 1 (1984), 189–206.

[48]

Daniel M. Kane and Jelani Nelson. 2014. Sparser johnson-lindenstrauss transforms. J. ACM 61, 1 (2014), 1–23.

Digital Library

[49]

Michael Kapralov, Vamsi Potluru, and David Woodruff. 2016. How to fake multiply by a gaussian matrix. In Proceedings of the 33rd International Conference on Machine Learning. 2101–2110.

[50]

Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. SIAM.

Digital Library

[51]

G. Kollias, H. Avron, Y. Ineichen, C. Bekas, A. Curioni, V. Sindhwani, and K. Clarkson. 2015. libSkylark: A framework for high-performance matrix sketching for statistical computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.

[52]

Eugenia-Maria Kontopoulou, Gregory-Paul Dexter, Wojciech Szpankowski, Ananth Grama, and Petros Drineas. 2020. Randomized linear algebra approaches to estimate the von neumann entropy of density matrices. IEEE Trans. Inf. Theory 66, 8 (2020), 5003–5021.

[53]

Kornilios Kourtis, Georgios Goumas, and Nectarios Koziris. 2008. Optimizing sparse matrix-vector multiplication using index and value compression. In Proceedings of the 5th Conference on Computing Frontiers. 87–96.

Digital Library

[54]

Kornilios Kourtis, Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2011. CSX: An extended compression format for spmv on shared memory systems. ACM SIGPLAN Not. 46, 8 (2011), 247–256.

Digital Library

[55]

Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, and Torsten Hoefler. 2021. On the parallel I/O optimality of linear algebra kernels: Near-optimal LU factorization. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’21). 463–464.

Digital Library

[56]

Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, and Torsten Hoefler. 2019. Red-blue pebbling revisited: Near optimal parallel matrix-matrix multiplication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–22.

Digital Library

[57]

Rasmus Kyng, Di Wang, and Peng Zhang. 2020. Packing LPs are hard to solve accurately, assuming linear equations are hard. In Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms. SIAM, 279–296.

[58]

Mu Li, Gary L. Miller, and Richard Peng. 2013. Iterative row sampling. In Proceedings of the 54th IEEE Annual Symposium on Foundations of Computer Science. 127–136.

Digital Library

[59]

Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 287–296.

Digital Library

[60]

George Marsaglia, Wai Wan Tsang, et al. 2000. The ziggurat method for generating random variables. J. Stat. Softw. 5, 8 (2000), 1–7.

[61]

Per-Gunnar Martinsson and Joel A. Tropp. 2020. Randomized numerical linear algebra: Foundations and algorithms. Acta Numer. 29 (2020), 403–572.

[62]

Maike Meier and Yuji Nakatsukasa. 2021. Fast randomized numerical rank estimation. arXiv:2105.07388. Retrieved from https://arxiv.org/abs/2105.07388.

[63]

Xiangrui Meng and Michael W. Mahoney. 2013. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the 45th ACM Symposium on Theory of Computing. 91–100.

Digital Library

[64]

Xiangrui Meng, Michael A. Saunders, and Michael W. Mahoney. 2014. LSRN: A parallel iterative solver for strongly over-or underdetermined systems. SIAM J. Sci. Comput. 36, 2 (2014), C95–C118.

Digital Library

[65]

Raphael A. Meyer, Cameron Musco, Christopher Musco, and David P. Woodruff. 2021. Hutch++: Optimal stochastic trace estimation. In Proceedings of the Symposium on Simplicity in Algorithms (SOSA’21). SIAM, 142–155.

[66]

Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, and David P. Woodruff. 2018. Spectrum approximation beyond fast matrix multiplication: Algorithms and hardness. In Proceedings of the 9th Innovations in Theoretical Computer Science 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

[67]

Jelani Nelson and Huy L. Nguyên. 2013. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Proceedings of the IEEE Annual Symposium on Foundations of Computer Science. 117–126.

Digital Library

[68]

Jelani Nelson and Huy L. Nguyên. 2013. Sparsity lower bounds for dimensionality reducing maps. In Proceedings of the 45th ACM Symposium on Theory of Computing. 101–110.

Digital Library

[69]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825–2830.

Digital Library

[70]

Mert Pilanci and Martin J. Wainwright. 2016. Iterative hessian sketch: Fast and accurate solution approximation for constrained least-squares. J. Mach. Learn. Res. 17, 1 (2016), 1842–1879.

Digital Library

[71]

Mert Pilanci and Martin J. Wainwright. 2017. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Optim. 27, 1 (2017), 205–245.

Digital Library

[72]

V. Rokhlin and M. Tygert. 2008. A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci. U.S.A. 105, 36 (2008), 13212–13217.

[73]

Youcef Saad. 1990. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Retrieved from https://www-users.cse.umn.edu/saad/software/SPARSKIT/.

[74]

John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw. 2011. Parallel random numbers: As easy as 1, 2, 3. In Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12.

Digital Library

[75]

Tamás Sarlós. 2006. Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th IEEE Annual Symposium on Foundations of Computer Science. IEEE, 143–152.

Digital Library

[76]

B. Saunders, Arne Storjohann, and Gilles Villard. 2004. Matrix rank certification. Electr. J. Lin. Algebr. 11 (2004), 16–23.

[77]

Yaroslav Shitov. 2021. Column subset selection is NP-complete. Lin. Algebr. Appl. 610 (2021), 52–58.

[78]

Aleksandros Sobczyk and Efstratios Gallopoulos. 2021. Estimating leverage scores via rank revealing methods and randomization. SIAM J. Matrix Anal. Appl. 42, 3 (January2021), 1199–1228. DOI:

Digital Library

[79]

Daniel A. Spielman and Nikhil Srivastava. 2008. Graph sparsification by effective resistances. In Proceedings of the 40th ACM Symposium on Theory of Computing (2008), 563.

[80]

Antonio Torralba, Rob Fergus, and William T Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11 (2008), 1958–1970.

Digital Library

[81]

Joel A. Tropp. 2011. Improved analysis of the subsampled randomized hadamard transform. Adv. Adapt. Data Anal. 3, 01n02 (2011), 115–126.

[82]

Roman Vershynin. 2018. High-dimensional Probability: An Introduction with Applications in Data Science. Vol. 47. Cambridge University Press.

[83]

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nat. Methods 17, 3 (2020), 261–272.

[84]

Wei Wang, Minos Garofalakis, and Kannan Ramchandran. 2007. Distributed sparse random projections for refinable approximation. In Proceedings of the 6th International Conference on Information Processing in Sensor Networks (IPSN’07). 331–339.

[85]

David P. Woodruff. 2014. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10, 1–2 (2014), 1–157.

Digital Library

[86]

Jiyan Yang, Xiangrui Meng, and Michael W Mahoney. 2015. Implementing randomized matrix algorithms in parallel and distributed environments. Proc. IEEE 104, 1 (2015), 58–92.

[87]

Raphael Yuster and Uri Zwick. 2005. Fast sparse matrix multiplication. ACM Trans. Algor. 1, 1 (2005), 2–13.

Digital Library

[88]

Qian Zuo and Hua Xiang. 2021. A quantum-inspired algorithm for approximating statistical leverage scores. arXiv:2111.08915. Retrieved from https://arxiv.org/abs/2111.08915.

Index Terms

pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores

Recommendations

Provably correct algorithms for matrix column subset selection with selectively sampled data

We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to numerous real-...
Fair Column Subset Selection
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a setting where the ...
Parallel Preconditioning with Sparse Approximate Inverses

A parallel preconditioner is presented for the solution of general sparse linear systems of equations. A sparse approximate inverse is computed explicitly and then applied as a preconditioner to an iterative method. The computation of the preconditioner ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 48, Issue 4

December 2022

339 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/3572845

Editors:
Zhaojun Bai
University of California at Davis, USA
,
Wolfgang Bangerth
Colorado State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2022

Online AM: 08 August 2022

Accepted: 21 July 2022

Revised: 28 February 2022

Received: 05 July 2021

Published in TOMS Volume 48, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
289
Total Downloads

Downloads (Last 12 months)117
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents