Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores

Published: 19 December 2022 Publication History

Abstract

We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.

References

[1]
Dimitris Achlioptas. 2001. Database-friendly random projections. In Proceedings of the 20th ACM Symposium on Principles of Database Systems. 274–281.
[2]
Nir Ailon and Bernard Chazelle. 2006. Approximate nearest neighbors and the fast johnson-lindenstrauss transform. In Proceedings of the 38th ACM Symposium on Theory of Computing. 557–563.
[3]
Nir Ailon and Edo Liberty. 2009. Fast dimension reduction using rademacher series on dual BCH codes. Discr. Comput. Geom. 42, 4 (2009), 615.
[4]
Ahmed Alaoui and Michael W. Mahoney. 2015. Fast randomized kernel ridge regression with statistical guarantees. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc.
[5]
Josh Alman and Virginia Vassilevska Williams. 2021. A refined laser method and faster matrix multiplication. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. SIAM, 522–539.
[6]
Hartwig Anzt, Terry Cojean, Yen-Chen Chen, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, Tobias Ribizel, and Yu-Hsiang Tsai. 2020. Ginkgo: A high performance numerical linear algebra library. J. Open Source Softw. 5, 52 (2020), 2260.
[7]
Viviana Arrigoni, Filippo Maggioli, Annalisa Massini, and Emanuele Rodolà. 2021. Efficiently parallelizable strassen-based multiplication of a matrix by its transpose. In Proceedings of the 50th International Conference on Parallel Processing. Association for Computing Machinery, New York, NY, 12. DOI:
[8]
Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Workshop on Large-scale Data Mining: Theory and Applications, Vol. 10. 10–9.
[9]
Haim Avron and Christos Boutsidis. 2013. Faster subset selection for matrices and applications. SIAM J. Matrix Anal. Appl. 34, 4 (2013), 1464–1499.
[10]
Haim Avron, Kenneth L. Clarkson, and David P. Woodruff. 2017. Faster kernel ridge regression using sketching and preconditioning. SIAM J. Matrix Anal. Appl. 38, 4 (2017), 1116–1138.
[11]
Haim Avron and Lior Horesh. 2015. Community detection using time-dependent personalized pagerank. In Proceedings of the 32nd International Conference on Machine Learning. PMLR, 1795–1803.
[12]
H. Avron, P. Maymounkov, and S. Toledo. 2010. Blendenpik: Supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput. 32, 3 (2010), 1217–1236.
[13]
Haim Avron and Vikas Sindhwani. 2016. High-performance kernel machines with implicit distributed optimization and randomization. Technometrics 58, 3 (2016), 341–349.
[14]
Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication. SIAM J. Sci. Comput. 38, 6 (2016), C624–C651.
[15]
Mitali Bafna and Nikhil Vyas. 2021. Optimal fine-grained hardness of approximation of linear equations. In Proceedings of the 48th Intternational Colloquium on Automata, Languages and Programming. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
[16]
Afonso S. Bandeira, Amit Singer, and Thomas Strohmer. 2020. Mathematics of Data Science. Retrieved from https://people.math.ethz.ch/abandeira/BandeiraSingerStrohmer-MDS-draft.pdf.
[17]
Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. 2009. An improved approximation algorithm for the column subset selection problem. In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms. 968–977.
[18]
Aydin Buluc and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS’08). IEEE, 1–11.
[19]
Aydın Buluç and John R. Gilbert. 2011. The combinatorial BLAS: Design, implementation, and applications. Int. J. High Perf. Comput. Appl. 25, 4 (2011), 496–509.
[20]
Coralia Cartis, Jan Fiala, and Zhen Shao. 2021. Hashing embeddings of optimal dimension, with applications to linear least squares. arXiv:2105.11815. Retrieved from https://arxiv.org/abs/2105.11815.
[21]
Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming. Springer, 693–703.
[22]
Nadiia Chepurko, Kenneth L Clarkson, Praneeth Kacham, and David P Woodruff. 2022. Near-optimal algorithms for linear algebra in the current matrix multiplication time. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms. SIAM, 3043–3068.
[23]
Ho Yee Cheung, Tsz Chiu Kwok, and Lap Chi Lau. 2013. Fast matrix rank algorithms and applications. J. ACM 60, 5 (2013), 1–25.
[24]
Ali Civril. 2014. Column subset selection problem is UG-hard. J. Comput. Syst. Sci. 80, 4 (2014), 849–859.
[25]
Kenneth L. Clarkson and David P. Woodruff. 2017. Low-rank approximation and regression in input sparsity time. J. ACM 63, 6, Article 54 (January2017), 54:1–54:45 pages.
[26]
Michael B. Cohen. 2016. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the 27th ACM-SIAM Symposium on Discrete Algorithms. 278–287.
[27]
Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. 2015. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the 47th ACM Symposium on Theory of Computing. 163–172.
[28]
Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. 2015. Uniform sampling for matrix approximation. In Proceedings of the 2015 Innovations in Theoretical Computer Science. ACM Press, New York, New York, 181–190.
[29]
Michael B. Cohen, Cameron Musco, and Christopher Musco. 2017. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms. 1758–1777.
[30]
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1 (1998), 46–55.
[31]
Yogesh Dahiya, Dimitris Konomis, and David P. Woodruff. 2018. An empirical evaluation of sketching for numerical linear algebra. In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1292–1300.
[32]
Timothy A. Davis. 2006. Direct Methods for Sparse Linear Systems. SIAM. DOI:
[33]
Michal Derezinski, Jonathan Lacotte, Mert Pilanci, and Michael W. Mahoney. 2021. Newton-LESS: Sparsification without trade-offs for the sketched newton update. In Advances in Neural Information Processing Systems, Vol. 34 (2021).
[34]
Michal Derezinski, Zhenyu Liao, Edgar Dobriban, and Michael Mahoney. 2021. Sparse sketches with small inversion bias. In Proceedings of the Conference on Learning Theory. PMLR, 1467–1510.
[35]
Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, and David P. Woodruff. 2012. Fast approximation of matrix coherence and statistical leverage. J. Mach. Learn. Res. 13, 1 (2012), 3475–3506.
[36]
P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós. 2010. Faster least squares approximation. Numer. Math. 117, 2 (October2010), 219–249. DOI:
[37]
Iain S. Duff, Michael A. Heroux, and Roldan Pozo. 2002. An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum. ACM Trans. Math. Softw. 28, 2 (2002), 239–267.
[38]
Jean-Guillaume Dumas, Clement Pernet, and Alexandre Sedoglavic. 2020. On fast multiplication of a matrix by its transpose. In Proceedings of the 45th International Symposium on Symbolic and Algebraic Computation (ISSAC’20). 162–169.
[39]
Flatiron Institute. sparse_dot_mkl. Retrieved from https://github.com/flatironinstitute/sparse_dot.
[40]
François Le Gall and Florent Urrutia. 2018. Improved rectangular matrix multiplication using powers of the coppersmith-winograd tensor. In Proceedings of the 29th ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1029–1046.
[41]
Alex Gittens and Michael W. Mahoney. 2013. Revisiting the Nyström method for improved large-scale machine learning. Proceedings of the 30th International. Conference on Machine Learning, Vol. 28, 567–575.
[42]
Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. Retrieved from http://eigen.tuxfamily.org.
[43]
Fred G. Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Trans. Math. Softw. 4, 3 (1978), 250–269.
[44]
IBM Corporation. IBM Engineering and Scientific Subroutine Library. Retrieved from https://www.ibm.com/docs/en/essl.
[45]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on Theory of Computing. 604–613.
[46]
Intel Corporation. Intel oneAPI Math Kernel Library. Retrieved from https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html.
[47]
William B. Johnson and Joram Lindenstrauss. 1984. Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26, 1 (1984), 189–206.
[48]
Daniel M. Kane and Jelani Nelson. 2014. Sparser johnson-lindenstrauss transforms. J. ACM 61, 1 (2014), 1–23.
[49]
Michael Kapralov, Vamsi Potluru, and David Woodruff. 2016. How to fake multiply by a gaussian matrix. In Proceedings of the 33rd International Conference on Machine Learning. 2101–2110.
[50]
Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. SIAM.
[51]
G. Kollias, H. Avron, Y. Ineichen, C. Bekas, A. Curioni, V. Sindhwani, and K. Clarkson. 2015. libSkylark: A framework for high-performance matrix sketching for statistical computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.
[52]
Eugenia-Maria Kontopoulou, Gregory-Paul Dexter, Wojciech Szpankowski, Ananth Grama, and Petros Drineas. 2020. Randomized linear algebra approaches to estimate the von neumann entropy of density matrices. IEEE Trans. Inf. Theory 66, 8 (2020), 5003–5021.
[53]
Kornilios Kourtis, Georgios Goumas, and Nectarios Koziris. 2008. Optimizing sparse matrix-vector multiplication using index and value compression. In Proceedings of the 5th Conference on Computing Frontiers. 87–96.
[54]
Kornilios Kourtis, Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2011. CSX: An extended compression format for spmv on shared memory systems. ACM SIGPLAN Not. 46, 8 (2011), 247–256.
[55]
Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, and Torsten Hoefler. 2021. On the parallel I/O optimality of linear algebra kernels: Near-optimal LU factorization. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’21). 463–464.
[56]
Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, and Torsten Hoefler. 2019. Red-blue pebbling revisited: Near optimal parallel matrix-matrix multiplication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–22.
[57]
Rasmus Kyng, Di Wang, and Peng Zhang. 2020. Packing LPs are hard to solve accurately, assuming linear equations are hard. In Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms. SIAM, 279–296.
[58]
Mu Li, Gary L. Miller, and Richard Peng. 2013. Iterative row sampling. In Proceedings of the 54th IEEE Annual Symposium on Foundations of Computer Science. 127–136.
[59]
Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 287–296.
[60]
George Marsaglia, Wai Wan Tsang, et al. 2000. The ziggurat method for generating random variables. J. Stat. Softw. 5, 8 (2000), 1–7.
[61]
Per-Gunnar Martinsson and Joel A. Tropp. 2020. Randomized numerical linear algebra: Foundations and algorithms. Acta Numer. 29 (2020), 403–572.
[62]
Maike Meier and Yuji Nakatsukasa. 2021. Fast randomized numerical rank estimation. arXiv:2105.07388. Retrieved from https://arxiv.org/abs/2105.07388.
[63]
Xiangrui Meng and Michael W. Mahoney. 2013. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the 45th ACM Symposium on Theory of Computing. 91–100.
[64]
Xiangrui Meng, Michael A. Saunders, and Michael W. Mahoney. 2014. LSRN: A parallel iterative solver for strongly over-or underdetermined systems. SIAM J. Sci. Comput. 36, 2 (2014), C95–C118.
[65]
Raphael A. Meyer, Cameron Musco, Christopher Musco, and David P. Woodruff. 2021. Hutch++: Optimal stochastic trace estimation. In Proceedings of the Symposium on Simplicity in Algorithms (SOSA’21). SIAM, 142–155.
[66]
Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, and David P. Woodruff. 2018. Spectrum approximation beyond fast matrix multiplication: Algorithms and hardness. In Proceedings of the 9th Innovations in Theoretical Computer Science 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[67]
Jelani Nelson and Huy L. Nguyên. 2013. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Proceedings of the IEEE Annual Symposium on Foundations of Computer Science. 117–126.
[68]
Jelani Nelson and Huy L. Nguyên. 2013. Sparsity lower bounds for dimensionality reducing maps. In Proceedings of the 45th ACM Symposium on Theory of Computing. 101–110.
[69]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825–2830.
[70]
Mert Pilanci and Martin J. Wainwright. 2016. Iterative hessian sketch: Fast and accurate solution approximation for constrained least-squares. J. Mach. Learn. Res. 17, 1 (2016), 1842–1879.
[71]
Mert Pilanci and Martin J. Wainwright. 2017. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Optim. 27, 1 (2017), 205–245.
[72]
V. Rokhlin and M. Tygert. 2008. A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci. U.S.A. 105, 36 (2008), 13212–13217.
[73]
Youcef Saad. 1990. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Retrieved from https://www-users.cse.umn.edu/saad/software/SPARSKIT/.
[74]
John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw. 2011. Parallel random numbers: As easy as 1, 2, 3. In Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12.
[75]
Tamás Sarlós. 2006. Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th IEEE Annual Symposium on Foundations of Computer Science. IEEE, 143–152.
[76]
B. Saunders, Arne Storjohann, and Gilles Villard. 2004. Matrix rank certification. Electr. J. Lin. Algebr. 11 (2004), 16–23.
[77]
Yaroslav Shitov. 2021. Column subset selection is NP-complete. Lin. Algebr. Appl. 610 (2021), 52–58.
[78]
Aleksandros Sobczyk and Efstratios Gallopoulos. 2021. Estimating leverage scores via rank revealing methods and randomization. SIAM J. Matrix Anal. Appl. 42, 3 (January2021), 1199–1228. DOI:
[79]
Daniel A. Spielman and Nikhil Srivastava. 2008. Graph sparsification by effective resistances. In Proceedings of the 40th ACM Symposium on Theory of Computing (2008), 563.
[80]
Antonio Torralba, Rob Fergus, and William T Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11 (2008), 1958–1970.
[81]
Joel A. Tropp. 2011. Improved analysis of the subsampled randomized hadamard transform. Adv. Adapt. Data Anal. 3, 01n02 (2011), 115–126.
[82]
Roman Vershynin. 2018. High-dimensional Probability: An Introduction with Applications in Data Science. Vol. 47. Cambridge University Press.
[83]
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nat. Methods 17, 3 (2020), 261–272.
[84]
Wei Wang, Minos Garofalakis, and Kannan Ramchandran. 2007. Distributed sparse random projections for refinable approximation. In Proceedings of the 6th International Conference on Information Processing in Sensor Networks (IPSN’07). 331–339.
[85]
David P. Woodruff. 2014. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10, 1–2 (2014), 1–157.
[86]
Jiyan Yang, Xiangrui Meng, and Michael W Mahoney. 2015. Implementing randomized matrix algorithms in parallel and distributed environments. Proc. IEEE 104, 1 (2015), 58–92.
[87]
Raphael Yuster and Uri Zwick. 2005. Fast sparse matrix multiplication. ACM Trans. Algor. 1, 1 (2005), 2–13.
[88]
Qian Zuo and Hua Xiang. 2021. A quantum-inspired algorithm for approximating statistical leverage scores. arXiv:2111.08915. Retrieved from https://arxiv.org/abs/2111.08915.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 48, Issue 4
December 2022
339 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3572845
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2022
Online AM: 08 August 2022
Accepted: 21 July 2022
Revised: 28 February 2022
Received: 05 July 2021
Published in TOMS Volume 48, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Parallel algorithms
  2. sparse data structures
  3. sketching
  4. column subset selection
  5. regression
  6. preconditioning
  7. statistical leverage scores

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 289
    Total Downloads
  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media