research-article

Open access

Register Tiling for Unstructured Sparsity in Neural Network Inference

Authors:

Lucas Wilkinson,

Maryam Mehri DehnaviAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 7, Issue PLDI

Article No.: 188, Pages 1995 - 2020

https://doi.org/10.1145/3591302

Published: 06 June 2023 Publication History

Abstract

Unstructured sparse neural networks are an important class of machine learning (ML) models, as they compact model size and reduce floating point operations. The execution time of these models is frequently dominated by the sparse matrix multiplication (SpMM) kernel, C=A× B, where A is a sparse matrix, and B and C are dense matrices. The unstructured sparsity pattern of matrices in pruned machine learning models along with their sparsity ratio has rendered useless the large class of libraries and systems that optimize sparse matrix multiplications. Reusing registers is particularly difficult because accesses to memory locations should be known statically. This paper proposes Sparse Register Tiling, a new technique composed of an unroll-and-sparse-jam transformation followed by data compression that is specifically tailored to sparsity patterns in ML matrices. Unroll-and-sparse-jam uses sparsity information to jam the code while improving register reuse. Sparse register tiling is evaluated across 2396 weight matrices from transformer and convolutional models with a sparsity range of 60-95% and provides an average speedup of 1.72× and 2.65× over MKL SpMM and dense matrix multiplication, respectively, on a multicore CPU processor. It also provides an end-to-end speedup of 2.12× for MobileNetV1 with 70% sparsity on an ARM processor commonly used in edge devices.

References

[1]

Andreas Abel and Jan Reineke. 2019. uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures. In ASPLOS (ASPLOS ’19). ACM, New York, NY, USA. 673–686. isbn:978-1-4503-6240-5 https://doi.org/10.1145/3297858.3304062

Digital Library

[2]

Hasan Metin Aktulga, Aydin Buluç, Samuel Williams, and Chao Yang. 2014. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. 1213–1222. https://doi.org/10.1109/IPDPS.2014.125

Digital Library

[3]

MOSEK ApS. 2022. MOSEK Optimization Suite. https://docs.mosek.com/10.0/pythonapi.pdf

[4]

ARM. 2015. Cortex-A72 Software Optimization Guide Application Note UAN 0016A. https://developer.arm.com/documentation/uan0016/a/

[5]

ARM. 2022. ARM Compute Library. https://github.com/ARM-software/ComputeLibrary

[6]

ARM. 2022. ARM Performance Libraries. https://developer.arm.com/downloads/-/arm-performance-libraries

[7]

Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615

Digital Library

[8]

Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., 19, 4 (2022), Article 50, sep, 25 pages. issn:1544-3566 https://doi.org/10.1145/3544559

Digital Library

[9]

Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks. In Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). Association for Computing Machinery, New York, NY, USA. 233–244. isbn:9781605586069 https://doi.org/10.1145/1583991.1584053

Digital Library

[10]

Steve Carr and Ken Kennedy. 1994. Improving the Ratio of Memory Operations to Floating-Point Operations in Loops. ACM Trans. Program. Lang. Syst., 16, 6 (1994), nov, 1768–1810. issn:0164-0925 https://doi.org/10.1145/197320.197366

Digital Library

[11]

Kazem Cheshmi, Zachary Cetinic, and Maryam Mehri Dehnavi. 2022. Vectorizing Sparse Matrix Computations with Partially-Strided Codelets. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’22). IEEE Press, Article 32, 15 pages. isbn:9784665454445 https://doi.org/10.1109/SC41404.2022.00037

[12]

Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2017. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’17). Association for Computing Machinery, New York, NY, USA. Article 13, 13 pages. isbn:9781450351140 https://doi.org/10.1145/3126908.3126936

Digital Library

[13]

Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2019. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’18). IEEE Press, Article 62, 15 pages. https://doi.org/10.1109/SC.2018.00065

Digital Library

[14]

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, dec, 25 pages. issn:0098-3500 https://doi.org/10.1145/2049662.2049663

Digital Library

[15]

Xiaohan Ding, Guiguang Ding, Xiangxin Zhou, Yuchen Guo, Jungong Han, and Ji Liu. 2019. Global Sparse Momentum SGD for Pruning Very Deep Neural Networks. Curran Associates Inc., Red Hook, NY, USA.

[16]

E. Elsen, M. Dukhan, T. Gale, and K. Simonyan. 2020. Fast Sparse ConvNets. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. 14617–14626. https://doi.org/10.1109/CVPR42600.2020.01464

[17]

Trevor Gale, Erich Elsen, and Sara Hooker. 2019. The State of Sparsity in Deep Neural Networks. CoRR, abs/1902.09574 (2019), https://doi.org/10.48550/arXiv.1902.09574 arXiv:1902.09574.

[18]

Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 17, 14 pages. isbn:9781728199986 https://doi.org/10.1109/SC41405.2020.00021

[19]

Google. 2022. XNNPACK. https://github.com/google/XNNPACK

[20]

Kazushige Goto and Robert A. van de Geijn. 2008. Anatomy of High-Performance Matrix Multiplication. ACM Trans. Math. Softw., 34, 3 (2008), Article 12, may, 25 pages. issn:0098-3500 https://doi.org/10.1145/1356052.1356053

Digital Library

[21]

Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, and Yuhao Zhu. 2020. Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 16, 15 pages. isbn:9781728199986

Digital Library

[22]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’15). MIT Press, Cambridge, MA, USA. 1135–1143.

Digital Library

[23]

Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. 2021. Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks. J. Mach. Learn. Res., 22, 1 (2021), Article 241, jan, 124 pages. issn:1532-4435

[24]

Changwan Hong, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Jinsung Kim, Süreyya Emre Kurt, Israt Nisa, Shivani Sabhlok, Ümit V. Çatalyürek, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Efficient Sparse-Matrix Multi-Vector Product on GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’18). Association for Computing Machinery, New York, NY, USA. 66–79. isbn:9781450357852 https://doi.org/10.1145/3208040.3208062

Digital Library

[25]

Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). Association for Computing Machinery, New York, NY, USA. 300–314. isbn:9781450362252 https://doi.org/10.1145/3293883.3295712

Digital Library

[26]

Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2023. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT ’22). Association for Computing Machinery, New York, NY, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668

Digital Library

[27]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. https://doi.org/10.48550/arXiv.1704.04861 arxiv:1704.04861.

[28]

Guyue Huang, Haoran Li, Minghai Qin, Fei Sun, Yufei Ding, and Yuan Xie. 2022. Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning. In Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC ’22). Association for Computing Machinery, New York, NY, USA. 1153–1158. isbn:9781450391429 https://doi.org/10.1145/3489517.3530588

Digital Library

[29]

Eun-Jin Im and Katherine Yelick. 2001. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY. In Computational Science — ICCS 2001, Vassil N. Alexandrov, Jack J. Dongarra, Benjoe A. Juliano, René S. Renner, and C. J. Kenneth Tan (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 127–136. isbn:978-3-540-45545-5 https://doi.org/10.1007/3-540-45545-0_22

[30]

Intel. 2022. Intel Math Kernel Library. https://software.intel.com/ content/www/us/en/develop/tools/math-kernel-library.html Accessed: 2022-08-17

[31]

Durk P Kingma, Tim Salimans, and Max Welling. 2015. Variational Dropout and the Local Reparameterization Trick. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.). 28, Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/file/bc7316929fe1545bf0b98d114ee3ecb8-Paper.pdf

[32]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901

Digital Library

[33]

H. T. Kung, Vikas Natesh, and Andrew Sabot. 2021. CAKE: Matrix Multiplication Using Constant-Bandwidth Blocks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA. Article 85, 14 pages. isbn:9781450384421 https://doi.org/10.1145/3458817.3476166

Digital Library

[34]

Süreyya Emre Kurt, Aravind Sukumaran-Rajam, Fabrice Rastello, and P. Sadayyapan. 2020. Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 87, 14 pages. isbn:9781728199986 https://doi.org/10.1109/SC41405.2020.00091

[35]

François Lagunas, Ella Charlaix, Victor Sanh, and Alexander Rush. 2021. Block Pruning For Faster Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. 10619–10629. https://doi.org/10.18653/v1/2021.emnlp-main.829

[36]

Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, and Decebal Constantin Mocanu. 2021. Sparse Training via Boosting Pruning Plasticity with Neuroregeneration. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, Inc., 9908–9922. https://proceedings.neurips.cc/paper/2021/file/5227b6aaf294f5f027273aebf16015f2-Paper.pdf

[37]

Christos Louizos, Max Welling, and Diederik P. Kingma. 2018. Learning Sparse Neural Networks through L_0 Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=H1Y8hhg0b

[38]

Tze Meng Low, Francisco D. Igual, Tyler M. Smith, and Enrique S. Quintana-Orti. 2016. Analytical Modeling Is Enough for High-Performance BLIS. ACM Trans. Math. Softw., 43, 2 (2016), Article 12, aug, 18 pages. issn:0098-3500 https://doi.org/10.1145/2925987

Digital Library

[39]

Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J. Dally. 2017. Exploring the Regularity of Sparse Structure in Convolutional Neural Networks. https://doi.org/10.48550/arXiv.1705.08922 arxiv:1705.08922.

[40]

Lu Miao, Xiaolong Luo, Tianlong Chen, Wuyang Chen, Dong Liu, and Zhangyang Wang. 2022. Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, And No Retraining. In International Conference on Learning Representations. https://openreview.net/forum?id=O1DEtITim__

[41]

Asit Mishra, Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan Stosic, Ganesh Venkatesh, Chong Yu, and Paulius Micikevicius. 2021. Accelerating Sparse Deep Neural Networks. https://doi.org/10.48550/arXiv.2104.08378 arxiv:2104.08378.

[42]

Mahdi Soltan Mohammadi, Kazem Cheshmi, Maryam Mehri Dehnavi, Anand Venkat, Tomofumi Yuki, and Michelle Mills Strout. 2019. Extending index-array properties for data dependence analysis. In Languages and Compilers for Parallel Computing: 31st International Workshop, LCPC 2018, Salt Lake City, UT, USA, October 9–11, 2018, Revised Selected Papers 31. 78–93. https://doi.org/10.1007/978-3-030-34627-0_7

[43]

Mahdi Soltan Mohammadi, Tomofumi Yuki, Kazem Cheshmi, Eddie C. Davis, Mary Hall, Maryam Mehri Dehnavi, Payal Nandy, Catherine Olschanowsky, Anand Venkat, and Michelle Mills Strout. 2019. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 594–609. isbn:9781450367127 https://doi.org/10.1145/3314221.3314646

Digital Library

[44]

Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-Based Weight Pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 907–922. isbn:9781450371025 https://doi.org/10.1145/3373376.3378534

Digital Library

[45]

Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, and Trishul Chilimbi. 2017. Optimizing CNNs on Multicores for Scalability, Performance and Goodput. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). Association for Computing Machinery, New York, NY, USA. 267–280. isbn:9781450344654 https://doi.org/10.1145/3037697.3037745

Digital Library

[46]

Carl Edward Rasmussen and Zoubin Ghahramani. 2000. Occam’s Razor. In Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS’00). MIT Press, Cambridge, MA, USA. 276–282.

[47]

Victor Sanh, Thomas Wolf, and Alexander M. Rush. 2020. Movement Pruning: Adaptive Sparsity by Fine-Tuning. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20). Curran Associates Inc., Red Hook, NY, USA. Article 1711, 12 pages. isbn:9781713829546

[48]

Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 158, nov, 30 pages. https://doi.org/10.1145/3428226

Digital Library

[49]

Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 157–173. isbn:978-3-642-11261-4

[50]

Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 521–532. isbn:9781450334686 https://doi.org/10.1145/2737924.2738003

Digital Library

[51]

Richard Vuduc, James Demmel, and Katherine Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics Conference Series, 16 (2005), 01, 521–530. https://doi.org/10.1088/1742-6596/16/1/071

[52]

R. Vuduc, J.W. Demmel, K.A. Yelick, S. Kamil, R. Nishtala, and B. Lee. 2002. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. In SC ’02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. 26–26. https://doi.org/10.1109/SC.2002.10025

[53]

Richard W. Vuduc and Hyun-Jin Moon. 2005. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the First International Conference on High Performance Computing and Communications (HPCC’05). Springer-Verlag, Berlin, Heidelberg. 807–816. isbn:3540290311 https://doi.org/10.1007/11557654_91

Digital Library

[54]

Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference: Artifact. https://doi.org/10.5281/zenodo.7832346

Digital Library

[55]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM, 52, 4 (2009), apr, 65–76. issn:0001-0782 https://doi.org/10.1145/1498765.1498785

Digital Library

[56]

Carl Yang, Ayd∈ Buluç, and John D. Owens. 2018. Design Principles for Sparse Matrix Multiplication on the GPU. In Euro-Par 2018: Parallel Processing: 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27 - 31, 2018, Proceedings. Springer-Verlag, Berlin, Heidelberg. 672–687. isbn:978-3-319-96982-4 https://doi.org/10.1007/978-3-319-96983-1_48

Digital Library

[57]

Xin Yu, Thiago Serra, Srikumar Ramalingam, and Shandian Zhe. 2022. The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks. In Proceedings of the 39th International Conference on Machine Learning, Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.) (Proceedings of Machine Learning Research, Vol. 162). PMLR, 25668–25683. https://proceedings.mlr.press/v162/yu22f.html

[58]

Tuowen Zhao, Tobi Popoola, Mary Hall, Catherine Olschanowsky, and Michelle Strout. 2022. Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-Iteration. ACM Trans. Archit. Code Optim., 20, 1 (2022), Article 16, dec, 26 pages. issn:1544-3566 https://doi.org/10.1145/3566054

Digital Library

[59]

Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, and Lidong Zhou. 2022. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 213–232.

[60]

Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: exploring the efficacy of pruning for model compression. https://doi.org/10.48550/ARXIV.1710.01878

[61]

Maohua Zhu, Tao Zhang, Zhenyu Gu, and Yuan Xie. 2019. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-Wise Sparse Neural Networks on Modern GPUs. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’52). Association for Computing Machinery, New York, NY, USA. 359–371. isbn:9781450369381 https://doi.org/10.1145/3352460.3358269

Digital Library

Cited By

Cheshmi KStrout MMehri Dehnavi MMohror KArnold DBadia R(2023)Runtime Composition of Iterations for Fusing Loop-carried Sparse DependenceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607097(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607097

Index Terms

Register Tiling for Unstructured Sparsity in Neural Network Inference
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Register-Aware Optimizations for Parallel Sparse Matrix---Matrix Multiplication

General sparse matrix---matrix multiplication (SpGEMM) is a fundamental building block of a number of high-level algorithms and real-world applications. In recent years, several efficient SpGEMM algorithms have been proposed for many-core processors ...
Block ILU Preconditioners for a Nonsymmetric Block-Tridiagonal M-Matrix
Abstract
We propose block ILU (incomplete LU) factorization preconditioners for a nonsymmetric block-tridiagonal M-matrix whose computation can be done in parallel based on matrix blocks. Some theoretical properties for these block ILU factorization ...
Improved sparse low-rank matrix estimation

We consider estimating simultaneously sparse and low-rank matrices from their noisy observations.We use non-convex penalty functions that are parameterized to ensure strict convexity of the overall objective function.An ADMM based algorithm is derived ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 7, Issue PLDI

June 2023

2020 pages

EISSN:2475-1421

DOI:10.1145/3554310

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2023

Published in PACMPL Volume 7, Issue PLDI

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSERC
NSERC Discovery

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
1,045
Total Downloads

Downloads (Last 12 months)912
Downloads (Last 6 weeks)65

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheshmi KStrout MMehri Dehnavi MMohror KArnold DBadia R(2023)Runtime Composition of Iterations for Fusing Loop-carried Sparse DependenceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607097(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607097

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents