Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Register Tiling for Unstructured Sparsity in Neural Network Inference

Published: 06 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Unstructured sparse neural networks are an important class of machine learning (ML) models, as they compact model size and reduce floating point operations. The execution time of these models is frequently dominated by the sparse matrix multiplication (SpMM) kernel, C=A× B, where A is a sparse matrix, and B and C are dense matrices. The unstructured sparsity pattern of matrices in pruned machine learning models along with their sparsity ratio has rendered useless the large class of libraries and systems that optimize sparse matrix multiplications. Reusing registers is particularly difficult because accesses to memory locations should be known statically. This paper proposes Sparse Register Tiling, a new technique composed of an unroll-and-sparse-jam transformation followed by data compression that is specifically tailored to sparsity patterns in ML matrices. Unroll-and-sparse-jam uses sparsity information to jam the code while improving register reuse. Sparse register tiling is evaluated across 2396 weight matrices from transformer and convolutional models with a sparsity range of 60-95% and provides an average speedup of 1.72× and 2.65× over MKL SpMM and dense matrix multiplication, respectively, on a multicore CPU processor. It also provides an end-to-end speedup of 2.12× for MobileNetV1 with 70% sparsity on an ARM processor commonly used in edge devices.

    References

    [1]
    Andreas Abel and Jan Reineke. 2019. uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures. In ASPLOS (ASPLOS ’19). ACM, New York, NY, USA. 673–686. isbn:978-1-4503-6240-5 https://doi.org/10.1145/3297858.3304062
    [2]
    Hasan Metin Aktulga, Aydin Buluç, Samuel Williams, and Chao Yang. 2014. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. 1213–1222. https://doi.org/10.1109/IPDPS.2014.125
    [3]
    MOSEK ApS. 2022. MOSEK Optimization Suite. https://docs.mosek.com/10.0/pythonapi.pdf
    [4]
    ARM. 2015. Cortex-A72 Software Optimization Guide Application Note UAN 0016A. https://developer.arm.com/documentation/uan0016/a/
    [5]
    ARM. 2022. ARM Compute Library. https://github.com/ARM-software/ComputeLibrary
    [6]
    ARM. 2022. ARM Performance Libraries. https://developer.arm.com/downloads/-/arm-performance-libraries
    [7]
    Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615
    [8]
    Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., 19, 4 (2022), Article 50, sep, 25 pages. issn:1544-3566 https://doi.org/10.1145/3544559
    [9]
    Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks. In Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). Association for Computing Machinery, New York, NY, USA. 233–244. isbn:9781605586069 https://doi.org/10.1145/1583991.1584053
    [10]
    Steve Carr and Ken Kennedy. 1994. Improving the Ratio of Memory Operations to Floating-Point Operations in Loops. ACM Trans. Program. Lang. Syst., 16, 6 (1994), nov, 1768–1810. issn:0164-0925 https://doi.org/10.1145/197320.197366
    [11]
    Kazem Cheshmi, Zachary Cetinic, and Maryam Mehri Dehnavi. 2022. Vectorizing Sparse Matrix Computations with Partially-Strided Codelets. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’22). IEEE Press, Article 32, 15 pages. isbn:9784665454445 https://doi.org/10.1109/SC41404.2022.00037
    [12]
    Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2017. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’17). Association for Computing Machinery, New York, NY, USA. Article 13, 13 pages. isbn:9781450351140 https://doi.org/10.1145/3126908.3126936
    [13]
    Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2019. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’18). IEEE Press, Article 62, 15 pages. https://doi.org/10.1109/SC.2018.00065
    [14]
    Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, dec, 25 pages. issn:0098-3500 https://doi.org/10.1145/2049662.2049663
    [15]
    Xiaohan Ding, Guiguang Ding, Xiangxin Zhou, Yuchen Guo, Jungong Han, and Ji Liu. 2019. Global Sparse Momentum SGD for Pruning Very Deep Neural Networks. Curran Associates Inc., Red Hook, NY, USA.
    [16]
    E. Elsen, M. Dukhan, T. Gale, and K. Simonyan. 2020. Fast Sparse ConvNets. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. 14617–14626. https://doi.org/10.1109/CVPR42600.2020.01464
    [17]
    Trevor Gale, Erich Elsen, and Sara Hooker. 2019. The State of Sparsity in Deep Neural Networks. CoRR, abs/1902.09574 (2019), https://doi.org/10.48550/arXiv.1902.09574 arXiv:1902.09574.
    [18]
    Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 17, 14 pages. isbn:9781728199986 https://doi.org/10.1109/SC41405.2020.00021
    [19]
    Google. 2022. XNNPACK. https://github.com/google/XNNPACK
    [20]
    Kazushige Goto and Robert A. van de Geijn. 2008. Anatomy of High-Performance Matrix Multiplication. ACM Trans. Math. Softw., 34, 3 (2008), Article 12, may, 25 pages. issn:0098-3500 https://doi.org/10.1145/1356052.1356053
    [21]
    Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, and Yuhao Zhu. 2020. Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 16, 15 pages. isbn:9781728199986
    [22]
    Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’15). MIT Press, Cambridge, MA, USA. 1135–1143.
    [23]
    Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. 2021. Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks. J. Mach. Learn. Res., 22, 1 (2021), Article 241, jan, 124 pages. issn:1532-4435
    [24]
    Changwan Hong, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Jinsung Kim, Süreyya Emre Kurt, Israt Nisa, Shivani Sabhlok, Ümit V. Çatalyürek, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Efficient Sparse-Matrix Multi-Vector Product on GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’18). Association for Computing Machinery, New York, NY, USA. 66–79. isbn:9781450357852 https://doi.org/10.1145/3208040.3208062
    [25]
    Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). Association for Computing Machinery, New York, NY, USA. 300–314. isbn:9781450362252 https://doi.org/10.1145/3293883.3295712
    [26]
    Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2023. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT ’22). Association for Computing Machinery, New York, NY, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668
    [27]
    Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. https://doi.org/10.48550/arXiv.1704.04861 arxiv:1704.04861.
    [28]
    Guyue Huang, Haoran Li, Minghai Qin, Fei Sun, Yufei Ding, and Yuan Xie. 2022. Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning. In Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC ’22). Association for Computing Machinery, New York, NY, USA. 1153–1158. isbn:9781450391429 https://doi.org/10.1145/3489517.3530588
    [29]
    Eun-Jin Im and Katherine Yelick. 2001. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY. In Computational Science — ICCS 2001, Vassil N. Alexandrov, Jack J. Dongarra, Benjoe A. Juliano, René S. Renner, and C. J. Kenneth Tan (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 127–136. isbn:978-3-540-45545-5 https://doi.org/10.1007/3-540-45545-0_22
    [30]
    Intel. 2022. Intel Math Kernel Library. https://software.intel.com/ content/www/us/en/develop/tools/math-kernel-library.html Accessed: 2022-08-17
    [31]
    Durk P Kingma, Tim Salimans, and Max Welling. 2015. Variational Dropout and the Local Reparameterization Trick. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.). 28, Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/file/bc7316929fe1545bf0b98d114ee3ecb8-Paper.pdf
    [32]
    Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901
    [33]
    H. T. Kung, Vikas Natesh, and Andrew Sabot. 2021. CAKE: Matrix Multiplication Using Constant-Bandwidth Blocks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Association for Computing Machinery, New York, NY, USA. Article 85, 14 pages. isbn:9781450384421 https://doi.org/10.1145/3458817.3476166
    [34]
    Süreyya Emre Kurt, Aravind Sukumaran-Rajam, Fabrice Rastello, and P. Sadayyapan. 2020. Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 87, 14 pages. isbn:9781728199986 https://doi.org/10.1109/SC41405.2020.00091
    [35]
    François Lagunas, Ella Charlaix, Victor Sanh, and Alexander Rush. 2021. Block Pruning For Faster Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. 10619–10629. https://doi.org/10.18653/v1/2021.emnlp-main.829
    [36]
    Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, and Decebal Constantin Mocanu. 2021. Sparse Training via Boosting Pruning Plasticity with Neuroregeneration. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, Inc., 9908–9922. https://proceedings.neurips.cc/paper/2021/file/5227b6aaf294f5f027273aebf16015f2-Paper.pdf
    [37]
    Christos Louizos, Max Welling, and Diederik P. Kingma. 2018. Learning Sparse Neural Networks through L_0 Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=H1Y8hhg0b
    [38]
    Tze Meng Low, Francisco D. Igual, Tyler M. Smith, and Enrique S. Quintana-Orti. 2016. Analytical Modeling Is Enough for High-Performance BLIS. ACM Trans. Math. Softw., 43, 2 (2016), Article 12, aug, 18 pages. issn:0098-3500 https://doi.org/10.1145/2925987
    [39]
    Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J. Dally. 2017. Exploring the Regularity of Sparse Structure in Convolutional Neural Networks. https://doi.org/10.48550/arXiv.1705.08922 arxiv:1705.08922.
    [40]
    Lu Miao, Xiaolong Luo, Tianlong Chen, Wuyang Chen, Dong Liu, and Zhangyang Wang. 2022. Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, And No Retraining. In International Conference on Learning Representations. https://openreview.net/forum?id=O1DEtITim__
    [41]
    Asit Mishra, Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan Stosic, Ganesh Venkatesh, Chong Yu, and Paulius Micikevicius. 2021. Accelerating Sparse Deep Neural Networks. https://doi.org/10.48550/arXiv.2104.08378 arxiv:2104.08378.
    [42]
    Mahdi Soltan Mohammadi, Kazem Cheshmi, Maryam Mehri Dehnavi, Anand Venkat, Tomofumi Yuki, and Michelle Mills Strout. 2019. Extending index-array properties for data dependence analysis. In Languages and Compilers for Parallel Computing: 31st International Workshop, LCPC 2018, Salt Lake City, UT, USA, October 9–11, 2018, Revised Selected Papers 31. 78–93. https://doi.org/10.1007/978-3-030-34627-0_7
    [43]
    Mahdi Soltan Mohammadi, Tomofumi Yuki, Kazem Cheshmi, Eddie C. Davis, Mary Hall, Maryam Mehri Dehnavi, Payal Nandy, Catherine Olschanowsky, Anand Venkat, and Michelle Mills Strout. 2019. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 594–609. isbn:9781450367127 https://doi.org/10.1145/3314221.3314646
    [44]
    Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-Based Weight Pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 907–922. isbn:9781450371025 https://doi.org/10.1145/3373376.3378534
    [45]
    Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, and Trishul Chilimbi. 2017. Optimizing CNNs on Multicores for Scalability, Performance and Goodput. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). Association for Computing Machinery, New York, NY, USA. 267–280. isbn:9781450344654 https://doi.org/10.1145/3037697.3037745
    [46]
    Carl Edward Rasmussen and Zoubin Ghahramani. 2000. Occam’s Razor. In Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS’00). MIT Press, Cambridge, MA, USA. 276–282.
    [47]
    Victor Sanh, Thomas Wolf, and Alexander M. Rush. 2020. Movement Pruning: Adaptive Sparsity by Fine-Tuning. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20). Curran Associates Inc., Red Hook, NY, USA. Article 1711, 12 pages. isbn:9781713829546
    [48]
    Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 158, nov, 30 pages. https://doi.org/10.1145/3428226
    [49]
    Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 157–173. isbn:978-3-642-11261-4
    [50]
    Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 521–532. isbn:9781450334686 https://doi.org/10.1145/2737924.2738003
    [51]
    Richard Vuduc, James Demmel, and Katherine Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics Conference Series, 16 (2005), 01, 521–530. https://doi.org/10.1088/1742-6596/16/1/071
    [52]
    R. Vuduc, J.W. Demmel, K.A. Yelick, S. Kamil, R. Nishtala, and B. Lee. 2002. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. In SC ’02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. 26–26. https://doi.org/10.1109/SC.2002.10025
    [53]
    Richard W. Vuduc and Hyun-Jin Moon. 2005. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the First International Conference on High Performance Computing and Communications (HPCC’05). Springer-Verlag, Berlin, Heidelberg. 807–816. isbn:3540290311 https://doi.org/10.1007/11557654_91
    [54]
    Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference: Artifact. https://doi.org/10.5281/zenodo.7832346
    [55]
    Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM, 52, 4 (2009), apr, 65–76. issn:0001-0782 https://doi.org/10.1145/1498765.1498785
    [56]
    Carl Yang, Ayd∈ Buluç, and John D. Owens. 2018. Design Principles for Sparse Matrix Multiplication on the GPU. In Euro-Par 2018: Parallel Processing: 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27 - 31, 2018, Proceedings. Springer-Verlag, Berlin, Heidelberg. 672–687. isbn:978-3-319-96982-4 https://doi.org/10.1007/978-3-319-96983-1_48
    [57]
    Xin Yu, Thiago Serra, Srikumar Ramalingam, and Shandian Zhe. 2022. The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks. In Proceedings of the 39th International Conference on Machine Learning, Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.) (Proceedings of Machine Learning Research, Vol. 162). PMLR, 25668–25683. https://proceedings.mlr.press/v162/yu22f.html
    [58]
    Tuowen Zhao, Tobi Popoola, Mary Hall, Catherine Olschanowsky, and Michelle Strout. 2022. Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-Iteration. ACM Trans. Archit. Code Optim., 20, 1 (2022), Article 16, dec, 26 pages. issn:1544-3566 https://doi.org/10.1145/3566054
    [59]
    Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, and Lidong Zhou. 2022. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 213–232.
    [60]
    Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: exploring the efficacy of pruning for model compression. https://doi.org/10.48550/ARXIV.1710.01878
    [61]
    Maohua Zhu, Tao Zhang, Zhenyu Gu, and Yuan Xie. 2019. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-Wise Sparse Neural Networks on Modern GPUs. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’52). Association for Computing Machinery, New York, NY, USA. 359–371. isbn:9781450369381 https://doi.org/10.1145/3352460.3358269

    Cited By

    View all
    • (2023)Runtime Composition of Iterations for Fusing Loop-carried Sparse DependenceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607097(1-15)Online publication date: 12-Nov-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 7, Issue PLDI
    June 2023
    2020 pages
    EISSN:2475-1421
    DOI:10.1145/3554310
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2023
    Published in PACMPL Volume 7, Issue PLDI

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Loop Tiling
    2. Pruned Neural Networks
    3. Sparse Matrix

    Qualifiers

    • Research-article

    Funding Sources

    • NSERC
    • NSERC Discovery

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)912
    • Downloads (Last 6 weeks)65
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Runtime Composition of Iterations for Fusing Loop-carried Sparse DependenceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607097(1-15)Online publication date: 12-Nov-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media