research-article

Accelerating sparse DNN models without hardware-support via tile-wise sparsity

Authors:

Yuhao ZhuAuthors Info & Claims

SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 16, Pages 1 - 15

Published: 09 November 2020 Publication History

Abstract

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse models cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix computations. As such, prior works usually modify or design completely new sparsity-optimized architectures for exploiting sparsity. We propose an algorithm-software co-designed pruning method that achieves latency speedups on existing dense architectures. Our work builds upon the insight that the matrix multiplication generally breaks the large matrix into multiple smaller tiles for parallel execution. We propose a tiling-friendly "tile-wise" sparsity pattern, which maintains a regular pattern at the tile level for efficient execution but allows for irregular, arbitrary pruning at the global scale to maintain the high accuracy. We implement and evaluate the sparsity pattern on GPU tensor core, achieving a 1.95 X speedup over the dense model.

References

[1]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/

[2]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., "Language models are few-shot learners," arXiv preprint arXiv:2005.14165, 2020.

[3]

S. Cao, C. Zhang, Z. Yao, W. Xiao, L. Nie, D. Zhan, Y. Liu, M. Wu, and L. Zhang, "Efficient and effective sparse lstm on fpga with bank-balanced sparsity," in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '19. New York, NY, USA: Association for Computing Machinery, 2019, p. 63--72. [Online]. Available

Digital Library

[4]

C. Chao and B. Saeta, "Cloud TPU: Codesigning Architecture and Infrastructure," https://www.hotchips.org/hc31/HC31_T3_Cloud_TPU_Codesign.pdf, 2019.

[5]

K. Chellapilla, S. Puri, and P. Simard, "High performance convolutional neural networks for document processing," 2006.

[6]

Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127--138, 2016.

[7]

S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cudnn: Efficient primitives for deep learning," arXiv preprint arXiv:1410.0759, 2014.

[8]

R. Child, S. Gray, A. Radford, and I. Sutskever, "Generating long sequences with sparse transformers," arXiv preprint arXiv:1904.10509, 2019.

[9]

K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, "On the properties of neural machine translation: Encoder-decoder approaches," arXiv preprint arXiv:1409.1259, 2014.

[10]

K. Clark, U. Khandelwal, O. Levy, and C. D. Manning, "What does bert look at? an analysis of bert's attention," arXiv preprint arXiv:1906.04341, 2019.

[11]

C. Deng, F. Sun, X. Qian, J. Lin, Z. Wang, and B. Yuan, "Tie: energy-efficient tensor train-based inference engine for deep neural network," in Proceedings of the 46th International Symposium on Computer Architecture, 2019, pp. 264--278.

[12]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.

[13]

A. Elafrou, G. Goumas, and N. Koziris, "Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1--15.

[14]

D. Fujiki, N. Chatterjee, D. Lee, and M. O'Connor, "Near-memory data transformation for efficient sparse matrix multi-vector multiplication," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1--17.

[15]

Google, "Understanding searches better than ever before," https://www.blog.google/products/search/search-language-understanding-bert/, 2019.

[16]

C. Guo, Y. Zhou, J. Leng, Y. Zhu, Z. Du, Q. Chen, C. Li, B. Yao, and M. Guo, "Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration," in 2020 57th ACM/IEEE Design Automation Conference (DAC), 2020, pp. 1--6.

[17]

C. Guo, "Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration," arXiv preprint arXiv:2002.08326, 2020.

[18]

S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "Eie: Efficient inference engine on compressed deep neural network," in Proceedings of the 43rd International Symposium on Computer Architecture, ser. ISCA '16. IEEE Press, 2016, p. 243--254. [Online]. Available

Digital Library

[19]

S. Han, J. Pool, J. Tran, and W. Dally, "Learning both weights and connections for efficient neural network," in Advances in neural information processing systems, 2015, pp. 1135--1143.

Digital Library

[20]

K. Hegde, H. Asghari-Moghaddam, M. Pellauer, N. Crago, A. Jaleel, E. Solomonik, J. Emer, and C. W. Fletcher, "Extensor: An accelerator for sparse tensor algebra," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 319--333.

[21]

P. Hill, A. Jain, M. Hill, B. Zamirai, C.-H. Hsu, M. A. Laurenzano, S. Mahlke, L. Tang, and J. Mars, "Deftnn: Addressing bottlenecks for dnn execution on gpus via synapse vector elimination and near-compute data fission," in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2017, pp. 786--799.

[22]

S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735--1780, 1997.

Digital Library

[23]

W. Hua, Y. Zhou, C. De Sa, Z. Zhang, and G. E. Suh, "Boosting the performance of cnn accelerators with dynamic fine-grained channel gating," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 139--150.

[24]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., "In-datacenter performance analysis of a tensor processing unit," in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2017, pp. 1--12.

[25]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Communications of The ACM, vol. 60, no. 6, pp. 84--90, 2017.

Digital Library

[26]

H. Kung, B. McDanel, and S. Q. Zhang, "Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization," in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '19. New York, NY, USA: Association for Computing Machinery, 2019, p. 821--834. [Online]. Available

Digital Library

[27]

Y. LeCun, Y. Bengio et al., "Convolutional networks for images, speech, and time series," The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.

[28]

Y. LeCun, J. S. Denker, and S. A. Solla, "Optimal brain damage," in Advances in neural information processing systems, 1990, pp. 598--605.

[29]

J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," in Proceedings of the 40th Annual International Symposium on Computer Architecture. New York, NY, USA: Association for Computing Machinery, 2013, pp. 487--498. [Online]. Available

Digital Library

[30]

J.-H. Luo, J. Wu, and W. Lin, "Thinet: A filter level pruning method for deep neural network compression," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5058--5066.

[31]

M. Luong, E. Brevdo, and R. Zhao, "Neural machine translation (seq2seq) tutorial," https://github.com/tensorflow/nmt, 2017.

[32]

M.-T. Luong and C. D. Manning, "Achieving open vocabulary neural machine translation with hybrid word-character models," in Association for Computational Linguistics (ACL), Berlin, Germany, August 2016. [Online]. Available: https://nlp.stanford.edu/pubs/luong2016acl_hybrid.pdf

[33]

P. Molchanov, A. Mallya, S. Tyree, I. Frosio, and J. Kautz, "Importance estimation for neural network pruning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11264--11272.

[34]

P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, "Pruning convolutional neural networks for resource efficient inference," arXiv preprint arXiv:1611.06440, 2016.

[35]

S. Narang, E. Undersander, and G. Diamos, "Block-sparse recurrent neural networks," arXiv preprint arXiv:1711.02782, 2017.

[36]

I. Nisa, J. Li, A. Sukumaran-Rajam, P. S. Rawat, S. Krishnamoorthy, and P. Sadayappan, "An efficient mixed-mode representation of sparse tensors," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1--25.

[37]

W. Niu, X. Ma, S. Lin, S. Wang, X. Qian, X. Lin, Y. Wang, and B. Ren, "Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning," in Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '20, 2020.

[38]

NVIDIA, "GPU Pro Tip: CUDA 7 Streams Simplify Concurrency," https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/, 2015.

[39]

NVIDIA, "NVIDIA Volta GPU Architecture Whitepaper," 2017.

[40]

NVIDIA, "CUDA Toolkit Documentation v10.1," 2019.

[41]

NVIDIA, "CUTLASS 1.3," https://github.com/NVIDIA/cutlass, 2019.

[42]

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002, pp. 311--318.

[43]

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, "Scnn: An accelerator for compressed-sparse convolutional neural networks," in Proceedings of the 44th Annual International Symposium on Computer Architecture, ser. ISCA '17. New York, NY, USA: Association for Computing Machinery, 2017, p. 27--40. [Online]. Available

Digital Library

[44]

E. Qin, A. Samajdar, H. Kwon, V. Nadella, S. Srinivasan, D. Das, B. Kaul, and T. Krishna, "Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training," in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020, pp. 58--70.

[45]

Y. Qiu, J. Leng, C. Guo, Q. Chen, C. Li, M. Guo, and Y. Zhu, "Adversarial Defense Through Network Profiling Based Path Extraction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4777--4786.

[46]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, p. 9, 2019.

[47]

M. A. Raihan, N. Goli, and T. M. Aamodt, "Modeling deep learning accelerator enabled gpus," in 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019, pp. 79--92.

[48]

P. Rajpurkar, R. Jia, and P. Liang, "Know what you don't know: Unanswerable questions for squad," arXiv preprint arXiv:1806.03822, 2018.

[49]

P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, "Squad: 100,000+ questions for machine comprehension of text," arXiv preprint arXiv:1606.05250, 2016.

[50]

M. Rhu, M. Sullivan, J. Leng, and M. Erez, "A locality-aware memory hierarchy for energy-efficient gpu architectures," in 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013, pp. 86--98.

[51]

U. A. ROMAN STEINBERG, "6 areas where artificial neural networks outperform humans," https://venturebeat.com/2017/12/08/6-areas-where-artificial-neural-networks-outperform-humans/, 2018.

[52]

F. Sadi, J. Sweeney, T. M. Low, J. C. Hoe, L. Pileggi, and F. Franchetti, "Efficient spmv operation for large and highly sparse matrices using scalable multi-way merge parallelization," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 347--358.

[53]

K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in ICLR 2015 : International Conference on Learning Representations 2015, 2015.

[54]

E. Strubell, A. Ganesh, and A. McCallum, "Energy and policy considerations for deep learning in NLP," CoRR, vol. abs/1906.02243, 2019. [Online]. Available: http://arxiv.org/abs/1906.02243

[55]

P. Tillet, "Torch-Blocksparse," https://github.com/ptillet/torch-blocksparse, 2020.

[56]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," 2017. [Online]. Available: https://arxiv.org/pdf/1706.03762.pdf

[57]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998--6008.

[58]

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, "Glue: A multi-task benchmark and analysis platform for natural language understanding," in ICLR 2019 : 7th International Conference on Learning Representations, 2019.

[59]

W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, "Learning structured sparsity in deep neural networks," in Advances in neural information processing systems, 2016, pp. 2074--2082.

Digital Library

[60]

H. Yang, S. Gui, Y. Zhu, and J. Liu, "Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach," International Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[61]

H. Yang, Y. Zhu, and J. Liu, "Ecc: Energy-constrained deep neural network compression via a bilinear regression model," International Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[62]

H. Yang, "Energy-constrained compression for deep neural networks via weighted sparse projection and layer input masking," International Conference on Learning Representations (ICLR), 2019.

[63]

T.-J. Yang, Y.-H. Chen, and V. Sze, "Designing energy-efficient convolutional neural networks using energy-aware pruning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5687--5695.

[64]

T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam, "Netadapt: Platform-aware neural network adaptation for mobile applications," in Proceedings of the European Conference on Computer Vision, 2018, pp. 285--300.

[65]

T.-H. Yang, H.-Y. Cheng, C.-L. Yang, I.-C. Tseng, H.-W. Hu, H.-S. Chang, and H.-P. Li, "Sparse reram engine: joint exploration of activation and weight sparsity in compressed neural networks," in Proceedings of the 46th International Symposium on Computer Architecture, 2019, pp. 236--249.

[66]

Z. Yao, S. Cao, W. Xiao, C. Zhang, and L. Nie, "Balanced sparsity for efficient dnn inference on gpu," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 5676--5683.

[67]

R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V. I. Morariu, X. Han, M. Gao, C.-Y. Lin, and L. S. Davis, "Nisp: Pruning networks using neuron importance score propagation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9194--9203.

[68]

J. Zhang, X. Chen, M. Song, and T. Li, "Eager pruning: algorithm and architecture support for fast training of deep neural networks," in Proceedings of the 46th International Symposium on Computer Architecture. ACM, 2019, pp. 292--303.

[69]

T. Zhang, S. Ye, K. Zhang, X. Ma, N. Liu, L. Zhang, J. Tang, K. Ma, X. L. Lin, M. Fardad, and Y. Wang, "Structadmm: A systematic, high-efficiency framework of structured weight pruning for dnns," 2018.

[70]

M. Zhu, T. Zhang, Z. Gu, and Y. Xie, "Sparse tensor core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2019, pp. 359--371.

Cited By

Chen RLu GWang YZhang RHu ZMiao YCai ZLeng JGuo M(2025)BAFT: bubble-aware fault-tolerant framework for distributed DNN training with hybrid parallelismFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3401-519:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11704-023-3401-5
Zhang KLiu XYang HFeng TYang XLiu YLuan ZQian D(2024)Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor CoreProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673108(1124-1134)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673108
Liu XZheng XYang HLuan ZQian DLee IChabbi MSteuwer M(2024)Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPUProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638471(229-242)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638471
Show More Cited By

Accelerating sparse DNN models without hardware-support via tile-wise sparsity
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor Core
ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing

As deep learning models continue to grow larger, model pruning is employed to reduce memory footprint and computation complexity, which generates a large number of sparse matrix-matrix multiplication (SpMM) with unstructured sparsity (e.g., vector ...
Accelerating sparse convolution with column vector-wise sparsity
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

Weight sparsity is a promising approach to reducing the model size and computation cost of convolutional neural networks (CNNs). Nevertheless, non-zero weights often distribute randomly in sparse CNN models, introducing enormous difficulty in obtaining ...
Hardware-Software Techniques for Accelerating Sparse Computation

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2020

1454 pages

ISBN:9781728199986

General Chair:
Christine Cuicchi,
Program Chairs:
Irene Qualters,
William Kramer

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 09 November 2020

Check for updates

Qualifiers

Research-article

Conference

SC '20

Sponsor:

SIGHPC

SC '20: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 9 - 19, 2020

Georgia, Atlanta

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
403
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen RLu GWang YZhang RHu ZMiao YCai ZLeng JGuo M(2025)BAFT: bubble-aware fault-tolerant framework for distributed DNN training with hybrid parallelismFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3401-519:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11704-023-3401-5
Zhang KLiu XYang HFeng TYang XLiu YLuan ZQian D(2024)Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor CoreProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673108(1124-1134)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673108
Liu XZheng XYang HLuan ZQian DLee IChabbi MSteuwer M(2024)Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPUProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638471(229-242)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638471
Guan YYu CZhou YLeng JLi CGuo MTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN PruningProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651351(416-430)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651351
Guo CZhang RXu JLeng JLiu ZHuang ZGuo MWu HZhao SZhao JZhang KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640423(450-466)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640423
Guan YQiu YLeng JYang FYu SLiu YFeng YZhu YZhou LLiang YZhang CLi CGuo MTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)Amanda: Unified Instrumentation Framework for Deep Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624864(1-18)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624864
Nova ADai HSchuurmans DKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Gradient-free structured pruning with unlabeled dataProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619505(26326-26341)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619505
Wilkinson LCheshmi KDehnavi M(2023)Register Tiling for Unstructured Sparsity in Neural Network InferenceProceedings of the ACM on Programming Languages10.1145/35913027:PLDI(1995-2020)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591302
Wang ZHuang SHuang YCui H(2023)Energy-Latency Attacks to On-Device Neural Networks via Sponge PoisoningProceedings of the 2023 Secure and Trustworthy Deep Learning Systems Workshop10.1145/3591197.3591307(1-11)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3591197.3591307
Lu GChen RWang YZhou YZhang RHu ZMiao YCai ZLi LLeng JGuo MBartolini ARietveld KSchuman CMoreira J(2023)DistSimProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592200(112-122)Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1145/3587135.3592200
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten