research-article

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Authors:

Priyanka Raina,

Christos Kozyrakis,

Mark HorowitzAuthors Info & Claims

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 369 - 383

https://doi.org/10.1145/3373376.3378514

Published: 13 March 2020 Publication History

Abstract

We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.

References

[1]

ARM ML Processor. https://developer.arm.com/products/processors/machine-learning/arm-ml-processor/.

[2]

NVDLA. http://nvdla.org/.

[3]

Tensilica customizable processor IP. http://ip.cadence.com/ipportfolio/tensilica-ip.

[4]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In 43rd Annual International Symposium on Computer Architecture (ISCA), pages 1--13, 2016.

Digital Library

[5]

Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. Fused-layer CNN accelerators. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, 2016.

Digital Library

[6]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 269--284, 2014.

Digital Library

[7]

Y. Chen, T. Yang, J. Emer, and V. Sze. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(2):292--308, June 2019.

[8]

Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 43rd Annual International Symposium on Computer Architecture (ISCA), pages 367--379, 2016.

Digital Library

[9]

Yu-Hsin Chen, Tushar Krishna, Joel Emer, and Vivienne Sze. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In IEEE International Solid-State Circuits Conference (ISSCC), pages 262--263, 2016.

[10]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. DaDianNao: A machine-learning supercomputer. In 47th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 609--622, 2014.

[11]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In 43rd International Symposium on Computer Architecture (ISCA), pages 27--39, 2016.

Digital Library

[12]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. ShiDianNao: Shifting vision processing closer to the sensor. In 42nd Annual International Symposium on Computer Architecture (ISCA), pages 92--104, 2015.

Digital Library

[13]

Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. Neuflow: A runtime reconfigurable dataflow processor for vision. In 2011 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 109--116, 2011.

[14]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. TETRIS: Scalable and efficient neural network acceleration with 3D memory. In 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.

[15]

Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis. Tangram: Optimized coarse-grained dataflow for scalable nn accelerators. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, pages 807--820, New York, NY, USA, 2019. ACM.

Digital Library

[16]

Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. A 240 G-ops/s mobile coprocessor for deep neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 696--701, 2014.

Digital Library

[17]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. EIE: Efficient inference engine on compressed deep neural network. In 43rd Annual International Symposium on Computer Architecture (ISCA), pages 243--254, 2016.

Digital Library

[18]

Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527--1554, 2006.

Digital Library

[19]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.

[20]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), Jun 2017.

Digital Library

[21]

Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In 43rd Annual International Symposium on Computer Architecture (ISCA), pages 380--392, 2016.

Digital Library

[22]

Jong Hwan Ko, Burhan Mudassar, Taesik Na, and Saibal Mukhopadhyay. Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation. In Proceedings of the 54th Annual Design Automation Conference 2017, DAC '17, pages 59:1--59:6, New York, NY, USA, 2017. ACM.

Digital Library

[23]

Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 754--768, 2019.

Digital Library

[24]

Hyoukjun Kwon, Michael Pellauer, and Tushar Krishna. MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. CoRR, abs/1805.02566, 2018.

[25]

Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '18, pages 461--475, New York, NY, USA, 2018. ACM.

Digital Library

[26]

Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou, and Lingli Wang. A high performance FPGA-based accelerator for large-scale convolutional neural networks. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pages 1--9, Aug 2016.

[27]

Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks. In 23rd IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 553--564, 2017.

[28]

Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. Polymage: Automatic optimization for image processing pipelines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 429--443, New York, NY, USA, 2015. ACM.

Digital Library

[29]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norm Jouppi. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 3--14, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[30]

A. Parashar, P. Raina, Y. S. Shao, Y. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer. Timeloop: A systematic approach to dnn accelerator evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 304--315, March 2019.

[31]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture, pages 27--40. ACM, 2017.

Digital Library

[32]

Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. Programming heterogeneous systems from an image processing DSL. ACM Trans. Archit. Code Optim., 14(3):26:1--26:25, August 2017.

Digital Library

[33]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et al. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 26--35. ACM, 2016.

Digital Library

[34]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 519--530, New York, NY, USA, 2013. ACM.

Digital Library

[35]

H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From high-level deep neural models to FPGAs. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, Oct 2016.

Digital Library

[36]

Yongming Shen, Mechael Ferdman, and Peter Milder. Overcoming resource underutilization in spatial CNN accelerators. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pages 1--4, Aug 2016.

[37]

Yongming Shen, Mechael Ferdman, and Peter Milder. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), Jun 2017.

Digital Library

[38]

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[39]

Mingcong Song, Jiaqi Zhang, Huixiang Chen, and Tao Li. Towards efficient microarchitectural design for accelerating unsupervised GAN-based deep learning. In High Performance Computer Architecture (HPCA), 2018 IEEE International Symposium on, pages 66--77. IEEE, 2018.

[40]

Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 16--25. ACM, 2016.

Digital Library

[41]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. CoRR, abs/1409.3215, 2014.

[42]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.

[43]

Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017, DAC '17, pages 29:1--29:6, New York, NY, USA, 2017. ACM.

Digital Library

[44]

Xuan Yang. A Systematic Framework to Analyze the Design Space of DNN Accelerators. PhD thesis, Stanford University, 2019.

[45]

Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, and Mark Horowitz. A systematic approach to blocking convolutional neural networks. arXiv preprint arXiv:1606.04209, 2016.

[46]

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), pages 548--560, 2017.

Digital Library

[47]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pages 161--170, 2015.

Digital Library

[48]

Chi Zhang and Viktor Prasanna. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pages 35--44, New York, NY, USA, 2017. ACM.

Digital Library

[49]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. Cambricon-X: An accelerator for sparse neural networks. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, 2016.

[50]

Julian G. Zilly, Rupesh Kumar Srivastava, Jan Koutn'i k, and Jü rgen Schmidhuber. Recurrent highway networks. CoRR, abs/1607.03474, 2016.

[51]

Wei Zuo, Yun Liang, Peng Li, Kyle Rupnow, Deming Chen, and Jason Cong. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 9--18, New York, NY, USA, 2013. ACM.

Digital Library

Cited By

Lai CZhang W(2024)gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-ChipsACM Transactions on Design Automation of Electronic Systems10.1145/3661997Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3661997
Zhou ZDuan XChen KChen YHan J(2024)ML-Fusion: Determining Memory Levels for Data Reuse Between DNN LayersProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658698(63-68)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658698
Ranawaka PAzhar MStenstrom P(2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649196
Show More Cited By

Index Terms

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

The use of increasingly larger and more complex neural networks (NNs) makes it critical to scale the capabilities and efficiency of NN accelerators. Tiled architectures provide an intuitive scaling solution that supports both coarse-grained parallelism ...
Dynamic loop pipelining in data-driven architectures
CF '05: Proceedings of the 2nd conference on Computing frontiers

Data-driven array architectures seem to be important alternatives for coarse-grained reconfigurable computing platforms. Their use has provided performance improvements over microprocessors and shorter programming cycles than FPGA-based platforms. As ...
Latte: a language, compiler, and runtime for elegant and efficient deep neural networks
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Deep neural networks (DNNs) have undergone a surge in popularity with consistent advances in the state of the art for tasks including image recognition, natural language processing, and speech recognition. The computationally expensive nature of these ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

March 2020

1412 pages

ISBN:9781450371025

DOI:10.1145/3373376

General Chair:
James Larus
EPFL
,
Program Chairs:
Luis Ceze
University of Washington
,
Karin Strauss
Microsoft

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Stanford SystemX
DSSoC DARPA grant
Stanford AHA affiliates program

Conference

ASPLOS '20

Sponsor:

ASPLOS '20: Architectural Support for Programming Languages and Operating Systems

March 16 - 20, 2020

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

140
Total Citations
View Citations
3,470
Total Downloads

Downloads (Last 12 months)482
Downloads (Last 6 weeks)41

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lai CZhang W(2024)gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-ChipsACM Transactions on Design Automation of Electronic Systems10.1145/3661997Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3661997
Zhou ZDuan XChen KChen YHan J(2024)ML-Fusion: Determining Memory Levels for Data Reuse Between DNN LayersProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658698(63-68)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658698
Ranawaka PAzhar MStenstrom P(2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649196
Zhou CHassman ZShah DRichard VLi YRodríguez GSadayappan PSukumaran-Rajam A(2024)YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUsProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641566(212-226)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641566
Li CZhou ZZheng SZhang JLiang YSun GTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-ExplorationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651352(950-965)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651352
Giordano MDoshi RLu QMurmann BTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)TinyForge: A Design Space Exploration to Advance Energy and Silicon Area Trade-offs in tinyML Compute Architectures with Custom Latch ArraysProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651328(1033-1047)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651328
Bhagirath KPant DTiwari AKhaneja V(2024)Towards Scalability and Performance: Framework for Heterogeneous Cluster Integration in Deep Learning Accelerators2024 IEEE 4th International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI SATA)10.1109/VLSISATA61709.2024.10560269(1-6)Online publication date: 17-May-2024
https://doi.org/10.1109/VLSISATA61709.2024.10560269
Yang JZheng HLouri A(2024)Versa-DNN: A Versatile Architecture Enabling High-Performance and Energy-Efficient Multi-DNN AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334095335:2(349-361)Online publication date: Mar-2024
https://doi.org/10.1109/TPDS.2023.3340953
Li YLouri AKaranth A(2024)A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332753535:1(46-58)Online publication date: Jan-2024
https://doi.org/10.1109/TPDS.2023.3327535
Wang CWang ZLi SZhang YShen HHuang K(2024)EWS: An Energy-Efficient CNN Accelerator With Enhanced Weight Stationary DataflowIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.335951171:7(3478-3482)Online publication date: Jul-2024
https://doi.org/10.1109/TCSII.2024.3359511
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents