Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3617232.3624865acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

Cocco: Hardware-Mapping Co-Exploration towards Memory Capacity-Communication Optimization

Published: 17 April 2024 Publication History

Abstract

Memory is a critical design consideration in current data-intensive DNN accelerators, as it profoundly determines energy consumption, bandwidth requirements, and area costs. As DNN structures become more complex, a larger on-chip memory capacity is required to reduce data movement overhead, but at the expense of silicon costs. Some previous works have proposed memory-oriented optimizations, such as different data reuse and layer fusion schemes. However, these methods are not general and potent enough to cope with various graph structures.
In this paper, we explore the intrinsic connection between network structures and memory features to optimize both hardware and mapping. First, we introduce a graph-level execution scheme with a corresponding dataflow and memory management method. This scheme enables the execution of arbitrary graph patterns with high data reuse and low hardware overhead. Subsequently, we propose Cocco, a hardware-mapping co-exploration framework leveraging graph-level features of networks. It aims to minimize communication overhead, such as energy consumption and bandwidth requirements, with a smaller memory capacity. We formulate the graph-partition scheduling and memory configuration search as an optimization problem and employ a genetic-based method to achieve efficient co-exploration for large and irregular networks. Experiments demonstrate that Cocco obtains lower external memory access, lower bandwidth requirements, and more stable optimization for graph partition compared to the greedy algorithm and dynamic programming introduced in prior works. Cocco also reduces the costs by 1.89% to 50.33% using co-exploration compared to other typical methods.

References

[1]
Dennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, E. R. Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Purushotham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Richard Czekalski, Ashay Rane, Sahil Parmar, Jeff Werner, Jim Sproch, Adrian Macias, and Brian Kurtz. 2020. Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE, Valencia, Spain, 145--158.
[2]
Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, and Mohammad Alizadeh. 2019. Learning Generalizable Device Placement Algorithms for Distributed Machine Learning. In Advances in Neural Information Processing Systems (NeurIPS), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). OpenReview.net, Vancouver, BC, Canada, 3983--3993.
[3]
Byung Hoon Ahn, Jinwon Lee, Jamie Menjay Lin, Hsin-Pai Cheng, Jilei Hou, and Hadi Esmaeilzadeh. 2020. Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices. In Proceedings of Machine Learning and Systems (MLSys), Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org, Austin, TX, USA, 1--14.
[4]
Manoj Alwani, Han Chen, Michael Ferdman, and Peter A. Milder. 2016. Fused-layer CNN accelerators. In Proceedings of the 49th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society, Taipei, Taiwan, 22:1--22:12.
[5]
Arteries. 2022. Arteries IP Homepage. https://www.arteris.com.
[6]
Ljubisa Bajic and Jasmina Vasiljevic. 2020. Compute substrate for Software 2.0. In Proceedings of the IEEE Hot Chips 32 Symposium (HCS). IEEE, Palo Alto, CA, USA, 1--31.
[7]
Pete Bannon, Ganesh Venkataramanan, Debjit Das Sarma, and Emil Talpes. 2019. Computer and Redundancy Solution for the Full Self-Driving Computer. In Proceedings of the IEEE Hot Chips 31 Symposium (HCS). IEEE, Cupertino, CA, USA, 1--22.
[8]
John Burgess. 2019. RTX ON - The NVIDIA TURING GPU. In Proceedings of the IEEE Hot Chips 31 Symposium (HCS). IEEE, Cupertino, CA, USA, 1--27.
[9]
Xuyi Cai, Ying Wang, Kaijie Tu, Chengsi Gao, and Lei Zhang. 2022. Olympus: Reaching Memory-Optimality on DNN Processors. IEEE Transactions on Computers (TC) 71, 8 (2022), 1939--1951.
[10]
Prasanth Chatarasi, Hyoukjun Kwon, Angshuman Parashar, Michael Pellauer, Tushar Krishna, and Vivek Sarkar. 2022. Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators. ACM Transactions on Architecture and Code Optimization 19, 1 (2022), 6:1--6:26.
[11]
Karam Chatha. 2021. Qualcomm® Cloud Al-100: 12TOPS/W Scalable, High Performance and Low Latency Deep Learning Inference Accelerator. In Proceedings of the IEEE Hot Chips 33 Symposium (HCS). IEEE, Palo Alto, CA, USA, 1--19.
[12]
Xiaoming Chen, Yinhe Han, and Yu Wang. 2020. Communication Lower Bound in Convolution Accelerators. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, San Diego, CA, USA, 529--541.
[13]
Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In Proceedings of the ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Seoul, South Korea, 367--379.
[14]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: shifting vision processing closer to the sensor. In Proceedings of the ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). ACM, Portland, OR, USA, 92--104.
[15]
Clément Farabet, Berin Martini, B. Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. IEEE Computer Society, Colorado Springs, CO, USA, 109--116.
[16]
Kaijie Feng, Xiaoya Fan, Jianfeng An, Xiping Wang, Kaiyue Di, Jiangfei Li, Minghao Lu, and Chuxi Li. 2021. ERDSE: efficient reinforcement learning based design space exploration method for CNN accelerator on resource limited platform. Graphics and Visual Computing 4 (2021), 1--11.
[17]
Ken-ichi Funahashi. 1989. On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 3 (1989), 183--192.
[18]
Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis. 2019. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Providence, RI, USA, 807--820.
[19]
Yuanxiang Gao, Li Chen, and Baochun Li. 2018. Spotlight: Optimizing Device Placement for Training Deep Neural Networks. In Proceedings of the 35th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, Stockholm, Sweden, 1662--1670.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Sun Jian. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Las Vegas, NV, USA, 770--778.
[21]
Kartik Hegde, Po-An Tsai, Sitao Huang, Vikas Chandra, Angshuman Parashar, and Christopher W. Fletcher. 2021. Mind mappings: enabling efficient algorithm-accelerator mapping space search. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Tim Sherwood, Emery D. Berger, and Christos Kozyrakis (Eds.). ACM, Virtual Event, USA, 943--958.
[22]
Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366.
[23]
Qijing Huang, Aravind Kalaiah, Minwoo Kang, James Demmel, Grace Dinh, John Wawrzynek, Thomas Norell, and Yakun Sophia Shao. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In Proceedings of the ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE, Valencia, Spain, 554--566.
[24]
Drago Ignjatovic, Daniel W. Bailey, and Ljubisa Bajic. 2022. The Wormhole AI Training Processor. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). IEEE, San Francisco, CA, USA, 356--358.
[25]
Abhinav Jangda and Uday Bondhugula. 2018. An effective fusion and tile size model for optimizing image processing pipelines. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Andreas Krall and Thomas R. Gross (Eds.). ACM, Vienna, Austria, 261--275.
[26]
Yang Jiao, Liang Han, Rong Jin, Yi-Jung Su, Chiente Ho, Li Yin, Yun Li, Long Chen, Zhen Chen, Lu Liu, Zhuyu He, Yu Yan, Jun He, Jun Mao, Xiaotao Zai, Xuejun Wu, Yongquan Zhou, Mingqiu Gu, Guocai Zhu, Rong Zhong, Wenyuan Lee, Ping Chen, Yiping Chen, Weiliang Li, Deyu Xiao, Qing Yan, Mingyuan Zhuang, Jiejun Chen, Yun Tian, Yingzi Lin, Wei Wu, Hao Li, and Zesheng Dou. 2020. A 12nm Programmable Convolution-Efficient Neural-Processing-Unit Chip Achieving 825TOPS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). IEEE, San Francisco, CA, USA, 136--140.
[27]
Yang Jiao, Liang Han, and Xin Long. 2020. Hanguang 800 NPU - The Ultimate AI Inference Solution for Data Centers. In Proceedings of the IEEE Hot Chips 32 Symposium (HCS). IEEE, Palo Alto, CA, USA, 1--29.
[28]
Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter C. Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, and David A. Patterson. 2021. Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE, Valencia, Spain, 1--14.
[29]
Sheng-Chun Kao, Xiaoyu Huang, and Tushar Krishna. 2022. DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators. arXiv preprint arXiv:2201.11218 abs/2201.11218 (2022), 1--8.
[30]
Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. In Proceedings of the IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, San Diego, CA, USA, 44:1--44:9.
[31]
Sheng-Chun Kao and Tushar Krishna. 2022. MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores. In IEEE International Symposium on High-Performance Computer Architecture, (HPCA). IEEE, Seoul, South Korea, 814--830.
[32]
Sheng-Chun Kao, Michael Pellauer, Angshuman Parashar, and Tushar Krishna. 2022. DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Cristiana Bolchini, Ingrid Verbauwhede, and Ioana Vatajelu (Eds.). IEEE, Antwerp, Belgium, 232--237.
[33]
Scott Kirkpatrick, D. Gelatt Jr., and Mario P. Vecchi. 1983. Optimization by Simmulated Annealing. Sci. 220, 4598 (1983), 671--680.
[34]
Simon Knowles. 2017. Scalable Silicon Compute. In Workshop on Deep Learning At Supercomputer Scale, NIPS. OpenReview.net, Long Beach, CA, USA, 1--22.
[35]
Simon Knowles. 2021. Graphcore. In Proceedings of the IEEE Hot Chips 33 Symposium (HCS). IEEE, Palo Alto, CA, USA, 1--25.
[36]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS). Curran Associates, Inc., Lake Tahoe, Nevada, United States, 1106--1114.
[37]
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, Columbus, OH, USA, 754--768.
[38]
Juhyoung Lee, Dongjoo Shin, Jinsu Lee, Jinmook Lee, Sanghoon Kang, and Hoi-Jun Yoo. 2019. A Full HD 60 fps CNN Super Resolution Processor with Selective Caching based Layer Fusion for Mobile Devices. In Proceedings of the Symposium on VLSI Circuits. IEEE, Kyoto, Japan, 302--303.
[39]
Grzegorz Lewicki and Giuseppe Marino. 2004. Approximation of functions of finite variation by superpositions of a Sigmoidal function. Appl. Math. Lett. 17, 10 (2004), 1147--1152.
[40]
Heng Liao, Jiajin Tu, Jing Xia, Hu Liu, Xiping Zhou, Honghui Yuan, and Yuxing Hu. 2021. Ascend: a Scalable and Unified Architecture for Ubiquitous Deep Neural Network Computing : Industry Track Paper. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, HPCA. IEEE, Seoul, South Korea, 789--801.
[41]
Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. DaVinci: A Scalable Architecture for Neural Network Computing. In Proceedings of the IEEE Hot Chips 31 Symposium (HCS). IEEE, Cupertino, CA, USA, 1--44.
[42]
Xinhan Lin, Shouyi Yin, Fengbin Tu, Leibo Liu, Xiangyu Li, and Shaojun Wei. 2018. LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA. In Proceedings of the 55th Annual Design Automation Conference (DAC). ACM, San Francisco, CA, USA, 16:1--16:6.
[43]
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE Computer Society, Austin, TX, USA, 553--564.
[44]
Yufei Ma, Yu Cao, Sarma B. K. Vrudhula, and Jae-sun Seo. 2017. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, Monterey, CA, USA, 45--54.
[45]
Marvin Minsky and Seymour Papert. 1987. Perceptrons - an introduction to computational geometry. MIT Press, .
[46]
Bert Moons, Roel Uytterhoeven, Wim Dehaene, and Marian Verhelst. 2017. Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). IEEE, San Francisco, CA, USA, 246--247.
[47]
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 35, 4 (2016), 83:1--83:11.
[48]
Thomas Norrie, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman P. Jouppi, and David A. Patterson. 2020. Google's Training Chips Revealed: TPUv2 and TPUv3. In Proceedings of the IEEE Hot Chips 32 Symposium (HCS). IEEE, Palo Alto, CA, USA, 1--70.
[49]
NVIDIA. 2018. THE NVIDIA DEEP LEARNING ACCELERATOR. In Proceedings of the IEEE Hot Chips 30 Symposium (HCS). IEEE, Cupertino, CA, USA, 1--18.
[50]
Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel S. Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Madison, WI, USA, 304--315.
[51]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA). ACM, Toronto, ON, Canada, 27--40.
[52]
Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training. In Preprint. OpenAI, 1--12.
[53]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized Evolution for Image Classifier Architecture Search. In Proceedings of the 33rd Conference on Artificial Intelligence (AAAI). AAAI Press, Honolulu, Hawaii, USA, 4780--4789.
[54]
Frank Rosenblatt. 1957. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, .
[55]
Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Computer Vision Foundation / IEEE Computer Society, Salt Lake City, UT, USA, 4510--4520.
[56]
Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, Columbus, OH, USA, 14--27.
[57]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations (ICLR). Computational and Biological Learning Society, San Diego, CA, USA, 1--14.
[58]
Zhuoran Song, Bangqi Fu, Feiyang Wu, Zhaoming Jiang, Li Jiang, Naifeng Jing, and Xiaoyao Liang. 2020. DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE, Valencia, Spain, 1010--1021.
[59]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Boston, MA, USA, 1--9.
[60]
Emil Talpes, Douglas Williams, and Debjit Das Sarma. 2022. DOJO: The Microarchitecture of Tesla's Exa-Scale Computer. In Proceedings of the IEEE Hot Chips 34 Symposium (HCS). IEEE, Cupertino, CA, USA, 1--28.
[61]
Zhanhong Tan, Hongyu Cai, Runpei Dong, and Kaisheng Ma. 2021. NN-Baton: DNN Workload Orchestration and Chiplet Granularity Exploration for Multichip Accelerators. In Proceedings of the IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE, Valencia, Spain, 1013--1026.
[62]
Jakub Tarnawski, Amar Phanishayee, Nikhil R. Devanur, Divya Mahajan, and Fanny Nina Paravecino. 2020. Efficient Algorithms for Device Placement of DNN Graph Operators. In Advances in Neural Information Processing Systems (NeurIPS), Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Open-Review.net, Virtual, 1--13.
[63]
Tenstorrent. 2021. Grayskull. https://tenstorrent.com/grayskull/.
[64]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems (NIPS), Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). OpenReview.net, Long Beach, CA, USA, 5998--6008.
[65]
Ofri Wechsler, Michael Behar, and Bharat Daga. 2019. Spring Hill (NNP-I 1000) Intel's Data Center Inference Chip. In Proceedings of the IEEE Hot Chips 31 Symposium (HCS). IEEE, Cupertino, CA, USA, 1--12.
[66]
Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing Programmable Spatial Accelerators. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE, Valencia, Spain, 268--281.
[67]
Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, and Yun Liang. 2021. HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). IEEE, Valencia, Spain, 1055--1068.
[68]
Saining Xie, Alexander Kirillov, Ross B. Girshick, and Kaiming He. 2019. Exploring Randomly Wired Neural Networks for Image Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, South Korea, 1284--1293.
[69]
Andrew Yang. 2019. Deep Learning Training At Scale Spring Crest Deep Learning Accelerator (Intel® Nervana™ NNP-T). In Proceedings of the IEEE Hot Chips 31 Symposium (HCS). IEEE, Cupertino, CA, USA, 1--20.
[70]
Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Lausanne, Switzerland, 369--383.
[71]
Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, and Yun Liang. 2022. AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA). ACM, New York, New York, USA, 874--887.
[72]
Shixuan Zheng, Xianjue Zhang, Leibo Liu, Shaojun Wei, and Shouyi Yin. 2022. Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, Seoul, South Korea, 475--489.
[73]
Shixuan Zheng, Xianjue Zhang, Daoli Ou, Shibin Tang, Leibo Liu, Shaojun Wei, and Shouyi Yin. 2020. Efficient Scheduling of Irregular Network Structures on CNN Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 39, 11 (2020), 3408--3419.
[74]
Brian Zimmer, Rangharajan Venkatesan, Yakun Sophia Shao, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, and Brucek Khailany. 2019. A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm. In Proceedings of the IEEE Symposium on VLSI Circuits (VLSI). IEEE, Kyoto, Japan, 300.
[75]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, (CVPR). Computer Vision Foundation / IEEE Computer Society, Salt Lake City, UT, USA, 8697--8710.

Cited By

View all
  • (2024)TMAC: Training-Targeted Mapping and Architecture Co-Exploration for Wafer-Scale ChipsIntegrated Circuits and Systems10.23919/ICS.2024.35150031:4(178-195)Online publication date: Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1
April 2024
494 pages
ISBN:9798400703720
DOI:10.1145/3617232
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2024

Check for updates

Author Tags

  1. design space exploration
  2. memory
  3. graph analysis
  4. subgraph
  5. genetic algorithm
  6. deep learning accelerator

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '24

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,155
  • Downloads (Last 6 weeks)133
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TMAC: Training-Targeted Mapping and Architecture Co-Exploration for Wafer-Scale ChipsIntegrated Circuits and Systems10.23919/ICS.2024.35150031:4(178-195)Online publication date: Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media