Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Application-level Validation of Accelerator Designs Using a Formal Software/Hardware Interface

Published: 14 February 2024 Publication History

Abstract

Ideally, accelerator development should be as easy as software development. Several recent design languages/tools are working toward this goal, but actually testing early designs on real applications end-to-end remains prohibitively difficult due to the costs of building specialized compiler and simulator support. We propose a new first-in-class, mostly automated methodology termed “3LA” to enable end-to-end testing of prototype accelerator designs on unmodified source applications. A key contribution of 3LA is the use of a formal software/hardware interface that specifies an accelerator’s operations and their semantics. Specifically, we leverage the Instruction-level Abstraction (ILA) formal specification for accelerators that has been successfully used thus far for accelerator implementation verification. We show how the ILA for accelerators serves as a software/hardware interface, similar to the Instruction Set Architecture for processors, that can be used for automated development of compilers and instruction-level simulators. Another key contribution of this work is to show how ILA-based accelerator semantics enables extending recent work on equality saturation to auto-generate basic compiler support for prototype accelerators in a technique we term “flexible matching.” By combining flexible matching with simulators auto-generated from ILA specifications, our approach enables end-to-end evaluation with modest engineering effort. We detail several case studies of 3LA, which uncovered an unknown flaw in a recently published accelerator and facilitated its fix.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, 265–283.
[2]
Edward Anderson, Zhaojun Bai, Christian Bischof, L. Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, and Danny Sorensen. 1999. LAPACK Users’ Guide (3rd ed.). Society for Industrial and Applied Mathematics.
[3]
Guido Arnout. 2000. SystemC standard. In Proceedings of the Design Automation Conference. IEEE, 573–577.
[4]
Franz Baader and Tobias Nipkow. 1998. Term Rewriting and All That. Cambridge University Press.
[5]
Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an agile hardware design flow. In Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference (DAC’20). IEEE Press, New York, NY, Article 142, 6 pages.
[6]
Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’06). ACM, New York, NY, 394–403.
[7]
Sorav Bansal and Alex Aiken. 2008. Binary translation using peephole superoptimizers. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI ’08). USENIX Association, 177–192.
[8]
Shunning Jiang, Christopher Torng, and Christopher Batten. 2018. An open-source python-based hardware generation, simulation, and verification framework. In Proceedings of the Workshop on Open-Source EDA Technology (WOSET’18). N.A., Virtual, 1–5.
[9]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05). USENIX Association, 41.
[10]
Gabriel Hjort Blindell. 2016. Instruction Selection—Principles, Methods, and Applications. Springer, Berlin.
[11]
Wei-Ting Jonas Chan, Andrew B. Kahng, Siddhartha Nath, and Ichiro Yamamoto. 2014. The ITRS MPU and SOC system drivers: Calibration and implications for design-based equivalent scaling in the roadmap. In Proceedings of the 32nd IEEE International Conference on Computer Design (ICCD ’14). IEEE Computer Society, New York, NY, 153–160.
[12]
Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, Guy Lorette (Ed.). Université de Rennes 1, La Baule (France), 6 pages.
[13]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. Retrieved from https://arXiv:1512.01274
[14]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18), Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 579–594.
[15]
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid State Circ. 52, 1 (2017), 127–138.
[16]
Zhi Chen, Cody Hao Yu, Trevor Morris, Jorn Tuyls, Yi-Hsiang Lai, Jared Roesch, Elliott Delaye, Vin Sharma, and Yida Wang. 2021. Bring Your Own Codegen to Deep Learning Compiler. Retrieved from https://arXiv:2105.03215
[17]
Canadian Institute for Advanced Research 2009. The CIFAR-10 Dataset. Canadian Institute for Advanced Research. Retrieved from http://www.cs.toronto.edu/kriz/cifar.html
[18]
Apple Inc. 2022. CoreML: Integrate Machine Learning Models Into Your App. Apple Inc. Retrieved from https://developer.apple.com/documentation/coreml
[19]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, New York, NY, 248–255.
[20]
Nachum Dershowitz. 1993. A taste of rewrite systems. In Functional Programming, Concurrency, Simulation and Automated Reasoning(Lecture Notes in Computer Science, Vol. 693), Peter E. Lauer (Ed.). Springer, Berlin, 199–228.
[21]
João Dias and Norman Ramsey. 2010. Automatically generating instruction selectors using declarative machine descriptions. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’10). ACM, New York, NY, 403–416.
[22]
Zhenman Fang, Farnoosh Javadi, Jason Cong, and Glenn Reinman. 2019. Understanding performance gains of accelerator-rich architectures. In Proceedings of the 30th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’19). IEEE, New York, NY, 239–246.
[23]
Alex Graves and Navdeep Jaitly. 2014. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 32), Eric P. Xing and Tony Jebara (Eds.). PMLR, 1764–1772. Retrieved from https://proceedings.mlr.press/v32/graves14.html
[24]
Peng Gu, Xinfeng Xie, Yufei Ding, Guoyang Chen, Weifeng Zhang, Dimin Niu, and Yuan Xie. 2020. IPIM: Programmable in-memory image processing accelerator using near-bank architecture. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). IEEE Press, 804–817.
[25]
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE Press, New York, NY, Article 56, 13 pages.
[26]
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 37–47.
[27]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, New York, NY, 243–254.
[28]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, New York, NY, 770–778.
[29]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Retrieved from https://arXiv:1704.04861
[30]
Bo-Yuan Huang, Sayak Ray, Aarti Gupta, Jason M. Fung, and Sharad Malik. 2018. Formal security verification of concurrent firmware in SoCs using instruction-level abstraction for hardware. In Proceedings of the 55th Annual Design Automation Conference (DAC’18). ACM, New York, NY, Article 91, 6 pages.
[31]
Bo-Yuan Huang, Hongce Zhang, Aarti Gupta, and Sharad Malik. 2019. ILAng: A modeling and verification platform for SoCs using instruction-level abstractions. In Proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’19)(Lecture Notes in Computer Science, Vol. 11427), Tomás Vojnar and Lijun Zhang (Eds.). Springer, Berlin, 351–357.
[32]
Bo-Yuan Huang, Hongce Zhang, Pramod Subramanyan, Yakir Vizel, Aarti Gupta, and Sharad Malik. 2018. Instruction-level abstraction (ILA): A uniform specification for system-on-chip (SoC) verification. ACM Trans. Des. Autom. Electron. Syst. 24, 1, Article 10 (Dec. 2018), 24 pages.
[33]
Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. 2022. Exocompilation for productive programming of hardware accelerators. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’22). ACM, 703–718.
[34]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. Retrieved from https://arXiv:1712.05877
[35]
Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (Berlin, Germany) (PLDI ’02). ACM, New York, NY, 304–314.
[36]
Norman P. Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David Patterson. 2020. A domain-specific supercomputer for training deep neural networks. Commun. ACM 63, 7 (June 2020), 67–78.
[37]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, 1–12.
[38]
Ville Korhonen, Pekka Jääskeläinen, Matias Koskela, Timo Viitanen, and Jarmo Takala. 2015. Rapid customization of image processors using halide. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP’15). IEEE, 629–633.
[39]
Smail Kourta, Adel Abderahmane Namani, Fatima Benbouzid-Si Tayeb, Kim Hazelwood, Chris Cummins, Hugh Leather, and Riyadh Baghdadi. 2022. Caviar: An e-graph-based TRS for automatic code optimization. In Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction (CC’22). ACM, New York, NY, 54–64.
[40]
Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, NY, 242–251.
[41]
Yi-Hsiang Lai, Ecenur Ustun, Shaojie Xiang, Zhenman Fang, Hongbo Rong, and Zhiru Zhang. 2021. Programming and synthesis for software-defined FPGA acceleration: Status and future prospects. ACM Trans. Reconfigurable Technol. Syst. 14, 4, Article 17 (Sept. 2021), 39 pages.
[42]
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques A. Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’21), Jae W. Lee, Mary Lou Soffa, and Ayal Zaks (Eds.). IEEE, New York, NY, 2–14.
[43]
Dongkwon Lee, Woosuk Lee, Hakjoo Oh, and Kwangkeun Yi. 2023. Optimizing homomorphic evaluation circuits by program synthesis and time-bounded exhaustive search. ACM Trans. Program. Lang. Syst. 45, 3, Article 16 (Sep. 2023), 37 pages.
[44]
Jiajie Li, Yuze Chi, and Jason Cong. 2020. HeteroHalide: From image processing DSL to efficient FPGA acceleration. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 51–57.
[45]
Yi Li, Aarti Gupta, and Sharad Malik. 2024. Exact scheduling to minimize off-chip data movement for deep learning accelerators. In Proceedings of the 29th Asia and South Pacific Design Automation Conference (ASP-DAC’24). IEEE. To appear.
[46]
The Linux Foundation 2019. ONNX: Open Neural Network Exchange. The Linux Foundation. Retrieved from https://onnx.ai/
[47]
Amanda Liu, Gilbert Louis Bernstein, Adam Chlipala, and Jonathan Ragan-Kelley. 2022. Verified tensor-program optimization via high-level scheduling rewrites. Proc. ACM Program. Lang. 6, POPL, Article 55 (Jan. 2022), 28 pages.
[48]
Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, and Fredrik Kjolstad. 2021. Compiling Halide Programs to Push-Memory Accelerators. Retrieved from https://arXiv:2105.12858
[49]
John McCarthy. 1960. Recursive functions of symbolic expressions and their computation by machine, part I. Commun. ACM 3, 4 (Apr. 1960), 184–195.
[50]
Jedidiah McClurg, Miles Claver, Jackson Garner, Jake Vossen, Jordan Schmerge, and Mehmet E. Belviranli. 2023. Optimizing regular expressions via rewrite-guided synthesis. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT ’22). ACM, New York, NY, 426–438.
[51]
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer Sentinel Mixture Models. Retrieved from https://arXiv:1609.07843
[52]
MLCommons [n.d.]. MLPerf Benchmarks. MLCommons. Retrieved from https://mlcommons.org
[53]
Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Q. Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2019. A hardware-software blueprint for flexible deep learning specialization. IEEE Micro 39, 5 (2019), 8–16.
[54]
Greg Nelson and Derek C. Oppen. 1980. Fast decision procedures based on congruence closure. J. ACM 27, 2 (Apr. 1980), 356–364.
[55]
Julie L. Newcomb, Andrew Adams, Steven Johnson, Rastislav Bodik, and Shoaib Kamil. 2020. Verifying and improving halide’s term rewriting system with program synthesis. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–28.
[56]
Robert Nieuwenhuis and Albert Oliveras. 2005. Proof-producing congruence closure. In Proceedings of the 16th International Conference on Term Rewriting and Applications (RTA’05)(Lecture Notes in Computer Science, Vol. 3467), Jürgen Giesl (Ed.). Springer, Berlin, 453–468.
[57]
Rachit Nigam, Samuel Thomas, Zhijing Li, and Adrian Sampson. 2021. A compiler infrastructure for accelerator generators. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’21). ACM, New York, NY, 804–817.
[58]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Annual Conference on Advances in Neural Information Processing Systems (NeurIPS’19), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché Buc, Emily B. Fox, and Roman Garnett (Eds.). Curran Associates, 8024–8035.
[59]
Clément Pit-Claudel, Thomas Bourgeat, Stella Lau, Arvind, and Adam Chlipala. 2021. Effective simulation and debugging for a high-level hardware language using software compilers. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’21). ACM, New York, NY, 789–803.
[60]
Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming heterogeneous systems from an image processing DSL. ACM Trans. Architect. Code Optimiz. 14, 3 (2017), 1–25.
[61]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, NY, 519–530.
[62]
Norman Ramsey and João Dias. 2011. Resourceable, retargetable, modular instruction selection using a machine-independent, type-based tiling of low-level intermediate code. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, New York, NY, 575–586.
[63]
Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, New York, NY, 267–278.
[64]
Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2020. MLPerf inference benchmark. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). IEEE Press, New York, NY, 446–459.
[65]
Jared Roesch, Steven Lyubomirsky, Marisa Kirisame, Logan Weber, Josh Pollock, Luis Vega, Ziheng Jiang, Tianqi Chen, Thierry Moreau, and Zachary Tatlock. 2019. Relay: A High-Level Compiler for Deep Learning. Retrieved from https://arXiv:1904.08368
[66]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2019. MobileNetV2: Inverted Residuals and Linear Bottlenecks. Retrieved from https://arXiv:1801.04381
[67]
Ruud Schellekens. 2020. Automatically Scheduling Halide-HLS. Master’s Thesis. Eindhoven University of Technology.
[68]
Siemens n.d. Catapult High-Level Synthesis and Verification. Siemens. Retrieved from https://eda.sw.siemens.com/en-US/ic/catapult-high-level-synthesis
[69]
Gus Henry Smith, Andrew Liu, Steven Lyubomirsky, Scott Davidson, Joseph McMahan, Michael Taylor, Luis Ceze, and Zachary Tatlock. 2021. Pure tensor program rewriting via access patterns (representation pearl). In Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming (MAPS ’21). ACM, New York, NY, 21–31.
[70]
John A. Stratton, Jyothi Krishna Viswakaran Sreelatha, Rajiv Ravindran, Sachin Sudhakar Dake, and Jeevitha Palanisamy. 2020. Optimizing halide for digital signal processors. In Proceedings of the IEEE Workshop on Signal Processing Systems (SiPS’20). IEEE, 1–6.
[71]
Pramod Subramanyan, Bo-Yuan Huang, Yakir Vizel, Aarti Gupta, and Sharad Malik. 2018. Template-based parameterized synthesis of uniform instruction-level abstractions for SoC verification. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37, 8 (2018), 1692–1705.
[72]
Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, Marco Donato, Paul N. Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. 9.8 A 25mm\({}^{\mbox{2}}\) SoC for IoT devices with 18ms noise-robust speech-to-text latency via Bayesian speech denoising and attention-based sequence-to-sequence DNN speech recognition in 16nm FinFET. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’21). IEEE, New York, NY, 158–160.
[73]
Thierry Tambe, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference. In Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference (DAC’20). IEEE Press, Article 51, 6 pages.
[74]
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning(ICML’19, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105–6114. http://proceedings.mlr.press/v97/tan19a.html
[75]
Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2009. Equality saturation: A new approach to optimization. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’09). ACM, New York, NY, 264–276.
[76]
Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, and Hervé Jégou. 2021. ResMLP: Feedforward Networks for Image Classification with Data-efficient Training. Retrieved from https://arXiv:2105.03404
[77]
Lenny Truong, Steven Herbst, Rajsekhar Setaluri, Makai Mann, Ross G. Daly, Keyi Zhang, Caleb Donovick, Daniel Stanley, Mark Horowitz, Clark W. Barrett, and Pat Hanrahan. 2020. fault: A Python embedded domain-specific language for metaprogramming portable hardware verification components. In Proceedings of the 32nd International Conference on Computer Aided Verification (CAV’20)(Lecture Notes in Computer Science, Vol. 12224), Shuvendu K. Lahiri and Chao Wang (Eds.). Springer, Berlin, 403–414.
[78]
Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. 2021. Vectorization for digital signal processors via equality saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’21). ACM, New York, NY, 874–886.
[79]
Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. 2021. Vectorization for digital signal processors via equality saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’21). ACM, New York, NY, 874–886.
[80]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. Retrieved from https://arXiv:1706.03762
[81]
Veripool n.d. Verilator. Veripool. Retrieved from https://www.veripool.org/verilator/
[82]
Sander Vocke, Henk Corporaal, Roel Jordans, Rosilde Corvino, and Rick Nas. 2017. Extending halide to improve software development for imaging DSPs. ACM Trans. Archit. Code Optim. 14, 3, Article 21 (Aug. 2017), 25 pages.
[83]
Paul N. Whatmough, Sae Kyu Lee, Marco Donato, Hsea-Ching Hsueh, Sam Likun Xi, Udit Gupta, Lillian Pentecost, Glenn G. Ko, David M. Brooks, and Gu-Yeon Wei. 2019. A 16 nm 25 mm\({}^{\mbox{2}}\) SoC with a 54.5\(\times\) flexibility-efficiency range from dual-core Arm Cortex-A53 to eFPGA and cache-coherent accelerators. In Proceedings of the Symposium on VLSI Circuits. IEEE, New York, NY, 34.
[84]
Deborah L. Whitfield and Mary Lou Soffa. 1997. An approach for exploring code improving transformations. ACM Trans. Program. Lang. Syst. 19, 6 (Nov. 1997), 1053–1084.
[85]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and extensible equality saturation. Proc. ACM Program. Lang. 5, POPL, Article 23 (Jan. 2021), 29 pages.
[86]
PyTorch Team 2020. Word-level Language Modeling RNN. PyTorch Team. Retrieved from https://github.com/pytorch/examples/tree/master/word_language_model
[87]
Xilinx Inc. [n.d.]. The Xilinx Software Development Kit (XSDK). Xilinx Inc. Retrieved from https://www.xilinx.com/products/design-tools/embedded-software/sdk.html
[88]
Yichen Yang, Phitchaya Phothilimthana, Yisu Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. 2021. Equality saturation for tensor graph superoptimization. In Proceedings of Machine Learning and Systems, Vol. 3. MLSys.org, Virtual, 255–268.
[89]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: An accelerator for sparse neural networks. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE Press, New York, NY, Article 20, 12 pages.
[90]
Bill Zorn. 2021. Rounding. Ph. D. Dissertation. University of Washington.

Cited By

View all
  • (2024)An Image-Retrieval Method Based on Cross-Hardware Platform FeaturesApplied System Innovation10.3390/asi70400647:4(64)Online publication date: 23-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 29, Issue 2
March 2024
438 pages
EISSN:1557-7309
DOI:10.1145/3613564
  • Editor:
  • Jiang Hu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 14 February 2024
Online AM: 29 December 2023
Accepted: 15 December 2023
Revised: 14 November 2023
Received: 01 April 2023
Published in TODAES Volume 29, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Accelerator
  2. domain-specific language
  3. compilation
  4. validation
  5. software/hardware interface

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)538
  • Downloads (Last 6 weeks)66
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Image-Retrieval Method Based on Cross-Hardware Platform FeaturesApplied System Innovation10.3390/asi70400647:4(64)Online publication date: 23-Jul-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media