Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Language Support for Navigating Architecture Design in Closed Form

Published: 25 October 2019 Publication History

Abstract

As computer architecture continues to expand beyond software-agnostic microarchitecture to specialized and heterogeneous logic or even radically different emerging computing models (e.g., quantum cores, DNA storage units), detailed cycle-level simulation is no longer presupposed. Exploring designs under such complex interacting relationships (e.g., performance, energy, thermal, frequency) calls for a more integrative but higher-level approach. We propose Charm, a modeling language supporting closed-form high-level architecture modeling. Charm enables mathematical representations of mutually dependent architectural relationships to be specified, composed, checked, evaluated, reused, and shared. The language is interpreted through a combination of automatic symbolic evaluation, scalable graph transformation, and efficient compiler techniques, generating executable DAGs and optimized analysis procedures. Charm also exploits the advancements in satisfiability modulo theory solvers to automatically search the design space to help architects explore multiple design knobs simultaneously (e.g., different CNN tiling configurations). Through two case studies, we demonstrate that Charm allows one to define high-level architecture models in a clean and concise format, maximize reusability and shareability, capture unreasonable assumptions, and significantly ease design space exploration at a high level.

References

[1]
Python Software Foundation. 2017. Pint: Makes Units Easy. Retrieved September 23, 2019 from https://pypi.org/project/Pin/.
[2]
B. Agrawal and T. Sherwood. 2006. Modeling TCAM power for next generation network devices. In Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software. 120--129.
[3]
Sadaf R. Alam and Jeffrey S. Vetter. 2006. A framework to develop symbolic performance models of parallel applications. In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS’06). IEEE, Los Alamitos, CA, 320--320. http://dl.acm.org/citation.cfm?id=1898699.1898852
[4]
Sadaf R. Alam and Jeffrey S. Vetter. 2006. Hierarchical model validation of symbolic performance models of scientific kernels. In Proceedings of the European Conference on Parallel Processing. 65--77.
[5]
Muhammad Shoaib Bin Altaf and David A. Wood. 2017. LogCA: A high-level performance model for hardware accelerators. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, 375--388.
[6]
Joël Alwen and Jeremiah Blocki. 2016. Efficiently computing data-independent memory-hard functions. In Proceedings of the Annual Cryptology Conference. 241--271.
[7]
Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference (AFIPS’67 (Spring)). ACM, New York, NY, 483--485.
[8]
Tudor Antoniu, Paul A. Steckler, Shriram Krishnamurthi, Erich Neuwirth, and Matthias Felleisen. 2004. Validating the unit correctness of spreadsheet programs. In Proceedings of the 26th International Conference on Software Engineering (ICSE’04). IEEE, Los Alamitos, CA, 439--448. http://dl.acm.org/citation.cfm?id=998675.999448
[9]
Omid Azizi, Aqeel Mahesri, Benjamin C. Lee, Sanjay J. Patel, and Mark Horowitz. 2010. Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 26--36.
[10]
Shekhar Borkar. 2010. The exascale challenge. In Proceedings of 2010 International Symposium on VLSI Design, Automation, and Test. 2--3.
[11]
Sergey Bravyi and Jeongwan Haah. 2012. Magic-state distillation with low overhead. Physical Review A 86, 5 (2012), 052329.
[12]
M. Breughe, S. Eyerman, and L. Eeckhout. 2012. A mechanistic performance model for superscalar in-order processors. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems Software. 14--24.
[13]
D. Brooks, V. Tiwari, and M. Martonosi. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of 27th International Symposium on Computer Architecture. 83--94.
[14]
X. E. Chen and T. M. Aamodt. 2008. Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs. In Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture. 59--70.
[15]
Eric S. Chung, Peter A. Milder, James C. Hoe, and Ken Mai. 2010. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’43). IEEE, Los Alamitos, CA, 225--236.
[16]
John Clow, Georgios Tzimpragos, Deeksha Dangwal, Sammy Guo, Joseph McMahan, and Timothy Sherwood. 2017. A pythonic approach for rapid hardware prototyping and instrumentation. In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL’17). 1--7.
[17]
J. Clow, G. Tzimpragos, D. Dangwal, S. Guo, J. McMahan, and T. Sherwood. 2017. A pythonic approach for rapid hardware prototyping and instrumentation. In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL’17). 1--7.
[18]
Jason Cong and Bingjun Xiao. 2014. Minimizing computation in convolutional neural networks. In Proceedings of the International Conference on Artificial Neural Networks. 281--290.
[19]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. 3123--3131.
[20]
Weilong Cui, Yongshan Ding, Deeksha Dangwal, Adam Holmes, Joseph McMahan, Ali Javadi-Abhari, Georgios Tzimpragos, Frederic T. Chong, and Timothy Sherwood. 2018. Charm: A language for closed-form high-level architecture modeling. In Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 152--165.
[21]
Weilong Cui and Timothy Sherwood. 2017. Estimating and understanding architectural risk. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50).
[22]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08). 337--340. http://dl.acm.org/citation.cfm?id=1792734.1792766
[23]
Hilding Elmqvist and Sven Erik Mattsson. 1997. An introduction to the physical modeling language Modelica. In Proceedings of the 9th European Simulation Symposium (ESS’07). 110--114.
[24]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, 365--376.
[25]
Stijn Eyerman and Lieven Eeckhout. 2010. Modeling critical sections in Amdahl’s law and its implications for multicore design. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 362--370.
[26]
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2009. A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems 27, 2 (May 2009), Article 3, 37 pages.
[27]
S. Eyerman, K. Hoste, and L. Eeckhout. 2011. Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (IEEE ISPASS’11). 216--226.
[28]
Z. Guz, E. Bolotin, I. Keidar, A. Kolodny, A. Mendelson, and U. C. Weiser. 2009. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters 8, 1 (Jan. 2009), 25--28.
[29]
Aric Hagberg, Pieter Swart, and Daniel S. Chult. 2008. Exploring Network Structure, Dynamics, and Function Using NetworkX. Technical Report. Los Alamos National Lab, Los Alamos, NM.
[30]
N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. 1991. The synchronous data flow programming language LUSTRE. Proceedings of the IEEE 79, 9 (Sept. 1991), 1305--1320.
[31]
Matthew A. Hammer, Umut A. Acar, and Yan Chen. 2009. CEAL: A C-based language for self-adjusting computation. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, New York, NY, 25--37.
[32]
Christian Haubelt, Jurgen Teich, Rainer Feldmann, and Burkhard Monien. 2003. SAT-based techniques in system synthesis. In Proceedings of the Conference on Design, Automation, and Test in Europe—Volume 1 (DATE’03). IEEE, Los Alamitos, CA, 11168. http://dl.acm.org/citation.cfm?id=789083.1022903
[33]
Mark D. Hill and Michael R. Marty. 2008. Amdahl’s law in the multicore era. Computer 41, 7 (July 2008), 33--38.
[34]
Sunpyo Hong and Hyesoon Kim. 2010. An integrated GPU power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 280--289.
[35]
John. Hopcroft and Richard. Karp. 1973. An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on Computing 2, 4 (1973), 225--231. arXiv:https://doi.org/10.1137/0202019
[36]
Engin Ïpek, Sally A. McKee, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. 2006. Efficiently exploring architectural design spaces via predictive modeling. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). ACM, New York, NY, 195--206.
[37]
A. B. Kahng, Bin Li, L. S. Peh, and K. Samadi. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the 2009 Design, Automation, and Test in Europe Conference and Exhibition. 423--428.
[38]
Eleni Kanellou, Nikolaos Chrysos, Stelios Mavridis, Yannis Sfakianakis, and Angelos Bilas. 2018. GPU provisioning: The 80--20 rule. In Proceedings of Euro-Par 2018: The 24th International Conference on Parallel and Distributed Computing. 352--364.
[39]
Gwangsun Kim, Niladrish Chatterjee, Mike O’Connor, and Kevin Hsieh. 2017. Toward standardized near-data processing with unrestricted data placement for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’17). ACM, New York, NY, Article 24, 12 pages.
[40]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’12). 1097--1105. http://dl.acm.org/citation.cfm?id=2999134.2999257
[41]
B. Lee and D. Brooks. 2006. Statistically rigorous regression modeling for the microprocessor design space. In Proceedings of ISCA-33: Workshop on Modeling, Benchmarking, and Simulation.
[42]
B. C. Lee and D. M. Brooks. 2007. Illustrative design space studies with microarchitectural regression models. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 340--351.
[43]
Benjamin C. Lee, David M. Brooks, Bronis R. de Supinski, Martin Schulz, Karan Singh, and Sally A. McKee. 2007. Methods of inference and learning for performance modeling of parallel applications. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’07). ACM, New York, NY, 249--258.
[44]
Benjamin C. Lee, Jamison Collins, Hong Wang, and David Brooks. 2008. CPR: Composable performance regression for scalable multiprocessor models. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41). IEEE, Los Alamitos, CA, 270--281.
[45]
P. LeGuernic, T. Gautier, M. Le Borgne, and C. Le Maire. 1991. Programming real-time applications with SIGNAL. Proceedings of the IEEE 79, 9 (Sept. 1991), 1321--1336.
[46]
X. Liang and D. Brooks. 2006. Mitigating the impact of process variations on processor register files and execution units. In Proceedings of the 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-06). 504--514.
[47]
Weichen Liu, Zonghua Gu, Jiang Xu, Xiaowen Wu, and Yaoyao Ye. 2011. Satisfiability modulo graph theory for task mapping and scheduling on multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems 22, 8 (Aug 2011), 1382--1389.
[48]
Louis Mandel and Marc Pouzet. 2005. ReactiveML: A reactive extension to ML. In Proceedings of the 7th ACM SIGPLAN International Conference on Principles and Practice of Declarative Programming (PPDP’05). ACM, New York, NY, 82--93.
[49]
Panagiotis Manolios, Daron Vroon, and Gayatri Subramanian. 2007. Automating component-based system assembly. In Proceedings of the 2007 International Symposium on Software Testing and Analysis (ISSTA’07). ACM, New York, NY, 61--72.
[50]
S. Mohanty, V. K. Prasanna, S. Neema, and J. Davis. 2002. Rapid design space exploration of heterogeneous embedded systems using symbolic search and multi-granular simulation. In Proceedings of the Joint Conference on Languages, Compilers, and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES/SCOPES’02). ACM, New York, NY, 18--27.
[51]
Arun Arvind Nair, Stijn Eyerman, Jian Chen, Lizy Kurian John, and Lieven Eeckhout. 2015. Mechanistic modeling of architectural vulnerability factor. ACM Transactions on Computer Systems 32, 4, Article 11 (Jan. 2015), 32 pages.
[52]
Steffen Peter and Tony Givargis. 2015. Component-based synthesis of embedded systems using satisfiability modulo theories. ACM Transactions on Design Automation of Electronic Systems 20, 4, Article 49 (Sept. 2015), 27 pages.
[53]
Stephen G. Powell, Kenneth R. Baker, and Barry Lawson. 2008. A critical review of the literature on spreadsheet errors. Decision Support Systems 46, 1 (Dec. 2008), 128--138.
[54]
A. Rahimi, L. Benini, and R. K. Gupta. 2016. Variability mitigation in nanometer CMOS integrated systems: A survey of techniques from circuits to software. Proceedings of the IEEE 104, 7 (July 2016), 1410--1448.
[55]
Felix Reimann, Michael Glaß, Christian Haubelt, Michael Eberl, and Jürgen Teich. 2010. Improving platform-based system synthesis by satisfiability modulo theories solving. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS’10). ACM, New York, NY, 135--144.
[56]
S. R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas. 2008. VARIUS: A model of process variation and resulting timing errors for microarchitects. IEEE Transactions on Semiconductor Manufacturing 21, 1 (Feb. 2008), 3--13.
[57]
Timothy E. Sheard. 2012. Painless programming combining reduction and search: Design principles for embedding decision procedures in high-level languages. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP’12). ACM, New York, NY, 89--102.
[58]
S. Song, C. Su, B. Rountree, and K. W. Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 673--686.
[59]
William J. Song, Saibal Mukhopadhyay, and Sudhakar Yalamanchili. 2016. Amdahl’s law for lifetime reliability scaling in heterogeneous multicore processors. In Proceedings of the 2016 International Symposium on High-Performance Computer Architecture (HPCA-22).
[60]
Kyle L. Spafford and Jeffrey S. Vetter. 2012. Aspen: A domain specific language for performance modeling. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, Article 84, 11 pages. http://dl.acm.org/citation.cfm?id=2388996.2389110
[61]
Xian-He Sun and Yong Chen. 2010. Reevaluating Amdahl’s law in the multicore era. Journal of Parallel and Distributed Computing 70, 2 (Feb. 2010), 183--188.
[62]
SymPy Development Team. 2016. SymPy: Python Library for Symbolic Mathematics. Available at http://www.sympy.org.
[63]
Tamás Szabó, Sebastian Erdweg, and Markus Voelter. 2016. IncA: A DSL for the definition of incremental program analyses. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). ACM, New York, NY, 320--331.
[64]
Nathan R. Tallent and Adolfy Hoisie. 2014. Palm: Easing the burden of analytical performance modeling. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS’14). ACM, New York, NY, 221--230.
[65]
Karl-Heinz Temme. 1989. CHARM: A synthesis tool for high-level chip-architecture planning. In Proceedings of the 1989 IEEE Custom Integrated Circuits Conference. 4.2/1--4.2/4.
[66]
Georgios Tzimpragos, Advait Madhavan, Dilip Vasudevan, Dmitri Strukov, and Timothy Sherwood. 2019. Boosted race trees for low energy classification. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). ACM, New York, NY, 215--228.
[67]
Didem Unat, Cy Chan, Weiqun Zhang, Samuel Williams, John Bachan, John Bell, and John Shalf. 2015. ExaSAT: An exascale co-design tool for performance modeling. International Journal of High Performance Computing Applications 29, 2 (May 2015), 209--232.
[68]
Manish Vachharajani, Neil Vachharajani, David A. Penry, Jason A. Blome, Sharad Malik, and David I. August. 2006. The liberty simulation environment: A deliberate approach to high-level system modeling. ACM Transactions on Computer Systems 24, 3 (Aug. 2006), 211--249.
[69]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM 52, 4 (April 2009), 65--76.
[70]
Dong Hyuk Woo and Hsien-Hsin S. Lee. 2008. Extending Amdahl’s law for energy-efficient computing in the many-core era. Computer 41, 12 (Dec. 2008), 24--31.
[71]
Xilinx. 2018. 7 Series Product Tables and Product Selection Guide. Retrieved September 23, 2019 from https://www.xilinx.com/support/documentation/selection-guides/7-series-product-selection-guide.pdf.
[72]
L. Yavits, A. Morad, and R. Ginosar. 2014. The effect of communication and synchronization on Amdahl’s law in multicore systems. Parallel Computing 40, 1 (2014), 1--16.
[73]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). ACM, New York, NY, 161--170.
[74]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 161--170.

Cited By

View all
  • (2023)User-Guided Personalized Image Aesthetic Assessment Based on Deep Reinforcement LearningIEEE Transactions on Multimedia10.1109/TMM.2021.313075225(736-749)Online publication date: 1-Jan-2023
  • (2022)Harmonious Textual Layout Generation Over Natural Images via Deep Aesthetics LearningIEEE Transactions on Multimedia10.1109/TMM.2021.309790024(3416-3428)Online publication date: 1-Jan-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 16, Issue 1
January 2020
232 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3365593
  • Editor:
  • Ramesh Karri
Issue’s Table of Contents
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 25 October 2019
Accepted: 01 September 2019
Revised: 01 June 2019
Received: 01 January 2019
Published in JETC Volume 16, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. High-level models
  2. design space exploration
  3. modeling language

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)28
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)User-Guided Personalized Image Aesthetic Assessment Based on Deep Reinforcement LearningIEEE Transactions on Multimedia10.1109/TMM.2021.313075225(736-749)Online publication date: 1-Jan-2023
  • (2022)Harmonious Textual Layout Generation Over Natural Images via Deep Aesthetics LearningIEEE Transactions on Multimedia10.1109/TMM.2021.309790024(3416-3428)Online publication date: 1-Jan-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media