research-article

Open access

A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories

Authors:

Michael Sartin-Tarm,

Lorenzo De Carli,

Karthikeyan Sankaralingam,

Cristian Estan,

Behnam RobatmiliAuthors Info & Claims

ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 37, Issue 1

Article No.: 2, Pages 1 - 30

https://doi.org/10.1145/2658993

Published: 17 November 2014 Publication History

Abstract

Spatial architectures provide energy-efficient computation but require effective scheduling algorithms. Existing heuristic-based approaches offer low compiler/architect productivity, little optimality insight, and low architectural portability.

We seek to develop a spatial-scheduling framework by utilizing constraint-solving theories and find that architecture primitives and scheduler responsibilities can be related through five abstractions: computation placement, data routing, event timing, resource utilization, and the optimization objective. We encode these responsibilities as 20 mathematical constraints, using SMT and ILP, and create schedulers for the TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using constraint solving is implementable, is practical, and can outperform specialized schedulers.

References

[1]

Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.).

Digital Library

[2]

S. Amarasinghe, D. R. Karger, W. Lee, and V. S. Mirrokni. 2002. A Theoretical and Practical Approach to Instruction Scheduling on Spatial Architectures. Technical Report. MIT.

[3]

Said Amellal and Bozena Kaminska. 1994. Functional synthesis of digital systems with TASS. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 13, 5 (May 1994), 537--552.

Digital Library

[4]

Corinne Ancourt and François Irigoin. 1991. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'91). 39--50.

Digital Library

[5]

Omid Azizi, Aqeel Mahesri, Benjamin C. Lee, Sanjay J. Patel, and Mark Horowitz. 2010. Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). ACM, 26--36.

Digital Library

[6]

Shuvra S. Battacharyya, Edward A. Lee, and Praveen K. Murthy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic.

Digital Library

[7]

Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5 (2011), 67--77.

Digital Library

[8]

Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, Michael Dahlin, Lizy K. John, Calvin Lin, Chuck R. Moore, Jim Burrill, Robert G. McDonald, William Yoder, and the TRIPS Team. 2004. Scaling to the end of silicon with EDGE architectures. IEEE Comput. 37, 7 (2004), 44--55.

Digital Library

[9]

Alessandro Cimatti, Anders Franzén, Alberto Griggio, Roberto Sebastiani, and Cristian Stenico. 2010. Satisfiability Modulo the Theory of Costs: Foundations and Applications. (TACAS 2010), 99--113.

Digital Library

[10]

Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). 30--40.

Digital Library

[11]

Jason Cong, Karthik Gururaj, Guoling Han, and Wei Jiang. 2009. Synthesis algorithm for application-specific homogeneous processor networks. IEEE Trans. Very Large Scale Integr. Syst. 17, 9 (Sept. 2009).

Digital Library

[12]

Katherine E. Coons, Xia Chen, Doug Burger, Kathryn S. McKinley, and Sundeep K. Kushwaha. 2006. A spatial path scheduling algorithm for EDGE architectures. SIGARCH Comput. Archit. News 34, 5 (Oct. 2006), 129--140.

Digital Library

[13]

Lorenzo De Carli, Yi Pan, Amit Kumar, Cristian Estan, and Karthikeyan Sankaralingam. 2009. PLUG: Flexible lookup modules for rapid deployment of new protocols in high-speed routers. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication (SIGCOMM'09). 207--218.

Digital Library

[14]

Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In TACAS.

Digital Library

[15]

Abhishek Deb, Josep Maria Codina, and Antonio González. 2011. SoftHV: A HW/SW co-designed processor with horizontal and vertical fusion. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF'11). Article 1, 10 pages.

Digital Library

[16]

Alexandre E. Eichenberger and Edward S. Davidson. 1997. Efficient formulation for optimal modulo schedulers. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI'97). 194--205.

Digital Library

[17]

Christine Eisenbeis and Antoine Sawaya. 1996. Optimal Loop Parallelization under Register Constraints. Research Report RR-2781, Inria.

[18]

John R. Ellis. 1985. Bulldog: A Compiler for Vliw Architectures. Ph.D. Dissertation, Yale.

Digital Library

[19]

Daniel W. Engels, Jon Feldman, David R. Karger, and Matthias Ruhl. 2001. Parallel processor scheduling with delay constraints. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'01). 577--585. http://dl.acm.org/citation.cfm&quest;id=365411.365538

Digital Library

[20]

Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. SIGARCH Comput. Archit. News 39, 3 (June 2011), 365--376.

Digital Library

[21]

Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'12). IEEE Computer Society, Washington, DC, 449--460.

Digital Library

[22]

Kevin Fan, Hyun hul Park, Manjunath Kudlur, and Scott Mahlke. 2008. Modulo scheduling for highly customized datapaths to increase hardware reusability. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). ACM, New York, NY, 124--133.

Digital Library

[23]

Paul Feautrier. 1994. Fine-grain scheduling under resource constraints. In Proceedings of the 7th Workshop on Language and Compilers for Parallel Computing. Springer-Verlag, LNCS 892, 1--15.

Digital Library

[24]

Mark Gebhart, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili, Aaron Smith, James Burrill, Stephen W. Keckler, Doug Burger, and Kathryn S. McKinley. 2009. An evaluation of the TRIPS computer system. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09).

Digital Library

[25]

Geoffrey J. Gordon, Sue Ann Hong, and Miroslav Dudík. First-order mixed integer linear programming. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI'09).

Digital Library

[26]

Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. A framework for resource-constrained rate-optimal software pipelining. In Proceedings of the Conference on Vector and Parallel Processing (CONPAR-94 VAPP VI).

Digital Library

[27]

Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro 33, 5 (2012).

Digital Library

[28]

Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). 503--514.

[29]

Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August. 2011. Bundled execution of recurring traces for energy-efficient general purpose processing. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44'11). 12--23.

Digital Library

[30]

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011), 6--15.

Digital Library

[31]

John N. Hooker. 2002. Logic, optimization and constraint programming. INFORMS J. Comput. 14 (2002), 295--321.

Digital Library

[32]

John N. Hooker and María Auxilio Osorio Lama. 1999. Mixed logical-linear programming. Discrete Appl. Math. 96--97, 1 (Oct. 1999).

Digital Library

[33]

Zhining Huang, Sharad Malik, Nahri Moreano, and Guido Araujo. 2004. The design of dynamically reconfigurable datapath coprocessors. ACM Trans. Embed. Comput. Syst. 3, 2 (May 2004), 361--384.

Digital Library

[34]

Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI'02). 304--314.

Digital Library

[35]

Krishnan Kailas, Ashok Agrawala, and Kemal Ebcioglu. 2001. CARS: A new code generation framework for clustered ILP processors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA'01). 133.

Digital Library

[36]

Daniel Kroening and Ofer Strichman. 2010. Decision Procedures: An Algorithmic Point of View. Springer.

Digital Library

[37]

Manjunath Kudlur and Scott Mahlke. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). 114--124.

Digital Library

[38]

Amit Kumar, Lorenzo De Carli, Sung Jin Kim, Marc de Kruijf, Karthikeyan Sankaralingam, Cristian Estan, and Somesh Jha. 2010. Design and implementation of the PLUG architecture for programmable and efficient network lookups. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 331--342.

Digital Library

[39]

Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. 1998. Space-time scheduling of instruction-level parallelism on a raw machine. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). 46--57.

Digital Library

[40]

Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006a. Instruction scheduling for a tiled dataflow architecture. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 141--150.

Digital Library

[41]

Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006b. Modeling instruction placement on a spatial architecture. In Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'06).

Digital Library

[42]

Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating spatial computation for whole program execution. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 163--174.

Digital Library

[43]

Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Kathryn S. McKinley, Calvin Lin, and Stephen W. Keckler. 2004. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 74--84.

Digital Library

[44]

Emre Özer, Sanjeev Banerjia, and Thomas M. Conte. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'31). 308--315. http://dl.acm.org/ citation.cfm&quest;id=290940.291004

Digital Library

[45]

Jens Palsberg and Mayur Naik. 2004. ILP-Based Resource-Aware Compilation. (Multiprocessor Systems-on-Chips, chapter 12. Elsevier, 2004).

[46]

Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 166--176.

Digital Library

[47]

William Pugh. 1991. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing'91).

Digital Library

[48]

Michael Sartin-Tarm, Tony Nowatzki, Lorenzo De Carli, Karthikeyan Sankaralingam, and Cristian Estan. 2013. Constraint centric scheduling guide. SIGARCH Comput. Archit. News 41, 2 (May 2013), 17--21.

Digital Library

[49]

Nadathur Satish, Kaushik Ravindran, and Kurt Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE'07.

Digital Library

[50]

Robert E. Shostak. 1984. Deciding combinations of theories. J. ACM 31, 1 (Jan. 1984), 1--12.

Digital Library

[51]

Steven Swanson, Ken Michelson, Andrew Schwerin, and Mark Oskin. 2003. WaveScalar. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36). 291. http://dl.acm.org/citation.cfm&quest;id=956417.956546

Digital Library

[52]

M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson-Edefors, and P. Stenstrom. 2007. FlexCore: Utilizing exposed datapath control for efficient computing. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS'07).

[53]

Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV).

Digital Library

[54]

Harvey M. Wagner. 1959. An integer linear-programming model for machine scheduling. Naval Res. Logistics Quarterly 6, 2 (1959), 131--140.

[55]

Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. 1997. Baring it all to software: RAW machines. Computer 30, 9 (1997), 86--93.

Digital Library

[56]

M. A. Watkins, M. J. Cianchetti, and D. H. Albonesi. 2008. Shared reconfigurable architectures for CMPS. In Proceedings of the 16th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'08). 299--304.

[57]

Laurence A. Wolsey and George L. Nemhauser. 1999. Integer and Combinatorial Optimization. Wiley.

Cited By

Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)DAG Processing Unit Version 2 (DPU-v2): Efficient Execution of Irregular Workloads on a Spatial DatapathEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_5(89-123)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_5
Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)Irregular Workloads at Risk of Losing the Hardware LotteryEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_1(1-21)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_1
Zheng SChen RWei AJin YHan QLu LWu BLi XYan SLiang YSalapura VZahran MChong FTang L(2022)AMOSProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527440(874-887)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527440
Show More Cited By

Index Terms

A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories

Recommendations

A general constraint-centric scheduling framework for spatial architectures
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which ...
A general constraint-centric scheduling framework for spatial architectures
PLDI '13

Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which ...
Solving constraint satisfaction problems with SAT modulo theories

Due to significant advances in SAT technology in the last years, its use for solving constraint satisfaction problems has been gaining wide acceptance. Solvers for satisfiability modulo theories (SMT) generalize SAT solving by adding the ability to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems

ACM Transactions on Programming Languages and Systems Volume 37, Issue 1

January 2015

170 pages

ISSN:0164-0925

EISSN:1558-4593

DOI:10.1145/2688877

Editor:
Jens Palsberg
University of California, Los Angeles, USA

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2014

Accepted: 01 July 2014

Revised: 01 March 2014

Received: 01 October 2013

Published in TOPLAS Volume 37, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
484
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)13

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)DAG Processing Unit Version 2 (DPU-v2): Efficient Execution of Irregular Workloads on a Spatial DatapathEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_5(89-123)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_5
Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)Irregular Workloads at Risk of Losing the Hardware LotteryEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_1(1-21)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_1
Zheng SChen RWei AJin YHan QLu LWu BLi XYan SLiang YSalapura VZahran MChong FTang L(2022)AMOSProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527440(874-887)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527440
Shah NMeert WVerhelst MHardavellas NCampanoni SGrot BKarpuzcu U(2022)DPU-v2: Energy-Efficient Execution of Irregular Directed Acyclic GraphsProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00090(1288-1307)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00090
Baker JDuckering CHoover AChong FPalesi MPalermo GGraves CArima E(2020)Time-sliced quantum circuit partitioning for modular architecturesProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392617(98-107)Online publication date: 11-May-2020
https://dl.acm.org/doi/10.1145/3387902.3392617
Nowatzki TArdalani NSankaralingam KWeng JEvripidou SStenström PO'Boyle M(2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243212

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents