Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories

Published: 17 November 2014 Publication History

Abstract

Spatial architectures provide energy-efficient computation but require effective scheduling algorithms. Existing heuristic-based approaches offer low compiler/architect productivity, little optimality insight, and low architectural portability.
We seek to develop a spatial-scheduling framework by utilizing constraint-solving theories and find that architecture primitives and scheduler responsibilities can be related through five abstractions: computation placement, data routing, event timing, resource utilization, and the optimization objective. We encode these responsibilities as 20 mathematical constraints, using SMT and ILP, and create schedulers for the TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using constraint solving is implementable, is practical, and can outperform specialized schedulers.

References

[1]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.).
[2]
S. Amarasinghe, D. R. Karger, W. Lee, and V. S. Mirrokni. 2002. A Theoretical and Practical Approach to Instruction Scheduling on Spatial Architectures. Technical Report. MIT.
[3]
Said Amellal and Bozena Kaminska. 1994. Functional synthesis of digital systems with TASS. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 13, 5 (May 1994), 537--552.
[4]
Corinne Ancourt and François Irigoin. 1991. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'91). 39--50.
[5]
Omid Azizi, Aqeel Mahesri, Benjamin C. Lee, Sanjay J. Patel, and Mark Horowitz. 2010. Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). ACM, 26--36.
[6]
Shuvra S. Battacharyya, Edward A. Lee, and Praveen K. Murthy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic.
[7]
Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5 (2011), 67--77.
[8]
Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, Michael Dahlin, Lizy K. John, Calvin Lin, Chuck R. Moore, Jim Burrill, Robert G. McDonald, William Yoder, and the TRIPS Team. 2004. Scaling to the end of silicon with EDGE architectures. IEEE Comput. 37, 7 (2004), 44--55.
[9]
Alessandro Cimatti, Anders Franzén, Alberto Griggio, Roberto Sebastiani, and Cristian Stenico. 2010. Satisfiability Modulo the Theory of Costs: Foundations and Applications. (TACAS 2010), 99--113.
[10]
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). 30--40.
[11]
Jason Cong, Karthik Gururaj, Guoling Han, and Wei Jiang. 2009. Synthesis algorithm for application-specific homogeneous processor networks. IEEE Trans. Very Large Scale Integr. Syst. 17, 9 (Sept. 2009).
[12]
Katherine E. Coons, Xia Chen, Doug Burger, Kathryn S. McKinley, and Sundeep K. Kushwaha. 2006. A spatial path scheduling algorithm for EDGE architectures. SIGARCH Comput. Archit. News 34, 5 (Oct. 2006), 129--140.
[13]
Lorenzo De Carli, Yi Pan, Amit Kumar, Cristian Estan, and Karthikeyan Sankaralingam. 2009. PLUG: Flexible lookup modules for rapid deployment of new protocols in high-speed routers. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication (SIGCOMM'09). 207--218.
[14]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In TACAS.
[15]
Abhishek Deb, Josep Maria Codina, and Antonio González. 2011. SoftHV: A HW/SW co-designed processor with horizontal and vertical fusion. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF'11). Article 1, 10 pages.
[16]
Alexandre E. Eichenberger and Edward S. Davidson. 1997. Efficient formulation for optimal modulo schedulers. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI'97). 194--205.
[17]
Christine Eisenbeis and Antoine Sawaya. 1996. Optimal Loop Parallelization under Register Constraints. Research Report RR-2781, Inria.
[18]
John R. Ellis. 1985. Bulldog: A Compiler for Vliw Architectures. Ph.D. Dissertation, Yale.
[19]
Daniel W. Engels, Jon Feldman, David R. Karger, and Matthias Ruhl. 2001. Parallel processor scheduling with delay constraints. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'01). 577--585. http://dl.acm.org/citation.cfm?id=365411.365538
[20]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. SIGARCH Comput. Archit. News 39, 3 (June 2011), 365--376.
[21]
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'12). IEEE Computer Society, Washington, DC, 449--460.
[22]
Kevin Fan, Hyun hul Park, Manjunath Kudlur, and Scott Mahlke. 2008. Modulo scheduling for highly customized datapaths to increase hardware reusability. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). ACM, New York, NY, 124--133.
[23]
Paul Feautrier. 1994. Fine-grain scheduling under resource constraints. In Proceedings of the 7th Workshop on Language and Compilers for Parallel Computing. Springer-Verlag, LNCS 892, 1--15.
[24]
Mark Gebhart, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili, Aaron Smith, James Burrill, Stephen W. Keckler, Doug Burger, and Kathryn S. McKinley. 2009. An evaluation of the TRIPS computer system. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09).
[25]
Geoffrey J. Gordon, Sue Ann Hong, and Miroslav Dudík. First-order mixed integer linear programming. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI'09).
[26]
Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. A framework for resource-constrained rate-optimal software pipelining. In Proceedings of the Conference on Vector and Parallel Processing (CONPAR-94 VAPP VI).
[27]
Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro 33, 5 (2012).
[28]
Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). 503--514.
[29]
Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August. 2011. Bundled execution of recurring traces for energy-efficient general purpose processing. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44'11). 12--23.
[30]
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011), 6--15.
[31]
John N. Hooker. 2002. Logic, optimization and constraint programming. INFORMS J. Comput. 14 (2002), 295--321.
[32]
John N. Hooker and María Auxilio Osorio Lama. 1999. Mixed logical-linear programming. Discrete Appl. Math. 96--97, 1 (Oct. 1999).
[33]
Zhining Huang, Sharad Malik, Nahri Moreano, and Guido Araujo. 2004. The design of dynamically reconfigurable datapath coprocessors. ACM Trans. Embed. Comput. Syst. 3, 2 (May 2004), 361--384.
[34]
Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI'02). 304--314.
[35]
Krishnan Kailas, Ashok Agrawala, and Kemal Ebcioglu. 2001. CARS: A new code generation framework for clustered ILP processors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA'01). 133.
[36]
Daniel Kroening and Ofer Strichman. 2010. Decision Procedures: An Algorithmic Point of View. Springer.
[37]
Manjunath Kudlur and Scott Mahlke. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). 114--124.
[38]
Amit Kumar, Lorenzo De Carli, Sung Jin Kim, Marc de Kruijf, Karthikeyan Sankaralingam, Cristian Estan, and Somesh Jha. 2010. Design and implementation of the PLUG architecture for programmable and efficient network lookups. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 331--342.
[39]
Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. 1998. Space-time scheduling of instruction-level parallelism on a raw machine. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). 46--57.
[40]
Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006a. Instruction scheduling for a tiled dataflow architecture. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 141--150.
[41]
Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006b. Modeling instruction placement on a spatial architecture. In Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'06).
[42]
Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating spatial computation for whole program execution. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 163--174.
[43]
Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Kathryn S. McKinley, Calvin Lin, and Stephen W. Keckler. 2004. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 74--84.
[44]
Emre Özer, Sanjeev Banerjia, and Thomas M. Conte. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'31). 308--315. http://dl.acm.org/ citation.cfm?id=290940.291004
[45]
Jens Palsberg and Mayur Naik. 2004. ILP-Based Resource-Aware Compilation. (Multiprocessor Systems-on-Chips, chapter 12. Elsevier, 2004).
[46]
Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 166--176.
[47]
William Pugh. 1991. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing'91).
[48]
Michael Sartin-Tarm, Tony Nowatzki, Lorenzo De Carli, Karthikeyan Sankaralingam, and Cristian Estan. 2013. Constraint centric scheduling guide. SIGARCH Comput. Archit. News 41, 2 (May 2013), 17--21.
[49]
Nadathur Satish, Kaushik Ravindran, and Kurt Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE'07.
[50]
Robert E. Shostak. 1984. Deciding combinations of theories. J. ACM 31, 1 (Jan. 1984), 1--12.
[51]
Steven Swanson, Ken Michelson, Andrew Schwerin, and Mark Oskin. 2003. WaveScalar. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36). 291. http://dl.acm.org/citation.cfm?id=956417.956546
[52]
M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson-Edefors, and P. Stenstrom. 2007. FlexCore: Utilizing exposed datapath control for efficient computing. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS'07).
[53]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV).
[54]
Harvey M. Wagner. 1959. An integer linear-programming model for machine scheduling. Naval Res. Logistics Quarterly 6, 2 (1959), 131--140.
[55]
Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. 1997. Baring it all to software: RAW machines. Computer 30, 9 (1997), 86--93.
[56]
M. A. Watkins, M. J. Cianchetti, and D. H. Albonesi. 2008. Shared reconfigurable architectures for CMPS. In Proceedings of the 16th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'08). 299--304.
[57]
Laurence A. Wolsey and George L. Nemhauser. 1999. Integer and Combinatorial Optimization. Wiley.

Cited By

View all
  • (2023)DAG Processing Unit Version 2 (DPU-v2): Efficient Execution of Irregular Workloads on a Spatial DatapathEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_5(89-123)Online publication date: 26-Apr-2023
  • (2023)Irregular Workloads at Risk of Losing the Hardware LotteryEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_1(1-21)Online publication date: 26-Apr-2023
  • (2022)AMOSProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527440(874-887)Online publication date: 18-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 37, Issue 1
January 2015
170 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/2688877
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2014
Accepted: 01 July 2014
Revised: 01 March 2014
Received: 01 October 2013
Published in TOPLAS Volume 37, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Satisfiability Modulo Theories
  2. Spatial architectures
  3. integer linear programming
  4. spatial architecture scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)13
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)DAG Processing Unit Version 2 (DPU-v2): Efficient Execution of Irregular Workloads on a Spatial DatapathEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_5(89-123)Online publication date: 26-Apr-2023
  • (2023)Irregular Workloads at Risk of Losing the Hardware LotteryEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_1(1-21)Online publication date: 26-Apr-2023
  • (2022)AMOSProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527440(874-887)Online publication date: 18-Jun-2022
  • (2022)DPU-v2: Energy-Efficient Execution of Irregular Directed Acyclic GraphsProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00090(1288-1307)Online publication date: 1-Oct-2022
  • (2020)Time-sliced quantum circuit partitioning for modular architecturesProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392617(98-107)Online publication date: 11-May-2020
  • (2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media