Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3343414.3343435acmconferencesArticle/Chapter ViewAbstractPublication PagesmemocodeConference Proceedingsconference-collections
research-article

Optimal compilation for exposed datapath architectures with buffered processing units by SAT solvers

Published: 18 November 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Conventional processor architectures are restricted in exploiting instruction level parallelism (ILP) due to the limited number of available registers in their instruction sets. Therefore, recent processor architectures expose their datapaths so that the compiler not only schedules instructions to functional units, but also takes care of directly moving values between functional units avoiding the need of registers at all. However, the current compiler technology is still based on classic register architectures where a nearly optimal register mapping is the key for the quality of the generated assembly code.
    The Synchronous Control Asynchronous Dataflow (SCAD) architecture is a new exposed datapath architecture where processing units (PUs) are equipped with first-in first-out (FIFO) buffers at their inputs and outputs. Code generation for SCAD machines can be done as known for classic queue machines to completely eliminate the use of registers, and to improve the degree of exploited ILP. However, the SCAD code generated this way is not optimal since compared to queue machines, SCAD machines can contain many PUs and buffers which offers the compiler more freedom to reduce unnecessary computational overhead. In this paper, we map the SCAD code generation problem to a satisfiability problem, and then use SAT solvers to generate code without overhead that works with the minimal number of PUs. The generated optimal code will serve as a reference to judge the quality of heuristics that will be finally used in SCAD compilers.

    References

    [1]
    E. Lee, "The problem with threads," IEEE Computer, vol. 39, no. 5, pp. 33--42, 2006.
    [2]
    D. Mosberger, "Memory consistency models," ACM SIGOPS: Operating Systems Review, vol. 27, no. 1, pp. 18--26, January 1993.
    [3]
    S. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," IEEE Computer, vol. 29, no. 12, pp. 66--76, December 1996.
    [4]
    R. Steinke and G. Nutt, "A unified theory of shared memory consistency," Journal of the ACM (JACM), vol. 51, no. 5, pp. 800--849, September 2004.
    [5]
    P. Axer, R. Ernst, H. Falk, A. Girault, D. Grund, N. Guan, B. Jonsson, P. Marwedel, J. Reineke, C. Rochange, M. Sebastian, R. von Hanxleden, R. Wilhelm, and W. Yi, "Building timing predictable embedded systems," Transactions on Embedded Computing Systems (TECS), vol. 13, no. 4, pp. 82:1--82:37, February 2014.
    [6]
    N. Jouppi and D. Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," in Architectural Support for Programming Languages and Operating Systems (ASPLOS), J. Emer, Ed. Boston, Massachusetts, USA: ACM, 1989, pp. 272--282.
    [7]
    D. Wall, "Limits of instruction-level parallelism," in Architectural Support for Programming Languages and Operating Systems (ASPLOS). Santa Clara, California, USA: ACM, 1991, pp. 176--188.
    [8]
    B. Rau and J. Fisher, "Instruction-level parallel processing: History, overview, and perspective," Journal of Supercomputing, vol. 7, no. 1--2, pp. 9--50, 1993.
    [9]
    R. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units," IBM Journal of Research and Development, vol. 11, no. 1, pp. 25--33, 1967.
    [10]
    R. Colwell, R. Nix, J. O'Donnell, D. Papworth, and P. Rodman, "A VLIW architecture for a trace scheduling compiler," ACM SIGARCH Computer Architecture News, vol. 15, no. 5, pp. 180--192, October 1987.
    [11]
    J. Fisher, P. Faraboschi, and C. Young, Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann, 2005.
    [12]
    J. Fisher, "Trace scheduling: A technique for global microcode compaction," IEEE Transactions on Computers (T-C), vol. C-30, no. 7, pp. 478--490, July 1981.
    [13]
    M. Lam, "Software pipelining: an effective scheduling technique for VLIW machines," in Programming Language Design and Implementation (PLDI), R. Wexelblat, Ed. Atlanta, Georgia, USA: ACM, 1988, pp. 318--328.
    [14]
    B. Ramakrishna Rau, "Iterative modulo scheduling: an algorithm for software pipelining loops," in Microarchitecture (MICRO). San Jose, California, USA: IEEE Computer Society, 1994, pp. 63--74.
    [15]
    R. Sethi and J. Ullman, "The generation of optimal code for arithmetic expressions," Journal of the ACM (JACM), vol. 17, no. 4, pp. 715--728, October 1970.
    [16]
    A. Aletà, J. Codina, A. González, and D. Kaeli, "Heterogeneous clustered VLIW microarchitectures," in Code Generation and Optimization (CGO). San Jose, California, USA: IEEE Computer Society, 2007, pp. 354--366.
    [17]
    W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe, "Space-time scheduling of instruction-level parallelism on a raw machine," in Architectural Support for Programming Languages and Operating Systems (ASPLOS), D. Bhandarkar and A. Agarwal, Eds. San Jose, California, USA: ACM, 1998, pp. 46--57.
    [18]
    S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, B. Liewei, J. Brown, M. Mattina, C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook, "TILE64 - processor: A 64-core SoC with mesh interconnect," in International Solid-State Circuits Conference (ISSCC). San Francisco, CA, USA: IEEE Computer Society, 2008, pp. 88--598.
    [19]
    S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. Eggers, "The WaveScalar architecture," ACM Transactions on Computer Systems (TOCS), vol. 25, no. 2, pp. 1--54, May 2007.
    [20]
    D. Burger, S. Keckler, K. McKinley, M. Dahlin, L. John, C. Lin, C. Moore, J. Burrill, R. McDonald, and W. Yoder, "Scaling to the end of silicon with EDGE architectures," IEEE Computer, vol. 37, no. 7, pp. 44--55, July 2004.
    [21]
    M. Thuresson, M. Själander, M. Björk, L. Svensson, P. Larsson-Edefors, and P. Stenström, "FlexCore: Utilizing exposed datapath control for efficient computing," in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), H. Blume, G. Gaydadjiev, C. Glossner, and P. Knijnenburg, Eds. Samos, Greece: IEEE Computer Society, 2007, pp. 18--25.
    [22]
    L. Waeijen, D. She, H. Corporaal, and Y. He, "A low-energy wide SIMD architecture with explicit datapath," Journal of Signal Processing Systems, vol. 80, no. 1, pp. 65--86, 2015.
    [23]
    H. Corporaal, "TTAs: Missing the ILP complexity wall," Journal of Systems Architecture, vol. 45, no. 12--13, pp. 949--973, June 1999.
    [24]
    A. Bhagyanath, T. Jain, and K. Schneider, "Poster abstract: A time-predictable model of computation," in Real-Time Systems Symposium (RTSS). San Antonio, Texas, USA: IEEE Computer Society, 2015, p. 376.
    [25]
    A. Bhagyanath, "Towards code generation for the synchronous control asynchronous dataflow (SCAD) architectures," in Methoden und Beschrei-bungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), R. Wimmer, Ed. Freiburg, Germany: University of Freiburg, 2016, pp. 77--88.
    [26]
    R. Nagarajan, S. Kushwaha, D. Burger, K. McKinley, C. Lin, and S. Keckler, "Static placement, dynamic issue (SPDI) scheduling for EDGE architectures," in Parallel Architectures and Compilation Techniques (PACT). Antibes Juan-les-Pins, France: IEEE Computer Society, 2004, pp. 74--84.
    [27]
    R. Vollmar, "Über einen Automaten mit Pufferspeicherung," Computing, vol. 5, no. 1, pp. 57--70, 1970.
    [28]
    M. Feller and M. Ercegovac, "Queue machines: An organization for parallel computation," in Conpar 81, ser. LNCS, W. Brauer, P. Brinch Hansen, D. Gries, C. Moler, G. Seegmüller, J. Stoer, N. Wirth, and W. Händler, Eds., vol. 111. Nürnberg, Germany: Springer, 1981, pp. 37--47.
    [29]
    H. Schmit, B. Levine, and B. Ylvisaker, "Queue machines: hardware compilation in hardware," in Field-Programmable Custom Computing Machines (FCCM), J. Arnold and K. Pocek, Eds. Napa, California, USA: IEEE Computer Society, 2002, pp. 152--160.
    [30]
    M. Davis, G. Logemann, and D. Loveland, "A machine program for theorem proving," Communications of the ACM (CACM), vol. 5, no. 7, pp. 394--397, 1962.

    Cited By

    View all
    • (2021)Translating structured sequential programs to dataflow graphsProceedings of the 19th ACM-IEEE International Conference on Formal Methods and Models for System Design10.1145/3487212.3487343(66-77)Online publication date: 20-Nov-2021
    1. Optimal compilation for exposed datapath architectures with buffered processing units by SAT solvers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MEMOCODE '16: Proceedings of the 14th ACM-IEEE International Conference on Formal Methods and Models for System Design
      November 2016
      196 pages
      ISBN:9781509027910

      Sponsors

      Publisher

      IEEE Press

      Publication History

      Published: 18 November 2016

      Check for updates

      Qualifiers

      • Research-article

      Conference

      MEMOCODE'16
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 34 of 82 submissions, 41%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Translating structured sequential programs to dataflow graphsProceedings of the 19th ACM-IEEE International Conference on Formal Methods and Models for System Design10.1145/3487212.3487343(66-77)Online publication date: 20-Nov-2021

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media