Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1250662.1250704acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Matrix scheduler reloaded

Published: 09 June 2007 Publication History

Abstract

From multiprocessor scale-up to cache sizes to the number of reorder-buffer entries, microarchitects wish to reap the benefits of more computing resources while staying within power and latency bounds. This tension is quite evident in schedulers, which need to be large and single-cycle for maximum performance on out-of-order cores. In this work we present two straightforward modifications to a matrix scheduler implementation which greatly strengthen its scalability. Both are based on the simple observation that the wakeup and picker matrices are sparse, even at small sizes; thus small indirection tables can be used to greatly reduce their width and latency. This technique can be used to create quicker iso-performance schedulers (17-58% reduced critical path) or larger iso-timing schedulers (7-26% IPC increase). Importantly, the power and area requirements of the additional hardware are likely offset by the greatly reduced matrix sizes and subsuming the functionality of the power-hungry allocation CAMs.

References

[1]
AMD software optimization guide for AMD64 processors, pub 25--112, rev 3.06, www.amd.com.
[2]
E. Borch, E. Tune, E. Manne, S. Emer, Loose loops sink chips, in Proceedings of HPCA-8, Feb. 2002.
[3]
A. Bracy, A. Prahlad, P. Roth, Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth, in Proceedings of MICRO-37, 2005.
[4]
E. Brekelbaum, J. Rupley, C. Wilkerson, B. Black, Hierarchal scheduling windows, in Proceedings of MICRO-35, 2002.
[5]
M. Brown, J. Stark, Y. Patt, Select-free instruction scheduling logic, in Proceedings of MICRO-34, 2001.
[6]
M. Butler, Y. Patt, An investigation of the performance of various dynamic scheduling techniques, in Proceedings of MICRO-25, 1992.
[7]
D. Ernst, T. Austin, Efficient dynamic scheduling through tag elimination, in Proceedings of ISCA-29, 2002.
[8]
D. Ernst, A. Hamel, T. Austin, Cyclone: a broadcast free dynamic instruction scheduler with selective replay, in Proceedings of ISCA-30, 2003.
[9]
J. Farrell, T. Fischer, Issue logic for a 600-Mhz out-of-order execution microprocessor, in IEEE Journal of Solid State Circuits, Vol. 33, No. 5, May 1998.
[10]
B. Fields, S. Rubin, R. Bodik, Focusing processor policies via critical-path prediction, in Proceedings of ISCA-28, 2001.
[11]
B. Fields, R. Bodik, M. Hill, Slack: maximizing performance under technological constraints, in Proceedings of ISCA-29, 2002.
[12]
A. Fog, The microarchitecture of Intel and AMD CPUs, www.agner.org/optimize/microarchitecture.pdf, Aug 13 2006.
[13]
A. Gonzales, M. Valero, Virtual Physical Registers, in Proceedings of HPCA-4, 1998.
[14]
M. Goshima, K. Nishino, Y. Nakashima, S. Mori, S. Tomita, A high-speed dynamic instruction scheduling scheme for superscalar processors, in Proceedings of MICRO-34, Dec 2001.
[15]
G. Hamerly, E. Perelman, J. Lau, B. Calder, SimPoint 3.0: faster and more flexible program analysis, Journal of Instruction Level Parallelism, Sep 2005.
[16]
I. Kim, M. Lipasti, Half-price architecture, in Proceedings of ISCA-30, 2003.
[17]
K. Krewell, Intel Looks to Core for Success, in Microprocessor Report, Mar 27 2006.
[18]
A. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, E. Rotenberg, A large, fast instruction window for tolerating cache misses, in Proceedings of ISCA-29, 2002.
[19]
D. Leibholz, R. Razdan, The Alpha 21264: a 500MHz out-of-order execution microprocessor, in Proceedings of IEEE Compcon, 1997.
[20]
E. Marques, C. Kirner, Design of the matching unit of a massively parallel dataflow computing system, in Proceedings of IEEE Conference on Massively Parallel Computing Systems, May 1994.
[21]
P. Michaud, A. Seznec, Data-flow prescheduling for large instruction windows in out-of-order processors, in Proceedings of HPCA-7, 2001.
[22]
S. Palacharla, N. Jouppi, J. Smith, Complexity-effective superscalar processors, in Proceedings of ISCA-24, 1997.
[23]
J. Parcerisa, J. Sahuquillo, A. Gonzlez, J. Duato, On-chip interconnects and instruction steering schemes for clustered microarchitectures, IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 2, Feb 2005.
[24]
P. Sassone, D. Wills, Dynamic strands: collapsing speculative dependence chains for reducing pipeline communication, in Proceedings of MICRO-37, 2005.
[25]
P. Sassone, D. Wills, G. Loh, Static strands: safely collapsing dependence chains for increasing embedded power efficiency, in Proceedings of the ACM Conference on Languages, Compilers, and Tools for Embedded Systems, 2005.
[26]
J. Shen, M. Lipasti, Modern Processor Design, McGraw Hill, 2003.
[27]
B. Sinharoy, R. Kalla, J. Tendler, R. Eickemeyer, J. Joyner, "POWER5 system microarchitecture," IBM Journal of Research and Development, Vol 49, No. 4/5, July 2005.
[28]
J. Smith, A. Pleszkun, Implementing precise interrupts in pipelined processors, Proceedings of Computer Architecture, 1985.
[29]
S. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, M. Upton, Continual flow pipelines, in Proceedings of ASPLOS-11, Oct 2004.
[30]
J. Stark, M. Brown, Y. Patt, On pipelining dynamic instruction scheduling logic, in Proceedings of MICRO-33, 2000.
[31]
E. Tune, D. Liang, D. Tullsen, B. Calder, Dynamic prediction of critical path instructions, in Proceedings of HPCA-7, 2001.

Cited By

View all
  • (2023)Simple Out of Order Core for GPGPUsProceedings of the 15th Workshop on General Purpose Processing Using GPU10.1145/3589236.3589244(21-26)Online publication date: 25-Feb-2023
  • (2023)SQUIP: Exploiting the Scheduler Queue Contention Side Channel2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179368(2256-2272)Online publication date: May-2023
  • (2022)HAIR: Halving the Area of the Integer Register File with Odd/Even BankingACM Transactions on Architecture and Code Optimization10.1145/354483819:4(1-25)Online publication date: 16-Sep-2022
  • Show More Cited By

Index Terms

  1. Matrix scheduler reloaded

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
    June 2007
    542 pages
    ISBN:9781595937063
    DOI:10.1145/1250662
    • General Chair:
    • Dean Tullsen,
    • Program Chair:
    • Brad Calder
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
      May 2007
      527 pages
      ISSN:0163-5964
      DOI:10.1145/1273440
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. matrix
    2. microarchitecture
    3. picker
    4. scheduler
    5. wakeup

    Qualifiers

    • Article

    Conference

    SPAA07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)64
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Simple Out of Order Core for GPGPUsProceedings of the 15th Workshop on General Purpose Processing Using GPU10.1145/3589236.3589244(21-26)Online publication date: 25-Feb-2023
    • (2023)SQUIP: Exploiting the Scheduler Queue Contention Side Channel2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179368(2256-2272)Online publication date: May-2023
    • (2022)HAIR: Halving the Area of the Integer Register File with Odd/Even BankingACM Transactions on Architecture and Code Optimization10.1145/354483819:4(1-25)Online publication date: 16-Sep-2022
    • (2022)Segmenting Age Matrices to Improve Instruction Scheduling without Increasing Delay and Area2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00059(360-363)Online publication date: Oct-2022
    • (2020)A Circuit for Identifying Oldest Ready Instructions in Reservation Stations2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)10.23919/MIPRO48935.2020.9245125(109-113)Online publication date: 28-Sep-2020
    • (2019)An Open Source FPGA-Optimized Out-of-Order RISC-V Soft Processor2019 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT47387.2019.00016(63-71)Online publication date: Dec-2019
    • (2018)High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft ProcessorsACM Transactions on Reconfigurable Technology and Systems10.1145/309374111:1(1-22)Online publication date: 9-Jan-2018
    • (2018)Performance improvement by prioritizing the issue of the instructions in unconfident branch slicesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00016(82-94)Online publication date: 20-Oct-2018
    • (2018)Rearranging Random Issue Queue with High IPC and Short Delay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00027(123-131)Online publication date: Oct-2018
    • (2016)Register sharing for equality predictionThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195643(1-12)Online publication date: 15-Oct-2016
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media