Article

Matrix scheduler reloaded

Authors:

Peter G. Sassone,

Jeff Rupley, II,

Edward Brekelbaum,

Gabriel H. Loh,

Bryan BlackAuthors Info & Claims

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

Pages 335 - 346

https://doi.org/10.1145/1250662.1250704

Published: 09 June 2007 Publication History

Abstract

From multiprocessor scale-up to cache sizes to the number of reorder-buffer entries, microarchitects wish to reap the benefits of more computing resources while staying within power and latency bounds. This tension is quite evident in schedulers, which need to be large and single-cycle for maximum performance on out-of-order cores. In this work we present two straightforward modifications to a matrix scheduler implementation which greatly strengthen its scalability. Both are based on the simple observation that the wakeup and picker matrices are sparse, even at small sizes; thus small indirection tables can be used to greatly reduce their width and latency. This technique can be used to create quicker iso-performance schedulers (17-58% reduced critical path) or larger iso-timing schedulers (7-26% IPC increase). Importantly, the power and area requirements of the additional hardware are likely offset by the greatly reduced matrix sizes and subsuming the functionality of the power-hungry allocation CAMs.

References

[1]

AMD software optimization guide for AMD64 processors, pub 25--112, rev 3.06, www.amd.com.

[2]

E. Borch, E. Tune, E. Manne, S. Emer, Loose loops sink chips, in Proceedings of HPCA-8, Feb. 2002.

Digital Library

[3]

A. Bracy, A. Prahlad, P. Roth, Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth, in Proceedings of MICRO-37, 2005.

Digital Library

[4]

E. Brekelbaum, J. Rupley, C. Wilkerson, B. Black, Hierarchal scheduling windows, in Proceedings of MICRO-35, 2002.

Digital Library

[5]

M. Brown, J. Stark, Y. Patt, Select-free instruction scheduling logic, in Proceedings of MICRO-34, 2001.

Digital Library

[6]

M. Butler, Y. Patt, An investigation of the performance of various dynamic scheduling techniques, in Proceedings of MICRO-25, 1992.

Digital Library

[7]

D. Ernst, T. Austin, Efficient dynamic scheduling through tag elimination, in Proceedings of ISCA-29, 2002.

Digital Library

[8]

D. Ernst, A. Hamel, T. Austin, Cyclone: a broadcast free dynamic instruction scheduler with selective replay, in Proceedings of ISCA-30, 2003.

Digital Library

[9]

J. Farrell, T. Fischer, Issue logic for a 600-Mhz out-of-order execution microprocessor, in IEEE Journal of Solid State Circuits, Vol. 33, No. 5, May 1998.

[10]

B. Fields, S. Rubin, R. Bodik, Focusing processor policies via critical-path prediction, in Proceedings of ISCA-28, 2001.

Digital Library

[11]

B. Fields, R. Bodik, M. Hill, Slack: maximizing performance under technological constraints, in Proceedings of ISCA-29, 2002.

Digital Library

[12]

A. Fog, The microarchitecture of Intel and AMD CPUs, www.agner.org/optimize/microarchitecture.pdf, Aug 13 2006.

[13]

A. Gonzales, M. Valero, Virtual Physical Registers, in Proceedings of HPCA-4, 1998.

Digital Library

[14]

M. Goshima, K. Nishino, Y. Nakashima, S. Mori, S. Tomita, A high-speed dynamic instruction scheduling scheme for superscalar processors, in Proceedings of MICRO-34, Dec 2001.

Digital Library

[15]

G. Hamerly, E. Perelman, J. Lau, B. Calder, SimPoint 3.0: faster and more flexible program analysis, Journal of Instruction Level Parallelism, Sep 2005.

[16]

I. Kim, M. Lipasti, Half-price architecture, in Proceedings of ISCA-30, 2003.

Digital Library

[17]

K. Krewell, Intel Looks to Core for Success, in Microprocessor Report, Mar 27 2006.

[18]

A. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, E. Rotenberg, A large, fast instruction window for tolerating cache misses, in Proceedings of ISCA-29, 2002.

Digital Library

[19]

D. Leibholz, R. Razdan, The Alpha 21264: a 500MHz out-of-order execution microprocessor, in Proceedings of IEEE Compcon, 1997.

Digital Library

[20]

E. Marques, C. Kirner, Design of the matching unit of a massively parallel dataflow computing system, in Proceedings of IEEE Conference on Massively Parallel Computing Systems, May 1994.

[21]

P. Michaud, A. Seznec, Data-flow prescheduling for large instruction windows in out-of-order processors, in Proceedings of HPCA-7, 2001.

Digital Library

[22]

S. Palacharla, N. Jouppi, J. Smith, Complexity-effective superscalar processors, in Proceedings of ISCA-24, 1997.

Digital Library

[23]

J. Parcerisa, J. Sahuquillo, A. Gonzlez, J. Duato, On-chip interconnects and instruction steering schemes for clustered microarchitectures, IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 2, Feb 2005.

Digital Library

[24]

P. Sassone, D. Wills, Dynamic strands: collapsing speculative dependence chains for reducing pipeline communication, in Proceedings of MICRO-37, 2005.

Digital Library

[25]

P. Sassone, D. Wills, G. Loh, Static strands: safely collapsing dependence chains for increasing embedded power efficiency, in Proceedings of the ACM Conference on Languages, Compilers, and Tools for Embedded Systems, 2005.

Digital Library

[26]

J. Shen, M. Lipasti, Modern Processor Design, McGraw Hill, 2003.

[27]

B. Sinharoy, R. Kalla, J. Tendler, R. Eickemeyer, J. Joyner, "POWER5 system microarchitecture," IBM Journal of Research and Development, Vol 49, No. 4/5, July 2005.

Digital Library

[28]

J. Smith, A. Pleszkun, Implementing precise interrupts in pipelined processors, Proceedings of Computer Architecture, 1985.

Digital Library

[29]

S. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, M. Upton, Continual flow pipelines, in Proceedings of ASPLOS-11, Oct 2004.

Digital Library

[30]

J. Stark, M. Brown, Y. Patt, On pipelining dynamic instruction scheduling logic, in Proceedings of MICRO-33, 2000.

Digital Library

[31]

E. Tune, D. Liang, D. Tullsen, B. Calder, Dynamic prediction of critical path instructions, in Proceedings of HPCA-7, 2001.

Digital Library

Cited By

Mori KKosugi SYoshida HShimada HAndo H(2024)Localizing the Tag Comparisons in the Wakeup Logic to Reduce Energy Consumption of the Issue Queue2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00044(493-506)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00044
Huerta RCruz JArnau JGonzález A(2024)SIMILMicroprocessors & Microsystems10.1016/j.micpro.2024.105105111:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.micpro.2024.105105
Huerta RArnau JGonzalez A(2023)Simple Out of Order Core for GPGPUsProceedings of the 15th Workshop on General Purpose Processing Using GPU10.1145/3589236.3589244(21-26)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3589236.3589244
Show More Cited By

Index Terms

Matrix scheduler reloaded
1. Computer systems organization
  1. Architectures

Recommendations

Matrix scheduler reloaded

From multiprocessor scale-up to cache sizes to the number of reorder-buffer entries, microarchitects wish to reap the benefits of more computing resources while staying within power and latency bounds. This tension is quite evident in schedulers, which ...
Efficiently scaling out-of-order cores for simultaneous multithreading
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

Simultaneous multithreading (SMT) out-of-order cores waste a significant portion of structural out-of-order core resources on instructions that do not need them. These resources eliminate false ordering dependences. However, because thread interleaving ...
Difficult-path branch prediction using subordinate microthreads
Special Issue: Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02)

Branch misprediction penalties continue to increase as microprocessor cores become wider and deeper. Thus, improving branch prediction accuracy remains an important challenge. Simultaneous Subordinate Microthreading (SSMT) provides a means to improve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

June 2007

542 pages

ISBN:9781595937063

DOI:10.1145/1250662

General Chair:
Dean Tullsen
University of California, San Diego
,
Program Chair:
Brad Calder
Microsoft & University of California, San Diego

ACM SIGARCH Computer Architecture News Volume 35, Issue 2
May 2007
527 pages
ISSN:0163-5964
DOI:10.1145/1273440
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SPAA07

Sponsor:

SIGARCH
IEEE-CS

SPAA07: 19th ACM Symposium on Parallelism in Algorithms and Architectures

June 9 - 13, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
949
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)5

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mori KKosugi SYoshida HShimada HAndo H(2024)Localizing the Tag Comparisons in the Wakeup Logic to Reduce Energy Consumption of the Issue Queue2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00044(493-506)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00044
Huerta RCruz JArnau JGonzález A(2024)SIMILMicroprocessors & Microsystems10.1016/j.micpro.2024.105105111:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.micpro.2024.105105
Huerta RArnau JGonzalez A(2023)Simple Out of Order Core for GPGPUsProceedings of the 15th Workshop on General Purpose Processing Using GPU10.1145/3589236.3589244(21-26)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3589236.3589244
Gast SJuffinger JSchwarzl MSaileshwar GKogler AFranza SKöstl MGruss D(2023)SQUIP: Exploiting the Scheduler Queue Contention Side Channel2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179368(2256-2272)Online publication date: May-2023
https://doi.org/10.1109/SP46215.2023.10179368
Michaud PPeysieux A(2022)HAIR: Halving the Area of the Integer Register File with Odd/Even BankingACM Transactions on Architecture and Code Optimization10.1145/354483819:4(1-25)Online publication date: 16-Sep-2022
https://dl.acm.org/doi/10.1145/3544838
Ando H(2022)Segmenting Age Matrices to Improve Instruction Scheduling without Increasing Delay and Area2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00059(360-363)Online publication date: Oct-2022
https://doi.org/10.1109/ICCD56317.2022.00059
Spasov D(2020)A Circuit for Identifying Oldest Ready Instructions in Reservation Stations2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)10.23919/MIPRO48935.2020.9245125(109-113)Online publication date: 28-Sep-2020
https://doi.org/10.23919/MIPRO48935.2020.9245125
Mashimo SInoue KShioya RFujita AMatsuo RAkaki SFukuda AKoizumi TKadomoto JIrie HGoshima M(2019)An Open Source FPGA-Optimized Out-of-Order RISC-V Soft Processor2019 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT47387.2019.00016(63-71)Online publication date: Dec-2019
https://doi.org/10.1109/ICFPT47387.2019.00016
Wong HBetz VRose J(2018)High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft ProcessorsACM Transactions on Reconfigurable Technology and Systems10.1145/309374111:1(1-22)Online publication date: 9-Jan-2018
https://dl.acm.org/doi/10.1145/3093741
Ando HOskin MInoue K(2018)Performance improvement by prioritizing the issue of the instructions in unconfident branch slicesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00016(82-94)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00016
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten