article

Dynamically allocating processor resources between nearby and distant ILP

Authors:

Rajeev Balasubramonian,

Sandhya Dwarkadas, and

David H. AlbonesiAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 29, Issue 2

Pages 26 - 37

https://doi.org/10.1145/384285.379249

Published: 01 May 2001 Publication History

Abstract

Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements.

In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is nor constrained by in order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get on overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.

References

[1]

H. Akkary and M. Driscoll. A Dynamic Multithreading Processor. In Proceedings of MICRO-31, pages 226-236, 1998.

Digital Library

[2]

R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory Hierarchy Reconfiguration for Energy and Performance in General- Purpose Processor Architectures. In Proceedings of MICRO-33, pages 245-257, Dec 2000.

Digital Library

[3]

R. Balasubramonian, S. Dwarkadas, and D. Albonesi. Dynamically Allocating Processor Resources between Nearby and Distant ILP. Technical Report 743, University of Rochester, Apr 2001.

Digital Library

[4]

D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.

[5]

R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous Subordinate Microthreading (SSMT). In Proceedings oflSCA, 1999.

Digital Library

[6]

T. Chen and J. Baer. Effective Hardware Based Data Prefetching for High Performance Processors. IEEE Transactions on Computers, 44(5):609-623, May 1995.

Digital Library

[7]

J.-L. Cruz, A. Gonzalez, M. Valero, and N. E Topham. Multiple-Banked Register File Architectures. In Proceedings of the 27th 1SCA, pages 316-325, 2000.

Digital Library

[8]

D. Bailey, et al. The NAS Parallel Benchmarks. Technical Report TR RNR-94-007, NASA Ames Research Center, March 1994.

[9]

M. Dubois and Y. H. Song. Assisted Execution. Technical Report CENG 98-25, EE-Systems, University of Southern California, Oct 1998.

[10]

J. Dundas and T. Mudge. Improving Data Cache Performance by Pre-executing Instructions Under a Cache Miss. In Proceedings oflCS, 1997.

Digital Library

[11]

A. Farcy, O. Temam, R. Espasa, and T. Juan. Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch Outcomes. In Proceedings of MICRO-31, pages 59-68, 1998.

Digital Library

[12]

K. Farkas, N. Jouppi, and E Chow. Register File Considerations in Dynamically Scheduled Processors. In Proceedings of HPCA, 1996.

Digital Library

[13]

N. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proceedings oflSCA, 1990.

Digital Library

[14]

R. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2), March/April 1999.

Digital Library

[15]

C.-K. Luk. Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors. In Proceedings of the 28th ISCA, 2001.

Digital Library

[16]

C.-K. Luk and T. Mowry. Compiler-based Prefetching for Recursive Data Structures. In Proceedings of ASPLOS VII, pages 222-233, 1996.

Digital Library

[17]

T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, and V. Vinals. Delaying Physical Register Allocation through Virtual-Physical Registers. In Proceedings of MICRO-32, pages 186-192, Nov 1999.

Digital Library

[18]

M. Moudgill, K. Pingali, and S. Vassiliadis. Register Renaming and Dynamic Speculation: an Alternative Approach. In Proceedings of MICRO, 1993.

Digital Library

[19]

T. Mowry, M. Lam, and A. Gupta. Design and Evaluation of a Compiler Algorithm for Prefetching. In Proceedings of ASPLOS-V, pages 62-73, 1992.

Digital Library

[20]

V. Pai and S. Adve. Code Transformations to Improve Memory Parallelism. In Proceedings of MICRO-32, pages 147-155, 1999.

Digital Library

[21]

S. Palacharla, N. Jouppi, and J. Smith. Complexity- Effective Superscalar Processors. In Proceedings of ISCA, pages 206-218, 1997.

Digital Library

[22]

S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the 27th ISCA, pages 25-36, 2000.

Digital Library

[23]

A. Rogers, M. Carlisle, J. Reppy, and L. Hendren. Supporting Dynamic Data Structures on Distributed Memory Machines. ACM Transactions on Programming Languages and Systems, Mar 1995.

Digital Library

[24]

E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In Proceedings of FTCS, 1999.

Digital Library

[25]

E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. Smith. Trace Processors. In Proceedings of MICRO-30, 1997.

Digital Library

[26]

A. Roth, A. Moshovos, and G. Sohi. Dependence Based Prefetching for Linked Data Structures. In Proceedings of ASPLOS VIII, pages 115-126, 1998.

Digital Library

[27]

A. Roth, A. Moshovos, and G. Sohi. Improving Virtual Function Call Target Prediction via Dependencebased Pre-computation. In Proceedings of ICS, 1999.

Digital Library

[28]

A. Roth and G. Sohi. Speculative Data-Driven Multithreading. In Proceedings of HPCA-7, 2001.

Digital Library

[29]

A. Sodani and G. Sohi. Dynamic Instruction Reuse. In Proceedings oflSCA, pages 194-205, 1997.

Digital Library

[30]

G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar Processors. In Proceedings oflSCA, 1995.

Digital Library

[31]

J. Steffan and T. Mowry. The Potential for Using Thread Level Data-Speculation to Facilitate Automatic Parallelization. In Proceedings of HPCA 4, 1998.

Digital Library

[32]

K. Sundaramoorthy, Z. Purser, and E. Rotenberg. Slipstream Processors: Improving both Performance and Fault Tolerance. In Proceedings of ASPLOS-IX, 2000.

Digital Library

[33]

D. Tullsen, S. Eggers, and H. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In Proceedings oflSCA, pages 392-403, 1995.

Digital Library

[34]

S. Wallace and N. Bagherzadeh. A Scalable Register File Architecture for Dynamically Scheduled Processors. In Proceedings of PACT, Oct 1996.

Digital Library

[35]

K. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2), April 1996.

Digital Library

[36]

C. Zilles and G. Sohi. Understanding the Backward Slices of Performance Degrading Instructions. In Proceedings of lSCA, pages 172-181,2000.

Digital Library

Cited By

Chadha GMahlke SNarayanasamy S(2015)Accelerating asynchronous programs through event sneak peekACM SIGARCH Computer Architecture News10.1145/2872887.275037343:3S(642-654)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750373
Chadha GMahlke SNarayanasamy SMarr DAlbonesi D(2015)Accelerating asynchronous programs through event sneak peekProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750373(642-654)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750373
Naithani AFeliu JAdileh AEeckhout L(2020)Precise Runahead Execution2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00040(397-410)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00040
Show More Cited By

Index Terms

Dynamically allocating processor resources between nearby and distant ILP

Recommendations

Dynamically allocating processor resources between nearby and distant ILP
ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture

Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP ...
Read More
Dynamically Allocating Processor Resources between Nearby and Distant ILP
Read More
Allowing for ILP in an embedded Java processor
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)

Java processors are ideal for embedded and network computing applications such as Internet TV's, set-top boxes, smart phones, and other consumer electronics applications. In this paper, we investigate cost-effective microarchitectural techniques to ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 29, Issue 2

Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)

May 2001

262 pages

ISSN:0163-5964

DOI:10.1145/384285

Editor:
Per Stenström
Chalmers Univ. of Technology

Issue’s Table of Contents

ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
June 2001
289 pages
ISBN:0769511627
DOI:10.1145/379240
Chairman:
Per Stenström
Chalmers Univ. of Technology

Copyright © 2001 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Published in SIGARCH Volume 29, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
606
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Chadha GMahlke SNarayanasamy S(2015)Accelerating asynchronous programs through event sneak peekACM SIGARCH Computer Architecture News10.1145/2872887.275037343:3S(642-654)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750373
Chadha GMahlke SNarayanasamy SMarr DAlbonesi D(2015)Accelerating asynchronous programs through event sneak peekProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750373(642-654)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750373
Naithani AFeliu JAdileh AEeckhout L(2020)Precise Runahead Execution2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00040(397-410)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00040
Mittal S(2016)A Survey of Recent Prefetching Techniques for Processor CachesACM Computing Surveys10.1145/290707149:2(1-35)Online publication date: 2-Aug-2016
https://dl.acm.org/doi/10.1145/2907071
Chadha GMahlke SNarayanasamy S(2015)Accelerating asynchronous programs through event sneak peekACM SIGARCH Computer Architecture News10.1145/2872887.275037343:3S(642-654)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750373
Friedmann OKlaedtke FLange M(2015)Ramsey-Based Inclusion Checking for Visibly Pushdown AutomataACM Transactions on Computational Logic10.1145/277422116:4(1-24)Online publication date: 26-Aug-2015
https://dl.acm.org/doi/10.1145/2774221
Chadha GMahlke SNarayanasamy SMarr DAlbonesi D(2015)Accelerating asynchronous programs through event sneak peekProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750373(642-654)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750373
Yi Ma Hongliang Gao Dimitrov MHuiyang Zhou (2015)Submitted to IEEE Transactions on Parallel and Distributed Systems Special Issue on CMP ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.1080(1-1)Online publication date: 2015
https://doi.org/10.1109/TPDS.2007.1080
Yang YXiang PMantor MRubin NHsu LDong QZhou H(2014)A Case for a Flexible Scalar Unit in SIMT ArchitectureProceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium10.1109/IPDPS.2014.21(93-102)Online publication date: 19-May-2014
https://dl.acm.org/doi/10.1109/IPDPS.2014.21
Huebner MGoehringer DTradowsky CHenkel JBecker J(2012)Adaptive processor architecture - invited paper2012 International Conference on Embedded Computer Systems (SAMOS)10.1109/SAMOS.2012.6404181(244-251)Online publication date: Jul-2012
https://doi.org/10.1109/SAMOS.2012.6404181
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents