Article

Free access

The performance potential of data dependence speculation & collapsing

Authors:

Yiannakis Sazeides,

Stamatis Vassiliadis,

James E. SmithAuthors Info & Claims

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Pages 238 - 247

Published: 02 December 1996 Publication History

Abstract

Two hardware methods for remedying the effects of true data dependences are studied. The first method dependence speculation, is used to eliminate address generation-load dependences. This is enabled by address prediction that permits load instructions to proceed speculatively without waiting for their address operands. The second technique, dependence collapsing, is used to eliminate data dependences by combining a dependence among multiple instructions into one instruction. The potential of these techniques for improving processor performance is demonstrated via trace-driven simulation. When both techniques are used with maximum issue widths of 4, 8, 16, and 32, the overall speedups in comparison to a base instruction level parallel machine are 1.20, 1.35, 1.51, and 1.66, respectively. In general, dependence collapsing contributes the majority of the improvement in performance. Under the dependence collapsing model, 298 to 478 of the total number of instructions in a trace may be collapsed. The distance separating the collapsed instructions is nearly always less than 8. Our experimentation also suggests that further performance improvements can be achieved by incorporating mechanisms that increase the address prediction rate.

References

[1]

T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. In Proceedings of the 19th International Symposium on Computer Architecture, pages 342- 351, May 1992.

Digital Library

[2]

T. M. Austin and G. S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual A CM/IEEE International Symposium and Workshop on Microarchitecture, pages 82-92, June 1995.

Digital Library

[3]

M. Butler, T-Y. Yeh, Y. N. Patt, M. Alsup, H. Scales, and M. Shebanow. Single instruction stream parallelism is greater than two. In Proceedings of the 18th International Symposium on Computer Architecture, May 1991.

Digital Library

[4]

T. E Chen and J. L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 44(5):609-623, May 1995.

Digital Library

[5]

R. J. Eickemeyer and S. Vassiliadis. A load instruction unit for pipelined processors. IBM Journal of Research and Development, 37(4):547-564, July 1993.

Digital Library

[6]

N. P. Jouppi and D. Wall. Available instruction-level parallelism for superscalar and superpipelined machines. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989.

Digital Library

[7]

M.S. Lam and R. P. Wilson. Limits of control flow on parallelism. In Proceedings of the 19th International Symposium on Computer Architecture, pages 46-57, May 1992.

Digital Library

[8]

J. R. Larus. Efficient program tracing. IEEE Computer, 26(5):52-61, May 1993.

Digital Library

[9]

M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and data speculation. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.

Digital Library

[10]

N. Malik, R. J. Eickemeyer, and S. Vassiliadis. Interlock collapsing alu for increased instmction-level parallelism. In Proceedings of the 25th Annual A CM/IEEE International Symposium and Workshop on Microarchitecture, September 1992.

Digital Library

[11]

S. McFafiing. Combining branch predictors. In DEC WRL TN-36, June 1993.

[12]

S. Mehrotra and L. Harrison. Examination of a memory access classification scheme for pointer intensive and numeric programs. In Proceedings of the lOth International Conference on Supercomputing, May 1996.

Digital Library

[13]

S. MICROSYSTEMS. The SPARC Architecture Manual. Prentice Hall, 1992.

Digital Library

[14]

R. K. Montoye, E. Hokenek, and S. L. Runyon. Design of the ibm risc systerrff6000 floating-point execution unit. iBM JournalofResearchandDevelopment, 34(1 ):59-70, January 1990.

Digital Library

[15]

A. Moshovos. Increasing instruction level parallelism through instruction coallescing, private communication, 1995.

[16]

R.R. Oehler and R. D. Groves. Ibm risc system/6000processor architecture. IBM Journal of Researchand Development, 34(1):23-36, January 1990.

Digital Library

[17]

J. Phillips and S. Vassiliadis. High performance 3-1 interlock collapsing alu's. IEEE Transactions on Computers, 43(3):257-268, March 1994.

Digital Library

[18]

S. Vassiliadis, B. Blaner, and R. J. Eickemeyer. Scism: A scalable compound instruction set machine architecture. IBM Journal of Research and Development, 38(1 ):59-78, January 1994.

Digital Library

[19]

S. Vassiliadis, J. Phillips, and B. Blanner. Interlock collapsing alu's. IEEE Transactions on Computers, 42(7):825-839, July 1993.

Digital Library

[20]

D. W. Wall. Limits of instruction level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 176-188, April 1991.

Digital Library

[21]

S. Weiss and J. E. Smith. Inside IBM Power and PowerPC. Morgan Kaufmann Publishers inc., San Mateo, CA, 1994.

Digital Library

Cited By

Orosa LAzevedo RMutlu O(2018)AVPPACM Transactions on Architecture and Code Optimization10.1145/323956715:4(1-30)Online publication date: 7-Dec-2018
https://dl.acm.org/doi/10.1145/3239567
Sheikh RCain HDamodaran RHunter HMoreno JEmer JSanchez D(2017)Load value prediction via path-based address predictionProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123951(423-435)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123951
Ghandour WAkkary HMasri W(2012)Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value PredictionACM Transactions on Architecture and Code Optimization10.1145/2133382.21333839:1(1-33)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1145/2133382.2133383
Show More Cited By

Index Terms

The performance potential of data dependence speculation & collapsing
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
2. Hardware
  1. Electronic design automation
    1. Logic synthesis
      1. Circuit optimization

Recommendations

Data Dependence Speculation Using Data Address Prediction and its Enhancement with Instruction Reissue
EUROMICRO '98: Proceedings of the 24th Conference on EUROMICRO - Volume 1

In this paper, we introduce an instruction reissue mechanism in order to enhance dynamic data dependence speculation using data address prediction. Since instructions which are not data-dependent upon speculatively executed instructions are not squashed,...
Improving the accuracy and performance of memory communication through renaming
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

As processors continue to exploit more instruction-level parallelism, a greater demand is placed on reducing the effects of memory access latency. In this paper, we introduce a novel modification of the processor pipeline called memory renaming. Memory ...
Boosting SMT Performance by Speculation Control
IPDPS '01: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS'01) - Volume 1

Simultaneous Multithreading (SMT) is a technique that permits multiple threads to execute in parallel within a single processor. Usually, an SMT processor uses shared instruction queues to collect instructions from the different threads. Hence, an SMT ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

December 1996

359 pages

ISBN:0818676418

Chairmen:
Stephen Melvin
Zytek Communications Corp.
,
Steve Beaty
Hewlett-Packard Corp.

Copyright © Copyright (c) 1996 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\TCMM: TC on Microprocessors & Microcomputers

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 December 1996

Check for updates

Author Tags

Qualifiers

Article

Conference

MICRO96

Sponsor:

SIGMICRO
IEEE-CS\TCMM

MICRO96: 29th Annual International Symposium on Microarchitecture

December 2 - 4, 1996

Paris, France

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
394
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)12

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Orosa LAzevedo RMutlu O(2018)AVPPACM Transactions on Architecture and Code Optimization10.1145/323956715:4(1-30)Online publication date: 7-Dec-2018
https://dl.acm.org/doi/10.1145/3239567
Sheikh RCain HDamodaran RHunter HMoreno JEmer JSanchez D(2017)Load value prediction via path-based address predictionProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123951(423-435)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123951
Ghandour WAkkary HMasri W(2012)Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value PredictionACM Transactions on Architecture and Code Optimization10.1145/2133382.21333839:1(1-33)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1145/2133382.2133383
Dasika GWoh MSeo SClark NMudge TMahlke SKathail VTatge RBarua R(2010)Mighty-morphing power-SIMDProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878934(67-76)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878934
Ghandour WAkkary HMasri WSalapura VGschwind MKnoop J(2010)The potential of using dynamic information flow analysis in data value predictionProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854327(431-442)Online publication date: 11-Sep-2010
https://dl.acm.org/doi/10.1145/1854273.1854327
Trias APuiggalí JCastro FJové TSbert MMarzo J(2009)Speculative parallelization of multipath radiosity algorithmProceedings of the 12th international conference on Symposium on Performance Evaluation of Computer & Telecommunication Systems10.5555/1688291.1688305(89-95)Online publication date: 13-Jul-2009
https://dl.acm.org/doi/10.5555/1688291.1688305
Yehia SClark NMahlke SFlautner KConte TFaraboschi PMangione-Smith BNajjar W(2005)Exploring the design space of LUT-based transparent acceleratorsProceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1086297.1086301(11-21)Online publication date: 24-Sep-2005
https://dl.acm.org/doi/10.1145/1086297.1086301
Clark NBlome JChu MMahlke SBiles SFlautner K(2005)An Architecture Framework for Transparent Instruction Set Customization in Embedded ProcessorsACM SIGARCH Computer Architecture News10.1145/1080695.106999333:2(272-283)Online publication date: 1-May-2005
https://dl.acm.org/doi/10.1145/1080695.1069993
Clark NBlome JChu MMahlke SBiles SFlautner K(2005)An Architecture Framework for Transparent Instruction Set Customization in Embedded ProcessorsProceedings of the 32nd annual international symposium on Computer Architecture10.1109/ISCA.2005.9(272-283)Online publication date: 4-Jun-2005
https://dl.acm.org/doi/10.1109/ISCA.2005.9
Yehia STemam O(2004)From Sequences of Dependent Instructions to FunctionsProceedings of the 31st annual international symposium on Computer architecture10.5555/998680.1006721Online publication date: 19-Jun-2004
https://dl.acm.org/doi/10.5555/998680.1006721
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents