Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/243846.243890acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free access

The performance potential of data dependence speculation & collapsing

Published: 02 December 1996 Publication History

Abstract

Two hardware methods for remedying the effects of true data dependences are studied. The first method dependence speculation, is used to eliminate address generation-load dependences. This is enabled by address prediction that permits load instructions to proceed speculatively without waiting for their address operands. The second technique, dependence collapsing, is used to eliminate data dependences by combining a dependence among multiple instructions into one instruction. The potential of these techniques for improving processor performance is demonstrated via trace-driven simulation. When both techniques are used with maximum issue widths of 4, 8, 16, and 32, the overall speedups in comparison to a base instruction level parallel machine are 1.20, 1.35, 1.51, and 1.66, respectively. In general, dependence collapsing contributes the majority of the improvement in performance. Under the dependence collapsing model, 298 to 478 of the total number of instructions in a trace may be collapsed. The distance separating the collapsed instructions is nearly always less than 8. Our experimentation also suggests that further performance improvements can be achieved by incorporating mechanisms that increase the address prediction rate.

References

[1]
T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. In Proceedings of the 19th International Symposium on Computer Architecture, pages 342- 351, May 1992.
[2]
T. M. Austin and G. S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual A CM/IEEE International Symposium and Workshop on Microarchitecture, pages 82-92, June 1995.
[3]
M. Butler, T-Y. Yeh, Y. N. Patt, M. Alsup, H. Scales, and M. Shebanow. Single instruction stream parallelism is greater than two. In Proceedings of the 18th International Symposium on Computer Architecture, May 1991.
[4]
T. E Chen and J. L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 44(5):609-623, May 1995.
[5]
R. J. Eickemeyer and S. Vassiliadis. A load instruction unit for pipelined processors. IBM Journal of Research and Development, 37(4):547-564, July 1993.
[6]
N. P. Jouppi and D. Wall. Available instruction-level parallelism for superscalar and superpipelined machines. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989.
[7]
M.S. Lam and R. P. Wilson. Limits of control flow on parallelism. In Proceedings of the 19th International Symposium on Computer Architecture, pages 46-57, May 1992.
[8]
J. R. Larus. Efficient program tracing. IEEE Computer, 26(5):52-61, May 1993.
[9]
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and data speculation. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.
[10]
N. Malik, R. J. Eickemeyer, and S. Vassiliadis. Interlock collapsing alu for increased instmction-level parallelism. In Proceedings of the 25th Annual A CM/IEEE International Symposium and Workshop on Microarchitecture, September 1992.
[11]
S. McFafiing. Combining branch predictors. In DEC WRL TN-36, June 1993.
[12]
S. Mehrotra and L. Harrison. Examination of a memory access classification scheme for pointer intensive and numeric programs. In Proceedings of the lOth International Conference on Supercomputing, May 1996.
[13]
S. MICROSYSTEMS. The SPARC Architecture Manual. Prentice Hall, 1992.
[14]
R. K. Montoye, E. Hokenek, and S. L. Runyon. Design of the ibm risc systerrff6000 floating-point execution unit. iBM JournalofResearchandDevelopment, 34(1 ):59-70, January 1990.
[15]
A. Moshovos. Increasing instruction level parallelism through instruction coallescing, private communication, 1995.
[16]
R.R. Oehler and R. D. Groves. Ibm risc system/6000processor architecture. IBM Journal of Researchand Development, 34(1):23-36, January 1990.
[17]
J. Phillips and S. Vassiliadis. High performance 3-1 interlock collapsing alu's. IEEE Transactions on Computers, 43(3):257-268, March 1994.
[18]
S. Vassiliadis, B. Blaner, and R. J. Eickemeyer. Scism: A scalable compound instruction set machine architecture. IBM Journal of Research and Development, 38(1 ):59-78, January 1994.
[19]
S. Vassiliadis, J. Phillips, and B. Blanner. Interlock collapsing alu's. IEEE Transactions on Computers, 42(7):825-839, July 1993.
[20]
D. W. Wall. Limits of instruction level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 176-188, April 1991.
[21]
S. Weiss and J. E. Smith. Inside IBM Power and PowerPC. Morgan Kaufmann Publishers inc., San Mateo, CA, 1994.

Cited By

View all
  • (2018)AVPPACM Transactions on Architecture and Code Optimization10.1145/323956715:4(1-30)Online publication date: 7-Dec-2018
  • (2017)Load value prediction via path-based address predictionProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123951(423-435)Online publication date: 14-Oct-2017
  • (2012)Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value PredictionACM Transactions on Architecture and Code Optimization10.1145/2133382.21333839:1(1-33)Online publication date: 1-Mar-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
December 1996
359 pages
ISBN:0818676418

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 December 1996

Check for updates

Author Tags

  1. address generation-load dependences
  2. address prediction
  3. address prediction rate
  4. base instruction level parallel machine
  5. data dependence speculation
  6. dependence collapsing
  7. parallel programming
  8. performance potential
  9. trace-driven simulation
  10. true data dependences

Qualifiers

  • Article

Conference

MICRO96
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)AVPPACM Transactions on Architecture and Code Optimization10.1145/323956715:4(1-30)Online publication date: 7-Dec-2018
  • (2017)Load value prediction via path-based address predictionProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123951(423-435)Online publication date: 14-Oct-2017
  • (2012)Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value PredictionACM Transactions on Architecture and Code Optimization10.1145/2133382.21333839:1(1-33)Online publication date: 1-Mar-2012
  • (2010)Mighty-morphing power-SIMDProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878934(67-76)Online publication date: 24-Oct-2010
  • (2010)The potential of using dynamic information flow analysis in data value predictionProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854327(431-442)Online publication date: 11-Sep-2010
  • (2009)Speculative parallelization of multipath radiosity algorithmProceedings of the 12th international conference on Symposium on Performance Evaluation of Computer & Telecommunication Systems10.5555/1688291.1688305(89-95)Online publication date: 13-Jul-2009
  • (2005)Exploring the design space of LUT-based transparent acceleratorsProceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1086297.1086301(11-21)Online publication date: 24-Sep-2005
  • (2005)An Architecture Framework for Transparent Instruction Set Customization in Embedded ProcessorsACM SIGARCH Computer Architecture News10.1145/1080695.106999333:2(272-283)Online publication date: 1-May-2005
  • (2005)An Architecture Framework for Transparent Instruction Set Customization in Embedded ProcessorsProceedings of the 32nd annual international symposium on Computer Architecture10.1109/ISCA.2005.9(272-283)Online publication date: 4-Jun-2005
  • (2004)From Sequences of Dependent Instructions to FunctionsProceedings of the 31st annual international symposium on Computer architecture10.5555/998680.1006721Online publication date: 19-Jun-2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media