Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/998680.1006708acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Published: 02 March 2004 Publication History

Abstract

The performance of memory-bound commercial applicationssuch as databases is limited by increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism(MLP) is an effective approach for improving the performance ofthese applications and that microarchitecture has a profound impacton achievable MLP. Using the epoch model of MLP, we reasonhow traditional microarchitecture features such as out-of-orderissue and state-of-the-art microarchitecture techniques suchas runahead execution affect MLP. Simulation results show that amoderately aggressive out-of-order issue processor improvesMLP over an in-order issue processor by 12-30%, and that aggressivehandling of loads, branches and serializing instructionsis needed to attain the full benefits of large out-of-order instructionwindows. The results also show that a processor's issue windowand reorder buffer should be decoupled to exploit MLP more efficiently.In addition, we demonstrate that runahead execution ishighly effective in enhancing MLP, potentially improving the MLPof the database workload by 82% and its overall performance by60%. Finally, our limit study shows that there is considerableheadroom in improving MLP and overall performance by implementingeffective instruction prefetching, more accurate branchprediction and better value prediction in addition to runahead execution.

References

[1]
{1} A. Maynard, C. Donelly and B. Olszewski, "Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads," in ASPLOS-VI, 1998.
[2]
{2} L. Barroso, K. Gharachorloo, E. Bugnion, "Memory System Characterization of Commercial Workloads," in 25th International Symposium on Computer Architecture, 1998.
[3]
{3} R. Hankins, T. Diep, M. Annavaram, B. Hirano, H. Eric, H. Nueckel and J. Shen, "Scaling and Characterizing Database Workloads: Bridging the Gap between Research and Practice," in 36th International Symposium on Microarchitecture, December 2003.
[4]
{4} W. Wulf, and S. McKee, "Hitting the Memory Wall: Implications of the Obvious," in Computer Architecture News, Vol. 23, No. 4, September 1995.
[5]
{5} A. Glew, "MLP yes! ILP no!," in ASPLOS Wild and Crazy Idea Session '98, October 1998.
[6]
{6} V. Pai and S. Adve, "Code Transformations to Improve Memory Parallelism," in 32nd International Symposium on Microarchitecture, November 1999.
[7]
{7} H. Zhou and T. Conte, "Enhancing Memory Level Parallelism via Recovery-Free Value Prediction," in International Conference on Supercomputing, June 2003.
[8]
{8} J. Dundas and T. Mudge, "Improving Data Cache Performance by Pre-Executing Instructions Under a Cache Miss," in International Conference on Supercomputing, July 1997.
[9]
{9} O. Mutlu, J. Stark, C. Wilkerson and Y. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," in 9th International Sysmposium on High Performance Computer Architecture, February 2003.
[10]
{10} M. Lipasti and J. Shen, "Value Locality and Load Value Prediction," in ASPLOS-VII, October 1996.
[11]
{11} F. Gabbay and A. Mendelson, "Speculative Execution Based on Value Prediction," in EE Department Tech Report 1080, Technion - Israel Institute of Technology, November 1996.
[12]
{12} Y. Sazeides and J. Smith, "The Predictability of Data Values," in 30th International Symposium on Microarchitecture, 1997.
[13]
{13} D. Weaver and T. Germond, "The SPARC Architecture Manual," PTR Prentice Hall, 1994.
[14]
{14} www.spec.org
[15]
{15} J. Collins, H. Wang, D. Tullsen, C. Hughes, Y. Lee, D. Lavery and J. Shen, "Speculative Precomputation: Long-Range Prefetching of Delinquent Loads," in 28th International Symposium on Computer Architecture, 2001.
[16]
{16} C. Luk, "Tolerating Memory Latency Through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors," in 28th International Symposium on Computer Architecture, 2001.
[17]
{17} D. Kim and D. Yeung, "Design and Evaluation of Compiler Algorithms for Pre-Execution," in ASPLOS-X, October 2002.
[18]
{18} K. Wang and M. Franklin, "Highly Accurate Data Value Prediction Using Hybrid Predictors," in 30th International Symposium on Microarchitecture, November 1997.
[19]
{19} T. Karkhanis and J. Smith, "A Day in the Life of a Data Cache Miss," in Workshop on Memory Performance Issues, May 2002.
[20]
{20} H. Akkary, R. Rajwar and S. Srinivasan, "Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors," in 36th International Symposium on Microarchitecture, December 2003.
[21]
{21} A. Roth and G. Sohi, "Speculative Data-Driven Multithreading," in 7th International Symposium on High-Performance Computer Architecture, January 2001.
[22]
{22} A. Moshovos, D. Pnevmatikatos and A. Baniasadi, "Slice-Processors: An Implementation of Operation-Based Prediction," in International Conference on Supercomputing, June 2001.
[23]
{23} M. Dubois and Y. Song, "Assisted Execution," University of Southern California CENG Technical Report 98-25, 1998.
[24]
{24} D. Sorin et al, "Analytic Evaluation of Shared-Memory Systems with ILP Processors," in 25th International Symposium on Computer Architecture, 1998.
[25]
{25} V. Pai, P. Ranganathan and S. Adve, "The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology," in International Symposium on High Performance Computer Architecture, February 1997.
[26]
{26} P. Ranganathan, K. Gharachorloo, S. Adve and L. Barroso, "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors," in ASPLOS-VIII, 1998.

Cited By

View all
  • (2021)Rebooting virtual memory with midgardProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00047(512-525)Online publication date: 14-Jun-2021
  • (2019)Demystifying Complex Workload-DRAM InteractionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33667083:3(1-50)Online publication date: 17-Dec-2019
  • (2019)PROFETProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261493:2(1-33)Online publication date: 19-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
June 2004
373 pages
ISBN:0769521436
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
    ISCA 2004
    March 2004
    373 pages
    ISSN:0163-5964
    DOI:10.1145/1028176
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 March 2004

Check for updates

Qualifiers

  • Article

Conference

ISCA04
Sponsor:

Acceptance Rates

ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Rebooting virtual memory with midgardProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00047(512-525)Online publication date: 14-Jun-2021
  • (2019)Demystifying Complex Workload-DRAM InteractionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33667083:3(1-50)Online publication date: 17-Dec-2019
  • (2019)PROFETProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261493:2(1-33)Online publication date: 19-Jun-2019
  • (2019)Co-optimizing memory-level parallelism and cache-level parallelismProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314599(935-949)Online publication date: 8-Jun-2019
  • (2019)Maximizing Limited ResourcesJournal of Signal Processing Systems10.1007/s11265-018-1369-491:3-4(379-397)Online publication date: 1-Mar-2019
  • (2018)A Monolithic-3D SRAM Design with Enhanced Robustness and In-Memory Computation SupportProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218645(1-6)Online publication date: 23-Jul-2018
  • (2018)Modeling Superscalar Processor Memory-Level ParallelismIEEE Computer Architecture Letters10.1109/LCA.2017.270137017:1(9-12)Online publication date: 1-Jan-2018
  • (2017)Clairvoyance: look-ahead compile-time schedulingProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049852(171-184)Online publication date: 4-Feb-2017
  • (2016)Improving bank-level parallelism for irregular applicationsThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195708(1-12)Online publication date: 15-Oct-2016
  • (2016)Simultaneous Multi-Layer AccessACM Transactions on Architecture and Code Optimization10.1145/283291112:4(1-29)Online publication date: 6-Jan-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media