research-article

ANATOMY: an analytical model of memory system performance

Authors:

Nagendra Gulur,

Mahesh Mehendale,

Raman Manikantan,

Ramaswamy GovindarajanAuthors Info & Claims

SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems

Pages 505 - 517

https://doi.org/10.1145/2591971.2591995

Published: 16 June 2014 Publication History

Abstract

Memory system design is increasingly influencing modern multi-core architectures from both performance and power perspectives. However predicting the performance of memory systems is complex, compounded by the myriad design choices and parameters along multiple dimensions, namely (i) technology, (ii) design and (iii) architectural choices. In this work, we construct an analytical model of the memory system to comprehend this diverse space and to study the impact of memory system parameters from latency and bandwidth perspectives. Our model, called ANATOMY, consists of two key components that are coupled with each other, to model the memory system accurately. The first component is a queuing model of memory which models in detail various design choices and captures the impact of technological choices in memory systems. The second component is an analytical model to summarize key workload characteristics, namely row buffer hit rate (RBH), bank-level parallelism (BLP), and request spread (S) which are used as inputs to the queuing model to estimate memory performance. We validate the model across a wide variety of memory configurations on 4, 8 and 16 cores using a total of 44 workloads. ANATOMY is able to predict memory latency with an average error of 8.1%, 4.1% and 9.7% over 4, 8 and 16 core configurations. We demonstrate the extensibility and applicability of our model by exploring a variety of memory design choices such as the impact of clock speed, benefit of multiple memory controllers, the role of banks and channel width, and so on. We also demonstrate ANATOMY's ability to capture architectural elements such as scheduling mechanisms (using FR_FCFS and PAR_BS) and impact of DRAM refresh cycles. In all of these studies, ANATOMY provides insight into sources of memory performance bottlenecks and is able to quantitatively predict the benefit of redressing them.

References

[1]

"DDR3, DDR4," 2013. {Online}. Available: http://www.jedec.org/category/technology-focus-area/mainmemory-ddr3-ddr4-sdram

[2]

D. Abts et al, "Achieving predictable performance through better memory controller placement in many-core CMPs," in ISCA-36, 2009.

Digital Library

[3]

J. H. Ahn, M. Erez, and W. J. Dally, "The design space of data-parallel memory systems," in SC, 2006.

Digital Library

[4]

M. Awasthi et al, "Handling the problems and opportunities posed by multiple on-chip memory controllers," in PACT-19, 2010.

Digital Library

[5]

N. Binkert et al, "The GEM5 simulator," SIGARCH Comput. Archit. News, 2011.

Digital Library

[6]

H. Choi, J. Lee, and W. Sung, "Memory access pattern-aware DRAM performance model for multi-core systems," in ISPASS, 2011.

Digital Library

[7]

Q. Deng et al, "Memscale: active low-power modes for main memory," in ASPLOS-16, 2011.

Digital Library

[8]

____, "Multiscale: memory system dvfs with multiple memory controllers," in ISLPED '12, 2012.

Digital Library

[9]

M. Ghosh et al, "Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3d die-stacked drams," in MICRO 40, 2007.

Digital Library

[10]

E. Ipek et al, "Self-optimizing memory controllers: A reinforcement learning approach," in Proceedings of the 35th Annual International Symposium on Computer Architecture, ser. ISCA '08, 2008.

Digital Library

[11]

B. Jacob, S. Ng, and D. Wang, Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., 2007.

Digital Library

[12]

T. S. Karkhanis and J. E. Smith, "A first-order superscalar processor model," SIGARCH Comput. Archit. News, 2004.

Digital Library

[13]

Y. Kim et al, "Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers," in HPCA-16, 2010.

[14]

____, "Thread cluster memory scheduling: Exploiting differences in memory access behavior," in MICRO-43, 2010.

Digital Library

[15]

B. C. Lee et al, "Architecting phase change memory as a scalable DRAM alternative," in ISCA-36, 2009.

Digital Library

[16]

C. J. Lee et al, "Prefetch-aware DRAM controllers," in MICRO-41, 2008.

Digital Library

[17]

F. Liu et al, "Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance," in HPCA-16, 2010.

[18]

J. Liu et al, "Raidr: Retention-aware intelligent dram refresh," in ISCA 12, 2012.

Digital Library

[19]

E. Modiano et al, "An approach for the analysis of packet delay in an integrated mobile radio network," in Proc. of the Twenty-Seventh Annual Conference on Information Sciences and Systems, 1993.

[20]

S. P. Muralidhara et al, "Reducing memory interference in multicore systems via application-aware memory channel partitioning," in MICRO-44, 2011.

Digital Library

[21]

O. Mutlu et al, "Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems," in ISCA-35, 2008.

Digital Library

[22]

C. D. Pack, "The output of an m/d/1 queue,? Operations Research, Vol. 23, No. 4, 1975.

[23]

R. L. Plackett, ?Karl pearson and the chi-squared test,? International Statistical Review (ISI) 51(1):59--72, 1983.

[24]

M. K. Qureshi et al, ?Preset: improving performance of phase change memories by exploiting asymmetry in write times,? SIGARCH Comput. Archit. News, 2012.

Digital Library

[25]

____, "Scalable high performance main memory system using phase-change memory technology," in ISCA-36, 2009.

Digital Library

[26]

S. Rixner et al, "Memory access scheduling," in ISCA-27, 2000.

Digital Library

[27]

B. M. Rogers et al, "Scaling the bandwidth wall: challenges in and avenues for CMP scaling," in ISCA-36, 2009.

Digital Library

[28]

S. M. Ross, "Stochastic processes," Wiley Series in Probability and Statistics, 1995.

[29]

J. Stuecheli et al, "Elastic refresh: Techniques to mitigate refresh penalties in high density memory," in MICRO 43, 2010.

Digital Library

[30]

K. Sudan et al, "Micro-pages: increasing DRAM efficiency with locality-aware data placement," in ASPLOS-15, 2010.

Digital Library

[31]

G. Sun et al, "Moguls: a model to explore the memory hierarchy for bandwidth improvements," in ISCA-38, 2011.

Digital Library

[32]

H. Wong et al, "Phase change memory," Proc. of the IEEE, 2010.

[33]

W. A. Wulf and S. A. McKee, "Hitting the memory wall: implications of the obvious," SIGARCH Comput. Archit. News, 1995.

Digital Library

[34]

G. L. Yuan et al, "A hybrid analytical DRAM performance model," 2009.

[35]

M. Zhou, Y. Du, B. R. Childers, R. Melhem, and D. Mosse, "Writeback-aware bandwidth partitioning for multi-core systems with pcm," in PACT, 2013.

Digital Library

Cited By

Pandey SVenkatesh T(2022)Performance investigation of packet-based communication in 3D-memoriesThe Journal of Supercomputing10.1007/s11227-022-04605-178:17(19070-19096)Online publication date: 15-Jun-2022
https://doi.org/10.1007/s11227-022-04605-1
Salkhordeh RMutlu OAsadi H(2019)An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main MemoriesIEEE Transactions on Computers10.1109/TC.2019.290659768:8(1114-1130)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1109/TC.2019.2906597
Panda RJohn L(2018)HALOProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205323(118-128)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205323
Show More Cited By

Index Terms

ANATOMY: an analytical model of memory system performance
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

ANATOMY: an analytical model of memory system performance
Performance evaluation review

Memory system design is increasingly influencing modern multi-core architectures from both performance and power perspectives. However predicting the performance of memory systems is complex, compounded by the myriad design choices and parameters along ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
LL-PCM: Low-Latency Phase Change Memory Architecture
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

PCM is a promising non-volatile memory technology, as it can offer a unique trade-off between density and latency compared with DRAM and flash memory. Albeit PCM is much faster than flash memory, it is still notably slower than DRAM, which can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems

June 2014

614 pages

ISBN:9781450327893

DOI:10.1145/2591971

General Chairs:
Sujay Sanghavi
The University of Texas at Austin
,
Sanjay Shakkottai
The University of Texas at Austin
,
Program Chairs:
Marc Lelarge
INRIA, France
,
Bianca Schroeder
University of Toronto

ACM SIGMETRICS Performance Evaluation Review Volume 42, Issue 1
Performance evaluation review
June 2014
569 pages
ISSN:0163-5999
DOI:10.1145/2637364
Editors:
Derek Eager
University of Saskatchewan
,
Carey Williamson
University of Calgary
Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMETRICS '14

Sponsor:

SIGMETRICS

SIGMETRICS '14: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

June 16 - 20, 2014

Texas, Austin, USA

Acceptance Rates

SIGMETRICS '14 Paper Acceptance Rate 40 of 237 submissions, 17%;

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
781
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pandey SVenkatesh T(2022)Performance investigation of packet-based communication in 3D-memoriesThe Journal of Supercomputing10.1007/s11227-022-04605-178:17(19070-19096)Online publication date: 15-Jun-2022
https://doi.org/10.1007/s11227-022-04605-1
Salkhordeh RMutlu OAsadi H(2019)An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main MemoriesIEEE Transactions on Computers10.1109/TC.2019.290659768:8(1114-1130)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1109/TC.2019.2906597
Panda RJohn L(2018)HALOProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205323(118-128)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205323
Tarvo AReiss S(2018)Automatic performance prediction of multithreaded programsAutomated Software Engineering10.1007/s10515-017-0214-525:1(101-155)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1007/s10515-017-0214-5
Dublish SNagarajan VTopham N(2017)Evaluating and mitigating bandwidth bottlenecks across the memory hierarchy in GPUs2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2017.7975295(239-248)Online publication date: May-2017
https://doi.org/10.1109/ISPASS.2017.7975295
Huang YLi D(2017)Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.42(166-177)Online publication date: Oct-2017
https://doi.org/10.1109/CLUSTER.2017.42
Bouache MGlover JBoukhobza J(2016)Analysis of Memory Performance: Mixed Rank Performance Across MicroarchitecturesHigh Performance Computing10.1007/978-3-319-46079-6_39(579-590)Online publication date: 6-Oct-2016
https://doi.org/10.1007/978-3-319-46079-6_39
Gulur NMehendale MGovindarajan RJohn LSmith CSachs KLladó C(2015)A Comprehensive Analytical Performance Model of DRAM CachesProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688044(157-168)Online publication date: 28-Jan-2015
https://dl.acm.org/doi/10.1145/2668930.2688044
Wang HJog A(2019)Exploiting Latency and Error Tolerance of GPGPU Applications for an Energy-Efficient DRAM2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2019.00046(362-374)Online publication date: Jul-2019
https://doi.org/10.1109/DSN.2019.00046
Sun FJi KLing MShi L(2017)A trace-driven analytical model with less profiling overhead for DRAM access latencies2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)10.1109/PACRIM.2017.8121883(1-6)Online publication date: Aug-2017
https://doi.org/10.1109/PACRIM.2017.8121883
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents