Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2591971.2591995acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

ANATOMY: an analytical model of memory system performance

Published: 16 June 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Memory system design is increasingly influencing modern multi-core architectures from both performance and power perspectives. However predicting the performance of memory systems is complex, compounded by the myriad design choices and parameters along multiple dimensions, namely (i) technology, (ii) design and (iii) architectural choices. In this work, we construct an analytical model of the memory system to comprehend this diverse space and to study the impact of memory system parameters from latency and bandwidth perspectives. Our model, called ANATOMY, consists of two key components that are coupled with each other, to model the memory system accurately. The first component is a queuing model of memory which models in detail various design choices and captures the impact of technological choices in memory systems. The second component is an analytical model to summarize key workload characteristics, namely row buffer hit rate (RBH), bank-level parallelism (BLP), and request spread (S) which are used as inputs to the queuing model to estimate memory performance. We validate the model across a wide variety of memory configurations on 4, 8 and 16 cores using a total of 44 workloads. ANATOMY is able to predict memory latency with an average error of 8.1%, 4.1% and 9.7% over 4, 8 and 16 core configurations. We demonstrate the extensibility and applicability of our model by exploring a variety of memory design choices such as the impact of clock speed, benefit of multiple memory controllers, the role of banks and channel width, and so on. We also demonstrate ANATOMY's ability to capture architectural elements such as scheduling mechanisms (using FR_FCFS and PAR_BS) and impact of DRAM refresh cycles. In all of these studies, ANATOMY provides insight into sources of memory performance bottlenecks and is able to quantitatively predict the benefit of redressing them.

    References

    [1]
    "DDR3, DDR4," 2013. {Online}. Available: http://www.jedec.org/category/technology-focus-area/mainmemory-ddr3-ddr4-sdram
    [2]
    D. Abts et al, "Achieving predictable performance through better memory controller placement in many-core CMPs," in ISCA-36, 2009.
    [3]
    J. H. Ahn, M. Erez, and W. J. Dally, "The design space of data-parallel memory systems," in SC, 2006.
    [4]
    M. Awasthi et al, "Handling the problems and opportunities posed by multiple on-chip memory controllers," in PACT-19, 2010.
    [5]
    N. Binkert et al, "The GEM5 simulator," SIGARCH Comput. Archit. News, 2011.
    [6]
    H. Choi, J. Lee, and W. Sung, "Memory access pattern-aware DRAM performance model for multi-core systems," in ISPASS, 2011.
    [7]
    Q. Deng et al, "Memscale: active low-power modes for main memory," in ASPLOS-16, 2011.
    [8]
    ____, "Multiscale: memory system dvfs with multiple memory controllers," in ISLPED '12, 2012.
    [9]
    M. Ghosh et al, "Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3d die-stacked drams," in MICRO 40, 2007.
    [10]
    E. Ipek et al, "Self-optimizing memory controllers: A reinforcement learning approach," in Proceedings of the 35th Annual International Symposium on Computer Architecture, ser. ISCA '08, 2008.
    [11]
    B. Jacob, S. Ng, and D. Wang, Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., 2007.
    [12]
    T. S. Karkhanis and J. E. Smith, "A first-order superscalar processor model," SIGARCH Comput. Archit. News, 2004.
    [13]
    Y. Kim et al, "Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers," in HPCA-16, 2010.
    [14]
    ____, "Thread cluster memory scheduling: Exploiting differences in memory access behavior," in MICRO-43, 2010.
    [15]
    B. C. Lee et al, "Architecting phase change memory as a scalable DRAM alternative," in ISCA-36, 2009.
    [16]
    C. J. Lee et al, "Prefetch-aware DRAM controllers," in MICRO-41, 2008.
    [17]
    F. Liu et al, "Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance," in HPCA-16, 2010.
    [18]
    J. Liu et al, "Raidr: Retention-aware intelligent dram refresh," in ISCA 12, 2012.
    [19]
    E. Modiano et al, "An approach for the analysis of packet delay in an integrated mobile radio network," in Proc. of the Twenty-Seventh Annual Conference on Information Sciences and Systems, 1993.
    [20]
    S. P. Muralidhara et al, "Reducing memory interference in multicore systems via application-aware memory channel partitioning," in MICRO-44, 2011.
    [21]
    O. Mutlu et al, "Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems," in ISCA-35, 2008.
    [22]
    C. D. Pack, "The output of an m/d/1 queue,? Operations Research, Vol. 23, No. 4, 1975.
    [23]
    R. L. Plackett, ?Karl pearson and the chi-squared test,? International Statistical Review (ISI) 51(1):59--72, 1983.
    [24]
    M. K. Qureshi et al, ?Preset: improving performance of phase change memories by exploiting asymmetry in write times,? SIGARCH Comput. Archit. News, 2012.
    [25]
    ____, "Scalable high performance main memory system using phase-change memory technology," in ISCA-36, 2009.
    [26]
    S. Rixner et al, "Memory access scheduling," in ISCA-27, 2000.
    [27]
    B. M. Rogers et al, "Scaling the bandwidth wall: challenges in and avenues for CMP scaling," in ISCA-36, 2009.
    [28]
    S. M. Ross, "Stochastic processes," Wiley Series in Probability and Statistics, 1995.
    [29]
    J. Stuecheli et al, "Elastic refresh: Techniques to mitigate refresh penalties in high density memory," in MICRO 43, 2010.
    [30]
    K. Sudan et al, "Micro-pages: increasing DRAM efficiency with locality-aware data placement," in ASPLOS-15, 2010.
    [31]
    G. Sun et al, "Moguls: a model to explore the memory hierarchy for bandwidth improvements," in ISCA-38, 2011.
    [32]
    H. Wong et al, "Phase change memory," Proc. of the IEEE, 2010.
    [33]
    W. A. Wulf and S. A. McKee, "Hitting the memory wall: implications of the obvious," SIGARCH Comput. Archit. News, 1995.
    [34]
    G. L. Yuan et al, "A hybrid analytical DRAM performance model," 2009.
    [35]
    M. Zhou, Y. Du, B. R. Childers, R. Melhem, and D. Mosse, "Writeback-aware bandwidth partitioning for multi-core systems with pcm," in PACT, 2013.

    Cited By

    View all

    Index Terms

    1. ANATOMY: an analytical model of memory system performance

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems
      June 2014
      614 pages
      ISBN:9781450327893
      DOI:10.1145/2591971
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 June 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. analytical model
      2. dram
      3. memory system performance

      Qualifiers

      • Research-article

      Conference

      SIGMETRICS '14
      Sponsor:

      Acceptance Rates

      SIGMETRICS '14 Paper Acceptance Rate 40 of 237 submissions, 17%;
      Overall Acceptance Rate 459 of 2,691 submissions, 17%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Performance investigation of packet-based communication in 3D-memoriesThe Journal of Supercomputing10.1007/s11227-022-04605-178:17(19070-19096)Online publication date: 15-Jun-2022
      • (2019)An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main MemoriesIEEE Transactions on Computers10.1109/TC.2019.290659768:8(1114-1130)Online publication date: 1-Aug-2019
      • (2018)HALOProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205323(118-128)Online publication date: 12-Jun-2018
      • (2018)Automatic performance prediction of multithreaded programsAutomated Software Engineering10.1007/s10515-017-0214-525:1(101-155)Online publication date: 1-Mar-2018
      • (2017)Evaluating and mitigating bandwidth bottlenecks across the memory hierarchy in GPUs2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2017.7975295(239-248)Online publication date: May-2017
      • (2017)Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.42(166-177)Online publication date: Oct-2017
      • (2016)Analysis of Memory Performance: Mixed Rank Performance Across MicroarchitecturesHigh Performance Computing10.1007/978-3-319-46079-6_39(579-590)Online publication date: 6-Oct-2016
      • (2015)A Comprehensive Analytical Performance Model of DRAM CachesProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688044(157-168)Online publication date: 28-Jan-2015
      • (2019)Exploiting Latency and Error Tolerance of GPGPU Applications for an Energy-Efficient DRAM2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2019.00046(362-374)Online publication date: Jul-2019
      • (2017)A trace-driven analytical model with less profiling overhead for DRAM access latencies2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)10.1109/PACRIM.2017.8121883(1-6)Online publication date: Aug-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media