Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2523721.2523740acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Writeback-aware bandwidth partitioning for multi-core systems with PCM

Published: 07 October 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Phase-Change Memory (PCM) has emerged as a promising low-power candidate to replace DRAM in main memory. Hybrid memory architecture comprised of a large PCM and a small DRAM is a popular solution to mitigate undesirable characteristics of PCM writes. Because PCM writes are much slower than reads, writebacks from the last-level cache consume a large portion of memory bandwidth, and thus, impact performance. Effectively utilizing shared resources, such as the last-level cache and the memory bandwidth, is crucial to achieving high performance for multi-core systems. Although existing memory bandwidth allocation schemes improve system performance, no current approach uses writeback information to partition bandwidth for hybrid memory. We use a writeback-aware analytic model to derive the allocation strategy for bandwidth partitioning of phase-change memory. From the derivation of the model, Writeback-aware Bandwidth Partitioning (WBP) is proposed as a new runtime mechanism to partition PCM service cycles among applications. WBP uses a partitioning weight to indicate the importance of writebacks (in addition to LLC misses) to bandwidth allocation. A companion Dynamic Weight Adjustment (DWA) scheme dynamically selects the partitioning weight to maximize system performance. Simulation results show that WBP and DWA improve performance by 24.9% (weighted speedup) over bandwidth partitioning schemes that do not take writebacks into consideration in a 8-core system.

    References

    [1]
    J. Kong, J. Choi, L. Choi, and S. W. Chung, "Low-cost application-aware DVFS for multi-core architecture," in ICCIT '08, 2008.
    [2]
    Kwang-Jin Lee et al., "A 90 nm 1.8 V 512 Mb diode-switch PRAM with 266 MB/s read throughput," Solid-State Circuits, IEEE Journal of, vol. 43, 2008.
    [3]
    Kang et al, "A 0.1 μm 1.8V 256Mb 66MHz Synchronous Burst PRAM," in ISSCC '06, 2006.
    [4]
    F. Pellizzer et al., "A 90nm phase change memory technology for stand-alone non-volatile memory applications," in Symp. on VLSI Tech., 2006.
    [5]
    P. Zhou, B. Zhao, J. Yang, and Y. Zhang, "A durable and energy efficient main memory using phase change memory technology," in ISCA '09, 2009.
    [6]
    Qureshi, Moinuddin K. et al., "Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling," in MICRO, 2009.
    [7]
    S. Cho and H. Lee, "Flip-N-Write: a simple deterministic technique to improve PRAM write performance, energy and endurance," in MICRO, 2009.
    [8]
    A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Mosse, "Increasing PCM main memory lifetime," in DATE '10, 2010.
    [9]
    B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, "Architecting phase change memory as a scalable DRAM alternative," in ISCA '09, 2009.
    [10]
    A. P. Ferreira, B. Childers, R. Melhem, D. Mosse, and M. Yousif, "Using PCM in next-generation embedded space applications," in RTAS, 2010.
    [11]
    M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable high performance main memory system using phase-change memory technology," in ISCA '09, 2009.
    [12]
    F. Liu, X. Jiang, and Y. Solihin, "Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance," in HPCA, 2010.
    [13]
    S. Chen, P. B. Gibbons, and S. Nath, "Rethinking database algorithms for phase change memory," in CIDR '11, 2011.
    [14]
    M. K. Qureshi, M. Franceschini, and L. A. Lastras-Monta\ no, "Improving read performance of phase change memories via write cancellation and write pausing," in HPCA, 2010, pp. 1--11.
    [15]
    A. S. Tanenbaum, Computer Networks, 3rd Edition.\hskip 1em plus 0.5em minus 0.4em\relax Prentice Hall, 1996.
    [16]
    M. K. Qureshi and Y. N. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," in MICRO 39, 2006.
    [17]
    M. Zhou, Y. Du, B. Childers, R. Melhem, and D. Mossé, "Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems," ACM Trans. Archit. Code Optim., vol. 8, no. 4, pp. 53:1--53:21, Jan. 2012.
    [18]
    P. G. Emma, "Understanding some simple processor-performance limits," IBM J. Res. Dev., vol. 41, no. 3, pp. 215--232, May 1997.
    [19]
    Y. Luo, O. M. Lubeck, H. Wasserman, F. Bassetti, and K. W. Cameron, "Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model," in Proc. of the 1st Intl. workshop on Software and performance, 1998.
    [20]
    Z. Zhang, Z. Zhu, and X. Zhang, "A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality," in MICRO 33, 2000.
    [21]
    J. G. K. Luo and M. Franklin, "Balancing throughput and fairness in SMT processors," in ISPASS '01, 2001, pp. 164 -- 171.
    [22]
    W. Zhang and T. Li, "Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures," in PACT '09, 2009.
    [23]
    G. E. Suh, L. Rudolph, and S. Devadas, "Dynamic partitioning of shared cache memory," Journal of Supercomputing, 2002.
    [24]
    M. Moreto, F. J. Cazorla, A. Ramirez, and M. Valero, "Transactions on high-performance embedded architectures and compilers III."\hskip 1em plus 0.5em minus 0.4em\relax Berlin, Heidelberg: Springer-Verlag, 2011, ch. Dynamic cache partitioning based on the MLP of cache misses, pp. 3--23.
    [25]
    J. D. Owens, P. Mattson, U. J. Kapasi, W. J. Dally, and S. Rixner, "Memory access scheduling," ISCA, vol. 0, p. 128, 2000.
    [26]
    K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith, "Fair queuing memory systems," in MICRO 39, 2006, pp. 208--222.
    [27]
    E. Ipek, O. Mutlu, J. F. Martınez, and R. Caruana, "Self-optimizing memory controllers: A reinforcement learning approach," in ISCA '08.
    [28]
    R. Wang, L. Chen, and T. Pinkston, "An analytical performance model for partitioning off-chip memory bandwidth," in IPDPS, 2013.
    [29]
    D. Kaseridis, J. Stuecheli, J. Chen, and L. K. John, "A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems," in HPCA'10, 2010, pp. 1--11.
    [30]
    E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt, "Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems," in ASPLOS, ser. ASPLOS XV, 2010.

    Cited By

    View all
    • (2016)Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/284725412:4(1-26)Online publication date: 4-Jan-2016
    • (2015)Real-Time In-Memory Checkpointing for Future Hybrid Memory SystemsProceedings of the 29th ACM on International Conference on Supercomputing10.1145/2751205.2751212(263-272)Online publication date: 8-Jun-2015
    • (2015)A Comprehensive Analytical Performance Model of DRAM CachesProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688044(157-168)Online publication date: 28-Jan-2015
    • Show More Cited By

    Index Terms

    1. Writeback-aware bandwidth partitioning for multi-core systems with PCM

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
        October 2013
        422 pages
        ISBN:9781479910212

        Sponsors

        Publisher

        IEEE Press

        Publication History

        Published: 07 October 2013

        Check for updates

        Author Tags

        1. analytic model
        2. memory bandwidth
        3. partitioning
        4. phase change memory

        Qualifiers

        • Research-article

        Acceptance Rates

        PACT '13 Paper Acceptance Rate 36 of 208 submissions, 17%;
        Overall Acceptance Rate 121 of 471 submissions, 26%

        Upcoming Conference

        PACT '24

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 10 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2016)Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/284725412:4(1-26)Online publication date: 4-Jan-2016
        • (2015)Real-Time In-Memory Checkpointing for Future Hybrid Memory SystemsProceedings of the 29th ACM on International Conference on Supercomputing10.1145/2751205.2751212(263-272)Online publication date: 8-Jun-2015
        • (2015)A Comprehensive Analytical Performance Model of DRAM CachesProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688044(157-168)Online publication date: 28-Jan-2015
        • (2014)ANATOMYACM SIGMETRICS Performance Evaluation Review10.1145/2637364.259199542:1(505-517)Online publication date: 16-Jun-2014
        • (2014)ANATOMYThe 2014 ACM international conference on Measurement and modeling of computer systems10.1145/2591971.2591995(505-517)Online publication date: 16-Jun-2014

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media