Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2989081.2989094acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Improving DRAM Bandwidth Utilization with MLP-Aware OS Paging

Published: 03 October 2016 Publication History

Abstract

Optimal use of available memory bank-level parallelism and channel bandwidth heavily impacts the performance of an application. Research studies have focused on improving bandwidth utilization by employing scheduling policies and request re-ordering techniques at the memory controller. However, potential to extract memory performance by intelligent page allocation that maximizes opportunity for bank-level parallelism and row buffer hits is often overlooked. The actual physical page location in memory has a huge impact on bank conflicts and potential for prioritizing low-latency requests such as row buffer hits. We demonstrate that with more intelligent virtual to physical paging mechanisms it is possible to reduce bank conflicts at the memory and achieve higher bandwidth utilization. Such intelligent paging mechanisms can then form a basis for other request re-ordering techniques to further improve memory performance. In this study we only focus on virtual-to-physical paging techniques and demonstrate 38.4% improvement on DRAM bandwidth utilization with a profile-based scheme. We study a wide variety of workloads from varied benchmark suites. We present results for profile based as well as preliminary results for dynamically adaptive paging techniques. Our results demonstrate improved bandwidth utilization with DRAM aware page layouts. Dynamic paging schemes further demonstrate the potential of run-time adaptive techniques in improving bandwidth utilization of increasingly parallel multi-channel main memory systems.

References

[1]
Dong Uk Kim, Seokju Yoon, Jae W. Lee, "An Analytical Model to Predict Performance Impact of DRAM Bank Partitioning", in Memory Systems Performance and Correctness, MSPC 2013.
[2]
Young-Suk Moon, Yongkee Kwon, Hong-Sik Kim, Dong-gun Kim, Hyungdong Hayden Lee, Kunwoo Park, "The Compact Memory Scheduling Maximizing Row Buffer Locality", in Memory Scheduling Championship, MSC 2012, held in conjunction with ISCA 2012.
[3]
George L. Yuan, Tor M. Aamodt, "A Hybrid Analytical DRAM Performance Model" in 5th Workshop on Modeling, Benchmarking and Simulation 2009.
[4]
Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian, Al Davis, "Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement", in Architectural Support for Programming Languages and Operating System, ASPLOS 2010.
[5]
Myoungsoo Jung, Mahmut Kandemir, "An Evaluation of Different Page Allocation Strategies in High Speed SSDs", in Hot Topics in Storage and File Systems, HotStorage 2012.
[6]
Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Michael Sullivan, Ikhwan Lee, and Mattan Erez, "Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems", In HPCA, 2012.
[7]
Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Patt Yale N, "Prefetch-Aware Shared-Resource Management for Multi-Core Systems", In ISCA, 2011.
[8]
Eiman Ebrahimi, Rustam Miftakhutdinow, and Chris Fallin, "Parallel Application Memory Scheduling", In MICRO, 2011.
[9]
Yoongu Kim, Onur Mutlu Dongsu Han, and Mor Harchol-Balter, "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers", In HPCA, 2010.
[10]
Yoongu Kim, Michael Papamichael, Onur Mutulu, and Mor Harchol-Balter, "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior", In MICRO, 2010.
[11]
Onur Mutlu and Thomas Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors", In MICRO, 2007.
[12]
Weeratunga, "The NAS Parallel Benchmarks," RNR Technical Report RNR-94-007, March 1994.
[13]
SPEC Benchmarks http://www.spec.org/
[14]
K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung, "BioBench: A Benchmark Suite of Bioinformatics Applications", in ISPASS'05.
[15]
Vijay Janapa Reddi, Alex Settle, and Daniel A. Connors, Robert S. Cohn, "PIN: A Binary Instrumentation Tool for Computer Architecture Research and Education," in WCAE'04
[16]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh and Kai Li, "The Parsec Benchmark Suite: Characterization and Architectural Implications", in PACT'2008.
[17]
David Wang, Brinda Ganesh, Nuengwong Tuaychareon, Kathleen Baynes, Aamer Jaleel, Bruce Jacob, "DRAMSim: A Memory System Simulator", in dasCMP, 2005.
[18]
Rishiraj A. Bheda, Jesse G. Beu, Brian P. Railing, Thomas M. Conte, "Extrapolation Pitfalls When Evaluating Limited Endurance Memories", in MASCOTS, 2012.
[19]
McCluskey, E. J., "Minimization of Boolean functions". Bell System Tech. J., Vol. 35, No. 5, 1956, p. 1417--1444.
[20]
Robert E. Tarjan, Depth-first search and linear graph algorithms, SIAM Journal on Computing, 1(2):146--160, 1972.
[21]
https://asc.llnl.gov/CORAL-benchmark.

Cited By

View all
  • (2023)A missing physical fitness test data classification method based on MLP2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud-EdgeCom58631.2023.00031(132-137)Online publication date: Jul-2023
  • (2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
  1. Improving DRAM Bandwidth Utilization with MLP-Aware OS Paging

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
    October 2016
    463 pages
    ISBN:9781450343053
    DOI:10.1145/2989081
    © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bandwidth Utilization
    2. Bank-Level Parallelism
    3. Channels
    4. DRAM
    5. MLP
    6. Paging

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MEMSYS '16

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A missing physical fitness test data classification method based on MLP2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud-EdgeCom58631.2023.00031(132-137)Online publication date: Jul-2023
    • (2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media