research-article

Improving DRAM Bandwidth Utilization with MLP-Aware OS Paging

Authors:

Rishiraj A. Bheda,

Thomas M. Conte,

Jeffrey S. VetterAuthors Info & Claims

MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

Pages 289 - 294

https://doi.org/10.1145/2989081.2989094

Published: 03 October 2016 Publication History

Abstract

Optimal use of available memory bank-level parallelism and channel bandwidth heavily impacts the performance of an application. Research studies have focused on improving bandwidth utilization by employing scheduling policies and request re-ordering techniques at the memory controller. However, potential to extract memory performance by intelligent page allocation that maximizes opportunity for bank-level parallelism and row buffer hits is often overlooked. The actual physical page location in memory has a huge impact on bank conflicts and potential for prioritizing low-latency requests such as row buffer hits. We demonstrate that with more intelligent virtual to physical paging mechanisms it is possible to reduce bank conflicts at the memory and achieve higher bandwidth utilization. Such intelligent paging mechanisms can then form a basis for other request re-ordering techniques to further improve memory performance. In this study we only focus on virtual-to-physical paging techniques and demonstrate 38.4% improvement on DRAM bandwidth utilization with a profile-based scheme. We study a wide variety of workloads from varied benchmark suites. We present results for profile based as well as preliminary results for dynamically adaptive paging techniques. Our results demonstrate improved bandwidth utilization with DRAM aware page layouts. Dynamic paging schemes further demonstrate the potential of run-time adaptive techniques in improving bandwidth utilization of increasingly parallel multi-channel main memory systems.

References

[1]

Dong Uk Kim, Seokju Yoon, Jae W. Lee, "An Analytical Model to Predict Performance Impact of DRAM Bank Partitioning", in Memory Systems Performance and Correctness, MSPC 2013.

[2]

Young-Suk Moon, Yongkee Kwon, Hong-Sik Kim, Dong-gun Kim, Hyungdong Hayden Lee, Kunwoo Park, "The Compact Memory Scheduling Maximizing Row Buffer Locality", in Memory Scheduling Championship, MSC 2012, held in conjunction with ISCA 2012.

[3]

George L. Yuan, Tor M. Aamodt, "A Hybrid Analytical DRAM Performance Model" in 5th Workshop on Modeling, Benchmarking and Simulation 2009.

[4]

Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian, Al Davis, "Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement", in Architectural Support for Programming Languages and Operating System, ASPLOS 2010.

Digital Library

[5]

Myoungsoo Jung, Mahmut Kandemir, "An Evaluation of Different Page Allocation Strategies in High Speed SSDs", in Hot Topics in Storage and File Systems, HotStorage 2012.

Digital Library

[6]

Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Michael Sullivan, Ikhwan Lee, and Mattan Erez, "Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems", In HPCA, 2012.

[7]

Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Patt Yale N, "Prefetch-Aware Shared-Resource Management for Multi-Core Systems", In ISCA, 2011.

Digital Library

[8]

Eiman Ebrahimi, Rustam Miftakhutdinow, and Chris Fallin, "Parallel Application Memory Scheduling", In MICRO, 2011.

Digital Library

[9]

Yoongu Kim, Onur Mutlu Dongsu Han, and Mor Harchol-Balter, "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers", In HPCA, 2010.

[10]

Yoongu Kim, Michael Papamichael, Onur Mutulu, and Mor Harchol-Balter, "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior", In MICRO, 2010.

Digital Library

[11]

Onur Mutlu and Thomas Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors", In MICRO, 2007.

Digital Library

[12]

Weeratunga, "The NAS Parallel Benchmarks," RNR Technical Report RNR-94-007, March 1994.

[13]

SPEC Benchmarks http://www.spec.org/

[14]

K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung, "BioBench: A Benchmark Suite of Bioinformatics Applications", in ISPASS'05.

Digital Library

[15]

Vijay Janapa Reddi, Alex Settle, and Daniel A. Connors, Robert S. Cohn, "PIN: A Binary Instrumentation Tool for Computer Architecture Research and Education," in WCAE'04

[16]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh and Kai Li, "The Parsec Benchmark Suite: Characterization and Architectural Implications", in PACT'2008.

Digital Library

[17]

David Wang, Brinda Ganesh, Nuengwong Tuaychareon, Kathleen Baynes, Aamer Jaleel, Bruce Jacob, "DRAMSim: A Memory System Simulator", in dasCMP, 2005.

Digital Library

[18]

Rishiraj A. Bheda, Jesse G. Beu, Brian P. Railing, Thomas M. Conte, "Extrapolation Pitfalls When Evaluating Limited Endurance Memories", in MASCOTS, 2012.

Digital Library

[19]

McCluskey, E. J., "Minimization of Boolean functions". Bell System Tech. J., Vol. 35, No. 5, 1956, p. 1417--1444.

[20]

Robert E. Tarjan, Depth-first search and linear graph algorithms, SIAM Journal on Computing, 1(2):146--160, 1972.

[21]

https://asc.llnl.gov/CORAL-benchmark.

Cited By

Che PPeng ZCao BLiu JChen TFan R(2023)A missing physical fitness test data classification method based on MLP2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud-EdgeCom58631.2023.00031(132-137)Online publication date: Jul-2023
https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00031
Vijaykumar NJain AMajumdar DHsieh KPekhimenko GEbrahimi EHajinazar NGibbons PMutlu O(2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00027

Improving DRAM Bandwidth Utilization with MLP-Aware OS Paging
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
VRL-DRAM: improving DRAM performance via variable refresh latency
DAC '18: Proceedings of the 55th Annual Design Automation Conference

A DRAM chip requires periodic refresh operations to prevent data loss due to charge leakage in DRAM cells. Refresh operations incur significant performance overhead as a DRAM bank/rank becomes unavailable to service access requests while being ...
A prefetch-aware memory system for data access patterns in multimedia applications
CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers

As the speed gap between CPU and external memory widens, memory latency has become the dominant performance bottleneck in modern applications. Closely connected are caches which play an important role in reducing the average memory latency. The way data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

October 2016

463 pages

ISBN:9781450343053

DOI:10.1145/2989081

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2016 ACM.

© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MEMSYS '16

MEMSYS '16: The Second International Symposium on Memory Systems

October 3 - 6, 2016

VA, Alexandria, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
128
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Che PPeng ZCao BLiu JChen TFan R(2023)A missing physical fitness test data classification method based on MLP2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud-EdgeCom58631.2023.00031(132-137)Online publication date: Jul-2023
https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00031
Vijaykumar NJain AMajumdar DHsieh KPekhimenko GEbrahimi EHajinazar NGibbons PMutlu O(2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00027

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten