research-article

Concurrent Dynamic Memory Coalescing on GoblinCore-64 Architecture

Authors:

John D. Leidel,

Yong ChenAuthors Info & Claims

MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

Pages 177 - 187

https://doi.org/10.1145/2989081.2989128

Published: 03 October 2016 Publication History

Abstract

The majority of modern microprocessors are architected to utilize multi-level data caches as a primary optimization to reduce the latency and increase the perceived bandwidth from an application. The spatial and temporal locality provided by data caches work well in conjunction with applications that access memory in a linear fashion. However, applications that exhibit random or non-deterministic memory access patterns often induce a significant number of data cache misses, thus reducing the natural performance benefit from the data cache.

In response to the performance penalties inherently present with non-deterministic applications, we have constructed a unique memory hierarchy within the GoblinCore-64 (GC64) architecture explicitly designed to exploit memory performance from irregular memory access patterns. The GC64 architecture combines a RISC-V-based core coupled with latency-hiding architectural features to a memory hierarchy with Hybrid Memory Cube (HMC) devices. In order to cope with the inherent non-determinism of applications and to exploit the packetized interface presented by the HMC device, we develop a methodology and associated implementation of a dynamic memory coalescing unit for the GC64 memory hierarchy that permits us to statistically sample memory requests from non-deterministic applications and coalesce them into the largest possible HMC payload requests.

In this work, we present two parallel methodologies and associated implementations for coalescing non-deterministic memory requests into the largest potential HMC request by constructing a binary tree representation of the live memory requests from disparate cores. We present the coalesced HMC memory request results from applications that exhibit linear and non-linear memory request patterns compiled for a RISC-V core in contrast with a traditional memory hierarchy.

References

[1]

John D. Leidel, Xi Wang, and Yong Chen. GoblinCore64: Architectural Specification. Technical report, Texas Tech University, September 2015.

[2]

Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanovic. The RISC-V Instruction Set Manual, Volume I: User-Level ISA Version 2.0. Technical report, 2014.

[3]

Rohit Chandra. Parallel Programming in OpenMP. Morgan kaufmann, 2001.

Digital Library

[4]

Yunsup Lee, Andrew Waterman, Rimas Avizienis, Henry Cook, Chen Sun, Vladimir Stojanovic, and Krste Asanovic. A 45nm 1.3 GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators. In European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014-40th, pages 199--202. IEEE, 2014.

[5]

Andrew Waterman. Improving energy efficiency and reducing code size with RISC-V compressed. PhD thesis, Master's thesis, University of California, Berkeley, 2011.

[6]

Brian Zimmer, Yunsup Lee, Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Ben Keller, Stevo Bailey, Milovan Blagojevic, Pi-Feng Chiu, Hanh-Phuc Le, et al. A RISC-V vector processor with tightly-integrated switched-capacitor DC-DC converters in 28nm FDSOI. In VLSI Circuits (VLSI Circuits), 2015 Symposium on, pages C316--C317. IEEE, 2015.

[7]

Michael Zimmer, David Broman, Chris Shaver, and Edward A Lee. FlexPRET: A processor platform for mixed-criticality systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014 IEEE 20th, pages 101--110. IEEE, 2014.

[8]

Joe Jeddeloh and Brent Keeth. Hybrid memory cube new DRAM architecture increases density and performance. In 2012 Symposium on VLSI Technology (VLSIT), 2012.

[9]

Maya Gokhale, Scott Lloyd, and Chris Macaraeg. Hybrid Memory Cube Performance Characterization on Data-centric Workloads. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 '15, pages 7:1--7:8, New York, NY, USA, 2015. ACM.

Digital Library

[10]

Paul Rosenfeld, Elliott Cooper-Balis, Todd Farrell, Dave Resnick, and Bruce Jacob. Peering over the memory wall: Design space and performance analysis of the Hybrid Memory Cube. Technical Report UMD-SCA-2012-10-01, University of Maryland.

[11]

Yinhe Han, Ying Wang, Huawei Li, and Xiaowei Li. Data-aware DRAM refresh to squeeze the margin of retention time in hybrid memory cube. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design, pages 295--300. IEEE Press, 2014.

Digital Library

[12]

Mushfique Junayed Khurshid and Mikko Lipasti. Data compression for thermal mitigation in the Hybrid Memory Cube. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 185--192. IEEE, 2013.

[13]

John D Leidel and Yong Chen. HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices. Parallel Processing Letters, 24(04):1442002, 2014.

[14]

John Leidel. Hybrid Memory Cube Simulator 2.0. http://gc64.org/?p=137, December 2015.

[15]

Maya Gokhale, Scott Lloyd, and Chris Hajas. Near Memory Data Structure Rearrangement. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS '15, pages 283--290, New York, NY, USA, 2015. ACM.

Digital Library

[16]

Lifeng Nai and Hyesoon Kim. Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS '15, pages 258--261, New York, NY, USA, 2015. ACM.

Digital Library

[17]

Hybrid Memory Cube Specification 2.0. Technical report, July 2015.

[18]

Toward a New Metric for Ranking High Performance Computing Systems. Technical report, Sandia National Laboratories, 2013.

[19]

David Bader and Kamesh Madduri. Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors. High Performance Computing: HiPC 2005, 3769:465--476, 2005.

Digital Library

[20]

John D. McCalpin. A Survey of Memory Bandwidth and Machine Balance in Current High Performance Computers, 1995.

[21]

Victor Podlozhnyuk. Black-Scholes option pricing, 2007.

[22]

NSLP Kumar, Sanjiv Satoor, and Ian Buck. Fast parallel expectation maximization for Gaussian mixture models on GPUs using CUDA. In High Performance Computing and Communications, 2009. HPCC'09. 11th IEEE International Conference on, pages 103--109. IEEE, 2009.

Digital Library

[23]

Jialin Liu, Yu Zhuang, and Yong Chen. Hierarchical collective i/o scheduling for high-performance computing. Big Data Research, 2(3):117--126, 2015. Big Data, Analytics, and High-Performance Computing.

Digital Library

[24]

Yin Lu, Yong Chen, Yu Zhuang, Jialin Liu, and Rajeev Thakur. Collective input/output under memory constraints. 2014.

[25]

Naznin Fauzia, Louis-Noël Pouchet, and P Sadayappan. Characterizing and enhancing global memory data coalescing on gpus. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 12--22. IEEE Computer Society, 2015.

Digital Library

Cited By

Asiatici MIenne P(2021)Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/346682315:2(1-33)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1145/3466823
Asiatici MIenne PBazargan KNeuendorffer S(2019)Stop Crying Over Your Cache Miss RateProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293901(310-319)Online publication date: 20-Feb-2019
https://dl.acm.org/doi/10.1145/3289602.3293901
Leidel JWang XConlon FChen YDonofrio DFatollahi-Fard FKeville K(2018)xBGASProceedings of the Workshop on Memory Centric High Performance Computing10.1145/3286475.3286478(22-26)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1145/3286475.3286478
Show More Cited By

Recommendations

MAC: Memory Access Coalescer for 3D-Stacked Memory
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Emerging data-intensive applications, such as graph analytics and data mining, exhibit irregular memory access patterns. Research has shown that with these memory-bound applications, traditional cache-based processor architectures, which exploit ...
Memory Coalescing for Hybrid Memory Cube
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

Arguably, many data-intensive applications pose significant challenges to conventional architectures and memory systems, especially when applications exhibit non-contiguous, irregular, and small memory access patterns. The long memory access latency can ...
PAC: Paged Adaptive Coalescer for 3D-Stacked Memory
HPDC '20: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing

Many contemporary data-intensive applications exhibit irregular and highly concurrent memory access patterns and thus challenge the performance of conventional memory systems. Driven by an expanding need for high-bandwidth memory featuring low access ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

October 2016

463 pages

ISBN:9781450343053

DOI:10.1145/2989081

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation (CNS-1338078)

Conference

MEMSYS '16

MEMSYS '16: The Second International Symposium on Memory Systems

October 3 - 6, 2016

VA, Alexandria, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
145
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Asiatici MIenne P(2021)Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/346682315:2(1-33)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1145/3466823
Asiatici MIenne PBazargan KNeuendorffer S(2019)Stop Crying Over Your Cache Miss RateProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293901(310-319)Online publication date: 20-Feb-2019
https://dl.acm.org/doi/10.1145/3289602.3293901
Leidel JWang XConlon FChen YDonofrio DFatollahi-Fard FKeville K(2018)xBGASProceedings of the Workshop on Memory Centric High Performance Computing10.1145/3286475.3286478(22-26)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1145/3286475.3286478
Leidel JJacob B(2018)StakeProceedings of the International Symposium on Memory Systems10.1145/3240302.3240307(365-376)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3240302.3240307
Wang XLeidel JChen Y(2018)Memory Coalescing for Hybrid Memory CubeProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225062(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3225058.3225062
Leidel JWang XChen Y(2018)GoblinCore-64: A RISC-V Based Architecture for Data Intensive Computing2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547560(1-8)Online publication date: Sep-2018
https://doi.org/10.1109/HPEC.2018.8547560
Lloyd SGokhale MJacob B(2017)Near memory key/value lookup accelerationProceedings of the International Symposium on Memory Systems10.1145/3132402.3132434(26-33)Online publication date: 2-Oct-2017
https://dl.acm.org/doi/10.1145/3132402.3132434

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents