Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2989081.2989128acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Concurrent Dynamic Memory Coalescing on GoblinCore-64 Architecture

Published: 03 October 2016 Publication History

Abstract

The majority of modern microprocessors are architected to utilize multi-level data caches as a primary optimization to reduce the latency and increase the perceived bandwidth from an application. The spatial and temporal locality provided by data caches work well in conjunction with applications that access memory in a linear fashion. However, applications that exhibit random or non-deterministic memory access patterns often induce a significant number of data cache misses, thus reducing the natural performance benefit from the data cache.
In response to the performance penalties inherently present with non-deterministic applications, we have constructed a unique memory hierarchy within the GoblinCore-64 (GC64) architecture explicitly designed to exploit memory performance from irregular memory access patterns. The GC64 architecture combines a RISC-V-based core coupled with latency-hiding architectural features to a memory hierarchy with Hybrid Memory Cube (HMC) devices. In order to cope with the inherent non-determinism of applications and to exploit the packetized interface presented by the HMC device, we develop a methodology and associated implementation of a dynamic memory coalescing unit for the GC64 memory hierarchy that permits us to statistically sample memory requests from non-deterministic applications and coalesce them into the largest possible HMC payload requests.
In this work, we present two parallel methodologies and associated implementations for coalescing non-deterministic memory requests into the largest potential HMC request by constructing a binary tree representation of the live memory requests from disparate cores. We present the coalesced HMC memory request results from applications that exhibit linear and non-linear memory request patterns compiled for a RISC-V core in contrast with a traditional memory hierarchy.

References

[1]
John D. Leidel, Xi Wang, and Yong Chen. GoblinCore64: Architectural Specification. Technical report, Texas Tech University, September 2015.
[2]
Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanovic. The RISC-V Instruction Set Manual, Volume I: User-Level ISA Version 2.0. Technical report, 2014.
[3]
Rohit Chandra. Parallel Programming in OpenMP. Morgan kaufmann, 2001.
[4]
Yunsup Lee, Andrew Waterman, Rimas Avizienis, Henry Cook, Chen Sun, Vladimir Stojanovic, and Krste Asanovic. A 45nm 1.3 GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators. In European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014-40th, pages 199--202. IEEE, 2014.
[5]
Andrew Waterman. Improving energy efficiency and reducing code size with RISC-V compressed. PhD thesis, Master's thesis, University of California, Berkeley, 2011.
[6]
Brian Zimmer, Yunsup Lee, Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Ben Keller, Stevo Bailey, Milovan Blagojevic, Pi-Feng Chiu, Hanh-Phuc Le, et al. A RISC-V vector processor with tightly-integrated switched-capacitor DC-DC converters in 28nm FDSOI. In VLSI Circuits (VLSI Circuits), 2015 Symposium on, pages C316--C317. IEEE, 2015.
[7]
Michael Zimmer, David Broman, Chris Shaver, and Edward A Lee. FlexPRET: A processor platform for mixed-criticality systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014 IEEE 20th, pages 101--110. IEEE, 2014.
[8]
Joe Jeddeloh and Brent Keeth. Hybrid memory cube new DRAM architecture increases density and performance. In 2012 Symposium on VLSI Technology (VLSIT), 2012.
[9]
Maya Gokhale, Scott Lloyd, and Chris Macaraeg. Hybrid Memory Cube Performance Characterization on Data-centric Workloads. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 '15, pages 7:1--7:8, New York, NY, USA, 2015. ACM.
[10]
Paul Rosenfeld, Elliott Cooper-Balis, Todd Farrell, Dave Resnick, and Bruce Jacob. Peering over the memory wall: Design space and performance analysis of the Hybrid Memory Cube. Technical Report UMD-SCA-2012-10-01, University of Maryland.
[11]
Yinhe Han, Ying Wang, Huawei Li, and Xiaowei Li. Data-aware DRAM refresh to squeeze the margin of retention time in hybrid memory cube. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design, pages 295--300. IEEE Press, 2014.
[12]
Mushfique Junayed Khurshid and Mikko Lipasti. Data compression for thermal mitigation in the Hybrid Memory Cube. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 185--192. IEEE, 2013.
[13]
John D Leidel and Yong Chen. HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices. Parallel Processing Letters, 24(04):1442002, 2014.
[14]
John Leidel. Hybrid Memory Cube Simulator 2.0. http://gc64.org/?p=137, December 2015.
[15]
Maya Gokhale, Scott Lloyd, and Chris Hajas. Near Memory Data Structure Rearrangement. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS '15, pages 283--290, New York, NY, USA, 2015. ACM.
[16]
Lifeng Nai and Hyesoon Kim. Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS '15, pages 258--261, New York, NY, USA, 2015. ACM.
[17]
Hybrid Memory Cube Specification 2.0. Technical report, July 2015.
[18]
Toward a New Metric for Ranking High Performance Computing Systems. Technical report, Sandia National Laboratories, 2013.
[19]
David Bader and Kamesh Madduri. Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors. High Performance Computing: HiPC 2005, 3769:465--476, 2005.
[20]
John D. McCalpin. A Survey of Memory Bandwidth and Machine Balance in Current High Performance Computers, 1995.
[21]
Victor Podlozhnyuk. Black-Scholes option pricing, 2007.
[22]
NSLP Kumar, Sanjiv Satoor, and Ian Buck. Fast parallel expectation maximization for Gaussian mixture models on GPUs using CUDA. In High Performance Computing and Communications, 2009. HPCC'09. 11th IEEE International Conference on, pages 103--109. IEEE, 2009.
[23]
Jialin Liu, Yu Zhuang, and Yong Chen. Hierarchical collective i/o scheduling for high-performance computing. Big Data Research, 2(3):117--126, 2015. Big Data, Analytics, and High-Performance Computing.
[24]
Yin Lu, Yong Chen, Yu Zhuang, Jialin Liu, and Rajeev Thakur. Collective input/output under memory constraints. 2014.
[25]
Naznin Fauzia, Louis-Noël Pouchet, and P Sadayappan. Characterizing and enhancing global memory data coalescing on gpus. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 12--22. IEEE Computer Society, 2015.

Cited By

View all
  • (2021)Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/346682315:2(1-33)Online publication date: 1-Dec-2021
  • (2019)Stop Crying Over Your Cache Miss RateProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293901(310-319)Online publication date: 20-Feb-2019
  • (2018)xBGASProceedings of the Workshop on Memory Centric High Performance Computing10.1145/3286475.3286478(22-26)Online publication date: 11-Nov-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
October 2016
463 pages
ISBN:9781450343053
DOI:10.1145/2989081
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data-Intensive Computing
  2. GoblinCore-64
  3. Memory Coalescing
  4. Microcode
  5. Parallel Computing
  6. RISC-V

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Science Foundation (CNS-1338078)

Conference

MEMSYS '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/346682315:2(1-33)Online publication date: 1-Dec-2021
  • (2019)Stop Crying Over Your Cache Miss RateProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293901(310-319)Online publication date: 20-Feb-2019
  • (2018)xBGASProceedings of the Workshop on Memory Centric High Performance Computing10.1145/3286475.3286478(22-26)Online publication date: 11-Nov-2018
  • (2018)StakeProceedings of the International Symposium on Memory Systems10.1145/3240302.3240307(365-376)Online publication date: 1-Oct-2018
  • (2018)Memory Coalescing for Hybrid Memory CubeProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225062(1-10)Online publication date: 13-Aug-2018
  • (2018)GoblinCore-64: A RISC-V Based Architecture for Data Intensive Computing2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547560(1-8)Online publication date: Sep-2018
  • (2017)Near memory key/value lookup accelerationProceedings of the International Symposium on Memory Systems10.1145/3132402.3132434(26-33)Online publication date: 2-Oct-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media