research-article

Public Access

GraphIA: an <u>i</u>n-situ <u>a</u>ccelerator for large-scale graph processing

Authors:

Yuan XieAuthors Info & Claims

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Pages 79 - 84

https://doi.org/10.1145/3240302.3240312

Published: 01 October 2018 Publication History

Abstract

Graph processing is widely used in various domains, while processing large-scale graphs has always been memory-bound. In-situ processing is a promising solution to overcome the "memory wall" challenges in such memory-intensive applications. Previous accelerator designs for graph processing only focused on integrating more computing units inside memories or using more memory layers, rather than exploiting the huge parallelism lying in memory banks. In this paper, we present GraphIA, an In-situ Accelerator for large-scale graph processing based on DRAM technology. GraphIA couples large-capacity memory and computing resource in DRAM by connecting multiple chips with computation circuits inside. GraphIA chips are organized into a scaling ring interconnection, which is able to maximize the individual bandwidth with minimal connection overheads and scale to larger graphs by using more chips. Banks in DRAM are organized into heterogeneous edge and vertex banks, cooperating with customized peripheral circuits. Data duplication and scheduling schemes in heterogeneous banks are further introduced to overcome the performance loss caused by the irregular local and remote memory access in our multi-chip ring structure, achieving 1.63X and 1.16X speedup respectively. According to our extensive experiments, by adopting GraphIA design, our in-situ accelerator achieves 217X speedup CPU-DRAM designs.

References

[1]

Junwhan Ahn et al. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA. IEEE, 105--117.

Digital Library

[2]

7-Zip LZMA Benchmark. 2018. Intel Skylake. https://www.7-cpu.com/cpu/Skylake.html. (2018).

[3]

Yunji Chen et al. 2014. Dadiannao: A machine-learning supercomputer. In MICRO. IEEE, 609--622.

Digital Library

[4]

Yuze Chi et al. 2016. Nxgraph: An efficient graph processing system on a single machine. In ICDE. IEEE, 409--420.

[5]

Guohao Dai et al. 2016. FPGP: Graph processing framework on fpga a case study of breadth-first search. In FPGA. ACM, 105--110.

Digital Library

[6]

Guohao Dai et al. 2017. ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture. In FPGA. ACM, 217--226.

Digital Library

[7]

Guohao Dai et al. 2018. GraphH: A Processing-in-Memory Architecture for Large-scale Graph Processing. IEEE TCAD (2018).

[8]

Paul Dlugosch et al. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE TPDS 25, 12 (2014), 3088--3098.

[9]

Mingyu Gao et al. 2017. Tetris: Scalable and efficient neural network acceleration with 3d memory. In ASPLOS. ACM, 751--764.

Digital Library

[10]

Joseph E Gonzalez et al. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. USENIX, 599--613.

Digital Library

[11]

Tae Jun Ham et al. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In MICRO. IEEE, 1--13.

Digital Library

[12]

Song Han et al. 2016. EIE: efficient inference engine on compressed deep neural network. In ISCA. IEEE, 243--254.

Digital Library

[13]

Manuel Holtgrewe et al. 2010. Engineering a scalable high quality graph partitioner. In IPDPS. IEEE, 1--12.

[14]

Tianhao Huang et al. 2018. HyVE: Hybrid vertex-edge memory hierarchy for energy-efficient graph processing. In DATE. IEEE, 973--978.

[15]

Micron Technology Inc. 2009. 4Gb: x4, x8, x16 DDR3 SDRAM. https://www.micron.com/~/media/documents/products/data-sheet/dram/ddr3/4gb_ddr3_sdram.pdf. (2009).

[16]

Micron Technology Inc. 2018. System Power Calculator Information. https://www.micron.com/support/tools-and-utilities/power-calc. (2018).

[17]

Duckhwan Kim et al. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In ISCA. IEEE, 380--392.

Digital Library

[18]

Aapo Kyrola et al. 2012. Graphchi: Large-scale graph computation on just a pc. In OSDI. USENIX, 31--46.

Digital Library

[19]

Jure Leskovec et al. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).

[20]

Shuangchen Li et al. 2017. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator. In MICRO. ACM, 288--301.

[21]

Yucheng Low et al. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. VLDB Endowment 5, 8 (2012), 716--727.

Digital Library

[22]

Grzegorz Malewicz et al. 2010. Pregel: a system for large-scale graph processing. In SIGMOD. ACM, 135--146.

Digital Library

[23]

Tayo Oguntebi et al. 2016. Graphops: A dataflow library for graph analytics acceleration. In FPGA. ACM, 111--117.

Digital Library

[24]

Muhammet Mustafa Ozdal et al. 2016. Energy efficient architecture for graph analytics accelerators. In ISCA. IEEE, 166--177.

Digital Library

[25]

Amitabha Roy et al. 2013. X-stream: Edge-centric graph processing using streaming partitions. In SOSP. ACM, 472--488.

Digital Library

[26]

Dan Zhang et al. 2018. Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching. In ASPLOS. ACM, 593--607.

Digital Library

[27]

Mingxing Zhang et al. 2018. GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition. In HPCA. IEEE, 544--557.

[28]

Mingxing Zhang et al. 2018. Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System. In ASPLOS. ACM, 608--621.

Digital Library

[29]

Xiaowei Zhu et al. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In ATC. USENIX, 375--386.

Digital Library

[30]

Xiaowei Zhu et al. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In OSDI. USENIX, 301--316.

Digital Library

Cited By

Peng WChen JHuang WHuang Y(2024)MRH-GCN: An Efficient GCN Accelerator for Multi-Relation Heterogeneous Graph2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00030(197-203)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00030
Al-Hawaj KTa TCebry NAgwa SAfuye OHall EGolden CApsel ABatten C(2023)EVE: Ephemeral Vector Engines2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071074(691-704)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071074
Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)DAG Processing Unit Version 1 (DPU): Efficient Execution of Irregular Workloads on a Multicore ProcessorEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_4(69-88)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_4
Show More Cited By

Index Terms

GraphIA: an <u>i</u>n-situ <u>a</u>ccelerator for large-scale graph processing
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage

Recommendations

Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
LL-PCM: Low-Latency Phase Change Memory Architecture
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

PCM is a promising non-volatile memory technology, as it can offer a unique trade-off between density and latency compared with DRAM and flash memory. Albeit PCM is much faster than flash memory, it is still notably slower than DRAM, which can ...
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms
Performance evaluation review

Variation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-process- or temperature-induced variation. We empirically demonstrate a new form of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

October 2018

361 pages

ISBN:9781450364751

DOI:10.1145/3240302

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MEMSYS '18

MEMSYS '18: The International Symposium on Memory Systems

October 1 - 4, 2018

Virginia, Alexandria, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
597
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)18

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Peng WChen JHuang WHuang Y(2024)MRH-GCN: An Efficient GCN Accelerator for Multi-Relation Heterogeneous Graph2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00030(197-203)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00030
Al-Hawaj KTa TCebry NAgwa SAfuye OHall EGolden CApsel ABatten C(2023)EVE: Ephemeral Vector Engines2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071074(691-704)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071074
Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)DAG Processing Unit Version 1 (DPU): Efficient Execution of Irregular Workloads on a Multicore ProcessorEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_4(69-88)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_4
Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)Irregular Workloads at Risk of Losing the Hardware LotteryEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_1(1-21)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_1
Shah NOlascoaga LZhao SMeert WVerhelst M(2022)DPU: DAG Processing Unit for Irregular Graphs With Precision-Scalable Posit Arithmetic in 28 nmIEEE Journal of Solid-State Circuits10.1109/JSSC.2021.313489757:8(2586-2596)Online publication date: Aug-2022
https://doi.org/10.1109/JSSC.2021.3134897
Oliveira GBoroumand AGhose SGomez-Luna JMutlu O(2022)Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI54635.2022.00060(273-278)Online publication date: Jul-2022
https://doi.org/10.1109/ISVLSI54635.2022.00060
Challapalle NRampalli SSong LChandramoorthy NSwaminathan KSampson JChen YNarayanan VMartínez JDuato JEeckhout L(2020)GaaS-XProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00044(433-445)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00044
Rheindt SFried ALenke ONolte LSabirov TTwardzik TWild THerkersdorf A(2020)X-CEL: A Method to Estimate Near-Memory Acceleration Potential in Tile-Based MPSoCsArchitecture of Computing Systems – ARCS 202010.1007/978-3-030-52794-5_9(109-123)Online publication date: 9-Jul-2020
https://doi.org/10.1007/978-3-030-52794-5_9
Rheindt SFried ALenke ONolte LWild THerkersdorf A(2019)NEMESYSProceedings of the International Symposium on Memory Systems10.1145/3357526.3357545(3-18)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357545
Li SGlova AHu XGu PNiu DMalladi KZheng HBrennan BXie YOskin MInoue K(2018)SCOPEProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00062(696-709)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00062

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents