Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/DAC18074.2021.9586184guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS

Published: 05 December 2021 Publication History

Abstract

FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the state-of-the-art designs in both throughput and BRAM usage efficiency.

References

[1]
R. Nane et al., “A survey and evaluation of fpga high-level synthesis tools,” TCAD, 2015.
[2]
Z. Ruan et al., “St-accel: A high-level programming platform for streaming applications on fpga,” in FCCM, 2018.
[3]
J. Cong et al., “Automated accelerator generation and optimization with composable, parallel and pipeline architecture,” in DAC, 2018.
[4]
J. Thomas et al., “Fleet: A framework for massively parallel streaming on fpgas,” in ASPLOS, 2020.
[5]
J. Cong et al., “Bandwidth optimization through on-chip memory restructuring for hls,” in DAC, 2017.
[6]
Z. Li et al., “Aggressive pipelining of irregular applications on reconfigurable hardware,” in ISCA, 2017.
[7]
J. Fowers et al., “A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication,” in FCCM, 2014.
[8]
X. Chen et al., “On-the-fly parallel data shuffling for graph processing on opencl-based fpgas,” in FPL, 2019.
[9]
X. Chen et al., “ThunderGP: HLS-based graph processing framework on fpgas,” in FPGA, 2021.
[10]
X. Chen et al., “Is fpga useful for hash joins?” in CIDR, 2020.
[11]
N. Ramanathan et al., “A case for work-stealing on fpgas with opencl atomics,” in FPGA, 2016.
[12]
J. Jiang et al., “Boyi: A systematic framework for automatically deciding the right execution model of opencl applications on fpgas,” in FPGA, 2020.
[13]
C. Balkesen et al., “Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware,” in ICDE, 2013.
[14]
T. Geng et al., “Awb-gcn: A graph convolutional network accelerator with runtime workload rebalancing,” in MICRO, 2020.
[15]
Intel. (2020) Intel FPGA SDK for opencl pro edition programming guide.
[16]
H. Röger and R. Mayer, “A comprehensive survey on parallelization and elasticity in stream processing,” CSUR, 2019.
[17]
K. Kara et al., “Fpga-based data partitioning,” in SIGMOD, 2017.
[18]
Z. Wang et al., “Multikernel data partitioning with channel on opencl-based FPGAs,” TVLSI, 2017.
[19]
D. Tong et al., “High throughput sketch based online heavy hitter detection on fpga,” Comput Architect News, 2016.
[20]
A. Kulkarni et al., “Hyperloglog sketch acceleration on fpga,” in FPL, 2020.
[21]
S. Zhou et al., “Hitgraph: High-throughput graph processing framework on fpga,” TPDS, 2019.
[22]
R. Rossi et al., “The network data repository with interactive graph analytics and visualization,” in AAAI, 2015.
[23]
H. Yan et al., “Constructing concurrent data structures on fpga with channels,” in FPGA, 2019.
[24]
J. Fang et al., “Parallel stream processing against workload skewness and variance,” in HPDC, 2017.
[25]
Z. Wang et al., “Melia: A mapreduce framework on opencl-based fpgas,” TPDS, 2016.

Cited By

View all
  • (2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022

Index Terms

  1. Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      2021 58th ACM/IEEE Design Automation Conference (DAC)
      Dec 2021
      1380 pages

      Publisher

      IEEE Press

      Publication History

      Published: 05 December 2021

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media