Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2847263.2847343acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
short-paper

A Case for Work-stealing on FPGAs with OpenCL Atomics

Published: 21 February 2016 Publication History

Abstract

We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize work-items not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera's OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means clustering algorithm on an Altera P385 D5 board, both with work-stealing and with a statically-partitioned load. When block RAM utilization is maximised in both cases, we find that work-stealing leads to a 1.5x speedup. This demonstrates that the ability to do load balancing at run-time can outweigh the drawback of using `expensive' atomics on FPGAs. We hope that our case study will stimulate further research into the high-level synthesis of fine-grained, lock-free, concurrent programs.

References

[1]
Altera. Altera SDK for OpenCL - Best Practices Guide. OCL003-14.1.0, 2014.
[2]
Altera. Altera SDK for OpenCL - Programming Guide. OCL002-14.1.0, 2014.
[3]
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, 1998.
[4]
A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. Anderson, S. Brown, and T. Czajkowski. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In FPGA, 2011.
[5]
D. Cederman and P. Tsigas. Dynamic load balancing using work-stealing. In GPU Computing Gems. Elsevier, 2012.
[6]
T. Czajkowski, U. Aydonat, D. Denisenko, and J. Freeman. From OpenCL to high-performance hardware on FPGAs. In FPL, 2012.
[7]
N. George, H. Lee, D. Novo, M. Owaida, D. Andrews, K. Olukotun, and P. Ienne. Automatic support for multi-module parallelism from computational patterns. In FPL, 2015.
[8]
V. Gramoli. More than you ever wanted to know about synchronization. In PPoPP, 2015.
[9]
D. Greaves and S. Singh. Kiwi: Synthesis of FPGA circuits from parallel programs. In FCCM, 2008.
[10]
M. Hosseinabady and J. L. Nunez-Yanez. Optimised OpenCL workgroup synthesis for hybrid ARM-FPGA devices. In FPL, 2015.
[11]
T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu. An efficient k-means clustering algorithm: Analysis and implementation. Pattern Matching and Machine Intelligence, 24(7):881--892, July 2002.
[12]
S. Kestur, J. D. Davis, and E. S. Chung. Towards a universal FPGA matrix-vector multiplication architecture. In FCCM, 2012.
[13]
Khronos Group. The OpenCL 1.0 Specification. 2009.
[14]
H.-S. Kim, M. Ahn, J. A. Stratton, and W.-m. W. Hwu. Design evaluation of OpenCL compiler framework for coarse-grained reconfigurable arrays. In FPT, 2012.
[15]
V. Kumar, A. Sbîrlea, A. Jayaraj, Z. Budimlić, D. Majeti, and V. Sarkar. Heterogeneous work-stealing across CPU and DSP cores. In HPEC, 2015.
[16]
V. Mirian and P. Chow. Using an OpenCL framework to evaluate interconnect implementations on FPGAs. In FPL, 2014.
[17]
T. T. Mutlugün and S.-D. Wang. OpenCL computing on FPGA using multiported shared memory. In FPL, 2015.
[18]
B. Nahill, A. Ramdial, H. Zeng, M. Di Natale, and Z. Zilic. An FPGA implementation of wait-free data synchronization protocols. In ETFA, 2013.
[19]
Z. Wang, B. He, and W. Zhang. A study of data partitioning on OpenCL-based FPGAs. In FPL, 2015.
[20]
F. Winterstein, S. Bayliss, and G. A. Constantinides. High-level synthesis of dynamic data structures: A case study using Vivado HLS. In FPT, 2013.
[21]
Xilinx. SDAccel Development Environment. UG1023 (v2015.1), 2015.

Cited By

View all
  • (2024)HardCilk: Cilk-like Task Parallelism for FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00025(140-150)Online publication date: 5-May-2024
  • (2023) AM 4 : MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing IEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324322213:1(408-421)Online publication date: Mar-2023
  • (2023)Will computing in memory become a new dawn of associative processors?Memories - Materials, Devices, Circuits and Systems10.1016/j.memori.2023.1000334(100033)Online publication date: Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2016
298 pages
ISBN:9781450338561
DOI:10.1145/2847263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. atomic operations
  2. high-level synthesis
  3. k-means clustering
  4. load balancing
  5. lock-free synchronisation
  6. parallelism

Qualifiers

  • Short-paper

Funding Sources

Conference

FPGA'16
Sponsor:

Acceptance Rates

FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;
Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HardCilk: Cilk-like Task Parallelism for FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00025(140-150)Online publication date: 5-May-2024
  • (2023) AM 4 : MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing IEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324322213:1(408-421)Online publication date: Mar-2023
  • (2023)Will computing in memory become a new dawn of associative processors?Memories - Materials, Devices, Circuits and Systems10.1016/j.memori.2023.1000334(100033)Online publication date: Jul-2023
  • (2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022
  • (2022)GIRAF: General Purpose In-Storage Resistive Associative FrameworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306544833:2(276-287)Online publication date: 1-Feb-2022
  • (2022)An Effective 2-Dimension Graph Partitioning for Work Stealing Assisted Graph Processing on Multi-FPGAsIEEE Transactions on Big Data10.1109/TBDATA.2020.30350908:5(1247-1258)Online publication date: 1-Oct-2022
  • (2021)The semantics of shared memory in Intel CPU/FPGA systemsProceedings of the ACM on Programming Languages10.1145/34854975:OOPSLA(1-28)Online publication date: 15-Oct-2021
  • (2021)ThunderGPThe 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3431920.3439290(69-80)Online publication date: 17-Feb-2021
  • (2021)Evaluating the Performance of Integer Sum Reduction on an Intel GPU2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00099(652-655)Online publication date: Jun-2021
  • (2021)Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586184(937-942)Online publication date: 5-Dec-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media