Bandwidth-Optimal Random Shuffling for GPUs

Mitchell, Rory; Stokes, Daniel; Frank, Eibe; Holmes, Geoffrey

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2106.06161v2 (cs)

[Submitted on 11 Jun 2021 (v1), last revised 3 Feb 2022 (this version, v2)]

Title:Bandwidth-Optimal Random Shuffling for GPUs

Authors:Rory Mitchell, Daniel Stokes, Eibe Frank, Geoffrey Holmes

View PDF

Abstract:Linear-time algorithms that are traditionally used to shuffle data on CPUs, such as the method of Fisher-Yates, are not well suited to implementation on GPUs due to inherent sequential dependencies, and existing parallel shuffling algorithms are unsuitable for GPU architectures because they incur a large number of read/write operations to high latency global memory. To address this, we provide a method of generating pseudo-random permutations in parallel by fusing suitable pseudo-random bijective functions with stream compaction operations. Our algorithm, termed `bijective shuffle' trades increased per-thread arithmetic operations for reduced global memory transactions. It is work-efficient, deterministic, and only requires a single global memory read and write per shuffle input, thus maximising use of global memory bandwidth. To empirically demonstrate the correctness of the algorithm, we develop a statistical test for the quality of pseudo-random permutations based on kernel space embeddings. Experimental results show that the bijective shuffle algorithm outperforms competing algorithms on GPUs, showing improvements of between one and two orders of magnitude and approaching peak device bandwidth.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2106.06161 [cs.DC]
	(or arXiv:2106.06161v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2106.06161

Submission history

From: Rory Mitchell [view email]
[v1] Fri, 11 Jun 2021 04:10:13 UTC (2,778 KB)
[v2] Thu, 3 Feb 2022 11:36:57 UTC (4,195 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Bandwidth-Optimal Random Shuffling for GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Bandwidth-Optimal Random Shuffling for GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators