research-article

BGPQ: A Heap-Based Priority Queue Design for GPUs

Authors:

Eddy Z. ZhangAuthors Info & Claims

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

Article No.: 9, Pages 1 - 10

https://doi.org/10.1145/3472456.3472463

Published: 05 October 2021 Publication History

Abstract

Programming today’s many-core processor is challenging. Due to the enormous amount of parallelism, synchronization is expensive. We need efficient data structures for providing automatic and scalable synchronization methods. In this paper, we focus on the priority queue data structure. We develop a heap-based priority queue implementation called BGPQ. BGPQ uses batched key nodes as the internal data representation, exploits both task parallelism and data parallelism, and is linearizable. We show that BGPQ achieves up to 88X speedup compared with four state-of-the-art CPU parallel priority queue implementations and up to 11.2X speedup over an existing GPU implementation. We also apply BGPQ to search problems, including 0-1 Knapsack and A* search. We achieve 45X-100X and 12X-46X speedup respectively over best known concurrent CPU priority queues.

References

[1]

Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The spraylist: A scalable relaxed priority queue. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 11–20.

Digital Library

[2]

Rassul Ayani. 1990. Lr-algorithm: concurrent operations on priority queues. In Parallel and Distributed Processing, 1990. Proceedings of the Second IEEE Symposium on. IEEE, 22–25.

Digital Library

[3]

Anastasia Braginsky, Nachshon Cohen, and Erez Petrank. 2016. CBPQ: High performance lock-free priority queue. In European Conference on Parallel Processing. Springer, 460–474.

Digital Library

[4]

Gerth Stølting Brodal, Rolf Fagerberg, Ulrich Meyer, and Norbert Zeh. 2004. Cache-oblivious data structures and algorithms for undirected breadth-first search and shortest paths. In Scandinavian Workshop on Algorithm Theory. Springer, 480–492.

[5]

Gerth Stølting Brodal, Jesper Larsson Träff, and Christos D Zaroliagis. 1998. A parallel priority queue with constant time operations. J. Parallel and Distrib. Comput. 49, 1 (1998), 4–21.

Digital Library

[6]

Irina Calciu, Hammurabi Mendes, and Maurice Herlihy. 2014. The Adaptive Priority Queue with Elimination and Combining. In Distributed Computing, Fabian Kuhn (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 406–420.

[7]

Paolo G. Crosetto. 2019. CUPQ: a CUDA implementation of a Priority Queue applied to the many-to-many shortest path problem. https://doi.org/10.5281/zenodo.3595244

[8]

Narsingh Deo and Sushil Prasad. 1992. Parallel heap: An optimal parallel priority queue. The Journal of Supercomputing 6, 1 (1992), 87–98.

Digital Library

[9]

Kristijan Dragicevic and Daniel Bauer. 2009. Optimization techniques for concurrent STM-based implementations: A concurrent binary heap as a case study. In 2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 1–8.

Digital Library

[10]

Keir Fraser. 2004. Practical lock-freedom. Technical Report. University of Cambridge, Computer Laboratory.

[11]

Oded Green, Robert McColl, and David A Bader. 2012. GPU merge path: a GPU merging algorithm. In Proceedings of the 26th ACM international conference on Supercomputing. ACM, 331–340.

Digital Library

[12]

Xi He, Dinesh Agarwal, and Sushil K Prasad. 2012. Design and implementation of a parallel priority queue on many-core architectures. In High Performance Computing (HiPC), 2012 19th International Conference on. IEEE, 1–10.

[13]

Maurice Herlihy and Nir Shavit. 2011. The art of multiprocessor programming. Morgan Kaufmann.

[14]

Galen C Hunt, Maged M Michael, Srinivasan Parthasarathy, and Michael L Scott. 1996. An efficient algorithm for concurrent priority queue heaps. Inform. Process. Lett. 60, 3 (1996), 151–157.

Digital Library

[15]

John Iacono, Ben Karsin, and Nodari Sitchinava. 2019. A parallel priority queue with fast updates for GPU architectures. arXiv preprint arXiv:1908.09378(2019).

[16]

Jonatan Lindén and Bengt Jonsson. 2013. A skiplist-based concurrent priority queue with minimal memory contention. In International Conference On Principles Of Distributed Systems. Springer, 206–220.

Digital Library

[17]

Yujie Liu and Michael Spear. 2012. A lock-free, array-based priority queue. ACM SIGPLAN Notices 47, 8 (2012), 323–324.

Digital Library

[18]

Yujie Liu and Michael Spear. 2012. Mounds: Array-Based Concurrent Priority Queues. In Proceedings of the 2012 41st International Conference on Parallel Processing(ICPP ’12). IEEE Computer Society, USA, 1–10. https://doi.org/10.1109/ICPP.2012.42

Digital Library

[19]

Silvano Martello, David Pisinger, and Paolo Toth. 1999. Dynamic programming and strong bounds for the 0-1 knapsack problem. Management Science 45, 3 (1999), 414–424.

Digital Library

[20]

N. Moscovici, N. Cohen, and E. Petrank. 2017. A GPU-Friendly Skiplist Algorithm. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 246–259. https://doi.org/10.1109/PACT.2017.13

[21]

RV Nageshwara and Vipin Kumar. 1988. Concurrent access of priority queues. IEEE Trans. Comput. 37, 12 (1988), 1657–1665.

Digital Library

[22]

Hagen Peters, Ole Schulz-Hildebrandt, and Norbert Luttenberger. 2009. Fast in-place sorting with cuda based on bitonic sort. In International Conference on Parallel Processing and Applied Mathematics. Springer, 403–410.

[23]

William Pugh. 1990. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33, 6 (1990), 668–676.

Digital Library

[24]

Nir Shavit and Itay Lotan. 2000. Skiplist-based concurrent priority queues. In Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000. IEEE, 263–268.

[25]

Nir Shavit and Dan Touitou. 1997. Software transactional memory. Distributed Computing 10, 2 (1997), 99–116.

[26]

Nir Shavit and Asaph Zemach. 1999. Scalable concurrent priority queue algorithms. In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing. ACM, 113–122.

Digital Library

[27]

Håkan Sundell and Philippas Tsigas. 2005. Fast and lock-free concurrent priority queues for multi-thread systems. J. Parallel and Distrib. Comput. 65, 5 (2005), 609–627.

Digital Library

[28]

Orr Tamir, Adam Morrison, and Noam Rinetzky. 2016. A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms. In 19th International Conference on Principles of Distributed Systems (OPODIS 2015)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 46), Emmanuelle Anceaume, Christian Cachin, and Maria Potop-Butucaru (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 1–16. https://doi.org/10.4230/LIPIcs.OPODIS.2015.15

[29]

Michael Voss, Rafael Asenjo, and James Reinders. 2019. Pro TBB: C++ Parallel Programming with Threading Building Blocks (1st ed.). Apress, USA.

[30]

Deli Zhang and Damian Dechev. 2015. A lock-free priority queue design based on multi-dimensional linked lists. IEEE Transactions on Parallel and Distributed Systems 27, 3 (2015), 613–626.

Digital Library

[31]

Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. 2011. On-the-fly Elimination of Dynamic Irregularities for GPU Computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Newport Beach, California, USA) (ASPLOS XVI). ACM, New York, NY, USA, 369–380.

Digital Library

Recommendations

Geometric tail of queue length of low-priority customers in a nonpreemptive priority MAP/PH/1 queue

We consider a MAP/PH/1 queue with two priority classes and nonpreemptive discipline, focusing on the asymptotic behavior of the tail probability of queue length of low-priority customers. A sufficient condition under which this tail probability decays ...
A single server priority queue with server failures and queue flushing

We consider a singler server queue serving two classes of customers according to a preemptive resume head of the line priority discipline. The server is prone to failures and at the time that they occur, all customers are flushed out of the system. The ...
Equilibrium analysis of a partially observable priority queue
Abstract
This study considers a single-server Markovian queue with preemptive priority discipline. In front of a server, two queues are formed: a normal queue for ordinary customers and a priority queue for priority customers. When a customer arrives, ...
Highlights
- Considering a partially observable priority queue with strategic customers.
- Investigating the customers’ equilibrium strategies of threshold type.
- Investigating the evolutionary stability of equilibria.
- Examining the effect of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

August 2021

927 pages

ISBN:9781450390682

DOI:10.1145/3472456

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2021

ICPP 2021: 50th International Conference on Parallel Processing

August 9 - 12, 2021

IL, Lemont, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
398
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)10

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents