Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3472456.3472463acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

BGPQ: A Heap-Based Priority Queue Design for GPUs

Published: 05 October 2021 Publication History

Abstract

Programming today’s many-core processor is challenging. Due to the enormous amount of parallelism, synchronization is expensive. We need efficient data structures for providing automatic and scalable synchronization methods. In this paper, we focus on the priority queue data structure. We develop a heap-based priority queue implementation called BGPQ. BGPQ uses batched key nodes as the internal data representation, exploits both task parallelism and data parallelism, and is linearizable. We show that BGPQ achieves up to 88X speedup compared with four state-of-the-art CPU parallel priority queue implementations and up to 11.2X speedup over an existing GPU implementation. We also apply BGPQ to search problems, including 0-1 Knapsack and A* search. We achieve 45X-100X and 12X-46X speedup respectively over best known concurrent CPU priority queues.

References

[1]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The spraylist: A scalable relaxed priority queue. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 11–20.
[2]
Rassul Ayani. 1990. Lr-algorithm: concurrent operations on priority queues. In Parallel and Distributed Processing, 1990. Proceedings of the Second IEEE Symposium on. IEEE, 22–25.
[3]
Anastasia Braginsky, Nachshon Cohen, and Erez Petrank. 2016. CBPQ: High performance lock-free priority queue. In European Conference on Parallel Processing. Springer, 460–474.
[4]
Gerth Stølting Brodal, Rolf Fagerberg, Ulrich Meyer, and Norbert Zeh. 2004. Cache-oblivious data structures and algorithms for undirected breadth-first search and shortest paths. In Scandinavian Workshop on Algorithm Theory. Springer, 480–492.
[5]
Gerth Stølting Brodal, Jesper Larsson Träff, and Christos D Zaroliagis. 1998. A parallel priority queue with constant time operations. J. Parallel and Distrib. Comput. 49, 1 (1998), 4–21.
[6]
Irina Calciu, Hammurabi Mendes, and Maurice Herlihy. 2014. The Adaptive Priority Queue with Elimination and Combining. In Distributed Computing, Fabian Kuhn (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 406–420.
[7]
Paolo G. Crosetto. 2019. CUPQ: a CUDA implementation of a Priority Queue applied to the many-to-many shortest path problem. https://doi.org/10.5281/zenodo.3595244
[8]
Narsingh Deo and Sushil Prasad. 1992. Parallel heap: An optimal parallel priority queue. The Journal of Supercomputing 6, 1 (1992), 87–98.
[9]
Kristijan Dragicevic and Daniel Bauer. 2009. Optimization techniques for concurrent STM-based implementations: A concurrent binary heap as a case study. In 2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 1–8.
[10]
Keir Fraser. 2004. Practical lock-freedom. Technical Report. University of Cambridge, Computer Laboratory.
[11]
Oded Green, Robert McColl, and David A Bader. 2012. GPU merge path: a GPU merging algorithm. In Proceedings of the 26th ACM international conference on Supercomputing. ACM, 331–340.
[12]
Xi He, Dinesh Agarwal, and Sushil K Prasad. 2012. Design and implementation of a parallel priority queue on many-core architectures. In High Performance Computing (HiPC), 2012 19th International Conference on. IEEE, 1–10.
[13]
Maurice Herlihy and Nir Shavit. 2011. The art of multiprocessor programming. Morgan Kaufmann.
[14]
Galen C Hunt, Maged M Michael, Srinivasan Parthasarathy, and Michael L Scott. 1996. An efficient algorithm for concurrent priority queue heaps. Inform. Process. Lett. 60, 3 (1996), 151–157.
[15]
John Iacono, Ben Karsin, and Nodari Sitchinava. 2019. A parallel priority queue with fast updates for GPU architectures. arXiv preprint arXiv:1908.09378(2019).
[16]
Jonatan Lindén and Bengt Jonsson. 2013. A skiplist-based concurrent priority queue with minimal memory contention. In International Conference On Principles Of Distributed Systems. Springer, 206–220.
[17]
Yujie Liu and Michael Spear. 2012. A lock-free, array-based priority queue. ACM SIGPLAN Notices 47, 8 (2012), 323–324.
[18]
Yujie Liu and Michael Spear. 2012. Mounds: Array-Based Concurrent Priority Queues. In Proceedings of the 2012 41st International Conference on Parallel Processing(ICPP ’12). IEEE Computer Society, USA, 1–10. https://doi.org/10.1109/ICPP.2012.42
[19]
Silvano Martello, David Pisinger, and Paolo Toth. 1999. Dynamic programming and strong bounds for the 0-1 knapsack problem. Management Science 45, 3 (1999), 414–424.
[20]
N. Moscovici, N. Cohen, and E. Petrank. 2017. A GPU-Friendly Skiplist Algorithm. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 246–259. https://doi.org/10.1109/PACT.2017.13
[21]
RV Nageshwara and Vipin Kumar. 1988. Concurrent access of priority queues. IEEE Trans. Comput. 37, 12 (1988), 1657–1665.
[22]
Hagen Peters, Ole Schulz-Hildebrandt, and Norbert Luttenberger. 2009. Fast in-place sorting with cuda based on bitonic sort. In International Conference on Parallel Processing and Applied Mathematics. Springer, 403–410.
[23]
William Pugh. 1990. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33, 6 (1990), 668–676.
[24]
Nir Shavit and Itay Lotan. 2000. Skiplist-based concurrent priority queues. In Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000. IEEE, 263–268.
[25]
Nir Shavit and Dan Touitou. 1997. Software transactional memory. Distributed Computing 10, 2 (1997), 99–116.
[26]
Nir Shavit and Asaph Zemach. 1999. Scalable concurrent priority queue algorithms. In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing. ACM, 113–122.
[27]
Håkan Sundell and Philippas Tsigas. 2005. Fast and lock-free concurrent priority queues for multi-thread systems. J. Parallel and Distrib. Comput. 65, 5 (2005), 609–627.
[28]
Orr Tamir, Adam Morrison, and Noam Rinetzky. 2016. A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms. In 19th International Conference on Principles of Distributed Systems (OPODIS 2015)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 46), Emmanuelle Anceaume, Christian Cachin, and Maria Potop-Butucaru (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 1–16. https://doi.org/10.4230/LIPIcs.OPODIS.2015.15
[29]
Michael Voss, Rafael Asenjo, and James Reinders. 2019. Pro TBB: C++ Parallel Programming with Threading Building Blocks (1st ed.). Apress, USA.
[30]
Deli Zhang and Damian Dechev. 2015. A lock-free priority queue design based on multi-dimensional linked lists. IEEE Transactions on Parallel and Distributed Systems 27, 3 (2015), 613–626.
[31]
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. 2011. On-the-fly Elimination of Dynamic Irregularities for GPU Computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Newport Beach, California, USA) (ASPLOS XVI). ACM, New York, NY, USA, 369–380.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Batched Heap
  2. GPUs
  3. Priority Queue

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 398
    Total Downloads
  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)10
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media