A Generalized Parallel Prefix Sums Algorithm for Arbitrary Size Arrays

Sikorski, Andrzej; Wrosz, Izajasz; Lewandowski, Michał

doi:10.1007/978-3-031-30442-2_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13826))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

512 Accesses

Abstract

The prefix sums algorithm is a fundamental parallel programming building block used to solve significant problems in engineering, mathematical software, and big data analytics. In this paper, we present a generalization of the work-efficient prefix sums algorithm introduced by Blelloch, which in its original form is particularly well-performing on highly parallel architectures. However, the algorithm works only with arrays whose size is a power of 2. While various solutions have been developed to alleviate this limitation, we propose a canonical extension of the classical algorithm, which preserves its original form and maintains the performance characteristics of the work-efficient algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parallel Computation for the All-Pairs Suffix-Prefix Problem

Constructing Generalized Suffix Trees on Distributed Parallel Platforms

Article 01 January 2023

Graph algorithms: parallelization and scalability

Article 21 September 2020

References

Intel Developer Cloud. https://devcloud.intel.com/. Accessed 30 Oct 2022
Aananthakrishnan, S., et al.: Piuma: programmable integrated unified memory architecture. arXiv preprint arXiv:2010.06277 (2020)
Akl, S.G.: Parallel Computation: Models and Methods. Prentice-Hall, Inc., Hoboken (1997)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2009, November 14–20, 2009, Portland, Oregon, USA. ACM (2009). https://doi.org/10.1145/1654059.1654078
Bilgic, B., Horn, B.K.P., Masaki, I.: Efficient integral image computation on the GPU. In: IEEE Intelligent Vehicles Symposium (IV), 2010, La Jolla, CA, USA, June 21–24, 2010, pp. 528–533. IEEE (2010). https://doi.org/10.1109/IVS.2010.5548142
Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989). https://doi.org/10.1109/12.42122
Article Google Scholar
Blelloch, G.E.: Prefix sums and their applications. Technical report. CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990
Google Scholar
Blelloch, G.E., Little, J.J.: Parallel solutions to geometric problems in the scan model of computation. J. Comput. Syst. Sci. 48(1), 90–115 (1994). https://doi.org/10.1016/S0022-0000(05)80023-6
Article MathSciNet MATH Google Scholar
Chatterjee, S., Blelloch, G.E., Zagha, M.: Scan primitives for vector computers. In: Martin, J.L., Pryor, D.V., Montry, G.R. (eds.) Proceedings Supercomputing ’90, New York, NY, USA, November 12–16, 1990, pp. 666–675. IEEE Computer Society (1990). https://doi.org/10.1109/SUPERC.1990.130084
Cole, R., Vishkin, U.: Faster optimal parallel prefix sums and list ranking. Inf. Comput. 81(3), 334–352 (1989). https://doi.org/10.1016/0890-5401(89)90036-9
Article MathSciNet MATH Google Scholar
Elliott, C.: Generic functional parallel algorithms: scan and FFT. Proc. ACM Program. Lang. 1(ICFP), 7:1–7:25 (2017). https://doi.org/10.1145/3110251
Fisher, A.L., Ghuloum, A.M.: Parallelizing complex scans and reductions. In: Sarkar, V., Ryder, B.G., Soffa, M.L. (eds.) Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, USA, June 20–24, 1994, pp. 135–146. ACM (1994). https://doi.org/10.1145/178243.178255
Harris, M., Sengupta, S., Owens, J.D.: Chapter 39. parallel prefix sum (scan) with CUDA (2007). https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda
Hillis, W.D., Jr., G.L.S.: Data parallel algorithms. Commun. ACM. 29(12), 1170–1183 (1986). https://doi.org/10.1145/7902.7903
Iso/iec 14882:2020(en) programming languages - c++. Standard, International Organization for Standardization, Geneva, Switzerland, December 2020
Google Scholar
Ladner, R.E., Fischer, M.J.: Parallel prefix computation. J. ACM 27(4), 831–838 (1980). https://doi.org/10.1145/322217.322232
Article MathSciNet MATH Google Scholar
Lakhotia, K., Petrini, F., Kannan, R., Prasanna, V.K.: Accelerating all reduce with in-network reduction on intel PIUMA. IEEE Micro 42(2), 44–52 (2022)
Article Google Scholar
Lakshmivarahan, S., Dhall, S.K.: Parallel Computing Using the Prefix Problem. Oxford University Press, Oxford (1994)
Google Scholar
Merrill, D., Garland, M., Grimshaw, A.S.: High-performance and scalable GPU graph traversal. ACM Trans. Parallel Comput. 1(2), 14:1–14:30 (2015). https://doi.org/10.1145/2717511
Morihata, A.: Lambda calculus with algebraic simplification for reduction parallelisation: extended study. J. Funct. Program. 31, e7 (2021). https://doi.org/10.1017/S0956796821000058
Article MathSciNet MATH Google Scholar
Reinders, J., Ashbaugh, B., Brodman, J., Kinsner, M., Pennycook, J., Tian, X.: Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Springer Nature, CA (2021). https://doi.org/10.1007/978-1-4842-5574-2
Book Google Scholar
Sanders, P., Träff, J.L.: Parallel prefix (scan) algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, September 17–20, 2006, Proceedings. LNCS, vol. 4192, pp. 49–57. Springer, Cham (2006). https://doi.org/10.1007/11846802_15
Satish, N., Harris, M.J., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23–29, 2009, pp. 1–10. IEEE (2009). https://doi.org/10.1109/IPDPS.2009.5161005
Sengupta, S., Lefohn, A., Owens, J.D.: A work-efficient step-efficient prefix sum algorithm (2006)
Google Scholar
Sycl 2020 specification (revision 5). Standard, Khronos Group, Beaverton, OR, USA, May 2022
Google Scholar
Zhang, N.: A novel parallel prefix sum algorithm and its implementation on multi-core platforms. In: 2010 2nd International Conference on Computer Engineering and Technology, vol. 2, pp. V2–66-V2-70. IEEE (2010). https://doi.org/10.1109/ICCET.2010.5485315
Zhang, W., Wang, Y., Ross, K.A.: Parallel prefix sum with SIMD. In: Bordawekar, R., Lahiri, T. (eds.) International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2020, Tokyo, Japan, August 31, 2020. pp. 1–11 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel, Gdańsk, Poland
Andrzej Sikorski, Izajasz Wrosz & Michał Lewandowski
Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
Izajasz Wrosz

Authors

Andrzej Sikorski
View author publications
You can also search for this author in PubMed Google Scholar
Izajasz Wrosz
View author publications
You can also search for this author in PubMed Google Scholar
Michał Lewandowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Izajasz Wrosz .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sikorski, A., Wrosz, I., Lewandowski, M. (2023). A Generalized Parallel Prefix Sums Algorithm for Arbitrary Size Arrays. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. https://doi.org/10.1007/978-3-031-30442-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-30442-2_3
Published: 28 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30441-5
Online ISBN: 978-3-031-30442-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Generalized Parallel Prefix Sums Algorithm for Arbitrary Size Arrays

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Computation for the All-Pairs Suffix-Prefix Problem

Constructing Generalized Suffix Trees on Distributed Parallel Platforms

Graph algorithms: parallelization and scalability

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Generalized Parallel Prefix Sums Algorithm for Arbitrary Size Arrays

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Computation for the All-Pairs Suffix-Prefix Problem

Constructing Generalized Suffix Trees on Distributed Parallel Platforms

Graph algorithms: parallelization and scalability

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation