Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3458744.3474041acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Public Access

Implementing Arbitrary/Common Concurrent Writes of CRCW PRAM

Published: 23 September 2021 Publication History

Abstract

The Parallel Random Access Machines (PRAM) abstraction is the simplest and most elegant algorithmic model for the design and analysis of parallel algorithms. It consists of different models categorized based on the underlying memory access mode used, the most powerful of which is the Concurrent Read Concurrent Write (CRCW) model. A PRAM algorithm describes a series of rounds, each of which consists of a collection of operations that can be executed concurrently within the same time step. However, the lack of support for concurrent memory accesses and the prevalence of asynchronous programming models led to the belief that implementing CRCW PRAM algorithms is unattainable and prompted many to avoid this model except for theoretical studies of optimal performance.
In this work, we study the arbitrary and common concurrent writes in the CRCW PRAM model and explore implementation challenges on general-purpose systems. Moreover, we examine current practices for implementing common/arbitrary concurrent writes and propose a new efficient lightweight and thread-safe method to implement concurrent writes through leveraging atomic instructions. To demonstrate the efficacy of our method, we developed OpenMP kernels for classical CRCW PRAM algorithms and provide experimental results and comparisons based on run time performance measured over the x86 multicore architecture. Our results show a performance speedup compared to current practices up to 4.5x across all our benchmarks.

References

[1]
Baruch Awerbuch and Yossi Shiloach. 1987. New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM. IEEE Trans. Comput. C-36, 10 (1987), 1258–1263.
[2]
Ravi B. Boppana. 1989. Optimal Separations Between Concurrent-Write Parallel Machines. In Proc. of the 21st Ann. ACM Symp. on Theory of Computing, May 14-17, 1989, Seattle, Washigton, USA, David S. Johnson (Ed.). ACM.
[3]
Stefan D. Bruda and Yuanqiao Zhang. 2010. Collapsing the Hierarchy of Parallel Computational Models. Int. J. Found. Comput. Sci. 21, 3 (2010).
[4]
B. S. Chlebus, K. Diks, T. Hagerup, and T. Radzik. 1988. Efficient simulations between concurrent-read concurrent-write pram models. In Mathematical Foundations of Computer Science 1988, Michal P. Chytil, Václav Koubek, and Ladislav Janiga(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg.
[5]
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Eunice Santos, Eunice Santos, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: towards a realistic model of parallel computation. SIGPLAN Not. 28, 7 (1993).
[6]
James Edwards and Uzi Vishkin. 2012. Better Speedups Using Simpler Parallel Programming for Graph Connectivity and Biconnectivity. In Proc. of the 2012 Int. Workshop on Programming Models and Applications for Multicores and Manycores.
[7]
James Edwards and Uzi Vishkin. 2013. Brief Announcement: Truly Parallel Burrows-wheeler Compression and Decompression. In Proc. of the 25th Annu. ACM Symp. on Parallelism in Algorithms and Architectures.
[8]
Oak Ridge Computing Leadership Facility. 2021. Andes cluster. https://www.olcf.ornl.gov/olcf-resources/compute-systems/andes/
[9]
Faith E. Fich, Russell Impagliazzo, Bruce Kapron, Valerie King, and Miroslaw Kutylowski. 1993. Limits on the power of parallel random access machines with weak forms of write conflict resolution. In STACS 93, P. Enjalbert, A. Finkel, and K. W. Wagner (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg.
[10]
Faith E. Fich, Prabhakar Ragde, and Avi Wigderson. 1988. Simulations among Concurrent-Write PRAMs. Algorithmica 3, 1–4 (nov 1988).
[11]
Toshiyuki Fujiwara, Kazuo Iwama, and Chuzo Iwamoto. 2004. Partially effective randomization in simulations between ARBITRARY and COMMON PRAMs. J. Parallel and Distrib. Comput. 64, 3 (2004).
[12]
F. Ghanim, U. Vishkin, and R. Barua. 2018. Easy PRAM-Based High-Performance Parallel Programming with ICE. IEEE Transactions on Parallel and Distributed Systems 29, 2 (2018).
[13]
J. Gil and Y. Matias. 1994. Fast and Efficient Simulations among CRCW PRAMs. J. Parallel and Distrib. Comput. 23, 2 (1994).
[14]
Torben Hagerup. 1992. Fast and optimal simulations between CRCW PRAMs. In STACS 92, Alain Finkel and Matthias Jantzen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg.
[15]
Torben Hagerup and T. Radzik. 1990. Every Robust CRCW PRAM Can Efficiently Simulate a PRIORITY PRAM. In Proceedings of the Second Annual ACM Symposium on Parallel Algorithms and Architectures(SPAA ’90). Association for Computing Machinery, New York, NY, USA.
[16]
J. JaJa. 1992. An Introduction to Parallel Algorithms. Addison-Wesley Publishing Company.
[17]
Rodinia Project. 2021. Rodinia Benchmark Suite. http://www.cs.virginia.edu/rodinia/doku.php?id=start
[18]
Prabhakar Radge. 1992. Processor-Time Tradeoffs in PRAM Simulations. J. Comput. Syst. Sci. 44, 1 (Feb. 1992).
[19]
Yossi Shiloach and Uzi Vishkin. 1982. An O(logn) parallel connectivity algorithm. Journal of Algorithms 3, 1 (1982). https://www.sciencedirect.com/science/article/pii/0196677482900086
[20]
Julian Shun and Guy E. Blelloch. 2014. A Simple Parallel Cartesian Tree Algorithm and Its Application to Parallel Suffix Tree Constr.1, 1 (Oct. 2014).
[21]
U. Vishkin, G. Caragea, and B. Lee. 2008. Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform. In Handbook on Parallel Computing (Editors: S. Rajasekaran, J. Reif). Chapman and Hall/CRC Press.
[22]
X. Wen and U. Vishkin. 2008. FPGA-based prototype of a PRAM-on-chip processor. In Proc. ACM Computing Frontiers.
[23]
S. B. Yang, S. K. Dhall, and S. Lakshmivarahan. 1991. Simple randomized parallel algorithms for finding a maximal matching in an undirected graph. In IEEE Proceedings of the SOUTHEASTCON ’91. 579–581 vol.1.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop
August 2021
314 pages
ISBN:9781450384414
DOI:10.1145/3458744
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arbitrary Concurrent Writes
  2. CRCW PRAM
  3. Parallel Algorithms
  4. Parallel Architectures
  5. Write-conflict resolution

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 588
    Total Downloads
  • Downloads (Last 12 months)340
  • Downloads (Last 6 weeks)29
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media