We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-documented and consistent basis for evaluation studies. We describe the applications currently in the suite in detail, discuss and compare some of their important characteristicsPsuch as data locality, granularity, synchronization, etc.Pand explore their behavior by running them on a real multiprocessor as well as on a simulator of an idealized parallel architecture. We expect the current set of applications to act as a nucleus for a suite that will grow with time. This report replaces and updates CSL-TR-91-469, April 1991.
Cited By
- Nongpoh B, Ray R and Banerjee A Approximate computing for multithreaded programs in shared memory architectures Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design, (1-9)
- Subramaniam S, Steely S, Hasenplaugh W, Jaleel A, Beckmann C, Fossum T and Emer J (2013). Using in-flight chains to build a scalable cache coherence protocol, ACM Transactions on Architecture and Code Optimization, 10:4, (1-24), Online publication date: 1-Dec-2013.
- Kasikci B, Zamfir C and Candea G RaceMob Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, (406-422)
- Krishnaiah G, Silpa B, Panda P and Kumar A Exploiting temporal decoupling to accelerate trace-driven NoC emulation Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (315-324)
- Bocchino R and Adve V Types, regions, and effects for safe programming with object-oriented parallel frameworks Proceedings of the 25th European conference on Object-oriented programming, (306-332)
- Patel A, Afram F, Chen S and Ghose K MARSS Proceedings of the 48th Design Automation Conference, (1050-1055)
- Krishnaiah G, Silpa B, Panda P and Kumar A FastFwd Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (247-256)
- Rountree B, Lowenthal D, Funk S, Freeh V, de Supinski B and Schulz M Bounding energy consumption in large-scale MPI programs Proceedings of the 2007 ACM/IEEE conference on Supercomputing, (1-9)
- Analysis of Shared Memory Misses and Reference Patterns Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
- Goudreau M, Lang K, Rao S, Suel T and Tsantilas T (1999). Portable and Efficient Parallel Computing Using the BSP Model, IEEE Transactions on Computers, 48:7, (670-689), Online publication date: 1-Jul-1999.
- Yeung D The scalability of multigrain systems Proceedings of the 13th international conference on Supercomputing, (268-277)
- Chang Y and Bhuyan L (1999). An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors, IEEE Transactions on Computers, 48:3, (352-360), Online publication date: 1-Mar-1999.
- Bianchini R and Lim B (2019). Evaluating the Performance of Multithreading and Prefetching in Multiprocessors, Journal of Parallel and Distributed Computing, 37:1, (83-97), Online publication date: 25-Aug-1996.
- Goudreau M, Lang K, Rao S, Suel T and Tsantilas T Towards efficiency and portability Proceedings of the eighth annual ACM symposium on Parallel Algorithms and Architectures, (1-12)
- Lim B and Bianchini R Limits on the performance benefits of multithreading and prefetching Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (37-46)
- Lim B and Bianchini R (1996). Limits on the performance benefits of multithreading and prefetching, ACM SIGMETRICS Performance Evaluation Review, 24:1, (37-46), Online publication date: 15-May-1996.
- Yeung D, Kubiatowicz J and Agarwal A MGS Proceedings of the 23rd annual international symposium on Computer architecture, (44-55)
- Yeung D, Kubiatowicz J and Agarwal A (1996). MGS, ACM SIGARCH Computer Architecture News, 24:2, (44-55), Online publication date: 1-May-1996.
- Brewer E, Gauthier P, Fox A and Schuett A Software Techniques for Improving MPP Bulk-Transfer Performance Proceedings of the 10th International Parallel Processing Symposium, (406-412)
- Woo S, Singh J and Hennessy J (1994). The performance advantages of integrating block data transfer in cache-coherent multiprocessors, ACM SIGOPS Operating Systems Review, 28:5, (219-229), Online publication date: 1-Dec-1994.
- Lim B and Agarwal A (1994). Reactive synchronization algorithms for multiprocessors, ACM SIGOPS Operating Systems Review, 28:5, (25-35), Online publication date: 1-Dec-1994.
- Woo S, Singh J and Hennessy J The performance advantages of integrating block data transfer in cache-coherent multiprocessors Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (219-229)
- Lim B and Agarwal A Reactive synchronization algorithms for multiprocessors Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (25-35)
- Woo S, Singh J and Hennessy J (1994). The performance advantages of integrating block data transfer in cache-coherent multiprocessors, ACM SIGPLAN Notices, 29:11, (219-229), Online publication date: 1-Nov-1994.
- Lim B and Agarwal A (1994). Reactive synchronization algorithms for multiprocessors, ACM SIGPLAN Notices, 29:11, (25-35), Online publication date: 1-Nov-1994.
- Chaiken D and Agarwal A Software-extended coherent shared memory Proceedings of the 21st annual international symposium on Computer architecture, (314-324)
Recommendations
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate ...
Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs
Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single ...