... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of I... more ... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of Illinois at Urbana-Champaign. ... An extreme scale system is one that is one thousand times more capable than a current comparable system, with the same power and physical footprint. ...
This collection of 45 papers and 13 posters from the September 2002 conference focuses on the sof... more This collection of 45 papers and 13 posters from the September 2002 conference focuses on the software and hardware that will enable cluster computing. The researchers discuss task management, network hardware, programming clusters, and scalable clusters. Among the topics are experience in offloadin"
We present a simple auto-tuning method to improve the performance of sparse matrix-vector multipl... more We present a simple auto-tuning method to improve the performance of sparse matrix-vector multiply (SpMV) on a GPU. The sparse matrix, stored in CSR format, is sorted in increasing order of the number of nonzero elements per row and partitioned into several ranges. The number of GPU threads per row (TPR) is then assigned for different ranges of the matrix rows to balance the workload for the GPU threads. Tests show that the method provides good performance for most of the matrices tested, compared to the NVIDIA sparse package. The auto-tuning approach is easy to implement, the tuning process is fast, and it is not necessary to convert the matrices into different formats and try them one by one to determine the best format for the matrix, as in some other approaches for this problem.
Proceedings of Scalable Parallel Libraries Conference
... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, ... more ... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, CM-5) while supporting others (KSR, Ncube, Sequent)) along with networks of work-stations, via p4. A schematic diagram is shown in Figure 1. [q'j ... [lo] RJ Harrison. ...
... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of I... more ... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of Illinois at Urbana-Champaign. ... An extreme scale system is one that is one thousand times more capable than a current comparable system, with the same power and physical footprint. ...
This collection of 45 papers and 13 posters from the September 2002 conference focuses on the sof... more This collection of 45 papers and 13 posters from the September 2002 conference focuses on the software and hardware that will enable cluster computing. The researchers discuss task management, network hardware, programming clusters, and scalable clusters. Among the topics are experience in offloadin"
We present a simple auto-tuning method to improve the performance of sparse matrix-vector multipl... more We present a simple auto-tuning method to improve the performance of sparse matrix-vector multiply (SpMV) on a GPU. The sparse matrix, stored in CSR format, is sorted in increasing order of the number of nonzero elements per row and partitioned into several ranges. The number of GPU threads per row (TPR) is then assigned for different ranges of the matrix rows to balance the workload for the GPU threads. Tests show that the method provides good performance for most of the matrices tested, compared to the NVIDIA sparse package. The auto-tuning approach is easy to implement, the tuning process is fast, and it is not necessary to convert the matrices into different formats and try them one by one to determine the best format for the matrix, as in some other approaches for this problem.
Proceedings of Scalable Parallel Libraries Conference
... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, ... more ... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, CM-5) while supporting others (KSR, Ncube, Sequent)) along with networks of work-stations, via p4. A schematic diagram is shown in Figure 1. [q'j ... [lo] RJ Harrison. ...
Uploads
Papers by W. Gropp