Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2016
Multiscale Approximation with Graphical Processing Units for Multiplicative Speedup in Molecular Dynamics
BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsPages 453–462https://doi.org/10.1145/2975167.2975214The timescales and structure sizes accessible via simulations of atomistic molecular dynamics (MD) can be advanced substantially by two independent techniques: (1) many-core parallelization with graphics processing units (GPUs) and (2) multiscale ...
- extended-abstractApril 2016
clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library
IWOCL '16: Proceedings of the 4th International Workshop on OpenCLArticle No.: 7, Pages 1–4https://doi.org/10.1145/2909437.2909442Sparse linear algebra is a cornerstone of modern computational science. These algorithms ignore the zero-valued entries found in many domains in order to work on much larger problems at much faster rates than dense algorithms. Nonetheless, optimizing ...
- research-articleMarch 2016
Implementing directed acyclic graphs with the heterogeneous system architecture
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing UnitPages 53–62https://doi.org/10.1145/2884045.2884052Achieving optimal performance on heterogeneous computing systems requires a programming model that supports the execution of asynchronous, multi-stream, and out-of-order tasks in a shared memory environment. Asynchronous dependency-driven tasking is one ...
- ArticleDecember 2015
Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
HIPC '15: Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)Pages 64–74https://doi.org/10.1109/HiPC.2015.55Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. ...
- ArticleOctober 2015
Exploring Parallel Programming Models for Heterogeneous Computing Systems
IISWC '15: Proceedings of the 2015 IEEE International Symposium on Workload CharacterizationPages 98–107https://doi.org/10.1109/IISWC.2015.16Parallel systems that employ CPUs and GPUs as two heterogeneous computational units have become immensely popular due to their ability to maximize performance under restrictive thermal budgets. However, programming heterogeneous systems via traditional ...
- ArticleMay 2015
On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium WorkshopPages 871–879https://doi.org/10.1109/IPDPSW.2015.131Graphics processing units (GPUs) have delivered promising speedups in data-parallel applications. A discrete GPU resides on the PCIe interface and has traditionally required data to be moved from the host memory to the GPU memory via PCIe. In certain ...
- research-articleNovember 2014
Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 769–780https://doi.org/10.1109/SC.2014.68The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) ...
- ArticleNovember 2012
Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisPages 240–247https://doi.org/10.1109/SC.Companion.2012.40B+ tree structured index searches are one of the fundamental database operations and hence, accelerating them is essential. GPUs provide a compelling mix of performance per watt and performance per dollar, and thus are an attractive platform for ...
- ArticleDecember 2011
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsPages 316–323https://doi.org/10.1109/ICPADS.2011.29The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
- ArticleJuly 2011
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingPages 141–149https://doi.org/10.1109/SAAHPC.2011.29The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
- research-articleMay 2011
Bounding the effect of partition camping in GPU kernels
CF '11: Proceedings of the 8th ACM International Conference on Computing FrontiersArticle No.: 27, Pages 1–10https://doi.org/10.1145/2016604.2016637Current GPU tools and performance models provide some common architectural insights that guide the programmers to write optimal code. We challenge and complement these performance models and tools, by modeling and analyzing a lesser known, but very ...
- ArticleFebruary 2011
Towards accelerating molecular modeling via multi-scale approximation on a GPU
ICCABS '11: Proceedings of the 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical SciencesPages 75–80https://doi.org/10.1109/ICCABS.2011.5729946Research efforts to analyze biomolecular properties contribute towards our understanding of biomolecular function. Calculating non-bonded forces (or in our case, electrostatic surface potential) is often a large portion of the computational complexity ...