Author: Daga, Mayank : Search

research-article

Multiscale Approximation with Graphical Processing Units for Multiplicative Speedup in Molecular Dynamics

BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsPages 453–462https://doi.org/10.1145/2975167.2975214

The timescales and structure sizes accessible via simulations of atomistic molecular dynamics (MD) can be advanced substantially by two independent techniques: (1) many-core parallelization with graphics processing units (GPUs) and (2) multiscale ...

extended-abstract

clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library

IWOCL '16: Proceedings of the 4th International Workshop on OpenCLArticle No.: 7, Pages 1–4https://doi.org/10.1145/2909437.2909442

Sparse linear algebra is a cornerstone of modern computational science. These algorithms ignore the zero-valued entries found in many domains in order to work on much larger problems at much faster rates than dense algorithms. Nonetheless, optimizing ...

research-article

Implementing directed acyclic graphs with the heterogeneous system architecture

GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing UnitPages 53–62https://doi.org/10.1145/2884045.2884052

Achieving optimal performance on heterogeneous computing systems requires a programming model that supports the execution of asynchronous, multi-stream, and out-of-order tasks in a shared memory environment. Asynchronous dependency-driven tasking is one ...

Article

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

HIPC '15: Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)Pages 64–74https://doi.org/10.1109/HiPC.2015.55

Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. ...

Article

Exploring Parallel Programming Models for Heterogeneous Computing Systems

IISWC '15: Proceedings of the 2015 IEEE International Symposium on Workload CharacterizationPages 98–107https://doi.org/10.1109/IISWC.2015.16

Parallel systems that employ CPUs and GPUs as two heterogeneous computational units have become immensely popular due to their ability to maximize performance under restrictive thermal budgets. However, programming heterogeneous systems via traditional ...

Article

On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems

IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium WorkshopPages 871–879https://doi.org/10.1109/IPDPSW.2015.131

Graphics processing units (GPUs) have delivered promising speedups in data-parallel applications. A discrete GPU resides on the PCIe interface and has traditionally required data to be moved from the host memory to the GPU memory via PCIe. In certain ...

research-article

Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format

SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 769–780https://doi.org/10.1109/SC.2014.68

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) ...

Article

Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU

SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisPages 240–247https://doi.org/10.1109/SC.Companion.2012.40

B+ tree structured index searches are one of the fundamental database operations and hence, accelerating them is essential. GPUs provide a compelling mix of performance per watt and performance per dollar, and thus are an attractive platform for ...

Article

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsPages 316–323https://doi.org/10.1109/ICPADS.2011.29

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...

Article

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingPages 141–149https://doi.org/10.1109/SAAHPC.2011.29

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...

research-article

Bounding the effect of partition camping in GPU kernels

CF '11: Proceedings of the 8th ACM International Conference on Computing FrontiersArticle No.: 27, Pages 1–10https://doi.org/10.1145/2016604.2016637

Current GPU tools and performance models provide some common architectural insights that guide the programmers to write optimal code. We challenge and complement these performance models and tools, by modeling and analyzing a lesser known, but very ...

Article

Towards accelerating molecular modeling via multi-scale approximation on a GPU

ICCABS '11: Proceedings of the 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical SciencesPages 75–80https://doi.org/10.1109/ICCABS.2011.5729946

Research efforts to analyze biomolecular properties contribute towards our understanding of biomolecular function. Calculating non-bonded forces (or in our case, electrostatic surface potential) is often a large portion of the computational complexity ...

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Multiscale Approximation with Graphical Processing Units for Multiplicative Speedup in Molecular Dynamics

clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library

Implementing directed acyclic graphs with the heterogeneous system architecture

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Exploring Parallel Programming Models for Heterogeneous Computing Systems

On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems

Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format

Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Bounding the effect of partition camping in GPU kernels

Towards accelerating molecular modeling via multi-scale approximation on a GPU

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder