Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2022
Optimal size of the block in block GMRES on GPUs: computational model and experiments
Numerical Algorithms (SPNA), Volume 92, Issue 1Pages 119–147https://doi.org/10.1007/s11075-022-01439-zAbstractThe block version of GMRES (BGMRES) is most advantageous over the single right hand side (RHS) counterpart when the cost of communication is high while the cost of floating point operations is not. This is the particular case on modern graphics ...
- research-articleSeptember 2022
Low-synch Gram–Schmidt with delayed reorthogonalization for Krylov solvers
AbstractThe parallel strong-scaling of iterative methods is often determined by the number of global reductions at each iteration. Low-synch Gram–Schmidt algorithms are applied here to the Arnoldi algorithm to reduce the number of global ...
- research-articleMay 2022
Parallel graph coloring algorithms for distributed GPU environments
AbstractGraph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring ...
Highlights- We present the first multi-GPU graph coloring implementation.
- Our framework ...
- research-articleNovember 2021
EXAGRAPH: Graph and combinatorial methods for enabling exascale applications
- Seher Acer,
- Ariful Azad,
- Erik G Boman,
- Aydın Buluç,
- Karen D. Devine,
- SM Ferdous,
- Nitin Gawande,
- Sayan Ghosh,
- Mahantesh Halappanavar,
- Ananth Kalyanaraman,
- Arif Khan,
- Marco Minutoli,
- Alex Pothen,
- Sivasankaran Rajamanickam,
- Oguz Selvitopi,
- Nathan R Tallent,
- Antonino Tumeo,
- Tim Germann
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 35, Issue 6Pages 553–571https://doi.org/10.1177/10943420211029299Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to ...
- research-articleJuly 2021
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic
- Ahmad Abdelfattah,
- Hartwig Anzt,
- Erik G Boman,
- Erin Carson,
- Terry Cojean,
- Jack Dongarra,
- Alyson Fox,
- Mark Gates,
- Nicholas J Higham,
- Xiaoye S Li,
- Jennifer Loe,
- Piotr Luszczek,
- Srikara Pranesh,
- Siva Rajamanickam,
- Tobias Ribizel,
- Barry F Smith,
- Kasia Swirydowicz,
- Stephen Thomas,
- Stanimire Tomov,
- Yaohung M Tsai,
- Ulrike Meier Yang
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 35, Issue 4Pages 344–369https://doi.org/10.1177/10943420211003313The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of low-precision special-function units designed for machine ...
-
- research-articleJanuary 2020
Scalable Asynchronous Domain Decomposition Solvers
SIAM Journal on Scientific Computing (SISC), Volume 42, Issue 6Pages C384–C409https://doi.org/10.1137/19M1291303Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes and more heterogeneous compute architectures make load balancing and the design of ...
- research-articleJanuary 2020
An Algebraic Sparsified Nested Dissection Algorithm Using Low-Rank Approximations
SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 41, Issue 2Pages 715–746https://doi.org/10.1137/19M123806XWe propose a new algorithm for the fast solution of large, sparse, symmetric positive-definite linear systems, spaND (sparsified Nested Dissection). It is based on nested dissection, sparsification, and low-rank compression. After eliminating all ...
- research-articleNovember 2019
A robust hierarchical solver for ill-conditioned systems with applications to ice sheet modeling
Journal of Computational Physics (JOCP), Volume 396, Issue CPages 819–836https://doi.org/10.1016/j.jcp.2019.07.024AbstractA hierarchical solver is proposed for solving sparse ill-conditioned linear systems in parallel. The solver is based on a modification of the LoRaSp method, but employs a deferred-compression technique, which provably reduces the ...
Highlights- We introduced the deferred-compression technique in hierarchical solvers for solving sparse ill-conditioned linear systems.
- research-articleMay 2018
A distributed-memory hierarchical solver for general sparse linear systems
Highlights- Derived a new formulation of a sequential hierarchical solver, which compresses dense fill-in blocks.
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because ...
- research-articleNovember 2014
Domain decomposition preconditioners for communication-avoiding krylov methods on a hybrid CPU/GPU cluster
- Ichitaro Yamazaki,
- Sivasankaran Rajamanickam,
- Erik G. Boman,
- Mark Hoemmen,
- Michael A. Heroux,
- Stanimire Tomov
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 933–944https://doi.org/10.1109/SC.2014.81Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication-avoiding (CA) techniques can improve Krylov methods' performance on modern ...
- research-articleNovember 2013
Scalable matrix computations on large scale-free graphs using 2D graph partitioning
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 50, Pages 1–12https://doi.org/10.1145/2503210.2503293Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two-dimensional ...
- ArticleMay 2012
Multithreaded Algorithms for Maxmum Matching in Bipartite Graphs
IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing SymposiumPages 860–872https://doi.org/10.1109/IPDPS.2012.82We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multicore and massively multithreaded computers. As computers with larger numbers of slower cores dominate the commodity processor ...
- ArticleMay 2012
ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms
IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing SymposiumPages 631–643https://doi.org/10.1109/IPDPS.2012.64With the ubiquity of multicore processors, it is crucial that solvers adapt to the hierarchical structure of modern architectures. We present ShyLU, a ``hybrid-hybrid'' solver for general sparse linear systems that is hybrid in two ways: First, it ...
- articleApril 2012
The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring
Partitioning and load balancing are important problems in scientific computing that can be modeled as combinatorial problems using graphs or hypergraphs. The Zoltan toolkit was developed primarily for partitioning and load balancing to support dynamic ...
- articleMarch 2012
A Quasi-algebraic Multigrid Approach to Fracture Problems Based on Extended Finite Elements
SIAM Journal on Scientific Computing (SISC), Volume 34, Issue 2Pages 603–626https://doi.org/10.1137/110819913The modeling of discontinuities arising from fracture of materials poses a number of significant computational challenges. The extended finite element method provides an attractive alternative to standard finite elements in that they do not require fine ...
- posterNovember 2011
Poster: a hybrid-hybrid solver for manycore platforms
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis CompanionPages 35–36https://doi.org/10.1145/2148600.2148619With the increasing levels of parallelism in a compute node, it is important to exploit multiple levels of parallelism even within a single compute node. We present ShyLU (pronounced "Shy-loo" for Scalable Hybrid LU), a "hybrid-hybrid" solver for ...
- ArticleAugust 2011
Enabling next-generation parallel circuit simulation with trilinos
Euro-Par'11: Proceedings of the 2011 international conference on Parallel ProcessingPages 315–323https://doi.org/10.1007/978-3-642-29737-3_36The Xyce Parallel Circuit Simulator, which has demonstrated scalable circuit simulation on hundreds of processors, heavily leverages the high-performance scientific libraries provided by Trilinos. With the move towards multi-core CPUs and GPU technology,...
- articleNovember 2010
Hypergraph-Based Unsymmetric Nested Dissection Ordering for Sparse LU Factorization
SIAM Journal on Scientific Computing (SISC), Volume 32, Issue 6Pages 3426–3446https://doi.org/10.1137/080720395In this paper we discuss a hypergraph-based unsymmetric nested dissection (HUND) ordering for reducing the fill-in incurred during Gaussian elimination. It has several important properties. It takes a global perspective of the entire matrix, as opposed ...
- articleAugust 2010
Distributed-Memory Parallel Algorithms for Distance-2 Coloring and Related Problems in Derivative Computation
SIAM Journal on Scientific Computing (SISC), Volume 32, Issue 4Pages 2418–2446https://doi.org/10.1137/080732158The distance-2 graph coloring problem aims at partitioning the vertex set of a graph into the fewest sets consisting of vertices pairwise at distance greater than 2 from each other. Its applications include derivative computation in numerical ...
- ArticleJune 2010
Factors impacting performance of multithreaded sparse triangular solve
As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical ...