ABSTRACT Practical and efficient methods exist for parallelizing the numerical work in sparse mat... more ABSTRACT Practical and efficient methods exist for parallelizing the numerical work in sparse matrix calculations. The initial symbolic analysis is now becoming a sequential bottleneck, limiting problems' sizes. One such analysis is the weighted bipartite matching used to achieve scalable, unsymmetric $LU$ factorization in Supertextsclu. Applying a mathematical optimization algorithm produces a distributed-memory implementation with explicit trade-offs between speed and matching quality. We present accuracy and performance results for this phase alone and in the context of Supertextsclu.
Traditional pivoting during parallel, unsymmetric $LU$ factorization introduces heavy communicati... more Traditional pivoting during parallel, unsymmetric $LU$ factorization introduces heavy communication and restructuring costs. Possible alternatives include pre-pivoting to place heavy elements along the diagonal and limited pivoting that maintains the factors' structures. Each alternative comes with trade-offs that affect accuracy and performance.
The purpose of this document is to facilitate contributions to LAPACK and ScaLAPACK by documentin... more The purpose of this document is to facilitate contributions to LAPACK and ScaLAPACK by documenting their design and implementation guidelines. The long-term goal is to provide guidelines for both LAPACK and ScaLAPACK. However, the parallel ScaLAPACK code has more open issues, so this document primarily concerns LAPACK.
For sparse LU factorization, dynamic pivoting tightly couples symbolic and numerical computation.... more For sparse LU factorization, dynamic pivoting tightly couples symbolic and numerical computation. Dynamic structural changes limit parallel scalability. Demmel and Li use static pivoting in distributed SuperLU for performance, but intentionally perturbing the input may lead silently to erroneous results. Are there experimentally stable static pivoting heuristics that lead to a dependable direct solver? The answer is currently a qualified yes. Current heuristics fail on a few systems, but all failures are detectable.
Practical and efficient methods exist for parallelizing the numerical work in sparse matrix calcu... more Practical and efficient methods exist for parallelizing the numerical work in sparse matrix calculations. The initial symbolic analysis is now becoming a sequential bottleneck, limiting problems' sizes. One such analysis is the weighted bipartite matching used to achieve scalable, unsymmetric $LU$ factorization in Supertextsclu. Applying a mathematical optimization algorithm produces a distributed-memory implementation with explicit trade-offs between speed and matching quality. We present accuracy and performance results for this phase alone and in the context of Supertextsclu.
Current tools for analyzing graph-structured data and semantic networks focus on static graphs. O... more Current tools for analyzing graph-structured data and semantic networks focus on static graphs. Our STING package tackles analysis of streaming graphs like today's social networks and communication tools. STING maintains a massive graph under changes while coordinating analysis kernels to achieve analysis at real-world data rates. We show examples of local metrics like clustering coefficients and global metrics like connected components and agglomerative clustering. STING supports parallel Intel architectures as well as the Cray XMT.
An increasingly fast-paced, digital world has produced an ever-growing volume of petabyte-sized d... more An increasingly fast-paced, digital world has produced an ever-growing volume of petabyte-sized datasets. At the same time, terabytes of new, unstructured data arrive daily. As the desire to ask more detailed questions about these massive streams has grown, parallel software and hardware have only recently begun to enable complex analytics in this non-scientific space. In this tutorial, we will discuss the open problems facing us with analyzing this "data deluge". We will present algorithms and data structures capable of analyzing spatio-temporal data at massive scale on parallel systems. We will try to understand the difficulties and bottlenecks in parallel graph algorithm design on current systems and will show how multithreaded and hybrid systems can overcome these challenges. We will demonstrate how parallel graph algorithms can be implemented on a variety of architectures using different programming models. The goal of this tutorial is to provide a comprehensive intro...
Graph-structured data in social networks, finance, network security, and others not only are mass... more Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss analysis algorithms for streaming graph data that maintain both local and global metrics. We extract parallelism from both analysis kernel and graph data to scale performance to real-world sizes.
ABSTRACT Practical and efficient methods exist for parallelizing the numerical work in sparse mat... more ABSTRACT Practical and efficient methods exist for parallelizing the numerical work in sparse matrix calculations. The initial symbolic analysis is now becoming a sequential bottleneck, limiting problems' sizes. One such analysis is the weighted bipartite matching used to achieve scalable, unsymmetric $LU$ factorization in Supertextsclu. Applying a mathematical optimization algorithm produces a distributed-memory implementation with explicit trade-offs between speed and matching quality. We present accuracy and performance results for this phase alone and in the context of Supertextsclu.
Traditional pivoting during parallel, unsymmetric $LU$ factorization introduces heavy communicati... more Traditional pivoting during parallel, unsymmetric $LU$ factorization introduces heavy communication and restructuring costs. Possible alternatives include pre-pivoting to place heavy elements along the diagonal and limited pivoting that maintains the factors' structures. Each alternative comes with trade-offs that affect accuracy and performance.
The purpose of this document is to facilitate contributions to LAPACK and ScaLAPACK by documentin... more The purpose of this document is to facilitate contributions to LAPACK and ScaLAPACK by documenting their design and implementation guidelines. The long-term goal is to provide guidelines for both LAPACK and ScaLAPACK. However, the parallel ScaLAPACK code has more open issues, so this document primarily concerns LAPACK.
For sparse LU factorization, dynamic pivoting tightly couples symbolic and numerical computation.... more For sparse LU factorization, dynamic pivoting tightly couples symbolic and numerical computation. Dynamic structural changes limit parallel scalability. Demmel and Li use static pivoting in distributed SuperLU for performance, but intentionally perturbing the input may lead silently to erroneous results. Are there experimentally stable static pivoting heuristics that lead to a dependable direct solver? The answer is currently a qualified yes. Current heuristics fail on a few systems, but all failures are detectable.
Practical and efficient methods exist for parallelizing the numerical work in sparse matrix calcu... more Practical and efficient methods exist for parallelizing the numerical work in sparse matrix calculations. The initial symbolic analysis is now becoming a sequential bottleneck, limiting problems' sizes. One such analysis is the weighted bipartite matching used to achieve scalable, unsymmetric $LU$ factorization in Supertextsclu. Applying a mathematical optimization algorithm produces a distributed-memory implementation with explicit trade-offs between speed and matching quality. We present accuracy and performance results for this phase alone and in the context of Supertextsclu.
Current tools for analyzing graph-structured data and semantic networks focus on static graphs. O... more Current tools for analyzing graph-structured data and semantic networks focus on static graphs. Our STING package tackles analysis of streaming graphs like today's social networks and communication tools. STING maintains a massive graph under changes while coordinating analysis kernels to achieve analysis at real-world data rates. We show examples of local metrics like clustering coefficients and global metrics like connected components and agglomerative clustering. STING supports parallel Intel architectures as well as the Cray XMT.
An increasingly fast-paced, digital world has produced an ever-growing volume of petabyte-sized d... more An increasingly fast-paced, digital world has produced an ever-growing volume of petabyte-sized datasets. At the same time, terabytes of new, unstructured data arrive daily. As the desire to ask more detailed questions about these massive streams has grown, parallel software and hardware have only recently begun to enable complex analytics in this non-scientific space. In this tutorial, we will discuss the open problems facing us with analyzing this "data deluge". We will present algorithms and data structures capable of analyzing spatio-temporal data at massive scale on parallel systems. We will try to understand the difficulties and bottlenecks in parallel graph algorithm design on current systems and will show how multithreaded and hybrid systems can overcome these challenges. We will demonstrate how parallel graph algorithms can be implemented on a variety of architectures using different programming models. The goal of this tutorial is to provide a comprehensive intro...
Graph-structured data in social networks, finance, network security, and others not only are mass... more Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss analysis algorithms for streaming graph data that maintain both local and global metrics. We extract parallelism from both analysis kernel and graph data to scale performance to real-world sizes.
Uploads
Papers by Jason Riedy