- Sponsor:
- sighpc
No abstract available.
Proceeding Downloads
The implementation of MPI-2 one-sided communication for the NEC SX-5
We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote Memory Access) for the NEC SX-5 vector supercomputer. MPI/SX is a non-threaded implementation of the full MPI-2 standard. Essential features of the ...
Single sided MPI implementations for SUN MPIr
This paper describes an implementation of generic MPI-2 single-sided communications for SUN-MPI. Our implementation is layered on top of point-to-point MPI communications and therefore can be adapted to other MPI implementations.
The code is ...
Automatically tuned collective communications
The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, ...
Landing CG on EARTH: a case study of fine-grained multithreading on an evolutionary path
We report on our work in developing a fine-grained multithreaded solution for the communication-intensive Conjugate Gradient (CG) problem. In our recent work, we have developed a simple, yet very efficient, solution to executing matrix-vector multiply ...
Parallel smoothed aggregation multigrid: aggregation strategies on massively parallel machines
Algebraic multigrid methods offer the hope that multigrid convergence can be achieved (for at least some important applications) without a great deal of effort from engineers and scientists wishing to solve linear systems. In this paper we consider ...
Scalable algorithms for adaptive statistical designs
We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment, and we focus on the problem of ...
Randomization, speculation, and adaptation in batch schedulers
This paper proposes extensions to the backfilling job-scheduling algorithm that significantly improve its performance. We introduce variations that sort the "backfilling order" in priority-based and randomized fashions. We examine the effectiveness of ...
An object-oriented job execution environment
This is a project for developing a distributed job execution environment for highly iterative jobs. An iterative job is one where the same binary code is run hundreds of times with incremental changes in the input values for each run. An execution ...
Towards an integrated, web-executable parallel programming tool environment
We present a new parallel programming tool environment that is (1) accessible and executable “anytime, anywhere,” through standard Web browsers and (2) integrated in that it provides tools which adhere to a common underlying methodology for parallel ...
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP clusters. To address this we study an algorithm from Discrete Element Modeling, ...
A comparison of three programming models for adaptive applications on the Origin2000
Adaptive applications have computational workloads and communication patterns which change unpredictably at runtime, requiring load balancing to achieve scalable performance on parallel machines. Efficient parallel implementation of such adaptive ...
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we ...
A wrapper generator for wrapping high performance legacy codes as Java/CORBA components
This paper describes a Wrapper Generator for wrapping high performance legacy codes as Java/CORBA components for use in a distributed component-based problem-solving environment. Using the Wrapper Generator we have automatically wrapped an MPI-based ...
A scalable SNMP-based distibuted monitoring system for heterogeneous network computing
Traditional centralized monitoring systems do not scale to present-day large, complex, network-computing systems. Based on recent SNMP standards for distributed management, this paper addresses the scalability problem through distribution of monitoring ...
ESP: a system utilization benchmark
This article describes a new benchmark, called the Effective System Performance (ESP) test, which is designed to measure system-level performance, including such factors as job scheduling efficiency, handling of large jobs and shutdown-reboot times. In ...
PM2: a high performance communication middleware for heterogeneous network environments
This paper introduces a high performance communication middle layer, called PM2, for heterogeneous network environments. PM2 currently supports Myrinet, Ethernet, and SMP. Binary code written in PM2 or written in a communication library, such as MPICH-...
Performance and interoperability issues in incorporating cluster management systems within a wide-area network-computing environment
This paper describes the performance and interoperability issues that arise in the process of integrating cluster management systems into a wide-area network-computing environment, and provides solutions in the context of the Purdue University Network ...
Architectural and performance evaluation of GigaNet and Myrinet interconnects on clusters of small-scale SMP servers
GigaNet and Myrinet are two of the leading interconnects for clusters of commodity computer systems. Both provide memory-protected user-level network interface access, and deliver low-latency and high-bandwidth communication to applications. GigaNet is ...
MPICH-GQ: quality-of-service for message passing programs
Parallel programmers typically assume that all resources required for a program's execution are dedicated to that purpose. However, in local and wide area networks, contention for shared networks, CPUs, and I/O systems can result in significant ...
Scalable fault-tolerant distributed shared memory
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a home-based lazy release consistency (HLRC) DSM system with independent ...
Realizing fault resilience in Web-server cluster
Today, a successful Internet service is absolutely critical to be up 100 percent of the time. Server clustering is the most promising approach to meet this requirement. However, the existing Web server-clustering solutions merely can provide high ...
Data access performance in a large and dynamic pharmaceutical drug candidate database
- Zina Ben-Miled,
- Yang Liu,
- Michael Bem,
- Robert Jones,
- Robert Oppelt,
- Samuel Milosevich,
- Dave Powers,
- Omran Bukhres
An explosion in the amount of data generated through chemical and biological experimentation has been observed in recent years. This rapid proliferation of vast amounts of data has led to a set of cheminformatics and bioinformatics applications that ...
Real-time biomechanical simulation of volumetric brain deformation for image guided neurosurgery
We aimed to study the performance of a parallel implementation of an intraoperative nonrigid registration algorithm that accurately simulates the biomechanical properties of the brain and its deformations during surgery. The algorithm was designed to ...
Computer simulations of cardiac electrophysiology
- John B. Pormann,
- Craig S. Henriquez,
- John A. Board,
- Donald J. Rose,
- David M. Harrild,
- Alexandra P. Henriquez
CardioWave is a modular system for simulating wavefront conduction in the heart. These simulations may be used to investigate the factors that generate and sustain life-threatening arrhythmias such as ventricular fibrillation. The user selects a set of ...
Parallel algorithms for radiation transport on unstructured grids
The method of discrete ordinates is commonly used to solve the Boltzmann radiation transport equation for applications ranging from simulations of fires to weapons effects. The equations are most efficiently solved by sweeping the radiation flux across ...
A parallel dynamic-mesh Lagrangian method for simulation of flows with dynamic interfaces
Many important phenomena in science and engineering, including our motivating problem of microstructural blood flow, can be modeled as flows with dynamic interfaces. The major challenge faced in simulating such flows is resolving the interfacial motion. ...
Self-consistent Langevin simulation of Coulomb collisions in charged-particle beams
In many plasma physics and charged-particle beam dynamics problems, Coulomb collisions are modeled by a Fokker-Planck equation. In order to incorporate these collisions, we present a three-dimensional parallel Langevin simulation method using a Particle-...
Using high-speed WANs and network data caches to enable remote and distributed visualization
Visapult is a prototype application and framework for remote visualization of large scientific datasets. We approach the technical challenges of tera-scale visualization with a unique architecture which employs high speed WANs and network data caches ...
High performance visualization of time-varying volume data over a wide-area network status
This paper presents an end-to-end, low-cost solution for visualizing time-varying volume data rendered on a parallel computer located at a remote site. Pipelining and careful grouping of processors are used to hide I/O time and to maximize processor ...
Distributed rendering for scalable displays
We describe a novel distributed graphics system that allows an application to render to a large tiled display. Our system, called WireGL, uses a cluster of off-the-shelf PCs connected with a high-speed network. WireGL allows an unmodified existing ...
Index Terms
- Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
SC '17 | 327 | 61 | 19% |
SC '16 | 442 | 81 | 18% |
SC '15 | 358 | 79 | 22% |
SC '14 | 394 | 83 | 21% |
SC '13 | 449 | 91 | 20% |
SC '12 | 461 | 100 | 22% |
SC '11 | 352 | 74 | 21% |
SC '10 | 253 | 51 | 20% |
SC '09 | 261 | 59 | 23% |
SC '08 | 277 | 59 | 21% |
SC '07 | 268 | 54 | 20% |
SC '06 | 239 | 54 | 23% |
SC '05 | 260 | 62 | 24% |
SC '04 | 200 | 60 | 30% |
SC '03 | 207 | 60 | 29% |
SC '02 | 230 | 67 | 29% |
SC '01 | 240 | 60 | 25% |
SC '00 | 179 | 62 | 35% |
Supercomputing '95 | 241 | 69 | 29% |
Supercomputing '93 | 300 | 72 | 24% |
Supercomputing '92 | 220 | 75 | 34% |
Supercomputing '91 | 215 | 83 | 39% |
Overall | 6,373 | 1,516 | 24% |