Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2014
Microbank: architecting through-silicon interposer-based main memory systems
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 1059–1070https://doi.org/10.1109/SC.2014.91Through-Silicon Interposer (TSI) has recently been proposed to provide high memory bandwidth and improve energy efficiency of the main memory system. However, the impact of TSI on main memory system architecture has not been well explored. While TSI ...
- research-articleNovember 2014
Using an adaptive HPC runtime system to reconfigure the cache hierarchy
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 1047–1058https://doi.org/10.1109/SC.2014.90The cache hierarchy often consumes a large portion of a processor's energy. To save energy in HPC environments, this paper proposes software-controlled reconfiguration of the cache hierarchy with an adaptive runtime system. Our approach addresses the ...
- research-articleNovember 2014
Anton 2: raising the bar for performance and programmability in a special-purpose molecular dynamics supercomputer
- David E. Shaw,
- J. P. Grossman,
- Joseph A. Bank,
- Brannon Batson,
- J. Adam Butts,
- Jack C. Chao,
- Martin M. Deneroff,
- Ron O. Dror,
- Amos Even,
- Christopher H. Fenton,
- Anthony Forte,
- Joseph Gagliardo,
- Gennette Gill,
- Brian Greskamp,
- C. Richard Ho,
- Douglas J. Ierardi,
- Lev Iserovich,
- Jeffrey S. Kuskin,
- Richard H. Larson,
- Timothy Layman,
- Li-Siang Lee,
- Adam K. Lerer,
- Chester Li,
- Daniel Killebrew,
- Kenneth M. Mackenzie,
- Shark Yeuk-Hai Mok,
- Mark A. Moraes,
- Rolf Mueller,
- Lawrence J. Nociolo,
- Jon L. Peticolas,
- Terry Quan,
- Daniel Ramot,
- John K. Salmon,
- Daniele P. Scarpazza,
- U. Ben Schafer,
- Naseer Siddique,
- Christopher W. Snyder,
- Jochen Spengler,
- Ping Tak Peter Tang,
- Michael Theobald,
- Horia Toma,
- Brian Towles,
- Benjamin Vitale,
- Stanley C. Wang,
- Cliff Young
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 41–53https://doi.org/10.1109/SC.2014.9Anton 2 is a second-generation special-purpose supercomputer for molecular dynamics simulations that achieves significant gains in performance, programmability, and capacity compared to its predecessor, Anton 1. The architecture of Anton 2 is tailored ...
- research-articleNovember 2014
ECC parity: a technique for efficient memory error resilience for multi-channel memory systems
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 1035–1046https://doi.org/10.1109/SC.2014.89Servers and HPC systems often use a strong memory error correction code, or ECC, to meet their reliability and availability requirements. However, these ECCs often require significant capacity and/or power overheads. We observe that since memory ...
- research-articleNovember 2014
In-situ feature extraction of large scale combustion simulations using segmented merge trees
- Aaditya G. Landge,
- Valerio Pascucci,
- Attila Gyulassy,
- Janine C. Bennett,
- Hemanth Kolla,
- Jacqueline Chen,
- Peer-Timo Bremer
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 1020–1031https://doi.org/10.1109/SC.2014.88The ever increasing amount of data generated by scientific simulations coupled with system I/O constraints are fueling a need for in-situ analysis techniques. Of particular interest are approaches that produce reduced data representations while ...
-
- research-articleNovember 2014
Scalable computation of stream surfaces on large scale vector fields
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 1008–1019https://doi.org/10.1109/SC.2014.87Stream surfaces and streamlines are two popular methods for visualizing three-dimensional flow fields. While several parallel streamline computation algorithms exist, relatively little research has been done to parallelize stream surface generation. ...
- research-articleNovember 2014
High-performance computation of distributed-memory parallel 3D voronoi and delaunay tessellation
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 997–1007https://doi.org/10.1109/SC.2014.86Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational ...
- research-articleNovember 2014
Finding constant from change: revisiting network performance aware optimizations on IaaS clouds
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 982–993https://doi.org/10.1109/SC.2014.85Network performance aware optimizations have long been an effective approach to optimizing distributed applications on traditional network environments. However, the assumptions of network topology or direct use of several measurements of pair-wise ...
- research-articleNovember 2014
Reciprocal resource fairness: towards cooperative multiple-resource fair sharing in IaaS clouds
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 970–981https://doi.org/10.1109/SC.2014.84Resource sharing in virtualized environments have been demonstrated significant benefits to improve application performance and resource/energy efficiency. However, resource sharing, especially for multiple resource types, poses several severe and ...
- research-articleNovember 2014
FlexSlot: moving hadoop into the cloud with flexible slot management
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 959–969https://doi.org/10.1109/SC.2014.83Load imbalance is a major source of overhead in Hadoop where the uneven distribution of input data among tasks can significantly delays the job completion. Running Hadoop in a private cloud opens up opportunities for mitigating data skew with elastic ...
- research-articleNovember 2014
Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices
- Jongsoo Park,
- Mikhail Smelyanskiy,
- Karthikeyan Vaidyanathan,
- Alexander Heinecke,
- Dhiraj D. Kalamkar,
- Xing Liu,
- Md. Mosotofa Ali Patwary,
- Yutong Lu,
- Pradeep Dubey
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 945–955https://doi.org/10.1109/SC.2014.82A new sparse high performance conjugate gradient benchmark (HPCG) has been recently released to address challenges in the design of sparse linear solvers for the next generation extreme-scale computing systems. Key computation, data access, and ...
- research-articleNovember 2014
Domain decomposition preconditioners for communication-avoiding krylov methods on a hybrid CPU/GPU cluster
- Ichitaro Yamazaki,
- Sivasankaran Rajamanickam,
- Erik G. Boman,
- Mark Hoemmen,
- Michael A. Heroux,
- Stanimire Tomov
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 933–944https://doi.org/10.1109/SC.2014.81Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication-avoiding (CA) techniques can improve Krylov methods' performance on modern ...
- research-articleNovember 2014
Parallelization of reordering algorithms for bandwidth and wavefront reduction
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 921–932https://doi.org/10.1109/SC.2014.80Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering ...
- research-articleNovember 2014
Real-time scalable cortical computing at 46 giga-synaptic OPS/watt with ~100× speedup in time-to-solution and ~100,000× reduction in energy-to-solution
- Andrew S. Cassidy,
- Rodrigo Alvarez-Icaza,
- Filipp Akopyan,
- Jun Sawada,
- John V. Arthur,
- Paul A. Merolla,
- Pallab Datta,
- Marc Gonzalez Tallada,
- Brian Taba,
- Alexander Andreopoulos,
- Arnon Amir,
- Steven K. Esser,
- Jeff Kusnitz,
- Rathinakumar Appuswamy,
- Chuck Haymes,
- Bernard Brezzo,
- Roger Moussalli,
- Ralph Bellofatto,
- Christian Baks,
- Michael Mastro,
- Kai Schleupen,
- Charles E. Cox,
- Ken Inoue,
- Steve Millman,
- Nabil Imam,
- Emmett McQuinn,
- Yutaka Y. Nakamura,
- Ivan Vo,
- Chen Guo,
- Don Nguyen,
- Scott Lekuch,
- Sameh Asaad,
- Daniel Friedman,
- Bryan L. Jackson,
- Myron D. Flickner,
- William P. Risk,
- Rajit Manohar,
- Dharmendra S. Modha
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 27–38https://doi.org/10.1109/SC.2014.8Drawing on neuroscience, we have developed a parallel, event-driven kernel for neurosynaptic computation, that is efficient with respect to computation, memory, and communication. Building on the previously demonstrated highly-optimized software ...
- research-articleNovember 2014
Optimization of a multilevel checkpoint model with uncertain execution scales
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 907–918https://doi.org/10.1109/SC.2014.79Future extreme-scale systems are expected to experience different types of failures affecting applications with different failure scales, from transient uncorrectable memory errors in processes to massive system outages. In this paper, we propose a ...
- research-articleNovember 2014
Exploring automatic, online failure recovery for scientific applications at extreme scales
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 895–906https://doi.org/10.1109/SC.2014.78Application resilience is a key challenge that must be addressed in order to realize the exascale vision. Process/node failures, an important class of failures, are typically handled today by terminating the job and restarting it from the last stored ...
- research-articleNovember 2014
Understanding the effects of communication and coordination on checkpointing at scale
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 883–894https://doi.org/10.1109/SC.2014.77Fault-tolerance poses a major challenge for future large-scale systems. Active research into coordinated, uncoordinated, and hybrid checkpointing systems has explored how the introduction of asynchrony can address anticipated scalability issues. However,...
- research-articleNovember 2014
DISC: a domain-interaction based programming model with support for heterogeneous execution
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 869–880https://doi.org/10.1109/SC.2014.76Several emerging trends are pointing to increasing heterogeneity among nodes and/or cores in HPC systems. Existing programming models, especially for distributed memory execution, typically have been designed to facilitate high performance on ...
- research-articleNovember 2014
Optimizing data locality for fork/join programs using constrained work stealing
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 857–868https://doi.org/10.1109/SC.2014.75We present an approach to improving data locality across different phases of fork/join programs scheduled using work stealing. The approach consists of: (1) user-specified and automated approaches to constructing a steal tree, the schedule of steal ...
- research-articleNovember 2014
Structure slicing: extending logical regions with fields
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisPages 845–856https://doi.org/10.1109/SC.2014.74Applications on modern supercomputers are increasingly limited by the cost of data movement, but mainstream programming systems have few abstractions for describing the structure of a program's data. Consequently, the burden of managing data movement, ...