SIGMETRICS: Vol 38, No 4

Volume 38, Issue 4March 2011Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)

Volume 38, Issue 4

March 2011

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:0163-5999

Tags:

Performance
Design
Linear algebra algorithms
Modeling methodologies
Traceability
GPU
benchmarking
CFD
CUDA
OpenCL

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this NewsletterAlerts Save to BinderBinder Export CitationCitation

Share on

Reflects downloads up to 18 Feb 2025Bibliometrics

Citation Count

510

Downloads (6 weeks)

Downloads (12 months)

287

Downloads (cumulative)

5,961

Sections

Volume 38 , Issue 4

March 2011

PreviousIssue NextIssue

Newsletter Downloads

PDF(cover-contents, organization, mission statement, editor's notes)

PDF(ads, SIGMMETRICS application, contents cont.)

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SPECIAL ISSUE: Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)

section

Session details: Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)

Stephen A. Jarvis

https://doi.org/10.1145/3263957

- 1
- 13
Metrics
Total Citations1
Total Downloads13
Last 12 Months0
Last 6 weeks0

Get Access

research-article

Using simulation to design extremescale applications and architectures: programming model exploration

Curtis L. Janssen,
Helgi Adalsteinsson,
Joseph P. Kenny

Pages 4–8https://doi.org/10.1145/1964218.1964220

A key problem facing application developers is that they are expected to utilize extreme levels of parallelism soon after delivery of future leadership class machines, but developing applications capable of exposing sufficient concurrency is a time ...

- 31
- 574
Metrics
Total Citations31
Total Downloads574
Last 12 Months3
Last 6 weeks0

Abstract
Get Access

research-article

Performance analysis of the OP2 framework on many-core architectures

M. B. Giles,
G. R. Mudalige,
Z. Sharif,
G. Markall,
P. H.J. Kelly

Pages 9–15https://doi.org/10.1145/1964218.1964221

We present a performance analysis and benchmarking study of the OP2 "active" library, which provides an abstraction framework for the solution of parallel unstructured mesh applications. OP2 aims to decouple the scientific specification of the ...

- 32
- 299
Metrics
Total Citations32
Total Downloads299
Last 12 Months17
Last 6 weeks0

Abstract
Get Access

research-article

Benchmarking and modelling of POWER7, Westmere, BG/P, and GPUs: an industry case study

J. A. Herdman,
W. P. Gaudin,
D. Turland,
S. D. Hammond

Pages 16–22https://doi.org/10.1145/1964218.1964222

This paper introduces an industry strength, multi-purpose, benchmark: Shamrock. Developed at the Atomic Weapons Establishment (AWE), Shamrock is a two dimensional (2D) structured hydrocode; one of its aims is to assess the impacts of a change in ...

- 2
- 302
Metrics
Total Citations2
Total Downloads302
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

research-article

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

S. J. Pennycook,
S. D. Hammond,
S. A. Jarvis,
G. R. Mudalige

Pages 23–29https://doi.org/10.1145/1964218.1964223

We present the performance analysis of a port of the LU benchmark from the NAS Parallel Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and report on the optimisation efforts employed to take advantage of this platform. ...

- 37
- 620
Metrics
Total Citations37
Total Downloads620
Last 12 Months13
Last 6 weeks1

Abstract
Get Access

research-article

Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

Sandeep Budanur,
Frank Mueller,
Todd Gamblin

Pages 30–36https://doi.org/10.1145/1964218.1964224

Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes with hundreds of cores and non-uniform memory access latencies are expected within the next decade. However, even current petascale systems with tens of ...

- 5
- 123
Metrics
Total Citations5
Total Downloads123
Last 12 Months5
Last 6 weeks0

Abstract
Get Access

research-article

The structural simulation toolkit

A. F. Rodrigues,
K. S. Hemmert,
B. W. Barrett,
C. Kersey,
R. Oldfield,
M. Weston,
R. Risen,
J. Cook,
P. Rosenfeld,
E. Cooper-Balis,
B. Jacob

Pages 37–42https://doi.org/10.1145/1964218.1964225

As supercomputers grow, understanding their behavior and performance has become increasingly challenging. New hurdles in scalability, programmability, power consumption, reliability, cost, and cooling are emerging, along with new technologies such as 3D ...

- 277
- 1,534
Metrics
Total Citations277
Total Downloads1,534
Last 12 Months202
Last 6 weeks31

Abstract
Get Access

research-article

Parallel memory prediction for fused linear algebra kernels

Ian Karlin,
Elizabeth Jessup,
Geoffrey Belter,
Jeremy G. Siek

Pages 43–49https://doi.org/10.1145/1964218.1964226

The performance of many scientific programs is limited by data movement. Loop fusion is one optimization used to increase the speed of memory bound operations. To automate loop fusion for matrix computations, we developed the Build to Order (BTO) ...

- 2
- 164
Metrics
Total Citations2
Total Downloads164
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

research-article

A fast GEMM implementation on the cypress GPU

Naohito Nakasato

Pages 50–55https://doi.org/10.1145/1964218.1964227

We present benchmark results of optimized dense matrix multiplication kernels for Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP) precision. Our SGEMM and DGEMM kernels show ~ 2 Top/s and ...

- 26
- 400
Metrics
Total Citations26
Total Downloads400
Last 12 Months16
Last 6 weeks3

Abstract
Get Access

research-article

Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

Xingfu Wu,
Valerie Taylor

Pages 56–62https://doi.org/10.1145/1964218.1964228

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the ...

- 19
- 350
Metrics
Total Citations19
Total Downloads350
Last 12 Months11
Last 6 weeks2

Abstract
Get Access

research-article

A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration

Ming-yu Hsieh,
Arun Rodrigues,
Rolf Riesen,
Kevin Thompson,
William Song

Pages 63–68https://doi.org/10.1145/1964218.1964229

We describe the integrated power, area and thermal modeling framework in the Structural Simulation Toolkit (SST) for large-scale high performance computer simulation. It integrates various power and thermal modeling tools and computes run-time energy ...

- 45
- 529
Metrics
Total Citations45
Total Downloads529
Last 12 Months11
Last 6 weeks1

Abstract
Get Access

research-article

Should we worry about memory loss?

O. Perks,
S. D. Hammond,
S. J. Pennycook,
S. A. Jarvis

Pages 69–74https://doi.org/10.1145/1964218.1964230

In recent years the High Performance Computing (HPC) industry has benefited from the development of higher density multi-core processors. With recent chips capable of executing up to 32 tasks in parallel, this rate of growth also shows no sign of ...

- 1
- 141
Metrics
Total Citations1
Total Downloads141
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

research-article

A statistical performance model of the opteron processor

Jeanine Cook,
Jonathan Cook,
Waleed Alkohlani

Pages 75–80https://doi.org/10.1145/1964218.1964231

Cycle-accurate simulation is the dominant methodology for processor design space analysis and performance prediction. However, with the prevalence of multi-core, multi-threaded architectures, this method has become highly impractical as the sole means ...

- 6
- 316
Metrics
Total Citations6
Total Downloads316
Last 12 Months4
Last 6 weeks1

Abstract
Get Access

research-article

Preliminary design examination of the ParalleX system from a software and hardware perspective

Alexandre Tabbal,
Matthew Anderson,
Maciej Brodowicz,
Hartmut Kaiser,
Thomas Sterling

Pages 81–87https://doi.org/10.1145/1964218.1964232

Exascale systems, expected to emerge by the end of the next decade, will require the exploitation of billion-way parallelism at multiple hierarchical levels in order to achieve the desired sustained performance. While traditional approaches to ...

- 21
- 184
Metrics
Total Citations21
Total Downloads184
Last 12 Months2
Last 6 weeks0

Abstract
Get Access

research-article

Energy-aware metrics for benchmarking heterogeneous systems

Simon McIntosh-Smith,
Terry Wilson,
Jon Crisp,
Amaurys Ávila Ibarra,
Richard B. Sessions

Pages 88–94https://doi.org/10.1145/1964218.1964233

With the advent of heterogeneous computing systems consisting of multi-core CPUs and many-core GPUs, robust methods are needed to facilitate fair benchmark comparisons between different systems. In this paper we present a benchmarking methodology for ...

- 5
- 342
Metrics
Total Citations5
Total Downloads342
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

Save to Binder

Create a New Binder

Name

Subjects

Comments

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation