Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 38, Issue 4March 2011Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN
Reflects downloads up to 18 Feb 2025Bibliometrics
Skip Table Of Content Section
SPECIAL ISSUE: Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
research-article
Using simulation to design extremescale applications and architectures: programming model exploration

A key problem facing application developers is that they are expected to utilize extreme levels of parallelism soon after delivery of future leadership class machines, but developing applications capable of exposing sufficient concurrency is a time ...

research-article
Performance analysis of the OP2 framework on many-core architectures

We present a performance analysis and benchmarking study of the OP2 "active" library, which provides an abstraction framework for the solution of parallel unstructured mesh applications. OP2 aims to decouple the scientific specification of the ...

research-article
Benchmarking and modelling of POWER7, Westmere, BG/P, and GPUs: an industry case study

This paper introduces an industry strength, multi-purpose, benchmark: Shamrock. Developed at the Atomic Weapons Establishment (AWE), Shamrock is a two dimensional (2D) structured hydrocode; one of its aims is to assess the impacts of a change in ...

research-article
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

We present the performance analysis of a port of the LU benchmark from the NAS Parallel Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and report on the optimisation efforts employed to take advantage of this platform. ...

research-article
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes with hundreds of cores and non-uniform memory access latencies are expected within the next decade. However, even current petascale systems with tens of ...

research-article
The structural simulation toolkit

As supercomputers grow, understanding their behavior and performance has become increasingly challenging. New hurdles in scalability, programmability, power consumption, reliability, cost, and cooling are emerging, along with new technologies such as 3D ...

research-article
Parallel memory prediction for fused linear algebra kernels

The performance of many scientific programs is limited by data movement. Loop fusion is one optimization used to increase the speed of memory bound operations. To automate loop fusion for matrix computations, we developed the Build to Order (BTO) ...

research-article
A fast GEMM implementation on the cypress GPU

We present benchmark results of optimized dense matrix multiplication kernels for Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP) precision. Our SGEMM and DGEMM kernels show ~ 2 Top/s and ...

research-article
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the ...

research-article
A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration

We describe the integrated power, area and thermal modeling framework in the Structural Simulation Toolkit (SST) for large-scale high performance computer simulation. It integrates various power and thermal modeling tools and computes run-time energy ...

research-article
Should we worry about memory loss?

In recent years the High Performance Computing (HPC) industry has benefited from the development of higher density multi-core processors. With recent chips capable of executing up to 32 tasks in parallel, this rate of growth also shows no sign of ...

research-article
A statistical performance model of the opteron processor

Cycle-accurate simulation is the dominant methodology for processor design space analysis and performance prediction. However, with the prevalence of multi-core, multi-threaded architectures, this method has become highly impractical as the sole means ...

research-article
Preliminary design examination of the ParalleX system from a software and hardware perspective

Exascale systems, expected to emerge by the end of the next decade, will require the exploitation of billion-way parallelism at multiple hierarchical levels in order to achieve the desired sustained performance. While traditional approaches to ...

research-article
Energy-aware metrics for benchmarking heterogeneous systems

With the advent of heterogeneous computing systems consisting of multi-core CPUs and many-core GPUs, robust methods are needed to facilitate fair benchmark comparisons between different systems. In this paper we present a benchmarking methodology for ...

Subjects

Comments