SIGARCH: Vol 39, No 4

column

The challenges of writing portable, correct and high performance libraries for GPUs

Pages 2–7https://doi.org/10.1145/2082156.2082158

Graphics Processing Units (GPUs) are widely used to accelerate scientific applications. Many successes have been reported with speedups of two or three orders of magnitude over serial implementations of the same algorithms. These speedups typically ...

column

Power profiling and optimization for heterogeneous multi-core systems

Pages 8–13https://doi.org/10.1145/2082156.2082159

Processing speed and energy efficiency are two of the most critical issues for computer systems. This paper presents a systematic approach for profiling the power and performance characteristics of application targeting heterogeneous multi-core ...

COLUMN: GPU applications

column

GPU accelerated CAE using open solvers and the cloud

Pages 14–19https://doi.org/10.1145/2082156.2082161

After more than five years since GPUs were first used as accelerators for general scientific computations, the field of General Purpose GPU computing or GPGPU has finally reached mainstream. Developers have now access to a mature hardware and software ...

column

Design space exploration of adaptive beamforming acceleration for bedside and portable medical ultrasound imaging

Pages 20–25https://doi.org/10.1145/2082156.2082162

The use of adaptive beamforming is a viable solution to provide high-resolution real-time medical ultrasound imaging. However, the increase in image resolution comes at an expense of a significant increase in compute requirement over conventional ...

column

GPU implementation and optimization of electromagnetic simulation using the FDTD method for antenna designing

Pages 26–31https://doi.org/10.1145/2082156.2082163

This paper describes electromagnetical field simulation using the 3D-FDTD method for antenna designing on a CUDAcompatible GPU. We use the Split Perfectly Matched Layer as an absorbing boundary condition. As is well known, the 3D-FDTD method is a kind ...

COLUMN: Architectures I

column

CoreSymphony: an efficient reconfigurable multi-core architecture

Pages 32–37https://doi.org/10.1145/2082156.2082165

This paper describes CoreSymphony, a cooperative and reconfigurable superscalar processor architecture that improves single-thread performance in chip multiprocessor. CoreSymphony enables some narrow-issue cores to be fused into a single wide-issue ...

column

An FPGA-based scalable simulation accelerator for tile architectures

Pages 38–43https://doi.org/10.1145/2082156.2082166

FPGA-based simulation systems can simulate processor behavior in realistic time. In order to practically simulate tile many-core architectures, we propose ScalableCore for prototyping system development using multiple FPGAs. In this paper, we present an ...

COLUMN: FPGA applications I

column

Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation

Pages 44–49https://doi.org/10.1145/2082156.2082168

This paper presents the domain-specific programmable design of custom computing machines for high-performance stencil computation. Stencil computation is one of the typical kernels in scientific computations, however its low operational-intensity makes ...

column

An implementation of out-of-order execution system for acceleration of computational fluid dynamics on FPGAs

Pages 50–55https://doi.org/10.1145/2082156.2082169

CFD is an important tool for designing aircraft components. FaSTAR is one of the most recent CFD program package with various solvers and automatic generation of grid data. However, FaSTAR is difficult to be executed in parallel machines because of its ...

column

Embedded architecture with hardware accelerator for target recognition in driver assistance system

Pages 56–59https://doi.org/10.1145/2082156.2082170

This paper presents a new Radar-based recognition system, which is able to identify obstacles during a vehicle movement. Obstacles recognition gives the benefits of avoiding false alarms and allows generating alarms that take into account the ...

COLUMN: Systems and tools II

column

Surviving the end of frequency scaling with reconfigurable dataflow computing

Pages 60–65https://doi.org/10.1145/2082156.2082172

Over the past decade x86 processors have come to dominate the world's largest supercomputers. However in the future conventional multicore processors are unlikely to be able to deliver the necessary performance per $ and per W to achieve exascale ...

column

KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

Pages 66–71https://doi.org/10.1145/2082156.2082173

With advances in manycore and accelerator architectures, the high performance and embedded spaces are rapidly converging. Emerging architectures feature different forms of parallelism. The Polyhedral Processes Networks (PPNs) are a proven model of ...

COLUMN: FPGA applications II

column

High speed CRC with 64-bit generator polynomial on an FPGA

Pages 72–77https://doi.org/10.1145/2082156.2082175

Deployment of jumbo frame sizes beyond 9000 bytes for storage systems is limited by 32-bit Cyclic Redundancy Checks used by a network protocol. In order to overcome this limitation we study possibility of using 64-bit polynomials in software and ...

column

A biologically plausible real-time spiking neuron simulation environment based on a multiple-FPGA platform

Pages 78–81https://doi.org/10.1145/2082156.2082176

Neurological research has revealed that neurons encode information in the timing of spikes. Spiking neural network simulations are a flexible and powerful method for investigating the behaviour of such neuronal systems. The spiking neuron models which ...

column

Parallelization of the channel width search for FPGA routing

Pages 82–85https://doi.org/10.1145/2082156.2082177

COLUMN: Architectures II

column

A study of an FPGA based flexible SIMD processor

Pages 86–89https://doi.org/10.1145/2082156.2082179

column

Augmenting DR-ASIP flexibility through multi-mode custom instructions

Pages 90–93https://doi.org/10.1145/2082156.2082180

This paper introduces a simple method called multimode custom instructions, which aims at reducing the power consumption of the register file of tightly coupled dynamically reconfigurable application specific instruction set processors (DR-ASIPs). To ...

column

A MEMS writer system embedded for a programmable optically reconfigurable gate array

Pages 94–97https://doi.org/10.1145/2082156.2082181

POSTER SESSION: Poster session short presentations

poster

Automatic fusions of CUDA-GPU kernels for parallel map

Pages 98–99https://doi.org/10.1145/2082156.2082183

When implementing a function mapping on the contemporary GPU, several contradictory performance factors affecting distribution of computation into GPU kernels have to be balanced. A decomposition-fusion scheme suggests to decompose the computational ...

poster

A discussion on calculating eigenvalues of real symmetric tridiagonal matrices on a GPU

Pages 100–101https://doi.org/10.1145/2082156.2082184

While GPUs are attracting attention as an accelerator in wide-ranged application areas, compatibility between the architecture and selected algorithm is important to effectively bring out their potential performance. This paper focuses on eigenvalue ...

poster

Multicore reconfiguration platform an alternative to RAMPSoC

Pages 102–103https://doi.org/10.1145/2082156.2082185

The current state of the art in processor performance improvement is multicore-processor systems. These systems offer a number of homogeneous and static processor cores for the parallel distribution of computational tasks. A novel idea in this research ...

poster

Parallelism Level Impact on Energy Consumption in Reconfigurable Devices

Pages 104–105https://doi.org/10.1145/2082156.2082186

Nowadays, System-on-Chip architectures are composed of several execution resources which support complex applications. As it shares silicon area and limits the cost of the global circuit, the embedding of a reconfigurable resource in these SoC provides ...

poster

Power and area optimisation in heterogeneous 3D networks-on-chip architectures

Pages 106–107https://doi.org/10.1145/2082156.2082187

Three dimensional Network-on-Chip (3D NoC) architectures have evolved with a lot of interest to address the on-chip communication delays of modern SoC systems. However, the vertical interconnections between layers is more power and area hungry compared ...

COLUMN: Departments

department

Internet nuggets

Mark Thorson

Pages 108–117https://doi.org/10.1145/2082156.2082189

SIGARCH

Sections

Newsletter Downloads

The challenges of writing portable, correct and high performance libraries for GPUs

Power profiling and optimization for heterogeneous multi-core systems

GPU accelerated CAE using open solvers and the cloud

Design space exploration of adaptive beamforming acceleration for bedside and portable medical ultrasound imaging

GPU implementation and optimization of electromagnetic simulation using the FDTD method for antenna designing

CoreSymphony: an efficient reconfigurable multi-core architecture

An FPGA-based scalable simulation accelerator for tile architectures

Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation

An implementation of out-of-order execution system for acceleration of computational fluid dynamics on FPGAs

Embedded architecture with hardware accelerator for target recognition in driver assistance system

Surviving the end of frequency scaling with reconfigurable dataflow computing

KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

High speed CRC with 64-bit generator polynomial on an FPGA

A biologically plausible real-time spiking neuron simulation environment based on a multiple-FPGA platform

Parallelization of the channel width search for FPGA routing

A study of an FPGA based flexible SIMD processor

Augmenting DR-ASIP flexibility through multi-mode custom instructions

A MEMS writer system embedded for a programmable optically reconfigurable gate array

Automatic fusions of CUDA-GPU kernels for parallel map

A discussion on calculating eigenvalues of real symmetric tridiagonal matrices on a GPU

Multicore reconfiguration platform an alternative to RAMPSoC

Parallelism Level Impact on Energy Consumption in Reconfigurable Devices

Power and area optimisation in heterogeneous 3D networks-on-chip architectures

Internet nuggets

Sections

Newsletter Downloads

Save to Binder

Subjects

Comments