Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Reflects downloads up to 16 Oct 2024Bibliometrics
Skip Table Of Content Section
COLUMN: Systems and tools I
column
The challenges of writing portable, correct and high performance libraries for GPUs

Graphics Processing Units (GPUs) are widely used to accelerate scientific applications. Many successes have been reported with speedups of two or three orders of magnitude over serial implementations of the same algorithms. These speedups typically ...

column
Power profiling and optimization for heterogeneous multi-core systems

Processing speed and energy efficiency are two of the most critical issues for computer systems. This paper presents a systematic approach for profiling the power and performance characteristics of application targeting heterogeneous multi-core ...

COLUMN: GPU applications
column
GPU accelerated CAE using open solvers and the cloud

After more than five years since GPUs were first used as accelerators for general scientific computations, the field of General Purpose GPU computing or GPGPU has finally reached mainstream. Developers have now access to a mature hardware and software ...

column
Design space exploration of adaptive beamforming acceleration for bedside and portable medical ultrasound imaging

The use of adaptive beamforming is a viable solution to provide high-resolution real-time medical ultrasound imaging. However, the increase in image resolution comes at an expense of a significant increase in compute requirement over conventional ...

column
GPU implementation and optimization of electromagnetic simulation using the FDTD method for antenna designing

This paper describes electromagnetical field simulation using the 3D-FDTD method for antenna designing on a CUDAcompatible GPU. We use the Split Perfectly Matched Layer as an absorbing boundary condition. As is well known, the 3D-FDTD method is a kind ...

COLUMN: Architectures I
column
CoreSymphony: an efficient reconfigurable multi-core architecture

This paper describes CoreSymphony, a cooperative and reconfigurable superscalar processor architecture that improves single-thread performance in chip multiprocessor. CoreSymphony enables some narrow-issue cores to be fused into a single wide-issue ...

column
An FPGA-based scalable simulation accelerator for tile architectures

FPGA-based simulation systems can simulate processor behavior in realistic time. In order to practically simulate tile many-core architectures, we propose ScalableCore for prototyping system development using multiple FPGAs. In this paper, we present an ...

COLUMN: FPGA applications I
column
Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation

This paper presents the domain-specific programmable design of custom computing machines for high-performance stencil computation. Stencil computation is one of the typical kernels in scientific computations, however its low operational-intensity makes ...

column
An implementation of out-of-order execution system for acceleration of computational fluid dynamics on FPGAs

CFD is an important tool for designing aircraft components. FaSTAR is one of the most recent CFD program package with various solvers and automatic generation of grid data. However, FaSTAR is difficult to be executed in parallel machines because of its ...

column
Embedded architecture with hardware accelerator for target recognition in driver assistance system

This paper presents a new Radar-based recognition system, which is able to identify obstacles during a vehicle movement. Obstacles recognition gives the benefits of avoiding false alarms and allows generating alarms that take into account the ...

COLUMN: Systems and tools II
column
Surviving the end of frequency scaling with reconfigurable dataflow computing

Over the past decade x86 processors have come to dominate the world's largest supercomputers. However in the future conventional multicore processors are unlikely to be able to deliver the necessary performance per $ and per W to achieve exascale ...

column
KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

With advances in manycore and accelerator architectures, the high performance and embedded spaces are rapidly converging. Emerging architectures feature different forms of parallelism. The Polyhedral Processes Networks (PPNs) are a proven model of ...

COLUMN: FPGA applications II
column
High speed CRC with 64-bit generator polynomial on an FPGA

Deployment of jumbo frame sizes beyond 9000 bytes for storage systems is limited by 32-bit Cyclic Redundancy Checks used by a network protocol. In order to overcome this limitation we study possibility of using 64-bit polynomials in software and ...

column
A biologically plausible real-time spiking neuron simulation environment based on a multiple-FPGA platform

Neurological research has revealed that neurons encode information in the timing of spikes. Spiking neural network simulations are a flexible and powerful method for investigating the behaviour of such neuronal systems. The spiking neuron models which ...

COLUMN: Architectures II
column
column
Augmenting DR-ASIP flexibility through multi-mode custom instructions

This paper introduces a simple method called multimode custom instructions, which aims at reducing the power consumption of the register file of tightly coupled dynamically reconfigurable application specific instruction set processors (DR-ASIPs). To ...

POSTER SESSION: Poster session short presentations
poster
Automatic fusions of CUDA-GPU kernels for parallel map

When implementing a function mapping on the contemporary GPU, several contradictory performance factors affecting distribution of computation into GPU kernels have to be balanced. A decomposition-fusion scheme suggests to decompose the computational ...

poster
A discussion on calculating eigenvalues of real symmetric tridiagonal matrices on a GPU

While GPUs are attracting attention as an accelerator in wide-ranged application areas, compatibility between the architecture and selected algorithm is important to effectively bring out their potential performance. This paper focuses on eigenvalue ...

poster
Multicore reconfiguration platform an alternative to RAMPSoC

The current state of the art in processor performance improvement is multicore-processor systems. These systems offer a number of homogeneous and static processor cores for the parallel distribution of computational tasks. A novel idea in this research ...

poster
Parallelism Level Impact on Energy Consumption in Reconfigurable Devices

Nowadays, System-on-Chip architectures are composed of several execution resources which support complex applications. As it shares silicon area and limits the cost of the global circuit, the embedding of a reconfigurable resource in these SoC provides ...

poster
Power and area optimisation in heterogeneous 3D networks-on-chip architectures

Three dimensional Network-on-Chip (3D NoC) architectures have evolved with a lot of interest to address the on-chip communication delays of modern SoC systems. However, the vertical interconnections between layers is more power and area hungry compared ...

COLUMN: Departments
department
Internet nuggets

Subjects

Comments