No abstract available.
Proceeding Downloads
An FPGA NIC Based Hardware Caching for Blockchain
These days, people pay attention to Blockchain, which is a main technology of cryptocurrency. Blockchain is a fault-tolerant distributed ledger that does not need an administrator. We call transfer of digital asset as a "transaction". We need to hold ...
High Speed Performance Estimation of Embedded Hard-core Processors in FPGA-based SoCs
The embedded hard-core processors beside the traditional FPGA fabric in FPGA-based System-on-Chip (SoC) devices make them an attractive alternative for realizing the software portions of the application while using the FPGA fabric for hardware ...
A Time-Division Multiplexing Ising Machine on FPGAs
Annealing machines based on the Ising model which can solve combinatorial optimization problems is an emerging solution to overcome the performance limit of von Neumann architecture. However, it is difficult to solve practical combinatorial optimization ...
An Adaptive Demotion Policy for High-Associativity Caches
Although the Least Recently Used (LRU) policy is known as a simple but high-performance cache replacement policy, high-associativity caches hardly adopt the LRU policy because of an increase in the hardware overheads. The Re-Reference Interval ...
Towards Flexible Automatic Generation of Graph Processing Gateware
FPGAs have been demonstrated as promising platforms to accelerate graph processing applications at scale with superior energy-efficiency. However, programming FPGAs is significantly more challenging than similar software solutions. To address this ...
Dataflow based Near Data Computing Achieves Excellent Energy Efficiency
The emergence of 3D-DRAM has rekindled interest in near data computing (NDC) research. This article introduces dataflow processing in memory (DFPIM) which melds near data computing, dataflow architecture, coarse-grained reconfigurable logic (CGRL), and ...
DTP: Enabling Exhaustive Exploration of FPGA Temporal Partitions for Streaming HPC Applications
Reconfigurable computing systems show great promise for accelerating streaming HPC applications because of their low power consumption and high performance. However, mapping an HPC application to a reconfigurable system is a challenging task. The ...
Hardware Acceleration with Multi-Threading of Java-Based High Level Synthesis Tool
In this research, we attempt to speed up the computational fluid dynamics (CFD) and the convolutional neural network (CNN) using JavaRock-Thrash thread function of the high-level synthesis tool with an FPGA. In the two-dimensional heat equation, by ...
Performance Evaluation of PEACH3: Field-Programmable Gate Array Switch for Tightly Coupled Accelerators
An FPGA switching hub for tightly coupled accelerators (TCA) architecture called PEACH3 (PCI-Express Adaptive Communication Hub ver. 3) is evaluated and its communication speed is analyzed. PEACH3 connects a number of GPUs directly through PCI express ...
Accelerated Embedded AKAZE Feature Detection Algorithm on FPGA
Feature detection is a major operation in various computer vision systems. The KAZE algorithm and its improved version, Accelerated-KAZE (AKAZE), are considered as the first algorithms to detect features by building a scale space using nonlinear ...
Reducing the Cost of Removing Border Artefacts in Fourier Transforms
Many image processing algorithms are implemented in a combination of spatial and frequency domains. The fast Fourier transform (FFT) is the workhorse of such algorithms. One limitation of the FFT is artefacts that result from the implicit periodicity ...
A porting and optimization of search for neighbour-particle in MPS method for GPU by using OpenACC
Moving Particle Semi-implicit (MPS) method is a particle method used in fields such as computational fluid dynamics. It is classified as a particle method. Target fluids and objects are divided up into particles, and each particle interacts with its ...
Acceleration of Publish/Subscribe Messaging in ROS-compliant FPGA Component
Intelligent robots demand complex information processing such as SLAM (Simultaneous Localization and Mapping) and DNN (Deep Neural Network). FPGA (Field Programmable Gate Array) is expected to accelerate these applications with high energy efficiency. ...
HW/SW Co-design of an IEEE 802.11a/g Receiver on Xilinx Zynq SoC using High-Level Synthesis
This paper presents an implementation of an Orthogonal Frequency-Division Multiplexing (OFDM) receiver using the high-level synthesis tool, from Xilinx called Software Defined System-on-Chip (SDSoC). The Zynq SoCs containing an ARM processor besides a ...
FPGA-based Stream Computing for High-Performance N-Body Simulation using Floating-Point DSP Blocks
Recent advancement of FPGAs allows high-performance and low-power computing by constructing deeply-pipelined custom hardware using floating-point DSP blocks. In this paper, we present a stream-computing architecture and design for FPGA-based high-...
Neural Network Training Acceleration with PSO Algorithm on a GPU Using OpenCL
Neural networks and deep learning currently provide the promising solutions to many practical problems. One of the difficulties in building neural network models is the training process that requires to find an optimal solution for the network weights. ...
FPGA based ASIC Emulator with High Speed Optical Serial Links
We propose a multiple FPGA system using high speed optical serial interface built in recent FPGAs and construct ASIC emulator. Although conventional system which uses parallel connection is limited to bandwidth of the number of I/Os, proposed system has ...
A Case for Remote GPUs over 10GbE Network for VR Applications
VR technology that enables users to experience environments made by computer similar to real environments has become popular. In VR technology, computation cost of graphic processing is high and thus it requires a high-end GPU because high-quality ...
Acceleration of the aggregation process in a Hall-thruster simulation using Intel FPGA SDK for OpenCL
The Full Particle-In-Cell (Full-PIC) method is a numerical simulation technique used in the research and development of Hall-thrusters which are a type of electric propulsion engines. It treats ions, neutrons, and electrons as particles and is highly ...
FPGA Accelerated NoC-Simulation: A Case Study on the Intel Xeon Phi Ringbus Topology
- Oliver Jakob Arndt,
- Christian Spindeldreier,
- Kevin Wohnrade,
- Daniel Pfefferkorn,
- Martin Neuenhahn,
- Holger Blume
Complex signal processing algorithms targeted on architectures with increasingly high numbers of parallel processing units require high performance core-interconnections (i.e., low latencies, high throughput, no pinch-offs or bottlenecks). Therefore, ...
High-level Synthesis based on Parallel Design Patterns using a Functional Language
Logic-circuit integration of a field-programmable gate array (FPGA) has grown considerably with improvements in semiconductor technology. High-level synthesis (HLS) is now widely used to implement complex FPGA applications to increase design efficiency, ...
High-Performance Hardware Accelerators for Solving Ordinary Differential Equations
- Ioannis Stamoulias,
- Matthias Möller,
- Rene Miedema,
- Christos Strydis,
- Christoforos Kachris,
- Dimitrios Soudris
Ordinary Differential Equations (ODEs) are widely used in many high-performance computing applications. However, contemporary processors generally provide limited throughput for these kinds of calculations. A high-performance hardware accelerator has ...
Access Network Generation for Efficient Debugging of FPGAs
The inclusion of access networks in modern FPGAs can provide a large number of use cases notably in debugging. Using access networks can eliminate the need for frequent synthesis during the debugging phase, which results in saving debugging time and ...
Probabilistic Strategies Based on Staged LSH for Speedup of Audio Fingerprint Searching with Ten Million Scale Database
We are developing and improving algorithms to identify audio fingerprints (AFP) in a network router. Staged Locality Sensitive Hashing (LSH) is one of them and nearly as fast as 1Gbps of prevalent network routers. In this paper, we propose two ...
HLS Compilation for CPU Interlays
The idea of coupling reconfigurable fabrics with general-purpose processors has been extensively studied during the last couple of decades. Custom instructions targeting those reconfigurable fabrics had to be handcrafted because tools capable of high ...
Index Terms
- Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies