A Parallel Yet Pipelined Architecture for Efficient Implementation of the Advanced Encryption Standard Algorithm on Reconfigurable Hardware
The Advanced Encryption System (AES) is used in almost all network-based applications to ensure security. The core computation of AES, which is performed on data blocks of 128 bits, is iterated for several rounds, depending on the key size. The strength ...
A Cross-ISA Kernelized High-Performance Parallel Emulator
Cross instruction-set-architecture (cross-ISA) virtual machine is the core technology to many important applications. However, traditional cross-ISA virtual machines have encountered several bottlenecks: (1) they cannot effectively utilize the host ...
Towards Scalable Java HPC with Hybrid and Native Communication Devices in MPJ Express
MPJ Express is a messaging system that allows application developers to parallelize their compute-intensive sequential Java codes on High Performance Computing clusters and multicore processors. In this paper, we extend MPJ Express software to provide ...
Parallel Implementations of the Cooperative Particle Swarm Optimization on Many-core and Multi-core Architectures
Particle swarm optimization (PSO) is an evolutionary heuristics-based method used for continuous function optimization. PSO is stochastic yet very robust. Nevertheless, real-world optimizations require a high computational effort to converge to a good ...
Transparent Speculative Parallelization of Discrete Event Simulation Applications Using Global Variables
Parallelizing (compute-intensive) discrete event simulation (DES) applications is a classical approach for speeding up their execution and for making very large/complex simulation models tractable. This has been historically achieved via parallel DES (...
Czip: A Fast Lossless Compression Algorithm for Climate Data
Climate data have been dramatically increasing in volume in recent years. This huge volume of climate data poses considerable challenges for data storage, archiving and sharing. In this paper, we propose a lossless compression algorithm for climate data,...
Combining Data and Computation Distribution Directives for Hybrid Parallel Programming: A Transformation System
This paper describes dSTEP, a directive-based programming model for hybrid shared and distributed memory machines. The originality of our work is the definition and an implementation of a unified high-level programming model addressing both data and ...
A Parallelization Approach for Hard Real-Time Systems and Its Application on Two Industrial Programs
Applications in industry often have grown and improved over many years. Since their performance demands increase, they also need to benefit from the availability of multi-core processors. However, a reimplementation from scratch and even a restructuring ...
Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime
There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the available ...
Purge-Rehab: Eager Software Transactional Memory with High Performance Under Contention
Transactional memory is a programming model that attempts to make parallel programming easier. Transactional memory uses either eager (at encounter time) or lazy (at commit time) validation to check for conflicting accesses between concurrent ...