Scheduling Parallel Computations by Work Stealing: A Survey
Work stealing has been proven to be an efficient technique for scheduling parallel computations, and has been gaining popularity as the multiprocessor/multicore-processor load balancing technology of choice in both industry and academia. A review on the ...
Data-Driven Thread Execution on Heterogeneous Processors
In this paper we report our experience in implementing and evaluating the Data-Driven Multithreading (DDM) model on a heterogeneous multi-core processor. DDM is a non-blocking multithreading model that decouples the synchronization from the computation ...
RedThreads: An Interface for Application-Level Fault Detection/Correction Through Adaptive Redundant Multithreading
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-...
Parallel Asynchronous Strategies for the Execution of Feature Selection Algorithms
Reducing the dimensionality of datasets is a fundamental step in the task of building a classification model. Feature selection is the process of selecting a smaller subset of features from the original one in order to enhance the performance of the ...
Software Static Energy Modeling for Modern Processors
Power and energy estimation tools are essential tools that are used by system designers, software developers and compiler developers to optimize their products. In this work we present a novel method for statically estimating and analyzing the energy ...
Software Speculation on Caching DSMs
Clusters with caching DSMs deliver programmability and performance by supporting shared-memory programming model and tolerating communication latency of remote fetches via caching. The input of a data parallel program is partitioned across machines in ...
Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization
- Naila Farooqui,
- Indrajit Roy,
- Yuan Chen,
- Vanish Talwar,
- Rajkishore Barik,
- Brian Lewis,
- Tatiana Shpeisman,
- Karsten Schwan
Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, ...
Hierarchical Pattern Mining with the Automata Processor
Mining complex patterns with hierarchical structures becomes more and more important to understand the underlying information in large and unstructured databases. When compared with a set-mining problem or a string-mining problem, the computation ...
Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations
- William Horn,
- Manoj Kumar,
- Joefon Jann,
- José Moreira,
- Pratap Pattnaik,
- Mauricio Serrano,
- Gabriel Tanase,
- Hao Yu
Graph processing is becoming a crucial component for analyzing big data arising in many application domains such as social and biological networks, fraud detection, and sentiment analysis. As a result, a number of computational models for graph ...
Fast Automated Processing and Evaluation of Identity Leaks
The relevance of identity data leaks on the Internet is more present than ever. Almost every week we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The ...
Automated Compiler Optimization of Multiple Vector Loads/Stores
With widening vectors and the proliferation of advanced vector instructions in today's processors, vectorization plays an ever-increasing role in delivering application performance. Achieving the performance potential of this vector hardware has ...