No abstract available.
Reviewers
Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors
Limiting the peak power consumption of chip multiprocessors (CMPs) has recently received a lot of attention. In order to enable chip-level power capping, the peak power consumption of on-chip L2 caches in a CMP often needs to be constrained by ...
A Theoretical Framework for Value Prediction in Parallel Systems
We present here a theoretical framework towards a fundamental understanding of the effects of value prediction. Our framework consists of two parts: first, an identification of the theoretical limit of value prediction and an indication of the potential ...
Heterogeneous Mini-rank: Adaptive, Power-Efficient Memory Architecture
Memory power consumption has become a big concern in server platforms. A recently proposed mini-rank architecture reduces the memory power consumption by breaking each DRAM rank into multiple narrow mini-ranks and activating fewer devices for each ...
Gossamer: A Lightweight Approach to Using Multicore Machines
The key to performance improvements in the multi-core era is for software to utilize the available concurrency. This paper presents a lightweight programming framework called Gossamer that is easy to use, enables the solution of a broad range of ...
Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs
To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to ...
Exploitation of Dynamic Communication Patterns through Static Analysis
- Robert Preissl,
- Bronis R. de Supinski,
- Martin Schulz,
- Daniel J. Quinlan,
- Dieter Kranzlmuller,
- Thomas Panas
Collective operations can have a large impact on the performance of parallel applications. However, the ideal implementation of a particular collective communication often depends on both the application and the targeted machine structure. Our approach ...
Parallel Exact Inference on a CPU-GPGPU Heterogenous System
Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a ...
Minimizing Stretch and Makespan of Multiple Parallel Task Graphs via Malleable Allocations
Many scientific applications can be structured as Parallel TaskGraphs (PTGs), i.e., graphs of data-parallel tasks. Adding data-parallelism to a task-parallel application provides opportunities for higher performance and scalability, but poses scheduling ...
Efficient PageRank and SpMV Computation on AMD GPUs
Google's famous PageRank algorithm is widely used to determine the importance of web pages in search engines. Given the large number of web pages on the World Wide Web, efficient computation of PageRank becomes a challenging problem. We accelerated the ...
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents ...
Energy Modeling of Wireless Sensor Nodes Based on Petri Nets
Energy minimization is of great importance in wireless sensor networks in extending the battery lifetime. Accurately understanding the energy consumption characteristics of each sensor node is a critical step for the design of energy saving strategies. ...
Optimal Task Reallocation in Heterogeneous Distributed Computing Systems with Age-Dependent Delay Statistics
This paper presents a general framework for optimal task reallocation in heterogeneous distributed-computing systems and offers a rigorous analytical model for the stochastic execution time of a workload. The model takes into account the heterogeneity ...
Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories
This paper presents a new design and an implementation of the runtime system of MapReduce for heterogeneous multicore processors with explicitly managed local memories. We advance the state of the art in runtime support for MapReduce using five ...
System-Level, Unified In-band and Out-of-band Dynamic Thermal Control
High-density computer racks become increasingly commonplace in supercomputing centers and data centers. With tight integration of high-powered computing components in the racks, hot spots or pockets of elevated temperatures on the chips and system can ...
Microwiper: Efficient Memory Propagation in Live Migration of Virtual Machines
Live migration of virtual machines relocates running VM across physical hosts with unnoticeable service downtime. However, propagating changing VM memory at low cost, especially for write-intensive applications or at relatively low network bandwidth, is ...
Reliability and Performance Optimization of Pipelined Real-Time Systems
We consider pipelined real-time systems, commonly found in assembly lines, consisting of a chain of tasks executing on a distributed platform. Their processing is pipelined: each processor executes only one interval of consecutive tasks. We are ...
An Efficient Randomized Routing Protocol for Single-Hop Radio Networks
In this paper we study the important problems of message routing, sorting, and selection in a radio network. A radio network consists of stations where each station is a hand-held device. We consider a single-hop radio network. In a single-hop network ...
Checkpointing vs. Migration for Post-Petascale Supercomputers
An alternative to classical fault-tolerant approaches for large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and a preventive measure is taken. We develop analytical performance models for two types of preventive ...
Revisting Tag Collision Problem in RFID Systems
In RFID systems, the reader is unable to discriminate concurrently reported IDs of tags from the overlapped signals, and a collision happens. Many algorithms for anticollision are proposed to improve the throughput and reduce the latency for tag ...
Distributed Minimum Transmission Multicast Routing Protocol for Wireless Sensor Networks
Energy efficient multicast routing is one of the fundamental problems in wireless sensor networks (WSNs). Previous work has shown that when the goal is to find multicast trees with minimum transmission cost, the problem becomes NP-complete. In this work,...
A Quantitative Study of Accountability in Wireless Multi-hop Networks
In this paper, we explore a quantitative approach to accountable wireless multi-hop networks. We propose using hierarchical P-Accountability to adapt the requirements of modeling a complex network environment and assess the degree of accountability in a ...
A Stack-on-Demand Execution Model for Elastic Computing
Cloud computing is all the rage these days; its confluence with mobile computing would bring an even more pervasive influence. Clouds per se are elastic computing infrastructure where mobile applications can offload or draw tasks in an on-demand push-...
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Modern supercomputing systems have witnessed a phenomenal growth in the recent history owing to the advent of multi-core architectures and high speed networks. However, the operational and maintenance costs of these systems have also grown rapidly. ...