Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 8, Issue 4January 2012Special Issue on High-Performance Embedded Architectures and Compilers
Bibliometrics
research-article
Open Access
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
Article No.: 19, Pages 1–20https://doi.org/10.1145/2086696.2086698

Hardware data prefetch is a very well known technique for hiding memory latencies. However, in a multicore system fitted with a shared Last-Level Cache (LLC), prefetch induced by a core consumes common resources such as shared cache space and main ...

research-article
Open Access
An architecture-independent instruction shuffler to protect against side-channel attacks
Article No.: 20, Pages 1–19https://doi.org/10.1145/2086696.2086699

Embedded cryptographic systems, such as smart cards, require secure implementations that are robust to a variety of low-level attacks. Side-Channel Attacks (SCA) exploit the information such as power consumption, electromagnetic radiation and acoustic ...

research-article
Open Access
Approximate graph clustering for program characterization
Article No.: 21, Pages 1–21https://doi.org/10.1145/2086696.2086700

An important aspect of system optimization research is the discovery of program traits or behaviors. In this paper, we present an automated method of program characterization which is able to examine and cluster program graphs, i.e., dynamic data graphs ...

research-article
Open Access
Bahurupi: A polymorphic heterogeneous multi-core architecture
Article No.: 22, Pages 1–21https://doi.org/10.1145/2086696.2086701

Computing systems have made an irreversible transition towards parallel architectures with the emergence of multi-cores. Moreover, power and thermal limits in embedded systems mandate the deployment of many simpler cores rather than a few complex cores ...

research-article
Open Access
Compiler mitigations for time attacks on modern x86 processors
Article No.: 23, Pages 1–20https://doi.org/10.1145/2086696.2086702

This paper studies and evaluates the extent to which automated compiler techniques can defend against timing-based side channel attacks on modern x86 processors. We study how modern x86 processors can leak timing information through side channels that ...

research-article
Open Access
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions
Article No.: 24, Pages 1–20https://doi.org/10.1145/2086696.2086703

Indirect jump instructions are used to implement multiway branch statements and virtual function calls in object-oriented languages. Branch behavior can have significant impact on program performance, but fortunately hardware predictors can alleviate ...

research-article
Open Access
DAPSCO: Distance-aware partially shared cache organization
Article No.: 25, Pages 1–19https://doi.org/10.1145/2086696.2086704

Many-core tiled CMP proposals often assume a partially shared last level cache (LLC) since this provides a good compromise between access latency and cache utilization. In this paper, we propose a novel way to map memory addresses to LLC banks that ...

research-article
Open Access
On-the-fly structure splitting for heap objects
Article No.: 26, Pages 1–20https://doi.org/10.1145/2086696.2086705

With the advent of multicore systems, the gap between processor speed and memory latency has grown worse because of their complex interconnect. Sophisticated techniques are needed more than ever to improve an application's spatial and temporal locality. ...

research-article
Open Access
Efficient liveness computation using merge sets and DJ-graphs
Article No.: 27, Pages 1–18https://doi.org/10.1145/2086696.2086706

In this work we devise an efficient algorithm that computes the liveness information of program variables. The algorithm employs SSA form and DJ-graphs as representation to build Merge sets. The Merge set of node n, M(n) is based on the structure of the ...

research-article
Open Access
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era
Article No.: 28, Pages 1–21https://doi.org/10.1145/2086696.2086707

Extracting high memory-level parallelism (MLP) is essential for speeding up single-threaded applications which are memory bound. At the same time, the projected amount of dark silicon (the fraction of the chip powered off) on a chip is growing. Hence, ...

research-article
Open Access
Exploring the limits of GPGPU scheduling in control flow bound applications
Article No.: 29, Pages 1–22https://doi.org/10.1145/2086696.2086708

GPGPUs are optimized for graphics, for that reason the hardware is optimized for massively data parallel applications characterized by predictable memory access patterns and little control flow. For such applications' e.g., matrix multiplication, GPGPU ...

research-article
Open Access
FlexSig: Implementing flexible hardware signatures
Article No.: 30, Pages 1–20https://doi.org/10.1145/2086696.2086709

With the advent of chip multiprocessors, new techniques have been developed to make parallel programing easier and more reliable. New parallel programing paradigms and new methods of making the execution of programs more efficient and more reliable have ...

research-article
Open Access
Hardware transactional memory with software-defined conflicts
Article No.: 31, Pages 1–20https://doi.org/10.1145/2086696.2086710

In this paper we investigate the benefits of turning the concept of transactional conflict from its traditionally fixed definition into a variable one that can be dynamically controlled in software. We propose the extension of the atomic language ...

research-article
Open Access
Improving performance of nested loops on reconfigurable array processors
Article No.: 32, Pages 1–23https://doi.org/10.1145/2086696.2086711

Pipelining algorithms are typically concerned with improving only the steady-state performance, or the kernel time. The pipeline setup time happens only once and therefore can be negligible compared to the kernel time. However, for Coarse-Grained ...

research-article
Open Access
Making wide-issue VLIW processors viable on FPGAs
Article No.: 33, Pages 1–16https://doi.org/10.1145/2086696.2086712

Soft and highly-customized processors are emerging as a common way to efficiently control large amount of computing resources available on FPGAs. Yet, some processor architectures of choice for DSP and media applications, such as wide-issue VLIW ...

research-article
Open Access
On the evaluation of the impact of shared resources in multithreaded COTS processors in time-critical environments
Article No.: 34, Pages 1–25https://doi.org/10.1145/2086696.2086713

Commercial Off-The-Shelf (COTS) processors are now commonly used in real-time embedded systems. The characteristics of these processors fulfill system requirements in terms of time-to-market, low cost, and high performance-per-watt ratio. However, ...

research-article
Open Access
Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks
Article No.: 35, Pages 1–21https://doi.org/10.1145/2086696.2086714

We propose a flexibly-partitioned cache design that either drastically weakens or completely eliminates cache-based side channel attacks. The proposed Non-Monopolizable (NoMo) cache dynamically reserves cache lines for active threads and prevents other ...

research-article
Open Access
On the simulation of large-scale architectures using multiple application abstraction levels
Article No.: 36, Pages 1–20https://doi.org/10.1145/2086696.2086715

Simulation is a key tool for computer architecture research. In particular, cycle-accurate simulators are extremely important for microarchitecture exploration and detailed design decisions, but they are slow and, so, not suitable for simulating large-...

research-article
Open Access
Optimizing explicit data transfers for data parallel applications on the cell architecture
Article No.: 37, Pages 1–20https://doi.org/10.1145/2086696.2086716

In this paper we investigate a general approach to automate some deployment decisions for a certain class of applications on multi-core computers. We consider data-parallelizable programs that use the well-known double buffering technique to bring the ...

research-article
Open Access
PLDS: Partitioning linked data structures for parallelism
Article No.: 38, Pages 1–21https://doi.org/10.1145/2086696.2086717

Recently, parallelization of computations in the presence of dynamic data structures has shown promising potential. In this paper, we present PLDS, a system for easily expressing and efficiently exploiting parallelism in computations that are based on ...

research-article
Open Access
Polyhedral parallelization of binary code
Article No.: 39, Pages 1–21https://doi.org/10.1145/2086696.2086718

Many automatic software parallelization systems have been proposed in the past decades, but most of them are dedicated to source-to-source transformations. This paper shows that parallelizing executable programs is feasible, even if they require complex ...

research-article
Open Access
ReNIC: Architectural extension to SR-IOV I/O virtualization for efficient replication
Article No.: 40, Pages 1–22https://doi.org/10.1145/2086696.2086719

Virtualization is gaining popularity in cloud computing and has become the key enabling technology in cloud infrastructure. By replicating the virtual server state to multiple independent platforms, virtualization improves the reliability and ...

research-article
Open Access
Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic
Article No.: 41, Pages 1–22https://doi.org/10.1145/2086696.2086720

In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, ...

research-article
Open Access
Seamlessly portable applications: Managing the diversity of modern heterogeneous systems
Article No.: 42, Pages 1–20https://doi.org/10.1145/2086696.2086721

Nowadays, many possible configurations of heterogeneous systems exist, posing several new challenges to application development: different types of processing units usually require individual programming models with dedicated runtime systems and ...

research-article
Open Access
SYRANT: SYmmetric resource allocation on not-taken and taken paths
Article No.: 43, Pages 1–20https://doi.org/10.1145/2086696.2086722

In the multicore era, achieving ultimate single process performance is still an issue e.g. for single process workload or for sequential sections in parallel applications. Unfortunately, despite tremendous research effort on branch prediction, ...

research-article
Open Access
The gradient-based cache partitioning algorithm
Article No.: 44, Pages 1–21https://doi.org/10.1145/2086696.2086723

This paper addresses the problem of partitioning a cache between multiple concurrent threads and in the presence of hardware prefetching. Cache replacement designed to preserve temporal locality (e.g., LRU) will allocate cache resources proportional to ...

research-article
Open Access
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
Article No.: 45, Pages 1–20https://doi.org/10.1145/2086696.2086724

The exponential increase in multicore processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with a single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been ...

research-article
Open Access
Thread Tranquilizer: Dynamically reducing performance variation
Article No.: 46, Pages 1–21https://doi.org/10.1145/2086696.2086725

To realize the performance potential of multicore systems, we must effectively manage the interactions between memory reference behavior and the operating system policies for thread scheduling and migration decisions. We observe that these interactions ...

research-article
Open Access
TL-plane-based multi-core energy-efficient real-time scheduling algorithm for sporadic tasks
Article No.: 47, Pages 1–20https://doi.org/10.1145/2086696.2086726

As the energy consumption of multi-core systems becomes increasingly prominent, it's a challenge to design an energy-efficient real-time scheduling algorithm in multi-core systems for reducing the system energy consumption while guaranteeing the ...

Subjects

Comments