TACO: Vol 18, No 4

Volume 18, Issue 4December 2021

Volume 18, Issue 4

December 2021

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Towards Enhanced System Efficiency while Mitigating Row Hammer

Article No.: 40, Pages 1–26https://doi.org/10.1145/3458749

In recent years, DRAM-based main memories have become susceptible to the Row Hammer (RH) problem, which causes bits to flip in a row without accessing them directly. Frequent activation of a row, called an aggressor row, causes its adjacent rows’ (...

research-article

Open Access

All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

Jerzy Proficz

Article No.: 41, Pages 1–22https://doi.org/10.1145/3460122

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI ...

research-article

Open Access

Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

Article No.: 42, Pages 1–24https://doi.org/10.1145/3460776

The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control ...

research-article

Open Access

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

Article No.: 43, Pages 1–26https://doi.org/10.1145/3460352

With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers. Such edge platforms are commonly equipped with ...

research-article

Open Access

Gem5-X: A Many-core Heterogeneous Simulation Platform for Architectural Exploration and Optimization

Article No.: 44, Pages 1–27https://doi.org/10.1145/3461662

The increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this ...

research-article

Open Access

PICO: A Presburger In-bounds Check Optimization for Compiler-based Memory Safety Instrumentations

Article No.: 45, Pages 1–27https://doi.org/10.1145/3460434

Memory safety violations such as buffer overflows are a threat to security to this day. A common solution to ensure memory safety for C is code instrumentation. However, this often causes high execution-time overhead and is therefore rarely used in ...

research-article

Open Access

Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs

Article No.: 46, Pages 1–25https://doi.org/10.1145/3460433

This article proposes a low I/O intensity-aware scheduling scheme on garbage collection (GC) in SSDs for minimizing the I/O long-tail latency to ensure I/O responsiveness. The basic idea is to assemble partial GC operations by referring to several ...

research-article

Open Access

Low-precision Logarithmic Number Systems: Beyond Base-2

Article No.: 47, Pages 1–25https://doi.org/10.1145/3461699

Logarithmic number systems (LNS) are used to represent real numbers in many applications using a constant base raised to a fixed-point exponent making its distribution exponential. This greatly simplifies hardware multiply, divide, and square root. LNS ...

research-article

Open Access

Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache

Article No.: 48, Pages 1–26https://doi.org/10.1145/3462632

Many emerging non-volatile memories are compatible with CMOS logic, potentially enabling their integration into a CPU’s die. This article investigates such monolithically integrated CPU–main memory chips. We exploit non-volatile memories employing 3D ...

research-article

Open Access

Byte-Select Compression

Article No.: 49, Pages 1–27https://doi.org/10.1145/3462209

Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have ...

research-article

Open Access

CIB-HIER: Centralized Input Buffer Design in Hierarchical High-radix Routers

Article No.: 50, Pages 1–21https://doi.org/10.1145/3468062

Hierarchical organization is widely used in high-radix routers to enable efficient scaling to higher switch port count. A general-purpose hierarchical router must be symmetrically designed with the same input buffer depth, resulting in a large amount of ...

research-article

Open Access

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation

Article No.: 51, Pages 1–23https://doi.org/10.1145/3469030

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on ...

research-article

Open Access

System-level Early-stage Modeling and Evaluation of IVR-assisted Processor Power Delivery System

Article No.: 52, Pages 1–27https://doi.org/10.1145/3468145

Despite being employed in numerous efforts to improve power delivery efficiency, the integrated voltage regulator (IVR) approach has yet to be evaluated rigorously and quantitatively in a full power delivery system (PDS) setting. To fulfill this need, ...

research-article

Open Access

GraphAttack: Optimizing Data Supply for Graph Applications on In-Order Multicore Architectures

Article No.: 53, Pages 1–26https://doi.org/10.1145/3469846

Graph structures are a natural representation of important and pervasive data. While graph applications have significant parallelism, their characteristic pointer indirect loads to neighbor data hinder scalability to large datasets on multicore systems. ...

research-article

Open Access

Scenario-Aware Program Specialization for Timing Predictability

Article No.: 54, Pages 1–26https://doi.org/10.1145/3473333

The successful application of static program analysis strongly depends on flow facts of a program such as loop bounds, control-flow constraints, and operating modes. This problem heavily affects the design of real-time systems, since static program ...

research-article

Open Access

WaFFLe: Gated Cache-<underline>Wa</underline>ys with Per-Core <underline>F</underline>ine-Grained DV<underline>F</underline>S for Reduced On-Chip Temperature and <underline>Le</underline>akage Consumption

Article No.: 55, Pages 1–25https://doi.org/10.1145/3471908

Managing thermal imbalance in contemporary chip multi-processors (CMPs) is crucial in assuring functional correctness of modern mobile as well as server systems. Localized regions with high activity, e.g., register files, ALUs, FPUs, and so on, ...

research-article

Open Access

SortCache: Intelligent Cache Management for Accelerating Sparse Data Workloads

Article No.: 56, Pages 1–24https://doi.org/10.1145/3473332

Sparse data applications have irregular access patterns that stymie modern memory architectures. Although hyper-sparse workloads have received considerable attention in the past, moderately-sparse workloads prevalent in machine learning applications, ...

research-article

Open Access

Device Hopping: Transparent Mid-Kernel Runtime Switching for Heterogeneous Systems

Article No.: 57, Pages 1–25https://doi.org/10.1145/3471909

Existing OS techniques for homogeneous many-core systems make it simple for single and multithreaded applications to migrate between cores. Heterogeneous systems do not benefit so fully from this flexibility, and applications that cannot migrate in mid-...

research-article

Open Access

LargeGraph: An Efficient Dependency-Aware GPU-Accelerated Large-Scale Graph Processing

Article No.: 58, Pages 1–24https://doi.org/10.1145/3477603

Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph ...

research-article

Open Access

Spiking Neural Networks in Spintronic Computational RAM

Article No.: 59, Pages 1–21https://doi.org/10.1145/3475963

Spiking Neural Networks (SNNs) represent a biologically inspired computation model capable of emulating neural computation in human brain and brain-like structures. The main promise is very low energy consumption. Classic Von Neumann architecture based ...

Subjects

Currently Not Available

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

Towards Enhanced System Efficiency while Mitigating Row Hammer

All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

Gem5-X: A Many-core Heterogeneous Simulation Platform for Architectural Exploration and Optimization

PICO: A Presburger In-bounds Check Optimization for Compiler-based Memory Safety Instrumentations

Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs

Low-precision Logarithmic Number Systems: Beyond Base-2

Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache

Byte-Select Compression

CIB-HIER: Centralized Input Buffer Design in Hierarchical High-radix Routers

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation

System-level Early-stage Modeling and Evaluation of IVR-assisted Processor Power Delivery System

GraphAttack: Optimizing Data Supply for Graph Applications on In-Order Multicore Architectures

Scenario-Aware Program Specialization for Timing Predictability

WaFFLe: Gated Cache-<underline>Wa</underline>ys with Per-Core <underline>F</underline>ine-Grained DV<underline>F</underline>S for Reduced On-Chip Temperature and <underline>Le</underline>akage Consumption

SortCache: Intelligent Cache Management for Accelerating Sparse Data Workloads

Device Hopping: Transparent Mid-Kernel Runtime Switching for Heterogeneous Systems

LargeGraph: An Efficient Dependency-Aware GPU-Accelerated Large-Scale Graph Processing

Spiking Neural Networks in Spintronic Computational RAM

Sections

Issue Downloads

Save to Binder

Subjects

Comments