TACO: Vol 19, No 1

Volume 19, Issue 1March 2022

Volume 19, Issue 1

March 2022

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Locality-Aware CTA Scheduling for Gaming Applications

Article No.: 1, Pages 1–26https://doi.org/10.1145/3477497

The compute work rasterizer or the GigaThread Engine of a modern NVIDIA GPU focuses on maximizing compute work occupancy across all streaming multiprocessors in a GPU while retaining design simplicity. In this article, we identify the operational aspects ...

research-article

Open Access

Iterative Compilation Optimization Based on Metric Learning and Collaborative Filtering

Article No.: 2, Pages 1–25https://doi.org/10.1145/3480250

Pass selection and phase ordering are two critical compiler auto-tuning problems. Traditional heuristic methods cannot effectively address these NP-hard problems especially given the increasing number of compiler passes and diverse hardware architectures. ...

research-article

Open Access

ReuseTracker: Fast Yet Accurate Multicore Reuse Distance Analyzer

Article No.: 3, Pages 1–25https://doi.org/10.1145/3484199

One widely used metric that measures data locality is reuse distance—the number of unique memory locations that are accessed between two consecutive accesses to a particular memory location. State-of-the-art techniques that measure reuse distance in ...

research-article

Open Access

GPU Domain Specialization via Composable On-Package Architecture

Article No.: 4, Pages 1–23https://doi.org/10.1145/3484505

As GPUs scale their low-precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that a converged GPU design trying to address diverging ...

research-article

Open Access

SMT-Based Contention-Free Task Mapping and Scheduling on 2D/3D SMART NoC with Mixed Dimension-Order Routing

Article No.: 5, Pages 1–21https://doi.org/10.1145/3487018

SMART NoCs achieve ultra-low latency by enabling single-cycle multiple-hop transmission via bypass channels. However, contention along bypass channels can seriously degrade the performance of SMART NoCs by breaking the bypass paths. Therefore, contention-...

research-article

Open Access

Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators

Article No.: 6, Pages 1–26https://doi.org/10.1145/3485137

A spatial accelerator’s efficiency depends heavily on both its mapper and cost models to generate optimized mappings for various operators of DNN models. However, existing cost models lack a formal boundary over their input programs (operators) for ...

research-article

Open Access

Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming

Article No.: 7, Pages 1–26https://doi.org/10.1145/3487922

The success of Deep Artificial Neural Networks (DNNs) in many domains created a rich body of research concerned with hardware accelerators for compute-intensive DNN operators. However, implementing such operators efficiently with complex hardware ...

research-article

Open Access

SecNVM: An Efficient and Write-Friendly Metadata Crash Consistency Scheme for Secure NVM

Article No.: 8, Pages 1–26https://doi.org/10.1145/3488724

Data security is an indispensable part of non-volatile memory (NVM) systems. However, implementing data security efficiently on NVM is challenging, since we have to guarantee the consistency of user data and the related security metadata. Existing ...

research-article

Open Access

TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware Scheduling

Article No.: 9, Pages 1–23https://doi.org/10.1145/3491218

Co-running GPU kernels on a single GPU can provide high system throughput and improve hardware utilization, but this raises concerns on application security. We reveal that translation lookaside buffer (TLB) attack, one of the common attacks on CPU, can ...

research-article

Open Access

HeapCheck: Low-cost Hardware Support for Memory Safety

Article No.: 10, Pages 1–24https://doi.org/10.1145/3495152

Programs written in C/C++ are vulnerable to memory-safety errors like buffer-overflows and use-after-free. While several mechanisms to detect such errors have been previously proposed, they suffer from a variety of drawbacks, including poor performance, ...

research-article

Open Access

Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints

Article No.: 11, Pages 1–26https://doi.org/10.1145/3494537

Improving energy efficiency is an important goal of computer system design. This article focuses on a general model of task-parallel applications under quality-of-service requirements on the completion time. Our technique, called Task-RM, exploits the ...

research-article

Open Access

CASHT: Contention Analysis in Shared Hierarchies with Thefts

Article No.: 12, Pages 1–27https://doi.org/10.1145/3494538

Cache management policies should consider workloads’ contention behavior when managing a shared cache. Prior art makes estimates about shared cache behavior by adding extra logic or time to isolate per workload cache statistics. These approaches provide ...

research-article

Open Access

Optimizing Small-Sample Disk Fault Detection Based on LSTM-GAN Model

Article No.: 13, Pages 1–24https://doi.org/10.1145/3500917

In recent years, researches on disk fault detection based on SMART data combined with different machine learning algorithms have been proven to be effective. However, these methods require a large amount of data. In the early stages of the establishment ...

research-article

Open Access

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

Article No.: 14, Pages 1–23https://doi.org/10.1145/3499757

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the ...

research-article

Open Access

CARL: Compiler Assigned Reference Leasing

Article No.: 15, Pages 1–28https://doi.org/10.1145/3498730

Data movement is a common performance bottleneck, and its chief remedy is caching. Traditional cache management is transparent to the workload: data that should be kept in cache are determined by the recency information only, while the program information,...

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

Locality-Aware CTA Scheduling for Gaming Applications

Iterative Compilation Optimization Based on Metric Learning and Collaborative Filtering

ReuseTracker: Fast Yet Accurate Multicore Reuse Distance Analyzer

GPU Domain Specialization via Composable On-Package Architecture

SMT-Based Contention-Free Task Mapping and Scheduling on 2D/3D SMART NoC with Mixed Dimension-Order Routing

Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators

Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming

SecNVM: An Efficient and Write-Friendly Metadata Crash Consistency Scheme for Secure NVM

TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware Scheduling

HeapCheck: Low-cost Hardware Support for Memory Safety

Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints

CASHT: Contention Analysis in Shared Hierarchies with Thefts

Optimizing Small-Sample Disk Fault Detection Based on LSTM-GAN Model

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

CARL: Compiler Assigned Reference Leasing

Sections

Issue Downloads

Save to Binder

Subjects

Comments