TACO: Vol 18, No 3

Volume 18, Issue 3September 2021

Volume 18, Issue 3

September 2021

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

PERI: A Configurable Posit Enabled RISC-V Core

Article No.: 25, Pages 1–26https://doi.org/10.1145/3446210

Owing to the failure of Dennard’s scaling, the past decade has seen a steep growth of prominent new paradigms leveraging opportunities in computer architecture. Two technologies of interest are Posit and RISC-V. Posit was introduced in mid-2017 as a ...

research-article

Open Access

MC-DeF: Creating Customized CGRAs for Dataflow Applications

Article No.: 26, Pages 1–25https://doi.org/10.1145/3447970

Executing complex scientific applications on Coarse-Grain Reconfigurable Arrays (CGRAs) promises improvements in execution time and/or energy consumption compared to optimized software implementations or even fully customized hardware solutions. Typical ...

research-article

Open Access

Acceleration of Parallel-Blocked QR Decomposition of Tall-and-Skinny Matrices on FPGAs

Article No.: 27, Pages 1–25https://doi.org/10.1145/3447775

QR decomposition is one of the most useful factorization kernels in modern numerical linear algebra algorithms. In particular, the decomposition of tall-and-skinny matrices (TSMs) has major applications in areas including scientific computing, machine ...

research-article

Open Access

Decreasing the Miss Rate and Eliminating the Performance Penalty of a Data Filter Cache

Article No.: 28, Pages 1–22https://doi.org/10.1145/3449043

While data filter caches (DFCs) have been shown to be effective at reducing data access energy, they have not been adopted in processors due to the associated performance penalty caused by high DFC miss rates. In this article, we present a design that ...

research-article

Open Access

Performance Evaluation of Intel Optane Memory for Managed Workloads

Shoaib Akram

Article No.: 29, Pages 1–26https://doi.org/10.1145/3451342

Intel Optane memory offers non-volatility, byte addressability, and high capacity. It suits managed workloads that prefer large main memory heaps. We investigate Optane as the main memory for managed (Java) workloads, focusing on performance ...

research-article

Open Access

GraphPEG: Accelerating Graph Processing on GPUs

Article No.: 30, Pages 1–24https://doi.org/10.1145/3450440

Due to massive thread-level parallelism, GPUs have become an attractive platform for accelerating large-scale data parallel computations, such as graph processing. However, achieving high performance for graph processing with GPUs is non-trivial. ...

research-article

Open Access

PRISM: Strong Hardware Isolation-based Soft-Error Resilient Multicore Architecture with High Performance and Availability at Low Hardware Overheads

Article No.: 31, Pages 1–25https://doi.org/10.1145/3450523

Multicores increasingly deploy safety-critical parallel applications that demand resiliency against soft-errors to satisfy the safety standards. However, protection against these errors is challenging due to complex communication and data access ...

research-article

Open Access

PAVER: Locality Graph-Based Thread Block Scheduling for GPUs

Article No.: 32, Pages 1–26https://doi.org/10.1145/3451164

The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache sizes per thread, leading to serious cache contention problems such as thrashing. Hence, the data access locality of an application should be considered during thread ...

research-article

Open Access

Automatic Sublining for Efficient Sparse Memory Accesses

Article No.: 33, Pages 1–23https://doi.org/10.1145/3452141

Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional ...

research-article

Open Access

Fast Key-Value Lookups with Node Tracker

Article No.: 34, Pages 1–26https://doi.org/10.1145/3452099

Lookup operations for in-memory databases are heavily memory bound, because they often rely on pointer-chasing linked data structure traversals. They also have many branches that are hard-to-predict due to random key lookups. In this study, we show that ...

research-article

Open Access

CacheInspector: Reverse Engineering Cache Resources in Public Clouds

Article No.: 35, Pages 1–25https://doi.org/10.1145/3457373

Infrastructure-as-a-Service cloud providers sell virtual machines that are only specified in terms of number of CPU cores, amount of memory, and I/O throughput. Performance-critical aspects such as cache sizes and memory latency are missing or reported ...

research-article

Open Access

Understanding Cache Compression

Article No.: 36, Pages 1–27https://doi.org/10.1145/3457207

Hardware cache compression derives from software-compression research; yet, its implementation is not a straightforward translation, since it must abide by multiple restrictions to comply with area, power, and latency constraints. This study sheds light ...

research-article

Open Access

Flynn’s Reconciliation: Automating the Register Cache Idiom for Cross-accelerator Programming

Article No.: 37, Pages 1–26https://doi.org/10.1145/3458357

A large portion of the recent performance increase in the High Performance Computing (HPC) and Machine Learning (ML) domains is fueled by accelerator cards. Many popular ML frameworks support accelerators by organizing computations as a computational ...

research-article

Open Access

KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls

Article No.: 38, Pages 1–22https://doi.org/10.1145/3459010

Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a code pattern for which a well-tuned library implementation exists is found in the source code ...

research-article

Open Access

Early Address Prediction: Efficient Pipeline Prefetch and Reuse

Article No.: 39, Pages 1–22https://doi.org/10.1145/3458883

Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register ...

Subjects

Currently Not Available

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

PERI: A Configurable Posit Enabled RISC-V Core

MC-DeF: Creating Customized CGRAs for Dataflow Applications

Acceleration of Parallel-Blocked QR Decomposition of Tall-and-Skinny Matrices on FPGAs

Decreasing the Miss Rate and Eliminating the Performance Penalty of a Data Filter Cache

Performance Evaluation of Intel Optane Memory for Managed Workloads

GraphPEG: Accelerating Graph Processing on GPUs

PRISM: Strong Hardware Isolation-based Soft-Error Resilient Multicore Architecture with High Performance and Availability at Low Hardware Overheads

PAVER: Locality Graph-Based Thread Block Scheduling for GPUs

Automatic Sublining for Efficient Sparse Memory Accesses

Fast Key-Value Lookups with Node Tracker

CacheInspector: Reverse Engineering Cache Resources in Public Clouds

Understanding Cache Compression

Flynn’s Reconciliation: Automating the Register Cache Idiom for Cross-accelerator Programming

KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls

Early Address Prediction: Efficient Pipeline Prefetch and Reuse

Sections

Issue Downloads

Save to Binder

Subjects

Comments