Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 18, Issue 3September 2021
Editor:
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
ISSN:1544-3566
EISSN:1544-3973
Reflects downloads up to 26 Sep 2024Bibliometrics
Skip Table Of Content Section
research-article
Open Access
PERI: A Configurable Posit Enabled RISC-V Core
Article No.: 25, Pages 1–26https://doi.org/10.1145/3446210

Owing to the failure of Dennard’s scaling, the past decade has seen a steep growth of prominent new paradigms leveraging opportunities in computer architecture. Two technologies of interest are Posit and RISC-V. Posit was introduced in mid-2017 as a ...

research-article
Open Access
MC-DeF: Creating Customized CGRAs for Dataflow Applications
Article No.: 26, Pages 1–25https://doi.org/10.1145/3447970

Executing complex scientific applications on Coarse-Grain Reconfigurable Arrays (CGRAs) promises improvements in execution time and/or energy consumption compared to optimized software implementations or even fully customized hardware solutions. Typical ...

research-article
Open Access
Acceleration of Parallel-Blocked QR Decomposition of Tall-and-Skinny Matrices on FPGAs
Article No.: 27, Pages 1–25https://doi.org/10.1145/3447775

QR decomposition is one of the most useful factorization kernels in modern numerical linear algebra algorithms. In particular, the decomposition of tall-and-skinny matrices (TSMs) has major applications in areas including scientific computing, machine ...

research-article
Open Access
Decreasing the Miss Rate and Eliminating the Performance Penalty of a Data Filter Cache
Article No.: 28, Pages 1–22https://doi.org/10.1145/3449043

While data filter caches (DFCs) have been shown to be effective at reducing data access energy, they have not been adopted in processors due to the associated performance penalty caused by high DFC miss rates. In this article, we present a design that ...

research-article
Open Access
Performance Evaluation of Intel Optane Memory for Managed Workloads
Article No.: 29, Pages 1–26https://doi.org/10.1145/3451342

Intel Optane memory offers non-volatility, byte addressability, and high capacity. It suits managed workloads that prefer large main memory heaps. We investigate Optane as the main memory for managed (Java) workloads, focusing on performance ...

research-article
Open Access
GraphPEG: Accelerating Graph Processing on GPUs
Article No.: 30, Pages 1–24https://doi.org/10.1145/3450440

Due to massive thread-level parallelism, GPUs have become an attractive platform for accelerating large-scale data parallel computations, such as graph processing. However, achieving high performance for graph processing with GPUs is non-trivial. ...

research-article
Open Access
PRISM: Strong Hardware Isolation-based Soft-Error Resilient Multicore Architecture with High Performance and Availability at Low Hardware Overheads
Article No.: 31, Pages 1–25https://doi.org/10.1145/3450523

Multicores increasingly deploy safety-critical parallel applications that demand resiliency against soft-errors to satisfy the safety standards. However, protection against these errors is challenging due to complex communication and data access ...

research-article
Open Access
PAVER: Locality Graph-Based Thread Block Scheduling for GPUs
Article No.: 32, Pages 1–26https://doi.org/10.1145/3451164

The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache sizes per thread, leading to serious cache contention problems such as thrashing. Hence, the data access locality of an application should be considered during thread ...

research-article
Open Access
Automatic Sublining for Efficient Sparse Memory Accesses
Article No.: 33, Pages 1–23https://doi.org/10.1145/3452141

Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional ...

research-article
Open Access
Fast Key-Value Lookups with Node Tracker
Article No.: 34, Pages 1–26https://doi.org/10.1145/3452099

Lookup operations for in-memory databases are heavily memory bound, because they often rely on pointer-chasing linked data structure traversals. They also have many branches that are hard-to-predict due to random key lookups. In this study, we show that ...

research-article
Open Access
CacheInspector: Reverse Engineering Cache Resources in Public Clouds
Article No.: 35, Pages 1–25https://doi.org/10.1145/3457373

Infrastructure-as-a-Service cloud providers sell virtual machines that are only specified in terms of number of CPU cores, amount of memory, and I/O throughput. Performance-critical aspects such as cache sizes and memory latency are missing or reported ...

research-article
Open Access
Understanding Cache Compression
Article No.: 36, Pages 1–27https://doi.org/10.1145/3457207

Hardware cache compression derives from software-compression research; yet, its implementation is not a straightforward translation, since it must abide by multiple restrictions to comply with area, power, and latency constraints. This study sheds light ...

research-article
Open Access
Flynn’s Reconciliation: Automating the Register Cache Idiom for Cross-accelerator Programming
Article No.: 37, Pages 1–26https://doi.org/10.1145/3458357

A large portion of the recent performance increase in the High Performance Computing (HPC) and Machine Learning (ML) domains is fueled by accelerator cards. Many popular ML frameworks support accelerators by organizing computations as a computational ...

research-article
Open Access
KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls
Article No.: 38, Pages 1–22https://doi.org/10.1145/3459010

Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a code pattern for which a well-tuned library implementation exists is found in the source code ...

research-article
Open Access
Early Address Prediction: Efficient Pipeline Prefetch and Reuse
Article No.: 39, Pages 1–22https://doi.org/10.1145/3458883

Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register ...

Subjects

Currently Not Available

Comments