research-article

MiCache: An MSHR-inclusive Non-blocking Cache Design for FPGAs

Authors:

Hai JinAuthors Info & Claims

FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Pages 22 - 32

https://doi.org/10.1145/3626202.3637571

Published: 02 April 2024 Publication History

Get Access

Abstract

On FPGAs, customizing data parallelism can significantly improve performances of applications. However, a large number of applications, such as sparse matrix multiplication, exhibit irregular memory access patterns, for which further improvements are limited by their low memory access efficiency. It is challenging to solve using traditional caches due to the massive cache misses. To address this, prior research efforts are dedicated to developing non-blocking caches withMiss Status Holding Registers (MSHRs) to manage the cache misses and mitigate stalls caused by the misses. However, exsiting approaches allocate dedicatedBlock RAMs (BRAMs) for implementing MSHRs. It introduces complexities in MSHR configurations and potential resource inefficiencies, as MSHR demand is highly dynamic when solving real-world problems. In this paper, we present MiCache, an MSHR-inclusive non-blocking cache design where cache entries and MSHR entries share the same storage spaces to support the dynamic demand for MSHRs during the executions of applications. We design a consistent storage structure for cache and MSHR entries, ensuring a unified and efficient mechanism for cache/MSHR lookup and data access. To further improve the performance, we design a parallel dual pipeline, one of which processes the requests from processing elements, and the other processes the responses from off-chip memory. We implement and evaluate our proposal on a Xilinx Alveo U280 board. Evaluation results show that, compared to the state-of-the-art non-blocking cache design on FPGAs, with equivalent cache configurations, MiCache reduces the BRAM consumption by up to 17%. When using the same amount of BRAM resources, MiCache achieves up to 1.56x performance improvement.

References

[1]

Johnathan Alsop, Xianwei Zhang, Tsung Tai Yeh, Bradford M. Beckmann, Matthew D. Sinclair, Srikant Bharadwaj, Alexandru Dutu, Anthony Gutierrez, Onur Kayiran, Michael LeBeane, Brandon Potter, and Sooraj Puthoor. 2019. Optimizing GPU Cache Policies for MI Workloads. In Proceedings of IEEE International Symposium on Workload Characterization. Orlando, FL, USA, 243--248.

Abstract

References

Index Terms

Recommendations

Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies

Temporal-based multilevel correlating inclusive cache replacement

Adaptive Cache Bypassing for Inclusive Last Level Caches

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations