research-article

PIM-Potential: Broadening the Acceleration Reach of PIM Architectures

Authors: Johnathan Alsop, Shaizeen Aga, Mohamed Ibrahim, Mahzabeen Islam, Nuwan Jayasena, Andrew McCrabbAuthors Info & Claims

MEMSYS '24: Proceedings of the International Symposium on Memory Systems

Pages 1 - 12

https://doi.org/10.1145/3695794.3695795

Published: 11 December 2024 Publication History

Get Access

Abstract

Continual demand for memory bandwidth has made it worthwhile for memory vendors to reassess processing in memory (PIM), which enables higher bandwidth by placing compute units in/near-memory. As such, memory vendors have recently proposed commercially viable PIM designs. While the challenge of efficient PIM orchestration requires consideration of new constraints across the compute stack, these can often be hidden from software and microarchitecture for highly regular workloads (e.g., common machine learning, or ML, primitives). However, these new constraints are not as easy to hide for workloads that exhibit certain types of irregularity. To extend PIM’s reach to a broader range of workloads, navigating these new constraints becomes necessary at all levels of the compute stack.

In this work, we analyze the capabilities and constraints of a promising new type of commercial PIM architectures and we describe the properties that make a workload amenable to acceleration on such a system. Next, we explore how limitations of these PIM designs like row activation overheads, lack of reuse benefit, and command bandwidth can expose novel bottlenecks for some workloads. These workloads, termed PIM-potential workloads, have properties which deviate in limited ways from the identified amenability characteristics but enjoy only minor performance gain from PIM. The exposed bottlenecks motivate targeted hardware and software optimizations - eager activation, increased register storage, selective PIM command issue, and increased command bandwidth - that can be leveraged to mitigate these performance constraints and enable PIM acceleration for a wider range of workloads. We evaluate their impact on PIM-potential primitive acceleration and demonstrate that PIM can be applied more broadly than previously described if the emergent PIM bottlenecks can be addressed. We argue that emerging PIM architectures and programming models should take into account these novel PIM bottlenecks and corresponding optimizations in order to enhance the scope of PIM acceleration.

References

[1]

2013. JEDEC High Bandwidth Memory (HBM) DRAM. https://www.jedec.org/standards-documents/docs/jesd235a.

Abstract

References

Index Terms

Recommendations

Plug N’ PIM: An integration strategy for Processing-in-Memory accelerators

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations