research-article

Public Access

To PIM or not for emerging general purpose processing in DDR memory systems

Authors:

Alexandar Devic,

Siddhartha Balakrishna Rai,

Anand Sivasubramaniam,

Justin EnoAuthors Info & Claims

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

Pages 231 - 244

https://doi.org/10.1145/3470496.3527431

Published: 11 June 2022 Publication History

Abstract

As Processing-In-Memory (PIM) hardware matures and starts making its way into normal compute platforms, software has an important role to play in determining what to perform where, and when, on such heterogeneous systems. Taking an emerging class of PIM hardware which provisions a general purpose (RISC-V) processor at each memory bank, this paper takes on this challenging problem by developing a software compilation framework. This framework analyzes several application characteristics - parallelizability, vectorizability, data set sizes, and offload costs - to determine what, whether, when and how to offload computations to the PIM engines. In the process, it also proposes a vector engine extension to the bank-level RISC-V cores. Using several off-the-shelf C/C++ applications, we demonstrate that PIM is not always a panacea, and a framework such as ours is essential in carefully selecting what needs to be performed where, when and how. The choice of hardware platforms - number of memory banks, relative speeds and capabilities of host CPU and PIM cores, can further impact the "to PIM or not" question.

References

[1]

2021. AMD Bulldozer Processor Families. Retrieved July 30, 2021 from https://www.cpu-world.com/CPUs/Bulldozer/index.html

[2]

2021. GitHub - kozyraki/phoenix: An API and runtime environment for data processing with MapReduce for shared-memory multi-core & multiprocessor systems. Retrieved July 30, 2021 from https://github.com/kozyraki/phoenix

[3]

2021. HBM PIM | Technology | Samsung Semiconductor. Retrieved July 30, 2021 from https://www.samsung.com/semiconductor/solutions/technology/hbm-processing-in-memory/

[4]

2021. Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Overview. Retrieved July 30, 2017 from https://www.intel.in/content/www/in/en/architecture-and-technology/avx-512-overview.html

[5]

2021. RISC-V International. Retrieved September 27, 2021 from https://riscv.org/

[6]

2021. riscvOVPsim - Free Imperas RISC-V Instruction Set Simulator | Imperas - Embedded Software Development. Retrieved July 30, 2021 from https://www.imperas.com/riscvovpsim-free-imperas-risc-v-instruction-set-simulator

[7]

2021. UPMEM | UPMEM is releasing a true Processing-in-Memory (PIM) acceleration solution. Retrieved July 30, 2021 from https://www.upmem.com/

[8]

Hameeza Ahmed, Paulo C. Santos, João P. C. Lima, Rafael F. Moura, Marco A. Z. Alves, Antônio C. S. Beck, and Luigi Carro. 2019. A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions. In 2019 Design, Automation Test in Europe Conference Exhibition (DATE). 564--569.

[9]

Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 336--348.

Digital Library

[10]

Mohammad Alian, Seung Won Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam McPadden, Oliver O'Halloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-mei Hwu, and Nam Sung Kim. 2018. Application-Transparent Near-Memory Processing Architecture with Memory Channel Network. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 802--814.

Digital Library

[11]

Marco Alves, Paulo Santos, Matthias Diener, and Luigi Carro. 2015. Opportunities and Challenges of Performing Vector Operations inside the DRAM. 22--28.

Digital Library

[12]

Bahar Asgari, Ramyad Hadidi, Jiashen Cao, Da Eun Shim, Sung-Kyu Lim, and Hyesoon Kim. 2021. FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 908--920.

[13]

Saambhavi Baskaran and Jack Sampson. 2020. Decentralized Offload-Based Execution on Memory-Centric Compute Cores. In The International Symposium on Memory Systems (Washington, DC, USA) (MEMSYS 2020). Association for Computing Machinery, New York, NY, USA, 61--76.

Digital Library

[14]

James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (Berlin, Germany) (ICPE '18). Association for Computing Machinery, New York, NY, USA, 41--42.

Digital Library

[15]

Kuan-Hsu Chen, Bor-Yeh Shen, and Wuu Yang. 2010. An automatic superword vectorization in LLVM. In 16th Workshop on Compiler Techniques for High-Performance and Embedded Computing. 19--27.

[16]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39.

Digital Library

[17]

Stefano Corda, Madhurya Kumaraswamy, Ahsan Javed Awan, Roel Jordans, Akash Kumar, and Henk Corporaal. 2021. NMPO: Near-Memory Computing Profiling and Offloading. CoRR abs/2106.15284 (2021). arXiv:2106.15284 https://arxiv.org/abs/2106.15284

[18]

Palash Das, Shivam Lakhotia, Prabodh Shetty, and Hemangee K. Kapoor. 2018. Towards Near Data Processing of Convolutional Neural Networks. In 2018 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID). 380--385.

[19]

Quan Deng, Lei Jiang, Youtao Zhang, Minxuan Zhang, and Jun Yang. 2018. DrAcc: a DRAM based Accelerator for Accurate CNN Inference. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). 1--6.

Digital Library

[20]

Fabrice Devaux. 2019. The true Processing In Memory accelerator. In 2019 IEEE Hot Chips 31 Symposium (HCS), Cupertino, CA, USA, August 18--20, 2019. IEEE, 1--24.

[21]

Jeff Draper, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff Lacoss, John Granacki, Jaewook Shin, Chun Chen, Chang Kang, Ihn Kim, and Gokhan Daglikoca. 2002. The Architecture of the DIVA Processing-In-Memory Chip. (09 2002).

Digital Library

[22]

D.G. Elliott, M. Stumm, W.M. Snelgrove, C. Cojocaru, and R. Mckenzie. 1999. Computational RAM: implementing processors in memory. IEEE Design Test of Computers 16, 1 (1999), 32--41.

Digital Library

[23]

João Dinis Ferreira, Gabriel Falcao, Juan Gómez-Luna, Mohammed Alser, Lois Orosa, Mohammad Sadrosadati, Jeremie S Kim, Geraldo F Oliveira, Taha Shahroodi, Anant Nori, et al. 2021. pluto: In-dram lookup tables to enable massively parallel general-purpose computation. arXiv preprint arXiv:2104.07699 (2021).

[24]

Basilio B. Fraguela, Jose Renau, Paul Feautrier, David A. Padua, and Josep Torrellas. 2003. Programming the FlexRAM parallel intelligent memory system. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2003, June 11--13, 2003, San Diego, CA, USA, Rudolf Eigenmann and Martin C. Rinard (Eds.). ACM, 49--60.

Digital Library

[25]

M. Gokhale, B. Holmes, and K. Iobst. 1995. Processing in memory: the Terasys massively parallel PIM array. Computer 28, 4 (1995), 23--31.

Digital Library

[26]

John Granacki, Mary Hall, Jeffrey Draper, Jeff Lacoss, and Jacqueline Chame. 2004. DIVA (Data Intensive Architecture). (06 2004), 404.

[27]

Peng Gu, Xinfeng Xie, Yufei Ding, Guoyang Chen, Weifeng Zhang, Dimin Niu, and Yuan Xie. 2020. iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 804--817.

Digital Library

[28]

Ramyad Hadidi, Lifeng Nai, Hyojong Kim, and Hyesoon Kim. 2017. CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory. ACM Transactions on Architecture and Code Optimization 14 (12 2017),1--25.

Digital Library

[29]

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, Jaewook Shin, and Joonseok Park. 1999. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture. In SC '99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. 57--57.

[30]

Lei Han, Zhaoyan Shen, Duo Liu, Zili Shao, H. Howie Huang, and Tao Li. 2018. A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal. ACM Trans. Storage 14, 1 (2018), 9:1--9:26.

Digital Library

[31]

Kevin Hsieh, Eiman Ebrahim, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 204--216.

Digital Library

[32]

Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu. 2016. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In 2016 IEEE 34th International Conference on Computer Design (ICCD). 25--32.

[33]

Wenqin Huangfu, Xueqi Li, Shuangchen Li, Xing Hu, Peng Gu, and Yuan Xie. 2019. MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12--16, 2019. ACM, 587--599.

Digital Library

[34]

Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 802--815.

Digital Library

[35]

Micron Technology Inc. 2021. Micron. DRAM Data Sheet. Micron Technology Inc. Retrieved January 20, 2021 from https://www.micron.com/products/dram/ddr4-sdram

[36]

Alexey N Ivutin and Anna G Troshina. 2018. Use LLVM for optimization of parallel execution of program code on the certain configuration. In 2018 ELEKTRO. IEEE, 1--6.

[37]

Mahmut Kandemir, Jihyun Ryoo, Xulong Tang, and Mustafa Karakoy. 2021. Compiler support for near data computing. 90--104.

Digital Library

[38]

Yi Kang, Wei Huang, Seung-Moon Yoo, D. Keen, Zhenzhou Ge, V. Lam, P. Pattnaik, and J. Torrellas. 1999. FlexRAM: toward an advanced intelligent memory system. In Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040). 192--201.

[39]

Dimitris Kaseridis, Jeffrey Stuecheli, and Lizy Kurian John. 2011. Minimalist Open-Page: A DRAM Page-Mode Scheduling Policy for the Many-Core Era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (Porto Alegre, Brazil) (MICRO-44). Association for Computing Machinery, New York, NY, USA, 24--35.

Digital Library

[40]

Byoung-Hak Kim, Eui Cheol Lim, and Chae Eun Rhee. 2019. Exploration of a PIM Design Configuration for Energy-Efficient Task Offloading. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.

[41]

Peter M. Kogge. 1994. EXECUBE-A New Architecture for Scaleable MPPs. In 1994 International Conference on Parallel Processing Vol. 1, Vol. 1. 77--84.

Digital Library

[42]

Mingu Kong, Min-Sun Keel, Naresh R. Shanbhag, Sean Eilert, and Ken Curewitz. 2014. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4--9, 2014. IEEE, 8326--8330.

[43]

Boxun Li, Peng Gu, Yi Shan, Yu Wang, Yiran Chen, and Huazhong Yang. 2015. RRAM-Based Analog Approximate Computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 12 (2015), 1905--1917.

Digital Library

[44]

Elliot Lockerman, Axel Feldmann, Mohammad Bakhshalipour, Alexandru Stanescu, Daniel Sanchez, and Nathan Beckmann. 2020. Livia: Data-Centric Computing Throughout the Memory Hierarchy. 417--433.

Digital Library

[45]

K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, and M. Horowitz. 2000. Smart Memories: a modular reconfigurable architecture. In Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201). 161--171.

[46]

Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 457--468.

[47]

Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, and Hyesoon Kim. 2018. CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 680--689.

[48]

M. Oskin, F.T. Chong, and T. Sherwood. 1998. Active Pages: a computation model for intelligent memory. In Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235). 192--203.

[49]

D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. Intelligent RAM (IRAM): chips that remember and compute. In 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers. 224--225.

[50]

Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das. 2016. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (Haifa, Israel) (PACT '16). ACM, New York, NY, USA, 31--44.

Digital Library

[51]

S. H. Pugsley, J. Jestes, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and F. Li. 2014. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro 34, 4 (2014), 44--52.

[52]

S. H. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and F. Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. 190--200.

[53]

Siddhartha Balakrishna Rai, Anand Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, Vijaykrishnan Narayanan, Ameen Akel, and Sean Eilert.2021. Design Space for Scaling-in General Purpose Computing within the DDR DRAM Hierarchy for Map-Reduce Workloads. In Proceedings of the 18th ACM International Conference on Computing Frontiers (Virtual Event, Italy) (CF '21). Association for Computing Machinery, New York, NY, USA, 113--123.

Digital Library

[54]

Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. 2007. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 13--24.

Digital Library

[55]

Venkata Yaswanth Raparti and Sudeep Pasricha. 2018. DAPPER: Data Aware Approximate NoC for GPGPU Architectures. In 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). 1--8.

[56]

Paul Rosenfeld, Elliott Cooper-Balis, and Bruce L. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Comput. Archit. Lett. 10, 1 (2011), 16--19.

Digital Library

[57]

Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. RowClone: Fast and Energy-Efficient in-DRAM Bulk Data Copy and Initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (Davis, California) (MICRO-46). Association for Computing Machinery, New York, NY, USA, 185--197.

Digital Library

[58]

Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2016. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM. CoRR abs/1611.09988 (2016). arXiv:1611.09988 http://arxiv.org/abs/1611.09988

[59]

Gagandeep Singh, Juan Gómez-Luna, Giovanni Mariani, Geraldo F. Oliveira, Stefano Corda, Sander Stuijk, Onur Mutlu, and Henk Corporaal. 2019. NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning. In 2019 56th ACM/IEEE Design Automation Conference (DAC). 1--6.

Digital Library

[60]

Srivatsa Rangachar Srinivasa, Wei-Hao Chen, Yung-Ning Tu, Meng-Fan Chang, Jack Sampson, and Vijaykrishnan Narayanan. 2019. Monolithic-3D Integration Augmented Design Techniques for Computing in SRAMs. In IEEE International Symposium on Circuits and Systems, ISCAS 2019, Sapporo, Japan, May 26--29, 2019. IEEE, 1--5.

[61]

Harold S. Stone. 1970. A Logic-in-Memory Computer. IEEE Trans. Comput. C-19, 1 (1970), 73--78.

Digital Library

[62]

Xulong Tang, Orhan Kislal, Mahmut Kandemir, and Mustafa Karakoy. 2017. Data Movement Aware Computation Partitioning. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 730--744.

[63]

Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky, et al. 2017. LLVM compiler implementation for explicit parallelization and SIMD vectorization. In Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC. 1--11.

Digital Library

[64]

Kanishkan Vadivel, Lorenzo Chelini, Ali BanaGozar, Gagandeep Singh, Stefano Corda, Roel Jordans, and Henk Corporaal. 2020. TDO-CIM: Transparent Detection and Offloading for Computation In-memory. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1602--1605.

[65]

Dong Ping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: throughput-oriented programmable processing in memory. In The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC'14, Vancouver, BC, Canada - June 23 - 27, 2014, Beth Plale, Matei Ripeanu, Franck Cappello, and Dongyan Xu (Eds.). ACM, 85--98.

Digital Library

[66]

Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang. 2000. A Permutation-Based Page Interleaving Scheme to Reduce Row-Buffer Conflicts and Exploit Data Locality. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (Monterey, California, USA) (MICRO 33). Association for Computing Machinery, New York, NY, USA, 32--41.

Digital Library

[67]

Vasileios Zois, Divya Gupta, Vassilis J. Tsotras, Walid A. Najjar, and Jean-Francois Roy. 2018. Massively Parallel Skyline Computation for Processing-in-Memory Architectures. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (Limassol, Cyprus) (PACT '18). Association for Computing Machinery, New York, NY, USA, Article 1, 12 pages.

Digital Library

Cited By

Friesel BDreimann MSpinczyk O(2024)Performance Models for Task-based Scheduling with Disruptive Memory TechnologiesProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699376(1-8)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699376
Liu JZhao ZDing ZBrock BRong HZhang Z(2024)UniSparse: An Intermediate Language for General Sparse Format CustomizationProceedings of the ACM on Programming Languages10.1145/36498168:OOPSLA1(137-165)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649816
Lee DHyun BKim TRhu M(2024)PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00053(627-642)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00053
Show More Cited By

Index Terms

To PIM or not for emerging general purpose processing in DDR memory systems
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures
      2. Emerging languages and compilers

Recommendations

Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and ...
Exploring Processing In-Memory for Different Technologies
GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

The recent emergence of IoT has led to a substantial increase in the amount of data processed. Today, a large number of applications are data intensive, involving massive data transfers between processing core and memory. These transfers act as a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

June 2022

1097 pages

ISBN:9781450386104

DOI:10.1145/3470496

General Chairs:
Valentina Salapura
Google
,
Mohamed Zahran
New York University
,
Program Chairs:
Fred Chong
The University of Chicago
,
Lingjia Tang
The University of Michigan

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ISCA '22

Sponsor:

SIGARCH

ISCA '22: The 49th Annual International Symposium on Computer Architecture

June 18 - 22, 2022

New York, New York

Acceptance Rates

ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
3,542
Total Downloads

Downloads (Last 12 months)1,077
Downloads (Last 6 weeks)98

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Friesel BDreimann MSpinczyk O(2024)Performance Models for Task-based Scheduling with Disruptive Memory TechnologiesProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699376(1-8)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699376
Liu JZhao ZDing ZBrock BRong HZhang Z(2024)UniSparse: An Intermediate Language for General Sparse Format CustomizationProceedings of the ACM on Programming Languages10.1145/36498168:OOPSLA1(137-165)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649816
Lee DHyun BKim TRhu M(2024)PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00053(627-642)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00053
Kakrannaya VRai SSivasubramaniam AZhu T(2024)Fast and Accurate DNN Performance Estimation across Diverse Hardware Platforms2024 32nd International Conference on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS64422.2024.10786578(1-8)Online publication date: 21-Oct-2024
https://doi.org/10.1109/MASCOTS64422.2024.10786578
Zhao YGao MLiu FHu YWang ZLin HLi JXian HDong HYang TJing NLiang XJiang L(2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00053
Rogers JSoliman TJahre M(2024)AIO: An Abstraction for Performance Analysis Across Diverse Accelerator Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00043(487-500)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00043
Patel NMamandipoor ANouri MAlian M(2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00032
Hyun BKim TLee DRhu M(2024)Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00029(263-279)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00029
Oliveira GOlgun AYağlıkçı ABostancı FGómez-Luna JGhose SMutlu O(2024)MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00024(186-203)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00024
Vieira JRoma NFalcao GTomás P(2024)NDPmulator: Enabling Full-System Simulation for Near-Data Accelerators From Caches to DRAMIEEE Access10.1109/ACCESS.2024.335292412(10349-10365)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3352924
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents