Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3470496.3527431acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article
Public Access

To PIM or not for emerging general purpose processing in DDR memory systems

Published: 11 June 2022 Publication History

Abstract

As Processing-In-Memory (PIM) hardware matures and starts making its way into normal compute platforms, software has an important role to play in determining what to perform where, and when, on such heterogeneous systems. Taking an emerging class of PIM hardware which provisions a general purpose (RISC-V) processor at each memory bank, this paper takes on this challenging problem by developing a software compilation framework. This framework analyzes several application characteristics - parallelizability, vectorizability, data set sizes, and offload costs - to determine what, whether, when and how to offload computations to the PIM engines. In the process, it also proposes a vector engine extension to the bank-level RISC-V cores. Using several off-the-shelf C/C++ applications, we demonstrate that PIM is not always a panacea, and a framework such as ours is essential in carefully selecting what needs to be performed where, when and how. The choice of hardware platforms - number of memory banks, relative speeds and capabilities of host CPU and PIM cores, can further impact the "to PIM or not" question.

References

[1]
2021. AMD Bulldozer Processor Families. Retrieved July 30, 2021 from https://www.cpu-world.com/CPUs/Bulldozer/index.html
[2]
2021. GitHub - kozyraki/phoenix: An API and runtime environment for data processing with MapReduce for shared-memory multi-core & multiprocessor systems. Retrieved July 30, 2021 from https://github.com/kozyraki/phoenix
[3]
2021. HBM PIM | Technology | Samsung Semiconductor. Retrieved July 30, 2021 from https://www.samsung.com/semiconductor/solutions/technology/hbm-processing-in-memory/
[4]
2021. Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Overview. Retrieved July 30, 2017 from https://www.intel.in/content/www/in/en/architecture-and-technology/avx-512-overview.html
[5]
2021. RISC-V International. Retrieved September 27, 2021 from https://riscv.org/
[6]
2021. riscvOVPsim - Free Imperas RISC-V Instruction Set Simulator | Imperas - Embedded Software Development. Retrieved July 30, 2021 from https://www.imperas.com/riscvovpsim-free-imperas-risc-v-instruction-set-simulator
[7]
2021. UPMEM | UPMEM is releasing a true Processing-in-Memory (PIM) acceleration solution. Retrieved July 30, 2021 from https://www.upmem.com/
[8]
Hameeza Ahmed, Paulo C. Santos, João P. C. Lima, Rafael F. Moura, Marco A. Z. Alves, Antônio C. S. Beck, and Luigi Carro. 2019. A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions. In 2019 Design, Automation Test in Europe Conference Exhibition (DATE). 564--569.
[9]
Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 336--348.
[10]
Mohammad Alian, Seung Won Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam McPadden, Oliver O'Halloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-mei Hwu, and Nam Sung Kim. 2018. Application-Transparent Near-Memory Processing Architecture with Memory Channel Network. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 802--814.
[11]
Marco Alves, Paulo Santos, Matthias Diener, and Luigi Carro. 2015. Opportunities and Challenges of Performing Vector Operations inside the DRAM. 22--28.
[12]
Bahar Asgari, Ramyad Hadidi, Jiashen Cao, Da Eun Shim, Sung-Kyu Lim, and Hyesoon Kim. 2021. FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 908--920.
[13]
Saambhavi Baskaran and Jack Sampson. 2020. Decentralized Offload-Based Execution on Memory-Centric Compute Cores. In The International Symposium on Memory Systems (Washington, DC, USA) (MEMSYS 2020). Association for Computing Machinery, New York, NY, USA, 61--76.
[14]
James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (Berlin, Germany) (ICPE '18). Association for Computing Machinery, New York, NY, USA, 41--42.
[15]
Kuan-Hsu Chen, Bor-Yeh Shen, and Wuu Yang. 2010. An automatic superword vectorization in LLVM. In 16th Workshop on Compiler Techniques for High-Performance and Embedded Computing. 19--27.
[16]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39.
[17]
Stefano Corda, Madhurya Kumaraswamy, Ahsan Javed Awan, Roel Jordans, Akash Kumar, and Henk Corporaal. 2021. NMPO: Near-Memory Computing Profiling and Offloading. CoRR abs/2106.15284 (2021). arXiv:2106.15284 https://arxiv.org/abs/2106.15284
[18]
Palash Das, Shivam Lakhotia, Prabodh Shetty, and Hemangee K. Kapoor. 2018. Towards Near Data Processing of Convolutional Neural Networks. In 2018 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID). 380--385.
[19]
Quan Deng, Lei Jiang, Youtao Zhang, Minxuan Zhang, and Jun Yang. 2018. DrAcc: a DRAM based Accelerator for Accurate CNN Inference. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). 1--6.
[20]
Fabrice Devaux. 2019. The true Processing In Memory accelerator. In 2019 IEEE Hot Chips 31 Symposium (HCS), Cupertino, CA, USA, August 18--20, 2019. IEEE, 1--24.
[21]
Jeff Draper, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff Lacoss, John Granacki, Jaewook Shin, Chun Chen, Chang Kang, Ihn Kim, and Gokhan Daglikoca. 2002. The Architecture of the DIVA Processing-In-Memory Chip. (09 2002).
[22]
D.G. Elliott, M. Stumm, W.M. Snelgrove, C. Cojocaru, and R. Mckenzie. 1999. Computational RAM: implementing processors in memory. IEEE Design Test of Computers 16, 1 (1999), 32--41.
[23]
João Dinis Ferreira, Gabriel Falcao, Juan Gómez-Luna, Mohammed Alser, Lois Orosa, Mohammad Sadrosadati, Jeremie S Kim, Geraldo F Oliveira, Taha Shahroodi, Anant Nori, et al. 2021. pluto: In-dram lookup tables to enable massively parallel general-purpose computation. arXiv preprint arXiv:2104.07699 (2021).
[24]
Basilio B. Fraguela, Jose Renau, Paul Feautrier, David A. Padua, and Josep Torrellas. 2003. Programming the FlexRAM parallel intelligent memory system. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2003, June 11--13, 2003, San Diego, CA, USA, Rudolf Eigenmann and Martin C. Rinard (Eds.). ACM, 49--60.
[25]
M. Gokhale, B. Holmes, and K. Iobst. 1995. Processing in memory: the Terasys massively parallel PIM array. Computer 28, 4 (1995), 23--31.
[26]
John Granacki, Mary Hall, Jeffrey Draper, Jeff Lacoss, and Jacqueline Chame. 2004. DIVA (Data Intensive Architecture). (06 2004), 404.
[27]
Peng Gu, Xinfeng Xie, Yufei Ding, Guoyang Chen, Weifeng Zhang, Dimin Niu, and Yuan Xie. 2020. iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 804--817.
[28]
Ramyad Hadidi, Lifeng Nai, Hyojong Kim, and Hyesoon Kim. 2017. CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory. ACM Transactions on Architecture and Code Optimization 14 (12 2017),1--25.
[29]
M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, Jaewook Shin, and Joonseok Park. 1999. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture. In SC '99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. 57--57.
[30]
Lei Han, Zhaoyan Shen, Duo Liu, Zili Shao, H. Howie Huang, and Tao Li. 2018. A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal. ACM Trans. Storage 14, 1 (2018), 9:1--9:26.
[31]
Kevin Hsieh, Eiman Ebrahim, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 204--216.
[32]
Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu. 2016. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In 2016 IEEE 34th International Conference on Computer Design (ICCD). 25--32.
[33]
Wenqin Huangfu, Xueqi Li, Shuangchen Li, Xing Hu, Peng Gu, and Yuan Xie. 2019. MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12--16, 2019. ACM, 587--599.
[34]
Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 802--815.
[35]
Micron Technology Inc. 2021. Micron. DRAM Data Sheet. Micron Technology Inc. Retrieved January 20, 2021 from https://www.micron.com/products/dram/ddr4-sdram
[36]
Alexey N Ivutin and Anna G Troshina. 2018. Use LLVM for optimization of parallel execution of program code on the certain configuration. In 2018 ELEKTRO. IEEE, 1--6.
[37]
Mahmut Kandemir, Jihyun Ryoo, Xulong Tang, and Mustafa Karakoy. 2021. Compiler support for near data computing. 90--104.
[38]
Yi Kang, Wei Huang, Seung-Moon Yoo, D. Keen, Zhenzhou Ge, V. Lam, P. Pattnaik, and J. Torrellas. 1999. FlexRAM: toward an advanced intelligent memory system. In Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040). 192--201.
[39]
Dimitris Kaseridis, Jeffrey Stuecheli, and Lizy Kurian John. 2011. Minimalist Open-Page: A DRAM Page-Mode Scheduling Policy for the Many-Core Era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (Porto Alegre, Brazil) (MICRO-44). Association for Computing Machinery, New York, NY, USA, 24--35.
[40]
Byoung-Hak Kim, Eui Cheol Lim, and Chae Eun Rhee. 2019. Exploration of a PIM Design Configuration for Energy-Efficient Task Offloading. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.
[41]
Peter M. Kogge. 1994. EXECUBE-A New Architecture for Scaleable MPPs. In 1994 International Conference on Parallel Processing Vol. 1, Vol. 1. 77--84.
[42]
Mingu Kong, Min-Sun Keel, Naresh R. Shanbhag, Sean Eilert, and Ken Curewitz. 2014. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4--9, 2014. IEEE, 8326--8330.
[43]
Boxun Li, Peng Gu, Yi Shan, Yu Wang, Yiran Chen, and Huazhong Yang. 2015. RRAM-Based Analog Approximate Computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 12 (2015), 1905--1917.
[44]
Elliot Lockerman, Axel Feldmann, Mohammad Bakhshalipour, Alexandru Stanescu, Daniel Sanchez, and Nathan Beckmann. 2020. Livia: Data-Centric Computing Throughout the Memory Hierarchy. 417--433.
[45]
K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, and M. Horowitz. 2000. Smart Memories: a modular reconfigurable architecture. In Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201). 161--171.
[46]
Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 457--468.
[47]
Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, and Hyesoon Kim. 2018. CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 680--689.
[48]
M. Oskin, F.T. Chong, and T. Sherwood. 1998. Active Pages: a computation model for intelligent memory. In Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235). 192--203.
[49]
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. Intelligent RAM (IRAM): chips that remember and compute. In 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers. 224--225.
[50]
Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das. 2016. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (Haifa, Israel) (PACT '16). ACM, New York, NY, USA, 31--44.
[51]
S. H. Pugsley, J. Jestes, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and F. Li. 2014. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro 34, 4 (2014), 44--52.
[52]
S. H. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and F. Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. 190--200.
[53]
Siddhartha Balakrishna Rai, Anand Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, Vijaykrishnan Narayanan, Ameen Akel, and Sean Eilert.2021. Design Space for Scaling-in General Purpose Computing within the DDR DRAM Hierarchy for Map-Reduce Workloads. In Proceedings of the 18th ACM International Conference on Computing Frontiers (Virtual Event, Italy) (CF '21). Association for Computing Machinery, New York, NY, USA, 113--123.
[54]
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. 2007. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 13--24.
[55]
Venkata Yaswanth Raparti and Sudeep Pasricha. 2018. DAPPER: Data Aware Approximate NoC for GPGPU Architectures. In 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). 1--8.
[56]
Paul Rosenfeld, Elliott Cooper-Balis, and Bruce L. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Comput. Archit. Lett. 10, 1 (2011), 16--19.
[57]
Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. RowClone: Fast and Energy-Efficient in-DRAM Bulk Data Copy and Initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (Davis, California) (MICRO-46). Association for Computing Machinery, New York, NY, USA, 185--197.
[58]
Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2016. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM. CoRR abs/1611.09988 (2016). arXiv:1611.09988 http://arxiv.org/abs/1611.09988
[59]
Gagandeep Singh, Juan Gómez-Luna, Giovanni Mariani, Geraldo F. Oliveira, Stefano Corda, Sander Stuijk, Onur Mutlu, and Henk Corporaal. 2019. NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning. In 2019 56th ACM/IEEE Design Automation Conference (DAC). 1--6.
[60]
Srivatsa Rangachar Srinivasa, Wei-Hao Chen, Yung-Ning Tu, Meng-Fan Chang, Jack Sampson, and Vijaykrishnan Narayanan. 2019. Monolithic-3D Integration Augmented Design Techniques for Computing in SRAMs. In IEEE International Symposium on Circuits and Systems, ISCAS 2019, Sapporo, Japan, May 26--29, 2019. IEEE, 1--5.
[61]
Harold S. Stone. 1970. A Logic-in-Memory Computer. IEEE Trans. Comput. C-19, 1 (1970), 73--78.
[62]
Xulong Tang, Orhan Kislal, Mahmut Kandemir, and Mustafa Karakoy. 2017. Data Movement Aware Computation Partitioning. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 730--744.
[63]
Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky, et al. 2017. LLVM compiler implementation for explicit parallelization and SIMD vectorization. In Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC. 1--11.
[64]
Kanishkan Vadivel, Lorenzo Chelini, Ali BanaGozar, Gagandeep Singh, Stefano Corda, Roel Jordans, and Henk Corporaal. 2020. TDO-CIM: Transparent Detection and Offloading for Computation In-memory. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1602--1605.
[65]
Dong Ping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: throughput-oriented programmable processing in memory. In The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC'14, Vancouver, BC, Canada - June 23 - 27, 2014, Beth Plale, Matei Ripeanu, Franck Cappello, and Dongyan Xu (Eds.). ACM, 85--98.
[66]
Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang. 2000. A Permutation-Based Page Interleaving Scheme to Reduce Row-Buffer Conflicts and Exploit Data Locality. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (Monterey, California, USA) (MICRO 33). Association for Computing Machinery, New York, NY, USA, 32--41.
[67]
Vasileios Zois, Divya Gupta, Vassilis J. Tsotras, Walid A. Najjar, and Jean-Francois Roy. 2018. Massively Parallel Skyline Computation for Processing-in-Memory Architectures. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (Limassol, Cyprus) (PACT '18). Association for Computing Machinery, New York, NY, USA, Article 1, 12 pages.

Cited By

View all
  • (2024)Performance Models for Task-based Scheduling with Disruptive Memory TechnologiesProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699376(1-8)Online publication date: 3-Nov-2024
  • (2024)UniSparse: An Intermediate Language for General Sparse Format CustomizationProceedings of the ACM on Programming Languages10.1145/36498168:OOPSLA1(137-165)Online publication date: 29-Apr-2024
  • (2024)PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00053(627-642)Online publication date: 2-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture
June 2022
1097 pages
ISBN:9781450386104
DOI:10.1145/3470496
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM
  2. compilers
  3. general purpose processing
  4. parallel processing
  5. processing-in-memory
  6. vector processing

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA '22
Sponsor:

Acceptance Rates

ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,077
  • Downloads (Last 6 weeks)98
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Performance Models for Task-based Scheduling with Disruptive Memory TechnologiesProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699376(1-8)Online publication date: 3-Nov-2024
  • (2024)UniSparse: An Intermediate Language for General Sparse Format CustomizationProceedings of the ACM on Programming Languages10.1145/36498168:OOPSLA1(137-165)Online publication date: 29-Apr-2024
  • (2024)PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00053(627-642)Online publication date: 2-Nov-2024
  • (2024)Fast and Accurate DNN Performance Estimation across Diverse Hardware Platforms2024 32nd International Conference on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS64422.2024.10786578(1-8)Online publication date: 21-Oct-2024
  • (2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
  • (2024)AIO: An Abstraction for Performance Analysis Across Diverse Accelerator Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00043(487-500)Online publication date: 29-Jun-2024
  • (2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
  • (2024)Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00029(263-279)Online publication date: 2-Mar-2024
  • (2024)MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00024(186-203)Online publication date: 2-Mar-2024
  • (2024)NDPmulator: Enabling Full-System Simulation for Near-Data Accelerators From Caches to DRAMIEEE Access10.1109/ACCESS.2024.335292412(10349-10365)Online publication date: 2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media