research-article

Public Access

Compiler support for near data computing

Authors:

Mahmut Taylan Kandemir,

Mustafa KarakoyAuthors Info & Claims

PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 90 - 104

https://doi.org/10.1145/3437801.3441600

Published: 17 February 2021 Publication History

Abstract

Recent works from both hardware and software domains offer various optimizations that try to take advantage of near data computing (NDC) opportunities. While the results from these works indicate performance improvements of various magnitudes, the existing literature lacks a detailed quantification of the potential of NDC and analysis of compiler optimizations on tapping into that potential. This paper first presents an analysis of the NDC potential when executing multithreaded applications on manycore platforms. It then presents two compiler schemes designed to take advantage of NDC. The first of these schemes try to increase the amount of computation that can be performed in a hardware component, whereas the second compiler strategy strikes a balance between optimizing NDC and exploiting data reuse, by being more selective on when to perform NDC (even if the opportunity presents itself) and how. The collected experimental results on a 5×5 manycore system reveal that our first and second compiler schemes improve the overall performance of our multithreaded applications by, respectively, 22.5% and 25.2%, on average. Furthermore, these two compiler schemes are only 6.8% and 4.1% worse than an oracle scheme that makes the best near data computing decisions for each and every computation.

References

[1]

2012. The Architecture and Performance of the TILE-Gx Processor Family. http://www.tilera.com/products/processors/TILE-Gx_Family.

[2]

Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute Caches. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA).

[3]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[4]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In Proc. of the International Symposium on Computer Architecture (ISCA).

Digital Library

[5]

Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[6]

Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In Proc. of the International Symposium on Computer Architecture (ISCA).

Digital Library

[7]

Jennifer M. Anderson and Monica S. Lam. 1993. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI).

[8]

Jeffery M. Arnold, Duncan A. Buell, and Elaine G. Davis. 1992. SPLASH 2. In Proceedings of the Symposium on Parallel Algorithms and Architectures.

[9]

Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim. 2016. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems. In 2016 49th annual IEEE/ACM international symposium on Microarchitecture (MICRO). IEEE, 1--13.

Digital Library

[10]

Vishal Aslot, Max Domeika, Rudolf Eigenmann, Greg Gaertner, Wesley B. Jones, and Bodo Parady. 2001. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In OpenMP Shared Memory Parallel Programming, Rudolf Eigenmann and Michael J. Voss (Eds.).

[11]

Kristof Beyls and Erik H. D'Hollander. 2009. Refactoring for Data Locality. Computer 42, 2 (2009).

[12]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News (2011).

Digital Library

[13]

Uday Bondhugula, J. Ramanujam, and et al. 2008. PLuTo: A practical and fully automatic polyhedral program optimization system. In Proceedings of Programming Language Design And Implementation (PLDI).

[14]

Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. 1994. Compiler Optimizations for Improving Data Locality. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[15]

John Carter, Wilson Hsieh, Leigh Stoller, Mark Swanson, Lixin Zhang, Erik Brunvand, Al Davis, Chen-Chi Kuo, Ravindra Kuramkote, Michael Parker, Lambert Schaelicke, and Terry Tateyama. 1999. Impulse: building a smarter memory controller. In Proceedings of International Symposium on High-Performance Computer Architecture.

[16]

Benjamin Y. Cho, Yongkee Kwon, Sangkug Lym, and Mattan Erez. 2020. Near Data Acceleration with Concurrent Host Access. In ISCA.

[17]

Wei Ding, Xulong Tang, Mahmut Kandemir, Yuanrui Zhang, and Emre Kultursay. 2015. Optimizing Off-chip Accesses in Multicores. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[18]

Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. DRAMA: An Architecture for Accelerated Processing Near Memory. IEEE Computer Architecture Letters 14, 1 (2015).

Digital Library

[19]

Sílvio Fernandes, Bruno C. Oliveira, and Ivan Saraiva Silva. 2009. Using NoC Routers as Processing Elements. In Proceedings of the Symposium on Integrated Circuits and System Design: Chip on the Dunes.

Digital Library

[20]

Pierfrancesco Foglia, Cosimo A. Prete, Marco Solinas, and Giovanna Monni. 2010. Re-NUCA: Boosting CMP Performance Through Block Replication. In Proc. of the Euromicro Conference on Digital System Design: Architectures, Methods and Tools.

Digital Library

[21]

Haohuan Fu, Junfeng Liao, Jinzhe Yang, Lanning Wang, Zhenya Song, Xiaomeng Huang, Chao Yang, Wei Xue, Fangfang Liu, Fangli Qiao, Wei Zhao, Xunqiang Yin, Chaofeng Hou, Chenglong Zhang, Wei Ge, Jian Zhang, Yangang Wang, Chunbo Zhou, and Guangwen Yang. 2016. The Sunway TaihuLight supercomputer: system and applications. Science China Information Sciences 59, 7 (21 Jun 2016), 072001.

[22]

Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 113--124.

Digital Library

[23]

Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior. ACM Trans. Program. Lang. Syst. (TOPLAS) (1999).

Digital Library

[24]

Maya Gokhale, Bill Holmes, and Ken Iobst. 1995. Processing in Memory: the Terasys Massively Parallel PIM Array. IEEE Computer (1995).

[25]

Peng Gu, yufei Ding, Guoyang Chen, Weifeng Zhang, Dimin Niu, and Yuan Xie. 2020. iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture. In ISCA.

[26]

Ramyad Hadidi, Lifeng Nai, Hyojong Kim, and Hyesoon Kim. 2017. CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory. Trans. Archit. Code Optim. 14, 4 (2017).

[27]

Mary H. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, and Monica S. Lam. 1995. Detecting Coarse-grain Parallelism Using an Interprocedural Parallelizing Compiler. In Supercomputing.

Digital Library

[28]

Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2016. Accelerating Dependent Cache Misses with an Enhanced Memory Controller. In Proccedings of the International Symposium on Computer Architecture (ISCA).

[29]

Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent near-Data Processing in GPU Systems. In Proc. of the International Symposium on Computer Architecture.

[30]

Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Proc. of the Symposium on VLSI Technology (VLSIT).

[31]

Yuho Jin. 2015. Unifying Router Power Gating with Data Placement for Energy-Efficient NoC. In Proc. of the International Symposium on Computer Architecture and High Performance Computing.

Digital Library

[32]

M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee. 2001. A layout-conscious iteration space transformation technique. IEEE Trans. Comput. (2001).

[33]

Mahmut Kandemir, Yuanrui Zhang, Jun Liu, and Taylan Yemliha. 2011. Neighborhood-Aware Data Locality Optimization for NoC-Based Multicores. In Proc. of the International Symposium on Code Generation and Optimization.

[34]

Mahmut Taylan Kandemir, Jihyun Ryoo, Xulong Tang, and Mustafa Karakoy. 2021. Compiler Support for Near Data Computing. Technical Report, Department of Computer Science and Engineering, The Pennsylvania State University (2021).

[35]

Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, and Kevin Hsieh. 2017. Toward standardized near-data processing with unrestricted data placement for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12.

Digital Library

[36]

Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, and Myoungsoo Jung. 2018. Enhancing Computation-to-core Assignment with Physical Location Information. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[37]

Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, and Myoungsoo Jung. 2017. POSTER: Location-Aware Computation Mapping for Manycore Processors. In Proceedings of the 2017 International Conference on Parallel Architectures and Compilation.

[38]

Monica S. Lam and Michael E. Wolf. 2004. A Data Locality Optimizing Algorithm. SIGPLAN Not. 39, 4 (2004).

[39]

Feihui Li, Guangyu Chen, Mahmut Kandemir, and Ibrahim Kolcu. 2007. Profile-Driven Energy Reduction in Network-on-Chips. SIGPLAN Not. 42, 6 (2007), 394--404.

Digital Library

[40]

Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. 1999. An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication. In ICS.

Digital Library

[41]

Qingda Lu, Christophe Alias, Uday Bondhugula, Thomas Henretty, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan, Yongjian Chen, Haibo Lin, and Tin-Fook Ngai. 2009. Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors. In Proc. of the International Conference on Parallel Architectures and Compilation Techniques (PACT).

Digital Library

[42]

Chikeung Luk and Todd C. Mowry. 1996. Compiler-based prefetching for recursive data structures. SIGPLAN Not. 31, 9 (1996).

[43]

Kathryn S. Mckinley, Steve Carr, and Chauwen Tseng. 1996. Improving Data Locality with Loop Transformations. Transactions on Programming Languages and Systems (TOPLAS) 18, 4 (1996).

[44]

Javier Merino, Valentin Puente, and Jose A. Gregorio. 2010. ESP-NUCA: A low-cost adaptive Non-Uniform Cache Architecture. In Proc. of the International Symposium on High-Performance Computer Architecture.

[45]

Javier Merino, Valentín Puente, Pablo Prieto, and José Ángel Gregorio. 2008. SP-NUCA: A Cost Effective Dynamic Non-Uniform Cache Architecture. SIGARCH Comput. Archit. News 36, 2 (2008).

Digital Library

[46]

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2019. Enabling Practical Processing in and near Memory for Data-Intensive Computing. In Proceedings of the Design Automation Conference 2019.

Digital Library

[47]

Ashutosh Pattnaik, Xulong Tang, Onur Kayiran, Adwait Jog, Asit Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, and Chita R. Das. 2019. Opportunistic Computing in GPU Architectures. In Proceedings of the International Symposium on Computer Architecture.

[48]

Ashutosh Pattnaik, Xulong Tang, Onur Kayiran, Adwait Jog, Asit Mishra, Mahmut T Kandemir, Anand Sivasubramaniam, and Chita R Das. 2019. Opportunistic computing in gpu architectures. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 210--223.

Digital Library

[49]

Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In Proc. of the International Symposium on Performance Analysis of Systems and Software (ISPASS).

[50]

Muhammad M. Rafique and Zhichun Zhu. 2018. CAMPS: Conflict-Aware Memory-Side Prefetching Scheme for Hybrid Memory Cube. In Proc. of the International Conference on Parallel Processing.

[51]

Qingchuan Shi, Farrukh Hijaz, and Omer Khan. 2013. Towards efficient dynamic data placement in NoC-based multicores. In Proc. of the International Conference on Computer Design (ICCD).

[52]

Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas. 2017. Pageforge: A near-Memory Content-Aware Page-Merging Architecture. In Proceedings of the International Symposium on Microarchitecture.

Digital Library

[53]

A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y. C. Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro (2016).

Digital Library

[54]

Yonghong Song and Zhiyuan Li. 1999. New Tiling Techniques to Improve Cache Temporal Locality. In PLDI.

[55]

Thomas L. Sterling and Hans P. Zima. 2002. Gilgamesh: A Multithreaded Processor-in-Memory Architecture for Petaflops Computing. In Proc. of the Conference on Supercomputing.

[56]

Harold S. Stone. 1970. A Logic-in-Memory Computer. Computers C-19, 1 (1970).

[57]

Xulong Tang, Mahmut Taylan Kandemir, Hui Zhao, Myoungsoo Jung, and Mustafa Karakoy. 2018. Computing with Near Data. Proc. ACM Meas. Anal. Comput. Syst. 2, 3 (2018).

Digital Library

[58]

Xulong Tang, Orhan Kislal, Mahmut Kandemir, and Mustafa Karakoy. 2017. Data Movement Aware Computation Partitioning. In Proc. of the International Symposium on Microarchitecture.

Digital Library

[59]

Xulong Tang, Mahmut Taylan Kandemir, Mustafa Karakoy, and Meena Arunachalam. 2019. Co-Optimizing Memory-Level Parallelism and Cache-Level Parallelism. In Proceedings of the 40th annual ACM SIGPLAN conference on Programming Language Design and Implementation.

Digital Library

[60]

Gabriel Urzaiz, David Villa, Felix Villanueva, and Juan Carlos Lopez. 2012. Process-in-Network: A Comprehensive Network Processing Approach. Sensors (Basel) 12, 6 (2012), 8112--8134.

[61]

S. Verdoolaege, M. Bruynooghe, G. Janssens, and P. Catthoor. 2003. Multi-dimensional incremental loop fusion for data locality. In ASAP.

[62]

Ben Verghese, Scott Devine, Anoop Gupta, and Mendel Rosenblum. 1996. Operating System Support for Improving Data Locality on CCNUMA Compute Servers. In ASPLOS.

[63]

M. E. Wolf and M. S. Lam. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems (1991).

[64]

Michael Wolfe. 1995. high performance compilers for parallel computing.

[65]

Xu Yang, Yumin Hou, and Hu He. 2019. A Processing-in-Memory Architecture Programming Paradigm for Wireless Internet-of-Things Applications. Sensors (Basel) 19, 1 (2019), 140.

Cited By

Lin JQu HMa SJi XLi HLi XSong CZhang W(2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: Oct-2024
https://doi.org/10.1109/TC.2023.3311948
Bitalebi HSafaei FEbrahimi M(2024)Nearest data processing in GPUSustainable Computing: Informatics and Systems10.1016/j.suscom.2024.10104744(101047)Online publication date: Dec-2024
https://doi.org/10.1016/j.suscom.2024.101047
Wang ZLiu CBeckmann NNowatzki T(2023)Affinity Alloc: Taming Not-So Near-Data ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623778(784-799)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623778
Show More Cited By

Index Terms

Compiler support for near data computing
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures

Recommendations

Performance Characterization of Parallel Discrete Event Simulation on Knights Landing Processor
SIGSIM-PADS '17: Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Performance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by fine-grain communication, especially in execution environments with high communication cost. However, the low cost of on-chip communication in emerging many-...
Thread-Level Speculation Execution Model Based on LLVM Compiler
CNIOT '21: Proceedings of the 2021 2nd International Conference on Computing, Networks and Internet of Things

With the trend of growing number of processing cores on Chip Multiprocessors, researchers have made a lot of efforts to make full use of core resources through extracting programs’ parallelism. Thread-Level Speculation (TLS) can speculatively ...
Affinity Alloc: Taming Not-So Near-Data Computing
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

To mitigate the data movement bottleneck on large multicore systems, the near-data computing paradigm (NDC) offloads computation to where the data resides on-chip. The benefit of NDC heavily depends on spatial affinity, where all relevant data are in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2021

507 pages

ISBN:9781450382946

DOI:10.1145/3437801

General Chair:
Jaejin Lee
Seoul National University, South Korea
,
Program Chair:
Erez Petrank
Technion, Israel

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

University of Pittsburgh
NSF

Conference

PPoPP '21

Sponsor:

PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 27, 2021

Virtual Event, Republic of Korea

Acceptance Rates

PPoPP '21 Paper Acceptance Rate 31 of 150 submissions, 21%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
950
Total Downloads

Downloads (Last 12 months)181
Downloads (Last 6 weeks)28

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin JQu HMa SJi XLi HLi XSong CZhang W(2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: Oct-2024
https://doi.org/10.1109/TC.2023.3311948
Bitalebi HSafaei FEbrahimi M(2024)Nearest data processing in GPUSustainable Computing: Informatics and Systems10.1016/j.suscom.2024.10104744(101047)Online publication date: Dec-2024
https://doi.org/10.1016/j.suscom.2024.101047
Wang ZLiu CBeckmann NNowatzki T(2023)Affinity Alloc: Taming Not-So Near-Data ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623778(784-799)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623778
Kandemir MAkbulut GChoi WKarakoy M(2023)Architecture-Aware Currying2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00029(250-264)Online publication date: 21-Oct-2023
https://doi.org/10.1109/PACT58117.2023.00029
Akbulut GKandemir MKarakoy MChoi W(2023)Data Recomputation for Multithreaded Applications2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323776(01-09)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323776
Maity SGoel MGhose M(2023)Data Locality Aware Computation Offloading in Near Memory Processing Architecture for Big Data Applications2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00019(288-297)Online publication date: 18-Dec-2023
https://doi.org/10.1109/HiPC58850.2023.00019
Bitalebi HGeraeinejad VEbrahimi MSun YWong DNaghibijouybari H(2022)Near LLC versus near main memory processingProceedings of the 14th Workshop on General Purpose Processing Using GPU10.1145/3530390.3532726(1-6)Online publication date: 3-Apr-2022
https://dl.acm.org/doi/10.1145/3530390.3532726
Choe JCrotty AMoreshet THerlihy MBahar RAgrawal KLee I(2022)HybriDSProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538591(321-332)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3490148.3538591
Devic ARai SSivasubramaniam AAkel AEilert SEno JSalapura VZahran MChong FTang L(2022)To PIM or not for emerging general purpose processing in DDR memory systemsProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527431(231-244)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527431
Chen DJin HZheng LHuang YYao PGui CWang QLiu HHe HLiao XZheng R(2022)A General Offloading Approach for Near-DRAM Processing-In-Memory Architectures2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00032(246-257)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00032
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents