Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3422575.3422792acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

X-Centric: A Survey on Compute-, Memory- and Application-Centric Computer Architectures

Published: 21 March 2021 Publication History

Abstract

Big Data and machine learning constitute the multifaceted challenge of computer engineering in the past decade. The meaningful processing of vast amounts of unstructured data from a myriad of sensors and devices is a complicated endeavor already. Aggravated by the need to enter the extremely power- and resource-constrained pocket-size mobile domain, the computing as we know it is rapidly evolving. Data-centric in- and near-memory computing, as well as highly heterogeneous accelerator-equipped application-centric architectures, are on the rise to tackle the unsatisfiable demand for evermore compute performance and efficiency.
To learn from these innovations, this paper surveys compute-, memory-, and application-centric architectures and related programming paradigms and analyzes prominent chances and challenges. The key insights from the particular domains are: 1) The high nominal processing performance of compute-centric systems is thwarted by massively decreasing data-to-task locality and increased data movement. Nevertheless, the commodity of shared-memory programming and the presence of widespread legacy applications keep this domain alive. 2) Memory-centric designs help to mitigate the data locality wall and significantly improve power and performance efficiency. However, a memory-centric programming paradigm is still missing. 3) Heterogeneity, customization, and established ecosystems (like for mobile devices) enable application-centric optimization under often tight thermal, power, and resource constraints. However, a holistic SoC-level design approach is required to utilize and program the diversity of processing units in different application domains efficiently.
A one-size-fits-all architecture approach seems not in sight because of the wide diversity in domain-specific requirements and constraints. Therefore, established ecosystems, 3D-stacked logic-enhanced memory devices, and commoditized architecture-aware programming models seem fundamental for performant and programmable future-proof computer architectures.

References

[1]
[1] Samsung Exynos 990.2019. https://www.samsung.com/semiconductor/global.semi.static/minisite/exynos/file/solution/MobileProcessor-990.pdf
[2]
Shweta Aladakatti 2019. Battery life optimization techniques for ultra-low power SOCs. EAI Endorsed Transactions on Cloud Systems 5, 16 (11 2019). https://doi.org/10.4108/eai.5-11-2019.162591
[3]
AMD. 2019. Introducing RDNA architecture. https://www.amd.com/system/files/documents/rdna-whitepaper.pdf
[4]
AnandTech. 2019. The Snapdragon 865 Performance Preview: Setting the Stage for Flagship Android 2020. https://www.anandtech.com/show/15178/qualcomm-announces-snapdragon-865-and-765-5g-for-all-in-2020-all-the-details/2
[5]
AnandTech. 2020. The Exynos 990 SoC: Last of Custom CPUs. https://www.anandtech.com/show/15603/the-samsung-galaxy-s20-s20-ultra-exynos-snapdragon-review-megalomania-devices/4
[6]
[6] Android Neural Networks API.2017. https://developer.android.com/ndk/guides/neuralnetworks
[7]
[7] Intel Many Integrated Core Architecture.2012. https://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture/
[8]
Arm. 2018. Accelerating mobile and laptop performance: Arm announces Client CPU roadmap. https://www.arm.com/company/news/2018/08/accelerating-mobile-and-laptop-performance
[9]
Oliver Arnold 2010. Power aware heterogeneous MPSoC with dynamic task scheduling and increased data locality for multiple applications. In Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS 2010), Samos, Greece, July 19-22, 2010, Fadi J. Kurdahi and Jarmo Takala (Eds.). IEEE, 110–117. https://doi.org/10.1109/ICSAMOS.2010.5642075
[10]
John Backus. 1978. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Commun. ACM 21, 8 (1978), 613–641.
[11]
Evgenij Belikov 2013. A Survey of High-Level Parallel Programming Models.
[12]
Shane Bell 2008. TILE64 - Processor: A 64-Core SoC with Mesh Interconnect. In 2008 IEEE International Solid-State Circuits Conference, ISSCC 2008, Digest of Technical Papers, San Francisco, CA, USA, February 3-7, 2008. IEEE, San Francisco, CA, USA, 88–89. https://doi.org/10.1109/ISSCC.2008.4523070
[13]
M. Berezecki 2011. Many-Core Key-Value Store. In Proceedings of the 2011 International Green Computing Conference and Workshops(IGCC ’11). IEEE Computer Society, USA, 1–8. https://doi.org/10.1109/IGCC.2011.6008565
[14]
Keren Bergman 2008. Exascale computing study: Technology challenges in achieving exascale systems. (2008).
[15]
[15] Arm big.LITTLE.2011. https://www.arm.com/why-arm/technologies/big-little
[16]
G. Blake 2009. A survey of multicore processors. IEEE Signal Processing Magazine 26, 6 (2009), 26–37.
[17]
Amirali Boroumand 2017. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory. IEEE Comput. Archit. Lett. 16, 1 (2017), 46–50. https://doi.org/10.1109/LCA.2016.2577557
[18]
A. Branover 2012. AMD Fusion APU: Llano. IEEE Micro 32, 2 (Mar 2012), 28–37. https://doi.org/10.1109/MM.2012.2
[19]
Doug Burger 1996. Memory Bandwidth Limitations of Future Microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, Philadelphia, PA, USA, May 22-24, 1996, Jean-Loup Baer (Ed.). ACM, 78–89. https://doi.org/10.1145/232973.232983
[20]
Thomas Burger. 2005. Intel Multi-Core Processors: Quick Reference Guide. https://software.intel.com/en-us/articles/intel-multi-core-processors-quick-reference-guide
[21]
Lei Chai 2007. Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System. In Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 14-17 May 2007, Rio de Janeiro, Brazil. IEEE Computer Society, 471–478. https://doi.org/10.1109/CCGRID.2007.119
[22]
T. Chen 2007. Cell Broadband Engine Architecture and its first implementation—A performance view. IBM Journal of Research and Development 51, 5 (Sep 2007), 559–572. https://doi.org/10.1147/rd.515.0559
[23]
Nagabhushan Chitlur 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In 18th IEEE International Symposium on High Performance Computer Architecture, HPCA 2012, New Orleans, LA, USA, 25-29 February, 2012. IEEE Computer Society, 433–440. https://doi.org/10.1109/HPCA.2012.6169046
[24]
Byn Choi 2011. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism. In 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, Galveston, TX, USA, October 10-14, 2011, Lawrence Rauchwerger and Vivek Sarkar (Eds.). IEEE Computer Society, 155–166. https://doi.org/10.1109/PACT.2011.21
[25]
George Chrysos. 2014. Intel® Xeon Phi™ Coprocessor — the Architecture. Technical Report.
[26]
Hybrid Memory Cube Consortium. 2014. Hybrid Memory Cube Specification 2.1. Technical Report.
[27]
UPC Consortium. 2005. UPC Language Specifications V1.2. (5 2005). https://doi.org/10.2172/862127
[28]
Intel Corporation. 2018. Intel Arria 10 Device Overview. Technical Report.
[29]
Intel Corporation. 2020. Intel Stratix 10 MX (DRAM System-in-Package) Device Overview. Technical Report.
[30]
L. Dagum 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 5, 1 (Jan 1998), 46–55. https://doi.org/10.1109/99.660313
[31]
William J. Dally. 2006. Computer Architecture in the Many-Core Era. In 24th International Conference on Computer Design (ICCD 2006), 1-4 October 2006, San Jose, CA, USA. IEEE, 1. https://doi.org/10.1109/ICCD.2006.4380784
[32]
Satish Damaraju 2012. A 22nm IA multi-CPU and GPU System-on-Chip. In 2012 IEEE International Solid-State Circuits Conference, ISSCC 2012, San Francisco, CA, USA, February 19-23, 2012. IEEE, 56–57. https://doi.org/10.1109/ISSCC.2012.6176876
[33]
Mattias De Wael 2015. Partitioned Global Address Space Languages. ACM Comput. Surv. 47, 4, Article 62 (May 2015), 27 pages. https://doi.org/10.1145/2716320
[34]
Robert H Dennard. 1968. Field-effect transistor memory. US Patent 3,387,286.
[35]
M. Deo 2016. Intel Stratix 10 mx devices with Samsung HBM2 solve the memory bandwidth challenge. Technical Report.
[36]
J. Diaz 2012. A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era. IEEE Transactions on Parallel and Distributed Systems 23, 8 (2012), 1369–1386.
[37]
[37] Chapel Documentation.2020. https://chapel-lang.org/docs/index.html
[38]
Xiangyu Dong 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the 45th Design Automation Conference, DAC 2008, Anaheim, CA, USA, June 8-13, 2008, Limor Fix (Ed.). ACM, 554–559. https://doi.org/10.1145/1391469.1391610
[39]
Alejandro Duran 2011. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures. Parallel Process. Lett. 21, 2 (2011), 173–193. https://doi.org/10.1142/S0129626411000151
[40]
Yves Durand 2014. EUROSERVER: Energy Efficient Node for European Micro-Servers. In 17th Euromicro Conference on Digital System Design, DSD 2014, Verona, Italy, August 27-29, 2014. IEEE Computer Society, 206–213. https://doi.org/10.1109/DSD.2014.15
[41]
[41] Arm DynamIQ.2017. https://www.arm.com/why-arm/technologies/dynamiq
[42]
Ryusuke Egawa 2013. Vertically integrated processor and memory module design for vector supercomputers. In 2013 IEEE International 3D Systems Integration Conference (3DIC). 1–6.
[43]
Duncan G Elliott 1992. Computational RAM: A memory-SIMD hybrid and its application to DSP. In Custom Integrated Circuits Conference, Vol. 30. 1–30.
[44]
Tetsuo Endoh 2016. An Overview of Nonvolatile Emerging Memories - Spintronics for Working Memories. IEEE J. Emerg. Sel. Topics Circuits Syst. 6, 2 (2016), 109–119. https://doi.org/10.1109/JETCAS.2016.2547704
[45]
Babak Falsafi 2016. Near-Memory Data Services. IEEE Micro 36, 1 (2016), 6–13. https://doi.org/10.1109/MM.2016.9
[46]
[46] Intel Core Processor Family.2020. https://www.intel.com/content/www/us/en/products/processors/core/
[47]
[47] POSIX 1003.1 FAQ.2011. http://www.opengroup.org/austin/papers/posix_faq.html
[48]
Paolo Faraboschi 2015. Beyond Processor-centric Operating Systems. In 15th Workshop on Hot Topics in Operating Systems, HotOS XV, Kartause Ittingen, Switzerland, May 18-20, 2015, George Candea (Ed.). USENIX Association. https://www.usenix.org/conference/hotos15/workshop-program/presentation/faraboschi
[49]
Amin Farmahini Farahani 2015. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In 21st IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, Burlingame, CA, USA, February 7-11, 2015. IEEE Computer Society, 283–295. https://doi.org/10.1109/HPCA.2015.7056040
[50]
Wu-chun Feng 2009. Tools and Environments for Multicore and Many-Core Architectures. IEEE Computer 42, 11 (2009), 26–27. https://doi.org/10.1109/MC.2009.412
[51]
[51] Intel FPGAs for Deep Learning.2019. https://press3.mcs.anl.gov//atpesc/files/2019/08/ATPESC_2019_Track-1_8_7-29_330pm_Moawad_Nash-FPGAs.pdf
[52]
Mingyu Gao 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In 2016 IEEE International Symposium on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, March 12-16, 2016. IEEE Computer Society, 126–137. https://doi.org/10.1109/HPCA.2016.7446059
[53]
Dan Ginsburg 2014. OpenGL ES 3.0 programming guide. Addison-Wesley Professional.
[54]
Gregory Ray Goslin. 1996. Guide to using field programmable gate arrays (FPGAs) for application-specific digital signal processing performance. In High-Speed Computing, Digital Signal Processing, and Filtering Using Reconfigurable Logic, Vol. 2914. SPIE, 321 – 331. https://doi.org/10.1117/12.255830
[55]
Peter Greenhalgh. 2011. Big.LITTLE Processing with ARM CortexTM-A15 & Cortex-A7. Technical Report.
[56]
Khronos Group. 2009. OpenCL Overview. https://www.khronos.org/opencl
[57]
Khronos Group. 2020. Vulkan Overview. https://www.khronos.org/vulkan/
[58]
M. Halpern 2016. Mobile CPU’s rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 64–76.
[59]
J. Henkel 2012. Invasive manycore architectures. In 17th Asia and South Pacific Design Automation Conference. 193–200.
[60]
Mark D. Hill 2019. Gables: A Roofline Model for Mobile SoCs. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16-20, 2019. IEEE, 317–330. https://doi.org/10.1109/HPCA.2019.00047
[61]
J. Howard 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In 2010 IEEE International Solid-State Circuits Conference - (ISSCC). 108–109. https://doi.org/10.1109/ISSCC.2010.5434077
[62]
C. Hsieh 2019. The Case for Exploiting Underutilized Resources in Heterogeneous Mobile Architectures. In 2019 Design, Automation Test in Europe Conference Exhibition (DATE). 1265–1268.
[63]
Kevin Hsieh 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18-22, 2016. IEEE Computer Society, 204–216. https://doi.org/10.1109/ISCA.2016.27
[64]
IBM IDC. 2017. The transformation of High Performance Computing: Simulation and Cognitive Methods in the Era of Big Data. https://www.slideshare.net/insideHPC/the-transformation-of-hpc-simulation-and-cognitive-methods-in-the-era-of-big-data
[65]
Andrey Ignatov 2019. AI Benchmark: All About Deep Learning on Smartphones in 2019. CoRR abs/1910.06663(2019). arxiv:1910.06663http://arxiv.org/abs/1910.06663
[66]
Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski. 2013. A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM.
[67]
Advanced Micro Devices Inc.2015. High-Bandwidth Memory (HBM) Reinventing Memory Technology. Technical Report.
[68]
Intel. [n.d.]. NVIDIA’s Next Generation CUDA Compute Architecture. Technical Report. https://www.nvidia.de/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
[69]
Nuwan Jayasena. 2018. Memory-centric Accelerators in High-performance Systems. In 55th Design Automation Conference (DAC 2018), 24-28 June 2018, San Francisco, CA USA, Special Session on “Memory-centric Architectures: Industry Perspective from Embedded Systems to High Performance Computing”.
[70]
L. Jiang 2010. Yield enhancement for 3D-stacked memory by redundancy sharing across dies. In 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 230–234.
[71]
Sang Woo Jun 2015. BlueDBM: an appliance for big data analytics. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13-17, 2015, Deborah T. Marr and David H. Albonesi (Eds.). ACM, 1–13. https://doi.org/10.1145/2749469.2750412
[72]
J. A. Kahle 2005. Introduction to the Cell multiprocessor. IBM Journal of Research and Development 49, 4.5 (2005), 589–604.
[73]
Henry Kasim 2008. Survey on Parallel Programming Model. In Network and Parallel Computing, Jian Cao, Minglu Li, Min-You Wu, and Jinjun Chen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 266–275.
[74]
Mushfique Junayed Khurshid 2013. Data compression for thermal mitigation in the Hybrid Memory Cube. In 2013 IEEE 31st International Conference on Computer Design, ICCD 2013, Asheville, NC, USA, October 6-9, 2013. IEEE Computer Society, 185–192. https://doi.org/10.1109/ICCD.2013.6657041
[75]
W. I. Kinney 1987. A non-volatile memory cell based on ferroelectric storage capacitors. In 1987 International Electron Devices Meeting. 850–851.
[76]
[76] HiSilicon Kirin.2019. http://www.hisilicon.com/en/Products/ProductList/Kirin
[77]
D. Kirk 2013. Programming Massively Parallel Processors: A Hands-on Approach (Second Edition). Morgan Kaufmann.
[78]
Peter Kogge. 2017. Memory Intensive Computing, the 3rdWall, and the Need for Innovation in Architecture. https://memsys.io/wp-content/uploads/2017/12/The_Wall.pdf
[79]
Peter M Kogge. 1994. EXECUBE-a new architecture for scaleable MPPs. In 1994 International Conference on Parallel Processing Vol. 1, Vol. 1. IEEE, 77–84.
[80]
Peter M Kogge 1997. Processing in memory: Chips to petaflops. In Workshop on Mixing Logic and DRAM: Chips that Compute and Remember at ISCA, Vol. 97. Citeseer.
[81]
David A. Kranz 1993. Integrating Message-Passing and Shared-Memory: Early Experience. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), San Diego, California, USA, May 19-22, 1993, Marina C. Chen and Robert Halstead (Eds.). ACM, 54–63. https://doi.org/10.1145/155332.155338
[82]
Ronny Krashinsky 2020. NVIDIA Ampere Architecture In-Depth. Technical Report.
[83]
Rakesh Kumar 2005. Heterogeneous Chip Multiprocessors. Computer 38, 11 (Nov. 2005), 32–38. https://doi.org/10.1109/MC.2005.379
[84]
R. Kumar 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.64–75. https://doi.org/10.1109/ISCA.2004.1310764
[85]
E Scott Larsen 2001. Fast matrix multiplies using graphics hardware. In Proceedings of the 2001 ACM/IEEE conference on Supercomputing. 55–55.
[86]
Dong Uk Lee 2014. 25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, 432–433.
[87]
Heng Liao 2019. DaVinci: A Scalable Architecture for Neural Network Computing. In 2019 IEEE Hot Chips 31 Symposium (HCS), Cupertino, CA, USA, August18-20, 2019. IEEE, 1–44. https://doi.org/10.1109/HOTCHIPS.2019.8875654
[88]
Tsung-Yao Lin 2016. Helio X20: The first tri-gear mobile SoC with CorePilot™ 3.0 technology. In 2016 IEEE Hot Chips 28 Symposium (HCS), Cupertino, CA, USA, August 21-23, 2016. IEEE, 1–24. https://doi.org/10.1109/HOTCHIPS.2016.7936204
[89]
Pejman Lotfi-Kamran 2012. Scale-out Processors. In Proceedings of the 39th Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA ’12). IEEE Computer Society, USA, 500–511.
[90]
Luxin Yan 2006. A DSP/FPGA - Based Parallel Architecture for Real-time Image Processing. In 2006 6th World Congress on Intelligent Control and Automation, Vol. 2. 10022–10025.
[91]
Milo M. K. Martin 2012. Why on-chip cache coherence is here to stay. Commun. ACM 55, 7 (2012), 78–89. https://doi.org/10.1145/2209249.2209269
[92]
K. Matsuyama 1997. Low current magnetic-RAM memory operation with a high sensitive spin valve material. IEEE Transactions on Magnetics 33, 5 (1997), 3283–3285.
[93]
T.G. Mattson 2014. Patterns for Parallel Programming. Addison-Wesley Professional.
[94]
RC Minnick 1966. CELLULAR ARRAYS FOR LOGIC AND STORAGE.Technical Report. STANFORD RESEARCH INST MENLO PARK CALIF.
[95]
Sparsh Mittal 2019. A Survey on Evaluating and Optimizing Performance of Intel Xeon Phi. (May 2019).
[96]
Sparsh Mittal 2015. A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Comput. Surv. 47, 4, Article 69 (July 2015), 35 pages. https://doi.org/10.1145/2788396
[97]
Sparsh Mittal 2015. A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Comput. Surv. 47, 4 (2015), 69:1–69:35. https://doi.org/10.1145/2788396
[98]
[98] Qualcomm Snapdragon 865 5G mobile platform.2019. https://www.qualcomm.com/media/documents/files/qualcomm-snapdragon-865-5g-mobile-platform-product-brief.pdf
[99]
Hadi Asghari Moghaddam 2016. Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules. IEEE Micro 36, 1 (2016), 24–34. https://doi.org/10.1109/MM.2016.8
[100]
Manuel Mohr 2017. Pegasus: Efficient data transfers for PGAS languages on non-cache-coherent many-cores. In Design, Automation & Test in Europe Conference & Exhibition, DATE 2017, Lausanne, Switzerland, March 27-31, 2017, David Atienza and Giorgio Di Natale (Eds.). IEEE, 1781–1786. https://doi.org/10.23919/DATE.2017.7927281
[101]
Valentin Mena Morales 2014. Energy-efficient FPGA implementation for binomial option pricing using OpenCL. In Design, Automation & Test in Europe Conference & Exhibition, DATE 2014, Dresden, Germany, March 24-28, 2014, Gerhard P. Fettweis and Wolfgang Nebel (Eds.). European Design and Automation Association, 1–6. https://doi.org/10.7873/DATE.2014.221
[102]
M. Motoyoshi. 2009. Through-Silicon Via (TSV). Proc. IEEE 97, 1 (2009), 43–48.
[103]
Robert W. Numrich 1998. Co-Array Fortran for Parallel Programming. SIGPLAN Fortran Forum 17, 2 (Aug. 1998), 1–31. https://doi.org/10.1145/289918.289920
[104]
P. Pacheco. 1996. Parallel Programming with MPI. Morgan Kaufman Publishers.
[105]
J. Parkhurst 2006. From Single Core to Multi-Core: Preparing for a new exponential. In 2006 IEEE/ACM International Conference on Computer Aided Design. 67–72. https://doi.org/10.1109/ICCAD.2006.320067
[106]
David Patterson 1997. Intelligent RAM (IRAM): Chips that remember and compute. In 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers. IEEE, 224–225.
[107]
David A. Patterson. 2004. Latency Lags Bandwidth. Commun. ACM 47, 10 (Oct. 2004), 71–75. https://doi.org/10.1145/1022594.1022596
[108]
David A. Patterson 1997. A case for intelligent RAM. IEEE Micro 17, 2 (1997), 34–44. https://doi.org/10.1109/40.592312
[109]
Ashutosh Pattnaik 2016. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11-15, 2016, Ayal Zaks andBilha Mendelson, Lawrence Rauchwerger, and Wen-mei W. Hwu (Eds.). ACM, 31–44. https://doi.org/10.1145/2967938.2967940
[110]
J. T. Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS). 1–24.
[111]
Russell J. Petersen 1995. An assessment of the suitability of FPGA-based systems for use in digital signal processing. In Field-Programmable Logic and Applications, Will Moore and Wayne Luk (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 293–302.
[112]
[112] AMD Ryzen Threadripper 3990X Processor.2020. https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-3990x/
[113]
[113] AMD Ryzen Desktop Processors.2020. https://www.amd.com/en/ryzen
[114]
Seth H. Pugsley 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2014, Monterey, CA, USA, March 23-25, 2014. IEEE Computer Society, 190–200. https://doi.org/10.1109/ISPASS.2014.6844483
[115]
V. J. Reddi 2018. Two Billion Devices and Counting. IEEE Micro 38, 1 (2018), 6–21.
[116]
Sven Rheindt 2019. NEMESYS: near-memory graph copy enhanced system-software. In Proceedings of the International Symposium on Memory Systems, MEMSYS 2019, Washington, DC, USA, September 30 - October 03, 2019. ACM, 3–18. https://doi.org/10.1145/3357526.3357545
[117]
[117] NVIDIA Titan RTX.2018. https://www.nvidia.com/en-us/deep-learning-ai/products/titan-rtx/
[118]
M. M. Sabry Aly 2015. Energy-Efficient Abundant-Data Computing: The N3XT 1,000x. Computer 48, 12 (2015), 24–33.
[119]
Vijay Saraswat 2019. X10 Language Specification. http://x10.sourceforge.net/documentation/languagespec/x10-latest.pdf
[120]
Ashley Saulsbury 1996. Missing the Memory Wall: The Case for Processor/Memory Integration. SIGARCH Comput. Archit. News 24, 2 (May 1996), 90–101. https://doi.org/10.1145/232974.232984
[121]
Fred Schlachter. 2013. No Moore’s Law for batteries. Proceedings of the National Academy of Sciences 110, 14(2013), 5273–5273. https://doi.org/10.1073/pnas.1302988110 arXiv:https://www.pnas.org/content/110/14/5273.full.pdf
[122]
[122] Huawei Kirin 990 Series.2019. https://consumer.huawei.com/en/campaign/kirin-990-series/
[123]
[123] MediaTek Dimensity 1000 Series.2019. https://www.mediatek.com/products/smartphones/dimensity-1000-series
[124]
Yakun Sophia Shao 2015. The Aladdin Approach to Accelerator Design and Modeling. IEEE Micro 35, 3 (2015), 58–70. https://doi.org/10.1109/MM.2015.50
[125]
Patrick Siegl 2016. Data-Centric Computing Frontiers: A Survey On Processing-In-Memory. In Proceedings of the Second International Symposium on Memory Systems, MEMSYS 2016, Alexandria, VA, USA, October 3-6, 2016, Bruce Jacob(Ed.). ACM, 295–308. https://doi.org/10.1145/2989081.2989087
[126]
Gagandeep Singh 2019. Near-memory computing: Past, present, and future. Microprocessors and Microsystems 71 (2019), 102868.
[127]
Avinash Sodani 2016. Knights landing: Second-generation intel xeon phi product. Ieee micro 36, 2 (2016), 34–46.
[128]
Sony. 2020. Unveiling New Details of PlayStation 5: Hardware Technical Specs. https://blog.playstation.com/2020/03/18/unveiling-new-details-of-playstation-5-hardware-technical-specs/?ref-cat=254013
[129]
Euripides Sotiriades 2007. A General Reconfigurable Architecture for the BLAST Algorithm. The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology 48, 3 (01 Sep 2007), 189–208. https://doi.org/10.1007/s11265-007-0069-2
[130]
Akshay Srivatsa 2017. Region based cache coherence for tiled MPSoCs. In 30th IEEE International System-on-Chip Conference, SOCC 2017, Munich, Germany, September 5-8, 2017, Massimo Alioto, Hai Helen Li, Jürgen Becker, Ulf Schlichtmann, and Ramalingam Sridhar (Eds.). IEEE, 286–291. https://doi.org/10.1109/SOCC.2017.8226059
[131]
Harold S Stone. 1970. A logic-in-memory computer. IEEE Trans. Comput. 100, 1 (1970), 73–78.
[132]
Vivienne Sze 2017. Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference, CICC 2017, Austin, TX, USA, April 30 - May 3, 2017. IEEE, 1–8. https://doi.org/10.1109/CICC.2017.7993626
[133]
C. Tan 2018. Stitch: Fusible Heterogeneous Accelerators Enmeshed with Many-Core Architecture for Wearables. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 575–587.
[134]
Cheng Tan 2017. LOCUS: Low-Power Customizable Many-Core Architecture for Wearables. ACM Trans. Embed. Comput. Syst. 17, 1, Article 16 (Nov. 2017), 26 pages. https://doi.org/10.1145/3122786
[135]
The Economist. 2017. The world’s most valuable resource is no longer oil, but data. https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
[136]
E. I. Vatajelu 2019. Challenges and Solutions in Emerging Memory Testing. IEEE Transactions on Emerging Topics in Computing 7, 3 (2019), 493–506.
[137]
David Wentzlaff 2007. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro 27, 5 (2007), 15–31. https://doi.org/10.1109/MM.2007.89
[138]
Samuel Williams 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65–76. https://doi.org/10.1145/1498765.1498785
[139]
Martin S. Won. 2019. Intel Agilex FPGAs Deliver a Game-Changing Combination of Flexibility and Agility for the Data-Centric World. Technical Report.
[140]
William A. Wulf 1995. Hitting the memory wall: implications of the obvious. SIGARCH Computer Architecture News 23, 1 (1995), 20–24. https://doi.org/10.1145/216585.216588
[141]
Yuan Xie. 2013. Future memory and interconnect technologies. In Design, Automation and Test in Europe, DATE 13, Grenoble, France, March 18-22, 2013, Enrico Macii (Ed.). EDA Consortium San Jose, CA, USA / ACM DL, 964–969. https://doi.org/10.7873/DATE.2013.202
[142]
Katherine A. Yelick 1998. Titanium: A High-performance Java Dialect. Concurrency - Practice and Experience 10 (1998), 825–836.
[143]
Salessawi Ferede Yitbarek 2016. Exploring specialized near-memory processing for data intensive operations. In 2016 Design, Automation & Test in Europe Conference & Exhibition, DATE 2016, Dresden, Germany, March 14-18, 2016, Luca Fanucci and Jürgen Teich (Eds.). IEEE, 1449–1452. http://ieeexplore.ieee.org/document/7459537/
[144]
Marcelo Yuffe 2011. A fully integrated multi-CPU, GPU and memory controller 32nm processor. In IEEE International Solid-State Circuits Conference, ISSCC 2011, Digest of Technical Papers, San Francisco, CA, USA, 20-24 February, 2011. IEEE, 264–266. https://doi.org/10.1109/ISSCC.2011.5746311
[145]
Jia Zhan 2016. A unified memory network architecture for in-memory computing in commodity servers. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 29:1–29:14. https://doi.org/10.1109/MICRO.2016.7783732
[146]
Dong Ping Zhang 2014. TOP-PIM: throughput-oriented programmable processing in memory. In The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC’14, Vancouver, BC, Canada - June 23 - 27, 2014, Beth Plale, Matei Ripeanu, Franck Cappello, and Dongyan Xu (Eds.). ACM, 85–98. https://doi.org/10.1145/2600212.2600213
[147]
Yuhao Zhu 2018. Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective. CoRR abs/1801.06274(2018). arxiv:1801.06274http://arxiv.org/abs/1801.06274

Cited By

View all
  • (2024)Elastic Gateway SoC designVehicular Communications10.1016/j.vehcom.2023.10072145:COnline publication date: 16-May-2024
  • (2022)An Agile Tile-based Platform for Adaptive Heterogeneous Many-Core Systems2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974358(1-4)Online publication date: 5-Dec-2022
  • (2022)AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V ProcessorsIEEE Access10.1109/ACCESS.2022.316868610(43895-43913)Online publication date: 2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '20: Proceedings of the International Symposium on Memory Systems
September 2020
362 pages
ISBN:9781450388993
DOI:10.1145/3422575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. application-centric
  2. architecture evolution
  3. computer architecture
  4. heterogeneous architecture
  5. memory-centric
  6. mobile device
  7. near-memory computing
  8. programming model
  9. roofline model
  10. survey

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MEMSYS 2020
MEMSYS 2020: The International Symposium on Memory Systems
September 28 - October 1, 2020
DC, Washington, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Elastic Gateway SoC designVehicular Communications10.1016/j.vehcom.2023.10072145:COnline publication date: 16-May-2024
  • (2022)An Agile Tile-based Platform for Adaptive Heterogeneous Many-Core Systems2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974358(1-4)Online publication date: 5-Dec-2022
  • (2022)AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V ProcessorsIEEE Access10.1109/ACCESS.2022.316868610(43895-43913)Online publication date: 2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media