research-article

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics

Authors:

Krishna Praveen Yalamarthy,

Fares Elsabbagh,

Kim HyesoonAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 754 - 766

https://doi.org/10.1145/3466752.3480128

Published: 17 October 2021 Publication History

Abstract

The importance of open-source hardware and software has been increasing. However, despite GPUs being one of the more popular accelerators across various applications, there is very little open-source GPU infrastructure in the public domain. We argue that one of the reasons for the lack of open-source infrastructure for GPUs is rooted in the complexity of their ISA and software stacks. In this work, we first propose an ISA extension to RISC-V that supports GPGPUs and graphics. The main goal of the ISA extension proposal is to minimize the ISA changes so that the corresponding changes to the open-source ecosystem are also minimal, which makes for a sustainable development ecosystem. To demonstrate the feasibility of the minimally extended RISC-V ISA, we implemented the complete software and hardware stacks of Vortex on FPGA. Vortex is a PCIe-based soft GPU that supports OpenCL and OpenGL. Vortex can be used in a variety of applications, including machine learning, graph analytics, and graphics rendering. Vortex can scale up to 32 cores on an Altera Stratix 10 FPGA, delivering a peak performance of 25.6 GFlops at 200 Mhz.

References

[1]

Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2016. FGPU: An SIMT-architecture for FPGAs. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 254–263.

Digital Library

[2]

AMD. [n.d.]. RDNA 1.0 Instruction Set Architecture. https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf.

[3]

AMD. [n.d.]. RDNA 1.0 Instruction Set Architecture. http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf.

[4]

Kevin Andryc, Murtaza Merchant, and Russell Tessier. 2013. FlexGrip: A soft GPGPU for FPGAs. In 2013 International Conference on Field-Programmable Technology (FPT). IEEE, 230–237.

[5]

Arvind. 2003. Bluespec: A Language for Hardware Design, Simulation, Synthesis and Verification Invited Talk. In Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design(MEMOCODE ’03). IEEE Computer Society, Washington, DC, USA, 249–. http://dl.acm.org/citation.cfm?id=823453.823860

[6]

Krste Asanovic. [n.d.]. RISC-V Vector Extension. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

[7]

Krste Asanović and David A Patterson. 2014. Instruction sets should be free: The case for risc-v. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2014-146 (2014).

[8]

Mikhail Asiatici and Paolo Ienne. 2019. Stop crying over your cache miss rate: Handling efficiently thousands of outstanding misses in fpgas. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 310–319.

Digital Library

[9]

J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. 2012. Chisel: Constructing hardware in a Scala embedded language. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. 1212–1221. https://doi.org/10.1145/2228360.2228584

Digital Library

[10]

Ali Bakhoda, George L Yuan, Wilson WL Fung, Henry Wong, and Tor M Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 163–174.

[11]

Raghuraman Balasubramanian, Vinay Gangadhar, Ziliang Guo, Chen-Han Ho, Cherin Joseph, Jaikrishnan Menon, Mario Paulo Drumond, Robin Paul, Sharath Prasad, and Pradip Valathol. 2015. Miaow-an open source rtl implementation of a gpgpu. In 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII). IEEE, 1–3.

[12]

Lars Bishop. 2006. OpenGL ES 1.1, 2.0 and EGL. In ACM SIGGRAPH 2006 Courses. 3–es.

Digital Library

[13]

Tine Blaise, Seyong Lee, Jeff Vetter, and Hyesoon Kim. 2021. Bringing OpenCL to Commodity RISC-V CPUs. In 2021 Workshop on RISC-V for Computer Architecture Research (CARRV).

[14]

Ian Bratt. 2015. The arm® mali-t880 mobile gpu. In 2015 IEEE Hot Chips 27 Symposium (HCS). IEEE, 1–27.

[15]

John Burgess. 2020. Rtx on—the nvidia turing gpu. IEEE Micro 40, 2 (2020), 36–44.

[16]

Jeff Bush, Philip Dexter, Timothy N Miller, and Aaron Carpenter. 2015. Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 173–182.

[17]

Jeff Bush, Mohammad A Khasawneh, Khaled Z Mahmoud, and Timothy N Miller. 2016. NyuziRaster: Optimizing rasterizer performance and energy in the Nyuzi open source GPU. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 204–213.

[18]

Matheus A. Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, and Luca Benini. 2019. Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI. CoRR abs/1906.00478(2019). arXiv:1906.00478

[19]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44–54.

Digital Library

[20]

Jongsok Choi, Kevin Nam, Andrew Canis, Jason Anderson, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. IEEE, 17–24.

Digital Library

[21]

Sylvain Collange. 2017. Simty: generalized SIMT execution on RISC-V. In First Workshop on Computer Architecture Research with RISC-V (CARRV 2017). 6.

[22]

Jordi Cortadella, Marc Galceran-Oms, and Mike Kishinevsky. 2010. Elastic systems. In Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010). IEEE, 149–158.

Digital Library

[23]

Victor Moya Del Barrio, Carlos González, Jordi Roca, Agustín Fernández, and E Espasa. 2006. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 231–241.

[24]

Fares Elsabbagh, Blaise Tine, Priyadarshini Roshan, Ethan Lyons, Euna Kim, Da Eun Shim, Lingjun Zhu, Sung Kyu Lim, and Hyesoon Kim. 2020. Vortex: OpenCL Compatible RISC-V GPGPU. CoRR abs/2002.12151(2020). arXiv:2002.12151https://arxiv.org/abs/2002.12151

[25]

H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 365–376.

[26]

Jon P Ewins, Marcus D Waller, Martin White, and Paul F Lister. 1998. Mip-map level selection for texture mapping. IEEE Transactions on Visualization and Computer Graphics 4, 4(1998), 317–329.

Digital Library

[27]

Kayvon Fatahalian. [n.d.]. Lecture 15: Optimizing Data Access in the Graphics Pipeline. http://cs348k.stanford.edu/fall18/lecture/gfxmemory.

[28]

Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. 2007. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. IEEE Computer Society, 407–420. https://doi.org/10.1109/MICRO.2007.12

Digital Library

[29]

Google. 2019. Google Stadia. https://stadia.google.com/.

[30]

Green500. 2019. Green500 list - June 2019. https://www.top500.org/lists/2019/06/

[31]

Ayub A Gubran and Tor M Aamodt. 2019. Emerald: graphics modeling for SoC systems. In Proceedings of the 46th International Symposium on Computer Architecture. 169–182.

Digital Library

[32]

A. Gutierrez, B. M. Beckmann, A. Dutu, J. Gross, M. LeBeane, J. Kalamatianos, O. Kayiran, M. Poremba, B. Potter, S. Puthoor, M. D. Sinclair, M. Wyse, J. Yin, X. Zhang, A. Jain, and T. Rogers. 2018. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 608–619. https://doi.org/10.1109/HPCA.2018.00058

[33]

Yuanjie Huang, Paolo Ienne, Olivier Temam, Yunji Chen, and Chengyong Wu. 2013. Elastic cgras. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays. 171–180.

Digital Library

[34]

Homan Igehy, Matthew Eldridge, and Kekoa Proudfoot. 1998. Prefetching in a texture cache architecture. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware. 133–ff.

Digital Library

[35]

Intel. [n.d.]. Intel Graphics Hardware Specifications. https://01.org/linuxgraphics/documentation/hardware-specification-prms.

[36]

Intel. 2018. the Open Programmable Acceleration Engine (OPAE). https://01.org/opae.

[37]

Pekka Jaaskelainen, Carlos Sanchez de La Lama, Erik Schnetter, Kalle Raiskila, Jarmo Takala, and Heikki Berg. 2015. POCL: Portable Computing Language. http://portablecl.org. International Journal of Parallel Programming (2015), 752–785.

[38]

Mohammad Reza Kakoee, Vladimir Petrovic, and Luca Benini. 2012. A multi-banked shared-l1 cache architecture for tightly coupled processor clusters. In 2012 International Symposium on System on Chip (SoC). IEEE, 1–5.

[39]

Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, and Markus Steinberger. 2018. On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing. PACMCGIT 1, 2 (2018), 28:1–28:17. https://doi.org/10.1145/3233303

Digital Library

[40]

Chad D. Kersey, Hyesoon Kim, and Sudhakar Yalamanchili. 2017. Lightweight SIMT Core Designs for Intelligent 3D Stacked DRAM. In Proceedings of the International Symposium on Memory Systems (Alexandria, Virginia) (MEMSYS ’17). ACM, 49–59. https://doi.org/10.1145/3132402.3132426

Digital Library

[41]

Hyesoon Kim, Jaekyu Lee, Nagesh B Lakshminarayana, Jaewoong Sim, Jieun Lim, and Tri Pho. 2012. Macsim: A cpu-gpu heterogeneous simulation framework user guide. Georgia Institute of Technology(2012).

[42]

Charles Eric LaForest and J Gregory Steffan. 2010. Efficient multi-ported memories for FPGAs. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. 41–50.

Digital Library

[43]

Samuli Laine and Tero Karras. 2011. High-performance software rasterization on GPUs. In Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics. 79–88.

Digital Library

[44]

C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis amp; transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.75–86. https://doi.org/10.1109/CGO.2004.1281665

[45]

Y. Lee, A. Waterman, R. Avizienis, H. Cook, C. Sun, V. Stojanović, and K. Asanović. 2014. A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators. In ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC). 199–202. https://doi.org/10.1109/ESSCIRC.2014.6942056

[46]

Alexander Lier, Marc Stamminger, and Kai Selgrad. 2018. CPU-style SIMD ray traversal on GPUs. In HPG ’18.

[47]

LunarG. 2019. LunarGLASS Shader Compiler Stack. https://www.lunarg.com/.

[48]

Mike Mantor. 2012. AMD Radeon™ HD 7970 with graphics core next (GCN) architecture. In 2012 IEEE Hot Chips 24 Symposium (HCS). IEEE, 1–35.

[49]

Microsoft. 2019. Microsoft XCloud. https://www.xbox.com/en-US/xbox-game-streaming/project-xcloud/.

[50]

A. Munshi. 2009. The OpenCL specification. In 2009 IEEE Hot Chips 21 Symposium (HCS). 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342

[51]

Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N. Patt. 2011. Improving GPU Performance via Large Warps and Two-level Warp Scheduling(MICRO-44). ACM, 308–317. https://doi.org/10.1145/2155620.2155656

Digital Library

[52]

NVIDIA. 2010. PTX: Parallel thread execution ISA version 2.3. http://developer.nvidia.com/compute/cuda.

[53]

Rafael T Possignolo, Elnaz Ebrahimi, Haven Skinner, and Jose Renau. 2016. FluidPipelines: Elastic circuitry without throughput penalty. In Logic Synthesis (IWLS), Proceedings of the 2016 International Workshop on.

[54]

Jason Power, Joel Hestness, Marc S Orr, Mark D Hill, and David A Wood. 2014. gem5-gpu: A heterogeneous cpu-gpu simulator. IEEE Computer Architecture Letters 14, 1 (2014), 34–36.

Digital Library

[55]

Kyle Roarty and Matthew D Sinclair. 2020. Modeling Modern GPU Applications in gem5. In gem5 Users Workshop.

[56]

Ben Sander and AMD SENIOR FELLOW. 2013. HSAIL: Portable compiler IR for HSA. In Hot Chips Symposium. 1–32.

[57]

Jason Sanders and Edward Kandrot. 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.

Digital Library

[58]

Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, 2008. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics (TOG) 27, 3 (2008), 1–15.

Digital Library

[59]

Wilson Snyder. [n.d.]. Verilator. https://www.veripool.org/wiki/verilator.

[60]

Rys Sommefeldt. 2015. A look at the PowerVR graphics architecture: Tile-based rendering.

[61]

Imagination Technologies. [n.d.]. PowerVR Instruction Set Reference. Rev 1.0. http://cdn.imgtec.com/sdk-documentation/PowerVR+Instruction+Set+Reference.pdf.

[62]

Blaise-Pascal Tine, Sudhakar Yalamanchili, and Hyesoon Kim. 2020. Tango: an optimizing compiler for Just-In-Time RTL simulation. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 157–162.

[63]

R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 335–344.

[64]

Elena Vasiou, Konstantin Shkurko, Erik Brunvand, and Cem Yuksel. 2019. Mach-RT: A Many Chip Architecture for Ray Tracing. In High-Performance Graphics - Short Papers, Markus Steinberger and Tim Foley (Eds.). The Eurographics Association. https://doi.org/10.2312/hpg.20191188

Digital Library

[65]

Ingo Wald, Will Usher, Nate Morrical, Laura Lediaev, and Valerio Pascucci. 2019. RTX Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location. In High-Performance Graphics - Short Papers. https://doi.org/10.2312/hpg.20191189

Digital Library

[66]

Li-Yi Wei. 2004. Tile-based texture mapping on graphics hardware. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware. 55–63.

Digital Library

[67]

Mike Wissolik, Darren Zacher, Anthony Torza, and Brandon Da. 2017. Virtex UltraScale+ HBM FPGA: A revolutionary increase in memory performance. Xilinx Whitepaper (2017).

[68]

Hoi-Jun Yoo, Jeong-Ho Woo, Ju-Ho Sohn, and Byeong-Gyu Nam. 2010. Mobile 3D graphics SoC: From algorithm to chip. John Wiley & Sons.

Cited By

Gouk DKang SBae HRyu ELee SKim DJang JJung M(2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665953
Ahn CJeong SCooper LParnenzini NKim H(2024)Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00123(634-641)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00123
Tarasov IMirzoyan DSovietov P(2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
https://doi.org/10.1007/978-3-031-51057-1_3
Show More Cited By

Recommendations

Cellular Automata and GPGPU: An Application to Lava Flow Modeling

This paper presents an efficient implementation of the SCIARA Cellular Automata computational model for simulating lava flows using the Compute Unified Device Architecture CUDA interface developed by NVIDIA and carried out on Graphical Processing Units ...
Bridging the GPGPU-FPGA efficiency gap
FPGA '11: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays

This paper compares an implementation of a Bayesian inference algorithm across several FPGAs and GPGPUs, while embracing both the execution model and high-level architecture of a GPGPU. Our study is motivated by recent work in template-based programming ...
Wave field synthesis for 3D audio: architectural prospectives
CF '09: Proceedings of the 6th ACM conference on Computing frontiers

In this paper, we compare the architectural perspectives of the Wave Field Synthesis (WFS) 3D-audio algorithm mapped on three different platforms: a General Purpose Processor (GPP), a Graphics Processor Unit (GPU) and a Field Programmable Gate Array (...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF CCRI
NSF CNS

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,452
Total Downloads

Downloads (Last 12 months)473
Downloads (Last 6 weeks)31

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gouk DKang SBae HRyu ELee SKim DJang JJung M(2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665953
Ahn CJeong SCooper LParnenzini NKim H(2024)Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00123(634-641)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00123
Tarasov IMirzoyan DSovietov P(2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
https://doi.org/10.1007/978-3-031-51057-1_3
Kosmidis LSolé MRodriguez IWolf JTrompouki M(2023)The METASAT Hardware Platform: A High-Performance Multicore, AI SIMD and GPU RISC-V Platform for On-board Processing2023 European Data Handling & Data Processing Conference (EDHPC)10.23919/EDHPC59100.2023.10396370(1-6)Online publication date: 2-Oct-2023
https://doi.org/10.23919/EDHPC59100.2023.10396370
Tine BSaxena VSrivatsan SSimpson JAlzammar FCooper LKim HAamodt TJerger NSwift M(2023)Skybox: Open-Source Graphic Rendering on Programmable RISC-V GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582024(616-630)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582024
Chen YWang XSong SFeng LWang Z(2023)RISC-V Custom Instructions of Elementary Functions for IoT Endpoint DevicesIEEE Transactions on Computers10.1109/TC.2023.333617473:2(523-535)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TC.2023.3336174
Todaro GMonopoli MBenelli GZulberti LPacini T(2023)Enhanced Soft GPU Architecture for FPGAs2023 18th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)10.1109/PRIME58259.2023.10161749(177-180)Online publication date: 18-Jun-2023
https://doi.org/10.1109/PRIME58259.2023.10161749
Kwon MJang JChoi HLee SJung M(2023)Failure Tolerant Training With Persistent Memory Disaggregation Over CXLIEEE Micro10.1109/MM.2023.323754843:2(66-75)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/MM.2023.3237548
Chen GKnag PTokunaga CKrishnamurthy R(2023)An Eight-Core RISC-V Processor With Compute Near Last Level Cache in Intel 4 CMOSIEEE Journal of Solid-State Circuits10.1109/JSSC.2022.322876558:4(1117-1128)Online publication date: Apr-2023
https://doi.org/10.1109/JSSC.2022.3228765
Sarda GShah NBhattacharjee DDebacker PVerhelst M(2023)Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter Analysis2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00017(226-228)Online publication date: 1-Oct-2023
https://doi.org/10.1109/IISWC59245.2023.00017
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents