Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3466752.3480128acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    The importance of open-source hardware and software has been increasing. However, despite GPUs being one of the more popular accelerators across various applications, there is very little open-source GPU infrastructure in the public domain. We argue that one of the reasons for the lack of open-source infrastructure for GPUs is rooted in the complexity of their ISA and software stacks. In this work, we first propose an ISA extension to RISC-V that supports GPGPUs and graphics. The main goal of the ISA extension proposal is to minimize the ISA changes so that the corresponding changes to the open-source ecosystem are also minimal, which makes for a sustainable development ecosystem. To demonstrate the feasibility of the minimally extended RISC-V ISA, we implemented the complete software and hardware stacks of Vortex on FPGA. Vortex is a PCIe-based soft GPU that supports OpenCL and OpenGL. Vortex can be used in a variety of applications, including machine learning, graph analytics, and graphics rendering. Vortex can scale up to 32 cores on an Altera Stratix 10 FPGA, delivering a peak performance of 25.6 GFlops at 200 Mhz.

    References

    [1]
    Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2016. FGPU: An SIMT-architecture for FPGAs. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 254–263.
    [2]
    AMD. [n.d.]. RDNA 1.0 Instruction Set Architecture. https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf.
    [3]
    AMD. [n.d.]. RDNA 1.0 Instruction Set Architecture. http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf.
    [4]
    Kevin Andryc, Murtaza Merchant, and Russell Tessier. 2013. FlexGrip: A soft GPGPU for FPGAs. In 2013 International Conference on Field-Programmable Technology (FPT). IEEE, 230–237.
    [5]
    Arvind. 2003. Bluespec: A Language for Hardware Design, Simulation, Synthesis and Verification Invited Talk. In Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design(MEMOCODE ’03). IEEE Computer Society, Washington, DC, USA, 249–. http://dl.acm.org/citation.cfm?id=823453.823860
    [6]
    Krste Asanovic. [n.d.]. RISC-V Vector Extension. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
    [7]
    Krste Asanović and David A Patterson. 2014. Instruction sets should be free: The case for risc-v. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2014-146 (2014).
    [8]
    Mikhail Asiatici and Paolo Ienne. 2019. Stop crying over your cache miss rate: Handling efficiently thousands of outstanding misses in fpgas. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 310–319.
    [9]
    J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. 2012. Chisel: Constructing hardware in a Scala embedded language. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. 1212–1221. https://doi.org/10.1145/2228360.2228584
    [10]
    Ali Bakhoda, George L Yuan, Wilson WL Fung, Henry Wong, and Tor M Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 163–174.
    [11]
    Raghuraman Balasubramanian, Vinay Gangadhar, Ziliang Guo, Chen-Han Ho, Cherin Joseph, Jaikrishnan Menon, Mario Paulo Drumond, Robin Paul, Sharath Prasad, and Pradip Valathol. 2015. Miaow-an open source rtl implementation of a gpgpu. In 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII). IEEE, 1–3.
    [12]
    Lars Bishop. 2006. OpenGL ES 1.1, 2.0 and EGL. In ACM SIGGRAPH 2006 Courses. 3–es.
    [13]
    Tine Blaise, Seyong Lee, Jeff Vetter, and Hyesoon Kim. 2021. Bringing OpenCL to Commodity RISC-V CPUs. In 2021 Workshop on RISC-V for Computer Architecture Research (CARRV).
    [14]
    Ian Bratt. 2015. The arm® mali-t880 mobile gpu. In 2015 IEEE Hot Chips 27 Symposium (HCS). IEEE, 1–27.
    [15]
    John Burgess. 2020. Rtx on—the nvidia turing gpu. IEEE Micro 40, 2 (2020), 36–44.
    [16]
    Jeff Bush, Philip Dexter, Timothy N Miller, and Aaron Carpenter. 2015. Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 173–182.
    [17]
    Jeff Bush, Mohammad A Khasawneh, Khaled Z Mahmoud, and Timothy N Miller. 2016. NyuziRaster: Optimizing rasterizer performance and energy in the Nyuzi open source GPU. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 204–213.
    [18]
    Matheus A. Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, and Luca Benini. 2019. Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI. CoRR abs/1906.00478(2019). arXiv:1906.00478
    [19]
    Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44–54.
    [20]
    Jongsok Choi, Kevin Nam, Andrew Canis, Jason Anderson, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. IEEE, 17–24.
    [21]
    Sylvain Collange. 2017. Simty: generalized SIMT execution on RISC-V. In First Workshop on Computer Architecture Research with RISC-V (CARRV 2017). 6.
    [22]
    Jordi Cortadella, Marc Galceran-Oms, and Mike Kishinevsky. 2010. Elastic systems. In Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010). IEEE, 149–158.
    [23]
    Victor Moya Del Barrio, Carlos González, Jordi Roca, Agustín Fernández, and E Espasa. 2006. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 231–241.
    [24]
    Fares Elsabbagh, Blaise Tine, Priyadarshini Roshan, Ethan Lyons, Euna Kim, Da Eun Shim, Lingjun Zhu, Sung Kyu Lim, and Hyesoon Kim. 2020. Vortex: OpenCL Compatible RISC-V GPGPU. CoRR abs/2002.12151(2020). arXiv:2002.12151https://arxiv.org/abs/2002.12151
    [25]
    H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 365–376.
    [26]
    Jon P Ewins, Marcus D Waller, Martin White, and Paul F Lister. 1998. Mip-map level selection for texture mapping. IEEE Transactions on Visualization and Computer Graphics 4, 4(1998), 317–329.
    [27]
    Kayvon Fatahalian. [n.d.]. Lecture 15: Optimizing Data Access in the Graphics Pipeline. http://cs348k.stanford.edu/fall18/lecture/gfxmemory.
    [28]
    Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. 2007. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. IEEE Computer Society, 407–420. https://doi.org/10.1109/MICRO.2007.12
    [29]
    Google. 2019. Google Stadia. https://stadia.google.com/.
    [30]
    Green500. 2019. Green500 list - June 2019. https://www.top500.org/lists/2019/06/
    [31]
    Ayub A Gubran and Tor M Aamodt. 2019. Emerald: graphics modeling for SoC systems. In Proceedings of the 46th International Symposium on Computer Architecture. 169–182.
    [32]
    A. Gutierrez, B. M. Beckmann, A. Dutu, J. Gross, M. LeBeane, J. Kalamatianos, O. Kayiran, M. Poremba, B. Potter, S. Puthoor, M. D. Sinclair, M. Wyse, J. Yin, X. Zhang, A. Jain, and T. Rogers. 2018. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 608–619. https://doi.org/10.1109/HPCA.2018.00058
    [33]
    Yuanjie Huang, Paolo Ienne, Olivier Temam, Yunji Chen, and Chengyong Wu. 2013. Elastic cgras. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays. 171–180.
    [34]
    Homan Igehy, Matthew Eldridge, and Kekoa Proudfoot. 1998. Prefetching in a texture cache architecture. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware. 133–ff.
    [35]
    Intel. [n.d.]. Intel Graphics Hardware Specifications. https://01.org/linuxgraphics/documentation/hardware-specification-prms.
    [36]
    Intel. 2018. the Open Programmable Acceleration Engine (OPAE). https://01.org/opae.
    [37]
    Pekka Jaaskelainen, Carlos Sanchez de La Lama, Erik Schnetter, Kalle Raiskila, Jarmo Takala, and Heikki Berg. 2015. POCL: Portable Computing Language. http://portablecl.org. International Journal of Parallel Programming (2015), 752–785.
    [38]
    Mohammad Reza Kakoee, Vladimir Petrovic, and Luca Benini. 2012. A multi-banked shared-l1 cache architecture for tightly coupled processor clusters. In 2012 International Symposium on System on Chip (SoC). IEEE, 1–5.
    [39]
    Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, and Markus Steinberger. 2018. On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing. PACMCGIT 1, 2 (2018), 28:1–28:17. https://doi.org/10.1145/3233303
    [40]
    Chad D. Kersey, Hyesoon Kim, and Sudhakar Yalamanchili. 2017. Lightweight SIMT Core Designs for Intelligent 3D Stacked DRAM. In Proceedings of the International Symposium on Memory Systems (Alexandria, Virginia) (MEMSYS ’17). ACM, 49–59. https://doi.org/10.1145/3132402.3132426
    [41]
    Hyesoon Kim, Jaekyu Lee, Nagesh B Lakshminarayana, Jaewoong Sim, Jieun Lim, and Tri Pho. 2012. Macsim: A cpu-gpu heterogeneous simulation framework user guide. Georgia Institute of Technology(2012).
    [42]
    Charles Eric LaForest and J Gregory Steffan. 2010. Efficient multi-ported memories for FPGAs. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. 41–50.
    [43]
    Samuli Laine and Tero Karras. 2011. High-performance software rasterization on GPUs. In Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics. 79–88.
    [44]
    C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis amp; transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.75–86. https://doi.org/10.1109/CGO.2004.1281665
    [45]
    Y. Lee, A. Waterman, R. Avizienis, H. Cook, C. Sun, V. Stojanović, and K. Asanović. 2014. A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators. In ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC). 199–202. https://doi.org/10.1109/ESSCIRC.2014.6942056
    [46]
    Alexander Lier, Marc Stamminger, and Kai Selgrad. 2018. CPU-style SIMD ray traversal on GPUs. In HPG ’18.
    [47]
    LunarG. 2019. LunarGLASS Shader Compiler Stack. https://www.lunarg.com/.
    [48]
    Mike Mantor. 2012. AMD Radeon™ HD 7970 with graphics core next (GCN) architecture. In 2012 IEEE Hot Chips 24 Symposium (HCS). IEEE, 1–35.
    [49]
    Microsoft. 2019. Microsoft XCloud. https://www.xbox.com/en-US/xbox-game-streaming/project-xcloud/.
    [50]
    A. Munshi. 2009. The OpenCL specification. In 2009 IEEE Hot Chips 21 Symposium (HCS). 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342
    [51]
    Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N. Patt. 2011. Improving GPU Performance via Large Warps and Two-level Warp Scheduling(MICRO-44). ACM, 308–317. https://doi.org/10.1145/2155620.2155656
    [52]
    NVIDIA. 2010. PTX: Parallel thread execution ISA version 2.3. http://developer.nvidia.com/compute/cuda.
    [53]
    Rafael T Possignolo, Elnaz Ebrahimi, Haven Skinner, and Jose Renau. 2016. FluidPipelines: Elastic circuitry without throughput penalty. In Logic Synthesis (IWLS), Proceedings of the 2016 International Workshop on.
    [54]
    Jason Power, Joel Hestness, Marc S Orr, Mark D Hill, and David A Wood. 2014. gem5-gpu: A heterogeneous cpu-gpu simulator. IEEE Computer Architecture Letters 14, 1 (2014), 34–36.
    [55]
    Kyle Roarty and Matthew D Sinclair. 2020. Modeling Modern GPU Applications in gem5. In gem5 Users Workshop.
    [56]
    Ben Sander and AMD SENIOR FELLOW. 2013. HSAIL: Portable compiler IR for HSA. In Hot Chips Symposium. 1–32.
    [57]
    Jason Sanders and Edward Kandrot. 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.
    [58]
    Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, 2008. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics (TOG) 27, 3 (2008), 1–15.
    [59]
    Wilson Snyder. [n.d.]. Verilator. https://www.veripool.org/wiki/verilator.
    [60]
    Rys Sommefeldt. 2015. A look at the PowerVR graphics architecture: Tile-based rendering.
    [61]
    Imagination Technologies. [n.d.]. PowerVR Instruction Set Reference. Rev 1.0. http://cdn.imgtec.com/sdk-documentation/PowerVR+Instruction+Set+Reference.pdf.
    [62]
    Blaise-Pascal Tine, Sudhakar Yalamanchili, and Hyesoon Kim. 2020. Tango: an optimizing compiler for Just-In-Time RTL simulation. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 157–162.
    [63]
    R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 335–344.
    [64]
    Elena Vasiou, Konstantin Shkurko, Erik Brunvand, and Cem Yuksel. 2019. Mach-RT: A Many Chip Architecture for Ray Tracing. In High-Performance Graphics - Short Papers, Markus Steinberger and Tim Foley (Eds.). The Eurographics Association. https://doi.org/10.2312/hpg.20191188
    [65]
    Ingo Wald, Will Usher, Nate Morrical, Laura Lediaev, and Valerio Pascucci. 2019. RTX Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location. In High-Performance Graphics - Short Papers. https://doi.org/10.2312/hpg.20191189
    [66]
    Li-Yi Wei. 2004. Tile-based texture mapping on graphics hardware. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware. 55–63.
    [67]
    Mike Wissolik, Darren Zacher, Anthony Torza, and Brandon Da. 2017. Virtex UltraScale+ HBM FPGA: A revolutionary increase in memory performance. Xilinx Whitepaper (2017).
    [68]
    Hoi-Jun Yoo, Jeong-Ho Woo, Ju-Ho Sohn, and Byeong-Gyu Nam. 2010. Mobile 3D graphics SoC: From algorithm to chip. John Wiley & Sons.

    Cited By

    View all
    • (2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
    • (2024)Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00123(634-641)Online publication date: 27-May-2024
    • (2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2021
    1322 pages
    ISBN:9781450385572
    DOI:10.1145/3466752
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. computer graphics
    2. memory systems.
    3. reconfigurable computing

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • NSF CCRI
    • NSF CNS

    Conference

    MICRO '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)473
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
    • (2024)Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00123(634-641)Online publication date: 27-May-2024
    • (2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
    • (2023)The METASAT Hardware Platform: A High-Performance Multicore, AI SIMD and GPU RISC-V Platform for On-board Processing2023 European Data Handling & Data Processing Conference (EDHPC)10.23919/EDHPC59100.2023.10396370(1-6)Online publication date: 2-Oct-2023
    • (2023)Skybox: Open-Source Graphic Rendering on Programmable RISC-V GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582024(616-630)Online publication date: 25-Mar-2023
    • (2023)RISC-V Custom Instructions of Elementary Functions for IoT Endpoint DevicesIEEE Transactions on Computers10.1109/TC.2023.333617473:2(523-535)Online publication date: 1-Dec-2023
    • (2023)Enhanced Soft GPU Architecture for FPGAs2023 18th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)10.1109/PRIME58259.2023.10161749(177-180)Online publication date: 18-Jun-2023
    • (2023)Failure Tolerant Training With Persistent Memory Disaggregation Over CXLIEEE Micro10.1109/MM.2023.323754843:2(66-75)Online publication date: 1-Mar-2023
    • (2023)An Eight-Core RISC-V Processor With Compute Near Last Level Cache in Intel 4 CMOSIEEE Journal of Solid-State Circuits10.1109/JSSC.2022.322876558:4(1117-1128)Online publication date: Apr-2023
    • (2023)Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter Analysis2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00017(226-228)Online publication date: 1-Oct-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media