Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3307650.3322247acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

OO-VR: NUMA friendly <u>o</u>bject-<u>o</u>riented <u>VR</u> rendering framework for future NUMA-based multi-GPU systems

Published: 22 June 2019 Publication History

Abstract

With the strong computation capability, NUMA-based multi-GPU system is a promising candidate to provide sustainable and scalable performance for Virtual Reality (VR) applications and deliver the excellent user experience. However, the entire multi-GPU system is viewed as a single GPU under the single programming model which greatly ignores the data locality among VR rendering tasks during the workload distribution, leading to tremendous remote memory accesses among GPU models (GPMs). The limited inter-GPM link bandwidth (e.g., 64GB/s for NVlink) becomes the major obstacle when executing VR applications in the multi-GPU system. By conducting comprehensive characterizations on different kinds of parallel rendering frameworks, we observe that distributing the rendering object along with its required data per GPM can reduce the inter-GPM memory accesses. However, this object-level rendering still faces two major challenges in NUMA-based multi-GPU system: (1) the large data locality between the left and right views of the same object and the data sharing among different objects and (2) the unbalanced workloads induced by the software-level distribution and composition mechanisms.
To tackle these challenges, we propose object-oriented VR rendering framework (OO-VR) that conducts the software and hardware co-optimization to provide a NUMA friendly solution for VR multi-view rendering in NUMA-based multi-GPU systems. We first propose an object-oriented VR programming model to exploit the data sharing between two views of the same object and group objects into batches based on their texture sharing levels. Then, we design an object aware runtime batch distribution engine and distributed hardware composition unit to achieve the balanced workloads among GPMs and further improve the performance of VR rendering. Finally, evaluations on our VR featured simulator show that OO-VR provides 1.58x overall performance improvement and 76% inter-GPM memory traffic reduction over the state-of-the-art multi-GPU systems. In addition, OO-VR provides NUMA friendly performance scalability for the future larger multi-GPU scenarios with ever increasing asymmetric bandwidth between local and remote memory.

References

[1]
2017. Direct3D. https://msdn.microsoft.com/en-us/library/windows/desktop/bb219837(v=vs.85).aspx
[2]
2017. OpenGL. https://www.opengl.org/about/
[3]
AMD. 2017. AMD crossfire, https://www.amd.com/en/technologies/crossfire
[4]
Jose-Maria Arnau, Joan-Manuel Parcerisa, and Polychronis Xekalakis. 2014. Eliminating Redundant Fragment Shader Executions on a Mobile GPU via Hardware Memoization. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 529--540. http://dl.acm.org/citation.cfm?id=2665671.2665748
[5]
Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-chip-module GPUs for continued performance scalability. ISCA 2017 45, 2 (2017), 320--332.
[6]
Dean Beeler and Anuj Gosalia. 2016. Asynchronous Time Warp On Oculus Rift. https://developer.oculus.com/blog/asynchronous-timewarp-on-oculus-rift/
[7]
Praveen Bhaniramka, P. C. D. Robert, and S. Eilemann. 2005. OpenGL multipipe SDK: a toolkit for scalable parallel rendering. In VIS 05. IEEE Visualization, 2005. 119--126.
[8]
Hsin-Jung Chen, Feng-Hsiang Lo, Fu-Chiang Jan, and Sheng-Dong Wu. 2010. Real-time multi-view rendering architecture for autostereoscopic displays. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on. IEEE, 1165--1168.
[9]
François De Sorbier, Vincent Nozick, and Hideo Saito. 2010. GPU-Based multi-view rendering. In Computer Games, Multimedia and Allied Technology. 7--13.
[10]
V. M. del Barrio, C. Gonzalez, J. Roca, A. Fernandez, and Espasa E. 2006. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In ISPASS.
[11]
Advanced Micro Devices(AMD). 2015. High-Bandwidth Memory (HBM). https://www.amd.com/Documents/High-Bandwidth-Memory-HBM.pdf
[12]
Stefan Eilemann, Ahmet Bilgili, Marwan Abdellah, Juan Hernando, Maxim Makhinya, Renato Pajarola, and Felix Schürmann. 2012. Parallel rendering on hybrid multi-gpu clusters. In Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association.
[13]
Stefan Eilemann, Maxim Makhinya, and Renato Pajarola. 2009. Equalizer: A scalable parallel rendering framework. IEEE transactions on visualization and computer graphics 15, 3 (2009), 436--452.
[14]
Stefan Eilemann, David Steiner, and Renato Pajarola. 2018. Equalizer 2.0-Convergence of a Parallel Rendering Framework. arXiv preprint arXiv:1802.08022 (2018).
[15]
Daniel Evangelakos and Michael Mara. 2016. Extended TimeWarp Latency Compensation for Virtual Reality. In Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '16). ACM, New York, NY, USA, 193--194.
[16]
Google. 2018. Google VR. https://vr.google.com/
[17]
Mike Houston. 2008. Anatomy of AMD TeraScale Graphics Engine. In SIGGRAPH.
[18]
Chang Hui, Lei Xiaoyong, and Dai Shuling. 2009. A dynamic load balancing algorithm for sort-first rendering clusters. In Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on. IEEE, 515--519.
[19]
Greg Humphreys, Mike Houston, Ren Ng, Randall Frank, Sean Ahern, Peter D Kirchner, and James T Klosowski. 2002. Chromium: a stream-processing frame-work for interactive rendering on clusters. ACM transactions on graphics (TOG) 21, 3 (2002), 693--702.
[20]
David Kanter. 2015. Graphics processing requirements for enabling immersive VR. In AMD White Paper.
[21]
Youngsok Kim, Jae-Eon Jo, Hanhwi Jang, Minsoo Rhu, Hanjun Kim, and Jangwoo Kim. 2017. GPUpd: A Fast and Scalable multi-GPU Architecture Using Cooperative Projection and Distribution. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). ACM, New York, NY, USA, 574--586.
[22]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). New York, NY, USA.
[23]
M. Makhinya, S. Eilemann, and R. Pajarola. 2010. Fast Compositing for Cluster-parallel Rendering. In Proceedings of the 10th Eurographics Conference on Parallel Graphics and Visualization (EG PGV'10). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 111--120.
[24]
Markets and Markets. 2018. Virtual Reality Market by Offering (Hardware and Software), Technology, Device Type (Head-Mounted Display, Gesture-Tracking Device), Application (Consumer, Commercial, Enterprise, Healthcare, Aerospace Defense) and Geography - Global Forecast to 2024. https://www.marketsandmarkets.com/Market-Reports/reality-applications-market-458.html
[25]
Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi, Aamer Jaleel, Alex Ramirez, and David Nellans. 2017. Beyond the socket: NUMA-aware GPUs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, 123--135.
[26]
Brendan Moloney, Daniel Weiskopf, Torsten Möller, and Magnus Strengert. 2007. Scalable Sort-first Parallel Direct Volume Rendering with Dynamic Load Balancing. (2007), 45--52.
[27]
Nvidia. 2016. GeForce GTX 1080 Whitepaper. https://international.download.nvidia.com/geforce-com/mternational/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf
[28]
Nvidia. 2016. NVIDIA VR Sli. https://developer.nvidia.com/vrworks/graphics/vrsli
[29]
Nvidia. 2018. Nvidia VRworks. https://developer.nvidia.com/vrworks
[30]
Oculus. 2018. Oculus VR Products, https://www.oculus.com/
[31]
Alanah Pearce and Mitch Dyer. 2016. THE LAB: VALVE's free and fun VR mini-game collection. https://www.ign.com/articles/2016/03/24/the-lab-valves-free-and-fun-vr-mini-game-collection
[32]
John W Poulton, William J Dally, Xi Chen, John G Eyles, Stephen G Tell, John M Wilson, and C Thomas Gray. 2013. A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications. (2013).
[33]
Bruno Raffin, Luciano Soares, Tao Ni, Robert Ball, Greg S Schmidt, Mark A Livingston, Oliver G Staadt, and Richard May. 2006. Pc clusters for virtual reality. In Virtual Reality Conference, 2006. IEEE, 215--222.
[34]
Grand View Research. 2017. Virtual Reality (VR) Market Analysis By Device, By Technology, By Component, By Application (Aerospace Defense, Commercial, Consumer Electronics, Industrial, Medical), By Region, And Segment Forecasts, 2018 - 2025. https://www.grandviewresearch.com/industry-analysis/virtual-reality-vr-market
[35]
Unity. 2018. Unity User Manual-Single Pass Stereo Rendering. https://docs.unity3d.com/Manual/SinglePassStereoRendering.html
[36]
JMP Van Waveren. 2016. The asynchronous time warp for virtual reality on consumer hardware. In Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology. ACM, 37--46.
[37]
Alex Vlachos. 2016. Advanced VR rendering performance. In Game Developer Conference.
[38]
Pan Wang, Zhiquan Cheng, Ralph Martin, Huahai Liu, Xun Cai, and Sikun Li. 2013. NUMA-aware image compositing on multi-GPU platform. The Visual Computer 29, 6--8 (2013), 639--649.
[39]
Michael Wimmer and Peter Wonka. 2003. Rendering Time Estimation for Realtime Rendering. In Proceedings of the 14th Eurographics Workshop on Rendering (EGRW '03). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 118--129. http://dl.acm.org/citation.cfm?id=882404.882422
[40]
Chenhao Xie, Xin Fu, and Shuaiwen Song. 2018. Perception-Oriented 3D Rendering Approximation for Modem Graphics Processors. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24--28, 2018. 362--374.
[41]
Chenhao Xie, Shuaiwen Leon Song, Jing Wang, Weigong Zhang, and Xin Fu. 2017. Processing-in-Memory Enabled Graphics Processors for 3D Rendering. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[42]
Chenhao Xie, Xingyao Zhang, Ang Li, Xin Fu, and Shuaiwen Leon Song. 2019. PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World With Customized Memory Cube. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[43]
Vinson Young, Aamer Jaleel, Evgeny Bolotin, Eiman Ebrahimi, David Nellans, and Oreste Villa. 2018. Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems. In 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44]
D. Zhang and Y. Luo. 2012. Single-trial ERPs elicited by visual stimuli at two contrast levels: Analysis of ongoing EEG and latency/amplitude jitters. In 2012 IEEE Symposium on Robotics and Applications (ISRA). 85--88.

Cited By

View all
  • (2024)GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00085(1080-1094)Online publication date: 2-Mar-2024
  • (2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
  • (2023)PBVR: Physically Based Rendering in Virtual Reality2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00039(77-86)Online publication date: 1-Oct-2023
  • Show More Cited By

Index Terms

  1. OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
          June 2019
          849 pages
          ISBN:9781450366694
          DOI:10.1145/3307650
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          In-Cooperation

          • IEEE-CS\DATC: IEEE Computer Society

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 22 June 2019

          Permissions

          Request permissions for this article.

          Check for updates

          Qualifiers

          • Research-article

          Funding Sources

          Conference

          ISCA '19
          Sponsor:

          Acceptance Rates

          ISCA '19 Paper Acceptance Rate 62 of 365 submissions, 17%;
          Overall Acceptance Rate 543 of 3,203 submissions, 17%

          Upcoming Conference

          ISCA '25

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)21
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 08 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00085(1080-1094)Online publication date: 2-Mar-2024
          • (2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
          • (2023)PBVR: Physically Based Rendering in Virtual Reality2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00039(77-86)Online publication date: 1-Oct-2023
          • (2023)Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071054(456-470)Online publication date: Feb-2023
          • (2022)Strategies for the Digitalization of Cultural HeritageComputational Science and Its Applications – ICCSA 2022 Workshops10.1007/978-3-031-10592-0_35(486-502)Online publication date: 4-Jul-2022
          • (2021)Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB DesignMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480083(1154-1168)Online publication date: 18-Oct-2021
          • (2021)Q-VR: system-level design for future mobile collaborative virtual realityProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446715(587-599)Online publication date: 19-Apr-2021
          • (2021)CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00065(709-722)Online publication date: Feb-2021

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media