-
Adaptive Dynamic Global Illumination
Authors:
Sayantan Datta,
Negar Goli,
Jerry Zhang
Abstract:
We present an adaptive extension of probe based global illumination solution that enhances the response to dynamic changes in the scene while while also enabling an order of magnitude increase in probe count. Our adaptive sampling strategy carefully places samples in regions where we detect time varying changes in radiosity either due to a change in lighting, geometry or both. Even with large numb…
▽ More
We present an adaptive extension of probe based global illumination solution that enhances the response to dynamic changes in the scene while while also enabling an order of magnitude increase in probe count. Our adaptive sampling strategy carefully places samples in regions where we detect time varying changes in radiosity either due to a change in lighting, geometry or both. Even with large number of probes, our technique robustly updates the irradiance and visibility cache to reflect the most up to date changes without stalling the overall algorithm. Our bandwidth aware approach is largely an improvement over the original \textit{Dynamic Diffuse Global Illumination} while also remaining orthogonal to the recent advancements in the technique.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Analyzing Machine Learning Workloads Using a Detailed GPU Simulator
Authors:
Jonathan Lew,
Deval Shah,
Suchita Pati,
Shaylin Cattell,
Mengchi Zhang,
Amruth Sandhupatla,
Christopher Ng,
Negar Goli,
Matthew D. Sinclair,
Timothy G. Rogers,
Tor Aamodt
Abstract:
Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA's cuDNN library. We use the resulting modified simulator, which has been made available publicly with this paper, to study some simple deep lear…
▽ More
Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA's cuDNN library. We use the resulting modified simulator, which has been made available publicly with this paper, to study some simple deep learning workloads. With our changes to GPGPU-Sim's functional simulation model, we find GPGPU-Sim performance model running a cuDNN enabled implementation of LeNet for MNIST reports results within 30% of real hardware. Using GPGPU-Sim's AerialVision performance analysis tool we observe that cuDNN API calls contain many varying phases and appear to include potentially inefficient microarchitecture behaviour such as DRAM partition bank camping, at least when executed on GPGPU-Sim's current performance model.
△ Less
Submitted 26 January, 2019; v1 submitted 18 November, 2018;
originally announced November 2018.
-
Modeling Deep Learning Accelerator Enabled GPUs
Authors:
Md Aamir Raihan,
Negar Goli,
Tor Aamodt
Abstract:
The efficacy of deep learning has resulted in its use in a growing number of applications. The Volta graphics processor unit (GPU) architecture from NVIDIA introduced a specialized functional unit, the "tensor core", that helps meet the growing demand for higher performance for deep learning. In this paper we study the design of the tensor cores in NVIDIA's Volta and Turing architectures. We furth…
▽ More
The efficacy of deep learning has resulted in its use in a growing number of applications. The Volta graphics processor unit (GPU) architecture from NVIDIA introduced a specialized functional unit, the "tensor core", that helps meet the growing demand for higher performance for deep learning. In this paper we study the design of the tensor cores in NVIDIA's Volta and Turing architectures. We further propose an architectural model for the tensor cores in Volta. When implemented a GPU simulator, GPGPU-Sim, our tensor core model achieves 99.6\% correlation versus an NVIDIA Titan~V GPU in terms of average instructions per cycle when running tensor core enabled GEMM workloads. We also describe support added to enable GPGPU-Sim to run CUTLASS, an open-source CUDA C++ template library providing customizable GEMM templates that utilize tensor cores.
△ Less
Submitted 20 February, 2019; v1 submitted 18 November, 2018;
originally announced November 2018.