GPU Computing Revolution CUDA
GPU Computing Revolution CUDA
ed licensed use limited to: CENTRE FOR DEVELOPMENT OF ADVANCED COMPUTING - CDAC - KOLKATA. Downloaded on February 24,2022 at 11:41:00 UTC from IEEE Xplore. Restriction
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)
levels of parallelism. The parallel programming uses hybrid CUDA, Kernels, and Threads :- The CUDA programming
CUDA, OpenMP and MPI (Message Passing Interface) model is a model in which both CPU and GPU are used.
programming [1]. The DEVICE = GPU which execute many threads in parallel.
A CUDA program is executed into a host program, having at The HOST = CPU which executes serial portion of an
least one thread running on the host Central Processing Unit application. KERNELS are the functions that run on the
and at least one parallel kernels that are required for the device. The explanation below is illustrated by Figure 2.
execution on Graphical Processing Unit which is a parallel In CUDA programming model, the GPU is considered as
preparing device. A kernel at that point executes a scalar a device. It is a co-processor to CPU and comprise of its own
successive program on a set of parallel threads. The developer DRAM (GPU memory), which runs numerous threads in
needs to arrange these threads into a matrix of thread blocks. parallel. A kernel executes a framework which comprises of
blocks. A block comprises of a cluster of threads which
cooperate with each other. And the cooperation is done by
synchronizing their execution i.e. (shared memory) and by
sharing data. Two threads from 2 different blocks cannot
cooperate.
Threads and blocks have their own IDs. Using this ID each
thread can decide on what data to work on.
This model has solved the memory addressing problems. [2]
The threads of solitary blocks are allowed to synchronize with Fig. 2. CUDA Model Overview
each other by means of barriers and approach a fast on-chip
memory for between threads correspondence. Threads from
different blocks in a comparative structure can facilitate just
II. APPLICATIONS OF CUDA-
by methods for shared global memory space, which is visible
to all of the thread. CUDA requires that thread blocks to be a) MRI:
free, inferring that a kernel should execute effectively Magnetic Resonance Imaging frames pictures with
regardless of the request in which blocks are asked for, high spatial-determination, The thing which has to be checked
regardless of whether all blocks are executed progressively in is set in an exceptionally strong field, that makes the turn of its
the arbitrary order. The repression on the conditions between cores adjust either parallel or restricting parallel to the circle.
blocks of a kernel gives adaptability. It similarly deduces that Radiofrequency beats are then connected to energizing the
the prerequisite for synchronization among threads is the cores, that radiate vitality once they then return to their unique
fundamental thought in separating parallel work into isolated state. The first normal core utilized in imaging is the
kernels. [3] Hydrogen. To broaden the rate parallel imaging and packed
detecting is connected. In X-ray, imaging Gridding is utilized
Process Flow on CUDA for picture remaking. To dissect the difficulties and chances of
In Figure 1 we illustrate the process flow in CUDA. The using general GPU process, the non-parallel Quick Fourier
Main Memory copies the data to the Memory of GPU. Then rebuild algorithmic program was executed by Gregerson that
the CPU sends the process instructions to the GPU where it is was called as Gridding, on a GeForce 8800 GPU Nvidia's
executed parallelly in each core. After the execution, the data CUDA system. Investigation changes over a manifestation
is copied to the Main Memory from the GPU memory. from its unique area (frequently time or space) to the
recurrence area and contrariwise. It lessens the nature of
CUDA model overview processing from O(n^2) to O(n log n). This increase GPU
execution by four-hundredth.[4]
198
ed licensed use limited to: CENTRE FOR DEVELOPMENT OF ADVANCED COMPUTING - CDAC - KOLKATA. Downloaded on February 24,2022 at 11:41:00 UTC from IEEE Xplore. Restriction
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)
199
ed licensed use limited to: CENTRE FOR DEVELOPMENT OF ADVANCED COMPUTING - CDAC - KOLKATA. Downloaded on February 24,2022 at 11:41:00 UTC from IEEE Xplore. Restriction
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)
pixels in a parallel picture is a total morphological depiction of Unified Device Architecture) is proposed which is utilized to
a picture. In the paired (binary) pictures, the sets are given quicken Multi-sensor picture combination, in which, every last
individuals from the 2D whole number space Z2, where every pixel is alloted to a solitary CUDA thread with the goal that
component of a set is a tuple (2D vector) whose directions are pixel-level picture combination would be parallelized by
the (x, y) directions of a white (or dark) pixel in the picture. numerous CUDA threads all the while. The execution for
The same as above can be stretched out for greyscale pictures. picture combination is additionally enhanced by making
utilization of the texture memory in CUDA. What's more, the
While the Gray-scale pictures can be spoken to as sets, whose
exploratory outcomes have obviously demonstrated that the
parts are in Z3: two segments are directions of a pixel and the
proposed strategies would enhance execution amazingly than
third one is its discrete power esteem. A CUDA program is upgraded CPU partner accessible. [12]
considered as a serial program with parallel segments. This
code of CPU and GPU is bound together, it begins its j) Pyramidal image blending using CUDA framework:
execution on the CPU having serial code and at whatever In their work, NVIDIA Corporation (2010) considered the
point a parallel code happens it is executed onto GPU and CUDA innovation as takes after. The method of pyramidal
control is offered back to CPU when parallel code completes picture mixing is one of the vital picture mixing system which
execution. The NVIDIA CUDA compiler (NVCC) isolates is utilized as an application in the all panoramic image
two unique codes (device and host) and after that incorporates stitching. The intriguing system of pyramidal picture
them independently from each other. The host is clarified as a procedure utilizes various picture pyramids that are required to
basic C code which is gathered with ordinary C compiler. The perform at various resolution levels of pictures. The beginning
code of device is not quite the same as the host code and is of the calculation happens with building Laplacian pyramids
for the two given (input) pictures and the Gaussian pyramid
likewise customized in a programming dialect with some other
for the mask picture. A short time later these picture pyramids
little augmentations to have programming dialect. It is are joined to frame the Laplacian pyramid, which is
assembled with "nvcc". The capacities that execute on GPU subsequently fell to get a seamless panaroma as the yield. The
and information parallel are called as "kernels". [10] full procedure as specified above isn't just computationally
costly tedious also and subsequently it should be speedup, and
h) CUDA based parallel wavelet algorithm in medical image
in this manner therefore, GPU based pyramidal picture mixing
fusion: calculation is acknowledged on NVIDIA's Compute Unified
In their work IMSNA (2013) considered the Device Architecture (CUDA). [13]
CUDA innovation for medicinal picture combination (image
fusion) utilizing wavelet calculation as takes after. In the III. CONCLUSION
present date essentially, CPU and PC group are utilized by
the restorative picture combination. At that point the parallel CUDA is an advance technology that brought revolution in the
wavelet picture combination calculation is given in view of processing power of the GPUs. The parallel execution of code
the model which is CUDA stream figuring model. can also be used to solve many other problems which include
Furthermore, a while later to the definite examinations of image processing, Bioinformatics, Molecular Dynamics.
wavelet picture combination, CUDA multi-layer stockpiling CUDA C platform is a software layer that has granted
structure and SIMD design, the information gathering and direct access to the virtual instruction set of GPUs. CUDA
the thread task is being deteriorated. The clinic image integration is always fantastic as NVIDIA provides high
investigations is additionally done by the assistance of quality support to the app developers who opt for CUDA
CUDA based wavelet picture combination calculation. In acceleration. Sometimes CUDA is not as that easy for the
this manner the test demonstrates this calculation can various apps to adopt as OpenCL as CUDA is a closed Nvidia
enhance the speed of CPU based one with 8-10 times and it framework and not open source like OpenCL but on the other
likewise has a linear speed up capacity. [11] hand, CUDA is supported by a wide variety of apps which is a
major advantage of using it. CUDA cross-develops with
i) Parallelizing Multi-Sensor Image Fusion Using CUDA: confidence and ease and thus helps in maintaining and using
In their work Journal of Computational and highly customized environments. It has more mature tools
Theoretical Nanoscience 7(1):408-411 · March 2012 has which comprise of a debugger and a profiler which are
considered the CUDA innovation as given takes after. The CUBLAS and CUFFT. Wherever CUDA is integrated it
picture combination (image fusion) in view of Pixel-level ensures unparalleled performance due to the high-quality
would achieve fantastic picture combination and in this way NVIDIA support. It provides the user with problem free
help enhance the execution of picture combination for the environment for running jobs.
multi-sensors. Howsoever there is a vast volume of
information that will be prepared yet the continuous IV. REFERENCES
combination handling is costly and furthermore un-adaptable.
[1] Yang, C. T., Huang, C. L., & Lin, C. F. (2011). Hybrid CUDA,
In this way considering the over, a fine-grained pixel-level OpenMP, and MPI parallel programming on multicore GPU
picture combination execution in view of CUDA (Compute clusters. Computer Physics Communications, 182(1), 266-269.
200
ed licensed use limited to: CENTRE FOR DEVELOPMENT OF ADVANCED COMPUTING - CDAC - KOLKATA. Downloaded on February 24,2022 at 11:41:00 UTC from IEEE Xplore. Restriction
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)
[2] Yang, Z., Zhu, Y., & Pu, Y. (2008, December). Parallel image
processing based on CUDA. In Computer Science and Software
Engineering, 2008 International Conference on (Vol. 3, pp. 198-201).
IEEE.
[3] Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J.,
Morton, S., ... & Volkov, V. (2008). Parallel computing experiences
with CUDA. IEEE micro, 28(4).
[5] Reichl, T., Passenger, J., Acosta, O., & Salvado, O. (2009, February).
Ultrasound goes GPU: real-time simulation using CUDA. In SPIE
Medical Imaging (pp. 726116-726116). International Society for Optics
and Photonics.
[6] Noël, P. B., Walczak, A. M., Xu, J., Corso, J. J., Hoffmann, K. R., &
Schafer, S. (2010). GPU-based cone beam computed tomography.
Computer methods and programs in biomedicine, 98(3), 271-277.
[7] Mahmoudi, S. A., Lecron, F., Manneback, P., Benjelloun, M., &
Mahmoudi, S. (2010, September). GPU-based segmentation of cervical
vertebra in X-ray images. In cluster Computing Workshops and Posters,
2010 IEEE International Conference on (pp. 1-8). IEEE.
[11] Jia Liu, Dustin Feld, Yong Xue, Senior Member, IEEE, Jochen Garcke,
and Thomas Soddemann, “Multicore processors and graphics
Processing Unit Accelerators forParallel Retrieval of Aerosol Optical
Depth From Satellite Data: Implementation, Performance, and Energy
Efficiency”, IEEE Journal Of Selected Topics In Applied Earth
Observations And Remote Sensing, No. 5, May 201
201
ed licensed use limited to: CENTRE FOR DEVELOPMENT OF ADVANCED COMPUTING - CDAC - KOLKATA. Downloaded on February 24,2022 at 11:41:00 UTC from IEEE Xplore. Restriction