100% found this document useful (2 votes)

111 views

Gpu Programming

This document discusses accelerating scientific computing using graphics hardware. It provides background on increasing transistor counts and parallel processing using GPUs. It discusses GPU programming models and an example application to computational fluid dynamics using the lattice Boltzmann method. Kernels, grids, blocks and threads are key abstractions for GPU programming.

Uploaded by

Jino Goju Stark

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

111 views

Gpu Programming

Uploaded by

Jino Goju Stark

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

Acceleration of scientific computing

using graphics hardware

Graham Pullan
Whittle Lab
Engineering Department, University of Cambridge
28 May 2008 I’ve added some notes that
weren’t on the original slides to
help readers of the online pdf
version.
Coming up...
• Background
• CPUs and GPUs
• GPU programming models
• An example – CFD
• Alternative devices
• Conclusions
Part 1: Background
Whittle Lab

Courtesy Vicente Jerez

Fidalgo, Whittle Lab
CFD basics

Body-fitted mesh
CFD basics

Body-fitted mesh For each cell, conserve:

• mass
• momentum
• energy
and update flow properties
Approximate compute requirements
“Steady” models (no wake/blade interaction, etc)
1 blade 0.5 Mcells 1 CPU hour
1 stage (2 blades) 1.0 Mcells 3 CPU hours
1 component (5 stages) 5.0 Mcells 20 CPU hours
Approximate compute requirements
“Steady” models (no wake/blade interaction, etc)
1 blade 0.5 Mcells 1 CPU hour
1 stage (2 blades) 1.0 Mcells 3 CPU hours
1 component (5 stages) 5.0 Mcells 20 CPU hours

“Unsteady” models (with wakes, etc)

1 component (1000 blades) 500 Mcells 0.1 M CPU hours
Engine (4000 blades) 2 Gcells 1 M CPU hours
Graham’s coding experience:
• FORTRAN
• C
• MPI
Graham’s coding experience:

• C

•MPI
Part 2: CPUs and GPUs
Moore’s Law
“The complexity for minimum component costs has
increased at a rate of roughly a factor of two per year.
Certainly over the short term this rate can be expected to
continue.”
Gordon Moore (Intel), 1965
Moore’s Law
“The complexity for minimum component costs has
increased at a rate of roughly a factor of two per year.
Certainly over the short term this rate can be expected to
continue.”
Gordon Moore (Intel), 1965

“OK, maybe a factor of two every two years.”

Gordon Moore (Intel), 1975 [paraphrased]
Was Moore right?

Source:
ftp://download.intel.com/resea
rch/silicon/Gordon_Moore_ISSC
C_021003.pdf

Source: Intel
Was Moore right?

Source: Intel
Feature size

Source:
ftp://download.intel.com/resea
rch/silicon/Gordon_Moore_ISSC
C_021003.pdf

Source: Intel
Clock speed

Source:
http://www.tomshardware.com/2005/11/21/the_mother_of_al
l_cpu_charts_2005/index.html
Power – the Clock speed limiter?
• 1 GHz CPU requires ≈ 25 W
• 3 GHz CPU requires ≈ 100 W
Power – the Clock speed limiter?
• 1 GHz CPU requires ≈ 25 W
• 3 GHz CPU requires ≈ 100 W

“The total of electricity consumed by major search

engines in 2006 approaches 5 GW.” – Wired / AMD

Source:
http://www.hotchips.org/hc19/docs/keynote2.pdf
What to do with all these transistors?
Parallel computing
Multi-core chips are either:

– Instruction parallel
(Multipile Instruction, Multiple Data) – MIMD

– Data parallel
(Single Instruction, Multiple Data) – SIMD
Today’s commodity MIMD chips: CPUs
Intel Core 2 Quad
• 4 cores
• 2.4 GHz
• 65nm features
• 582 million transistors
• 8MB on chip memory
Today’s commodity SIMD chips: GPUs
NVIDIA 8800 GTX
• 128 cores
• 1.35 GHz
• 90nm features
• 681 million transistors
• 768MB on board memory
CPUs vs GPUs

Source:
http://www.eng.cam.ac.uk/~gp10006/research/Brandvi
k_Pullan_2008a_DRAFT.pdf
CPUs vs GPUs
Transistor usage:

Source: NVIDIA CUDA SDK

documentation
Graphics pipeline

Source:
ftp://download.nvidia.com/developer/presentations/200
4/Perfect_Kitchen_Art/English_Evolution_of_GPUs.pdf
Graphics pipeline
GPUs and scientific computing

GPUs are designed to apply the

same shading function
to many pixels simultaneously
GPUs and scientific computing

GPUs are designed to apply the

same function
to many data simultaneously

This is what most scientific computing needs!

Part 3: Programming methods
3 Generations of GPGPU (Owens, 2008)
3 Generations of GPGPU (Owens, 2008)
• Making it work at all:
– Primitive functionality and tools (graphics APIs)
– Comparisons with CPU not rigorous
3 Generations of GPGPU (Owens, 2008)
• Making it work at all:
– Primitive functionality and tools (graphics APIs)
– Comparisons with CPU not rigorous
• Making it work better:
– Easier to use (higher level APIs)
– Understanding of how best to do it
3 Generations of GPGPU (Owens, 2008)
• Making it work at all:
– Primitive functionality and tools (graphics APIs)
– Comparisons with CPU not rigorous
• Making it work better:
– Easier to use (higher level APIs)
– Understanding of how best to do it
• Doing it right:
– Stable, portable, modular building blocks

Source:
http://www.ece.ucdavis.edu/~jowens/talks/intel-
santaclara-070420.pdf
GPU – Programming for graphics
Courtesy, John Owens, UC Davis

Application specifies geometry – GPU

rasterizes

Each fragment is shaded (SIMD)

Shading can use values from memory

(textures)

Image can be stored for re-use

Source:
http://www.ece.ucdavis.edu/~jowens/talks/intel-
santaclara-070420.pdf
GPGPU programming (“old-school”)

Draw a quad

Run a SIMD program over each

fragment

Gather is permitted from texture memory

Resulting buffer can be stored for re-use

Courtesy, John Owens, UC Davis

NVIDIA G80 hardware implementation
• Vertex/fragment processors replaced by Unified Shaders
• Now view GPU as massively parallel co-processor
• Set of (16) SIMD MultiProcessors (8 cores)

Source:
http://www.ece.wisc.edu/~kati/fpga2008/fpga2008%20wo
rkshop%20-%2006%20NVIDIA%20-%20Kirk.pdf
NVIDIA G80 hardware implementation
Divide 128 cores into
16 Multiprocessors (MPs)

•Each MP has:
–Registers
–Shared memory
–Read only constant
cache
–Read only texture
cache
NVIDIA’s CUDA programming model
• G80 chip supports MANY active threads: 12,288
• Threads are lightweight:
– Little creation overhead
– “instant” switching
– Efficiency achieved through 1000’s of threads
• Threads are organised into blocks (1D, 2D, 3D)
• Blocks are further organised into a grid
Kernels, grids, blocks and threads
Kernels, grids, blocks and threads
• Organisation of threads and blocks is key abstraction
Kernels, grids, blocks and threads
• Organisation of threads and blocks is key abstraction
• Software:
– Threads from one block may cooperate:
• Using data in shared memory
• Through synchronising
Kernels, grids, blocks and threads
• Organisation of threads and blocks is key abstraction
• Software:
– Threads from one block may cooperate:
• Using data in shared memory
• Through synchronising
• Hardware:
– A block runs on one MP
– Hardware free to schedule any block on any MP
– More than one block can reside on one MP
Kernels, grids, blocks and threads
CUDA implementation
• CUDA implemented as extensions to C

• CUDA programs:
– explicitly manage host and device memory:
• allocation
• transfers
– set thread blocks and grid
– launch kernels
– are compiled with the CUDA nvcc compiler
Part 4: An example – CFD
Distribution function

f = f (c, x , t )
c is microscopic velocity

ρ = ∫ f dc

ρ u = ∫ cf d c
u is macroscopic velocity
Boltzmann equation
The evolution of f:

∂f ∂f
+ u ⋅ ∇f =
∂t ∂t collisions

Major simplification:

∂f 1
+ u ⋅ ∇f = − ( f − f )
eq

∂t τ
Lattice Boltzmann Method
Uniform mesh (lattice)
Lattice Boltzmann Method
Uniform mesh (lattice) Restrict microscopic velocities
to a finite set:

ρ = ∑ fα ρu = ∑ fα cα
α α
Macroscopic flow

For 2D, 9 velocities recover

• Isothermal, incompressible Navier-Stokes eqns

⎛ 1 ⎞ Δx 2
• With viscosity: ν = ⎜τ − ⎟
⎝ 2 ⎠ Δt
Solution procedure

1. Evaluate macroscopic properties:

ρ = ∑ fα ρu = ∑ fα cα
α α

2. Evaluate fαeq ( ρ , u )

3. Find

fα = fα −
*

τ
1
( f α − fα
eq
)
Solution procedure
Solution procedure

Simple prescriptions at
boundary nodes
CPU code: main.c
/* Memory allocation */
f0 = (float *)malloc(ni*nj*sizeof(float));
...

/* Main loop */
Stream (...args...);
Apply_BCs (...args...);
Collide (...args...);
GPU code: main.cu
/* allocate memory on host */
f0 = (float *)malloc(ni*nj*sizeof(float));

/* allocate memory on device */

cudaMallocPitch((void **)&f0_data, &pitch,
sizeof(float)*ni, nj);

cudaMallocArray(&f0_array, &desc, ni, nj);

/* Main loop */
Stream (...args...);
Apply_BCs (...args...);
Collide (...args...);
CPU code – collide.c
for (j=0; j<nj; j++) {
for (i=0; i<ni; i++) {
i2d = I2D(ni,i,j);
/* Flow properties */
density = ...function of f’s ...
vel_x = ... “
vel_y = ... “
/* Equilibrium f’s */
f0eq = ... function of density, vel_x, vel_y ...
f1eq = ... “
/* Collisions */
f0[i2d] = rtau1 * f0[i2d] + rtau * f0eq;
f1[i2d] = rtau1 * f1[i2d] + rtau * f1eq;
...
}
}
GPU code – collide.cu – kernel wrapper

void collide( ... args ...)

{
/* Set thread blocks and grid */
dim3 grid = dim3(ni/TILE_I, nj/TILE_J);
dim3 block = dim3(TILE_I, TILE_J);

/* Launch kernel */
collide_kernel<<<grid, block>>>(... args ...);

}
GPU code – collide.cu - kernel
/* Evaluate indices */
i = blockIdx.x*TILE_I + threadIdx.x;
j = blockIdx.y*TILE_J + threadIdx.y;
i2d = i + j*pitch/sizeof(float);
/* Read from device global memory */
f0now = f0_data[i2d];
f1now = f1_data[i2d];

/* Calc flow, feq, collide, as CPU code */

/* Write to device global memory */

f0_data[i2d] = rtau1 * f0now + rtau * f0eq;
f1_data[i2d] = rtau1 * f1now + rtau * f1eq;
GPU code – stream.cu – kernel wrapper
void stream( ... args ...)
{
/* Copy linear memory to CUDA array */
cudaMemcpy2DToArray(f1_array, 0, 0,
(void *)f1_data, pitch,sizeof(float)*ni, nj,
cudaMemcpyDeviceToDevice);
/* Make CUDA array a texture */
f1_tex.filterMode = cudaFilterModePoint;
cudaBindTextureToArray(f1_tex, f1_array));
/* Set threads and launch kernel */
dim3 grid = dim3(ni/TILE_I, nj/TILE_J);
dim3 block = dim3(TILE_I, TILE_J);
stream_kernel<<<grid, block>>>(... args ...);
}
GPU code – stream.cu – kernel
/* indices */

i = blockIdx.x*TILE_I + threadIdx.x;
j = blockIdx.y*TILE_J + threadIdx.y;
i2d = i + j*pitch/sizeof(float);

/* stream using texture fetches */

f1_data[i2d] = tex2D(f1_tex, (i-1), j);
f2_data[i2d] = tex2D(f2_tex, i, (j-1));
...
CPU / GPU demo
Results
• 2D Lattice Boltzmann code: 15x speedup GPU vs CPU

• Real CFD is more complex:

– more kernels
– 3D

• To improve performance, make use of shared memory

3D stencil operations
• Most CFD operations use nearest neighbour lookups
(stencil operations)

• e.g. 7 point stencil: centre point + 6 nearest neighbours

• Load data into shared memory

• Perform stencil ops
• Export results to device global memory
• Read in more data into shared memory
Stencil operations

3D sub-domain Threads in one plane

Source:
http://www.eng.cam.ac.uk/~gp10006/research/Brandvik_Pullan_2
008a_DRAFT.pdf
CUDA stencil kernel
__global__ void smooth_kernel(float sf, float
*a_data, float *b_data){

/* shared memory array */

__shared__ float a[16][3][5];
/* fetch first planes */
a[i][0][k] = a_data[i0m10];
a[i][1][k] = a_data[i000];
a[i][2][k] = a_data[i0p10];
__syncthreads();
/* compute */
b_data[i000] =
sf1*a[i][1][k] + sfd6*(a[im1][1][k] +
a[ip1][1][k] + a[i][0][k] +
a[i][2][k] + a[i][1][km1] + a[i][1][kp1])
/* load next "j" plane and repeat ...*/
Typical grid – CUDA partitioning
Typical grid – CUDA partitioning

Each colour to a different

multiprocessor
3D results

30x speedup GPU vs CPU

Part 5: NVIDIA – the only show in town?
NVIDIA

• 4 Tesla HPC GPUs

• 500 GFLOPs peak per GPU
• 1.5GB per GPU
AMD

• Firestream HPC GPU

• 500 GFLOPs
• 2GB
• available?
ClearSpeed

80 GFLOPs
35 W !
IBM Cell BE

25 x 8 GFLOPs
Chip comparison (Giles 2008)

Source:
http://www.cardiff.ac.uk/arcca/services/events/NovelArchitecture/
Mike-Giles.pdf
Too much choice!
• Each device has
– different hardware characteristics
– different software (C extensions)
– different developer tools

• How can we write code for all SIMD devices for all
applications?
Big picture – all devices, all problems?
Forget the big picture
Tackle the dwarves!
The View from Berkeley (7 “dwarves”)
1. Dense Linear Algebra
2. Sparse Linear Algebra
3. Spectral Methods
4. N-Body Methods
5. Structured Grids
6. Unstructured Grids
7. MapReduce

Source:
http://view.eecs.berkeley.edu/wiki/Main_Page
The View from Berkeley (13 dwarves?)
1. Dense Linear Algebra
2. Sparse Linear Algebra
3. Spectral Methods
4. N-Body Methods
5. Structured Grids
6. Unstructured Grids
7. MapReduce
8. Combinational Logic
9. Graph Traversal
10. Dynamic Programming
11. Backtrack and Branch-and-Bound
12. Graphical Models
13. Finite State Machines
The View from Berkeley (13 dwarves?)
1. Dense Linear Algebra
2. Sparse Linear Algebra
3. Spectral Methods
4. N-Body Methods
5. Structured Grids
6. Unstructured Grids
7. MapReduce
8. Combinational Logic
9. Graph Traversal
10. Dynamic Programming
11. Backtrack and Branch-and-Bound
12. Graphical Models
13. Finite State Machines
SBLOCK (Brandvik)
• Tackle structured grid, stencil operations dwarf
• Define kernel using high level Python abstraction
• Generate kernel for a range of devices from same
definition: CPU, GPU, Cell
• Use MPI to handle multiple devices
SBLOCK kernel definition
kind = "stencil"
bpin = ["a"]
bpout = ["b"]
lookup = ((1,0, 0), (0, 0, 0), (1,0, 0), (0, 1,0),
(0, 1, 0), (0, 0, 1), (0, 0, 1))
calc = {"lvalue": "b",
"rvalue": """sf1*a[0][0][0] +
sfd6*(a[1][0][0] + a[1][0][0] +
a[0][1][0] + a[0][1][0] +
a[0][0][1] + a[0][0][1])"""}
SBLOCK – CPU implementation (C)
void smooth(float sf, float *a, float *b)
{
for (k=0; k < nk; k++) {
for (j=0; j < nj; j++) {
for (i=0; i < ni; i++) {
/* compute indices i000, im100, etc */
b[i000] = sf1*a[i000] +
sfd6*(a[im100] + a[ip100] +
a[i0m10] + a[i0p10]
+ a[i00m1] + a[i00p1]);
}
}
}
}
SBLOCK – GPU implementation (CUDA)
__global__ void smooth_kernel(float sf, float
*a_data, float *b_data){

/* shared memory array */

So long as the task fits the dwarf:

• Programmer need not learn every device library

• Optimal device code is produced
• Code is future proofed (so long as back-ends are
available)
Part 6: Conclusions
Conclusions
• Many science applications fit the SIMD model
• GPUs are commodity SIMD chips
• Good speedups (10x – 100x) can be achieved
Conclusions
• Many science applications fit the SIMD model
• GPUs are commodity SIMD chips
• Good speedups (10x – 100x) can be achieved

• GPGPU is evolving (Owens, UC Davis):

1. Making it work at all (graphics APIs)
2. Doing it better (high level APIs)
3. Doing it right (portable, modular building blocks)
Conclusions
• Many science applications fit the SIMD model
• GPUs are commodity SIMD chips
• Good speedups (10x – 100x) can be achieved

• GPGPU is evolving (Owens, UC Davis):

1. Making it work at all (graphics APIs)
2. Doing it better (high level APIs)
3. Doing it right (portable, modular building blocks)
Acknowledgements and info
• Research student: Tobias Brandvik (CUED)
• Donation of GPU hardware: NVIDIA

http://dx.doi.org/10.1109/JPROC.2008.917757

http://www.gpgpu.org

http://www.oerc.ox.ac.uk/research/many-core-and-
reconfigurable-supercomputing

FPGA-SoC Implementation of YOLOv4 For Flying-Object Detection
No ratings yet
FPGA-SoC Implementation of YOLOv4 For Flying-Object Detection
20 pages
Exportlayers DWG ISO13567
No ratings yet
Exportlayers DWG ISO13567
5 pages
BTP Presentation On Text To Image Synthesis
100% (1)
BTP Presentation On Text To Image Synthesis
38 pages
GPU Datasheet
No ratings yet
GPU Datasheet
3 pages
Green Building
86% (7)
Green Building
60 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Nvidia Cuda
No ratings yet
Nvidia Cuda
26 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
Compute Unified Device Architecture
No ratings yet
Compute Unified Device Architecture
6 pages
CUDA Installation Guide Windows
100% (1)
CUDA Installation Guide Windows
17 pages
Product Availability Update: Processamento Paralelo em GPU's Na Arquitetura Fermi
100% (1)
Product Availability Update: Processamento Paralelo em GPU's Na Arquitetura Fermi
44 pages
Nvidia Cuda Arc
No ratings yet
Nvidia Cuda Arc
16 pages
gtc22 Whitepaper Hopper
No ratings yet
gtc22 Whitepaper Hopper
71 pages
CUDA
No ratings yet
CUDA
46 pages
Cuda 9 and Beyond
100% (1)
Cuda 9 and Beyond
45 pages
CUDA Memory Types: Parallel and High Performance Computing
No ratings yet
CUDA Memory Types: Parallel and High Performance Computing
27 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
No ratings yet
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
132 pages
CUDA Installation Guide Windows
No ratings yet
CUDA Installation Guide Windows
28 pages
Nvidia Nano Datasheet
No ratings yet
Nvidia Nano Datasheet
41 pages
NVIDIA Ampere GA102 GPU Architecture Whitepaper V1 PDF
No ratings yet
NVIDIA Ampere GA102 GPU Architecture Whitepaper V1 PDF
44 pages
IDA+VMWare - Linux Debugger
No ratings yet
IDA+VMWare - Linux Debugger
8 pages
Image Rotation Using CUDA
No ratings yet
Image Rotation Using CUDA
18 pages
GPU Wiki
No ratings yet
GPU Wiki
9 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
Nvidia DGX A100 Datasheet
No ratings yet
Nvidia DGX A100 Datasheet
2 pages
8 Nvidia PDF
No ratings yet
8 Nvidia PDF
48 pages
HPC Job
No ratings yet
HPC Job
8 pages
Cuda PDF
No ratings yet
Cuda PDF
18 pages
Gan Types
No ratings yet
Gan Types
8 pages
Android Graphics
No ratings yet
Android Graphics
58 pages
Nvidia Ampere Architecture Whitepaper
No ratings yet
Nvidia Ampere Architecture Whitepaper
83 pages
s9623 Gpu Accelerated 3d Point Cloud Processing With Hierarchical Gaussian Mixtures
No ratings yet
s9623 Gpu Accelerated 3d Point Cloud Processing With Hierarchical Gaussian Mixtures
51 pages
A Graphics Processing Unit
No ratings yet
A Graphics Processing Unit
14 pages
6wind Support Intel DPDK Presentation
100% (1)
6wind Support Intel DPDK Presentation
40 pages
NV Applications Catalog Lowres
No ratings yet
NV Applications Catalog Lowres
20 pages
Tutorial On DNN 4 of 9 DNN Accelerator Architectures PDF
No ratings yet
Tutorial On DNN 4 of 9 DNN Accelerator Architectures PDF
73 pages
Embedded Linux Slides
No ratings yet
Embedded Linux Slides
551 pages
Microsoft PowerPoint - SoC Design Flow Tools Codesign
No ratings yet
Microsoft PowerPoint - SoC Design Flow Tools Codesign
110 pages
High Performance Computing: Course Introduction
No ratings yet
High Performance Computing: Course Introduction
32 pages
6WIND-Intel White Paper - Optimized Data Plane Processing Solutions Using The Intel® DPDK v2
No ratings yet
6WIND-Intel White Paper - Optimized Data Plane Processing Solutions Using The Intel® DPDK v2
8 pages
Egomotion Estimation Using Visual Odometry
No ratings yet
Egomotion Estimation Using Visual Odometry
40 pages
Simcenter STAR-CCM+ 2302.0001: Release Notes
No ratings yet
Simcenter STAR-CCM+ 2302.0001: Release Notes
68 pages
12 - Goal Stack Planning
100% (1)
12 - Goal Stack Planning
65 pages
NVSwitch
No ratings yet
NVSwitch
23 pages
Gpu-Applications-Catalog 2021
No ratings yet
Gpu-Applications-Catalog 2021
76 pages
A Brief Introduction To 3d
100% (1)
A Brief Introduction To 3d
84 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Movidius Neural Computer Stick
No ratings yet
Movidius Neural Computer Stick
33 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
No ratings yet
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
74 pages
Game Development in Android
No ratings yet
Game Development in Android
37 pages
ARM Based Development Course by Mouli Sankaran
100% (7)
ARM Based Development Course by Mouli Sankaran
1,027 pages
Cpu Vs Gpu
No ratings yet
Cpu Vs Gpu
12 pages
AOSP On New Deivce
No ratings yet
AOSP On New Deivce
54 pages
Oops Through Java by Vikram Dunga
No ratings yet
Oops Through Java by Vikram Dunga
165 pages
Image Processing With Opencv Python: Kripasindhu Sarkar
No ratings yet
Image Processing With Opencv Python: Kripasindhu Sarkar
67 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
Instant Download IoT Projects with NVIDIA Jetson Nano: AI-Enabled Internet of Things Projects for Beginners Agus Kurniawan PDF All Chapters
100% (5)
Instant Download IoT Projects with NVIDIA Jetson Nano: AI-Enabled Internet of Things Projects for Beginners Agus Kurniawan PDF All Chapters
65 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Robotics 2 Gnuplot: Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard
No ratings yet
Robotics 2 Gnuplot: Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard
11 pages
Traditionall Knowledgge
No ratings yet
Traditionall Knowledgge
4 pages
Mark
No ratings yet
Mark
1 page
Clausis and Kelvin Law Thermodynamics
No ratings yet
Clausis and Kelvin Law Thermodynamics
4 pages
Insertion in A Sorted List: Program
No ratings yet
Insertion in A Sorted List: Program
6 pages
Clausis and Kelvin Law Thermodynamics
No ratings yet
Clausis and Kelvin Law Thermodynamics
4 pages
OnDevice Instagram - User Manual
No ratings yet
OnDevice Instagram - User Manual
8 pages
EC2303 Computer Architecture and Organization QUESTION PAPER
No ratings yet
EC2303 Computer Architecture and Organization QUESTION PAPER
4 pages
Student Handout Science F1
70% (10)
Student Handout Science F1
20 pages
Spring Framework Reference PDF
No ratings yet
Spring Framework Reference PDF
855 pages
Legacy Report: ICC Evaluation Service, Inc
No ratings yet
Legacy Report: ICC Evaluation Service, Inc
2 pages
Weldlok Steel Grating
No ratings yet
Weldlok Steel Grating
16 pages
Stone Waste As A Groundbreaking Conception For The Low Cost Concrete
No ratings yet
Stone Waste As A Groundbreaking Conception For The Low Cost Concrete
7 pages
Core Web API PDF
No ratings yet
Core Web API PDF
107 pages
1886 Sarawak Gazette
100% (3)
1886 Sarawak Gazette
9 pages
Notebook / Laptop: Merk Keterangan Harga
No ratings yet
Notebook / Laptop: Merk Keterangan Harga
1 page
Grouted Riprap
No ratings yet
Grouted Riprap
8 pages
Assignment No 6 Shearing Process in Sheet Metal
No ratings yet
Assignment No 6 Shearing Process in Sheet Metal
5 pages
JAVA Online Training Docs
No ratings yet
JAVA Online Training Docs
8 pages
WALLPRES Excavation
No ratings yet
WALLPRES Excavation
11 pages
LD Method Statement
No ratings yet
LD Method Statement
17 pages
Presentation Filler Slab Students
100% (4)
Presentation Filler Slab Students
27 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Fy 350mpa: Example (3) Solution
No ratings yet
Fy 350mpa: Example (3) Solution
30 pages
Research Essay On Hoi An Ancient Town
No ratings yet
Research Essay On Hoi An Ancient Town
15 pages
Ppe - 1to5
100% (1)
Ppe - 1to5
11 pages
PBC Linear Motion Catalog
No ratings yet
PBC Linear Motion Catalog
204 pages
TP Solar Site Configuration Manual
No ratings yet
TP Solar Site Configuration Manual
29 pages
Agru PL 2010 Preturi
No ratings yet
Agru PL 2010 Preturi
54 pages
Parallel & Distributed Simulation Systems: From Chandy/Misra To The High Level Architecture and Beyond
No ratings yet
Parallel & Distributed Simulation Systems: From Chandy/Misra To The High Level Architecture and Beyond
98 pages
Howto Linux, CMTS
No ratings yet
Howto Linux, CMTS
9 pages
Rethinking Urban Renewal in Hong Kong
No ratings yet
Rethinking Urban Renewal in Hong Kong
14 pages
Creating A Reactive Energy Pulse Output Based On The ADE7753
No ratings yet
Creating A Reactive Energy Pulse Output Based On The ADE7753
12 pages
FLDigi 3.22 Users Manual
No ratings yet
FLDigi 3.22 Users Manual
335 pages