Frequently Asked Questions About CUDA Programming
Frequently Asked Questions About CUDA Programming
A:
You are using visual studio 2010 so you should add the path to your project. Just right click
on the name of the project, select properties. under configuration properties select VC++
Directories. add an extra ; at the end of Include Directories and
add C:\ProgramData\NVIDIACorporation\CUDASamples\v5.0\common\inc\. also
the common directory might also have a lib folder that you should add under Library
Directories.
You should do this for each project that needs them. also you can copy them to your VS
directory under VC\include.
Q: General way of solving Error: Stack around the variable 'x' was corrupted
A:
There are however, a somewhat smallish number of things that typically cause your
problem:
Improper handling of memory:
Using the wrong type of deletion (free for something allocated with new, etc.),
Accessing something after it's memory has been deleted.
Declare everything static inline, and ensure that there are no undefined
functions, and that there are no functions that call undefined functions.
Declare everything inline for Studio and extern inline for gcc. Then provide a
global version of the function in a separate file.
The downside of inlining is that it can bloat your code size if the function is called
from many places.
In many places we create the functions for small work/functionality which contain simple
and less number of executable instruction. Imagine their calling overhead each time
they are being called by callers.
When a normal function call instruction is encountered, the program stores the memory
address of the instructions immediately following the function call statement, loads the
function being called into the memory, copies argument values, jumps to the memory
location of the called function, executes the function codes, stores the return value of
the function, and then jumps back to the address of the instruction that was saved just
before executing the called function. Too much run time overhead.
The C++ inline function provides an alternative. With inline keyword, the compiler
replaces the function call statement with the function code itself (process called
expansion) and then compiles the entire code. Thus, with inline functions, the compiler
does not have to jump to another location to execute the function, and then jump back
as the code of the called function is already available to the calling program.
Pros :1. It speeds up your program by avoiding function calling overhead.
2. It save overhead of variables push/pop on the stack, when function calling happens.
3. It save overhead of return call from a function.
4. It increases locality of reference by utilizing instruction cache.
5. By marking it as inline, you can put a function definition in a header file (i.e. it can be included in
multiple compilation unit, without the linker complaining)
Cons :1. It increases the executable size due to code expansion.
2. C++ inlining is resolved at compile time. Which means if you change the code of the inlined
function, you would need to recompile all the code using it to make sure it will be updated
3. When used in a header, it makes your header file larger with information which users dont care.
4. As mentioned above it increases the executable size, which may cause thrashing in memory. More
number of page fault bringing down your program performance.
5. Sometimes not useful for example in embedded system where large executable size is not preferred
at all due to memory constraints.
When to use Function can be made as inline as per programmer need. Some useful recommendation are mentioned
below1. Use inline function when performance is needed.
2. Use inline function over macros.
3. Prefer to use inline keyword outside the class with the function definition to hide implementation
details.
Just Tested:
.value after ratingListBox.Items[i] can also work.
Q : member
type C#
A : Method names which are similar to class name are called constructors. Constructors
dont have a return type.
Change Class Name or Method Names
Q : PInvokeStackImbalance
function
A : As mentioned in Dane Rose's comment, you can either use __stdcall on your C++
function or declare CallingConvention = CallingConvention.Cdecl on your DllImport.
Q : Use
A:
All CUDA API functions return an error code (or cudaSuccess if no error occured). All other
parameters are passed by reference. However, in plain C you cannot have references, that's
why you have to pass an address of the variable that you want the return information to be
stored. Since you are returning a pointer, you need to pass a double-pointer.
Another well-known function which operates on addresses for the same reason is
the scanffunction. How many times have you forgotten to write this & before the variable that
you want to store the value to? ;)
int i;
scanf("%d",&i);
It is needed because the function sets the pointer. As with every output parameters in C,
you need a pointer to an actual variable that you set, rather than the value itself
In this article well let you know the complete syntax of CUDA Kernels.
We all are love to learn and always curious about know everything in detail.
I was very disappointed when I was not able to find the complete syntax of
CUDA Kernels. So, I though let me give it a day to search everywhere, after
the havey search, I found the syntax of CUDA Kernel and today I am
presenting It you reader.
The CUDA Kernel consist in <<< >>> brackets four things.
First argument is known as Grid Size, followed by Block Size, followed
by size of Shared Memory and end with Stream argument.
Here is the complete syntax;
Kernel_Name<<< GridSize, BlockSize, SMEMSize,
(arguments,....);
Stream >>>
Grid Size
We all know what is Grid size, in case you dont know read further.
Grid size is defined by the number of blocks in a grid. In previous version of
CUDA architecture (from Compute capability 1.x to 2.x) the grid can only be
organized in two dimension (X and Y direction ). But in the current version
(from Compute capability 3.x onwards) the grid can be organized in three
dimension ( X , Y and Z all ).
Block Size
The blocks organized in terms of threads. Threads is the smallest unit in
Parallel programming so in CUDA.
Shared Memory (SMEMSize)
This is for the size of shared memory which is to be use in CUDA Kernel for
shared variable space. This is use bec. Of dynamic shared memory size in
CUDA Kernels.
Streams
A stream is a sequence of operations that are performed in order on the
device.
Streams allows independent concurrent in-order queues of
execution. Stream tell on which device, kernel will execute.
Operations in different streams can be interleaved and overlapped, which
can be used to hide data transfers between host and device.
Q : What is Stream in CUDA API
A : http://cuda-programming.blogspot.in/2013/01/cuda-streams-what-is-cudastreams.html
Stream
A stream is a sequence of operations that are performed in order on the
device.
Streams allows independent concurrent in-order queues of execution.
Operations in different streams can be interleaved and overlapped, which
can be used to hide data transfers between host and device.
---
Function Prototype
Creates a new asynchronous stream.
cudaError_t cudaStreamCreate (cudaStream_t * pStream)
Parameters:
pStream - Pointer to new stream identifier
Returns:
cudaSuccess, cudaErrorInvalidValue
Note that this function may also return error codes from previous,
asynchronous launches.
A:
Short answer ... it depends.
1.
Static defined local variables do not lose their value between function calls. In other
words they are global variables, but scoped to the local function they are defined in.
2.
Static global variables are not visible outside of the C file they are defined in.
3.
Static functions are not visible outside of the C file they are defined in.
Static member functions are functions that do not require an instance of the class, and are
called the same way you access static member variables -- with the class name rather than a
variable name. (E.g. a_class::static_function(); rather than an_instance.function();) Static member
functions can only operate on static members, as they do not belong to specific instances of a class.
Static member functions can be used to modify static member variables to keep track of their values
-- for instance, you might use a static member function if you chose to use a counter to give each
instance of a class a unique id.
Ex.
Int angka[]={0};
memset
void * memset ( void * ptr, int value, size_t num );
Fill block of memory
Sets the first num bytes of the block of memory pointed by ptr to the
specified value (interpreted as an unsigned char).
Parameters
ptr
Pointer to the block of memory to fill.
value
Value to be set. The value is passed as an int, but the function fills the block
of memory using the unsigned charconversion of this value.
num
Number of bytes to be set to the value.
size_t is an unsigned integral type.
Return Value
ptr is returned.
Example
1 /* memset example */
2 #include <stdio.h>
3 #include <string.h>
4
5 int main ()
6{
7 char str[] = "almost every programmer should know memset!";
8 memset (str,'-',6);
9 puts (str);
10 return 0;
11 }
Output:
------ every programmer should know memset!
Right Shift by 2
Leading 2 Blanks
Right ========>>>>>>
Syntax :
[variable]>>[number of places]