Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
285 views

Nvidia - Ug - Matlab Gpu Coder

NVidia Matlab Gpu coder

Uploaded by

Hakan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
285 views

Nvidia - Ug - Matlab Gpu Coder

NVidia Matlab Gpu coder

Uploaded by

Hakan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

GPU Coder™ Support Package for

NVIDIA® GPUs
User’s Guide

R2019b
How to Contact MathWorks

Latest news: www.mathworks.com

Sales and services: www.mathworks.com/sales_and_services

User community: www.mathworks.com/matlabcentral

Technical support: www.mathworks.com/support/contact_us

Phone: 508-647-7000

The MathWorks, Inc.


1 Apple Hill Drive
Natick, MA 01760-2098
GPU Coder™ Support Package for NVIDIA® GPUs User's Guide
© COPYRIGHT 2018–2019 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used
or copied only under the terms of the license agreement. No part of this manual may be photocopied or
reproduced in any form without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by,
for, or through the federal government of the United States. By accepting delivery of the Program or
Documentation, the government hereby agrees that this software or documentation qualifies as commercial
computer software or commercial computer software documentation as such terms are used or defined in
FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this
Agreement and only those rights specified in this Agreement, shall pertain to and govern the use,
modification, reproduction, release, performance, display, and disclosure of the Program and
Documentation by the federal government (or other entity acquiring for or through the federal government)
and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the
government's needs or is inconsistent in any respect with federal procurement law, the government agrees
to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patentsfor more information.
Revision History
September 2018 Online only New for Version 18.2.0 (R2018b)
November 2018 Online only Rereleased for Version 18.2.1 (R2018b)
March 2019 Online only Revised for Version 19.1.0 (R2019a)
July 2019 Online only Rereleased for Version 19.1.1 (R2019a)
August 2019 Online only Rereleased for Version 19.1.2 (R2019a)
September 2019 Online only Revised for Version 19.2.0 (R2019b)
Contents

Installation and Setup


1
Install Support Package for NVIDIA Hardware . . . . . . . . . . . . . 1-2
Install, Update, or Uninstall Support Package . . . . . . . . . . . . . 1-2

Install and Setup Prerequisites for NVIDIA Boards . . . . . . . . . 1-4


Target Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
Development Host Requirements . . . . . . . . . . . . . . . . . . . . . . 1-6

Deployment
2
Build and Run an Executable on NVIDIA Hardware . . . . . . . . . 2-2
Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Tutorial Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Example: Vector Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Create a Live Hardware Connection Object . . . . . . . . . . . . . . . 2-3
Generate CUDA Executable Using GPU Coder . . . . . . . . . . . . 2-4
Run the Executable and Verify the Results . . . . . . . . . . . . . . . 2-7

Build and Run an Executable on NVIDIA Hardware Using GPU


Coder App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Tutorial Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Example: Vector Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Custom Main File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
GPU Coder App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
Run the Executable and Verify the Results . . . . . . . . . . . . . . 2-15

Read Video Files on NVIDIA Hardware . . . . . . . . . . . . . . . . . . 2-17


Sobel Edge Detection on Video File . . . . . . . . . . . . . . . . . . . 2-17

iii
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Create a Live Hardware Connection Object . . . . . . . . . . . . . 2-17
The videoReaderDeploy Entry-Point Function . . . . . . . . . . . . 2-18
Generate CUDA Executable Using GPU Coder . . . . . . . . . . . 2-19
Run the Executable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Specifying the Video File at Runtime . . . . . . . . . . . . . . . . . . 2-21
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23

Stop or Restart an Executable Running on NVIDIA Hardware


................................................ 2-24

Run Linux Commands on NVIDIA Hardware . . . . . . . . . . . . . . 2-26


Create a Communication Object . . . . . . . . . . . . . . . . . . . . . . 2-26
Execute System Commands on Your NVIDIA Hardware . . . . . 2-27
Run/Stop a CUDA Executable on Your NVIDIA Hardware . . . 2-28
Manipulate Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28

Open a Secure Shell Command-Line Session with NVIDIA


Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30

Verification
3
Processor-In-The-Loop Execution from Command Line . . . . . . 3-2
Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
Example: The Mandelbrot Set . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Create a Live Hardware Connection Object . . . . . . . . . . . . . . . 3-5
Configure the PIL Execution . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Generate Code and Run PIL Execution . . . . . . . . . . . . . . . . . . 3-7
Terminate the PIL Execution Process. . . . . . . . . . . . . . . . . . . . 3-7

Processor-In-The-Loop Execution with the GPU Coder App . . . 3-9


Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
Example: The Mandelbrot Set . . . . . . . . . . . . . . . . . . . . . . . . 3-10
GPU Coder App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

Execution-Time Profiling for PIL . . . . . . . . . . . . . . . . . . . . . . . 3-17


Generate Execution-Time Profile . . . . . . . . . . . . . . . . . . . . . . 3-17
View Execution Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19

iv Contents
1

Installation and Setup

• “Install Support Package for NVIDIA Hardware” on page 1-2


• “Install and Setup Prerequisites for NVIDIA Boards” on page 1-4
1 Installation and Setup

Install Support Package for NVIDIA Hardware


Add support for NVIDIA DRIVE and Jetson hardware to the MATLAB® product by
installing the GPU Coder Support Package for NVIDIA GPUs. This support package
requires the base product, GPU Coder, which requires MATLAB, MATLAB Coder™, and
Parallel Computing Toolbox™. Ensure that these products are installed before installing
this support package. See “Install and Setup Prerequisites for NVIDIA Boards” on page 1-
4.

The NVIDIA DRIVE and Jetson hardware is also referred to as a board or as target
hardware.

Install, Update, or Uninstall Support Package


Install Support Package

1 On the MATLAB Home tab, in the Environment section, select Add-Ons > Get
Hardware Support Packages.

1-2
See Also

2 In the Add-On Explorer window, click the support package and then click Install.

Update Support Package

On the MATLAB Home tab, in the Environment section, select Help > Check for
Updates.

Uninstall Support Package

1 On the MATLAB Home tab, in the Environment section, click Add-Ons > Manage
Add-Ons.
2 In the Add-On Manager window, find and click the support package, and then click
Uninstall.

See Also

More About
• “Install and Setup Prerequisites for NVIDIA Boards” on page 1-4
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”

1-3
1 Installation and Setup

Install and Setup Prerequisites for NVIDIA Boards

Target Requirements
Hardware

GPU Coder Support Package for NVIDIA GPUs supports the following development
platforms:

• NVIDIA Jetson AGX Xavier platform.


• NVIDIA Jetson Nano platform.
• NVIDIA Jetson TX2 embedded platform.
• NVIDIA Jetson TX1 embedded platform.
• NVIDIA DRIVE PX2 platform.

The GPU Coder Support Package for NVIDIA GPUs uses an SSH connection over TCP/IP
to execute commands while building and running the generated CUDA® code on the
DRIVE or Jetson platforms. Connect the target platform to the same network as the host
computer. Alternatively, you can use an Ethernet crossover cable to connect the board
directly to the host computer.

Software

• Use the JetPack or the DriveInstall software to install the OS image, developer tools,
and the libraries required for developing applications on the Jetson or DRIVE
platforms. You can use the Component Manager in the JetPack or the
DriveInstall software to select the components to be installed on the target
hardware. For installation instructions, refer to the NVIDIA board documentation. At a
minimum, you must install:

• CUDA toolkit.
• cuDNN library.
• TensorRT library.
• OpenCV library.
• GStreamer library (v1.0 or higher) for deployment of the videoReader function.

The GPU Coder Support Package for NVIDIA GPUs has been tested with the following
JetPack and DRIVE SDK versions:

1-4
Install and Setup Prerequisites for NVIDIA Boards

Hardware Platform Software Version


Jetson AGX Xavier, Nano, TX2/TX1 JetPack 4.2.1
DRIVE DRIVE SDK 5.0.3.10
• Install the Simple DirectMedia Layer (SDL v1.2) library, V4L2 library, and V4L2
utilities for running the webcam examples. You must also install the development
packages for these libraries.

For example, on Ubuntu, use the apt-get command to install these libraries.

sudo apt-get install libsdl1.2-dev


sudo apt-get install v4l-utils

Environment Variable on the Target

GPU Coder Support Package for NVIDIA GPUs uses environment variables to locate the
necessary tools, compilers, and libraries required for code generation. Ensure that the
following environment variables are set.

Variable Name Default Value Description


PATH /usr/local/cuda/bin Path to the CUDA toolkit executable
on the Jetson or DRIVE platform.
LD_LIBRARY_PATH /usr/local/cuda/lib64 Path to the CUDA library folder on
the Jetson or DRIVE platform.

Ensure that the required environment variables are accessible from non-interactive SSH
logins. For example, you can use the export command at the beginning of the
$HOME/.bashrc shell config file to add the environment variables.

Example .bashrc File


# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples

# If not running interactively, don't do anything


case $- in
*i*) ;;
*)
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
return;;
esac

1-5
1 Installation and Setup

# don't put duplicate lines or lines starting with space in the history.
# See bash(1) for more options
HISTCONTROL=ignoreboth

# append to the history file, don't overwrite it


shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)


HISTSIZE=1000
HISTFILESIZE=2000
.
.
.

Alternatively, you can set system-wide environment variables in the /etc/environment


file. You must have sudo privileges to edit this file.

Example /etc/environment File


PATH="/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
LD_LIBRARY_PATH="/usr/local/cuda/lib64/"

Input Devices

A webcam connected to the USB host port of the target hardware.

Development Host Requirements


This support package requires the base product, GPU Coder. GPU Coder requires the
following MathWorks® and third-party products.

MathWorks Products

• MATLAB (required).
• MATLAB Coder (required).
• Parallel Computing Toolbox (required).
• Deep Learning Toolbox™ (required for deep learning).
• Image Processing Toolbox™ (recommended).
• Embedded Coder® (recommended).
• Simulink® (recommended).

1-6
See Also

Third-Party Products

• NVIDIA GPU enabled for CUDA.


• CUDA toolkit and driver.
• C/C++ Compiler.
• CUDA Deep Neural Network library (cuDNN).
• NVIDIA TensorRT – high performance deep learning inference optimizer and run-time
library.

For information on the version numbers for the compiler tools and libraries, see
“Installing Prerequisite Products” (GPU Coder). For information on setting up the
environment variables on the host development computer, see “Setting Up the
Prerequisite Products” (GPU Coder).

Note It is recommended to use the same versions of cuDNN and TensorRT libraries on
the target board and the host computer.

See Also

More About
• “Install Support Package for NVIDIA Hardware” on page 1-2
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”

1-7
2

Deployment

• “Build and Run an Executable on NVIDIA Hardware” on page 2-2


• “Build and Run an Executable on NVIDIA Hardware Using GPU Coder App”
on page 2-9
• “Read Video Files on NVIDIA Hardware” on page 2-17
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Open a Secure Shell Command-Line Session with NVIDIA Hardware” on page 2-30
2 Deployment

Build and Run an Executable on NVIDIA Hardware


In this section...
“Learning Objectives” on page 2-2
“Tutorial Prerequisites” on page 2-2
“Example: Vector Addition” on page 2-3
“Create a Live Hardware Connection Object” on page 2-3
“Generate CUDA Executable Using GPU Coder” on page 2-4
“Run the Executable and Verify the Results” on page 2-7

The GPU Coder Support Package for NVIDIA GPUs uses the GPU Coder product to
generate CUDA code (kernels) from the MATLAB algorithm. These kernels run on any
CUDA enabled GPU platform. The support package automates the deployment of the
generated CUDA code on GPU hardware platforms such as Jetson or DRIVE

Learning Objectives
In this tutorial, you learn how to:

• Prepare your MATLAB code for CUDA code generation by using the kernelfun
pragma.
• Connect to the NVIDIA target board.
• Generate and deploy CUDA executable on the target board.
• Run the executable on the board and verify the results.

Tutorial Prerequisites
Target Board Requirements

• NVIDIA DRIVE or Jetson embedded platform.


• Ethernet crossover cable to connect the target board and host PC (if the target board
cannot be connected to a local network).
• NVIDIA CUDA toolkit installed on the board.
• Environment variables on the target for the compilers and libraries. For information
on the supported versions of the compilers and libraries and their setup, see “Install
and Setup Prerequisites for NVIDIA Boards” on page 1-4.

2-2
Build and Run an Executable on NVIDIA Hardware

Development Host Requirements

• GPU Coder for code generation. For an overview and tutorials, see the “Getting
Started with GPU Coder” (GPU Coder) page
• NVIDIA CUDA toolkit on the host.
• Environment variables on the host for the compilers and libraries. For information on
the supported versions of the compilers and libraries, see “Third-party Products” (GPU
Coder). For setting up the environment variables, see “Environment Variables” (GPU
Coder).

Example: Vector Addition


This tutorial uses a simple vector addition example to demonstrate the build and
deployment workflow on NVIDIA GPUs. Create a MATLAB function myAdd.m that acts as
the entry-point for code generation. Alternatively, use the files in the “Getting Started
with the GPU Coder Support Package for NVIDIA GPUs” example for this tutorial. The
easiest way to create CUDA code for this function is to place the coder.gpu.kernelfun
pragma in the function. When the GPU Coder encounters kernelfun pragma, it attempts
to parallelize all the computation within this function and then maps it to the GPU.

function out = myAdd(inp1,inp2) %#codegen


coder.gpu.kernelfun();
out = inp1 + inp2;
end

Create a Live Hardware Connection Object


The support package software uses an SSH connection over TCP/IP to execute commands
while building and running the generated CUDA code on the DRIVE or Jetson platforms.
Connect the target platform to the same network as the host computer. Alternatively, use
an Ethernet crossover cable to connect the board directly to the host computer. Refer to
the NVIDIA documentation on how to set up and configure your board.

To communicate with the NVIDIA hardware, you must create a live hardware connection
object by using the jetson or drive function. To create a live hardware connection
object, provide the host name or IP address, user name, and password of the target
board. For example to create live object for Jetson hardware:

hwobj = jetson('192.168.1.15','ubuntu','ubuntu');

2-3
2 Deployment

The software performs a check of the hardware, compiler tools and libraries, IO server
installation, and gathers peripheral information on target. This information is displayed in
the command window.
Checking for CUDA availability on the Target...
Checking for NVCC in the target system path...
Checking for CUDNN library availability on the Target...
Checking for TensorRT library availability on the Target...
Checking for Prerequisite libraries is now complete.
Fetching hardware details...
Fetching hardware details is now complete. Displaying details.
Board name : NVIDIA Jetson TX2
CUDA Version : 9.0
cuDNN Version : 7.0
TensorRT Version : 3.0
Available Webcams : UVC Camera (046d:0809)
Available GPUs : NVIDIA Tegra X2

Alternatively, to create live object for DRIVE hardware:


hwobj = drive('92.168.1.16','nvidia','nvidia');

Note If there is a connection failure, a diagnostics error message is reported on the


MATLAB command window. If the connection has failed, the most likely cause is incorrect
IP address or host name.

Generate CUDA Executable Using GPU Coder


To generate a CUDA executable that can be deployed to a NVIDIA target, create a custom
main wrapper file main.cu, main.h that calls the entry-point function in the generated
code. The main file passes a vector containing the first 100 natural numbers to the entry
point function and writes the results to a myAdd.bin binary file.
//main.cu
// Include Files
#include "myAdd.h"
#include "main.h"
#include "myAdd_terminate.h"
#include "myAdd_initialize.h"
#include <stdio.h>

// Function Declarations

2-4
Build and Run an Executable on NVIDIA Hardware

static void argInit_1x100_real_T(real_T result[100]);


static void main_myAdd();

// Function Definitions
static void argInit_1x100_real_T(real_T result[100])
{
int32_T idx1;

// Initialize each element.


for (idx1 = 0; idx1 < 100; idx1++) {
result[idx1] = (real_T) idx1;
}
}

void writeToFile(real_T result[100])


{
FILE *fid = NULL;
fid = fopen("myAdd.bin", "wb");
fwrite(result, sizeof(real_T), 100, fid);
fclose(fid);
}

static void main_myAdd()


{
real_T out[100];
real_T b[100];
real_T c[100];

argInit_1x100_real_T(b);
argInit_1x100_real_T(c);

myAdd(b, c, out);
writeToFile(out); // Write the output to a binary file
}

// Main routine
int32_T main(int32_T, const char * const [])
{
// Initialize the application.
myAdd_initialize();

// Invoke the entry-point functions.


main_myAdd();

2-5
2 Deployment

// Terminate the application.


myAdd_terminate();
return 0;
}

//main.h
#ifndef MAIN_H
#define MAIN_H

// Include Files
#include <stddef.h>
#include <stdlib.h>
#include "rtwtypes.h"
#include "myAdd_types.h"

// Function Declarations
extern int32_T main(int32_T argc, const char * const argv[]);

#endif

Create a GPU code configuration object for generating an executable. Use the
coder.hardware function to create a configuration object for the DRIVE or Jetson
platform and assign it to the Hardware property of the code configuration object cfg.
Use the BuildDir property to specify the folder for performing remote build process on
the target. If the specified build folder does not exist on the target, then the software
creates a folder with the given name. If no value is assigned to
cfg.Hardware.BuildDir, the remote build process happens in the last specified build
folder. If there is no stored build folder value, the build process takes place in the home
folder.

cfg = coder.gpuConfig('exe');
cfg.Hardware = coder.hardware('NVIDIA Jetson');
cfg.Hardware.BuildDir = '~/remoteBuildDir';
cfg.CustomSource = fullfile('main.cu');

To generate CUDA code, use the codegen command and pass the GPU code configuration
object along with the size of the inputs for and myAdd entry-point function. After the code
generation takes place on the host, the generated files are copied over and built on the
target.

codegen('-config ',cfg,'myAdd','-args',{1:100,1:100});

2-6
See Also

Run the Executable and Verify the Results


To run the executable on the target hardware, use the runApplication() method of the
hardware object. In the MATLAB command window, enter:

pid = runApplication(hwobj,'myAdd');
### Launching the executable on the target...
Executable launched successfully with process ID 26432.
Displaying the simple runtime log for the executable...

Copy the output bin file myAdd.bin to the MATLAB environment on the host and
compare the computed results with the results from MATLAB.

outputFile = [hwobj.workspaceDir '/myAdd.bin']


getFile(hwobj,outputFile);

% Simulation result from the MATLAB.


simOut = myAdd(0:99,0:99);

% Read the copied result binary file from target in MATLAB.


fId = fopen('myAdd.bin','r');
tOut = fread(fId,'double');
diff = simOut - tOut';
fprintf('Maximum deviation : %f\n', max(diff(:)));
Maximum deviation between MATLAB Simulation output and GPU coder output on Target is: 0.000000

See Also
drive | drive | jetson | jetson | killApplication | killProcess | openShell |
runApplication | runExecutable | system

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”
• “Deploy and Run Sobel Edge Detection with I/O on NVIDIA Jetson”

More About
• “Build and Run an Executable on NVIDIA Hardware Using GPU Coder App” on page
2-9

2-7
2 Deployment

• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24


• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9

2-8
Build and Run an Executable on NVIDIA Hardware Using GPU Coder App

Build and Run an Executable on NVIDIA Hardware Using


GPU Coder App
In this section...
“Learning Objectives” on page 2-9
“Tutorial Prerequisites” on page 2-10
“Example: Vector Addition” on page 2-10
“Custom Main File” on page 2-11
“GPU Coder App” on page 2-12
“Run the Executable and Verify the Results” on page 2-15

The GPU Coder Support Package for NVIDIA GPUs uses the GPU Coder product to
generate CUDA code (kernels) from the MATLAB algorithm. These kernels run on any
CUDA enabled GPU platform. The support package automates the deployment of the
generated CUDA code on GPU hardware platforms such as Jetson or DRIVE

Learning Objectives
In this tutorial, you learn how to:

• Prepare your MATLAB code for CUDA code generation by using the kernelfun
pragma.
• Create and set up a GPU Coder project.
• Change settings to connect to the NVIDIA target board.
• Generate and deploy CUDA executable on the target board.
• Run the executable on the board and verify the results.

Before following getting started with this tutorial, it is recommended to familiarize


yourself with the GPU Coder App. For more information, see “Code Generation by Using
the GPU Coder App” (GPU Coder).

2-9
2 Deployment

Tutorial Prerequisites
Target Board Requirements

• NVIDIA DRIVE or Jetson embedded platform.


• Ethernet crossover cable to connect the target board and host PC (if the target board
cannot be connected to a local network).
• NVIDIA CUDA toolkit installed on the board.
• Environment variables on the target for the compilers and libraries. For information
on the supported versions of the compilers and libraries and their setup, see “Install
and Setup Prerequisites for NVIDIA Boards” on page 1-4.

Development Host Requirements

• GPU Coder for code generation. For an overview and tutorials, see the “Getting
Started with GPU Coder” (GPU Coder) page
• NVIDIA CUDA toolkit on the host.
• Environment variables on the host for the compilers and libraries. For information on
the supported versions of the compilers and libraries, see “Third-party Products” (GPU
Coder). For setting up the environment variables, see “Environment Variables” (GPU
Coder).

Example: Vector Addition


This tutorial uses a simple vector addition example to demonstrate the build and
deployment workflow on NVIDIA GPUs. Create a MATLAB function myAdd.m that acts as
the entry-point for code generation. Alternatively, use the files in the “Getting Started
with the GPU Coder Support Package for NVIDIA GPUs” example for this tutorial. The
easiest way to create CUDA code for this function is to place the coder.gpu.kernelfun
pragma in the function. When the GPU Coder encounters kernelfun pragma, it attempts
to parallelize all the computation within this function and then maps it to the GPU.

function out = myAdd(inp1,inp2) %#codegen


coder.gpu.kernelfun();
out = inp1 + inp2;
end

2-10
Build and Run an Executable on NVIDIA Hardware Using GPU Coder App

Custom Main File


To generate a CUDA executable that can be deployed to a NVIDIA target, create a custom
main wrapper file main.cu, main.h that calls the entry-point function in the generated
code. The main file passes a vector containing the first 100 natural numbers to the entry
point function and writes the results to a myAdd.bin binary file.

//main.cu
// Include Files
#include "myAdd.h"
#include "main.h"
#include "myAdd_terminate.h"
#include "myAdd_initialize.h"
#include <stdio.h>

// Function Declarations
static void argInit_1x100_real_T(real_T result[100]);
static void main_myAdd();

// Function Definitions
static void argInit_1x100_real_T(real_T result[100])
{
int32_T idx1;

// Initialize each element.


for (idx1 = 0; idx1 < 100; idx1++) {
result[idx1] = (real_T) idx1;
}
}

void writeToFile(real_T result[100])


{
FILE *fid = NULL;
fid = fopen("myAdd.bin", "wb");
fwrite(result, sizeof(real_T), 100, fid);
fclose(fid);
}

static void main_myAdd()


{
real_T out[100];
real_T b[100];
real_T c[100];

2-11
2 Deployment

argInit_1x100_real_T(b);
argInit_1x100_real_T(c);

myAdd(b, c, out);
writeToFile(out); // Write the output to a binary file
}

// Main routine
int32_T main(int32_T, const char * const [])
{
// Initialize the application.
myAdd_initialize();

// Invoke the entry-point functions.


main_myAdd();

// Terminate the application.


myAdd_terminate();
return 0;
}

//main.h
#ifndef MAIN_H
#define MAIN_H

// Include Files
#include <stddef.h>
#include <stdlib.h>
#include "rtwtypes.h"
#include "myAdd_types.h"

// Function Declarations
extern int32_T main(int32_T argc, const char * const argv[]);

#endif

GPU Coder App


To open the GPU Coder app, on the MATLAB toolstrip Apps tab, under Code
Generation, click the GPU Coder app icon. You can also open the app by typing
gpucoder in the MATLAB Command Window.

1 The app opens the Select source files page. Select myAdd.m as the entry-point
function. Click Next.

2-12
Build and Run an Executable on NVIDIA Hardware Using GPU Coder App

2 In the Define Input Types window, enter myAdd(1:100,1:100) and click


Autodefine Input Types, then click Next.
3 You can initiate the Check for Run-Time Issues process or click Next to go to the
Generate Code step.
4 Set the Build type to Executable and the Hardware Board to NVIDIA Jetson.

5 Click More Settings, on the Custom Code panel, enter the custom main file
main.cu in the field for Additional source files. The custom main file and the
header file must be in the same location as the entry-point file.

2-13
2 Deployment

6 Under the Hardware panel, enter the device address, user name, password, and
build folder for the board.

2-14
Build and Run an Executable on NVIDIA Hardware Using GPU Coder App

7 Close the Settings window and click Generate. The software generates CUDA code
and deploys the executable to the folder specified. Click Next and close the app.

Run the Executable and Verify the Results


In the MATLAB command window, use the runApplication() method of the hardware
object to start the executable on the target hardware.

hwobj = jetson;
pid = runApplication(hwobj,'myAdd');

### Launching the executable on the target...


Executable launched successfully with process ID 26432.

2-15
2 Deployment

Displaying the simple runtime log for the executable...

Copy the output bin file myAdd.bin to the MATLAB environment on the host and
compare the computed results with the results from MATLAB.
outputFile = [hwobj.workspaceDir '/myAdd.bin']
getFile(hwobj,outputFile);

% Simulation result from the MATLAB.


simOut = myAdd(0:99,0:99);

% Read the copied result binary file from target in MATLAB.


fId = fopen('myAdd.bin','r');
tOut = fread(fId,'double');
diff = simOut - tOut';
fprintf('Maximum deviation is: %f\n', max(diff(:)));

Maximum deviation between MATLAB Simulation output and GPU coder output on Target is: 0.000000

See Also
drive | drive | jetson | jetson | killApplication | killProcess | openShell |
runApplication | runExecutable | system

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”
• “Deploy and Run Sobel Edge Detection with I/O on NVIDIA Jetson”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9

2-16
Read Video Files on NVIDIA Hardware

Read Video Files on NVIDIA Hardware


With GPU Coder Support Package for NVIDIA GPUs, you can generate CUDA code for the
MATLAB VideoReader object to read files containing video data on the NVIDIA target
hardware. The generated code uses the GStreamer library API to read the video files.

Sobel Edge Detection on Video File


In this example, you use GPU Coder and the GPU Coder Support Package for NVIDIA
GPUs to generate and deploy a CUDA executable for a Sobel edge detection application
on the Jetson TX2 board. This CUDA application reads the contents of a video file,
performs the edge detection operation, and displays the output video on the NVIDIA
hardware.

Requirements
1 GPU Coder.
2 GPU Coder Support Package for NVIDIA GPUs.
3 Image Processing Toolbox toolbox for the rhinos.avi sample video file used in this
example.
4 NVIDIA CUDA toolkit.
5 GStreamer and SDL libraries on the target.
6 Environment variables for the compilers and libraries on the host and the target. For
more information, see “Third-party Products” (GPU Coder), “Environment Variables”
(GPU Coder), and “Install and Setup Prerequisites for NVIDIA Boards” on page 1-4.
7 NVIDIA Jetson TX2 embedded platform.

Create a Live Hardware Connection Object


The support package software uses an SSH connection over TCP/IP to execute commands
while building and running the generated CUDA code on the Jetson platforms. Connect
the target platform to the same network as the host computer. Alternatively, use an
Ethernet crossover cable to connect the board directly to the host computer. Refer to the
NVIDIA documentation on how to set up and configure your board.

To communicate with the NVIDIA hardware, you must create a live hardware connection
object by using the jetson function. To create a live hardware connection object, provide

2-17
2 Deployment

the host name or IP address, user name, and password of the target board. For example
to create live object for Jetson hardware:

hwobj = jetson('192.168.1.15','ubuntu','ubuntu');

The software performs a check of the hardware, compiler tools and libraries, IO server
installation, and gathers peripheral information on target. This information is displayed in
the command window.

Checking for CUDA availability on the Target...


Checking for NVCC in the target system path...
Checking for CUDNN library availability on the Target...
Checking for TensorRT library availability on the Target...
Checking for Prerequisite libraries is now complete.
Fetching hardware details...
Fetching hardware details is now complete. Displaying details.
Board name : NVIDIA Jetson TX2
CUDA Version : 9.0
cuDNN Version : 7.0
TensorRT Version : 3.0
Available Webcams : UVC Camera (046d:0809)
Available GPUs : NVIDIA Tegra X2

Alternatively, to create live object for DRIVE hardware:

hwobj = drive('92.168.1.16','nvidia','nvidia');

Note If there is a connection failure, a diagnostics error message is reported on the


MATLAB command window. If the connection has failed, the most likely cause is incorrect
IP address or host name.

The videoReaderDeploy Entry-Point Function


Create a MATLAB file videoReaderDeploy.m that acts as the entry-point function for
code generation. The videoReaderDeploy.m function creates a VideoReader object
called vObj to read the rhinos.avi video file located on the target hardware. The
function then uses the hasFrame and readFrame methods of the VideoReader object to
determine and read valid video frames from the input file. The function performs Sobel
edge detection through a 2-dimensional spatial gradient operation and displays the edge
detected image on the target hardware. The function finds the horizontal gradient(h) and
vertical gradient (v) of the input image with respective Sobel kernels.

2-18
Read Video Files on NVIDIA Hardware

function videoReaderDeploy()

% Create Jetson hardware object


hwobj = jetson();
vidName = '/home/ubuntu/Videos/rhinos.avi';

% Create video reader object


vObj = VideoReader(hwobj,vidName,'Width',320,'Height',240);

% Create display object on the target


dispObj = hwobj.imageDisplay;

% Grab frame from the video pipeline


while vObj.hasFrame
img = vObj.readFrame();

% Sobel edge detection


kernel = [1 2 1;0 0 0;-1 -2 -1];
h = conv2(img(:,:,2),kernel,'same');
v = conv2(img(:,:,2),kernel','same');
e = sqrt(h.*h + v.*v);
edgeImg = uint8((e > 100) * 240);

% Display edge detected image


image(dispObj,edgeImg);
end
end

For code generation, the VideoReader function requires the full path to the video file on
the target hardware. The GPU Coder Support Package for NVIDIA GPUs uses the
GStreamer library API to read the video files on the target platform. The software
supports file (container) formats and codecs that are compatible with GStreamer. For
more information, see https://gstreamer.freedesktop.org/documentation/plugin-
development/advanced/media-types.html?gi-language=c. For other code generation
limitations for the VideoReader function, see “Limitations” on page 2-23.

Generate CUDA Executable Using GPU Coder


Create a GPU code configuration object for generating an executable. Use the
coder.hardware function to create a configuration object for the Jetson platform and
assign it to the Hardware property of the code configuration object cfg. Use the
BuildDir property to specify the folder for performing remote build process on the
target. If the specified build folder does not exist on the target, then the software creates

2-19
2 Deployment

a folder with the given name. If no value is assigned to cfg.Hardware.BuildDir, the


remote build process happens in the last specified build folder. If there is no stored build
folder value, the build process takes place in the home folder. Set the
GenerateExampleMain property to generate an example CUDA C++ main function and
compile it. This example does not require modifications to the generated main files. Use
the putFile method of the Jetson object to move the input video file to the target
platform.

cfg = coder.gpuConfig('exe');
cfg.Hardware = coder.hardware('NVIDIA Jetson');
cfg.Hardware.BuildDir = '~/remoteBuildDir';
cfg.GenerateExampleMain = 'GenerateCodeAndCompile';
hwobj.putFile('rhinos', hwobj.workspaceDir);

To generate CUDA code, use the codegen command and pass the GPU code configuration
object along with the videoReaderDeploy entry-point function. After the code
generation takes place on the host, the generated files are copied over and built on the
target.

codegen('-config ',cfg,'videoReaderDeploy','-report');

Run the Executable


To run the executable on the target hardware, use the runApplication() method of the
hardware object. In the MATLAB command window, enter:

pid = runApplication(hwobj,'videoReaderDeploy');

A window opens on the target hardware display showing the edge detected output of the
input video.

2-20
Read Video Files on NVIDIA Hardware

Specifying the Video File at Runtime


Instead of specifying the video file during code generation time, you can modify the entry-
point function and code configuration object to accept a variable file name when running
the executable.
function videoReaderDeploy(vfilename)

% Create Jetson hardware object


hwobj = jetson();

% Create video reader object


vObj = VideoReader(hwobj,vfilename,'Width',640,'Height',480);

% Create display object on the target


dispObj = hwobj.imageDisplay;

% Grab frame from the video pipeline


while vObj.hasFrame
img = vObj.readFrame();

% Sobel edge detection


kernel = [1 2 1;0 0 0;-1 -2 -1];
h = conv2(img(:,:,2),kernel,'same');

2-21
2 Deployment

v = conv2(img(:,:,2),kernel','same');
e = sqrt(h.*h + v.*v);
edgeImg = uint8((e > 100) * 240);

% Display edge detected image


image(dispObj,edgeImg);
end
end

Create a custom main file to handle the variable file name input when running the
executable. A snippet of the code is shown.

static void main_videoReaderDeploy(const char* const vfilename)


{
videoReaderDeploy(vfilename);
}

//
// Arguments : int32_T argc
// const char * const argv[]
// Return Type : int32_T
//
int32_T main(int32_T, const char * const argv[])
{
//Initialize the application
videoReaderDeploy_initialize();

//Invoke entry-point function


main_videoReaderDeploy(argv[1]);

//Terminate the application


videoReaderDeploy_terminate();
return 0;
}

Modify the code configuration object to include this custom main file.

cfg = coder.gpuConfig('exe');
cfg.Hardware = coder.hardware('NVIDIA Jetson');
cfg.Hardware.BuildDir = '~/remoteBuildDir';
cfg.CustomSource = 'main.cu';

To generate CUDA code, use the codegen command and pass the GPU code configuration
object along with the videoReaderDeploy entry-point function.

2-22
See Also

vfilename = coder.typeof('a',[1,1024]);
codegen('-config ',cfg,'-args',{vfilename},'videoReaderDeploy','-report');

Limitations
• The VideoReader.getFileFormats method is not supported for code generation.
• For the readFrame and read functions, code generation does not support the optional
positional argument native.

See Also
drive | drive | jetson | jetson | killApplication | killProcess | openShell |
runApplication | runExecutable | system

Related Examples
• “Deploy and Run Fog Rectification for Video on NVIDIA Jetson”
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”
• “Deploy and Run Sobel Edge Detection with I/O on NVIDIA Jetson”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9

2-23
2 Deployment

Stop or Restart an Executable Running on NVIDIA


Hardware
You can use the MATLAB Command Window to stop or restart binary executables that are
running on the NVIDIA hardware.

1 Create a connection from the MATLAB software to the NVIDIA hardware. In this
example, the connection is to a Jetson board and is named hwJetson.

If a connection is already present in the MATLAB Workspace, skip this step.

hwJetson = jetson('192.168.1.15','ubuntu','ubuntu');

hwJetson =

jetson with properties:

DeviceAddress: '192.168.1.15'
Port: 22
BoardName: 'NVIDIA Jetson TX2'
CUDAVersion: '9.0'
cuDNNVersion: '7.0'
TensorRTVersion: '3.0'
GpuInfo: [1×1 struct]
webcamlist: []

2 To stop an executable running on the hardware, use the killApplication function


with the connection followed by the name of the executable. The name of the
executable is the same as the name of the model from which the executable
originated. For example:

killApplication(hwJetson,'myAdd')
3 To restart the stopped executable, or to run multiple instances of the executable, use
the runApplication function. For example:

runApplication(hwjetson,'myAdd')

### Launching the executable on the target...


Executable launched successfully with process ID 26432.
Displaying the simple runtime log for the executable...

See Also
drive | drive | jetson | jetson | killApplication | killProcess | openShell |
runApplication | runExecutable | system

2-24
See Also

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”
• “Deploy and Run Sobel Edge Detection with I/O on NVIDIA Jetson”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9

2-25
2 Deployment

Run Linux Commands on NVIDIA Hardware


The NVIDIA DRIVE and Jetson hardware runs a Linux® distribution as the operating
system. Using utilities shipped in the GPU Coder Support Package for NVIDIA GPUs, you
can remotely execute Linux shell commands on theNVIDIA hardware directly from the
MATLAB command line. For example, you can run and stop an executable, list the
contents of a folder, look up the CPU load of a process running on the
hardware.CUDANVIDIA You can also start an interactive SSH session directly from within
MATLAB.

Create a Communication Object


The GPU Coder Support Package for NVIDIA GPUs uses an SSH connection over TCP/IP
to execute commands while building and running the generated CUDA code on the DRIVE
or Jetson platforms. You can use the infrastructure developed for this purpose to
communicate with the NVIDIA hardware. Connect the target platform to the same
network as the host computer. Alternatively, use an Ethernet crossover cable to connect
the board directly to the host computer. Refer to the NVIDIA documentation on how to set
up and configure your board.

To communicate with the NVIDIA hardware, you must create a live hardware connection
object by using the drive or jetson function. To create a live hardware connection
object, provide the host name or IP address, user name, and password of the target
board. For example, to create a live object for the Jetson hardware:

hwobj = jetson('192.168.1.15','ubuntu','ubuntu');

During the hardware live object creation checking of hardware, IO server installation and
gathering peripheral info on target are performed. This information is displayed in the
command window as shown.

Checking for CUDA availability on the Target...


Checking for NVCC in the target system path...
Checking for CUDNN library availability on the Target...
Checking for TensorRT library availability on the Target...
Checking for Prerequisite libraries is now complete.
Fetching hardware details...
Fetching hardware details is now complete. Displaying details.
Board name : NVIDIA Jetson TX2
CUDA Version : 9.0
cuDNN Version : 7.0

2-26
Run Linux Commands on NVIDIA Hardware

TensorRT Version : 3.0


Available Webcams : UVC Camera (046d:0809)
Available GPUs : NVIDIA Tegra X2

Similarly, to create live object for DRIVE hardware:

hwobj = drive('192.168.1.16','nvidia','nvidia');

Note If there is a connection failure, a diagnostics error message is reported on the


MATLAB command line. If the connection has failed, the most likely cause is incorrect IP
address or host name.

Execute System Commands on Your NVIDIA Hardware


You can use the system method of the jetson or drive object to execute various Linux
shell commands on the NVIDIA hardware from MATLAB. For example, to list the contents
of the home folder on the target:

system(hwobj,'ls -al ~')

This statement executes a folder list shell command and returns the resulting text output
at the MATLAB command prompt. You can store the result in a MATLAB variable to
perform further processing. Establish who is the owner of the .profile file under /
home/ubuntu.
output = system(hwobj,'ls -al /home/ubuntu');
ret = regexp(output, '\s+[\w-]+\s+\d\s+(\w+)\s+.+\.profile\s+', 'tokens');
ret{1}

You can also achieve the same result using a single shell command.

system(hwobj,'stat --format="%U" /home/ubuntu/.profile')

You cannot execute interactive system commands using the system() method. To
execute interactive commands on the NVIDIA hardware, you must open a terminal
session.

openShell(hwobj)

This command opens a PuTTY terminal that can execute interactive shell commands like
'top'.

2-27
2 Deployment

Run/Stop a CUDA Executable on Your NVIDIA Hardware


To run/stop a CUDA executable, you can use the runExecutable and
killApplication methods of the jetson or drive object.

1. To run a CUDA executable you previously run on the NVIDIA hardware, execute the
following command on the MATLAB command line:

runExecutable(hwobj,'<executable name>')

where the string '<executable name>' is the name of the CUDA executable you want
to run on the NVIDIA hardware.

2. To stop a CUDA executable running on the NVIDIA hardware, execute the following
command on the MATLAB command line:

killApplication(hwobj,'<executable name>')

This command kills the Linux process with the name '<executable name>.elf' on the
NVIDIA hardware. Alternatively, you can execute the following command to stop the
model:

system(hwobj,'sudo killall <executable name>'')

Manipulate Files
The jetson or drive object provides basic file manipulation capabilities. To transfer a
file on hardware to your host computer,NVIDIA you use the getFile() method.

getFile(hwobj,'/usr/share/pixmaps/debian-logo.png');

You can then read the PNG file in MATLAB:

img = imread('debian-logo.png');
image(img);

The getFile() method takes an optional second argument that allows you to define the
file destination. To transfer a file on your host computer to NVIDIA hardware, you use
putFile() method.

putFile(hwobj,'debian-logo.png','/home/ubuntu/debian-logo.png.copy');

Make sure that file is copied.

2-28
See Also

system(hwobj,'ls -l /home/ubuntu/debian-logo.png.copy')

You can delete files on your NVIDIA hardware using the deleteFile() command.

deleteFile(hwobj,'/home/ubuntu/debian-logo.png.copy');

Make sure that file is deleted.

system(hwobj,'ls -l /home/ubuntu/debian-logo.png.copy')

The command results in an error indicating that the file cannot be found.

See Also
deleteFile | drive | drive | getFile | jetson | jetson | killApplication |
openShell | runApplication | system

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”
• “Deploy and Run Sobel Edge Detection with I/O on NVIDIA Jetson”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9

2-29
2 Deployment

Open a Secure Shell Command-Line Session with NVIDIA


Hardware
To open a Secure Shell (SSH) command-line session with NVIDIA DRIVE or Jetson
hardware, use the openShell method of the jetson or drive object. For example to
open an SSH session to the Jetson hardware, enter:

hwJetson = jetson;
openShell(hwJetson);

Similarly, you can create a live hardware connection to the DRIVE target and open an
SSH session.
drive | drive | jetson | jetson | killApplication | killProcess | openShell |
runApplication | runExecutable | system

2-30
Open a Secure Shell Command-Line Session with NVIDIA Hardware

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Getting Started with the GPU Coder Support Package for NVIDIA GPUs”
• “Deploy and Run Sobel Edge Detection with I/O on NVIDIA Jetson”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9

2-31
3

Verification

• “Processor-In-The-Loop Execution from Command Line” on page 3-2


• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9
• “Execution-Time Profiling for PIL” on page 3-17
3 Verification

Processor-In-The-Loop Execution from Command Line


Use the processor-in-the-loop (PIL) execution to check the numerical behavior of the
CUDA code that you generate from MATLAB functions. A PIL simulation, which requires
target connectivity, compiles generated source code, and then downloads and runs object
code on NVIDIA GPU platforms. The results of the PIL simulation are transferred to
MATLAB to verify the numerical equivalence of the simulation and the code generation
results.

The PIL verification process is a crucial part of the design cycle to check that the behavior
of the generated code matches the design. PIL verification requires an Embedded Coder
license.

Note When using PIL execution, make sure that the Benchmarking option in GPU Coder
settings is false. Executing PIL with benchmarking results in compilation errors.

Prerequisites
Target Board Requirements

• NVIDIA DRIVE or Jetson embedded platform.


• Ethernet crossover cable to connect the target board and host PC (if the target board
cannot be connected to a local network).
• NVIDIA CUDA toolkit installed on the board.
• Environment variables on the target for the compilers and libraries. For information
on the supported versions of the compilers and libraries and their setup, see “Install
and Setup Prerequisites for NVIDIA Boards” on page 1-4.

Development Host Requirements

• GPU Coder for code generation. For an overview and tutorials, see the “Getting
Started with GPU Coder” (GPU Coder) page.
• Embedded Coder.
• NVIDIA CUDA toolkit on the host.
• Environment variables on the host for the compilers and libraries. For information on
the supported versions of the compilers and libraries, see “Third-party Products” (GPU
Coder). For setting up the environment variables, see “Environment Variables” (GPU
Coder).

3-2
Processor-In-The-Loop Execution from Command Line

Example: The Mandelbrot Set


Description

You do not have to be familiar with the algorithm in the example to complete the tutorial.

The Mandelbrot set is the region in the complex plane consisting of the values z0 for
which the trajectories defined by

zk + 1 = zk2 + z0, k = 0, 1, …

remain bounded at k→∞. The overall geometry of the Mandelbrot set is shown in the
figure. This view does not have the resolution to show the richly detailed structure of the
fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot
set exhibits an elaborate boundary that reveals progressively finer recursive detail.

3-3
3 Verification

Algorithm

Create a MATLAB script called mandelbrot_count.m with the following lines of code.
This code is a baseline vectorized MATLAB implementation of the Mandelbrot set.
function count = mandelbrot_count(maxIterations, xGrid, yGrid) %#codegen
% mandelbrot computation

z0 = xGrid + 1i*yGrid;
count = ones(size(z0));

3-4
Processor-In-The-Loop Execution from Command Line

% Add Kernelfun pragma to trigger kernel creation


coder.gpu.kernelfun;

z = z0;
for n = 0:maxIterations
z = z.*z + z0;
inside = abs(z)<=2;
count = count + inside;
end
count = log(count);

For this tutorial, pick a set of limits that specify a highly zoomed part of the Mandelbrot
set in the valley between the main cardioid and the p/q bulb to its left. A 1000x1000 grid
of real parts (x) and imaginary parts (y) is created between these two limits. The
Mandelbrot algorithm is then iterated at each grid location. An iteration number of 500 is
enough to render the image in full resolution. Create a MATLAB script called
mandelbrot_test.m with the following lines of code. It also calls the
mandelbrot_count function and plots the resulting Mandelbrot set.

maxIterations = 500;
gridSize = 1000;
xlim = [-0.748766713922161, -0.748766707771757];
ylim = [ 0.123640844894862, 0.123640851045266];

x = linspace( xlim(1), xlim(2), gridSize );


y = linspace( ylim(1), ylim(2), gridSize );
[xGrid,yGrid] = meshgrid( x, y );

count = mandelbrot_count(maxIterations, xGrid, yGrid);

figure(1)
imagesc( x, y, count );
colormap([jet();flipud( jet() );0 0 0]);
axis off
title('Mandelbrot set');

Create a Live Hardware Connection Object


To communicate with the NVIDIA hardware, you must create a live hardware connection
object by using the jetson or drive function. To create a live hardware connection
object, provide the host name or IP address, user name, and password of the target
board. For example to create live object for Jetson hardware:

hwobj = jetson('192.168.1.15','ubuntu','ubuntu');

3-5
3 Verification

The software performs a check of the hardware, compiler tools and libraries, IO server
installation, and gathers peripheral information on target. This information is displayed in
the command window.
Checking for CUDA availability on the Target...
Checking for 'nvcc' in the target system path...
Checking for cuDNN library availability on the Target...
Checking for TensorRT library availability on the Target...
Checking for prerequisite libraries is complete.
Gathering hardware details...
Gathering hardware details is complete.
Board name : NVIDIA Jetson TX2
CUDA Version : 9.0
cuDNN Version : 7.0
TensorRT Version : 3.0
Available Webcams : Microsoft® LifeCam Cinema(TM)
Available GPUs : NVIDIA Tegra X2

Alternatively, to create live object for DRIVE hardware:


hwobj = drive('92.168.1.16','nvidia','nvidia');

Note If there is a connection failure, a diagnostics error message is reported on the


MATLAB command window. If the connection has failed, the most likely cause is incorrect
IP address or host name.

Configure the PIL Execution


Create a GPU code configuration object for generating a library and configure the object
for PIL. Use the coder.hardware function to create a configuration object for the DRIVE
or Jetson platform and assign it to the Hardware property of the code configuration
object cfg. Use 'NVIDIA Jetson' for the Jetson boards and 'NVIDIA Drive' for the
DRIVE boards.
cfg = coder.gpuConfig('lib','ecoder',true);
cfg.GpuConfig.CompilerFlags = '--fmad=false';
cfg.VerificationMode = 'PIL';
cfg.GenerateReport = true;
cfg.Hardware = coder.hardware('NVIDIA Jetson');

The --fmad=false flag when passed to nvcc, instructs the compiler to disable Floating-
Point Multiply-Add (FMAD) optimization. This option is set to prevent numerical mismatch

3-6
See Also

in the generated code because of architectural differences in the CPU and the GPU. For
more information, see “Numerical Differences Between CPU and GPU” (GPU Coder).

Generate Code and Run PIL Execution


To generate CUDA library and the PIL interface, use the codegen command and pass the
GPU code configuration object along with the size of the inputs for the
mandelbrot_count entry-point function. The -test option runs the MATLAB test file,
mandelbrot_test. The test file uses mandelbrot_count_pil, the generated PIL
interface for mandelbrot_count.
codegen -config cfg -args {0,zeros(1000),zeros(1000)} mandelbrot_count -test mandelbrot_test

### Connectivity configuration for function 'mandelbrot_count': 'NVIDIA Jetson'


Code generation successful: View report
Running test file: 'mandelbrot_test' with MEX function 'mandelbrot_count_pil.mexa64'.
### Starting application: 'codegen/lib/mandelbrot_count/pil/mandelbrot_count.elf'
To terminate execution: clear mandelbrot_count_pil
### Launching application mandelbrot_count.elf...

The software creates the following output folders:

• codegen\lib\mandelbrot_count — Standalone code for mandelbrot_count.


• codegen\lib\mandelbrot_count\pil — PIL interface code for
mandelbrot_count.

Verify that the output of this run matches the output from the original
mandelbrot_count.m function.

Note On a Microsoft®Windows® system, the Windows Firewall can potentially block a


PIL execution. Change the Windows Firewall settings to allow access.

Terminate the PIL Execution Process.


To terminate the PIL execution process.
clear mandelbrot_count_pil;

See Also
drive | drive | getPILPort | getPILTimeout | jetson | jetson | setPILPort |
setPILTimeout | webcam

3-7
3 Verification

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Processor-in-the-Loop Execution on NVIDIA Targets using GPU Coder”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9
• “Execution-Time Profiling for PIL” on page 3-17

3-8
Processor-In-The-Loop Execution with the GPU Coder App

Processor-In-The-Loop Execution with the GPU Coder


App
Use the processor-in-the-loop (PIL) execution to check the numerical behavior of the
CUDA code that you generate from MATLAB functions. A PIL simulation, which requires
target connectivity, compiles generated source code, and then downloads and runs object
code on NVIDIA GPU platforms. The results of the PIL simulation are transferred to
MATLAB to verify the numerical equivalence of the simulation and the code generation
results.

The PIL verification process is a crucial part of the design cycle to check that the behavior
of the generated code matches the design. PIL verification requires an Embedded Coder
license.

Note When using PIL execution, make sure that the Benchmarking option in GPU Coder
settings is false. Executing PIL with benchmarking results in compilation errors.

Prerequisites
Target Board Requirements

• NVIDIA DRIVE or Jetson embedded platform.


• Ethernet crossover cable to connect the target board and host PC (if the target board
cannot be connected to a local network).
• NVIDIA CUDA toolkit installed on the board.
• Environment variables on the target for the compilers and libraries. For information
on the supported versions of the compilers and libraries and their setup, see “Install
and Setup Prerequisites for NVIDIA Boards” on page 1-4.

Development Host Requirements

• GPU Coder for code generation. For an overview and tutorials, see the “Getting
Started with GPU Coder” (GPU Coder) page.
• Embedded Coder.
• NVIDIA CUDA toolkit on the host.
• Environment variables on the host for the compilers and libraries. For information on
the supported versions of the compilers and libraries, see “Third-party Products” (GPU

3-9
3 Verification

Coder). For setting up the environment variables, see “Environment Variables” (GPU
Coder).

Example: The Mandelbrot Set


Description

You do not have to be familiar with the algorithm in the example to complete the tutorial.

The Mandelbrot set is the region in the complex plane consisting of the values z0 for
which the trajectories defined by

zk + 1 = zk2 + z0, k = 0, 1, …

remain bounded at k→∞. The overall geometry of the Mandelbrot set is shown in the
figure. This view does not have the resolution to show the richly detailed structure of the
fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot
set exhibits an elaborate boundary that reveals progressively finer recursive detail.

3-10
Processor-In-The-Loop Execution with the GPU Coder App

Algorithm

Create a MATLAB script called mandelbrot_count.m with the following lines of code.
This code is a baseline vectorized MATLAB implementation of the Mandelbrot set.
function count = mandelbrot_count(maxIterations, xGrid, yGrid) %#codegen
% mandelbrot computation

z0 = xGrid + 1i*yGrid;
count = ones(size(z0));

3-11
3 Verification

% Add Kernelfun pragma to trigger kernel creation


coder.gpu.kernelfun;

z = z0;
for n = 0:maxIterations
z = z.*z + z0;
inside = abs(z)<=2;
count = count + inside;
end
count = log(count);

For this tutorial, pick a set of limits that specify a highly zoomed part of the Mandelbrot
set in the valley between the main cardioid and the p/q bulb to its left. A 1000x1000 grid
of real parts (x) and imaginary parts (y) is created between these two limits. The
Mandelbrot algorithm is then iterated at each grid location. An iteration number of 500 is
enough to render the image in full resolution. Create a MATLAB script called
mandelbrot_test.m with the following lines of code. It also calls the
mandelbrot_count function and plots the resulting Mandelbrot set.

maxIterations = 500;
gridSize = 1000;
xlim = [-0.748766713922161, -0.748766707771757];
ylim = [ 0.123640844894862, 0.123640851045266];

x = linspace( xlim(1), xlim(2), gridSize );


y = linspace( ylim(1), ylim(2), gridSize );
[xGrid,yGrid] = meshgrid( x, y );

count = mandelbrot_count(maxIterations, xGrid, yGrid);

figure(1)
imagesc( x, y, count );
colormap([jet();flipud( jet() );0 0 0]);
axis off
title('Mandelbrot set');

GPU Coder App


To open the GPU Coder app, on the MATLAB toolstrip Apps tab, under Code
Generation, click the GPU Coder app icon. You can also open the app by typing
gpucoder in the MATLAB Command Window.

1 The app opens the Select source files page. Select mandelbrot_count.m as the
entry-point function. Click Next.

3-12
Processor-In-The-Loop Execution with the GPU Coder App

2 In the Define Input Types window, enter


mandelbrot_count(500,zeros(1000),zeros(1000)) and click Autodefine
Input Types, then click Next.
3 You can initiate the Check for Run-Time Issues process or click Next to go to the
Generate Code step.
4 Set the Build type to Static Library and the Hardware Board to NVIDIA
Jetson.

5 Under the Hardware panel, enter the device address, user name, password, and
build folder for the board.

3-13
3 Verification

6 Close the Settings window and click Generate. The software generates CUDA code
for the mandelbrot_count entry point function.
7 Click Verify Code.
8 In the command field, specify the test file that calls the original MATLAB functions.
For example, mandelbrot_test.
9 To start the PIL execution, click Run Generated Code.

The GPU Coder app:

• Generates a standalone library, for example, codegen\lib\mandelbrot_count.


• Generates PIL interface code, for example, codegen\lib\mandelbrot_count
\pil.

3-14
Processor-In-The-Loop Execution with the GPU Coder App

• Runs the test file, replacing calls to the MATLAB function with calls to the
generated code in the library.
• Displays messages from the PIL execution in the Test Output tab.

Note On a MicrosoftWindows system, the Windows Firewall can potentially block a


PIL execution. Change the Windows Firewall settings to allow access.
10 Verify that the results from the PIL execution match the results from the original
MATLAB functions.
11 To terminate the PIL execution process, click Stop PIL Verification. Alternatively, on
the Test Output tab, click the link that follows To terminate execution.

3-15
3 Verification

See Also
drive | drive | getPILPort | getPILTimeout | jetson | jetson | setPILPort |
setPILTimeout | webcam

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Processor-in-the-Loop Execution on NVIDIA Targets using GPU Coder”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Execution-Time Profiling for PIL” on page 3-17

3-16
Execution-Time Profiling for PIL

Execution-Time Profiling for PIL


During a processor-in-the-loop (PIL) execution, you can produce a profile of execution
times for code generated from entry-point functions. The software calculates execution
times from data that is obtained through instrumentation probes added to the PIL
application.

Use the execution-time profile to check whether your code runs within the required time
on your target hardware:

• If code execution overruns, look for ways to reduce execution time.


• If your code easily meets time requirements, consider enhancing functionality to
exploit the unused processing power.

At the end of the PIL execution, you can:

• View a report of code execution times.


• Use the Simulation Data Inspector to view and compare plots of function execution
times.
• Access and analyze execution time profiling data.

Note PIL execution supports multiple entry-point functions. An entry-point function can
call another entry-point function as a subfunction. However, the software generates
execution-time profiles only for functions that are called at the entry-point level. The
software does not generate execution-time profiles for entry-point functions that are
called as subfunctions by other entry-point functions.

Note When using PIL execution, make sure that the Benchmarking option in GPU Coder
settings is false. Executing PIL with benchmarking results in compilation errors.

Generate Execution-Time Profile


Before running a processor-in-the-loop (PIL) execution, enable execution-time profiling:

1 To open the GPU Coder app, on the MATLAB toolstrip Apps tab, under Code
Generation, click the app icon.

3-17
3 Verification

2
To open your project, click and then click Open existing project. Select the
project.
3 On the Generate Code page, click Verify Code.
4 Select the Enable entry point execution profiling check box.

Or, from the Command Window, specify the CodeExecutionProfiling property of your
coder.gpuConfig object. For example:

cfg.CodeExecutionProfiling = true;

3-18
Execution-Time Profiling for PIL

View Execution Times


When you run a PIL execution with execution time profiling enabled, the software
generates a message in the Test Output tab. For example:
### Starting application: 'codegen\lib\mandelbrot_count\pil\mandelbrot_count.elf'
To terminate execution: clear mandelbrot_count_pil
### Launching application mandelbrot_count.elf...
Execution profiling data is available for viewing. Open Simulation Data Inspector.
Execution profiling report available after termination.

To open the code execution profiling report:


1 Click the Stop SIL Verification link.

3-19
3 Verification

The software terminates the execution process and displays a new link.
Execution profiling report: report(getCoderExecutionProfile('mandelbrot_count'))

2 Click the new link.

The report provides:

• A summary.
• Information about profiled code sections, which includes time measurements for:

3-20
Execution-Time Profiling for PIL

• The entry_point_fn_initialize function, for example,


mandelbrot_count_initialize.
• The entry-point function, for example, mandelbrot_count.
• The entry_point_fn_terminate function, for example,
mandelbrot_count_terminate.
• Definitions for metrics.

By default, the report displays time in ticks. You can specify the time unit and numeric
display format. The report displays time in seconds only if the timer is calibrated, that is,
the number of timer ticks per second is established. For example, if your processor speed
is 2.035 GHz, specify the number of timer ticks per second by using the
TimerTicksPerSecond property. To display time in microseconds (10-6 seconds), use the
report command.
executionProfile=getCoderExecutionProfile('mandelbrot_count'); % Create workspace var
executionProfile.TimerTicksPerSecond = 2035 * 1e6;
report(executionProfile, ...
'Units', 'Seconds', ...
'ScaleFactor', '1e-06', ...
'NumericFormat', '%0.3f')

To display measured execution times for a code section, click the Simulation Data
Inspector icon on the corresponding row. You can use the Simulation Data Inspector to
manage and compare plots from various executions.

The following table lists the information provided in the code section profiles.

Column Description
Section Name of function from which code is generated.
Maximum Longest time between start and end of code section.
Execution Time
Average Execution Average time between start and end of code section.
Time
Maximum Self Maximum execution time, excluding time in child sections.
Time
Average Self Time Average execution time, excluding time in child sections.
Calls Number of calls to the code section.
Icon that you click to display the profiled code section.

3-21
3 Verification

Column Description
Icon that you click to display measured execution times with
Simulation Data Inspector.

See Also
drive | drive | getPILPort | getPILTimeout | jetson | jetson | setPILPort |
setPILTimeout | webcam

Related Examples
• “Sobel Edge Detection using Webcam on NVIDIA Jetson”
• “Processor-in-the-Loop Execution on NVIDIA Targets using GPU Coder”

More About
• “Build and Run an Executable on NVIDIA Hardware” on page 2-2
• “Stop or Restart an Executable Running on NVIDIA Hardware” on page 2-24
• “Run Linux Commands on NVIDIA Hardware” on page 2-26
• “Processor-In-The-Loop Execution from Command Line” on page 3-2
• “Processor-In-The-Loop Execution with the GPU Coder App” on page 3-9

3-22

You might also like