Jetson Nano
Jetson Nano
Jetson Nano
Version: 1765
October 24, 2023
Contents
Contents 3
List of Figures 11
I. Machine Learning 17
1. Artificial intelligence - When are machines intelligent? 19
1.1. What is Artificial Intelligence? . . . . . . . . . . . . . . . . . . . . . . 19
1.2. Machinelles Lernen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3. Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5. Limits of Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . 22
3. Neural Networks 37
3.1. Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.1. Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3
4
3.1.2. Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.3. Activation and Output Function . . . . . . . . . . . . . . . . . 39
3.2. Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1. Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.2. Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.3. Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.4. Error Backpropagation . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.5. Problem of Overfitting . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.6. Epochs, Batches and Steps . . . . . . . . . . . . . . . . . . . . 42
3.2.7. Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.8. Correct Classification Rate and Loss Function . . . . . . . . . . 43
3.2.9. Training Data, Validation Data and Test Set . . . . . . . . . . 43
3.2.10. Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.11. Weight Imprinting . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3. Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1. Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2. Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3. Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.4. Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.5. Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.6. Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.7. AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.8. YOLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4. Images and Image Classification with a Convolutional Neural Network
(CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1. Representation of Images . . . . . . . . . . . . . . . . . . . . . 50
3.5. Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.1. CNN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.2. LeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.3. AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.4. InceptionNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.5. VGG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.6. Residual Neural Network (ResNet) . . . . . . . . . . . . . . . . 54
3.5.7. MobileNet & MobileNetV2 . . . . . . . . . . . . . . . . . . . . 54
3.5.8. EfficientNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.9. General Structure . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.10. Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.11. Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5.12. Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.13. Activation function . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.14. Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5.15. Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.16. Dense Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.17. Model used for a CIFAR 10 classification . . . . . . . . . . . . 60
6.2. OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.1. Application of OpenCV . . . . . . . . . . . . . . . . . . . . . . 86
6.2.2. Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.3. How does a computer read an image . . . . . . . . . . . . . . . 86
6.3. MediaPipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4. MediaPipe Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4.1. Hand Landmarks Detection . . . . . . . . . . . . . . . . . . . . 87
6.4.2. Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
18.TensorFlow 151
18.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
18.1.1. TensorFlow Framework . . . . . . . . . . . . . . . . . . . . . . 151
18.1.2. TensorFlow Use Case . . . . . . . . . . . . . . . . . . . . . . . . 151
18.1.3. TensorFlow Lite for Microcontrollers . . . . . . . . . . . . . . . 151
18.2. Sample TensorFlow Code . . . . . . . . . . . . . . . . . . . . . . . . . 152
18.3. TensorFlow 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
18.4. TensorFlow and Python . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8
Literature 337
Index 347
List of Figures
3.1. Biological (left) and abstracted (right) model of neural networks. Source:[Ert16] 37
3.2. Representation of a neuron as a simple threshold element . . . . . . . 38
3.3. Graph with weighted edges and equivalent weight matrix . . . . . . . 38
3.4. ReLu function (left) and Leaky ReLu . . . . . . . . . . . . . . . . . . . 40
3.5. Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6. Funktion Tangens Hyperbolicus . . . . . . . . . . . . . . . . . . . . . . 41
3.7. Illustration of the weight imprinting process [QBL18] . . . . . . . . . . 44
3.8. Illustration of imprinting in the normalized embedding space. (a) Before
imprinting, the decision boundaries are determined by the trained
weights. (b) With imprinting, the embedding of an example (the yellow
point) from a novel class defines a new region. [QBL18] . . . . . . . . 44
3.9. The YOLO system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.10. Original image, scaling, rotation and cropping of the image . . . . . . 49
3.11. Structure of a black and white image with 10 × 10 pixels . . . . . . . . 50
3.12. Image with 3 colour channels . . . . . . . . . . . . . . . . . . . . . . . 50
3.13. Greyscale with a value range from 0 to 255 . . . . . . . . . . . . . . . 51
3.14. Black and white image, image with 4 grey levels and with 256 grey levels. 51
3.15. Softened image with a 5 × 5 filter . . . . . . . . . . . . . . . . . . . . . 52
3.16. Edge detection on an image . . . . . . . . . . . . . . . . . . . . . . . . 52
3.17. Struktur der CNN-Architektur gemäß LeCun et.al. [LeC+98] . . . . . 53
3.18. Definition of the convolution of two matrices . . . . . . . . . . . . . . 56
3.19. Convolution layer with 3 × 3 kernel and stride (1, 1) . . . . . . . . . . 56
3.20. Beispielanwendung für einen Filter und ein Ausschnitt . . . . . . . . . 57
3.21. Visualisation of filters of a trained network [LXW19; Sha+16] . . . . . 58
3.22. Detection of vertical and horizontal edges . . . . . . . . . . . . . . . . 58
3.23. Representation of the ReLu activation function . . . . . . . . . . . . . 59
3.24. Application of an activation function with a bias b ∈ R to a matrix . . 59
3.25. Application of max-pooling and average-pooling . . . . . . . . . . . . . 60
3.26. Application of flattening . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.27. Structure of the neural network used . . . . . . . . . . . . . . . . . . . 61
4.1. Examples from the training set of the dataset MNIST [SSS19] . . . . . 65
4.2. Extract from ten illustrations of the data set Canadian Institute For
Advanced Research (CIFAR)-10 . . . . . . . . . . . . . . . . . . . . . . 66
11
12
14.1. Jetson Nano MSR-Lab in case with Pi camera, front view . . . . . . . 133
14.2. Connections of the Jetson Nano MSR-Lab in the case . . . . . . . . . 133
14.3. J41 header of the Jetson Nano MSR-Lab in the case, rear view . . . . 134
14.4. Circuit for testing the GPIO pins with LED . . . . . . . . . . . . . . . 135
14.5. Circuit for testing the GPIO pins with LED . . . . . . . . . . . . . . . 136
List of Figures 13
18.1. Training history of the simple model visualised with TensorBoard . . . 159
18.2. Training history of the simple model visualised with TensorBoard . . . 160
18.3. Training history of the simple model visualised with TensorBoard . . . 161
18.4. Trainingshistorie des einfachen Modells visualisiert mit TensorBoard . 162
18.5. Header lines of the data set Iris . . . . . . . . . . . . . . . . . . . . . . 164
18.6. Output of the categories of the data set Iris . . . . . . . . . . . . . . . 166
18.7. Names of the categories in the data set Iris . . . . . . . . . . . . . . . 166
18.8. Checking the data set Iris . . . . . . . . . . . . . . . . . . . . . . . . . 167
18.9. Variance in the data set Iris . . . . . . . . . . . . . . . . . . . . . . . . 167
18.10.Coding of the categories before (left) and after conversion . . . . . . . 168
18.11.Course of the accuracy for the training and test data during the training
of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
18.12.Course of the loss function for the training and test data during the
training of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
18.13.Course of accuracy for the training and test data during the training of
the second model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
18.14.Course of the loss function for the training and test data during the
training of the second model . . . . . . . . . . . . . . . . . . . . . . . . 174
18.15.Categories in CIFAR-100 . . . . . . . . . . . . . . . . . . . . . . . . . 176
18.16.Visualisation of the first 25 training data from the data set CIFAR-10 178
18.17.Visualisation of the first 25 training data from the data set CIFAR-100 179
18.18.Visualisation of the first 25 training data in the filtered CIFAR-100
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
18.19.Visualisation of the transition between the two data sets . . . . . . . . 181
18.20.Visualisation of the transition generated data set . . . . . . . . . . . . 182
18.21.Visualisierung von Engpässung mit nvtop Quelle:[Meu20] . . . . . . . 192
18.22.Visualisierung von Engpässung mit TensorBoard Quelle:[Meu20] . . . 193
18.23.Bottleneck Daten laden Quelle:[Meu20] . . . . . . . . . . . . . . . . . . 193
15
Part I.
Machine Learning
17
1. Artificial intelligence - When are
machines intelligent?
Answering the question is not easy. First, the term intelligence must be defined. Its
definition is not unambiguous. For example, it is not defined at what point a person
is considered intelligent. This raises the question of what belongs to intelligence. In
psychology, intelligence includes, among other things, the ability to adapt to unknown
and situations, but also to solve new problems.[FVP98]
„The aim of AI is to develop machines that behave as if they had intelligence.“ - John
McCarthy, 1955
According to this definition, simple machines and sensors are intelligent. They act
in accordance with a programme. This means that all reactions of the system have
been determined and fixed beforehand. If individual characteristics of humans are
considered, then, according to machines, a great many machines can be considered
intelligent; especially since machines have taken over activities that are wholly or
partially performed by humans. A simple example is computer programs. They are
many times more efficient in this area than humans. Hence the following definition
[Ric83]:
„AI is the study of how to make computers do things which, at the moment, people
do better.“ - Elaine Rich, 1983
This approach is clever in that it does not circumvent the definition of intelligence.
On the other hand, it points out that it is a snapshot in time. This takes into
account, that the human being is very adaptable. This adaptability distinguishes him
in particular. Man is able to grasp new external influences and adapt to them. He
can learn independently.
In the decade from 2010 to 202, many impressive results were achieved in the field of
artificial intelligence. In 2011, the computer Watson defeated the human champion
in the very popular guessing show „Jeopardy“ [Fer12]. Professional players of the
board game Go were beaten by System AlphaGo for the first time in 2015 and 2016
[Wan+16]. Even bluffing in poker was successfully applied by an artificial intelligence
called Libratus in 2017 [BS18]. In 2018, a computer program from the Chinese
corporation Alibaba shows better reading comprehension than humans. The software,
GPT-3, writes 2020 non-fiction text and poetry that looks like it was created by
humans. Artificial intelligences have also become active in the art world; new songs by
the Beatles have been composed and paintings created in the style of famous painters.
These successes tempt one to conclude that computer software is superior to human
intelligence. To clarify the difference, the terms weak and strong intelligence are
introduced. Strong artificial intelligence is for all-encompassing software that can
19
20
Figure 1.1.: relationship of artificial intelligence, machine learning and deep learning
Supervised learning uses training and test data for the learning process. The training
data includes both input data (e.g. object metrics) and the desired result (e.g.
classification of objects). The algorithm should then use the training data to find
a function that maps the input data to the result. Here, the function is adapted
independently by the algorithm during the learning process. Once a certain
success rate has been achieved for the training data, the learning process is
verified with the help of the test data. An example of this would be a clustering
procedure in which the clusters are already known before the learning process
begins.
Unsupervised learning uses only input data for the learning process, where the out-
come is not yet known. Based on the characteristics of the input data, patterns
are to be recognised in this. One application of unsupervised learning is the
clustering of data where the individual clusters are not yet defined before the
learning process.
Reinforcement learning is based on the reward principle for actions that have taken
place. It starts in an initial state without information about the environment or
about the effects of actions. An action then leads to a new state and provides a
reward (positive or negative). This is done until a final condition occurs. The
learning process can then be repeated to maximise the reward.
Chapter 1. 21
Input
Output
Input
Input
Output
Input
input neurons are the neurons that receive signals from the outside world. Here there
is one neuron for each type of input (feature).
hidden neurons are neurons that represent the actual learning process.
output neurons are the neurons that give signals to the outside world. There is one
neuron for each type of output (feature).
All neurons of a category are combined into a layer. Thus, in each neural network
there is an input layer, see figure1.2 green neurons, a hidden layer, in the figure 1.2
the blue neurons, and an output layer with the red neurons in the figure 1.2. If a
KNN contains more than one hidden layer, it is called a deep neural network. The
connections between the neurons in each layer are called synapses. These contain a
weighting, which is multiplied by the signal of the start neuron. Thus, the individual
signals are weighted. The weights are in turn adjusted during the learning process,
based on functions.
1.4. Application
Due to the adaptability of artificial intelligences, they can be used in many different
ways. In its study „Smartening up with Artificial Intelligence (AI)“, McKinsey
described eight application scenarios in which artificial intelligence has particular
potential [Mck].
• Autonomous Vehicles
Figure 1: A real-world attack on VGG16, using a physical patch generated by the white-box ensemble
Figure 1.5.: Optimal image for the identification of a toaster [Sit+18]
method described in Section 3. When a photo of a tabletop with a banana and a notebook (top
photograph) is passed through VGG16, the network reports class ’banana’ with 97% confidence (top
plot). If we physically place a sticker targeted to the class "toaster" on the table (bottom photograph),
the photograph is classified as a toaster with 99% confidence (bottom plot). See the following video
for a full demonstration: https://youtu.be/i1sp4X57TL4
the human eye could not recognise as such. A classic example is the toaster in the
image 1.5.[Bro+17]
This attack is significant because the attacker does not need to know what image they are attacking
when constructing the attack. After generating an adversarial patch, the patch could be widely
distributed across the Internet for other attackers to print out and use. Additionally, because the
attack uses a large perturbation, the existing defense techniques which focus on defending against
small perturbations may not be robust to larger perturbations such as these. Indeed recent work has
demonstrated that state-of-the art adversarially trained models on MNIST are still vulnerable to larger
perturbations than those used in training either by searching for a nearby adversarial example using a
different metric for distance [14], or by applying large perturbations in the background [1].
2 Approach
The traditional strategy for finding a targeted adversarial example is as follows: given some classifier
P [y | x], some input x ∈ Rn , some target class yb and a maximum perturbation ε, we want to find the
input xb that maximizes log (P [by|xb]), subject to the constraint that kx − x
bk∞ ≤ ε. When P [y | x] is
parameterized by a neural network, an attacker with access to the model can perform iterated gradient
descent on x in order to find a suitable input x
b. This strategy can produce a well camouflaged attack,
but requires modifying the target image.
Instead, we create our attack by completely replacing a part of the image with our patch. We mask
our patch to allow it to take any shape, and then train over a variety of images, applying a random
translation, scaling, and rotation on the patch in each image, optimizing using gradient descent. In
particular for a given image x ∈ Rw×h×c , patch p, patch location l, and patch transformations t
(e.g. rotations or scaling) we define a patch application operator A(p, x, l, t) which first applies the
transformations t to the patch p, and then applies the transformed patch p to the image x at location l
(see figure 2).
2
2. Processes for Knowledge Acquisition
in Databases
2.1. Knowledge Discovery in Databases (KDD Process)
Knowledge discovery in databases, or KDD for short, is the decisive step in the
examination of data. There are various possibilities for knowledge discovery. However,
in order to achieve the goal efficiently and systematically, a suitable process is of
immanent importance. In this section, the KDD process is considered in detail. In
this process, knowledge discovery is divided into several steps. [Düs00]
25
26
Figure 2.1.: KDD process according to [FPSS96] (own reduced representation based
on [FPSS96])
At the beginning of the KDD process there is always the available database from which
the knowledge is to be gained. From this, the data to be used must be selected from
one or more databases and the data must then be processed and prepared. Only then
can the data be used for knowledge extraction with the help of data mining methods.
After evaluation and possibly several runs through the various steps, the knowledge
gained can finally be extracted and used in the application.
Fayyad further divides this process into nine sub-steps, some of which are not directly
represented in Figure 2.1. These nine steps are described below.
.
To further improve the data, useful features are sought to represent the data [FPSS96].
Possible methods include dimensionality reduction and transformation. The imple-
mentation of this step will vary greatly depending on the application and domain.
As in the previous steps, the most effective transformation may emerge during the
KDD process [MR10]. The importance of this step is made clearer by the application
example in section 2.6.
After the preparation of the data has been completed, a suitable data mining method
is selected. Which method is suitable depends largely on the objectives defined in the
first step [FPSS96]. A basic distinction is made between descriptive methods, which
aim at interpreting and understanding the data, and predictive methods, which provide
a behavioural model that is able to predict certain variables for previously unseen
data. For this purpose, regressions can be used on the one hand, and classification
methods such as decision trees, support vector machines or neural networks on the
other. Some of these models can also be used for descriptive statements. [MR10]
In this step, the specific data mining algorithm to be used is selected. If, for example,
it was decided in the previous step that a predictive method should be chosen, a choice
must now be made between, for example, neural networks and decision trees, whereby
the former provide more precise descriptions of the correlations, but the latter are
easier to interpret. [MR10]
7. Data Mining
In the seventh step, data mining is finally applied. Often data mining and KDD
are used synonymously, but in fact data mining is a sub-step of the KDD process.
[FPSS96] This step may also have to be carried out several times, possibly even
immediately one after the other, to optimise the hyperparameters. [MR10]
After the data mining has been successfully completed, the results are evaluated and
interpreted. The visualisation of found patterns or models can be helpful for this.
In this step, the influence of the data preparation steps is reflected on and possibly
returned to. It must also be evaluated whether the model found is plausible and usable.
[FPSS96; MR10]
The discovered knowledge can now be used. The product of this can be documentation
or reports that record the knowledge obtained so that it can be used by interested
parties. Furthermore, the extracted knowledge can be integrated into another system,
e.g. in the form of a prediction model. By applying it under real conditions, e.g. with
dynamic instead of static data, deficits can become apparent that still need to be
reworked. [FPSS96; MR10]
28
2.2.5. Evaluation
The generic model also emphasises the iterative nature of the KDD process. Therefore,
an evaluation of the obtained results is indispensable.
2.3. CRISP-DM
While the first KDD models seem to be strongly oriented towards business applications,
the CRoss-Industry Standard Process for Data Mining (CRISP-DM) is intended to be
applicable to all industries and thus also to technical problems. CRISP-DM was devel-
oped in 1996 as part of a merger of the companies DaimlerChrysler, SPSS and NCR.
The aim was to embed data mining in an industry-, tool- and application-independent
process. The methodology is described with a hierarchical process model consisting of
four abstraction levels: phases, generic tasks, specialised tasks and process instance.
While the phases are always run through in the same way and the generic tasks are
the same, which are described in more detail by the specialised tasks, the last level
deals with the individual scope of application. [Cha+00]
Chapter 2. 29
The CRISP-DM process model contains the life cycle of a data mining project as shown
in Figure 2.2. The phases shown are largely consistent with the phases described
in the generic model. Ultimately, the KDD process is embedded in CRISP-DM.
As mentioned, individual generic tasks are assigned to each phase, which must be
processed in accordance with the specific task. In the phase „Data preparation“this
includes data selection, cleansing, construction, integration and formatting. [Cha+00]
Business
Understanding Data
Understanding
Data
Deployment
Data Preparation
Modeling
Evaluation
Figure 2.2.: Life cycle of a data mining project in CRISP-DM according to [Cha+00]
This model also emphasises the importance of returning to past phases during the
process. In Figure 2.2 only the most important and frequent connections between the
phases are shown. The outer circle represents the cyclical nature of the data mining
process itself. [Cha+00]
Interpretation
Evaluation/
Figure 2.3.: Modified model of the KDD process based on the model according to
[FPSS96]
The lines marking the iterative loops are solid and not dashed to emphasise the
absolute necessity of these iterative correction steps. However, the feedback from the
30
model is shown dashed because in this process model it is assumed that the knowledge
gained is used in the form of a model on an edge computer, here the Jetson Nano.
This complicates direct feedback and corrections.
In Figure 2.4 the distribution of the steps on different levels as well as localities and
systems becomes clear. This illustration refers to the application in a production
environment. It would be conceivable, for example, to create a system for process
control that detects defective workpieces with the help of a camera. The database
would be collected with this camera. The use of the data, including the steps explained
in the previous sections, takes place in an office outside of production. However, the
results found are used again in the control system in production, where they are
exposed to new data. This new data can be added to the database(s) and possibly
a feedback between the results in the field to a review and correction of the models
taking place in the office can also be realised.
Production
Model,
Databases New Data Results
Patterns
Evaluation,
Verification
Work preparation
2.6.1. Database
As a data basis for an image classification, a sufficient number of images including their
correct classification in terms of the application, the label, must be available. There
are various image databases that are freely accessible, including ImageNet, MS-COCO
and CIFAR-10, for example. Of course, only such databases come into question that
contain objects and features that the model is to recognise later. [Den+09; Den12;
SW16; Aga18a]
File formats and colour depth as well as the number of colour channels should also
already be taken into account when searching for suitable databases, as a (lossless)
transformation to the required file formats is not possible in every case or the database
might contain less information than desired. Which file formats, colour depths and
colour channels to use depends on the desired application, the training platform used
and the algorithm used.
adjustment is done on the basis of the training images. After a given number of
training images, the validation images are used to test the accuracy of the current
model. After several epochs, i.e. several runs of all training images, the model can be
tested.
2.6.6. Evaluation/Verification
In the first instance, the interpretation and evaluation of the trained model is done
repeatedly during the training with the help of the validation data set. In the second
instance, the model is tested directly after training with the help of the test images
that are still unknown to the model. If the results are satisfactory, the model can
be integrated into the application. In this instance, too, it is important to test the
performance of the model in order to be able to draw conclusions and make corrections.
In the dynamic application, other framework conditions prevail, so that drawing
conclusions about necessary corrections in the previous steps may require great domain
knowledge.
The interpretation of the found model is not trivial for neural networks, especially for
complex networks like Convolutional Neural Network (CNN). It may not be possible
to extract direct correlations that could be transferred to other applications.
2.6.7. Model
The trained model can finally be integrated into the intended application. While the
data preparation and training take place on a computer that is generally equipped
with high computing power, the model is intended to be used on an edge computer.
For example, real-time object recognition could be performed with a camera connected
to the Edge Computer based on the training with the static images. Transferring the
model to the edge computer may require converting the model to a different format.
Hypothesis Test
For the application of the hypothesis test, two hypotheses are always formulated:
a null hypothesis H0 , which mostly implies that the presumed data structure does
not exist, and the alternative hypothesis H1 or HA , which mostly implies that the
presumed data structure exists. The hypothesis test refers to the assumption of a true
null hypothesis [ABF11]. It must be proven that the null hypothesis is false. First,
values such as the T-value ("The T-value is the quotient of the arithmetic mean of the
differences between two variables being compared and the estimate of the standard
error for this mean in the population." [ABF11]) or the F-value is calculated. These
values are compared with a critical value adjusted to the situation. If this calculated
value is smaller than the critical value, the null hypothesis is considered true. In
this case, the P-value, i.e. the probability with which the result related to the null
hypothesis occurs, is also only slightly smaller than 0.05. If, on the other hand, the
P-value is very small, the null hypothesis is rejected. It is considered relatively certain
that the null hypothesis is false; however, it is not known which hypothesis is correct.
Conversely, if the null hypothesis is not rejected, it cannot be concluded that it is
correct [ABF11]. In this case, the result cannot be interpreted.
all mean values are equal. The test variables of the procedure are used to test whether
the variance between the groups is greater than the variance within the groups. This
makes it possible to determine whether the grouping is meaningful or not, or whether
the groups differ significantly or not. In its simplest form, the analysis of variance is
an alternative to the T-test. The result is the same as with the T-test,„because the
results of a one-way analysis of variance (one-way ANOVA) and a T-test are identical
if the two samples have the same variance.“[Dor13]
Here, outlier detection as well as cluster analysis belong to the observation problems;
classification and regression analysis belong to the prediction problems.
methods [JL07] are used to determine the distance between two clusters or a cluster
and a single object. Classification, on the other hand, assigns data to already existing
groups.
• residual analysis,
• overfitting,
• examining the data for outliers and influential data points, and
The validated model can be used to forecast values of y given values of x. To estimate
the uncertainty of the prediction, a confidence interval is often given in addition to
the predicted y-value.
Outlier Detection
Outliers are measured values or findings that are inconsistent with the rest of the
data, for example, by having unusual attribute values or not meeting expectations.
Expectation is most often the range of dispersion around the expected value in which
most measured values are found. „Robust bounds for detecting outliers for many
distribution types can also be derived based on quartiles and quartile distance.“[SH06]
Values of exploratory studies that lie further than 1.5 times the quartile distance
outside this interval are called outliers. Particularly high outliers are shown separately
in the boxplot.
36
For example, the Local Outlier Factor procedure searches for objects that have a
density that deviates significantly from their neighbours, then this is referred to as
density-based outlier detection. Identified outliers are then usually verified manually
and removed from the data set, as they can worsen the results of other procedures.
Before deciding in favour of removing values, it is therefore still necessary to check in
each case what data loss occurs when deleting or labelling the missing values. If the
number of available data sets falls below the level necessary to proceed, the removal
of the outliers should be avoided.
Correlation Analysis
An important task of data analysis is the analysis of the correlation between individual
characteristics. The strength of the connection between two quantitative characteristics
is called correlation in descriptive statistics and inferential statistics and can be
differentiated into linear and non-linear correlation. In multivariate data sets, the
correlation coefficient is additionally calculated for each pair of variables.[Gö] „For
correlation analysis, mainly methods of classical, multivariate, robust and exploratory
statistics are used, but also various non-linear regression methods whose approximation
errors can be used as correlation measures.“[Run15] A prerequisite for performing
correlation analysis is the normal distribution of the data under investigation.
Figure 3.1.: Biological (left) and abstracted (right) model of neural networks.
Source:[Ert16]
In 1958, Frank Rosenblatt presented the simplest form of a neural network. This
model, presented as a as a single-layer perceptron, contains a layer of input neurons,
each connected to an output neuron. all connected to an output neuron. [Dör18] If
several neurons are connected in layers in connected in series, we are talking about
the multilayer perceptron. The output neuron acts as an input signal for the following
neuron, until a final output is final output takes place at the output layer.
37
38
3.1.1. Weights
In order to be able to map different relationships between the activations of the
neurons, the information flows between the neurons are weighted differently. In
addition, constant input values can provide a neuron-specific bias, the weighting of
which may also need to be determined. [Moe18] In the representation of the network
as a directed graph, the weights are usually specified at the edges. Mathematically,
the representation as a matrix is common. An example of the graph representation
and the equivalent weight matrix can be seen in Figure ??. The weights are often
named by convention with first the index of the neuron receiving the signal as input
and then the index of the sending neuron.
Figure 3.3.: Graph with weighted edges and equivalent weight matrix
At the beginning, the weights are set to random values. Usually values between -1
and 1 are chosen, although the values resulting from the training can also lie outside
Chapter 3. Neural Networks 39
this interval. [Moe18] For certain(!) training methods, initialisation with zeros is also
conceivable. [KK07]
3.1.2. Bias
Bias is the neuron-specific bias that can be interpreted as an additional input with a
constant value. The bias value can be modelled both explicitly or as a weighting of an
additional constant input value and can strengthen or weaken the result of a neuron.
[Zie15]
Which value a neuron passes on depends on the incoming signals, their weighting,
the activation function and the output function. From the incoming output signals
of the connected neurons (and possibly the feedback of the own output signal of the
considered neuron itself), the activation function f calculates the activity level of the
neuron, taking into account the weighting of the signals. [Kru+15] The output of the
neuron xi is calculated from the wij weighted incoming signals of the n neurons xj
with j = 1...n.
n
xi = f ( wij xj )
P
j=1
In the simplest case, the activation function is linear, then the signals are added
weighted and possibly scaled with a constant factor. More complex, non-linear
relationships can only be modelled using linear activation functions in multi-layer
networks.[KK07] Non-linear functions facilitate generalisation and adaptation to diverse
data and are therefore widely used. [Gup20]
The output value of the neuron is then determined from the activity level of the
neuron determined in this way, with the help of the output function. In most cases,
the identity is used as the output function [Bec18a], so that often only the activation
function is addressed and the output function is neglected.
Function ReLu
The Rectified Linear Unit function is considered the most commonly used activation
function. It is defined as R(z) = max(0, z), i.e. all negative values become zero and
all values above zero remain unchanged. However, this can also lead to problems, as
negative activations are ignored and neurons are deactivated. This is what the Leaky
ReLu function is designed to correct, using a low slope <1 in the negative range (see
figure 3.4). Both functions and their derivatives are monotonic. This is important so
that the activation cannot decrease with increasing input values. [Gup20; Sha18]
40
Function Sigmoid
The sigmoid function transforms the input values into the range [0,1], see figure 3.5.
The function for this is S(z) = 1+e1−z . Due to the range of values, the function is
well suited for the representation of probabilities. It is always positive, so that the
output cannot have a weakening influence on the subsequent neurons. The function is
differentiable and monotonic, but its derivative is not monotonic. [Gup20; Sha18]
Function Softmax
The softmax function is often described as a superposition of several sigmoid functions.
It is mainly used in the output layer to reflect the probabilities for the different
categories. The mathematical expression for the softmax function is [Gup20]:
zj
σ(z)j = PKe
ezk
k=1
Function tanh
The tangent hyperbolic function is a non-linear function in the value range from −1
to 1. It is similar to the sigmoid function with the major difference of point symmetry
at the origin. The function is defined as:
2
tanh(z) = 1+e−2z −1
Chapter 3. 41
The negative inputs are strongly negative and the zero inputs are close to the zero
point of the diagram. Negative outputs can occur, which can also negatively affect
the weighted sum of a neuron of the following layer. This function is particularly well
suited to the differentiation of two classes. The function is differentiable, monotonic
and the derivative is non-monotonic. [Gup20] The figure 3.6 shows the function tangent
hyperbolicus in the definition range from −6 to 6.
3.2. Training
The great strength of neural networks lies in their ability to learn. In neural networks,
the learning process consists of adapting the weights with the aim of outputting the
desired output value(s) in the output layer in the case of corresponding input signals
in the input layer. For this purpose, the weights are first initialised randomly and
then adjusted according to the learning rules.
3.2.1. Propagation
Propagation refers to the process in which information is sent via the input layer
into the network and passes through it to the output layer. If the information only
moves from the input to the output layer, it is referred to as a forward network or
forward propagation. At the beginning, the input is copied into the input layer and
then forwarded to the first hidden layer. What information this layer passes on to the
next depends on the weights and the activation functions. Finally, the output can be
read from the output layer. [Kri08]
3.2.2. Error
Error is the deviation between the actual network output and the desired network
output after the input data has been propagated through the entire neural network.
[Kri08]
error function is used (gradient descent). The strength of the adjustment is determined
by the learning rate. This process is repeated iteratively until the termination criterion,
e.g. a sufficiently small error, is met.[KK07]
This rule can only be applied directly to the weights between the output layer and
the layer before it. For the layers before it, the error must be propagated backwards
through the network, since the desired outputs for the hidden networks are not given.
[Kru+15]
3.2.7. Dropout
Dropout disables some neurons by multiplying a defined percentage of neurons by
zero during forward propagation. These neurons have no longer have any influence on
subsequent activations. Deactivation occurs on a random basis per run. If a neuron
is imprinted on a feature and fails, the surrounding neurons learn to deal with this
problem. This increases the generalisation of the network, it learns more abstract
concepts. A disadvantage is the increase in training time, because the parameter
feedback is noisy. At the end of the training, all neurons are reactivated. [Bec18b]
Chapter 3. 43
If the data sets are highly similar, transfer learning is a good way to improve a neural
network. Similarities exist, for example, in a trained network for the recognition of
dog breeds, which would also be suitable for the recognition of cat breeds. However,
this network would be unsuitable for the recognition of different car models, as there
would be too many deviations in the basic structures. As far as the size of the data
set is concerned, transfer learning may be particularly suitable for small data sets, as
otherwise overfitting would quickly occur. [Bec18c]
Figure 3.8.: Illustration of imprinting in the normalized embedding space. (a) Before
imprinting, the decision boundaries are determined by the trained weights.
(b) With imprinting, the embedding of an example (the yellow point) from
a novel class defines a new region. [QBL18]
Average Embedding
(i)
If n > 1 examples (x+ )ni=1 are available for a new class, new weights are computed by
(i)
averaging the normalized embeddings w̃x = n1 i=1 Φ(x+ )
Pn
and re-normalizing the resulting vector to unit length w+ = ||w̃ w̃+ || . In practice,
+
the averaging operation can also be applied to the embeddings computed from the
randomly augmented versions of the original low-shot training examples.
Fine-tuning
Since the model architecture has the same differentiable form as ordinary ConvNet
classifiers, a finetuning step can be applied after new weights are imprinted. The
average embedding strategy assumes that examples from each novel class have a
unimodal distribution in the embedding space. This may not hold for every novel class
since the learned embedding space could be biased towards features that are salient
and discriminative among base classes. However, fine-tuning (using backpropagation
to further optimize the network weights) should move the embedding space towards
having unimodal distribution for the new class. [QBL18]
Chapter 3. 45
The convolutional layer does not necessarily have to be the first layer, but can
also receive preprocessed data as input. Usually, multiple Convolutional Layers are
used.[Mic19]
The filters of the first Convolutional Layer detect low-level features such as colours,
edges and curves, while higher layers detect more abstract features. By stacking
multiple Convolutional and Pooling Layers, higher level feature representations can be
progressively extracted. Moreover, the dimensionality is also reduced continuously,
which reduces the computational cost. [Gu+18; Sah18]
3.3.4. Hyperparameter
When building and/or training a CNN, several parameters need to be considered.
The architecture of the network and the order of the layers need to be determined.
There are new approaches to different convolutional methods from which to choose.
In addition, the size and number of filters and the stride must be defined. The
pooling procedure must also be defined, as well as the activation functions in the fully
networked layers. Likewise, it must be defined how many neurons the input layer
should have (this influences the possible image size) and how many layers and neurons
per layer the Fully Connected Layer should be composed of.
Many considerations must be made regarding the database. The complexity of the
network to be trained depends mainly on whether only one input channel (black and
white or greyscale) is used or three (RGB). If only one colour channel is to be used
to reduce complexity, the images must be prepared accordingly. Likewise, the size
(number of pixels, height and width) of the images may need to be adjusted to fit the
architecture used. Normalising the images can also be helpful.
Since the images are presented to the neural network as data arrays, the file format
does not play a direct role. Indirectly, however, the different image qualities of the
various formats can influence the performance of the networks. For example, networks
Chapter 3. 47
trained with high-resolution images may produce poor results when they are asked
to classify compressed images. [DK16] Differences can also arise from the different
compressions, so different file formats should not be mixed.
3.3.5. Fine-tuning
Fine-tuning allows specific layers from the convolution layer to be trained specifically.
With a small number of training images, the mesh can be easily transferred to a new
task. This is much faster than designing a neural network from scratch. [Cho18]
• The fully connected layer contains no information about where the object is
located.
Should the position of an object in the image matter, the fully connected layers are
largely useless, as the later layers only generate abstract concepts. [Cho18]
3.3.7. AlexNet
The most popular CNNs for object detection and classification from image data are
AlexNet, GoogleNet and ResNet50. [SJM18]
Gu et al. [Gu+18] refer to the development of AlexNet in 2012 as a breakthrough in
large-scale image classification. Nevertheless, its architecture is only comlpex enough
to keep its operation comprehensible. Therefore, it seems a good choice for first
attempts at your own training.
AlexNet consists of five Convolutional Layers and three Fully Connected Layers. The
number and size of the filters as well as their stride are shown in table 3.1 for the
sake of clarity. The first and second convolutional layers are each followed by a
response normalisation layer, also known as batch normalisation. This additional
layer is intended to mitigate the effect of unstable gradients through normalisation.
After the response normalisation layers as well as after the fifth convolutional layer, a
max-pooling layer follows. Overlapping pooling was chosen as the pooling method.
This means that the individual sections to which pooling is applied overlap. The
pooling area measures 3x3 and the stride 2 pixels. ReLu is applied to the output
of each Convolutional Layer and each of the Fully Connected Layers. Each Fully
Connected Layer consists of 4096 neurons. The output layer uses the softmax function
to map the probabilities for the different categories via the output neurons. [KSH12a;
Ala08]
3.3.8. YOLO
YOLO (You Only Look Once) is a state-of-the-art real-time object detection system
that performs classification and localisation (with bounding box) much faster than
conventional systems.[Red21]
Detection (in the sense of classification AND localisation) of objects is more complex
than mere classification. In conventional systems, the image is divided into many
regions, each of which is classified to determine the regions that contain the detected
object. [Cha17]
In YOLO, on the other hand, the entire image is given once to a single neural network,
which decomposes the image itself into individual regions, determining bounding boxes
and classification probabilities for each of these regions themselves, and weighting the
bounding boxes with the probabilities.
network and not just individual parts of it, contextual information can be taken into
account. For example, fewer errors due to interpretation of parts of the background as
objects occur less frequently than with other systems. In addition, what is learned
by the system can be better transferred to other areas, so that objects learned from
photographs, for example, can also be recognised better in artistic representations
than with other systems. [Red+15; Red21]
The YOLO code as well as the code used for the training are open source.
Figure 3.10.: Original image, scaling, rotation and cropping of the image
50
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 0
1 0
1 01 01 1 1 1
1 1 1 1 1 1 01 1 1 1
1 1 1 1 1 1 01 1 1 1
1 1 1 1
0 1
0 01 01 1 1 1
1 1 1 1 1 1 01 1 1 1
1 1 1 1 1 1 01 1 1 1
10 × 10 Pixel
1 1 1 1
0 1
0 01 01 1 1 1
1 1 1 1 1 1 1 1 1 1
Often greyscales are used instead of colour images. For this purpose, 256 greyscales
are often used.When converting a colour image with three colours, the three matrices
are added weighted. It has been shown that equal weighting of the colour channels is
+ + =
inappropriate. The OpenCV library for processing images uses the following weighting
[Tea20a]:
Figure 3.14.: Black and white image, image with 4 grey levels and with 256 grey levels.
However, the quality of the image does not only depend on the parameters mentioned.
The sharpness of the image also contributes to it. However, there is unfortunately
no simple definition here, but is often shaped by subjective perception. On the other
hand, it is possible to influence image sharpness by mathematical operations. Blurring
of an image can be achieved by blurring. Each pixel value is thereby represented by
the average of the pixel values in a small window centred on the pixel.
52
The resolution of the image affects the amount of computation required. For an image
with a resolution of 674 × 1200 and a window size of 101 × 101, the new pixel value
must be calculated separately for all three colour channels. In total this results in
1. Object detection
2. Image classification.
.
When detecting objects, an image is examined to see if one or more objects can be
identified. If an object is detected, the coordinates of the surrounding rectangle and
the class or classes of the object are returned with a respective probability. With
the segmentation, instead of the coordinates of the surrounding rectangle, a polygon
course is returned that surrounds the object. In contrast, when classifying an image,
a description of the image is searched for. In both cases CNN are commonly used and
are described in the following section.
A CNN expects an image as input information. If images do not contain colour
information, they are represented as matrices whose number of columns and rows
correspond to the number of pixels in width and height. In the case that an image
Chapter 3.
PROC. OF THE IEEE, NOVEMBER 1998 7
53
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84
Fig.Figure
2. Architecture of LeNet-5, a Convolutional Neural Network, here for digits recognition. Each plane is a feature et.al.
map, i.e. a [LeC+98]
set of units
whose weights3.17.: Struktur
are constrained der
to be identical. CNN-Architektur gemäß LeCun
as the feature maps in the previous layer. The trainable B. LeNet-5 Acti-
coecient and bias control the e ect of the sigmoid non-
Feature Kernel
linearity. IfLayer
the coecient is small, then the unit operates This
Sizesection describes in more detail the architecture of vation
Stride
in a quasi-linear mode, and the sub-samplingMap layer merely LeNet-5, the Convolutional SizeNeural Network used in the
blurs the input. If the coecient is large, sub-sampling experiments. LeNet-5 comprises 7 layers, not counting thefunction
performing a \noisy OR" or1a \noisy input,
units can be seen as Image
Input
all of which contain trainable parameters (weights).
32 × 32 - - -
AND" function depending on the value of the bias. Succes- The input is a 32x32 pixel image. This is signicantly larger
sive1layers of convolutions
Convolution and sub-sampling are6typically than
28 the
× largest
28 character
5 ×in 5
the database (at1most 20x20 tanh
alternated, resulting in a \bi-pyramid": at each layer, the pixels centered in a 28x28 eld). The reason is that it is
number Average
of feature maps is increased as the spatial desirable that potential distinctive features such as stroke
tion2is decreased. Each 6 resolu-
unit in the third hidden layer 14 × 14
in g- end-points or corner can2 appear
× 2 in the center2of the recep- tanh
Pooling tive eld of the highest-level feature detectors. In LeNet-5
ure 2 may have input connections from several feature maps
3 previous Convolution
in the layer. The convolution/sub-sampling 16 com- the10 set× of centers
10 of the5receptive
× 5 elds of the1last convolu- tanh
bination, inspired by Hubel and Wiesel's notions of \sim- tional layer (C3, see below) form a 20x20 area in the center
ple"4and \complex"Average
cells, was implemented in Fukushima's
16
of the 32x32 input. The values of the input pixels are nor-
5×5 2×2 2 tanh
Pooling
Neocognitron 32], though no globally supervised learning malized so that the background level (white) corresponds
procedure such as back-propagation was available then. A to a value of -0.1 and the foreground (black) corresponds
large5 degree of invariance
Convolution 120 of to 1.175.
to geometric transformations 1 × This 1 makes the 5× 5 input roughly
mean 1 0, and the tanh
the input can be achieved with this progressive reduction variance roughly 1 which accelerates learning 46].
Neural In the
of spatial
6 resolution compensated by a progressive- increase sampling 84following, convolutional layers are labeled
- Cx,layers
layers are labeled- Sx, and fully-connected
sub- tanh
Network
of the richness of the representation (the number of feature
are labeled Fx, where x is the layer index.
maps).
Neural
Since all the weights are learned with back-propagation, Layer C1 is a convolutional layer with 6 feature maps.
Ausgabe
convolutional networks -
can be seen as synthesizing their Each unit 10 in each feature map - is connected to-a 5x5 neigh- softmax
Network
own feature extractor. The weight sharing technique has borhood in the input. The size of the feature maps is 28x28
the interesting side e ect of reducing the number of free which prevents connection from the input from falling o
parameters, thereby reducing the \capacity"
Table the ma- the boundary.
3.3.:of Structural elements C1 contains
of LeNet156 trainable parameters, and
chine and reducing the gap between test error and training 122,304 connections.
error 34]. The network in gure 2 contains 340,908 con- Layer S2 is a sub-sampling layer with 6 feature maps of
nections, but only 60,000 trainable free parameters because size 14x14. Each unit in each feature map is connected to a
of the weight sharing. 2x2 neighborhood in the corresponding feature map in C1.
contains colour
Fixed-size information,
Convolutional Networks have suchbeenaapplied
matrix Theisfourcreated
inputs to afor
unit each
in S2 arecolour.
added, thenInmultiplied
this case, one
to manyofapplications, among otheror handwriting recogni- by a trainable coecient, and added to a trainable bias.
speaks colour channel simply
tion 35], 36], machine-printed character recognition 37], of channelindexchannel.
The result is passed through a sigmoidal function. The
CNN on-lineis handwriting
a commonly recognitionused38], and face recogni- 2x2 method
shift-invariant receptive eldsforare non-overlapping,
extracting therefore
adaptive feature features.
tion 39]. Fixed-size convolutional networks that share maps in S2 have half the number of rows and column as
Its weights
success alonghas contributed
a single temporal dimension to arethe popularity
known as feature mapsof inmachine
C1. Layer S2learning. Inparameters
has 12 trainable its history, its
Time-Delay Neuralhave Networks (TDNNs). TDNNs and 5,880 connections.
have beenevolution.
architectures undergone a
used in phoneme recognition (without sub-sampling) 40],great Some well-known architectures
Layer C3 is a convolutional layer with 16 feature maps. are
now41],described.
spoken word recognition (with sub-sampling) 42], Each unit in each feature map is connected to several 5x5
43], on-line recognition of isolated handwritten charac- neighborhoods at identical locations in a subset of S2's
ters 44], and signature verication 45]. feature maps. Table I shows the set of S2 feature maps
3.5.3. AlexNet
The AlexNet architecture was the first to use CNN models implemented on Graphics
Processing Unit (GPU)s. In 2012, this really connected the growing computing power
with Deep Learning. AlexNet is significantly deeper and more complex than LeNet
They also started using ReLU activations instead of sigmoid or tanh, which helped train
better models. AlexNet not only won the 2012 ImageNet classification competition,
54
but beat the runner-up by a margin that suddenly made non-neural models almost
obsolete. [KSH12a]
3.5.4. InceptionNet
The architecture InceptionNet used the increased possibilities of the hardware. The
architecture was again much deeper and richer in parameters than the existing models.
To deal with the problem of training deeper models, they used the idea of multiple
auxiliary classifiers present between the model to prevent the gradient from breaking.
One of their ideas was to use kernels of different sizes in parallel, thus increasing the
width of the model instead of the depth. This allows such an architecture to extract
both larger and smaller features simultaneously. [Sze+14]
3.5.5. VGG
Many architectures tried to achieve better results by using larger convolution kernels.
For example, the architecture AlexNet uses cores of size 11 × 11, among others. The
architecture VGG took the path of using several cores of size 3 × 3 in succession and
more non-linearities in the form of activation functions. This again improved the
results. [Zha+15; SZ15]
3.5.8. EfficientNet
The architecture is the result of considering that the various architectures to date, focus
on either performance or computational efficiency. The authors of [TL20] claimed that
these two problems can be solved by similar architectures. They proposed a common
CNN skeleton architecture and three parameters, namely the width, the depth and
the resolution. The width of the model refers to the number of channels present in
the different layers, the depth refers to the number of layers in the model and the
resolution refers to the size of the input image to the model. They claimed that by
keeping all these parameters small, one can create a competitive yet computationally
Chapter 3. 55
Model Size Accuracy Parameter Depth
VGG16 528 MB 71,3% 138.357.544 23
VGG19 549 MB 71,3% 143.667.240 26
ResNet-50 98 MB 74,9% 25.636.712 50
ResNet-101 171 MB 76,4% 44.707.176 101
ResNet-152 232 MB 76,6% 60.419.944 152
InceptionV3 92 MB 77,9% 23,851,784 159
InceptionResNetV2 215 MB 80,3% 55,873,736 572
MobileNet 16 MB 70,4% 4.253.864 88
MobileNetV2 14 MB 71,3% 3.538.984 88
efficient CNN model. On the other hand, by increasing the value of these parameters,
one can create a model that is designed for accuracy.
• Activation function
• Pooling
• Flattening
• Neural network
When defining the filters, make sure that the filter is defined for each channel. To
capture the features in an image, the values in the filter must match the structure of
the original pixel values of the image. Basically, if there is a shape in the input image
that generally resembles the curve that this filter represents, then the convolution will
result in a large value.
As a rule, several filters are used in parallel. The result is so-called feature maps.
Further convolution layers are then applied to this result, which in turn generate
feature maps.
Chapter 3. 57
Feature Identification
A convolution is defined by filters. Each filter can be interpreted as an identifier of
features. Features are, for example, edges, colours or even curves. For example, the
7 × 7 filter is considered:
0 0 0 0 0 50 0
0 0 0 0 50 0 0
0 0 0 50 0 0 0
0 0 0 50 0 0 0
0 0 0 50 0 0 0
0 0 0 50 0 0 0
0 0 0 0 0 0 0
This filter can also be visualised as images:
To demonstrate the effect of the filter, a sample image 3.20 is considered:
In the example image 3.20 a section is marked to which the filter is applied.
0 0 0 0 0 0 30 0 0 0 0 0 50 0
0 0 0 0 80 80 80 0 0 0 0 50 0 0
0 0 0 30 80 0 0 0 0 0 50 0 0 0
0 0 0 80 80 0 0 ⋆ 0 0 0 50 0 0 0
0 0 0 80 80 0 0 0 0 0 50 0 0 0
0 0 0 80 80 0 0 0 0 0 50 0 0 0
0 0 0 80 80 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 30
0 0 0 0 80 80 80
0 0 0 30 80 0 0
= 0 0 0 80 80 0 0
0 0 0 80 80 0 0
0 0 0 80 80 0 0
0 0 0 80 80 0 0
The basic property of filters is clarified. If there is a shape in the input image that
generally resembles the filter, then large values result. A probability is then assigned via
the activation function. Various filters are then defined and applied in the convolution
layer. The result is then the feature map. The figure 3.21 shows some examples of
concrete visualisations of filters of the first convolutional layer of a trained network.
Edge Detection
Suitable filters can also be defined for edge detection. The filter
1 0
−1
1 0 −1
1 0 −1
teristics of finger vein biometrics. However, the differences vein recognition. Specifically, the CNN fi
between images from ImageNet and finger vein images lie not convolutional layer of the AlexNet networ
only in the images themselves, but also in the image space. these filters with large kernel size well cap
Consequently, developing a method for exploiting effective features. By visualizing the CNN and Gab
features using a CNN model pre-trained on ImageNet is not outputs, five labels are given to the CNN fi
58 straightforward. their appearances and filtered images. App
the CNN filters that can capture line struc
establish a new group of CNN filters. Simil
approach, the competitive order (CO) map
using the winner-take-all rule on the new
tered images. Then the pyramid histograms
order are extracted and concatenated to
histogram for an image representation.
This paper makes three important contri
(1) A framework that employs a CNN mo
ImageNet for finger vein recognition is p
from the existing CNN-based technologies
FIGURE 1. Gabor filters with four groups of parameters on eight
labeled dataset to train a deep model, the
orientations. is more readily usable in fields where
Figure 3.21.: Visualisation of filters of a trained network [LXW19; Sha+16]
datasets are scarce and tend to be small, s
Finger veins located underneath the skin of a finger contain recognition. In addition, only the CNN fi
10 10 10 0 0 network
0
a vascular that resembles a piecewise linear pattern, layers are used. (2) We define five labels
10 10 10 which0 are 0 similar
0 tothe principle lines in a0 palm 30 print.
30 0 according to their appearances and outp
10 10 10 For biometrics
0 0 0 involving1 abundant
0 −1 line structures,
0 30orienta- 30 0
∗ 1approaches method confirms that CNN filters with dif
tion
10 10 10 0 0 0 coding (OC)-based 0 −1 using= the 0Radon trans-
30 30 0
various degrees of effectiveness for featu
1 filters
0 −1
10 10 10 form [16], [17] and tunable such as steerable filter [18]
0 0 0 0 30 30 0 (3) The proposed CNN-CO can perform b
and Gabor filter [3], [19]–[21], are appropriate for capturing of-the-art algorithms, as demonstrated in e
10 10 10 0 0 0
the orientation information. OC-based representation is nor- The remainder of this paper is orga
10 10 10 mally0 obtained
0 0 by first convolving an image with
a bank of Section 2 introduces the related work on
10 10 10 filters and then encoding
the filtered responses using several
0 0 0 0 0 0 0 nition technology. A comparison of the v
1 1 1 achieve
10 10 10 rules. Since OC-based approaches can guaranteed
0 0 0 0 recog-
0 0 0 Gabor filters and CNN filters is presen
0applied 0 0 =
accuracy, they ∗ been
have widely for
palmprint
10 10 10 0 0 0 0 0 0 0 Section 4 describes our proposed approac
nition and finger vein recognition.
−1 −1 Figs.
−1 1 and 2 depict the provides extensive experimental results an
10 10 10 0 0 0
Gabor filters and their filtered images. In Fig. 3(a),0the0visu- 0 0
Section 6 provides our conclusion and
10 10 10 0 0of 0convolutional filters from the first layer of the
alization future work.
AlexNet [4] network indicates that some CNN filters from
Figurethe3.22.:
first layer and their
Detection of outputs
verticalareandsimilar to Gaboredges
horizontal filters. II. RELATED WORK
This similarity motivates us to select CNN filters from a The last decade has witnessed extensiv
shallow layer to extract orientation features for finger vein automated finger vein recognition techn
recognition. biometric techniques, finger vein recogniti
can be used for a vertical edge detection, whereas the filter
In this paper, we propose to employ the learnable convo- main steps: image collection, image prep
lutional filters from a CNN model pre-trained on ImageNet extraction, and matching. Many notable tec
to extract the CNN1competitive 1 1order (CNN-CO) for finger proposed for finger vein recognition.
0 0 0
35114 −1 −1 −1
can be used for a horizontal one. Both filters are now applied to a vertical edge.
Edges can be present in any arrangement in images, edge detection must take this into
account. That is, if a CNN is trained, the filters evolve based on the given images.
3.5.12. Padding
In the previous representation, filters could not be applied to boundary values. On
the one hand, this means that information is not used, on the other hand, the size of
the images is reduced each time the folding is applied. To prevent this, padding is
applied. For a filter of dimension (2n + 1) × (2m + 1), the input image is padded with
an additional border of size n and m. The additional values are each zero occupied.
5
4
3
2
1
x
−5 −4 −3 −2 −1 1 2 3 4 5
zero, the function returns zero; for positive values, it behaves linearly [Aga18b]. Thus,
only positive values are returned (Abb. 3.23).
For example, an activation function can be applied after a convolution. This is shown
in the equation ??.
3.5.14. Pooling
After the use of convolutional layers, pooling layers are often used in CNN to reduce
the size of the representation, to also speed up the calculations as well as to make some
of the detection functions a bit more robust. A distinction is usually made between
two pooling methods.
• Max Pooling
• Average Pooling
In pooling, areas of a matrix are combined. The ranges are defined by two values n
and m. The ranges are submatrices of the matrix of size n × m. A function is then
applied to each submatrix.
In max-pooling, the maximum value of the submatrix is determined and entered as
the result. In the case of average pooling, the average value of the matrix elements is
determined instead of the maximum value and used further.
In the case of pooling, a window with a defined size is moved over the input matrix
and the corresponding value within the window is taken over as a pooled value. The
step size with which the window moves over the matrix is defined via a parameter
[DV16]. The default value corresponds to the size of the sub-matrix. An example
is shown in the representation 3.25: a 4 × 4-matrix is defined with size pool_size
2 × 2 and a step size stride_size 2 × 2 processed. Since two values are combined
into each dimension, the dimensions of the output matrix are halved in contrast to
the input matrix. [Bri15]
60
2 0 1 1
0 1 0 0 pool_size = (2,2) 2 1
0 0 1 0 stride = (2,2) 3 1
0 3 0 0
2 0 1 1
0 1 0 0 pool_size = (2,2) 0,75 0,75
0 0 1 0 stride = (2,2) 0,50 0,25
0 3 0 0
3.5.15. Flattening
With a flatten layer, the multidimensional input object is converted into a vector.
For example, if an object with dimensions 5 × 7 × 7 is passed in, it results in a
one-dimensional vector with 245 elements. In the figure 3.26, the function Flatten is
applied to a 3 × 3 matrix. The function is used when a neural network is subsequently
used.
1 1 0
4 2 1 1 1 0 4 2 1 0 2 1
0 2 1
1 MODEL = models.Sequential()
2 MODEL.add(layers.Conv2D(filters=32, kernel_size=(3, 3), activation=’relu’,
,→ input_shape=(32, 32, 3)))
3 MODEL.add(layers.MaxPooling2D(pool_size=(2, 2)))
4 MODEL.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation=’relu’))
5 MODEL.add(layers.MaxPooling2D(pool_size=(2, 2)))
6 MODEL.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation=’relu’))
7 MODEL.add(layers.Flatten())
8 MODEL.add(layers.Dense(units=64, activation=’relu’))
9 MODEL.add(layers.Dense(units=10))
configuration transforms the input data in the first convolutional layer from 32×32×3
to an output of 30 × 30 × 32. A separate feature map is stored for each of the applied
filters.
The next layer, a max-pooling layer with a pool_size of 2 × 2, reduces the data size
from 30 × 30 × 32 to 15 × 15 × 32 (line 3).
input: [(?, 32, 32, 3)]
conv2d_input: InputLayer
output: [(?, 32, 32, 3)]
This data is now fed again into a convolutional layer, which here is also configured
with a kernel_size of 2 × 2, ??, line 4, but with 64 filters. This transforms the data
from 15 × 32 × to 13 × 64 × .
To reduce the amount of data again, another max-pooling layer is applied, which is
also configured with a pool_size of 2 × 2 in line 5, reducing the amount of data
from 13 × 13 × 64 to 6 × 6 × 64.
The last convolutional layer is then applied to this network. This, like the previous
convolutional layer, works with kernel_size 2 × 2 and 64 filters as per line 6. This
transforms the data from 6 × 64 × to 4 × 64 × .
To enable classification of this information with a neural network, the so-called dense
layer, the results of the convolutional layers are transformed into a vector with a
flatten layer, see line 7. Due to the input size of 4 × 64 × this results in a vector of
62
length 1024.
The vector is fed into a Dense layer in line 8. This layer is parameterised with 64
units and the activation function ReLu, which combines the 1024 input data into 64
nodes and calculates the corresponding outputs.
In the last layer, the 64 output values are passed to another Dense layer, which has
been configured with 10 units. The input data are thus now reduced to 10 output
data, which correspond to the classes to be identified of the data set CIFAR-10.
4. Databases and Models for Machine
Learning
Data is the basis for machine learning. The success of the models depends on their
quality. This is because machine learning relies on accurate and reliable information in
the training of its algorithms. This self-evident fact is well known, but unfortunately
not sufficiently taken into account. Poor data leads to insufficient or incorrect results.
This goes without saying, but is often overlooked. The training data is realistic if
it reflects the data that the AI system records in real use. Unrealistic data sets
hinder machine learning and lead to expensive misinterpretations. If one wants to
develop software for drone cameras, realistic images must also be used. In such a
case, if one uses corresponding images from the web, they usually have the following
characteristics:
If one wants to use data sets for one’s own needs, care must be taken that only data
that are also realistic are used. The data sets must also not contain any outliers or
redundancies. When checking the quality of the data, the following questions can be
helpful:
• Where does the data come from? Is it suitable for the intended application?
Kaggle.com: Hier werden über 20.000 Datensätze angeboten. Dazu ist nur ein kosten-
loses Benutzerkonto notwendig.
lionbridge.ai: The website offers a good overview of data sets from the public and
commercial sectors.
govdata.de: The Data Portal for Germany offers freely available data from all areas
of public administration in Germany.
63
64
# B u i l d your i n p u t p i p e l i n e
ds = ds . s h u f f l e ( 1 0 2 4 ) . batch ( 3 2 ) . p r e f e t c h ( t f . data . e x p e r i m e n t a l .AUTOTUNE)
f o r example in ds . t a k e ( 1 ) :
image , l a b e l = example [ " image " ] , example [ " l a b e l " ]
scikit data sets: Data sets are also installed with the Python library. There are only
a few datasets, but they are already pre-worked so that they are easy to load
and use.
UCI - Center for Machine Learning and Intelligent System: The University of Irvine
in Califonia offers around 600 data sets for its own study.
Open Science Data Cloud: The platform aims to create a way for everyone to access
high-quality data. Researchers can house and share their own scientific data,
access complementary public datasets, create and share customised virtual
machines with the tools they need to analyse their data.
Amazon: Amazon also provides data sets. You have to register for this free of charge.
Google Dataset Search: The website does not directly offer datasets, but rather
search support. Google restricts its search engine here to data sets.
faces: Labeled Faces in the Wild is a public benchmark for face verification, also
known as pair matching. No matter what the performance of an algorithm, it
should not be used to conclude that an algorithm is suitable for any commercial
purpose.
4.1. Datenbanken
4.1.1. Data Set MNIST
The dataset MNIST is available for free use and contains 70,000 images of handwritten
digits with the corresponding correct classification. [Den+09; Den12] The name of the
dataset is justified by its origin, as it is a mmodified dataset from two datasets from
the US National Institute of Standards and Technology. These contain handwritten
digits from 250 different people, consisting of US Census Bureau employees and high
school students. The NIST Special Database 3 (SD-3) and NIST Special Database 1
(SD-1) datasets collected in this way were merged because the former dataset of clerical
workers contains cleaner and more easily recognisable data. Initially, SD-3 was used
Chapter 4. Databases and Models for Machine Learning 65
Figure 4.1.: Examples from the training set of the dataset MNIST [SSS19]
as the training set and SD-1 as the test set, but because of the differences, it makes
more sense to merge the two. The dataset is already split into 60,000 training images
and 10,000 test images. [LCB13; Nie15]
The data is provided in four files, two for the training dataset and two for the test
dataset, one of which contains the image data and the other the associated labels:
The images in the dataset MNIST are greyscale images of size 28 × 28 pixels.[LCB13]
The data is in IDX file format and so cannot be opened and visualised by default.
However, you can write a programme to convert the data into CSV format or load a
variant in CSV format directly from other websites. [Red20]
The library TensorFlow provides the dataset MNIST under tensorflow_datasets.
This is not the only dataset provided by TensorFlow. A list of all datasets that can
be loaded via so can be found in catalogue of datasets.
To import the dataset into the Python environment, the module keras.datasets is used
truck truck cat bird frog
here. This contains a direct interface to the dataset CIFAR-10, which is downloaded
to the local system by calling the function load_data(). In the background, the
dataset is downloaded as a packed file with the extension .tar.gz and stored in the
folder C:\Users\<username>\.keras\datasets. After unpacking, the dataset is
contained in several serialised objects, which are loaded into the Python environment
via pickle. The training data, training labels, test data and test labels can then be
accesseddeer cat lists (Abschnitt
via corresponding frog4.1.2). In orderfrogto reduce the vanishing
bird
gradient problem during the later training, the RGB colour values stored in the data
set with the value range from 9 to 255 are converted into values from 0 to 1. The
listing ?? shows the loading and normalisation of the data.
The data set CIFAR-100, on the other hand, also contains 60,000 images, but divided
into 100 categories with 600 images each. In addition, there are 20 upper categories,
to each of which five of the 100 categories are assigned; see table 18.15.
The dataset can also be downloaded and imported directly into TensorFlow via Keras
[Raj20]:
1 import tensorflow as tf
2 from tensorflow.keras import datasets
3
4 (TRAIN_IMAGES100, TRAIN_LABELS100), (TEST_IMAGES100, TEST_LABELS100) = tf.keras.
,→ datasets.cifar100.load_data()
5 TRAIN_IMAGES100 = TRAIN_IMAGES100 / 255.0
6 TEST_IMAGES100 = TEST_IMAGES100 / 255.0
Figure 4.3.: Supercategories and categories in CIFAR-100 with the original designations
The data set is available on various websites, but attention must be paid to the structure
of the data set. Since it is a file in CSV format, the structure is column-oriented. As a
rule, the title of the individual column is given in the first line: sepal_length, sepal_-
width, petal_length, petal_width and species. The values are in centimetres,
the species is given as setosa for Iris setosa, versicolor for Iris versicolor and
virginica for the species Iris virginica. The columns in this record are:
• Sequence number
• Length of sepal in centimetres
• Sepal width in centimetres
• Petal length in centimetres
68
The aim is to classify the three different iris species based on the length and width
of the sepal and petal. Since the dataset is delivered with the library sklearn, this
approach is chosen.
The record is a dictionary. Its keys can be easily displayed:
1 >>> iris.keys()
The command (X.head()) shows – as in Figure 18.5 – the head of the data frame. It
is obvious that each data set consists of four values.
Each record also already contains its classification in the key target. In the figure 18.6
this is listed for the first data sets.
Chapter 4. Databases and Models for Machine Learning 69
1 ’.. _iris_dataset:
2
3 Iris plants dataset
4 --------------------
5
6 **Data Set Characteristics:**
7
8 :Number of Instances: 150 (50 in each of three classes)
9 :Number of Attributes: 4 numeric, predictive attributes and the class
10 :Attribute Information:
11 - sepal length in cm
12 - sepal width in cm
13 - petal length in cm
14 - petal width in cm
15 - class:
16 - Iris-Setosa
17 - Iris-Versicolour
18 - Iris-Virginica
19
20 :Summary Statistics:
21
22 ============== ==== ==== ======= ===== ====================
23 Min Max Mean SD Class Correlation
24 ============== ==== ==== ======= ===== ====================
25 epal length: 4.3 7.9 5.84 0.83 0.7826
26 sepal width: 2.0 4.4 3.05 0.43 -0.4194
27 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
28 petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
29 ============== ==== ==== ======= ===== ====================
30
31 :Missing Attribute Values: None
32 :Class Distribution: 33.3% for each of 3 classes.
33 :Creator: R.A. Fisher
34 :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
35 :Date: July, 1988
36
37 The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
38 from Fisher\’s paper. Note that it\’s the same as in R, but not as in the UCI\
,→ nMachine Learning Repository, which has two wrong data points.
39
40 This is perhaps the best known database to be found in the
41 pattern recognition literature. Fisher\’s paper is a classic in the field and
42 is referenced frequently to this day. (See Duda & Hart, for example.) The
43 data set contains 3 classes of 50 instances each, where each class refers to a\
,→ ntype of iris plant. One class is linearly separable from the other 2;
,→ the\nlatter are NOT linearly separable from each other.
44
45 .. topic:: References
46
47 - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
48 Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
49 Mathematical Statistics" (John Wiley, NY, 1950).
50 - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
51 (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
52 - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
53 Structure and Classification Rule for Recognition in Partially Exposed
54 Environments". IEEE Transactions on Pattern Analysis and Machine
55 Intelligence, Vol. PAMI-2, No. 1, 67-71.
56 - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
57 on Information Theory, May 1972, 431-433.
58 - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
59 conceptual clustering system finds 3 classes in the data.
60 - Many, many more ...’
can be used to determine how many classes are present. The output of the command
is shown in the figure 18.7; it results in 3 classes with the numbers 0, 1 and 2. There
are 50 records assigned to each of them.
• Object recognition
– The dataset COCO contains ≈ 330,000 images.
– The dataset COCO contains ≈ 1,500,000 annotations for objects.
– The data set COCO contains 80 categories.
– The pictures each have five headings.
– The images have a medium resolution of 640 × 480 pixels.
• Semantic segmentation
Chapter 4. Databases and Models for Machine Learning 71
{
" info " : {...} ,
" licenses " : {...} ,
" images " : { . . . } ,
" categories " : {...} ,
" annotations " : { . . . }
}
{
" d e s c r i p t i o n " : "COCO␣ 2017 ␣ D a t a s e t " ,
" u r l " : " http :// cocodataset . org " ,
" version " : " 1.0 " ,
" year " : 2017 ,
" c o n t r i b u t o r " : "COCO␣ Consortium " ,
" d a t e _ c r e a t e d " : " 2017/09/01 "
}
Due to the size and frequent use of the dataset, there are many tools, for example
COCO-annotator and COCOapi, to access the data.
Actually, the dataset COCO consists of several, each made for a specific machine
learning task. The first task is to determine surrounding rectangles for objects. That is,
objects are identified and the coordinates of the surrounding rectangle are determined.
The extended task is object segmentation. Here, objects are also identified, but in
addition, instead of the surrounding rectangles, polygons are drawn to delimit the
objects. The third classical task is cloth segmentation. The model should perform
object segmentation, but not on individual objects, but on continuous background
patterns such as grass or sky.
The annotations are stored in JSON format. The JSON format is a dictionary with
key-value pairs in curly brackets. It can also contain lists, ordered collections of
elements within curly braces, or dictionaries nested within them.
Section “Info”
The dictionary for the info section contains metadata about the record. For the
official record COCO it is the following information:
As can be seen, only basic information is included, with the url value pointing to
the official website of the dataset. This is common for machine learning datasets to
point to their websites for additional information. For example, there you can find
information on how and when the data was collected.
In the section “licenses” you will find links to licenses for images in the dataset with
the following structure:
72
[
{
" u r l " : " h t t p : / / creativecommons . o r g / l i c e n s e s /by−nc−s a / 2 . 0 / " ,
" id " : 1 ,
" name " : " A t t r i b u t i o n −NonCommercial−S h a r e A l i k e ␣ L i c e n s e "
},
{
" u r l " : " h t t p : / / creativecommons . o r g / l i c e n s e s /by−nc / 2 . 0 / " ,
" id " : 2 ,
" name " : " A t t r i b u t i o n −NonCommercial ␣ L i c e n s e "
},
...
]
{
" license " : 3,
" file_name " : " 000000391895. jpg " ,
" c o c o _ u r l " : " h t t p : / / images . c o c o d a t a s e t . o r g / t r a i n 2 0 1 7 / 0 0 0 0 0 0 3 9 1 8 9 5 . j p g " ,
" height " : 360 ,
" width " : 6 4 0 ,
" date_captured " : " 2013−11−14␣ 1 1 : 1 8 : 4 5 " ,
" f l i c k r _ u r l " : " h t t p : / / farm9 . s t a t i c f l i c k r . com/8186/8119368305 _4e622c8349_z . j p g
" i d " : 391895
}
This dictionary “images” is probably the second most important and contains metadata
about the images.
The images dictionary contains the field id. In this field the licence of the image is
given. The full text is given in the URL. When using the images, it must be ensured
that no licence infringement occurs. If in doubt, do not use them. This also means,
however, that when creating one’s own data set, one assigns an appropriate licence to
each image.
The most important field is the id field. This is the number used in the annotations
section to identify the image. So, for example, if you want to identify the annotations
for the given image file, you have to check the value of the id field for the corresponding
image document in the images section and then cross-reference it in the annotations
section.
In the official record COCO, the value of the id field is the same as the name file_-
name after removing the leading zeros. If one uses a custom record COCO, this may
not necessarily be the case.
Section “Categories”
The categories section is a little different from the other sections. It is designed
for the task of object recognition and segmentation and for the task of substance
segmentation.
For object recognition and object segmentation, the information is obtained according
to the listing 4.10
Chapter 4. Databases and Models for Machine Learning 73
[
{ " supercategory " : " p e r s o n " , " i d " : 1 , " name " : " p e r s o n " } ,
{ " supercategory " : " v e h i c l e " , " i d " : 2 , " name " : " b i c y c l e " } ,
{ " supercategory " : " v e h i c l e " , " i d " : 3 , " name " : " c a r " } ,
...
{ " supercategory " : " i n d o o r " , " i d " : 9 0 , " name " : " t o o t h b r u s h " }
]
[
{ " s u p e r c a t e g o r y " : " t e x t i l e " , " i d " : 9 2 , " name " : " banner " } ,
{ " s u p e r c a t e g o r y " : " t e x t i l e " , " i d " : 9 3 , " name " : " b l a n k e t " } ,
...
{ " s u p e r c a t e g o r y " : " o t h e r " , " i d " : 1 8 3 , " name " : " o t h e r " }
]
In the section, the lists contain the categories of objects that can be recognised on
images. Each category has a unique number id and it should be in the range [1,
number of categories]. Categories are also grouped into supercategories that can be
used in programs to recognise, for example, vehicles in general, when you don’t care if
it is a bicycle, a car or a truck.
There are separate lists for substance segmentation, see Listing 4.11
The number of categories in this section start high to avoid conflicts with object
segmentation, as these tasks can be performed together in the panoptic segmentation
task. The values from 92 to 182 represent the well-defined background material, while
the value 183 represents all other background textures that do not have their own
classes.
The annotations section is the most important section of the dataset, containing
important information for the specific dataset for each task.
The fields according to the listing 4.12 haben folgende Bedeutung.
“segmentation”: This is a list of segmentation masks for pixels; this is a flattened list
of pairs, so you should take the first and second values (x and y in the image),
then the third and fourth, and so on, to get coordinates; note that these are
not image indices, as they are floating point numbers — they are created and
compressed from the pixel coordinates by tools like COCO-annotator.
´‘iscrowd‘”: This is a flag indicating whether the caption applies to a single object
(value 0) or to several objects close to each other (value 1); for fill segmentation,
this field is always 0 and is ignored.
“image_id”: The id field corresponds to the number of the id field from the images
dictionary; warning: this value should be used to cross-reference the image with
other dictionaries, i.e. not the field id.
“bbox”: The heading contains the surrounding rectangles or bounding box, i.e. the
coordinates in the form of the x and y coordinates of the upper left corner, as
74
{
" segmentation " :
[[
239.97 ,
260.24 ,
222.04 ,
...
]] ,
" area " : 2765.1486500000005 ,
" iscrowd " : 0 ,
" image_id " : 5 5 8 8 4 0 ,
" bbox " :
[
199.84 ,
200.46 ,
77.71 ,
70.88
],
" category_id " : 58 ,
" i d " : 156
}
well as the width and height of the rectangle around the object; it is very useful
for extracting individual objects from images, as in many languages such as
Python this can be done by accessing the image array as;
cropped_object = image[bbox[0]:bbox[0] + bbox[2], bbox[1]:bbox[1]
+ bbox[3]]
“category_id”: The field contains the class of the object, corresponding to the field
id from the section “categories”
“id”: The number is the unique identifier for the annotation; note that this is only
the ID of the annotation, so it does not refer to the respective image in other
dictionaries.
When working with crowd images (“iscrowd”: 1), the “segmentation” part may
be a little different:
This is because with a large number of pixels, explicitly listing all the pixels
that create a segmentation mask would take a lot of space. Instead, the dataset
COCO uses a custom compression Run-Length Encoding (RLE), which is very
efficient because segmentation masks are binary and RLE for only zeros and
ones can reduce the size many times.
Chapter 4. 75
4.1.5. ImageNet
ImageNet is an image database with more than 14 million images that is constantly
growing. Each image is assigned to a noun. For each category there are on average
more than 500 images. ImageNet contains more than 20,000 categories in English
with a typical category such as „balloon“ or „strawberry“. The database of third-party
image URL annotations is freely accessible directly through ImageNet, although the
actual images are not owned by ImageNet. [Den+09; Sha+20; KSH12b]
4.2. FaceNet
[SKP15]
FaceNet lernt eine Abbildung von Gesichtsbildern auf einem kompakten euklidis-
chen Raum, in dem Abstände direkt einem Maß für die Ähnlichkeit von Gesichtern
entsprechen. Sobald dies geschehen ist, sind Aufgaben wie Gesichtserkennung, -veri-
fizierung und -clusterung mit Standardtechniken (unter Verwendung der FaceNet-Ein-
bettungen als Merkmale) einfach zu erledigen. Verwendet ein Deep CNN, das trainiert
wird, um die Einbettung selbst zu optimieren, anstatt die Ausgabe einer dazwischen-
liegenden Engpassschicht zu verwenden. Das Training erfolgt mit Triplets: ein Bild
eines Gesichts ("Anker"), ein weiteres Bild desselben Gesichts ("positives Exemplar")
und ein Bild eines anderen Gesichts ("negatives Exemplar"). Der Hauptvorteil liegt in
der Repräsentationseffizienz: Mit nur 128 Byte pro Gesicht kann eine Spitzenleistung
erzielt werden (Rekordgenauigkeit von 99,63 % bei LFW, 95,12 % bei Youtube Faces
DB).
siehe ../../MLbib/CNN/class10_FaceNet.pdf
FaceNet ist ein tiefes Faltungsneuronales Netzwerk, das von Google-Forschern entwick-
elt und um 2015 eingeführt wurde, um die Hürden bei der Gesichtserkennung und
-verifizierung effektiv zu lösen. Der FaceNet-Algorithmus transformiert das Gesichtsbild
in einen 128-dimensionalen euklidischen Raum, ähnlich wie beim Word Embedding[9].
Das so erstellte FaceNet-Modell wird auf Triplett-Verlust trainiert, um die Ähn-
lichkeiten und Unterschiede auf dem bereitgestellten Bilddatensatz zu erfassen. Die
vom Modell erzeugten Einbettungen mit 128 Dimensionen können verwendet werden,
um Gesichter auf sehr effektive und präzise Weise zu clustern. Durch die Verwendung
von FaceNet-Embeddings als Merkmalsvektoren könnten nach der Erstellung des
Vektorraums Funktionalitäten wie Gesichtserkennung und -verifikation implementiert
werden[10]. Kurz gesagt, die Abstände für die ähnlichen Bilder würden viel näher
sein als die zufälligen nicht ähnlichen Bilder. Die allgemeine Blockdarstellung des
FaceNet-Ansatzes der Gesichtserkennung ist in Abb.1 dargestellt.
siehe ../../MLbib/CNN/10.1109ICACCS.2019.8728466.pdf
Eingabeformat? Farben? Pixelauflösung?
4.3. Models
When installing NVidia’s jetson-inference project, the utility offers a large selection
of pre-trained deep learning models for download. Among them are well-known ones
like the AlexNetAlexNet from 2012 as well as various so-called ResNet. Also included
76
Network CLI argument NetworkType enum
AlexNet alexnet ALEXNET
GoogleNet googlenet GOOGLENET
GoogleNet-12 googlenet-12 GOOGLENET_12
ResNet-18 resnet-18 RESNET_18
ResNet-50 resnet-50 RESNET_50
ResNet-101 resnet-101 RESNET_101
ResNet-152 resnet-152 RESNET_152
VGG-16 vgg-16 VGG-16
VGG-19 vgg-19 VGG-19
Inception-v4 inception-v4 INCEPTION_V4
The models GoogleNet and ResNet-18, which are based on the ImageNet image
database, are automatically downloaded in the build step. The table ?? lists all models
that are available in the jetson-inference project. The first column of the table is
the name of the structure of the model. [Alo+18]. The second column contains the
passing parameter for the -network argument of the imagenet-camera.py program.
5. Frameworks and Libraries
As a programming language for data science, Python represents a compromise between
the R language, which focuses on data analysis and visualisation, and Java, which
forms the backbone of many large-scale applications. This flexibility means that
Python can act as a single tool that brings your entire workflow together.
Python is often the first choice for developers who need to apply statistical techniques
or data analysis to their work, or for data scientists whose tasks need to be integrated
into web applications or production environments. Python particularly shines in the
area of machine learning. The combination of machine learning libraries and flexibility
makes Python unique. Python is best suited for developing sophisticated models and
predictive modules that can be directly integrated into production systems.
One of Python’s greatest strengths is its extensive library. Libraries are sets of routines
and functions written in a specific language. A robust set of libraries can make the job
of developers immensely easier to perform complex tasks without having to rewrite
many lines of code.
5.2. Frameworks
5.2.1. TensorFlow
The TensorFlow framework, which is supported by Google, is the undisputed top dog.
It has the most GitHub activity, Google searches, Medium articles, books on Amazon
and ArXiv articles. It also has the most developers using it, and is listed in the most
online job descriptions. [Goo19b]
Variant also exist for TensorFlow that is specific to a hardware. For example, NVIDIA
graphics cards are addressed by a variant with Compute Unified Device Architecture
(CUDA) and Intel also has an optimised variant. However, you may have to do without
the latest version.
TensorFlow Lite, the small device version, brings model execution to a variety of
devices, including mobile devices and IoT, and provides more than a 3x increase in
inference speed compared to the original TensorFlow.
TensorFlow uses data flow graphs to represent computation, shared state, and the
operations that mutate that state. By fully representing the dependencies of each step
of the computation, parts of the computation can be reliably parallelised [Aba+16].
Originally, TensorFlow was to be used for Google’s internal use, but was released in
November 2015 under the Apache 2.0 open source licence. Since then, TensorFlow has
brought together a variety of tools, libraries and communities. TensorFlow provides
low-level interfaces for programming languages such as Python, Java, C++ and Go.
77
78
The mid-level and high-level APIs provide functions for creating, training, saving and
loading models, training, saving and loading models. [Cho18]
5.2.3. TensorRT
TensorRT is a neural network runtime environment based on CUDA, which specialises
in optimising the performance of neural networks [NVI15]. This is achieved by reducing
the computational accuracy from Floating Point 32-bit (FP32) to Floating Point 16-bit
(FP16) resp. Integer 8-bit (INT8). Since the remaining interference accuracy is
sufficient for most use cases, the speed can be significantly increased with this method
[GMG16]. When optimising the network, the operators used are replaced by TensorRT
operators, which can only be executed within a TensorRT environment.
5.2.4. Keras
Keras is a high-level Application Programming Interface (API) application program-
ming interface (API) for neural networks built in Python and running on TensorFlow,
Theano or CNTK. It enables the definition and training of different deep learning mod-
els using optimised Tensor libraries, which serve as a backend engine. The TensorFlow
backend, Theano backend and CNTK backend implementations are currently sup-
ported. Any code written in Keras can be run on the backends without customisation.
Keras offers a choice of predefined layers, optimisation functions or other important
neural network components. The functions can also be extended with custom modules.
The two most important model types provided by Keras are sequential and functional
api. With sequential, straight-line models can be created whose layers are lined up
one after the other. For more complex network structures with feedback, there is the
functional api. [Cho18; SIG20b]
5.2.5. PyTorch
PyTorch, which is supported by Facebook, is the third most popular framework. It is
younger than TensorFlow and has rapidly gained popularity. It allows customisation
that TensorFlow does not. [Fac20]
Chapter 5. Frameworks and Libraries 79
5.2.6. Caffe
Caffe has been around for almost five years. It is in relatively high demand by
employers and frequently mentioned in academic articles, but has had little recent
coverage of its use. [JES20]
5.2.7. Caffe2
Caffe2 is another open source product from Facebook. It builds on Caffe and is now
in the PyTorch GitHub repository. [Sou20]
5.2.8. Theano
Theano was developed at the University of Montreal in 2007 and is the oldest major
Python framework. It has lost much of its popularity and its leader stated that
major versions are no longer on the roadmap. However, updates continue to be made.
[AR+16]
Theano uses a NumPy-like syntax to optimise and evaluate mathematical expressions.
What makes Theano different is that it uses the GPU of the computer’s graphics card.
The speed thus makes Theano interesting.
5.2.10. CNTK
CNTK is the Microsoft Cognitive Toolkit. It reminds me of many other Microsoft
products in the sense that it tries to compete with Google and Facebook offerings and
fails to gain significant acceptance. [Mic20]
5.2.11. Deeplearning4J
Deeplearning4J, also called DL4J, is used with the Java language. It is the only
semi-popular framework that is not available in Python. However, you can import
models written with Keras into DL4J. The framework has a connection to Apache
Spark and Hadoop. [Ecl20]
5.2.12. Chainer
Chainer is a framework developed by the Japanese company Preferred Networks. It
has a small following. [PN20]
5.2.13. FastAI
FastAI is built on PyTorch. Its API was inspired by Keras and requires even less
code for strong results. Jeremy Howard, the driving force behind Fast.AI, was a top
Kaggler and president of Kaggle. [fas20]
Fast.AI is not yet in demand for careers, nor is it widely used. However, it has a
large built-in pipeline of users through its popular free online courses. It is also both
powerful and easy to use. Its uptake could grow significantly.
80
5.3.1. NumPy
NumPy is the fundamental library for scientific computing in Python, and many of
the libraries in this list use NumPy arrays as their basic inputs and outputs. In short,
NumPy introduces objects for multidimensional arrays and matrices, and routines
that allow developers to perform advanced mathematical and statistical functions on
these arrays with as little code as possible. [Fou20b]
5.3.2. SciPy
SciPy builds on NumPy by adding a collection of algorithms and high-level com-
mands for manipulating and visualising data. The package also includes functions
for numerically calculating integrals, solving differential equations, optimisation and
more.
5.3.3. pandas
Pandas adds data structures and tools designed for practical data analysis in finance,
statistics, social sciences and engineering. Pandas is well suited for incomplete, messy
and unlabelled data (i.e. For the kind of data you are likely to face in the real world)
and provides tools for shaping, merging, reshaping and splitting datasets.
5.3.4. IPython
IPython extends the functionality of Python’s interactive interpreter with a souped-up
interactive shell that adds introspection, rich media, shell syntax, tab completion and
command archive retrieval. It also acts as an integrated interpreter for your programs,
which can be particularly useful for debugging. If you have ever used Mathematica or
MATLAB, you will feel at home with IPython.
5.3.5. Matplotlib
Matplotlib is the standard Python library for creating 2D diagrams and plots. The
API is quite low-level, i.e. it requires several commands to produce good-looking
graphs and figures compared to some more advanced libraries. However, the advantage
is greater flexibility. With enough commands, you can create almost any graph with
matplotlib.
5.3.6. scikit-learn
scikit-learn builds on NumPy and SciPy by adding a set of algorithms for general ma-
chine learning and data mining tasks, including clustering, regression and classification.
As a library, scikit-learn has much to offer. Its tools are well documented and among
the contributing developers are many machine learning experts. Moreover, it is a very
helpful library for developers who do not want to choose between different versions of
the same algorithm. Its power and ease of use make the library very popular.
Chapter 5. 81
5.3.7. Scrapy
Scrapy is a library for creating spiderbots to systematically crawl the web and extract
structured data such as prices, contact information and URLs. Originally developed
for web scraping, Scrapy can also extract data from APIs.
5.3.8. NLTK
NLTK stands for Natural Language Toolkit and provides an effective introduction to
Natural Language Processing (NLP) or text mining with Python. The basic functions
of NLTK allow you to mark up text, identify named entities and display parse trees
that reveal parts of speech and dependencies like sentence diagrams. This gives you
the ability to do more complicated things like sentiment analysis or generate automatic
text summaries.
5.3.9. Pattern
Pattern combines the functionality of Scrapy and NLTK into a comprehensive library
that aims to serve as an out-of-the-box solution for web mining, NLP, machine learning
and network analysis. Its tools include a web crawler; APIs for Google, Twitter and
Wikipedia; and text analytics algorithms such as parse trees and sentiment analysis
that can be run with just a few lines of code.
5.3.10. Seaborn
Seaborn is a popular visualisation library built on top of matplotlib. With Seaborn,
graphically very high quality plots such as heat maps, time series and violin plots can
be generated.
5.3.11. Bokeh
Bokeh allows the creation of interactive, zoomable plots in modern web browsers using
JavaScript widgets. Another nice feature of Bokeh is that it comes with three levels of
user interface, from high-level abstractions that let you quickly create complex plots
to a low-level view that provides maximum flexibility for app developers.
5.3.12. basemap
Basemap supports adding simple maps to matplotlib by taking coordinates from
matplotlib and applying them to more than 25 different projections. The library
Folium builds on Basemap and allows the creation of interactive web maps, similar to
the JavaScript widgets of Bokeh.
5.3.13. NetworkX
This library allows you to create and analyse graphs and networks. It is designed
to work with both standard and non-standard data formats, making it particularly
efficient and scalable. With these features, NetworkX is particularly well suited for
analysing complex social networks.
5.3.14. LightGBM
Gradient Boosting is one of the best and most popular libraries for Machine Learning
and helps developers to create new algorithms by using newly defined elementary
models and especially decision trees.
82
Accordingly, there are special libraries designed for a fast and efficient implementation
of this method. These are LightGBM, XGBoost and CatBoost. All these libraries are
competitors but help solve a common problem and can be used in almost similar ways.
5.3.15. Eli5
Most of the time, model prediction results in machine learning are not really accurate.
Eli5, a library built into Python, helps overcome this very challenge. It is a combination
of visualising and debugging all machine learning models and tracking all steps of the
algorithm.
5.4.2. OpenVINO
The OpenVINO toolkit helps accelerate the development of powerful computer vision
and deep learning in vision applications. It enables Deep Learning via hardware
accelerators as well as simple, heterogeneous execution on Intel platforms, including
accpu, GPU, Field-Programmable Gate Array (FPGA) and Video Processing Unit
(VPU). Key components include optimised features for OpenCV. [Int19b; Tea20b]
5.4.4. OpenNN
OpenNN is a software library written in C++ for advanced analysis. It implements
neural networks. This library is characterised by its execution speed and memory
allocation. It is constantly optimised and parallelised to maximise its efficiency.
[AIT20]
5.5.1. glob
The glob module finds all pathnames in a given directory which match a given pattern
and returns them in an arbitrary way. [Fou20a]
5.5.2. os
The os module provides general operating system functionalities. The functions
of this library allow programmes to be designed in a platform-independent way.
platform-independent. The abbreviation os stands for Operating System. It enables
access to and reference to certain paths and the files stored there. [Bal19]
5.5.3. PIL
The abbreviation PIL stands for Python Image Libary and provides many functions
for image processing. image processing. It provides fast data access to the basic image
formats. image formats is guaranteed. In addition to image archiving and batch
function applications file formats, create thumbnails, print images and perform many
other functions. many other functions can be performed.
5.5.4. LCC15
5.5.5. math
As can be easily inferred from the name, this library provides access to all mathematical
functions defined in the C standard. These include among others trigonometric
functions, power and logarithm functions and angle functions. tions. Complex
numbers are not included in this library. [Fou20b]
5.5.6. cv2
With the cv2 module, input images can be rewritten into three-dimensional arrays.
OpenCV contains algorithms for image processing and machine vision. The algorithms
are based on the latest research results and are continuously being further developed.
The modules from image processing include, for example, face recognition, as well as
many fast filters and functions for camera calibration. The machine vision modules
include boosting (automatic classification), decision tree learning and artificial neural
networks. [Wik20]
5.5.7. random
The Random module implements pseudo-random number generators for different
distributions. distributions. For example, random numbers can be generated or a
mixture of elements can be created. of elements can be generated. [Fou20c]
84
5.5.8. pickle
The pickle module implements binary protocols for serialising and deserialising a
Python object structure. of a Python object structure. Object hierarchies can be
converted in a binary stream (pickle) and then (pickle) and then be returned from
a binary architecture back to the object hierarchical structure (unpickle). structure
(unpickle). [Fou20d]
5.5.9. PyPI
The Python Software Foundation organisation organises and manages various packages
for Python programming. [Fou21a] Here you can find packages for various tasks. An
example is the official API for accessing the dataset COCO. [Fou21b]
6. Libraries, Modules, and Frameworks
Libraries, Module, and Framework have an inevatible role in Machine learning and
computer vision field. They are written on a specific topic, which have build-in functions
and classes who help the developer and data scientist to run usefull application. Some
usefull libraries, modules, and framework use in Machine learning (ML) and Computer
Vision(CS) are:
• Tensor Flow
• MediaPipe
• OpenCV
• Pyfirmata
85
86
6.2. OpenCV
OpenCV is the huge open-source library for the computer vision, machine learning, and
image processing. It plays a major role in real-time operation which is very important
in today’s systems. Computer Vision is the science of programming a computer to
process and ultimately understand images and video, or simply saying making a
computer see. Solving even small parts of certain Computer Vision challenges, creates
exciting new possibilities in technology, engineering and even entertainment. By using
it, one can process images and videos to identify objects, faces, or even handwriting
of a human. When we create applications for computer vision that we don’t want to
build from scratch we can use this library to start focusing on real world problems.
There are many companies using this library today such as Google, Amazon, Microsoft
and Toyota. OpenCV.
• Face recognition
• Object recognition
6.3. MediaPipe
MediaPipe is one of the most widely shared and re-usable libraries for media processing
within Google. Google open-source MediaPipe was first introduced in June, 2019. It
aims to to provide some integrated computer vision and machine learning features.
It has some build in module who can detect the Human motion by tracking the
different part of the body. This Github link [MediaPipe Github] shows all the available
application the MediaPipe Module able to perform. It is best suited for gesture
detection for edge computing application.
• Hand Landmarks
• Face Detection
• Object Detection
• Holistic
Jetson Nano
89
7. Jetson Nano
7.1. The Jetson Nano and the Competition
System-on-a-chip (SOC) systems offer a low-cost entry point for developing autonomous,
low-power deep-learning applications. They are best suited for inference, i.e. for
applying pre-trained machine learning models to real data. The Jetson Nano from
the manufacturer NVIDIA is such an SOC system. It is the smallest AI module in
NVIDIA’s Jetson family, promises high processing speeds high processing speed with
moderate power consumption at the same time. [NVI19]
At the lower end of the price spectrum, several low-cost systems have come to market
in recent years that are suitable for developing machine learning applications for
embedded systems. Chips and systems have emerged that have special computational
kernels for such tasks, with low power consumption at the same time.
For developers, Intel has created an inexpensive solution with the Neural Compute
Stick, for example, which can be connected to existing hardware such as a Raspberry
Pi via USB. For around 70 euros, you get a system with 16 cores that is based on
Intel’s own low-power processor chip for image processing, Movidius. The product
from Intel uses Movidius Myriad X Vision Processing Unit and can be used with the
operating systems Windows® 10, Ubuntu or MacOS. [Int19a]
Google is trying to address the same target group with the Google Edge TPU Boards.
For 80 euros, you get a Tensor Processing Unit (TPU) that uses similar hardware to
Google Cloud or Google’s development environment for Jupyter notebooks Colab and
can also be used as a USB 3 dongle. [Vin18; Goo19a; Goo11b]
Comparable technology can also be found in the latest generation of Apple’s iPhone
with the A12 Bionic chip with 8 cores and 5 trillion operations per second, in Samsung’s
Exynos 980 with integrated Neural Processing Unit (NPU) or in Huawei’s Kirin 990
with two NPUs at once. [Sam20; HDC20]
But the real added value of the Jetson Nano is in its design for easy implementation
of artificial intelligence. What makes this so easy is the NVIDIA TensorRT software
development kit, which enables high-performance deep learning interference. An
already trained network can be implemented and evolve as it is confronted with the
new data available in the specific application.
NVidia’s [NVI20a] website features a variety of projects from the Jetson community.
Besides simple teleoperation, the Jetson Nano also allows the realisation of collision
avoidance and object tracking up to the further construction of a small autonomous
vehicle. In addition to the non-commercial projects, industrial projects are also
91
92
presented. One example is the Adaptive Traffic Controlling System, which uses a
visual traffic congestion measurement method to adjust traffic light timing to reduce
vehicle congestion. Another project enables the visually or reading impaired to hear
both printed and handwritten text by converting recognised sentences into synthesised
speech. This project also basically uses only the Jetson Nano and the Pi camera.
Tensorflow 2.0 and Google Colab are used to train the model. The recognition of sign
language, faces and objects can also be realised with Jetson Nano and pre-trained
models.
8. Application “Image Recognition
with the Jetson”
8.1. Application Example “Bottle Detection”
Due to growing data analytics capabilities, developments such as predictive mainte-
nance and intelligent fault detection are being driven in industry. By using artificial
intelligence, fewer or no human decisions are required and more complex relationships
can be identified. The goal here is often quality control or early detection of error
sources or expected errors.
A simpler application is the rapid detection of faults that have already occurred.
A defect can be a foreign object or unwanted object that can enter the system in
any phase such as production, manufacturing, assembly or logistics of a product
development. To avoid this or to detect it at an early stage, an artificial intelligence
system can be used to continuously monitor the system in the required phases.
In this example, bottles are to be detected in any environment. The object detection
is to be continuous. A camera is used to capture the environment. The collected data
is verified with the NVIDIA Jetson Nano. If a bottle is detected, the Jetson Nano
shall provide a binary signal to a connected system. In industry, the system could be
a Speicherprogrammierbare Steuerung (SPS) that controls a machine.
After starting the programme, the live image of the camera is permanently displayed
on the screen and the result of the image recognition is shown, as can be seen in
Figure 8.1. This makes it easy to check the results of the Jetson Nano during the
development and testing phase.
93
94
95
96
J44
Serial
J38
PoE
J40
Buttons
J13 J41
Kamera Expansion
J48
Jumper J15
5V Fan
PWR
LED
If no monitor is to be used in the application, this must be taken into account when
installing the operating system.
µUSB interface
A power supply (5V 2A) with µUSB output can also be used at the µUSB interface.
µUSB was chosen because inexpensive power supplies with this interface are easy to
find. The maximum power consumption is 5W to 10W. A high quality power supply
unit should be chosen. If an unsuitable power supply is used, the NVIDIA Jetson
Nano will not start. A simple measure is to use a power supply with a shorter cable,
as this leads to a lower voltage drop.
The Jetson Nano has 4 USB 3.0 interfaces. The current consumption on a USB
interface is limited to 100mA. If the connected device has a higher consumption, it
must be equipped with an additional power supply. As a rule, the USB keyboard and
mouse can be connected.
Ethernet interface
With the help of the Ethernet interface with RJ45 connection, the computer can either
be connected directly to the Internet or communicate with another computer. Gigabit
Ethernet can be used with a patch cable plugged into the router. Ethernet connections
based on 10/100/1000BASE-T are supported. However, a WiFi connection can also
be achieved with an optional USB dongle for WLAN.
Status LED
The status LED indicates whether the Jetson Nano is receiving power. The status
LED will flash if there is insufficient or too much power from the power supply.
98
GPIOs
The Jetson Nano has an interface through which extensions can be connected. There
are 40 connections to the outside at the J41 interface. They are called General Purpose
Input Output (GPIO). With this, own circuits can be connected and the board can
be provided with new functions. LEDs or motors can be connected here. But there
are other interesting interfaces of the type SPI, I2 C and I2 S available.
microSD card
The Jetson Nano does not have a hard disk with an operating system on it. Therefore,
there must be another way to store it. For this purpose, the Jetson Nano uses a
microSD card. The SD card contains the operating system that is to be started and
the application programmes. Accordingly, the SD card must be chosen. It should
therefore be fast and large enough; the recommended minimum is a 16GB UHS-1 card.
The slot is on the bottom of the Jetson Nano.
9.3. ToDo
Werbung, Quellen fehlen, Siehe pdf-Dokumente bzgl. Raspberry Pi als Vorlage
CTS
TXD
TXD
N/A
RTS
GND
Binary signals are exchanged between two systems via communication via GPIO. Thus,
they are simply digital inputs and outputs that can be defined as inputs or inputs as
required.
By default, all interface signal pins of the Jetson Nano are configured as GPIO. Pins
3, 5, 8, 9, 27 and 28 are exceptions to this. Pins 3 and 5 and pins 27 and 28 are for
the two I2 C interfaces with the data lines Serial Data (SDA) and the line with the
clock signal Serial Clock (SCL). Pins 8 and 10 are reserved for the receive line (Rx) of
the one and the transmit line (Tx) of the UART. [NVI21a]
To use the GPIO, the GPIO library must be installed.[NVI20b].
The so-called pinout of the Jetson Nano can be seen in this figure 9.2.
9.4.5. I2 S-Interface
9.4.6. Serial Peripheral Interface (SPI)
In the default configuration, the Jetson Nano does not have SPI port access. However,
this can be configured on the SChnittstelle J41 Expansion Header.
100
I I
Sysfs GPIO Name Pin Pin Name Sysfs GPIO
I I
I2S_4_SDOUT
gpio78 40 39 GND
D21
I2S_4_SDIN SPI_2_MOSI
gpio77 38 37 gpio12
D20 D26
UART_2_CTS I2S_4_LRCK
gpio51 36 35 gpio76
D16 D19
GPIO_PE6
GND 34 33 gpio38
D13
LCD_BL_PWM GPIO_PZ0
gpio168 32 31 gpio200
D12 D6
CAM_AF_EN
GND 30 29 gpio149
D5
I2C_1_SCL I2C_1_SDA
28 27
D1/I2C Bus 0 D0/I2C Bus 0
SPI_1_CS1
gpio20 26 25 GND
D7
SPI_1_CS0 SPI_1_SCK
gpio19 24 23 gpio18
D8 D11
SPI_2_MISO SPI_1_MISO
gpio13 22 21 gpio17
D25 D9
SPI_1_MOSI
GND 20 19 gpio16
D10
SPI_2_CS0
gpio15 18 17 3.3V
D24
SPI_2_CS1 LCD_TE
gpio232 16 15 gpio194
D23 D22
SPI_2_SCK
GND 14 13 gpio14
D27
I2S_4_SCLK UART_2_RTS
gpio79 12 11 gpio50
D18 D17
UART_2_RX
10 9 GND
/dev/ttyTHS1 - D15
UART_2_TX AUDIO_MCLK
8 7 gpio216
/dev/ttyTHS1 - D14 D4
I2C_2_SCL
GND 6 5
I2C Bus 1 - D3
I2C_2_SDA
5V 4 3
I2C Bus 1 - D2
I
5V 2 1 3.3V
I
PWM
Tach
5V
GND
• CUDA,
• TensorRT,
• Computer Vision,
• Deep Stream SDK.
This enables development for different hardware systems. Thus, the performance can
be scaled depending on the application. [AY18; NVI21b] The figure 9.7 shows the
structure of the development environment with the modules mentioned and others.
The possible uses of the modules are also mentioned.
1. Windows:
• Format your microSD card with the SD memory card formatter from the
company SD Association [Link]. The SD card formatter must be installed
and started. After starting the SD Memory Card Formatter programme, a
dialogue opens, see figure 9.8.
Then the following steps have to be gone through:
a) First, the microSD card must be inserted into an SD card drive.
b) Then the correct card drive is selected.
c) The type „Quick format“ is then selected.
d) The field „Volume label“ must not be changed.
e) After that, formatting can be started by activating the button „Format“.
However, a warning dialogue opens first, which must be confirmed with
the button „Ja“.
After formatting the microSD card, you can write the image file to the mi-
croSD card using the Etcher programme. The following steps are necessary
for this:
a) The programme must be installed and started. After starting the
Etcher programme, a dialogue opens, see Figure 9.8.
Chapter 9. 103
b) In this start dialogue the button „Select image“ is first. Then the
downloaded image file can be selected from.
If the microSD card is not formatted, the operating system reports
this, see figure ??. Then the process must be aborted and the card
must first be formatted.
c) Pressing the button „Flash!“ starts the copying process. The process
takes about 10 minutes if a USB 3 interface is used. During this time
Etcher writes the image file and then validates it.
d) After the process is complete, Windows may report that the microSD
card cannot be read, see Figure ??. This message is to be ignored. The
microSD card can simply be removed from the drive.
Removing the microSD card from the drive completes its preparation.
This decision must be made before the first start. A change cannot be made later.
Then there is only the possibility to change by reinstalling the operating system. This
means that all settings must also be repeated and the software that the operating
system does not provide must also be added again.
Before switching on the Jetson Nano, some preparations must be made. First, a
monitor must be connected via an HDMI cable, a keyboard and a mouse. The Jetson
Nano is ready and also a suitable power supply unit. However, the power supply unit
Chapter 9. 105
must not yet be plugged in, as the Jetson Nano will then start automatically. Under
these conditions, the prepared micorSD card is inserted into the Jetson Nano. Now
the system can be switched on. First the monitor is switched on, then the power
supply unit is connected to the Jetson Nano. The green PWR LED lights up and the
system starts automatically. The boot process takes a few minutes. The system must
then be configured during the initial start-up.
First start
During the first boot process, some settings have to be made:
First, the End User License Agreement (EULA) from NVIDIA for the Jetson Nano
software is displayed. This agreement must be accepted. Then the language must
be set. The keyboard layout can also be selected. This should match the connected
keyboard. A suitable time zone must also be selected. Then the settings must be made,
which will be requested at the next login. The following labels are recommended:
Finally, the size of the partition must be specified. For the standard application, the
maximum size must be specified here.
After logging in
After successful configuration, a welcome message, see ??. The system can now be
used.
106
• Jetson Nano,
• Ethernet cable.
• PC
• Ethernet connector
After that, the Jetson Nano is available. You can then connect to the nano via ssh or
putty:
ssh elmar@JetsonNanoMSRLab
Installation of SSH
An installation of SSH is not necessary, as it is included in the distribution of the
Jetson Nano’s operating system. This only needs to be activated.
Activation of SSH
To activate SSH, the SSH server is started. This is done by the following command:
sudo /etc/init.d/ssh start
Since this is necessary with every use of SSH, this process is automated:
sudo update-rc.d ssh defaults
After this entry, SSH is permanently available, even after a reboot. Access via SSH to
the Jetson Nano is then always possible.
Another possibility is to place an empty file named „ssh“ in the root directory. If the
file exists, SSH is automatically activated at the next boot.
You can activate SSH access via the system settings. To do this, open the configuration
dialogue in the start menu Settings Configuration , switch to the tab „Interfaces“ and set
the item „SSH“ to „Enabled“. After confirmation, SSH is also permanently available.
The following command is used to open the dialogue:
sudo raspi-config
The Jetson Nano must then be restarted:
sudo reboot
In order to use SSH, the IP address of the Jetson Nano must be known. With the
terminal command
ifconfig
the current network settings including the IP address are output.
108
Since 2017, Windows has had an SSH implementation based on OpenSSH, which is
integrated into the PowerShell command line. To use it, the PowerShell must be called
up via the Start menu and the following command entered:
ssh benutzername@<IPAdresseDesJetsonNanos>
When connecting for the first time, the SSH keys of the Jetson Nano must be confirmed
with „yes“. After entering the user password, remote access can be performed.
After the installation is complete, the Jetson Nano must be restarted. Once the
reboot is complete, the nmap command on the PC can be used to check whether the
installation of xrdp was successful:
$ nmap jetson
Starting Nmap 7.70 ( https://nmap.org ) at 2019-04-13 01:39 BST
Nmap scan report for jetson (192.168.1.118)
Host is up (0.0025s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
3389/tcp open ms-wbt-server
Nmap done: 1 IP address (1 host up) scanned in 1.25 seconds
$
Once the settings have been configured, the session can be set in full screen mode by
setting a reasonable resolution for the resulting windowed desktop.
If the PC is connected to the board via RDP, the desktop will look slightly different.
This is because a standard Ubuntu desktop is displayed, running Gnome, rather than
the older Unity-style desktop that is the default for L4T. This means that a separate
desktop is created for the RDP. This desktop remains until the user logs off, closing
the RDP window does not close the desktop. This means that the next time the RDP
window is opened, the desktop will still be available.
However, it should be noted that one cannot be logged on to the physical desktop and
open an RDP desktop at the same time.
If the Jetson Nano is accessed via a wireless network, in the Gnome application, under
„Settings“, select „Power“ from the menu on the left and ensure that „Turn off Wi-Fi
to save power“ is turned off.
Now the monitor, keyboard and mouse can be unplugged, s are no longer needed.
Chapter 9. 109
the project can be copied to the Jetson Nano. The library can then be installed. In
the directory where the setup.py file of the project is located, now the command
can be executed. To be able to work with the library, one must have the user rights.
To do this, first create a GPIO user group using
is created. Then one can set the corresponding user rights by adding a user:
The programme udev manages the devices. The user rules must now be communicated
to this programme. To do this, copy the file 99-gpio.rules into the rules.d directory.
These rules are only used after a restart of the system. Alternatively, the rules can
also be loaded actively:
The GitHub project also contains sample programmes located in the samples folder.
These can be used for testing. One example is the program simple_out.py. It uses
the BCM pin numbering mode of the Raspberry Pi and outputs alternating high and
low values to BCM pin 18, i.e. pin 12, every 2 seconds. This can be used to make an
LED flash. The call is made by the following command:
$ python3 simple_out.py
110
Specification Camera V2
Size 25mm × 23mm × 9mm
Weight 3g
Still image resolution 8 megapixels
Video Mode 1080P30,720P60 and 640x480P60/90
Sensor Sony IMX219
Sensor resolution 3280 × 2464 pixels
Sensor image area 3.68mm × 2.76mm
Pixel size 1.12umx1.12um
Optical size 1/4"
Full frame SLR lens equivalent 35mm
Focal length 3.04mm
hline Horizontal field of view 62.2◦
Vertical field of view 48.8◦
Focal length ratio 2.0
installed. The available formats are then queried with the following command:
$ v4l2-ctl --list-formats-ext
9.10.3. CSI-3-Interface
The camera is controlled via a CSI-3 interface, which is regularly used to integrate
digital still cameras, high-resolution and high-frame-rate sensors, teleconferencing and
camcorder functionalities on a UniPro network.
A basic CSI-3 v1.1 link configuration using four forward lanes and one reverse lane
(10 total wires) can support up to 14.88 Gbps (usable bit rate, including 8B10B and
UniPro overhead) in the forward direction and typically supports 1 Mbps or more in
the reverse direction. The UniPro stack itself uses some link bandwidth (primarily in
the reverse direction) for the purpose of guaranteeing reliable packet delivery to the
receiver. Cameras implementing a minimal MIPI CSI-3 configuration consisting of
one forward and one reverse lane (four total wires) can transmit 12 BPP 4K video at
about 40 FPS. MIPI
height of 2464 pixels at 21 frames per second and to display it in a window with a
width of 960 pixels and a height of 616 pixels, the following line is entered:
$ gst-launch-1.0 nvarguscamerasrc sensor_mode=0 !\
’video/x-raw(memory:NVMM),width=3820, height=2464,\
framerate=21/1, format=NV12’ ! nvvidconv flip-method=0 ! \
’video/x-raw,width=960, height=616’ ! nvvidconv ! \
nvegltransform ! nveglglessink -e
When the camera is attached to the bracket on the Jetson Nano, the image may
initially be upside down. If you set flip-method=2, the image will be rotated by
180◦ . For more options to rotate and flip the image, see table ??.
none 0 no rotation
clockwise 1 Rotate clockwise by 90◦
rotate-180 2 Rotate by 180◦
counterclockwise 3 Rotate counterclockwise by 90◦
horizontal-flip 4 Flip horizontally
vertical-flip 5 Flip vertical
upper-left-diagonal 6 flip over upper-left/lower-right diagonal
upper-right-diagonal 7 Flip over upper right/lower left diagonal
automatic 8 flip method based on image-orientation tag
The documentation for GSteamer can be found at [GSt16] and in a GitHub project.
From the GitHub project, the documentation can be accessed via
git clone https://gitlab.freedesktop.org/gstreamer/gst-docs
• Step2: Calibrating the camera In order to Calibrate the camera, the first
step will be to read in calibration Images of a chess board. It’s recommended
to use at least 20 images to get a reliable calibration [SM20] To calibrate
the camera opencv gives calibrateCamera()function. retval, cameraMa-
trix, distCoeffs, rvecs, tvecs = cv2.calibrateCamera(objectPoints,
imagePoints, imageSize) In calibrateCamera() function we need object
points and image points. The coordinates of the corners in the 2D displayed
image which called as imagepoints are mapped to the 3D coordinates of the
real, undistorted chessboard corners, which are called as objectpoints.The z
coordinates will stay zero so leave that as it is but, for the first two columns
x and y, Numpy’s mgrid function is used to generate the coordinates that we
want. mgrid returns the coordinate values for given grid size and shape those
113
114
coordinates back into two columns, one for x and one for y. The following are
returned by this function:
– Distortion coefficients
– It also returns the position of the camera in the world, with the values of
rotation and translation vectors rvecs, tvecs
Note: More details about the concept of distortion in cameras,a step-by-step procedure
to calibrate the camera matrix, distortion coefficients and remove distortion and
the corresponding python programs was explained in the file Distortion_Camera_-
Calibration_OpenCV.tex in the directory “JetsonNano_Inhalt_Software”.
Figure 10.1.: Camera matrix calculated by the cameracalibrate function for the logitech
webcam
The below is the image of before and after distortion removal by the python program:
Chapter 10. Camera Calibration using OpenCV 115
Figure 10.3.: Image of the chessboard before and after removing distortion
There is no noticeable difference between the original image before and after the
distortion removal.However, it is important to remove the distortion in the image as
we need to calculate the real-world coordinates of the object’s centroid from the pixel
co-ordinates.
11. Distortion and Camera Calibration
The process of estimating the parameters of the camera is called camera calibra-
tion.When a camera looks at 3D objects in the real world and transforms them into a
2D image, the transformation isn’t perfect.Sometimes, the images are distorted: edges
bent,sort of rounded or stretched outward. It is necessary to correct this distortion in
some use cases where the information from the image is crucial.
117
118
Tangential distortion: Tangential distortion occurs mainly because the lens is not
parallely aligned to the imaging plane, that makes the image to be extended a little
while longer or tilted, it makes the objects appear farther away or even closer than
they actually are.
So, In order to reduce the distortion, this distortion can be captured by five numbers
called Distortion Coefficients, whose values reflect the amount of radial and tangential
distortion in an image. If we know the values of all the coefficients, we can use them
to calibrate our camera and undistort the distorted images.
This means we have all the information (parameters or coefficients) about the camera
required to determine an accurate relationship between a 3D point in the real world
and its corresponding 2D projection (pixel) in the image captured by that calibrated
camera.
Typically this means recovering two kinds of parameters
A chessboard is great for calibration because it’s regular, high contrast pattern makes
it easy to detect automatically. Checkerboard patterns are distinct and easy to detect
in an image. Not only that, the corners of squares on the checkerboard are ideal for
localizing them because they have sharp gradients in two directions In addition, these
corners are also related by the fact that they are at the intersection of checkerboard
lines. All these facts are used to robustly locate the corners of the squares in a
checkerboard pattern.
Chapter 11. 119
The below image shows the difference between image of the chessboard with and
without the distortion.
Figure 11.6.: The Image of the chessboard passed to the opencv functions to identify
and draw corners
Listing 11.1.: Script to find and mark the corners on a chessboard image
• Distortion coefficients
• It also returns the position of the camera in the world, with the values of rotation
and translation vectors rvecs, tvecs
The below are the pictures of the camera matrix and distortion coefficients calculated
by the function.
122
Figure 11.8.: Camera matrix calculated by the cameracalibrate function for the logitech
webcam
The undistort function takes in a distorted image, our camera matrix, and distortion
coefficients and it returns an undistorted, often called destination image. OpenCV
has a function called cv2.undistort() which takes a camera matrix, distortion
coefficients and the image.
dst = cv2.undistort(img, camera_matrix, dist_coefs, None, newcameramtx)
Figure 11.10.: Image of the chessboard before and after removing distortion
#! / u s r / b i n / env python
’’’
camera c a l i b r a t i o n f o r d i s t o r t e d images w i t h c h e s s board s a m p l e s
r e a d s d i s t o r t e d images , c a l c u l a t e s t h e c a l i b r a t i o n and w r i t e u n d i s t o r t e d images
usage :
c a l i b r a t e . py [−−o u t <o u t p u t path >] [−− s q u a r e _ s i z e ] [< image mask >]
default values :
−−o u t : . sample / o u t p u t /
−−s q u a r e _ s i z e : 1 . 0
<image mask> d e f a u l t s t o . o u t / c h e s s b o a r d / ∗ . j p g
# Python 2/3 c o m p a t i b i l i t y
from __future__ import p r i n t _ f u n c t i o n
# l o c a l modules
from common import s p l i t f n
# b u i l t −i n modules
import o s
import s y s
from g l o b import g l o b
import numpy a s np
import cv2
import l o g g i n g
import a r g p a r s e
i f __name__ == ’__main__ ’ :
# Parse arguments
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n= ’ Generate ␣ camera ␣ matrix ␣and␣ ’
’ d i s t o r t i o n ␣ p a r a m e t e r s ␣ from ␣ c h e c k e r b o a r d ␣ ’
’ images ’ )
p a r s e r . add_argument ( ’ images ’ , help= ’ path ␣ t o ␣ images ’ )
p a r s e r . add_argument ( ’ pattern_x ’ , metavar= ’X ’ , d e f a u l t =9, type=int ,
help= ’ p a t t e r n ␣x ’ )
p a r s e r . add_argument ( ’ pattern_y ’ , metavar= ’Y ’ , d e f a u l t =6, type=int ,
help= ’ p a t t e r n ␣y ’ )
p a r s e r . add_argument ( ’−−out ’ , help= ’ o p t i o n a l ␣ path ␣ f o r ␣ output ’ )
p a r s e r . add_argument ( ’−−s q u a r e _ s i z e ’ , d e f a u l t =1.0)
a r g s=p a r s e r . p a r s e _ a r g s ( )
l o g g i n g . debug ( a r g s )
# g e t images i n t o a l i s t
e x t e n s i o n s = [ ’ j p g ’ , ’JPG ’ , ’ png ’ ]
i f o s . path . i s d i r ( a r g s . images ) :
img_names = [ f n f o r f n in o s . l i s t d i r ( a r g s . images )
i f any ( f n . e n d s w i t h ( e x t ) f o r e x t in e x t e n s i o n s ) ]
p r o j _ r o o t = a r g s . images
else :
l o g g i n g . e r r o r ( "%s ␣ i s ␣ not ␣ a ␣ d i r e c t o r y " % a r g s . images )
exit ()
i f not a r g s . out :
out = o s . path . j o i n ( p r o j _ r o o t , ’ out ’ )
else :
out = a r g s . out
i f not found :
print ( ’ c h e s s b o a r d ␣ not ␣ found ’ )
continue
img_points . append ( c o r n e r s . r e s h a p e ( −1 , 2 ) )
o b j _ p o i n t s . append ( p a t t e r n _ p o i n t s )
print ( ’ ok ’ )
# c a l c u l a t e camera d i s t o r t i o n
rms , camera_matrix , d i s t _ c o e f s , r v e c s , t v e c s = cv2 . c a l i b r a t e C a m e r a ( o b j _ p o i n t s
# w r i t e t o m a t r i x t o be used as i n p u t
with open ( o s . path . j o i n ( out , " matrix . t x t " ) , "w" ) a s matf :
camera_matrix . r e s h a p e ( ( 3 , 3 ) )
np . s a v e t x t ( matf , ( camera_matrix [ 0 ] , camera_matrix [ 1 ] , camera_matrix [ 2 ] ) ,
# u n d i s t o r t t h e image w i t h t h e c a l i b r a t i o n
f o r img_found in img_names_undistort :
img = cv2 . imread ( img_found )
h , w = img . shape [ : 2 ]
newcameramtx , r o i = cv2 . getOptimalNewCameraMatrix ( camera_matrix , d i s t _ c o e
cv2 . destroyAllWindows ( )
11.3.4. Use
Light? WS:Light? Distance?
Power? How to change?
12. Arducam IMX477
Für Jetson IMX477 HQ Camera Board, 12.3MP Camera Board für Nvidia Jetson
Nano/Xavier NX, Raspberry Pi Compute Module
Farbe: IMX477 Kamera Board für Jetsons
Ein Can’t-Miss Update auf IMX219 – Viel bessere Bildqualität und Flexibilität der
Linse, dank 53% mehr Pixel, 92 % größere Pixelgröße, 192 % größerer Bildbereich und
einer CS-Mount. Maximale Standbildauflösung bei 4056 × 3040, Bildfrequenzen: 30
fps Full 12,3 MP, 60 fps 1080P.
Beschleunigt durch Jetson Hardware ISP Engine – unterstützt NVIDIA Argus Kamer-
a-Plugin für H264-Kodierung, JPEG-Snapshots sowie Gain, Belichtung, Framerate
und Gruppe, Hold-Kontrolle.
4-Lane Camera for the Future - Route out all 4 data spurs to the camera connector
for customized Carrier boards with 4-spane CSI-Anschluss and future 4-spane camera
driver updates. For the new version (R32 4.4), please go to the link:
https://github.com/ArduCAM/MIPI_Camera/releases
Hinweis:
• Es kann auch auf dem Raspberry Pi Compute Modul verwendet werden. (wie
CM3, CM3+), aber nicht kompatibel mit Standard Raspberry Pi Modellen
International products have separate terms, are sold from abroad and may differ
from local products, including fit, age ratings, and language of product, labeling or
instructions
127
128
• https://github.com/JetsonHacksNano/CSI-Camera/blob/master/simple_
camera.py
• https://www.arducam.com/docs/camera-for-jetson-nano/native-jetson-cameras-imx219-imx4
imx477-troubleshoot/
• https://www.arducam.com/sony/imx477/
• https://www.uctronics.com/camera-modules/camera-for-raspberry-pi/
high-quality-camera-raspberry-pi-12mp-imx477.html
• https://www.arducam.com/sony/imx477/
Chapter 12. Arducam IMX477 129
# MIT L i c e n s e
# C o p y r i g h t ( c ) 2019 J e t s o n H a c k s
# See l i c e n s e
# Using a CSI camera ( such as t h e Raspbe rry Pi Version 2) c o n n e c t e d t o a
# NVIDIA J e t s o n Nano D e v e l o p e r Kit u s i n g OpenCV
# D r i v e r s f o r t h e camera and OpenCV a r e i n c l u d e d i n t h e b a s e image
import cv2
def g s t r e a m e r _ p i p e l i n e (
capture_width =1280 ,
c a p t u r e _ h e i g h t =720 ,
d i s p l a y _ w i d t h =1280 ,
d i s p l a y _ h e i g h t =720 ,
f r a m e r a t e =60 ,
flip_method =0,
):
return (
" nvarguscamerasrc ␣ ! ␣ "
" v i d e o /x−raw ( memory :NVMM) , ␣ "
" width=( i n t )%d , ␣ h e i g h t =( i n t )%d , ␣ "
" format =( s t r i n g )NV12 , ␣ f r a m e r a t e =( f r a c t i o n )%d/1 ␣ ! ␣ "
" nvvidconv ␣ f l i p −method=%d␣ ! ␣ "
" v i d e o /x−raw , ␣ width=( i n t )%d , ␣ h e i g h t =( i n t )%d , ␣ format =( s t r i n g )BGRx␣ ! ␣ "
" videoconvert ␣ ! ␣"
" v i d e o /x−raw , ␣ format =( s t r i n g )BGR␣ ! ␣ a p p s i n k "
% (
capture_width ,
capture_height ,
framerate ,
flip_method ,
display_width ,
display_height ,
)
)
def show_camera ( ) :
# To f l i p t h e image , modify t h e f l i p _ m e t h o d parameter (0 and 2 a r e t h e most c
print ( g s t r e a m e r _ p i p e l i n e ( flip_method =0))
cap = cv2 . VideoCapture ( g s t r e a m e r _ p i p e l i n e ( flip_method =0) , cv2 .CAP_GSTREAMER)
i f cap . isOpened ( ) :
window_handle = cv2 . namedWindow ( " CSI␣Camera " , cv2 .WINDOW_AUTOSIZE)
# Window
while cv2 . getWindowProperty ( " CSI␣Camera " , 0 ) >= 0 :
r e t _ v a l , img = cap . r e a d ( )
cv2 . imshow ( " CSI␣Camera " , img )
# This a l s o a c t s as
keyCode = cv2 . waitKey ( 3 0 ) & 0xFF
# Stop t h e program on t h e ESC key
i f keyCode == 2 7 :
break
cap . r e l e a s e ( )
cv2 . destroyAllWindows ( )
else :
print ( " Unable ␣ t o ␣ open ␣ camera " )
13. Lens Calibration Tool
Arducam Lens Calibration Tool, Sichtfeld (Field of View, FoV) Test Chart Folding
Card, Pack of 2
https://www.amazon.com/-/de/dp/B0872Q1RLD/ref=sr_1_40?__mk_de_DE=ÅMÅŽ~O~N&
dchild=1&keywords=arducam&qid=1622358908&sr=8-40
13.1. Übersicht
Sind Sie immer noch frustriert von der Berechnung des FOV Ihrer Objektive und
den unscharfen Bildern? Arducam hat jetzt ein multifunktionales Tool für Objektive
veröffentlicht, mit dem Sie das Sichtfeld des Objektivs ohne Berechnung erhalten und
den Objektivfokus schnell und einfach kalibrieren können.
13.2. Applications
Focus calibration, sharpness estimation and field of view quick measuring for M12,
CS-Mount, C-Mount, and DSLR lenses
131
132
Figure 14.1.: Jetson Nano MSR-Lab in case with Pi camera, front view
On the side of the Jetson Nano (picture 14.2), the required connections - USB, HDMI,
Ethernet, Micro USB and the microSD card slot - are exposed.
At the back of the Jetson Nano (Figure 14.3) is the J41 header with the GPIO pins.
133
134
Figure 14.3.: J41 header of the Jetson Nano MSR-Lab in the case, rear view
14.1. GPIOs
The pinout of the Jetson Nano and the instructions for installing the required libraries
can be found in chapter ??.
The Jetson GPIO library offers four options for numbering the I/O pins, so the
numbering to be used must be specified.
GPIO.setmode(GPIO.BOARD)
oder
GPIO.setmode(GPIO.BCM)
oder
GPIO.setmode(GPIO.CVM)
oder
GPIO.setmode(GPIO.TEGRA_SOC)
GPIO.setup(channel, GPIO.IN)
or
GPIO.setup(channel, GPIO.OUT)
where under channel the pin is specified in the given numbering. The state of the
pin prepared in this way can now be checked via
GPIO.output(channel, GPIO.HIGH)
and
GPIO.output(channel, GPIO.LOW)
be changed.
330Ω
LED
10kΩ
12 BC337
Figure 14.4.: Circuit for testing the GPIO pins with LED
A photo of the circuit can be seen in figure ??. The cable on the left in this photo is
connected to pin 2, the one in the middle to pin 6 and the one on the right to pin 12.
136
Figure 14.5.: Circuit for testing the GPIO pins with LED
15. “Hello World” - Image Recognition
This chapter shows how the Jetson Nano can be used for image recognition. However,
no own network is trained, but a trained network is used. An example project from
NVIDIA is used for this purpose. In the example, an attempt is made to recognise
bottles. The result is then a probability; a value of 70% is called good.
137
138
Model Argument Classes
Detect-COCO-Airplane coco-airplain airplane
Detect-COCO-Bottle coco-bottle bottles
Detect-COCO-Chair coco-chair chairs
Detect-COCO-Dog coco-dogs dogs
ped-100 pednet pedestrian
multiped-500 multiped pedestrians & luggage
facenet-120 facenet faces
WS:Insert concrete
example here
15.2.2. Using the Console Programme imagenet-console.py
The Python programme imagenet-console.py can also be used. The programme
performs an inference for an image with the model imageNet. The programme also
expects three transfer parameters. The first parameter is the image to classify. The
second parameter is a file name; in this file the image is stored overlaid with the
bounding boxes. The argument is the classification model.
The call to the application could look like this, for example:
$ ./imagenet-console.py --network=googlenet images/orange_0.jpg output_-
WS:What is googlenet? 0.jpg
--width: Setting the width of the camera resolution. The default value is 1280.
--height: Setting the width of the camera resolution. The default value is 720.
The resolution should be set to a format that the camera supports. WS:Beispiel fehlt
The use of these arguments can be combined as needed. To run the programme with
the GoogleNet network using the Pi camera with a default resolution of 1280 × 720,
entering this command is sufficient:
$ ./imagenet-camera.py
The OpenGL window displays the live camera stream, the name of the classified
object and the confidence level of the classified object along with the frame rate of the
network. The Jetson Nano provides up to 75 frames per second for the GoogleNet
and ResNet-18 models. WS:What is ResNEt-18
Since each individual image is analysed in this programme and the result is output,
rapid changes between the identified object classes may occur in the case of ambiguous
assignments. If this is not desired, modifications must be made.
#!/usr/bin/python
import argparse
After the shebang sequence, the necessary import statements are added. Here they are
the modules jetson.inference and jetson.utils. These two modules are used for
loading images and image recognition. In addition, the standard package argparse is
given here. With the help of the functions of the package, the command line can be
parsed.
Part III.
141
16. Python Installation
Python is commonly used for developing websites and software, task automation, data
analysis, and data visualization. Since it’s relatively easy to learn, Python has been
adopted by many non-programmers such as accountants and Data scientists, for a
variety of everyday tasks, like organizing finances. It is an interpreted, object-oriented,
high-level programming language with dynamic semantics. Python Installation at the
time of writing this report the latest version of python is 3.9.7 release.16.1
143
17. Virtual environments in Python
When creating software, it is a difficult undertaking to have a well-defined development
environment. When developing software with Python, one is supported. A virtual
environment can be set up here. It ensures, even if individual software packages are
updated on the computer, that the defined package versions are always used.
WS:Could be explained
better
The following command creates a folder python-envs if it does not already exist, and
a new virtual environment in /home/msr/python-envs/env with the name env:
python3 -m venv ∼/python-envs/env
After these two steps, the virtual environment can be used. To do this, it must be
activated:
source ∼/python-envs/env/bin/activate
The terminal indicates that the environment is active by displaying the environment
name (env) before the user name. To deactivate the environment, simply run the
command deactivate.
Now individual packages can be installed. In order to be able to manage the installed
packages, the package wheel should also be installed; this is needed for the installation
of many subsequent packages.
pip3 install wheel
A new virtual environment should be created for each major project. This way, specific
package versions can be installed. If updating a package causes problems, the virtual
environment can simply be deleted. This is easier than reinstalling the entire system.
145
146
In the Windows operating system the package pipenv can now be installed.
pip install pipenv
On the other hand, Linux/macOS users can use the following command to install the
WS:Why brew if pip pipenv package after installing LinuxBrew.
works?
brew install pipenv
Now a new virtual environment can be created for the project with the following
command:
pipenv shell
The command creates a new virtual environment virtualenv and a pip file Pipfile
for the project. After installing the package virtualenv, the necessary packages for
the project can now be installed in the virtual environment. In the following command
the packages requests and flask are installed.
pipenv install requests
pipenv install flask
The command reports that the file Pipfile has been changed accordingly. The
following command can be used to uninstall the package.
pipenv uninstall flask
To be able to work outside the package, the virtual environment must be disabled:
exit
The virtual environment can then be deactivated again with the command
source MyProject/bin/activate
can be activated.
With the installation of Anaconda there is an easy way to define a virtual environment.
The command
conda create -n MyProject python = 3.7
creates a new virtual environment named MyProject using the specific Python version
of 3.7. After creating the virtual environment, it can be activated:
conda activate MyProject
Alternatively, the package manager inside the virtual environment can be used.
pip install numpy
For documentation purposes, all installed packages within a virtual environment can
be displayed. The following command is used for this, with the figure 17.2 showing a
possible output:
conda list
If one wants to use a different virtual environment, the current one must be deactivated:
conda deactivate
All created virtual environments can also be displayed. This facilitates their manage-
ment. The figure 17.3 shows a possible output.
conda env list
Finally, you can also remove a virtual environment. The following command removes
the virtual environment MyProject with all its packages.
conda env remove -n myenv
After starting the programme, one can switch to the „Environments“ task according to
Figure 17.4, where the virtual environment base is always displayed. It is the default
setting.
In this dialogue one can press the button „Create“. Then the necessary information
must be entered to create a new virtual environment called new-env, see figure 17.5.
After that, you can display all installed packages and install new ones. To activate
the virtual environment, the corresponding environment must be selected.
150
151
152
TensorFlow Lite, short “TFLite”, is a set of tools for converting and optimizing
TensorFlow models for use on mobile and edge devices.TensorFlow Lite, as an Edge
AI implementation, significantly eliminates the challenges of integrating large-scale
computer vision with on-device machine learning, allowing machine learning to be
used anywhere.TensorFlow Lite for Microcontrollers is a machine learning framework
designed to run on microcontrollers and other devices with limited memory.We can
improve the intelligence of billions of devices in our lives, including household appliances
and Internet of Things devices, by bringing machine learning to tiny microcontrollers,
rather than relying on expensive hardware or reliable internet connections, which are
often subject to bandwidth and power constraints and result in high latency. It also
helps in data security as the data stays in the device.TensorFlow Lite is specially
optimized for on-device machine learning (Edge ML). As an Edge ML model, it is
suitable for deployment to resource-constrained edge devices. [Goo19c; DMM20]
3. The class has a call function which will run an input through those layers.
# Load i n t h e TensorFlow l i b r a r y
import t e n s o r f l o w a s t f
# C r e a t e an i n s t a n c e o f t h e model
model = MyModel ( )
In this section, the complete process of training a neural network is shown. It is started
with a data set containing 70,000 images. This data set is already divided into test and
training images. Since it is the ten digits, the number of classes is also given as ten. A
neural network with one layer is used. All in all, this results in a short training time.
Since the images only have a resolution of 28 × 28-pixel and the neural network is
very small, only an accuracy of about 60 % is achieved. Extending the neural network
with a hidden layer containing 200 neurons improves the result significantly to over 80
%. A further improvement is achieved by using a CNN. This achieves an accuracy
of 99 %. During the training, the results are displayed graphically with the help of
Tensorboard.
Formalities must be observed when training a neural network with TensorFlow. The
following sequence must be taken into account:
import t e n s o r f l o w . k e r a s . backend a s K
from t e n s o r f l o w . k e r a s . u t i l s import t o _ c a t e g o r i c a l
dat_form = K. image_data_format ( )
rows , c o l s = 2 8 , 28
train_size = x _ t r a i n . shape [ 0 ]
test_size = x _ t e s t . shape [ 0 ]
i f dat_form == ’ c h a n n e l s _ f i r s t ’ :
x _ t r a i n = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , 1 , rows , c o l s )
x _ t e s t = x _ t e s t . r e s h a p e ( t e s t _ s i z e , 1 , rows , c o l s )
input_shape = ( 1 , rows , c o l s )
else :
x _ t r a i n = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , rows , c o l s , 1 )
x _ t e s t = x _ t e s t . r e s h a p e ( t e s t _ s i z e , rows , c o l s , 1 )
input_shape = ( rows , c o l s , 1 )
# norm d a t a t o f l o a t i n range 0 . . 1
x_train = x_train . astype ( ’ f l o a t 3 2 ’ )
x_test = x_test . astype ( ’ f l o a t 3 2 ’ )
x _ t r a i n /= 255
x _ t e s t /= 255
# conv c l a s s v e c s t o one h o t v e c
y _ t r a i n = t o _ c a t e g o r i c a l ( y_train , 1 0 )
y _ t e s t = t o _ c a t e g o r i c a l ( y_test , 1 0 )
x_train = x_train [ : 1 0 0 ]
y_train = y_train [ : 1 0 0 ]
To reduce the training time for the first trials, instead of starting with all training
data, the data is initially limited to 100 images:
The command flatten() transforms the input data, here a field with 28 by 28 entries,
into a one-dimensional vector of size 784. The function model.add() adds each of
the calculations as a separate layer to the model. Then the function Dense() starts
the calculation of the activation function of the neurons. In this case there are ten
neurons and the activation function softmax. Each of the ten neurons corresponds to
one possible result, i.e. one of the ten letters. Each neuron, i.e. each possible outcome,
is assigned a probability by the function softmax, so that the sum of all neurons is 1.
A value close to 1 is to be interpreted in such a way that the associated result can
be assumed with a high degree of certainty. If all values are close to 0, no reliable
statement can be made.
TensorFlow uses the above construction to build a graph for the calculations, these are
automatically optimised and transformed into a format that allows efficient calculation
on the hardware. In practical use, the values are calculated from input to output. For
training, TensorFlow automatically determines the derivative of the loss function and
adds the gradient descent calculations to the graph. One function is sufficient for this:
The command passes the loss function categorical_crossentropy to the model.
The selected loss function calculates the squared Euclidean distance between two
vectors. To determine the optimal weights in the neural network, the algorithm
Adam() is used, which is a variant of gradient descent. The algorithm is very robust,
from t e n s o r f l o w . k e r a s . l o s s e s import c a t e g o r i c a l _ c r o s s e n t r o p y
from t e n s o r f l o w . k e r a s . o p t i m i z e r s import Adam
model . compile (
l o s s=c a t e g o r i c a l _ c r o s s e n t r o p y ,
o p t i m i z e r=Adam ( ) ,
m e t r i c s =[ ’ a c c u r a c y ’ ] )
so that for most applications no changes to the parameters are necessary. Through
the passing parameter metrics=[’accuracy’], it is counted during the calculations
how many images have been correctly determined by the neural network.
be executed. If only the history of the respective run is to be visualised, any data
already contained in the log file must be deleted. already contained in the log file
must be deleted. According to [Goo10], this should be done with the command
!rm -rf ./logs/
can be used. Before executing model.fit(), the log file and the callback function
must be prepared:
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, \
histogram_freq=1)
With this callback function, the data, named after the time of the training, is stored.
In the training, the callback function must be given to the tensor board, which is why
this part must be adapted:
With the input
%tensorboard -logdir logs/fit
the training is visualised with TensorBoard after it is completed. For the simple model
built above, the history of the training in TensorBoard looks as shown in Figure 18.1.
Figure 18.1.: Training history of the simple model visualised with TensorBoard
160
Listing 18.9.: Neural network with a second hidden layer and the dropout function
An error message may appear that the file cannot be found. This can be ignored.
After a short time, the TensorBoard should then be able to be reloaded.
Figure 18.2.: Training history of the simple model visualised with TensorBoard
Chapter 18. 161
There are now two convolutional layers and one pooling layer in this model. The
newly added Conv2D() layers belong in front of the existing Slabs() layer because
they use the two-dimensional structure of the input images. The first layer learns 32
filters with a matrix size of 3 × 3, the second 64 filters. This is followed by a maximum
pooling layer that stores only the largest value from each 2 × 2 field.
With the training data set limited to 100 samples, even this training is still fast. The
history is shown in Figure 18.3.
Figure 18.3.: Training history of the simple model visualised with TensorBoard
If you train over 12 epochs with all 60,000 training data, the training on a laptop
with an AMD Ryzen 5 3500U processor takes over 15 minutes (about 1.5 minutes
per epoch at about 200ms per step). In return, an accuracy of over 99 % is already
achieved in the sixth run, which Tensornboard displays as the fifth (see figure 18.4).
162
The individual elements can now be viewed. After entering the command
iris[’DESCR’]
Chapter 18. 163
#! / u s r / b i n / env python
# c o d i n g : u t f −8
# In [ ] :
#p r e p a r a t i o n f o r t e n s o r b o a r d
get_ipython ( ) . run_line_magic ( ’ load_ext ’ , ’ t e n s o r b o a r d ’ )
import t e n s o r f l o w a s t f
import d a t e t i m e
# In [ ] :
#c l e a r p r e v i o u s l o g s
import s h u t i l
s h u t i l . r m t r e e ( ’ . / l o g s ’ , i g n o r e _ e r r o r s=True )
# In [ ] :
#l o a d mnist d a t a s e t
from t e n s o r f l o w . k e r a s . d a t a s e t s import mnist
train_da , test_da = mnist . load_data ( )
x_train , y _ t r a i n = train_da
x_test , y _ t e s t = test_da
# In [ ] :
#d a t a p r e p a r a t i o n / t r a n s f o r m a t i o n
import t e n s o r f l o w . k e r a s . backend a s K
from t e n s o r f l o w . k e r a s . u t i l s import t o _ c a t e g o r i c a l
dat_form = K. image_data_format ( )
rows , c o l s = 2 8 , 28
t r a i n _ s i z e = x _ t r a i n . shape [ 0 ]
t e s t _ s i z e = x _ t e s t . shape [ 0 ]
i f dat_form == ’ c h a n n e l s _ f i r s t ’ :
x_train = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , 1 , rows , c o l s )
x_test = x _ t e s t . r e s h a p e ( t e s t _ s i z e , 1 , rows , c o l s )
input_shape = ( 1 , rows , c o l s )
else :
x_train = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , rows , c o l s , 1 )
x_test = x _ t e s t . r e s h a p e ( t e s t _ s i z e , rows , c o l s , 1 )
input_shape = ( rows , c o l s , 1 )
# norm d a t a t o f l o a t i n range 0 . . 1
x_train = x_train . astype ( ’ f l o a t 3 2 ’ )
x_test = x_test . astype ( ’ f l o a t 3 2 ’ )
x _ t r a i n /= 255
x _ t e s t /= 255
# conv c l a s s v e c s t o one h o t v e c
y _ t r a i n = t o _ c a t e g o r i c a l ( y_train , 1 0 )
y_test = t o _ c a t e g o r i c a l ( y_test , 1 0 )
# In [ ] :
164
from s k l e a r n . d a t a s e t s import l o a d _ i r i s
i r i s = load_iris ()
import numpy a s np
import pandas a s pd
import m a t p l o t l i b . p y p l o t a s p l t
import t e n s o r f l o w a s t f
from t e n s o r f l o w . k e r a s . models import S e q u e n t i a l
from t e n s o r f l o w . k e r a s . l a y e r s import Dense
Each record also already contains its classification in the key target. In the figure 18.6
this is listed for the first data sets.
y = pd.DataFrame(data=iris.target, columns = [’irisType’])
Chapter 18. 165
’ . . ␣ _ i r i s _ d a t a s e t : \ n\ n I r i s ␣ p l a n t s ␣ d a t a s e t
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−−−−−−−−−−−−−−−−−−−−
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ∗∗ Data ␣ S e t ␣ C h a r a c t e r i s t i c s : ∗ ∗
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Number␣ o f ␣ I n s t a n c e s : ␣ 150 ␣ ( 5 0 ␣ i n ␣ each ␣ o f ␣ t h r e e ␣ c l a s s e s )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Number␣ o f ␣ A t t r i b u t e s : ␣ 4 ␣ numeric , ␣ p r e d i c t i v e ␣ a t t r i b u t e s ␣and␣ t h e ␣ c l a s s
␣␣␣␣␣␣␣␣ : Attribute ␣ Information :
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ s e p a l ␣ l e n g t h ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ s e p a l ␣ width ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ p e t a l ␣ l e n g t h ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ p e t a l ␣ width ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ c l a s s :
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ I r i s −S e t o s a
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ I r i s −V e r s i c o l o u r
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ I r i s −V i r g i n i c a
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Summary␣ S t a t i s t i c s :
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣==============␣====␣====␣=======␣=========================
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣Min␣ ␣Max␣ ␣ ␣Mean␣ ␣ ␣ ␣SD␣ ␣ ␣ C l a s s ␣ C o r r e l a t i o n
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣==============␣====␣====␣=======␣=====␣====================
␣␣␣␣␣␣␣␣ s e p a l ␣ length : ␣␣␣ 4 . 3 ␣␣ 7 . 9 ␣␣␣ 5.84 ␣␣␣ 0.83 ␣␣␣␣ 0.7826
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ s e p a l ␣ width : ␣ ␣ ␣ ␣ 2 . 0 ␣ ␣ 4 . 4 ␣ ␣ ␣ 3 . 0 5 ␣ ␣ ␣ 0 . 4 3 ␣ ␣ ␣ −0.4194
␣␣␣␣␣␣␣␣ p e t a l ␣ length : ␣␣␣ 1 . 0 ␣␣ 6 . 9 ␣␣␣ 3.76 ␣␣␣ 1.76 ␣␣␣␣ 0.9490 ␣␣ ( high ! )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ p e t a l ␣ width : ␣ ␣ ␣ ␣ 0 . 1 ␣ ␣ 2 . 5 ␣ ␣ ␣ 1 . 2 0 ␣ ␣ ␣ 0 . 7 6 ␣ ␣ ␣ ␣ 0 . 9 5 6 5 ␣ ␣ ( h i g h ! )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣==============␣====␣====␣=======␣=====␣====================
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : M i s s i n g ␣ A t t r i b u t e ␣ Values : ␣None
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : C l a s s ␣ D i s t r i b u t i o n : ␣ 33.3% ␣ f o r ␣ each ␣ o f ␣ 3 ␣ c l a s s e s .
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : C r e a t o r : ␣R.A. ␣ F i s h e r
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Donor : ␣ Michael ␣ M a r s h a l l ␣ (MARSHALL%PLU@io . a r c . nasa . gov )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Date : ␣ July , ␣ 1988
y.head()
Missing Values
With the command
pandas.DataFrame.info()
can be used to check whether the data is consistent. The figure 18.8 shows that the
data is complete and the data type is identical for all values. Consequently, no action
is required for this data set.
Chapter 18. 167
x.info()
priority, so that the result is distorted. To avoid this, a vector is generated that
contains only the 0 and 1 values, thus encoding the category. This is a so-called
„OneHotVector“. There are two possibilities for this. One could use the function
to_categorical of the library Keras or the function OneHotEncoder of the library
sklearn.
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)
The result of the first five data shows the figure 18.10 on the right. It is displayed with
y_train[:5,:]
The left side of the figure 18.10 shows the original coding of the categories. Here you
can also see which data sets were randomly selected as the first five training data.
Figure 18.10.: Coding of the categories before (left) and after conversion
The result of the conversion can be checked in the first data set:
x_train[0]
model1 = S e q u e n t i a l ( )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ , input_shape= X_train [ 0 ] . shape ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 3 , a c t i v a t i o n= ’ softmax ’ ) )
Listing 18.15.: Structure of the neural network for the data set Iris
The next step is to finalise the model. For this, the optimisation, the evaluation
function and the metric must be passed.
model1.compile(optimizer=’adam’, loss= ’categorical_crossentropy’, met-
rics = [’acc’])
Several candidates are available for the optimisation function:
In the figure 18.11 the progression of accuracy over the epochs is shown. Only five
lines need to be programmed to generate the graph:
plt.plot(history.history[’acc’])
plt.plot(history.history[’val_acc’])
plt.xlabel(’Epochs’)
plt.ylabel(’Acc’)
Figure 18.11.: Course of the accuracy for the training and test data during the training
of the model
plt.plot(history.history[’loss’])
plt.plot(history.history[’val_loss’])
plt.xlabel(’Epochs’)
plt.ylabel(’Loss’)
Figure 18.12.: Course of the loss function for the training and test data during the
training of the model
model1.evaluate(x_test, y_test)
18.7.5. Regularisation
An accuracy of 87 % is also achieved for this model using the test data.
The course of the accuracy and the evaluation function are shown in the figures 18.13
and 18.14.
172
model2 = S e q u e n t i a l ( )
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , input_shape= X_train [ 0 ] . shape ) )
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( t f . k e r a s . l a y e r s . Dropout ( 0 . 5 ) )
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r = t f . k e r a s .
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( t f . k e r a s . l a y e r s . Dropout ( 0 . 5 ) )
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 3 , a c t i v a t i o n = ’ softmax ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s .
Listing 18.16.: Construction of the neural network for the data set Iris - Improved
version
plt . p l o t ( h i s t o r y 2 . h i s t o r y [ ’ acc ’ ] )
plt . p l o t ( h i s t o r y 2 . h i s t o r y [ ’ val_acc ’ ] )
plt . t i t l e ( ’ Accuracy ␣ vs . ␣ e p o c h s ’ )
plt . y l a b e l ( ’ Acc ’ )
plt . x l a b e l ( ’ Epoch ’ )
plt . l e g e n d ( [ ’ T r a i n i n g ’ , ’ V a l i d a t i o n ’ ] , l o c= ’ l o w e r ␣ r i g h t ’ )
plt . show ( )
Figure 18.13.: Course of accuracy for the training and test data during the training of
the second model
The Python program part for the loss function is given as follows:
174
Figure 18.14.: Course of the loss function for the training and test data during the
training of the second model
#! / u s r / b i n / env python
# c o d i n g : u t f −8
# In [ ] :
#T u t o r i a l f o r I r i s D a t a s e t
#code b a s e d on t h e t u t o r i a l @ h t t p s : / /www. k d n u g g e t s . com/2020/07/ g e t t i n g −s t a r t e d −t
# In [ ] :
#l o a d d a t a
from s k l e a r n . d a t a s e t s import l o a d _ i r i s
i r i s = load_iris ()
# In [ ] :
#l o a d o t h e r needed l i b r a r i e s
from s k l e a r n . m o d e l _ s e l e c t i o n import t r a i n _ t e s t _ s p l i t #t o s p l i t d a t a
import numpy a s np
import pandas a s pd
import m a t p l o t l i b . p y p l o t a s p l t
import t e n s o r f l o w a s t f
from t e n s o r f l o w . k e r a s . l a y e r s import Dense
from t e n s o r f l o w . k e r a s . models import S e q u e n t i a l
# In [ ] :
#c o n v e r t i n t o d a t a frame
X = pd . DataFrame ( data = i r i s . data , columns = i r i s . feature_names )
y = pd . DataFrame ( data= i r i s . t a r g e t , columns = [ ’ i r i s T y p e ’ ] )
# In [ ] :
#c h e c k i f d a t a i s c o m p l e t e
X. i n f o ( )
# In [ ] :
#S p l i t d a t a i n t o t r a i n i n g and t e s t s e t
X_train , X_test , y_train , y _ t e s t = t r a i n _ t e s t _ s p l i t (X, y , t e s t _ s i z e =0.1)
# In [ ] :
#c h e c k v a r i a n c e
X_train . var ( ) , X_test . var ( )
# In [ ] :
176
Superclass Classes
aquatic mammals beaver, dolphin, otter, seal, whale
fish aquarium fish, flatfish, ray, shark, trout
flowers orchids, poppies, roses, sunflowers, tulips
food containers bottles, bowls, cans, cups, plates
fruit and vegetables apples, mushrooms, oranges, pears, sweet peppers
household electrical devices clock, computer keyboard, lamp, telephone, television
household furniture bed, chair, couch, table, wardrobe
insects bee, beetle, butterfly, caterpillar, cockroach
large carnivores bear, leopard, lion, tiger, wolf
large man-made outdoor things bridge, castle, house, road, skyscraper
large natural outdoor scenes cloud, forest, mountain, plain, sea
large omnivores and herbivores camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals fox, porcupine, possum, raccoon, skunk
non-insect invertebrates crab, lobster, snail, spider, worm
people baby, boy, girl, man, woman
reptiles crocodile, dinosaur, lizard, snake, turtle
small mammals hamster, mouse, rabbit, shrew, squirrel
trees maple, oak, palm, pine, willow
vehicles 1 bicycle, bus, motorcycle, pickup truck, train
vehicles 2 lawn-mower, rocket, streetcar, tank, tractor
That is, they receive 50,000 training data per data set, consisting of 32×32- pixels
in three colour channels. A vector of length 50,000 is assigned, which contains the
corresponding labels. To see in which form the labels are coded, one can look at the
first ten entries:
print(y_train10[:10])
[6]
[9]
[9]
[4]
[1]
[1]
[2]
[7]
[8]
[3]]
If you look at several entries, you will notice that the ten categories are coded with the
numbers 0 to 9. This means that the vector with the categories is to be transformed
into the OneHotVector representation.
In Figure 18.16 the first 25 training data are shown with associated labels. This can
be easily programmed:
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(X_train10[i], cmap=plt.cm.binary)
plt.xlabel(y_train10[i])
plt.show()
178
Figure 18.16.: Visualisation of the first 25 training data from the data set CIFAR-10
Similarly, examples from the CIFAR-100 dataset can be displayed in Figure 18.17:
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(X_train100[i], cmap=plt.cm.binary)
plt.xlabel(y_train100[i])
plt.show()
Chapter 18. 179
Figure 18.17.: Visualisation of the first 25 training data from the data set CIFAR-100
It can be seen that the categories are numbered in alphabetical order (0-apples,
1-aquarium fish...; cf. table 18.15). Accordingly, the category „bottles“ has index 9,
which can be checked by displaying the corresponding images.
Since the goal is to add only the images of the category “bottles” to CIFAR-10, the
CIFAR-100 dataset has to be filtered and reduced so that only the images with index
9 remain. This is done by searching for all entries with the value 9 in the label vector
and applying the resulting structure to the training data to reduce it to these entries.
Since we want to add the filtered data to a dataset that has already assigned the index
9 to an existing category, it will also have its index changed to 10 in this step.
idx = (y_train100 == 9).reshape(X_train100.shape[0])
X_train100 = X_train100[idx]
y_train100 = y_train100[idx]
for i in range(len(y_train100)):
y_train100[i]=10
With the query
len(X_train100)
we find that the number of training data in the CIFAR-100 dataset has reduced from
50,000 to 500, which is exactly the number of expected entries per category. For a
rough check, we can again display the first 25 entries from the modified CIFAR-100
dataset. As can be seen in Figure 18.18, judging from both the image and the label,
these consist exclusively of the desired category.
180
Figure 18.18.: Visualisation of the first 25 training data in the filtered CIFAR-100
dataset
Unfortunately, the two data sets could not be merged directly. In this case, the added
category would only have 500 images, while there are 5,000 images for each of the
other categories. As a result, this category would not be sufficiently trained and would
be recognised less well than the other categories. Now there are two possibilities. From
the 500 images, 5,000 images could be generated, but this is costly. The alternative
is to reduce the images of the other categories to 500. For this purpose, a similar
procedure is now also applied to the data from the CIFAR-10 dataset in order to
represent all categories equally in the new dataset.
X_train10_red = [None]*5000
y_train10_red = [None]*5000
for i in range(10):
idx = (y_train10 == i).reshape(X_train10.shape[0])
x=X_train10[idx]
y=y_train10[idx]
X_train10_red[i*500:i*500+500] = x[0:500]
y_train10_red[i*500:i*500+500] = y[0:500]
With the command concatenate from numpy, the two data sets, our modified versions
of CIFAR-10 and CIFAR-100, can now be linked.
X_train = np.concatenate((X_train10, X_train100))
y_train = np.concatenate((y_train10, y_train100))
Chapter 18. 181
The length of the two arrays is now 5500 entries. To see how the two datasets have
been merged, entries 4999 to 5023 of the newly created dataset can be viewed. In
Figure 18.19 it can be seen that the data of the modified dataset CIFAR-100 has been
appended to the data of the dataset CIFAR-10. In addition, the categories are now
also in order in the CIFAR-10 dataset that we reduced.
Figure 18.19.: Visualisation of the transition between the two data sets
To correct this, the created data set is shuffled. It is important that images and labels
are shuffled in the same way so that they still fit together. For this, the auxiliary
variable shuffler is defined, which specifies the way in which the two arrays are
shuffled randomly but both equally.
shuffler = np.random.permutation(len(X_train))
X_train = X_train[shuffler]
y_train = y_train[shuffler]
Then the same section of the dataset looks like in Figure 18.20.
182
y_train = tf.keras.utils.to_categorical(y_train)
print(X_train.shape)
print(y_train.shape)
print(y_train[1])
The same steps for modifying and merging the datasets must also be gone through
for the test data. The code for creating the dataset without the additional steps for
visualisation is shown below in the listing 18.21.
Chapter 18. 183
#! / u s r / b i n / env python
# c o d i n g : u t f −8
# In [ ] :
import numpy a s np
import t e n s o r f l o w a s t f
from t e n s o r f l o w . k e r a s import d a t a s e t s
# In [ ] :
# In [ ] :
# In [ ] :
f o r i in range ( 1 0 ) :
i d x = ( y _ t r a i n 1 0 == i ) . r e s h a p e ( X_train10 . shape [ 0 ] )
x = X_train10 [ i d x ]
y = y_train10 [ idx ]
X_train10_red [ i ∗ 5 0 0 : i ∗500+500] = x [ 0 : 5 0 0 ]
y_train10_red [ i ∗ 5 0 0 : i ∗500+500] = y [ 0 : 5 0 0 ]
X_test10_red = [ None ] ∗ 1 0 0 0
y_test10_red = [ None ] ∗ 1 0 0 0
f o r i in range ( 1 0 ) :
i d x = ( y _ t e s t 1 0 == i ) . r e s h a p e ( X_test10 . shape [ 0 ] )
x = X_test10 [ i d x ]
184
Table 18.1.: Visualisation of training progress with the data set CIFAR10
Dataset 1 Model 1 Epochs 100 Batch Size 32
Chapter 18. 185
Accuracy Loss
Dataset 1 Model 2 Epochs 100 Batch Size 32
Accuracy Loss
Table 18.2.: Visualisation of the training courses with the modified data set
Dataset 2 Model 1 Epochs 100 Batch Size 32
Accuracy Loss
Dataset 2 Model 1 Epochs 200 Batch Size 32
186
Accuracy Loss
Dataset 2 Model 1 Epochs 100 Batch Size 64
Accuracy Loss
Dataset 2 Model 1 Epochs 100 Batch Size 128
Accuracy Loss
Dataset 2 Model 1 Epochs 100 Batch Size 256
Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 32
Chapter 18. 187
Accuracy Loss
Dataset 2 Model 2 Epochs 200 Batch Size 32
Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 64
Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 128
Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 256
188
Accuracy Loss
Table 18.3 shows the values of the accuracy and loss function at the end of the training
for training, validation and the evaluation following the training with the test data.
Table 18.3.: Model quality and training time for different parameters: D-dataset,
M-model, E-epochs, B-batch size
Parameters training validation test
D M E B accuracy loss accuracy loss accuracy loss train.
time
1 1 100 32 0.9498 0.1522 0.7109 1.586 0.7066 1.567 65.1
1 2 100 32 0.9937 0.0219 0.7356 1.323 0.7316 1.363 1,366.2
2 1 100 32 0.9782 0.0647 0.5482 3.441 0.5367 3.306 7.4
2 1 200 32 0.9914 0.0299 0.5309 4.229 0.5224 4.350 14.8
2 1 100 64 0.9861 0.0460 0.5564 2.990 0.5427 3.157 6.1
2 1 100 128 0.9793 0.0663 0.5355 2.994 0.5392 3.060 5.6
2 1 100 256 0.9616 0.1003 0.5491 2.642 0.5543 2.602 5.2
2 2 100 32 0.9932 0.0413 0.5836 1.598 0.5582 1.730 152.3
2 2 200 32 0.9989 0.0094 0.6073 1.715 0.5578 1.954 303.0
2 2 100 64 0.9884 0.0548 0.5773 1.562 0.5603 1.679 151.9
2 2 100 128 0.9891 0.0489 0.6036 1.537 0.5602 1.686 151.4
2 2 100 256 0.9857 0.0570 0.5936 1.561 0.5569 1.727 154.6
As might be expected, the full CIFAR-10 dataset achieves higher accuracy when
trained with AlexNet, as it was trained with many more images. For the simpler
model, the small dataset seems to achieve high accuracy in training, but there seems
to be overfitting because the test accuracy for the small, modified dataset is much
lower than the training accuracy. At the same time, training with the full CIFAR-10
dataset on the AlexNet architecture requires a training time of almost a full day with
the specifications RAM 64 GB, Intel i9 9900k processor, 8 cores, 3600 MHz.
It can be seen that training with 100 epochs is absolutely sufficient. The results vary
slightly with the batch size, but no clear pattern can be seen here.
The accuracies achieved are not particularly high. This is not surprising, however,
when one considers the reduced training effort and makes a comparison with the
results achieved by the AlexNet developers with AlexNet. Here, too, despite the use of
over 1.2 million high-resolution images with the ImageNet dataset, a top-1 error rate
of 37.5 % was still observed. In this respect, the results achieved here with TensorFlow
with fewer low-resolution images are plausible if not satisfactory.
#! / u s r / b i n / env python
# c o d i n g : u t f −8
# S c r i p t f o r T r a i n i n g w i t h s t a n d a r d and m a n i p u l a t e d CIFAR10 d a t a s e t
# ( s e e S c r i p t ’ D a t a s e t . py ’ ) w i t h a s i m p l e CNN and AlexNet
#
# Written by C. Joachim b a s e d on d i f f e r e n t s o u r c e s ( s e e r e f e r e n c e s i n code )
# January 2021
#
# I f t h e m a n i p u l a t e d d a t a s e t i s t o be used , run t h i s s c r i p t i n advance
# such t h a t t h e d a t a s e t can be l o a d e d
# In [ ] :
import t e n s o r f l o w a s t f
import time
from t e n s o r f l o w import k e r a s
# In [ ] :
#Choose parameters f o r t r a i n i n g
Dataset = 1 #D a t a s e t=1 f o r s t a n d a r d CIFAR10 , D a t a s e t=2 f o r own d a t a s e t
Model = 1 #Model=1 f o r s i m p l e CNN, Model=2 f o r AlexNet
Epochs = 100 #c h o o s e number o f e p o c h s
Batch = 256 #c h o o s e Batch S i z e ( s t a n d a r d i s 32 , 64 , 128 , 256)
# In [ ] :
#i m p o r t d a t a s e t
i f D a t a s e t ==1:
#l o a d CIFAR10
from t e n s o r f l o w . k e r a s import d a t a s e t s
( X_train , y _ t r a i n ) , ( X_test , y _ t e s t ) = t f . k e r a s . d a t a s e t s . c i f a r 1 0 . load_data ( )
#10 C a t e g o r i e s −−> 10 o u t p u t neurons needed
c a t = 10
i f D a t a s e t ==2:
#l o a d m a n i p u l a t e d CIFAR10 d a t a s e t ( a l r e a d y n o r m a l i z e d and OneHot−encoded )
get_ipython ( ) . run_line_magic ( ’ s t o r e ’ , ’−r ’ )
( X_train , y_train , X_test , y _ t e s t ) = Data_CIFAR
#11 C a t e g o r i e s −−> 11 o u t p u t neurons needed
c a t = 11
# In [ ] :
i f Model==1:
#use Model from MNIST−T u t o r i a l from ’ c t Python−P r o j e k t e by H e i s e
#w i t h m o d i f i e d i n p u t s i z e and a d j u s t a b l e number o f o u t p u t neurons
#
#i f t h e s i m p l e CNN i s chosen , n o r m a l i z e d a t a
X_train = X_train /255
X_test = X_test /255
#i f t h e D a t a s e t i s o r i g i n a l CIFAR−10 , we s t i l l need t o
#c o n v e r t t o OneHot−Vector
i f D a t a s e t ==1:
y_train = t f . keras . u t i l s . t o _ c a t e g o r i c a l ( y_train )
y_test = t f . keras . u t i l s . t o _ c a t e g o r i c a l ( y_test )
190
epochs=10,
validation_data=(test_images, test_labels),
callbacks=[cp_callback]) # Pass callback to training
Es kann sein, dass auf diesem Wege eine Warnmeldung generiert wird, die aber ignoriert
werden kann. WS:Welche
Warnumeldung
Zwischen zwei Modellen mit derselben Architektur können die Gewichte mit Hilfe
der erstellten Checkpoint-Datei geteilt werden. Sei model ein zweites Modell mit der
gleichen Architektur wie das Modell, deren Gewichte gespeichert wurden, lädt man
die Gewichte aus dem Checkpoint wie folgt in das neue Modell:
model.load_weights(checkpoint_path)
model.save_weights(’./checkpoints/my_checkpoint’)
SavedModel-Format
!mkdir -p saved_model
model.save(’saved_model/my_model’)
Das Speichern eines Modells ist nur dann sinnvoll, wenn es später genutzt werden
kann. Das Modell kann aus dem gespeicherten Modell neu geladen werden:
new_model = tf.keras.models.load_model(’saved_model/my_model’)
Format HDF5
model.save(’my_model.h5’)
Geladen wird das Modell entsprechend mit dem Dateinamen inklusive der Endung h5:
new_model = tf.keras.models.load_model(’my_model.h5’)
WS:Wie wird ein Mode
genutzt?
192
1. tf.data
Bei Verwendung einer Grafikkarte der Firma NVIDIA ist die einfachste Lösung zur
Überwachung der GPU-Auslastung über die Zeit wahrscheinlich das Werkzeug nvtop.
Eine mögliche Ausgabe ist in Abbildung 18.21 dargestellt.
profile_batch={BATCH_INDEX_TO_MONITOR}
18.11.2. tf.data
Um die GPU durchgängig arbeiten zu lassen, muss der Engpass des Ladens der Daten WS:Hilft dies auch bei
beseitigt werden. In Abbildung 18.23 ist zu erkennen, dass die GPU nicht arbeitet, einer CPU?
während die CPU mit dem Laden von Daten beschäftigt ist.
Als erstes sollte zur Beseitigung dieses Engpasses von Keras sequences zu tf.data
gewechselt werden. Danach können weitere Optimierungen vorgenommen werden:
Patch-Auswahl geschehen.
Das nachfolgende Beispiel ist ein optimierter Code zur Datensatzerstellung mit scharfen
und verschwommenen Bildern zu sehen:
18.11.4. Multi-GPU-Training
Der einfachste Weg, Multi-GPU-Training durchzuführen, ist die Verwendung der Mir-
roredStrategy. Sie instanziiert das Modell auf jeder GPU. Bei jedem Schritt werden
WS:Was ist Backward verschiedene Batches an die GPUs gesendet, die den Backward Pass ausführen. Dann
Pass? werden die Gradienten aggregiert, um eine Aktualisierung der Gewichte durchzuführen,
und die aktualisierten Werte werden an jedes instanziierte Modell propagiert.
Die Verteilungsstrategie ist mit TensorFlow 2.0 wieder recht einfach. Es ist nur daran
zu denken, dass die übliche Batchgröße mit der Anzahl der verfügbaren GPUs zu
multiplizieren:
# Define multi-gpu strategy
mirrored_strategy = tf.distribute.MirroredStrategy()
# Update batch size value
batch_size *= mirrored_strategy.num_replicas_in_sync
# Create strategy scope to perform training
with mirrored_strategy.scope():
model = [...]
model.fit(...)
• tf.config.threading:
– set_intra_op_parallelism_threads: Legt die Anzahl der Threads fest,
die innerhalb einer einzelnen Operation für Parallelität verwendet werden.
– set_inter_op_parallelism_threads: Legt die Anzahl der Threads fest,
die für die Parallelität zwischen unabhängigen Vorgängen verwendet werden.
#d a t a s e t c r e a t i o n f o r o p t i m i z i n g t r a i n i n g time
#from h t t p s : / /www. k d n u g g e t s . com/2020/03/ t e n s o r f l o w −o p t i m i z i n g −t r a i n i n g −tim
import t e n s o r f l o w a s t f
class TensorflowDatasetLoader :
def __init__ ( s e l f , dataset_path , b a t c h _ s i z e =4, p a t c h _ s i z e =(256 , 2 5 6 ) , n_e
# L i s t a l l images p a t h s
sharp_images_paths = [ s t r ( path ) f o r path in Path ( dataset_path ) . g l o b ( " ∗/ s h
i f n_images i s not None :
sharp_images_paths = sharp_images_paths [ 0 : n_images ]
# Generate c o r r e s p o n d i n g b l u r r e d images p a t h s
blur_images_paths = [ path . r e p l a c e ( " s h a r p " , " b l u r " ) f o r path in sharp_imag
# D e f i n e d a t a s e t c h a r a c t e r i s t i c s ( b a t c h _ s i z e , number_of_epochs , s h u f f l i n g
d a t a s e t = d a t a s e t . batch ( b a t c h _ s i z e )
d a t a s e t = d a t a s e t . s h u f f l e ( b u f f e r _ s i z e =50)
dataset = dataset . repeat ()
d a t a s e t = d a t a s e t . p r e f e t c h ( b u f f e r _ s i z e=t f . data . e x p e r i m e n t a l .AUTOTUNE)
s e l f . dataset = dataset
@staticmethod
196
Der Wert OMP_NUM_THREADS wird auf die Anzahl der verfügbaren Kernen und der
Parameter KMP_BLOCKTIME wird auf einen kleineren Wert als das größere Netz gesetzt.
19.3. Procedure
The process for using TensorFlow Lite consists of four steps:
1. Choice of a model:
An existing, trained model can be adopted or trained yourself, or a model can
be created from scratch and trained.
2. Conversion of the model:
If a custom model is used that is accordingly not in TensorFlow Lite format, it
must be converted to the format using the TensorFlow Lite converter.
197
198
Train a Model
The figure 19.2 illustrates the basic process for creating a model that can be used on
an edge computer. Most of the process uses standard TensorFlow tools.
TensorFlow
Definition Training Quanti- Export Modell im
Trainiertes Trainiertes
eines Datei-
32-Bit-Modell 8-Bit-Modell
32-Bit-Modells sierung format .pb
Konver-
Edge Computer tierung
Figure 19.2.: The basic process for creating a model for an edge computer[Goo19a]
general, converting models reduces file size and introduces optimisations without
compromising accuracy. The TensorFlow Lite converter continues to offer options to
further reduce file size and increase execution speed, with some trade-offs in accuracy.
[Goo20a][WS20]
One way to optimise is quantisation. While the weights of the models are usually
stored as 32-bit floating point numbers, they can be converted to 8-bit integers with
only a minimal loss in the accuracy of the model. This saves memory and also allows
for faster computation, as a CPU works more efficiently with integers. [WS20]
This converts the saved model into the format tflite and saves it in the specified
location with the specified name. [Goo11a] The flag ’wb’ merely specifies that it is
written in binary mode.
In addition, this model can now be optimised as already described. For this purpose,
the optimisation must be specified [WS20] before the actual conversion:
200
converter.optimizations = [tf.lite.Optimize.DEFAULT]
In addition to the DEFAULT variant, there are also options to specifically optimise the
size or latency. However, according to [Ten21], these are outdated, so DEFAULT should
be used.
Where <username> is the user name used on the Jetson nano and <IP address> is
the IP address. The IP address can be viewed in a terminal on the Jetson nano using
the command
$ ip addr show
display.
WS:Wohin kopieren?
Then the shape/size of the input and output tensors can be determined so that
subsequently the data can be adjusted accordingly.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
With this information, the input data, in this example of ?? a random array, can be
adjusted and set as the input data:
input_shape = input_details[0][’shape’]
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0][’index’], input_data)
Chapter 19. 201
Then the interpreter is executed and determines the result of the model
interpreter.invoke()
Finally, the output of the model can be determined and read out:
output_data = interpreter.get_tensor(output_details[0][’index’])
#! / u s r / b i n / env python
# c o d i n g : u t f −8
# S c r i p t f o r c o n v e r t i n g a t r a i n e d model ( s e e S c r i p t Training_CIFAR . py )
# t o a TensorFlow−L i t e −Model w i t h o u t o p t i m i z a t i o n
#
# Based on h t t p s : / /www. t e n s o r f l o w . o r g / api_docs / python / t f / l i t e / TFLiteConverter
# In [ 2 ] :
import t e n s o r f l o w a s t f
# In [ 1 ] :
# In [ 3 ] :
#Convert t h e model
converter = t f . l i t e . TFLiteConverter . from_saved_model ( " saved_model / "+" D a t a s e t ␣%
tflite_model = converter . convert ()
# In [ 4 ] :
#Save t h e model .
with open ( ’ model1 . t f l i t e ’ , ’wb ’ ) a s f :
f . write ( tflite_model )
# In [ ] :
#! / u s r / b i n / env python
# c o d i n g : u t f −8
# S c r i p t f o r c o n v e r t i n g a t r a i n e d model ( s e e S c r i p t Training_CIFAR . py )
# t o a TensorFlow−L i t e −Model w i t h o p t i m i z a t i o n
#
# Based on h t t p s : / /www. t e n s o r f l o w . o r g / api_docs / python / t f / l i t e / TFLiteConverter
# In [ ] :
import t e n s o r f l o w a s t f
# In [ ] :
under conditions closer to the application using a few more or less arbitrary images
selected from the Google Image Search for the various categories.
Five photos in format .jpg or .jpeg will be selected from the internet and, to a small
extent, from the photos contained in the project jetson-inference (https://github.
com/dusty-nv/jetson-inference), per category. Care is taken to ensure that the WS:correct source
images are not too similar so that the results are approximately representative and
meaningful despite the small number of images per category.
The selected images are shown in figure ?? and a reference to their source to facilitate
the search for comparable photos, the link to the image in the Google Image Search in
table 19.2.
204
bottle3.jpg https://images.app.goo.gl/wPaB7Q7u9DTbzhd38
bottle4.jpg https://images.app.goo.gl/HLKGVanQACtDxxmm8
With the selected images, the trained model is first tested on the work PC in the
Jupyter notebook to determine the performance of the original model with respect
to the test images as reference. The script for this is test_original.py. Based on
the selection of the model parameters (dataset, model, epochs and batch size; see
WS:ref listing missing script training_CIFAR.py), this first reads in the model to be evaluated. Then the
selected test pattern is loaded. This must be normalised and scaled in the same way
as in the training. Therefore, the build-in functions of TensorFlow are used again
for this (tf.image.per_image_standardization and (tf.image.resize). Finally,
the model can be run and the result evaluated. The category with the maximum
probability and the corresponding description are determined.
After the test on the work PC, the models converted to tflite format are used on the
Jetson Nano. The programme tflite-photo.py works similarly to the previous one:
The selected .tflite model is loaded, as well as the selected image, which is normalised
and resized in the same way. in the same way. Before that, however, the TensorFlow
Lite interpreter is initialised and the tensors are allocated. In addition, the necessary
sizes of the input and output tensors are determined. After the input image has
been prepared and defined as the input tensor, the model is executed including a
measurement of the computing time. Afterwards, the result can be evaluated as usual.
In this case, the result is also superimposed on the input image and the resulting
result image is stored in a separate folder.
Table 19.3.: Test results for the different models: A - model with modified dataset,
AlexNet, 100 epochs, batch size 32 in .pb format; A1 - tflite model
without quantisation; A2 - tflite model with quantisation; B - model with
CIFAR-10 dataset, AlexNet, 100 epochs, batch size 32 in .pb format; B1 -
tflite model without quantisation; B2 - tflite model with quantisation
Results
image
A A1 A2 B B1 B2
airplane0 99.9% ship 99.9% ship 99.7% ship 99.7% plane 94.4% bird 51.0% bird
airplane1 94.3% plane 94.3% plane 94.0% plane 99.9% plane 100% plane 100% plane
airplane2 62.0% auto 62.0% auto 99.2% auto 85.2% auto 85.2% auto 94.8% frog
airplane3 99.9% auto 99.9% auto 99.9% auto 100% plane 100% plane 100% plane
airplane4 99.9% plane 100% plane 99.9% plane 100% plane 100% plane 100% plane
auto0 99.7% horse 99.7% horse 100% horse 84.7% horse 84.7% horse 91.0% horse
auto1 94.6% auto 94.6% auto 97.8% auto 99.9% auto 100% auto 100% auto
auto2 42.0% bird 42.0% bird 95.9% truck 100% auto 100% auto 100% auto
Chapter 19. 207
auto3 100% auto 100% auto 100% auto 100% auto 100% auto 100% auto
auto4 100% horse 100% horse 100% horse 89.0% auto 89.0% auto 79.3% auto
bird0 100% bird 100% bird 100% bird 99.9% bird 99.9% bird 100% bird
bird1 95.3% horse 95.3% horse 85.2% horse 47.9% plane 47.9% plane 93.8% bird
bird2 99.7% frog 99.7% frog 99.7% frog 99.9% bird 100% bird 100% bird
bird3 80.4% truck 80.4% truck 41.5% truck 91.7% bird 91.7% bird 99.3% bird
bird4 99.9% bird 99.9% bird 99.4% bird 99.8% plane 99.8% plane 99.5% plane
cat0 97.9% horse 97.9% horse 98.6% horse 91.3% horse 91.3% horse 74.3% plane
cat1 74.2% horse 74.2% horse 66.3% horse 57.6% horse 57.6% horse 72.5% frog
cat2 99.9% horse 99.9% horse 99.8% horse 98.2% cat 98.2% cat 57.6% auto
cat3 99.6% bird 99.6% bird 99.8% bird 78.7% dog 78.7% dog 65.3% plane
cat4 99.0% horse 99.0% horse 59.6% bird 100% horse 100% horse 100% horse
deer0 86.6% cat 86.6% cat 80.2% cat 99.9% frog 100% frog 100% frog
deer1 99.8% horse 99.8% horse 99.8% horse 100% horse 100% horse 100% horse
deer2 99.9% frog 99.9% frog 99.9% frog 100% frog 100% frog 100% frog
deer3 96.6% bird 96.6% bird 96.4% bird 100% deer 100% deer 100% deer
deer4 97.3% truck 97.3% truck 97.8% truck 100% deer 100% deer 100% deer
dog0 90.0% truck 90.0% truck 97.8% truck 99.8% horse 99.8% horse 72.3% horse
dog1 98.7% dog 98.7% dog 89.0% dog 98.7% deer 100% dog 100% dog
dog2 99.3% horse 99.3% horse 99.0% horse 78.1% plane 78.1% plane 74.3% bird
dog3 99.9% bird 100% bird 100% bird 99.9% bird 99.9% bird 99.6% bird
dog4 99.9% horse 100% horse 99.99% horse 85.8% horse 85.8% horse 100% horse
frog0 95.0% truck 95.0% truck 87.19% truck 97.7% plane 97.7% plane 93.1% plane
frog1 46.5% bird 46.5% bird 44.1% frog 58.5% plane 58.5% plane 41.7% frog
frog2 99.9% horse 99.9% horse 99.9% horse 95.7% bird 100% frog 100% frog
frog3 99.3% bottle 99.3% bottle 96.3% bottle 99.5% frog 99.5% frog 99.9% frog
frog4 62.8% bird 62.8% bird 89.0% bird 58.1% plane 58.1% plane 62.3% plane
horse0 100% bird 100% bird 100% bird 99.9% horse 100% horse 100% horse
horse1 66.7% bird 66.7% bird 83.16% horse 77.5% frog 77.5% frog 99.6% frog
horse2 88.1% horse 88.1% horse 95.0% horse 99.9% horse 100% horse 100% horse
horse3 90.0% horse 90.0% horse 98.5% horse 100% horse 100% horse 100% horse
horse4 86.2% horse 86.2% horse 78.2% horse 77.7% dog 77.7% dog 50.5% bird
ship0 99.9% auto 100% auto 100% auto 85.8% truck 85.8% truck 50.9% frog
ship1 69.2% ship 69.2% ship 50.7% auto 99.0% plane 99.0% plane 99.7% plane
ship2 66.4% bird 66.4% bird 76.7% bird 50.4% frog 50.5% frog 83.3% plane
ship3 54.2% bottle 54.2% bottle 87.1% bottle 98.6% plane 98.6% plane 73.1% plane
ship4 73.8% ship 73.8% ship 97.4% truck 99.9% plane 100% plane 100% plane
truck0 52.6% horse 52.57% horse 99.9% horse 99.9% plane 100% plane 99.9% plane
truck1 99.9% truck 100% truck 100% truck 100% truck 100% truck 100% truck
truck2 99.9% truck 99.9% truck 99.6% truck 99.9% plane 99.9% plane 99.3% plane
truck3 64.5% auto 64.5% truck 99.3% auto 99.9% plane 99.9% plane 99.8% plane
truck4 99.9% bird 99.9% bird 99.9% bird 99.9% horse 100% horse 100% horse
bottle0 99.9% bottle 100% bottle 100% bottle - - -
bottle1 99.9% bird 99.9% bottle 99.5% bird - - -
bottle2 96.7% horse 96.7% horse 89.7% horse - - -
bottle3 93.7% truck 93.7% truck 99.9% truck - - -
bottle4 81.5% bird 81.5% bird 96.6% bird - - -
In table 19.4 the required computing time for the classification of the images with the
respective network in .tflite format with or without optimisation is recorded, which
is printed with the script tflite-foto.py together with the classification on the
output image and output in the terminal. All images are entered together, so that
208
#! / u s r / b i n / env python
# c o d i n g : u t f −8
# In [ ] :
import t e n s o r f l o w a s t f
import numpy a s np
import time
from PIL import Image
# In [ ] :
# c h o o s e model p a r a m e t e r s
Dataset = 2
Model = 2 # must not be changed
Epochs = 100
Batch = 32
# In [ ] :
# l o a d t r a i n e d model
model = t f . k e r a s . models . load_model ( " saved_model / "+" D a t a s e t ␣%s ␣ Model ␣%s ␣ Epochs ␣%s ␣
# In [ ] :
# c h o o s e image
f i l e _ i n = " b o t t l e 3 . jpg "
# In [ ] :
# p r e p a r e imgage − a d j u s t s i z e and n o r m a l i z e
img = t f . i o . r e a d _ f i l e ( " images / "+f i l e _ i n )
img = t f . image . decode_jpeg ( img , c h a n n e l s =3)
img = t f . r e s h a p e ( img , ( 1 , img . shape [ 0 ] , img . shape [ 1 ] , 3 ) )
img = t f . image . p e r _ i m a g e _ s t a n d a r d i z a t i o n ( img )
img = t f . image . r e s i z e ( img , ( 2 2 7 , 2 2 7 ) )
# In [ ] :
# e v a l u a t e image w i t h model
prob = model . p r e d i c t ( img )
# In [ ] :
# f i n d c l a s s w i t h maximum c o n f i d e n c e
c o n f = np . amax ( prob )
i n d e x = np . argmax ( prob )
Chapter 19. 209
#! / u s r / b i n / python
#
# S c r i p t f o r u s i n g t f l i t e −Models w i t h p h o t o s from d i s c
# The t f l i t e −Models were t r a i n e d w i t h t h e o r o g i n a l CIFAR−10 d a t a s e t
# and a m a n i p u l a t e d CIFAR−10 d a t a s e t ( c a t e g o r y ’ b o t t l e ’ added from CIFAR−100)
#
# Documentation i n t h e Document ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano ’ by
# R e l a t e d t o t h e P r o j e c t ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano und TensorF
#
# Written by C. Joachim i n January 2021
# b a s e d on t h e imagenet−c o n s o l e from h t t p s : / / g i t h u b . com/ d u s t y −nv / j e t s o n −i n f e r e n c e
# and TensorFlow L i t e i n f e r e n c e w i t h Python from h t t p s : / /www. t e n s o r f l o w . o r g / l i t e /
# and h t t p s : / / g i t h u b . com/ t e n s o r f l o w / t e n s o r f l o w / b l o b / master / t e n s o r f l o w / l i t e / e x a m p
#
# To run e n t e r ( i n t h e example u s i n g t h e image b o t t l e _ 1 and t h e model1 )
# $ python3 . / t f l i t e −f o t o . py b o t t l e _ 1 . j p g m1_bottle_1 . j p g model1 . t f l i t e
#
# The Models :
# model1 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model2 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model3 . t f l i t e : o r i g i n a l CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h s w
# model4 . t f l i t e : o r i g i n a l CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h s w
import j e t s o n . i n f e r e n c e
import j e t s o n . u t i l s
import a r g p a r s e
import s y s
import t e n s o r f l o w a s t f
from PIL import Image
import numpy a s np
import time
# p a r s e t h e command l i n e
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n=" C l a s s i f y ␣an␣ image ␣ u s i n g ␣ t f l i t e −mod
f o r m a t t e r _ c l a s s=a r g p a r s e . RawT
try :
opt = p a r s e r . parse_known_args ( ) [ 0 ]
except :
print ( " " )
parser . print_help ()
sys . exit (0)
# Get i n p u t and o u t p u t t e n s o r s
input_details = i n t e r p r e t e r . get_input_details ()
output_details = i n t e r p r e t e r . get_output_details ()
# Adjust input
h e i g h t = i n p u t _ d e t a i l s [ 0 ] [ ’ shape ’ ] [ 1 ]
width = i n p u t _ d e t a i l s [ 0 ] [ ’ shape ’ ] [ 2 ]
img = Image . open ( " images / "+opt . f i l e _ i n ) . r e s i z e ( ( width , h e i g h t ) )
input_data = np . expand_dims ( img , a x i s =0)
input_data = ( np . f l o a t 3 2 ( input_data ) − np . mean ( input_data ) ) / np . s t d ( input_data )
210
an increased computing time due to the first call of the model only occurs with the
image “airplane0”.
Table 19.4.: Computing time for the different models: A1 - Model with modified
dataset, AlexNet, 100 epochs, batch size in tflite model without quanti-
sation; A2 - Model with modified dataset, AlexNet, 100 epochs, batch
size in tflite model with quantisation; B1 - model with CIFAR-10 dataset,
AlexNet, 100 epochs, batch size 32 in tflite model without quantisation;
B2 - model with CIFAR-10 dataset, AlexNet, 100 epochs, batch size 32
in tflite model with quantisation.
Computing Time
image
A1 A2 B1 B2
airplane0 2.63 0.67 2.73 0.66
airplane1 0.31 0.13 0.35 0.13
airplane2 0.32 0.13 0.32 0.13
airplane3 0.32 0.13 0.31 0.13
airplane4 0.31 0.13 0.32 0.13
auto0 0.32 0.13 0.32 0.13
auto1 0.32 0.13 0.32 0.13
auto2 0.32 0.13 0.31 0.13
auto3 0.31 0.13 0.32 0.13
auto4 0.31 0.13 0.32 0.13
bird0 0.31 0.13 0.32 0.13
bird1 0.31 0.13 0.31 0.13
bird2 0.31 0.13 0.31 0.13
bird3 0.31 0.13 0.31 0.13
bird4 0.31 0.13 0.31 0.13
cat0 0.32 0.13 0.35 0.13
cat1 0.32 0.13 0.32 0.13
cat2 0.31 0.13 0.31 0.13
cat3 0.31 0.13 0.32 0.13
cat4 0.31 0.13 0.32 0.13
deer0 0.31 0.13 0.32 0.13
deer1 0.32 0.13 0.32 0.13
deer2 0.32 0.13 0.32 0.13
deer3 0.32 0.13 0.32 0.13
deer4 0.31 0.13 0.32 0.13
dog0 0.31 0.13 0.32 0.13
dog1 0.32 0.13 0.32 0.13
dog2 0.32 0.13 0.32 0.13
dog3 0.32 0.13 0.32 0.13
dog4 0.32 0.13 0.32 0.13
frog0 0.31 0.13 0.32 0.13
frog1 0.32 0.13 0.31 0.13
frog2 0.32 0.13 0.32 0.13
frog3 0.31 0.13 0.31 0.13
frog4 0.31 0.13 0.32 0.13
horse0 0.31 0.13 0.32 0.13
horse1 0.32 0.13 0.32 0.13
horse2 0.31 0.13 0.32 0.13
horse3 0.32 0.13 0.32 0.13
horse4 0.31 0.13 0.32 0.13
Chapter 19. 211
of Alex Krizhevsky [KSH12a] show that even the original AlexNet, when trained with
1.2 million high-resolution images of the ImageNet dataset, still has a top-1 error rate
of 37.5 %. In this respect, an error rate in the range of about twice this error rate
for a training with only 5,500 and still 60,000 images in a resolution of only 32 ×
32 is quite understandable. Also, other sources confirm the trajectories of accuracy
observed in the training on the training and validation data, which confirms that the
training was conducted correctly and thus the results.[Var20]
Due to these high error rates for models using the AlexNetAlexNet architecture, many
WS:sources more complex architectures have since been developed. If a reliable classification
should be realised in an application, it would make sense to use one of these more
advanced architectures, but this would also increase the training time accordingly.
This also applies to the use of many more training images. This would lead to a
stronger generalisation of the learned patterns, so that, for example, the risk that the
model classifies on the basis of the backgrounds instead of the motives can be reduced.
However, if the training time is not to be increased by such measures, it makes sense
to use an already pre-trained network and to train it further on the primary category
to be recognised for the application.
In this respect, only the first pass through the KDD scheme was realised here. Now,
in order to realise the actual application, it is necessary to proceed iteratively in order
to improve the mentioned aspects so that finally a model can be generated that can
be used in reality.
Regardless of the model quality, it can be stated that the models could be converted
into the .tflite format and evaluated on the Jetson Nano without major impairments.
The conversion did not necessarily increase the error rate. Even between the optimised
and non-optimised model, contrary to expectations, no increase in the error rate is
recognisable in the tests. On the other hand, the optimisation is effective in reducing
the file size and optimising the computing time.
#! / u s r / b i n / python
#
# S c r i p t f o r u s i n g t f l i t e −Models w i t h c o n n e c t e d camera i n ’ l i v e ’−mode ,
# 50 frames a r e c l a s s i f i e d and t h e c a t e g o r y w i t h t h e h i g h e s t p o s s i b i l i t y i s d i s p l
# I f a b o t t l e i s d e t e c t e d , t h e chosen Pin i s s e t t o h i g h
# Another Pin i n d i c a t e s w h e t h e r program i s r e a d y t o e x e c u t e d e t e c t i o n
# Pins t o be used as o u t p u t can be d e f i n e d b e l o w i m p o r t i n g commands
#
# The t f l i t e −Models were t r a i n e d w i t h a m a n i p u l a t e d CIFAR−10 d a t a s e t ( c a t e g o r y ’ b
#
# Documentation i n t h e Document ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano ’ by
# R e l a t e d t o t h e P r o j e c t ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano und TensorF
#
# Written by C. Joachim i n January 2021
# Based on t h e m o d i f i e d imagenet−camera f o r b o t t l e −d e t e c t i o n from p r e v i o u s p r o j e c
# and TensorFlow L i t e i n f e r e n c e w i t h Python from h t t p s : / /www. t e n s o r f l o w . o r g / l i t e /
# and h t t p s : / / g i t h u b . com/ t e n s o r f l o w / t e n s o r f l o w / b l o b / master / t e n s o r f l o w / l i t e / e x a m p
# o r i g i n a l imagenet−camera from h t t p s : / / g i t h u b . com/ d u s t y −nv / j e t s o n −i n f e r e n c e
#
# The Models :
# model1 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model2 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
#
# Can be run w i t h
# $ python3 . / t f l i t e −camera . py model1 . t f l i t e
# where t h e l a t t e r d e f i n e s t h e model t o use .
# For f u r t h e r o p t i o n s s e e p a r s i n g o f command l i n e
#
#
import j e t s o n . i n f e r e n c e
import j e t s o n . u t i l s
import a r g p a r s e
import s y s
import a t e x i t
import J e t s o n . GPIO a s GPIO
import numpy a s np
import time
B o t t l e D e t = 12
Running = 13
#p r e p a r e t h e Pins as Outputs
GPIO . setmode (GPIO .BOARD)
GPIO . s e t u p ( B ot tl eD et , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r b o t t l e d e t e c t e d
GPIO . s e t u p ( Running , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r image d e t e c t i o n runni
# p a r s e t h e command l i n e
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n=" C l a s s i f y ␣ a ␣ l i v e ␣ camera ␣ stream ␣ u s i n g
f o r m a t t e r _ c l a s s=a r g p a r s e . RawTextHelpFormatter )
p a r s e r . add_argument ( " model " , type=str , d e f a u l t=" model1 . t f l i t e " , help=" model ␣ t o ␣ lo
p a r s e r . add_argument ( "−−camera " , type=str , d e f a u l t=" 0 " , help=" i n d e x ␣ o f ␣ t h e ␣MIPI␣CS
p a r s e r . add_argument ( "−−width " , type=int , d e f a u l t =1280 , help=" d e s i r e d ␣ width ␣ o f ␣ cam
p a r s e r . add_argument ( "−−h e i g h t " , type=int , d e f a u l t =720 , help=" d e s i r e d ␣ h e i g h t ␣ o f ␣ ca
try :
opt = p a r s e r . parse_known_args ( ) [ 0 ]
except :
print ( " " )
214
#! / u s r / b i n / python
#
# S c r i p t f o r u s i n g t f l i t e −Models w i t h c o n n e c t e d camera i n ’ l i v e ’−mode
# Only c l a s s i f y t h e image whenever key ’ q ’ i s p r e s s e d ( t h e n t a k e a p h o t o and a n a l
# I f a b o t t l e i s d e t e c t e d , t h e chosen Pin i s s e t t o h i g h
# Another Pin i n d i c a t e s w h e t h e r program i s r e a d y t o e x e c u t e d e t e c t i o n
# Pins t o be used as o u t p u t can be d e f i n e d b e l o w i m p o r t i n g commands
#
# The t f l i t e −Models were t r a i n e d w i t h a m a n i p u l a t e d CIFAR−10 d a t a s e t ( c a t e g o r y ’ b
#
# Documentation i n t h e Document ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano ’ by
# R e l a t e d t o t h e P r o j e c t ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano und TensorF
#
# Written by C. Joachim i n January 2021
# Based on t h e m o d i f i e d imagenet−camera f o r b o t t l e −d e t e c t i o n from p r e v i o u s p r o j e c
# and TensorFlow L i t e i n f e r e n c e w i t h Python from h t t p s : / /www. t e n s o r f l o w . o r g / l i t e /
# and h t t p s : / / g i t h u b . com/ t e n s o r f l o w / t e n s o r f l o w / b l o b / master / t e n s o r f l o w / l i t e / e x a m p
# o r i g i n a l imagenet−camera from h t t p s : / / g i t h u b . com/ d u s t y −nv / j e t s o n −i n f e r e n c e
#
# The Models :
# model1 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model2 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
#
# Need t o i n s t a l l Keyboard Module w i t h
# sudo p i p 3 i n s t a l l k e y b o a r d
#
# Can be run w i t h
# $ sudo python3 . / t f l i t e −camera−b u t t o n . py model1 . t f l i t e
# where t h e l a t t e r d e f i n e s t h e model t o use .
# For f u r t h e r o p t i o n s s e e p a r s i n g o f command l i n e
#
# C r e a t e a f o l d e r named ’ out_images_camera ’ on same l e v e l b e f o r e h a n d s
#
#
import j e t s o n . i n f e r e n c e
import j e t s o n . u t i l s
import a r g p a r s e
import s y s
import a t e x i t
import J e t s o n . GPIO a s GPIO
import numpy a s np
import time
import keyboard
B o t t l e D e t = 12
Running = 13
#p r e p a r e t h e Pins as Outputs
GPIO . setmode (GPIO .BOARD)
GPIO . s e t u p ( B ot tl eD et , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r b o t t l e d e t e c t e d
GPIO . s e t u p ( Running , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r image d e t e c t i o n runni
# p a r s e t h e command l i n e
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n=" C l a s s i f y ␣ a ␣ l i v e ␣ camera ␣ stream ␣ u s i n g
f o r m a t t e r _ c l a s s=a r g p a r s e . RawTextHelpFormatter )
p a r s e r . add_argument ( " model " , type=str , d e f a u l t=" model1 . t f l i t e " , help=" model ␣ t o ␣ lo
p a r s e r . add_argument ( "−−camera " , type=str , d e f a u l t=" 0 " , help=" i n d e x ␣ o f ␣ t h e ␣MIPI␣CS
p a r s e r . add_argument ( "−−width " , type=int , d e f a u l t =1280 , help=" d e s i r e d ␣ width ␣ o f ␣ cam
Part IV.
Appendix
215
20. Material List
Number Designation Link Price
217
218
NVIDIA JETSON
NANO MODULE
SMALL. POWERFUL. POWERED BY AI.
KEY FEATURES
Jetson Nano module Environment
>> 128-core NVIDIA Maxwell GPU >> Operating Temperature: -25 C to 80 C*
>> Quad-core ARM® A57 CPU >> Storage Temperature: -25 C to 80 C
>> 4 GB 64-bit LPDDR4 >> Humidity: 85% RH, 85ºC
>> 16 GB eMMC 5.1 [non-operational]
>> 10/100/1000BASE-T Ethernet >> Vibration: Sinusoidal 5 G RMS 10 to 500
Hz, random 2.88 G RMS, 5 to 500 Hz
Power [non-operational]
>> Voltage Input: 5 V >> Shock: 140 G, half sine 2 ms duration
>> Module Power: 5 W~10 W [non-operational]
* See thermal design guide for details NVIDIA JETSON NANO MODULE | Data sheet | Jul19
219
220
© 2019 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, CUDA, Jetson, Jetson Nano, NVIDIA Maxwell, and TensorRT are
trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be
trademarks of the respective companies with which they are associated. ARM, AMBA and ARM Powered are registered trademarks of ARM
Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All other brands or product names are the property of their respective holders.
“ARM” is used to represent ARM Holdings plc; its operating company ARM Limited; and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea
Limited.; ARM Taiwan Limited; ARM France SAS; ARM Consulting (Shanghai) Co. Ltd.; ARM Germany GmbH; ARM Embedded Technologies Pvt.
Ltd.; ARM Norway, AS and ARM Sweden AB. Jul19
DATA SHEET
Audio
Multi-Stream HD Video and JPEG
Industry standard High Definition Audio (HDA) controller provides a
Video Decode
multichannel audio path to the HDMI interface.
H.265 (Main, Main 10): 2160p 60fps | 1080p 240fps
Memory H.264 (BP/MP/HP/Stereo SEI half-res): 2160p 60fps | 1080p
240fps
Dual Channel | System MMU | Memory Type: 4ch x 16-bit LPDDR4
| Maximum Memory Bus Frequency: 1600MHz | Peak Bandwidth: H.264 (MVC Stereo per view): 2160p 30fps | 1080p 120fps
25.6 GB/s | Memory Capacity: 4GB VP9 (Profile 0, 8-bit): 2160p 60fps | 1080p 240fps
VP8: 2160p 60fps | 1080p 240fps
Storage VC-1 (Simple, Main, Advanced): 1080p 120fps | 1080i 240fps
eMMC 5.1 Flash Storage | Bus Width: 8-bit | Maximum Bus MPEG-2 (Main): 2160p 60fps | 1080p 240fps | 1080i 240fps
Frequency: 200MHz (HS400) | Storage Capacity: 16GB Video Encode
H.265:2160p 30fps | 1080p 120fps
Boot Sources H.264 (BP/MP/HP): 2160p 30fps | 1080p 120fps
eMMC and USB (recovery mode) H.264 (MVC Stereo per view): 1440p 30fps | 1080p 60fps
VP8: 2160p 30fps | 1080p 120fps
Networking JPEG (Decode and Encode): 600 MP/s
10/100/1000 BASE-T Ethernet | Media Access Controller (MAC)
Peripheral Interfaces
Imaging xHCI host controller with integrated PHY: 1 x USB 3.0, 3 x USB 2.0
Dedicated RAW to YUV processing engines process up to | USB 3.0 device controller with integrated PHY | EHCI controller
1400Mpix/s (up to 24MP sensor) | MIPI CSI 2.0 up to 1.5Gbps (per with embedded hub for USB 2.0 | 4-lane PCIe: one x1/2/4 controller
lane) | Support for x4 and x2 configurations (up to four active | single SD/MMC controller (supporting SDIO 4.0, SD HOST 4.0) |
streams). 3 x UART | 2 x SPI | 4 x I2C | 2 x I2S: support I2S, RJM, LJM, PCM,
TDM (multi-slot mode) | GPIOs
Operating Requirements
Temperature Range (Tj): -25 – 97C* | Module Power: 5 – 10W | Mechanical
Power Input: 5.0V Module Size: 69.6 mm x 45 mm | PCB: 8L HDI | Connector: 260 pin
SO-DIMM
Note: Refer to the software release feature list for current software support; all features may not be available for a particular OS.
◊
Product is based on a published Khronos Specification and is expected to pass the Khronos Conformance Process. Current conformance status
can be found at www.khronos.org/conformance.
* See the Jetson Nano Thermal Design Guide for details. Listed temperature range is based on module Tj characterization.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 1
222
Revision History
Version Date Description
v0.1 JAN 2019 Initial Release
v0.7 MAY 2019 Description
• Memory: corrected peak bandwidth
• Peripheral Interfaces: corrected number of available I2C interfaces
Functional Overview
• Removed block diagram; see the Jetson Nano Product Design Guide for these details
Power and System Management
• Removed On-Module Internal Power Rails table
• Updated Power Domains table
• Updated Programmable Interface Wake Event table
• Updated Power Up/Down sequence diagrams
Pin Descriptions
• Updated throughout to reflect updated pinmux
• GPIO Pins: updated table to reflect dedicated GPIO pins only (see pinmux for ALL GPIO
capable pins)
Interface Descriptions
• Updated throughout to reflect updated pinmux
• Embedded DisplayPort (eDP) Interface: clarified DP use/limitations on DP0
• MIPI Camera Serial Interface (CSI) - Updated CSI description to remove erroneous
reference to virtual channels
Physical/Electrical Characteristics
• Absolute Maximum Ratings - Added reference to Jetson Nano Thermal Design Guide for
Operating Temperature; extended IDDMAX to 5A
• Pinout: Updated to reflect updated pinmux
• Package Drawing and Dimensions – Updated drawing
v0.8 OCT 2019 Description
• Operating Requirements: corrected Module Power to reflect power for module only
(previous stated range included module + IO); updated Temperature Range for clarity,
included maximum operating temperature and updated note to reflect module
temperature is based on Tj .
v1.0 FEB 2020 Pin Descriptions
• GPIO Pins: corrected pin number listing for GPIO01
Interface Descriptions
• High-Definition Multimedia Interface (HDMI) and DisplayPort (DP) Interfaces reference to
YUV output support
• Gigabit Ethernet – Corrected Realtek Gigabit Ethernet Controller part number
Physical/Electrical Characteristics
• Operating and Absolute Maximum Ratings – Added Mounting Force to Absolute
Maximum Ratings table.
• Package Drawing and Dimensions – Updated drawing
• Environmental & Mechanical Screening – Added section
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 2
Chapter 21. Data Sheets 223
Table of Contents
1.0 Functional Overview 4
1.1 Maxwell GPU........................................................................................................................................................................ 4
1.2 CPU Complex....................................................................................................................................................................... 5
1.2.1 Snoop Control Unit and L2 Cache....................................................................................................................... 5
1.2.2 Performance Monitoring ....................................................................................................................................... 6
1.3 High-Definition Audio-Video Subsystem............................................................................................................................. 6
1.3.1 Multi-Standard Video Decoder............................................................................................................................. 6
1.3.2 Multi-Standard Video Encoder ............................................................................................................................. 6
1.3.3 JPEG Processing Block ....................................................................................................................................... 7
1.3.4 Video Image Compositor (VIC) ............................................................................................................................ 7
1.4 Image Signal Processor (ISP) ............................................................................................................................................. 8
1.5 Display Controller Complex ................................................................................................................................................. 8
1.6 Memory ................................................................................................................................................................................. 9
2.0 Power and System Management 10
2.1 Power Rails ........................................................................................................................................................................ 11
2.2 Power Domains/Islands ..................................................................................................................................................... 11
2.3 Power Management Controller (PMC).............................................................................................................................. 12
2.3.1 Resets ................................................................................................................................................................. 12
2.3.2 System Power States and Transitions .............................................................................................................. 12
2.3.2.1 ON State ................................................................................................................................................... 12
2.3.2.2 OFF State ................................................................................................................................................. 13
2.3.2.3 SLEEP State............................................................................................................................................. 13
2.4 Thermal and Power Monitoring ......................................................................................................................................... 13
2.5 Power Sequencing ............................................................................................................................................................. 14
2.5.1 Power Up ............................................................................................................................................................ 14
2.5.2 Power Down........................................................................................................................................................ 14
3.0 Pin Descriptions 15
3.1 MPIO Power-on Reset Behavior ....................................................................................................................................... 15
3.2 MPIO Deep Sleep Behavior .............................................................................................................................................. 16
3.3 GPIO Pins........................................................................................................................................................................... 17
4.0 Interface Descriptions 18
4.1 USB..................................................................................................................................................................................... 18
4.2 PCI Express (PCIe)............................................................................................................................................................ 19
4.3 Display Interfaces............................................................................................................................................................... 20
4.3.1 MIPI Display Serial Interface (DSI) .................................................................................................................... 20
4.3.2 High-Definition Multimedia Interface (HDMI) and DisplayPort (DP) Interfaces............................................... 21
4.3.3 Embedded DisplayPort (eDP) Interface ............................................................................................................ 22
4.4 MIPI Camera Serial Interface (CSI) / VI ( Video Input) ..................................................................................................... 23
4.5 SD / SDIO ........................................................................................................................................................................... 25
4.6 Inter-IC Sound (I2S)............................................................................................................................................................ 25
4.7 Miscellaneous Interfaces ................................................................................................................................................... 26
4.7.1 Inter-Chip Communication (I2C) ........................................................................................................................ 26
4.7.2 Serial Peripheral Interface (SPI) ........................................................................................................................ 27
4.7.3 UART ................................................................................................................................................................... 29
4.7.4 Gigabit Ethernet.................................................................................................................................................. 30
4.7.5 Fan ...................................................................................................................................................................... 30
4.7.6 Debug .................................................................................................................................................................. 31
5.0 Physical / Electrical Characteristics 32
5.1 Operating and Absolute Maximum Ratings ...................................................................................................................... 32
5.2 Digital Logic ........................................................................................................................................................................ 33
5.3 Pinout.................................................................................................................................................................................. 35
5.4 Package Drawing and Dimensions ................................................................................................................................... 36
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 3
224
The Maxwell GPU architecture introduced an all-new design for the SM, redesigned all unit and crossbar structures, optimized
data flows, and significantly improved power management. The SM scheduler architecture and algorithms were rewritten to
be more intelligent and avoid unnecessary stalls, while further reducing the energy per instruction required for scheduling. The
organization of the SM also changed; each Maxwell SM (called SMM) is now partitioned into four separate processing blocks,
each with its own instruction buffer, scheduler and 32 CUDA cores.
The SMM CUDA cores perform pixel/vertex/geometry shading and physics/compute calculations. Texture units perform
texture filtering and load/store units fetch and save data to memory. Special Function Units (SFUs) handle transcendental and
graphics interpolation instructions. Finally, the Polymorph Engine handles vertex fetch, tessellation, viewport transform,
attribute setup, and stream output. The SMM geometry and pixel processing performance make it highly suitable for rendering
advanced user interfaces and complex gaming applications; the power efficiency of the Maxwell GPU enables this
performance on devices with power-limited environments.
Features:
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 4
Chapter 21. Data Sheets 225
GPU frequency and voltage are actively managed by Tegra Power and Thermal Management Software and influenced by
workload. Frequency may be throttled at higher temperatures (above a specified threshold) resulting in a behavior that
reduces the GPU operating frequency. Observed chip-to-chip variance is due to NVIDIA ability to maximize performance
(DVFS) on a per-chip basis, within the available power budget.
CPU frequency and voltage are actively managed by Tegra Power and Thermal Management Software and influenced by
workload. Frequency may be throttled at higher temperatures (above a specified threshold) resulting in a behavior that
reduces the CPU operating frequency. Observed chip-to-chip variance is due to NVIDIA ability to maximize performance
(DVFS) on a per-chip basis, within the available power budget.
2MB L2
Fixed line length of 64 bytes
16-way set-associative cache structure
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 5
226
The video decoder communicates with the memory controller through the video DMA which supports a variety of memory
format output options. For low power operations, the video decoder can operate at the lowest possible frequency while
maintaining real-time decoding using dynamic frequency scaling techniques.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 6
Chapter 21. Data Sheets 227
Features:
Color Decompression
High-quality Deinterlacing
Inverse Teleciné
Temporal Noise Reduction
- High-quality video playback
- Reduces camera sensor noise
Scaling
Color Conversion
Memory Format Conversion
Blend/Composite
2D Bit BLIT operation
Rotation
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 7
228
Features:
Flexible post-processing architecture for supporting custom computer vision and computational imaging operations
Bayer domain hardware noise reduction
Per-channel black-level compensation
High-order lens-shading compensation
3 x 3 color transform
Bad pixel correction
Programmable coefficients for de-mosaic with color artifact reduction
Color Artifact Reduction: a two-level (horizontal and vertical) low-pass filtering scheme that is used to reduce/remove
any color artifacts that may result from Bayer signal processing and the effects of sampling an image.
Enhanced down scaling quality
Edge Enhancement
Color and gamma correction
Programmable transfer function curve
Color-space conversion (RGB to YUV)
Image statistics gathering (per-channel)
- Two 256-bin image histograms
- Up to 4,096 local region averages
- AC flicker detection (50Hz and 60Hz)
- Focus metric block
Features:
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 8
Chapter 21. Data Sheets 229
1.6 Memory
The Jetson Nano integrates 4GB of LPDDR4 over a four-channel x 16-bit interface. Memory frequency options are 204MHz
and 1600MHz; maximum frequency of 1600MHz has a theoretical peak memory bandwidth of 25.6GB/s.
The Memory Controller (MC) maximizes memory utilization while providing minimum latency access for critical CPU requests.
An arbiter is used to prioritize requests, optimizing memory access efficiency and utilization and minimizing system power
consumption. The MC provides access to main memory for all internal devices. It provides an abstract view of memory to its
clients via standardized interfaces, allowing the clients to ignore details of the memory hierarchy. It optimizes access to shared
memory resources, balancing latency and efficiency to provide best system performance, based on programmable
parameters.
Features:
TrustZone (TZ) Secure and OS-protection regions
System Memory Management Unit
Dual CKE signals for dynamic power down per device
Dynamic Entry/Exit from Self -Refresh and Power Down states
The MC can sustain high utilization over a very diverse mix of requests. For example, the MC is prioritized for bandwidth (BW)
over latency for all multimedia blocks (the multimedia blocks have been architected to prefetch and pipeline their operations to
increase latency tolerance); this enables the MC to optimize performance by coalescing, reordering, and grouping requests to
minimize memory power. DRAM also has modes for saving power when it is either not being used, or during periods of
specific types of use.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 9
230
Power Management Controller (PMC) and Real Time Clock (RTC): These blocks reside in an Always On (not
power gated) partition. The PMC provides an interface to an external power manager IC or PMU. It primarily controls
voltage transitions for the SoC as it transitions to/from different low power modes; it also acts as a slave receiving
dedicated power/clock request signals as well as wake events from various sources (e.g., SPI, I2C, RTC, USB
attach) which can wake the system from a deep-sleep state. The RTC maintains the ability to wake the system based
on either a timer event or an external trigger (e.g., key press).
Power Gating: The SoC aggressively employs power-gating (controlled by PMC) to power-off modules which are
idle. CPU cores are on a separate power rail to allow complete removal of power and eliminate leakage. Each CPU
can be power gated independently. Software provides context save/restore to/from DRAM.
Clock Gating: Used to reduce dynamic power in a variety of power states.
Dynamic Voltage and Frequency Scaling (DVFS): Raises voltages and clock frequencies when demand requires,
lowers them when less is sufficient, and removes them when none is needed. DVFS is used to change the voltage
and frequencies in the following power domains: CPU, CORE, and GPU.
An optional back up battery can be attached to the PMIC_BBAT module input. It is used to maintain the RTC voltage when
VDD_IN is not present. This pin is connected directly to the onboard PMIC. When a backup cell is connected to the PMIC, the
RTC will retain its contents and can be configured to charge the backup cell.
The backup cells MUST provide a voltage in the range 2.5V to 3.5V. These is charged with a constant current (CC), constant
voltage (CV) charger that can be configured between 2.5V and 3.5V CV output and 50µA to 800µA CC.
251 VDD_IN Input 5.0V Power: Main DC input, supplies PMIC and other
252 regulators
253
254
255
256
257
258
259
260
235 PMIC_BBAT Bidirectional 1.65V-5.5V Power: PMIC Battery Back-up. Optionally used to
provide back-up power for the Real-Time Clock
(RTC). Connects to lithium cell or super capacitor on
carrier board. PMIC is supply when charging cap or
coin cell. Super cap or coin cell is source when
system is disconnected from power.
240 SLEEP/WAKE* Input CMOS – 5.0V PU Sleep / Wake. Configured as GPIO for optional use
to place system in sleep mode or wake system from
sleep.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 10
Chapter 21. Data Sheets 231
237 POWER_EN Input CMOS – 5.0V Module on/off: high = on, low = off.
233 SHUTDOWN_REQ* Output CMOS – 5.0V z Shutdown Request: used by the module to request a
shutdown from the carrier board (POWER_EN low).
100kΩ pull-up to VDD_IN (5V) on the module.
239 SYS_RESET* Bidirectional Open Drain, 1.8V 1 Module Reset. Reset to the module when driven low
by the carrier board. When module power sequence
is complete used as carrier board supply enable.
Used to ensure proper power on/off sequencing
between module and carrier board supplies. 4.7kΩ
pull-up to 1.8V on the module.
178 MOD_SLEEP* Output CMOS – 1.8V Indicates the module sleep status. Low is in sleep
mode, high is normal operation. This pin is
controlled by system software and should not be
modified.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 11
232
2.3.1 Resets
The PMC receives the primary reset event (from SYS_RESET*) and generates various resets for: PMC, RTC, and CAR. From
the PMC provided reset, the Clock and Reset (CAR) controller generates resets for most of the blocks in the module. In
addition to reset events, the PMC receives other events (e.g., thermal, WatchDog Timer (WDT), software, wake) which also
result in variants of system reset.
The RTC block includes an embedded real-time clock and can wake the system based on either a timer event or an external
trigger (e.g., key press).
ON SLEEP
EVENT EVENT
OFF ON SLEEP
OFF WAKE
EVENT EVENT
2.3.2.1 ON State
The ON power state is entered from either OFF or SLEEP states. In this state the Jetson module is fully functional and
operates normally. An ON event has to occur for a transition between OFF and ON states. The only ON EVENT currently used
is a low to high transition on the POWER_EN pin. This must occur with VDD_IN connected to a power rail, and POWER_EN is
asserted (at a logic1). The POWER_EN control is the carrier board indication to the Jetson module that the VDD_VIN power is
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 12
Chapter 21. Data Sheets 233
good. The Carrier board should assert this high only when VDD_IN has reached its required voltage level and is stable. This
prevents the Jetson module from powering up until the VDD_IN power is stable.
NOTE: The Jetson Nano module does include an Auto-Power-On option; a system input that enables the
module to power on if asserted. For more information on available signals and broader system
usage, see the Jetson Nano Product Design Guide.
HW Shutdown Set POWER_EN pin to zero for at least 100µS, the internal PMIC will In ON State
start shutdown sequence
Thermal Shutdown If the internal temperature of the Jetson module reaches an unsafe Any power state
temperature, the hardware is designed to initiate a shutdown
The SLEEP state can only be entered directly by software. For example, operating within an OS, with no operations active for
a certain time can trigger the OS to initiate a transition to the SLEEP state.
To Exit the SLEEP state a WAKE event must occur. WAKE events can occur from within the Jetson module or from external
devices through various pins on the Jetson Nano connector. A full list of Wake enabled pins is available in the pinmux.
Event Details
RTC WAKE up Timers within the Jetson module can be programmed, on SLEEP entry. When these expire they
create a WAKE event to exit the SLEEP state.
Thermal Condition If the Jetson module internal temperature exceeds programmed hot and cold limits the system is
forced to wake up, so it can report and take appropriate action (shut down for example)
USB VBUS detection If VBUS is applied to the system (USB cable attached) then the device can be configured to Wake
and enumerate
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 13
234
2.5.1 Power Up
During power up, the carrier board must wait until the signal SYS_RESET* is deasserted from the Jetson module before
enabling its power; the Jetson module will deassert the SYS_RESET* signal to enable the complete system to boot.
NOTE: I/O pins cannot be high (>0.5V) before SYS_RESET* goes high. When SYS_RESET* is low, the
maximum voltage applied to any I/O pin is 0.5V. For more information, refer to the Jetson Nano
Product Design Guide.
VDD_IN
POWER_EN
Module Power
SYS_RESET*
VDD_IN
SHUTDOWN_REQ*
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 14
Chapter 21. Data Sheets 235
The I/O pins on the SO-DIMM are comprised of both Single Function I/O (SFIO) and Multi-Purpose digital I/O (MPIO) pins.
Each MPIO can be configured to act as a GPIO or it can be assigned for use by a particular I/O controller. Though each MPIO
has up to five functions (GPIO function and up to four SFIO functions), a given MPIO can only act as a single function at a
given point in time. The functions for each pin on the Jetson module are fixed to a single SFIO function or as a GPIO. The
different MPIO pins share a similar structure, but there are several varieties of such pins. The varieties are designed to
minimize the number of on-board components (such as level shifters or pull-up resistors) required in Jetson Nano designs.
ST (standard) pins are the most common pins on the chip. They are used for typical General Purpose I/O.
DD (dual-driver) pins are similar to the ST pins. A DD pin can tolerate its I/O pin being pulled up to 3.3V (regardless
of supply voltage) if the pin’s output-driver is set to open-drain mode. There are special power-sequencing
considerations when using this functionality.
NOTE: The output of DD pins cannot be pulled High during deep-power-down (DPD).
CZ (controlled output impedance) pins are optimized for use in applications requiring tightly controlled output
impedance. They are similar to ST pins except for changes in the drive strength circuitry and in the weak pull-ups/-
downs. CZ pins are included on the VDDIO_SDMMC3 (Module SDMMC pins) power rail; also includes a CZ_COMP
pin. Circuitry within the Jetson module continually matches the output impedance of the CZ pins to the on-board pull-
up/-down resistors attached to the CZ_COMP pins.
LV_CZ (low voltage-controlled impedance) pins are similar to CZ pins but are optimized for use with a 1.2V supply
voltage (and signaling level). They support a 1.8V supply voltage (and signaling level) as a secondary mode. The
Jetson nano uses LV_CZ pins for SPI interfaces operating at 1.8V.
DP_AUX pin is used as an Auxiliary control channel for the DisplayPort which needs differential signaling. Because
the same I/O block is used for DisplayPort and HDMI to ensure the control path to the display interface is minimized,
the DP_AUX pins can operate in open-drain mode so that HDMI’s control path (i.e., DDC interface which needs I2C)
can also be used in the same pin.
An output driver with tristate capability, drive strength controls and push-pull mode, open-drain mode, or both
An input receiver with either Schmitt mode, CMOS mode, or both
A weak pull-up and a weak pull-down
MPIO pins are partitioned into multiple “pin control groups” with controls being configured for the group. During normal
operation, these per-pin controls are driven by the pinmux controller registers. During deep sleep, the PMC bypasses and then
resets the pinmux controller registers. Software reprograms these registers as necessary after returning from deep sleep.
Refer to the Tegra X1 (SoC) Technical Reference Manual for more information on modifying pin controls.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 15
236
NOTE: The output of DD pins cannot be pulled High during deep-power-down (DPD).
OD pins do NOT retain their output during DPD. OD pins should NOT be configured as GPIOs in a platform
where they are expected to hold a value during DPD.
ALL MPIO pins do NOT have identical behavior during deep sleep. They differ with regard to:
Input buffer behavior during deep sleep
- Forcibly disabled OR
- Enabled for use as a “GPIO wake event” OR
- Enabled for some other purpose (e.g., a “clock request” pin)
Output buffer behavior during deep sleep
- Maintain a static programmable (0, 1, or tristate) constant value OR
- Capable of changing state (i.e., dynamic while the chip is still in deep sleep)
Weak pull-up/pull-down behavior during deep sleep
- Forcibly disabled OR
- Can be configured
Pins that do not enter deep sleep
- Some of the pins whose outputs are dynamic during deep sleep are of special type and they do not enter deep
sleep (e.g., pins that are associated with PMC logic do not enter deep sleep, pins that are associated with JTAG
do not enter into deep sleep any time.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 16
Chapter 21. Data Sheets 237
206 GPIO07 Bidirectional CMOS – 1.8V [ST] pd Pulse Width Modulation Signal (PWM)
228 GPIO13 Bidirectional CMOS – 1.8V [ST] pd Pulse Width Modulation Signal
230 GPIO14 Bidirectional CMOS – 1.8V [ST] pd Pulse Width Modulation Signal
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 17
238
4.1 USB
Standard Notes
Universal Serial Bus Specification Revision 3.0 Refer to specification for related interface timing details.
Universal Serial Bus Specification Revision 2.0 USB Battery Charging Specification, version 1.0; including Data Contact Detect protocol
Modes: Host and Device
Speeds: Low, Full, and High
Refer to specification for related interface timing details.
Enhanced Host Controller Interface Specification Refer to specification for related interface timing details.
for Universal Serial Bus revision 1.0
An xHCI/Device controller (named XUSB) supports the xHCI programming model for scheduling transactions and interface
managements as a host that natively supports USB 3.0, USB 2.0, and USB 1.1 transactions with its USB 3.0 and USB 2.0
interfaces. The XUSB controller supports USB 2.0 L1 and L2 (suspend) link power management and USB 3.0 U1, U2, and U3
(suspend) link power managements. The XUSB controller supports remote wakeup, wake on connect, wake on disconnect,
and wake on overcurrent in all power states, including sleep mode.
Each USB 2.0 port operates in USB 2.0 High Speed mode when connecting directly to a USB 2.0 peripheral and operates in
USB 1.1 Full- and Low-Speed modes when connecting directly to a USB 1.1 peripheral. All USB 2.0 ports operating in High
Speed mode share one High-Speed Bus Instance, which means 480 Mb/s theoretical bandwidth is distributed across these
ports. All USB 2.0 ports operating in Full- or Low-Speed modes share one Full/Low-Speed Bus Instance, which means 12
Mb/s theoretical bandwidth is distributed across these ports.
The USB 3.0 port only operates in USB 3.0 Super Speed mode (5 Gb/s theoretical bandwidth).
87 GPIO0 Input USB VBUS, 5V USB 0 VBUS Detect (USB_VBUS_EN0). Do not feed 5V
directly into this pin; see the Jetson Nano Product Design
Guide for complete details.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 18
Chapter 21. Data Sheets 239
NOTE: Upstream Type 1 Vendor Defined Messages (VDM) should be sent by the Endpoint Port (EP) if the
Root Port (RP) also belongs to same vendor/partner; otherwise the VDM is silently discarded.
See the Jetson Nano Product Design Guide for supported USB 3.0/PCIe configuration and connection examples.
180 PCIE0_CLKREQ* Bidirectional Open Drain 3.3V z PCIe Reference Clock Request
This signal is used by a PCIe device to indicate it needs the
PCIE0_CLK_N and PCIE0_CLK_P to actively drive reference
clock. On module 47kΩ pull-up to 3.3V
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 19
240
Features:
PHY Layer
- Start / End of Transmission. Other out-of-band signaling
- Per DSI interface: one Clock Lane; two Data Lanes
- Supports link configuration – 1x 2
- Maximum link rate 1.5Gbps as per MIPI D-PHY 1.1v version
- Maximum 10MHz LP receive rate
Lane Management Layer with Distributor
Protocol Layer with Packet Constructor
Supports MIPI DSI 1.0.1v version mandatory features
Command Mode (One-shot) with Host and/or display controller as master
Clocks
- Bit Clock: Serial data stream bit-rate clock
- Byte Clock: Lane Management Layer Byte-rate clock
- Application Clock: Protocol Layer Byte-rate clock.
Error Detection / Correction
- ECC generation for packet Headers
- Checksum generation for Long Packets
Error recovery
High-Speed Transmit timer
Low-Power Receive timer
Turnaround Acknowledge Timeout
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 20
Chapter 21. Data Sheets 241
76 DSI_CLK_N Output MIPI D-PHY Differential output clock for DSI interface
78 DSI_CLK_P
82 DSI_D1_N Output MIPI D-PHY Differential data lanes for DSI interface.
84 DSI_D1_P
70 DSI_D0_N Bidirectional MIPI D-PHY Differential data lanes for DSI interface. DSI lane can read data
72 DSI_D0_P back from the panel side in low power (LP) mode.
The HDMI and DP interfaces share the same set of interface pins. A new transport mode was introduced in HDMI 2.0 to
enable link clock frequencies greater than 340MHz and up to 600MHz. For transfer rates above 340MHz, there are two main
requirements:
All link data, including active pixel data, guard bands, data islands and control islands must be scrambled.
The TMDS clock lane must toggle at CLK/4 instead of CLK. Below 340MHz, the clock lane toggles as normal
(independent of the state of scrambling).
Features:
HDMI
- HDMI 2.0 mode (3.4Gbps < data rate <= 6Gbps)
- HDMI 1.4 mode (data rate<=3.4Gbps)
- Multi-channel audio from HDA controller, up to eight channels 192kHz 24-bit.
- Vendor Specific Info-frame (VSI) packet transmission
- 24-bit RGB pixel formats
- Transition Minimized Differential Signaling (TMDS) functional up to 340MHz pixel clock rate
DisplayPort
- Display Port mode: interface is functional up to 540MHz pixel clock rate (i.e., 1.62GHz for RBR, 2.7GHz for HBR,
and 5.4GHz for HBR2).
- 8b/10b encoding support
- External Dual Mode standard support
- Audio streaming support
83 DP1_TXD3_P Differential Output AC-Coupled on Carrier Board DP Data lane 3 or HDMI Differential Clock. AC
81 DP1_TXD3_N [DP] coupling required on carrier board. For HDMI, pull-
downs (with disable) also required on carrier board.
77 DP1_TXD2_P Differential Output AC-Coupled on Carrier Board HDMI Differential Data lanes 2:0. AC coupling required
75 DP1_TXD2_N [DP] on carrier board. For HDMI, pull-downs (with disable)
also required on carrier board.
71 DP1_TXD1_P
69 DP1_TXD1_N HDMI:
65 DP1_TXD0_P DP1_TXD2_[P,N] = HDMI Lane 0
DP1_TXD1_[P,N] = HDMI Lane 1
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 21
242
96 DP1_HPD Input CMOS – 1.8V [ST] HDMI Hot Plug detection. Level shifter required as this
pin is not 5V tolerant.
94 HDMI_CEC Bidirectional Open Drain, 1.8V [DD] Consumer Electronics Control (CEC) one-wire serial
bus.
NVIDIA provides low level CEC APIs (read/write).
These are not supported in earlier Android releases.
For additional CEC support, 3rd party libraries need to
be made available.
100 DP1_AUX_P Bidirectional Open-Drain, 1.8V (3.3V tolerant DDC Serial Clock for HDMI. Level shifter required; pin
- DDC) [DP_AUX] is not 5V tolerant.
98 DP1_AUX_N Bidirectional Open-Drain, 1.8V (3.3V tolerant DDC Serial Data. Level shifter required; pin is not 5V
- DDC) tolerant.
83 DP1_TXD3_P Differential Output AC-Coupled on Carrier Board DisplayPort 1 Differential Data lanes 2:0. AC coupling
81 DP1_TXD3_N [DP] required on carrier board.
77 DP1_TXD2_P DP1_TXD2_[P,N] = DP Lane 2
75 DP1_TXD2_N DP1_TXD1_[P,N] = DP Lane 1
71 DP1_TXD1_P DP1_TXD0_[P,N] = DP Lane 0
69 DP1_TXD1_N
65 DP1_TXD0_P
63 DP1_TXD0_N
96 DP1_HPD Input CMOS – 1.8V [ST] DisplayPort 1 Hot Plug detection. Level shifter required
and must be non-inverting.
100 DP1_AUX_P Bidirectional Open-Drain, 1.8V [DP_AUX] DisplayPort 1 auxiliary channels. AC coupling required
98 DP1_AUX_N on carrier board.
eDP is a mixed-signal interface consisting of four differential serial output lanes and one PLL. This PLL is used to generate a
high frequency bit-clock from an input pixel clock enabling the ability to handle 10-bit parallel data per lane at the pixel rate for
the desired mode. Embedded DisplayPort (eDP) modes (1.6GHz for RBR, 2.16GHz, 2.43GHz, 2.7GHz for HBR, 3.42GHz,
4.32GHz and 5.4GHz for HBR2).
NOTE: eDP has been tested according to DP1.2b PHY CTS even though eDPv1.4 supports lower swing
voltages and additional intermediate bit rates. This means the following nominal voltage levels
(400mV, 600mV, 800mV, 1200mV) and data rates (RBR, HBR, HBR2) are tested. This interface can
be tuned to drive lower voltage swings below 400mV and can be programmed to other intermediate
bit rates as per the requirements of the panel and the system designer.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 22
Chapter 21. Data Sheets 243
59 DP0_TXD3_P Differential Output AC-Coupled on Carrier DP0 Differential Data. AC coupling & pull-
57 DP0_TXD3_N Board downs (with disable) required on carrier board.
53 DP0_TXD2_P [DP] DP0_TXD3_[P,N] = DisplayPort 0 Data Lane 3
51 DP0_TXD2_N DP0_TXD2_[P,N] = DisplayPort 0 Data Lane 2
47 DP0_TXD1_P DP0_TXD1_[P,N] = DisplayPort 0 Data Lane 1
45 DP0_TXD1_N DP0_TXD0_[P,N] = DisplayPort 0 Data Lane 0
41 DP0_TXD0_P
39 DP0_TXD0_N
88 DP0_HPD Input CMOS – 1.8V [ST] DP0 Hot Plug detection. Level shifter required
as this pin is not 5V tolerant
The Camera Serial Interface (CSI) is based on MIPI CSI 2.0 standard specification and implements the CSI receiver which
receives data from an external camera module with CSI transmitter. The Video Input (VI) block receives data from the CSI
receiver and prepares it for presentation to system memory or the dedicated image signal processor (ISP) execution
resources.
Features:
If the two streams come from a single source, then the streams are separated using a filter indexed on different data types. In
case of separation using data types, the normal data type is separated from the embedded data type.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 23
244
213 CAM_I2C_SCL Bidirectional Open Drain – 3.3V [DD] z Camera I2C Clock
215 CAM_I2C_SDA Bidirectional Open Drain – 3.3V [DD] z Camera I2C Data
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 24
Chapter 21. Data Sheets 245
4.5 SD / SDIO
Standard Notes
SD Specifications Part E1 SDIO Specification Version 4.00 Support for SD 4.0 Specification without UHS-II
The SecureDigital (SD)/Embedded MultiMediaCard (eMMC) controller is used to support the on-module eMMC and a single
SDIO interface made available for use with SDIO peripherals; it supports Default and High-Speed modes.
The SDMMC controller has a direct memory interface and is capable of initiating data transfers between memory and external
device. The SDMMC controller supports both the SD and eMMC bus protocol and has an APB slave interface to access
configuration registers. Interface is intended for supporting various compatible peripherals with an SD/MMC interface.
SD/SDIO Card 4 1.8 / 3.3 208 104 Available at connector for SDIO or SD Card use
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 25
246
The I2S controller transports streaming audio data between system memory and an audio codec. The I2S controller supports
I2S format, Left-justified Mode format, Right-justified Mode format, and DSP mode format, as defined in the Philips inter-IC-
sound (I2S) bus specification.
The I2S and PCM (master and slave modes) interfaces support clock rates up to 24.5760MHz.
The I2S controller supports point-to-point serial interfaces for the I2S digital audio streams. I2S-compatible products, such as
compact disc players, digital audio tape devices, digital sound processors, and those with digital TV sound may be directly
connected to the I2S controller. The controller also supports the PCM and telephony mode of data-transfer. Pulse-Code-
Modulation (PCM) is a standard method used to digitize audio (particularly voice) patterns for transmission over digital
communication channels. The Telephony mode is used to transmit and receive data to and from an external mono CODEC in
a slot-based scheme of time-division multiplexing (TDM). The I2S controller supports bidirectional audio streams and can
operate in half-duplex or full-duplex mode.
Features:
Basic I2S modes to be supported (I2S, RJM, LJM and DSP) in both Master and Slave modes.
PCM mode with short (one-bit-clock wide) and long-fsync (two bit-clocks wide) in both master and slave modes.
NW-mode with independent slot-selection for both Tx and Rx
TDM mode with flexibility in number of slots and slot(s) selection.
Capability to drive-out a High-z outside the prescribed slot for transmission
Flow control for the external input/output stream.
211 GPIO09 Output CMOS – 1.8V [ST] PD Audio Codec Master Clock (AUD_MCLK)
195 I2S0_DIN Input CMOS – 1.8V [CZ] PD I2S Audio Port 0 Data In
193 I2S0_DOUT Output CMOS – 1.8V [CZ] PD I2S Audio Port 0 Data Out
197 I2S0_FS Bidirectional CMOS – 1.8V [CZ] PD I2S Audio Port 0 Frame Select (Left/Right Clock)
199 I2S0_SCLK Bidirectional CMOS – 1.8V [CZ] PD I2S Audio Port 0 Clock
222 I2S1_DIN Input CMOS – 1.8V [ST] PD I2S Audio Port 1 Data In
220 I2S1_DOUT Output CMOS – 1.8V [ST] PD I2S Audio Port 1 Data Out
224 I2S1_FS Bidirectional CMOS – 1.8V [ST] PD I2S Audio Port 1 Frame Select (Left/Right Clock)
226 I2S1_SCLK Bidirectional CMOS – 1.8V [ST] PD I2S Audio Port 1 Clock
This general purpose I2C controller allows system expansion for I2C -based devices as defined in the NXP inter-IC-bus (I2C)
specification. The I2C bus supports serial device communications to multiple devices; the I2C controller handles clock source
negotiation, speed negotiation for standard and fast devices, 7-bit slave address support according to the I2C protocol and
supports master and slave mode of operation.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 26
Chapter 21. Data Sheets 247
The I2C controller supports the following operating modes: Master – Standard-mode (up to 100Kbit/s), Fast-mode (up to 400
Kbit/s), Fast-mode plus (Fm+, up to 1Mbit/s); Slave – Standard-mode (up to 100Kbit/s), Fast-mode (up to 400 Kbit/s), Fast-
mode plus (Fm+, up to 1Mbit/s).
185 I2C0_SCL Bidirectional Open Drain – 3.3V [DD] z Only 3.3V devices supported without level
187 I2C0_SDA z shifter. I2C 0 Clock/Data pins. On module 2.2kΩ
pull-up to 3.3V.
189 I2C1_SCL Bidirectional Open Drain – 3.3V [DD] z Only 3.3V devices supported without level
191 I2C1_SDA z shifter. I2C 1 Clock/Data pins. On module 2.2kΩ
pull-up to 3.3V.
232 I2C2_SCL Bidirectional Open Drain – 1.8V [DD] z Only 1.8V devices supported without level
234 I2C2_SDA z shifter. I2C 2 Clock/Data pins. On module 2.2kΩ
pull-up to 1.8V.
213 CAM_I2C_SCL Bidirectional Open Drain – 3.3V [DD] z Only 3.3V devices supported without level
2
215 CAM_I2C_SDA z shifter. Camera I C Clock/Data pins. On module
4.7kΩ pull-up to 3.3V.
Features:
Independent Rx FIFO and Tx FIFO.
Software controlled bit-length supports packet sizes of 1 to 32 bits.
Packed mode support for bit-length of 7 (8-bit packet size) and 15 (16-bit packet size).
SS_N can be selected to be controlled by software, or it can be generated automatically by the hardware on packet
boundaries.
Receive compare mode (controller listens for a specified pattern on the incoming data before receiving the data in the
FIFO).
Simultaneous receive and transmit supported
Supports Master mode. Slave mode has not been validated
108 SPI1_MISO Bidirectional CMOS – 1.8V [CZ] PD SPI 1 Master In / Slave Out
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 27
248
tCS
SPIx_CSx
SPIx_SCK 0 n
tSU
tHD
SPIx_MISO
Note: Polarity of SCLK is programmable. Data can be driven or input relative to either the rising edge (shown above) or falling edge.
tCSH
SPIx_CSx
SPIx_SCK 0 1 n
tDSU
SPIx_MOSI
tDH tDD
SPIx_MISO
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 28
Chapter 21. Data Sheets 249
4.7.3 UART
UART controller provides serial data synchronization and data conversion (parallel-to-serial and serial-to-parallel) for both
receiver and transmitter sections. Synchronization for serial data stream is accomplished by adding start and stop bits to the
transmit data to form a data character. Data integrity is accomplished by attaching a parity bit to the data character. The parity
bit can be checked by the receiver for any transmission bit errors.
NOTE: The UART receiver input has low baud rate tolerance in 1-stop bit mode. External devices must use
2 stop bits.
In 1-stop bit mode, the Tegra UART receiver can lose sync between Tegra receiver and the external
transmitter resulting in data errors/corruption. In 2-stop bit mode, the extra stop bit allows the Tegra
UART receiver logic to align properly with the UART transmitter.
Features:
Synchronization for the serial data stream with start and stop bits to transmit data and form a data character
Supports both 16450- and 16550-compatible modes. Default mode is 16450
Device clock up to 200MHz, baud rate of 12.5Mbits/second
Data integrity by attaching parity bit to the data character
Support for word lengths from five to eight bits, an optional parity bit and one or two stop bits
Support for modem control inputs
DMA capability for both Tx and Rx
8-bit x 36 deep Tx FIFO
11-bit x 36 deep Rx FIFO. Three bits of 11 bits per entry log the Rx errors in FIFO mode (break, framing, and parity
errors as bits 10, 9, 8 of FIFO entry)
Auto sense baud detection
Timeout interrupts to indicate if the incoming stream stopped
Priority interrupts mechanism
Flow control support on RTS and CTS
Internal loopback
SIR encoding/decoding (3/16 or 4/16 baud pulse widths to transmit bit zero)
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 29
250
188 GBE_LED_LINK Output Link LED (green) enable. Link LED only illuminates if link established is
1000. 100/10 will not cause the Link LED to light up.
4.7.5 Fan
The Jetson Nano includes PWM and Tachometer functionality to enable fan control as part of a thermal solution. The Pulse
Width Modulator (PWM) controller is a frequency divider with a varying pulse width. The PWM runs off a device clock
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 30
Chapter 21. Data Sheets 251
programmed in the Clock and Reset controller and can be any frequency up to the device clock maximum speed of 48MHz.
The PWFM gets divided by 256 before being subdivided based on a programmable value.
4.7.6 Debug
A debug interface is supported via JTAG on-module test points or serial interface over UART1. The JTAG interface can be
used for SCAN testing or communicating with integrated CPU. See the NVIDIA Jetson Nano Product Design Guide for more
information.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 31
252
WARNING: Exceeding the listed conditions may damage and/or affect long-term reliability of the part.
The Jetson Nano module should never be subjected to conditions extending beyond the
ratings listed below.
Absolute maximum ratings describe stress conditions. These parameters do not set minimum and maximum operating
conditions that will be tolerated over extended periods of time. If the device is exposed to these parameters for extended
periods of time, no guarantee is made and device reliability may be affected. It is not recommended to operate the Jetson
Nano module under these conditions.
V M_PIN Voltage applied to any powered -0.5 VDD + 0.5 V VDD + 0.5V when CARRIER_PWR_ON high &
I/O pin associated I/O rail powered.
I/O pins cannot be high (>0.5V) before
CARRIER_PWR_ON goes high.
When CARRIER_PWR_ON is low, the maximum
voltage applied to any I/O pin is 0.5V
DD pins configured as open -0.5 3.63 V The pin’s output-driver must be set to open-drain
drain mode
TOP Operating Temperature -25 97 °C See the Jetson Nano Thermal Design Guide for
details.
TSTG Storage Temperature (ambient) -40 80 °C
MMAX Mounting Force 4.0 kgf kilogram-force (kgf). Maximum force applied to
PCB. See the Jetson Nano Thermal Design
Guide for additional details on mounting a thermal
solution.
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 32
Chapter 21. Data Sheets 253
I2C[1,0] Output Low Voltage (IOL = 2mA) (see note) --- 0.3 x VDD V
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 33
254
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 34
Chapter 21. Data Sheets 255
5.4 Pinout
Signal Name Pin # Pin # Signal Name Signal Name Pin # Pin # Signal Name
Top Bottom Top Bottom
Odd Even Odd Even
GND 1 2 GND PCIE0 RX0 P 133 134 PCIE0 TX0 N
CSI1_D0_N 3 4 CSI0_D0_N GND 135 136 PCIE0_TX0_P
CSI1_D0_P 5 6 CSI0_D0_P PCIE0_RX1_N 137 138 GND
GND 7 8 GND PCIE0 RX1 P 139 140 PCIE0 TX1 N
RSVD 9 10 CSI0 CLK N GND 141 142 PCIE0 TX1 P
RSVD 11 12 CSI0_CLK_P RSVD 143 144 GND
GND 13 14 GND KEY KEY KEY KEY
CSI1 D1 N 15 16 CSI0 D1 N RSVD 145 146 GND
CSI1 D1 P 17 18 CSI0 D1 P GND 147 148 PCIE0 TX2 N
GND 19 20 GND PCIE0_RX2_N 149 150 PCIE0_TX2_P
CSI3_D0_N 21 22 CSI2_D0_N PCIE0_RX2_P 151 152 GND
CSI3 D0 P 23 24 CSI2 D0 P GND 153 154 PCIE0 TX3 N
GND 25 26 GND PCIE0 RX3 N 155 156 PCIE0 TX3 P
CSI3_CLK_N 27 28 CSI2_CLK_N PCIE0_RX3_P 157 158 GND
CSI3 CLK P 29 30 CSI2 CLK P GND 159 160 PCIE0 CLK N
GND 31 32 GND USBSS RX N 161 162 PCIE0 CLK P
CSI3 D1 N 33 34 CSI2 D1 N USBSS RX P 163 164 GND
CSI3_D1_P 35 36 CSI2_D1_P GND 165 166 USBSS_TX_N
GND 37 38 GND RSVD 167 168 USBSS TX P
DP0 TXD0 N 39 40 CSI4 D2 N RSVD 169 170 GND
DP0_TXD0_P 41 42 CSI4_D2_P GND 171 172 RSVD
GND 43 44 GND RSVD 173 174 RSVD
DP0 TXD1 N 45 46 CSI4 D0 N RSVD 175 176 GND
DP0 TXD1 P 47 48 CSI4 D0 P GND 177 178 MOD SLEEP*
GND 49 50 GND PCIE_WAKE* 179 180 PCIE0_CLKREQ*
DP0_TXD2_N 51 52 CSI4_CLK_N PCIE0_RST* 181 182 RSVD
DP0 TXD2 P 53 54 CSI4 CLK P RSVD 183 184 GBE MDI0 N
GND 55 56 GND I2C0 SCL 185 186 GBE MDI0 P
DP0_TXD3_N 57 58 CSI4_D1_N I2C0_SDA 187 188 GBE_LED_LINK
DP0_TXD3_P 59 60 CSI4_D1_P I2C1_SCL 189 190 GBE_MDI1_N
GND 61 62 GND I2C1 SDA 191 192 GBE MDI1 P
DP1_TXD0_N 63 64 CSI4 D3 N I2S0_DOUT 193 194 GBE LED ACT
DP1_TXD0_P 65 66 CSI4_D3_P I2S0_DIN 195 196 GBE_MDI2_N
GND 67 68 GND I2S0 FS 197 198 GBE MDI2 P
DP1 TXD1 N 69 70 DSI D0 N I2S0 SCLK 199 200 GND
DP1_TXD1_P 71 72 DSI_D0_P GND 201 202 GBE_MDI3_N
GND 73 74 GND UART1_TXD 203 204 GBE_MDI3_P
DP1 TXD2 N 75 76 DSI CLK N UART1 RXD 205 206 GPIO07
DP1 TXD2 P 77 78 DSI CLK P UART1 RTS* 207 208 GPIO08
GND 79 80 GND UART1_CTS* 209 210 CLK_32K_OUT
DP1_TXD3_N 81 82 DSI_D1_N GPIO09 211 212 GPIO10
DP1 TXD3 P 83 84 DSI D1 P CAM I2C SCL 213 214 FORCE RECOVERY*
GND 85 86 GND CAM I2C SDA 215 216 GPIO11
GPIO0 87 88 DP0_HPD GND 217 218 GPIO12
SPI0_MOSI 89 90 DP0_AUX_N SDMMC_DAT0 219 220 I2S1_DOUT
SPI0 SCK 91 92 DP0 AUX P SDMMC DAT1 221 222 I2S1 DIN
SPI0_MISO 93 94 HDMI CEC SDMMC DAT2 223 224 I2S1 FS
SPI0_CS0* 95 96 DP1_HPD SDMMC_DAT3 225 226 I2S1_SCLK
SPI0_CS1* 97 98 DP1_AUX_N SDMMC_CMD 227 228 GPIO13
UART0 TXD 99 100 DP1 AUX P SDMMC CLK 229 230 GPIO14
UART0_RXD 101 102 GND GND 231 232 I2C2 SCL
UART0_RTS* 103 104 SPI1_MOSI SHUTDOWN_REQ* 233 234 I2C2_SDA
UART0 CTS* 105 106 SPI1 SCK PMIC BBAT 235 236 UART2 TXD
GND 107 108 SPI1 MISO POWER EN 237 238 UART2 RXD
USB0_D_N 109 110 SPI1_CS0* SYS_RESET* 239 240 SLEEP/WAKE*
USB0_D_P 111 112 SPI1_CS1* GND 241 242 GND
GND 113 114 CAM0 PWDN GND 243 244 GND
USB1 D N 115 116 CAM0 MCLK GND 245 246 GND
USB1_D_P 117 118 GPIO01 GND 247 248 GND
GND 119 120 CAM1_PWDN GND 249 250 GND
USB2 D N 121 122 CAM1 MCLK VDD IN 251 252 VDD IN
USB2 D P 123 124 GPIO02 VDD IN 253 254 VDD IN
GND 125 126 GPIO03 VDD_IN 255 256 VDD_IN
GPIO04 127 128 GPIO05 VDD_IN 257 258 VDD_IN
GND 129 130 GPIO06 VDD IN 259 260 VDD IN
PCIE0_RX0_N 131 132 GND
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 35
256
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 36
Chapter 21. Data Sheets 257
JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 37
258
Notice
The information provided in this specification is believed to be accurate and reliable as of the date provided. However, NVIDIA Corporation
(“NVIDIA”) does not give any representations or warranties, expressed or implied, as to the accuracy or completeness of such information.
NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third
parties that may result from its use. This publication supersedes and replaces all other specifications for the product that may have been
previously supplied.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and other changes to this specification, at any
time and/or to discontinue any product or service without notice. Customer should obtain the latest relevant specification before placing
orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement,
unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer. NVIDIA hereby
expressly objects to applying any customer general terms and conditions with regard to the purchase of the NVIDIA product referenced in
this specification.
NVIDIA products are not designed, authorized or warranted to be suitable for use in medical, military, aircraft, space or life support
equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury,
death or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or
applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on these specifications will be suitable for any specified use without
further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole
responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the
application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality
and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this
specification. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable
to: (i) the use of the NVIDIA product in any manner that is contrary to this specification, or (ii) customer product designs.
No license, either express or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under
this specification. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to
use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under
the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property
rights of NVIDIA. Reproduction of information in this specification is permissible only if reproduction is approved by NVIDIA in writing, is
reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES,
EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL
IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards
customer for the products described herein shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.
HDMI
HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC.
ARM
ARM, AMBA and ARM Powered are registered trademarks of ARM Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All
other brands or product names are the property of their respective holders. ʺARMʺ is used to represent ARM Holdings plc; its operating
company ARM Limited; and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea Limited.; ARM Taiwan Limited; ARM France SAS;
ARM Consulting (Shanghai) Co. Ltd.; ARM Germany GmbH; ARM Embedded Technologies Pvt. Ltd.; ARM Norway, AS and ARM Sweden
AB
OpenCL
OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.
Trademarks
NVIDIA, the NVIDIA logo, and Tegra are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries.
Other company and product names may be trademarks of the respective companies with which they are associated.
Copyright
NVIDIA Corporation | 2788 San Tomas Expressway | Santa Clara, CA 95051 | www.nvidia.com
Chapter 21. 259
JETSON NANO
DEVELOPER KIT
User Guide
260
DA_09402_004
NOTE
Welcome to the NVIDIA Jetson platform! There two key things you should do right
away:
1. Sign up for the NVIDIA Developer Program – this enables you to ask
questions and contribute on the NVIDIA Jetson Forums, gives access to all
documentation and collateral on the Jetson Download Center, and more.
2. Read this User Guide! After that, check out these important links:
• Jetson FAQ – Please read the FAQ.
• Support Resources – This web page links to important resources, including
the Jetson Forum and the Jetson Ecosystem page.
• NVIDIA Jetson Linux Driver Package Developer Guide – Jetson Linux
Driver Package is a key component of the Jetson platform, and provides
the sample filesystem for your developer kit. Comprehensive
documentation may be found in the Developer Guide.
Thanks,
The NVIDIA Jetson team
TABLE OF CONTENTS
Note ........................................................................................... ii
JetPack ....................................................................................... 7
Summary of JetPack Components ............................................................ 7
How to Install JetPack ......................................................................... 9
Initial Configuration upon First Boot ........................................................ 10
The NVIDIA® Jetson Nano™ Developer Kit is an AI computer for makers, learners, and
developers that brings the power of modern artificial intelligence to a low-power, easy-
to-use platform. Get started quickly with out-of-the-box support for many popular
peripherals, add-ons, and ready-to-use projects.
Your carrier board revision is the last three characters of the 180-level part number,
which is printed the underside of the carrier board. Your Jetson Nano Developer Kit part
number is printed on the developer kit box.
Note The B01 revision carrier board is compatible with the production
specification Jetson Nano module. The A02 revision carrier board is
not.
Both revisions of the carrier board are described in this user guide.
Jetson Nano is supported by the comprehensive NVIDIA® JetPack™ SDK, and has the
performance and capabilities needed to run modern AI workloads. JetPack includes:
• Full desktop Linux with NVIDIA drivers
• AI and Computer Vision libraries and APIs
• Developer tools
• Documentation and sample code
In summary:
• You need a 16 GB or larger UHS-1 microSD card, HDMI or DP monitor, USB
keyboard and mouse, and 5V⎓2A Micro-USB power supply.
• Download the image and write it to the microSD card.
• Insert the microSD card into the slot under the Jetson Nano module, then attach the
display, keyboard, mouse, and Ethernet cable or wireless networking adapter.
• Connect the Micro-USB power supply. The developer kit powers on automatically.
Developer kit module and carrier board: rev B01 top view
Interface Details
This section highlights some of the Jetson Nano Developer Kit interfaces. See the Jetson
Nano Developer Kit Carrier Board Specification for comprehensive information.
Module
• [J501] Slot for a microSD card.
• The passive heatsink supports 10W module power usage at 25° C ambient
temperature. If your use case requires additional cooling, you can configure the
module to control a system fan. See the Jetson Nano Supported Component List for
fans that have been verified for attachment to the heatsink.
Carrier Board
• [DS3] Power LED; lights when the developer kit is powered on.
• [J2] SO-DIMM connector for Jetson Nano module.
• [J6] HDMI and DP connector stack.
• [J13] Camera connector; enables use of CSI cameras. Jetson Nano Developer Kit
works with IMX219 camera modules, including Leopard Imaging LI-IMX219-MIPI-
FF-NANO camera module and Raspberry Pi Camera Module V2.
• [J15] 4-pin fan control header. Pulse Width Modulation (PWM) output and
tachometer input are supported.
• [J18] M.2 Key E connector can be used for wireless networking cards; includes
interfaces for PCIe (x1), USB 2.0, UART, I2S, and I2C.
To reach J18 you must detach the Jetson Nano module.
• [J25] Power jack for 5V⎓4A power supply. (The maximum supported continuous
current is 4.4A.) Accepts a 2.1×5.5×9.5 mm plug with positive polarity.
• [J28] Micro-USB 2.0 connector; can be used in either of two ways:
• If J48 pins are not connected, you can power the developer kit from a 5V⎓2A
Micro-USB power supply.
• If J48 pins are connected, operates in Device Mode.
• [J32 and J33] are each a stack of two USB 3.0 Type A connectors. Each stack is limited
to 1A total power delivery. All four are connected to the Jetson Nano module via a
USB 3.0 hub built into the carrier board.
• [J38] The Power over Ethernet (POE) header exposes any DC voltage present on J43
Ethernet jack per IEEE 802.3af.
• [J40] Carrier board rev A02 only: 8-pin button header; brings out several system
power, reset, and force recovery related signals (see the following diagram).
• Pins 3 and 4 put the developer kit into Force Recovery Mode if they are
connected when it is powered on.
• [J41] 40-pin expansion header includes:
• Power pins.
Two 3.3V power pins and two 5V power pins. These are not switchable; power is
always available when the developer kit is connected to power.
Two 5V pins can be used to power the developer kit at 2.5A each.
• Interface signal pins.
All signals use 3.3V levels.
By default, all interface signal pins are configured as GPIOs, except pins 3 and 5
and pins 27 and 28, which are I2C SDA and SCL, and pins 8 and 10, which are
UART TX and RX. L4T includes a Python library, Jetson.GPIO, for controlling
GPIOs. See /opt/nvidia/jetson-gpio/doc/README.txt on your Jetson
system for details.
L4T includes the jetson-io utility to configure pins for SFIOs. See
“Configuring the 40-pin Expansion Header” in the L4T Development Guide for
more information.
• [J43] RJ45 connector for gigabit Ethernet.
• [J44] Carrier board rev A02 only: 3.3V serial port header; provides access to the
UART console.
• [J48] Enables either J28 Micro-USB connector or J25 power jack as power source for
the developer kit. Without a jumper, the developer kit can be powered by J28 Micro-
USB connector. With a jumper, no power is drawn from J28, and the developer kit
can be powered via J25 power jack.
• [J49] Carrier board rev B01 only: Camera connector; same as [J13].
• [J50] Carrier Board Rev B01 only: 12-pin button header; brings out system power,
reset, UART console, and force recovery related signals:
• Pin 1 connects to LED Cathode to indicate System Sleep/Wake (Off when system
is in sleep mode).
• Pin 2 connects to LED Anode.
• Pins 3 and 4 are respectively UART Receive and Send.
• Pins 5 and 6 disable auto power-on if connected.
• Pins 7 and 8 reset the system if connected when the system is running.
• Pins 9 and 10 put the developer kit into Force Recovery Mode if they are
connected when it is powered on.
• Pins 11 and 12 initiate power-on when connected if auto power-on is disabled.
Power Guide
Jetson Nano Developer Kit requires a 5V power supply capable of supplying 2A current.
The J25 power jack is 9.5 mm deep, and accepts positive polarity plugs with 2.1 mm
inner diameter and 5.5 mm outer diameter. As an example, NVIDIA has validated
Adafruit’s GEO241DA-0540 Power Supply for use with Jetson Nano Developer Kit.
The carrier board consumes between 0.5W (at 2A) and 1.25W (at 4A) with no peripherals
attached.
The Jetson Nano module is designed to optimize power efficiency and supports two
software-defined power modes. The default mode provides a 10W power budget for the
modules, and the other, a 5W budget. These power modes constrain the module to near
their 10W or 5W budgets by capping the GPU and CPU frequencies and the number of
online CPU cores at a pre-qualified level. See the NVIDIA Jetson Linux Driver Package
Developer Guide for details about power modes.
Note that the power mode budgets cover the two major power domains for the Jetson
Nano module: GPU (VDD_GPU) and CPU (VDD_CPU). Individual parts of the CORE
(VDD_CORE) power domain, such as video encode and video decode, are not covered
by these budgets. This is a reason why power modes constrain the module to near a
power budget, but not to the exact power budget. Your particular use case determines
the module’s actual power consumption. See the Jetson Nano module Data Sheet for details
about how power domains are used to optimize power consumption.
Attached peripherals are the final component of the developer kit’s total power usage.
Select a power supply that is capable of delivering sufficient power for your workload.
JETPACK
NVIDIA JetPack SDK is the most comprehensive solution for building AI applications. It
includes the latest OS images for Jetson products, along with libraries and APIs,
samples, developer tools, and documentation.
https://docs.nvidia.com/jetson/jetpack/index.html
OS Image
JetPack includes a reference file system derived from Ubuntu.
Sample Applications
JetPack includes several samples which demonstrate the use of JetPack components.
These are stored in the reference filesystem and can be compiled on the developer kit.
Developer Tools
JetPack includes the following developer tools. Some are used directly on a Jetson
system, and others run on a Linux host computer connected to a Jetson system.
• Tools for application development and debugging:
• Nsight Eclipse Edition for development of GPU accelerated applications: Runs
on Linux host computer. Supports all Jetson products.
• CUDA-GDB for application debugging: Runs on the Jetson system or the Linux
host computer. Supports all Jetson products.
• CUDA-MEMCHECK for debugging application memory errors: Runs on the
Jetson system. Supports all Jetson products.
• Tools for application profiling and optimization:
• Nsight Systems for application profiling across GPU and CPU: Runs on the
Linux host computer. Supports all Jetson products.
• nvprof for application profiling across GPU and CPU: Runs on the Jetson system.
Supports all Jetson products.
• Visual Profiler for application profiling across GPU and CPU: Runs on Linux
host computer. Supports all Jetson products.
• Nsight Graphics for graphics application debugging and profiling: Runs on the
Linux host computer. Supports all Jetson products.
Documentation
Documents that are relevant to developers using JetPack include:
• JetPack Documentation • Multimedia API Reference
Before using SDK Manager, follow these steps to power your developer kit and put it
into Force Recovery mode:
1. Jumper the Force Recovery pins (3 and 4) on J40 button header
2. Jumper the J48 Power Select Header pins and connect a power supply to J25 power
jack. The developer kit automatically powers on in Force Recovery mode.
3. Remove the Force Recovery pins’ jumper when the developer kit powers on.
Now use SDK Manager to flash your developer kit with the OS image and install other
Jetpack components. SDK Manager can also set up your Linux host computer
development environment. For full instructions, see the SDK Manager documentation.
Note Headless initial configuration requires that the developer kit not be
powered by a Micro-USB power supply, since the Micro-USB port is
needed to access the initial configuration prompts.
NVIDIA® Jetson™ Linux Driver Package (L4T) is the operating system component of
JetPack, and provides the Linux kernel, Bootloader, Jetson Board Support Package (BSP),
and sample filesystem for Jetson developer kits. L4T is included on the Jetson Nano SD
Card image. Alternatively, you can use SDK Manager to install L4T along with all the
other JetPack components to your developer kit.
L4T is also available for download directly from the main L4T page on the Jetson
Developer Site. See the “Quick Start Guide” section of the NVIDIA Jetson Linux Driver
Package Developer Guide for flashing instructions.
The “Platform Adaptation and Bring-Up” topic in the Developer Guide describes how to
port the Jetson BSP and bootloader from your developer kit to a new hardware platform
incorporating the Jetson module. Porting L4T to a new device enables use of the other
JetPack components on that device, along with the software you’ve created using the
developer kit.
COMPLIANCE INFORMATION
The NVIDIA Jetson Nano Developer Kit is compliant with the regulations listed in this
section.
UNITED STATES
Federal Communications Commission (FCC)
This device complies with part 15 of the FCC Rules. Operation is subject to the following
two conditions: (1) this device may not cause harmful interference, and (2) this device
must accept any interference received, including any interference that may cause
undesired operation of the device.
This equipment has been tested and found to comply with the limits for a Class B digital
device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide
reasonable protection against harmful interference in a residential installation. This
equipment generates uses and can radiate radio frequency energy and, if not installed
and used in accordance with the instructions, may cause harmful interference to radio
communications. However, there is no guarantee that interference will not occur in a
particular installation.
If this equipment does cause harmful interference to radio or television reception, which
can be determined by turning the equipment off and on, the user is encouraged to try to
correct the interference by one or more of the following measures:
FCC Warning: The FCC requires that you be notified that any changes or modifications
to this device not expressly approved by the manufacturer could void the user’s
authority to operate the equipment.
UL Recognized Component Logo for Embedded System Module for Jetson Nano, model
number P3448
CANADA
Innovation, Science and Economic Development Canada (ISED)
CAN ICES-3(B)/NMB-3(B)
This device complies with Industry Canada license-exempt RSS standard(s). Operation
is subject to the following two conditions: (1) this device may not cause interference, and
(2) this device must accept any interference, including interference that may cause
undesired operation of the device.
Le présent appareil est conforme aux CNR d'Industrie Canada applicables aux appareils
radio exempts de licence. L'exploitation est autorisée aux deux conditions suivantes : (1)
l'appareil ne doit pas produire de brouillage, et (2) l'utilisateur de l'appareil doit accepter
tout brouillage radioélectrique subi, même si le brouillage est susceptible d'en
compromettre le fonctionnement.
EUROPEAN UNION
European Conformity; Conformité Européenne (CE)
This product meets the applicable EMC requirements for Class B, I.T.E equipment and applicable
radio equipment requirements
JAPAN
Voluntary Control Council for Interference (VCCI)
日本工業規格JIS C 0950:2008により、2006年7月1日以降に販売される特定分野の電気および
電子機器について、製造者による含有物質の表示が義務付けられます。
特定化学物質記号
主な分類
Pb Hg Cd Cr(VI) PBB PBDE
PCBボード 0 0 0 0 0 0
除外項
パッシブ電子部品 0 0 0 0 0
目
除外項
アクティブ電子部品 0 0 0 0 0
目
除外項
コネクター / ケーブル 0 0 0 0 0
目
プロセッサー 0 0 0 0 0 0
メモリ 0 0 0 0 0 0
除外項
機械部品 0 0 0 0 0
目
はんだ付け材料 0 0 0 0 0 0
フラックス、クリームはん
0 0 0 0 0 0
だ、ラベル、その他消耗品
注:
1.「0」は、特定化学物質の含有率が日本工業規格JIS C 0950:2008に記載されている含有率基
準値より低いことを示します。
2.「除外項目」は、特定化学物質が含有マークの除外項目に該当するため、特定化学物質につ
いて、日本工業規格JIS C 0950:2008に基づく含有マークの表示が不要であることを示しま
す。
3.「0.1wt%超」または「0.01wt%超」は、特定化学物質の含有率が日本工業規格JIS C
0950:2008 に記載されている含有率基準値を超えていることを示します。
A Japanese regulatory requirement, defined by specification JIS C 0950: 2008, mandates that
manufacturers provide Material Content Declarations for certain categories of electronic
products offered for sale after July 1, 2006.
PCB 0 0 0 0 0 0
Connectors/Cables Exempt 0 0 0 0 0
Processor 0 0 0 0 0 0
Memory 0 0 0 0 0 0
Mechanicals Exempt 0 0 0 0 0
Soldering material 0 0 0 0 0 0
1. “0” indicates that the level of the specified chemical substance is less than the threshold
level specified in the standard, JIS C 0950: 2008.
2. “Exempt” indicates that the specified chemical substance is exempt from marking and it is
not required to display the marking for that specified chemical substance per the standard, JIS
C 0950: 2008.
3. “Exceeding 0.1wt%” or “Exceeding 0.01wt%” is entered in the table if the level of the
specified chemical substance exceeds the threshold level specified in the standard, JIS C 0950:
2008.
SOUTH KOREA
Radio Research Agency (RRA)
R-R-NVA-P3450 R-R-NVA-P3448
B급 기기 이 기기는 가정용(B급) 전자파적합기기로서 주
로 가정에서 사용하는 것을 목적으로 하며, 모
든 지역에서 사용할 수 있습니다.
110181-
상호 : 앤비디아홍콩홀딩즈리미티드( 영업소) 법인등록번호
0036373
문
준비 120-84-
대표자성명 카렌테레사번즈 사업자등록번호:
06711
주소 서울특별시 강남구 영동대로 511, 2101호 ( 삼성동, 코엑스무역타워)
제품 내용
제품의 종류 해당없음 제품명(규격) 해당없음
구비서류 : 없음
작성방법
This for is publicly certify That NVIDIA Company has undergone the confirmation and
evaluation procedures for the acceptable amounts of hazardous materials contained in
graphic card according to the regulations stipulated in Article 3 of the ‘Status on the
Recycling of Electrical and Electronic Products, and Automobiles’ and that company has
graphic card adhered to the Enforcement Regulations of Article 11, Item 1 of the statute.
Attachment: None
*Preparing the Form
① Please indicate the product category according to the categories listed in Article 8, Items
1and 2 of the ‘ Enforcement Ordinance of the Statute on the Recycling of Electrical, Electronic
and Automobile Materials’
② For electrical and electronic products, please indicate the Model Name (and number). For
automobiles, please indicate the Vehicle Identification Number.
③ Please indicate the name of manufacturer and/or importer of the product.
RUSSIA/KAZAKHSTAN/BELARUS
Customs Union Technical Regulations (CU TR)
This device complies with the technical regulations of the Customs Union (CU TR)
This device complies with the rules set forth by Federal Agency of Communications and the
Ministry of Communications and Mass Media
Federal Security Service notification has been filed.
TAIWAN
Bureau of Standards, Metrology & Inspection (BSMI)
限用物質含有情况標示聲明書
Declaration of the presence condition of the Restricted Substances Marking
設備名稱:Jetson Nano Developer Kit
Equipment Name: Jetson Nano Developer Kit
單元 限用物質及其化學符號
Parts Restricted substances and its chemical symbols
多溴聯 多溴二苯
铅 汞 镉 六價铬
苯 醚
(Pb ) (Hg) (Cd) (Cr(VI))
(PBB) (PBDE)
印刷電路板
O O O O O O
PCB
處理器
O O O O O O
Processor
主動電子零件 - O O O O O
Active components
被動電子零件
- O O O O O
Passive components
存儲設備
O O O O O O
Memory
機械部件
- O O O O O
Mechanicals
連接器/線材
- O O O O O
Connectors/Cable
焊接金屬
O O O O O O
Soldering material
助焊劑,錫膏,標籤及耗材
Flux, Solder Paste, label and O O O O O O
other consumable materials
備考1:O:系指該限用物質未超出百分比含量基準值
Note 1: O:indicates that the percentage content of the restricted substance does not exceed the
percentage of reference value of presence.
備考2: -:系指該项限用物質为排外项目。
Note 2:-:indicates that the restricted substance corresponds to the exemption.
注:環保使用期限的參考標識取决與產品正常工作的温度和濕度等條件
Note: The referenced Environmental Protection Use Period Marking was determined according to normal
operating use conditions of the product such as temperature and humidity.
CHINA
China RoHS Material Content Declaration
产品中有害物质的名称及含量
The Table of Hazardous Substances and their Content
根据中国《电器电子产品有害物质限制使用管理办法》
as required by Management Methods for Restricted Use of Hazardous Substances in Electrical
and Electronic Products
有害物质
部件名称 Hazardous Substances
Parts 多溴二苯
铅 汞 镉 六价铬 多溴联苯
醚
(Pb) (Hg) (Cd) (Cr(VI)) (PBB)
(PBDE)
印刷电路板
O O O O O O
PCB
处理器
O O O O O O
Processor
主动电子零件
X O O O O O
Active components
被动电子零件
X O O O O O
Passive components
存储设备
O O O O O O
Memory
机械部件
X O O O O O
Mechanicals
连接器/线材
X O O O O O
Connectors / Cable
焊接金属
O O O O O O
Soldering material
助焊剂,锡膏,标签及耗
材
Flux, Solder Paste, label O O O O O O
and other consumable
materials
本表格依据SJ/T 11364-2014 的规定编制
The table according to SJ/T 11364-2014
注:环保使用期限的参考标识取决于产品正常工作的温度和湿度等条件
Note: The referenced Environmental Protection Use Period Marking was determined according
to normal operating use conditions of the product such as temperature and humidity.
INDIA
India RoHS Compliance Statement
This product, as well as its related consumables and spares, complies with the reduction
in hazardous substances provisions of the “India E-waste (Management and Handling)
Rule 2016”. It does not contain lead, mercury, hexavalent chromium, polybrominated
biphenyls or polybrominated diphenyl ethers in concentrations exceeding 0.1 weight %
and 0.01 weight % for cadmium, except for where allowed pursuant to the exemptions
set in Schedule 2 of the Rule.
Notice
© 2017-2020 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, Jetson, Jetson Nano, and JetPack
are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other
company and product names may be trademarks of the respective companies with which they are associated.
NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO
WARRANTIES, EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND ALL
EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY
OR CONDITION OF TITLE, MERCHANTABILITY, SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE
AND NON-INFRINGEMENT, ARE HEREBY EXCLUDED TO THE MAXIMUM EXTENT PERMITTED BY LAW.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no
responsibility for the consequences of use of such information or for any infringement of patents or other rights
of third parties that may result from its use. No license is granted by implication or otherwise under any patent
or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change
without notice. This publication supersedes and replaces all information previously supplied. NVIDIA
Corporation products are not authorized for use as critical components in life support devices or systems
without express written approval of NVIDIA Corporation.
www.nvidia.com
Chapter 21. 285
Document History
Doc_Number
Table of Contents
Customizing the Jetson Nano 40-pin Expansion Header .........................................................1
Prerequisites .................................................................................................................... 1
Download and Customize the Pinmux Spreadsheet ...................................................... 1
Download the L4T Driver Package and Source Files ..................................................... 2
Update the U-Boot Pinmux .............................................................................................. 2
Update the CBoot Pinmux ................................................................................................ 4
Flash Jetson Nano............................................................................................................ 5
Jetson Nano Developer Kit 40-Pin Expansion Header Configuration DA_09565-001 | iii
288
The Jetson Nano Developer Kit carrier board includes a 40-pin expansion header. By default,
all interface signal pins are configured as GPIO inputs, except pins 3 and 5 and pins 27 and 28,
which are I2C SDA and SCL, and pins 8 and 10, which are UART TX and RX.
This application note describes how to alter the function of pins on the 40-pin header by using
the Jetson Nano Developer Kit pinmux spreadsheet. Note that the pinmux actually configures
the SoC on the Jetson module, which ultimately routes the SoC signals to the carrier board’s
40-pin header.
If you want to configure other SoC pins, see the full Jetson Nano module pinmux spreadsheet
and the NVIDIA L4T Development Guide for complete documentation.
Prerequisites
• A Jetson Nano Developer Kit.
• A computer running Linux (a Linux host) which has the GCC toolchain installed that is
recommended for building L4T. For more details, see the topic “The L4T Toolchain”
section in the L4T Development Guide.
• A computer running Microsoft Windows (a Windows host) with Microsoft Excel installed.
List of Figures
Figure 1. TXB Devices Typical Usage .................................................................................2
Figure 2. TXB Device Usage on Developer Kit Carrier Board...............................................2
Figure 3. Simplified TXB0108 Architecture Diagram ...........................................................3
Figure 4. One TXB Connection from Jetson Nano to 40-Pin Header ....................................6
Figure 5. Audio Codec (12S and MCLK) Example Connections ............................................7
Figure 6. Button and LED Example Connections ................................................................7
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | iv
298
Introduction
This application note describes how to work with the signals on the 40-pin expansion header
on the NVIDIA® Jetson Nano™ Developer Kit carrier board. All signals routed to this connector
from the Jetson Nano module, except for the I2C interfaces, pass through TI TXB0108RGYR
level shifters. This is necessary to level shift the signals from 1.8V levels from the module to
3.3V levels at the expansion connector pins. This level shifter retains the ability to pass
bidirectional signals without the need for a direction pin. There are some considerations that
must be considered when using the signals that come from (or go to) these level shifters.
These considerations are described in this application note.
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 1
Chapter 21. 299
Figure 1 shows the usage of the TXB devices and Figure 2 shows the TXB on the carrier board.
3.3V
VccA VccB
OE
TXB0108
A0 B0
A1 B1
Controller A2 B2
A3 B3
Peripheral
A4 B4
A5 B5
A6 B6
A7 B7
19 20
SF IO/GPIO_8 Pin N8 VccA VccB To 40-Pin SPI0_MISO 21 22
SPI1_MISO
SF IO/GPIO_9 Pin N9
1MΩ OE Header SPI0_SCK 23 24
SPI0_CS0*
TXB0108 25 26
SPI0_CS1*
I2C0_SDA I2C0_SCL
A0 B0 GPIO01
27 28
. A5 B5 38
I2S0_DOUT
37
. A6 B6 39 40
. A7 B7
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 2
300
OE
One
Shot
T1
4kΩ
One
Shot
T2
A B
One
T3 Shot
4kΩ
One
T4 Shot
The TXB level shifters have output buffers with ~4kΩ resistors in series which make them very
weak. A one-shot (OS) circuit is included to help with signal rise/fall times. The OS circuitry is
enabled when a rising or falling edge is sensed at one of the inputs.
When an A-port pin is connected to a push-pull driver output or a strong pull-up/down resistor
(equivalent to a push-pull driver) and driven high, the weak buffer (w/series resistor) drives the
B-port high along with the OS circuit which is enabled when the rising edge is sensed. The B-
port is driven high by both the buffer and T1 until the OS circuit times out after which only the
weak buffer continues to drive the output.
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 3
Chapter 21. 301
If, instead, the external push-pull driver drives the A-Port pin low, the B-port will be driven low
along with the OS circuit which is enabled when the falling edge is sensed. The B-port is driven
low by both the buffer and T2 until the OS circuit times out and only the weak buffer continues
to drive the output. (See Figure 3).
The TXB-type level shifter is called "weak-buffered,” because it is strong enough to hold the
output port high or low during a DC state with a high impedance load, but weak enough that it
can easily be over driven by a push/pull driver (or strong pull-up/down resistor). This allows
the device to support inputs or outputs on both the A- and B-port sides.
An external push-pull driver or strong pull-up/down resistor must be able to supply more than
±2 mA of current to reliably overdrive the weak output buffer which is always active as long as
the OE pin is active (high). If a pull-up/down resistor is used to force a TXB pin to a high/low
state, similar to a push-pull driver, the resistor should be ~VccX/2 (~1.65kΩ or stonger if VccX
is 3.3V, or ~0.9kΩ or stronger if VccX is 1.8V.
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 4
302
Output Enable
The TXB level shifters have an output enable (OE) pin. When low, the outputs (weak buffers and
OS circuits) are disabled. When high, the weak buffers are enabled always, and the OS circuits
are enabled when a rising and falling edge is detected. The Jetson Nano Developer Kit carrier
board pulls the OE pins to 1.8V, so they are always enabled when the system is powered on.
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 5
Chapter 21. 303
Following are several examples where pins from the 40-pin expansion header are connected
to some device or circuit and some things to consider. The following figure shows eight signals
from Jetson Nano that are routed to one of the TXB level shifters and then to the 40-pin
expansion header. These signals support the following options on Jetson Nano:
I2S0_SCLK: Audio I2S interface shift clock or GPIO
I2S0_DOUT: Audio I2S interface shift clock or GPIO
I2S0_DIN: Audio I2S interface shift clock or GPIO
I2S0_FS: Audio I2S interface shift clock or GPIO
GPIO09: GPIO or Audio Master Clock
GPIO13: GPIO or PWM
GPIO11: GPIO or General Purpose Clock
GPIO01: GPIO or General Purpose Clock
3.3V 5.0V
1.8V
1 2
3 4
20kΩ
3.3V 5 6
GPIO09 7 8
1MΩ VccA VccB 10
9
OE 11 12
I2S0_SCLK
TXB0108
Jetson 13
15
14
16
I2S0_SCLK 199 A0 B0 17 18
I2S0_DOUT 220 A1 B1 19 20
I2S0_DIN 195 A2 B2 21 22
I2S0_FS 197 A3 B3 23 24
A4 B4 25 26
GPIO09 211 A5 B5 27 28
GPIO01
GPIO13 228 A6 B6 30
GPIO11
29
GPIO11 216 A7 B7 32
GPIO13
31
GPIO01 118 34
I2S0_FS
33
36
I2S0_DIN
35
38
I2S0_DOUT
37
39 40
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 6
304
The following example shows possible connection of the Jetson Nano I2S0 interface and
GPIO09 (Audio MCLK) Audio Codec.
3.3V 5.0V
1 2
3 4
5 6
GPIO09 7 8 Audio Codec
9 10
I2S0_SCLK
11 12 SCLK
13 14 FS
15 16 DIN
17 18 DOUT
19 20
21 22 MCLK
23 24
25 26
27 28
GPIO01 30
29
GPIO11 32
GPIO13
31
34
I2S0_FS
33
36
I2S0_DIN
35
38
I2S0_DOUT
37
39 40
1 2
3 4
Button 5 6
GPIO09 7 8
9 10
11 12
I2S0_SCLK
13 14
15 16
5.0V 17 18
19 20
100Ω
21 22
23 24
25 26
27 28
GPIO01 30
29
GPIO11 32
6.8kΩ GPIO13
31
2N3904 34
I2S0_FS
33
36
I2S0_DIN
35
38
I2S0_DOUT
37
39 40
Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 7
Chapter 21. 305
Notice
This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or qualit y
of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness
of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for
the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This
document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any
time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless
otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA
hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced
in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment ,
nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property
or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and
therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all
parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the
applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and
perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product
designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirement s
beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on
or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under
this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use
such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the
patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights
of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and
in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED,
IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO
EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL,
PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE
OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that
customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described
herein shall be limited in accordance with the Terms of Sale for the product.
ARM
ARM, AMBA and ARM Powered are registered trademarks of ARM Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All other brands
or product names are the property of their respective holders. ʺARMʺ is used to represent ARM Holdings plc; its operating company ARM Limited;
and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea Limited.; ARM Taiwan Limited; ARM France SAS; ARM Consulting (Shanghai) Co. Ltd.;
ARM Germany GmbH; ARM Embedded Technologies Pvt. Ltd.; ARM Norway, AS and ARM Sweden AB.
Trademarks
NVIDIA, the NVIDIA logo, and Jetson Nano are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other
company and product names may be trademarks of the respective companies with which they are associated.
Copyright
© 2020 NVIDIA Corporation. All rights reserved.
• https://www.raspberrypi.org/products/raspberry-pi-high-quality-camera/
0
• https://www.reichelt.de/de/de/raspberry-pi-kamera-12mp-sony-imx477r-rasp-cam-hq-p276
html?PR
RASP CAM HQ
Die Raspberry Pi High Quality Kamera ist das neueste Kamerazubehör von Raspberry
Pi (ohne Objektiv, siehe Zubehör).
Sie bietet eine höhere Auflösung zum Vorgängermodell v2 (12 Megapixel, im Vergleich
zu 8 Megapixeln), eine höhere Empfindlichkeit (ca. 50 % mehr Fläche pro Pixel für
eine verbesserte Leistung bei schlechten Lichtverhältnissen) und ist für den Einsatz
mit Wechselobjektiven in C- und CS-Fassung ausgelegt. Andere Objektivformfaktoren
können jedoch mit Objektivadaptern von Drittanbietern angepasst werden.
Sie ist unter anderem für Industrie- und Verbraucheranwendungen, einschließlich
Sicherheitskameras konzipiert worden, die ein Höchstmaß an visueller Wiedergabetreue
und/oder die Integration mit Spezialoptiken erfordern.
Technische Daten
• 7,9 mm Sensor-Diagonale
• Objektiv-Standards: C-Mount
• integrierter IR-Sperrfilter
• Stativ-Montage: 1/4“-20
Lieferumfang
307
308
Hinweis
• zum Betreiben der Kamera wird die neuste Raspberry Pi Software benötigt
• Produktion bis mindestens Januar 2026
Ausführung
Modell Raspberry Pi
Video 30 fps 1920 × 1080 Pixel
Foto 4056 × 3040 Pixel
Allgemeines
Ausführung Standard
Auflösung 12 MP
Video 60 fps 1280 x 720 Pixel
Betrachtungswinkel 75◦
Bildsensor 1/4“
Anschlüsse / Schnittstellen
Anschluss CSI
Sonstiges
Spezifikation IMX477R
Lampeneigenschaften
Länge 38 mm
Maße
Breite 38 mm
Herstellerangaben
Hersteller RASPBERRY PI
Artikelnummer des Herstellers SC0261
Verpackungsgewicht 0.046 kg
RoHS konform
EAN / GTIN 0633696492738
CS-Mount-Adapter. Mittels Adapterkabel kann die Kamera außerdem auch mit dem
Raspberry Pi Zero verwendet werden.
Technische Daten
Sensor:
Sony IMX477R Gestapelter, hintergrundbeleuchteter Sensor
7,9 mm Sensor-Diagonale
1, 55µm × 1, 55µm Pixelgröße
Ausgang: RAW12/10/8, COMP8
Auflagemaß: Einstellbar (12,5 mm-22,4 mm)
Objektivtyp:
C-Mount
CS-Mount (C-CS Adapter inklusive)
IR-Sperrfilter: Integriert (Der Sperrfilter kann entfernt werden, dies ist jedoch irre-
versibel)
Länge des Flachbandkabels: 200 mm
Stativanschluss: 1/4“-20
Die Raspberry Pi High Quality Camera wird mindestens bis Januar 2026 in Produktion
bleiben
310
MAXIMALE AUFLÖSUNG DER KAMERA Technisch wäre der CCD Sensor der
Raspberry Pi HQ Kamera, gemäß Datenblatt, in der Lage, Fotos und Videos mit einer
Auflösung von 4K / 60Hz aufzunehmen.
Dies wäre aber nur mit einem 4-Lane CSI-2 Interface möglich. Der Raspberry Pi nutzt
jedoch ein 2-Lane CSI-2 Interface und hat darüber hinaus auch eine Limitierung durch
den H.264 Hardware Encoder.
Somit ist die Auflösung der HQ Kamera in Verbindung mit dem Raspberry Pi auf
1920×1080 für Videoaufnahmen und 2592×1944 für Fotos begrenzt.
BRENNWEITE, WEITWINKEL, TELE, ZOOM – KLEINER OBJEKTIV-GUIDE
Bei der Auswahl eines passenden Objektivs für die HQ Kamera, kommt man unweiger-
lich mit Fachbegriffen wie Brennweite, Tele- oder Weitwinkelobjektiv in Berührung.
Hier eine kurze Erklärung:
Grundsätzlich gilt: Die Brennweite eines Objektivs bestimmt das Sichtfeld deiner
Kamera und wird standardmäßig in Millimetern (mm) gemessen. Als Faustregel kann
man folgendes sagen: Je höher die Nummer / Zahl der Brennweite, desto enger das
Sichtfeld und desto dichter / größer erscheint das zu fotografierende Objekt.
Das bedeutet bei einem Objektiv mit 6 mm Brennweite, dass du einen großen Bereich
mit der Kamera “sehen” kannst, das Objekt was du fotografieren möchtest jedoch
relativ klein erscheint. Weitwinkelobjektive eignen sich daher ideal für Landschaftsauf-
nahmen oder Aufnahmen einer Gruppe von Personen.
Wenn du nun den Kirchturm in der Landschaft oder die Schwiegermutter in der
Hochzeitsgesellschaft fotografieren möchtest, eignet sich ein Teleobjektiv mit einer
Brennweite ab 16 mm.
Die ideale Symbiose aus Weitwinkel und Tele ist ein Zoomobjektiv. Hier lässt sich die
Brennweite über einen Zoomring flexibel ändern und das zu fotografierende Objekt
erscheint somit größer oder kleiner.
Die Firma Nikon hat auf dieser Seite einen super Objektivsimulatior, wo das Thema
Brennweite nochmal in Verwendung mit einer Nikon Kamera verständlich gezeigt wird.
VERBINDUNG / ANSCHLUSS DER KOMPONENTEN Da der Raspberry Pi Zero
einen kleineren Kameraanschluss hat, musst du zunächst das bei der HQ-Kamera
mitgelieferte Kabel austauschen. Dazu musst du vorsichtig den schwarzen Bügel des
ZIF Connectors nach hinten ziehen. Achtung: Wende hier nicht zu viel Kraft an, da
der Anschluss sehr empfindlich ist!
Ziehe nun das weiße Kabel aus dem Anschluss und verbinde das Raspberry Pi Zero
Adapterkabel, in dem du das Kabel mit den goldenen Kontaktflächen der breiten Seite
nach unten in den Connector schiebst. Anschließend schiebst du den schwarzen Bügel
wieder vorsichtig nach vorne.
Die schmale Seite des Adapterkabels kommt anschließend ebenfalls mit den goldenen
Kontakten nach unten zeigend in den Connector des Raspberry Pi Zero.
Schiebe nun noch die bespielte microSD Karte in den Pi Zero und verbinde das Micro
USB Kabel mit der Buchse USB / J10.
Deine Raspberry Pi Webcam ist nun einsatzbereit!
VERBINDUNG MIT DEM COMPUTER UND ERSTER TEST Du kannst nun den
USB A Stecker des micro USB Kabels mit deinem Computer verbinden. Die Webcam
wird nach ca. 10 Sekunden ohne Treiberinstallation als “Piwebcam” erkannt und steht
dir in allen Programmen mit Webcam-Unterstützung als Kamera zur Auswahl.
23. RPIZ CAM 16mm
https://www.reichelt.de/raspberry-pi-16mm-kameralinse-teleobjektiv-rpiz-cam-16mm-to-p2769
html?&nbc=1&trstct=lsbght_sldr::276919
Dieses 16mm Teleobjektiv ist für die originale Raspberry Pi Kamera konzipiert worden
Lieferumfang
16 mm Teleobjektiv (ohne Bildsensor-Platine, siehe Zubehör)
Ausführung
Modell Raspberry Pi
Allgemeines
Ausführung Zubehör
Sonstiges
Spezifikation Tele
Herstellerangaben
Hersteller RASPBERRY PI
Artikelnummer des Herstellers SC0123
Verpackungsgewicht 0.214 kg
RoHS konform
EAN / GTIN 9900002769213
311
24. RPIZ CAM 6mm WW
https://www.reichelt.de/raspberry-pi-6mm-kameralinse-weitwinkel-rpiz-cam-6mm-ww-p276922.
html?&nbc=1&tr
Dieses Weitwinkel-Kameraobjektiv ist für die originale Raspberry Pi Kamera konzipiert
worden.
Lieferumfang
6 mm Weitwinkelobjektiv (ohne Bildsensor-Platine, siehe Zubehör)
Blende: F1.2
Gewinde: CS
minimaler Objektabstand: 20 cm
Ausführung
Modell Raspberry Pi
Allgemeines
Ausführung Zubehör
Sonstiges
Spezifikation Weitwinkel
Herstellerangaben
Hersteller RASPBERRY PI
Artikelnummer des Herstellers SC0124
Verpackungsgewicht 0.107 kg
RoHS konform
EAN / GTIN 9900002769220
313
314
25. Data Sheets RaspCAM
Raspberry Pi
High Quality Camera
www.raspberrypi.org
315
316
© 2019 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, CUDA, Jetson, Jetson Nano, NVIDIA Maxwell, and TensorRT are
trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be
trademarks of the respective companies with which they are associated. ARM, AMBA and ARM Powered are registered trademarks of ARM
Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All other brands or product names are the property of their respective holders.
“ARM” is used to represent ARM Holdings plc; its operating company ARM Limited; and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea
Limited.; ARM Taiwan Limited; ARM France SAS; ARM Consulting (Shanghai) Co. Ltd.; ARM Germany GmbH; ARM Embedded Technologies Pvt.
Ltd.; ARM Norway, AS and ARM Sweden AB. Jul19
Raspberry Pi
High Quality Camera
www.raspberrypi.org
318
Overview
The Raspberry Pi High Quality Camera is the latest camera accessory from
Raspberry Pi. It offers higher resolution (12 megapixels, compared to 8
megapixels), and sensitivity (approximately 50% greater area per pixel for
improved low-light performance) than the existing Camera Module v2, and
is designed to work with interchangeable lenses in both C- and CS-mount
form factors. Other lens form factors can be accommodated using third-party
lens adapters.
The High Quality Camera provides an alternative to the Camera Module v2
for industrial and consumer applications, including security cameras,
which require the highest levels of visual fidelity and/or integration with
specialist optics. It is compatible with all models of Raspberry Pi computer
from Raspberry Pi 1 Model B onwards, using the latest software release from
www.raspberrypi.org.1
The package comprises a circuit board carrying a Sony IMX477 sensor,
an FPC cable for connection to a Raspberry Pi computer, a milled aluminium
lens mount with integrated tripod mount and focus adjustment ring, and a C- to
CS-mount adapter.
1
Excluding early Raspberry Pi Zero models, which lack the necessary FPC connector. Later Raspberry Pi Zero
models require an adapter FPC, sold separately.
Specification
2
Can be removed to enable IR sensitivity. Modification is irreversible.
Physical specifications
10.2
5.02 10.16
1.2 4
0.8
5.8 1.4
2.5
22.4
38
30.75 8.5
36
7.62 13.97
11.43 38
SAFETY INSTRUCTIONS
To avoid malfunction or damage to this product, please observe the following:
• Before connecting the device, shut down your Raspberry Pi computer and disconnect it from
external power.
• If the cable becomes detached, pull the locking mechanism forward on the connector, insert the ribbon
cable ensuring the metal contacts face towards the circuit board, then push the locking mechanism back
into place.
• This device should be operated in a dry environment at 0–50°C.
• Do not expose it to water or moisture, or place on a conductive surface whilst in operation.
• Do not expose it to excessive heat from any source.
• Care should be taken not to fold or strain the ribbon cable.
• Care should be taken when screwing in parts or fitting a tripod. A cross-thread can cause irreparable
damage and void the warranty.
• Take care whilst handling to avoid mechanical or electrical damage to the printed circuit board
and connectors.
• Avoid handling the printed circuit board whilst it is powered and only handle by the edges to minimise the
risk of electrostatic discharge damage.
• Store in a cool, dry location.
• Avoid rapid changes of temperature, which can cause moisture build up in the device, affecting
image quality.
MIPI DSI and MIPI CSI are service marks of MIPI Alliance, Inc
Raspberry Pi and the Raspberry Pi logo are trademarks of the Raspberry Pi Foundation
www.raspberrypi.org
Chapter 25. 323
Raspberry Pi
High Quality Camera
Getting started
Operating instructions, regulatory compliance
information, and safety information
www.raspberrypi.org
324
Operating instructions
Dust cap
Mounting holes
Fitting lenses
The High Quality Camera is designed to accept CS-mount lenses. An optional
adapter is supplied to extend the back focus by 5 mm , such that the camera is
also compatible with C-mount lenses.
Please ensure that the dust cap is fitted when there is no lens fitted, because
the sensor is sensitive to dust. To fit a lens, unscrew the dust cap and screw the
lens into the threads. Remove the C-CS adapter when a CS-mount lens is to be
fitted; it is only required when a C-mount lens is fitted.
The CGL 6 mm CS-mount and 16 mm C-mount lenses are examples of third-
party products that are compatible with the High Quality Camera. See step-by-
step instructions for fitting the CS-mount and C-mount lenses.
2. L
oosen the back focus lock screw with a small flat screwdriver.
3. A
djust the back focus height by turning the back focus adjustment ring
clockwise or anti-clockwise relative to the main housing until the camera
is in focus.
4. Tighten the back focus lock screw.
Tripod mount
The tripod mount is an optional component, and it can be unscrewed when it is
not needed. If it is needed, take care not to damage the ribbon when screwing
the tripod into the camera.
Regulatory compliance
EU
The Raspberry Pi High Quality Camera is in conformity with the following
applicable community harmonised legislation:
Electromagnetic Compatibility Directive (EMC) 2014/30/EU,
Restriction of Hazardous Substances (RoHS) Directive 2011/65/EU
The following harmonised standards have been used to demonstrate conformity
to these standards:
EN 55032:2015
EN 55024:2010
IEC60695-2-11
EN60950-1
FCC
The Raspberry Pi High Quality Camera is in conformity with the requirements
of the following specifications:
FCC 47 CFR Part 15, Subpart B, Class B Digital Device.
This device complies with part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) This device may not cause harmful interference,
and (2) this device must accept any interference received, including interference
that may cause undesired operation.
Safety information
WARNINGS
• This product should only be connected to and powered by a Raspberry Pi computer. Any external power
supply used with the Raspberry Pi should comply with relevant regulations and standards applicable in
the country of intended use.
• This product should be operated in a well ventilated environment and should not be covered.
• This product should be placed on a stable, flat, non-conductive surface while it is in use, and it should not
be contacted by conductive items.
MIPI DSI and MIPI CSI are service marks of MIPI Alliance, Inc
Raspberry Pi and the Raspberry Pi logo are trademarks of the Raspberry Pi Foundation
www.raspberrypi.org
330
10.2
5.02 10.16
1.2 4
0.8
5.8 1.4
2.5
22.4
38
30.75 8.5
36
7.62 13.97
38
镜头参数(Parameter of Lens)
型号(Model): PT361060M3MP12
通光孔径(Aperture) F1.2
接口 (Mount) CS
重量(Weight) 53g
尺寸(Size)
16 mm C-mount lens
The 16 mm lens provides a higher quality image
than the 6 mm lens. It has a narrow angle of view
which is more suited to viewing distant objects.
Aperture
To adjust the aperture, hold the camera with the
lens facing away from you. Turn the inner ring,
closest to the camera, while holding the camera
steady. Turn clockwise to close the aperture and
reduce image brightness. Turn anti-clockwise to
open the aperture. Once you are happy with the
light level, tighten the screw on the side of the
lens to lock the aperture into position.
Focus
To adjust focus, hold the camera with the
lens facing away from you. Turn the focus
ring, labelled “NEAR FAR”,
anti-clockwise to focus on a nearby
object. Turn it clockwise to focus on a
distant object. You may find you need to
adjust the aperture again after this.
334
6 mm CS-mount lens
A low-cost 6 mm lens is available for the High Quality
Camera. This lens is suitable for basic photography.
It can also be used for macro photography because it
can focus objects at very short distances.
Aperture
To adjust the aperture, hold the camera with the lens
facing away from you. Turn the middle ring while
holding the outer ring, furthest from the camera, steady.
Turn clockwise to close the aperture and reduce image
brightness. Turn anti-clockwise to open the aperture.
Once you are happy with the light level, tighten the
screw on the side of the lens to lock the aperture.
Focus
To adjust focus, hold the camera with the lens facing
away from you. Hold the outer two rings of the lens;
this is easier if the aperture is locked as described
above. Turn the camera and the inner ring
anti-clockwise relative to the two outer rings to focus
on a nearby object. Turn them clockwise to focus on a
distant object. You may find you need to adjust the
aperture again after this.
Chapter 25. 335
Konnektivität (Anschluss, Verbindung)
Gewinde/Anschluss 1/4“ (6,4 mm), 3D, Ball
Physikalische Eigenschaften
Art Ministativ
Bein-Segmente 3-teilig (2-fach ausziehbar)
Haltegriff Keinen
Material Metall
Stativkopf 3D-Kugelkopf
Abmessung & Gewicht
Gewicht 125 g
Min. Höhe - Max. Höhe 14-26 cm
Anwendungsgebiet
Einsatz Foto, Video (3D)
Empfohlene Anwendung Foto-/Video
Gestaltung (Farbe, Design, Motiv, Serie)
Farbe Schwarz
Farbton Schwarz
Produktbereich Foto & Video
Serie Ball XL
ArtikelNummer 00004065
GTIN 4007249040657
337
338
[Gö] U. Göttingen, ed. Kapitel 3: Erste Schritte der Datenanalyse. url: https:
//www.uni-goettingen.de/de/document/download/9b4b2033cba125be183719130e524467.
pdf/mvsec3.pdf.
[HDC20] L. Huawei Device Co., ed. Huawei Kirin 990. https://consumer.huawei.
com/en/campaign/kirin-990-series/Huawei Kirin 990. 2020.
[Hdf] Hierarchical Data Format, version 5. 1997. url: http://www.hdfgroup.
org/HDF5/.
[He+16] K. He et al. “Deep Residual Learning for Image Recognition”. In: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). [pdf]. 2016. doi: 10.1109/cvpr.2016.90.
[How+] A. G. Howard et al. MobileNets: Efficient Convolutional Neural Networks
for Mobile Vision Applications. [pdf]. url: http://arxiv.org/pdf/
1704.04861v1.
[Int19a] Intel Corporation, ed. Intel Neural Compute Stick 2 - Data Sheet. . .
/IntelNCS2 / ExterneDokumente / Intel / NCS2 _ Datasheet - English .
pdfpdf. 2019. url: https://ark.intel.com/content/www/de/de/
ark/products/140109/intel-neural-compute-stick-2.html.
[Int19b] Intel Corporation, ed. OpenVino - System Requirements. 2019. url: https:
/ / software . intel . com / content / www / us / en / develop / tools /
openvino-toolkit/system-requirements.html.
[JES20] Y. Jia and E. Evan Shelhamer. Caffe. Ed. by B. A. I. Research. 2020.
url: http://caffe.berkeleyvision.org/.
[JL07] J. Janssen and W. Laatz. Statistische Datenanalyse mit SPSS für Windows.
Vol. 6. Springer-Verlag, 2007.
[Jai+20] A. Jain et al. Efficient Execution of Quantized Deep Learning Models: A
Compiler Approach. 2020. doi: arxiv-2006.10226. arXiv: 2006.10226
[cs.DC]. url: http://arxiv.org/pdf/2006.10226v1.
[Jet19] Jetsonhacks. Jetson Nano GPIO. 2019. url: https://www.jetsonhacks.
com/2019/06/07/jetson-nano-gpio/.
[Jun+19] B. Junjie et al. ONNX: Opern Neural Network Exchange. 2019. url:
https://github.com/onnx/onnx.
[KH09] A. Krizhevsky and G. Hinton. Learning Multiple Layers of Features from
Tiny Images. 2009. url: https : / / www . cs . toronto . edu / ~kriz /
learning-features-2009-TR.pdf.
[KK07] I. Kononenko and M. Kukar, eds. Machine Learning and Data Mining.
Woodhead Publishing, 2007. isbn: 978-1-904275-21-3.
[KM06] L. A. Kurgan and P. Musilek. “A Survey of Knowledge Discovery and Data
Mining Process Models”. In: The Knowledge Engineering Review 21.1
(2006), pp. 1–24. issn: 0269-8889. doi: 10.1017/S0269888906000737.
[KSH12a] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “ImageNet Classifica-
tion with Deep Convolutional Neural Networks”. In: Advances in Neural
Information Processing Systems. Ed. by F. Pereira et al. Vol. 25. Cur-
ran Associates, Inc., 2012, pp. 1097–1105. doi: 0.1145/3065386. url:
proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-
Paper.pdf.
[KSH12b] A. Krizhevsky, I. Sutskever, and G. Hinton. “ImageNet Classification with
Deep Convolutional Neural Networks”. In: Neural Information Processing
Systems 25 (Jan. 2012). doi: 10 . 1145 / 3065386. url: http : / / www .
cs . toronto . edu / ~kriz / imagenet _ classification _ with _ deep _
convolutional.pdf.
342
347
348
Dataset Keras, 78
CIFAR-10, 65, 201, 206
CIFAR-100, 65, 201, 206 LeNet, 53
CIFAR10, 201 LightGBM, 81
CIFAR100, 201
CIIFAR-10, 211 math, 83
matplotlib, 80
Fisher’s Iris Data Set, 67–70, 162,
Microsoft Cognitive Toolkit, 79
164–167, 169, 172
Mlpy, 82
ImageNet, 75
MNIST, 155, 156
MNIST, 64, 65
MobileNet, 54
TensorFlow, 65
MobileNetV2, 54
dataset
MonileNet, 55
CIFAR-10, 206
MXNet, 79
Datensatz
CIFAR, 11, 12, 15, 60, 62, 65–67, 201, NetworkX, 81
206, 211 NIST Special Database 1
NIST Special Database 3, 15, 64 see Datensatz, 15, 64
NIST Special Database 1, 15, 64, 65 NIST Special Database 3
Deeplearning4J, 79 see Datensatz, 15, 64
DeepScene, 76 NLTK, 81
NumPy, 80
EfficientNet, 54
Eli5, 82 Object detection, 52
OpenCV, 82
FastAI, 79 OpenNN, 83
feature maps, 56 OpenVINO, 82
Frameworks, 77 os, 83
General Purpose Input Output Pandas, 80
see GPIO, 6, 15, 98 Pattern, 81
glob, 83 pickle, 84
GoogleNet, 76 PIL, 83
GPIO, 6, 15, 98, 99, 109 PyPI, 84
GPU, 15, 53, 79, 82, 192–194, 199 Python Software Foundation, 84
Graphics Processing Unit PyTorch, 78
see GPU, 15, 53
random, 83
hyperparameter, 55 Rectified Linear Unit
hyperparameters, 55 see ReLu, 15, 58
ReLu, 15, 58, 62
I2 C, 15, 99 Residual Neural Network
I2 S, 15, 99 see ResNet, 15, 54
Image classification, 52 ResNet, 15, 54, 55, 75
ImageNet, 55, 76 ResNets, 76
see Dataset 75 RLE, 15, 74
Inception, 55, 76 Run-Length Encoding, 15, 74
InceptionNet, 54
Inter-IC Sound SBC, 15, 95
see I2 S, 15, 99 scikit-learn, 80
Inter-Integrated Circuit SciPy, 80
see I2 C, 15, 99 SCL, 15, 99
IPython, 80 Scrapy, 81
Index 349
SDA, 15, 99
Seaborn, 81
Secure Shell
see SSH, 15, 107
segmentation, 52
Serial Clock
see SCL, 15, 99
Serial Data
see SDA, 15, 99
Serial Peripheral Interface
see SPI, 15, 99
Single-Board-Computer
see SBC, 15, 95
Speicherprogrammierbare Steuerung
see SPS, 15, 93
SPI, 15, 99
SPS, 15, 93
SSD-Mobilenet-V2, 76
SSH, 15, 107, 108
Statsmodels, 82