Object Detection Using Deep CNNs Trained On Synthetic Images

Uploaded by

raxx666

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Object Detection Using Deep CNNs Trained On Synthetic Images

Uploaded by

raxx666

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Object Detection Using Deep CNNs Trained on Synthetic Images

P. S. Rajpura H. Bojinov R. S. Hegde

Department of Electrical Engineering Innit Inc. Department of Electrical Engineering
Indian Institute of Technology Redwood City, CA 94063,USA Indian Institute of Technology
Gandhinagar, Gujarat, India 382355 Email:hristo.bojinov@innit.com Gandhinagar, Gujarat, India 382355
Email:param.rajpura@iitgn.ac.in Email:hegder@iitgn.ac.in
arXiv:1706.06782v2 [cs.CV] 18 Sep 2017

Abstract—The need for large annotated image datasets for published works like Faster R-CNN [9] and SSD [10] learn
training Convolutional Neural Networks (CNNs) has been a object proposals and object classification in an end-to-end
significant impediment for their adoption in computer vision fashion.
applications. We show that with transfer learning an effective
object detector can be trained almost entirely on synthetically The availability of large sets of training images has been
rendered datasets. We apply this strategy for detecting pack- a prerequisite for successfully training CNNs [6]. Manual
aged food products clustered in refrigerator scenes. Our CNN annotation of images for object detection, however, is a
trained only with 4000 synthetic images achieves mean average time-consuming and mechanical task; what is more, in some
precision (mAP) of 24 on a test set with 55 distinct products as applications the cost of capturing images with sufficient
objects of interest and 17 distractor objects. A further increase
of 12% in the mAP is obtained by adding only 400 real images variety is prohibitive. In fact the largest image datasets are
to these 4000 synthetic images in the training set. A high degree built upon only a few categories for which images can be
of photorealism in the synthetic images was not essential in feasibly curated (20 categories in PASCAL VOC [11], 80
achieving this performance. We analyze factors like training in COCO [12], and 200 in ImageNet [13]). In applications
data set size and 3D model dictionary size for their influence where a large set of intra-category objects need to be
on detection performance. Additionally, training strategies like
fine-tuning with selected layers and early stopping which affect detected the option of supervised learning with CNNs is even
transfer learning from synthetic scenes to real scenes are tougher as it is practically impossible to collect sufficient
explored. Training CNNs with synthetic datasets is a novel training material.
application of high-performance computing and a promising There have been solutions proposed to reduce annotation
approach for object detection applications in domains where efforts by employing transfer learning or simulating scenes
there is a dearth of large annotated image data.
to generate large image sets. The research community has
Keywords-Convolutional Neural Networks (CNN); Deep proposed multiple approaches for the problem of adapting
learning; Transfer learning; Synthetic datasets; Object Detec- vision-based models trained in one domain to a different
tion; 3D Rendering
domain [14]–[18]. Examples include: re-training a model
in the target domain [19]; adapting the weights of a pre-
I. I NTRODUCTION
trained model [20]; using pre-trained weights for feature
The field of Computer Vision has reached new heights extraction [21]; and, learning common features between
over the last few years. In the past, methods like DPMs [1], domains [22].
SIFT [2] and HOG [3] were used for feature extraction, Attempts to use synthetic data for training CNNs to adapt
and linear classifiers were used for making predictions. in real scenarios have been made in the past. Peng et. al.
Other methods [4] used correspondences between template used available 3D CAD models, both with and without
images and the scene image. Later works focused on class- texture, and rendered images after varying the projections
independent object proposals [5] using segmentation and and orientations of the objects, evaluating on 20 categories in
classification using hand crafted features. Today methods the PASCAL VOC 2007 data set [23]. The CNN employed
based on Deep Neural Networks (DNNs) have achieved for their approach used a general object proposal module [8]
state-of-the-art performance on image classification, object which operated independently from the fine-tuned classifier
detection, and segmentation [6], [7]. DNNs been success- network. In contrast, Su and coworkers [24] used the ren-
fully deployed in numerous domains [6], [7]. Convolutional dered 2D images from 3D on varying backgrounds for pose
Neural Networks (CNNs), specifically, have fulfilled the estimation. Their work also uses an object proposal stage
demand for a robust feature extractor that can generalize and limits the objects of interest to a few specific categories
to new types of scenes. CNNs were initially deployed for from the PASCAL VOC data set. Georgakis and cowork-
image classification [6] and later extended to object detec- ers [25] propose to learn object detection with synthetic data
tion [8]. The R-CNN approach [8] used object proposals generated by object instances being superimposed into real
and features from a pre-trained object classifier. Recently scenes at different positions, scales, and illumination. They
Figure 1. Overview of our approach to train object detectors for real images based on synthetic rendered images.

propose the use of existing object recognition data sets such from simulation to reality. The rest of this paper is organized
as BigBird [26] rather than using 3D CAD models. They as follows: our methodology is described in section II,
limit their synthesized scenes to low-occlusion scenarios followed by the results we obtain reported in section III,
with 11 products in GMU-Kitchens data set. Gupta et. al. finally concluding the paper in section IV.
generate a synthetic training set by taking advantage of scene
segmentation to create synthetic training examples, however II. M ETHOD
the goal is text localization instead of object detection [21]. Given a RGB image captured inside a refrigerator, our
Tobin et. al. perform domain randomization with low-fidelity goal is to predict a bound-box and the object class category
rendered images from 3D meshes, however their objective for each object of interest. In addition, there are few objects
is to locate simpler polygon-shaped objects restricted to a in the scene that need to be neglected. Our approach is
table top in world coordinates [27]. In [28], [29], the Unity to train a deep CNN with synthetic rendered images from
game engine is used to generate RGB-D rendered images available 3D models. Overview of the approach is shown in
and semantic labels for outdoor and indoor scenes. They Figure 1. Our work can be divided into two major parts
show that by using photo-realistic rendered images the effort namely synthetic image rendering from 3D models and
for annotation can be significantly reduced. They combine transfer learning by fine-tuning the deep neural network with
synthetic and real data to train models for semantic segmen- synthetic images.
tation, however the network requires depth map information
for semantic segmentation. A. Synthetic Generation of Images from 3D Models
None of the existing approaches to training with synthetic
data consider the use of synthetic image datasets for training We use an open source 3D graphics software named
a general object detector in a scenario where high intra-class Blender. Blender-Python APIs facilitate to load 3D models
variance is present along with high clutter or occlusion. Ad- and automate the scene rendering. We use Cycles Render
ditionally, while previous works have compared the perfor- Engine available with Blender since it supports ray-tracing
mance using benchmark datasets, the study of cues or hyper- to render synthetic images. Since all the required annotation
parameters involved in transfer learning has not received data is available, we use the KITTI [31] format with bound-
sufficient attention. We propose to detect object candidates box co-ordinates, truncation state and occlusion state for
in the scene with large intra-class variance compared to an each object in the image.
approach of detecting objects for few specific categories. We Real world images have lot of information embed-
are especially interested in synthetic datasets which do not ded about the environment, illumination, surface materials,
require extensive effort towards achieving photorealism. In shapes etc. Since the trained model, at test time must be
this work, we simulate scenes using 3D models and use the able to generalize to the real world images, we take into
rendered RGB images to train a CNN-based object detector. consideration the following aspects during generation of
We automate the process of rendering and annotating the each scenario:
2D images with sufficient diversity to train the CNN end- • Number of objects
to-end and use it for object detection in real scenes. Our • Shape, Texture, and Materials of the objects
experiments also explore the effects of different parameters • Texture and Materials of the refrigerator
like data set size and 3D model repository size. We also • Packing pattern of the objects
explore the effects of training strategies like fine-tuning • Position, Orientation of camera
selective layers and early stopping [30] on transfer learning • Illumination via light sources
Figure 2. Overview of the training dataset. a) Snapshots of the few 3D models from the ShapeNet database used for rendering images. We illustrate
the variety in object textures, surface materials and shapes in 3D models used for rendering. b) Rendered non-photo realistic images with with varying
object textures, surface materials and shapes arranged in random, grid and bin packed patterns finally captured from various camera angles with different
illuminations. c) Few real images used to illustrate the difference in real and synthetic images. These images are subset of the real dataset used for
benchmarking performance of model trained with synthetic images.

In order to simulate the scenario, we need 3D models, replicate common scenarios in refrigerator. The light sources
their texture information and metadata. Thousands of 3D are placed such that illumination is varied in every scene and
CAD models are available online. We choose ShapeNet [32] the images are not biased to a well lit environment since
database since it provides a large variety of objects of refrigerators generally tend to have dim lighting. Multiple
interest for our application. Among various categories from cameras are placed at random location and orientation to
ShapeNet like bottles, tins, cans and food items, we selec- render images from each scene. The refrigerator texture
tively add 616 various object models to object repository and material properties are dynamically chosen for every
(R0 ) for generating scenes. Figure 2a shows few of the rendered image. Figure 2b shows few rendered images used
models in R0 . The variety helps randomize the aspect of as training set while Figure 2c shows the subset of real world
shape, texture and materials of the objects. For the refriger- images used in training.
ator, we choose a model from Archive3D [33] suitable for
the application. The design of refrigerator remains same for B. Deep Neural Network Architecture, Training and Evalu-
all the scenarios though the textures and material properties ation
are dynamically chosen. Figure 3 provides the detailed illustration of network
For generating training set with rendered images, the architecture and work-flow for the training and valida-
3D scenes need to be distinct. The refrigerator model with tion stages. For neural network training we use NVIDIA-
5-25 randomly selected objects from R0 are imported in DIGITST M -DetectNet [34] with Caffe [35] library in back-
each scene. To simulate the cluster of objects packed in end. During training, the RGB images with resolution (in
refrigerator like real world scenarios, we use three patterns pixels) 512 x 512 are labelled with standard KITTI [31]
namely grid, random and bin packing for 3D models. The format for object detection. We neglect objects truncated or
grid places the objects in a particular scene on a refrigerator highly occluded in the images using appropriate flags in the
tray top at predefined distances. Random placements drop ground truth label generated while rendering. The dataset is
the objects at random locations on refrigerator tray top. later fed into a fully convolutional network (FCN) predicting
Bin packing tries to optimize the usage of tray top area coverage map for each detected class. The FCN network
placing objects very close and clustered in the scene to represented concisely in Figure 4 has the same structure
Figure 3. Work-flow for the major steps of the system. a) Using annotated images, the FCN generates a coverage map and bound-box co-ordinates. The
training loss is a weighted sum of coverage and bound-box loss. b) At validation time, coverage map and bound boxes are generated from the FCN.

as GoogLeNet [7] without the data input layers and output network and ground truth
layers. For our experiments, we use pre-trained weights on N
ImageNet to initialize the FCN network which has earlier 1 X 2
coverageti − coveragepi (1)
been helpful for transfer learning [25]. 2N i=1

where coveraget is the coverage map extracted from

annotated ground truth and coveragep is the predicted
coverage map while N denoting the batch size.
• L1 loss between the true and predicted corners of
the bounding box for the object covered by each grid
square.
N
1 Xh t i
x1 − xp1 + y1t − y1p + xt2 − xp2 + y2t − y2p
2N i=1
(2)
where (xt1 , y1t , xt2 , y2t ) are the ground-truth bound box
co-ordinates while (xp1 , y1p , xp2 , y2p ) are the predicted
bound box co-ordinates. N denotes the batch size.
For the validation stage, we threshold the coverage map
Figure 4. The Fully Convolutional Network architecture used in the
detector. Each bar represents a layer. Convolution layer includes Convo-
obtained after forward pass through the FCN network,
lution, ReLU activation and Pooling. Inception Layer includes the module and use the bound-box regressor to predict the corners.
as described in GoogleNet [7]. Since multiple bound-boxes are generated, we finally cluster
them to refine the predictions. For evaluation, we compute
The bound-box regressor predicts bound-box corner per Intersection over Union (IoU) score. With a threshold hyper-
grid square. We train the detector through stochastic gradient parameter, predicted bound boxes are classified as True
descent with Adam optimizer using standard learning rate Positives (TP), False Positives (FP) and False Negatives
of 1e−3 . The total loss is the weighted summation of the (FN). Precision (PR) and Recall (RE) are calculated using
following losses: these metrics and a simplified mAP score is defined by the
• L2 loss between the coverage map estimated by the product of PR and RE [36].
III. R ESULTS AND D ISCUSSION images followed by a light decline in performance as shown
We evaluate our object detector trained exclusively with in Figure 5. Note that the smaller dataset is a subset of
synthetically rendered images using manually annotated the larger dataset size i.e. we have incrementally added new
crowd-sourced refrigerator images. Figure 8 illustrates the images to train dataset. After an extent, we observe decline
variety in object textures, shapes, scene illumination and in accuracy as we increase the dataset size suggesting over-
environment cues present in the test set. The real scenarios fitting to synthetic data with increase in dataset size.
also include other objects like vegetables, fruits, etc. which
need to be neglected by the detector. We address them as
distractor objects.
All the experiments were carried on workstation
with IntelR CoreT M i7-5960X processor accelerated by
NVIDIAR GEFORCET M GTX 1070. NVIDIA-DIGITST M
(v5.0) tool was used to prepare and manage the databases
and trained models. Hyper-parameters search on learning
rate, learning rate policy, training epochs, batch-size were
performed for training all neural network models.
The purpose of our experiments was to evaluate the
efficacy of transfer learning from rendered 3D models on
real refrigerator scenarios. Hence we divide this section into
two parts:
Figure 5. Detection results for the validation image-set for the training
• Factors affecting Transfer Learning: Here, we analyze
iterations.
the factors which we experimented with to achieve
the best detection performance via transfer learning.
We study following factors affecting overall detection
performance:
– Training Dataset Size: The variety in training im-
ages used determines the performance of neural
networks.
– Selected Layer Fine-tuning: Features learned at
each layer in CNNs have been distinct and found
to be general across domains and modalities. Fine-
tuning of the final fully-connected linear classifi-
cation layers has been used in practice for transfer
learning across applications. Hence, we extend this
idea to train several convolutional as well as linear
layers of the network and evaluate the resulting
performance. Figure 6. Neurons in layers of CNN learn distinct features. Network
– Object Dictionary Size: The appearance of an weights were fine-tuned by freezing layers sequentially. The figure repre-
sents the performance with weights fine-tuned till mentioned layers.
object in image in static environment is a function
of its shape, texture and surface material property.
We use GoogleNet FCN architecture with 11 different
Variance in objects used for rendering has been
hierarchical levels with few inception modules as single level
observed to increase detection performance signif-
(Figure 4). mAP vs. number of epochs chart is presented in
icantly [24].
Figure 6 for models with different layers selected for fine-
• Detection Accuracy: Here, we represent the analysis of tuning. Starting from training just the final coverage and
the performance on real dataset achieved with the best bounding-box regressor layers we sequentially open deeper
detector model1 . layers for fine-tuning. We observe that fine-tuning all the
A. Factors affecting transfer learning inception modules helps transfer learning from synthetic
images to real images in our application. The results show
Considering other parameters like object dictionary size
that selection of the layers to fine-tune proves to be important
and fine-tuned network layers, we vary the training data size
for detection performance.
from 500-6000. We observe an increase in mAP up to 4000
To study the relationship of variance in 3D models with
1 Trained network weights and synthetic dataset are available at https: performance, we incrementally add distinct 3D models to the
//github.com/paramrajpura/Syn2Real dictionary starting from 10 to 400. We observe an increase
on ImageNet dataset also noting that the training dataset
was devoid of such distractor objects marked as background
clutter. We report in Figure 5, Figure 6 and Figure 7 mAP vs.
epochs trained plots over mAP vs. variance in factor to also
represent the relevance of early stopping [30]. The networks
trained by varying factors, show their peak performances
for 25-50 epochs of training while the performance declines
contrary to saturating which suggests over-fitting to synthetic
images.

Figure 7. Variety in training data affects the capability of generalizing

object detection in real scenarios.

in mAP up to 200 models and slight decline later on as

represented in Figure 7.

B. Detection Accuracy
We evaluate our best object detector model on a set of
50 crowd-sourced refrigerator scenes with all cue variances
covering 55 distinct objects of interest considered as posi-
tives and 17 distractor objects as negatives. Figure 8 shows
the variety in test set and the predicted bound-boxes for all Figure 9. Performance plots illustrating the effect of including synthetic
images while training neural networks.
refrigerator images. The detector achieves mAP of 24 on
this dataset which is a promising result considering that no
distractor objects were used while training using synthetic IV. C ONCLUSION
images. The question arises how well does a network trained with
synthetic images fare against one trained with real world
images. Hence we compare the performance of networks
trained with three different training image-sets as illustrated
in Figure 9. The synthetic training set consisted of 4000
images with 200 3D object models of interest while the real
training set consisted of 400 images parsed from the internet
with 240 distinct products and 19 distractor objects. The
hybrid set with synthetic and real images consisted of 3600
synthetic and 400 real images. All models were evaluated
on a set of 50 refrigerator scenes with less than 5% object
overlap between the test set and train set images. CNN
fully trained with 4000 synthetic images (achieves 24 mAP)
underperforms against one with 400 real images (achieves
28 mAP) but the addition of 4000 synthetic images to real
Figure 8. Scenes representing variance in scale, background, textures, dataset boosts the detection performance by 12% (achieves
illumination, packing patterns and material properties wherein Top Row: 36 mAP) which signifies the importance of transferable cues
Object detector correctly predicts the bound boxes for all objects of interest.
Middle Row: Object detector misses objects of interest. Bottom Row: from synthetic to real.
Object detector falsely predicts the presence of an object. To improve the observed performance, several tactics can
be tried. The presence of distractor objects in the test set
We observe that detector handles scale, shape and texture was observed to negatively impact performance. We are
variance. Though packing patterns like vertical stacking or working on the addition of distractor objects to the 3D
highly oblique camera angles lead to false predictions. Few model repository for rendering scenes with distractor objects
vegetables among the distractor objects are falsely predicted to train the network to become aware of them. Optimizing
as objects of interest suggesting the influence of pre-training the model architecture or replacing DetectNet with object
proposal networks might be another alternative. Training [10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y.
CNNs for semantic segmentation using synthetic images and Fu, and A. C. Berg, “SSD: Single shot multibox detector,”
the addition of depth information to the training sets is also in Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in
expected to help in the case of images with high degree of Bioinformatics), 2016, vol. 9905 LNCS, pp. 21–37.
occlusion.
[11] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I.
ACKNOWLEDGMENT Williams, J. Winn, and A. Zisserman, “The Pascal Visual
Object Classes Challenge: A Retrospective,” International
We acknowledge funding support from Innit Inc. con- Journal of Computer Vision, vol. 111, no. 1, pp. 98–136,
sultancy grant CNS/INNIT/EE/P0210/1617/0007 and High 2014.
Performance Computing Lab support from Mr. Sudeep
[12] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra-
Banerjee. We thank Aalok Gangopadhyay for the insightful
manan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Com-
discussions. mon objects in context,” Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence
R EFERENCES and Lecture Notes in Bioinformatics), vol. 8693 LNCS, no.
PART 5, pp. 740–755, 2014.
[1] D. Forsyth, “Object detection with discriminatively trained
part-based models,” Computer, vol. 47, no. 2, pp. 6–7, feb [13] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-
2014. Fei, “ImageNet: A large-scale hierarchical image database,”
in 2009 IEEE Conference on Computer Vision and Pattern
[2] D. G. Lowe, “Distinctive image features from scale-invariant Recognition. IEEE, jun 2009, pp. 248–255.
keypoints,” International Journal of Computer Vision, vol. 60,
no. 2, pp. 91–110, nov 2004. [14] W. Li, L. Duan, D. Xu, and I. W. Tsang, “Learning with
augmented features for supervised and semi-supervised het-
[3] N. Dalal and B. Triggs, “Histograms of oriented gradients erogeneous domain adaptation,” in IEEE Transactions on
for human detection,” in Proceedings - 2005 IEEE Computer Pattern Analysis and Machine Intelligence, vol. 36, no. 6,
Society Conference on Computer Vision and Pattern Recog- jun 2014, pp. 1134–1148.
nition, CVPR 2005, vol. I. IEEE, 2005, pp. 886–893.
[15] J. Hoffman, E. Rodner, J. Donahue, T. Darrell, and K. Saenko,
[4] S. Ekvall, F. Hoffmann, and D. Kragic, “Object recognition “Efficient Learning of Domain-invariant Image Representa-
and pose estimation for robotic manipulation using color tions,” in ICLR, jan 2013, pp. 1–9.
cooccurrence histograms,” in Proceedings 2003 IEEERSJ
International Conference on Intelligent Robots and Systems [16] J. Hoffman, S. Guadarrama, E. Tzeng, R. Hu, J. Donahue,
IROS 2003 Cat No03CH37453, vol. 2, no. October. IEEE, R. Girshick, T. Darrell, and K. Saenko, “LSDA: Large Scale
2003, pp. 1284–1289. Detection Through Adaptation,” in Proceedings of the 27th
International Conference on Neural Information Processing
[5] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Systems. MIT Press, 2014, pp. 3536–3544.
Smeulders, “Selective search for object recognition,” Inter-
national Journal of Computer Vision, vol. 104, no. 2, pp. [17] B. Kulis, K. Saenko, and T. Darrell, “What you saw is not
154–171, sep 2013. what you get: Domain adaptation using asymmetric kernel
transforms,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
IEEE, jun 2011, pp. 1785–1792.
classification with deep convolutional neural networks,” in
International Conference on Neural Information Processing
[18] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning
Systems. Curran Associates Inc., 2012, pp. 1097–1105.
Transferable Features with Deep Adaptation Networks,” in
Proceedings of the 32nd International Conference on Inter-
[7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, national Conference on Machine Learning - Volume 37, 2015,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper pp. 97–105.
with convolutions,” in Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recog- [19] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How trans-
nition, vol. 07-12-June. IEEE, jun 2015, pp. 1–9. ferable are features in deep neural networks?” in Proceedings
of the 27th International Conference on Neural Information
[8] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region- Processing Systems. MIT Press, 2014, pp. 3320–3328.
Based Convolutional Networks for Accurate Object Detection
and Segmentation,” IEEE Transactions on Pattern Analysis [20] Y. Li, N. Wang, J. Shi, J. Liu, and X. Hou, “Revisit-
and Machine Intelligence, vol. 38, no. 1, pp. 142–158, jan ing Batch Normalization For Practical Domain Adaptation,”
2016. Arxiv Preprint, vol. 1603.04779, no. 10.1016/B0-7216-0423-
4/50051-2, mar 2016.
[9] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal [21] A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic Data
Networks,” IEEE Transactions on Pattern Analysis and Ma- for Text Localisation in Natural Images,” Arxiv Preprint, vol.
chine Intelligence, vol. 39, no. 6, pp. 1137–1149, jun 2017. 1604.06646, no. 10.1109/CVPR.2016.254, apr 2016.
[22] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, [35] M. P. Vlastelica, S. Hayrapetyan, M. Tapaswi, and R. Stiefel-
“Deep Domain Confusion: Maximizing for Domain Invari- hagen, “Kit at MediaEval 2015 - Evaluating visual cues
ance,” Arxiv Preprint, vol. 1412.3474, dec 2014. for affective impact of movies task,” in CEUR Workshop
Proceedings, vol. 1436. New York, New York, USA: ACM
[23] X. Peng, B. Sun, K. Ali, and K. Saenko, “Learning deep Press, 2015, pp. 675–678.
object detectors from 3D models,” Proceedings of the IEEE
International Conference on Computer Vision, vol. 2015 Inter, [36] D. Hoiem, Y. Chodpathumwan, and Q. Dai, “Diagnosing error
pp. 1278–1286, dec 2015. in object detectors,” in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence
[24] H. Su, C. R. Qi, Y. Li, and L. J. Guibas, “Render for and Lecture Notes in Bioinformatics). Springer, Berlin,
CNN: Viewpoint estimation in images using CNNs trained Heidelberg, 2012, vol. 7574 LNCS, no. PART 3, pp. 340–
with rendered 3D model views,” Proceedings of the IEEE 353.
International Conference on Computer Vision, vol. 2015 Inter,
pp. 2686–2694, may 2015.

[25] G. Georgakis, A. Mousavian, A. C. Berg, and J. Kosecka,

“Synthesizing Training Data for Object Detection in Indoor
Scenes,” Arxiv Preprint, vol. 1702.07836, feb 2017.

[26] A. Singh, J. Sha, K. S. Narayan, T. Achim, and P. Abbeel,

“BigBIRD: A large-scale 3D database of object instances,”
in Proceedings - IEEE International Conference on Robotics
and Automation. IEEE, may 2014, pp. 509–516.

[27] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and

P. Abbeel, “Domain Randomization for Transferring Deep
Neural Networks from Simulation to the Real World,” Arxiv
Preprint, vol. 1703.06907, mar 2017.

[28] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M.

Lopez, “The SYNTHIA Dataset: A Large Collection of Syn-
thetic Images for Semantic Segmentation of Urban Scenes,”
in 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, jun 2016, pp. 3234–3243.

[29] A. Handa, V. Patraucean, V. Badrinarayanan, S. Stent,

and R. Cipolla, “SceneNet: Understanding Real World In-
door Scenes With Synthetic Data,” Arxiv Preprint, vol.
1511.07041, no. 10.1109/CVPR.2016.442, nov 2015.

[30] Y. Yao, L. Rosasco, and A. Caponnetto, “On early stopping

in gradient descent learning,” Constructive Approximation,
vol. 26, no. 2, pp. 289–315, aug 2007.

[31] Sharp and Toby, Computer Vision and Pattern Recognition

(CVPR), 2012 IEEE Conference on : date, 16-21 June 2012.
IEEE, 2012.

[32] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan,

Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su,
J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich
3D Model Repository,” Arxiv Preprint, vol. 1512.03012, no.
10.1145/3005274.3005291, dec 2015.

[33] A. 3D, “Archive 3D,” 2015. [Online]. Available: http:

//archive3d.net/

[34] J. Barker, S. Sarathy, and A. T. July, “De-

tectNet : Deep Neural Network for Object
Detection in DIGITS,” pp. 1–8, 2016. [On-
line]. Available: https://devblogs.nvidia.com/parallelforall/
detectnet-deep-neural-network-object-detection-digits/

The x86 PC Assembly Language, Design, and Interfacing MUHAMMAD ALI MAZIDI JANICE GILLISPIE MAZIDI DANNY CAUSEY Fifth Edition
100% (2)
The x86 PC Assembly Language, Design, and Interfacing MUHAMMAD ALI MAZIDI JANICE GILLISPIE MAZIDI DANNY CAUSEY Fifth Edition
826 pages
Philips SONOS PC Model 77471 C D (D.2.1) Field Service Manual
No ratings yet
Philips SONOS PC Model 77471 C D (D.2.1) Field Service Manual
319 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
Object Detection Using Convolutional Neural Network Transfer Learning
No ratings yet
Object Detection Using Convolutional Neural Network Transfer Learning
11 pages
CVlecture 6
No ratings yet
CVlecture 6
33 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Harsha Thesis
No ratings yet
Harsha Thesis
62 pages
Object Detection Using CNN
No ratings yet
Object Detection Using CNN
5 pages
Deep Residual Learning
No ratings yet
Deep Residual Learning
80 pages
Zhou Et Al. (2022)
No ratings yet
Zhou Et Al. (2022)
13 pages
Object Detection With Deep Learning_ A Review Summary
No ratings yet
Object Detection With Deep Learning_ A Review Summary
11 pages
PART B ETI-1
No ratings yet
PART B ETI-1
7 pages
1-realtimeobjectdetection
No ratings yet
1-realtimeobjectdetection
6 pages
An_Investigation_of_Deep_Neural_Network_based_Techniques_for_Object_Detection_an
No ratings yet
An_Investigation_of_Deep_Neural_Network_based_Techniques_for_Object_Detection_an
6 pages
Smart Shopping System IEEE PAPER TYK EDI Group 9unique
No ratings yet
Smart Shopping System IEEE PAPER TYK EDI Group 9unique
6 pages
Real Time Object Detection With Deep Learning and OpenCV
No ratings yet
Real Time Object Detection With Deep Learning and OpenCV
5 pages
Object Detection Using Domain Randomization and Generative Adversarial Refinement of Synthetic Images
No ratings yet
Object Detection Using Domain Randomization and Generative Adversarial Refinement of Synthetic Images
8 pages
bmvc14_sun_fromvirtualtoreal
No ratings yet
bmvc14_sun_fromvirtualtoreal
12 pages
1701.01077
No ratings yet
1701.01077
8 pages
Younis 2020
No ratings yet
Younis 2020
5 pages
Project Report (2) RRRRRRRRRRR
No ratings yet
Project Report (2) RRRRRRRRRRR
10 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Vitamin Deficiency Detection(Base Paper)
No ratings yet
Vitamin Deficiency Detection(Base Paper)
3 pages
Synthetic Data For Object Classification in Industrial Applications
No ratings yet
Synthetic Data For Object Classification in Industrial Applications
8 pages
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
No ratings yet
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
11 pages
Computer Vision Application
No ratings yet
Computer Vision Application
2 pages
Scene Reconstruction From 4D Radar Data With GAN and Diffusion
No ratings yet
Scene Reconstruction From 4D Radar Data With GAN and Diffusion
69 pages
Recent Advances in Deep Learning For Object Detection
No ratings yet
Recent Advances in Deep Learning For Object Detection
26 pages
Tmp4e31 TMP
No ratings yet
Tmp4e31 TMP
7 pages
Instagen: Enhancing Object Detection by Training On Synthetic Dataset
No ratings yet
Instagen: Enhancing Object Detection by Training On Synthetic Dataset
13 pages
Object Detection and Its Implementation On Android Devices
No ratings yet
Object Detection and Its Implementation On Android Devices
8 pages
Real Time Object Detection and Recognition Using Mobilenet SSD With Opencv IJERTV11IS010070
No ratings yet
Real Time Object Detection and Recognition Using Mobilenet SSD With Opencv IJERTV11IS010070
2 pages
DeepLearning_RobotVision
No ratings yet
DeepLearning_RobotVision
9 pages
Kim2019 Article LatentTransformationsNeuralNet
No ratings yet
Kim2019 Article LatentTransformationsNeuralNet
15 pages
A Deep Learning Based Assistant For The Visually Impaired
No ratings yet
A Deep Learning Based Assistant For The Visually Impaired
11 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
Sensors: HEMIGEN: Human Embryo Image Generator Based On Generative Adversarial Networks
No ratings yet
Sensors: HEMIGEN: Human Embryo Image Generator Based On Generative Adversarial Networks
16 pages
Team-4 DL
No ratings yet
Team-4 DL
5 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
Nivetha Me P2 Report
No ratings yet
Nivetha Me P2 Report
86 pages
GIRAFFE; Representing Scenes as Compositional Generative Neural Feature Fields _2011.12100v2
No ratings yet
GIRAFFE; Representing Scenes as Compositional Generative Neural Feature Fields _2011.12100v2
12 pages
5-IJLEMR-77839
No ratings yet
5-IJLEMR-77839
5 pages
Nivetha Me P2 PPT
No ratings yet
Nivetha Me P2 PPT
18 pages
Object Detection using ELAN
No ratings yet
Object Detection using ELAN
6 pages
Deep-Drone-Object 2
No ratings yet
Deep-Drone-Object 2
8 pages
Object Detection Using Tensorflow....
No ratings yet
Object Detection Using Tensorflow....
9 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
No ratings yet
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
10 pages
Multi-Layered Deep Convolutional Neural Network For Object Detection
No ratings yet
Multi-Layered Deep Convolutional Neural Network For Object Detection
6 pages
978-0-7503-6244-3.preview (1)
No ratings yet
978-0-7503-6244-3.preview (1)
56 pages
Nivetha Me Phase1rep
No ratings yet
Nivetha Me Phase1rep
57 pages
Wen Wen 2021 Thesis
No ratings yet
Wen Wen 2021 Thesis
114 pages
Accuracy-Relevance Trade-Off in Transfer Learning For Object Detection
No ratings yet
Accuracy-Relevance Trade-Off in Transfer Learning For Object Detection
4 pages
A novel model to detect and categorize objects from images by using a hybrid machine learning model
No ratings yet
A novel model to detect and categorize objects from images by using a hybrid machine learning model
13 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Manuscript Template 2
No ratings yet
Manuscript Template 2
13 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
4 - Engineering Vehicles Detection for Warehouse Surveillance System Based on Modified YOLOv4-Tiny
No ratings yet
4 - Engineering Vehicles Detection for Warehouse Surveillance System Based on Modified YOLOv4-Tiny
17 pages
A Simple Single-Scale Vision Transformer For Object Localization
No ratings yet
A Simple Single-Scale Vision Transformer For Object Localization
12 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
2501.03482v1
No ratings yet
2501.03482v1
9 pages
2501.03479v1
No ratings yet
2501.03479v1
13 pages
2501.03492v1
No ratings yet
2501.03492v1
15 pages
2501.03486v1
No ratings yet
2501.03486v1
27 pages
2501.03488v1
No ratings yet
2501.03488v1
24 pages
2501.03250v1
No ratings yet
2501.03250v1
91 pages
Ada-Leval: Evaluating Long-Context Llms With Length-Adaptable Benchmarks
No ratings yet
Ada-Leval: Evaluating Long-Context Llms With Length-Adaptable Benchmarks
13 pages
On The Evaluation of Machine-Generated Reports: James Mayfield Eugene Yang Dawn Lawrie Sean Macavaney
No ratings yet
On The Evaluation of Machine-Generated Reports: James Mayfield Eugene Yang Dawn Lawrie Sean Macavaney
12 pages
How Do Ideas Gain Legitimacy in Internal Crowdsourcing Idea Development Exploring The Effects of Feedback On Idea Selection
No ratings yet
How Do Ideas Gain Legitimacy in Internal Crowdsourcing Idea Development Exploring The Effects of Feedback On Idea Selection
33 pages
Novelty Detection - A Perspective From Natural Language Processing - Acl2022 - Jounral
No ratings yet
Novelty Detection - A Perspective From Natural Language Processing - Acl2022 - Jounral
42 pages
Towards Learning 3d Object Detection and 6d Pose Estimation From Synthetic Data
No ratings yet
Towards Learning 3d Object Detection and 6d Pose Estimation From Synthetic Data
4 pages
A Survey On Deep Transfer Learning
No ratings yet
A Survey On Deep Transfer Learning
10 pages
8 Nicolet Ambulatory EEG Monitor M
No ratings yet
8 Nicolet Ambulatory EEG Monitor M
74 pages
BMW DIS Installation Manual
No ratings yet
BMW DIS Installation Manual
12 pages
ms excel mcq
No ratings yet
ms excel mcq
5 pages
Windows 10 Note
No ratings yet
Windows 10 Note
35 pages
International Standard: Colorimetry - CIE 1976 L A B Colour Space
100% (2)
International Standard: Colorimetry - CIE 1976 L A B Colour Space
18 pages
Colab
No ratings yet
Colab
1 page
Form 2
No ratings yet
Form 2
5 pages
21st Century Lit CO2 Final Version
No ratings yet
21st Century Lit CO2 Final Version
56 pages
Satya Kushwaha CV
No ratings yet
Satya Kushwaha CV
1 page
19bcm556 - Driving School Management Documentation
No ratings yet
19bcm556 - Driving School Management Documentation
58 pages
Web App Sample Question Paper
No ratings yet
Web App Sample Question Paper
7 pages
Cyber-Controller-10-8-0-InstallationGuide
No ratings yet
Cyber-Controller-10-8-0-InstallationGuide
52 pages
Herald MPMC Course Plan
No ratings yet
Herald MPMC Course Plan
11 pages
Aquilion16 Software Specifics
No ratings yet
Aquilion16 Software Specifics
18 pages
Ms-Office 2016 Shortcut Keys-94FC
No ratings yet
Ms-Office 2016 Shortcut Keys-94FC
59 pages
AI ML Explained
No ratings yet
AI ML Explained
26 pages
Virtual AED 7.1.0.0 Installation Guide
No ratings yet
Virtual AED 7.1.0.0 Installation Guide
68 pages
Ste Microproject1
No ratings yet
Ste Microproject1
25 pages
GS30A01D61-01EN 005 NoRestriction
No ratings yet
GS30A01D61-01EN 005 NoRestriction
3 pages
Operation (Hmi) : February 00
No ratings yet
Operation (Hmi) : February 00
136 pages
7 12th Computer Application Minimum Study Material English Medium
No ratings yet
7 12th Computer Application Minimum Study Material English Medium
139 pages
Ec2 Ug PDF
100% (1)
Ec2 Ug PDF
1,411 pages
Binary Representation
100% (3)
Binary Representation
52 pages
Decision Support System (DSS) - 1
No ratings yet
Decision Support System (DSS) - 1
11 pages
Aca Q-Bank
No ratings yet
Aca Q-Bank
3 pages
4 Azure Devops Pipeline Part 2
100% (1)
4 Azure Devops Pipeline Part 2
84 pages
PowerPoint 2016 Module 1 PPT Presentation
No ratings yet
PowerPoint 2016 Module 1 PPT Presentation
20 pages
Samsung NVMe SSD 980 Data Sheet Rev.1.1
No ratings yet
Samsung NVMe SSD 980 Data Sheet Rev.1.1
5 pages