Object Detection
Object Detection
TA : Young-geun Kim
Biostatistics Lab.,
Seoul National University
March-June, 2018
1 Introduction
2 R-CNN
3 YOLO
4 Evaluation
Introduction
Challenges
Challenges (Conti.)
Challenges (Conti.)
Approaches
R-CNN
Selective Search
Detection Network
Limitation of R-CNN
Fast R-CNN
RoI pooling connects the raw image and the final extracted feature
before FC layers.
RoI feature vector passes two sibling FC layer.
Multi-task Loss
txk = (x k − xr )/wr
tyk = (y k − yr )/hr
twk = log(w k /wr )
thk = log(hk /hr )
where Lcls (p, u) = − log pu is log loss for true class u and
X
Lloc (t u , v ) = huber (tiu − vi ).
i∈{x,y ,w ,h}
Faster R-CNN
3. For all regions classified to be positive, adjust them using reg layer.
RPN and fast R-CNN share feature extractor part. This shared
structure reduces test-time, the origin of its name ”Faster R-CNN”.
Sharing structure is implemented by following sequence.
Phase Feature Extractor Region Proposal
Initialized from
1. Train RPN -
ImageNet model
Initialized from RPN from
2. Train fast R-CNN
ImageNet model phase 1.
Frozen from
3. Tune RPN -
phase 2.
Frozen from RPN from
4. Tune fast R-CNN
phase 2. phase 3.
Table: Evaluation on VOC 2007 test set, adjusted from Girshick and Ross,
2015. and Ren et al., 2015.
YOLO
You-Only-Look-Once
Terminology
Terminology (Conti.)
Terminology (Conti.)
All bounding boxes sharing grid cell have the same conditional class
probability, formally Pr (Classi |Object).
truth is
At test time, the class-specific confidence, Pr (Classi ) ∗ IoUpred
predicted by multiplying predicted conditional class confidence and
objectness confidence.
Terminology (Conti.)
Architecture
Architecture (Conti.)
Loss
Following is the loss function of YOLO. The first two terms are about
bbox regression. Next two terms are about objectness classification
and the last term is about the class classification.
Here, 1i and 1ij are indicators about responsibility of ith grid cell and
its jth bounding box, respectively.
Seoul National University Deep Learning March-June, 2018 43 / 57
YOLO
Performance
Performance (Conti.)
Compared with fast R-CNN, YOLO has high location error and low
background error.
Correct: correct class and IoU >.5, Loc: correct class, .1<IoU<.5,
Sim: class is similar, IoU>.1, Other: class is wrong, IoU>.1,
Background: IoU<.1 for any object.
Evaluation
Figure: from
https://kr.mathworks.com/help/vision/ref/selectstrongestbbox.html
Figure: from
https://kr.mathworks.com/help/vision/ref/selectstrongestbbox.html.
Seoul National University Deep Learning March-June, 2018 48 / 57
Evaluation
Evaluation measures
Detecting all objects as it is easy. Just classify all regions to all object.
What would be the value of TN? If the model is reasonable, TN
should be ∞.
Average Precision
References
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich
feature hierarchies for accurate object detection and semantic
segmentation. In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 580-587).
Uijlings, Jasper RR, et al. ”Selective search for object recognition.”
International journal of computer vision 104.2 (2013): 154-171.
Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. ”Efficient
graph-based image segmentation.” International journal of computer
vision 59.2 (2004): 167-181
Felzenszwalb, Pedro F., et al. ”Object detection with discriminatively
trained part-based models.” IEEE transactions on pattern analysis and
machine intelligence 32.9 (2010): 1627-1645.
References (Conti.)
References (Conti.)
References (Conti.)
Boyd, Kendrick, Kevin H. Eng, and C. David Page. ”Area under the
precision-recall curve: Point estimates and confidence intervals.” Joint
European Conference on Machine Learning and Knowledge Discovery
in Databases. Springer, Berlin, Heidelberg, 2013.
Introduction to modern information retrieval