cs231n 2018 ds06
cs231n 2018 ds06
cs231n 2018 ds06
and Segmentation
Vincent Chen and Edward Chou
Agenda
● Why would understanding different architectures be useful?
● Modular Frameworks
● Describe Modern Frameworks
○ Detection
○ Segmentation
○ Trade-offs
○ Open Source Links
● Using Detection for Downstream Tasks
Why do I need this?
● SoTA Object Detectors are really good!
○ Used in consumer products
● Understanding trade-offs: when should I use each framework?
● Object detection/segmentation is a first step to many interesting problems!
○ While not perfect, you can assume you have bounding boxes for your visual tasks!
○ Examples: scene graph prediction, dense captioning, medical imaging features
Modular Frameworks
● Base network
○ Feature extraction
● Proposal Generation
○ Sliding windows, RoI, Use a network?
Modern Convolutional Detection/Segmentation
Detection
● R-FCN
● Faster R-CNN
● YOLO
● SSD
Segmentation
● Mask R-CNN
● SegNet
● U-Net, DeepLab, and more!
Modern Convolutional Object Detectors
Propose ‘context module’ which uses dilated convolutions for multi scale
aggregation.
Uses a novel technique to upsample encoder output which involves storing the
max-pooling indices used in pooling layer. This gives reasonably good
performance and is space efficient (versus FCN)
Segnet Architecture
Mask R-CNN
- Encoder-decoder architecture.
- When desired output should include localization, i.e., a class label is
supposed to be assigned to each pixel
- Training in patches helps with lack of data
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc
/detection_model_zoo.md
Object Detection
https://github.com/tensorflow/models/blob/master/research/object_detection/object
_detection_tutorial.ipynb
Further Reading
Speed/accuracy tradeoffs for modern convolutional object detectors (2017):
https://arxiv.org/pdf/1611.10012.pdf