Computer vision

Basics of
Computer Vision
Dmitry Ryabokon, github.com/dryabokon

Content
1. CNNs out of box
2. Basic operations
3. Feature extraction
4. Transfer learning
5. Fine Tuning
6. Benchmarking the CNNs
7. Object detection
8. Detection of patterns and anomalies
9. Stereo Vision
10. Reconstruct features back into images
11. Style Transfer
12. GAN
13. Labeling
14. Color detection
15. PyTorch

4
The popular networks
Classification
o LeNet Model
o AlexNet Model
o VGG Model
o ResNet Paper
o YOLO9000 Paper
o DenseNet Paper
Segmentation
o FCN8 Paper
o SegNet Paper
o U-Net Paper
o E-Net Paper
o ResNetFCN Paper
o PSPNet Paper
o Mask RCNN Paper
Detection
o Faster RCNN Paper
o SSD Paper
o YOLOv2 Paper
o R-FCN Paper
CNNs out of box

5
Some datasets available for research
MNIST: 10 classes, ~7000 ex. per class ImageNet: 1000 classes, ~100 ex per class
CNNs out of box

6
The Street View House Numbers
10 classes, ~2000 ex. per class
CIFAR: 10 classes, 6000 ex. per class
100 classes, 600 ex per class
CNNs out of box

7
Olivetti database: 40 classes .. and much more @ kaggle
CNNs out of box

8
26%
16.40%
11.70%
7.30%
6.70%
3.60% 3.10%
0%
5%
10%
15%
20%
25%
30%
2010 2011 2012 2013 2014 2015 2016 2017
AlexNet VGG
GoogLeNet
ResNet
GoogLeNet-v4
XRCE
ImageNet Classification error
ZF
CNNs out of box

9
real-world image text extraction.
compressing and decompressing images
automatic speech recognition
semantic image segmentation
image matching and retrieval
image-to-text
computer vision
discovery of latent 3D keypoints
unsupervised learning
localizing and identifying multiple objects in a single image
3D object reconstruction
image classification
identify the name of a street (in France) from an image
predicting future video frames
https://github.com/keras-team/keras/tree/master/examples
https://github.com/tensorflow/models/tree/master/research
Usage of the neural networks
CNNs out of box

11
CNNs out of box
import tensorflow_hub as hub
import urllib.request
import cv2
import numpy
import json
import tensorflow as tf
# ----------------------------------------------------------------------------------------------------------------------
URL = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'
data = json.loads(urllib.request.urlopen(URL).read().decode())
class_names = [data['%d'%i][1] for i in range(0,999)]
# ----------------------------------------------------------------------------------------------------------------------
def example_predict():
module = hub.Module("https://tfhub.dev/google/imagenet/resnet_v2_50/classification/1")
img = cv2.imread('data/ex-natural/dog/dog_0000.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224)).astype(numpy.float32)
img = numpy.array([img]).astype(numpy.float32) / 255.0
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
outputs = module(dict(images=img), signature="image_classification", as_dict=True)
prob = outputs["default"].eval(session=sess)[0]
idx = numpy.argsort(-prob)[0]
print(class_names[idx], prob[idx])
sess.close()
return
# ----------------------------------------------------------------------------------------------------------------------
if __name__ == '__main__':
example_predict()

13
CNNs out of box
from keras.applications import MobileNet
from keras.applications.xception import Xception
from keras.applications.mobilenet import preprocess_input
import urllib.request
import cv2
import numpy
import json
from keras import backend as K
K.set_image_dim_ordering('tf')
# ----------------------------------------------------------------------------------------------------------------------
URL = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'
data = json.loads(urllib.request.urlopen(URL).read().decode())
class_names = [data['%d'%i][1] for i in range(0,999)]
# ----------------------------------------------------------------------------------------------------------------------
def example_predict():
CNN = MobileNet()
#CNN = Xception()
img = cv2.imread('data/ex-natural/dog/dog_0000.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224)).astype(numpy.float32)
prob = CNN.predict(preprocess_input(numpy.array([img])))
idx = numpy.argsort(-prob[0])[0]
print(class_names[idx], prob[0, idx])
return
# ----------------------------------------------------------------------------------------------------------------------
if __name__ == '__main__':
example_predict()

15
Basic operations
Convolution

16
Basic operations
Convolution
=

17
Basic operations
Convolution
=

18
Basic operations
Convolution
=

19
Basic operations
Convolution
=

20
Basic operations
Convolution
=

21
Basic operations
Convolution
=

22
Basic operations
Convolution
=

23
Basic operations
Convolution
=

24
Basic operations
Convolution
=

25
Basic operations
Convolution
=

26
Basic operations
Convolution
=

27
Basic operations
Convolution
=

28
Basic operations
Convolution
=

29
Basic operations
Convolution
=

30
Basic operations
Convolution
=

31
Basic operations
Convolution
=

32
Basic operations
Convolution

33
Basic operations
Convolution

34
Basic operations
De-Convolution

38
Basic operations
Fully Connected

40
Basic operations
Normalization
layer batch instance group

48
airplane car cat dog
flower fruit motorbike person
Feature extraction

49
Feature extraction

52
How CNN constructs the features
AlexNet

60
MobileNetFeature extraction

61

63

73
Random before learning After learning
Finetuning

74
Filters after learning
Finetuning

78
2ConvlayersBenchmarking the CNNs

79
AlexNetBenchmarking the CNNs

80
InceptionBenchmarking the CNNs

81
MobileNetBenchmarking the CNNs

Reconstruct
features back into
images

83AlexNet
Reconstruct features back into images

84AlexNet

85AlexNet

86
Features
AlexNet

87
Reconstructed
AlexNet

88
Original

89
Features
FC

90
Reconstructed
FC

91MobileNet

92MobileNet

93Inception

94Inception

96
Object detection
def demo_cascade():
if USE_CAMERA:
cap = cv2.VideoCapture(0)
else:
cap = []
frame = cv2.imread(filename_in)
while (True):
if USE_CAMERA:
ret, frame = cap.read()
gray_rgb = tools_image.desaturate(frame)
faces = faceCascade.detectMultiScale(frame,scaleFactor=1.1,minNeighbors=5,minSize=(30, 30))
for (x, y, w, h) in faces:
cv2.rectangle(gray_rgb, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('frame', gray_rgb)
key = cv2.waitKey(1)
if key & 0xFF == 27:
break
if (key & 0xFF == 13) or (key & 0xFF == 32):
cv2.imwrite(filename_out, gray_rgb)
if USE_CAMERA:
cap.release()
cv2.destroyAllWindows()
if not USE_CAMERA:
cv2.imwrite(filename_out, gray_rgb)
return

97
Object detection
haarcascade_eye.xml
haarcascade_eye_tree_eyeglasses.xml
haarcascade_frontalcatface.xml
haarcascade_frontalcatface_extended.xml
haarcascade_frontalface_alt.xml
haarcascade_frontalface_alt2.xml
haarcascade_frontalface_alt_tree.xml
haarcascade_frontalface_default.xml
haarcascade_fullbody.xml
haarcascade_lefteye_2splits.xml
haarcascade_licence_plate_rus_16stages.xml
haarcascade_lowerbody.xml
haarcascade_profileface.xml
haarcascade_righteye_2splits.xml
haarcascade_russian_plate_number.xml
haarcascade_smile.xml
haarcascade_upperbody.xml

98
Object detection: Haar cascades
1. Create samples
2. Verify samples
3. Train the cascade

99
Object detection: create samples from single positive
opencv_createsamples -vec pos.vec
-img pos_singleobject.jpg
-bg negneg.txt
-num 100
-bgcolor 0
-bgthresh 0
-w 28 -h 28
-maxxangle 1.1
-maxyangle 1.1
-maxzangle 0.5
-maxidev 40
result vec file
input positive sample file
list of negative samples

100
-info pos_multippos.txt
-num 4
-w 28 -h 28
result file
list of pos samples
Object detection: create samples from many positive

101
Object detection: verify vec file
-show

102
Object detection: training
opencv_traincascade -data output_folder
-vec ../positive/positives.vex
-bg ../negative/bg_train.txt
-numPos 800
-numNeg 400
-numStages 10
-w 20
-h 20
existing folder
made by opencv_createsamples
list of negative samples

103
Train the Cascades
import cv2
import os
import tools_image
# ---------------------------------------------------------------------------------------------------------------------
path = 'data/ex12_single/'
filename_in = path + 'pos_single/object_to_detect.jpg'
filename_out = 'data/output/detect_result.jpg'
object_detector= cv2.CascadeClassifier(path + 'output_single/cascade.xml')
# ---------------------------------------------------------------------------------------------------------------------
def train_cascade():
os.chdir(path)
os.system('1-create_vec_from_single.bat')#os.system('2-verify_vec_single.bat')
os.system('3-train_from_single.bat')
return
# ---------------------------------------------------------------------------------------------------------------------
if __name__ == '__main__':
#train_cascade()
image = tools_image.desaturate(cv2.imread(filename_in))
objects, rejectLevels, levelWeights =
object_detector.detectMultiScale3(image, scaleFactor=1.05, minSize=(20, 20), outputRejectLevels=True)
for (x, y, w, h) in objects:cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imwrite(filename_out,image)

104
Object detection: example of training

Detection of
patterns and
anomalies

106
Detection of patterns and anomalies
Input data:
o Still image of multiple similar objects sampled across the grid-like structure.
Output:
o Pattern – an image that best fits (represents) the sampled instances of the
similar objects
o Markup (labeling) of an image to indicate the positions of the patterns and
outliers as well as their confidence level
Limitations
o Samples on the input image have limited pose variance so their view angle is
similar (e.g. front view only, no side or back view)

107
Synthetic image: a template is sampled thru the grid

108
Input image: added some variance : Euclidian translation + gamma correction

109
Result hitmap: confidence level to match the pattern
pattern
(auto detected)

110
Result mark-up: positions of the patterns

111
Real image

112
Real image

113
Real image

114
Real image

115
Result hitmap: confidence level to match the pattern
pattern
(auto detected)

116
Result mark-up: positions of the patterns + outlier area

121
Stereo vision
def get_disparity_v_01(imgL, imgR, disp_v1, disp_v2, disp_h1, disp_h2):
window_size = 7
left_matcher = cv2.StereoSGBM_create(
minDisparity=disp_h1,
numDisparities=int(0 + (disp_h2 - disp_h1) / 16) * 16,
blockSize=5,
P1=8 * 3 * window_size ** 2,
P2=32 * 3 * window_size ** 2,
disp12MaxDiff=disp_h2,
uniquenessRatio=15,
speckleWindowSize=0,
speckleRange=2,
preFilterCap=63,
mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)
right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
dispr = right_matcher.compute(imgR, imgL)
displ = left_matcher.compute(imgL, imgR)
wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)
wls_filter.setLambda(80000)
wls_filter.setSigmaColor(1.2)
filteredImg_L = wls_filter.filter(displ, imgL, None, dispr)
wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=right_matcher)
filteredImg_R = wls_filter.filter(dispr, imgR, None, displ)

123
Stereo vision
def get_disparity_v_02(imgL, imgR, disp_v1, disp_v2, disp_h1, disp_h2):
max = numpy.maximum(math.fabs(disp_h1), math.fabs(disp_h2))
levels = int(1 + (max) / 16) * 16
stereo = cv2.StereoBM_create(numDisparities=levels, blockSize=15)
displ = stereo.compute(imgL, imgR)
dispr = numpy.flip(stereo.compute(numpy.flip(imgR, axis=1), numpy.flip(imgL, axis=1)), axis=1)
return -displ / 16, -dispr / 16

125
Stereo vision
def get_best_matches(image1,image2,disp_v1, disp_v2, disp_h1, disp_h2, window_size=15,step=10):
N = int(image1.shape[0] * image1.shape[1] / step / step)
rand_points = numpy.random.rand(N,2)
rand_points[:, 0] = window_size + (rand_points[:, 0]*(image1.shape[0]-2*window_size))
rand_points[:, 1] = window_size + (rand_points[:, 1]*(image1.shape[1]-2*window_size))
coord1,coord2,quality = [],[],[]
for each in rand_points:
row,col = int(each[0]),int(each[1])
template = tools_image.crop_image(image1,row-window_size,col-window_size,row+window_size,col+window_size)
top,left,bottom,right = row - window_size + disp_v1, col - window_size + disp_h1,row + window_size + disp_v2,col + window_size + disp_h2
field = tools_image.crop_image(image2,top,left,bottom,right)
q = cv2.matchTemplate(field, template, method=cv2.TM_CCOEFF_NORMED)
q = q[1:, 1:]
q = (q + 1) * 128
idx = numpy.argmax(q.flatten())
q_best = numpy.max(q.flatten())
qq=q[int(idx/q.shape[1]), idx % q.shape[1]]
if q_best>0:
dr = int(idx/q.shape[1])+disp_v1
dc = idx % q.shape[1]+disp_h1
if col + dc>=0 and col + dc<image1.shape[1] and row + dr>=0 and row + dr<image1.shape[0]:
coord1.append([col ,row ])
coord2.append([col+ dc, row+dr])
quality.append(q_best)
return numpy.array(coord1), numpy.array(coord2),numpy.array(quality)

Labeling M Schlesinger
D Schlesinger
Flach Hlavác

127
T
T
set of objects
object
Kkt  label for object t
KTk :
set of labels
labeling
T
K all possible labelings
  Ttt   neighbor of t
 tttt kkg , weight of an edge
K
t
Labeling
Definitions

128
 
 tttt
tt
TtKk
kkgk T 


 ,&*

Find the valid labeling
   1,0,  tttt kkg
Labeling
The (OR, AND) problem

129
Find a labeling with maximal sum
 
 






tt
Tt
tttt
Kk
kkgk
T

,maxarg*
  Rkkg tttt  ,
Labeling
The (MAX, +) problem

130
Labeling
The (MAX, +) problem: the best path on graph

131
 
 






tt
Tt
tttt
Kk
kkgk
T

,maxarg*
Labeling
The (MAX, +) problem: the best path on graph

132
 
 






tt
Tt
tttt
Kk
kkgk T

,min*
  городамимеждурастояниеkkg tttt  ,
городаT 
дорогиK 
Labeling
The (MIN, +) problem: travelling salesman problem

133
Labeling
The (OR, AND) problem: example
 
 tttt
tt
TtKk
kkgk T 


 ,&*


134
 
 tttt
tt
TtKk
kkgk T 


 ,&*

Labels in all the neighbor objects should be
connected by an edge
Labeling
The (OR, AND) problem: example

135
 
 tttt
tt
TtKk
kkgk T 


 ,maxmin*

Labeling
The (MIN, MAX) problem: clustering
The distance between most different objects of
the same cluster should be minimal
T = objects
K = clusters
𝑔 𝑡𝑡′(𝑘 𝑡, 𝑘 𝑡′) = distance between the objects

136
 
 tttt
tt
TtKk
kkgk T 


 ,*

 
 
 
 
 
 
 
 
 maxmin,
minmax,
min,
max,
min,
max,
andor,
,
,






Count of valid labelings
Valid labeling
Travelling salesman problem
Travelling salesman problem
Optimization on Gibbs field
Optimization on Gibbs field
Hamiltonian path
Clustering
Labeling
The general labeling problem

137
Labeling
Some applications: texture segmentation

138
Labeling
Some applications: stereo vision

139
Labeling
Some applications: stereo vision

140
Labeling
(MAX, +) labeling problem on 1D chain structure

141
Labeling
(MAX, +) labeling problem on 1D chain structure

142
Labeling
(MAX, +) labeling problem on 1D chain structure: N best paths

143
       1 1 2 2 1 2 2 1, & , , & ,g k k g k k g k k g k k   
1 2 1 2> , >k k k k 
Labeling
(OR, AND) labeling problem with supermodular weights

144
Labeling
Supermodular weights

145
Labeling
NOT supermodular weights

146
Labeling
(OR, AND) labeling problem with supermodular weights: example

147
Labeling

148
Labeling

149
Labeling

150
Labeling

151
Labeling
(OR, AND) labeling problem with supermodular weights

152
Labeling

153
Labeling

154
1 2 1 2> , >k k k k 
       1 2 2 1 1 1 2 2, , , , , ,min g k k g k k min g k k g k k         
Labeling
(MAX, MIN) labeling problem with supermodular weights

155
Labeling

156
Labeling

157
Labeling

158
Labeling

159
Labeling

160
Labeling

161
Labeling

162
Labeling

163
Labeling

164
Labeling

165
 
 tttt
tt
TtKk
kkgk T 


 ,maxmin*

  кластераодногообъектамимеждуотличиеkkg tttt  ,
объектыT 
кластерыK 
Labeling
(MIN, MAX) labeling problem: clustering
The distance between most different objects of
the same cluster should be minimal

166
Labeling

167
Labeling
Remove all edges having weight > 0
Solve (OR,AND) problem

168
Labeling
Remove all edges having weight ≥ 1

169
Labeling
Remove all edges having weight ≥ 1

170
Labeling
distance between most different objects of the same cluster = 1

Computer vision

More Related Content

Computer vision