Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Basics of
Computer Vision
Dmitry Ryabokon, github.com/dryabokon

2

Content
1. CNNs out of box
2. Basic operations
3. Feature extraction
4. Transfer learning
5. Fine Tuning
6. Benchmarking the CNNs
7. Object detection
8. Detection of patterns and anomalies
9. Stereo Vision
10. Reconstruct features back into images
11. Style Transfer
12. GAN
13. Labeling
14. Color detection
15. PyTorch

3

CNNs out of
box

4

4
The popular networks
Classification
o LeNet Model
o AlexNet Model
o VGG Model
o ResNet Paper
o YOLO9000 Paper
o DenseNet Paper
Segmentation
o FCN8 Paper
o SegNet Paper
o U-Net Paper
o E-Net Paper
o ResNetFCN Paper
o PSPNet Paper
o Mask RCNN Paper
Detection
o Faster RCNN Paper
o SSD Paper
o YOLOv2 Paper
o R-FCN Paper
CNNs out of box

5

5
Some datasets available for research
MNIST: 10 classes, ~7000 ex. per class ImageNet: 1000 classes, ~100 ex per class
CNNs out of box

6

6
Some datasets available for research
The Street View House Numbers
10 classes, ~2000 ex. per class
CIFAR: 10 classes, 6000 ex. per class
100 classes, 600 ex per class
CNNs out of box

7

7
Some datasets available for research
Olivetti database: 40 classes .. and much more @ kaggle
CNNs out of box

8

8
26%
16.40%
11.70%
7.30%
6.70%
3.60% 3.10%
0%
5%
10%
15%
20%
25%
30%
2010 2011 2012 2013 2014 2015 2016 2017
AlexNet VGG
GoogLeNet
ResNet
GoogLeNet-v4
XRCE
ImageNet Classification error
ZF
CNNs out of box

9

9
real-world image text extraction.
compressing and decompressing images
automatic speech recognition
semantic image segmentation
image matching and retrieval
image-to-text
computer vision
discovery of latent 3D keypoints
unsupervised learning
localizing and identifying multiple objects in a single image
3D object reconstruction
image classification
identify the name of a street (in France) from an image
predicting future video frames
https://github.com/keras-team/keras/tree/master/examples
https://github.com/tensorflow/models/tree/master/research
Usage of the neural networks
CNNs out of box

10

10
CNNs out of box

11

11
CNNs out of box
import tensorflow_hub as hub
import urllib.request
import cv2
import numpy
import json
import tensorflow as tf
# ----------------------------------------------------------------------------------------------------------------------
URL = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'
data = json.loads(urllib.request.urlopen(URL).read().decode())
class_names = [data['%d'%i][1] for i in range(0,999)]
# ----------------------------------------------------------------------------------------------------------------------
def example_predict():
module = hub.Module("https://tfhub.dev/google/imagenet/resnet_v2_50/classification/1")
img = cv2.imread('data/ex-natural/dog/dog_0000.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224)).astype(numpy.float32)
img = numpy.array([img]).astype(numpy.float32) / 255.0
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
outputs = module(dict(images=img), signature="image_classification", as_dict=True)
prob = outputs["default"].eval(session=sess)[0]
idx = numpy.argsort(-prob)[0]
print(class_names[idx], prob[idx])
sess.close()
return
# ----------------------------------------------------------------------------------------------------------------------
if __name__ == '__main__':
example_predict()

12

12
CNNs out of box

13

13
CNNs out of box
from keras.applications import MobileNet
from keras.applications.xception import Xception
from keras.applications.mobilenet import preprocess_input
import urllib.request
import cv2
import numpy
import json
from keras import backend as K
K.set_image_dim_ordering('tf')
# ----------------------------------------------------------------------------------------------------------------------
URL = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'
data = json.loads(urllib.request.urlopen(URL).read().decode())
class_names = [data['%d'%i][1] for i in range(0,999)]
# ----------------------------------------------------------------------------------------------------------------------
def example_predict():
CNN = MobileNet()
#CNN = Xception()
img = cv2.imread('data/ex-natural/dog/dog_0000.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224)).astype(numpy.float32)
prob = CNN.predict(preprocess_input(numpy.array([img])))
idx = numpy.argsort(-prob[0])[0]
print(class_names[idx], prob[0, idx])
return
# ----------------------------------------------------------------------------------------------------------------------
if __name__ == '__main__':
example_predict()

14

Basic
operations

15

15
Basic operations
Convolution

16

16
Basic operations
Convolution
=

17

17
Basic operations
Convolution
=

18

18
Basic operations
Convolution
=

19

19
Basic operations
Convolution
=

20

20
Basic operations
Convolution
=

21

21
Basic operations
Convolution
=

22

22
Basic operations
Convolution
=

23

23
Basic operations
Convolution
=

24

24
Basic operations
Convolution
=

25

25
Basic operations
Convolution
=

26

26
Basic operations
Convolution
=

27

27
Basic operations
Convolution
=

28

28
Basic operations
Convolution
=

29

29
Basic operations
Convolution
=

30

30
Basic operations
Convolution
=

31

31
Basic operations
Convolution
=

32

32
Basic operations
Convolution

33

33
Basic operations
Convolution

34

34
Basic operations
De-Convolution

35

35
Basic operations
Pool: avg

36

36
Basic operations
Pool: max

37

37
Basic operations
Flatten

38

38
Basic operations
Fully Connected

39

39
Basic operations
Dropout

40

40
Basic operations
Normalization
layer batch instance group

41

Feature
extraction

42

42
Feature extraction

43

43
Feature extraction

44

44
Extract the features

45

45
Feature extraction

46

46
Feature extraction

47

47
Feature extraction

48

48
airplane car cat dog
flower fruit motorbike person
Feature extraction

49

49
airplane car cat dog
flower fruit motorbike person
Feature extraction

50

50
AlexNetFeature extraction

51

51
AlexNetFeature extraction

52

52
How CNN constructs the features
AlexNet

53

53
AlexNetFeature extraction

54

54
AlexNetFeature extraction

55

55
AlexNetFeature extraction

56

56
AlexNetFeature extraction

57

57
AlexNetFeature extraction

58

58
AlexNetFeature extraction

59

59
AlexNetFeature extraction

60

60
MobileNetFeature extraction

61

61
MobileNetFeature extraction

62

62
XceptionFeature extraction

63

63
MobileNetFeature extraction

64

Transfer
learning

65

65
Transfer learning

66

Fine tuning

67

67
Finetuning

68

68
Finetuning

69

69
Finetuning

70

70
Finetuning

71

71
Finetuning

72

72
Finetuning

73

73
Random before learning After learning
Finetuning

74

74
Filters after learning
Finetuning

75

75
Finetuning

76

76
Finetuning

77

Benchmarking the
CNNs

78

78
2ConvlayersBenchmarking the CNNs

79

79
AlexNetBenchmarking the CNNs

80

80
InceptionBenchmarking the CNNs

81

81
MobileNetBenchmarking the CNNs

82

Reconstruct
features back into
images

83

83AlexNet
Reconstruct features back into images

84

84AlexNet
Reconstruct features back into images

85

85AlexNet
Reconstruct features back into images

86

86
Features
AlexNet
airplane car cat dog
flower fruit motorbike person
Reconstruct features back into images

87

87
Reconstructed
AlexNet
airplane car cat dog
flower fruit motorbike person
Reconstruct features back into images

88

88
Original
airplane car cat dog
flower fruit motorbike person
Reconstruct features back into images

89

89
Features
FC
airplane car cat dog
flower fruit motorbike person
Reconstruct features back into images

90

90
Reconstructed
FC
airplane car cat dog
flower fruit motorbike person
Reconstruct features back into images

91

91MobileNet
Reconstruct features back into images

92

92MobileNet
Reconstruct features back into images

93

93Inception
Reconstruct features back into images

94

94Inception
Reconstruct features back into images

95

Object
detection

96

96
Object detection
def demo_cascade():
if USE_CAMERA:
cap = cv2.VideoCapture(0)
else:
cap = []
frame = cv2.imread(filename_in)
while (True):
if USE_CAMERA:
ret, frame = cap.read()
gray_rgb = tools_image.desaturate(frame)
faces = faceCascade.detectMultiScale(frame,scaleFactor=1.1,minNeighbors=5,minSize=(30, 30))
for (x, y, w, h) in faces:
cv2.rectangle(gray_rgb, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('frame', gray_rgb)
key = cv2.waitKey(1)
if key & 0xFF == 27:
break
if (key & 0xFF == 13) or (key & 0xFF == 32):
cv2.imwrite(filename_out, gray_rgb)
if USE_CAMERA:
cap.release()
cv2.destroyAllWindows()
if not USE_CAMERA:
cv2.imwrite(filename_out, gray_rgb)
return

97

97
Object detection
haarcascade_eye.xml
haarcascade_eye_tree_eyeglasses.xml
haarcascade_frontalcatface.xml
haarcascade_frontalcatface_extended.xml
haarcascade_frontalface_alt.xml
haarcascade_frontalface_alt2.xml
haarcascade_frontalface_alt_tree.xml
haarcascade_frontalface_default.xml
haarcascade_fullbody.xml
haarcascade_lefteye_2splits.xml
haarcascade_licence_plate_rus_16stages.xml
haarcascade_lowerbody.xml
haarcascade_profileface.xml
haarcascade_righteye_2splits.xml
haarcascade_russian_plate_number.xml
haarcascade_smile.xml
haarcascade_upperbody.xml

98

98
Object detection: Haar cascades
1. Create samples
2. Verify samples
3. Train the cascade

99

99
Object detection: create samples from single positive
opencv_createsamples -vec pos.vec
-img pos_singleobject.jpg
-bg negneg.txt
-num 100
-bgcolor 0
-bgthresh 0
-w 28 -h 28
-maxxangle 1.1
-maxyangle 1.1
-maxzangle 0.5
-maxidev 40
result vec file
input positive sample file
list of negative samples

100

100
opencv_createsamples -vec pos.vec
-info pos_multippos.txt
-num 4
-w 28 -h 28
result file
list of pos samples
Object detection: create samples from many positive

101

101
Object detection: verify vec file
opencv_createsamples -vec pos.vec
-show

102

102
Object detection: training
opencv_traincascade -data output_folder
-vec ../positive/positives.vex
-bg ../negative/bg_train.txt
-numPos 800
-numNeg 400
-numStages 10
-w 20
-h 20
existing folder
made by opencv_createsamples
list of negative samples

103

103
Train the Cascades
import cv2
import os
import tools_image
# ---------------------------------------------------------------------------------------------------------------------
path = 'data/ex12_single/'
filename_in = path + 'pos_single/object_to_detect.jpg'
filename_out = 'data/output/detect_result.jpg'
object_detector= cv2.CascadeClassifier(path + 'output_single/cascade.xml')
# ---------------------------------------------------------------------------------------------------------------------
def train_cascade():
os.chdir(path)
os.system('1-create_vec_from_single.bat')#os.system('2-verify_vec_single.bat')
os.system('3-train_from_single.bat')
return
# ---------------------------------------------------------------------------------------------------------------------
if __name__ == '__main__':
#train_cascade()
image = tools_image.desaturate(cv2.imread(filename_in))
objects, rejectLevels, levelWeights = 
object_detector.detectMultiScale3(image, scaleFactor=1.05, minSize=(20, 20), outputRejectLevels=True)
for (x, y, w, h) in objects:cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imwrite(filename_out,image)

104

104
Object detection: example of training

105

Detection of
patterns and
anomalies

106

106
Detection of patterns and anomalies
Input data:
o Still image of multiple similar objects sampled across the grid-like structure.
Output:
o Pattern – an image that best fits (represents) the sampled instances of the
similar objects
o Markup (labeling) of an image to indicate the positions of the patterns and
outliers as well as their confidence level
Limitations
o Samples on the input image have limited pose variance so their view angle is
similar (e.g. front view only, no side or back view)

107

107
Detection of patterns and anomalies
Synthetic image: a template is sampled thru the grid

108

108
Detection of patterns and anomalies
Input image: added some variance : Euclidian translation + gamma correction

109

109
Detection of patterns and anomalies
Result hitmap: confidence level to match the pattern
pattern
(auto detected)

110

110
Detection of patterns and anomalies
Result mark-up: positions of the patterns

111

111
Detection of patterns and anomalies
Real image

112

112
Detection of patterns and anomalies
Real image

113

113
Detection of patterns and anomalies
Real image

114

114
Detection of patterns and anomalies
Real image

115

115
Detection of patterns and anomalies
Result hitmap: confidence level to match the pattern
pattern
(auto detected)

116

116
Detection of patterns and anomalies
Result mark-up: positions of the patterns + outlier area

117

Stereo Vision

118

118
Stereo vision

119

119
Stereo vision

120

120
Stereo vision

121

121
Stereo vision
def get_disparity_v_01(imgL, imgR, disp_v1, disp_v2, disp_h1, disp_h2):
window_size = 7
left_matcher = cv2.StereoSGBM_create(
minDisparity=disp_h1,
numDisparities=int(0 + (disp_h2 - disp_h1) / 16) * 16,
blockSize=5,
P1=8 * 3 * window_size ** 2,
P2=32 * 3 * window_size ** 2,
disp12MaxDiff=disp_h2,
uniquenessRatio=15,
speckleWindowSize=0,
speckleRange=2,
preFilterCap=63,
mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)
right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
dispr = right_matcher.compute(imgR, imgL)
displ = left_matcher.compute(imgL, imgR)
wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)
wls_filter.setLambda(80000)
wls_filter.setSigmaColor(1.2)
filteredImg_L = wls_filter.filter(displ, imgL, None, dispr)
wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=right_matcher)
filteredImg_R = wls_filter.filter(dispr, imgR, None, displ)

122

122
Stereo vision

123

123
Stereo vision
def get_disparity_v_02(imgL, imgR, disp_v1, disp_v2, disp_h1, disp_h2):
max = numpy.maximum(math.fabs(disp_h1), math.fabs(disp_h2))
levels = int(1 + (max) / 16) * 16
stereo = cv2.StereoBM_create(numDisparities=levels, blockSize=15)
displ = stereo.compute(imgL, imgR)
dispr = numpy.flip(stereo.compute(numpy.flip(imgR, axis=1), numpy.flip(imgL, axis=1)), axis=1)
return -displ / 16, -dispr / 16

124

124
Stereo vision

125

125
Stereo vision
def get_best_matches(image1,image2,disp_v1, disp_v2, disp_h1, disp_h2, window_size=15,step=10):
N = int(image1.shape[0] * image1.shape[1] / step / step)
rand_points = numpy.random.rand(N,2)
rand_points[:, 0] = window_size + (rand_points[:, 0]*(image1.shape[0]-2*window_size))
rand_points[:, 1] = window_size + (rand_points[:, 1]*(image1.shape[1]-2*window_size))
coord1,coord2,quality = [],[],[]
for each in rand_points:
row,col = int(each[0]),int(each[1])
template = tools_image.crop_image(image1,row-window_size,col-window_size,row+window_size,col+window_size)
top,left,bottom,right = row - window_size + disp_v1, col - window_size + disp_h1,row + window_size + disp_v2,col + window_size + disp_h2
field = tools_image.crop_image(image2,top,left,bottom,right)
q = cv2.matchTemplate(field, template, method=cv2.TM_CCOEFF_NORMED)
q = q[1:, 1:]
q = (q + 1) * 128
idx = numpy.argmax(q.flatten())
q_best = numpy.max(q.flatten())
qq=q[int(idx/q.shape[1]), idx % q.shape[1]]
if q_best>0:
dr = int(idx/q.shape[1])+disp_v1
dc = idx % q.shape[1]+disp_h1
if col + dc>=0 and col + dc<image1.shape[1] and row + dr>=0 and row + dr<image1.shape[0]:
coord1.append([col ,row ])
coord2.append([col+ dc, row+dr])
quality.append(q_best)
return numpy.array(coord1), numpy.array(coord2),numpy.array(quality)

126

Labeling M Schlesinger
D Schlesinger
Flach Hlavác

127

127
T
T
set of objects
object
Kkt  label for object t
KTk :
set of labels
labeling
T
K all possible labelings
  Ttt   neighbor of t
 tttt kkg , weight of an edge
K
t
Labeling
Definitions

128

128
 
 tttt
tt
TtKk
kkgk T 


 ,&*

Find the valid labeling
   1,0,  tttt kkg
Labeling
The (OR, AND) problem

129

129
Find a labeling with maximal sum
 
 






tt
Tt
tttt
Kk
kkgk
T

,maxarg*
  Rkkg tttt  ,
Labeling
The (MAX, +) problem

130

130
Labeling
The (MAX, +) problem: the best path on graph

131

131
 
 






tt
Tt
tttt
Kk
kkgk
T

,maxarg*
Labeling
The (MAX, +) problem: the best path on graph

132

132
 
 






tt
Tt
tttt
Kk
kkgk T

,min*
  городамимеждурастояниеkkg tttt  ,
городаT 
дорогиK 
Labeling
The (MIN, +) problem: travelling salesman problem

133

133
Labeling
The (OR, AND) problem: example
 
 tttt
tt
TtKk
kkgk T 


 ,&*


134

134
 
 tttt
tt
TtKk
kkgk T 


 ,&*

Labels in all the neighbor objects should be
connected by an edge
Labeling
The (OR, AND) problem: example

135

135
 
 tttt
tt
TtKk
kkgk T 


 ,maxmin*

Labeling
The (MIN, MAX) problem: clustering
The distance between most different objects of
the same cluster should be minimal
T = objects
K = clusters
𝑔 𝑡𝑡′(𝑘 𝑡, 𝑘 𝑡′) = distance between the objects

136

136
 
 tttt
tt
TtKk
kkgk T 


 ,*

 
 
 
 
 
 
 
 
 maxmin,
minmax,
min,
max,
min,
max,
andor,
,
,






Count of valid labelings
Valid labeling
Travelling salesman problem
Travelling salesman problem
Optimization on Gibbs field
Optimization on Gibbs field
Hamiltonian path
Clustering
Labeling
The general labeling problem

137

137
Labeling
Some applications: texture segmentation

138

138
Labeling
Some applications: stereo vision

139

139
Labeling
Some applications: stereo vision

140

140
Labeling
(MAX, +) labeling problem on 1D chain structure

141

141
Labeling
(MAX, +) labeling problem on 1D chain structure

142

142
Labeling
(MAX, +) labeling problem on 1D chain structure: N best paths

143

143
       1 1 2 2 1 2 2 1, & , , & ,g k k g k k g k k g k k   
1 2 1 2> , >k k k k 
Labeling
(OR, AND) labeling problem with supermodular weights

144

144
Labeling
Supermodular weights

145

145
Labeling
NOT supermodular weights

146

146
Labeling
(OR, AND) labeling problem with supermodular weights: example

147

147
Labeling
(OR, AND) labeling problem with supermodular weights: example

148

148
Labeling
(OR, AND) labeling problem with supermodular weights: example

149

149
Labeling
(OR, AND) labeling problem with supermodular weights: example

150

150
Labeling
(OR, AND) labeling problem with supermodular weights: example

151

151
Labeling
(OR, AND) labeling problem with supermodular weights

152

152
Labeling
(OR, AND) labeling problem with supermodular weights: example

153

153
Labeling
(OR, AND) labeling problem with supermodular weights: example

154

154
1 2 1 2> , >k k k k 
       1 2 2 1 1 1 2 2, , , , , ,min g k k g k k min g k k g k k         
Labeling
(MAX, MIN) labeling problem with supermodular weights

155

155
Labeling
(MAX, MIN) labeling problem with supermodular weights

156

156
Labeling
(MAX, MIN) labeling problem with supermodular weights

157

157
Labeling
(MAX, MIN) labeling problem with supermodular weights

158

158
Labeling
(MAX, MIN) labeling problem with supermodular weights

159

159
Labeling
(MAX, MIN) labeling problem with supermodular weights

160

160
Labeling
(MAX, MIN) labeling problem with supermodular weights

161

161
Labeling
(MAX, MIN) labeling problem with supermodular weights

162

162
Labeling
(MAX, MIN) labeling problem with supermodular weights

163

163
Labeling
(MAX, MIN) labeling problem with supermodular weights

164

164
Labeling
(MAX, MIN) labeling problem with supermodular weights

165

165
 
 tttt
tt
TtKk
kkgk T 


 ,maxmin*

  кластераодногообъектамимеждуотличиеkkg tttt  ,
объектыT 
кластерыK 
Labeling
(MIN, MAX) labeling problem: clustering
The distance between most different objects of
the same cluster should be minimal

166

166
Labeling
(MIN, MAX) labeling problem: clustering

167

167
Labeling
(MIN, MAX) labeling problem: clustering
Remove all edges having weight > 0
Solve (OR,AND) problem

168

168
Labeling
(MIN, MAX) labeling problem: clustering
Remove all edges having weight ≥ 1
Solve (OR,AND) problem

169

169
Labeling
Remove all edges having weight ≥ 1
Solve (OR,AND) problem
(MIN, MAX) labeling problem: clustering

170

170
Labeling
(MIN, MAX) labeling problem: clustering
distance between most different objects of the same cluster = 1

More Related Content

Computer vision