Lecture 13

Stereo
16-385 Computer Vision

http://www.cs.cmu.edu/~16385/ Spring 2020, Lecture 13
Course announcements
• Homework 3 is due on March 4th.

- How many of you have looked at/started/finished homework 3?
• Take-home quiz 5 is due on March 1st.

Overview of today’s lecture
• Leftover from two-view geometry.
• Revisiting triangulation.
• Disparity.
• Stereo rectification.
• Stereo matching.
• Improving stereo matching.
• Structured light.
Slide credits
Some of these slides were adapted directly from:
• Kris Kitani (16-385, Spring 2017).

• Srinivasa Narasimhan (16-823, Spring 2017).
Revisiting triangulation
How would you reconstruct 3D points?
Left image Right image


1. Select point in one image (how?)

2. Form epipolar line for that point in second image (how?)

3. Find matching point along line (how?)

3. Find matching point along line (how?)
4. Perform triangulation (how?)
Triangulation
3D point
left image right image
left camera with matrix right camera with matrix


2. Form epipolar line for that point in second image (how?) What are the disadvantages
3. Find matching point along line (how?) of this procedure?
4. Perform triangulation (how?)
Stereo rectification
What’s different between these two images?
Objects that are close move more or less?
The amount of horizontal movement is
inversely proportional to …
The amount of horizontal movement is
inversely proportional to …
… the distance from the camera.
More formally…
3D point
image plane
camera center camera center

image plane
(baseline)
How is X related to x?
(baseline)
(baseline)
How is X related to x’?
(baseline)
(baseline)
(baseline)
Disparity
(wrt to camera origin of image plane)
(baseline)
Disparity
inversely proportional
to depth
Real-time stereo sensing
Nomad robot searches for meteorites in Antartica

http://www.frc.ri.cmu.edu/projects/meteorobot/index.html
Subaru
Eyesight system
Pre-collision
braking
What other vision system uses
disparity for depth sensing?
This is how 3D movies work
Is disparity the only depth cue
the human visual system uses?
So can I compute depth from any two
images of the same object?
So can I compute depth from any two
images of the same object?
1. Need sufficient baseline
2. Images need to be ‘rectified’ first (make epipolar lines horizontal)

1. Rectify images
(make epipolar lines horizontal)
2. For each pixel
a. Find epipolar line
b. Scan line for best match
c. Compute depth from disparity
How can you make the epipolar lines horizontal?
3D point
image plane
camera center camera center
What’s special about these two cameras?

When are epipolar lines horizontal?
When this relationship holds:
R=I t = (T, 0, 0)
x’
t
Proof in take-home quiz 5
It’s hard to make the image planes exactly parallel
How can you make the epipolar lines horizontal?
Use stereo rectification?
What is stereo rectification?
Reproject image
planes onto a
common plane
parallel to the line
between camera
centers
How can you do this?

Reproject image
planes onto a
common plane
parallel to the line
between camera
centers
Need two
homographies (3x3
transform), one for
each input image
reprojection
C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision.Computer Vision and Pattern Recognition, 1999.
Stereo Rectification
1. Rotate the right camera by R
(aligns camera coordinate system orientation only)
2. Rotate (rectify) the left camera so that the epipole

is at infinity
3. Rotate (rectify) the right camera so that the epipole

is at infinity
4. Adjust the scale

Stereo Rectification:
1. Compute E to get R
2. Rotate right image by R
3. Rotate both images by Rrect
4. Scale both images by H
rotate by R
rotate by Rrect
rotate by Rrect
scale by H
scale by H
Step 1: Compute E to get R
SVD: Let
We get FOUR solutions:
two possible rotations two possible translations

We get FOUR solutions:
Which one do we choose?

Compute determinant of R, valid solution must be equal to 1
(note: det(R) = -1 means rotation and reflection)
Compute 3D point using triangulation, valid solution has positive Z value

(Note: negative Z means point is behind the camera )
Let’s visualize the four configurations…
image plane
optical axis
Camera Icon
camera center
Find the configuration where the point is in front of both cameras

Find the configuration where the points is in front of both cameras
Find the configuration where the points is in front of both cameras
When do epipolar
lines become
horizontal?
Parallel cameras
Where is the epipole?

Parallel cameras
epipole at infinity
Setting the epipole to infinity
(Building Rrect from e)
epipole e
Let Given: (using SVD on E)
(translation from E)
epipole coincides with translation vector
cross product of e and

the direction vector of
the optical axis
orthogonal vector
If and orthogonal
then
If and orthogonal
then
Where is this point located on the image plane?

If and orthogonal
then
Where is this point located on the image plane?

At x-infinity
Stereo Rectification Algorithm
1. Estimate E using the 8 point algorithm (SVD)
2. Estimate the epipole e (SVD of E)
3. Build Rrect from e
4. Decompose E into R and T
5. Set R1=Rrect and R2 = RRrect
6. Rotate each left camera point (warp image)

[x’ y’ z’] = R1 [x y z]
7. Rectified points as p = f/z’[x’ y’ z’]
8. Repeat 6 and 7 for right camera points using R2

What can we do after
rectification?
Stereo matching
Depth Estimation via Stereo Matching
1. Rectify images
(make epipolar lines horizontal)
2. For each pixel
a. Find epipolar line
How would
b. Scan line for best match
you do this?
c. Compute depth from disparity
Reminder from filtering
How do we detect an edge?
Reminder from filtering
How do we detect an edge?
• We filter with something that looks like an edge.
* 1 0 -1
horizontal edge filter

1
* 0
original -1
We can think of linear filtering as a way to evaluate

how similar an image is locally to some template.
vertical edge filter
Find this template
How do we detect the template in he following image?
Find this template
filter
output What will

the output
look like?
image
Solution 1: Filter the image using the template as filter kernel.

Find this template
filter
output
image
Solution 1: Filter the image using the template as filter kernel. What went wrong?
Find this template
filter
output
image
Increases for higher

Solution 1: Filter the image using the template as filter kernel. local intensities.
Find this template
filter template mean
output What will

the output
look like?
image
Solution 2: Filter the image using a zero-mean template.

Find this template
output
output True detection
thresholding
image False
detections
Solution 2: Filter the image using a zero-mean template. What went wrong?
Find this template
output
output
Not robust to high-

contrast areas
image
Solution 2: Filter the image using a zero-mean template.

Find this template
filter
output What will

the output
look like?
image
Solution 3: Use sum of squared differences (SSD).

Find this template
1-output
filter
output True detection
thresholding
image
Solution 3: Use sum of squared differences (SSD). What could go wrong?

Find this template
1-output
filter
output
Not robust to local

intensity changes
image
Solution 3: Use sum of squared differences (SSD).

Find this template
Observations so far:
• subtracting mean deals with brightness bias
• dividing by standard deviation removes contrast bias
Can we combine the two effects?

Find this template
What will
the output
look like?
output
local patch mean

image
Solution 4: Normalized cross-correlation (NCC).

Find this template
1-output
True detections
thresholding

Find this template
1-output
True detections
thresholding

What is the best method?
It depends on whether you care about speed or invariance.
• Zero-mean: Fastest, very sensitive to local intensity.
• Sum of squared differences: Medium speed, sensitive to intensity offsets.
• Normalized cross-correlation: Slowest, invariant to contrast and brightness.

Stereo Block Matching
Left Right
scanline
Matching cost
disparity
• Slide a window along the epipolar line and compare contents of

that window with the reference window in the left image
• Matching cost: SSD or normalized correlation
SSD
Normalized cross-correlation
Similarity Measure Formula
Sum of Absolute Differences (SAD)
Sum of Squared Differences (SSD)
Zero-mean SAD
Locally scaled SAD
Normalized Cross Correlation (NCC)
SAD SSD NCC Ground truth

Effect of window size
W=3 W = 20
Effect of window size
W=3 W = 20
Smaller window Larger window

+ More detail + Smoother disparity maps
- More noise - Less detail
- Fails near boundaries
When will stereo block matching fail?
When will stereo block matching fail?
textureless regions repeated patterns
specularities
Improving stereo matching
Block matching Ground truth
What are some problems with the result?

How can we improve depth estimation?
How can we improve depth estimation?
Too many discontinuities.
We expect disparity values to change slowly.
Let’s make an assumption:

depth should change smoothly
Stereo matching as …
Energy Minimization
What defines a good stereo correspondence?

1. Match quality
– Want each pixel to find a good match in the other image
2. Smoothness
– If two pixels are adjacent, they should (usually) move about the same
amount
energy function
(for one pixel)
{
{
data term smoothness term
Want each pixel to find a good match Adjacent pixels should (usually)
in the other image move about the same amount
(block matching result) (smoothness function)
data term
SSD distance between windows
centered at I(x, y) and J(x+ d(x,y), y)
SSD distance between windows
centered at I(x, y) and J(x+ d(x,y), y)
smoothness term
: set of neighboring pixels

4-connected 8-connected
neighborhood neighborhood
smoothness term
L1 distance
“Potts model”
One possible solution…
Dynamic Programming
Can minimize this independently per scanline

using dynamic programming (DP)
: minimum cost of solution such that d(x,y) = d

Match only Match & smoothness (via graph cut)
Ground Truth
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001
All of these cases remain difficult, what can we do?
textureless regions repeated patterns
specularities
Structured light
Use controlled (“structured”) light to make correspondences easier
Disparity between laser points on

the same scanline in the images
determines the 3-D coordinates of
the laser point on object
Use controlled (“structured”) light to make correspondences easier
Structured light and two cameras
laser
I J
Structured light and one camera
Projector acts like

“reverse” camera
I J
Example: Laser scanner
Digital Michelangelo Project

http://graphics.stanford.edu/projects/mich/
15-463/15-663/15-862 Computational Photography
Learn about structured light and other cameras – and build some on your own!
cameras that take video at the speed of light cameras that measure depth in real time
cameras that capture

entire focal stacks
cameras that see around corners http://graphics.cs.cmu.edu/courses/15-463/

References
Basic reading:
• Szeliski textbook, Section 8.1 (not 8.1.1-8.1.3), Chapter 11, Section 12.2.
• Hartley and Zisserman, Section 11.12.

Lecture 13

Uploaded by

Copyright:

Available Formats

Lecture 13

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 13

Uploaded by

Copyright:

Available Formats

Stereo

16-385 Computer Vision

• Homework 3 is due on March 4th.

• Take-home quiz 5 is due on March 1st.

• Leftover from two-view geometry.

• Improving stereo matching.

Some of these slides were adapted directly from:

• Kris Kitani (16-385, Spring 2017).

Left image Right image

Left image Right image

Left image Right image

Left image Right image

Left image Right image

left image right image

left camera with matrix right camera with matrix

Left image Right image

… the distance from the camera.

camera center camera center

Nomad robot searches for meteorites in Antartica

1. Need sufficient baseline

2. Images need to be ‘rectified’ first (make epipolar lines horizontal)

camera center camera center

What’s special about these two cameras?

How can you do this?

2. Rotate (rectify) the left camera so that the epipole

3. Rotate (rectify) the right camera so that the epipole

4. Adjust the scale

We get FOUR solutions:

two possible rotations two possible translations

Which one do we choose?

Compute 3D point using triangulation, valid solution has positive Z value

Find the configuration where the point is in front of both cameras

Where is the epipole?

epipole coincides with translation vector

cross product of e and

Where is this point located on the image plane?

Where is this point located on the image plane?

2. Estimate the epipole e (SVD of E)

3. Build Rrect from e

4. Decompose E into R and T

5. Set R1=Rrect and R2 = RRrect

6. Rotate each left camera point (warp image)

7. Rectified points as p = f/z’[x’ y’ z’]

8. Repeat 6 and 7 for right camera points using R2

horizontal edge filter

We can think of linear filtering as a way to evaluate

output What will

Solution 1: Filter the image using the template as filter kernel.

Increases for higher

filter template mean

output What will

Solution 2: Filter the image using a zero-mean template.

filter template mean

output True detection

filter template mean

Not robust to high-

Solution 2: Filter the image using a zero-mean template.

output What will

Solution 3: Use sum of squared differences (SSD).

output True detection