Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
Computer Vision
Jia-Bin Huang, Virginia Tech
Many slides from S. Seitz, N Snavely, and D. Hoiem
Administrative stuffs
• HW 3
• Fundamental matrix
• Affine structure from motion
Motion [ˈmōSH(ə)n]:
Camera Location and Orientation
http://www.3dcadbrowser.com/download.aspx?3dmodel=40454
SfM Applications – Surveying
cultural heritage structure analysis
https://www.youtube.com/watch?v=1HhOmF22oYA
SfM Applications – Visual effect
(matchmove)
https://www.youtube.com/watch?v=bK6vCPcFkfk
Steps
Images Points: Structure from Motion
Points More points: Multiple View Stereo
Points Meshes: Model Fitting
+ Meshes Models: Texture Mapping
+ + =
Slide credit: J. Xiao
Steps
Images Points: Structure from Motion
Points More points: Multiple View Stereo
Points Meshes: Model Fitting
+ Meshes Models: Texture Mapping
+ + =
Slide credit: J. Xiao
Steps
Images Points: Structure from Motion
Points More points: Multiple View Stereo
Points Meshes: Model Fitting
+ Meshes Models: Texture Mapping
+ + + =
Slide credit: J. Xiao
Steps
Images Points: Structure from Motion
Points More points: Multiple View Stereo
Points Meshes: Model Fitting
+ Meshes Models: Texture Mapping
Example: https://photosynth.net/
upT3 p1T
T T
v p p
AX 0 A 3T 2
up3 p1T
T T
v p3 p2
Further reading: HZ p. 312-313
Triangulation: Linear Solution
𝒑 𝑻𝟏 𝒑𝑻𝟏 𝑿
u
x wv
p1T
P pT2
pT3
𝑢
1[] [] [ ]
𝐱=𝑤 𝑣 = 𝑷𝑿 = 𝒑 𝑻 𝑿 = 𝒑𝑻 𝑿
𝒑𝟑
𝟐
𝑻
𝟐
𝑻
𝒑𝟑 𝑿
1 𝑢 𝒑 𝑻
𝑿 𝒑 𝑻
𝟏 𝑿
[][ ][ ]
𝑢 𝟑
𝑤 𝑣 = 𝑣 𝒑𝑻 𝑿 = 𝒑 𝑻 𝑿
𝟑 𝟐
1 𝑻 𝑻
u p
1
T
𝒑𝟑 𝑿 𝒑𝟑 𝑿
x w v P pT
2
1 pT
3
x PX x P X
Triangulation: Linear Solution u u
x wv x w v
Given P, P’, x, x’ 1 1
1. Precondition points and projection
matrices p1T p1T
2. Create matrix A
P pT2 P pT
3. [U, S, V] = svd(A) 2
p 3
T
p3T
4. X = V(:, end)
upT3 p1T
Pros and Cons T T
vp 3 p 2
• Works for any number of correspondingA up3T p1T
images T T
v p3 p 2
• Not projectively invariant
Code: http://www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_X_from_xP_lin.m
Triangulation: Non-linear Solution
• Minimize projected error while satisfying
=0
𝑐𝑜𝑠𝑡 ( 𝑿 )=𝑑𝑖𝑠𝑡 ( 𝒙 , ^𝒙 )2+𝑑𝑖𝑠𝑡 ( 𝒙 ′ , ^𝒙 ′ )2
𝒙 ′
𝒙
^
𝒙 𝒙′
^
Figure source: Robertson and Cipolla (Chpt 13 of Practical Image Processing and Computer Vision)
Triangulation: Non-linear Solution
• Minimize projected error while satisfying
=0
𝑐𝑜𝑠𝑡 ( 𝑿 )=𝑑𝑖𝑠𝑡 ( 𝒙 , ^𝒙 )2+𝑑𝑖𝑠𝑡 ( 𝒙 ′ , ^𝒙 ′ )2
x1j
x3j
x2j
P1
P3
P2
Slides from Lana Lazebnik
Projective structure from motion
• Given: m images of n fixed 3D points
• xij = Pi Xj , i = 1,… , m, j = 1, … , n
• Problem:
• Estimate unknown m projection matrices Pi and n 3D points Xj
from the known mn corresponding points xij
• With no calibration info, cameras and points can only be
recovered up to a 4x4 projective transformation Q:
• X → QX, P → PQ-1
• We can solve for structure and motion when
2mn >= 11m + 3n – 15
DoF in Pi DoF in Xj Up to 4x4 projective tform Q
• For two cameras, at least 7 points are needed
Sequential structure from motion
•Initialize motion (calibration) from
two images using fundamental matrix
cameras
camera using all the known 3D points
that are visible in its image –
calibration/resectioning
Sequential structure from motion
•Initialize motion from two images
using fundamental matrix
cameras
camera using all the known 3D points
that are visible in its image –
calibration
cameras
camera using all the known 3D
points that are visible in its image –
calibration
E (P, X) D x ij , Pi X j
m n
i 1 j 1
Xj
• Theory:
The Levenberg
–Marquardt algorithm
P1 X j • Practice:
x1j x3j The Ceres-Solver from Google
P3Xj
P2Xj x2j
P1
P3
P2
Auto-calibration
(best method with software available; also has good overview of recent methods)
C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:
A factorization method. IJCV, 9(2):137-154, November 1992.
Orthographic Projection - Examples
Orthographic projection for
rotated/translated camera
a2
a1 X
Affine structure from motion
• Affine projection is a linear mapping + translation in
homogeneous coordinates
X
x a11 a12 a13 t x
x x Y t AX t
y a 21 a 22 a 23 y
Z
a2
X Projection of
a1 world origin
1 n
x i Ai X t i xˆ ij x ij x ik
n k 1
1 n 1 n 1 n
x ij x ik A i X j t i A i X k t i A i X j X k A i X
ˆ
j
n k 1 n k 1 n k 1
ˆ
xˆ ij Ai X j
2d normalized point 3d normalized point
(observed)
Linear (affine) mapping
Suppose we know 3D points and
affine camera parameters …
then, we can compute the observed 2d
positions of each point
A1 xˆ 11 xˆ 12 xˆ 1n
A xˆ xˆ 22 xˆ 2 n
2 X1 X2 Xn
21
ˆ ˆ ˆ
A m 3D Points (3xn) x m1 x m2 x mn
AX
Source: M. Hebert
Factorizing the measurement matrix
• Singular value decomposition of D:
Source: M. Hebert
Factorizing the measurement matrix
• Singular value decomposition of D:
Source: M. Hebert
Factorizing the measurement matrix
• Obtaining a factorization from SVD:
Source: M. Hebert
Factorizing the measurement matrix
• Obtaining a factorization from SVD:
~ ~
A X
Source: M. Hebert
Affine ambiguity
~ ~
A S
X
• Why?
We have only an affine transformation and we have
not enforced any Euclidean constraints
(e.g., perpendicular image axes) Source: M. Hebert
Eliminating the affine ambiguity
a1 · a2 = 0
x
|a1|2 = |a2|2 = 1
a2
a1 X
Source: M. Hebert
Solve for orthographic constraints
Three equations for each image i
~ T~
ai1 CC ai1 1
T
~ ~
a T
~
aiT2 CCT ~
ai 2 1 where A i ~ T
i1
~ T~
a CC a 0
T ai 2
i1 i2
•
𝐿11
[]
𝐿1 2
𝐿1 3
𝐿21
[ 𝑎 𝑑 𝑏𝑑 𝑐𝑑 𝑎 𝑒 𝑏𝑒 𝑐𝑒 𝑎 𝑓 𝑏𝑓 𝑐𝑓 ] 𝐿22 =k
𝐿23
𝐿31
𝐿32
𝐿33
How to solve L = CCT ?
•
𝐿11
[]
𝐿1 2
𝐿1 3
𝐿21
[ 𝑎 𝑑 𝑏𝑑 𝑐𝑑 𝑎 𝑒 𝑏𝑒 𝑐𝑒 𝑎 𝑓 𝑏𝑓 𝑐𝑓 ] 𝐿22 =k
𝐿23
𝐿31
𝐿32
reshape([a b c]’*[d e f], [1, 9])
𝐿33
Algorithm summary
• Given: m images and n tracked features xij
• For each image i, center the feature coordinates
• Construct a 2m × n measurement matrix D:
• Column j contains the projection of point j in all views
• Row i contains one coordinate of the projections of all the n points
in image i
• Factorize D:
• Compute SVD: D = U W VT
• Create U3 by taking the first 3 columns of U
• Create V3 by taking the first 3 columns of V
• Create W3 by taking the upper left 3 × 3 block of W
• Create the motion (affine) and shape (3D) matrices:
A = U3W3½ and S = W3½ V3T
• Eliminate affine ambiguity
• Solve L = CCT using metric constraints
• Solve C using Cholesky decomposition
• Update A and X: A = AC, S = C-1S Source: M. Hebert
Dealing with missing data
• So far, we have assumed that all points are
visible in all views
• In reality, the measurement matrix typically
looks something like this:
cameras
points
One solution:
• solve using a dense submatrix of visible points
• Iteratively add new cameras
Reconstruction results
C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:
A factorization method. IJCV, 9(2):137-154, November 1992.
Further reading
5. Non-maxima suppression 59
har
Review of Affine SfM from Interest
Points
2. Correspondence via Lucas-Kanade tracking
Original (x,y) position
b) Compute (u,v) by
Solve for
orthographic
constraints
HW 3 – Part 1 Epipolar Geometry
Problem: recover F from matches with outliers
load matches.mat
[c1, r1] – 477 x 2
[c2, r2] – 500 x 2
matches – 252 x 2
Write-up:
•Describe what test you used for deciding inlier vs. outlier.
•Display the estimated fundamental matrix F after normalizing to unit length
•Plot the outlier keypoints with green dots on top of the first image plot(x, y, '.g');
•Plot the corresponding epipolar lines
Distance of point to epipolar line
l=Fx=[a b c]
.x .
x‘=[u v 1]
′ ¿
𝑑 𝑙 ,𝑥 =¿ 𝑎𝑢+𝑏𝑣+𝑐∨ 2 2 ¿
( )
√ 𝑎 +𝑏
HW 3 – Part 2 Affine SfM
Problem: recover motion and structure
load tracks.mat
~ T~
a CC a 0
T ai 2
i1 i2
- Building
- Tractor
- Camera
HW 3 – Graduate credits
Automatic vanishing point detection
Input:
• lines: a matrix of size [NumLines x 5] where each row represents a line
segment with (x1, y1, x2, y2, lineLength)
Output:
• VP: [2 x 3] each column corresponds to a vanishing point in the order of
X, Y, Z
• lineLabel: [NumLine x 3] each column is a logical vector indicating which
line segments correspond to the vanishing point.
HW 3 – Graduate credits
Epipolar Geometry
Source: Y. Furukawa
Multi-view stereo: Basic idea
Source: Y. Furukawa
Multi-view stereo: Basic idea
Source: Y. Furukawa
Multi-view stereo: Basic idea
Source: Y. Furukawa
Plane Sweep Stereo
reference
camera
Scene surface
Sweeping
plane
Image 2
Image 1
Plane Sweep Stereo
• “Shape and motion from image streams under orthography: A factorization method.” C.
Tomasi and T. Kanade, IJCV, 9(2):137-154, November 1992
• “An efficient solution to the five-point relative pose problem”, Nister, PAMI 2004
• “Accurate, dense, and robust multiview stereopsis”, Furukawa and Ponce, CVPR 2007