Structure from Motion

•

PUC-RIO

Daniel Costa

11/09/2014

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 3, do total de 39 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 6, do total de 39 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 9, do total de 39 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Visão Computacional

209 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

Structure from Motion 
 
 
Raul Queiroz Feitosa 
6/11/2013 Structure from Motion 2 
Objectives 
 
 
 State the Structure from Motion (SFM) problem, 
 Introduce SFM techniques for small relief (affine), and 
 Introduce SFM techniques for large relief (projective). 
6/11/2013 Structure from Motion 3 
Contents 
 Introduction 
 Affine SFM 
 Projective SFM 
6/11/2013 Structure from Motion 4 
Structure from motion 
 Automatic recovery of camera motion and scene 
structure from two or more images. 
 
 
 
 
 
 
 
 
 
 also called automatic camera tracking or matchmoving. 
6/11/2013 Structure from Motion 5 
Applications of SFM 
 Computer vision: 
 multiple-view shape reconstruction, 
 novel view synthesis, 
 autonomous vehicle navigation. 
 … 
 Film production: 
 insertion of computer-generated imagery (CGI) into live-
action backgrounds, 
 … 
 
6/11/2013 Structure from Motion 6 
Stereopsis × SFM 
 Stereopsis 
 Intrinsic camera parameters known (calibrated 
cameras), and 
 Extrinsic camera parameters known (camera pose 
relative to a fixed world coordinate system known) 
 Structure from Motion 
 Camera parameters are (partially) unknown and 
 possibly change over time. 
6/11/2013 Structure from Motion 7 
SFM problem formulation 
 
 Given the images pij seen by m cameras of n fixed 
3D points Pj (homogeneous) 
 
 estimate the m projection matrices Mi and the 3D 
points Pj from the mn correspondences pij. 
 
6/11/2013 Structure from Motion 8 
Structure from Motion 
Contents: 
 Introduction 
 Affine SFM 
 Projective SFM 
6/11/2013 Structure from Motion 9 
Affine approximation 
 If scene's relief is small compared with distance zj to the camera 
the perspective projection may be approximated by affine 
models. 
increasing focal length / distance from camera 
center at 
infinite 
6/11/2013 Structure from Motion 10 
Affine model 
 For affine cameras (see chapter on camera models), the 
previous equation takes the (nonhomogeneous) form 
 
 
 where Mi is the affine projection matrix, for i=1,…, m, 
j=1,…,n. 
 Dropping i and j from the notation above, yields 
iji
j
iij bP
P
p 





 M
1
2×1 2×3 3×1 2×1 
bPp 




























 
2
1
232221
131211
b
b
z
y
x
aaa
aaa
v
u
2×3 3×1 2×1 
6/11/2013 Structure from Motion 11 
Affine model 
 Thus 
 
 
 
 involves 2mn equations in 8m+3n unknowns. 
 Hence, for a sufficient number of views (m) and a 
sufficient number of points (n), camera motion (Mi) and 
scene structure (Pi) may be recovered. 
iji
j
iij bP
P
p 





 M
1
2×1 2×4 3×1 2×3 3×1 2×1 
6/11/2013 Structure from Motion 12 
Affine ambiguity 
Affine Ambiguity 
 If are solutions to the SFM affine problem, 
so are , where 
 
 
 and Q is an arbitrary affine transformation matrix, - 
that is, it can be written as 
 
 , 
 
 where C is a nonsingular 3x3 matrix and d R3. 
ji P and M ji P and M











 
 
1
 
1
 and 1
jj
ii QQ
PP
MM 
1
 with 
1
11
1







 









TT
QQ
00
dd CCC
6/11/2013 Structure from Motion 13 
Affine ambiguity 
Affine Ambiguity 
 The affine SFM problem can only be defined up to an 
affine transformation ambiguity. 
 
 Taking into account the 12 parameters of the general 
affine transformation, a solution may exist as long as 
 
2mn ≥ 8m+3n-12 
 
 Thus, for 2 views, we need 4 point correspondences. 
# of Mi 
# of Pj 
6/11/2013 Structure from Motion 14 
Offset to the center of mass 
 Translate the origin of the image coordinate system to the 
center of mass pi0 of the observed points (pij → pij - pi0 ), 
by doing 
 
 
 
 For simplicity, the origin of the world coordinate system 
is moved to the centroid of the 3D points (Pj → Pj - P0 ) 
 This yields 
  
 







n
k
n
k
kjiikiiji
n
k
ikijij
nnn 1 11
111
PPbPbPppp 
jiij Pp 
6/11/2013 Structure from Motion 15 
The measurement matrixx 
 Let’s create a 2m × n data (measurement) matrix D: 
 
 
 
 
 
 
 
 The measurement matrix D = P must have rank 3! 
  P



D 





























 n
mmnmm
n
n
PPP
ppp
ppp
ppp






21
2
1
21
22221
11211
 
ca
m
er
as
 
(2
 m
) 
points (n) 
2m×n 2m×3 3×n 
6/11/2013 Structure from Motion 16 
Tomasi-Kanade factorization 
Applying SVD to matrix D 
D=UWVT 
Since rank(D)=3 (noise free) 
 
 U= W= VT= 
 
Therefore 
D= U3 W3 VT3 = P 
In practice, due to noise, inaccuracy in location of the 
interest points, and the mere fact that cameras are not 
affine, the matrix D is full rank. 
 
U3 Un-3 
W3 0 
0 0 
VT3. 
VTn-3 
2m×n 2m×3 3×3 3×n 
6/11/2013 Structure from Motion 17 
Tomasi-Kanade factorization algorithm 
1. Compute the singular value decomposition. 
D=UWVT 
 
2. Construct the matrices U3, V3 and W3 formed by the three 
leftmost columns of the matrices U and V W (the ones 
corresponding to the largest singular values), and the 
corresponding 3×3 sub matrix of W. 
 
3. Define 
0= U3 (W3)½ P0 = (W3)½ VT3 
 the 2m×3 matrix is an estimate of the camera motion, 
and the 3×n matrix is an estimate of the scene structure. 
 
It can be proven that W3 is the closest rank-3 approximation of D. 
6/11/2013 Structure from Motion 18 
Recalling the affine ambiguity 
 The decomposition 
 
0= U3 (W3)
½
 P0 = (W3)
½
 VT3 
 is not unique. 
 
 For any 3×3 non singular matrix C 
= 0 C and P = C-1 P0 
 is also a solution (affine ambiguity). 
6/11/2013 Geometric Camera Models 19 
Affine camera models 
Summary of affine mappings 
p=M WP , where 
 
 
 
 
 
 
 
Note that in all cases the 3rd row is =(0 0 0 1)! 











1
2
1
0
y
T
x
T
t
t
r
r
M













1
2
1
0
y
T
x
T
t
t
r
r
M











1
2
1
0
y
T
x
T
t
t


r
r
M



















 

11
0sin
0cot
2
1
0
y
T
x
T
t
t
r
r


M
orthographic - 5 dof scaled orthographic – 6 dof 
weak-perspective - 7 dof general affine - 8 dof 
6/11/2013 Structure from Motion 20 
Eliminating the affine ambiguity 
 From a previous slide 
 
 the image coordinates pij are the projections of Pij, 
on the vectors aTi1 and a
T
i2 , the rows of i. 
 Consider scaled orthographic projection: image 
axes are perpendicular. 
 
jiij Pp 
 











2211
21
2211
21 0
ˆˆˆˆ
0ˆˆ
i
TT
ii
TT
i
i
TT
i
i
T
ii
T
i
i
T
iaaaa
aa
aaaa
aa
CCCC
CC
6/11/2013 Structure from Motion 21 
Eliminating the affine ambiguity 
 Scaled orthographic: image axes are perpendicular 
 
 
 
 
 
 
 
This translates into 2m quadratic equations in the elements of C 
 Solve for C 
 Update  and P →  = 0C and P = C-1 P0 
 Note that the solution is defined up to an arbitrary rotation. 
 Common practice: → first camera frame ≡ world frame. 
p 
P 
i1aˆ
2
ˆ
ia
 











2211
21
2211
21 0
ˆˆˆˆ
0ˆˆ
i
TT
ii
TT
i
i
TT
i
i
T
ii
T
i
i
T
i
aaaa
aa
aaaa
aa
CCCC
CC
6/11/2013 Structure from Motion 22 
Tomasi-Kanade algorithm summary 
 Given: m images and n features pij 
 For each image i, center the feature coordinates 
 Construct a 2m × n measurement matrix D: 
 Column j has the projection of point Pj in all views 
 Row i has one coordinate of the projections of all n points in image i 
 Factorize D : 
 Compute SVD: D = U W VT 
 Create U3 by taking the first 3 columns of U 
 Create V3 by taking the first 3 columns of V 
 Create W3 by taking the upper left 3 × 3 block of W 
 Create the motion and shape matrices: 
 0 = U3W3½ and P0 = W3½ V3T (or 0 = U3 and P0 = W3V3T) 
 Eliminate affine ambiguity. 
Source: M. Hebert 
6/11/2013 Structure from Motion 23 
Reconstruction results 
Figures from 
C. Tomasi and T. Kanade. 
Shape and motion from 
image streams under 
orthography: A 
factorization method. 
International Journal of 
Computer Vision, 9(2):137-
154, November 1992. 
6/11/2013 Structure from Motion 24 
Contents 
 Introduction 
 Affine SFM 
 Projective SFM 
6/11/2013 Structure from Motion 25 
Projective model 
 For projective cameras we have 
 
 
 
 
 
 for i=1,…,m and j=1,…,n where mTi1, m
T
i2, m
T
i3, are the 
rows of a 3 ×4 matrix Mi and Pj denotes the homo-
geneous coordinate vector of the point Pj in some world 
coordinate system. 
 or 
1
1
3
2
3
1





















ji
ji
ij
ji
ji
ij
ji
ij
ij
ij
ij
v
u
z
v
u
Pm
Pm
Pm
Pm
Pp M
6/11/2013 Structure from Motion 26 
Projective ambiguity 
 
 
 
 
 According to Theorem 1 (see chapter on Camera 
Model) Mi is an arbitrary rank-3 3×4 matrix. 
 Hence, if Mi and Pj are solutions for the equation 
above, so are 
M’i =Mi Q and P’j =Q-1 Pj , 
 where Q is an arbitrary projective transformation matrix, 
i.e., any non singular 4×4 matrix. 
 or 
1
1
3
2
3
1





















ji
ji
ij
ji
ji
ij
ji
ij
ij
ij
ij
v
u
z
v
u
Pm
Pm
Pm
Pm
Pp M
6/11/2013 Structure from Motion 27 
Projective ambiguity 
The 4×4 matrix Q is defined up to a scale, since multiplying it 
by a nonzero scalar amounts to applying inverse scaling to Mi 
and Pj. 
Thus, the projective SFM problem can only be defined up to a 
projective transformation ambiguity. 
Taking into account the projective ambiguity the problem has a 
finite number of solutions as soon as 
 
2mn ≥ 11m+3n-15 
 
For two views, we need seven point correspondences. 
# of Mi 
# of Pj 
Bilinear projection SFM 
As in algebraic reconstruction, we do 
 
 
So, we look for a solution that minimizes 
 
 
This can be done by alternating steps where Pj are 
kept constant (estimated) while Mi are estimated 
(kept constant). 
6/11/2013 Structure from Motion 28 
zij pij = Mi Pj  pij  Mi Pj = 0 
2
 
ij
jiijE Pp M
6/11/2013 Structure from Motion 29 
Bundle Adjustment 
Given estimates for the matrices Mi (i=1,…,m) and vectors 
Pj (j=1,…,n) , refine them by using non linear least square 
to minimize the global reprojection error E: 
 
 
 
 
Expensive, but encompasses a physically significant error 
measure. 
As initial guesses use Pj from Affine SFM e and from them 
compute Mi. 































ji ji
ji
ij
ji
ji
ij vu
mn
E
,
2
3
2
2
3
11
Pm
Pm
Pm
Pm
6/11/2013 Structure from Motion 30 
Eliminating the projective ambiguity 
Let us assume that Mi (i=1,…,m) and vectors Pj (j=1,…,n) 
have been estimated. 
If Mi and Pj are a reconstruction in a particular Euclidean 
coordinate system, there exist a 4×4 matrix Q such that 
 
 
The next slides show two methods of computing the 
Euclidean upgrade matrix Q, when (some of) the intrinsic 
parameters are known. 
iiii PP
1ˆ and ˆ  QQMM
ˆ ˆ 
6/11/2013 Structure from Motion 31 
Eliminating the projective ambiguity 
Method 1 (calibrated cameras): 
 Mi (like Mi) is defined up to a scale, so 
 
 where ρi accounts for the scale of Mi and Ki is the 
calibration matrix. 
 Writing Q=(Q3 q4), where Q3 is 4×3 matrix and q4 is 
a vector in R4 , yields 
 
 If the camera is calibrated (Ki is known), Mi Q3 are 
scaled rotation matrix, Thus …. 
 )( iiiii t RKM 
 3 iiii RKQM 
ˆ 
ˆ 
6/11/2013 Structure from Motion 32 
Eliminating the projective ambiguity 
Method 1 (cont.): 
 … the rows (j=1,2,3) of Mi are perpendicular vectors, so 
 
 
 
 
 
 
 
 
 
 where Qi is defined up to a scale. To determine it uniquely we 
assume that the world coordinate system coincides with the first 
camera’s frame. 
 
T
ijm














0
0
0
0
0
33332332
23321331
1333
3332
2331
T
i
TT
i
T
i
TT
i
T
i
TT
i
T
i
TT
i
T
i
TT
i
T
i
TT
i
T
i
TT
i
QQQQ
QQQQ
QQ
QQ
QQ
mmmm
mmmm
mm
mm
mm
 
 
 
6/11/2013 Structure from Motion 33 
Eliminating the projective ambiguity 
Method 1 (cont.): 
 Given m images, we obtain 12 linear and 5(m-1) 
quadratic equations in the coefficients of Q that can be 
solved using non linear least squares. 
 The vector q4 of matrix Q 
Q=(Q3 q4), 
 can be determined by assuming (arbitrarily) that the 
origin of world coordinate system and the origin of the 
first camera’s frame coincide. 
 
6/11/2013 Structure from Motion 34 
Eliminating the projective ambiguity 
Method 2: 
 The previous equation is linear in the coefficients of 
the symmetric matrix =Q3Q3T (linear least squares) 
 To enforce the rank 3 of , we do 
 
=UΛUT, 
 
 where Λ is the diagonal matrix with the eigenvalues of 
 and U is the orthogonal matrix formed by its 
eigenvectors. 
6/11/2013 Structure from Motion 35 
Eliminating the projective ambiguity 
Method 2 (cont.): 
 Computing the eigenvalues/eigenvectors of  we get 
=UΛUT, 
 where Λ is the diagonal matrix with the eigenvalues of , and U 
is the orthogonal matrix formed by its eigenvectors. 
 To enforce rank 3 of  we do 
Q=U3Λ3½ 
 where U3 is formed by the 3 columns of U associated to the 3 
largest eigenvalues of  and Λ3 is the corresponding sub matrix 
of Λ.. 
 Vector q4 can be computed as in the previous method. 
 
6/11/2013 Structure from Motion 36 
Eliminating the projective ambiguity 
These methods can be adapted to the case where only some 
of the intrinsiccamera parameters are known. 
 
Since Ri are orthogonal matrices 
 
 
 each image provides a set of constraints between the 
entries of Ri and . 
 
 2 Tiii
T
ii KKMM 
6/11/2013 Structure from Motion 37 
Eliminating the projective ambiguity 
 Assuming, for example that the center of the image is known for 
each camera, we can take u0=v0=0 and 
 
 
 
 
 
 Each zero entry of provides 2 independent linear equations 
in the 10 coefficients of the 4×4 matrix . With m ≥ 5 images, these 
parameters can be estimated via linear least squares. 
 
 
 What if zero skew can be assumed? 
T
i i
KK
 





0
0
32
31
i
T
i
i
T
i
mm
mm


100
0
sin
1
sin
cos
0
sin
cos
sin
1
2
2
2
2
2
2
2
2
i
i
i
i
ii
i
i
ii
i
i
T
i i 











KK
6/11/2013 Structure from Motion 38 
Readings 
 C. Tomasi and T. Kanade. Shape and motion from image 
streams under orthography: A factorization method. 
International Journal of Computer Vision, 9(2):137-154, 
November 1992. 
 F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 
Segmenting, Modeling, and Matching Video Clips 
Containing Multiple Moving Objects. PAMI 2007. 
 Muhamad, S. and Hebert, M. (2000), Iterative Projective 
Reconstruction from Multiple Views, Proc. IEEE 
Conference on Computer Vision and Pattern Recognition, 
2000, pp. II, 430-437. 
6/11/2013 Structure from Motion 39 
Next Topic 
 
Up to you!