Baixe o app para aproveitar ainda mais
Prévia do material em texto
Structure from Motion Raul Queiroz Feitosa 6/11/2013 Structure from Motion 2 Objectives State the Structure from Motion (SFM) problem, Introduce SFM techniques for small relief (affine), and Introduce SFM techniques for large relief (projective). 6/11/2013 Structure from Motion 3 Contents Introduction Affine SFM Projective SFM 6/11/2013 Structure from Motion 4 Structure from motion Automatic recovery of camera motion and scene structure from two or more images. also called automatic camera tracking or matchmoving. 6/11/2013 Structure from Motion 5 Applications of SFM Computer vision: multiple-view shape reconstruction, novel view synthesis, autonomous vehicle navigation. … Film production: insertion of computer-generated imagery (CGI) into live- action backgrounds, … 6/11/2013 Structure from Motion 6 Stereopsis × SFM Stereopsis Intrinsic camera parameters known (calibrated cameras), and Extrinsic camera parameters known (camera pose relative to a fixed world coordinate system known) Structure from Motion Camera parameters are (partially) unknown and possibly change over time. 6/11/2013 Structure from Motion 7 SFM problem formulation Given the images pij seen by m cameras of n fixed 3D points Pj (homogeneous) estimate the m projection matrices Mi and the 3D points Pj from the mn correspondences pij. 6/11/2013 Structure from Motion 8 Structure from Motion Contents: Introduction Affine SFM Projective SFM 6/11/2013 Structure from Motion 9 Affine approximation If scene's relief is small compared with distance zj to the camera the perspective projection may be approximated by affine models. increasing focal length / distance from camera center at infinite 6/11/2013 Structure from Motion 10 Affine model For affine cameras (see chapter on camera models), the previous equation takes the (nonhomogeneous) form where Mi is the affine projection matrix, for i=1,…, m, j=1,…,n. Dropping i and j from the notation above, yields iji j iij bP P p M 1 2×1 2×3 3×1 2×1 bPp 2 1 232221 131211 b b z y x aaa aaa v u 2×3 3×1 2×1 6/11/2013 Structure from Motion 11 Affine model Thus involves 2mn equations in 8m+3n unknowns. Hence, for a sufficient number of views (m) and a sufficient number of points (n), camera motion (Mi) and scene structure (Pi) may be recovered. iji j iij bP P p M 1 2×1 2×4 3×1 2×3 3×1 2×1 6/11/2013 Structure from Motion 12 Affine ambiguity Affine Ambiguity If are solutions to the SFM affine problem, so are , where and Q is an arbitrary affine transformation matrix, - that is, it can be written as , where C is a nonsingular 3x3 matrix and d R3. ji P and M ji P and M 1 1 and 1 jj ii QQ PP MM 1 with 1 11 1 TT QQ 00 dd CCC 6/11/2013 Structure from Motion 13 Affine ambiguity Affine Ambiguity The affine SFM problem can only be defined up to an affine transformation ambiguity. Taking into account the 12 parameters of the general affine transformation, a solution may exist as long as 2mn ≥ 8m+3n-12 Thus, for 2 views, we need 4 point correspondences. # of Mi # of Pj 6/11/2013 Structure from Motion 14 Offset to the center of mass Translate the origin of the image coordinate system to the center of mass pi0 of the observed points (pij → pij - pi0 ), by doing For simplicity, the origin of the world coordinate system is moved to the centroid of the 3D points (Pj → Pj - P0 ) This yields n k n k kjiikiiji n k ikijij nnn 1 11 111 PPbPbPppp jiij Pp 6/11/2013 Structure from Motion 15 The measurement matrixx Let’s create a 2m × n data (measurement) matrix D: The measurement matrix D = P must have rank 3! P D n mmnmm n n PPP ppp ppp ppp 21 2 1 21 22221 11211 ca m er as (2 m ) points (n) 2m×n 2m×3 3×n 6/11/2013 Structure from Motion 16 Tomasi-Kanade factorization Applying SVD to matrix D D=UWVT Since rank(D)=3 (noise free) U= W= VT= Therefore D= U3 W3 VT3 = P In practice, due to noise, inaccuracy in location of the interest points, and the mere fact that cameras are not affine, the matrix D is full rank. U3 Un-3 W3 0 0 0 VT3. VTn-3 2m×n 2m×3 3×3 3×n 6/11/2013 Structure from Motion 17 Tomasi-Kanade factorization algorithm 1. Compute the singular value decomposition. D=UWVT 2. Construct the matrices U3, V3 and W3 formed by the three leftmost columns of the matrices U and V W (the ones corresponding to the largest singular values), and the corresponding 3×3 sub matrix of W. 3. Define 0= U3 (W3)½ P0 = (W3)½ VT3 the 2m×3 matrix is an estimate of the camera motion, and the 3×n matrix is an estimate of the scene structure. It can be proven that W3 is the closest rank-3 approximation of D. 6/11/2013 Structure from Motion 18 Recalling the affine ambiguity The decomposition 0= U3 (W3) ½ P0 = (W3) ½ VT3 is not unique. For any 3×3 non singular matrix C = 0 C and P = C-1 P0 is also a solution (affine ambiguity). 6/11/2013 Geometric Camera Models 19 Affine camera models Summary of affine mappings p=M WP , where Note that in all cases the 3rd row is =(0 0 0 1)! 1 2 1 0 y T x T t t r r M 1 2 1 0 y T x T t t r r M 1 2 1 0 y T x T t t r r M 11 0sin 0cot 2 1 0 y T x T t t r r M orthographic - 5 dof scaled orthographic – 6 dof weak-perspective - 7 dof general affine - 8 dof 6/11/2013 Structure from Motion 20 Eliminating the affine ambiguity From a previous slide the image coordinates pij are the projections of Pij, on the vectors aTi1 and a T i2 , the rows of i. Consider scaled orthographic projection: image axes are perpendicular. jiij Pp 2211 21 2211 21 0 ˆˆˆˆ 0ˆˆ i TT ii TT i i TT i i T ii T i i T iaaaa aa aaaa aa CCCC CC 6/11/2013 Structure from Motion 21 Eliminating the affine ambiguity Scaled orthographic: image axes are perpendicular This translates into 2m quadratic equations in the elements of C Solve for C Update and P → = 0C and P = C-1 P0 Note that the solution is defined up to an arbitrary rotation. Common practice: → first camera frame ≡ world frame. p P i1aˆ 2 ˆ ia 2211 21 2211 21 0 ˆˆˆˆ 0ˆˆ i TT ii TT i i TT i i T ii T i i T i aaaa aa aaaa aa CCCC CC 6/11/2013 Structure from Motion 22 Tomasi-Kanade algorithm summary Given: m images and n features pij For each image i, center the feature coordinates Construct a 2m × n measurement matrix D: Column j has the projection of point Pj in all views Row i has one coordinate of the projections of all n points in image i Factorize D : Compute SVD: D = U W VT Create U3 by taking the first 3 columns of U Create V3 by taking the first 3 columns of V Create W3 by taking the upper left 3 × 3 block of W Create the motion and shape matrices: 0 = U3W3½ and P0 = W3½ V3T (or 0 = U3 and P0 = W3V3T) Eliminate affine ambiguity. Source: M. Hebert 6/11/2013 Structure from Motion 23 Reconstruction results Figures from C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2):137- 154, November 1992. 6/11/2013 Structure from Motion 24 Contents Introduction Affine SFM Projective SFM 6/11/2013 Structure from Motion 25 Projective model For projective cameras we have for i=1,…,m and j=1,…,n where mTi1, m T i2, m T i3, are the rows of a 3 ×4 matrix Mi and Pj denotes the homo- geneous coordinate vector of the point Pj in some world coordinate system. or 1 1 3 2 3 1 ji ji ij ji ji ij ji ij ij ij ij v u z v u Pm Pm Pm Pm Pp M 6/11/2013 Structure from Motion 26 Projective ambiguity According to Theorem 1 (see chapter on Camera Model) Mi is an arbitrary rank-3 3×4 matrix. Hence, if Mi and Pj are solutions for the equation above, so are M’i =Mi Q and P’j =Q-1 Pj , where Q is an arbitrary projective transformation matrix, i.e., any non singular 4×4 matrix. or 1 1 3 2 3 1 ji ji ij ji ji ij ji ij ij ij ij v u z v u Pm Pm Pm Pm Pp M 6/11/2013 Structure from Motion 27 Projective ambiguity The 4×4 matrix Q is defined up to a scale, since multiplying it by a nonzero scalar amounts to applying inverse scaling to Mi and Pj. Thus, the projective SFM problem can only be defined up to a projective transformation ambiguity. Taking into account the projective ambiguity the problem has a finite number of solutions as soon as 2mn ≥ 11m+3n-15 For two views, we need seven point correspondences. # of Mi # of Pj Bilinear projection SFM As in algebraic reconstruction, we do So, we look for a solution that minimizes This can be done by alternating steps where Pj are kept constant (estimated) while Mi are estimated (kept constant). 6/11/2013 Structure from Motion 28 zij pij = Mi Pj pij Mi Pj = 0 2 ij jiijE Pp M 6/11/2013 Structure from Motion 29 Bundle Adjustment Given estimates for the matrices Mi (i=1,…,m) and vectors Pj (j=1,…,n) , refine them by using non linear least square to minimize the global reprojection error E: Expensive, but encompasses a physically significant error measure. As initial guesses use Pj from Affine SFM e and from them compute Mi. ji ji ji ij ji ji ij vu mn E , 2 3 2 2 3 11 Pm Pm Pm Pm 6/11/2013 Structure from Motion 30 Eliminating the projective ambiguity Let us assume that Mi (i=1,…,m) and vectors Pj (j=1,…,n) have been estimated. If Mi and Pj are a reconstruction in a particular Euclidean coordinate system, there exist a 4×4 matrix Q such that The next slides show two methods of computing the Euclidean upgrade matrix Q, when (some of) the intrinsic parameters are known. iiii PP 1ˆ and ˆ QQMM ˆ ˆ 6/11/2013 Structure from Motion 31 Eliminating the projective ambiguity Method 1 (calibrated cameras): Mi (like Mi) is defined up to a scale, so where ρi accounts for the scale of Mi and Ki is the calibration matrix. Writing Q=(Q3 q4), where Q3 is 4×3 matrix and q4 is a vector in R4 , yields If the camera is calibrated (Ki is known), Mi Q3 are scaled rotation matrix, Thus …. )( iiiii t RKM 3 iiii RKQM ˆ ˆ 6/11/2013 Structure from Motion 32 Eliminating the projective ambiguity Method 1 (cont.): … the rows (j=1,2,3) of Mi are perpendicular vectors, so where Qi is defined up to a scale. To determine it uniquely we assume that the world coordinate system coincides with the first camera’s frame. T ijm 0 0 0 0 0 33332332 23321331 1333 3332 2331 T i TT i T i TT i T i TT i T i TT i T i TT i T i TT i T i TT i QQQQ QQQQ QQ QQ QQ mmmm mmmm mm mm mm 6/11/2013 Structure from Motion 33 Eliminating the projective ambiguity Method 1 (cont.): Given m images, we obtain 12 linear and 5(m-1) quadratic equations in the coefficients of Q that can be solved using non linear least squares. The vector q4 of matrix Q Q=(Q3 q4), can be determined by assuming (arbitrarily) that the origin of world coordinate system and the origin of the first camera’s frame coincide. 6/11/2013 Structure from Motion 34 Eliminating the projective ambiguity Method 2: The previous equation is linear in the coefficients of the symmetric matrix =Q3Q3T (linear least squares) To enforce the rank 3 of , we do =UΛUT, where Λ is the diagonal matrix with the eigenvalues of and U is the orthogonal matrix formed by its eigenvectors. 6/11/2013 Structure from Motion 35 Eliminating the projective ambiguity Method 2 (cont.): Computing the eigenvalues/eigenvectors of we get =UΛUT, where Λ is the diagonal matrix with the eigenvalues of , and U is the orthogonal matrix formed by its eigenvectors. To enforce rank 3 of we do Q=U3Λ3½ where U3 is formed by the 3 columns of U associated to the 3 largest eigenvalues of and Λ3 is the corresponding sub matrix of Λ.. Vector q4 can be computed as in the previous method. 6/11/2013 Structure from Motion 36 Eliminating the projective ambiguity These methods can be adapted to the case where only some of the intrinsiccamera parameters are known. Since Ri are orthogonal matrices each image provides a set of constraints between the entries of Ri and . 2 Tiii T ii KKMM 6/11/2013 Structure from Motion 37 Eliminating the projective ambiguity Assuming, for example that the center of the image is known for each camera, we can take u0=v0=0 and Each zero entry of provides 2 independent linear equations in the 10 coefficients of the 4×4 matrix . With m ≥ 5 images, these parameters can be estimated via linear least squares. What if zero skew can be assumed? T i i KK 0 0 32 31 i T i i T i mm mm 100 0 sin 1 sin cos 0 sin cos sin 1 2 2 2 2 2 2 2 2 i i i i ii i i ii i i T i i KK 6/11/2013 Structure from Motion 38 Readings C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2):137-154, November 1992. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007. Muhamad, S. and Hebert, M. (2000), Iterative Projective Reconstruction from Multiple Views, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2000, pp. II, 430-437. 6/11/2013 Structure from Motion 39 Next Topic Up to you!
Compartilhar