Calibração de Câmera e Geometria Epipolar

•
SIN SIGLA

0
Jon Snow
25/02/2023
Esta é uma pré-visualização de arquivo. Entre para ver o arquivo original
Camera Calibration and Epipolar Geometry
CS5330, Huaizu Jiang
Fall 2022, Northeastern University
Announcement
Grades of PA2
Grades of PA2
Reach out to us if you believe your grade is incorrect
Piazza is preferred
Email is also fine
Using Teams, comments on Canvas may result in significant delay
Recap
Batch Normalization
Learnable scale and shift parameters:
Output,
Shape is N x D
Learning , 
will recover the identity function (in expectation)
Input:
Per-channel mean, shape is D
Normalized x,
Shape is N x D
Per-channel std, shape is D
Slide credit: D Fouhey & J Johnson
Don’t/Can’t treat them as constants in PA4 (how about the gradients?)
 Derivation: https://www.adityaagrawal.net/blog/deep_learning/bprop_batch_norm
Regularization: Dropout
“randomly set some neurons to zero in the forward pass”
[Srivastava et al., 2014]
Slide credit: E. Learned-Miller, originally developed by Andrej Karpathy, Justin Johnson, Fei-Fei Li
Randomly set the output value of some neurons to be 0
Oct. 5, 2021
VGG: Deeper Networks, Regular Design
3x3 conv, 128
Pool
3x3 conv, 64
3x3 conv, 64
Input
3x3 conv, 128
Pool
3x3 conv, 256
3x3 conv, 256
Pool
3x3 conv, 512
3x3 conv, 512
Pool
3x3 conv, 512
3x3 conv, 512
Pool
FC 4096
FC 1000
Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
3x3 conv, 384
Pool
5x5 conv, 256
11x11 conv, 96
Input
Pool
3x3 conv, 384
3x3 conv, 256
Pool
FC 4096
FC 4096
Softmax
FC 1000
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 256
3x3 conv, 256
3x3 conv, 128
3x3 conv, 128
3x3 conv, 64
3x3 conv, 64
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
FC 4096
FC 1000
FC 4096
AlexNet
VGG16
VGG19
VGG Design rules:
All conv are 3x3 stride 1 pad 1
All max pool are 2x2 stride 2
After pool, double #channels
Simonyan and Zissermann, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, ICLR 2015
Slide credit: D Fouhey & J Johnson
Residual Networks
Once we have Batch Normalization, we can train networks with 10+ layers. What happens as we go deeper?
Training error
Iterations
56-layer
20-layer
Iterations
56-layer
20-layer
Test error
In fact the deep model seems to be underfitting since it also performs worse than the shallow model on the training set! It is actually underfitting
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Slide credit: D Fouhey & J Johnson
Residual Networks
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 128
3x3 conv, 128, / 2
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
...
3x3 conv, 512
3x3 conv, 512, /2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
Pool
relu
Residual block
3x3 conv
3x3 conv
F(x) + x
F(x)
relu
X
A residual network is a stack of many residual blocks
Regular design, like VGG: each residual block has two 3x3 conv
Network is divided into stages: the first block of each stage halves the resolution (with stride-2 conv) and doubles the number of channels
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Adapted from: D Fouhey & J Johnson
relu
Residual block
3x3 conv, stride 2
3x3 conv
F(x) + x
F(x)
relu
X
1x1 conv, stride 2
Today’s Class
Training of CNNs
Camera calibration
Group-based Convolution
Input: Cin x H x W
Hyperparameters:
Kernel size: KH x KW
Number filters: Cout 
Padding: P
Stride: S
Groups: G
Weight matrix: Cout /G x Cin /G x KH x KW x G
Bias vector: Cout /G
FLOPS: Cout /G x Cin /G x KH x KW x G x H x W
MobileNet
Channel: C, Groups: C
Pointwise Conv
9C2HW
9CHW
C2HW
Computation reduction: 9C2HW/(9CHW + C2HW) = 9C/(9+C)
[Howard et al., MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017]
ShuffleNet
[Zhang et al., ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CVPR 2018]
ShuffleNet Units
[Zhang et al., ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CVPR 2018]
Readings
AlexNet: https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
VGG: https://arxiv.org/abs/1409.1556
ResNet: https://arxiv.org/abs/1512.03385
MobileNets:https://arxiv.org/abs/1704.04861
ShuffleNet: https://arxiv.org/abs/1707.01083
Training Convolutional Networks
 Download big datasets
 Design CNN architecture
 Initialize Weights
 For t = 1 to T:
 Form minibatch
 Compute loss + gradient
 Update Weights
 Apply trained model to task
Slide credit: D Fouhey & J Johnson
Training Convolutional Networks
 Download big datasets
 Design CNN architecture
 Initialize Weights
 For t = 1 to T:
 Form minibatch
 Compute loss + gradient
 Update Weights
 Apply trained model to task
Slide credit: D Fouhey & J Johnson
Weight Initialization: Activation Statistics
Forward pass for a 6-layer net with hidden size 4096
Slide credit: D Fouhey & J Johnson
Weight Initialization: Activation Statistics
Forward pass for a 6-layer net with hidden size 4096
All activations tend to zero for deeper network layers
Q: What do the gradients dL/dW look like?
Slide credit: D Fouhey & J Johnson
All activations tend to zero for deeper network layers
Q: What do the gradients dL/dW look like?
A: All zero, no learning =(
Weight Initialization: Activation Statistics
Forward pass for a 6-layer net with hidden size 4096
Weights are too small at initialization!
Slide credit: D Fouhey & J Johnson
Weight Initialization: Activation Statistics
Increase scale of weights at initialization 0.01 -> 0.05
Slide credit: D Fouhey & J Johnson
Weight Initialization: Activation Statistics
All activations saturate
Q: What do the gradients look like?
Weights are too big at initialization!
Increase scale of weights at initialization 0.01 -> 0.05
Slide credit: D Fouhey & J Johnson
All activations saturate
Q: What do the gradients look like?
A: Local gradients all zero, no learning =(
Weight Initialization: Activation Statistics
Increase scale of weights at initialization 0.01 -> 0.05
Weights are too big at initialization!
Slide credit: D Fouhey & J Johnson
Weight Initialization: Xavier
Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010
“Xavier” initialization: std = 1 / sqrt(Din)
Slide credit: D Fouhey & J Johnson
Weight Initialization: Xavier
Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010
“Xavier” initialization: std = 1 / sqrt(Din)
Weights are just right at initialization!
Slide credit: D Fouhey & J Johnson
For conv layers, Din is kernel_size2 * input_channels
Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010
Weight Initialization: Xavier
“Xavier” initialization: std = 1 / sqrt(Din)
Weights are just right at initialization!
Slide credit: D Fouhey & J Johnson
Weight Initialization: MSRA
”Just right” – activations nicely scaled for all layers
He et al, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, ICCV 2015
For ReLU networks: std = sqrt(2 / Din)
Weights are just right at initialization!
Slide credit: D Fouhey & J Johnson
Training Convolutional Networks
 Download big datasets
 Design CNN architecture
 Initialize Weights
 For t = 1 to T:
 Form minibatch
 Compute loss + gradient
 Update Weights
 Apply trained model to task
Slide credit: D Fouhey & J Johnson
Training Convolutional Networks
 Download big datasets
 Design CNN architecture
 Initialize Weights
 For t = 1 to T:
 Form minibatch
 Compute loss + gradient
 Update Weights
 Apply trained model to task
If the model is big, won’t we overfit?
Slide credit: D Fouhey & J Johnson
Regularizing CNNs: Weight Decay
Add L2 regularization term to the loss penalizing large weight matrices
Usually don’t regularize bias terms, or BatchNorm scale / shift params
*Technical note: Adding an explicit term to the loss is “L2
Regularization”; “Weight decay” adds a term to the gradient. They are equivalent for SGD, but not quite the same for other optimizers like Adam
Slide credit: D Fouhey & J Johnson
Regularizing CNNs: Data Augmentation
Horizontal 
Flip
Color
Jitter
Image Cropping
Hippo
Hippo?
Hippo?
Hippo?
Slide credit: D Fouhey & J Johnson
Regularizing CNNs: Data Augmentation
Apply random transformations to input images during training
Artificially “inflate” the size of your dataset
Training Image
Augmented images
Hippo
Hippo
Slide credit: D Fouhey & J Johnson
Training Convolutional Networks
 Download big datasets
 Design CNN architecture
 Initialize Weights
 For t = 1 to T:
 Form minibatch
 Compute loss + gradient
 Update Weights
 Apply trained model to task
If the model is big, won’t we overfit?
Slide credit: D Fouhey & J Johnson
Training Convolutional Networks
 Download big datasets
 Design CNN architecture
 Initialize Weights
 For t = 1 to T:
 Form minibatch
 Compute loss + gradient
 Update Weights
 Apply trained model to task
What if we can’t find one?
Slide credit: D Fouhey & J Johnson
Transfer Learning: Feature Extraction
Donahue et al, “DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition”, ICML 2014
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
FC-1000
1. Train on ImageNet
Slide credit: D Fouhey & J Johnson
Transfer Learning: Feature Extraction
Donahue et al, “DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition”, ICML 2014
Use your small dataset to train a linear classifier on top of pretrained CNN features
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
FC-1000
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
Freeze these
Remove last layer
1. Train on ImageNet
2. CNN as feature extractor
Slide credit: D Fouhey & J Johnson
Transfer Learning: Fine-Tuning
Reinitialize last layer and continue training whole network on your dataset
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
FC-1000
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
Freeze these
Remove last layer
FC
1. Train on ImageNet
2. CNN as feature extractor
3. Bigger dataset: Fine-Tuning
Slide credit: D Fouhey & J Johnson
Transfer Learning: Fine-Tuning
Some tricks:
Train with feature extraction first before fine-tuning
Lower the learning rate: use ~1/10 of LR used in original training
Sometimes freeze lower layers to save computation
Reinitialize last layer and continue training whole network on your dataset
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
FC-1000
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
Image
Conv-64
Conv-64
MaxPool
Conv-128
Conv-128
MaxPool
Conv-256
Conv-256
MaxPool
Conv-512
Conv-512
MaxPool
Conv-512
Conv-512
MaxPool
FC-4096
FC-4096
Freeze these
Remove last layer
FC
1. Train on ImageNet
2. CNN as feature extractor
3. Bigger dataset: Fine-Tuning
Slide credit: D Fouhey & J Johnson
Convolution Layers
Pooling Layers
Fully-Connected Layers
Activation Function
Normalization
Recap: Convolutional Networks
Slide credit: D Fouhey & J Johnson
Recap: CNN Architectures
Lin et al
Sanchez & Perronnin
Krizhevsky et al (AlexNet)
Zeiler & Fergus
Simonyan & Zisserman (VGG)
Szegedy et al (GoogLeNet)
He et al (ResNet)
Russakovsky et al
Shao et al
Hu et al
(SENet)
Shallow
8 Layer
8 Layer
19 Layer
22 Layer
152 Layer
152 Layer
152 Layer
AlexNet
VGG
ResNet
Slide credit: D Fouhey & J Johnson
2010	2011	2012	2013	2014	2014	2015	2016	2017	Human	28.2	25.8	16.399999999999999	11.7	7.3	6.7	3.6	3	2.2999999999999998	5.0999999999999996	
Error Rate
Recap: Training CNNs
 Download big datasets
 Design CNN architecture
 Initialize Weights
 For t = 1 to T:
 Form minibatch
 Compute loss + gradient
 Update Weights
 Apply trained model to task
Transfer Learning
Xavier / MSRA Init
Regularization + Data Augmentation
Slide credit: D Fouhey & J Johnson
Perspective & 3D Geometry
Our goal: Recovery of 3D structure
J. Vermeer, Music Lesson, 1662
A. Criminisi, M. Kemp, and A. Zisserman,Bringing Pictorial Space to Life: computer techniques for the analysis of paintings, Proc. Computers and the History of Art, 2002
Slide credit: S Lazebnik
44
Single-view ambiguity
x
X?
X?
X?
Slide credit: S Lazebnik
45
Things aren’t always as they appear…
http://en.wikipedia.org/wiki/Ames_room
Slide credit: S Lazebnik
46
Single-view ambiguity
Slide credit: S Lazebnik
47
Single-view ambiguity
Rashad Alakbarov shadow sculptures
Slide credit: S Lazebnik
Anamorphic perspective
Image source
Slide credit: S Lazebnik
Resolving Single-view Ambiguity
Shoot light (lasers etc.) out of your eyes!
Con: not so biologically plausible, dangerous?
Slide credit: D Fouhey & J Johnson
Resolving Single-view Ambiguity
Shoot light (lasers etc.) out of your eyes!
Con: not so biologically plausible, dangerous?
Slide credit: D Fouhey & J Johnson
Resolving Single-view Ambiguity
X
Stereo: given 2 calibrated cameras in different views and correspondences, can solve for X
x
x
Original diagram credit: S. Lazebnik
Slide credit: D Fouhey & J Johnson
http://www.well.com/~jimg/stereo/stereo_list.html
Slide credit: J. Hays
53
53
CS 376 Lecture 16: Stereo 1
http://www.well.com/~jimg/stereo/stereo_list.html
Slide credit: J. Hays
54
54
CS 376 Lecture 16: Stereo 1
Autostereograms
Exploit disparity as depth cue using single image.
(Single image random dot stereogram, Single image stereogram)
Slide credit: J. Hays, Images from magiceye.com
55
55
CS 376 Lecture 16: Stereo 1
Autostereograms
Slide credit: J. Hays, Images from magiceye.com
56
56
CS 376 Lecture 16: Stereo 1
Multi-view geometry problems
Camera 1
K
?
Slide credit: Noah Snavely
Calibration:
Figure out intrinsics of camera (K).
We need camera intrinsics / K in order to figure out where the rays are
Structure from motion solves the following problem:
Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded points, find…
a set of 3D points P and 
a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low.
This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences.
Multi-view geometry problems
Camera 3
R3,t3
?
Camera 1
Camera 2
R1,t1
R2,t2
Recovering structure:
Given cameras and correspondences, find 3D.
Slide credit: Noah Snavely
Structure from motion solves the following problem:
Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded
points, find…
a set of 3D points P and 
a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low.
This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences.
Multi-view geometry problems
Camera 3
R3,t3
Camera 1
Camera 2
R1,t1
R2,t2
Stereo/Epipolar Geomery:
Given 2 cameras and find where a point could be
Slide credit: Noah Snavely
Structure from motion solves the following problem:
Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded points, find…
a set of 3D points P and 
a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low.
This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences.
Multi-view geometry problems
Camera 1
Camera 2
Camera 3
R1,t1
R2,t2
R3,t3
?
?
?
Motion:
Figure out R, t for a set of cameras given correspondences
Slide credit: Noah Snavely
Structure from motion solves the following problem:
Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded points, find…
a set of 3D points P and 
a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low.
This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences.
Outline
(Today) Calibration: 
Getting intrinsic matrix/K
(Today, next week) Epipolar geometry: 
2 pictures -> depth map
(Next week) Stereo
Rectified 2 pictures -> depth map
Structure from motion (SfM): 
2+ pictures -> cameras, pointcloud
Optical flow (we may have a guest lecture here)
Finding correspondences in 2 pictures (similar to stereo)
Slide inspired by: D Fouhey & J Johnson
Our goal: Recovery of 3D structure
When certain assumptions hold, we can recover structure from a single view
In general, we need multi-view geometry
Image source
But first, we need to understand the geometry of a single camera…
Slide credit: S Lazebnik
62
Projection matrix
Optical center
Principal point
Image plane center
R,t
Ow
iw
kw
jw
x: Image Coordinates: (u,v,1)
K: Intrinsic Matrix (3x3)
R: Rotation (3x3) 
t: Translation (3x1)
X: World Coordinates: (X,Y,Z,1)
Typical Perspective Model
focal length
principal point (image coords of camera origin on retina)
Just moves camera origin
2D Projection of X 
rotation
translation
3D point
Slide credit: D Fouhey & J Johnson
Notes on the Intrinsic Matrix
. 
: how many pixels per unit in the x direction on the sensor
: skew factor. Exists if the x and y coordinates are not perpendicular.
: offset from the image plane center
Radial Distortion in Wide-angle Lenses
Slide adapted from James Hays
Reading: Section 2.1.5, Computer Vision: Algorithms and Applications, 2nd Edition
Camera Calibration
Pairs of [X,Y,Z] and [u,v] → eqns to constrain M
How do I get [X,Y,Z], [u,v]?
Slide credit: D Fouhey & J Johnson
Camera Calibration Targets
Using a tape measure
312.747 309.140 30.086
305.796 311.649 30.356
307.694 312.358 30.418
310.149 307.186 29.298
311.937 310.105 29.216
311.202 307.572 30.682
307.106 306.876 28.660
309.317 312.490 30.230
307.435 310.151 29.318
308.253 306.300 28.881
306.650 309.301 28.905
308.069 306.831 29.189
309.671 308.834 29.029
308.255 309.955 29.267
307.546 308.613 28.963
311.036 309.206 28.913
307.518 308.175 29.069
309.950 311.262 29.990
312.160 310.772 29.080
311.988 312.709 30.514
880 214
 43 203
270 197
886 347
745 302
943 128
476 590
419 214
317 335
783 521
235 427
665 429
655 362
427 333
412 415
746 351
434 415
525 234
716 308
602 187
Known 3d locations
Known 2d image coords
Image credit: J. Hays
Slide credit: D Fouhey & J Johnson
Camera Calibration Targets
…
A set of views of a plane (not covered here)
Slide credit: D Fouhey & J Johnson
Camera Calibration Targets
A single, huge plane. What’s this for?
Slide credit: D Fouhey & J Johnson
Camera calibration
Given n points with known 3D coordinates Xi and known image projections pi, estimate the camera parameters
Xi
pi
Slide credit: S. Lazebnik
71
Camera Calibration: Linear Method
Remember (from geometry): this implies MXi & pi are proportional/scaled copies of each other
Remember (from homography fitting): this implies their cross product is 0
Slide credit: D Fouhey & J Johnson
Camera Calibration: Linear Method
…Some tedious math occurs…
(see Homography deriviation)
Slide credit: D Fouhey & J Johnson
Camera Calibration: Linear Method
How many linearly independent equations?
2
How many equations per [u,v] + [X,Y,Z] pair?
2
If M is 3x4, how many degrees of freedom?
11
Slide credit: D Fouhey & J Johnson
Camera Calibration: Linear Method
Derivation from L. Lazebnik; note we negate one of the equations from the cross product
How do we solve problems of the form ?
Eigenvector of ATA with smallest eigenvalue
Slide credit: D Fouhey & J Johnson
Camera calibration: Linear method
Note: for coplanar points that satisfy ΠTX=0,
we will get degenerate solutions (Π,0,0), 
(0,Π,0), or (0,0,Π)
Slide credit: S. Lazebnik
76
Camera calibration: Linear vs. nonlinear
Linear calibration is easy to formulate and solve, but it doesn’t directly tell us the camera parameters
In practice, non-linear methods are preferred
Write down objective function in terms of intrinsic and extrinsic parameters
Define error as sum of squared distances between measured 2D points and estimated projections of 3D points
Minimize error using Newton’s method or other non-linear optimization
Can model radial distortion and impose constraints such as known focal length and orthogonality
vs.
Slide credit: S. Lazebnik
77
x h s
x
h s
[
]
X
t
R
K
x
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
ú
ú
ú
û
ù
ê
ê
ê
ë
é
=
ú
ú
ú
û
ù
ê
ê
ê
ë
é
1
*
*
*
*
*
*
*
*
*
*
*
*
Z
Y
X
y
x
l
l
l
[
]
X
t
R
K
x
=