Lecture5 - multivariate regression (rabiscado)

Jose Mario Chizzotti

em 05/03/2026

Conteúdos escolhidos para você

496 pág.

CausalML_book

UNA

41 pág.

Slides 6 - IV (rabiscado)

147 pág.

ECONOMETRIA

UNIFCV

121 pág.

Notas de Econometria

UFRGS

Perguntas dessa disciplina

Ao se realizar a amostragem de um determinado evento qualquer, não é garantido que as amostras tenham qualquer tipo de rigor estatístico. Muitas vezes

ESTÁCIO

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio aplica

UNIASSELVI

Um pesquisador deseja avaliar três cultivares de arroz em dois sistemas de manejo hídrico (alagado e sequeiro). Para isso, decide utilizar um exper...

UNIASSELVI

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio apl...

UNIASSELVI

avaliação de experimentação agrícola Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, confor...

UNEMAT

Material

Conteúdos escolhidos para você

496 pág.

CausalML_book

UNA

41 pág.

Slides 6 - IV (rabiscado)

147 pág.

ECONOMETRIA

UNIFCV

121 pág.

Notas de Econometria

UFRGS

Perguntas dessa disciplina

Ao se realizar a amostragem de um determinado evento qualquer, não é garantido que as amostras tenham qualquer tipo de rigor estatístico. Muitas vezes

ESTÁCIO

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio aplica

UNIASSELVI

Um pesquisador deseja avaliar três cultivares de arroz em dois sistemas de manejo hídrico (alagado e sequeiro). Para isso, decide utilizar um exper...

UNIASSELVI

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio apl...

UNIASSELVI

avaliação de experimentação agrícola Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, confor...

UNEMAT

Prévia do material em texto

Econometrics I: Multivariate Regressions
Bruno Ferman Econometrics I: Multivariate Regression
 
Populism
Defense sporanebre DE Interest
I A
R eeo E cause
Effie ECHO xe.am
ea a
ear
you IE i mn
wn pfq
Miami tintpiftile
n
ab I p
Pu Xiu
to
ELMET'spix Elmira
iii s c
si
K happy
ECulx l EChx7 ECw ae
i
iPr qq
Ef yoas ws P d
Y A pot
mini
yen as
EqxHHh Y PotPhHxa7pz xa4ru
am I
cftp.s nr
eDTEfY wI xtEi ELeg
ECYlx3
egg
ELExfEGI.ro
SAMPLE for prepotsnestertsst
i i b
FIrsfessrpuii.EC I 0 30kSunr3ssstD
Nco.ms p Nlp 9
ftp.yq.itnF
f
LARGE Songhai EChx7GLDzezsascovssek.li
mfs piddle I
t.fi q sNlahlor
9
Multivariate OLS
Very similar to what we have seen before... The difference
is that now we have more regressors...
Yi = b1 + b2X2i + ... + bKXKi + ui (1)
As we will see, most of the intuition we saw on bivariate
regression carries through.
In particular, the distinction between correlation and
causality...
Note: we can think of X1i as a constant covariate. Recall this is just
notation...
Bruno Ferman Econometrics I: Multivariate Regression
Regression Terminology
Yi = b1 + b2X2i + ... + bKXKi + ui (2)
Terminology:
Yi is the dependent variable
Xki are the independent variables
b1 + b2X2 + ... + bKXK is the population regression line
b1 is the intercept of the population regression line
bk is the slope of the k variable of the population regression
line.
ui is the error term
Alternatively, we can write:
Yi = x0i b + ui (3)
where xi 2 RK and b 2 RK
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
As before, we start by defining the population parameters b
such that:
E[xiui] = E[xi(Yi � x0i b)] = 0 (4)
As before, this is the FOC of minE[(Yi � x0i b)
2].
Important: this b is only uniquely identified if E[xix0i ] is
invertible.
What does that mean?
We will show later that the OLS estimator is a “good”
estimator of b (defined this way).
Before that, note that this b may be of interest for exactly
the same reasons as in the bivariate case.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
As before, we start by defining the population parameters b
such that:
E[xiui] = E[xi(Yi � x0i b)] = 0 (4)
As before, this is the FOC of minE[(Yi � x0i b)
2].
Important: this b is only uniquely identified if E[xix0i ] is
invertible.
What does that mean?
We will show later that the OLS estimator is a “good”
estimator of b (defined this way).
Before that, note that this b may be of interest for exactly
the same reasons as in the bivariate case.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
As before, we start by defining the population parameters b
such that:
E[xiui] = E[xi(Yi � x0i b)] = 0 (4)
As before, this is the FOC of minE[(Yi � x0i b)
2].
Important: this b is only uniquely identified if E[xix0i ] is
invertible.
What does that mean?
We will show later that the OLS estimator is a “good”
estimator of b (defined this way).
Before that, note that this b may be of interest for exactly
the same reasons as in the bivariate case.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
Theorem
The Linear CEF Theorem
Suppose the CEF is linear. Then the population regression
function is it.
Important result!
The CEF is the minimum mean square error predictor of Yi
given Xi (considering any function m(Xi)).
The CEF summarizes the relationship between variables Xi
and Yi.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
Theorem
The Linear CEF Theorem
Suppose the CEF is linear. Then the population regression
function is it.
Important result!
The CEF is the minimum mean square error predictor of Yi
given Xi (considering any function m(Xi)).
The CEF summarizes the relationship between variables Xi
and Yi.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
Theorem
The Best Linear Predictor Theorem
The function x0i b is the best linear predictor of Yi given xi in a
MMSE sense.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
Theorem
The Regression CEF Theorem
The function x0i b is the best linear approximation to E[Yi|xi].
That is:
b = argminbE
⇥
(E[Yi|xi]� x0ib)
2⇤
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Population Parameter
Note that the linearity imposes restrictions on the
approximation to the CEF.
Example: Yi is earnings, and xi includes education and
gender.
If we can “saturate” the model, then we can guarantee that
the CEF is linear.
Example.
Note that, in this case, E[ui|xi] = 0
Bruno Ferman Econometrics I: Multivariate Regression
0
Partial Effects
If E[ui|xi] = 0 holds, then note that (in a model with no
interactions):
∂E[Yi|xk]
∂Xjk
= bk (5)
That is: bk is the marginal effect of Xk on E[Yi|xi] holding
constant all other X’s
This is a partial correlation.
Note: this does not necessarily mean a causal effect!
Intuition: comparison to the bivariate regression model.
Bruno Ferman Econometrics I: Multivariate Regression
P
I Gotta
DO
D.TT
Partial Effects
If E[ui|xi] = 0 holds, then note that (in a model with no
interactions):
∂E[Yi|xk]
∂Xjk
= bk (5)
That is: bk is the marginal effect of Xk on E[Yi|xi] holding
constant all other X’s
This is a partial correlation.
Note: this does not necessarily mean a causal effect!
Intuition: comparison to the bivariate regression model.
Bruno Ferman Econometrics I: Multivariate Regression
Partial Effects
More generally, we have that:
bk =
cov(Yi, x̃ki)
var(x̃ki)
(6)
where x̃ki is the error in a population OLS model of xki on
all other covariates.
This is related to, but it is not the Frisch-Waugh-Lovell
theorem.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Estimator
Assumptions:
1 E[ui|xi] = 0 (alternatively, E[xiui] = 0)
2 {xi, Yi}N
i=1 are i.i.d.
3 No perfect multicolinearity
4 xi and Yi have nonzero finite forth moments (technical
condition for the large sample approximations)
How does Assumption 3 relates to the assumption that
E[xix0i ] is invertible?
See Goldberger’s parody on “micronumerosity” (Hansen,
chapter 4)
When we consider asymptotic properties, we assume
directly that E[xix0i ] is invertible.
Bruno Ferman Econometrics I: Multivariate Regression
The OLS Estimator
Similar to the bivariate case....
The Ordinary Least Square (OLS) Estimator chooses b to
minimize:
(b̂) = argmin
N
Â
i=1
(Yi � x0ib)
2 (7)
If assumptions (1) to (4) are satisfied, we can show that (b̂)
will be unbiased, consistent, and asymptotically normal.
Later, when we use matrix algebra, we will derive the
formula for multivariate OLS, prove that it is unbiased and
consistent, and derive the asymptotic distribution.
What would happen if A3 is not valid?
Bruno Ferman Econometrics I: Multivariate Regression
Multivariate OLS
What do we get? Very similar to before....
If the CEF is linear, we get the CEF coefficients...
If not, we get the best approximation to the CEF, or the
best linear predictor (in this case, OLS is consistent, but
not unbiased)
Again, we are just getting correlations (which might be
useful!), not necessarily causal effects.
Bruno Ferman Econometrics I: Multivariate Regression
pepe
SGR
f g
under
gHyfot p i8w Yi t.mx aux
4 Uh Pat'Ll
Go esfsnefeedopxe y ELwiw
Potpie 184 4
Be l
BECx.u.I.GCw.nitC.cn 7 of
a chew
Yilot Petite
Yuh P.xy.co
ECulxwH
y.gg
iPErmqFncUs
a tho Yih Y ofpytnwimw.ie
y µ
Pp T re serperheressoarserest
T with Will 1 10
Ps task cash
Causal Effects
In many instances, we want to estimate a causal effect.
Example: consider the OLS regression of Earnings on
Education. Or class size on test scores.
In these cases, A1 may become a very strong assumption...
Including more variables might help...
Bruno Ferman Econometrics I: Multivariate Regression
Causal Effects
Suppose you want to estimate the return to schooling. Let
b1 be the increase in earnings onewould have if education
increases. Consider the potential outcomes model:
Yi(0) = b0 + b2ai + ui, where ai is a measure of ability.
Yi(1) = b1 + Yi(0), so b1 is the causal effect of education on
earnings.
So we can write:
Yi = b0 + b1Xi + b2ai + ui (8)
Under which assumptions we would have ui uncorrelated
with Xi and ai?
Bruno Ferman Econometrics I: Multivariate Regression
Causal Effects
Suppose you want to estimate the return to schooling. Let
b1 be the increase in earnings one would have if education
increases. Consider the potential outcomes model:
Yi(0) = b0 + b2ai + ui, where ai is a measure of ability.
Yi(1) = b1 + Yi(0), so b1 is the causal effect of education on
earnings.
So we can write:
Yi = b0 + b1Xi + b2ai + ui (8)
Under which assumptions we would have ui uncorrelated
with Xi and ai?
Bruno Ferman Econometrics I: Multivariate Regression
Causal Effects
Suppose b1 is the causal effect of Xi, and the true model is:
Y = b0 + b1X + b2a + u, where E[Xu] = E[au] = 0 (9)
That is: ability is the only other determinant of earnings
that is correlated with education. Is this a reasonable
assumption?
If we have information on ability, then we can run multivariate OLS
and get a consistent estimator of b1!
Notes:
It is not necessary that b2 is the causal effect of ability on earnings.
Ability is only a control variable.
However, it cannot be that education has a causal effect on ability (more
details in Microeconometrics I). Example: link, see from 6:45
Conclusion: if we are able to include all variables in u that are
correlated with education, then OLS gives you a causal effect.
Bruno Ferman Econometrics I: Multivariate Regression
DO IT
taxi
1K I
Yen X il
pi.cat r 9IiiIi.Pn µvrtixisstEtx.EE pz f ywtt t
pipa Pe Yeux
verite
Causal Effects
Suppose b1 is the causal effect of Xi, and the true model is:
Y = b0 + b1X + b2a + u, where E[Xu] = E[au] = 0 (9)
That is: ability is the only other determinant of earnings
that is correlated with education. Is this a reasonable
assumption?
If we have information on ability, then we can run multivariate OLS
and get a consistent estimator of b1!
Notes:
It is not necessary that b2 is the causal effect of ability on earnings.
Ability is only a control variable.
However, it cannot be that education has a causal effect on ability (more
details in Microeconometrics I). Example: link, see from 6:45
Conclusion: if we are able to include all variables in u that are
correlated with education, then OLS gives you a causal effect.
Bruno Ferman Econometrics I: Multivariate Regression
too
Causal Effects
Suppose b1 is the causal effect of Xi, and the true model is:
Y = b0 + b1X + b2a + u, where E[Xu] = E[au] = 0 (9)
That is: ability is the only other determinant of earnings
that is correlated with education. Is this a reasonable
assumption?
If we have information on ability, then we can run multivariate OLS
and get a consistent estimator of b1!
Notes:
It is not necessary that b2 is the causal effect of ability on earnings.
Ability is only a control variable.
However, it cannot be that education has a causal effect on ability (more
details in Microeconometrics I). Example: link, see from 6:45
Conclusion: if we are able to include all variables in u that are
correlated with education, then OLS gives you a causal effect.
Bruno Ferman Econometrics I: Multivariate Regression
Omitted Variable Bias
What if we cannot observe ability.... What would happen if
we regress earnings on education?
b̃1 =
\cov(X, Y)
\var(X)
!p
cov(X, Y)
var(X)
= b1 + b2
cov(X, a)
var(X)
(10)
We have omitted variable bias if:
1 b2 6= 0, and;
2 cov(X, a) 6= 0
Note that we can also talk about the sign of the bias.
Bruno Ferman Econometrics I: Multivariate Regression
Omitted Variable Bias
What if we cannot observe ability.... What would happen if
we regress earnings on education?
b̃1 =
\cov(X, Y)
\var(X)
!p
cov(X, Y)
var(X)
= b1 + b2
cov(X, a)
var(X)
(10)
We have omitted variable bias if:
1 b2 6= 0, and;
2 cov(X, a) 6= 0
Note that we can also talk about the sign of the bias.
Bruno Ferman Econometrics I: Multivariate Regression
Another example
Yi is mother employment and Xi is access to childcare.
Can we estimate the causal effects of Xi on Yi using OLS
regression?
It depends.... Imagine that you have a group of n applicant
mothers and...
1 a fraction
n1
n of them randomly received access to childcare.
2 the first n1 who applied received access.
3 the n1 with most vulnerable background received access.
4 they were divided into two groups (more vs less vulnerable
background). Each group had its own lottery, where the
odds were higher for the more vulnerable group.
Bruno Ferman Econometrics I: Multivariate Regression
g
We
ECx1w e 25
war
Elevate I cfwy.HU
as De yeux crush
Xze x 1
f
i
f
Xs exs 1Y potPixtpzWthi
dGofEEa.tx.in o
yipe.p4xwtpExEw
oin0a.c
Another example
Yi is mother employment and Xi is access to childcare.
Can we estimate the causal effects of Xi on Yi using OLS
regression?
It depends.... Imagine that you have a group of n applicant
mothers and...
1 a fraction
n1
n of them randomly received access to childcare.
2 the first n1 who applied received access.
3 the n1 with most vulnerable background received access.
4 they were divided into two groups (more vs less vulnerable
background). Each group had its own lottery, where the
odds were higher for the more vulnerable group.
Bruno Ferman Econometrics I: Multivariate Regression
1
Summary
What you get from multivariate OLS is pretty much the
same you get from bivariate OLS: correlations, the CEF
function, best linear predictor,...
If you care about a causal relationship, multivariate OLS
allows you to control for other variables, making it more
plausible that the assumptions you need are satisfied.
However, you need to control for all possible variables that
are correlated with X and affect Y....
Bruno Ferman Econometrics I: Multivariate Regression
Non-linear Regression Functions and Interactions
The linear regression model we saw assumes that marginal
effects do not vary with other variables.
Example:
Earningsi = b0 + b1Educi + b2Femalei + ui
Question: how does Educ affect E[Earning|Female = 1] and
E[Earning|Female = 0]?
Consider the extended model:
Earningsi = b0 + b1Educi + b2Femalei +
= b3Educi ⇥ Femalei + ui
Bruno Ferman Econometrics I: Multivariate Regression
toE E H x x bT
Ggg
E E YlxT te xbi x b If
Non-linear Regression Functions and Interactions
The linear regression model we saw assumes that marginal
effects do not vary with other variables.
Example:
Earningsi = b0 + b1Educi + b2Femalei + ui
Question: how does Educ affect E[Earning|Female = 1] and
E[Earning|Female = 0]?
Consider the extended model:
Earningsi = b0 + b1Educi + b2Femalei +
= b3Educi ⇥ Femalei + ui
Bruno Ferman Econometrics I: Multivariate Regression
I Inxp
Xi Xu
YEn Xn
y or
M
Non-linear Regression Functions and Interactions
Other non-linear models that can be estimated using OLS:
Power functions: Yi = b0 + b1Xi + b2X2
i + ui
Log-linear: ln(Yi) = b0 + b1Xi + ui
Linear-log: Yi = b0 + b1ln(Xi) + ui
Log-log: ln(Yi) = b0 + b1ln(Xi) + ui
Interpretation
Bruno Ferman Econometrics I: Multivariate Regression
aC
and q
I c
I
Rip x
Save for later...
Some topics on multivariate OLS we will cover later...
Derivation of multivariate OLS formula (derive the
variance, prove consistency, asymptotic distribution,...)
Frisch-Waugh-Lovell theorem
Gauss Markov theorem
Hypothesis testing in multivariate OLS
Self-study and problem set
R2 and adjusted-R2
Bruno Ferman Econometrics I: Multivariate Regression
O ra
b e
f

Lecture5 - multivariate regression (rabiscado)

Ferramentas de estudo

Conteúdos escolhidos para você

CausalML_book

Slides 6 - IV (rabiscado)

ECONOMETRIA

Notas de Econometria

Perguntas dessa disciplina

Ao se realizar a amostragem de um determinado evento qualquer, não é garantido que as amostras tenham qualquer tipo de rigor estatístico. Muitas vezes

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio aplica

Um pesquisador deseja avaliar três cultivares de arroz em dois sistemas de manejo hídrico (alagado e sequeiro). Para isso, decide utilizar um exper...

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio apl...

avaliação de experimentação agrícola Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, confor...

Conteúdos escolhidos para você

CausalML_book

Slides 6 - IV (rabiscado)

ECONOMETRIA

Notas de Econometria

Perguntas dessa disciplina

Ao se realizar a amostragem de um determinado evento qualquer, não é garantido que as amostras tenham qualquer tipo de rigor estatístico. Muitas vezes

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio aplica

Um pesquisador deseja avaliar três cultivares de arroz em dois sistemas de manejo hídrico (alagado e sequeiro). Para isso, decide utilizar um exper...

Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, conforme aumentava a dose de nitrogênio apl...

avaliação de experimentação agrícola Durante o acompanhamento da produção de milho em duas áreas experimentais, um pesquisador observou que, confor...

Mais conteúdos dessa disciplina