Buscar

Note 5 Econometrics Havard

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 3, do total de 7 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 6, do total de 7 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Prévia do material em texto

Harvard Economics
Ec 1126
Tamer - October 21, 2015
Note 5 - Asymptotics
Note 5: Asymptotic Inference in the Linear Model
1 Large Sample Inference in the Least Squares Model
Again, to construct confidence intervals, we used the assumption that yi|xi ∼ N (x′iβ, σ2). This
is a very strong assumption as it requires that the distribution of the outcome variable have a
given form (or shape); in particular, it requires that this distribution be known (up to mean and
variance). But, on the other hand, with the CRM assumption above, we were able to construct
exact small sample valid confidence intervals.
Here, we try and relax that assumption by making a different (also heroic) assumption. Mainly,
we require that we have a large enough sample whereby limit theory holds (i.e., we are able to use
central limit theorems and laws of large numbers). This will allows to get approximate confidence
intervals that hold as sample size gets large.
We provide first a brief (very brief) refresher of these. Please consult a standard probability
and statistics text, such as deGroot and Shervish for a more complete analysis.
1.1 Consistency of Least Squares
Definition 1.1. The sequence of random variables Qn converges in probability to a constant α if
lim
n→∞Prob(|Qn − α| > �) = 0
The intuitive argument that sample moments can be used as estimators for population
moments is justified in large samples by law of large numbers:
Law of Large Numbers. If Wi i.i.d and E(|Wi|) <∞, then
1
n
N∑
i=1
Wi →p E(W1)
It is convenient to work with convergence in probability because it interacts nicely with
continuous functions:
1
Harvard Economics
Ec 1126
Tamer - October 21, 2015
Note 5 - Asymptotics
Slustsky Theorem: (i) If the sequence of random variables Qn takes on values in RJ , (ii)
Qn →p α and (iii) the function g : RJ → RM is continuous at α, then
g(Qn)→p g(α)
We say that βˆ is a consistent estimator of β if b→p β.
Claim 1.2. βˆ is a consistent estimator of β.
Proof: Reminder: the least squares estimator, βˆ = (X ′X)−1X ′Y was derived by minimizing
the sample version of the expected squared loss.
So, we have
1
N
X ′Y =
1
N
N∑
i=1
xiyi →p Exiyi
Also
1
N
xix
′
i →p Exix′i
Since βˆ is a continuous function of these sample moments, we have
βˆ →p β = [Exix′i]−1Exiyi
1.2 Limit Distribution of Least Squares
Let W be a K × 1 random variable with a N (0,Σ) distribution. A sequence of random variables
SN converges in distribution to N (0,Σ) if for any (well-behaved) subset A of RK , we have
lim
N→∞
Prob(SN ∈ A) = Prob(W ∈ A)
Notation:
SN →d N (0,Σ)
Central Limit Theorem: If theK×1 random variablesGi are independent and identically distributed
with E(Gi) = 0 and Cov(Gi) = Σ, then
1√
N
∑
i
Gi →d N (0,Σ)
2
Victoria
Sticky Note
where does the transpose go?
Victoria
Sticky Note
why the 1 over sqrt N?
Victoria
Sticky Note
isn't there a sum here?
Harvard Economics
Ec 1126
Tamer - October 21, 2015
Note 5 - Asymptotics
There is also a second part to Slutzky.
Slutsky Part 2: Let SN be a sequence of K × 1 random variables with SN →d N(0,Σ), and
let QN be a sequence of J ×K random variables with QN →d α, a constant. Then
QNSN →d α.N(0,Σ) = N (0, αΣα′)
Define the predictor
ui = yi − E∗(yi|xi)
so that
yi = x
′
iβ + ui
with Cov(ui, xi) = 0 (by definition).
Substitute this expression for yi into the formula for the least-squares estimator:
βˆ =
(
1
N
∑
i xix
′
i
)−1 1
N
∑
xiyi
= β +
(
1
N
∑
i xix
′
i
)−1 1
N
∑
i xiui
Now, look at
√
N(βˆ − β) :
√
N(βˆ − β) =
(
1
N
∑
i
xix
′
i
)−1
1√
N
∑
i
xiui
Because βˆ is converging to β, its distribution is becoming degenerate, with the probability piling
up in a shrinking neighborhood of β. So we multiply by
√
N to obtain a nondegenerate limit
distribution. Define Gi = xiui. Then, Gi is iid (a function of (yi, xi)), EGi = 0 and
Cov(
1√
N
∑
i
Gi) =
1
N
∑
i
Cov(Gi) = Cov(G1)
So,
√
N will be the right stabilizing factor for the variance. Now we can appeal to the central limit
theorem to obtain the normal distribution.
3
Victoria
Highlight
Victoria
Sticky Note
???
Harvard Economics
Ec 1126
Tamer - October 21, 2015
Note 5 - Asymptotics
Claim 1.3. We have
√
N(βˆ − β)→d N (0, αΣα′)
where α = [E(xix
′
i)]
−1 and Σ = E[u2ixix
′
i]
We start with
√
N(βˆ − β) =
(
1
N
∑
i
xix
′
i
)−1
1√
N
∑
i
xiui
Define Gi = xiui and note that E(Gi) = 0, Cov(Gi) = E(GiG
′
i) = Σ. By the Central Limit
Theorem we have:
1√
N
∑
i
Gi →d N (0,Σ)
By the law of large numbers and Slutsky,
QN =
(
1
N
∑
i
xix
′
i
)−1
→p [Exix′i]−1 = α
Then, by second Slutsky,
√
N(βˆ − β) = QnSn →d αN (0,Σ) = N (0, αΣα′)
This is usually called the Huber-Eicher-White standard error formula. This formula for the variance
simplifies under homosekdasticity. In particular, if E[u2i |xi] = σ2 (does not depend on xi - that is
what homoskedasticity means), we get
Σ = E[u2ixix
′
i] = E
[
E(u2i |xi)xix′i
]
= σ2Exix
′
i
which means that in this case,
√
N(βˆ − β) = QnSn →d αN (0,Σ) = N (0, σ2α) = N (0, σ2[Exix′i]−1)
So, the purpose of deriving a limit distribution for an estimator is to be able to characterize
the sampling uncertainty. Again, we do not observe β and we are using βˆ as an estimator for β.
This estimator is a function of the sample (and hence depends on the sample size). It is natural to
expect that as sample size increases in an iid setting, we gain more and more information about β,
4
Victoria
Highlight
Victoria
Sticky Note
quando o alpha pula pra dentro pra ficar na variancia, ele ganha ao quadrado? tipo, ganha uma potencia?
Victoria
Highlight
Harvard Economics
Ec 1126
Tamer - October 21, 2015
Note 5 - Asymptotics
and hence somehow βˆ is closer to β. This will be reflected in the distribution of βˆ. In particular,
given the above, we have
βˆ
N large∼ N
(
β,
αΣα
N
)
So,
V ar(βˆ) ' 1
N
αΣα′
Here, ' means that as N is large. So, it is reassuring to see that the variance of βˆ is converging to
zero as N gets large. So, the distribution of βˆ is centered around β with its variance (a measure
of sampling uncertainty) decreasing to zero.
2 Confidence Interval
So, we have derived, under random sampling and large N , that the sampling distribution of the
least squares estimator βˆ can be approximated by a precisely defined normal random vector with a
given covariance matrix:
P (
√
N(βˆ − β) ∈ A) ∼A P (Z ∈ A)
where Z ∼ N (0, αΣα′) and ∼A means that “equals for large N”. There is a question of how good
is the approximation, but we will not deal with that in this class.
Now, the point of us deriving this approximation is because we want to do inference on β
(which we do not observe). I.e., we want to learn about the value of β. A confidence interval is one
way to do it. But, to get a confidence interval, we need to somehow get the sampling distribution
of βˆ and how it is related to β. In the previous note, we did that by making assumptions. In
particular, there the normal linear model requires that the distribution of y|x is normal (a strong
assumption). Here, on the other hand, we drop that assumption, but use an approximation to the
distribution of βˆ that we get via large N . So, we can “substitute” questions about where β is, via
the normal limit above. To do that, we need an estimator of α and Σ.
In particular, let the residuals uˆi = yi − x′iβˆ. and define
Λˆ = αˆΣˆαˆ′
5
Victoria
Sticky Note
here is a typo and we are actually missing a prime ' after the second alpha, right?
Victoria
Highlight
VictoriaHighlight
Victoria
Highlight
Harvard Economics
Ec 1126
Tamer - October 21, 2015
Note 5 - Asymptotics
where
αˆ =
(
1
N
∑
i
xix
′
i
)−1
, Σˆ =
1
N
∑
i
uˆ2ixix
′
i
We can use the law of large numbers and Slutsky to show that
Λˆ = αˆΣˆαˆ′ →p Λ = αΣα′
Now, we shall obtain a confidence interval for β1 the 1st element of the K−vector β (this
is just for illustration - we can construct an interval for any element of β).
To do that precisely, let l be a K-dimensional vector with li = 0 for all i 6= 1 and l1 = 1. So,
l′β = β1
and Slutsky implies that
(l′Λˆl)1/2 →p (l′Λl)1/2
(Here, l′Λˆl essentially “picks out” the (1, 1) element of Λˆ).
The second Slutsky implies that
l′(
√
N(βˆ − β))
(l′Λˆl)1/2
→d 1
(l′Λl)1/2
N (0, l′Λl) = N (0, 1)
Define the Standard Error (SE) as
SE = (l′Λˆl/N)1/2
We have established that
l′(βˆ − β)
SE
→d N (0, 1)
Claim 2.1. We have
l′(βˆ − β)
SE
→d N (0, 1)
The ratio l′(βˆ − β)/SE is an asymptotic pivot for l′β. It depends upon the unknown
parameters only through l′β, and it has a known limit distribution. This leads to a confidence
interval for l′β. The normal distribution is available in tables and in computer programs. We have
Prob(N (0, 1) > 1.96) = .025
6
Victoria
Highlight
Victoria
Sticky Note
da onde surge esse l' aqui??
Harvard Economics
Ec 1126
Tamer - October 21, 2015
Note 5 - Asymptotics
and since the normal distribution is symmetric about zero,
Prob(−1.96 ≤ N (0, 1) ≤ 1.96) = .95
Then, the Claim above implies that
lim
N→∞
Prob(−1.96 ≤ l
′(βˆ − β)
SE
≤ 1.96) = .95
and so
lim
N→∞
Prob(l′βˆ − 1.96SE ≤ β ≤ l′βˆ + 1.96SE) = .95
or that,
lim
N→∞
Prob(β ∈ [l′βˆ − 1.96SE, l′βˆ + 1.96SE]) = .95
7
	Large Sample Inference in the Least Squares Model
	Consistency of Least Squares
	Limit Distribution of Least Squares
	Confidence Interval

Continue navegando

Outros materiais