Baixe o app para aproveitar ainda mais
Prévia do material em texto
Harvard Economics Ec 1126 Tamer - October 21, 2015 Note 5 - Asymptotics Note 5: Asymptotic Inference in the Linear Model 1 Large Sample Inference in the Least Squares Model Again, to construct confidence intervals, we used the assumption that yi|xi ∼ N (x′iβ, σ2). This is a very strong assumption as it requires that the distribution of the outcome variable have a given form (or shape); in particular, it requires that this distribution be known (up to mean and variance). But, on the other hand, with the CRM assumption above, we were able to construct exact small sample valid confidence intervals. Here, we try and relax that assumption by making a different (also heroic) assumption. Mainly, we require that we have a large enough sample whereby limit theory holds (i.e., we are able to use central limit theorems and laws of large numbers). This will allows to get approximate confidence intervals that hold as sample size gets large. We provide first a brief (very brief) refresher of these. Please consult a standard probability and statistics text, such as deGroot and Shervish for a more complete analysis. 1.1 Consistency of Least Squares Definition 1.1. The sequence of random variables Qn converges in probability to a constant α if lim n→∞Prob(|Qn − α| > �) = 0 The intuitive argument that sample moments can be used as estimators for population moments is justified in large samples by law of large numbers: Law of Large Numbers. If Wi i.i.d and E(|Wi|) <∞, then 1 n N∑ i=1 Wi →p E(W1) It is convenient to work with convergence in probability because it interacts nicely with continuous functions: 1 Harvard Economics Ec 1126 Tamer - October 21, 2015 Note 5 - Asymptotics Slustsky Theorem: (i) If the sequence of random variables Qn takes on values in RJ , (ii) Qn →p α and (iii) the function g : RJ → RM is continuous at α, then g(Qn)→p g(α) We say that βˆ is a consistent estimator of β if b→p β. Claim 1.2. βˆ is a consistent estimator of β. Proof: Reminder: the least squares estimator, βˆ = (X ′X)−1X ′Y was derived by minimizing the sample version of the expected squared loss. So, we have 1 N X ′Y = 1 N N∑ i=1 xiyi →p Exiyi Also 1 N xix ′ i →p Exix′i Since βˆ is a continuous function of these sample moments, we have βˆ →p β = [Exix′i]−1Exiyi 1.2 Limit Distribution of Least Squares Let W be a K × 1 random variable with a N (0,Σ) distribution. A sequence of random variables SN converges in distribution to N (0,Σ) if for any (well-behaved) subset A of RK , we have lim N→∞ Prob(SN ∈ A) = Prob(W ∈ A) Notation: SN →d N (0,Σ) Central Limit Theorem: If theK×1 random variablesGi are independent and identically distributed with E(Gi) = 0 and Cov(Gi) = Σ, then 1√ N ∑ i Gi →d N (0,Σ) 2 Victoria Sticky Note where does the transpose go? Victoria Sticky Note why the 1 over sqrt N? Victoria Sticky Note isn't there a sum here? Harvard Economics Ec 1126 Tamer - October 21, 2015 Note 5 - Asymptotics There is also a second part to Slutzky. Slutsky Part 2: Let SN be a sequence of K × 1 random variables with SN →d N(0,Σ), and let QN be a sequence of J ×K random variables with QN →d α, a constant. Then QNSN →d α.N(0,Σ) = N (0, αΣα′) Define the predictor ui = yi − E∗(yi|xi) so that yi = x ′ iβ + ui with Cov(ui, xi) = 0 (by definition). Substitute this expression for yi into the formula for the least-squares estimator: βˆ = ( 1 N ∑ i xix ′ i )−1 1 N ∑ xiyi = β + ( 1 N ∑ i xix ′ i )−1 1 N ∑ i xiui Now, look at √ N(βˆ − β) : √ N(βˆ − β) = ( 1 N ∑ i xix ′ i )−1 1√ N ∑ i xiui Because βˆ is converging to β, its distribution is becoming degenerate, with the probability piling up in a shrinking neighborhood of β. So we multiply by √ N to obtain a nondegenerate limit distribution. Define Gi = xiui. Then, Gi is iid (a function of (yi, xi)), EGi = 0 and Cov( 1√ N ∑ i Gi) = 1 N ∑ i Cov(Gi) = Cov(G1) So, √ N will be the right stabilizing factor for the variance. Now we can appeal to the central limit theorem to obtain the normal distribution. 3 Victoria Highlight Victoria Sticky Note ??? Harvard Economics Ec 1126 Tamer - October 21, 2015 Note 5 - Asymptotics Claim 1.3. We have √ N(βˆ − β)→d N (0, αΣα′) where α = [E(xix ′ i)] −1 and Σ = E[u2ixix ′ i] We start with √ N(βˆ − β) = ( 1 N ∑ i xix ′ i )−1 1√ N ∑ i xiui Define Gi = xiui and note that E(Gi) = 0, Cov(Gi) = E(GiG ′ i) = Σ. By the Central Limit Theorem we have: 1√ N ∑ i Gi →d N (0,Σ) By the law of large numbers and Slutsky, QN = ( 1 N ∑ i xix ′ i )−1 →p [Exix′i]−1 = α Then, by second Slutsky, √ N(βˆ − β) = QnSn →d αN (0,Σ) = N (0, αΣα′) This is usually called the Huber-Eicher-White standard error formula. This formula for the variance simplifies under homosekdasticity. In particular, if E[u2i |xi] = σ2 (does not depend on xi - that is what homoskedasticity means), we get Σ = E[u2ixix ′ i] = E [ E(u2i |xi)xix′i ] = σ2Exix ′ i which means that in this case, √ N(βˆ − β) = QnSn →d αN (0,Σ) = N (0, σ2α) = N (0, σ2[Exix′i]−1) So, the purpose of deriving a limit distribution for an estimator is to be able to characterize the sampling uncertainty. Again, we do not observe β and we are using βˆ as an estimator for β. This estimator is a function of the sample (and hence depends on the sample size). It is natural to expect that as sample size increases in an iid setting, we gain more and more information about β, 4 Victoria Highlight Victoria Sticky Note quando o alpha pula pra dentro pra ficar na variancia, ele ganha ao quadrado? tipo, ganha uma potencia? Victoria Highlight Harvard Economics Ec 1126 Tamer - October 21, 2015 Note 5 - Asymptotics and hence somehow βˆ is closer to β. This will be reflected in the distribution of βˆ. In particular, given the above, we have βˆ N large∼ N ( β, αΣα N ) So, V ar(βˆ) ' 1 N αΣα′ Here, ' means that as N is large. So, it is reassuring to see that the variance of βˆ is converging to zero as N gets large. So, the distribution of βˆ is centered around β with its variance (a measure of sampling uncertainty) decreasing to zero. 2 Confidence Interval So, we have derived, under random sampling and large N , that the sampling distribution of the least squares estimator βˆ can be approximated by a precisely defined normal random vector with a given covariance matrix: P ( √ N(βˆ − β) ∈ A) ∼A P (Z ∈ A) where Z ∼ N (0, αΣα′) and ∼A means that “equals for large N”. There is a question of how good is the approximation, but we will not deal with that in this class. Now, the point of us deriving this approximation is because we want to do inference on β (which we do not observe). I.e., we want to learn about the value of β. A confidence interval is one way to do it. But, to get a confidence interval, we need to somehow get the sampling distribution of βˆ and how it is related to β. In the previous note, we did that by making assumptions. In particular, there the normal linear model requires that the distribution of y|x is normal (a strong assumption). Here, on the other hand, we drop that assumption, but use an approximation to the distribution of βˆ that we get via large N . So, we can “substitute” questions about where β is, via the normal limit above. To do that, we need an estimator of α and Σ. In particular, let the residuals uˆi = yi − x′iβˆ. and define Λˆ = αˆΣˆαˆ′ 5 Victoria Sticky Note here is a typo and we are actually missing a prime ' after the second alpha, right? Victoria Highlight VictoriaHighlight Victoria Highlight Harvard Economics Ec 1126 Tamer - October 21, 2015 Note 5 - Asymptotics where αˆ = ( 1 N ∑ i xix ′ i )−1 , Σˆ = 1 N ∑ i uˆ2ixix ′ i We can use the law of large numbers and Slutsky to show that Λˆ = αˆΣˆαˆ′ →p Λ = αΣα′ Now, we shall obtain a confidence interval for β1 the 1st element of the K−vector β (this is just for illustration - we can construct an interval for any element of β). To do that precisely, let l be a K-dimensional vector with li = 0 for all i 6= 1 and l1 = 1. So, l′β = β1 and Slutsky implies that (l′Λˆl)1/2 →p (l′Λl)1/2 (Here, l′Λˆl essentially “picks out” the (1, 1) element of Λˆ). The second Slutsky implies that l′( √ N(βˆ − β)) (l′Λˆl)1/2 →d 1 (l′Λl)1/2 N (0, l′Λl) = N (0, 1) Define the Standard Error (SE) as SE = (l′Λˆl/N)1/2 We have established that l′(βˆ − β) SE →d N (0, 1) Claim 2.1. We have l′(βˆ − β) SE →d N (0, 1) The ratio l′(βˆ − β)/SE is an asymptotic pivot for l′β. It depends upon the unknown parameters only through l′β, and it has a known limit distribution. This leads to a confidence interval for l′β. The normal distribution is available in tables and in computer programs. We have Prob(N (0, 1) > 1.96) = .025 6 Victoria Highlight Victoria Sticky Note da onde surge esse l' aqui?? Harvard Economics Ec 1126 Tamer - October 21, 2015 Note 5 - Asymptotics and since the normal distribution is symmetric about zero, Prob(−1.96 ≤ N (0, 1) ≤ 1.96) = .95 Then, the Claim above implies that lim N→∞ Prob(−1.96 ≤ l ′(βˆ − β) SE ≤ 1.96) = .95 and so lim N→∞ Prob(l′βˆ − 1.96SE ≤ β ≤ l′βˆ + 1.96SE) = .95 or that, lim N→∞ Prob(β ∈ [l′βˆ − 1.96SE, l′βˆ + 1.96SE]) = .95 7 Large Sample Inference in the Least Squares Model Consistency of Least Squares Limit Distribution of Least Squares Confidence Interval
Compartilhar