Baixe o app para aproveitar ainda mais
Prévia do material em texto
Apostila 7, PGE953, copyright Betsabe Blas, 2018 Apostila 7 - Outline • Gauss Markov Theorem • Generalized Least Squares (GLS) • Weighted Least Squares (WLS) • The linear mixed model as GLS • Start distribution theory for linear models 1 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Review: What can we estimate? • In Result 6.1, we showed that there was a unique decomposition Y = Y1 + Y2 with Y1 ∈ C(X1) and Y2 ∈ C⊥(X1), and that Y1 = PX1Y, the predicted values from the least squares fit using X1. • However, since C(X1) = C(X2), the same result holds for X2. • Hence if C(X1) = C(X2), the predicted values from the least squares fits are the same. • Result 6.2: If two design matrices are reparameterizations of one another, the predicted values of the form PXY are the same. 2 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Review: More Properties of Least Squares • Result 6.3: E(β̂ols|X) = GXTXβ cov(β̂ols|X) = σ2GXTXGT. • If G is the Moore-Penrose inverse, then GXTXGT = G 3 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Review: More Properties of Least Squares • Result 6.4: E(�̂|X) = 0 cov(�̂|X) = σ2(I−PX) 4 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Review: More Properties of Least Squares • In practice, it is important to estimate σ2. Let r = rank(XTX). The usual estimate is σ̂2 = (n− r)−1�̂T�̂ • Result 6.6: E(σ̂2|X) = σ2 5 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Review: Estimability • We know that if XTX is of full rank, then E(β̂ols|X) = β, and hence OLS gives us a method for estimating β and functions of it. • What happens if XTX is not of full rank? • Definition 6.2: The linear combination cTβ is said to be estimable if there exists an (n× 1) vector b with the property that E(bTY|X) = cTβ for all β when E(Y|X) = Xβ. In this case, bTY is said to be an unbiased estimator of cTβ. • Basically we are asking: what can we actually estimate from a non-full rank parameterization? 6 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Review: Estimability • Result 6.7: The function cTβ is estimable if and only if cT = bTX for some vector b. • Result 6.8: The function cTβ is estimable if and only if cT = cTGXTX, where G is a generalized inverse of XTX. 7 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Review: The Gauss Markov Theorem • Gauss Markov Theorem: Suppose that cTβ is estimable. Then among all estimators that are linear combinations of Y, i.e., dTY and that are also unbiased, i.e., E(dTY|X) = cTβ, cTβ̂ols is the one with the minimum variance. • We showed that cTβ̂ols is an unbiased estimator of cTβ. 8 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 The Gauss Markov Theorem • Next we have to show the minimum variance property. We showed previously that (XGXTX)T = XT. Remember that cT = cTGXTX = bTX. Then var(cTβ̂ols|X) = σ2cTGXTXGTc = σ2cTGXTXGTXTb = σ2cTGXTb = σ2cTGcT. 9 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 The Gauss Markov Theorem • There is a somewhat subtle argument going on here cTβ̂ols = cTGXTY = bTXGXTY = bTPXY • The estimate does not depend on the form of the generalized inverse • Hence its variance does not • For the M-P, it is symmetric, so GXTXGT = GXTXG = G • Hence, the previous argument holds for any G 10 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 The Gauss Markov Theorem • Now consider any other linear function dTY with the property that it is unbiased, i.e., E(dTY|X) = cTβ. • This means that for any β, dTXβ = cTβ, and hence that dTX = cT. • Then cov(cTβ̂ols,dTY|X) = cov(cTGXTY,dTY|X) = σ2cTGXTd = σ2cTGc, the last step following from dTX = cT. 11 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 The Gauss Markov Theorem • This means that 0 ≤ var(cTβ̂ols − dTY|X) = var(cTβ̂ols|X) + var(dTY|X)− 2cov(cTβ̂ols,dTY|X) = σ2cTGc + var(dTY|X)− 2σ2cTGc = var(dTY|X)− σ2cTGc = var(dTY|X)− var(cTβ̂ols|X) as desired. 12 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 The Gauss Markov Theorem • The Gauss-Markov Theorem is a powerful result of course. • It says that if we restrict to linear functions of Y that are unbiased for cTβ, then β̂ols is as good as we can do. • However, both unbiasedness and linear are constraints on the system. • There is a large literature on estimators that relax both. 13 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Critical Result • Result 7.1: Suppose that the m×m symmetric matrix Σ is positive semidefinite matrix of rank r. Then there is an m× r matrix K such that Σ = KKT, and KTK is positive definite of rank r. • First use the spectral decomposition Σ = P [ D 0 0 0 ] PT, where P is orthogonal and D is a diagonal matrix of the positive eigenvalues of Σ 14 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Critical Result • Now note that we can rewrite[ D 0 0 0 ] = [ D1/2 0m−r×r ] [ D1/2 0r×m−r ] = [ D1/2 0m−r×r ] [ D1/2 0m−r×r ]T • Hence, Σ = KKT, where K = P [ D1/2 0m−r×r ] 15 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Critical Result • Hence, Σ = KKT, where K = P [ D1/2 0m−r×r ] • Because PTP = I, we get KTK = D as claimed. 16 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Generalized Least Squares (GLS) • The GLS model occurs when cov(�|X) 6= σ2I. • Some examples later. • Let us write Y = Xβ + �; E(Y|X) = Xβ; cov(�|X) = σ2V; cov(Y|X) = σ2V. where V is positive definite (it is by definition symmetric). 17 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Generalized Least Squares (GLS) • Since V is symmetric and positive definite, we can write it as KKT, where K is nonsingular. (why?) • Define Y∗ = (K−1)Y; X∗ = (K−1)X; �∗ = (K−1)� 18 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Generalized Least Squares (GLS) • Then, because if I know X I also know X∗ and vice-versa because K is invertible. E(Y∗|X∗) = E(Y∗|X) = X∗β; cov(Y∗|X∗) = σ2I why? • The OLS estimator for this model is the GLS estimator: β̂gls = (XT∗X∗)−XT∗Y∗ = (XTV−1X)−XTV−1Y 19 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Generalized Least Squares (GLS) • Remember, E(Y∗|X∗) = E(Y∗|X) = X∗β; cov(Y∗|X∗) = σ2I • The GLS estimator is the OLS estimator for this equivalent model. We minimize (Y∗ −X∗β)T(Y∗ −X∗β) = (Y−Xβ)V−1(Y−Xβ) (homework) 20 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Generalized Least Squares (GLS) • Since (XTV−1X) is symmetric, its Moore-Penrose inverse is as well, and if you use the Moore-Penose inverse, then E(β̂gls|X) = (XTV−1X)−(XTV−1X)β; cov(β̂gls|X) = σ2(XTV−1X)− 21 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Gauss-Markov for GLS • Result 7.2: Let cTβ be estimable. Then cTβ̂gls is unbiased for cTβ and it has minimum variance among all linear estimators that are also unbiased for cTβ. • Because K is invertible, an equivalent model is that E(Y∗|X∗) = E(Y∗|X) = X∗β; cov(Y∗|X∗) = σ2I • Note that cTβ being estimable and K is invertible, this means that cT = bTX = bTKK−1X = bT∗X∗ • Thus, if cTβ is estimable in the original model, it is estimable in this equivalent model, and hence we can apply Gauss-Markov. 22 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Weighted Least Squares, an Example of GLS • Suppose that V = diag(w21, w22, ..., w2n). • This problem is generally known as weighted least squares. • As homework, I have asked you to show that GLS is equivalent to dividing Yi and Xi by wi. • As homework, I have asked you to show that GLS minimizes∑n i=1(Yi −XTi β)2/w2i •This case is known as weighted least squares, with the weights inversely proportional to the regression error variances 23 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 The Linear Mixed Model • The most general form of the linear mixed model is that for independent u and �, Y = Xβ + Zu + �; E(u|X,Z) = E(�|X,Z) = 0; cov(u|X,Z) = D; cov(�|X,Z) = σ2I • Here, D is not necessarily a diagonal matrix. 24 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Repeated Measures • The classic example of this model is repeated measures with a random intercept. There are N individuals and mi observations per individual, and n = ∑Ni=1mi. • The model takes the form Yij = XTi β + ui + �ij ; ui = [ 0, σ2u ] ; �i = [ 0, σ2 ] . • Note how the data within an individual are correlated: cov(Yij , Yik|Xi) = σ2u 25 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Repeated Measures • The idea of repeated measures analysis is to get better inference for β while taking into account the correlation in the observations for an individual. • In practice, the gain in efficiency can be very large 26 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Repeated Measures • We can write the repeated measures model as a linear mixed model. Let u = (u1, ..., uN )T. Let ei be an mi vector of ones, and define Z = e1 0 0 · · · 0 0 e2 0 · · · 0 0 0 e3 · · · 0 ... ... ... ... ... 0 0 0 · · · eN • This is an n×N vector, where n is the total sample size and N the number of individuals 27 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Repeated Measures and the Linear Mixed Model • With X and � defined in the usual way, and λ = σ2u/σ2, we have Y = Xβ + Zu + � • Since cov(U|X,Z) = σ2uIN, we have that E(Y|X,Z) = Xβ; cov(Y|X,Z) = σ2uZZT + σ2I = = σ2{I + λZZT) = σ2V • Note that from a homework assignment, V is positive definite, even though ZZT is not. 28 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Repeated Measures and the Linear Mixed Model • If we knew V, i.e., if we knew λ = (σ2u/σ2), then from, Gauss-Markov, the GLS estimator is just β̂gls = β̂gls(λ) = (XTV−1X)−XTV−1Y • Later we will learn how to estimate the variance component σ2u. 29 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Distribution Theory Relevant for Linear Models • The standard normal distribution, denoted by X = Normal(0, 1), has density function f(z) = (2pi)−1 exp(−z2/2) • Its first four moments are E(Z) = 0, E(Z2) = 1, E(Z3) = 0 and E(Z4) = 3. • We write X = Normal(µ, σ2) if X = µ+ σZ, Z is standard normal. Its density function is fX(x, µ, σ2) = (2piσ2)−1/2 exp { −(2σ2)−1(x− µ)2 } 30 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Distribution Theory Relevant for Linear Models • If X = Normal(µ, σ2), its moment generating function is mX(t) = E{exp(tX)} = ∫ exp(tx)fX(x, µ, σ2)dx = exp(tµ+ t2σ2/2) • If X = Normal(µ, σ2), then Z = (X − µ)/σ = Normal(0, 1) (homework) 31 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Distribution Theory Relevant for Linear Models • If it exists, the moment generating function determines the distribution function of a random variable, i.e., if two random variables have the same mgf on an open interval including zero, then they have the same distribution. • The mgf determines the moments: E(Xk) = ∂ kmX(t) ∂tk ∣∣∣∣∣ t=0 32 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Multivariate Standard Normal • Z = (Z1, ..., Zn) is multivariate standard normal if the Zi are independent standard normal random variables, and its density function is fZ(z) = (2pi)−n/2 exp(−zTz/2) • The mgf of a multivariate random vector X is mX(b) = E { exp(bTX) } • If the multivariate random vector X = (X1, ...,Xn) has mgf mX(b), and if Xi has mgf mi(ti), then the Xi are independent if and only if mX(b) = ∏n i=1mi(ti) 33 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Multivariate Standard Normal • Suppose that X has mean µ and covariance matrix Σ, where rank(Σ) = r > 0, and hence that Σ has r non-zero eigenvalues. • Using the spectral decomposition, we can write Σ = KKT. • Definition 7.1: Then X = Normal(µ,Σ), the multivariate normal with mean µ, covariance matrix Σ and rank r if it has the same distribution as µ+ KZ, where Z is multivariate standard normal 34 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Multivariate Standard Normal • Result 7.3: Suppose that X = Normal(µ,Σ). Then (homework) its mgf is mX(b) = exp(bTµ+ bTΣb/2) • In addition, let a and B be constants. Then, with another mgf calculation, a + BX = Normal(a + Bµ,BΣBT). 35 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018 Multivariate Standard Normal • Thus, if X = (XT1 ,XT2 ) and µ = (µT1 , µT2 )T, and if Σ = [ Σ11 Σ12 Σ21 Σ22 ] then X1 = Normal(µ1,Σ11) and X2 = Normal(µ2,Σ22) • Result 7.4: In other words, the marginal distributions of a multivariate normal are also multivariate normal. 36 / 36 Apostila 7, PGE953, copyright Betsabe Blas, 2018
Compartilhar