685 pág.

Pré-visualização | Página 50 de 50
a tn random variable follows an F1,n distribution (see Problem 6 at the end of this chapter). 6.3 The Sample Mean and the Sample Variance 195 6.3 The Sample Mean and the Sample Variance Let X1, . . . , Xn be independent N (μ, σ 2) random variables; we sometimes refer to them as a sample from a normal distribution. In this section, we will find the joint and marginal distributions of X = 1 n n∑ i=1 Xi S2 = 1 n − 1 n∑ i=1 (Xi − X)2 These are called the sample mean and the sample variance, respectively. First note that because X is a linear combination of independent normal random variables, it is normally distributed with E(X) = μ Var(X) = σ 2 n As a preliminary to showing that X and S2 are independently distributed, we establish the following theorem. THEOREM A The random variable X and the vector of random variables (X1 − X , X2 − X , . . . , Xn − X) are independent. Proof At the level of this course, it is difficult to give a proof that provides sufficient insight into why this result is true; a rigorous proof essentially depends on geo- metric properties of the multivariate normal distribution, which this book does not cover. We present a proof based on moment-generating functions; in particular, we will show that the joint moment-generating function M(s, t1, . . . , tn) = E{exp[s X + t1(X1 − X) + · · · + tn(Xn − X)]} factors into the product of two moment-generating functions—one of X and the other of (X1 − X), . . . , (Xn − X). The factoring implies (Section 4.5) that the random variables are independent of each other and is accomplished through some algebraic trickery. First we observe that since n∑ i=1 ti (Xi − X) = n∑ i=1 ti Xi − nX ¯t 196 Chapter 6 Distributions Derived from the Normal Distribution then s X + n∑ i=1 ti (Xi − X) = n∑ i=1 [ s n + (ti − ¯t ) ] Xi = n∑ i=1 ai Xi where ai = s n + (ti − ¯t ) Furthermore, we observe that n∑ i=1 ai = s n∑ i=1 a2i = s2 n + n∑ i=1 (ti − ¯t )2 Now we have M(s, t1, . . . , tn) = MX1···Xn (a1, . . . , an) and since the Xi are independent normal random variables, we have M(s, t1, . . . , tn) = n∏ i=1 MXi (ai ) = n∏ i=1 exp ( μai + σ 2 2 a2i ) = exp ( μ n∑ i=1 ai + σ 2 2 n∑ i=1 a2i ) = exp [ μs + σ 2 2 ( s2 n ) + σ 2 2 n∑ i=1 (ti − ¯t )2 ] = exp ( μs + σ 2 2n s2 ) exp [ σ 2 2 n∑ i=1 (ti − ¯t )2 ] The first factor is the mgf of X . Since the mgf of the vector (X1− X , . . . , Xn − X) can be obtained by setting s = 0 in M , the second factor is this mgf. ■ 6.3 The Sample Mean and the Sample Variance 197 COROLLARY A X and S2 are independently distributed. Proof This follows immediately since S2 is a function of the vector (X1 − X , . . . , Xn − X), which is independent of X . ■ The next theorem gives the marginal distribution of S2. THEOREM B The distribution of (n −1)S2/σ 2 is the chi-square distribution with n −1 degrees of freedom. Proof We first note that 1 σ 2 n∑ i=1 (Xi − μ)2 = n∑ i=1 ( Xi − μ σ )2 ∼ χ2n Also, 1 σ 2 n∑ i=1 (Xi − μ)2 = 1 σ 2 n∑ i=1 [(Xi − X) + (X − μ)]2 Expanding the square and using the fact that ∑n i=1(Xi − X) = 0, we obtain 1 σ 2 n∑ i=1 (Xi − μ)2 = 1 σ 2 n∑ i=1 (Xi − X)2 + ( X − μ σ/ √ n )2 This is a relation of the form W = U + V . Since U and V are independent by Corollary A, MW (t) = MU (t)MV (t). W and V both follow chi-square distri- butions, so MU (t) = MW (t)MV (t) = (1 − 2t) −n/2 (1 − 2t)−1/2 = (1 − 2t)−(n−1)/2 The last expression is the mgf of a random variable with a χ2n−1 distribution. ■ One final result concludes this chapter’s collection. 198 Chapter 6 Distributions Derived from the Normal Distribution COROLLARY B Let X and S2 be as given at the beginning of this section. Then X − μ S/ √ n ∼ tn−1 Proof We simply express the given ratio in a different form: X − μ S/ √ n = ( X − μ σ/ √ n ) √ S2/σ 2 The latter is the ratio of an N (0, 1) random variable to the square root of an independent random variable with a χ2n−1 distribution divided by its degrees of freedom. Thus, from the definition in Section 6.2, the ratio follows a t distribution with n − 1 degrees of freedom. ■ 6.4 Problems 1. Prove Proposition A of Section 6.2. 2. Prove Proposition B of Section 6.2. 3. Let X be the average of a sample of 16 independent normal random variables with mean 0 and variance 1. Determine c such that P(|X | < c) = .5 4. If T follows a t7 distribution, find t0 such that (a) P(|T | < t0) = .9 and (b) P(T > t0) = .05. 5. Show that if X ∼ Fn,m, then X−1 ∼ Fm,n . 6. Show that if T ∼ tn , then T 2 ∼ F1,n . 7. Show that the Cauchy distribution and the t distribution with 1 degree of free- dom are the same. 8. Show that if X and Y are independent exponential random variables with λ = 1, then X/Y follows an F distribution. Also, identify the degrees of freedom. 9. Find the mean and variance of S2, where S2 is as in Section 6.3. 10. Show how to use the chi-square distribution to calculate P(a < S2/σ 2 < b). 11. Let X1, . . . , Xn be a sample from an N (μX , σ 2) distribution and Y1, . . . , Ym be an independent sample from an N (μY , σ 2) distribution. Show how to use the F distribution to find P(S2X/S2Y > c). CHAPTER 7 Survey Sampling 7.1 Introduction Resting on the probabilistic foundations of the preceding chapters, this chapter marks the beginning of our study of statistics by introducing the subject of survey sampling. As well as being of considerable intrinsic interest and practical utility, the development of the elementary theory of survey sampling serves to introduce several concepts and techniques that will recur and be amplified in later chapters. Sample surveys are used to obtain information about a large population by exam- ining only a small fraction of that population. Sampling techniques have been used in many fields, such as the following: • Governments survey human populations; for example, the U.S. government con- ducts health surveys and census surveys. • Sampling techniques have been extensively employed in agriculture to estimate such quantities as the total acreage of wheat in a state by surveying a sample of farms. • The Interstate Commerce Commission has carried out sampling studies of rail and highway traffic. In one such study, records of shipments of household goods by motor carriers were sampled to evaluate the accuracy of preshipment estimates of charges, claims for damages, and other variables. • In the practice of quality control, the output of a manufacturing process may be sampled in order to examine the items for defects. • During audits of the financial records of large companies, sampling techniques may be used when examination of the entire set of records is impractical. The sampling techniques discussed here are probabilistic in nature—each mem- ber of the population has a specified probability of being included in the sample, and the actual composition of the sample is random. Such techniques differ markedly from 199 200 Chapter 7 Survey Sampling the type of sampling scheme in which particular population members are included in the sample because the investigator thinks they are typical in some way. Such a scheme may be effective in some situations, but there is no way mathematically to guarantee its unbiasedness (a term that will be precisely defined later) or to estimate the magnitude of any error committed, such as that arising from estimating the popu- lation mean by the sample mean. We will see that using a random sampling technique has a consequence that estimates can be guaranteed to be unbiased and probabilistic