A maior rede de estudos do Brasil

Grátis
685 pág.
Rice J A Mathematical statistics and data analysis

Pré-visualização | Página 50 de 50

a tn random
variable follows an F1,n distribution (see Problem 6 at the end of this chapter).
6.3 The Sample Mean and the Sample Variance 195
6.3 The Sample Mean and the Sample Variance
Let X1, . . . , Xn be independent N (μ, σ 2) random variables; we sometimes refer to
them as a sample from a normal distribution. In this section, we will find the joint
and marginal distributions of
X = 1
n
n∑
i=1
Xi
S2 = 1
n − 1
n∑
i=1
(Xi − X)2
These are called the sample mean and the sample variance, respectively. First note
that because X is a linear combination of independent normal random variables, it is
normally distributed with
E(X) = μ
Var(X) = σ
2
n
As a preliminary to showing that X and S2 are independently distributed, we
establish the following theorem.
THEOREM A
The random variable X and the vector of random variables (X1 − X , X2 −
X , . . . , Xn − X) are independent.
Proof
At the level of this course, it is difficult to give a proof that provides sufficient
insight into why this result is true; a rigorous proof essentially depends on geo-
metric properties of the multivariate normal distribution, which this book does not
cover. We present a proof based on moment-generating functions; in particular,
we will show that the joint moment-generating function
M(s, t1, . . . , tn) = E{exp[s X + t1(X1 − X) + · · · + tn(Xn − X)]}
factors into the product of two moment-generating functions—one of X and the
other of (X1 − X), . . . , (Xn − X). The factoring implies (Section 4.5) that the
random variables are independent of each other and is accomplished through
some algebraic trickery. First we observe that since
n∑
i=1
ti (Xi − X) =
n∑
i=1
ti Xi − nX ¯t
196 Chapter 6 Distributions Derived from the Normal Distribution
then
s X +
n∑
i=1
ti (Xi − X) =
n∑
i=1
[ s
n
+ (ti − ¯t )
]
Xi
=
n∑
i=1
ai Xi
where
ai = s
n
+ (ti − ¯t )
Furthermore, we observe that
n∑
i=1
ai = s
n∑
i=1
a2i =
s2
n
+
n∑
i=1
(ti − ¯t )2
Now we have
M(s, t1, . . . , tn) = MX1···Xn (a1, . . . , an)
and since the Xi are independent normal random variables, we have
M(s, t1, . . . , tn) =
n∏
i=1
MXi (ai )
=
n∏
i=1
exp
(
μai + σ
2
2
a2i
)
= exp
(
μ
n∑
i=1
ai + σ
2
2
n∑
i=1
a2i
)
= exp
[
μs + σ
2
2
(
s2
n
)
+ σ
2
2
n∑
i=1
(ti − ¯t )2
]
= exp
(
μs + σ
2
2n
s2
)
exp
[
σ 2
2
n∑
i=1
(ti − ¯t )2
]
The first factor is the mgf of X . Since the mgf of the vector (X1− X , . . . , Xn − X)
can be obtained by setting s = 0 in M , the second factor is this mgf. ■
6.3 The Sample Mean and the Sample Variance 197
COROLLARY A
X and S2 are independently distributed.
Proof
This follows immediately since S2 is a function of the vector (X1 − X , . . . ,
Xn − X), which is independent of X . ■
The next theorem gives the marginal distribution of S2.
THEOREM B
The distribution of (n −1)S2/σ 2 is the chi-square distribution with n −1 degrees
of freedom.
Proof
We first note that
1
σ 2
n∑
i=1
(Xi − μ)2 =
n∑
i=1
(
Xi − μ
σ
)2
∼ χ2n
Also,
1
σ 2
n∑
i=1
(Xi − μ)2 = 1
σ 2
n∑
i=1
[(Xi − X) + (X − μ)]2
Expanding the square and using the fact that
∑n
i=1(Xi − X) = 0, we obtain
1
σ 2
n∑
i=1
(Xi − μ)2 = 1
σ 2
n∑
i=1
(Xi − X)2 +
(
X − μ
σ/
√
n
)2
This is a relation of the form W = U + V . Since U and V are independent
by Corollary A, MW (t) = MU (t)MV (t). W and V both follow chi-square distri-
butions, so
MU (t) = MW (t)MV (t)
= (1 − 2t)
−n/2
(1 − 2t)−1/2
= (1 − 2t)−(n−1)/2
The last expression is the mgf of a random variable with a χ2n−1 distribution. ■
One final result concludes this chapter’s collection.
198 Chapter 6 Distributions Derived from the Normal Distribution
COROLLARY B
Let X and S2 be as given at the beginning of this section. Then
X − μ
S/
√
n
∼ tn−1
Proof
We simply express the given ratio in a different form:
X − μ
S/
√
n
=
(
X − μ
σ/
√
n
)
√
S2/σ 2
The latter is the ratio of an N (0, 1) random variable to the square root of an
independent random variable with a χ2n−1 distribution divided by its degrees of
freedom. Thus, from the definition in Section 6.2, the ratio follows a t distribution
with n − 1 degrees of freedom. ■
6.4 Problems
1. Prove Proposition A of Section 6.2.
2. Prove Proposition B of Section 6.2.
3. Let X be the average of a sample of 16 independent normal random variables
with mean 0 and variance 1. Determine c such that
P(|X | < c) = .5
4. If T follows a t7 distribution, find t0 such that (a) P(|T | < t0) = .9 and
(b) P(T > t0) = .05.
5. Show that if X ∼ Fn,m, then X−1 ∼ Fm,n .
6. Show that if T ∼ tn , then T 2 ∼ F1,n .
7. Show that the Cauchy distribution and the t distribution with 1 degree of free-
dom are the same.
8. Show that if X and Y are independent exponential random variables with λ = 1,
then X/Y follows an F distribution. Also, identify the degrees of freedom.
9. Find the mean and variance of S2, where S2 is as in Section 6.3.
10. Show how to use the chi-square distribution to calculate P(a < S2/σ 2 < b).
11. Let X1, . . . , Xn be a sample from an N (μX , σ 2) distribution and Y1, . . . , Ym be
an independent sample from an N (μY , σ 2) distribution. Show how to use the
F distribution to find P(S2X/S2Y > c).
CHAPTER 7
Survey Sampling
7.1 Introduction
Resting on the probabilistic foundations of the preceding chapters, this chapter marks
the beginning of our study of statistics by introducing the subject of survey sampling.
As well as being of considerable intrinsic interest and practical utility, the development
of the elementary theory of survey sampling serves to introduce several concepts and
techniques that will recur and be amplified in later chapters.
Sample surveys are used to obtain information about a large population by exam-
ining only a small fraction of that population. Sampling techniques have been used
in many fields, such as the following:
• Governments survey human populations; for example, the U.S. government con-
ducts health surveys and census surveys.
• Sampling techniques have been extensively employed in agriculture to estimate
such quantities as the total acreage of wheat in a state by surveying a sample of
farms.
• The Interstate Commerce Commission has carried out sampling studies of rail and
highway traffic. In one such study, records of shipments of household goods by
motor carriers were sampled to evaluate the accuracy of preshipment estimates of
charges, claims for damages, and other variables.
• In the practice of quality control, the output of a manufacturing process may be
sampled in order to examine the items for defects.
• During audits of the financial records of large companies, sampling techniques may
be used when examination of the entire set of records is impractical.
The sampling techniques discussed here are probabilistic in nature—each mem-
ber of the population has a specified probability of being included in the sample, and
the actual composition of the sample is random. Such techniques differ markedly from
199
200 Chapter 7 Survey Sampling
the type of sampling scheme in which particular population members are included
in the sample because the investigator thinks they are typical in some way. Such a
scheme may be effective in some situations, but there is no way mathematically to
guarantee its unbiasedness (a term that will be precisely defined later) or to estimate
the magnitude of any error committed, such as that arising from estimating the popu-
lation mean by the sample mean. We will see that using a random sampling technique
has a consequence that estimates can be guaranteed to be unbiased and probabilistic