T0review

•

UFV

0

Lorena Ribeiro

23/09/2014

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Estatística Aplicada

24.095 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

1
Measures of Central Tendency and Dispersion [ST&D p. 16-27] 
Individual values of a population are designated Yi, i = 1,...,N, where N= size of pop. 
Individual values of a sample are also denoted Yi, i = 1,...,n, where n= size of the sample. 
Greek letters are used for population parameters (µ = pop. mean; σ2 = pop. variance). 
Mean or average (measure of central tendency) 
 Pop. mean:    Y
N
i
i
N
1 * Sample mean: 
n
Y
Y
r
i
i
 1 
 
Variance (measure of dispersion of the individuals about the mean) 
 Pop. variance: 

2
2
1


 ( )Y
N
i
i
N
* Sample variance:
1
)(
1
2
2





n
YY
s
r
i
i
 
 The quantities (Yi - ) are called deviations. 
 
To express these measures of dispersion in the original units of observation: 
 
 Pop. standard deviation: 2  * Sample standard deviation: 2ss  
 
To express the standard deviation in units of the mean (or %): 
 
 Pop. coeff. of variation: 
CV * Sample coeff. of variation: Y
sCV  
 
Visualization of central tendency and dispersion using boxplots 
  Review ST&D p. 58 Estimation and inference, p53: 3.8 Distribution of means 
Box Plots 
median
mean
1.5 IQ 
range 
interqartile 
(IQ) range 
* 
 
0 
Outliers 
0 >1.5 IQ and<3 IQ 
 * >3 IQ 
Y
 2
Measures of dispersion of sample means 
 
An important population parameter is the sample variance of the mean ( 2Y ). 
 
 
If you repeatedly sample a population by taking samples of size n, the variance 
of those sample means is what we call the sample variance of the mean. 
 
 
It relates very simply to the population variance: 
 
 Variance of the mean: nY
2
2   
 
 
We can estimate 2Y for a population by taking r independent, random samples 
of size n from that population, calculating the sample means iY , and then 
calculating the variance of those sample means. 
 
21
2
2
1
)(
Y
r
i
i
Y r
YY
s 



 
 
 
The square root of 
2
Ys is called standard error (or standard deviation of a mean). 
 
 Standard error: n
sss YY  2 
 
 As with the standard deviation, this is a quantity in the original units of 
observation. 
 
 The SE is important in determining confidence intervals and the powers 
of tests. 
 
 3
The Normal distribution (~N) 
 
If you measure a quantitative trait most of the measurements will cluster near the 
population mean (µ), and as you consider values further and further from µ, 
individuals exhibiting those values become rarer. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Some basic characteristics of this kind of distribution are: 
 
1) The maximum value occurs at µ; 
2) The dispersion is symmetric about µ (i.e. the mean, median, and mode of 
the population are equal); and 
3) The “tails” asymptotically approach zero. 
 
A distribution which meets these basic criteria is known as a normal distribution. 
 
 The following conditions tend to result in a normal distribution: 
 
1) There are many factors which contribute to the observed value of the trait; 
2) These many factors act independently of one another; and 
3) The individual effects of the factors are additive and of comparable magnitude. 
 
 Many biological and ecological variables are approximately normally distributed. 
 
 The bell-shaped normal distribution is also known as a Gaussian curve, named 
after Friedrich Gauss who figured out the formal mathematics: 
 
 Z(Y) is the height of the curve at a given 
observed value Y. 
 The location and shape are uniquely 
determined by only two parameters, µ and 
σ2. 
µ 
Frequency 
of observation 
Observed 
value 
2
2
1
2
1)(


  


Y
eYZ
 4

 ii YZ
 If we set µ = 0 and σ2 = 1, we obtain a standard normal curve [N(0,1)]: 
 By varying the value of µ, one can center Z(Y) anywhere on the x-axis. 
 By varying σ2, one can freely adjust the width of the central hump. 
 
5 0 - 5 
0 . 4 
0 . 3 
0 . 2 
0 . 1 
0 . 0 
S i g m a 
F r 
e q 
. 
N o r m a l ( 0 , 1) 
 
50- 5
0.4
0.3
0.2
0 .1
0.0
Sigma
Fr
eq
.
Normal (0 , 2)
 
 
50 -5 
0.4
0.3
0.2
0.1
0.0
S i g m a 
Fr
eq
.
N o r m a l ( 1 , 1)
 
To convert any ~N into a standard N curve: 
 
Standard N curve where - centers to 0 
  =0, =1 / puts variation in units of  
 
 
 
 
 
 
 
 
 
 
 
 
The following % of items lie within the indicated limits: 
    contains 68.27% of the items 
   2 contains 95.45% of the items 
   3 contains 99.73% of the items 
 
Conversely: 
 50% of the items fall between   0.674 
 95% of the items fall between   1.960 
 99% of the items fall between   2.576 68.27%
95.45%
99.73%
Location and Scale transformation (when 0 and/or 1) 
 
 N(1,1) -= N(0,1) 
Z= (Y-)/ 
 N(0,2) /= N(0,1) 
505
0 . 4 
0 . 3 
0 . 2 
0 . 1 
0 . 0 
F r 
e q 
. 
0.4
0.3
0.2
0 .1
0.0
Fr
eq
0.4
0.3
0.2
0 .1
0.0
Fr
eq
505
0 . 4 
0 . 3 
0 . 2 
0 . 1 
0 . 0 
F r 
e q 
. 
-5 0 1 5 
-5 0 5 
-5 0 5 
-5 0 5 
 5
Q1: From a ~N population of finches with mean weight µ = 17.2 g and variance σ2 = 36 g2, 
what is the probability of randomly selecting an individual finch weighing > than 22 g? 
 
Solution: To answer this, first convert the value 22 g to its corresponding normal score: 
 
8.0
6
2.1722 
g
ggYZ ii 

 
 
Table A14: 21.19% of the area lies to the right of Z = 0.8. Then, 22 g is not an unusual 
weight for a finch in this population (less than 1 SD from the mean). 
 
 
 
 
 
 
 
 
 
 
 
 
Q2: From the same population. What is the probability of randomly selecting a sample of 20 
finches with an average weight of more than 22 g? 
 
This question is asking for the probability of selecting a sample of a certain average value. 
 
For a sample of size n = 20, the appropriate distribution to consider is the normal distribution 
of sample means 
for sample size n = 20 (µ = 17.2 g and 2
22
2
)20( 8.120
36 gg
nnY
  
 
With this in mind, we proceed as before: 
 
6.3
34.1
2.1722
)20(


ggYZ
nY
i
i 

 
 
Table A14: only 0.02% of the area lies to the right of Z = 3.6 (only 0.02% chance) 
22 g is an extremely unusual mean weight for a sample of twenty finches in this 
population (it is >3 SE from the mean!). 
 
One final word about the wide applicability of the normal distribution: 
 
 
17.2 
Y 
22.0 
Question: What is this area? 
Or: P(Y≥22) = X 
0.8 
Z 
0
Answer: 
P(Y≥22) = P(Z≥0.8) = 0.2119 
The central limit theorem states that, as sample size increases, the 
distribution of sample means drawn from a population of any distribution 
will approach a normal distribution with mean µ and variance σ2/n. 
 
 6
Use of the normal distribution table (page 612, Appendix A4) 
 
For any value of Z, the table reports the area under the curve to the right of Z. 
 
This area to the right of Z is the theoretical probability of randomly picking an 
individual from N(0,1) whose value is greater than Z. 
From Table 
P(Z  1.17)= 0.121 (pb inside Table)
 
If asked 
 
P(Z 1.17)=1- P(Z  1.17)= 0.879 
P(0.42Z  1.61)= 
 
P(Z  0.42) - P(Z  1.61)= 
 
0.3372 - 0.0537 = 0.2835 
P(-1.61Z  0.42)= 
 
P(Z  -1.61) - P(Z 0.42)= 
 
1- P(Z  1.61) - P(Z  0.42)= 
 
[1- 0.0537] - 0.3372= 
 
0.9463 - 0.3372=0.6091 
P(|Z|  1.05)= 
 
2 * P(Z  1.05)= 
 
2 * 0.1469= 0.2938 
 7
4.78943.75)227.1*2()*(  YszY
s
YYz
Normal probability plot (Q-Q plot) ST&D p. 566 
 
14 malt extract values: 77.7, 76.0, 76.9, 74.6, 74.7, 76.5, 74.2, 75.4, 76.0, 76.0, 
73.9, 77.4, 76.6, 77.3 (ST&D p. 30, Lab1). N=14  
Divide ~N in 14 intervals = area. 
Normal line: slope=s=1.227, intercept= Y =75.943. y= a+bx 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Graphic tool for assessing normality 
78.4 Sahpiro-Wilk test for ~N 
 
Correlation coefficient 
between the data and the 
normal scores. 
 
W=1 perfect ~N 
 
W=0.8 ~N? 
 
SAS 
PROC UNIVARIATE 
 NORMAL; 
 
Pr<W should be lower than 
0.05 to reject Normality