Baixe o app para aproveitar ainda mais
Prévia do material em texto
1 Measures of Central Tendency and Dispersion [ST&D p. 16-27] Individual values of a population are designated Yi, i = 1,...,N, where N= size of pop. Individual values of a sample are also denoted Yi, i = 1,...,n, where n= size of the sample. Greek letters are used for population parameters (µ = pop. mean; σ2 = pop. variance). Mean or average (measure of central tendency) Pop. mean: Y N i i N 1 * Sample mean: n Y Y r i i 1 Variance (measure of dispersion of the individuals about the mean) Pop. variance: 2 2 1 ( )Y N i i N * Sample variance: 1 )( 1 2 2 n YY s r i i The quantities (Yi - ) are called deviations. To express these measures of dispersion in the original units of observation: Pop. standard deviation: 2 * Sample standard deviation: 2ss To express the standard deviation in units of the mean (or %): Pop. coeff. of variation: CV * Sample coeff. of variation: Y sCV Visualization of central tendency and dispersion using boxplots Review ST&D p. 58 Estimation and inference, p53: 3.8 Distribution of means Box Plots median mean 1.5 IQ range interqartile (IQ) range * 0 Outliers 0 >1.5 IQ and<3 IQ * >3 IQ Y 2 Measures of dispersion of sample means An important population parameter is the sample variance of the mean ( 2Y ). If you repeatedly sample a population by taking samples of size n, the variance of those sample means is what we call the sample variance of the mean. It relates very simply to the population variance: Variance of the mean: nY 2 2 We can estimate 2Y for a population by taking r independent, random samples of size n from that population, calculating the sample means iY , and then calculating the variance of those sample means. 21 2 2 1 )( Y r i i Y r YY s The square root of 2 Ys is called standard error (or standard deviation of a mean). Standard error: n sss YY 2 As with the standard deviation, this is a quantity in the original units of observation. The SE is important in determining confidence intervals and the powers of tests. 3 The Normal distribution (~N) If you measure a quantitative trait most of the measurements will cluster near the population mean (µ), and as you consider values further and further from µ, individuals exhibiting those values become rarer. Some basic characteristics of this kind of distribution are: 1) The maximum value occurs at µ; 2) The dispersion is symmetric about µ (i.e. the mean, median, and mode of the population are equal); and 3) The “tails” asymptotically approach zero. A distribution which meets these basic criteria is known as a normal distribution. The following conditions tend to result in a normal distribution: 1) There are many factors which contribute to the observed value of the trait; 2) These many factors act independently of one another; and 3) The individual effects of the factors are additive and of comparable magnitude. Many biological and ecological variables are approximately normally distributed. The bell-shaped normal distribution is also known as a Gaussian curve, named after Friedrich Gauss who figured out the formal mathematics: Z(Y) is the height of the curve at a given observed value Y. The location and shape are uniquely determined by only two parameters, µ and σ2. µ Frequency of observation Observed value 2 2 1 2 1)( Y eYZ 4 ii YZ If we set µ = 0 and σ2 = 1, we obtain a standard normal curve [N(0,1)]: By varying the value of µ, one can center Z(Y) anywhere on the x-axis. By varying σ2, one can freely adjust the width of the central hump. 5 0 - 5 0 . 4 0 . 3 0 . 2 0 . 1 0 . 0 S i g m a F r e q . N o r m a l ( 0 , 1) 50- 5 0.4 0.3 0.2 0 .1 0.0 Sigma Fr eq . Normal (0 , 2) 50 -5 0.4 0.3 0.2 0.1 0.0 S i g m a Fr eq . N o r m a l ( 1 , 1) To convert any ~N into a standard N curve: Standard N curve where - centers to 0 =0, =1 / puts variation in units of The following % of items lie within the indicated limits: contains 68.27% of the items 2 contains 95.45% of the items 3 contains 99.73% of the items Conversely: 50% of the items fall between 0.674 95% of the items fall between 1.960 99% of the items fall between 2.576 68.27% 95.45% 99.73% Location and Scale transformation (when 0 and/or 1) N(1,1) -= N(0,1) Z= (Y-)/ N(0,2) /= N(0,1) 505 0 . 4 0 . 3 0 . 2 0 . 1 0 . 0 F r e q . 0.4 0.3 0.2 0 .1 0.0 Fr eq 0.4 0.3 0.2 0 .1 0.0 Fr eq 505 0 . 4 0 . 3 0 . 2 0 . 1 0 . 0 F r e q . -5 0 1 5 -5 0 5 -5 0 5 -5 0 5 5 Q1: From a ~N population of finches with mean weight µ = 17.2 g and variance σ2 = 36 g2, what is the probability of randomly selecting an individual finch weighing > than 22 g? Solution: To answer this, first convert the value 22 g to its corresponding normal score: 8.0 6 2.1722 g ggYZ ii Table A14: 21.19% of the area lies to the right of Z = 0.8. Then, 22 g is not an unusual weight for a finch in this population (less than 1 SD from the mean). Q2: From the same population. What is the probability of randomly selecting a sample of 20 finches with an average weight of more than 22 g? This question is asking for the probability of selecting a sample of a certain average value. For a sample of size n = 20, the appropriate distribution to consider is the normal distribution of sample means for sample size n = 20 (µ = 17.2 g and 2 22 2 )20( 8.120 36 gg nnY With this in mind, we proceed as before: 6.3 34.1 2.1722 )20( ggYZ nY i i Table A14: only 0.02% of the area lies to the right of Z = 3.6 (only 0.02% chance) 22 g is an extremely unusual mean weight for a sample of twenty finches in this population (it is >3 SE from the mean!). One final word about the wide applicability of the normal distribution: 17.2 Y 22.0 Question: What is this area? Or: P(Y≥22) = X 0.8 Z 0 Answer: P(Y≥22) = P(Z≥0.8) = 0.2119 The central limit theorem states that, as sample size increases, the distribution of sample means drawn from a population of any distribution will approach a normal distribution with mean µ and variance σ2/n. 6 Use of the normal distribution table (page 612, Appendix A4) For any value of Z, the table reports the area under the curve to the right of Z. This area to the right of Z is the theoretical probability of randomly picking an individual from N(0,1) whose value is greater than Z. From Table P(Z 1.17)= 0.121 (pb inside Table) If asked P(Z 1.17)=1- P(Z 1.17)= 0.879 P(0.42Z 1.61)= P(Z 0.42) - P(Z 1.61)= 0.3372 - 0.0537 = 0.2835 P(-1.61Z 0.42)= P(Z -1.61) - P(Z 0.42)= 1- P(Z 1.61) - P(Z 0.42)= [1- 0.0537] - 0.3372= 0.9463 - 0.3372=0.6091 P(|Z| 1.05)= 2 * P(Z 1.05)= 2 * 0.1469= 0.2938 7 4.78943.75)227.1*2()*( YszY s YYz Normal probability plot (Q-Q plot) ST&D p. 566 14 malt extract values: 77.7, 76.0, 76.9, 74.6, 74.7, 76.5, 74.2, 75.4, 76.0, 76.0, 73.9, 77.4, 76.6, 77.3 (ST&D p. 30, Lab1). N=14 Divide ~N in 14 intervals = area. Normal line: slope=s=1.227, intercept= Y =75.943. y= a+bx Graphic tool for assessing normality 78.4 Sahpiro-Wilk test for ~N Correlation coefficient between the data and the normal scores. W=1 perfect ~N W=0.8 ~N? SAS PROC UNIVARIATE NORMAL; Pr<W should be lower than 0.05 to reject Normality
Compartilhar