Baixe o app para aproveitar ainda mais
Prévia do material em texto
Hardy-‐Weinberg Estudo dirigido 1 1) Calcule as frequência esperada do genótipo abaixo e a probabilidade de se encontrar um indivíduo com esse mesmo genótipo, levando em consideração a seguinte tabela com as frequências de cada alelo. Na sua opinião a identificação feita por esse conjunto de marcadores é confiável? 2) Teste a hipótese nula de Hardy-‐Weinberg para uma amostra populacional de 459 indivíduos com os seguintes genótipos: MM = 144, MN = 201, NN 114. 3) Um sítio de clivagem da enzima de restrição BanI está localizado dentro de um íntron longo do gene que codifica a álcool desidrogenase em D. melanogaster. Este sítio foi localizado em 29 dos 60 cromossomos isolados de uma população amostrada em Raleigh, Carolina do Norte (Keitman e Aguadé, 1986). Considere que B e b representam a presença e ausência, respectivamente, do sítio de BanI no cromossomo. Assumindo o equilíbrio de Hardy-‐Weinberg, calcule as freqüências esperadas dos genótipos BB, Bb e bb. 4) No sistema sanguíneo Ss, relacionado ao sistema MN, três fenótipos correspondentes aos genótipos SS, Ss e ss podem ser definidos. Numa amostra de 1000 britânicos, o número observado de cada genótipo para o sistema Ss foi 99 SS, 418 Ss e 483 ss. Estime a freqüência alélica de S (p) e s (q) e verifique se as frequênias genotípicas estão em equilíbrio Hardy-‐Weinberg. 5) Numa amostra de 1617 bascos da Espanha, o número de indivíduos com sangue do tipo A, B, O e AB foi 724, 110, 763 e 20, respectivamente. As melhores estimativas para as freqüências alélicas são p1 = 0.2661 (para IA), p2 = 0.0411 (para IB) e p3 = 0.6928 (para IO). Calcule os números esperados para os quatro genótipos e conduza um teste para verificar se freqüências observadas seguem o princípio de Hardy-‐Weinberg. 22 CHAPTER 2 ·· Current forensic DNA profiles use 10–13 loci to estimate expected genotype frequencies. Problem 2.1 gives a 10-locus genotype for the same individual in Table 2.2, allowing you to calculate the odds ratio for a realistic example. In Chapter 4 we will reconsider the expected frequency of a DNA profile with the added complication of allele-frequency differentiation among human racial groups. Testing for Hardy–Weinberg A common use of Hardy–Weinberg expectations is to test for deviations from its null model. Populations with genotype frequencies that do not fit Hardy– Weinberg expectations are evidence that one or more of the evolutionary processes embodied in the assumptions of Hardy–Weinberg are acting to det- ermine genotype frequencies. Our null hypothesis is that genotype frequencies meet Hardy–Weinberg expectations within some degree of estimation error. Genotype frequencies that are not close to Hardy– Weinberg expectations allow us to reject this null hypothesis. The processes in the list of assumptions then become possible alternative hypotheses to explain observed genotype frequencies. In this sec- tion we will work through a hypothesis test for Hardy–Weinberg equilibrium. The first example uses observed genotypes for the MN blood group, a single locus in humans that has Calculate the expected genotype frequency and odds ratio for the 10-locus DNA profile below. Allele frequencies are given in Table 2.3. D3S1358 17, 18 vWA 17, 17 FGA 24, 25 Amelogenin X, Y D8S1179 13, 14 D21S11 29, 30 D18S51 18, 18 D5S818 12, 13 D13S317 9, 12 D7S820 11, 12 What does the amelogenin locus tell us and how did you assign an expected frequency to the observed genotype? Is it likely that two unrelated individuals would share this 10-locus genotype by chance? For this genotype, would a match between a crime scene sample and a suspect be convincing evidence that the person was present at the crime scene? Problem box 2.1 The expected genotype frequency for a DNA profile The loci used for human DNA profiling are a general class of DNA sequence marker known as simple tandem repeat (STR), simple sequence repeat (SSR), or microsatellite loci. These loci feature tandemly repeated DNA sequences of one to six base pairs (bp) and often exhibit many alleles per locus and high levels of heterozygosity. Allelic states are simply the number of repeats present at the locus, which can be determined by electrophoresis of PCR amplified DNA fragments. STR loci used in human DNA profiling generally exhibit Hardy–Weinberg expected genotype frequencies, there is evidence that the genotypes are selectively “neutral” (i.e. not affected by natural selection), and the loci meet the other assumptions of Hardy–Weinberg. STR loci are employed widely in population genetic studies and in genetic mapping (see reviews by Goldstein & Pollock 1997; McDonald & Potts 1997). This is an example of the DNA sequence found at a microsatellite locus. This sequence is the 24.1 allele from the FGA locus (Genbank accession no. AY749636; see Fig. 2.8). The integral repeat is the 4 bp sequence CTTT and most alleles have sequences that differ by some number of full CTTT repeats. However, there are exceptions where alleles have sequences with partial repeats or stutters in the repeat pattern, for example the TTTCT and CTC sequences imbedded in the perfect CTTT repeats. In this case, the 24.1 allele is 1 bp longer than the 24 allele sequence. (continued) Box 2.1 DNA profiling 9781405132770_4_002.qxd 1/19/09 2:22 PM Page 22 22 CHAPTER 2 ·· Current forensic DNA profiles use 10–13 loci to estimate expected genotype frequencies. Problem 2.1 gives a 10-locus genotype for the same individual in Table 2.2, allowing you to calculate the odds ratio for a realistic example. In Chapter 4 we will reconsider the expected frequency of a DNA profile with the added complication of allele-frequency differentiation among human racial groups. Testing for Hardy–Weinberg A common use of Hardy–Weinberg expectations is to test for deviations from its null model. Populations with genotype frequencies that do not fit Hardy– Weinberg expectations are evidence that one or more of the evolutionary processes embodied in the assumptions of Hardy–Weinberg are acting to det- ermine genotype frequencies. Our null hypothesis is that genotype frequencies meet Hardy–Weinberg expectations within some degree of estimation error. Genotype frequencies that are not close to Hardy– Weinberg expectations allow us to reject this null hypothesis. The processes in the list of assumptions then become possible alternative hypotheses to explain observed genotype frequencies. In this sec- tion we will work through a hypothesis test for Hardy–Weinberg equilibrium. The first example uses observed genotypes for the MN blood group, a single locus in humans that has Calculate the expected genotypefrequency and odds ratio for the 10-locus DNA profile below. Allele frequencies are given in Table 2.3. D3S1358 17, 18 vWA 17, 17 FGA 24, 25 Amelogenin X, Y D8S1179 13, 14 D21S11 29, 30 D18S51 18, 18 D5S818 12, 13 D13S317 9, 12 D7S820 11, 12 What does the amelogenin locus tell us and how did you assign an expected frequency to the observed genotype? Is it likely that two unrelated individuals would share this 10-locus genotype by chance? For this genotype, would a match between a crime scene sample and a suspect be convincing evidence that the person was present at the crime scene? Problem box 2.1 The expected genotype frequency for a DNA profile The loci used for human DNA profiling are a general class of DNA sequence marker known as simple tandem repeat (STR), simple sequence repeat (SSR), or microsatellite loci. These loci feature tandemly repeated DNA sequences of one to six base pairs (bp) and often exhibit many alleles per locus and high levels of heterozygosity. Allelic states are simply the number of repeats present at the locus, which can be determined by electrophoresis of PCR amplified DNA fragments. STR loci used in human DNA profiling generally exhibit Hardy–Weinberg expected genotype frequencies, there is evidence that the genotypes are selectively “neutral” (i.e. not affected by natural selection), and the loci meet the other assumptions of Hardy–Weinberg. STR loci are employed widely in population genetic studies and in genetic mapping (see reviews by Goldstein & Pollock 1997; McDonald & Potts 1997). This is an example of the DNA sequence found at a microsatellite locus. This sequence is the 24.1 allele from the FGA locus (Genbank accession no. AY749636; see Fig. 2.8). The integral repeat is the 4 bp sequence CTTT and most alleles have sequences that differ by some number of full CTTT repeats. However, there are exceptions where alleles have sequences with partial repeats or stutters in the repeat pattern, for example the TTTCT and CTC sequences imbedded in the perfect CTTT repeats. In this case, the 24.1 allele is 1 bp longer than the 24 allele sequence. (continued) Box 2.1 DNA profiling 9781405132770_4_002.qxd 1/19/09 2:22 PM Page 22 Genotype frequencies 25 larger if the expected number is 8 (it is 50%). Adding all of these relative squared differences gives the total relative squared deviation observed over all genotypes. (2.8) We need to compare our statistic to values from the χ2 distribution. But first we need to know how much information, or the degrees of freedom (commonly abbreviated as df ), was used to estimate the χ2 statistic. In general, degrees of freedom are based on the number of categories of data: df = no. of classes compared – no. of parameters estimated – 1 χ2 2 2 221 6 181 61 43 2 518 80 21 6 360 = − + + −( . ) . ( . ) . ( . ) .. . 58 7 46= for the χ2 test itself. In this case df = 3 − 1 −1 = 1 for three genotypes and one estimated allele frequency (with two alleles: the other allele frequency is fixed once the first has been estimated). Figure 2.9 shows a χ2 distribution for one degree of freedom. Small deviations of the observed from the expected are more probable since they leave more area of the distribution to the right of the χ2 value. As the χ2 value gets larger, the probability that the difference between the observed and expected is just due to chance sampling decreases (the area under the curve to the right gets smaller). Another way of saying this is that as the observed and expected get increasingly different, it becomes more improb- able that our null hypothesis of Hardy–Weinberg is actually the process that is determining genotype frequencies. Using Table 2.5 we see that a χ2 value of 7.46 with 1 df has a probability between 0.01 and 0.001. The conclusion is that the observed genotype frequencies would be observed less than 1% of the time in a population that actually had Hardy–Weinberg expected genotype frequencies. Under the null hypothesis we do not expect this much difference or more from Hardy–Weinberg expecta- tions to occur often. By convention, we would reject chance as the explanation for the differences if the χ2 value had a probability of 0.05 or less. In other words, if chance explains the difference in five trials out of 100 or less then we reject the hypothesis that the observed and expected patterns are the same. The critical value above which we reject the null hypothesis for a χ2 test is 3.84 with 1 df, or in nota- tion χ20.05, 1 = 3.84. In this case, we can clearly see an excess of heterozygotes and deficits of homozygotes and employing the χ2 test allows us to conclude that Hardy–Weinberg expected genotype frequencies are not present in the population. ·· 0 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P ro ba bi lit y of o bs er vi ng a s m uc h or m or e d iff er en ce b y ch an ce χ2 0.05,1 = 3.84 χ2 Observed = 7.46 χ2 value Figure 2.9 A χ2 distribution with one degree of freedom. The χ2 value for the Hardy–Weinberg test with MN blood group genotypes as well as the critical value to reject the null hypothesis are shown (see text for details). The area under the curve to the right of the arrow indicates the probability of observing that much or more difference between the observed and expected outcomes. Table 2.5 χ2 values and associated cumulative probabilities in the right-hand tail of the distribution for 1–5 df. Probability df 0.5 0.25 0.1 0.05 0.01 0.001 1 0.4549 1.3233 2.7055 3.8415 6.6349 10.8276 2 1.3863 2.7726 4.6052 5.9915 9.2103 13.8155 3 2.3660 4.1083 6.2514 7.8147 11.3449 16.2662 4 3.3567 5.3853 7.7794 9.4877 13.2767 18.4668 5 4.3515 6.6257 9.2364 11.0705 15.0863 20.5150 9781405132770_4_002.qxd 1/19/09 2:22 PM Page 25 20 C H APTER 2 ·· Table 2.3 Allele frequencies for nine STR loci commonly used in forensic cases estimated from 196 US Caucasians sampled randomly with respect to geographic location. The allele names are the numbers of repeats at that locus (see Box 2.1). Allele frequencies (Freq) are as reported in Budowle et al. (2001), Table 1, from FBI sample population. D3S1358 vWA D21S11 D18S51 D13S317 FGA D8S1179 D5S818 D7S820 Allele Freq Allele Freq Allele Freq Allele Freq Allele Freq Allele Freq Allele Freq Allele Freq Allele Freq 12 0.0000 13 0.0051 27 0.0459 <11 0.0128 8 0.0995 18 0.0306 <9 0.0179 9 0.0308 6 0.0025 13 0.0025 14 0.1020 28 0.1658 11 0.0128 9 0.0765 19 0.0561 9 0.1020 10 0.0487 7 0.0172 14 0.1404 15 0.1122 29 0.1811 12 0.1276 10 0.0510 20 0.1454 10 0.1020 11 0.4103 8 0.1626 15 0.2463 16 0.2015 30 0.2321 13 0.1224 11 0.3189 20.2 0.0026 11 0.0587 12 0.3538 9 0.1478 16 0.2315 17 0.2628 30.2 0.0383 14 0.1735 12 0.3087 21 0.1735 12 0.1454 13 0.1462 10 0.2906 17 0.2118 18 0.2219 31 0.0714 15 0.1276 13 0.1097 22 0.1888 13 0.3393 14 0.0077 11 0.2020 18 0.1626 19 0.0842 31.2 0.0995 16 0.1071 14 0.0357 22.2 0.0102 14 0.2015 15 0.0026 12 0.1404 19 0.0049 20 0.0102 32 0.0153 17 0.1556 23 0.1582 15 0.1097 13 0.0296 32.2 0.1122 18 0.0918 24 0.1378 16 0.0128 14 0.0074 33.2 0.0306 19 0.0357 25 0.0689 17 0.0026 35.2 0.0026 20 0.0255 26 0.0179 21 0.0051 27 0.0102 22 0.0026 9781405132770_4_002.qxd 1/19/09 2:22 PM Page 20
Compartilhar