CINXE.COM
Fst example
<!DOCTYPE html PUBLIC "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Generator" content="Microsoft Word 98"> <meta name="GENERATOR" content="Mozilla/4.76 (Macintosh; U; PPC) [Netscape]"> <title>Fst example</title> </head> <body text="#000000" bgcolor="#c0c0c0" link="#0000ee" vlink="#551a8b" alink="#ff0000"> <a name="Top"></a><font face="Times"><font size="+2">Worked example of calculating <i>F-statistics</i> from genotypic data:</font></font> <p><b><font face="Times New Roman,Times"><a href="http://www.uwyo.edu/dbmcd/popecol/index.html"> Return to Main Index page</a></font></b> <a href="http://www.uwyo.edu/dbmcd/popecol/Maylects/lect37.html"> Go to Lecture 35</a> <a href="http://www.uwyo.edu/dbmcd/popecol/Maylects/lect38.html">Go to Lecture 36</a> </p> <table border="1" cellpadding="5" width="311"> <tbody> <tr> <td valign="top" width="41%"> </td> <td valign="top" width="19%"> </td> <td valign="top" width="25%"><font face="Times">Genotype</font></td> <td valign="top" width="15%"> </td> </tr> <tr> <td valign="top" width="41%"> </td> <td valign="top" width="19%"> <center><i><font face="Times">AA</font></i></center> </td> <td valign="top" width="25%"> <center><i><font face="Times">Aa</font></i></center> </td> <td valign="top" width="15%"> <center><i><font face="Times">aa</font></i></center> </td> </tr> <tr> <td valign="top" width="41%"> <center><font face="Times">Subpopulation 1</font></center> </td> <td valign="top" width="19%"> <center><font face="Times">125</font></center> </td> <td valign="top" width="25%"> <center><font face="Times">250</font></center> </td> <td valign="top" width="15%"> <center><font face="Times">125</font></center> </td> </tr> <tr> <td valign="top" width="41%"> <center><font face="Times">Subpopulation 2</font> </center> </td> <td valign="top" width="19%"> <center><font face="Times">50</font></center> </td> <td valign="top" width="25%"> <center><font face="Times">30</font></center> </td> <td valign="top" width="15%"> <center><font face="Times">20</font></center> </td> </tr> <tr> <td valign="top" width="41%"> <center><font face="Times">Subpopulation 3</font> </center> </td> <td valign="top" width="19%"> <center><font face="Times">100</font></center> </td> <td valign="top" width="25%"> <center><font face="Times">500</font></center> </td> <td valign="top" width="15%"> <center><font face="Times">400</font></center> </td> </tr> </tbody> </table> <p><font face="Times"><i>N</i> (number of individuals genotyped. The sum of each of the rows in the table above):</font> </p> <blockquote><font face="Times">Population 1: 500</font> <br> <font face="Times">Population 2: 100</font> <br> <font face="Times">Population 3: 1,000</font></blockquote> <font face="Times">Remember that the number of alleles is <b><font color="#ff0000">TWICE</font></b> the number of genotypes.</font> <p><font face="Times"><b>Step 1. Calculate the gene (allele) frequencies</b>:</font> </p> <blockquote><font face="Times">Each homozygote will have two alleles, each heterozygote will have one allele. Note that the denominator will be twice <i>N</i><sub>i</sub> (twice as many alleles as individuals).</font> <p><img src="EqnFST.1.gif" border="0" height="183" width="422" align="middle"> <b>Eqns FST.1</b></p> </blockquote> <p><br> <font face="Times"><b>Step 2. Calculate the expected genotypic counts under Hardy-Weinberg Equilibrium</b>, and then calculate the <b>excess or deficiency of homozygotes in each subpopulation</b>.</font> </p> <blockquote><font face="Times">Pop. 1 Expected <em>AA</em> = 500*0.5<sup>2</sup> = 125 (= observed)</font> <p><font face="Times"> Expected <em>Aa</em> = 500*2*0.5*0.5 = 250 (= observed)</font> </p> <p><font face="Times"> Expected <em>aa</em> = 500*0.5<sup>2</sup> = 125 (= observed)</font> </p> <p><font face="Times">Pop. 2 Expected <em>AA</em> = 100*0.65<sup>2</sup> = 42.25 (observed has excess of 7.75)</font> </p> <p><font face="Times"> Expected <em>Aa</em> = 100*2*0.65*0.35 = 45.5 (observed has deficit of 15.5)</font> </p> <p><font face="Times"> Expected <em>aa</em> = 100*0.35<sup>2</sup> = 12.25 (observed has excess of 7.75)</font> </p> <p><font face="Times"> <font color="#0000ff">Note that sum of two types of homozygote excess = amount of heterozygote deficiency.</font></font> <br> <font face="Times"><font color="#0000ff"> These quantities <b>have</b> to balance (it's a mathematical necessity, given that <i>p</i> + <i>q</i> =1.</font></font> <br> </p> <p><font face="Times">Pop. 3 Expected <em>AA</em> = 1,000*0.35<sup>2</sup> = 122.5 (observed has deficiency of 22.5)</font> </p> <p><font face="Times"> Expected <em>Aa</em> = 1,000*2*0.65*0.35 = 455 (observed has excess of 45)</font> </p> <p><font face="Times"> Expected <em>aa</em> = 1,000*0.35<sup>2</sup> = 422.5 (observed has deficiency of 22.5)</font> </p> <p><font face="Times"><b>Summary of homozygote deficiency or excess relative to HWE</b>:</font> </p> <p><font face="Times"> Pop. 1. Observed = Expected: perfect fit</font> <br> <font face="Times"> Pop. 2. Excess of 15.5 homozygotes: some inbreeding</font> <br> <font face="Times"> Pop. 3. Deficiency of 45 homozygotes: outbred or experiencing a Wahlund effect (isolate breaking).</font></p> </blockquote> <p><b>Step 3. Calculate the local <font color="#ff0000">observed </font>heterozygosities</b> of each subpopulation (we will call them <em>H</em><sub>obs s</sub>, where the <i>s</i> subscript refers to the <i>s</i><sup>th</sup> of <i>n</i> populations -- 3 in this example). Here we count <b>genotypes</b>: </p> <blockquote> <blockquote> <p><i>H</i><sub>obs 1</sub> = 250 / 500 = 0.5 </p> <p><i>H</i><sub>obs 2</sub> = 30 / 100 = 0.3</p> <p><i>H</i><sub>obs 3</sub> = 500 / 1000 = 0.5</p> </blockquote> </blockquote> <b>Step 4. Calculate the local expected heterozygosity, or gene diversity, of each subpopulation </b>(modified version of Eqn 35.1): <blockquote><img src="EqnFST.2.gif" border="0" height="97" width="559" align="middle"> <b>Eqns FST.2</b> <p>(With two alleles it would actually be easier to use 2<i>pq</i> than to use the summation format of Eqn 33.1)</p> </blockquote> <blockquote><font color="#0000ff"><b><font size="+2">Notation</font></b>: Note that I am using <i>p</i><sub>1</sub> and <i>q</i><sub>1</sub> here (where the subscripts refer to subpopulations 1 through 3). We would need to use multiple subscripts if we were using the notation of Eqn 35.1 where the alleles are <i>p</i><sub>i</sub> (and the <i>i</i> refer to alleles 1 to <i>k</i>). Indeed, with real multi-locus multipopulation data, we would have a triple summation and three subscripts; one for alleles (<i>i</i> =1 to <i>k</i>), one for the loci (</font> <font face="Mistral">l </font>=1 to <i>m</i>), and one for subpopulations (<i>s</i> = 1 to <i>n</i>).</font></blockquote> <b>Step 5. Calculate the local inbreeding coefficient of each subpopulation </b>(same as Eqn 35.4, except that we are subscripting for the subpopulations): <blockquote><img src="EqnFST.3.gif" style="border: 0px solid ; height: 35px; width: 126px;" align="middle" title="" alt="FS"> where <i>s</i> (<i>s</i> = 1 to 3) refers to the subpopulation <b>Eqn FST.3</b></blockquote> <blockquote><i>F</i><sub>1</sub> = (0.5 — 0.5) / 0.5 = 0 <p><i>F</i><sub>2</sub> = (0.455 — 0.3) / 0.455 = 0.341 <br> [positive <i>F</i> means fewer heterozygotes than expected indicates inbreeding] </p> <p><i>F</i><sub>3</sub> = (0.455 — 0.5) / 0.455 = -0.099 <br> [negative <i>F</i> means more heterozygotes than expected means excess outbreeding]</p> </blockquote> <b>Step 6. Calculate <img src="pbar.gif" border="0" height="15" width="12"><i> (p-bar</i>, the frequency of allele <i>A</i>) over the total population.</b> <br> [<i>Note that if we had more alleles we could put this and Step 7 all together as a single "global gene frequencies" step, or have one for each allele frequency</i>]. <blockquote><img src="pbarcalc1a.gif" border="0" height="31" width="360" align="middle"> {genotype splitting method}</blockquote> <blockquote> <blockquote>or (yields same answer)</blockquote> <img src="pbarcalc2.gif" border="0" height="31" width="357" align="middle"> {using Eqn FST.1 values for <i>p</i><sub>s</sub>} <p> <i>Note that we weight by <b>population size</b></i></p></blockquote> <b>Step 7. Calculate <img src="qbara.gif" border="0" height="15" width="12"> (<i>q-bar</i></font>, the frequency of allele <i>a</i>) over the total population</b> <blockquote><img src="qbarcalc.gif" border="0" height="31" width="360"></blockquote> <blockquote><b><font color="#ff0000">Check</font></b>: <i>p-bar</i> + <i>q-bar</i> = 0.4156 + 0.5844 = 1.0 (as required by Eqn 31.1). <p> The check doesn't guarantee that our result is correct, but if they don't sum to one, we know we miscalculated.</p></blockquote> <p><b>Step 8. Calculate the global heterozygosity indices (over <font color="#0000ff">I</font>ndividuals, <font color="#0000ff">S</font>ubpopulations and <font color="#0000ff">T</font>otal population)</b></p> <blockquote> <p><i>Note that the first two calculations employ a weighted average of the values in the whole set of subpopulations.</i></p> <p><i>H</i><sub><font color="#0000ff">I</font> </sub>based on <b><font color="#ff0000">observed</font></b> heterozygosities in <b>individuals</b> in subpopulations </p> <p><img src="EqnFST.4.gif" style="border: 0px solid ; height: 84px; width: 372px;" title="" alt="HI"> <b>Eqn FST.4</b> </p> <p><i>H</i><sub><font color="#0000ff">S</font></sub> based on <b><font color="#0000cc">expected</font></b> heterozygosities in <b>subpopulations</b></p> <p> <img src="EqnFST.5.gif" style="border: 0px solid ; height: 85px; width: 383px;" title="" alt="HS"> <b>Eqn FST.5</b> </p> <p><i>H</i><sub><font color="#0000ff">T</font> </sub>based on <b><font color="#0000cc">expected</font></b> heterozygosities for overall total population (using global allele frequencies and a modified form of Eqn 35.1): </p> <p><img src="Htcalc.gif" style="border: 0px solid ; height: 21px; width: 359px;" title="" alt="HTcalc"> <b>Eqn FST.6</b> </p> </blockquote> <p> or we could calculate it as 2*<em>p</em>-bar *<i>q</i>-bar = 2 * 0.4156 * 0.5844 = 0.4858</p> <b>Step 9. CALCULATE THE GLOBAL <i>F</i>-STATISTICS:</b> <blockquote>Compare and contrast the global <i>F</i><sub>IS</sub>below with the "local inbreeding coefficient" <i>F</i><sub>s</sub> of Step 5. <br> Here we are using a weighted average of the individual heterozygosities over all the subpopulations. <br>Both <i>F</i><sub>IS</sub> and <i>F</i><sub>s</sub> are, however, based on the <b>observed</b> heterozygosities, <br> whereas <i>F</i><sub>ST</sub> and <i>F</i><sub>IT</sub> are based on <b>expected</b> heterozygosities. <p><img src="EqnFIS.gif" style="border: 0px solid ; height: 32px; width: 299px;" align="middle" title="" alt="FIScalc"> <b>Eqn FST.7</b></p> </blockquote> <blockquote><img src="EqnFST.gif" style="border: 0px solid ; height: 32px; width: 291px;" align="middle" title="" alt="FSTcalc"> <b>Eqn FST.8</b></blockquote> <blockquote><img src="EqnFIT.gif" style="border: 0px solid ; height: 32px; width: 299px;" align="middle" title="" alt="FITcalc"> <b>Eqn FST.9</b> <p><b>Notation note</b>: the subscripts I, S, and T are not summation subscripts. They simply indicate the level of our analysis. Likewise, the <i>s</i> on <i>F</i><sub>s</sub> in Step 5 or on the <i>p</i><sub>s</sub> in Step 1 (the <i>s</i> was implicit there) just tell us what we are referring to. In contrast, the subscripts for Eqn 35.1 and 35.2 are used in summations and change as we work through the pieces of the calculation.</p> </blockquote> <b><font face="Times"><font size="+1">Step 10. Finally, draw some conclusions about the genetic structure of the population and its subpopulations.</font></font></b> <blockquote><font face="Times"><font size="+1">1) One of the possible HWE conclusions we could make:</font></font> <blockquote><font face="Times"><font size="+1">Pop. 1 is consistent with HWE (results of Step 2)</font></font></blockquote> <font face="Times"><font size="+1">2) Two of the possible "local inbreeding" conclusions we could make from Step 5:</font></font> <blockquote><font face="Times"><font size="+1">Pop. 2 is inbred (results of Step 5), and</font></font> <br> <font face="Times"><font size="+1">Pop. 3 may have disassortative mating or be experiencing a Wahlund effect (more heterozygotes than expected).</font></font></blockquote> <b><font face="Times"><font size="+1">3) Conclusion concerning overall degree of genetic differentiation (<i>F</i></font><sub>ST</sub><font size="+1">)</font></font></b> <blockquote><font face="Times"><font size="+1">Subdivision of populations, possibly due to genetic drift,</font></font> <br> <font face="Times"><font size="+1">accounts for approx. 3.4% of the total genetic variation</font></font> <br> <font face="Times"><font size="+1"> (result of Eqn FST.8 <i>F</i></font><sub>ST</sub><font size="+1"> calculation in Step 9),</font></font></blockquote> <font face="Times"><font size="+1">4) No excess or deficiency of heterozygotes over the total population (<i>F</i></font><sub>IT</sub><font size="+1"> is nearly zero).</font></font></blockquote> <font face="Times"><font size="+1"><a href="#Top">Return to top</a></font></font> </body> </html>