Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
. 2009 Oct 16;5(10):e1000686. doi: 10.1371/journal.pgen.1000686

Figure 2. Principal component analysis of two populations.

Figure 2

(A) Consider a sample of Inline graphic individuals from population A (indicated by the red circle) and Inline graphic from population B (indicated by the blue circle), where the two populations have the same effective population size of Inline graphic and are both derived from a single ancestral population, also of size Inline graphic, with the split happening a time Inline graphic in the past. (B) The expected locations of these two sets of samples on the first PC is defined by the time since divergence (the Euclidean distance between the samples is Inline graphic) (see text for definitions) and the relative sample size from the populations, with the larger sample lying closer to the origin. Defining Inline graphic, the relative location of the two populations on the first PC are Inline graphic for samples from population A and Inline graphic for samples from population B (note that the sign is arbitrary). (C) To investigate the effect of finite genome size simulations were carried out for the model shown in part A with 80 genomes sampled from population A, 20 from population B and a split time of 0.02 Inline graphic generations (Inline graphic) and between Inline graphic and Inline graphic SNPs. Lines indicate the analytical expectation. A jitter has been added to the x-axis for clarity. Note that the separation of samples with 10 SNPs does not correlate with population and simply reflects random clustering arising from the small numbers of SNPs.