Download Quantitative Trait Loci Methods and Protocols 1st Edition Nicola J. Camp ebook All Chapters PDF
Download Quantitative Trait Loci Methods and Protocols 1st Edition Nicola J. Camp ebook All Chapters PDF
Download Quantitative Trait Loci Methods and Protocols 1st Edition Nicola J. Camp ebook All Chapters PDF
com
https://ebookname.com/product/quantitative-trait-loci-
methods-and-protocols-1st-edition-nicola-j-camp/
OR CLICK BUTTON
DOWNLOAD EBOOK
https://ebookname.com/product/plant-stress-tolerance-methods-and-
protocols-1st-edition-melvin-j-oliver/
ebookname.com
https://ebookname.com/product/cell-migration-developmental-methods-
and-protocols-1st-edition-donna-j-webb/
ebookname.com
https://ebookname.com/product/sirna-design-methods-and-
protocols-2013th-edition-debra-j-taxman/
ebookname.com
https://ebookname.com/product/encyclopedia-of-mobile-computing-and-
commerce-1st-edition-david-taniar/
ebookname.com
Pakistani Englishes Syntactic Variations 1st Edition Asma
Iqbal
https://ebookname.com/product/pakistani-englishes-syntactic-
variations-1st-edition-asma-iqbal/
ebookname.com
https://ebookname.com/product/evaluative-morphology-from-a-cross-
linguistic-perspective-1st-edition-livia-kortvelyessy/
ebookname.com
https://ebookname.com/product/knight-christopher-gravett/
ebookname.com
Thin Groups and Superstrong Approximation 1st Edition
Emmanuel Breuillard
https://ebookname.com/product/thin-groups-and-superstrong-
approximation-1st-edition-emmanuel-breuillard/
ebookname.com
Methods in Molecular Biology TM
VOLUME 195
Quantitative
Trait Loci
Methods and Protocols
Edited by
Nicola J. Camp
Angela Cox
HUMANA PRESS
1
Association Studies
Jennifer H. Barrett
1. Introduction
A classical case-control study design is frequently used in genetic epidemiol-
ogy to investigate the association between genotype and the presence or absence
of disease. Association studies can also be useful in the investigation of quantita-
tive traits. The aim of such studies is to test for association at the population
level between the quantitative trait and genotype at a particular locus. Whether
investigating qualitative or quantitative traits, such studies depend on the prior
identification of a candidate gene or genes. The genotyped locus could either
be a polymorphism within a potentially trait-affecting gene or a marker in
linkage disequilibrium with such a gene. Currently, screening of the whole
genome is only feasible using linkage analysis, which is discussed elsewhere,
because linkage extends over much greater distances than does linkage disequi-
librium.
Quantitative trait association studies are based on a sample of unrelated
subjects from the population. Various sampling designs are possible, including
random sampling and sampling on the basis of an extreme phenotype. The
advantages and disadvantages of these alternative designs are discussed.
The basic method of analysis is called analysis of variance (see Subheading
2.1.) a standard statistical technique for testing for differences in mean between
two or more groups, on the basis of the comparison of between- and within-
group variances. An alternative if subjects are sampled on the basis of extreme
phenotype is to compare genotypes between groups with high and low trait
values (see Subheading 2.2.).
From: Methods in Molecular Biology: vol. 195: Quantitative Trait Loci: Methods and Protocols.
Edited by: N. J. Camp and A. Cox Humana Press, Inc., Totowa, NJ
3
4 Barrett
2. Methods
2.1. Analysis of Variance and Linear Regression
The standard approach to the analysis of quantitative trait association studies
assumes the following model. The phenotype yij of individual i with genotype
j at the locus of interest is given by
yjj = µj + ei (1)
where µj is the mean for the jth genotype and ei represents residual environmental
and possibly polygenic effects for individual i, assumed to be Normally distrib-
uted with mean 0 and variance σ2e . The data required consist of measured
phenotypes and genotypes on a sample of unrelated individuals. The parameters
µj are estimated in the obvious way by the mean values of individuals with
genotype j. The F-statistic from analysis of variance (ANOVA), the ratio of
between- and within-genotype variances, is used to test for the association
between genotype and phenotype, because under the null hypothesis that all
genotypes have the same mean and variance, this ratio should be 1. This
approach has been called the measured-genotype test (1), in contrast to earlier
biometrical methods that use information on the distribution of the phenotype
only (i.e., with unmeasured genotype) discussed briefly in Note 1.
Equivalently, a linear regression analysis of phenotype on genotype can be
carried out, possibly including as covariates other factors that may be related
to phenotype. Where the genotype is determined by one biallelic polymorphism
(with possible genotypes AA, AB, and BB), a test for trend is provided by
regressing the phenotype on the number of copies of the A allele.
There are many examples of this type of approach in the literature. For
example, O’Donnell et al. (2) used multiple linear regression to investigate the
relationship between diastolic blood pressure and different genotypes of the
angiotensin-converting enzyme (ACE) gene. Hegele et al. (3) use analysis of
variance to demonstrate association between serum concentrations of creatinine
and urea and the gene encoding angiotensinogen (AGT).
3. Interpretation
In common with association studies for qualitative traits, a significant associa-
tion does not demonstrate an effect of the polymorphism considered, because
it may also arise through linkage disequilibrium with another locus. A further
similarity is that population admixture can lead to spurious associations. For
this reason, family-based approaches, such as the transmission-disequilibrium
test for quantitative traits (7), have been developed (see Chapter 5).
3.1. Heterogeneity
Published results of associations with quantitative as with qualitative traits
are not always in agreement. Because for most complex traits the effect of any
one locus is likely to be small, individual studies are often not sufficiently
powerful to detect association. To address this issue, Juo et al. (8) carried out
a meta-analysis of studies investigating association between apolipoprotein A-
I levels and variants of the apolipoprotein gene, which had produced conflicting
results. This is a potentially useful approach, but may be flawed by publication
bias, which is likely to be more of an issue in epidemiological studies than in
clinical trials. There is also an assumption that patients are genetically and
clinically homogeneous, with similar environmental exposures.
3.2. Using Extremes
An important consideration when using extreme sampling strategies (as in
outlined in Subheading 2.2.) is that extremes may be untypical of the quantita-
tive trait as a whole in that they may be under the influence of other genes.
A clear example of this, cited in ref. 4, is that studying individuals with
achondroplastic dwarfism would be inappropriate if the primary interest were
in identifying genes controlling height.
3.3. Power of Association Studies
An attractive feature of association studies is that they may require smaller
sample sizes than methods based on linkage (9).
6 Barrett
Schork et al. (5) investigated the power of the extreme sampling method
analytically (Subheading 2.2.) to detect association between the trait and a
single biallelic marker in linkage disequilibrium with a trait-affecting locus.
Power depends on many factors, including locus-specific heritability, degree
of linkage disequilibrium, allele frequencies, mode of inheritance, and choice
of threshold. In some settings, overall sample sizes of less than 500 provided
adequate power to detect association with a locus accounting for 10% of the
trait variance.
The power of several methods of analysis, variants of those described here,
has been compared in a simulation study (10). Under the models considered,
ANOVA/linear regression (see Subheading 2.1.) generally performed better
than a variant of the extremes method (see Subheading 2.2.), based on the
same number of genotyped individuals, as most of the information on phenotype
is lost by categorizing into “high” and “low” values. As with any method based
on selective sampling, another drawback is that it is also necessary to phenotype
a larger number of subjects to achieve the same sample size for analysis. The
same authors suggested a variation on ANOVA/linear regression, the truncated
measured genotype (TMG) test, where only extremes are included in the analysis
(see Note 4). This TMG test was found to be more powerful than ANOVA/
linear regression for the same sample size of genotyped individuals, although,
again, a larger number of subjects must be phenotyped to achieve this. These
results are, however, dependent on the underlying genetic model. Allison et
al. (4) showed that extreme sampling can actually lead to a decrease in power
in the presence of another gene influencing the trait.
Page and Amos (10) also found that variants of ANOVA/linear regression
and of the TMG test, which are based on alleles, were more powerful than the
genotype-based methods discussed earlier. In these approaches, the phenotype
of each individual contributes to two groups, one for each allele or, in the case
of homozygotes, contributes twice to one group. Allele-based methods, which
“double the sample size,” are generally only valid under the assumption of
Hardy–Weinberg equilibrium (11). Furthermore, the greater power of this
approach is to be expected for the models used in these simulations, all of which
assumed an additive effect of the trait allele, and may not apply more generally.
Long and Langley (12) investigated the power to detect association using
a number of single nucleotide polymorphisms in the region of a quantitative
trait locus, but excluding the functional locus itself. Their test statistic was
based on ANOVA (see Subheading 2.1); the significance of the largest F-
statistic obtained from any marker was estimated from its empirical distribution
based on 1000 random permutations of the phenotype/marker data. From their
simulations, they concluded that, using about 500 individuals, there was gener-
ally sufficient power to detect association if 5–10% of the phenotypic variation
was attributable to the locus. Furthermore, tests using single markers had greater
Association Studies 7
Table 1
Summary Data on ACE Levels According to Genotype
ace geno Mean Std. dev. Freq.
II 74.496732 31.729764 153
ID 90.233871 39.484505 124
DD 103.73913 46.564928 23
Total 83.243333 37.475487 300
power than haplotype-based tests. The latter were based on comparing mean
trait values across all distinct haplotypes, and the authors concede that other
haplotype-based tests making use of additional information may perform better.
4. Software
The basic methods described in this chapter can be carried out in standard
statistical software packages such as Stata (13), which is used here, SAS, or
SPSS. The data would generally be expected to consist of one record for each
subject, recording their measured trait value, their genotype, and any covariates
of interest.
5. Worked Example
5.1. Analysis of Variance
An insertion/deletion (I/D) polymorphism of the ACE gene is associated
with plasma ACE levels in some populations. Plasma ACE levels were measured
and I/D genotype obtained for 300 Pima Indians to investigate the relationship
in this population (14). The data consist of 300 records, including ACE levels
(ranging from 7 to 238 units) and genotype (II, ID, or DD).
In Stata, ANOVA can be carried out by the command
oneway ace leve ace geno, tabulate
where ace leve and ace geno are the variables for ACE levels and genotype,
respectively. This produces Tables 1 and 2. Table 1 is produced by specifying
the tabulate option after the oneway command (for one-way analysis of variance)
and provides useful summary information. In addition to the mean ACE levels
within each genotype group (i.e., estimates of µ1, µ2, and µ3), the standard
deviation and the number of subjects with each genotype are displayed. It can
be seen that individuals with the DD genotype have much higher levels on
average than those with the II genotype, with intermediate levels found in
heterozygotes.
Table 2 is the basic ANOVA table. The total variability of the data is
measured by the total sum of squares (419,919) (i.e. the sum of squares of the
8 Barrett
Table 2
Analysis of Variance Results for the Data in Table 1
Source SS df MS F Prob > F
Between groups 27426.3358 2 13713.1679 10.38 0.0000
Within groups 392492.901 297 1321.52492
Total 419919.237 299 1404.41216
differences between each of the observations and the overall mean). This figure
can be separated into the between-genotype sum of squares (the sum of squares
of the difference between the group mean and the overall mean) and the within-
genotype sum of squares (the sum of squares of the differences between each
observation and the mean for the corresponding genotype). These are used to
estimate the corresponding variance, shown in the mean square (MS) column,
by dividing by the number of degrees of freedom. [The number of degrees of
freedom is one less than the number of groups or observations within groups
(i.e., 3−1 for between genotypes and 152+123+22 within genotypes).] The
F-statistic (10.38) is the ratio of these estimated variances. Under the null
hypothesis of no difference between groups, its expected value is 1 and it
should follow an F-distribution with (2, 297) degrees of freedom. In this case,
there is overwhelming evidence for a difference in level according to genotype.
The differences in the initial table are not the result of random variation.
The analysis of variance table (Table 2) can also be obtained by using the
Stata command
anova ace leve ace geno
indicating that the I/D genotype explains 6.5% of the variance in plasma ACE
levels in this population.
Slightly different output, but exactly the same F-test and estimate of R-
squared can alternatively be obtained by carrying out a regression analysis:
xi: regress ace leve i.ace geno
The i in front of the ACE genotype variable shows that this is to be treated
as a categorical variable in the analysis. If, instead, interest was in testing for
a trend in ACE levels with the number of D alleles, then genotype could be
Association Studies 9
Table 3
Genotype Frequencies in Two Extreme Groups Defined by the Top and
Bottom Quintiles of ACE Levelsa
ace geno
Five quantiles of
ace leve II ID DD Total
1 39 20 3 62
62.90 32.26 4.84 100.00
5 17 33 10 122
28.33 55.00 16.67 100.00
Total 56 53 13 122
45.90 43.44 10.66 100.00
a
Pearson chi2(2) = 15.5722, Pr = 0.000.
6. Notes
1. Commingling analysis. The model underlying ANOVA (see Subheading 2.1.)
assumes that the data consist of a mixture of Normal distributions, one corresponding
10 Barrett
to each genotype, each with the same variance. Even in the absence of genotype
data, statistical methods can be used to test for evidence of a mixture of more than
one Normal distribution. This “unmeasured genotype” approach is sometimes known
as commingling analysis. Evidence for a mixture of two or three distributions is
supportive of the hypothesis that a major gene underlies the trait, although, of
course, environmental factors could also give rise to distinct distributions. Model
fitting allows estimates to be made of parameters of interest such as µj and σ2e and
the proportion of subjects in each class.
In the presence of genotype data in a candidate gene, the method of commingling
analysis can be extended to condition on the measured polymorphism(s). In addition
to testing for evidence of a mixture of distributions, this method also provides
evidence of whether the measured genotype itself gives rise to the mixture or whether
another polymorphism in the gene is a more likely explanation (15,16).
2. Distributional assumptions. In view of the underlying model for ANOVA, a Nor-
malizing transformation may be applied to the data. It is important to note that the
model assumes a Normal distribution within each genotype rather than overall. (In
commingling analysis, Normalizing the data leads to a conservative test for mixture,
as this may remove skewness in the overall distribution of the data arising from
the mixing of distributions.) The further assumption of a common within-genotype
variance can be tested, and homogeneity of variance may sometimes be achieved
by transformation. In the worked example in this chapter, there is some evidence
for heterogeneity in the variances. One advantage of the extremes method outlined
in Subheading 2.2. is that it does not rely on these distributional assumptions.
3. Nonparametric alternatives. Another nonparametric alternative to ANOVA is the
Kruskal–Wallis test. In this approach, the complete set of N trait values is ranked
from 1 to N, and the average rank in each genotype group is calculated. The test
statistic is based on comparing the genotype-specific average ranks with the overall
average rank of (N+1)/2. Under the null hypothesis of no genotype–phenotype
association, the test statistic follows a chi-squared distribution with two degrees of
freedom (assuming three genotypes), and a significantly higher value indicates that
the distributions differ. Applying this method to the example in Subheading 5., the
test statistic takes the value 18.2 (p=0.0001). This method is only slightly less
powerful than ANOVA when the data are Normally distributed and has the advantage
that distributional assumptions are not made. However, the test alone is not very
informative, and, in general, the estimates provided by ANOVA are also useful.
4. Analysis of extremes. An alternative suggestion for the analysis of extreme samples,
the TMG method mentioned earlier, is to use analysis of variance, ignoring the
sampling scheme. The analysis of variance assumption of random sampling from
a Normal distribution is violated, but it has been argued that, for large enough
sample sizes, the significance level of the test is still correct (10). The analogs of
this test and of those outlined in Subheadings 2.1. and 2.2. based on alleles rather
than genotypes, where each individual’s phenotype contributes twice to the analysis,
violate the further assumption of independence of observations.
Slatkin (17) suggested selecting individuals on the basis of unusually high (or
low) trait values and testing (1) for a difference in genotype frequency between the
Association Studies 11
selected sample and a random sample and (2) for differences in phenotype distribu-
tion according to genotype within the selected sample. These two tests are approxi-
mately independent and so can be combined into one overall test. This approach is
particularly powerful when a rare allele has a substantial effect on phenotype, even
though the overall proportion of phenotypic variance attributable to the locus is small.
5. Family-based samples. Although association studies as described in this chapter are
applicable to unrelated sets of cases and controls, extensions have been suggested
to allow for relatedness between subjects. Tregouet et al. (18) suggested using
estimating equations, a statistical method for estimating regression parameters based
on correlated data. They found that, for nuclear families of equal size, the power
of this approach was comparable to maximum likelihood and was similar to the
power expected in a sample of the same number of unrelated individuals. However,
the type 1 error rate could be substantially inflated in the presence of strong clustering
if the number of families is relatively small (<50).
References
1. Boerwinkle, E., Chakraborty, R., and Sing, C. F. (1986) The use of measured
genotype information in the analysis of quantitative phenotypes in man. Ann. Hum.
Genet. 50, 181–194.
2. O’Donnell, C. J., Lindpainter, K., Larson, M. G., Rao, V. S., Ordovas, J. M.,
Schaefer, E. J., et al. (1998) Evidence for association and genetic linkage of the
angiotensin-converting enzyme locus with hypertension and blood pressure in men
but not women in the Framingham Heart Study. Circulation 97, 1766–1772.
3. Hegele, R. A., Harris, S. B., Hanley, A. J. G., and Zinman, B. (1999) Association
between AGT codon 235 polymorphism and variation in serum concentrations of
creatinine and urea in Canadian Oji-Cree. Clin. Genet. 55, 438–443.
4. Allison, D. B., Heo, M., Schork, N. J., and Elston, R. C. (1998) Extreme selection
strategies in gene mapping studies of oligogenic quantitative traits do not always
increase power. Hum. Heredity 48, 97–107.
5. Schork, N. J., Nath, S. K., Fallin, D., and Chakravarti, A. (2000) Linkage disequilib-
rium analysis of biallelic DNA markers, human quantitative trait loci, and threshold-
defined case and control subjects Am. J. Hum. Genet. 67, 1208–1218.
6. Risch, N. and Zhang, H. (1995) Extreme discordant sib pairs for mapping quantita-
tive trait loci in humans. Science 268, 1584–1589.
7. Allison, D. B. (1997) Transmission-disequilibrium tests for quantitative traits. Am.
J. Hum. Genet. 60, 676–690.
8. Juo, S.-H.H., Wyszynski, D. F., Beaty, T. H., Huang, H.-Y., and Bailey-Wilson,
J. E. (1999) Mild association between the A/G polymorphism in the promoter of
the apolipoprotein A-I gene and apolipoprotein A-I levels: a meta-analysis. Am.
J. Med. Genet. 82, 235–241.
9. Risch, N. J. (2000) Searching for genetic determinants in the new millennium.
Nature 405, 847–856.
10. Page, G. P. and Amos, C. I. (1999) Comparison of linkage-disequilibrium methods
for localization of genes influencing quantitative traits in humans. Am. J. Hum.
Genet. 64, 1194–1205.
12 Barrett
11. Saseini, P. (1997) From genotype to genes: doubling the sample size. Biometrics
53, 1253–1261.
12. Long, A. D. and Langley, C. H. (1999) The power of association studies to detect
the contribution of candidate gene loci to variation in complex traits. Genome Res.
9, 720–731.
13. StataCorp. 1999. Stata Statistical Software: Release 6.0. Stata Corporation, College
Station, TX.
14. Foy, C. A., McCormack, L. J., Knowler, W. C., Barrett, J. H., Catto, A., and
Grant, P. J. (1996) The angiotensin-I converting enzyme (ACE) gene I/D polymor-
phism and ACE levels in Pima Indians. J. Med. Genet. 33, 336–337.
15. Cambien, F., Costerousse, O., Tiret, L., Poirier, O., Lecerf, L., Gonzales, M. F.,
et al (1994) Plasma level and gene polymorphism of angiotensin-converting enzyme
in relation to myocardial infarction. Circulation 90, 669–676.
16. Barrett, J. H., Foy, C. A., and Grant, P. J. (1996) Commingling analysis of the
distribution of a phenotype conditioned on two marker genotypes: application to
plasma angiotensin-converting enzyme levels. Genet. Epidemiol. 13, 615–625.
17. Slatkin, M. (1999) Disequilibrium mapping of a quantitative-trait locus in an
expanding population. Am. J. Hum. Genet. 64, 1765–1773.
18. Tregouet, D.-A., Ducimetiere, P., and Tiret, L. (1997) Testing association between
candidate-gene markers and phenotype in related individuals, by use of estimating
equations. Am. J. Hum. Genet. 61, 189–199.
2
Parametric Linkage Analysis
Lyle J. Palmer, Audrey H. Schnell, John S. Witte,
and Robert C. Elston
1. Introduction
“Linkage” describes the situation in which two syntenic loci are inherited
together. More specifically, two loci are said to be linked if they are close
enough to each other on a chromosome that recombination during meiosis is
uncommon enough for their cosegregation to be detectable within families.
Thus, linkage is a property of loci. All linkage techniques are essentially
designed to test for a statistical association between a marker (genetic or
biochemical) and a phenotypic trait. Classical model-based (parametric) linkage
analysis was developed to investigate the cosegregation of a genetic marker
and a binary trait (generally, disease affection status) within pedigrees. Model-
based linkage analysis of quantitative traits is also possible and forms the basis
of this chapter. Methods based on the exact likelihood calculation are described
in this chapter; Markov chain Monte Carlo methods are described in Chapter 6.
Classically, model-based linkage is tested by the calculation of the maximum
likelihood log-odds (LOD) score for each marker over a range of recombination
fractions (θ). Linkage of a marker to a trait phenotype relies on the detection
within families of low levels of recombination between the marker and trait
loci. This analysis assumes that a locus having both a major effect on phenotype
and a defined Mendelian pattern of inheritance is segregating within families.
The detailed model specification required makes model-based LOD score link-
age a stringent but nonrobust method for gene discovery. Although linkage
analysis can be repeated using many possible models, this constitutes multiple
testing; statistical power to detect linkage is reduced once appropriate correc-
tions are made (1).
From: Methods in Molecular Biology: vol. 195: Quantitative Trait Loci: Methods and Protocols.
Edited by: N. J. Camp and A. Cox Humana Press, Inc., Totowa, NJ
13
14 Palmer et al.
Model-based linkage analysis may be used for the following: (1) to assess
the genetic distance between marker and disease-associated loci by estimating
the number of recombination events between them; (2) to order genes in a
genetic map if the recombination fractions (θ) are known; and (3) to identify
genetic forms of common diseases. The statistical level of significance generally
used for evidence of linkage is about 10−4, which corresponds to a LOD score
of 3.0, translating to a false-positive rate (i.e., the probability of making an
error when inferring the presence of linkage) of around 5% (2). Parametric linkage
analysis can be performed on nuclear or extended families. Multipoint linkage
analysis using more than one marker locus can be performed, which increases
statistical power to detect linkage. Similarly, linkage of more than one trait locus
is possible (3). However, the interpretation of LOD scores is then difficult and
somewhat controversial (4). It is unclear what level of significance is meaningful
for a linkage to a trait determined by multiple genes; there is no clear prior hypoth-
esis to which one may attribute a Bayesian prior probability and genetic studies
of complex traits often involve large-scale multiple testing. Lander and Kruglyak
(5) have suggested that standard linkage analysis of complex traits should use a
LOD of 3.3 (p≈0.00005) as the threshold for statistical significance, in order to
give a genomewide false-positive rate of 5%. This assumes linkage analysis with
one free parameter (θ), a dense genetic map of markers applied to a large number
of informative meioses, and a genome size of 3300 cM.
1.1. Genetic Models
Simple genetic models are derived from Mendelian laws of inheritance. For
an individual, the pair of alleles (maternal and paternal) at a locus (the genotype)
is homozygous if the two alleles are the same allelic variant and heterozygous
if they are different allelic variants. If more than one locus is involved, the
patterns of alleles for a single chromosome is called a haplotype; together, the
two haplotypes for an individual is called a (multilocus) genotype. Each off-
spring receives at each locus only one of the two alleles from a given parent;
alleles are transmitted randomly (i.e., each with probability 0.5), and offspring
genotypes are independent conditional on the parental genotypes. The probabil-
ity that a parent transmits a particular allele or haplotype to an offspring is
called the transmission probability and is the first component of a genetic model.
The second component of a genetic model concerns the relationship between
the (unobserved) genotypes and the observed characteristics, or phenotype, of
an individual. A phenotype may be discrete or, the focus of this volume,
continuous. Penetrance is defined as the probability (in the case of a continuous
phenotype, a probability density) of a phenotype given a genotype; a complete
genetic model requires specification of the penetrances of all possible genotypes.
Parametric Linkage Analysis 15
With this model specification, we can calculate the likelihood for a set
of pedigrees, in which we assume that the only unknown parameter is the
recombination fraction θ on which the transmission probability depends (we
shall assume that θ is scalar [although more generally, it may be a vector if,
for example, multiple marker loci are involved] or θ is made sex dependent).
Letting L denote likelihood, we base inferences about θ on the likelihood ratio
L(θ)
Λ= (1)
L(1⁄2)
or, equivalently, its logarithm. In human genetics, it is usual to take logarithms
to base 10 and we define the LOD score at θ to be
Z(θ)=log10 ( ) L(θ)
L(1⁄2)
(2)
with a maximum Z(θ̂) at the maximum likelihood estimate θ̂. Thus, the LOD
as used in genetics is the logarithm of the likelihood for the data if there is
linkage divided by the likelihood if there is no linkage. Note that if L(1⁄2)>L(θ)
for some value of θ, then the corresponding LOD score is negative. Invariably,
it is the maximum LOD (sometimes referred to as the maxLOD) that is calculated
in linkage analyses, usually with θ̂ bounded at one-half.
When three-generational data are available, more power can be obtained by
estimating sex-specific recombination fractions θf and θm if they are different,
using the maximum log likelihood
where θ̃f and θ̃m are maximum likelihood estimates constrained so that θ̃f +
θ̃m = 1 (12).
2. Methods
We will discuss methods of exact likelihood calculations of the LOD score
statistics for linkage analysis. Sampling methods will be discussed in Chapter 6.
There are two approaches for model-based linkage analysis of a quantitative
trait based on direct maximization of the likelihood that are widely available,
have been previously published, and have software available: LODLINK and
LINKAGE. In each case, a single gene with two alleles is assumed to contribute
to the distribution of the trait.
2.1. The LINKAGE Software Package
In the LINKAGE package version 5.1 (10), the quantitative trait is described
by the mean for each genotype, the common homozygote variance, and a
Parametric Linkage Analysis 17
multiplier for the heterozygote variance (see Note 1). Commingling analysis
is first applied to a quantitative trait using pedigree data in order to estimate
mixture parameters—means, standard deviation(s), and admixture propor-
tion(s)—under the assumption of a mixture of two Normal component distribu-
tions (13). Admixture resulting from two components is often the case of interest
in human linkage analysis; the “abnormal” components of the quantitative trait
distribution may correspond to one genotype (the recessive case) or to two
genotypes (the dominant case). The results of the commingling analysis is used
to recode individuals into liability classes, which are then treated as qualitative
outcomes in standard LOD-score-based linkage analysis using LINKAGE (11)
(see Note 2). The relative frequency of alleles in the two component distributions
are also estimated by the commingling analysis and are used to determine
genotype probabilities of founder individuals in a pedigree (14). The ordinates
of the two component Normal distributions for chosen intervals are scaled and
are then used as the penetrance probabilities for the respective liability classes.
However, this pseudoquantitative algorithm employed in the LINKAGE
package is awkward, has the restriction that it assumes monogenic inheritance
of the trait being analyzed (15), and, in practice, has proven to result in less
statistical power than expected (16,17).
3. Interpretation
3.1. Assumptions Implicit in the Genetic Model
Model-based linkage analysis is often used with guessed values of the disease
allele frequencies and penetrances, and this will not inflate the significance of
a result (i.e., probability statements about the data on the assumption θ=1⁄2),
provided that the quantitative trait being modeled is, in fact, under the control
18 Palmer et al.
of a major locus in the families being studied and there are no errors in the
probability model assumed for the marker [it is not necessary for the marker
to be error-free—only that the allele frequencies and marker penetrances are
correct (20,21)]. Furthermore, given the assumptions underlying the likelihood,
we can maximize the LOD score over both θ and the parameters that describe
the mode of inheritance of the trait, and, provided the pedigrees are randomly
sampled or ascertained on the basis of the trait only, we obtain consistent
parameter estimates (22,23).
3.2. Statistical Inference
Model-based linkage analysis was originally derived for monogenic diseases
and was used exclusively for dichotomous disease affection status. Traditionally,
Z(θ̂)>3 has been taken as significant evidence for linkage (24). From general
likelihood theory, under the null hypothesis θ=1⁄2, the statistic 2[logc10]Z(θ̂) is
asymptotically distributed as a 1⁄2 : 1⁄2 mixture of χ12 and a point mass at zero,
so that Z(θ̂)>3 corresponds asymptotically to a statistic value greater than 13.8,
which translates to p<10−4 if we allow for the mixture of distributions, which
is equivalent to performing a one-sided χ12 test. Use of such an extremely small
p-value was chosen in an attempt to limit to 0.05 the probability of making
an error when concluding that linkage is present, using the fact that the prior
probability of linkage between two random autosomal loci in the human genome
is about 0.054. On the assumption that there is no appropriate prior probability
of linkage in the case of complex traits, Lander and Kruglyak (5) proposed
that the appropriate p-value should be based on the multiple testing performed
when the whole genome is scanned for linkage, whether or not such a scan
has been performed (25).
Many linkage programs assume 0≤θ̂≤0.5. LODLINK obtains the maximum
likelihood estimate over the whole interval between 0 and 1 because when
most of the data are only two generational, there are usually two maxima, one
less than 0.5 and one greater than 0.5. Should the larger maximum occur for
θ̂ > 0.5, this is evidence against linkage. If the maximum occurs for θ̂ < 0.5
and the LOD score for 1 − θ̂ is smaller, the result is in favor of linkage.
3.3. Power and Efficient Study Design
Linkage studies depend on the availability of families in which at least one
parent is a double heterozygote for the two loci being investigated (i.e., the
marker and putative disease locus). Families may thus be informative or nonin-
formative with respect to either the genetic marker or trait. Highly polymorphic
markers with many, equally frequent alleles are generally most informative for
linkage analysis. As is the case with all genetic analysis, model-based linkage
analysis is dependent on consistent and accurate phenotypic assessment. Assum-
Parametric Linkage Analysis 19
4. Software
4.1. The LINKAGE Software Package
The LINKAGE software package is available from fttp://linkage.rockefeller.
edu/software/linkage/ and is compiled for the DOS, OS2, Windows, UNIX,
and VMS operating systems.
5. Worked Example
In this worked example we use dopamine-β-hydroxylase activity as the
quantitative trait of interest. Dopamine-β-hydroxylase (DBH) is an enzyme
that catalyzes the conversion of dopamine to norephinephrine (33). Several
studies found evidence that plasma and serum DBH levels are under control
of a major locus linked to the ABO blood group locus (34–36). In a model-
based linkage study of four large Caucasian families (37), Wilson and colleagues
found strong evidence (LOD=5.88 at θ=0.00) that a gene influencing DBH
activity is linked to the ABO blood group locus on chromosome 9q. This
analysis of square-root transformed DBH activity (37) forms the basis of our
worked example.
All of the files used in this example are available on the S.A.G.E. website
(http://darwin.cwru.edu/pub/sage.html). Although only a single Caucasian fam-
ily (HGAR Family 9) is used here because of space constraints, all four families
described by Wilson et al. (37) are available on our website. The LODLINK
program and the Family Structure Program (FSP), both part of the S.A.G.E.
v3.1 package of computer programs, will be used to perform the model-based
linkage analysis.
screen. For FSP screen 1 (Fig. 3), the user types in a name for the title of the
run. For this example, the box is checked to create the segregation analysis
data file. There is one record per individual in the family data file, the symbol
for male is 1 and the symbol for female is 2; these numbers are typed into the
respective boxes.
For screen 2 (Fig. 4), it is necessary to fill in a FORTRAN format statement
that tells the program where the data are located and the required format (see
Note 6), The family ID must be numeric. The other parameters are alphanumeric
and the maximum length of each (i.e., the maximum number of columns) is
listed. Figure 5 shows the last FSP screen, which outputs the parameter file.
When the output parameter file box is clicked, a file download screen appears.
The option to save this file to disk should be chosen and the user should note
the location where the file is saved. The next step is to run FSP using the
parameter file just created and the original family data file to produce the .seg
file. How S.A.G.E. is run depends on the computer platform on which S.A.G.E.
is installed.
Parametric Linkage Analysis 23
Table 1
Marker Locus Description File for ABO Blood Group
Explanation
MISSING=0
ABO } ABO is the locus name
}
A1 = 0.190400
A2 = 0.061200
The alleles and their frequencies
B = 0.072800
}
O = 0.675600
;
1 = {A1/A1,A1/A2,A1/O} 1 is the phenotype code for blood group A1
2 = {A1/B} 2 is the phenotype code for blood group A1B
3 = {A2/A2,A2/O} 3 is the phenotype code for blood group A2
4 = {A2/B} 4 is the phenotype code for blood group A2B
5 = {B/B,B/O} 5 is the phenotype code for blood group B
6 = {O/O} 6 is the phenotype code for blood group O
;
26 Palmer et al.
7 (Fig. 12), the FORTRAN format statement is filled in. The first five parameters
are the family structure information created by FSP. The family ID, trait, and
marker phenotype symbols are in exactly the same format (i.e., in the same
columns) as the original family data (see Note 8). Figure 13 shows the screen
to output the LODLINK parameter file again, and the user should save the file
and note the location. LODLINK can now be run.
"Mutta miksi ei heillä olisi ollut työtä? Miksi olivat ihmiset iloisia
saadessaan työtä ylellisten huvitusten ja himojen tyydytysten
hankkimisessa kapitalisteille myyden itsensä mitä likaisimpiin ja
alentavimpiin toimiin? Se johtui yksinkertaisesti siitä, että näiden
samojen kapitalistien voiton ottaminen vähentämällä kansan
kulutuskyvyn ainoastaan pieneen osaan sen tuottamiskyvystä oli
suhteellisesti supistanut tuotteliaan työalan suuruutta, jossa
järkiperäisen järjestelmän vallitessa tulee aina olla työtä jokaiselle
tekijälle, kunnes kaikkien tarpeet ovat tyydytetyt, kuten nyt on
asianlaita. Puolustaessaan ylellistä tuhlaustaan tunnustivat
kapitalistit yhden vääryyden seuraukset oikeuttaakseen itseään
tekemään toista."
"No nyt, Charles", sanoi opettaja, "saat sinä auttaa meitä hiukan
eräässä omantunnon kysymyksessä. Me olemme yksi ja toinen
kertoneet kerrassaan pahoja asioita voittojärjestelmästä sekä sen
moraaliselta että taloudelliselta puolelta käsitellen. Eiköhän voine olla
mahdollista, että olemme tehneet sitä kohtaan vääryyttä?
Emmeköhän ole maalanneet siitä liian mustaa kuvaa?
Oikeusopilliselta katsantokannalta nähden sitä tuskin olemme
voineet tehdä, sillä ei löydy kyllin kovia sanoja oikein kuvatakseen
sitä ivaa, mitä se on tehnyt koko ihmiskunnalle. Mutta me emme
mahdollisesti ole kyllin voimakkaasti esiintuoneet sen taloudellista
voimattomuutta ja maailman tulevaisuuden toivottomuutta
aineelliseen hyvinvointiin nähden niin kauan kuin sitä oli kärsittävä?
Voitko puolustaa meitä tässä suhteessa?"
"Vuokra ja korko."
"Miten?"
VERTAUS VESISÄILIÖSTÄ.
"Ja kansan jano oli suuri, sillä nyt ei ollut, kuten oli ollut heidän
isiensä aikana, jolloin maa oli heille avoinna jokaisen etsiä itselleen
vettä, sillä nyt olivat kapitalistit ottaneet itselleen kaikki kaivot ja
kaikki lähteet ja vesimyllyt ja laivat ja vesisaavit, niin ettei kukaan
voisi saada vettä paitsi vesisäiliöstä joka markkinoina oli. Ja kansa
nurisi kapitalisteja vastaan ja sanoi: 'Katso vesisäiliö juoksee yli ja
me kuolemme janoon. Antakaa sen vuoksi meille vettä, ettemme
hukkuisi.'
"Ja kun kapitalistit näkivät, että kansa vielä napisi eikä tahtonut
kuunnella tietäjiä ja kun he myöskin pelkäsivät, että kansa voisi tulla
vesisäiliölle ja ottaa vettä väkivoimin, toivat he heille muutamia pyhiä
miehiä (mutta nämät olivat vääriä profeettoja), jotka puhuivat
kansalle, että sen tulee rauhassa pysyä eikä vaivata kapitalisteja sen
vuoksi että he janoissaan olivat. Ja nämät pyhät miehet, jotka olivat
vääriä profeettoja todistivat kansalle, että nämät kärsimyksensä oli
heidän päällensä lähettänyt Jumala heidän sielujensa autuuden
tähden ja että jos kestävät sitä kärsivällisesti eivätkä vettä himoitse
eivätkä kapitalisteja häiritse, tulisi heille käymään, että kun he
kuolleet ovat, he tulevat maahan, jossa ei ole kapitalisteja, mutta
runsaasti vettä. Kuitenkin löytyi myös muutamia oikeitakin Jumalan
profeettoja ja nämät armahtivat kansaa eivätkä ennustaneet
kapitalistien puolesta, vaan pikemmin puhuivat aina heitä vastaan.
"Kun nyt kapitalistit näkivät, että kansa yhä napisi eikä tahtonut
rauhallisena pysyä tietäjien eikä väärien profeettain sanoista
huolimatta, menivät he itse heidän keskuuteensa ja kastelivat
sormensa vedessä, joka säiliöstä yli juoksi ja pirskottivat vesitippoja
sormistansa kansan joukkoon, joka säiliön ympärillä tunkeili ja
näiden vesitippojen nimenä oli armeliaisuus ja olivat ne tavattoman
katkeria.
"Ja kun kapitalistit yhä näkivät, etteivät tietäjien sanat eikä pyhien
miesten, jotka vääriä profeettoja olivat, eivätkä vesitipat, joita
armeliaisuudeksi kutsuttiin, rauhoittaneet kansaa, vaan pikemmin
sitä kiihottivat ja kokosivat vesisäiliöiden luo ikään kuin he aikoisivat
vettä väkivallalla ottaa, pitivät he yhteisen neuvottelun ja lähettivät
miehiään salaisesti kansan keskuuteen. Ja nämät miehet etsivät
mahtavimmat kansan joukosta ja kaikki, joilla oli sotataitoa ja ottivat
heidät syrjään ja puhuivat voimakkaasti heille sanoen:
"Ja pitkän ajan kuluttua oli vesi säiliössä alentunut, sillä kapitalistit
tekivät suihkulähteitä ja kalalammikoita sen vedestä ja kylpivät siinä,
he ja heidän vaimonsa ja lapsensa ja tuhlasivat vettä omaksi
huvituksekseen.
"Ja kun kapitalistit näkivät, että vesisäiliö oli tyhjä, sanoivat he:
'huono aika on loppunut', ja he lähettivät kansaa pestaamaan, että
nämät toisivat vettä sen uudelleen täyttääkseen. Ja vedestä, jonka
kansa toi säiliöön, saivat he joka saavista pennin, mutta vedestä,
jonka kapitalistit ottivat säiliöstä kansalle antaakseen he saivat kaksi
penniä, että he voisivat saada itselleen voiton. Ja jonkun ajan
kuluttua säiliö taasen oli tulvillaan kuten ennenkin.
"Ja kun kansa useita kertoja oli täyttänyt säiliön, kunnes se oli
tulvillaan ja nähnyt janoa, kunnes kapitalistit olivat siinä olevan
veden tuhlanneet, sattui, että maassa nousi muutamia miehiä, joita
kutsuttiin agitaattoreiksi, sillä he kiihoittivat kansaa nousemaan
vastustamaan. Ja he puhuivat kansalle kehottaen sitä yhtymään ja
silloin heidän ei tarvitsisi enää kapitalisteja palvella eikä enää veden
puutteessa kärsiä. Ja kapitalistien silmissä olivat nämät agitaattorit
tuhoisia henkilöitä ja he olisivat varmasti ristiinnaulinneet heidät,
mutta eivät uskaltaneet kansan pelvon tähden.