Introduction

Although many environmental factors may influence body height, in Westernized countries interindividual variation in adult height is to a large extent the result of genetic variation. Heritability estimates for adult height at age 20 years or older range from 0.77 to 0.94 in men and 0.68 to 0.93 in women from seven European Countries and Australia.1 These high estimates are in line with those from other studies.2, 3, 4, 5

With the heritability of adult height well quantified, the next step is to identify specific genes involved in the variation in height. Several linkage and associations studies have been conducted to locate chromosomal regions affecting height. Association studies focus on candidate genes and, with respect to height, have concentrated on an array of growth-related candidate genes. Associations were shown for vitamin D receptor gene,6 D2 dopamine receptor (DRD2) gene,7, 8 collagen IA1,9 oestrogen receptor gene,10 luteinizing hormone beta gene,11 and the SHOX gene.12 Although association studies have relatively high statistical power to detect quantitative trait loci (QTLs) with small effects, they have recently been criticized as having a higher risk of false positives.13 Also, by focusing on genes in known pathways, it is possible to overlook genes involved in currently unknown pathways. Linkage studies are needed to scan the whole genome for chromosomal regions that are related to the trait of interest. However, the statistical power for localizing QTLs in studies using sib-pair designs is much lower. At present, seven papers have been published with results of genomewide scans for adult height,14, 15, 16, 17, 18, 19, 20 while results from one study pertain to a genome scan in a limited region.21 Table 1 presents an overview of the results of these studies. In order to compare the outcomes of the studies, all chromosomal positions were transformed to Kosambi cM for both the Decode and Marshfield map.

Table 1 Overview of the results of previous genomewide scans for height

LOD scores of 3 and higher were found for regions on six chromosomes: chromosome 3,16 chromosome 6,14, 20 chromosome 7,14 chromosome 12,14 chromosome 13,14 and chromosome 14.19 For other positions along the genome LOD scores were below 3. Such lower LOD scores may still be informative and taken as suggestive for linkage, when a consistent pattern emerges for a specific chromosomal region.

The results of some studies in Table 1 seem to provide evidence for linkage for a select number of chromosomal regions. For chromosome 3, results centre around 176.54 and 181.87 cM with additional results at either end of this region, while a promising area at chromosome 5 is located near 138.64–139.33 cM. For chromosome 6, two high LOD scores were found for a region between 154.64 and 159.98 cM. Mukhopadhyay et al18 reported a peak near this same position, although they did not specify the exact location. For chromosome 7, the region of 155.1–173.71 cM seems a good candidate for QTLs, while for chromosome 9 there seems to be replication of the chromosomal region around 55 cM. For chromosome 17 there is a replication for position 103.53 cM and for chromosome 18 there is a replication in the region of 115.89 cM. Lastly, for chromosome 20 two nearby regions seem interesting, around 35 and around 60 cM.

Although gradually a pattern is emerging indicating the likely locations of QTLs for adult height, additional replication studies in independent samples are needed to draw more definite conclusions with regard to the relevant chromosomal regions for height. In this paper, we present the linkage results for adult height in a Dutch population of twins and their siblings.

Methods

Participants

In 1991 the Netherlands Twin Register (NTR) started a longitudinal survey study of health and lifestyle.22 To this aim, questionnaires were sent out in 1991, 1993, 1995, 1997, and 2000 to adolescent and adult twins and their family members. Twin pairs were asked to participate in all waves, parents were asked to participate in 1991, 1993, and 1995, siblings were included since 1995 and spouses in 2000. Data are currently being collected for a sixth survey.

Body height was obtained from all survey data collected between 1991 and 2000. Height data were only included when height was obtained at age 20 years or older. Most individuals completed questionnaires at more than one survey. Differences in height across the questionnaires were checked and height data were discarded when there was no consistency across questionnaires and differences were larger than 5 cm (N=118, 1.4%). In several subsamples, height was also measured in the laboratory during experimental protocols. When measured height was available for age 20 years or older, this was used in the analyses instead of self-reported height (N=973, 12%). As reported elsewhere, the correlation between self-reported and measured height for this sample is 0.93.1 Height at age 20 years or older was available for 5949 twins and 2138 siblings.

Based on questionnaire data on anxiety and depression, a genetic factor score was composed that was used for an extreme discordant and concordant selection for a QTL study of anxious depression (Netherlands Twin family Study of Anxious Depression, see Boomsma et al23 for a detailed description). The participants selected for the QTL study (n=2.724) were asked to provide a buccal swab for DNA isolation. Of the 1962 participants (72%) who returned a buccal swab, 917 (624 offspring and 293 parents) were genotyped over the entire genome. Participants for whom less than 50% of the markers were successfully typed were removed from the sample. This resulted in a subsample of 558 offspring and 278 parents from 192 families for whom genotyping was successful.

DNA collection, genotyping, and error checking

DNA collection has been described in detail elsewhere.22, 24 Genotyping was conducted by the Marshfield Laboratory with the 10 cM spaced microsatellite screening set 1025 with few alternative markers. Pedigrees were checked for Mendelian errors with the program Unknown26 and pedigree relationships in the entire sample with the GRR program.27 Mendelian errors were removed by assigning missing values to the marker scores if the errors appeared incidental. Likelihoods for recombinations were checked using the Merlin program.28 Excessive recombinations were observed for five markers indicating potential problems. Those markers were not included in the final analyses: two markers on chromosome 1 (D1S160-MS48 and D1S1627-ATA25E0); two markers (D11S1985-GGAA5C04 and D11S2006-GATA46A12) in a group of five very closely or identically mapped markers on chromosome 11; and one marker on chromosome 20 (D20S159-UT1307). The pseudoautosomal markers on the X-chromosome were not included in the analyses. For all other recombination problems, the data were cleaned using Merlin's procedure for identifying unlikely recombinations, which resulted in the assignment of missing values to genotypes if the likelihood of a recombination was less than 0.025.

Marker distances were assigned from the Decode map29 if available. For markers not mapped by Decode, the original distance provided on the Marshfield website30 was transformed by linear interpolation from adjacent markers with known Decode map values.

Identical by descent (IBD) estimation

If a sibling pair receives the same alleles from a parent in a certain region of the genome, the pair is said to share the parent's alleles in that region IBD. Since offspring receive their alleles from two parents, a sib-pair can share 0, 1. or 2 alleles IBD. The IBD status of a pair is usually estimated for a number of markers with (approximately) known location along the genome and is then used as a measure of genetic similarity. The IBD status at a marker is informative for the IBD status at any other locus (eg a disease or trait locus) along the chromosome as long as the population recombination fraction between the marker and the locus is less than 0.5. In that case, the IBD status at the marker and the locus are correlated in the population, and hence similarity at the marker is informative for similarity at the locus. IBD status is not always unambiguously known and has to be estimated using the specific allele pattern across chromosomes of two or more siblings and parents. The estimate of the proportion of alleles shared IBD is referred to as π̂, and is obtained as: , where π̂ijk is the estimated proportion of alleles shared IBD between sib j and k for the ith family, and p (IBD = 0) i j k , p (IBD = 1) i j k and p (IBD = 2) i j k are the probabilities that sib j and k share 0, 1, or 2 alleles, respectively, conditional on the marker information. The probabilities of sharing 0, 1, or 2 alleles IBD at every 7.544 cM (Haldane map) over the genome were estimated with the program Merlin28 (autosomal chromosomes) or Merlin-in-X (MINX28) for the X-chromosome.

Genetic model fitting

Genotypic data as well as adult height at 20 years or older were available for 477 offspring from 174 families, forming 513 sib-pairs. Although this does not provide information for linkage, phenotypic data from participants for whom no genotypic data were available were simultaneously analysed to obtain accurate estimates of the effects of covariates and background genetic influences. The total (typed+untyped) sample consisted of 8087 offspring from 3617 families, with an average of 2.2 offspring per family. As an additional check analyses were run with and without the untyped participants.

We included effects of sex and birth cohort on body height. The latter was included to take into account the secular increase in body height in the Dutch population; adult height increases with more than 1 cm per decade.31

In a previous study, it was demonstrated that variation in body height in the Dutch population consists of additive genetic variation (89% in males and 90% in females) and nonshared environmental variation.1 Variation in body height was decomposed into variation due to a QTL (σq2), due to additive influences (σa2), and due to nonshared environmental influences (σe2), using structural equation modelling as implemented in Mx.32 The X-chromosome was analysed using option - -vc in MINX.28 Estimates of the variance component associated with a putative QTL were obtained by using the π̂ approach, in which the covariance due to the marker or trait locus for a sib-pair is modelled as a function of the estimated proportion of alleles shared IBD. As stated earlier, height data from genotyped and untyped individuals were simultaneously modelled to allow more accurate estimation of the background genetic variance, the effect of sex on body height, and the cohort effect. For the untyped pairs, σa2 consisted of (σa2+σq2). The general variance–covariance matrix for pair j, k of the ith family (Ωijk) is given by

Here σa2, σq2, and σe2 denote the background additive genetic, QTL, and environmental variances, with ρ=1 for MZ twins and ρ = ½ for DZ twins. Significance of genetic variation due to the QTL was evaluated by the likelihood ratio test, from which the LOD score can be calculated by dividing the test statistic χ2 by 2 ln 10 (∼4.6).33

Results

Table 2 shows the descriptives for the total sample of offspring with adult height (N=8087) and the genotyped sample of offspring with adult height (N=477). Body height was normally distributed and comparable across the total sample and the genotyped sample (Table 2). This result, taken together with low correlations between body height and the anxious depression factor scores (−0.02 and −0.03), indicates that the genotyped subsample can be considered to be a random selection of the total sample for body height. Males were on average 13.18 cm taller than females (95% confidence interval (CI): 12.91–13.45, P<0.001). The cohort effect indicated that body height in the Dutch population shows an annual increase of 0.13 cm (95% CI: 0.12–0.14, P<0.001).

Table 2 Number of participants (N), mean age at 1-1-2003 (SD) and mean height in cm (SD) for the total population and the genotyped subsample

Figure 1 shows the results of the whole-genome scan. The highest peak (LOD score of 2.32) was found on chromosome 6 at approximately 84.38 Kosambi cM between markers D6S1053 and D6S1031. Six other peaks with LOD scores ranging from 1.53 to 2.04 were found on chromosomes 1, 5, 8, 10, and 18. These results are outlined in Table 3 and shown in Figure 2.

Figure 1
figure 1

Whole-genome scan for height.

Table 3 Linkage results of the sib-pair based linkage for height
Figure 2
figure 2

Linkage results for chromosomes showing LOD scores above 1.5 for height.

Discussion

As a highly heritable, complex trait height presents a prime candidate for identifying the responsible genes by means of linkage analysis. The current article presented the results of a linkage analysis for adult height using data from Dutch twins and siblings. The highest LOD score in our study was obtained for a region on chromosome 6, near the markers D6S1053 and D6S1031 (LOD=2.32), while LOD scores of at least 1.53 were shown for regions on chromosomes 1, 5, 8, 10, and 18. It is important that results are replicated in other samples before drawing conclusions about the significance of a chromosomal region. In the current article, we reviewed the results of previous studies and indicated some chromosomal regions for which there was consistent evidence for linkage. Our result for chromosome 6 was a replication of the result reported by Wu et al19 for the GENOA European American sample, where a LOD score of 2.66 was found for the same region. Two other studies, including another Dutch population, also found evidence for linkage on chromosome 6, although for another region than in the present study.14, 20 However, estimates for QTL locations obtained for complex traits through linkage analyses may be many centimorgans from the true locus. Roberts et al34 used simulations to show that with a sample size of 200 families CIs around the peak location may cover 30–100 cM. With a sample size of 800 families, the 95% CIs around the peak location may still cover 15–35 cM. Even when specific QTLs on a chromosome may not be replicated, repeated linkage for nearby areas on a chromosome may indicate a significance of that region for the trait of interest. In this case, our current results and that of previous studies seem to warrant more detailed study of the regions identified on chromosome 6. This is further supported by the fact that the oestrogen receptor alpha, which has been associated with height,10 is located within the same region. A replication was also shown for our peak at 98.3 cM on chromosome 18; Hirschhorn et al14 also reported increased LOD scores for the region of 106.81 and 115.89 at chromosome 18 in the Botnia and the Finland population, while Mukhopadhyay et al18 reported an increased LOD score at 99.04 cM.

Specific population characteristics may decrease the generalizability of results to other populations. A gene–environment interaction may increase the effect of a particular gene for height and may limit the replication of effects. However, most populations studied have been Western populations and lifestyle factors may therefore be fairly similar across the populations. A further complication may arise because different analysis packages may result in different LOD scores for the same regions. This was recently demonstrated by Mukhopadhyay et al,18 further emphasizing the need to include lower LOD scores when comparing study results. Combining the results of studies may prove critical to the identification of chromosomal regions important for height:35 no single study can be expected to localize all genes involved in height but a meta-analysis may indicate which chromosomal regions with small but consistent effects are worth further investigation. The small effects for a variety of genes would suggest height to be a polygenic trait. This would be in contrast to the results of a segregation analysis in which height was predicted by a putative major gene, explaining 37–53% of the variance.36 A more recent analysis20 also found that a model that included a major recessive gene and residual polygenic effect (mixed-recessive model) best fitted the height data. However, the difference between the mixed-recessive model and the polygenic model was very small and both models fitted the data equally well when spouse correlations were included in the model. Considering height as a polygenic trait, it may not be surprising that the different linkage analyses show a diverse pattern with many genes having small effects.

In conclusion, the present study replicated previous findings for possible QTLs on chromosomes 6 and 18. However, additional linkage studies are needed to draw more definite conclusions about the relevance of these regions for determining adult height.