Article Text

PDF

Confirmation linkage study in support of the X chromosome harbouring a QTL underlying human height variation
  1. Y-Z Liu1,
  2. F-H Xu1,
  3. H Shen1,
  4. H Deng1,
  5. Y-J Liu1,
  6. L-J Zhao1,
  7. V Dvornyk1,
  8. T Conway1,
  9. J-L Li2,
  10. Q-Y Huang1,
  11. K M Davies1,
  12. R R Recker1,
  13. H-W Deng1
  1. 1Osteoporosis Research Center, Creighton University, Omaha, Nebraska, USA
  2. 2Center for Medical Informatics, School of Medicine, Yale University, New Haven, Connecticut, USA
  1. Correspondence to:
 Dr Hong-Wen Deng
 Osteoporosis Research Center, Creighton University, 601 N 30th St, Suite 6787, Omaha, Nebraska 68131, USA; dengcreighton.edu

Statistics from Altmetric.com

Human height is a complex trait determined by both genetic and environmental factors. An initial whole genome study showed several genomic regions with suggestive linkage to height in a sample of 630 subjects from 53 human pedigrees. The present study was conducted in an extended sample of 1816 subjects from 79 pedigrees in an attempt to replicate and confirm the results of the previous whole genome scan. Xq24–25 on the X chromosome was confirmed as the region suggestive of linkage to height. In the previous whole genome study, a microsatellite marker of the region DXS1001 achieved a two point LOD score of 1.91 for linkage to height. In the present study on the 79 pedigrees, another marker of the same region, DXS8067, which is only 2.7 cM away from the former marker, attained a higher two point LOD score of 2.66. Moreover, the region’s significant linkage to height was sustained, with a two point LOD score of 1.00 achieved in a subset of the current sample (1026 subjects from 26 new pedigrees), which is independent of the original 630 subjects used in the whole genome study. Our results—together with identification of several syndromes with short stature, which are in linkage to Xq24–25—strongly suggest that this region may harbour a quantitative trait locus (QTL) underlying human height variation.

Human height is a typical complex trait determined by both genetic and environmental factors. Nutritional status and diseases are the most important environmental factors controlling human linear growth.1–4 However, genetic factors play a more dominant role in height determination. This is indicated by a significant familial aggregation of the trait, translating into a heritability of well above 50%.5–9

The search for genes underlying height variation has long been an endeavour in the field of genetic studies of complex traits. Several segregation analyses on populations of different races have suggested the existence of major genes for height, though with different inheritance models.10,11 Association studies identified a list of candidate genes showing a significant relation to height variation. These genes include vitamin D receptor (VDR) gene,12 D2 dopamine receptor gene,13 collagen Iα1 (COLIA1) gene,14 oestrogen receptor α (ER-α) gene,15 and luteinising hormone β gene.16 Stimulated by high throughput genotyping technology, genome-wide linkage scanning has recently become a major approach to the search for genes determining height variation. Whole genome studies using microsatellite markers have been undertaken by our group and others.11,17–21 Several genomic regions with significant or suggestive linkage to height were identified.

Our whole genome scan for QTLs underlying height variation was initially done on 630 subjects from 53 multiplex pedigrees of white European origin. We detected suggestive linkage to height at the regions of 5q31, 8p22, 17q25, Xp22, and Xq24–25.20 The current study was undertaken to confirm those results by doing linkage analysis with 16 markers, concentrating on the above five regions in a much more extended sample of 1816 subjects from 79 pedigrees. The intermarker density was narrowed down from the previous ∼10 cM to ∼5 cM in the current study.

Key points

  • An initial whole genome study in a sample of 630 subjects from 53 human pedigrees showed several genomic regions that might have linkage to height.

  • The present study was conducted on an extended sample of 1816 subjects from 79 pedigrees in an attempt to replicate/confirm the results of the previous whole genome scan.

  • Xq24–25 on X chromosome is confirmed as the region that may have linkage to height. Linkage to this region is suggested in both the original 53 pedigrees and the extended 79 pedigrees. Moreover, the linkage signal sustained its significance in a subset of the extended sample independent of the original 53 pedigrees.

  • These results, together with identification of several syndromes with short stature which are also in linkage to Xq24–25, strongly suggest that this region may harbour a quantitative trait locus underlying human height variation.

METHODS

Subjects

The study was approved by the Creighton University institutional review board. All the study subjects signed informed consent documents before entering the project. The study subjects came from an expanding database being created for studies to search for genes underlying the risk of developing osteoporosis and obesity which is under way in the Osteoporosis Research Center (ORC) of Creighton University. The sampling scheme and exclusion criteria have been detailed elsewhere.20 Briefly, patients with chronic diseases and conditions which could affect bone mass were excluded from the study. These diseases/conditions include chronic disorders involving vital organs (heart, lung, liver, kidney, brain), serious metabolic diseases (diabetes, hypo- and hyperparathyroidism, hyperthyroidism, and so on), other skeletal diseases (such as Paget’s disease, osteogenesis imperfecta, and rheumatoid arthritis), chronic use of drugs affecting bone metabolism (corticosteroids, anticonvulsant drugs), and conditions leading to malnutrition (chronic diarrhoea, chronic ulcerative colitis).

All the study subjects were of Europid origin. These subjects can be divided into three groups: 630 individuals from 53 pedigrees in our previous whole genome study were included in the current study; 128 individuals were newly recruited but still belonged to the original 53 pedigrees; and 1058 individuals were newly recruited and made up 26 new pedigrees. The pedigrees varied in size from four to 416 individuals (mean (SD), 31.9 (48.9)). The distribution of pedigree size is summarised in table 1.

Table 1

Distribution of pedigree size

Recruitment of the sample was initially intended for osteoporosis study and was therefore based on bone mineral density (BMD) values in the probands. Among the original 53 pedigrees for our whole genome study, 50 were recruited through probands having BMD z scores of ⩽−1.28 at the hip or spine and three were recruited without regard to BMD. Among the 26 new pedigrees, 25 were recruited through probands having BMD z scores of ⩾+1.28 at the hip or spine, and one was recruited without regard to BMD. Recruitment through probands having extreme BMD values—that is, z scores of ⩽−1.28 or ⩾+1.28—may help enhance the statistical power for a linkage study of BMD related traits.22,23 In our sample, the correlations of sex and age adjusted height values, with thus adjusted spine BMD and hip BMD values, are 0.17 (p<0.01) and 0.18 (p<0.01), respectively. Therefore our sampling scheme based on BMD values may also have positive effects on the power of linkage study of height.

Genotyping

For each subject, blood (20 ml) was drawn into EDTA containing tubes by certified phlebotomists and stored chilled (at ∼4°C) until DNA extraction, which was normally completed within the next five calendar days. DNA was extracted employing a Puregene DNA isolation kit (Gentra Systems Inc, catalogue number D-5000, Minneapolis, Minnesota, USA) following the procedures detailed therein. DNA was genotyped using fluorescently labelled markers as before.24 The 16 dinucleotide markers we genotyped are commercially available through Perkin Elmer Applied Biosystems (ABI PRISM linkage mapping sets, version 2, Norwalk, Connecticut, USA). The polymerase chain reaction (PCR) was done on PE 9700 thermocyclers (GeneAmp® PCR System 9700, Applied Biosystems, Foster City, California, USA). PCR cycling conditions followed those suggested in the ABI PRISM linkage mapping sets, version 2.5. Genotyping was done using Applied Biosystems automated DNA sequencing systems (model 3700; Perkin Elmer-ABI, Foster City, California, USA) running the GENESCAN™ version 4.0 and GENOTYPER™ version 4.0 softwares for allele identification and sizing. The current 3700 system in this study is a genotyping platform upgraded from ABI 377, which we used for the previous whole genome study.

As some of the markers for the current confirmation study were also genotyped in the previous whole genome study, we retyped those markers to avoid a systemic shift in genotyping results and consequent difficulty in binning adjustment.

A genetic database management system (GenoDB)25 was employed to manage the phenotype and genotype data for linkage analyses. GenoDB was also employed for allele binning (including setting up allele binning criteria and converting allele sizes to distinct allele numbers), data quality control, and data formatting for PedCheck26 and linkage analyses using SOLAR. PedCheck (available at http://watson.hgen.pitt.edu/register/-soft_doc.html) was employed for checking whether the data conformed to a Mendelian inheritance pattern at all the marker loci, and for confirming the alleged relations of family members within pedigrees. A genotyping error rate of 0.3% was achieved after three rounds of data check with PedCheck and three rounds of PCR amplification of the missing data and data which could not pass the PedCheck. All the 16 markers were successfully genotyped. These markers have an average population heterozygosity of ∼0.79 and spaced on average ∼5 cM.

Statistical analyses

The variance component linkage analysis27–29 for quantitative traits was used. The program employed was SOLAR (Sequential Oligogenic Linkage Analysis Routines),29 which is available online (http://www.sfbr.org/sfbr/public/software/solar/solar.html).

In linkage analysis, age and sex were used as covariates to adjust for height, as they generally affect human height variation significantly in our study population.30 Analyses were also done without adjusting for one or both of these covariates. Adjustment for significant covariates in genetic analyses can generally increase the signal to noise ratio in linkage detection by decreasing the proportion of the residual phenotypic variation attributable to random environmental factors.31 This may thus improve statistical power in our linkage analyses. The height data were tested by graphical methods32 and found not to deviate from normal distributions.

We also tested genetic heterogeneity for the markers or regions achieving a LOD score of >2.0 in the linkage analysis of the 79 pedigrees. The program we used for this purpose is HOMO. It was developed by Dr Harald Göring and is available online at hgoringdarwin.sfbr.org. HOMO has been integrated into SOLAR version 2.0. It performs a test of heterogeneity using an admixture model33 and is, in principle, quite similar to Jürg Ott’s program HOMOG.34

RESULTS

The distribution of pedigree size is summarised in table 1. The basic characteristics of study subjects in 79 pedigrees are summarised in table 2. The relationships contained in the original pedigrees for the whole genome study, the current extended pedigrees, and the pedigrees independent of the original ones are listed in table 3. It can be seen that all the relationships informative for linkage analysis have drastically increased in the extended pedigrees. For example, the number of the sibling, grandparent–grandchild, and avuncular pairs has been more than tripled and that for cousin pairs has increased 10-fold. This sharp increase in relationship is attributed to the incorporation of several large multigenerational pedigrees to our extended sample. The biggest one contains 416 people.

Table 2

Basic characteristics of the study subjects

Table 3

Relationships contained in pedigrees

Table 4 compares the results of the present study with our initial whole genome scan. As the SOLAR software for linkage analysis does not handle multipoint linkage analysis of the X chromosome, the two regions Xp22 and Xq24–25 have only two point linkage results available. As presented in table 4, Xq24–25 is the region consistently showing linkage to height in both the initial 630 subjects and the present extended sample of 1816 subjects. While in the previous study Xq24–25 achieved a two point LOD score of 1.91 at the marker DXS1001, a higher one of 2.66 is attained in the present extension study at the marker DXS8067, which is only 2.7cM away from DXS1001 (fig 1). In addition, another region on chromosome 5 (5q31) also achieved a marginally significant two point LOD score of 0.96 and a multipoint LOD score of 0.61 in the extended sample.

Table 4

Comparison of linkage results between the previous whole genome study and the present confirmation study

Figure 1

X chromosome two point linkage analysis results for the previous whole genome study and the present confirmation study. The X axis lists genotyped microsatellite markers arranged in the order as on the X chromosome. The Y axis is the scale for two point LOD score values. Open squares represent the LOD scores achieved in the previous whole genome study in 53 pedigrees. Filled diamonds represent the LOD scores achieved in the present confirmation study in 79 pedigrees. The grey lines enclose the region Xq24–25, where confirmative significant linkage to height was detected. Two markers, DXS8009 and DXS8067, are genotyped only in the present study. Three markers—DXS1060, DXS8051, and DXS1001—are genotyped in both the previous whole genome study and the present confirmation study.

We also conducted linkage analysis in the subset of the sample extended from, but still limited to, the original 53 pedigrees. This subset of the sample is made up of the original 630 subjects for the whole genome study and 128 newly recruited subjects who come from the same pedigrees as the 630 subjects. Again, Xq24–25 was confirmed as the only region of significant linkage to height, with a two point LOD score of 1.46 (table 4).

To further confirm the significance for linkage of the two regions Xq24–25 and 5q31, we computed two point or multipoint LOD scores in the newly recruited 26 pedigrees composed of 1058 genotyped subjects. This subset of sample makes up the majority of our total sample size. More importantly, it is a purely independent sample in relation to the 630 subjects, on whom our initial whole genome linkage study was conducted. For the Xq24–25 region, a significant two point LOD score of 1.00 was achieved, equivalent to a nominal p value of 0.0197. For the 5q31 region, a marginally significant two point LOD score of 0.72 (p = 0.048) and a multipoint LOD score of 0.60 (p = 0.07) were attained. The above results were summarised in table 5.

Table 5

Linkage analysis results for the regions Xq25 and 5q31 in original sample, extended sample, and independent sample

A heterogeneity test for the region Xq24–25 in the 79 pedigrees did not detect significant genetic heterogeneity of height, with an hLOD value of 2.67. However, the result should be interpreted with caution owing to the test’s intrinsic limitations.35

Figure 1 plots X chromosome two point LOD scores for both the previous whole genome study and the present confirmation study. The figure includes 20 microsatellite markers genotyped in the previous whole genome study in the 53 pedigrees (with LOD scores denoted by open squares) and the present confirmation study in the extended 79 pedigrees (with LOD scores denoted by filled diamonds). For the initial whole genome study, the linkage signal peaks at the two regions, the region near the pter (Xp22) and the region Xq24–25. Five markers were retyped in or near these two regions in the present confirmation study. The marker, DXS8067, of the region Xq24–25 achieved a LOD score of 2.66, higher than the one (1.91) attained by another marker, DXS1001, of the same region in the initial whole genome study.

DISCUSSION

The genome-wide linkage scan has now become a major device for deciphering the genetic basis of complex traits. Theoretically, it can obviate the need to look into complicated intermediate biochemical steps and directly indicate the genomic regions that may be relevant to the development of a quantitative trait or the pathogenesis of a genetic disease.

Proven to be effective in the localisation of genes underlying monogenetic characters, the whole genome linkage approach, however, faces challenges in the genetic dissection of complex traits such as height, blood pressure, and blood glucose levels. This is largely because of the complex biological basis of these characters—they are determined not only by multiple genetic and environmental factors but also by their interactions, which lead to both genetic heterogeneity and phenocopies. Consequently, significant linkage results are much easier to claim than to replicate. To ensure the relevance of the results of whole genome linkage studies, stringent criteria have been proposed.36 Generally, a multipoint LOD score above 3 has been accepted as a rule of thumb for a region to be statistically significant for linkage in a whole genome scan. However, for the significant levels applied to a linkage study seeking confirmation of an initial whole genome scan, as in the case of this study, no consensus has been reached. But the strict standard accounting for multiple testing in a whole genome study should definitely not be applied to a confirmation study using only a limited number of markers. In the situation where five markers are used to replicate a region, it was suggested that a nominal p value of 0.01 should be required for confirmation at a 5% significance level.36

Our primary whole genome linkage scan was undertaken on 630 subjects from 53 pedigrees. In that study, 380 microsatellite markers were used, giving us an average marker density of ∼10 cM/marker. Suggestive linkage results were achieved at 5q31, 8p22, 17q25, Xq24–25, and Xp22. To confirm these results, more microsatellite markers (with a density of ∼5 cM/marker) within or around these regions were chosen and tested for linkage to height in our extended sample. Xq24–25 turns out to be the only region consistently showing linkage to height (tables 4 and 5). Although further investigation is needed to validate this result, its relevance is supported by three features.

First, the significance level of linkage achieved is remarkable for a confirmation study. As suggestive linkage is already detected at the whole genome level, the current aim of confirming the linkage signal represents a common hypothesis-driven test, where nominal rather than stringent genome-wide significance levels should be used. In the light of that, the LOD score of 2.66 achieved in our extended sample should be highly significant. It is equivalent to a nominal p value of 0.00014 and even approaching the significance levels for a genome-wide linkage scan.

Second, the characteristics of our extended sample are highly suitable for a study trying to replicate a previous linkage. The majority of our extended sample is composed of newly recruited pedigrees, which themselves almost double the original sample size, making the current sample quite independent of the original one. Moreover, the newly recruited subjects are mainly from large multigeneration pedigrees. Pedigrees with at least 85 family members make up more than 60% of the newly recruited sample, the largest pedigree having 416 individuals with genotype information. These huge pedigrees provide a large number of relationships informative for linkage analysis and hence increase the statistical power of the current study substantially as compared with the original whole genome scan.

Third and most importantly, in the 1058 newly recruited subjects from the 26 new pedigrees, Xq24–25 sustained its significance for linkage to height, with a two point LOD score of 1.00, equivalent to a nominal significance level of 0.0197 (table 5). These 1058 subjects represent a purely independent subset of the total sample in relation to the 630 subjects used in our initial whole genome study. Confirmation of linkage in such an independent group of subjects constitutes an unequivocal replication for the region’s importance to height identified in the genome-wide scan.36

Recently, several whole genome studies have been done on height. The time is ripe to summarise the data generated by these studies for meta-analysis purposes. So far linkage signals for height have been reported for chromosomes 3, 5, 6, 7, 9, 12, 20, and X.11,17–21 Cross study comparison shows that two regions, 7q31–36 and 6q25, were both detected by different studies. Linkage to height of the former region was identified both in 614 subjects from 247 Finnish families19 and in 746 subjects from 179 Sweden families.18 Linkage to height of the latter region was detected both in 408 subjects from 58 Finnish families18 and in 1184 subjects from 200 Dutch pedigrees.11 The replication achieved for these two regions implicates the relatively homogeneous genetic basis for height determination in northern Europeans. On the other hand, the importance of these two regions for height may be limited to the northern European population—an isolated population with a history marked by population bottle necks and random genetic drifts.

In contrast, in a relatively less homogeneous population—such as the white US subjects in our study—identification of genetic factors underlying height variation should be much more elusive and challenging, as genetic heterogeneity is to be expected. However, the answers to this search have more universal implications for human biology. First, in order to ensure statistical power, the gene mapping in less homogeneous—as compared to isolated—populations requires a much larger sample size, which is the prime advantage of this study over others in the field. Among all the linkage studies on height, this study has not only the largest number of individuals but also the most complex pedigrees and the most abundant informative relationships, which give us the highest statistical power over other studies. This is particularly meaningful for a replication study in two areas. First, only with high statistical power can it be possible, as well as theoretically justified, for a study to replicate linkage findings caused by QTLs with moderate effects,37 which are expected to be the genetic basis of height. Second, only with high statistical power can one be confident enough to reject previous linkage findings that could not be replicated in a confirmation study.36

The X chromosome plays an important role in human height. This is evidenced by the large number of X linked diseases involving abnormal linear growth. In particular, for the genomic region Xq24–25, linkage to several syndromes with short stature has recently been discovered. These syndromes include the MRSS syndrome (X linked mental retardation with short stature),38 the MRGH syndrome (X linked mental retardation with isolated growth hormone deficiency),39,40 panhypopituitarism,41 and a syndrome reported by Cabezas et al (X linked mental retardation with short stature, small testes, muscle wasting, and tremor).42 These findings, together with the present study confirming the linkage of Xq24–25 to normal human height variation, shed light on the region for its potential involvement in human linear growth and support the existence of a QTL in this region for height determination. Given such a critical role of the X chromosome in human height development, surprisingly little effort has been made to map the X chromosome in the whole genome study of human height. To date, our initial whole genome study20 and the present confirmation study are the only ones exploring the X chromosome in a genome-wide scale for its importance in height determination.

Some genomic regions showing suggestive linkage to height in our last whole genome scan could not be replicated in this study. A significant decrease in LOD scores is seen for 8p22, 17q25, and Xp22. Confirmation for the region 5q31 is tenuous, as only marginally significant results were achieved.

Given the relatively high statistical power of this study, it is unlikely that the disappearance of these regions’ linkage signal for height reflects random fluctuations. However, there are still certain factors that could lead to a drastic difference in linkage results between this study and the earlier whole genome scan. First, the allele frequencies may have changed in the present study for the markers genotyped in both the current and the previous whole genome study. As IBD (identity by descent) inference is based on allele frequencies, the changes of the latter will inevitably to some extent affect the computation results of IBD and other downstream parameters, including the multipoint and two point LOD scores. Second, the genotyping errors may have substantial effects on linkage results. Despite the similar genotyping error rate (0.3%) for both the current study and the initial whole genome study, the distribution of the errors may vary between the two studies. Such errors can seriously deflate power and inflate recombination fraction estimates in two point linkage analysis.43–45 In multipoint linkage analysis, the effects of the errors can be further magnified as marker density increases,46 leading to increased potential for false exclusion of the true disease loci. Third, even if those unreplicated regions do harbour QTLs underlying height variation, it is still possible that those regions have gone undetected in a replication study owing to the expected minor effects of each QTL on height. Such a situation can be explained by the fact that a study should have a much greater power—given a certain sample size—to detect than to repeat the findings of any one of multiple genes underlying a complex trait.47

A limitation of our study is the absence of multipoint linkage results for the X chromosome. This is because of the inability of the SOLAR software to handle multipoint analysis for the X chromosome. Other softwares, such as GENEHUNTER,48 which has that function, unfortunately cannot handle the large multiplex pedigrees that make up most of our study sample. Breaking down these large pedigrees into smaller ones is an option, but this procedure may result in a considerable loss of statistical power—especially for a sample like the present study with huge multigeneration pedigrees as the main component. Although multipoint linkage results are more accurate than two point linkage, our two point results on the X chromosome still provide a clear and consistent pattern for linkage of Xq24–25 to height. The selection of this region for confirmation study is based on our linkage scan of the X chromosome using 18 microsatellite markers,20 giving us a marker density of ∼10 cM/marker. A clear peak of linkage signal was formed in the Xq24–25 region, with the marker DXS1001 having the highest LOD score (fig 1). Three markers, including DXS1001 and two flanking ones, are thus chosen to saturate this region in the present study. The marker DXS8067, which is just 2.7 cM away from DXS1001, achieved a much higher LOD score of 2.66 (fig 1). More notably, in the purely independent subset of the sample—that is, the 1058 subjects from 26 pedigrees—the significant linkage of the marker DXS8067 to height is maintained. Given such a remarkable increase in LOD score in the extended sample, and a consistently significant linkage in the independent subset of the sample, the chance of a type I error for the linkage of Xq24–25 is remote. Further investigation of this region’s importance is justified.

This study represents the first follow up study in the field of genome-wide scan for height. It is also the first one bringing to light the X chromosome as the potential genetic basis of normal human height variation. Based on the result, subsequent fine mapping studies will be pursued. Candidate genes inside the Xq24–25 region will be subject to association studies and transmission disequilibrium test studies49 in pursuit of functional mutations. Our efforts will contribute to the understanding of genetics of human growth and the genetic architecture of complex traits as a whole.

Acknowledgments

The investigators were partially supported by grants from Health Future Foundation, NIH grants (K01 AR02170–01, R01 AR45349–01, R01 GM60402–01A1, P01 DC01813–07), grants from State of Nebraska cancer and smoking related disease research program and the State of Nebraska tobacco settlement fund, US Department of Energy grant DE-FG03–00ER63000/A00, grants from Creighton University, grants from National Science Foundation of China, a grant from Huo Ying Dong Education Foundation, and grants from HuNan Normal University and the Ministry of Education of China.

REFERENCES

View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.