Article Text


A survey of haplotype variants at several disease candidate genes: the importance of rare variants for complex diseases
  1. P-Y Liu1,
  2. Y-Y Zhang1,
  3. Y Lu1,
  4. J-R Long1,
  5. H Shen1,2,
  6. Lan-J Zhao1,2,
  7. F-H Xu1,2,
  8. P Xiao1,2,
  9. D-H Xiong1,2,
  10. Y-J Liu1,2,
  11. R R Recker1,
  12. H-W Deng1,2,3
  1. 1Osteoporosis Research Center, Creighton University, Omaha, NE 68131, USA
  2. 2Department of Biomedical Sciences, Creighton University, Omaha, NE 68131, USA
  3. 3Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, Hunan 410081, China
  1. Correspondence to:
 Dr H-W Deng
 Osteoporosis Research Center, Creighton University Medical Center, 601 N. 30th St., Suite 6787, Omaha, NE 68131, USA;


Background: The haplotype based association method offers a powerful approach to complex disease gene mapping. In this method, a few common haplotypes that account for the vast majority of chromosomes in the populations are usually examined for association with disease phenotypes. This brings us to a critical question of whether rare haplotypes play an important role in influencing disease susceptibility and thus should not be ignored in the design and execution of association studies.

Methods: To address this question we surveyed, in a large sample of 1873 white subjects, six candidate genes for osteoporosis (a common late onset bone disorder), which had 29 SNPs, an average marker density of 13 kb, and covered a total of 377 kb of the DNA sequence.

Results: Our empirical data demonstrated that two rare haplotypes of the parathyroid hormone (PTH)/PTH related peptide receptor type 1 and vitamin D receptor genes (PTHR1 and VDR) with frequencies of 1.1% and 2.9%, respectively, had significant effects on osteoporosis phenotypes (p = 4.2 × 10−6 and p = 1.6 × 10−4, respectively). Large phenotypic differences (4.0∼5.0%) were observed between carriers of these rare haplotypes and non-carriers. Carriers of the two rare haplotypes showed quantitatively continuous variation in the population and were derived from a wide spectrum rather than from one extreme tail of the population phenotype distribution.

Conclusions: These findings indicate that rare haplotypes/variants are important for disease susceptibility and cannot be ignored in genetics studies of complex diseases. The study has profound implications for association studies and applications of the HapMap project.

  • BMD, bone mineral density
  • CD-CV, common diseases common variants
  • LD, linkage disequilibrium
  • PTH, parathyroid hormone
  • PTHR1, parathyroid hormone receptor 1
  • VDR, vitamin D receptor
  • association
  • complex diseases
  • haplotype
  • rare variants

Statistics from

Haplotype analyses have become increasingly important in genetic studies of human diseases. When multiple markers, often in linkage disequilibrium (LD), in a chromosomal region are studied to assess the association between this region and the study traits of interest, a statistical analysis based on haplotypes may often be more efficient than separate analyses of individual markers. This has been demonstrated both through empirical1,2 and simulation3–5 studies. Firstly, haplotype analyses take into account of a number of tightly linked markers, which are much more informative than individual markers.6,7 Secondly, haplotype analyses can identify unique chromosomal segments likely to harbour disease predisposing genes. The phenotypic effect of several mutations at different sites within a gene can depend on whether the mutations occur on the same chromosome (in cis, as a haplotype) or on opposite homologous chromosomes (in trans).1,8,9 These findings emphasise an important aspect of examining candidate genes by SNP haplotyping.

The human genome has been portrayed as a series of high LD regions with limited haplotype diversity.10,11 Several common haplotypes that can be captured by a few tagged SNPs usually account for a majority of genetic variation in the genomic regions or candidate genes.10–12 Such haplotype patterns observed in empirical studies have triggered the development of the International HapMap Project (, which aims to determine the common patterns of DNA sequence variation in the human genome.13 Focusing on these common haplotypes greatly facilitates LD based mapping analyses.12 By comparing the frequency of haplotype variants in unrelated cases and controls and/or the disease phenotypic distribution among haplotype variants in the cohort samples of moderate size, genetic association studies can identify specific disease predisposing common haplotypes. However, a critical question is whether rare haplotypes play an important role in influencing disease susceptibility and thus should not be ignored in design and execution of association studies.

Given the heightened interest in association studies using haplotypes, we aimed to assess the potential role of rare haplotypes influencing disease susceptibility. In the present study, we surveyed six candidate genes with 29 SNPs, which had an average marker density of 13 kb and covered a total of 377 kb DNA sequence, for osteoporosis (a common late onset bone disorder) in a large sample of 1873 white subjects. We systemically investigated haplotype variations at these candidate genes, haplotype effects on disease phenotypes, and phenotypic distribution of haplotype carriers in the sample.


Study subjects

The subjects came from study to search for genes underlying the risk of osteoporosis being carried out in the Osteoporosis Research Center of Creighton University Medical Center. We recruited 405 nuclear families totaling 1873 subjects, including 740 parents, 744 daughters, and 389 sons with a mean (SD) family size of 4.62 (1.78). All the subjects were white, of European origin. Only healthy people were included, with the exclusion criteria being as detailed earlier.14 Briefly, patients with chronic diseases and conditions that might potentially affect bone mass were excluded from the study. These diseases/conditions included chronic disorders involving vital organs (heart, lung, liver, kidney, and brain), serious metabolic diseases (including diabetes, hypoparathyroidism and hyperparathyroidism, hyperthyroidism), other skeletal diseases (including Paget’s disease, osteogenesis imperfecta, and rheumatoid arthritis), chronic use of drugs affecting bone metabolism (corticosteroid therapy and anti-convulsant drugs), and malnutrition conditions (including chronic diarrhoea and chronic ulcerative colitis). For each study subject, we obtained the information on age, sex, medical, family and reproductive history, physical activity, alcohol use, and dietary and smoking habits. The study was approved by the institutional review board of Creighton University and informed consent documents were obtained for each subject.

Candidate genes

The chosen study candidate genes were apolipoprotein E (APOE), type I collagen α1 (COL1A1), oestrogen receptor-α (ER-α), parathyroid hormone (PTH)/PTH-related peptide receptor type 1 (PTHR1), transforming growth factor-β1 (TGF-β1) and vitamin D receptor (VDR). They are significant in terms of their functional roles in bone metabolisms, and/or their prominence in the genetic studies of osteoporosis.15 A total of 29 SNPs for these candidate genes were identified from the database dbSNP ( These selected SNPs for the study were based on a comprehensive consideration of the criteria of: (a) functional relevance and importance (missense mutation, frameshift mutation etc.), (b) level of heterozygosity, (c) position in or around the gene, and (d) their use in previous genetic epidemiology studies. Detailed information about SNPs analyzed in this study is presented in table 1. These SNPs spanned a total of about 377 kb; the average physical distance between neighbouring markers was 13.0 kb. However, the pairwise LD (D′) is highly variable among these candidate genes, ranging from 0.02 to 1.0 with an average of 0.48.16

Table 1

 Information about the 29 SNPs in the six candidate genes for osteoporosis

SNP genotyping

Genomic DNA was extracted from whole blood using a commercial isolation kit (Gentra Systems, Minneapolis, MN, USA) following the procedure detailed in the kit. The genotyping procedure for all SNPs was similar, involving PCR and Invader assay (Third Wave Technology, Madison, WI, USA). PCR was performed in 10 μl reaction volume with 35 ng genomic DNA, 0.2 mmol/l each of dCTP, dATP, dGTP and dTTP, 1× PCR buffer and 1.5 mmol/l MgCl2, 0.4 μmol/l each of the primers, and 0.35 U of Taq polymerase (ABI, Applied Biosystems, Foster City, CA, USA). The sequences of the PCR primers for all SNPs are presented in table 1. The following procedure was used on an ABI 9700 thermal cycler: 95°C for 5 minutes, 30 cycles of 94°C for 1 minute, 50°C for 1 minute, 72°C for 1 minute, and then 72°C for 5 minutes. After amplification, the product was diluted 1:20 in nuclease free water. Invader reaction was performed in a 7.5 μl reaction volume, with 3.75 μl diluted PCR product, 1.5 μl probe mix, 1.75 μl Cleavase FRET mix, and 0.5 μl Cleavase enzyme/MgCl2 solution (Third Wave Technology). The reaction mix was overlaid by 15 μl mineral oil and denatured at 95°C for 5 minutes, and then incubated at 63°C for 20 minutes in an ABI 9700 thermal cycler. After incubation, the fluorescence intensity for both colours (FAM and Red dyes) was measured using Cytofluor 4000 (ABI). The data were then used with Invader Analyzer software (Third Wave Technology), and the genotype for every sample was identified according to the ratio of the fluorescence intensity of the two dyes. PedCheck software17 was used to verify Mendelian inheritance of the alleles within each family and the family relationships (


Bone mineral density (BMD) is the most important surrogate phenotype for osteoporosis, which is mainly characterised by low BMD (34). Femoral neck and total hip BMDs (g/cm2) were measured by a Hologic 2000+ or a 4500 dual energy x ray absorptiometry (DXA) scanner (Hologic Inc., Bedford, MA, USA). Both machines were calibrated daily, and the coefficient of variability (CV) values of the DXA measurements at the femoral neck and total hip were 1.87% and 1.0% on the Hologic 2000+, and 1.98% and 1.4% on the Hologic 4500. Of the subjects, 92% were measured on the Hologic 4500. Data obtained from different machines were transformed to a compatible measurement,18 which has been shown to be highly reliable and accurate.19 Members of the same nuclear family were measured on the same scanner. At the same visit of the BMD scan, weight was measured using a calibrated balance beam scale, and height was measured using a calibrated stadiometer.

Statistical analyses

Haplotype pairs carried by each individual were inferred in nuclear families using Genehunter (version 2.1; Subjects with ambiguous haplotypes were excluded for further analyses. Specifically, we excluded 87, 59, 49, 70, 47 and 46 such subjects for haplotype analyses at the APOE, COL1A1, ER-α, PTHR1, TGF-β1 and VDR genes, respectively. To avoid reporting results based on very few individuals (∼20), only those haplotypes with frequencies greater than 0.8% were analysed. In genetic analyses, the phenotypic values were tested for measured potentially important covariates and adjusted for those significant ones (age, sex, weight and height). These adjustments, in consideration of the correlation structures among subjects,21 were performed in the regression model described by George and Elston22 and implemented in SOLAR ( The model residuals were calculated by subtracting the fitted values for covariate effects from the original phenotypic values and were used as phenotypic variables in haplotype analyses. This adjustment procedure is similar to that using the regular multiple regression model with BMD as a dependent variable and with age, sex, weight, and height as independent variables, apart from considering kinship of subjects within pedigrees. The heritabilities of BMD phenotypes and their standard errors were also estimated in the above covariate analysis. The normality of the phenotype data was examined by the Kolmogorov-Smirnov test implemented in SPSS 10.0 (SPSS Inc., Chicago, IL, USA). The mean difference of the studied phenotypes was performed by comparing individuals carrying a specific haplotype with non-carriers, using two sample two sided t tests. The empirical p values of the t tests were obtained by permutation tests. During the permutation procedure, the original phenotypic and genotypic data were reshuffled 107 times. The t test statistics were then computed on each dataset generated by reshuffling. Using Bonferroni correction for multiple tests, we obtained an empirical threshold p⩽5.4 × 10−4 for single test, which achieves a global significance level of 0.05 for our study. The power for association studies using variance component models for TDT for sibship data24 was obtained by the Genetic Power Calculator (

We also calculated the effective number of haplotypes and expected haplotype heterozygosity for each gene. The effective number of haplotypes, analogous to the effective number of alleles,26 was calculated as:

Embedded Image

where pi is the frequency of the ith haplotype and ne is the effective number of haplotypes. Expected haplotype heterozygosity was calculated using the equation 1-(1/ne).


Characteristics of the study subjects

Descriptive characteristics of the study subjects stratified by sex are presented in table 2. Men were generally older, taller, and heavier than women in our sample. The mean BMD unadjusted for any covariates (age, sex, height, and weight) at the femoral neck and hip were significantly different between men and women (p<0.001); men had 6.0% and 10.8% higher femoral neck and hip BMD, respectively. Covariates accounted for 38.3% and 38.7% of phenotypic variations in femoral neck and hip BMD, respectively. After adjusting for covariate effects, the heritabilities (SE) for femoral neck and hip BMD were estimated to be 0.61 (0.04) and 0.65 (0.04), respectively, which fall into the range of the heritability estimates for BMD reported elsewhere in whites.15

Table 2

 Basic characteristics of the study subjects

Haplotype variations

For the six candidate genes studied here, the effective number of haplotypes varied from 2.3 to 7.4 (mean 5.0) and the haplotype heterozygosity varied from 0.57 to 0.91 (mean 0.73). The number of common haplotypes with frequencies ⩾5% ranged from two to seven, with an average of 4.3 per gene. These common haplotypes accounted for, on average, 87% (ranging from 56 to 98%) of all chromosomes in our white sample (fig 1). We observed a large number of rare haplotypes with frequencies as low as 0.2%, owing to the large sample size used in our study.

Figure 1

 Proportion of chromosomes represented by rare haplotypes with frequencies <5% (black bars) and common haplotypes with frequencies ⩾5% (grey bars) for different candidate genes. The numbers at the top of bars indicate the total amount of rare or common haplotypes observed in the sample.

Haplotype effects on osteoporosis phenotypes

We compared individuals carrying a specific haplotype with non-carriers in our sample. Two rare haplotypes of the PTHR1 and VDR genes showed significant effects on femoral neck and hip BMD, respectively (fig 2). The haplotype H5 of the PTHR1 gene (that is, the GATG haplotype) accounted for only 1.1% of chromosomes in the population (fig 2A). However, individuals carrying H5 had 5.5% significantly higher femoral neck BMD than non-carriers (p = 4.2 × 10−6). This significant result remained unchanged by permutation tests (p<1.0 × 10−7), suggesting that the association was very unlikely to be a false positive. Similarly, the H10 of the VDR gene (that is, the ATAC haplotype) accounted for 2.9% of chromosomes in the population, and subjects with this haplotype had an average of 4.0% lower hip BMD than those without (p = 1.6 × 10−4, fig 2B). The other unspecified rare haplotypes did not show significant differences between carriers and non-carriers. Among common haplotypes, the haplotype H4 (that is, the C+C haplotype) of the TGF-β1 gene accounted for 47.7% of chromosomes in the population, and showed significant differences (∼1.4%) in femoral neck BMD between its carriers and non-carriers (p = 4.1× 10−4).

Figure 2

 Mean BMD (g/cm2) of carriers (black bars) and non-carriers (grey bars) of specific haplotypes. Only haplotypes with frequencies >0.8% are considered. (A) Femoral neck BMD and PTHR1 haplotypes. (B) Hip BMD and VDR haplotypes.

Phenotypic distribution of haplotype carriers

Based on the interesting results for rare haplotypes found above, we further examined the distribution of femoral neck BMD among various PTHR1 haplotype carriers and attempted to determine if the haplotype effect of the rare H5 is an artefact due to outliers and/or anomalous data distribution (fig 3A). The femoral neck BMD data had kurtosis and skewness coefficients of 0.54 and 0.22, respectively, and fitted well to a normal distribution (p = 0.34). Importantly, the H5 haplotype carriers showed quantitatively continuous variation in the sample. They were obviously derived from a wide spectrum rather than only from one extreme tail of the population phenotype distribution. In our sample, only two individuals carrying the H5 had femoral neck BMD phenotype within the range μ ± 4σ, while other carriers within the range μ ± 3σ. Similar distribution was observed in the H10-haplotype carriers at the VDR gene for the hip BMD data (fig 3B). Therefore, the effects of the rare haplotypes observed are not artefacts due to outliers and/or anomalous data distribution.

Figure 3

 BMD distributions among various haplotype carriers. Carriers of different haplotypes are indicated by different colours. (A) Femoral neck BMD among PTHR1 haplotype carriers. (B) Hip BMD among VDR haplotype carriers.


In our study, we assessed haplotype effects on osteoporosis phenotypes in a large white sample, with a particular emphasis on the potential role of rare haplotypes influencing disease susceptibility. We demonstrated that two rare haplotypes had significant effects on osteoporosis phenotypes. Large phenotypic differences were observed between carriers of these rare haplotypes and non-carriers. These findings indicate that rare haplotypes/variants are important for disease susceptibility and cannot be ignored in genetics studies of complex diseases.

Our results have particular relevance for association studies, particularly using haplotypes, for identification of complex disease genes. The major attraction of haplotype methods is that common haplotypes explain most of the genetic variation in the genomic regions or candidate genes, and that these haplotypes can be captured by a small number of tag SNPs. Focusing on these common haplotypes greatly facilitates experimental design and execution of association studies.12 In our empirical data, on average, four common haplotypes per gene represented 87% of all chromosomes in the sample. We found two rare haplotypes showing large phenotypic difference between carriers and non-carriers. For example, two common PTHR1 haplotypes accounted for 87% of total genetic diversity in the sample, while the remaining diversity was explained by the remaining 14 rare haplotypes, of which the haplotype H5 (with a frequency of 1.1%) had significant effects on femoral neck BMD. However, we did not observe any significant evidence for the two common haplotypes of PTHR1. These results imply that analysis based exclusively on common haplotypes may be inadequate in association studies.

Our results provide indirect evidence for better understanding the genetic architecture of complex diseases, which is ultimately important for the success of association mapping. The “common diseases common variants” (CD-CV) hypothesis proposes that the genetic risk for common disease will often be due to disease predisposing variants found relatively commonly in susceptible populations.27–29 Accordingly, a systematic association analysis of common variants in the human genome should reveal the major causative genetic contributions to diseases with considerably greater statistical power than linkage approach. Several favourite examples from the CD-CV proponents include APOE e4 in Alzheimer’s disease,30PPARγ Pro12Ala in type 2 diabetes,31 factor V Leiden in deep vein thrombosis,32 and CCR5 in protection against HIV.33 On the other hand, increasing evidence of allelic complexity at the loci predisposing to complex diseases has been observed,34 in contrast to the CD-CV hypothesis. A recent modelling study showed that most genetic variance underlying complex diseases probably attributes to loci where susceptibility mutations are mildly deleterious and where the overall mutations rate (and allelic heterogeneity) is relatively high.35 Allelic complexity may be even greater for late onset chronic diseases (such as osteoporosis), as negative selection does not act strongly on phenotypes that typically afflict individuals later in life, after reproduction has taken place.36 Our empirical data demonstrated that two haplotypes of the PTHR1 and VDR genes, which were significantly associated with the variation in osteoporosis phenotypes (p = 4.2 × 10−6 and p = 1.6 × 10−4, respectively), were rare, with frequencies of 1.1% and 2.9%, respectively. The two rare haplotypes conferred large phenotypic differences (4.0–5.0%) for carriers versus non-carriers in the sample. It should be noted that similar significance was also found at a common TGF-β1 haplotype that had a frequency of 47.7% and accounted for about 1.4% phenotypic difference in the sample. These findings should increase knowledge on the genetic architecture of complex diseases.

Our results unambiguously showed that rare variants may play an important role in influencing the susceptibility to common diseases. Identification of such rare variants in whole genome association studies will pose a daunting challenge. For example, assuming additive models and using variance component approaches for TDT for sibship data,24 3940 random sibling pairs would be required to have 80% power to detect a QTL underlying 1% phenotypic variation at a genomewide significant level of p<5.0 × 10−8 if the tested marker was a functional mutation variant and had a minor allele frequency of 1%.37 However, only 415 sibling pairs would be required in the same situation if the frequency of the functional mutation allele was increased to 10%. The sample size required to detect rare variants with sufficient statistical power becomes prohibitive in association studies when there is incomplete LD and/or the frequencies of the tested marker and functional mutation allele are not matched. The currently employed sample size in association studies is generally on the order of a few hundred individuals. This sample size may be suitable for identifying common variants, but will probably miss important rare variants without a sample as large as or larger than ours. This attests to the necessity of large sample association studies with collaborative efforts for identifying these rare variants. This also implies that the results from current association studies with small to moderate sample sizes may tend to favour the CD-CV hypothesis unduly.

Data are limited at present to anticipate how frequently the hypothesis of common variants or rare variants is correct at different susceptibility loci.38–40 Common variants may contribute to a large extent of phenotype variations in general populations; however, rare variants may also be important in human health and cannot be ignored in genetic studies of complex diseases, because they may confer large phenotype difference for carriers versus non-carriers, as shown here. Certainly, association methods will work well in some susceptibility loci for some common diseases. It should be kept in mind, however, that gene mapping studies should not overlook those rare variants that exert a large effect size on common diseases. Classical linkage analysis and positional cloning still remain the method of choice for identifying rare and high risk disease associated variants, owing to the clear inheritance patterns that they display in large and affected pedigrees.38,41

Several caveats for our findings should be acknowledged. Firstly, the main strength of our study is the large sample with 1873 subjects used in the analyses, which allows assessment of the potential role of rare halotypes. However, it makes the studied candidate genes unavailable for molecular determination of haplotypes and comprehensive genotyping, owing to limited resources. In our sample, haplotypes were inferred using family data. Information from relatives and the use of a large family dataset can help resolve haplotype ambiguity and greatly increase the precision of haplotype inference.42 Empirical studies also indicate advantages to using family data, including detection of genotyping errors and integration with meiotic maps.43 Therefore, our analyses based on statistically inferred haplotypes are reliable and robust. Secondly, the chosen SNPs were distributed with an average density of 13 kb, which should capture most of the genetic variation in the studied candidate genes. Patterns of haplotype variation observed in our sample were largely consistent with previous studies.10–12 Thus, our conclusions drawn from such haplotype data should be generally applicable to genetic studies of complex diseases. Thirdly, although TDT methods are robust to population stratification, they can only use offspring information from informative families. This leads to a potentially large reduction in power to detect allelic associations. In our study, we used whole family data in the analyses, and the results were validated by robust permutation tests. In a previous study, Long et al examined population stratification in the same sample as ours by testing the equality of within and between family genetic components.44 They did not find any evidence for population stratification in the sample. Therefore, the results obtained from our analyses are convincing.


Investigators of this work were partially supported by grants from Health Future Foundation, NIH, State of Nebraska, US DOE. The study was also benefited by grants from CNSF, the Huo Ying Dong Education Foundation, and the Ministry of Education of China.


View Abstract


  • Competing interests: none declared

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.