Article Text


Genomewide scans of red cell indices suggest linkage on chromosome 6q23
  1. A Iliadou1,
  2. D M Evans1,
  3. G Zhu1,
  4. D L Duffy1,
  5. I H Frazer2,
  6. G W Montgomery1,
  7. N G Martin1
  1. 1Queensland Institute of Medical Research, Brisbane, Queensland, Australia
  2. 2Centre for Immunology and Cancer Research, University of Queensland, Princess Alexandra Hospital, Brisbane, Queensland, Australia
  1. Correspondence to:
 Dr D M Evans
 The Welcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK; davide{at}


Background: The red cell indices quantify the size, number and oxygen-carrying ability of erythrocytes. Although the genetic basis of many monogenic forms of anaemia is well understood, comparatively little is known about the genes responsible for variation in the red cell indices among healthy participants.

Objective: To identify quantitative trait loci (QTLs) responsible for normal variation in the red cell indices of 391 pairs of dizygotic twins who were measured longitudinally at 12, 14 and 16 years of age.

Results: Evidence suggesting linkage of red cell indices to haemoglobin concentration (LOD  = 3.03) and haematocrit (LOD  = 2.95) on chromosome 6q23, a region previously identified as possibly harbouring a QTL for haematocrit, was found. Evidence for linkage to several other regions of the genome, including chromosome 4q32 for red cell count and 7q for mean cell volume, was also found. In contrast, there was little evidence of linkage to the chromosomal regions containing the genes for erythropoietin (7q21) and its receptor (19p13.2), nor to the regions containing the genes for the haemoglobin α (16p13.3) and β chains (11p15.5).

Conclusion: Findings provide additional evidence for a QTL affecting haemoglobin and haematocrit on chromosome 6q23. In contrast, polymorphisms in the genes coding for erythropoietin, its receptor and the haemoglobin α and β chains do not appear to contribute substantially to variation in the red cell indices between healthy persons.

  • HCT, haematocrit
  • ibd, identical-by-descent
  • LOD, logarithm of odds ratio
  • MCV, mean corpuscular volume
  • QTL, quantitative trait locus
  • RBC, red blood cell count

Statistics from

The red cell indices describe the size, number and oxygen-carrying capacity of erythrocytes. Haemoglobin concentration indicates the amount of oxygen-carrying protein haemoglobin in a given volume of blood; red blood cell count (RBC), the concentration of erythrocytes; mean corpuscular volume (MCV), the average volume of each red cell; and haematocrit (HCT), the proportion of blood that consists of erythrocytes. Together, these indices assist in the differential diagnosis of anaemia and are risk factors for a number of clinical conditions. For example, a high haematocrit is associated with increased risk of cerebrovascular1,2 and coronary artery diseases.1

Approximately 7% of the world’s population are carriers for different inherited disorders of haemoglobin, making them one of the most prevalent Mendelian diseases.3 Although the genetic basis of many of these haemoglobinopathies is well understood, comparatively little is known regarding the genetic causes of variation in the normal range of red cell values. Studies on twins have shown that a substantial proportion of the variation between persons is due to genetic factors with heritability estimates ranging from 20% to 96%.4–9 A recent genomewide linkage study by Lin et al7 identified a locus on chromosome 6q23–24 linked to HCT, and a pleiotropic locus affecting haemoglobin and HCT on chromosome 9q.7 Interestingly, the authors found no evidence of linkage to regions containing the genes for the α and β haemoglobin chains (on chromosomes 16p13.3 and 11p15.5), nor erythropoietin and its receptor (on chromosomes 7q21 and 19p13.2), which have all been implicated in Mendelian forms of anaemia. The implication is that the quantitative trait loci (QTLs) responsible for normal variation in the red cell indices might differ from the loci which cause monogenic forms of anaemia.

In a previous study, we reported that variation in the red cell indices was highly heritable, with genetic factors explaining between 61% and 96% of the phenotypic variance in our sample of adolescent twins.5 In this paper, we extend these results by performing a genomewide linkage scan of the 391 dizygotic twin pairs from that study. We measured the red cell indices of twins at 12, 14 and 16 years of age and performed univariate multipoint sib-pair linkage analyses across the genome. Our study is the first to report linkage results for RBC and MCV, and will hopefully constitute the first stage in the identification and subsequent positional cloning of QTLs responsible for normal variation in the red cell indices.



Twins were recruited as part of an ongoing study concerned with the development of melanocytic naevi (moles), the clinical protocol of which has been described in detail elsewhere.10–12 Twins were enlisted by contacting the principals of primary schools in the greater Brisbane area, media appeals and by word of mouth. Informed consent was obtained from all participants and parents before testing. The results reported here are for data collected from May 1992 to June 1999. Twins were tested as closely as possible to their 12th, 14th and 16th birthdays. Data were obtained from 706 pairs of twins comprising 154 monozygotic females, 161 monozygotic males, 91 dizygotic females, 104 dizygotic males and 196 dizygotic twin pairs of opposite sex (including 91 pairs in which the females were born first and 105 pairs in which the males were born first). Not all twins were tested across all three measurement occasions. Table 1 shows a breakdown of these data. For example, the first row in table 1 indicates that 85 twin pairs with complete phenotypic and genotypic information were tested at age 12 years only (ie, not at ages 14 or 16 years). No attempt was made to exclude subjects having illness, although a few such cases were subsequently excluded as outliers.13

Table 1

 Breakdown of participation showing the number of complete twin pairs for whom all the red cell indices and genotype information (in the case of dizygotic pairs) were available

Venous blood was collected into a 5-ml EDTA tube. Total blood haemoglobin (g/l), RBC (×1012/l) and MCV (fl) were measured using a Coulter Model STKS blood counter. From these values, haematocrit was calculated by multiplying RBC count with MCV. Dizygotic twin pairs were selected for linkage analysis on the basis of previous zygosity testing.14


DNA was extracted from buffy coats using a modification of the “salt method”.15 For twin pairs of the same sex, zygosity was determined by typing nine independent DNA microsatellite polymorphisms and the X/Y amelogenin marker for sex determination by polymerase chain reaction (ABI Profiler system). All twins were also typed for ABO, Rh and MNS blood groups (zygosity was subsequently confirmed via the genome scans).

The genome scan consisted of 726 highly polymorphic autosomal microsatellite markers at an average spacing of ∼5 cM in 539 families (2360 participants). Markers on the X chromosome were also typed, but linkage to these is not reported here. The microsatellites consisted of a combination of markers from the ABI-Prism and CIDR genotyping sets. Overlapping parts of the sample received either a 10 cM scan using the ABI-2 marker set (400 markers) at the Australian Genome Research Facility (Melbourne, Australia) or a 10 cM scan using the Weber marker set at the Center for Inherited Disease Research (Baltimore, Maryland, USA), or both. Only 30 markers were common to both marker sets and were used for quality control; the remaining markers intercalated to form a scan at approximately 5 cM spacing. The only families to receive one scan had both parents genotyped, and so had high information content. The average heterozygosity of markers was 0.78 and the mean information content was 0.77. Although genome scan data were available from parents, twins and siblings, phenotype data (ie, blood cell counts) were available only from twins. Full details of the genome scan are provided elsewhere.14

Linkage analyses

Univariate multipoint variance components linkage analysis was used to test for linkage between marker loci and blood cell phenotypes.16–19 Variance components were estimated by maximum-likelihood analysis of the raw data20 as implemented in the software package MERLIN21 along with fixed effects for sex and age (ie, although every effort was made to measure twins at their 12th, 14th and 16th birthdays, some twins were slightly older or younger than this). As both circadian and seasonal effects have been reported for erythrocytes, linear, quadratic and sinusoidal fixed effects were included for the time of day and month from which blood was sampled.22 Univariate multipoint linkage analyses were performed for each marker at each age. Only phenotypic data from dizygotic pairs were included in the analyses because monozygotic twins share all their identical-by-descent (IBD) genes across the genome, and are thus uninformative for linkage. Note also that although red cell indices were measured only in twins, parental and sibling genotypes still helped determine IBD sharing between the dizygotic twin pairs.

The null hypothesis that the additive genetic variance in a trait caused by a QTL linked to a given marker is zero (ie, σq2  = 0) was tested by comparing the likelihood of a reduced model in which σq2 was constrained to zero with the likelihood of a model in which the genetic variance due to the QTL (σq2) was estimated. Twice the difference in natural log-likelihood between these models is distributed asymptotically as a 1/2:1/2 mixture of χ12 and a point mass at zero,23 whereas the difference between the two log10 likelihoods produces a LOD score equivalent to the classical LOD score of parametric linkage analysis.24

Multivariate analyses

Several groups have shown that multivariate methods can increase the power of QTL linkage analysis.25–29 The rationale is that information on the QTL comes not only from each variable’s variance but also from the covariation between the different measures. Multivariate analysis takes advantage of this fact and models both the variation and the covariation between the variables in terms of the underlying QTL. Given the high correlation between the haemoglobin and RBC variables, as well as the moderate negative correlation between the RBC and MCV variables (see Results section), we used a multivariate procedure in an attempt to increase evidence for linkage at the most promising chromosomal regions. Specifically, we performed multivariate analyses of the haemoglobin, RBC and MCV variables on chromosome 6 after the univariate analyses had indicated the possible existence of a QTL affecting these variables in 14-year-old twins (see Results section). We did not perform longitudinal genetic analyses on any of the variables because of the small number of dizygotic twins measured across all occasions (table 1), and did not include HCT in any of the multivariate models as this measure was derived from the RBC and MCV variables.

Multivariate QTL linkage analyses were performed using structural equation modelling as implemented in the computer package Mx30–32 using data from both monozygotic and dizygotic pairs. The QTL was modelled as a single latent factor which pleiotropically affected the phenotypes. In this model, the correlation between the QTL effects was set to one for monozygotic twins (as monozygotic twins are genetically identical), and to π̂, the estimated proportion of shared IBD at the marker locus for dizygotic twins. The probabilities of sharing zero (p0), one (p1) or two (p2) marker alleles IBD for each dizygotic twin pair were calculated in a multipoint fashion using the Lander–Green algorithm as implemented in MERLIN.21 These probabilities were then used to obtain π̂ = p2+½p1 for each dizygotic pair. Residual sources of variation were modelled using saturated Cholesky structures. As phenotypic information was available from monozygotic twins, it was possible to partition the residual familial resemblance into separate variance components owing to additive genetic and common environmental sources of variation. Multivariate analyses included the same fixed effects as in the univariate models. We compared the fit of the full multivariate model with a model where the latent factor containing the QTL effects was absent. Twice the difference in log-likelihoods between the models was evaluated against a χ2 distribution with 3 degrees of freedom (df).31


Table 2 presents the range, means and standard deviations for each of the red cell indices. All variables approximated the normal distribution and displayed minimal skew and kurtosis. We draw the reader’s attention to the large interindividual variation that exists between these healthy, Australian twins.

Table 2

 Minimum and maximum values, means and standard deviations (SD) in the red cell indices for males and females at 12 years, 14 years and 16 years of age

Figures 1–4 show the results from the genomewide tests of linkage. Each plot displays the LOD scores for all three ages so that it is possible to compare the consistency of results across measurement occasions. Suggestive evidence of linkage occurred at age 14 years on chromosome 6q23 for the highly correlated haemoglobin (LOD = 3.03; fig 1) and haematocrit (LOD = 2.95; fig 2) measures. Some minor evidence was also present at ages 12 and 14 years for RBC on chromosome 6 (fig 3), although none of the red cell indices showed any evidence of linkage in these regions at age 16 years. Another major peak occurred at age 16 years on chromosome 4q32 for RBC (LOD = 3.07; fig 3), with some evidence of linkage at age 12 years but not at age 14 years.

Figure 1

 Genome scan for haemoglobin. LOD, limit of detection. Age in years.

Figure 2

 Genome scan for haematocrit. LOD, limit of detection. Age in years.

Figure 3

 Genome scan for red blood cell count. LOD, limit of detection. Age in years.

Table 3 summarises several other regions of suggestive linkage (ie, defined here as LOD score >1.5). These included regions which were present either at more than one age or for more than one variable. The most promising example of this was MCV, where there was suggestive evidence of linkage on chromosome 7q at all three ages (fig 4), and chromosome 17, which contained peaks for haemoglobin, RBC, HCT and MCV in similar regions at age 14 years. We found little evidence of linkage on chromosome 16p13.3 or 11p15.5, which contain the genes for the haemoglobin α and β chains (all LODs <1). There was also no evidence of linkage on chromosome 7q21 or 19p13.2, which contain the genes for erythropoietin and its receptor (all LODs <1 except for MCV at age 12 years, where LOD = 1.02 at marker D7S221).

Table 3

 Regions with multipoint LOD >1.5 from the genome scan of the red cell indices

Figure 4

 Genome scan for mean corpuscular volume. LOD, limit of detection. Age in years.

Table 4 displays the correlations between the different red cell indices. In each age group there was a high positive correlation between haemoglobin, HCT and RBC, as well as a moderate negative correlation between RBC and MCV, but little correlation between MCV and the other red cell indices. We attempted to take advantage of this correlation structure by performing a multivariate analysis of haemoglobin, RBC and MCV after the univariate analyses had indicated the possible existence of a QTL on chromosome 6 in 14-year-old twins. Figure 5 displays the results of these analyses. Unfortunately, multivariate analysis of these variables failed to increase evidence of linkage in this region of the genome (a similar analysis performed on just haemoglobin and RBC also produced similar results—results not shown).

Table 4

 Pearson’s correlations between the different red cell indices at 12, 14 and 16 years of age

Figure 5

 Multivariate analysis of haemoglobin (Hb), red blood cell (RBC) and mean corpuscular volume (MCV) on chromosome 6 for 14-year-old twins. Also shown are the univariate results for each of these variables.


The most promising result from our study was the large linkage peak on chromosome 6q23 for the haemoglobin and HCT variables in 14-year-old twins. This is the same region identified by Lin et al7 as part of the Framingham Heart Study, although interestingly, Lin et al7 found linkage only for HCT and not for the haemoglobin variable. In our study, the haemoglobin and HCT variables were very highly correlated (r = 0.91–0.93 across all sexes and ages) so it was not surprising that the same chromosomal region was implicated by both variables. Chromosome 6q23 contains a number of promising candidate genes including EBP41L2, which codes for protein 4.1G, a member of the erythrocyte cytoskeletal protein 4.1R gene family (defects in erythrocyte cytoskeletal proteins are known to produce anaemia and other haematological anomalies), and HEBP2, which codes for a putative haem-binding protein. A form of hereditary persistence of fetal haemoglobin has also been mapped to this region.33

Multivariate analysis of this region in 14-year-old twins failed to increase evidence for linkage. This could be due to a number of factors. Firstly, the distribution of the multivariate test statistic is a complicated mixture of χ2 distributions which has yet to be well characterised in the literature.23–31 Evaluating the test against a χ2 distribution of 3df is slightly conservative and may be partially responsible for the apparent loss in power. Secondly, multivariate linkage analysis increases power to detect linkage only if the QTL pleiotropically affects more than one variable in the analysis. Possibly, the peaks identified in the univariate analyses represent different QTLs which affect the variables separately. In this case, a multivariate analysis of the phenotypes would be less powerful than the univariate analyses, as the multivariate test has more df. Finally, and most cynically, one or more of the univariate results might simply reflect random fluctuation and type I error.

A major strength of our study is the availability of multiple measurements on many of the same persons across time. Although these results do not formally constitute replication (because the data are not independent), they do indicate the extent to which the results are robust with respect to measurement error and temporal changes in the phenotypes. In this vein, it is interesting to note that although there was some evidence of linkage at ages of 12 and 14 years on chromosome 6q23 for the haemoglobin and HCT measures, there was little evidence of linkage in 16-year olds. This is particularly interesting given that the phenotypic correlation across measurement occasions was quite high (table 4), and previous longitudinal genetic analyses have suggested that most genetic variance from ages 12 and 14 years is transmitted to 16 years.13 A possible explanation is that there were fewer dizygotic twins measured at age 16 years than at ages 12 or 14 years (table 1), and hence the power to detect linkage at 16 years was not as great as that at 12 or 14 years. Alternatively, the positive results might simply reflect type I error and stochastic variation in the linkage signal.

Our study also idenitified several other regions of suggestive linkage. Most prominent among these was the large peak on chromosome 4q32 for RBC at age 16 years, and the peak on chromosome 7q36 which displayed linkage to MCV at all ages. Although there are no obvious candidate genes in these regions, the 7q36 region has previously been implicated in a linkage study involving a large family with heterocellular hereditary persistence of fetal haemoglobin.34 Although some of these regions may harbour QTLs, other peaks will be the result of random fluctuation and type I error. As always, the key to assessing the relevance of these findings is replication using independent datasets.

Similar to Lin et al,7 we found no evidence of linkage to regions containing the genes for the α and β haemoglobin chains (chromosomes 16p13.3 and 11p15.5), nor to the regions containing the genes for erythropoietin and its receptor (chromosomes 7q21 and 19p13.2). Polymorphisms in all of these genes can produce Mendelian forms of anaemia. The results from this study add weight to the notion that the QTLs responsible for normal variation in the red cell indices differ from the loci responsible for monogenic forms of anaemia, although it should be noted that the power of sib-pair linkage analysis is low, and so QTLs accounting for small proportions of the phenotypic variance could go undetected in our study.

In conclusion, we have found suggestive evidence for a QTL influencing HCT and haemoglobin on chromosome 6q23. This is the same region that was previously identified by Lin et al7 as containing a QTL for HCT. We intend to fine map this region (ie, increasing marker density) and perform association analysis of candidate genes in the area (eg, EBP41L2, HEBP2). This study might represent the first step in the eventual identification of genes, which can increase our understanding of the quantitative genetics of normal variation in the red cell indices.


View Abstract


  • Published Online First 1 September 2006

  • Competing interests: None declared.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.