Article Text
Abstract
The MSH2*1906G→C mutation was recently shown to be a rare yet highly penetrant mutation leading to colorectal cancer. The mutation was only found among Ashkenazi Jewish individuals and lies on an extended haplotype that is common in that population. This study determined that the mutation probably arose between 11 and 22 generations ago, during the time when the Ashkenazim were living in eastern Europe.
- DMLE, disease mapping using linkage disequilibrium
- colorectal cancer
- founder mutation
- haplotype
- hereditary cancer
Statistics from Altmetric.com
In both men and women, colorectal cancer is the third most commonly diagnosed cancer in Canada.1 A small proportion of colorectal cancer cases (less than 5% overall) can be attributed to highly penetrant mutations in susceptibility genes such as APC, MLH1, and MSH2.2
Recently, we reported identification and characterization of a new mutation in MSH2 (MSH2*1906G→C) among Ashkenazi Jewish individuals with colorectal cancer.3 The mutation appears to be rare in non-Ashkenazi Jews and is highly penetrant. Among 16 mutation carriers studied, recruited from five different countries, all carried the same haplotype around the mutation, indicating that all cases probably descend from a common ancestor. It seems clear that this mutation can be considered as a founder mutation in this population4; we have therefore estimated the probable age of the mutation in order to interpret its frequency and distribution in light of the demographic history of the Ashkenazim.
The usual method for estimating mutation age5 is based on the expected decay of linkage disequilibrium owing to recombination over the generations between time of the most recent common ancestor and today. Modifications have been suggested that adjust for population growth under a simple model.6 However, consideration of gene genealogies and models for coalescent times for different chromosomal lineages can lead to richer models capable of accounting for the randomness of the gene genealogy.
An intra-allelic coalescent model for a rare disease mutation, allowing for population growth and selection, was recently extended to multipoint linkage disequilibrium mapping.7 This approach elegantly incorporates the dependence between gene genealogies at multiple markers. Incorporation of realistic assumptions about the past leads to higher variability in the genealogies, and therefore the confidence regions for mutation age tend to be wider than regions from simpler approaches. It is also worth noting that this method estimates the time of origin of the mutation, and this may be substantially older than the time of the most recent common ancestor, which is the quantity estimated by other methods.5–7 Software is available (DMLE).8
Order and physical distances between markers were updated (UCSC, Santa Cruz, California, USA, July 2003). Genetic distances were taken from deCODE9 where available. Linear interpolation was used to obtain genetic distances for markers not in deCODE and for one marker (D2S391) where the genetic distance given by deCODE9 gave an incorrect marker ordering. The single locus method requires knowledge of haplotype phase. For the 12 families where phase was unknown, it was estimated using the software PHASE10; we seeded the algorithm with the haplotypes of the four phase-known cases, and sampled from the posterior distribution of mutation carrying haplotypes. We estimated population growth among Ashkenazi people from the year 1500 onwards at 1.5-fold or 1.6-fold per generation, assuming 25 years per generation.5 The proportion of mutation carrying chromosomes sampled was estimated from a lifetime risk of colorectal cancer of 6.85%,1 a worldwide population of Ashkenazi Jews of 13 million,4 and a mutation prevalence among colorectal cancer cases of 0.59%.3
The single locus method for estimation of the time since the founding of the mutation5,6 gives highly variable estimates ranging between 12 and 700 generations (table 1). In particular, for markers near the mutation the estimates are very sensitive to the recombination rates and also may be strongly affected by mutation at the microsatellite markers. However, the coalescent model with growth rate 1.5-fold per generation (fig 1A) presents a clearer picture of a recent mutation origin at about 17 generations ago (95% confidence region, 12.7 to 22.5 generations). For a growth rate of 1.6, the mutation is estimated to be slightly younger (fig 1B).
Associated allele frequencies, linkage disequilibrium, and single locus estimates of the number of generations since the most recent common ancestor
Plot of the posterior distribution of mutation age, estimated by the software DMLE8 (disease mapping using linkage disequilibrium) employing Markov Chain Monte Carlo estimation. The dotted lines show the 95% posterior confidence region for the number of generations.
Estimates of the year of origin of the mutation range between 1440 (growth rate 1.5-fold per generation, upper end of the confidence region) and 1715 (growth rate 1.6-fold per generation, lower end of the confidence region). This 275 year range places the mutation in the period when the Ashkenazim were living in northern Europe, in partially closed communities. Such inference makes sense, as this mutation has not been identified in other Jewish groups or in individuals not of Jewish descent. These dates are consistent with an origin around the time of the Chmielnicki massacres that particularly affected the Jewish population of what is now Ukraine and Poland in 1648 to 1655. A similarly recent age and site of origin has been postulated for the mutation in the torsion dystonia gene, although that mutation is frequent in the Ashkenazim.5 For this rare MSH2 mutation, the rapid population growth means in fact that neither the existence of positive selection nor drift are required to explain the current allele frequency.
The estimate of lifetime colorectal cancer risk (6.85%) was taken from Canadian cancer statistics for 2004. This rate may be somewhat higher than estimates from the USA or from Canada several years previously. We repeated the DMLE analysis using a lifetime risk of 5% for colorectal cancer, and this did not alter our estimates of mutation age detectably. On the other hand, the coalescent model is very sensitive to the assumed growth rate. However, for this population there is good information about the population size over the last 500 years. Nevertheless, variation in population size owing to persecution or major epidemics was not considered, and such factors could affect the variability of gene genealogies and hence the estimated mutation age. The effect of the massacres in eastern Europe is likely to increase dependence between gene genealogies, and therefore to make the age of the mutation appear slightly older; on the other hand, any selection against the mutation would make the mutation appear slightly younger. Selection is not likely, given that most cases have onset of cancer in the fourth or fifth decades of life.
This mutation was identified by sequencing a well studied candidate gene, MSH2, in which several other mutations have also been shown to increase risk for colorectal cancer.2 It is worth speculating whether a strategy of testing for linkage disequilibrium could have identified this mutation. Recently, it has been shown that genome-wide association testing was able to identify this gene using only Ashkenazi Jewish individuals with colorectal cancer (compared with Ashkenazi controls).11 Therefore, case–control studies of rare variants can be a successful strategy when cases come from high risk families, when a founder population with extensive linkage disequilibrium is studied, when the founding mutation is recent enough for the haplotypic signature to extend over a sizeable region, and when there is one mutation responsible for a large proportion of the cases in the sampled population.
Acknowledgments
This work was supported by the Mathematics of Information Technology and Complex Systems/Networks of Centres of Excellence Program, and by the Ontario Genomics Institute. WDF is funded by the Canadian Genetic Diseases Network and the Cancer Research Society Inc. We appreciate Dr Reeve’s assistance and suggestions with respect to the DMLE software, and Dr Speed for his comments.
REFERENCES
Footnotes
-
Competing interests: none declared