Article Text


Risk of colorectal and endometrial cancer for carriers of mutations of the hMLH1 and hMSH2 gene: correction for ascertainment
  1. F Quehenberger1,
  2. H F A Vasen2,
  3. H C van Houwelingen1
  1. 1Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, Leiden, Netherlands
  2. 2The Netherlands Foundation for the Detection of Hereditary Tumours, Leiden University Medical Centre
  1. Correspondence to:
 Dr Hans F A Vasen
 The Netherlands Foundation for the Detection of Hereditary Tumours, Leiden University Medical Centre, Rijnsburgerweg 10, “Poortgebouw Zuid”, 2333 AA Leiden, Netherlands;


Background: Hereditary non-polyposis colorectal cancer (HNPCC) is caused by germline mutations of mismatch repair genes, usually in hMLH1 or hMSH2. All earlier studies on penetrance except one population based study were conducted in HNPCC families and did not correct for the way in which these families were ascertained.

Objective: To obtain estimates of the risk of colorectal cancer (CRC) and endometrial cancer (EC) for carriers of disease causing mutations of the hMSH2 and hMLH1 genes.

Methods: Families with known germline mutations of hMLH1 (n = 39) and hMSH2 (n = 45) were extracted from the Dutch HNPCC cancer registry. Ascertainment-corrected maximum likelihood estimation was carried out on a competing risks model for cancer of the colorectum and endometrium.

Results: Both loci were analysed jointly as there was no significant difference in risk (p = 0.08). At age 70, colorectal cancer risk for men was 26.7% (95% confidence interval, 12.6% to 51.0%) and for women, 22.4% (10.6% to 43.8%); the risk for endometrial cancer was 31.5% (11.1% to 70.3%).

Conclusions: Current estimates of the CRC risk of mutations to the hMLH1 and hMSH2 locus should be replaced by considerably lower risks which account for the selection of the families.

  • CRC, colorectal carcinoma
  • EC, endometrial carcinoma
  • HNPCC, hereditary non-polyposis colorectal cancer
  • MC, cancer at minor HNPCC sites
  • MMR, mismatch repair
  • hereditary non-polyposis colorectal cancer
  • hMLH1
  • hMSH2
  • colorectal cancer risk
  • endometrial cancer risk

Statistics from

Hereditary non-polyposis colorectal cancer (HNPCC) or Lynch syndrome1 was originally defined as familial clustering of colorectal cancer (CRC). A set of diagnostic criteria, the so called Amsterdam criteria, was proposed to provide uniformity in clinical studies. According to these criteria, at least three relatives should have colorectal cancer (that is, Amsterdam criteria I) or HNPCC associated cancers (cancer of colorectum, endometrium, ureter, renal pelvis, and small bowel (Amsterdam criteria II)), and one of them should be a first degree relative of the other two. In addition, at least two generations should be affected, the cancer should be diagnosed before the age of 50 years in one of the relatives, familial adenomatous polyposis should be excluded, and the cancer should be confirmed by pathology.2

HNPCC is caused by germline mutations in DNA mismatch repair genes (MMR). Mutations of hMSH2 and hMLH1 constitute almost 90% of the mutations reported in families with HNPCC and are identified in half the families which meet the Amsterdam criteria.3 The penetrance of these mutations was found to be high,1 but the evidence was mainly based on observed risk of CRC in cohorts of mutation carriers that were identified by multiple cancer cases in the families. This contains a circular argument, as the excess of CRC cases—which was the reason for genotyping family members—was counted again for a risk estimate. On the other hand, a family with only two cancer cases and an equal number of mutation carriers, which would provide evidence for low mutation risk, would not be included in the study. Indeed it was shown that this method seriously overestimated the CRC risks of mutations and underestimated the risks of extracolonic cancers in families that were ascertained by the Amsterdam criteria I.4

Direct assessment of penetrance by identifying cohorts of mutation carriers from the population would be difficult, as the frequency of hMSH2 and hMLH1 mutations in CRC cases was found to be as low as 0.135% by Salovaara et al5 and 0.0319% by Dunlop et al.6 In this situation, the genotyping of a kin cohort of relatives of cancer cases is a possible way of identifying more mutation carriers. This design was later described as genotyped proband or kin cohort design, with or without additional genotyping of relatives.7–9

The most efficient way of identifying mutation carriers is to ascertain families with multiple cancer cases. If a mutation has been identified in such a family, genetic testing is offered to all unaffected relatives. The proportion of carriers of the pathogenic mutation among unaffected relatives is expected to decrease with age because they have developed cancer, whereas affected relatives not carrying the pathogenic mutation are increasingly found at older ages. From these observations it is possible to draw conclusions about the cancer risk associated with the mutation, independent of the phenotypic criteria that were originally used to ascertain the high risk families.10,11

Our aim in this study was to obtain estimates of the risk of CRC and endometrial cancer (EC) for carriers of disease causing mutations of the hMSH2 and hMLH1 genes from data on HNPCC families in which a mutation of either the hMSH2 or hMLH1 gene had been found by conditioning on all phenotypic information.


HNPCC families

In 1987 a national registry for families with HNPCC was established in the Netherlands. The registry had three objectives:

  • to promote surveillance in HNPCC families;

  • to safeguard the continuity of the surveillance programme;

  • to promote research.

The methods and approach of the registry have been described elsewhere.12 In brief, clinical specialists or clinical genetic centres from all parts of the Netherlands refer all families suspected of HNPCC because of clustering of CRC to the registry. The genealogical studies were carried out by genetic field workers associated with the registry or by clinical geneticists. We collected clinical information including the age at diagnosis of cancer, site of the tumour, age at death, and causes of death. The cancer diagnosis was confirmed by medical and pathological reports in as many affected relatives as possible. In addition, we collected data on colonoscopic screening of the unaffected relatives. For the present study families were selected in which a disease causing mutation of hMLH1 or hMSH2 had been identified. Genetic counselling and testing were offered to all first degree relatives of carriers of a pathogenic mutation.

Mutation analysis

The techniques used in the mutation analysis have been reported previously.13 To summarise, the general strategy was to amplify by polymerase chain reaction each of the 16 exons of hMSH2 and the 19 exons of hMLH1 in a single affected member of the family, and to analyse these products by guanosine and cytidine extension clamped denaturing gradient gel electrophoresis. To determine the molecular nature of the variant, exons with an altered pattern of migration on denaturing gradient gel electrophoresis were sequenced. When variants were detected, the investigations were extended to the rest of the family to verify the segregation of the nucleotide change with the disease phenotype.

Study data

The identity of the parents was recorded for each non-founder family member. The phenotypic data included the current age, age at first colonoscopy, age at death, age at diagnosis of CRC, EC, or cancer at minor HNPCC sites (MC). The minor HNPCC sites were the small bowel, the stomach, the ovary, and the urinary tract including the renal pelvis and ureter but excluding the urinary bladder. Mutation status at the hMLH1 and the hMSH2 locus was obtained from a part of the family members.

Statistical analysis

HNPCC-causing germline mutations of MMR genes were assumed to have a population frequency of 0.1% each,5,6 to follow Mendelian inheritance, and to be in Hardy–Weinberg equilibrium. All calculations were conditional on the family structure.

A biallelic single locus model was applied, which meant that in hMLH1 families only the hMLH1 locus was modelled and in hMSH2 families only the hMSH2 locus was modelled. The unconditional likelihood of observing the genotypes and the phenotypes of a pedigree was factorised into the product of the likelihood of the genotypes and the likelihood of the phenotypes conditional on the genotypes. The likelihood of the genotype was determined by the mutation allele frequencies. For the likelihood of the phenotypes conditional on the genotypes we assumed an age dependent disease risk for carriers and non-carriers of a pathogenic mutation. CRC, EC, and MC were determined to be competing risks.14 This meant that any information about cancer cases was ignored after the first diagnosis of any of CRC, EC, or MC. Assuming that the minimum of current age, age at first colonoscopy, and age at death constitute an uninformative censoring event, the partial likelihood of observing a non-diseased person with genotype g at the censoring time t was

SCRC,g(t) • SEC,g(t) • SMC,g(t)

where S denotes the cause specific survival function, which is the exponential function of minus the cumulative cause specific hazard function. The complement of this function is the cause specific cumulative risk function; it gives the age dependent probability of getting the specific cancer diagnosed if there were no censoring and the risk of observing an event was not changed by the diagnosis of a competing cancer type. For persons diagnosed with the first cancer of type CRC, EC, or MC at age t, the likelihood of the phenotype given the genotype was the specific hazard function of that type of cancer multiplied by SCRC,g(t) • SEC,g(t) • SMC,g(t). In order to correct for ascertainment, we conditioned the likelihood of the observed genotypes and phenotypes on the likelihood of the observed phenotypes and on the event that at least one CRC, EC, or MC case in the family was a mutation carrier, which was given by the ratio of the two likelihoods.10,15 The likelihood of the conditioning event was calculated as the difference between the likelihood of the phenotypes alone and the likelihood of the phenotypes and none of the cancer cases being a mutation carrier. More details on the statistical methods are given in the electronic appendix (this can be viewed on the JMG web site:

For each sex, cancer specific hazard rates for non-carriers were taken from the age dependent cancer incidences published by the Netherlands Cancer Registry for each five year age interval.16 Thereby, we assumed that the mutation frequencies were so low that they would not have a substantial influence at the population level. Furthermore, we assumed that age dependent cancer incidences of the populations represented a cause specific hazard function.14 Finally the cause specific hazard functions were smoothed by a triangular kernel smoother with a kernel width of 11 years (see electronic appendix).

Cancer specific hazard rates of mutation carriers were modelled as a product of non-carrier hazard rates and an age dependent relative hazard function. We assumed that the logarithms of the cancer specific relative hazard rates were polynomial functions. The relative hazard rates of MC were set constant. The polynomial degrees for CRC and ED were determined by a two stage search. In the first stage a backwards search was made, reducing the polynomial degree for both cancers simultaneously. In the second stage a stepwise search was carried out separately for each cancer until there was no indication of a lack of fit according to the likelihood ratio criterion at the 5% level. This search was done for hMLH1 and hMSH2 families separately. For the comparison of cancer specific log relative hazard between hMLH1 and hMSH2 we added to the model relative hazard parameters for locus, one parameter for each cancer. Sex differences were tested similarly by adding a risk parameter.

Parameter estimates were obtained by maximum likelihood. Standard errors were obtained from the information matrix of the parameters by the delta method. Confidence intervals of disease risks were calculated symmetrically around the logarithm of the disease specific relative hazard rates. Hypothesis tests were based on the likelihood ratio criterion at a 5% error rate.

All the analyses were carried out on a modified version of the computer program MENDEL 3.3.17


The characteristics of the HNPCC families are summarised in table 1. The diseased non-carriers were one patient with ovarian cancer diagnosed at age 55 from an hMLH1 family; and two patients with CRC diagnosed at ages 73 and 42, respectively, plus one patient with gastric cancer diagnosed at age 28 from separate hMSH2 families. We assumed a constant relative hazard function for MC risk, as there were only seven cases of MC that had been genotyped. The search for the best fitting relative hazard function for CRC risk found second degree polynomials for both genes. The degree of the polynomial for the log relative hazard function for EC was zero, indicating that the relative hazard for EC did not vary with age. Sensitivity analysis showed little dependence of the results on the assumed allele frequencies, non-carrier risks, and phenotypic and genotypic information earlier than 1990 (table 5 in the electronic appendix).

Table 1

 Characteristics of HNPCC families in which either an hMLH1 or an hMLH2 mutation had been found

The risk of hMSH2 mutations relative to hMLH1 was 0.23 (95% confidence interval, 0.03 to 1.80) in CRC and 0.47 (0.22 to 1.10) in EC. As the statistical test for differences between genes was not significant (p = 0.08), it was decided to assume the same hazard rates for both genes.

The test for a sex specific relative hazard rates was not significant for either CRC (p = 0.99) or MC (p = 0.88). However, we still assumed sex dependent hazard rates for non-carriers, as there was no evidence against this.

The coefficients of the polynomials that represented the logarithms of the relative hazard functions are shown in table 2. The resulting age dependent relative and absolute hazards are given in table 3 and fig 1. Maximum relative hazard for CRC was (mean (SD)) 33.1 (15.2) at age 39 years. The maximum age dependent incidence, which is given approximately by the hazard rates, was 1.3 (0.6)% at age 57 for men and 1.0 (0.5)% at age 55 for women. At age 72, CRC incidence had decreased to the population level. The age dependent incidence followed the population incidence at a distance, rising steeply after the age of 50 to a maximum at 80 years, as the relative risk of EC had not been found to be age dependent.

Table 2

 Parameter estimates of the final model

Table 3

 Cause specific hazards relative to the population and absolute cause specific hazards of carriers of mutations at either the hMLH1 or the hMSH2 locus

Figure 1

 Age dependent incidences of the population and of carriers of a pathogenic mutation at either hMLH1 or hMSH2 locus. CRC, colorectal carcinoma; EC, endometrial carcinoma.

The mutation carrier cumulative risks are presented in table 4 and fig 2. The results of separate analysis of hMLH1 and hMSH2 families are given in tables 2a to 4a and tables 2b to 4b, respectively, in the electronic appendix.

Table 4

 Age dependent cause specific cumulative risks and 95% confidence intervals for carriers of mutations at either the hMLH1 or the hMSH2 locus

Figure 2

 Age dependent cause specific cumulative risk and confidence intervals for carriers of mutations at either the hMLH1 or the hMSH2 locus. (A) Colorectal carcinoma (CRC) for men. (B) CRC for women. (C) Endometrial carcinoma (EC). (D) Both CRC and EC. It was assumed that no competing risks took effect.


By assessing the occurrence of germline mutations of MMR genes in high risk families for a person aged 70 years carrying an hMLH1 or an hMSH2 mutation we found a CRC risk of 26.7% for men and 22.4% for women, whereas the risk for EC was estimated to be 31.5%. The highest incidence for CRC was 1.3% at age 57 for men and 1.0% at age 55 for women. The difference between genes was not significant. There was also no sex difference in relative hazard rates for CRC for mutation carriers.

Cancer risks of mutation carriers within HNPCC families have been investigated in at least seven studies.13,18–23 The cumulative CRC risk reported for hMLH1 or hMSH2 ranged from 54%13 to 100%20 at age 70. Differences between genes13,24 and between sexes9 have been found in some studies. EC risks between 24% and 62% have been reported.13,20–23 All studies except one9 applied the Kaplan–Meier estimate to cohorts of mutation carriers that were ascertained on the occurrence of multiple cancer cases within a family. Carayol et al4 had demonstrated in a simulation study that by selecting HNPCC families under the demographic parameters of France by the Amsterdam criteria I the CRC risk estimates reported in previous studies were largely overestimated. They found that an actual male cumulative risk for CRC of 25% at age 59 would on average give a risk of 59% with the Kaplan–Meier estimate. Vasen et al,13 using an earlier version of the data which were underlying this study, found 55% and 70% for the risk of hMLH1 and hMSH2, respectively, at age 70. By analysing our data in a way that took into account that the families were selected on familial clustering of cancers and the occurrence of mutation carriers,10,15 we found cumulative risks that were in close agreement with those obtained in the simulation study4 (table 4).

Carayol et al4 found in addition an underestimation of extracolonic cancer risk for the Kaplan–Meier method in families ascertained only by multiple CRC cases. At least some of the families in the current study were ascertained for extracolonic cancers, so we would expect a risk estimate for EC or MC, respectively, within the range reported by other studies. Unfortunately, owing to the few observed EC cases, our results are too imprecise to allow conclusions to be drawn.

Our analysis urged us to be explicit about the phenotype on which the families were selected and how phenotypes depended on genotypes. The relevant phenotypes were age at diagnosis of HNPCC specific cancers, at the start of screening, at death, or at study end. The competing risks model for the phenotypic data had the advantage that it took into account only the first HNPCC specific cancer and therefore did not need to be explicit about the risk in persons in whom one cancer had occurred already. The method is not sensitive to the selection of probands for genotyping, as it is unbiased under any sampling scheme.

The usual risk estimates of an event from competing risk models give the probability of observing the index event (or several index events) as the first of a set of events.14 However, the risk estimates given in table 4 assume that the subject is still at risk for the index event after an event from the remaining set of competing risks has occurred. Nevertheless these estimates are different from an estimate of the risk of the index event in a population based cohort study. The reason for this is that by ignoring any events after the first one, competing risk models also ignore the possibly higher risk of a person getting the index cancer who had been diagnosed as having another HNPCC related cancer, compared with a person who did not have any previous event. A person who might have died from a cancer not included in the competing risks would cause a risk difference in the other direction.

Families selected on many cancer cases are likely not only to contain a high proportion of mutational carriers but also to overrepresent other genetic and environmental risk factors, as well as high penetrance mutations.25 As a consequence this will wrongly attribute more risk to the mutation under study. Such an overestimated mutation risk would still be representative for mutation carriers from high risk families, to which we would like to give genetic counselling.26 In our study, we assumed the population risk for persons without a mutation. Sensitivity analysis gave no evidence for excess familial risk.

A follow up study of 199 non-diseased mutation carriers found a risk for developing CRC under endoscopic surveillance of 10.5% within 10 years, corresponding to an annual incidence of 1.0% (95% confidence interval, 3.8 to 17.2).27 Taking into account a considerable risk reduction resulting from screening, this would suggest high lifetime risks for CRC. In our study, an incidence above 1.0% was only found at age 50 and 60 for men and for an even narrower age range in women (table 4, fig 1). However, one has to consider the high degree of uncertainty within both studies.

The bell shape of the incidence function (fig 1) was caused by the data driven choice of the relative hazard function. The decline in CRC incidence for mutation carriers below the population incidence probably reflects the lack of data at higher ages.


The CRC risk estimates for mutation carriers in the current study are the lowest reported so far. These lower estimates might have an impact on counselling. However, we do not believe that there is a need to change screening practice, because the CRC risks are still high. Additional studies are needed to assess the impact of other genetic and environmental factors.


We thank the families who participated in the study. We are grateful to A Antoniou from the Cancer Research UK Genetic Epidemiology Unit, University of Cambridge, for advice on how to adapt MENDEL. We would like to thank N Nagelkerke from the Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, and L Hsu from the Fred Hutchinson Cancer Research Centre for comments on the manuscript. We would like to thank all specialists who referred families to the Dutch HNPCC registry. FQ was funded by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) as part of the project “Survival analysis of complicated data”, grant No 91202015.


View Abstract
  • The supplement is available as a downloadable PDF (printer friendly file).

    If you do not have Adobe Reader installed on your computer,
    you can download this free-of-charge, please Click here


    Files in this Data Supplement:

    • [view PDF] - Sensitivity analysis
      Penetrance estimates from hMLH1 families
      Penetrance estimates from MSH2 families
      Details of statistical methods


  • Competing interests: none declared

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.