Statistics from Altmetric.com
- CRC, colorectal cancer
- HNPCC, hereditary non-polyposis colorectal cancer
- MSI, microsatellite instability
- ECC, extracolonic cancer
A positive family history has been shown to be an important risk factor for colorectal cancer (CRC). Part of the familial aggregation is explained by the inherited diseases familial adenomatous polyposis and hereditary non-polyposis colorectal cancer (HNPCC).1 The latter syndrome is characterised by a high risk of colorectal cancer with a high rate of multiple primary tumours and a young age of onset, and also by a high risk of cancers of other organs (endometrium, stomach, pancreas, ovary, small intestine, urinary tract).2 The germline mutations which cause this syndrome have been shown to occur on genes that are responsible for repairing DNA mismatches. In humans, six mismatch repair (MMR) genes have been identified (hMLH1, hMSH2, hPMS1, hPMS2, hMSH6, and hMSH3) but germline mutations have been found in the first five only, mostly in hMLH1 and hMSH2.3 It is now commonly accepted that the lifetime risk of colorectal cancer in MMR carriers is very high, between 70% and 90%. A recent review of the available data indicated a lifetime risk of colorectal cancer of 74% or more in males, a somewhat lower risk in females, and a lifetime endometrial cancer risk of 42% or more in female mutation carriers.4
Apart from the preferential localisation of tumours in the proximal part of the colon, and the high frequency of multiple tumours, there is no specific individual characteristic of the syndrome. Therefore, the syndrome is diagnosed in patients on familial criteria. The classical criteria are the so-called “Amsterdam criteria”,5 which were issued in an effort to standardise clinical studies and to ensure that only families with HNPCC would be classified as such. These very stringent criteria include: (1) three relatives with colon cancer, two of them being first degree relatives of the third; (2) at least two generations affected by colon cancer; and (3) one colon cancer patient diagnosed at 50 years or younger. New criteria for HNPCC were published after a workshop organised by the National Cancer Institute (Bethesda guidelines)6 or proposed by the International Collaborative Group on HNPCC (Amsterdam II),7 which both substantially expanded the Amsterdam criteria to take into account extracolonic cancers and, in the Bethesda guidelines, early onset adenomas.
Because of the lack of DNA mismatch repair in the tumour cells of affected subjects, these cells acquire mutations that change the length of the nucleotide repeat sequences, termed microsatellite instability (MSI).8 This MSI can be detected in the tumours of affected subjects and Aaltonen et al9 proposed using this phenotype as a prescreening method in HNPCC.
In most of the studies that have provided estimates of the risk for a carrier of one of the MMR genes of developing CRC or extracolonic cancer (ECC), the families were ascertained using the Amsterdam criteria.10–14 The lifetime risks for colorectal cancer ranged from 78%10,13 to 87%,11 slightly higher for males than females. The lifetime risks for extracolonic cancer also turned out to be substantial, with estimates ranging from 28%13 to 72%,10 the highest risk being for endometrial cancer in women, with values ranging from 43%10 to 60%.14 Yet the use of these very restrictive criteria is bound to cause an ascertainment bias towards multiple case families. The reason for this is that, in populations where sibships are small, families with no case, one case, or two cases, which are the majority among families including mutation carriers, do not meet the criteria for being selected. In addition, families with predominantly ECC cases will not be tested. There is therefore an over-representation of families with multiple CRC cases in the samples, and for this reason the cancer risk in mutation carriers, and in particular CRC risk, is expected to be overestimated in these studies.
The aim of the present paper is to investigate the bias associated with the Amsterdam criteria for both colorectal and extracolonic cancer risks. For this purpose, we simulated samples of at most three generation families, using the French population demographic characteristics and various values of the colorectal cancer and extracolonic cancer risks. We then tested subsets of HNPCC pedigrees meeting the Amsterdam criteria, and compared the cancer risk estimates based on these pedigrees to the actual risks underlying the simulations. Lastly, we discussed the importance of the bias in relation to the magnitude of the actual risks.
MATERIAL AND METHODS
The families which were simulated were at most three generation families, with two ancestors one of whom carried a mutation. Since the probability that a given family fulfils the Amsterdam criteria obviously depends on family size, we chose a simulation process where the family size and structure would be variable, as it is in reality, and we used French demographic data dating back to 1920 to perform the simulations. The ancestors in each family were taken from the generation of people born between 1901 and 1925. All the children of this couple and their grandchildren (third generation) were generated using the following parameters: number of children per woman and per birth cohort; interval between two consecutive births according to the mother's birth cohort; age of the mother at the birth of the first child according to the mother's birth cohort.
The simulation was conducted using the guidelines provided by Pennec15: (1) each mother of the ancestral couple was randomly attributed a date of birth between 1901 and 1925, and (2) each mother in the pedigree was randomly attributed the total number of her children, her age at the birth of the first child, and the age at the following births, if any, according to her birth cohort. The descendants of a man were studied through those of his spouse, and the age difference between the spouses was arbitrarily set at two years.
The risks of cancer were assumed to be different for colorectal cancer and extracolonic cancer, and to vary with age and sex. Four age classes were considered: 20-39, 40-59, 60-79, and 80-99 and the cancer risk was assumed to be nil before the age of 20. For each sex, and for each age class, a cumulative risk up to the end of the interval was fixed. The incidence rate λk being assumed to be constant over a given class k, the cumulative risk RC(t) at age t belonging to the interval [tk,tk+I], may be obtained by the following formula:
Two sets of risks defined as “low” and “high” risks were considered and are given in table 1. The high risks are those given in previous studies10–14 and the low risks were arbitrarily chosen at values approximately 30% less than the high ones. For ECC, women were attributed a higher risk than men since endometrial and ovarian cancer are part of the spectrum of tumours.
In each pedigree, the genotypes were simulated according to Mendel's laws for subjects whose parents were in the pedigree, as was the relative frequency of genotypes in the population for spouses. The frequency of the mutation in the population was arbitrarily fixed at 0.001 and we assumed the absence of de novo mutations. We considered the most informative situation in which all the genotypes of the subjects in the pedigree would be known and the phenotypes were simulated using the values of risks given in table 1. For each person, at each age greater than 20 years and until age at last contact, the disease status was randomly attributed for CCR as well as ECC. In order to avoid the difficult problem of independence of cancers in case of multiple tumours, the first occurrence of cancer was the only one to be considered. Thus, any person affected at a given age with a type of cancer was censpored at this age for other events. For the sake of simplicity, we did not simulate death and the age at last contact of all subjects was the age they would have reached in the year 2000.
We also considered that families with many affected subjects have a greater likelihood of being ascertained than families with smaller numbers. If π is the probability that a family fulfilling the Amsterdam criteria is ascertained, this probability was set to 1 in a first step, and then we considered the two situations where π would be equal respectively to N/6 and N/15, N being the number of CCR cases in the pedigree.
For each model of risks, 40 replicates of 1000 pedigrees were simulated and the simulation procedure was validated by checking that the risk estimates based on the whole sample were the same as the theoretical risks underlying the simulation.
The risks were estimated using the Kaplan-Meier estimator, as used by all authors.10–14 The bias was evaluated through the average difference between the actual risk and the estimated risk:where Re,i is the risk estimated in the replicate I, and R0 is the actual risk used in the simulation.
Whatever the risks, the number of subjects aged more than 80 was too small to permit estimation of cancer risk in this age class and therefore the results are given for the first three classes only.
Applying a selection according to the Amsterdam criteria (with complete ascertainment of families fulfilling these criteria) dramatically reduced the sample size, as expected. Indeed, only about 10% of the families in the samples fulfilled these criteria, when the actual CRC risks were high, as defined in table 1, and only 4% when these risks were low. In our analysis, the estimates of CRC and ECC risks are independent given that we studied only the first occurrence of cancer, and the biases associated with each of these risks are shown in tables 2 and 3, respectively. For the CRC risk estimates, there is a very large overestimation of risks from 17% to 130% depending on the actual risks. Compared to the CRC risks, the ECC estimates are substantially underestimated (about 30% lower than the actual values), with a small variation according to the actual risks. Because of random fluctuations, the range of estimates is generally large, for both cancer risks, as shown in tables 2 and 3.
On one hand, selecting families on Amsterdam criteria and not taking into account this selection in the estimation method largely overestimates the CRC risks. On the other hand, there is also an underestimation of the ECC risks and that was quite unexpected; indeed, since families are selected through CRC cases only, the proportion of other cancers in families should remain unchanged. However, since subjects affected with CRC are censored at the age of onset of this cancer for other events, selecting CRC cases as a first occurrence results in selection of cases not affected with ECC. To get an idea of what the actual CCR and ECC risks could be, we simulated samples using various values of risks, until we reached the CRC and ECC risk estimates the closest to the average estimates which were found in published studies (table 4). We found that the actual CRC lifetime risks could be as low as 40%, instead of 80% in men and 70% in women, and that the ECC risks could be as high as 60% in men, instead of 35%, and 65% in women, instead of 50%.
Allowing for preferential ascertainment of families with multiple cases of CRC did not modify the estimates of risks (results not shown), whatever the value of π specified in the simulation. However, since it drastically reduced the sample size, the random fluctuations were substantially greater.
Our study clearly showed that colorectal cancer risks are largely overestimated in HNPCC, at about double the actual levels. It also showed a not negligible underestimation of the risk of extracolonic cancer, which would be still higher than CRC risks. In summary, the lifetime risk of colorectal cancer for mutation carriers would be about 40% in both sexes, and the lifetime risk of extracolonic cancer would be about 60% in men and 65% in women.
In most simulation studies dealing with ascertainment biases, families have a fixed size, usually relatively small (about three sibs per sibship). The problem with this constraint is that families meeting the Amsterdam criteria would have very few unaffected subjects, and therefore also very few unaffected carriers, which would artificially inflate the bias on the risk estimates. To avoid this problem, we have developed a simulation method using real demographic data. In our simulated samples, the families meeting the Amsterdam criteria are larger than the others, allowing for a proper representation of unaffected carriers. However, since we used French demographic parameters to perform the simulations, one could argue that these parameters might not be appropriate to the countries (The Netherlands, Finland, USA) in which the risks have been estimated. To evaluate the sensitivity of our results to this parameter specification, we performed the simulations and evaluated the biases in very extreme, and unrealistic, situations where the fertility rate of women would be twice or half the French one. As expected, the bias was higher for a low fertility rate and lower with a high one, but the impact on the risk estimates was quite small. For instance, for an actual cumulative risk of 60% of developing colorectal cancer at the age of 79, which would lead to an estimation of 87% (bias 45%) using French fertility rates, the estimated risk would be 79% (bias 32%) if the fertility rate was twice, and 94% (bias 57%) if this rate was half the French one. Given that the countries where the studies were carried out have demographic parameters which are fairly comparable to the French one, we can conclude that the biases found in this study are quite robust to demographic parameter specification.
Would these results be different in a more realistic situation where family members would not all be tested, some of them could be diseased, de novo mutations could occur, etc? Including mortality rates in the simulation process would decrease the overall information provided by a pedigree. Indeed, some subjects would be studied at various ages and a smaller number of subjects would be available for risk estimation, particularly in the older age groups. However, this is not likely to affect the risk estimates, given that mortality from other causes would be independent of the condition under study. The absence of mortality may have another consequence since people dying from cancer at a young age would not have the opportunity to have descendants, which would reduce the number of carriers in the following generation. Since the average age at diagnosis is about 45 years,2 most people would have already had their children before the occurrence of their disease, according to French demographic data. Therefore, an allowance for mortality would not have modified the cancer risk estimates, but rather would have reduced the precision of those estimates.
Unavailability of some family members is also likely to lessen the information provided by the pedigrees. In “real life”, a systematic bias could be introduced if genetic testing was not independent of phenotype. For instance, affected subjects may be dead and unaffected subjects may more or less be willing to undergo genetic testing. An additional overestimation of risk might be expected if affected relatives were systematically considered as carriers (since some of them could be sporadic) and/or if a not negligible proportion of unaffected relatives were missing, which is probably the case in most of the studies which we referred to.
De novo mutations were not considered in our study. Such cases are unlikely to be ascertained using Amsterdam criteria since their family history is expected to be negative. Even when subjects are ascertained through non-familial criteria (as further discussed), patients' parents are seldom available and de novo mutations are very difficult to prove directly. The argument for the existence of such mutations is usually indirect, through the study of haplotype sharing among subjects carrying the same germline mutation.16 This phenomenon is not expected to have any impact on our conclusions.
Among the hypotheses made in the present study, the most questionable is the assumption of genetic homogeneity, that is, of equal risk associated with all mutations. Previous studies showed that the risks associated with hMLH1 and hMLS2 mutations appeared quite similar11 or only slightly different.13 However, we cannot exclude that some specific mutations could be associated with much higher risks than others. The high aggregation of colorectal cancer cases in Finnish families is striking,17 and it is possible that mutation 1 of hMLH1, a predominant mutation owing to a strong founder effect in Finland, is associated with particularly high colorectal cancer risks. If heterogeneity existed, we would expect that the estimation bias would be smaller for these families who would be over-represented in a sample selected on Amsterdam criteria. Conversely, mutations with lower penetrance would be under-selected and the ascertainment bias would be maximum in that group. The data published up to now do not support such a difference. For instance, the relative frequency of mutation 1 in Finland is not much greater (57% versus 50%) when families were selected on Amsterdam criteria18 than when they were selected on the presence of MSI in tumours.19 Such heterogeneity could, however, exist but would require a large body of data and rigorous methods of analysis to be demonstrated.
We would like to emphasise that the overestimation of risks shown in this study is a totally different issue from the discrepancies in penetrance estimates found in other family syndromes, depending on the population in which the mutation carriers have been ascertained. Such discrepancies have been shown, for instance, in breast-ovarian cancer family syndrome resulting from mutations of the BRCA1 or BRCA2 genes. The estimation from multiple case families, although using a method which perfectly corrected for ascertainment bias,25 provided higher risk estimates for BRCA1 mutation carriers than studies of subjects selected independently of family history in a population with a particularly high frequency of a specific BRCA1 mutation.26 Such a discrepancy could be explained by the fact that the cancer risk conferred by mutated BRCA1 is modified by other factors, either genetic or acquired, that themselves run in families.27
The question is now how could unbiased estimates of cancer risks be obtained? The risks that we found by our approximate method give an order of magnitude. Dunlop et al20 proposed selecting subjects on age at diagnosis of the index case (at or below 35 years) and the presence of MSI in the patient's tumour, that is, independently of family history. Excluding the index case from the analysis and using a maximum likelihood method, they obtained significantly higher risk estimates for males than females for colorectal cancer (74% versus 30%), and a risk of uterine cancer of 42% by the age of 70 years. As noted by Watson and Lynch,4 the estimates from this study were considerably lower than the estimates from other studies. However, since the frequency of colorectal cancer cases occurring so early in life is low, there were only six families fulfilling the inclusion criteria in the study by Dunlop et al20 and the estimates are subject to considerable sampling errors. Another possibility would be to conduct the analysis of families selected on the existence of MSI in patients' tumours.9,19,21 One should keep in mind, however, that the risk estimates could still be biased because at least one person, the index case, would be affected by colorectal cancer.
The lifetime risk of colorectal cancer among mutation carriers in HNPCC families is considered to be very high (70-90% in published reports).
This risk is likely to be overestimated as it is based on families selected according to the very stringent Amsterdam criteria and not corrected for ascertainment bias.
Using a simulation study, we showed that the bias was such that the estimated risk could be double the actual risk.
More generally, we would like to emphasise that, whatever the way of detecting families with mutation carriers, estimating disease risks from these data requires an appropriate method adapted to this particular selection. Apart from the study of Dunlop et al,20 authors all used the Kaplan-Meier method, which is a mere counting method, and is thus totally inappropriate for the estimation of risks from families selected through the Amsterdam criteria. Using this method with carriers in families selected on the enlarged criteria6,7 would still provide biased estimates, although to a lesser extent, of the risks. Totally unbiased estimates could be obtained by using a maximum likelihood method conditional on the mode of selection of families, as does the ARCAD method for cancer risk estimation in carriers of p53 mutations.22,23 This method corrects for selection through an affected child and the restriction of genetic testing to families with at least one first or second degree relative with early onset cancer. Such a method could be adapted to other types of selection, such as the new criteria for HNPCC,6,7 provided that the method of estimation takes into account the particular selection of families. If this is the case, it would allow an estimation of the risks without any bias and with good precision, which is still lacking at present.
This work was supported in part by the French “Ligue Nationale Contre le Cancer”.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.