Detecting low penetrance genes in cancer: the way ahead
- aSection of Cancer Genetics, Institute of Cancer Research, Cotswold Road, Sutton, Surrey SM2 5NG, UK, bMolecular and Population Genetics Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London WC2A 3PX, UK
- Dr Houlston, or Dr Tomlinson,
The search for the genes responsible for many complex genetic diseases is well under way and has already been successful in some cases. The study of cancer as a complex genetic disease has lagged behind other conditions, largely because of particular problems that are associated with malignant disease. Cancer also, however, presents specific opportunities for gene identification, which are not found in many other diseases. While the methods of genetic mapping and gene cloning used for other complex diseases will be applied to cancer, these must almost certainly be complemented by other methods, such as the study of somatic mutations, cancer associated phenotypes, and modifier genes for Mendelian cancers. Here, we review the strategies available for identifying cancer predisposition genes of low and moderate penetrance.
Large scale genetic studies which aim to identify the moderate and low penetrance loci involved in many genetic but non-Mendelian diseases appear now almost to be a commonplace. Notable success in this field of research includes the recognition of a relationship between the ApoE genotype and Alzheimer's disease risk1 and in the field of infectious diseases where variation in TNF and HLA have been shown to be associated with substantially different risks of TB and malaria.2-4 This is not to say that all these studies are near completion and in some cases, such as multiple sclerosis, localisation of predisposition genes (let alone gene identification) is proving very difficult.5 6 While it is easy to underestimate the difficulties inherent in studying diseases like diabetes, asthma, and rheumatoid arthritis, and indeed to underestimate the remaining problems, it is nevertheless surprising that the study of cancer has lagged behind that of other complex genetic diseases. Almost all cancer susceptibility alleles identified so far are rare and highly penetrant (for example, APC, BRCA1, BRCA2, MSH2, LMH1, PTEN, CDNK2A).7 They may cause a substantial proportion of cancers at young ages, but they are unlikely to be responsible for a high proportion of all cancers, leaving a considerable potential contribution from less penetrant genes. It is possible to gain an insight into the potential impact of such genes on cancer incidence given that the relative risk of disease in first degree relatives is only of the order of 1.5-2.5 over all ages for most common cancers.7 Under a dominant low penetrance model, the ratio of cancer risk in susceptibles to that in the general population cannot exceed about 5, if 50% of all cancers occur in susceptibles, when about 10% of the population must be at increased risk. A dominant gene carried by 2% of the population will cause a risk 11 times that in the general population and will cause 22% of all cancers. Within this range, such genes will rarely produce striking multiple case families, except possibly for breast cancer, where an 11-fold increase in risk would correspond to a penetrance of 43% by the age of 70. In the case of colon cancer, for example, where the cumulative risk in the general population is only 1.2% by the age of 70, an 11-fold increase in risk in susceptibles would correspond to a penetrance of only 13%.
Why is cancer different from other common diseases?
Cancer has very few strictly unique features compared with other complex genetic diseases, but it has a combination of features that make it particularly problematical to study. First, there are very few cancers with non-Mendelian inheritance, but which have large sib relative risks (unless stratified by age). Second, common cancers (such as those of the colon, breast, or bronchus) are usually late onset and the parents of patients (even those who present “early”) are often dead. Third, many cancers are fatal, making samples difficult to obtain retrospectively and even contemporaneously. Fourth, not only does cancer require particular combinations of genes and environment (like other diseases), but it also has a truly random component in that several somatic mutations must occur for a carcinoma or sarcoma to develop. Fifth, the challenge of discovering genes mutated somatically in cancer, but with no germline effects, has been a worthy distraction for the geneticist from the task of studying cancer as a complex genetic disease.
Nevertheless, despite the above problems and relatively low overall relative risks, the prospects for identifying low penetrance genes for cancer are far from bleak. The wide age range of presentation of most common cancers means that selection for early age of onset may be a more powerful way of enriching for disease with a genetic component in cancer than in many other diseases and for reducing environmental influences. In addition, the fact that some cancers progress in a stepwise fashion means that the more common benign precursor lesions can be studied instead of or in addition to the cancer itself. Moreover, as will be shown below, complementary strategies can be used to identify low penetrance cancer genes.
Types of low penetrance cancer predisposition genes
All Mendelian cancer predisposition genes appear to act in a cell autonomous fashion. Indeed, such a mechanism is more or less implicit in the action of tumour suppressor genes such asAPC, RB1, andTP53. It is probable that non-Mendelian cancer predisposition loci will include many genes with cell autonomous effects, but will also include some genes with “global” effects (for example, carcinogen metabolism polymorphisms, behavioural differences reflected in different diets or tendencies to smoke, and anti-tumour immune response) and other genes with effects on the local tumour environment (for example, influencing stromal-epithelial interactions or production of paracrine hormones). Some potential low or moderate penetrance cancer genes have been characterised, but the only successful mechanism for identifying these genes has been the analysis of candidate loci. The most common experimental design has been the case-control study, comparing allele frequencies in cancer patients with those in healthy controls. A number of putative low penetrance genes have been described, conferring susceptibility to cancer through a variety of mechanisms. Table 1 provides a summary of loci reported to act as low penetrance genes, most on the basis of more than one study. Of those which have known or probable modes of action, some, such as those involved in the metabolism of carcinogens, probably increase cancer risk by raising the “global” mutation rate (although the overall effect may appear to be site specific) while others such as the I1307K variant of APC may increase the mutation rate in a cell autonomous fashion.17
Recently, it has been proposed that a significant part of the non-Mendelian contribution to cancer might be derived from missense variants (or a restricted set of protein truncating variants) at classical tumour suppressor loci. This suggestion was prompted by the discovery of the I1307K variant at the APClocus, which confers about a two-fold increased risk of colorectal cancer in the Ashkenazi population.17-19 It has also been proposed that the E1317Q APC variant is associated with colorectal adenomas and hence with colon cancer.18 Individually, each of these types of variant would be at “sub-polymorphic” levels, contributing little to cancer risk and difficult to detect by usual gene mapping methods. Together, however, several rare variants at any locus such asAPC could contribute significantly to cancer risk, although there is as yet little evidence to support this theory.
Linkage versus association analysis to detect low penetrance genes
The detection of high penetrance genes is generally through linkage studies using multiple case families. Occasional high penetrance cancer genes may remain to be identified. Moderate penetrance genes for cancer, if such genes exist for any particular tumour type, can also be mapped by linkage studies, although non-parametric methods are most likely to be of use given uncertainties regarding genetic heterogeneity and the certain occurrence of sporadic cases within families with a genetic predisposition. The optimal linkage design for detection of a common low penetrance gene is not as simple as for a rare high penetrance gene. Families comprising three or more cases are likely to be more powerful for linkage than affected sib pairs or other two case families. For example, assuming a dominant model and gene frequency of 0.1 and risk ratio of 6, 473 affected sib pairs, 205 three affected sibs, 145 four affected sibs, 545 cousin pairs, or 550 avuncular pairs would be required to give a lod of 3.3. Increasing the number of affected subjects in a family does not necessarily represent an efficient strategy, since a large proportion of these large multiple case families are likely to be caused by highly penetrant genes or by chance. Furthermore, if the gene is common there is the possibility that a large family will be segregating two copies of the mutation. This leads to parents being homozygous at the disease locus with the consequence of the family being uninformative for linkage. While the composition of families used for linkage analysis will always depend on what can practicably be collected, affected sib trios probably provide the most efficient strategy for detecting low penetrance genes over a wide range of models.24
A concern, given the low relative risks associated with most common cancers, is that few moderate penetrance cancer predisposition genes exist. Low penetrance genes, characterised by small genotypic risks or less (that is, <4) will only confer a sib relative risk of 1.7 or less and will therefore rarely give rise to multiple case families. It is difficult or impossible to identify such genes by linkage analysis, because the number of affected relative pairs required will be prohibitively large.25 In contrast, the required sample size for a test based on allelic association can be vastly smaller, even allowing for multiple comparisons.25 26 For example, to detect by linkage a dominantly acting gene conferring a four-fold increase in risk with a frequency of 0.1 would require 1055 affected sib pairs. However, with certain caveats discussed below, only 200 cases and 200 controls (or 200 parent offspring trios using the transmission disequilibrium test) would be required if an association strategy was adopted, even if a high degree of significance was stipulated to allow for 100 000 comparisons (that is, 5 × 10–8).
At present, association studies are limited to the evaluation of candidate gene loci and there is every expectation that such studies will continue and become more common. However, there is increasing interest in the possibility of systematic genome wide association studies. These are becoming increasing feasible with the advent of high density marker maps of single nucleotide polymorphisms (SNPs) coupled with new methods of genotyping.27 Furthermore, the availability of the complete human genome sequence opens up the possibility of conducting genome wide allelic association studies based on analysis of intragenic polymorphisms.
An essential issue in a genome wide association study is marker density since the power of any association test will decline rapidly as linkage disequilibrium diminishes, or if there are differences in their relative allele frequency.28 A recent analysis of this problem based on simulations suggested that a useful level of linkage disequilbrium is unlikely to extend beyond an average distance of roughly 3 kb in the general population.29 This would mean that around half a million SNPs would be required for whole genome studies. Furthermore, the extent of linkage disequilibrium is similar in isolated populations unless the founding bottleneck is very narrow or the frequency of the variant is low (<5%).29 The outlook for genome wide association studies from these findings appears grim. It is, however, not clear that the underlying premises on which these simulations were carried out truly reflect the real world and the outlook is likely not to be so negative. As pointed out by Risch and Merikangas,30 the expectation with respect to linkage disequilibrium across the genome is unknown and studies of theApoE gene and late onset Alzheimer's disease, and the insulin VNTR region and diabetes, show that significant linkage disequilibrium exists well outside these regions.
The issue of false positive results is of great concern in association studies. Their frequency can be reduced by imposing low p values (for example, 10-6 to 10-8) or by replicating findings in independent samples, or ideally by both. The simplest and most efficient form of association study is the case-control approach based on a comparison of allele frequencies in unrelated cases and controls. A major problem inherent in this approach, however, is that spurious associations can arise from population stratification. Hidden stratification can be tested for. Provided that cases and controls are well matched, differences in the frequency of genotypes will only be seen at predisposition loci. Given that spurious associations result in a departure of genotypes from Hardy-Weinberg equilibrium, stratification can be detected by typing a series of unlinked markers chosen from a panel known to exhibit differences in allele frequency between populations.31 The appropriate Bonferroni correction would, of course, be required to assess the statistical significance of any putative association.
One method of circumventing the problem of occult population stratification is to use family based controls. The most common approach is the transmission disequilbrium test (TDT), which assesses the evidence for preferential transmission of one allele over the other from heterozygous parents.32 An attractive feature of the TDT is that it is a test of linkage and not merely of linkage disequilibrium, since only linkage disequlibrium can distort the distribution of marker genotypes among parents of affecteds. A problem, however, is that most cancers develop in later life and it may rarely be possible to determine parental genotypes directly. To obviate the requirement for parental genotypes, allied, although less powerful, statistics based on the use of sib genotypes have been devised.33-35
Given that the human genome project will lead to the identification of polymorphisms for all genes and the introduction of reliable, high density oligonucleotide arrays (or alternative methods) to detect the identified polymorphisms will allow genome wide searches to be conducted rapidly, the detection of low penetrance susceptibility genes will then be only restricted by the availability of large cohorts of well characterised patients and healthy controls. For a direct association study, based on the use of 100 000 SNPs, using a sample size of 1000 plus cases and controls would be sufficient to detect loci conferring fairly small genotypic risks (∼1.01-3.00) under multiplicative and additive models. This number of cases would be similarly sufficient under a common recessive model provided that p (predisposition allele frequency) >0.5 and under a common dominant model provided p<0.5.26
Thus there is, in summary, scope for identifying low or moderate penetrance cancer predisposition genes using association studies in particular, possibly selected for early onset or disease of specific histological type. Both of these approaches of subclassifying cancers are likely to prove useful in order to enrich for genetic homogeneity. It is, however, clear that the chances of successfully identifying new cancer genes with incomplete penetrance are uncertain, whether using association or linkage analysis. Cancer provides novel opportunities for gene identification using indirect methods and we believe that these should be exploited wherever possible.
Indirect approaches for identifying low penetrance cancer predisposition genes
As for any other genetic disease, cancer is amenable to specific techniques for enhancing the power of studies to detect genetic effects, for example, by studying genetically young or isolated populations, and selecting early onset or severe cases. There are, however, several techniques for detecting low penetrance predisposition genes, which have special application to cancer.
Although the proportion of diseases which results from somatic mutation may be under-recognised, cancer provides a near unique opportunity for identifying low penetrance predisposing genes by studying somatic mutations. There are precedents for carrying out such studies, which derive from Mendelian cancer syndromes. ThePTEN gene causing Cowden's syndrome and theDPC4/SMAD4 gene causing juvenile polyposis were both originally identified by mapping somatic homozygous deletions in cancers.36 37 In Mendelian cancer syndromes, the justification for extending results from somatic genetics to germline predispositions is that germline and somatic mutations are, in theory, functionally equivalent in a cell; one patient may inherit a variant and another may acquire it by somatic mutation, just as Knudson38 predicted. For the identification of low or moderate penetrance genes, the use of somatic genetics is suitable for genes with cell autonomous action. An example comes from the study of Paget's disease, in which linkage analysis and the study of somatic mutations in osteosarcoma have independently identified a disease locus on chromosome 18q.39 40 Somatic and germline studies will be complementary; fine mapping information, for example, might come from allelic association in the germline or from minimal regions of deletion in the soma.
Identifying somatic mutations is not, of course, a straightforward task, although samples of most tumours are relatively easy to collect. There are also some doubts that the sort of variation present and selected in the soma (generally loss of tumour suppressor genes and oncogene amplification) would have the relatively subtle effects typical of a low penentrance gene; high penetrance, or cell lethality, or simply lack of suitable germline variation are arguably more likely. Nevertheless, mechanisms such as somatic hypermutability (as proposed for I1307K-APC) provide sufficient justification for identifying predisposition genes by studying somatic mutations; the existence of alternative and complex genetic pathways may also mean that a tumour following one pathway may still gain an advantage from a variant in another pathway even if that variant does not have such a profound effect as the somatic mutations which usually occur at that locus.
Many and varied methods of somatic mutation mapping and identification exist. One of the most interesting is allele specific loss of heterozygosity (LOH), which is suited to studying patients with multiple tumours, but without Mendelian family histories. This is an extension of the principle of differential LOH patterns in multiple tumours first advanced in studies of MEN1.41 The approach stems from the concept of using loss of heterozygosity information in tumours to increase the power of linkage since loss of the wild type allele occurs in cases of tumours caused by mutations in tumour suppressor recessive oncogenes.42 To date there have been few examples of the utility of this approach, although linkage of isolated hyperparathyroidism to MEN1 has been reported.43 The concept can clearly be extended to mapping a novel gene. Examples of multiple tumours, which might be amenable to analysis using this strategy, include colorectal adenomas, naevi, and solar keratoses. If the tumours have a genetic cause owing to a mutation in a tumour suppressor gene, every lesion from the same patient should show loss of the “wild type” allele. In this way, double somatic events can be distinguished from single events associated with the site of a predisposition allele. Using this strategy it is theoretically possible to identify a novel adenoma predisposition gene by the analysis of five adenomas from 200 subjects, given a rate of allelic loss in excess of 50%. This would apply even if the frequency of the deleterious gene and its associated risk were to vary widely (for example, 0.001-0.1 and 5-20 respectively). The utility of this approach is, however, dependent upon a low background rate of somatic events, a condition most likely to be fulfilled in early tumours.
ASSOCIATED NON-TUMOUR PHENOTYPES
Some inherited disease phenotypes (or, sometimes, normal variants) are associated with an increased risk of tumours, without tumours being an integral part of the disease process (table 2). These non-tumour phenotypes are classically non-neoplastic, although there is some evidence that this classification is incorrect in some cases. Many of the associated phenotypes have high sib relative risks and may be relatively genetically homogeneous, so that these diseases are much more amenable to genetic study than the cancer itself. The genes involved in predisposition to the disease phenotype may either contribute directly to inherited cancer risk, or may harbour a set of alleles which increases cancer risk without leading to the associated disease phenotype. The usual methods of studying complex genetic diseases can be used to analyse families with these tumour associated phenotypes. In almost all cases, the phenotype is relatively common, usually more so than the associated cancer. In most cases, genetic studies of the diseases shown are in progress.
Some inherited tumours are associated with an increased risk of tumours of a different site or type from the cancer of primary interest, without any good evidence that one tumour type can progress to the other. Like the associated non-tumour phenotypes, the associated tumours may have relatively high sib relative risks and genetic homogeneity, making them an attractive means of identifying genes for the cancer of interest. Examples include multiple cutaneous leiomyomas, which are associated with uterine leiomyomas and leiomyosarcomas,63 melanocytic naevi which are associated with melanomas,59 and palmar keratoses which are associated with bladder cancer.46 47 Such lesions are theoretically amenable to study by a combination of linkage analysis and somatic mutation mapping, although no study of this type has yet been completed.
Modifier genes for the severity of a Mendelian disease must be distinguished from minor susceptibility loci for that disease, on the grounds that the modifiers have no direct effect on disease susceptibility. In cancer, however, there is a requirement for somatic mutations and there is frequently an overlap, in theory and in practice, between the spectrum of germline and somatic mutations in a single tumour type. Thus, for example, a gene which modifies the number of adenomas in the Mendelian disease familial adenomatous polyposis (FAP) has a good chance of also influencing the probability that a person develops a sporadic colorectal adenoma and thus carcinoma (by reason that it acts in a similar fashion on germline and somaticAPC mutations). Such genes may therefore act as low penentrance susceptibility loci for cancer, as long as they harbour suitable genetic variation. Mendelian cancer modifier genes are more likely to be QTLs, but semi-quantitative or qualitative variation (for example, in progression of benign lesions to malignancy or the propensity to metastasise) may also exist; in attenuated FAP, for example, the modifier may have such a strong effect that no detectable tumour develops.
One strategy for identifying modifiers of Mendelian cancer syndromes is to use discordant sib pairs, both of whom are affected by the disease (or, at least, carry the disease causing mutation), but one of whom has mild disease and the other has severe disease. Alternatively, disease severity may be treated as a continuous variable. The best Mendelian cancer syndromes to study are those associated with multiple lesions, in order to minimise the effects of somatic mutations which occur randomly in time. It is not essential to exclude disease variation caused by genotype-phenotype correlations provided that studies are performed within families, but it is clear that the clinical data on which these studies are based must be collected in a uniform manner and that, in cases in which the phenotype may be influenced by exogenous factors, information on all possible confounding covariates is collected.
The increasingly complete synteny between the map of mouse and human genomes opens up the possibility of using mouse models for the identification of low penetrance cancer susceptibility genes. A complete discussion of such possibilities is beyond the scope of this article. Nevertheless, the ease with which mouse models can be used to investigate complex traits will undoubtedly lead to an increase in the use of animal models as a route into the human situation,64 thus reversing the current trend to make mouse models of human disease. Although attractive, the use of animal models is not problem free. Specifically, the limited genetic variation in animal systems and the very different environments and reproductive strategies of mice and humans represent intrinsic problems, as evidenced by attempts to identify a human modifier gene for FAP using the Min mouse. In addition, the use of animal models to identify cancer genes is beset by the very different life span of humans and small mammals and hence the time frame within which a tumour must grow. Furthermore, one of the most important facets of tumourigenesis, namely organ or site specificity, can be difficult to explore using animal models, as seen in the Min mouse model of FAP65-67 and mouse models of HNPCC.65 68 69 Attempts to counter these problems in animal systems, for example by the use of extrinsic carcinogens as a strategy to accelerate tumourigenesis, may succeed, but also have potential to create additional problems.
RESPONSE TO TREATMENT
Intervention in cancer provides another potential way of identifying moderate or low penetrance predisposition genes. Genes influencing the response to therapy may also influence susceptibility. Naturally, the main role of these studies will be to modify therapeutic intervention and candidate susceptibility genes will emerge as a by product.
The fine mapping problem: a multidisciplinary approach
There is already a considerable body of publications and discussion of gene identification in complex diseases. Most conclude that good fortune, such as a prime candidate gene within a region of significant linkage, is the best recipe for success. In the absence of such good fortune, initial mapping screens for moderate or low penentrance cancer susceptibility genes may use any of the above techniques (although the indirect approaches must subsequently be confirmed using linkage or association studies or both). Subsequent steps must be taken to identify the gene(s) involved, and experience has shown that this can be a very troublesome task. Even when the human genome is sequenced, it may still be more efficient to proceed to fine mapping of these genes, rather than go directly to testing candidates within the minimal region of interest, both in terms of time, and as a means of overcoming the problem, resulting from linkage disequilibrium, of identifying the relevant functional variation within a genetic interval. There is considerable overlap between each of the indirect approaches for identifying cancer genes which we have detailed above. Thus, after initial mapping screens, if fine mapping is considered worthwhile, several of these methods can be used in parallel. It may even be the case that cancer has advantages over other complex diseases in this regard; minimal regions of allele loss do not, for example, exist in asthma (as far as is known). All the methods outlined above may be useful in specific circumstances, but the most useful are likely to include somatic mutation mapping, modifier identification, and the study of tumour associated phenotypes. Additional methods, such as in vitro studies of tumour suppressor genes or of gene expression in tumours may be used as supplementary techniques.
Establishing causality is clearly a major issue with respect to association studies. Conclusions about the relationship between a specific gene variant and cancer risk should be based upon the guidelines developed by Hill70: (1) the strength of the association, weak associations being more likely to be attributable to bias or confounding; (2) reproducibility of the findings, based on the analysis of different cohorts and study designs; (3) biological plausibility and functional analyses; and (4) animal models, the ease with which “knock outs” can be made making suitable animal models more frequently available.
The identification of common, moderate, or low penentrance genes for cancer is potentially of great benefit, because it allows screening to be targeted to those at greatest risk. Cancer is therefore different from diseases such as type I diabetes or rheumatoid arthritis in which gene identification will provide benefits which are less direct. It is therefore ironic that cancer has lagged behind other diseases in the identification of non-Mendelian genetic predisposition. The strategies most commonly used for identifying the genes for other complex diseases can, in theory, be used for cancer, but some of these methods will be extremely difficult to use in practice, because of problems specific to cancer which are detailed above. Affected sib pairs who are alive with gastric or pancreatic cancer are extremely rare, for example, and few patients with ovarian cancer have living parents to provide optimal power for analysis using methods like the transmission disequilibrium test. It is likely to be necessary, therefore, to use supplementary, indirect methods for the identification of non-Mendelian cancer predisposition genes. The biology of cancer provides opportunities, through, for example, the study of somatic mutations, tumour associated phenotypes, and modifier genes which may not exist in other complex diseases. Most of these rely on the multistage mutation model of carcinogenesis, which provides both problems and opportunities for the identification of moderate and low penetrance cancer genes. Finally, one of the differences between cancer and many other multifactorial diseases is that we do actually know quite a lot of the genes that are associated with a high risk of cancer, and this clears the field for the clearly more difficult area of low and moderate penetrance genetics.
We are grateful to two anonymous reviewers for their comments.