Background Colorectal cancer is the fourth most common type of cancer and the second most common cause of cancer death. Fewer than 5% of colon cancers arise in the presence of a clear hereditary cancer condition; however, current estimates suggest that an additional 15–25% of colorectal cancers arise on the basis of unknown inherited factors.
Aim To identify additional genetic factors responsible for colon cancer.
Methods A large kindred with excess colorectal cancer was identified through the Utah Population Database and evaluated clinically and genetically for inherited susceptibility.
Results A major genetic locus segregating with colonic polyps and cancer in this kindred was identified on chromosome 13q with a non-parametric linkage score of 24 (LOD score of 2.99 and p=0.001). The genetic region spans 21 Mbp and contains 27 RefSeq genes. Sequencing of all candidate genes in this region failed to identify a clearly deleterious mutation; however, polymorphisms segregating with the phenotype were identified. Chromosome 13q is commonly gained and overexpressed in colon cancers and correlates with metastasis, suggesting the presence of an important cancer progression gene. Evaluation of tumours from the kindred revealed a gain of 13q as well.
Conclusions This identified region may contain a novel gene responsible for colon cancer progression in a significant proportion of sporadic cancers. Identification of the precise gene and causative genetic change in the kindred will be an important next step to understanding cancer progression and metastasis.
- Colon cancer
- genetic linkage
- chromosome 13q
Statistics from Altmetric.com
The lifetime risk of colorectal cancer (CRC) in the USA is 6%, and this tumour is the second leading cause of cancer death after lung cancer. CRC screening programmes that include the removal of precancerous adenomatous polyps are the key to prevention, early diagnosis and survival. Understanding the genetic and environmental risk factors that affect colon cancer initiation and progression constitute a complementary piece of these prevention efforts.
Sequentially ascertained pedigree and twin studies indicate that 20–30% of colon cancer cases appear to arise in the setting of inherited susceptibility.1–3 Three to five per cent of colon cancer cases arise in the setting of well characterised inherited syndromes.4 These include syndromes in which colonic adenomatous polyps occur as a part of (1) familial adenomatous polyposis, (2) MUTYH-associated polyposis, (3) hereditary non-polyposis colon cancer (HNPCC or Lynch syndrome), and those where colonic hamartomatous polyps are found, (4) Peutz–Jeghers syndrome, (5) juvenile polyposis and (6) Cowden syndrome. Each has now been associated with a gene (or genes) that, when mutated, gives rise to the condition. Although the inherited mutations are rare in the population, the genes involved have been found to be very important for initiation and progression of all colon cancers. The genetic basis of the remaining 15–25% of inherited colon cancer susceptibility is poorly understood.
Individuals with more than one first-degree relative with colon cancer or a single first-degree relative with colon cancer diagnosed at age ≤50 years have a three- to six-fold greater risk than those with no family history.5 6 Multiple recent studies have characterised ‘high-risk colon cancer families’ that fulfilled clinical criteria for HNPCC, but were not one of the inherited syndromes based on phenotype as well as tumour and germline genetic testing.7–10 This non-syndromic type of susceptibility is less penetrant than observed in the known inherited syndromes. The average age at CRC diagnosis in these non-syndromic cases is the mid-50s to early 60s, a decade earlier than the general population (70 years), whereas the average age in Lynch syndrome is 44 years. Defining the colon cancer aetiology of this population may again reveal genes that are generally important in colon cancer development.
Association studies report several low-penetrance genetic variants associated with colon cancer risk that could account for a yet-to-be defined proportion of the familial colon cancer cases.11 12 A genome-wide association study of ∼7000 colon cancer cases and controls identified a region on 8q24 with an OR for colon cancer of ∼1.2.13 The OR climbed to 2.6 with co-inheritance of single-nucleotide polymorphisms (SNPs) on 8q24, 11q23 and 18q21 in a follow-up study of 14 000 cases and controls.14 Affected-relative-pair studies have also reported genetic regions that are co-inherited more often in first-degree relatives with CRC than those without. These include 7q31, 9q22.33, 3q21–24, and 11q23 with some minor peaks in agreement across studies.15–18 These reports support the paradigm that common inherited colon cancer arises from a number of susceptibility genes of lower penetrance than the well-described syndromes of colon cancer.
Large families whereby precise inheritance can be correlated with phenotype offer another approach to identify isolated specific loci with well-defined recombinant boundaries. Large families identified through Utah Population Database (UPDB), a genealogical resource linked to vital records and Utah and Idaho statewide cancer registries, are the foundation of what we have come to understand about both sporadic and inherited colon cancers.2 19–23 We report one such large family ascertained through UPDB and identified as having a statistical excess of CRC. Phenotypic and genetic analysis revealed a significant genetic locus on chromosome 13q that is linked to the colonic adenomatous polyp and cancer phenotype.
Materials and methods
This study was approved by the Institutional Review Board of the University of Utah. Informed consent was obtained from all research participants.
Family ascertainment and participation
The family (Kindred 5275 (K5275)) was identified from the Utah Population Database (UPDB), a genealogical resource containing over 7.5 million individual records of people who had a significant life event (birth, death, childbirth) in Utah or who are ancestral to current members of the Utah population. Probabilistic record linking methods, which take into account common identifiers to link records from one source to another, have been used to link ∼94% of Utah Cancer Registry records (1966–present) to individuals in UPDB.24 K5275 was identified from UPDB as having a statistical excess of CRC as compared with the database as a whole.24 The probability that some number of cancer cases is observed among the descendants of a founder, given some number of person-years of risk among his or her descendants, iswhere x is the number of cancers observed, and λ is the number expected given the total person time experienced in each of some number of risk strata based on age and sex. Considering only situations in which the observed number of cases (x) is greater than the expected number (λ), the probability of x or more cases being observed in a given family is
In this formula, j is incremented from 0 to (x−1) and the sum of the Poisson probability of observing j cases of λ is expected over all the possible values of j, then subtracted from 1. Pedigrees selected from UPDB by this method were reviewed for all cancers (to rule out obvious known syndromes), dominant inheritance patterns, and availability of age-appropriate participants as described in our previous report of six such families including this one.10 The p value (not adjusted for multiple comparisons) calculated under the assumption of no familial aggregation of CRC based on Poisson probability of observed number of cases was 0.002.24 The Familial Standardised Incidence Ratio of CRC (ratio of observed to expected CRCs) was calculated at 12.4 for the five-generation family as previously described.25
CRC cases in the family were contacted by the Utah Cancer Registry by mail requesting them, or their next-of-kin, for permission to be contacted by the study. Before the family was expanded, inherited CRC syndromes were excluded. Medical records were obtained on CRC cases and evaluated to rule out adenomatous and hamartomatous polyposis syndromes based on published guidelines.4 HNPCC or Lynch syndrome was excluded by evaluating DNA microsatellite instability, a common feature of Lynch syndrome tumours, in archived tumour blocks from two index CRC cases (II-6 and III-13).26 27 Tumour and normal DNA were extracted as described previously28 and analysed using the ‘reference marker panel’ (BAT25, BAT26, D2S123, D5S346 and D17S250).29 In addition, the MUTYH gene was sequenced from the germline DNA of individuals II-6, III-1 and III-15 (figure 1) for the two common mutations, Y165C and G382D, which constitute ∼85% of the mutations in the Caucasian population.30
Colonoscopy and phenotype assignment
Once known inherited CRC syndromes were excluded, study staff contacted interested individuals and expanded the kindred through family referral. All reported CRC cases were confirmed by the cancer registry or pathology report. A medical history and physical examination were completed for each participant. Clinically indicated colonoscopy with polypectomy was performed by participating endoscopists with standard preparation and monitoring. Each polyp was noted by location and size before being removed and sent for histopathological evaluation. Individuals were coded as affected on the basis of CRC status, size and number of adenomas, and the age when they were first diagnosed with an adenoma. Three individuals had a single adenoma without advanced features (<10 mm and no villous histology) and were over age 50, the age at which the frequency of adenomas exceeds 10% in the general population.31 This included two individuals with a single <5 mm adenoma at ages 80 (II-4) and 55 (III-4) and one individual with a single 7 mm adenoma at age 55 (III-5). Linkage analysis was run two ways: (1) with all adenoma and CRC cases as affected and spouses as unknown; (2) with the three noted cases run as unknown.
Genotyping and linkage analysis
Genome-wide linkage was performed using a highly polymorphic custom set of 325 short tandem repeat (STR) genetic markers. The average heterozygosity was 0.78, and the average spacing was 10.8 cM. Genotyping was performed on 13 individuals who were enrolled at the time (two spouses, III-17, III-2, and 11 kindred members, II-4, II-6, III-1, III-11, III-12, III-15, III-18, IV-1, IV-2, IV-3, IV-22 in figure 1) by using automated probe hybridisation instruments designed and built at the University of Utah Human Genome Center.32 Genotype data were screened, and misinheritances were reviewed and resolved by two technicians; these reviews were carried out for <5% of the genotypes. Genome-wide pairwise two-point linkage analysis of the genotypes was performed using the MLINK subroutine of the FASTLINK (v4.0) and LINKAGE (v5.1) program.33 34 All markers were analysed using the Marshfield genetic map35 assuming equal allele frequency and an autosomal dominant model with a population frequency of 0.001 and a penetrance of 0.60.
The family was expanded, and genotyping was also done using Affymetrix GeneChip Human Mapping 10 000 SNP array (HMA10K) on 21 individuals (four spouses, 17 kindred members). Samples were processed according to the GeneChip Mapping Assay Manual DNAARRAY_WS2 protocol (Affymetrix) on the Affymetrix Fluidics Station 400. Arrays were scanned with the Affymetrix GeneChip Scanner 3000 and analysed with Affymetrix GeneChip DNA Analysis Software (GDAS) to generate genotype assignments for each of the SNP probes on the array. The deCODE map was used with Affymetrix allele frequencies. Multipoint non-parametric linkage analysis of SNP genotypic data was performed using the program GENEHUNTER (v2.1)36 37 as part of the graphical user interface easyLinkage (v4.01). Analysis was completed both with and without removal of SNPs that were uninformative or in linkage disequilibrium, with virtually identical results. Five STR markers were used for fine mapping of the chromosome 13 locus in a total of 40 individuals (seven spouses, 33 kindred members with phenotypic information). FASTLINK and GENEHUNTER were used for the combined analysis of STR and SNP data at the chromosome 13 locus using the analysis programs on a UNIX platform.
Identification of sequence changes
Primers to amplify genes of interest were designed using the Exon Primer (Institut für Humangenetik, Munich, Germany) utility found on the UCSC Genome Browser with a maximal target size of 300 bp. Primers were designed for all exons, 5′ untranslated region (UTR), 3′-UTR, and 2 kb of promoter for each gene. Amplicons were optimised using 2.5×LC Green Plus master mix (2.0 mM; Idaho Technology, Salt Lake City, UT, USA), 1.0 μM primers with 20 ng DNA in a 10 μl reaction mixture with a temperature gradient of 62–72°C. Melting acquisition was performed on a 96-well LightScannner high-resolution melting instrument (Idaho Technology). The plate was heated from 76°C to 98°C, and melting curve analysis was performed with LightScanner Software v2.0 with normalisation of data. A total of ten individuals (four affected, six controls) were screened for each amplicon. Samples giving abnormal curves were submitted to the University of Utah DNA sequencing core for analysis. DNA sequences were compared with published sequences using the UCSC Genome Browser BLAT utility found at (http://www.genome.ucsc.edu/cgi-bin/hgBlat?command=start).
Genome-wide loss of heterozygosity (LOH) analysis on tumour
Comparative genomic hybridisation (CGH) was performed on archived formalin-fixed paraffin-embedded CRC from individual II-13 using the Agilent CGH arrays. Tumour and normal DNA were microdissected, deparaffinised, and extracted from 5–10 μm sections of the block. Puregene DNA purification protocol (Gentra, Minneapolis, MN, USA), with an extensive proteinase K treatment step, was used. Genomic DNA was digested with AluI and RsaI and labelled with Cy3-dCTP (normal) or Cy5-dCTP (tumour) using the Agilent Genomic DNA Labelling Kit. Labelled DNA was hybridised to Agilent's Human Genome CGH Microarray Kit 44B with an average resolution of 35 kb. Hybridised microarray slides were washed, dried and scanned using an Agilent G2505B Microarray Scanner. Data were read and processed using Agilent's Feature Extraction Software to prepare microarray data for analysis. CGH Analytics software (Agilent, Santa Clara, CA, USA) was used to check data quality and analyse statistically significant gains and losses.
Formalin-fixed paraffin-embedded tumour DNA (individuals II-6, III-13 and IV-1) and adenoma DNA (individuals III-11 and IV-4) were compared with normal DNA for somatic copy number changes at the chromosome 13 locus. DNA was PCR-amplified using primers for STR markers in the region (D13S170, D13S251 and D13S265), and products were resolved and captured on an ABI3130xl capillary sequencing instrument and ABI GeneMapper 3.7 software. The ratio of the peak height of the two alleles in the tumour were compared with the ratio of the normal DNA. The ratio was calculated as: (peak area of tumour allele 2/peak area of tumour allele 1)/(peak area of normal allele 2/peak area of normal allele 1), and values over 1.5 were suggestive of copy number gain in the tumour.
Phenotype of family
The couple at the top of the kindred had 81 descendents recorded in UPDB (figure 1). The extended five-generation pedigree has been published and shows five documented CRCs at ages 72, 86, 61, 42 and 35 years.10 The family clinically evaluated and used for genotyping and linkage analysis includes four lower generations and is shown in figure 1. The average age at colon cancer diagnosis in this branch was 46, and the average age when the first adenoma was detected was 49.5. DNA was obtained for genotyping on 40 individuals including spouses (table 1). Colonoscopy was performed on 32 kindred members, of which all have genotype data. Three family members had CRC, nine family members and two spouses had adenomatous polyps, and 22 family members had no adenomatous polyps. Medical records from the CRC cases and colonoscopy procedures showed no evidence of known adenomatous or hamartomatous polyposis conditions. No one in the family had in excess of five adenomatous polyps. Two individuals had eight hyperplastic polyps (III-5 and IV-13). Two CRC cases (II-6 and III-13) showed microsatellite stability, indicating that this family does not have Lynch syndrome. We note that no affected family members had advanced adenomas (≥10 mm or advanced histology); however, two of the three colon cancers were metastatic at diagnosis (age 42 and 35); a subtle suggestion that these adenomas may rapidly advance to a metastatic state.
K5275 links to chromosome 13q31
Two separate genome-wide scans and additional fine mapping identified a single major locus on 13q31 which segregates with adenomatous polyps and colon cancer in K5275 (figures 2–4). Both the STR scan and the HMA10K SNP scan supported linkage to this identical region. The linkage analysis, specifying an autosomal dominant model, generated a maximum two-point logarithm of odds (LOD) score of 2.43 for the STR marker D13S251 on chromosome 13. Two adjacent markers, D13S170 and D13S265, also yielded positive LOD scores, 1.26 and 1.21, respectively. Non-parametric analysis of the chromosome 13 fine mapping region including both SNP (n=12) and STR (n=4) markers with high heterozygosity yielded a maximum non-parametric linkage (NPL) score of 24.12 (LOD score of 2.99 and p=0.001) at D13S251 when individuals II-4, III-4 and III-5 were coded as unknown (figure 4). Phenotyping and analysis with these three individuals is described in Materials and Methods. When these individuals were coded as affected, the NPL score for this region increased to 30.25 (LOD score of 3.12 and p=0.0005), but the maximum LOD score is now found at D13S265. This result is due to inclusion of individual III-5, who shares two markers with the minimal disease haplotype (figure 5). When this individual is excluded, the non-recombinant region spans rs1870836 (75 454 410 Mbp; 13q22.2) to D13S265 (89 171 101 Mbp; 13q31.3).
Although this is a large genetic region of 21 Mbp, there are only 27 RefSeq genes in the non-recombinant region. Included are eight genes, KLF5, KLF12, LMO7, c13orf7 (RNF219), SPRY2, GPC5, MYCBP2 and POU4F1 that have been implicated in cancer initiation or progression.38 39 Each of the exons in these eight genes (or the RNA in the case of MYCBP2) was evaluated for genetic variants using the LightScanner followed by sequencing when a variant was detected. No unambiguously deleterious mutations have been identified in these genes; however, SNPs segregating with the colon cancer and polyp phenotype were identified (table 2). There is no evidence to indicate if these are in disequilibrium with the responsible change or if they are causative. Although the frequencies of the noted SNPs in RNF219 and POU4F1 are not reported in the population, they have been observed in individuals without colon cancer.
LOD scores were negative in regions surrounding genes known to cause familial colon cancer (APC, MYH, CTNNB1, MLH1, MSH2, MSH6, PMS2, STK11, PTEN, BMPR1A). The exception was D18S548, which is near SMAD4, with a LOD of 1.36; however, additional markers close to SMAD4 (D18S363 and D18S858) gave negative LOD scores.
Whole genome loss and gain of tumour in kindred reveals classic 13q gain
CGH analysis was performed on archived colon tumour and normal DNA from individual III-13, who is an obligate carrier of the chromosome 13 haplotype. The cancer was diagnosed at age 42 and had metastasised to five of 15 lymph nodes. CGH analysis revealed duplication of the majority of chromosome 13q (see online supplemental figure S1A); however, at the region of linkage (online supplemental figure S1B), there are only two small significant amplifications (p=0.002) as indicated by the gold bar on the right. One region includes POU4F1, a gene sequenced and found to have a novel SNP ∼50 bp upstream of the 5′-UTR. Interestingly, this cancer had losses commonly observed in sporadic CRCs including APC on chromosome 5, TP53 on the p-arm of 17, and DCC and SMAD4 on chromosome 18.
To further support this result, all three colon cancers from the family and two adenomas >5 mm were evaluated for copy number gains by comparing peak areas of three STR markers at the chromosome 13 locus in neoplastic versus normal DNA. The adenomas showed no change; however, all three cancers showed a copy number increase at one of three markers. The tumour from IV-1 had a ratio of 1.65 at D13S170, and tumours II-6 and III-13 had ratios of 1.89 and 1.67, respectively, at D13S265. The other markers were either non-informative (homozygous, see figure 5) or did not show copy number gain.
We describe a large extended family with excess CRC and no clinical or molecular features of the known hereditary colon cancer conditions. Colon cancers were identified at early ages (average of 59 years in the extended pedigree, and 46 years in the lower four generations who were enrolled in study), suggesting that underlying hereditary factors were at play. A genome-wide scan with linkage analysis identified a major locus on chromosome 13q22.1–13q31.3 that correlates statistically with colon cancer and adenomatous polyps in this family. Although linkage of 13q to CRC has not been previously reported, chromosome 13q is often gained and overexpressed in primary colon tumours.40–43 Consistent with these sporadic tumours, CGH analysis of one tumour from K5275 showed a gain of 13q, along with other chromosomal regions commonly altered in colon tumours. Two other cancers in the family show a copy number gain at 13q with one of three markers. Genes within this region are thought to play an important role in CRC progression, but have not been precisely identified.
Linkage analysis in common inherited conditions such as CRC can be challenging because of reduced penetrance, sporadic adenomas and the interplay of multiple genetic and environmental factors. Analysis was run coding three individuals with adenomas but not meeting the strict disease criteria as both affected and unknown. Because of the large size of the family, analysis could be restricted to the most informative branches, excluding all of the young (under 39 years of age) unaffected individuals in the lower generation, which represent a mix of gene carriers and non-carriers (figure 1). Inclusion of these additional individuals, even with assigning of age liability classes with reduced penetrance, reduces the LOD score for marker D13S251 to 2.17, suggesting that many of the individuals in the younger generation are too young to express the phenotype. Having a family large enough to be able to eliminate phenotypic ambiguity may be essential to tease the signal from the noise in these rare inherited conditions that are also common in the general population.
As demonstrated by this study, investigation of large extended families, especially those identified through the UPDB, provides a powerful approach to defining precise boundaries of genetic regions involved in CRC initiation and/or progression. One could argue that genetic regions found to increase colon cancer risk in these families are unique to the family and not applicable to the general population. However, the fact that 13q is gained in 30–50% of all CRCs and correlates particularly with metastasis suggests that this is an important genetic region. Consistent with this correlation, it appears that simple adenomas may rapidly advance to metastatic colon cancer in this family, as no family members had the intermediate phenotype of advanced adenomas.
Identification of the precise gene and causative genetic change in this kindred will be an important next step. Traditional exon sequencing of genes in this region has not yielded a clear answer. A more comprehensive approach using high-throughput sequencing technology and evaluation of inherited copy number variation will be applied to the entire locus from representative affected family members. There is a reasonable chance that a clear deleterious mutation will not be identified because evidence from sporadic CRCs suggests that there is a gain of function (oncogene) at 13q. The polymorphisms, non-coding sequences and copy number variations that are identified and that segregate with affected individuals will need to be evaluated for biological significance. Once the gene is identified, additional cases/families can be evaluated to determine the precise fraction of colon cancer cases that this specific loci affects. If this gene is also involved in sporadic cancer progression and metastasis, there may be opportunities for management of the molecular process through prevention or treatment interventions.
University of Utah's sequencing, genotyping and microarray core facilities provided services in support of this study. We are grateful to the study coordinators, Amy Lee Dalton, David Nilson, Jennifer Lilley and Michelle Lewandowski for their tireless work with the family. We also thank Cindy Solomon, Rebecca Hulinsky, Katrina Lowstuter and Kory Jasperson for counselling the families on cancer risk, and Michelle Condie-Done for analysis of the tumour samples.
Funding This study was supported by National Cancer Institute grants R01-CA40641 and PO1-CA73992; additional support was provided by a Cancer Center Support Grant P30-CA42014, General Clinical Research Center Grant M01-RR00064 and N01-PC-67000, Utah Cancer Registry grant N01-PC-35141 from the National Cancer Institute's SEER program with additional support from the Utah State Department of Health and the University of Utah, and by the Huntsman Cancer Foundation. Database support to the Utah Population Database is provided by the Huntsman Cancer Foundation. Other Funders: NIH; Huntsman Cancer Foundation, Utah State Department of Health.
Competing interests None.
Ethics approval This study was conducted with the approval of the University of Utah Institutional Review Board.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.