Variants in or near KITLG, BAK1, DMRT1, and TERT-CLPTM1L predispose to familial testicular germ cell tumour
- Christian P Kratz1,
- Summer S Han1,
- Philip S Rosenberg1,
- Sonja I Berndt1,
- Laurie Burdett2,
- Meredith Yeager2,
- Larissa A Korde1,
- Phuong L Mai1,
- Ruth Pfeiffer1,
- Mark H Greene1
- 1Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA
- 2Core Genotyping Facility, National Cancer Institute, SAIC-Frederick, Gaithersburg, Maryland, USA
- Correspondence to Christian P Kratz, Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, 6120 Executive Blvd, EPS/7018, Rockville, MD 20852, USA;
Contributors CPK planned the study, wrote the paper; SSH conducted analysis, wrote the paper; PSR conducted analysis; SIB provided control samples; LB and MY performed genotyping; LAK and PLM planned the study; RP conducted analysis; MHG planned the study, wrote the paper.
- Received 17 February 2011
- Revised 21 March 2011
- Accepted 18 April 2011
- Published Online First 26 May 2011
Background Familial testicular germ cell tumours (TGCTs) and bilateral TGCTs comprise 1–2% and 5% of all TGCTs, respectively, but their genetic basis remains largely unknown.
Aim To investigate the contribution of known testicular cancer risk variants in familial and bilateral TGCTs.
Methods and results The study genotyped 106 single nucleotide polymorphisms (SNPs) in four regions (BAK1, DMRT1, KITLG, TERT-CLPTM1L) previously identified from genome-wide association studies of TGCT, including risk single nucleotide polymorphisms (SNPs) rs210138 (BAK1), rs755383 (DMRT1), rs4635969 (TERT-CLPTM1L) in 97 cases with familial TGCT and 22 affected individuals with sporadic bilateral TGCT as well as 871 controls. Using a generalised estimating equations method that takes into account blood relationships among cases, the associations with familial and bilateral TGCT were analysed. Three previously identified risk SNPs were found to be associated with familial and bilateral TGCT (rs210138: OR 1.80, CI 1.35 to 2.41, p=7.03×10−5; rs755383: OR 1.67, CI 1.23 to 2.22, p=6.70×10−4; rs4635969: OR 1.59, CI 1.16 to 2.19, p=4.07×10−3). Evidence for a second independent association was found for an SNP in TERT (rs4975605: OR 1.68, CI 1.23 to 2.29, p=1.24×10−3). Another association with an SNP was identified in KITLG (rs2046971: OR 2.33, p=1.28×10−3); this SNP is in high linkage disequilibrium (LD) with reported risk variant rs995030.
Conclusion This study provides evidence for replication of recent genome-wide association studies results and shows that variants in or near BAK1, DMRT1, TERT-CLPTM1L, and KITLG predispose to familial and bilateral TGCT. These findings imply that familial TGCT and sporadic TGCT share a common genetic basis.
- Testicular cancer
- cancer-prone syndromes
- risk variants
- paediatric oncology
- genetic epidemiology
Testicular germ cell tumour (TGCT) is the most common cancer diagnosed among young men.1 Most affected individuals are diagnosed with seminomas or non-seminomas; however, mixed germ cell tumours also occur. The incidence of TGCT has increased since 1960,2 suggesting that TGCTs are, at least partially, caused by environmental factors. Established risk factors include white race, positive personal or family history of TGCT, and cryptorchidism (reviewed in Greene et al3).
TGCTs have a strong genetic component; approximately 1.4% of men with TGCT have familial TGCT, defined as at least two affected men in one family.4 Sons of men with TGCT display a four- to sixfold increase in TGCT risk. In brothers of cases, the risk is increased eight- to 10-fold.5 6 Interestingly, in dizygotic and monozygotic twin brothers of men with TGCT, 37-fold and 76.5-fold elevated risks of TGCT have been reported, respectively7; however, according to a recent meta-analysis, twins had only an approximately 30% increased risk of developing TGCT.8 Moreover, 2–5% of patients develop bilateral TGCT.9 10
Three recent genome-wide association studies (GWAS) have uncovered predisposition loci for TGCT in or near six genes: KITLG, SPRY4, BAK1, TERT-CLPTM1L, ATF7IP, and DMRT1.11–13 Although the majority of subjects participating in these three studies had non-familial TGCT, these GWAS also included subsets of subjects with familial TGCT and bilateral TGCT. Subgroup-analyses showed no significant differences between sporadic, bilateral and familial cases in these data11–13; however, the power to detect subgroup differences was limited. There are important aetiologic implications if it can be established that single nucleotide polymorphism (SNP) associations are homogenous in familial and sporadic cases. The National Cancer Institute's (NCI) Clinical Genetics Branch is conducting a multidisciplinary aetiologic study of familial and sporadic bilateral TGCT, and those cases are independent of those published in GWAS to date.14 Here, we investigated whether four of the previously identified regions are associated with familial or bilateral TGCT cases in our study cohort.
Patients and methods
Affected individuals and controls
Genotyping was performed on 97 patients with familial TGCT from 56 multiple-case families and 22 affected individuals with sporadic bilateral TGCT (table 1). Men with sporadic bilateral disease were included because of a presumed strong genetic component underlying bilateral TGCT. All subjects were white and enrolled in NCI Clinical Genetics Branch Familial Testicular Germ Cell Study (NCI Protocol 02-C-0178; NCT00039598) from 2003 to 2009. Subjects were recruited/ascertained through self-referral. We confirmed the diagnosis by reviewing medical records, pathology reports, and/or stored histological material. Written informed consent was obtained from all participants, and the study was approved by NCI's institutional review board. An additional 871 cancer-free Caucasian male control subjects were obtained from the Prostate, Lung, Colorectal, Ovarian (PLCO) Cancer Screening Trial,15 which is an early cancer detection screening trial that enrolled men and women, ages 55–74 years, from 10 different centres in the USA between 1993 and 2001. All subjects included in this study were required to have completed a baseline questionnaire, provided a blood specimen, and consented to participate in aetiologic studies of cancer and related diseases. Controls were limited to whites living in the continental USA without a diagnosis of colon adenoma or cancer at baseline. DNA was extracted from blood specimens using standard procedures. The institutional review boards at the NCI and 10 screening centres approved the PLCO study.
Genotyping was conducted using DNA extracted from blood or buffy coat from all subjects at the Core Genotyping Facility of the NCI's Division of Cancer Epidemiology and Genetics, using a custom iSelect bead chip (Illumina Custom Infinium, http://www.illumina.com/pages.ilmn?ID=158) as part of a large scale genotyping effort at NCI for 19 different tumour types. The iSelect panel included 27 904 SNPs representing ∼1300 genes. These candidate genes were chosen by various investigators because of their potential role in the pathogenesis of one of the 19 studied tumour types. TagSNPs were chosen for the candidate genes included on this platform based on the HapMap CEU population (Data Release 20/Phase II, NCBI Build 36.1 assembly, dbSNPb126) using a modified version of the methods by Carlson et al.16 For each candidate gene, tagSNPs were selected for the region spanning 20 kb upstream and 10 kb downstream of the gene, using a binning threshold of r2=0.8. Description and methods for assays can be found at http://cgf.nci.nih.gov/operations/multiplex-genotyping.html. A total of 195 duplicates were included for quality control purposes; these had 99.9% concordance. SNPs were excluded if they had a call rate <90%, were inconsistent with Hardy–Weinberg proportions among controls (p<1×10−6), or failed validation. SNPs on the X chromosome were also excluded if they exhibited >10% heterozygosity among males. Individuals with a call rate <90% were also excluded. After exclusions, 25 823 SNPs remained. Of the six previously identified gene regions for testicular cancer, four of the genes (BAK1, DMRT1, KITLG, and TERT) were genotyped on our iSelect panel. A total of 106 SNPs were analysed from those regions.
To take into account correlations among cases within each family, we used a generalised estimating equations approach17 that incorporated known familial relationships using kinship coefficients. For the jth individual in the ith family, our linear logistic model is given as , where xij is an SNP genotype for subject j in family i. The variance–covariance matrix for the ith family was specified as , where Ai is a diagonal matrix with the jth diagonal value taking the binomial variance pij (1−pij). Here Ri is a correlation matrix for which we used 2×kinship coefficient as correlation between each pair of subjects, which can be calculated from known familial relationships. A kinship coefficient represents a probability that any randomly chosen two alleles from two individuals are identical by descent, taking values from 0 (unrelated pair) to 0.5 (monozygotic twins). Kinship coefficients were calculated using the R package kinship, and a GEE estimation was performed using the R package geepack.18
Conditional analysis was performed to check the independence of association signals from two distinct loci within a gene. More specifically, we performed an association test for one locus after adjusting for the other locus (and vice versa), and concluded that the signals were dependent (or independent) if the significances were decreased (or unchanged) after adjusting for each other. An analysis by adaptive combination of p values was conducted to combine the information of association signals from multiple SNPs, and also to take into account multiple testing within a gene.19 The products of the top K p values (K=1,2,…5) were used as test statistics, and their significance was assessed through permutations. In order to generate permutations that incorporate both the correlations among cases and the ascertainment scheme of the study, we applied the following procedure. Among the NCASE families with cases and the NCONTROL ‘families’ with controls (with family size one), we randomly chose NCASE families and re-labelled the disease status as ‘case’ for all the members in the chosen families. Similarly for the rest of the NCASE families not selected as a case family in the permutation testing, we re-labelled the disease status as ‘control’. This strategy breaks the association between SNPs and disease status, while keeping (1) the ascertainment scheme in our study, wherein all family members have the same disease status, and (2) the linkage disequilibrium (LD) structure among SNPs in the observed data. Our simulation study showed this gives a correct type 1 error (data not shown). The software Haploview (version 4.2) was used to estimate and visualise LD among SNPs using our control genotype data.20
Of the previously identified gene regions,11–13 four were genotyped in this study, including three of the identified risk SNPs, rs210138 (within an intron of BAK1), rs755383 (near DMRT1), and rs4635969 (near TERT-CLPTM1L). Moreover, we genotyped SNP rs2046971 (KITLG), which served as a surrogate for the previously identified risk SNP rs995030 from this region. To test whether we could confirm these four risk loci in a cohort of familial or bilateral TGCT cases, we analysed these four SNPs first. We found all four SNPs to be associated with familial and bilateral TGCT under a log-additive genetic model (rs210138: OR 1.80, CI 1.35 to 2.41, p=7.03×10−5, non-risk/risk allele: A/G; rs755383: OR 1.67, CI 1.23 to 2.22, p=6.70×10−4, non-risk/risk allele: C/T; rs4635969: OR 1.59, CI 1.16 to 2.19, p=4.07×10−3, non-risk/risk allele: C/T; rs2046971: OR 2.33, CI 1.39 to 3.85, p=1.28×10−3, non-risk/risk allele: G/C).
Next we tested whether other SNPs in these four genomic regions had similar or stronger associations with familial or bilateral TGCT. We found several SNPs in BAK1 that were strongly associated with case status; the strongest association was for rs210162 (p=9.11×10−6) (figure 1A). Notably, rs210138 and rs210162 were in LD (r2=0.74), and conditional analysis suggested that these associations were correlated and driven by one signal (data not shown). The LD among the seven top associated SNPs in this region was high, the r2 value ranging from 0.66 to 1.00. No additional associations that were stronger than rs755383 were identified in DMRT1 (figure 1B). However, we identified a second associated SNP in the TERT locus (rs4975605: OR 1.68, CI 1.23 to 2.29, p=1.24×10−03) (figure 1C). rs4975605 is located within an intron of TERT and is not in LD with rs4635969 (r2=0.04) or with rs2736100 (r2=0.013; this is calculated from HapMap data), another reported TGCT risk variant in the TERT locus.13 Conditional analysis suggested that both signals in rs4975605 and rs4635969 were independent because their effects remain detectable after correction for each other (rs4975605 adjusting for rs4635969: OR 1.60, CI 1.15 to 2.21, p=4.66×10−3; rs4635969 adjusting for rs4975605: OR 1.43, CI 1.02 to 1.99, p=3.63×10−2). An analysis by adaptive combination of p values was then performed to assess the significance of association signals from multiple SNPs within an entire gene, and the results showed p=6.01×10−5, p=0.0144, p=0.0403, and p=0.0028 for BAK1, DMRT1, TERT and KITLG, respectively.
Three different mechanisms may underlie the totality of familial TGCT. First, familial TGCT may be a classical Mendelian disorder that is caused by germline mutations in rare, high penetrant, yet-to-be-discovered genes. A second subset of familial TGCT may be genetically driven by a polygenic disorder associated with several common, low penetrant susceptibility alleles. A third subset may be due to primarily shared environmental exposures in members of individual families. Of course, genetic and environmental factors may modulate the risk in each basic type. However, the clear replication signals of SNPs implicated in familial disease strongly suggests that familial, bilateral, and sporadic tumours are polygenic diseases driven by the same spectrum of genetic risk factors. Previous linkage studies that did not detect loci with consistently high logarithm of odds (LOD) scores,21 22 the three recent GWAS,11–13 and our findings are consistent with this model.
Our study provides the first replication of the recently identified TGCT risk loci in DMRT1 and TERT, and the second replication of previously identified predisposition alleles in BAK1 and KITLG. Moreover, we provided evidence for a new and independent signal in TERT that requires further verification. Although the reported GWAS focused on non-familial TGCT, they also included subsets of patients with familial disease or bilateral disease.11–13 All three GWAS demonstrated similar ORs in familial TGCT cases compared with those TGCT cases without a family history.11–13 Therefore, our results from an independent set of previously unstudied familial cases strengthen the notion that familial and sporadic TGCT are polygenic diseases associated with the same genetic factors.
We compared the ORs obtained from our study with those reported in the previous GWAS, and found that they are quite similar; the reported ORs for rs4635969 in TERT were 1.65 and 1.5413 and ours is 1.59; for KITLG, the reported ORs were 2.29 and 2.59 for rs99503012 and we obtained an OR of 2.33 for rs2046971, which is in high LD with rs995030; for DMRT1 and BAK1, we obtained slightly higher ORs compared to the previous findings; for rs210138 in BAK1, the reported OR was 1.512 while ours is 1.8; for rs755383 in DMRT1, we have an OR of 1.67, which is larger than 1.57 and 1.37 which were previously reported.13 Also, previous studies reported that <1% of sporadic cases were homozygous for the non-risk minor allele in rs4474514 in KITLG,11 and we found that such a pattern was observed in familial cases in our data; one of 97 familial cases (1%) was homozygous for non-risk minor allele in rs2046971 in KITLG and none of 22 sporadic cases were homozygous in this SNP.
Our analytic method accounted for correlations between relatives, and thereby increased the power to detect associations because it allowed us to include all affected members from each individual family. This method may be useful to explore large scale genetic association analyses in other complex disorders with strong heritability but unclear linkage signals. Due to sample size limitations of the present study, our approach was better suited for testing prior hypotheses rather than agnostically detecting novel associations. The control group is a convenience sample not closely matched in age, but it is unlikely that this matters given the comparatively early onset of this disease. Another limitation was that bilateral and familial cases were not separated in this analysis. Notably, the three GWAS did not observe differences between these subgroups, suggesting that they do not represent biologically distinct entities.
In conclusion, this is the first large scale genotyping effort focusing exclusively on subjects with familial or bilateral TGCT. Using a statistical approach that accounted for familial relationships, we confirmed results from recent GWAS and identified familial/bilateral TGCT risk alleles in KITLG, BAK1, TERT, and DMRT1. We provided evidence for a new and independent signal in TERT that requires further verification. Together with the results from previous GWAS, our data suggest that familial TGCT and bilateral and sporadic TGCT are polygenetic diseases caused by the same spectrum of genetic risk factors.
We are grateful to the Protocol 02-C-0178 and PLCO participants for their valuable contributions.
CPK and SSH contributed equally to this work.
Funding This work was supported by the Intramural Research Program of the National Institutes of Health and the National Cancer Institute, and by a support services contract with Westat (N02-CP-65504).
Competing interests None.
Patient consent Obtained.
Ethics approval Ethics approval was provided by NCI IRB.
Provenance and peer review Not commissioned; externally peer reviewed.