Association between genetic polymorphisms and endometrial cancer risk: a systematic review

Introduction Endometrial cancer is one of the most commonly diagnosed cancers in women. Although there is a hereditary component to endometrial cancer, most cases are thought to be sporadic and lifestyle related. The aim of this study was to systematically review prospective and retrospective case–control studies, meta-analyses and genome-wide association studies to identify genomic variants that may be associated with endometrial cancer risk. Methods We searched MEDLINE, Embase and CINAHL from 2007 to 2019 without restrictions. We followed PRISMA 2009 guidelines. The search yielded 3015 hits in total. Following duplicate exclusion, 2674 abstracts were screened and 453 full-texts evaluated based on our pre-defined screening criteria. 149 articles were eligible for inclusion. Results We found that single nucleotide polymorphisms (SNPs) in HNF1B, KLF, EIF2AK, CYP19A1, SOX4 and MYC were strongly associated with incident endometrial cancer. Nineteen variants were reported with genome-wide significance and a further five with suggestive significance. No convincing evidence was found for the widely studied MDM2 variant rs2279744. Publication bias and false discovery rates were noted throughout the literature. Conclusion Endometrial cancer risk may be influenced by SNPs in genes involved in cell survival, oestrogen metabolism and transcriptional control. Larger cohorts are needed to identify more variants with genome-wide significance.


AbsTrACT
Introduction endometrial cancer is one of the most commonly diagnosed cancers in women. although there is a hereditary component to endometrial cancer, most cases are thought to be sporadic and lifestyle related. The aim of this study was to systematically review prospective and retrospective case-control studies, meta-analyses and genome-wide association studies to identify genomic variants that may be associated with endometrial cancer risk. Methods We searched MeDline, embase and cinahl from 2007 to 2019 without restrictions. We followed PrisMa 2009 guidelines. The search yielded 3015 hits in total. Following duplicate exclusion, 2674 abstracts were screened and 453 full-texts evaluated based on our pre-defined screening criteria. 149 articles were eligible for inclusion. results We found that single nucleotide polymorphisms (snPs) in HNF1B, KLF, EIF2AK, CYP19A1, SOX4 and MYC were strongly associated with incident endometrial cancer. nineteen variants were reported with genomewide significance and a further five with suggestive significance. no convincing evidence was found for the widely studied MDM2 variant rs2279744. Publication bias and false discovery rates were noted throughout the literature. Conclusion endometrial cancer risk may be influenced by snPs in genes involved in cell survival, oestrogen metabolism and transcriptional control. larger cohorts are needed to identify more variants with genome-wide significance.

InTroduCTIon
Endometrial cancer is the most common gynaecological malignancy in the developed world. 1 Its incidence has risen over the last two decades as a consequence of the ageing population, fewer hysterectomies for benign disease and the obesity epidemic. In the USA, it is estimated that women have a 1 in 35 lifetime risk of endometrial cancer, and in contrast to cancers of most other sites, cancer-specific mortality has risen by approximately 2% every year since 2008 related to the rapidly rising incidence. 2 Endometrial cancer has traditionally been classified into type I and type II based on morphology. 3 The more common subtype, type I, is mostly comprised of endometrioid tumours and is oestrogen-driven, arises from a hyperplastic endometrium, presents at an early stage and has an excellent 5 year survival rate. 4 By contrast, type II includes non-endometrioid tumours, specifically serous, carcinosarcoma and clear cell subtypes, which are biologically aggressive tumours with a poor prognosis that are often diagnosed at an advanced stage. 5 Recent efforts have focused on a molecular classification system for more accurate categorisation of endometrial tumours into four groups with distinct prognostic profiles. 6 7 The majority of endometrial cancers arise through the interplay of familial, genetic and lifestyle factors. Two inherited cancer predisposition syndromes, Lynch syndrome and the much rarer Cowden syndrome, substantially increase the lifetime risk of endometrial cancer, but these only account for around 3-5% of cases. [8][9][10] Having first or second degree relative(s) with endometrial or colorectal cancer increases endometrial cancer risk, although a large European twin study failed to demonstrate a strong heritable link. 11 The authors failed to show that there was greater concordance in monozygotic than dizygotic twins, but the study was based on relatively small numbers of endometrial cancers. Lu and colleagues reported an association between common single nucleotide polymorphisms (SNPs) and endometrial cancer risk, revealing the potential role of SNPs in explaining part of the risk in both the familial and general populations. 12 Thus far, many SNPs have been reported to modify susceptibility to endometrial cancer; however, much of this work predated genome wide association studies and is of variable quality. Understanding genetic predisposition to endometrial cancer could facilitate personalised risk assessment with a view to targeted prevention and screening interventions. 13 This emerged as the most important unanswered research question in endometrial cancer according to patients, carers and healthcare professionals in our recently completed James Lind Womb Cancer Alliance Priority Setting Partnership. 14 It would be particularly useful for non-endometrioid endometrial cancers, for which advancing age is so far the only predictor. 15 We therefore conducted a comprehensive systematic review of the literature to provide an overview of the relationship between SNPs and endometrial cancer risk. We compiled a list of the most robust endometrial cancer-associated SNPs. We assessed the applicability of this panel of SNPs with a theoretical polygenic risk score (PRS) calculation. We also critically appraised the meta-analyses investigating the

search strategy
We searched Embase, MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases via the Healthcare Databases Advanced Search (HDAS) platform, from 2007 to 2018, to identify studies reporting associations between polymorphisms and endometrial cancer risk. Key words including MeSH (Medical Subject Heading) terms and free-text words were searched in both titles and abstracts. The following terms were used: "endomet*","uter*", "womb", "cancer(s)", "neoplasm(s)", "endometrium tumour", "carcinoma", "adenosarcoma", "clear cell carcinoma", "carcinosarcoma", "SNP", "single nucleotide polymorphism", "GWAS", and "genome-wide association study/ies". No other restrictions were applied. The search was repeated with time restrictions between 2018 and June 2019 to capture any recent publications.

eligibility criteria
Studies were selected for full-text evaluation if they were primary articles investigating a relationship between endometrial cancer and SNPs. Study outcome was either the increased or decreased risk of endometrial cancer relative to controls reported as an odds ratio (OR) with corresponding 95% confidence intervals (95% CIs).

study selection
Three independent reviewers screened all articles uploaded to a screening spreadsheet developed by Helena VonVille. 17 Disagreements were resolved by discussion. Chronbach's α score was calculated between reviewers and indicated high consistency at 0.92. Case-control, prospective and retrospective studies, genome-wide association studies (GWAS), and both discovery and validation studies were selected for full-text evaluation. Non-English articles, editorials, conference abstracts and proceedings, letters and correspondence, case reports and review articles were excluded.
Candidate-gene studies with at least 100 women and GWAS with at least 1000 women in the case arm were selected to ensure reliability of the results, as explained by Spencer et al. 18 To construct a panel of up to 30 SNPs with the strongest evidence of association, those with the strongest p values were selected. For the purpose of an SNP panel, articles utilising broad European or multi-ethnic cohorts were selected. Where overlapping populations were identified, the most comprehensive study was included.

data extraction and synthesis
For each study, the following data were extracted: SNP ID, nearby gene(s)/chromosome location, OR (95% CI), p value, minor or effect allele frequency (MAF/EAF), EA (effect allele) and OA (other allele), adjustment, ethnicity and ancestry, number of cases and controls, endometrial cancer type, and study type including discovery or validation study and meta-analysis. For risk estimates, a preference towards most adjusted results was applied. For candidate-gene studies, a standard p value of<0.05 was applied and for GWAS a p value of <5×10 -8 , indicating genome-wide significance, was accepted as statistically significant. However, due to the limited number of SNPs with p values reaching genome-wide significance, this threshold was then lowered to <1×10 -5 , allowing for marginally significant SNPs to be included. As shown by Mavaddat et al, for breast cancer, SNPs that fall below genome-wide significance may still be useful for generating a PRS and improving the models. 19 We estimated the potential value of a PRS based on the most significant SNPs by comparing the predicted risk for a woman with a risk score in the top 1% of the distribution to the mean predicted risk. Per-allele ORs and MAFs were taken from the publications and standard errors (SEs) for the lnORs were derived from published 95% CIs. The PRS was assumed to have a Normal distribution, with mean 2∑β i p I and SE, σ, equal to √2∑β i 2 p I (1−p i ), according to the binomial distribution, where the summation is over all SNPs in the risk score. Hence the relative risk (RR) comparing the top 1% of the distribution to the mean is given by exp(Z 0.01 σ), where Z is the inverse of the standard normal cumulative distribution.

resulTs
The flow chart of study selection is illustrated in figure 1. In total, 453 text articles were evaluated and, of those, 149 articles met our inclusion criteria. One study was excluded from table 1, for having an Asian-only population, as this would make it harder to compare with the rest of the results which were all either multi-ethnic or Caucasian cohorts, as stated in our inclusion criteria for the SNP panel. 20 Any SNPs without 95% CIs were also excluded from any downstream analysis. Additionally,

Table 1 Continued
SNPs in linkage disequilibrium (r 2 >0.2) with each other were examined, and of those in linkage disequilibrium, the SNP with strongest association was reported. Per allele ORs were used unless stated otherwise.

Top snPs associated with endometrial cancer risk
Following careful interpretation of the data, 24 independent SNPs with the lowest p values that showed the strongest association with endometrial cancer were obtained (table 1). [21][22][23][24][25] These SNPs are located in or around genes coding for transcription factors, cell growth and apoptosis regulators, and enzymes involved in the steroidogenesis pathway. All the SNPs presented here were reported on the basis of a GWAS or in one case, an exome-wide association study, and hence no SNPs from candidate-gene studies made it to the list. This is partly due to the nature of larger GWAS providing more comprehensive and powered results as opposed to candidate gene studies. Additionally, a vast majority of SNPs reported by candidate-gene studies were later refuted by large-scale GWAS such as in the case of TERT and MDM2 variants. 26 27 The exception to this is the CYP19 gene, where candidate-gene studies reported an association between variants in this gene with endometrial cancer in both Asian and broad European populations, and this association was more recently confirmed by large-scale GWAS. 21 28-30 Moreover, a recent article authored by O'Mara and colleagues reviewed the GWAS that identified most of the currently known SNPs associated with endometrial cancer. 31 Most of the studies represented in table 1 are GWAS and the majority of these involved broad European populations. Those having a multi-ethnic cohort also consisted primarily of broad European populations. Only four of the variants in table 1 are located in coding regions of a gene, or in regulatory flanking regions around the gene. Thus, most of these variants would not be expected to cause any functional effects on the gene or the resulting protein. An eQTL search using GTEx Portal showed that some of the SNPs are significantly associated (p<0.05) with modified transcription levels of the respective genes in various tissues such as prostate (rs11263761), thyroid (rs9668337), pituitary (rs2747716), breast mammary (rs882380) and testicular (rs2498794) tissue, as summarised in table 2.
The only variant for which there was an indication of a specific association with non-endometrioid endometrial cancer was rs148261157 near the BCL11A gene. The A allele of this SNP had a moderately higher association in the non-endometrioid arm (OR 1.64, 95% CI 1.32 to 2.04; p=9.6×10 -6 ) compared with the endometrioid arm (OR 1.25, 95% CI 1.14 to 1.38; p=4.7×10 -6 ). 21 Oestrogen receptors α and β encoded by ESR1 and ESR2, respectively, have been extensively studied due to the assumed role of oestrogens in the development of endometrial cancer. O'Mara et al reported a lead SNP (rs79575945) in the ESR1 region that was associated with endometrial cancer (p=1.86×10 -5 ). 24 However, this SNP did not reach genome-wide significance in a more recent larger GWAS. 21 No statistically significant associations have been reported between endometrial cancer and SNPs in the ESR2 gene region.

Cancer genetics
The MYC family of proto-oncogenes encode transcription factors that regulate cell proliferation, which can contribute to cancer development if dysregulated. The recent GWAS by O'Mara et al reported three SNPs within the MYC region that reached genome-wide significance with conditional p values reaching at least 5×10 -8 . 35 To test the utility of these SNPs as predictive markers, we devised a theoretical PRS calculation using the log ORs and EAFs per SNP from the published data. The results were very encouraging with an RR of 3.16 for the top 1% versus the mean, using all the top SNPs presented in table 1 and 2.09 when using only the SNPs that reached genome-wide significance (including AKT1).

Controversy surrounding MDM2 variant snP309
MDM2 negatively regulates tumour suppressor gene TP53, and as such, has been extensively studied in relation to its potential role in predisposition to endometrial cancer. Our search identified six original studies of the association between MDM2 SNP rs2279744 (also referred to as SNP309) and endometrial cancer, all of which found a statistically significant increased risk per copy of the G allele. Two more original studies were identified through our full-text evaluation; however, these were not included here as they did not meet our inclusion criteria-one due to small sample size, the other due to studying rs2279744 status dependent on another SNP. 36 37 Even so, the two studies were described in multiple meta-analyses that are listed in table 3. Different permutations of these eight original studies appear in at least eight published meta-analyses. However, even the largest meta-analysis contained <2000 cases (table 3) 38 In comparison, a GWAS including nearly 13 000 cases found no evidence of an association with OR and corresponding 95% CI of 1.00 (0.97 to 1.03) and a p value of 0.93 (personal communication). 21 Nevertheless, we cannot completely rule out a role for MDM2 variants in endometrial cancer predisposition as the candidate-gene studies reported larger effects in Asians, whereas the GWAS primarily contained participants of European ancestry. There is also some suggestion that the SNP309 variant is in linkage disequilibrium with another variant, SNP285, which confers an opposite effect.
It is worth noting that the SNP285C/SNP309G haplotype frequency was observed in up to 8% of Europeans, thus requiring correction for the confounding effect of SNP285C in European studies. 39 However, aside from one study conducted by Knappskog et al, no other study including the meta-analyses corrected for the confounding effect of SNP285. 40 Among the studies presented in table 3, Knappskog et al (2012) reported that after correcting for SNP285, the OR for association of this haplotype with endometrial cancer was much lower, though still significant. Unfortunately, the meta-analyses which synthesised Knappskog et al (2012), as part of their analysis, did not correct for SNP285C in the European-based studies they included. 38 41 42 It is also concerning that two meta-analyses using the same primary articles failed to report the same result, in two instances. 38 42-44 dIsCussIon This article represents the most comprehensive systematic review to date, regarding critical appraisal of the available evidence of common low-penetrance variants implicated in predisposition to endometrial cancer. We have identified the most robust SNPs in the context of endometrial cancer risk. Of those, only 19 were significant at genome-wide level and a further five were considered marginally significant. The largest GWAS conducted in this field was the discovery-and meta-GWAS by O'Mara et al, which utilised 12 096 cases and 108 979 controls. 21 Despite the inclusion of all published GWAS and around 5000 newly genotyped cases, the total number did not reach anywhere near what is currently available for other common cancers such as breast cancer. For instance, BCAC (Breast Cancer Association Consortium) stands at well over 200 000 individuals with more than half being cases, and resulted in identification of ~170 SNPs in relation to breast cancer. 19 45 A total of 313 SNPs including imputations were then used to derive a PRS for breast cancer. 19 Therefore, further efforts should be directed to recruit more patients, with deep phenotypic clinical data to allow for relevant adjustments and subgroup analyses to be conducted for better precision.
A recent pre-print study by Zhang and colleagues examined the polygenicity and potential for SNP-based risk prediction for 14 common cancers, including endometrial cancer, using available summary-level data from European-ancestry datasets. 46 They estimated that there are just over 1000 independent endometrial cancer susceptibility SNPs, and that a PRS comprising all such SNPs would have an area under the receiver-operator curve of 0.64, similar to that predicted for ovarian cancer, but lower than that for the other cancers in the study. The modelling in the paper suggests that an endometrial cancer GWAS double the size of the current largest study would be able to identify susceptibility SNPs together explaining 40% of the genetic variance, but that in order to explain 75% of the genetic variance it would be necessary to have a GWAS comprising close to 150 000 cases and controls, far in excess of what is currently feasible.
We found that the literature consists mainly of candidategene studies with small sample sizes, meta-analyses reporting conflicting results despite using the same set of primary articles, and multiple reports of significant SNPs that have not been validated by any larger GWAS. The candidate-gene studies were indeed the most useful and cheaper technique available until the mid to late 2000s. However, a lack of reproducibility (particularly due to population stratification and reporting bias), uncertainty of reported associations, and considerably high false discovery rates make these studies much less appropriate in the post-GWAS era. Unlike the candidate-gene approach, GWAS do not require prior knowledge, selection of genes or SNPs, and provide vast amounts of data. Furthermore, both the genotyping process and data analysis phases have become cheaper, the latter particularly due to faster and open-access pre-phasing and imputation tools being made available.
It is clear from table 1 that some SNPs were reported with wide 95% CI, which can be directly attributed to small sample sizes particularly when restricting the cases to non-endometrioid histology only, low EAF or poor imputation quality. Thus, these should be interpreted with caution. Additionally, most of the SNPs reported by candidate-gene studies were not detected by the largest GWAS to date conducted by O'Mara et al. 21 However, this does not necessarily mean that the possibility of those SNPs being relevant should be completely dismissed. Moreover, meta-analyses were attempted for other variants; however, these showed no statistically significant association and many presented with high heterogeneity between the respective studies (data not shown). Furthermore, as many studies utilised the same set of cases and/or controls, conducting a meta-analysis was not possible for a good number of SNPs. It is therefore unequivocal that the literature is crowded with numerous small candidategene studies and conflicting data. This makes it particularly hard to detect novel SNPs and conduct meaningful meta-analyses.
We found convincing evidence for 19 variants that indicated the strongest association with endometrial cancer, as shown in table 1. The associations between endometrial cancer and variants in or around HNF1B, CYP19A1, SOX4, MYC, KLF and EIF2AK found in earlier GWAS were then replicated in the latest and largest GWAS. These SNPs showed promising potential in a theoretical PRS we devised based on published data. Using all 24 or genome-wide significant SNPs only, women with a PRS in the top 1% of the distribution would be predicted to have a risk of endometrial cancer 3.16 and 2.09 times higher than the mean risk, respectively.
However, the importance of these variants and relevance of the proximate genes in a functional or biological context is challenging to evaluate. Long distance promoter regulation by enhancers may disguise the genuine target gene. In addition, enhancers often do not loop to the nearest gene, further complicating the relevance of nearby gene(s) to a GWAS hit. In order to elucidate biologically relevant candidate target genes in endometrial cancer, O'Mara et al looked into promoter-associated chromatin looping using a modern HiChIP approach. 47 The authors utilised normal and tumoural endometrial cell lines for this analysis which showed significant enrichment for endometrial cancer heritability, with 103 candidate target genes identified across the 13 risk loci identified by the largest ECAC GWAS. Notable genes identified here were CDKN2A and WT1, and their antisense counterparts. The former was reported to be nearby of rs1679014 and the latter of rs10835920, as shown in table 1. Moreover, of the 36 candidate target genes, 17 were found to be downregulated while 19 were upregulated in endometrial tumours.
The authors also investigated overlap between the 13 endometrial cancer risk loci and top eQTL variants for each target gene. 47 In whole blood, of the two particular lead SNPs, rs8822380 at 17q21.32 was a top eQTL for SNX11 and HOXB2, whereas rs937213 at 15q15.1 was a top eQTL for SRP14. In endometrial tumour, rs7579014 at 2p16.1 was found to be a top eQTL for BCL11A. This is particularly interesting because BCL11A was the only nearby/candidate gene that had a GWAS association reported in both endometrioid and non-endometrioid subtypes. The study looked at protein-protein interactions between endometrial cancer drivers and candidate target gene products. Significant interactions were observed with TP53 (most significant), AKT, PTEN, ESR1 and KRAS, among others. Finally, when 103 target candidate genes and 387 proteins were combined together, 462 pathways were found to be significantly enriched. Many of these are related to gene regulation, cancer, obesity, insulinaemia and oestrogen exposure. This study clearly showed a potential biological relevance for some of the SNPs reported by ECAC GWAS in 2018.
Most of the larger included studies used cohorts primarily composed of women of broad European descent. Hence, there are negligible data available for other ethnicities, particularly African women. This is compounded by the lack of reference genotype data available for comparative analysis, making it harder for research to be conducted in ethnicities other than Europeans. This poses a problem for developing risk prediction Cancer genetics models that are equally valuable and predictive across populations. Thus, our results also are of limited applicability to non-European populations.
Furthermore, considering that non-endometrioid cases comprise a small proportion (~20%) of all endometrial cancer cases, much larger cohort sizes are needed to detect any genuine signals for non-endometrioid tumours. Most of the evaluated studies looked at either overall/mixed endometrial cancer subtypes or endometrioid histology, and those that looked at variant associations with non-endometrioid histology were unlikely to have enough power to detect any signal with statistical significance. This is particularly concerning because nonendometrioid subtypes are biologically aggressive tumours with a much poorer prognosis that contribute disproportionately to mortality from endometrial cancer. It is particularly important that attempts to improve early detection and prevention of endometrial cancer focus primarily on improving outcomes from these subtypes. It is also worth noting that, despite the current shift towards a molecular classification of endometrial cancer, most studies used the overarching classical Bokhman's classification system, type I versus type II, or no histological classification system at all. Therefore, it is important to create and follow a standardised and comprehensive classification system for reporting tumour subtypes for future studies.
This study compiled and presented available information for an extensively studied, yet unproven in large datasets, SNP309 variant in MDM2. Currently, there is no convincing evidence for an association between this variant and endometrial cancer risk. Additionally, of all the studies, only one accounted for the opposing effect of a nearby variant SNP285 in their analyses. Thus, we conclude that until confirmed by a sufficiently large GWAS, this variant should not be considered significant in influencing the risk of endometrial cancer and therefore not included in a PRS. This is also true for the majority of the SNPs reported in candidate-gene studies, as the numbers fall far short of being able to detect genuine signals.
This systematic review presents the most up-to-date evidence for endometrial cancer susceptibility variants, emphasising the need for further large-scale studies to identify more variants of importance, and validation of these associations. Until data from larger and more diverse cohorts are available, the top 24 SNPs presented here are the most robust common genetic variants that affect endometrial cancer risk. The multiplicative effects of these SNPs could be used in a PRS to allow personalised risk prediction models to be developed for targeted screening and prevention interventions for women at greatest risk of endometrial cancer.
Twitter emma J crosbie @Dremmacrosbie Contributors cB planned the study, did the systematic review, analysed the data and wrote the manuscript. DJT and al supervised the study and provided statistical support for the analysis. MJs supervised the study. naJr and an supported data acquisition. Dge and eJc designed and planned the study, provided supervision and wrote the manuscript. eJc provided funding for the study. all authors reviewed and approved the final manuscript. disclaimer The views expressed are those of the author(s) and not necessarily those of the nhs, the nihr or the Department of health.
Competing interests none declared.
Patient consent for publication not required.
Provenance and peer review not commissioned; externally peer reviewed. data availability statement Data are available upon reasonable request. The protocol for this systematic review was published at PrOsPerO and the data that inform this manuscript are available upon reasonable request from the corresponding author.