Evaluation of polygenic risk scores for ovarian cancer risk prediction in a prospective cohort study

Background Genome-wide association studies have identified >30 common SNPs associated with epithelial ovarian cancer (EOC). We evaluated the combined effects of EOC susceptibility SNPs on predicting EOC risk in an independent prospective cohort study. Methods We genotyped ovarian cancer susceptibility single nucleotide polymorphisms (SNPs) in a nested case–control study (750 cases and 1428 controls) from the UK Collaborative Trial of Ovarian Cancer Screening trial. Polygenic risk scores (PRSs) were constructed and their associations with EOC risk were evaluated using logistic regression. The absolute risk of developing ovarian cancer by PRS percentiles was calculated. Results The association between serous PRS and serous EOC (OR 1.43, 95% CI 1.29 to 1.58, p=1.3×10–11) was stronger than the association between overall PRS and overall EOC risk (OR 1.32, 95% CI 1.21 to 1.45, p=5.4×10–10). Women in the top fifth percentile of the PRS had a 3.4-fold increased EOC risk compared with women in the bottom 5% of the PRS, with the absolute EOC risk by age 80 being 2.9% and 0.9%, respectively, for the two groups of women in the population. Conclusion PRSs can be used to predict future risk of developing ovarian cancer for women in the general population. Incorporation of PRSs into risk prediction models for EOC could inform clinical decision-making and health management.

AbsTrACT background genome-wide association studies have identified >30 common SnPs associated with epithelial ovarian cancer (eOc). We evaluated the combined effects of eOc susceptibility SnPs on predicting eOc risk in an independent prospective cohort study. Methods We genotyped ovarian cancer susceptibility single nucleotide polymorphisms (SnPs) in a nested case-control study (750 cases and 1428 controls) from the UK collaborative trial of Ovarian cancer Screening trial. Polygenic risk scores (PrSs) were constructed and their associations with eOc risk were evaluated using logistic regression. the absolute risk of developing ovarian cancer by PrS percentiles was calculated. results the association between serous PrS and serous eOc (Or 1.43, 95% ci 1.29 to 1.58, p=1.3×10 -11 ) was stronger than the association between overall PrS and overall eOc risk (Or 1.32, 95% ci 1.21 to 1.45, p=5.4×10 -10 ). Women in the top fifth percentile of the PrS had a 3.4-fold increased eOc risk compared with women in the bottom 5% of the PrS, with the absolute eOc risk by age 80 being 2.9% and 0.9%, respectively, for the two groups of women in the population. Conclusion PrSs can be used to predict future risk of developing ovarian cancer for women in the general population. incorporation of PrSs into risk prediction models for eOc could inform clinical decision-making and health management.

InTroduCTIon
Ovarian cancer (OC) is the sixth most common cancer in women with 7378 women diagnosed with the cancer in the UK in 2014. Epithelial ovarian cancer (EOC) is the most common type, accounting for 90% of OC, of which two-thirds are serous EOC. 1 The overall 10-year survival rate for OC is around 36% and is poorer when diagnosed at an advance stage. 2 Therefore, early diagnosis has the potential to improve survival rates; however, most women with symptoms present with advanced stage disease. Epidemiological studies have estimated the risk of OC in first-degree relatives of patients with OC to be threefold greater than the risk in the general population, indicating the importance of genetic factors in disease susceptibility. 3 High penetrance mutations in BRCA1 and BRCA2 explain around 25% of the observed familial relative risk (FRR) 3 and a further 10% is explained by moderate-risk mutations in MLH1, MSH2, MSH6, RAD51C, RAD51D and BRIP1. 4 Genome-wide association studies (GWAS) have identified ~30 common low-risk SNPs that are associated with EOC, accounting for approximately 6.4% of the FRR. 5 6 Additional potential susceptibility loci were identified by pleiotropy cancer GWAS analysis. 7 Although individually each SNP is associated with a low risk of EOC, in combination their effects on risk may be greater. Their inclusion in EOC risk prediction models may improve risk precision. 3 Providing refined personalised cancer risks can result in better risk stratification and hence help in improving early cancer detection and prevention.
Previous published studies investigating the combined effects of EOC SNPs in terms of polygenic risk scores (PRSs) have either been based on retrospective studies or overlapped with the association studies that led to the identification of the SNPs. Here, we use data from an independent prospective population-based cohort study, the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS), to evaluate the EOC PRSs in predicting EOC risk prospectively.

MeThods study subjects
UKCTOCS is a randomised controlled trial for OC screening initiated in 2001. Postmenopausal women between age 50 and 74 years were recruited from 13 regional centres in National Health Service (NHS) Trusts in England, Wales and Northern Ireland. 8 9 Exclusion criteria included self-reported previous bilateral oophorectomy or ovarian malignancy, increased risk of OC due to history of OC or breast cancer in the family or known OC predisposing mutations, or had an active non-ovarian malignancy. [8][9][10] All participants provided written informed consent. All women completed a two-page 18-item baseline questionnaire at recruitment which captured data on the known OC risk factors (eg, personal/family cancer history, height/ weight, reproductive history, oral contraceptive pill (OCP) and hormone replacement therapy use). 9 Two postal follow-up questionnaires were sent to the women, with the first 3-5 years post-randomisation and the second in 2014. 8 A blood sample was donated by each woman at recruitment and serum was extracted as previously described. 9 Further details on sample processing and DNA extraction are provided in the online supplementary material. Notification of cancer diagnosis and deaths were through NHS Digital for the women residing in England and the Northern Ireland Cancer Registry and Central Services Agency for those residing in Northern Ireland. For women who developed OC, medical notes were retrieved and independently reviewed by an Outcomes Review Committee who assigned histological subtype, stage and grade. For the present study, we used a nested case-control design in women of self-reported white European ancestry. Cases were defined as women diagnosed with incident invasive epithelial ovarian or fallopian tube cancers or primary peritoneal cancer. Two random controls were selected per case, matched on regional centre, age at randomisation and year at recruitment. Following an outcomes review on 31 December 2014, 8 a total of 750 EOC cases and 1428 controls were included in the present analysis.

snP selection and genotyping
A panel of 96 SNPs were designed on the basis of their association with EOC risk from the meta-analysis of Kuchenbaecker et al 5 (online supplementary figure 1). These included SNPs from 50 regions that demonstrated associations at genome-wide significance level but also regions with associations at p<10 −5 . This was done in view of the ongoing OncoArray experiment 6 that was being performed on a larger sample size compared with the study of Kuchenbaecker et al. For each region, multiple correlated SNPs were selected for inclusion in the panel to ensure data availability in case of SNP genotyping failures.
Genotyping was performed on 96.96 dynamic arrays using the Fluidigm EP1 system (Fluidigm, San Francisco, California, USA) from 10 ng of DNA following the manufacturer's conditions using the pre-amplification protocol. The 96 SNPs included inventoried and Custom Assay-by-Design TaqMan probes (Applied Biosystems). Analysis was performed using Genotyping SNP Analysis software (Fluidigm). In total, 52 SNPs failed quality control (QC) due to poor clustering on the serum DNA samples, leaving 44 SNPs for analysis. To ensure consistency with the most recent GWAS results, SNPs were selected for inclusion in the PRS if they were in regions that showed genome-wide significance in the OncoArray experiment 6 (online supplementary figure 1). Of the 44 SNPs, 19 SNPs were from 15 genomewide significant regions reported in the OncoArray experiment (online supplementary figure 1). In total, 191 samples with call rates <80% were excluded (8%); therefore, 2178 samples passed QC (750 cases and 1428 controls). Any 96-well plates with pass rates <80% were excluded for a particular SNP. Also, 131 duplicate samples were included, and the concordance for duplicate samples was 97.6%. SNPs with significant deviations from Hardy-Weinberg equilibrium were assessed for quality of genotype clustering. All demonstrated clear clusters of genotyping calls and were therefore included in the analysis.

snP selection for inclusion in the Prs
The selection of genotyped SNPs for inclusion in the PRS was based on the latest results from the meta-analysis of GWAS for EOC reported by Phelan et al. 6 Two separate PRS were constructed: one for overall EOC and one for serous EOC. The overall PRS was constructed using the set of SNPs that showed associations with overall or any type of EOC at GWAS level. The serous PRS was constructed using the set of SNPs that showed associations with overall or any type of EOC at GWAS level, but also showed associations in the same direction for serous EOC. Only one SNP from each region was used. For each region, if the most significant SNP in the GWAS was among the SNPs genotyped, then it was selected for inclusion in the PRS. If the top SNP was not available, then we used a genotyped SNP (among the 44) from the region which had the highest correlation with the top published SNP. In total, 15 SNPs from 15 regions were included in the overall PRS construction and the same set of SNPs was selected for the serous PRS construction.

statistical analysis
To construct the PRS, we first evaluated all pairwise SNP interactions among the SNPs included in the PRS for their associations with EOC risk using logistic regression. In each model, the effects of both SNPs were included (as continuous variables taking values 0, 1 and 2) together with an interaction term between the SNPs. The quantile-quantile plot (qqplot) for all pairwise combinations were examined for the null hypothesis of no interaction effect. A Bonferroni correction was applied to adjust for multiple testing. The adjusted p value threshold was set at 8.3×10 −5 .
The PRS for individual i was defined as where g ki is the number of effect alleles for SNP k in individual i (taking values 0, 1 and 2) and β k is the per-allele log odds ratio (OR) for developing EOC associated with each copy of the effect allele of SNP k and n is the total number of SNPs used (n=15, online supplementary table 1). The log OR estimates for each SNP were obtained from the combined COGS and OncoArray association analyses of the Ovarian Cancer Association Consortium 6 (online supplementary table 1). For the overall PRS, we used the log OR estimates for developing overall EOC; for the serous PRS, we used the log OR estimates for serous EOC. For women with missing SNP genotypes (due to genotyping failures), we used the mean genotypes in controls and cases for each SNP separately.
Logistic regression was used to examine the association between the PRS and outcome. When investigating the overall PRS, the outcome (cases) was all invasive EOC (any histotype). When investigating the PRS for serous EOC, the outcome was serous EOC only. In each case, the PRS was treated as either a continuous or a categorical variable. The PRS was standardised by subtracting the mean in controls and dividing by the standard deviation (SD) in controls. When used as a categorical predictor, the PRS was grouped into the percentiles: [90%,95%) and [95%,100%] on the basis of the PRS distribution in controls with [0,5%) as the lowest 5% PRS group and [95%,100%] as the highest. The middle [40%,60%) group was used as the reference category. The observed ORs by PRS percentiles were compared with the theoretical OR predictions under a multiplicative polygenic model of inheritance. 11 Additional analyses were performed by adjusting for age and family history of EOC. Age was considered to be the age at EOC diagnosis for cases and at the age at last follow-up or age at the first non-EOC cancer (whichever occurred first) for controls. Two separate family history variables were constructed indicating (1) the number of relatives diagnosed with EOC in first-degree relatives and (2) indicating the number of affected relatives in both first-degree and second-degree relatives. We performed separate analyses adjusting for each of the two family history variables. Family history information was available at Cancer genetics the baseline questionnaire. The discriminatory power of the PRS was assessed by the C-statistic using R package 'pROC'. All statistical tests were two-tailed, and the significance threshold was set at 0.05.
The age-specific absolute risks of developing EOC by PRS percentiles were calculated by considering the OC incidence by PRS percentile and competing causes of mortality (other than EOC). We used data on EOC incidences and mortality rates were from the UK during 2012-2014. 1 12 The OC risks by PRS percentiles were calculated as is the OC incidence associated with PRS at is the PRS-specific survival function of being OC free at age t and S m ( t ) is the survival function at age t calculated on the basis of incidences of death from causes other than OC. To calculate the OC incidence for each PRS percentile, we assumed that the average, age-specific OC incidences, over all PRS percentiles, agreed with the population OC incidences and calculated the PRS-specific incidence recursively. Details of these methods have been described elsewhere. 11 13 14 resulTs Data on 1428 controls and 750 EOC cases were included in the overall PRS analysis. Data on the same 1428 controls and 489 serous EOC cases (including 417 high-grade serous) were included in the serous PRS association analysis. Table 1 summarises the study characteristics and provides a breakdown by histology subtype. A summary of the SNPs included the PRS is shown in online supplementary table 1.

Pairwise snP*snP interaction analysis
A total of 105 pairwise SNP*SNP interaction tests were performed but there was no significant evidence of interaction between any SNP pairs after a Bonferroni adjustment. The plot of observed against expected −log 10 p values did not show a significant departure from the expected values under the null hypothesis of no interaction (figure 1). Figure 2A shows that the PRS follows a nearly normal distribution in both controls and cases where the mean of cases was right shifted by 0.28 for the overall standardised PRS and 0.37 for the serous standardised PRS. There was a significant association between the overall PRS and overall EOC in the nested case-control study and the association was stronger between the serous PRS and serous EOC. The OR per unit SD was estimated to be 1.32 (95% CI 1.21 to 1.45, p=5.38 × 10 -10 ) for the overall PRS and 1.43 (95% CI 1.29 to 1.58, p=1.28 × 10 -11 ) for the serous PRS (table 2).

Association between Prs and ovarian cancer
The associations between the overall PRS and overall EOC, and between the serous PRS and serous EOC, stratified by age are shown in table 2. No significant interaction between age and EOC was observed in either the overall or the serous group (table 2). Discrimination as measured by the C-statistic was equal to 0.58 (95% CI 0.55 to 0.60) for the overall PRS and 0.60 (95% CI 0.57 to 0.63) for the serous PRS.
The estimated ORs by percentiles of PRS compared with the middle quintile ([40%,60%)) are shown in figure 2B. The ORs increased with increasing PRS percentiles for both the overall and the serous PRS. In the overall group, the OR for developing EOC for women in the lowest overall PRS percentile ([0,5%)) was estimated to be 0.53 (95% CI 0.30 to 0.89) and the OR for those in the highest overall PRS percentile ([95%,100%]) was 1.77 (95% CI 1.20 to 2.61) compared with the women in the middle overall PRS quintile. In the serous group, the OR for developing serous EOC was estimated to be 0.51 (95% CI 0.25 to 0.94) for women in the lowest serous PRS percentile ([0,5%)) and 2. 16 Figure 3 shows the predicted age-specific absolute risk of developing overall EOC by different PRS percentile categories. By age 80, the risks of developing EOC for women in the highest and lowest 5% of the PRS are predicted to be 2.9% and 0.9%, respectively.

dIsCussIon
Prior to incorporating the effects of common genetic variants into risk prediction models, it is important to calibrate the magnitude of their associations in studies which are independent of the original GWAS. This is the first prospective cohort study to evaluate the combined effects of GWAS identified common SNPs on EOC risk. We used data from a cohort of women in which women with known strong family history of OC or with known high-risk mutations were excluded. Therefore, the results are more applicable to women at 'low risk' of developing OC.

Cancer genetics
Our results show that SNPs combine multiplicatively on EOC risk and that the PRS predicts EOC risk prospectively. There was a stronger association with the serous EOC for predicting the risk of serous EOC. This was expected as most of the SNPs used in the PRS construction showed stronger associations with serous EOC in the published GWAS. The empirical OR estimates for EOC associated with different percentiles of the PRS were in line with the theoretical expected values under the multiplicative model, suggesting the PRS is calibrated at the extremes of the distribution, although the OR estimates are associated with wide CIs. There was 3.4-fold difference in the risk of EOC between women at the 5th and 95th percentiles of the PRS. However, the discriminatory ability of the 'PRS alone' is modest with an area under the receiver-operating characteristic curve of 0.58-0.60 (based on 1428 controls, 750 overall OC cases, 489 serous OC cases). There was no evidence of an interaction between the PRS and age in our study, suggesting that the relative effect of the PRS remains constant with age. Little changes were observed in the OR estimates associated with the PRS after adjusting for family history of OC (table 3). Overall, family history of OC was not significantly associated with EOC risk in the present study, but this could be a consequence of the study design. One of the eligibility criteria for inclusion in the UKCTOCS trial was that women had to be at low risk of familial OC; therefore, the overall cohort is 'biased' towards women without significant family history of OC. However, the reduction in the effect size for family history after adjusting for the PRS is consistent with the predicted contribution of the SNPs to the familial risk of OC. To assess the effect of family history on OC accurately and the possible attenuation in the family history association after taking into account the PRS, larger studies with more representative samples of women from the population (with respect to family history) would be required.
The current estimate of lifetime EOC risk in the UK general population is 1.86% based on 2014 data. Our results show that using the PRS alone results in a cumulative EOC risk by age 80 of 2.92% for women in the highest 5% of the overall PRS percentiles and 0.88% for women in the lowest 5% of the PRS. Although such differences alone may not lead to changes in the clinical management of women (eg, use of risk-reducing treatments such as salpingo-oophorectomy), the PRS in combination In the overall PRS analysis, we used cases of any type of ovarian cancers and in the serous PRS analysis we used cases of serous ovarian cancer.  17 Although the present study is limited by the inclusion of women with no significant family history of OC, the findings are relevant to tailoring screening efforts in the future. In addition, these PRS can result in clinically significant differences in risk when used in combination with mutations in moderate penetrance genes such as RAD51C, RAD51D and BRIP1. 16 Hence, risk modelling incorporating a combination of PRS, other rare genetic susceptibility variants, family history and epidemiological factors may enable population risk stratification to identify individuals who will benefit from targeted interventions. For example, risk-reducing salpingo-oophorectomy has been suggested to be cost-effective at >4%-5% lifetime risk of OC. 18 19 This may provide clinical utility for undertaking surgical prevention above these levels of risk. A change in guidelines to enable women at above these risk thresholds to benefit from surgical prevention has been advocated. 20 The present study has also several limitations. Some of the study participants had been diagnosed with another cancer prior to their recruitment into the study. These women were included in the analysis (52 incident cases, 111 controls) in line with previous GWAS. After excluding the women with a history of cancer, the results remain similar and the conclusions were not influenced by these assumptions (online supplementary tables 2-4 and supplementary figure 2-5). Although the UKCTOCS study was not included in the recent GWAS, 6 since this is a national study, it is possible that some incidental overlaps may exist with samples included in the Phelan et al study, 6 if study participants enrolled independently in other studies. On further investigation by the coordinating centre, 34 incident OC cases and 5 of the controls in the study were also included in one of the case-control studies included in Phelan et al. 6 After excluding the overlapping individuals the results remained virtually identical (data not shown). Another limitation is that the majority of the women were eligible to participate in the UKCTOCS study if they did not have family history of OC. Therefore, we were not able to obtain an unbiased estimates of the effect of family history. In the present study, the OR estimate associated with family history of OC is substantially lower compared with studies of familial risks of OC, 3 but this is expected under the present design. A possible further implication of the present study design (ie, women selected for no family history) is a possible attenuation of the effect of the PRS due to the fact that common variants are expected to be confounded with family history of OC. However, the OR estimates for different percentiles of the PRS were in line with those expected under a multiplicative model, suggesting that this is unlikely to result in a substantial bias in the PRS associations. We note that although the multiplicative model assumption is further supported by the fact that no SNP*SNP interactions were detected, the current study is underpowered to investigate pairwise interactions of modest effects (online supplementary table 5). Larger studies will have to investigate this. The sample size is also limited for assessing the associations of the PRS with different histotypes, other than serous EOC. Although GWAS have demonstrated associations between SNPs and other EOC histotypes, the number of endometrioid, clear cell and mucinous cancers were too small and the PRS did not show evidence associations with these histotypes (online supplementary table 6). Additionally, the latest GWAS has identified many more common SNPs (~30) associated with EOC which were estimated to account for 6.4% of the polygenic risk in the population. 6 The 15 SNPs included in the PRS and evaluated in the present study explained only 3.4% of the polygenic risk. Therefore, further improvement could be achieved by incorporating these additional SNPs into the PRS or by constructing PRS which include both genomewide significant SNPs and SNPs with higher p values of association using penalised regression models. [21][22][23] In conclusion, this paper is the first evaluation of the association of the PRS with EOC in a prospective general population cohort. It demonstrates that the PRS based on published SNP effect sizes is well calibrated and the PRS is a strong risk factor for EOC that contributes towards the discrimination of women who will develop EOC. It will be necessary to incorporate the PRS in comprehensive OC risk prediction models together with other risk factors for the disease and assess the improvement in risk prediction in prospective studies. Such comprehensive risk models will facilitate the clinical decision-making and health management for at-risk women and provide more personalised risk management.