Population-based targeted sequencing of 54 candidate genes identifies PALB2 as a susceptibility gene for high-grade serous ovarian cancer

Purpose The known epithelial ovarian cancer (EOC) susceptibility genes account for less than 50% of the heritable risk of ovarian cancer suggesting that other susceptibility genes exist. The aim of this study was to evaluate the contribution to ovarian cancer susceptibility of rare deleterious germline variants in a set of candidate genes. Methods We sequenced the coding region of 54 candidate genes in 6385 invasive EOC cases and 6115 controls of broad European ancestry. Genes with an increased frequency of putative deleterious variants in cases versus controls were further examined in an independent set of 14 135 EOC cases and 28 655 controls from the Ovarian Cancer Association Consortium and the UK Biobank. For each gene, we estimated the EOC risks and evaluated associations between germline variant status and clinical characteristics. Results The ORs associated for high-grade serous ovarian cancer were 3.01 for PALB2 (95% CI 1.59 to 5.68; p=0.00068), 1.99 for POLK (95% CI 1.15 to 3.43; p=0.014) and 4.07 for SLX4 (95% CI 1.34 to 12.4; p=0.013). Deleterious mutations in FBXO10 were associated with a reduced risk of disease (OR 0.27, 95% CI 0.07 to 1.00, p=0.049). However, based on the Bayes false discovery probability, only the association for PALB2 in high-grade serous ovarian cancer is likely to represent a true positive. Conclusions We have found strong evidence that carriers of PALB2 deleterious mutations are at increased risk of high-grade serous ovarian cancer. Whether the magnitude of risk is sufficiently high to warrant the inclusion of PALB2 in cancer gene panels for ovarian cancer risk testing is unclear; much larger sample sizes will be needed to provide sufficiently precise estimates for clinical counselling.


Original research
Population-based targeted sequencing of 54 candidate genes identifies PALB2 as a susceptibility gene for high-grade serous ovarian cancer AbsTrACT Purpose The known epithelial ovarian cancer (EOC) susceptibility genes account for less than 50% of the heritable risk of ovarian cancer suggesting that other susceptibility genes exist. The aim of this study was to evaluate the contribution to ovarian cancer susceptibility of rare deleterious germline variants in a set of candidate genes. Methods We sequenced the coding region of 54 candidate genes in 6385 invasive EOC cases and 6115 controls of broad European ancestry. Genes with an increased frequency of putative deleterious variants in cases versus controls were further examined in an independent set of 14 135 EOC cases and 28 655 controls from the Ovarian Cancer Association Consortium and the UK Biobank. For each gene, we estimated the EOC risks and evaluated associations between germline variant status and clinical characteristics. results The ORs associated for high-grade serous ovarian cancer were 3.01 for PALB2 (95% CI 1.59 to 5.68; p=0.00068), 1.99 for POLK (95% CI 1.15 to 3.43; p=0.014) and 4.07 for SLX4 (95% CI 1.34 to 12.4; p=0.013). Deleterious mutations in FBXO10 were associated with a reduced risk of disease (OR 0.27, 95% CI 0.07 to 1.00, p=0.049). However, based on the Bayes false discovery probability, only the association for PALB2 in high-grade serous ovarian cancer is likely to represent a true positive. Conclusions We have found strong evidence that carriers of PALB2 deleterious mutations are at increased risk of high-grade serous ovarian cancer. Whether the magnitude of risk is sufficiently high to warrant the inclusion of PALB2 in cancer gene panels for ovarian cancer risk testing is unclear; much larger sample sizes will be needed to provide sufficiently precise estimates for clinical counselling.

InTroduCTIon
Rare, predicted deleterious variants in multiple genes have been shown to be associated with a moderate to high risk of epithelial ovarian cancer (EOC). These include the DNA double stand break repair genes BRCA1, 1 BRCA, 2 BRIP1, 3 RAD51C, and RAD51 4 , and the mismatch repair genes MSH2 and MSH6. 5 6 ANKRD11, FANCM, PALB2 and POLE have recently been reported as possible susceptibility genes. [7][8][9] Multiple common variants conferring weaker risk effects have also been identified, 10-17 some of which modify EOC risk in carriers of more highly penetrant gene mutations. 18 19 EOC is heterogeneous with five main histotypes: high-grade serous (HGSOC), low-grade serous, endometrioid, clear cell and mucinous ovarian cancer. These have different clinical characteristics and outcomes and are characterised by different germline and somatic genetic changes that result in the perturbation of different molecular pathways. For example, germline mutations in DNA double break repair genes predispose to HGSOC while germline mutations in mismatch repair genes increase risk of the endometrioid and clear cell histotypes. 6

Cancer genetics
The known susceptibility alleles account for less than 50% of the excess familial risk of ovarian cancer, suggesting that other susceptibility genes and alleles exist. 15 The unexplained genetic component of risk is likely to be made up of a combination of common genetic variants conferring weak effects and uncommon alleles conferring weak to moderate relative risks (less than 10-fold).
The aim of this study was to identify additional ovarian cancer susceptibility genes using case-control sequencing of candidate genes identified through various approaches including their known function in pathways that are associated with ovarian cancer development and from whole exome sequencing studies (WES) of ovarian cancer cases that have identified putative deleterious mutations in genes not previously evaluated for EOC risk.

MATerIAl And MeThods selection of candidate genes
Genes based on known biological function As several EOC susceptibility genes are involved in DNA doublestrand break repair and Fanconi anaemia (FA), 8 we selected genes involved in these pathways. FA is a rare genetic disease characterised by chromosomal instability, hypersensitivity to DNA crosslinking agents, defective DNA repair, severe bone marrow failure, cancer susceptibility and many congenital defects. To date, 22 FA genes have been identified, of which eight have previously been evaluated in ovarian cancer casecontrol studies: 3  We also included FANCN (PALB2), which has been studied previously in ovarian cancer 3 9 20-22 but its association with EOC risk is equivocal. Eight candidate genes involved in other aspects of DNA repair were also included: ALKBH3, CHEK2, GTF2H4, POLE, POLK, RDM1 and XRCC1.

Genes from whole exome sequencing studies (WES)
Twelve genes (BUB1B, C5orf28, C6, DNAJB4, EXO1, LIG4, MKNK2, MMRN1, PARP1, RAD52, SMC1A and SNRNP200) were selected from WES analysis of EOC cases where putative deleterious (truncating) mutations were identified at a greater frequency in cases compared with publicly available WES data from controls reported by the NHLBI GO Exome Sequencing Project and The Exome Aggregation Consortium databases (http:// exac. broadinstitute. org). Germline WES data for EOC cases were available for 412 HGSOC cases from the Cancer Genome Atlas ovarian cancer study; 513 ovarian cancer cases from an Australian case series 6; 97 familial non-BRCA1/BRCA2 ovarian cancer cases from Gilda Radner Familial Ovarian Cancer Registry and 54 ovarian cancer cases from the UK Familial Ovarian Cancer Registry.
Four genes from these WES studies (GANC, KNTC1, PSG6 and UPK2) were selected because more than one family member diagnosed with ovarian cancer from 10 familial cases carried the same truncating mutation in one of these genes.

study subjects
We used case-control data from targeted sequencing, exome and array-based genotyping.

Targeted sequencing
We included 5914 EOC cases and 5479 controls of European ancestries from 19 studies-13 case-control studies, 1 familial ovarian cancer study from Poland, 2 clinical trials and 3 caseonly studies (online supplementary table 1). 14 HGSOC cases were preferentially plated out for sequencing where possible.

Exome sequencing
We extracted data on the 54 candidate genes from 829 case and 913 controls from two ovarian cancer case-control studies (MDA [23][24][25] and NCO 14 ) for which whole exome sequence data were available (online supplementary table 1).

Variants from genotyping array data
For genes that reached nominal significance in the combined analysis of the targeted sequencing and exome sequencing data, we extracted genotypes of any deleterious variants included on the OncoArray and UK Biobank Axiom Array. These two arrays were used to genotype up to 18 936 controls and 13 288 cases from the Ovarian Cancer Association Consortium (OCAC), 15 9725 controls and 858 cases from UK Biobank GWAS (https:// www. ukbiobank. ac. uk/), respectively. Samples overlapped with the sequencing studies were excluded from the analysis.
All studies had ethics committee approval, and all participants provided informed consent.
Sequencing reads were demultiplexed and then aligned against the human genome reference sequence (hg19) using the Burrows-Wheeler Aligner. 26 The Genome Analysis Toolkit 27 was used for base quality-score recalibration, local indel realignment and variant calling. Finally, ANNOVAR 28 was used for variant annotation. Variants were called if (1) genotype information was available from a chip genotype for that sample or (2) the variants were presented in more than one amplicon or (3) read depth ≥15 and alternate allele frequency ≥40% or (4) read depth ≥100 and alternate allele frequency ≥25%. These thresholds were defined using the results from sequencing of positive controls with known variants and genotype information from chip array genotyping of overlapping samples.  29 was used to merge the overlapped paired-end reads into one read, using default parameters. Reference genome alignment and joint genotype calling according to a pipeline described in Yu et al. 30 The coding sequences and splice sites of all 54 genes were extracted. Fifty-three genes with 100% average coverage at 10X were included in the analysis. GTF2H4 was excluded from the analysis, as the average coverage was only 43%.
Deleterious variants were defined as those predicted to result in protein truncation (frameshift indel, splice site, nonsense mutations and start loss) or predicted to be deleterious and/ or likely deleterious by Clinvar. 31 Any exonic single nucleotide variants within 3 bp of the exon-intron boundary and any intronic variants within 20 bp of the exon-intron boundary at the 5-prime end, and 6 bp at the 3-prime end, were evaluated using the software MaxEntScan to identify those most likely to disrupt splicing. 32 Variants with a MaxEntScan score that decreased by more than 40% compared with the reference sequence and having a reference sequence score ≥3 were considered deleterious. Sequencing alignments were confirmed by visual inspection using the Integrative Genomic Viewer. 33

Risk estimation and genotype-phenotype analyses
We used a simple burden test for association between deleterious variants and ovarian cancer risk on a gene-by-gene basis. The burden test was based on unconditional logistic regression adjusted for country (Australia, Denmark, German, Poland, the UK and the USA) and sequencing method (targeted sequencing or exome sequencing). ORs and associated 95% CI were calculated.

Missense variant analyses
We also identified multiple rare (minor allele frequency <1%) missense variants that have an unknown functional effect on the protein. We used the rare admixture likelihood burden test 34 to test these variants for association. We excluded any missense variants classified as deleterious and classified the remaining variants by whether or not they are predicted to have a damaging effect on protein function by two out of three prediction tools-SIFT (score <0.05), 35 polyphen-2 36 (classified as probably damaging or damaging) and Provean 37 (score≤−2.5). Subjects with a missense variant call rate less than 80% and variants with a call rate less than 80% or with genotype frequencies inconsistent with Hardy-Weinberg equilibrium (p<10 -5 ) were excluded.

Germline deleterious mutations in ovarian cancer cases and controls
Sequencing results were available for 6385 EOC cases and 6115 controls after quality control analysis. The characteristics of these individuals by study are summarised in online supplementary table 1. Most EOC cases were serous histotype (n=6304, 98.7%), of which 5951 were the HGSOC histotype (93.2%).   table 4). Given the evidence for association of multiple FA genes with EOC risk, we also carried out a burden test to compare the frequency of deleterious variants in any of the eight genes which were not significantly associated with ovarian cancer risks individually (FANCA, FANCB, FANCC, FANCD2, FANCE, FANCG, FANCI and FANCL). A combined analysis will have greater power if multiple genes were associated but the effect sizes too small to detect individually. There was no significant difference in the frequency of deleterious variants in cases (96/6184, 1.6%) and controls (85/6089, 1.4%) (p=0.50).

Validation analyses in ovarian cancer case-control studies
We also evaluated risk associations between deleterious variants in POLK, PALB2, and SLX4 with EOC risk based on germline genotyping data for 13 277 EOC cases and 18 930 controls from OCAC and for 858 EOC cases and 9725 controls and from UK Biobank. For OCAC samples, data were available for six deleterious non-monomorphic variants in PALB2; for UK Biobank samples, data were available for seven PALB2 and one POLK deleterious variants (table 2, list of variants in online supplementary table 5).
In OCAC case-control analyses, PALB2 variants showed a non-significant increased risk of EOC (OR 2.10, 95% CI 0.74 to 5.94, p=0.16). The strength of this association increased when the analysis was restricted to 6181 HGSOC cases (OR 3.48, 95% CI 1.10 to 11.1, p=0.035). In UK Biobank, we observed a weak association for PALB2 mutations with EOC risk (OR 3.12, 95% CI 0.87 to 11.2, p=0.081). There was no evidence of risk association for mutations in POLK (table 2).
We used an approximate Bayes factor to calculate the Bayes false discovery probability (BFDP) described by Wakefield 38 for PALB2, SLX4, POLK and FBXO10 based on several different priors and assuming that the associated risk is unlikely to be greater than an OR of 4 (table 3). The evidence for association of PALB2 was strong with a BFDP of less than 15% when the prior on the alternative hypothesis is 0.1. The nominally significant associations for the other three genes are likely to be false positives.

Predicting the functional impact of missense coding variants
Combining the whole exome and targeted sequencing data, we identified 5265 unique missense variants with minor allele frequency less than 1% in the 54 genes (online supplementary table 6). We used the in silico software programs SIFT, Polyphen-2 and Provean to evaluate the predicted impact of these variants on protein function for each gene. Of the 5265 variants, 2111 were classified as 'deleterious' based on at least 2 out of 3 of these classifiers. We found weak evidence for association with increased EOC risk for rare missense variants in DUOX1 and PAK4 using burden testing (p=0.015 and 0.025, respectively) (online supplementary table 7); for DUOX1, the strength of this association improved when the analyses were restricted to the HGSOC histotype (p=0.0061). When we performed the same analyses for 1493 very rare variants (MAF<0.001), we observed significant association for missense variants in DUOX1 and FANCE (p=0.015 and 0.034, respectively).

dIsCussIon
We have evaluated the association between putative deleterious variants in 54 genes with the risk of HGSOC through a combination of whole exome and targeted sequencing analysis in 5951 cases and 6115 controls of broad European ancestries. We found evidence for four genes-PALB2, POLK, SLX4 and FBXO10-associated with HGSOC risk. Association analysis in an additional 14 135 ovarian cancer cases and 28 655 controls genotyped through OCAC and the UK Biobank provided further support for PALB2 as a HGSOC susceptibility gene.
The probability that a genetic association deemed statistically significant is a false positive depends on the prior of the null hypothesis and the power of the study to detect an effect size plausible under the alternative hypothesis. We calculated Wakefield's BFDP 38 based on several different priors to further evaluate the likelihood that PALB2, POLK, SLX4 and FBXO10 are EOC susceptibility genes. If we assume the prior on the alternative to be 1 in 10 or 1 in 20, the BFDPs for the association of deleterious variants in PALB2 with HGSOC are 0.14 and 0.26, respectively. These moderately strong priors are reasonable given the evidence for the association from previously published studies. 20 Two studies have reported nominally significant associations for PALB2 with OR 4.4 (95% CI 2.1 to 9.1) 20 and (2.87, 95% CI 1.61 to 4.74). 21 Kotsopoulos and colleagues reported an increased risk that was not significant (OR, 4.55, 95% CI 0.76 to 27) and, in a subset of the samples included in this study, we also found a non-significant increase in risk (OR 3.2, 95% CI 0.86 to 12). 3 It is possible that cryptic population structure could cause spurious association in these data. Principal component analysis is one approach to reducing the risk of such bias, but there are too few common variants in the regions covered by the targeted sequencing panel to do a principal component analysis and chip genotyping data that would be required for such an analysis is not available for all the samples. Adjusting for country of origin and restricting the analysis to samples from individuals of broad European ancestries should reduce any problem with population stratification.
We lacked the statistical power to identify susceptibility genes conferring relative risks of less than 2 (figure 1). Our use of targeted sequencing and a definition of deleterious variants as those that likely truncate the protein product will have probably underestimated the true prevalence of deleterious variants in these genes. Incomplete coverage of each gene will have missed some small indels and single nucleotide variants. Amplicon based sequencing will also miss large deletions and rearrangements, which are relatively common in some genes. 39 40 Finally, any functional mutations in the non-coding region of these genes will have been missed. 41 Some commercial gene-panel tests for hereditary breastovarian cancer already include PALB2. However, whether there is clinical utility in testing unaffected women for deleterious mutations in PALB2 is not clear given the uncertainties in the risk estimates for this gene. There is no consensus over the risk threshold at which preventative surgery should be offered; many cancer genetics clinics in the UK will refer women if their predicted lifetime risk of EOC is greater than 10%. Others have suggested that the risk threshold should be lower given the low risk nature of the intervention; prophylactic surgery has been shown to be cost-effective for women at a lifetime risk of 5%. Recent updates to the US National Comprehensive Cancer Network Guidelines recommend considering risk reducing salpingo-oophorectomy in carriers of moderate risk genes if the lifetime risk of such mutation carriers exceeds 2.6%. Based on our data and population data for ovarian cancer incidence in England and Wales in 2016, the cumulative risk of ovarian cancer by age 80 for a carrier of a deleterious PALB2 mutation is 3.2% (figure 2). Thus, a woman carrying a PALB2 deleterious mutation would be eligible for prophylactic surgery. However, the CIs for this estimate range from 1.8% to 5.7%. Very large, well-designed case-control studies will be required to provide more precise, unbiased estimates of risk suitable for clinical counselling.
In summary, we have found relatively strong evidence that deleterious germline mutations in PALB2 are associated with a moderate increase in the risk of HGSOC with weak evidence for POLK, SLX4 and FBXO10. Mutations in the other 50 genes we tested are unlikely to contribute meaningfully to genetic predisposition to HGSOC. This study highlights the importance of large sample sizes needed to obtain risk estimates with the precision necessary for clinical use.