Background Moderate-risk genes have not been extensively studied, and missense substitutions in them are generally returned to patients as variants of uncertain significance lacking clearly defined risk estimates. The fraction of early-onset breast cancer cases carrying moderate-risk genotypes and quantitative methods for flagging variants for further analysis have not been established.
Methods We evaluated rare missense substitutions identified from a mutation screen of ATM, CHEK2, MRE11A, RAD50, NBN, RAD51, RINT1, XRCC2 and BARD1 in 1297 cases of early-onset breast cancer and 1121 controls via scores from Align-Grantham Variation Grantham Deviation (GVGD), combined annotation dependent depletion (CADD), multivariate analysis of protein polymorphism (MAPP) and PolyPhen-2. We also evaluated subjects by polygenotype from 18 breast cancer risk SNPs. From these analyses, we estimated the fraction of cases and controls that reach a breast cancer OR≥2.5 threshold.
Results Analysis of mutation screening data from the nine genes revealed that 7.5% of cases and 2.4% of controls were carriers of at least one rare variant with an average OR≥2.5. 2.1% of cases and 1.2% of controls had a polygenotype with an average OR≥2.5.
Conclusions Among early-onset breast cancer cases, 9.6% had a genotype associated with an increased risk sufficient to affect clinical management recommendations. Over two-thirds of variants conferring this level of risk were rare missense substitutions in moderate-risk genes. Placement in the estimated OR≥2.5 group by at least two of these missense analysis programs should be used to prioritise variants for further study. Panel testing often creates more heat than light; quantitative approaches to variant prioritisation and classification may facilitate more efficient clinical classification of variants.
- Cancer: breast
- Clinical genetics
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
For the last 15+ years, most clinical cancer genetics involving predisposition to breast (and ovarian) cancer have been driven by mutation screening of BRCA1 and BRCA2. For these two genes, the ratio of truncating and splice junction variants (T+SJV) to pathogenic rare missense substitutions (rMS) is about 10:1,1 ,2 fostering a view among clinical cancer geneticists that rMS are only a minor part of the spectrum of breast cancer predisposing variants.
Following the discoveries of BRCA1 and BRCA2, many additional genes have been identified as breast cancer susceptibility genes. A prominent group of these are referred to as moderate-risk susceptibility genes because protein truncating variants and severely dysfunctional missense substitutions in them appear to confer, on average, twofold to fivefold increased risk of breast cancer. This magnitude of increased risk is less dramatic than that conferred by most pathogenic alleles in the high-risk genes BRCA1, BRCA2 and PALB2, but potentially high enough to influence the medical management of carriers.3–6 Beyond the moderate-risk genes, many common SNPs have been identified as markers for slightly increased breast cancer risk.7–9 A challenge posed by these modest-risk SNPs is that, individually, they do not confer enough risk to influence the medical management of a carrier, but considered as an ensemble they may.
In our previously published case–control mutation screening studies of ATM, CHEK2, MRE11A, NBN, RAD50, RAD51, RINT1 and XRCC2,10–15 we repeatedly found, albeit with some variations in the methodology, that the summed frequency of predicted deleterious missense substitutions exceeded that of protein truncating variants. This study used the same ethnically diverse sample of 1297 breast cancer cases and 1121 controls, negative for pathogenic variants in BRCA1, BRCA2 or PALB2 (table 1). Here, we added BARD1 mutation screening data to the original gene-by-gene analyses and applied consistent analytic models across the rare variants from all nine genes. Setting an OR≥2.5 as a threshold for clinical significance, we estimated the scores from the missense substitution analysis programs Align-GVGD,16 CADD,17 MAPP18 and PolyPhen-219 required to identify a group of missense substitutions that reach an average OR≥2.5. These results were used to determine the proportion of cases and controls carrying a potential risk-conferring rMS. We also explored a combined evaluation of 18 Breast Cancer Association Consortium (BCAC)-confirmed modest-risk SNPs as a polygene, compared predicted to empirically observed ORs and estimated the prevalence of genotype combinations across these 18 SNPs with an average OR≥2.5. Our data set is unique in that the moderate-risk gene mutation screening and SNP genotyping were performed on the same subjects, giving us the opportunity to compare prevalence of the OR≥2.5 threshold across T+SJV and OR≥2.5 groupings of rMS or normalised polygene score (NPS).
Materials and methods
Patients were selected from women systematically recruited by population-based sampling by the Australian, Northern Californian and Ontarian sites of the Breast Cancer Family Registry (BCFR). Patients were recruited between 1995 and 2005. The selection criteria for cases (N=1297) were diagnosis at or before age 45 years and self-reported race/ethnicity plus grandparents’ country of origin consistent with Caucasian, East Asian, Hispanic/Latino or Recent African racial or ethnic heritage.20 The controls (N=1121) were frequency matched to cases within each centre on racial or ethnic group, with age at selection not more than ±10 years difference from the age range at diagnosis of the patients systematically recruited from the same centre.
Mutation screening and SNP genotyping
Mutation screening was as described previously10–15 and is included in online supplementary methods, as is SNP genotyping. The following methods focus on the analysis of missense substitutions and of the ensemble of 18 modest-risk SNPs.
Allele frequency threshold
Following our allele frequency analysis of ATM, BRCA1, BRCA2 and CHEK2 from Damiola et al,12 we applied a minor allele frequency (q) threshold of ≤0.1%, based on exome variant server and 1000 genomes project allele frequency data that are independent of this study's mutation screening, for all variants of the eight genes in which biallelic truncating variants are often either embryonic lethal or else cause a highly deleterious phenotype from the ataxia telangiectasia/Fanconi anaemia spectrum. Biallelic CHEK2 carriers are superficially healthy, and our analysis suggested a cut-off of q<0.32% for that gene.12
In silico missense substitution scoring
Align-GVGD (agvgd.iarc.fr/agvgd_input.php) and MAPP (mendel.stanford.edu/SidowLab/downloads/MAPP/index.html) require user-supplied protein multiple sequence alignments (pMSAs) to score missense substitutions; both compare the physicochemical features of the missense residue to the physicochemical range of variation at the relevant position in the pMSA to calculate their scores. Align-GVGD produces a score with seven discrete grades from C0 (most likely neutral) to C65 (most likely deleterious). MAPP, which additionally requires a phylogenetic tree detailing the evolutionary relationships and distances between the organisms with sequences represented in the pMSA, outputs a continuous variable, the MAPP score.
For programs that require a user-generated pMSA, it has been suggested that the pMSA for each gene needs enough variation to average at least three amino acid substitutions per position (3S/P).21 For each gene, we created an initial pMSA containing the human sequence and 13 additional orthologs. To maintain harmony across the pMSAs, orthologs were sampled from a phylogenetically similar set of organisms ranging from a non-human primate (Macaca mulatta) to the non-chordate deuterostomate Strongylocentrotus purpuratus (see details in online supplementary methods).
Ortholog sequences downloaded from GenBank were aligned using the expresso extension of T-Coffee to create the initial pMSA.22 ,23 The initial alignment was checked by hand in Geneious V.7.1.4 ( http://www.geneious.com) for anomalies that might be attributed to gene model errors rather than actual sequence divergence. Potential anomalies were corrected by reference to, and gene reprediction from, genomic DNA sequence available on the UCSC genome browser (http://genome.ucsc.edu).
Routines from the PHYLIP package (V.3.69),24 constrained using the known phylogeny of the species included in our alignments, were used to estimate substitutions per position within each alignment and to calculate the distance matrices required by MAPP. The complete alignments, phylogenetic trees and distances are available upon request.
PolyPhen-2 (genetics.bwh.harvard.edu/pph2/)25 and CADD (cadd.gs.washington.edu/)17 operate without user-supplied alignments. PolyPhen-2 uses a combination of internally generated pMSAs, functional annotations and structural information to evaluate missense substitutions;19 we used its output variable ‘pph2_prob’ as a continuous variable score. CADD uses a series of 63 gene annotations, combined through a support vector machine linear kernel, to define a PHRED-like score (their ‘scaled C-score’) for all possible single-nucleotide substitutions and small insertion–deletion mutations to the human genome.17
Although CADD has a built-in method for short indels, the other missense analysis programs do not. For Align-GVGD, MAPP and PolyPhen-2, nonsense substitutions in the final exon and non-frameshift indels received the score of the most severe missense substitution possible in the affected interval. Variant pathogenicity scores are summarised in online supplementary table S1.
To assess evidence of risk from the case–control frequency distribution of T+SJV and rMS, we constructed a table with one entry per subject; the variants per subject; and annotations for whether the variant was in a key functional domain (see online supplementary table S2), its frequency, as well as study centre, case–control status, race/ethnicity and age for the subject. In-frame deletions (IFDs) were treated as rMS. For the subjects who carried more than one rare variant of interest, only the most deleterious score was considered. We then divided the subjects into groups: a reference group of non-carriers and carriers of common variants (only), carriers of rMS not in a functional domain, carriers of rMS in a key functional domain divided into two groups via score and carriers of T+SJVs. For each of the four rMS analysis programs, we toggled the program’s severity score from a very relaxed to a very stringent value. We repeatedly estimated two ORs as the stringency increased: the OR for subjects that carried one or more rMS at or above the score (and no T+SJV), and the OR for subjects who carried an rMS that was below the score (and carried neither a T+SJV nor a higher scoring rMS). From this analysis, we determined a threshold severity score for each program at which subjects carrying an rMS at or above the threshold had an average OR≥2.5.
For the 18 SNP polygene, we created a polygenic risk score (PRS) by multiplying together the appropriate published OR estimate from each individual SNP genotype.7 ,9 ,26 ,27 The geometric mean of the PRS of the controls was used to normalise the PRS into an NPS. Because risk estimates from Caucasian populations may not be applicable to women from other race/ethnicities, we gave each non-Caucasian subject a race/ethnicity-specific risk estimate derived from the population of the subject,26 ,28–30 and normalised each race/ethnicity separately. Risk estimates of rs1045485 could not be found for the non-Caucasian subjects in this study, so the risk estimate based on Caucasian populations was used for all race/ethnicities. In instances where a Latina-specific risk estimate could not be found, we used the average between Caucasian and East Asian race/ethnicities. To determine the correlation between NPS and the observed OR, we grouped the subjects into a series of 10 contiguous bins based on percentile, using the central quintile (40–60 percentile) as the reference group. We treated groups outside of the reference as categorical variables for OR calculations. For the threshold analysis, we used the same reference group and adjusted the NPS threshold until the group containing scores above the set threshold had an OR≥2.5.
For the regressions, NPS was treated as the independent variable and the resultant OR of each group as the dependent variable weighted to the number of individuals in each group, excluding subjects in the middle quintile. p Values were found by testing the regression coefficient equal to 0 or 1. To combine the risk estimates from the rMS and NPS, we multiplied the NPS and OR from the rMS.
All analyses were performed using multivariable unconditional logistic regression using Stata V.12.1 software (StataCorp, College Station, Texas, USA). Adjustments were made for race/ethnicity and study centre, unless otherwise noted.
Initial evaluation of rare variants
From mutation screening of 1297 cases and 1121 controls, we observed 22 T+SJV, 9 IFDs and 196 rMS with minor allele frequencies <0.32% for CHEK2 and <0.1% for the remaining genes. T+SJVs falling before the final exon were considered pathogenic and were associated with an OR of 3.32 (p=0.0023, table 2). Nonsense mutations located in the final exon were considered as IFDs.
The National Comprehensive Cancer Network (NCCN) and the American College of Radiology (ACR) recommend screening beginning at age 30 years and offering breast MRI in addition to mammograms for women with a ≥20% lifetime breast cancer risk.31 ,32 The American Cancer Society (ACS) recommends breast MRI for women with a 20–25% or greater lifetime risk.33 In the USA, the lifetime risk of a woman to develop breast cancer is estimated to be 12.3%;34 however, this figure is an overestimate for our purposes because it includes women who are at high risk because of inherited mutations in genes such as BRCA1 or BRCA2, or very strong family history. For a woman with minimal risk factors, for example, age at menarche ≥14 years, first childbirth at age ≤20 years and no family history, the Gail model35 and Tyrer–Cuzick model36 suggest a lifetime risk of 6.9% and 11% for developing breast cancer, respectively. If we assume that the average of these two estimates (9%) is approximately correct, carriage of a genotype conferring a 2.5-fold increase of risk, even in this low-risk population, would result in a lifetime risk estimate exceeding the NCCN, ACR and American Cancer Society (ACS) medically actionable threshold of a 20% lifetime risk. Subject to formal variant classification, carriers may then qualify, under current recommendations, for early mammography and/or enhanced screening with breast MRI. We note that threshold for intensified screening may be higher in other countries.
Considering all rMS as a group, we obtained a risk estimate that was elevated but that did not reach an OR≥2.5 threshold (OR=1.42, p=0.0091, table 2). We focused our analyses of rMSs to those that are relatively likely to impact key functions. This grouping included all of the rMS from the relatively small proteins encoded by CHEK2, RAD51, RINT1 and XRCC2. Noting the structural similarity between BARD1 and BRCA1, and that BRCA1 pathogenic rMS are so far only known from the RING and BRCT domains,37–40 we limited analyses of BARD1 rMS to RING and BRCT domain substitutions. For the relatively large proteins encoded by ATM, MRE11A, NBN and RAD50, we focused analyses of rMS on the same key functional domains specified in our prior publications (see online supplementary table S2).10 ,12 We observed 140 rMS and IFDs in key functional domains (OR 1.94, p=5.1×10−05, table 2), which still did not reach an OR≥2.5 threshold.
Grouping rMS to estimate risk and carrier rates
To define a higher-risk subset of rMS, we focused the next analyses on the three established moderate-risk genes: ATM, CHEK2 and NBN.41 There is no fully accepted method for rMS analysis. Instead of introducing a new method for variant classification, we used four existing missense analysis programs, Align-GVGD, CADD, MAPP and PolyPhen-2,16–18 ,25 to assign severity scores to the key domain rMS from these genes. Align-GVGD was selected because its scores contribute to determination of prior probabilities of pathogenicity for key domain missense substitutions in BRCA1 and BRCA2,2 MAPP and PolyPhen-2 because of their strong performance in our recent analyses of mismatch repair protein missense substitutions,42 and CADD because of its reported ability to prioritise variants across functional categories and effect sizes.17 For variant evaluation, we adjusted our pMSAs to two depths: human through platypus (mammals only), and human through the organism required for 3S/P for each individual gene. Receiver operating characteristic (ROC) curves were generated for each method and depth (if applicable). Area under the curve (AUCs) were similar for all methods (see online supplementary table S3 and supplementary figure S1). The correlations between the missense analysis programs were highest between Polyphen-2 and CADD (R2=0.56), but the R2 for any combination of missense analysis programs was never >0.8, so none of the missense analysis programs were dropped from further analysis (see online supplementary table S4).
We then toggled the severity score for each of the four programs to find the lowest score where the OR for key domain rMS above the score reached at least 2.5 (figure 1A–D). The thresholds at which each of the four rMS analysis programs reached OR≥2.5 for key domain rMS were above a score of 11 for MAPP when using pMSAs consisting of organisms from human through 3S/P, C35 for Align-GVGD when using pMSAs consisting of organisms from human through 3S/P, 23 for CADD and 0.9 for PolyPhen-2 (see online supplementary table S5). It was interesting that neither Align-GVGD nor MAPP was able to achieve an OR≥2.5 with a pMSA that consisted only of mammals (human through platypus). It appears that these sequences, although generally more complete than those from more distant organisms, do not offer adequate variation to stratify variants. Examining the variants that were placed in the OR≥2.5 category by multiple missense analysis programs, we found that an overlap of at least two of the missense analysis programs resulted in a classification of variants with an OR≥2.5 (OR 2.59, p=0.0044; online supplementary table S6).
Applying the score thresholds determined from the ATM-CHEK2-NBN group to the key domain rMS observed in the remaining six less established moderate-risk genes, rMS ORs for the BARD1-MRE11A-RAD50-RAD51-RINT1-XRCC2 group ranged from 2.41 (p=0.0078) using PolyPhen-2 to 4.86 (p=0.0129) using Align-GVGD (data not shown). We also found that concordance between at least two of the missense analysis programs resulted in a grouping of rMS with an OR≥2.5 (OR 4.90, p=0.0012; online supplementary table S6).
Having established that the thresholds identified with the ATM-CHEK2-NBN group were able to extract OR≥2.5 groupings from the remaining six genes, we used these thresholds to evaluate the proportions of cases and controls with above-threshold variants across the nine-gene ensemble (table 2). We found that 3.7–6.3% of cases and 1.0–2.5% of controls carried an above-threshold key domain rMS. Considering only the key domain rMS that were placed in an OR≥2.5 grouping by more than one of the missense analysis programs (figure 1E,F), concordance between two or more missense analysis programs was associated with an OR≥2.5 (OR 3.18, p=2.68×10−5), affecting 5.4% of cases and 1.6% of controls (table 2). These results are comparable to other studies, but include data from controls.43–45 Comparing the proportion of above-threshold key domain rMS carriers to T+SJV carriers, rMS carriers appear to outnumber T+SJV carriers by a ratio of about 2.5:1.
Asking whether the results reported here are robust to the loss of any one gene from the less established moderate-risk gene set, we performed a series of analyses in which the genotype information of one of the genes was dropped and then the OR, rMS to T+SJV ratio, and carrier percentage was re-determined for the rMS from the remaining eight genes. We observed that, in each subset of eight genes, 3.0–5.9% of cases and 0.8–2.7% of controls were carriers of a variant from the above-threshold grouping, with a ratio of rMS to T+SJVs consistently over 1.6:1 for cases (see online supplementary table S7).
Common SNP-based polygene scores and above-threshold carrier rates
Generally, individual modest-risk SNPs do not confer enough risk to impact clinical practice. An attractive method for using SNPs in a clinical setting is to combine the risk estimates from multiple SNPs. Indeed, a recent large study combined risk estimates from 77 SNPs and found ORs≥2.5 at and above the 99th percentile of the combined scores.9 We genotyped 18 BCAC-confirmed SNPs on the same subjects from the case–control mutation screening phase of this study (table 1). Using per-allele ORs from recent large studies,7 ,26 ,28–30 we treated the SNPs as a polygene and created an NPS for each subject (see online supplementary table S8 and figure 2A).
To determine how closely the NPS predicted OR, we grouped the NPS scores into deciles and compared the mean NPS of each decile to its observed OR. With all subjects grouped together, the NPS correlated highly with the observed OR (coeff.=0.9232, R2=0.70, p=0.0060) (figure 2B). Evaluating each race/ethnicity individually, Caucasians were the only group to achieve significance (coeff.=0.9835, R2=0.81, p=0.0014) (figure 2C, data not shown), likely due to small sample sizes of the non-Caucasian groups. We also tested the alternate hypothesis, NPS=observed OR, and did not observe a significant difference (p=0.75 and 0.93 for all subjects and Caucasians, respectively).
Using the NPS, how many women are at a medically actionable risk? Using an approach analogous to that applied to the key domain rMS, we toggled the NPS to find the lowest score where the observed OR for the group of subjects with NPS above the score exceeded 2.5 (figure 2D–E). From the data, 2.1% of cases and 1.2% of controls carry a combination of SNPs such that they have a polygene score associated with an average OR≥2.5 (table 3). Limiting the analysis to Caucasians, we found that 3.2% of cases and 1.3% of controls have an NPS associated with an average OR≥2.5.
To explore the possibility of integrating gene mutation screening with SNP genotyping, we tested for interactions between carriage of an OR≥2.5 rare variant (combining T+SJVs and above-threshold rMS into a single group) and the NPS. In these tests, the interaction term never approached significance (p=0.52, 0.82, 0.51 and 0.96 for analyses with rMS scored by Align-GVGD, CADD, MAPP and PolyPhen-2, respectively). Accordingly, as the multiplicative OR model does appear to apply to combinations of rare variants from these nine genes with the NPS, we isolated the subjects who carried a rare variant in the OR≥2.5 category and multiplied the OR estimated from their rMS or T+SJV with their NPS. These combined ORs varied from ∼1.0 to >5.0 (figure 3).
Neither pathogenic alleles in moderate-risk breast cancer susceptibility genes nor individual modest-risk breast cancer-associated SNPs confer the magnitude of risk of early-onset breast cancer conferred by pathogenic alleles in high-risk genes such as BRCA1 and BRCA2. Nonetheless, under a generalised understanding of NCCN, ACR and ACS guidelines, a ≥2.5-fold increased risk of breast cancer is high enough to impact the medical management of otherwise healthy carriers. Across the nine moderate-risk susceptibility genes examined here, two classes of sequence variants meet or exceed this 2.5-fold risk threshold. 2.1% of cases carried a T+SJV, and these were associated with an OR of 3.32 (p=0.0023). Each of the four missense substitution analysis programs that we evaluated was able to define a set of key functional domain rMS that reached the OR≥2.5 threshold. 5.4% carried an rMS that two or more of the programs agreed was above the threshold, and this group of rMS was associated with an OR of 3.18 (p=2.68×10−5). In addition, 2.1% of cases carried an above-threshold SNP polygene genotype.
Whether focusing on the confirmed moderate-risk genes ATM, CHEK2 and NBN or looking at all nine genes, the ratio of carriers of T+SJV to above-threshold key domain rMS was in the range of 1:2–1:3. This finding is different than the ∼10:1 ratio observed in BRCA1/2; the preponderance of above-threshold rMS in these moderate-risk genes is much more reminiscent of the situation withTP53.5 Because most rMS observed during clinical testing of these moderate-risk genes would be returned as variants of uncertain significance (VUS) in test reports, the relatively high proportion of above-threshold rMS reported here creates a challenge for test interpretation. During clinical counselling, to alleviate patient distress, observations of VUS rMS, especially in moderate-risk genes, are often downplayed as of minimal significance—‘normalised’. However, at least for the nine genes that we examined, normalising the rMS amounts to disregarding approximately 2/3 of sequence variants with OR≥2.5 detectable by the genetic tests.
Within the logical structure of the analyses presented here, the OR≥2.5 threshold applied to rMS and SNP polygene groupings was a device used to align the analyses with current patient management standards. A consequence is that the OR point estimates reported for those groupings in tables 2 and 3 are circularly dependent on the threshold selected. Nonetheless, the following four key results are independent to the circular logic underlying those OR point estimates: (i) the a priori existence of groupings with OR≥2.5; (ii) the p values associated with those groupings; (iii) the ratios of subjects with T+SJVs, rMSs with OR≥2.5 and SNP polygene with OR≥2.5; and (iv) the frequencies among controls and early-onset cases of individuals with a genotype falling into one of these groupings. These findings all correspond to open, medically relevant questions.
Although we did not accompany this study with functional assays, a yeast complementation assay applied to 25 CHEK2 missense substitutions included applicable Align-GVGD and PolyPhen-2 scores.46 Among the six rMS with Align-GVGD and PolyPhen-2 scores meeting our severity criterion, the average activity was −0.062 (SD=0.027) in an assay where the internal wild-type and dysfunctional variants were given scores of 1.00 and 0.00, respectively. In contrast, the average score among the 19 rMS not meeting our concordant severity criterion was +0.472 (SD=0.388), resulting in a p value of 1.12×10−5 against the hypothesis that the two groups have the same mean activity. Moving forward, it will become important to develop methods that combine patient observational data with in silico and functional assay results towards clinical classification of these rMS. Such methods may leverage the Bayesian classification framework already developed for rMS in BRCA1, BRCA2, MLH1, MSH2, etc.37 ,47 ,48
One weakness of this study is that it focuses on early-onset cases using a data set that already contributed either to association of rMS in these genes with breast cancer susceptibility (ATM, CHEK2),10 ,11 or susceptibility to breast cancer in general (MRE11A, NBN, RAD50, RINT1, XRCC2).12 ,14 ,15 While the impact on the overall results of a possible false association for one or another of the genes is addressed by the leave-one-out analysis, the possibility remains that the ORs that we report are systematically inflated either because this was a study of early-onset cases or because of winner's curse.41 These issues were partly ameliorated in two ways: (i) an OR≥2.5 grouping of rMS could be isolated by each of the four rMS analysis programs that we used, and (ii) the group of women that can benefit most from early or intensified breast cancer screening is primarily those at risk of early-onset breast cancer—largely, the group of women from which the cases used in this study are drawn. Looking forward, the ratio of the above-threshold rMS to T+SJV can be re-evaluated in case–control studies, but accurate assessment of risk will have to come from prospective cohort studies. A second weakness in our analytic strategy is that the rMS analyses in five of the genes included here—ATM, BARD1, MRE11A, RAD50 and NBN—are somewhat dependent on our definitions of key protein functional domains. This analytic element is in need of independent evaluation and refinement.
Our analysis raises additional questions regarding standard clinical genetic testing practices using panel tests. For the established moderate-risk genes ATM, CHEK2 and NBN, the majority of the pathogenic variants that the test can actually detect are rMS, likely to be reported to patients as VUS, and likely to be normalised during counselling. In this circumstance, how does one answer the clinical validity question, “Are the variants the test is intended to identify associated with disease risk, and are these risks well quantified?”41 What is the impact on studies intended to explore the penetrance and tumour spectrum of pathogenic variants in these genes if the studies focus on T+SJVs even though these may represent a minority of the pathogenic variants? One path forward lies in a more nuanced use of the IARC 5-class system for variant classification and reporting to incorporate more data from ongoing research on missense substitution evaluation.49 From work that defined the sequence analysis-based prior probabilities of pathogenicity for rMS in BRCA1, BRCA2 and the mismatch repair genes, one can clearly define subsets of rMS that have relatively high probabilities of pathogenicity.2 ,42 A straightforward approach for clinicians could be to make systematic efforts to enrol carriers of high probability of pathogenicity rMS in research studies, such as those coordinated through the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium,50 while still describing these findings to patients as VUS. For BRCA1, BRCA2 and the mismatch repair genes, these could be defined as rMS with prior probabilities of pathogenicity of ≥0.66 as defined at the calibrated prior probability of pathogenicity websites (priors.hci.utah.edu/PRIORS/index.php and hci-lovd.hci.utah.edu/home.php, respectively). rMS from the nine genes examined here that are placed in an OR≥2.5 grouping by two or more of the missense analysis programs similarly fall into a relatively high probability of pathogenicity subset. VUS with lower probabilities of pathogenicity could reasonably be normalised since future reclassification to a clearly pathogenic variant is rather unlikely. Such an approach would better prioritise those missense substitutions with high probabilities of pathogenicity, leading to better understanding of these VUS by clinicians and patients. This approach should empower research towards gene validation, penetrance and tumour spectrum and thereby address the question of clinical validity in the future.
The authors wish to thank all participants in the BCFR for their contribution to the study. The authors also appreciate the support of J. McKay and the Genetic Cancer Susceptibility group at the International Agency for Research on Cancer (IARC).
Twitter Follow Javier Oliver at @javiom
Contributors ELY and BJF were involved in data analysis, critical revision of manuscript and final approval. AWS, FD, GD, NF, TCF, AG, WKK, SMcK-C, TN-D, JO, AMP, MP, NR, JSR, MH, JG, FLC-K, FL, DEG, MV and CV were involved in data acquisition, critical revision of manuscript and final approval. KAK was involved critical revision of manuscript and final approval. JLH, MCS, ILA and EMJ were involved in the acquisition of subjects, critical revision of manuscript and final approval. SVT was involved in the acquisition of data, critical revision of manuscript, final approval and is the corresponding author.
Funding This work was supported by the United States National Institutes of Health (NIH) National Cancer Institute (NCI) grants R01 CA121245 and R01 CA155767, by the Canadian Institutes of Health Research (CIHR) for the CIHR Team in Familial Risks of Breast Cancer programme, by the Government of Canada through Genome Canada and the Canadian Institutes of Health Research, and the Ministère de l'enseignement supérieur, de la recherche, de la science, et de la technologie du Québec through Génome Québec. The BCFR is supported by grant UM1 CA164920 from the USA National Cancer Institute.
Disclaimer The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centres in the BCFR, nor does mention of trade names, commercial products or organisations imply endorsement by the US government or the BCFR.
Competing interests None declared.
Patient consent Obtained.
Ethics approval The studies described here were approved by the institutional review board (IRB) of the International Agency for Research on Cancer (IARC), the University of Utah IRB and the local IRBs of the BCFR centres from which we received samples.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement A table of the data used for this analysis including case–control status and variant calling information is available upon contacting the corresponding author. Additionally, protein multiple sequence alignments for the nine genes are also available upon request.