Background The currently known breast cancer-associated single nucleotide polymorphisms (SNPs) are presently not used to guide clinical management. We explored whether a genetic test that incorporates a SNP-based polygenic risk score (PRS) is clinically meaningful in non-BRCA1/2 high-risk breast cancer families.
Methods 101 non-BRCA1/2 high-risk breast cancer families were included; 323 cases and 262 unaffected female relatives were genotyped. The 161-SNP PRS was calculated and standardised to 327 population controls (sPRS). Association analysis was performed using a Cox-type random effect regression model adjusted by family history. Updated individualised breast cancer lifetime risk scores were derived by combining the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm breast cancer lifetime risk with the effect of the sPRS.
Results The mean sPRS for cases and their unaffected relatives was 0.70 (SD=0.9) and 0.53 (SD=0.9), respectively. A significant association was found between sPRS and breast cancer, HR=1.16, 95% CI 1.03 to 1.28, p=0.026. Addition of the sPRS to risk prediction based on family history alone changed screening recommendations in 11.5%, 14.7% and 19.8 % of the women according to breast screening guidelines from the USA (National Comprehensive Cancer Network), UK (National Institute for Health and Care Excellence and the Netherlands (Netherlands Comprehensive Cancer Organisation), respectively.
Conclusion Our results support the application of the PRS in risk prediction and clinical management of women from genetically unexplained breast cancer families.
- cancer: breast
- clinical genetics
- genetic epidemiology
- genetic screening/counselling
- polygenic risk score
Statistics from Altmetric.com
- cancer: breast
- clinical genetics
- genetic epidemiology
- genetic screening/counselling
- polygenic risk score
Breast cancer is the most common cancer in women in the Western world. For women with a first-degree relative with breast cancer, the risk for developing breast cancer is twofold in comparison with women without such a family history.1 Approximately 20% of this familial relative risk is explained by pathogenic variants in the high-risk genes BRCA1 and BRCA2, 2%–5% by variants in other breast cancer genes (eg, CHEK2, PALB2 and ATM) and 18% by the currently known common low risk variants, mostly single nucleotide polymorphisms (SNPs).2–5
Individually, these SNPs confer a very small increase in breast cancer risk but jointly they may confer a substantial increase of the risk.2 This combined risk of all SNPs associated with breast cancer can be summarised in a polygenic risk score (PRS). The PRS can stratify women into different risk categories,2 6–8 which for 8% of women from the general population might be high enough to be clinically relevant, regardless of family history.2
The PRS may also be combined with other risk factors, such as BRCA1/2 status or breast cancer family history, to further refine and individualise risk estimation. The large majority of breast cancer families seen in Family Cancer Clinics today cannot be linked to pathogenic variants in BRCA1 or BRCA2. Risk management for women from these families is based mainly on family history, which can be used as a variable to calculate individual breast cancer risk in various risk prediction algorithms,9 such as the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA).10
Until now, the PRS is not included in clinical genetic practice to guide clinical management. Several studies have shown an improved discriminative power between breast cancer cases and controls by combining the PRS with a breast cancer risk prediction tool.11–14 However, little is currently known of the discriminative power of the PRS between family members, with respect to who will develop breast cancer. A recent study genotyped cases and controls in 52 Finnish non-BRCA1/2 breast cancer families to calculate a 75-SNP PRS. The PRS for healthy women from breast cancer families was lower in comparison with affected family members.15 This suggests that the PRS can help to individualise risk stratification and advice for surveillance for women in breast cancer families.
Here, we explore the clinical applicability of the 161-SNP PRS for risk prediction in a cohort of 101 high-risk breast cancer families not explained by pathogenic variants in the BRCA1 and BRCA2 genes. The clinical impact of the PRS on breast cancer risk prediction based on family history alone was investigated by determining the potential change in clinical management, as stipulated by three currently used guidelines (the National Comprehensive Cancer Network guideline (NCCN),16 the National Institute for Health and Care Excellence guideline (NICE)17 and the Netherlands Comprehensive Cancer Organisation guideline (IKNL).18
Materials and methods
Two cohorts were included, a hospital-based case–control (Oorsprong van borstkanker integraal onderzocht (ORIGO)) and a family-based case–control cohort. Informed consent was obtained for all individuals. Population controls were irreversibly anonymised. Only women were included in this study.
The ORIGO cohort consists of incident breast cancer cases, not selected for breast cancer family history enrolled between 1996 and 2006 in the context of the ORIGO study, as described elsewhere.19 For the present study, 357 ORIGO cases were selected for which genotyping had been performed on the iCOGS array. Likewise, 327 healthy genotyped bloodbank donors were included in the ORIGO cohort as controls. Age of last follow-up was determined as the age at diagnosis for cases and the age at inclusion for controls.
The families from the family-based cohort were selected between 1990 and 2012 through five Clinical Genetic Services (Rotterdam, Groningen, Nijmegen, Leiden, the Netherlands, and Budapest, Hungary) and the Foundation for the Detection of Hereditary Tumours in the Netherlands, as previously described.20 At least one family member affected with breast cancer was tested for BRCA1 and BRCA2. We did not have informed consent for testing other specific genes besides BRCA1 and BRCA2. The selection criteria for families included: breast cancer (invasive/in situ) before the age of 60 years in at least three women or in two women if at least one of them had bilateral breast cancer before the age of 60 years. In total, 102 families without a pathogenic variant in BRCA1 or BRCA2 were included of which a blood DNA sample was available for 612 women. Of these women, 340 were affected with breast cancer and 272 were unaffected relatives. The unaffected relatives were censored regarding breast cancer, irrespective of other types of cancer. Most cancers were verified with a pathology report. Date of last follow-up was determined as the date of last contact with the family.
DNA samples of all included individuals were genotyped with the iCOGS SNP array, designed for association analysis in breast, ovarian and prostate cancer, containing 211 155 SNPs.3 Genotyping and quality control of the ORIGO cohort was performed as part of association studies conducted by the Breast Cancer Association Consortium (BCAC).3 For the family-based cohort, quality control led to the exclusion of 27 individuals (see online supplementary material and methods). Therefore, further analysis was done with 323 breast cancer cases and 262 unaffected relatives from 101 families for this cohort.
Some of the 182 currently known SNPs are associated primarily with oestrogen receptor (ER)-negative or ER-positive breast cancer. We constructed a PRS for overall breast cancer with 161 SNPs, selecting all SNPs significantly associated (p<5.10−8) with overall breast cancer in case–control studies performed by BCAC4 (online supplementary table S1). ER status was not known for all cases in our study, and substrata would become too small to reach sufficient statistical power for ER-specific PRSs. The 85 SNPs that were not directly genotyped by the iCOGS array were imputed by prephasing with SHAPEIT and IMPUTE2.21 22 To improve imputation quality both the reference panels 1000 genomes phase three and Genome of the Netherlands (GoNL) were used.23 24
Polygenic risk score
The following formula was used to calculate the PRS based on 161 SNPs:
where n ij is the number of risk alleles (0, 1 or 2) for SNP i carried by individual j and OR i is the per-allele log OR for breast cancer associated with SNP i. The ORs were the most recent estimates from analysis of the OncoArray data4 (online supplementary table S1). The majority of studies used for this analysis were population-based case–control studies.4
The PRS was calculated for all included individuals. For the descriptive analysis, the PRS was standardised to the mean and SD in healthy population controls. The mean standardised PRS (sPRS) in population controls is therefore 0 with an SD of 1. Standardisation facilitates the comparison between different groups. For further analysis in the family-based cohort, the PRS was standardised to the mean and SD in the family-based cohort including both cases and unaffected relatives.
Total BOADICEA score and polygenic load (BOADICEAFH)
The pedigrees were collected and drawn for all families, including all known first-degree and second-degree relatives of the genotyped individuals. For 25 of the 561 family members affected with breast cancer, the age of breast cancer diagnosis was not known. For these affected family members, the age at diagnosis was assumed to equal the average age of developing breast cancer in the Netherlands (61 years), or the age at last follow-up if this was earlier.
Two different scores were calculated for all individuals in the family cohort by the online risk prediction tool BOADICEA,10 the total BOADICEA score and the polygenic load. The total BOADICEA score (hereafter termed BOADICEALTR) is a measure for lifetime breast cancer risk and incorporates BRCA1 and BRCA2 status, age, birth cohort and a polygenic load. The polygenic load in the BOADICEA model is an estimated polygenetic component representing a large number of loci of small effect to capture the residual familial aggregation of breast cancer and is therefore a measure of the breast cancer family history.15 Calculation of the polygenic load is described previously by Muranen et al.15 To avoid confusion between the variables polygenic load and the PRS, the polygenic load is hereafter termed BOADICEAFH. The BOADICEALTR and BOADICEAFH were calculated by simulating an individual to be at an age of 1 year and unaffected (for cases), that is, lifetime risk at birth, given the family history.
To define the degree of correlation between the sPRS and the BOADICEAFH, the Pearson correlation coefficient was calculated. A Cox-type random effect regression model was used to estimate the association between the sPRS and breast cancer, adjusting by family history, using the BOADICEAFH (FH) as covariate:
where is the age at first diagnosis of breast cancer or the age at censoring for member j in family i. Censoring was done at age of last contact with the family or death. Censoring at the age of diagnosis for other tumours, if present, did not affect the result. refers to the baseline hazard, which is left completely unspecified (Cox-type model), is the main effect of interest, the regression coefficient of the sPRS and β 2 is the effect of the BOADICEAFH. In comparing affected to unaffected relatives, it is important to adjust for different numbers of affected versus unaffected relatives per family. We therefore added a family specific random effect u> 0 in our model, shared by the members of the same family. This unobserved heterogeneity shared within families was assumed to follow a gamma distribution.
To evaluate the potential of the sPRS on the reclassification of breast cancer risk, we constructed a new individual breast cancer risk score based on both the BOADICEALTR and the estimated effect of the sPRS with the model defined by expression 1. Namely, since BOADICEALTR is defined as the probability of experiencing breast cancer before age 80 years, the new score is calculated as the distribution function at 80 of a Cox proportional hazard model using BOADICEALTR as baseline (average risk in the sample) and the sPRS as covariate:
The sPRS is expected to individualise cancer risk estimates but not to alter the overall average risk level computed by BOADICEA in the joint sample, that is, the higher risks given to some individuals are expected to be compensated by lower risks in others. For this reason, we centred the sPRS at the mean of the whole family cohort.
The risk calculation based on BOADICEA alone (BOADICEALTR) and the new individual breast cancer risk score (BOADICEAsPRS) were compared for all individuals in the family-based cohort to define the change in risk category and thus advice for breast cancer surveillance according to three different guidelines, NICE,17 NCCN16 and IKNL18 (online supplementary table S2).
Statistical significance was established at 5%, analysis was performed using R V.18.104.22.168
The analysis of the ORIGO cohort included 357 breast cancer cases and 327 population controls. The analysis of the family-based cohort included 323 breast cancer cases and 262 unaffected relatives from 101 families. Unaffected relatives derived from 49 of these 101 families.
Virtually all breast cancers were invasive in both cohorts, and second breast cancers were more prevalent in familial cases (table 1). In both the ORIGO and family-based cohort, the sPRS was on average higher in cases than in controls (table 2). The unaffected relatives in the family-based cohort had on average a higher sPRS in comparison with ORIGO cases and controls. The mean sPRS for sporadic cases was 0.35 (SD=0.92), and in the family-based cohort, the mean sPRS was 0.70 (SD=0.90) and 0.53 (SD=0.95) for the affected and unaffected relatives, respectively. In the family-based cohort, the sPRS was higher for cases with two invasive breast tumours in comparison with cases with one breast tumour (invasive/in situ), with a mean sPRS of 0.66 (SD=0.89) and 0.89 (SD=0.93), respectively. The distributions of the sPRS in both cohorts are shown in figure 1. Information about the 95% CI and SE in different groups are shown in table 2.
Further analyses were performed only for the family-based cohort. A weak but statistically significant positive correlation was detected between the BOADICEAFH (measure of the family history) and the sPRS. The Pearson correlation coefficient was 0.103, 95% CI 0.022 to 0.183, p=0.013, which means that 1.1% of the variance in the sPRS is explained by the BOADICEAFH. Larger correlation was found in the unaffected relatives (correlation coefficient 0.153, 95% CI 0.032 to 0.269, p=0.013). No evidence of correlation was found in family cases only (correlation coefficient 0.057, 95% CI −0.052 to 0.165, p=0.306).
Cox-type random effects modelling
The sPRS should not be directly combined with the BOADICEALTR because the PRS is a part of the familial relative risk, captured by BOADICEA by its polygenic component, the BOADICEAFH. For this reason, adjustment was made by the BOADICEAFH in the association analysis, using the model defined by expression 1. Furthermore, adjusting for the BOADICEAFH helps to correct for ascertainment bias. The BOADICEAFH was calculated for cases assuming they were at age 1 year and unaffected. Consequently controls have, in our sample, a larger BOADICEAFH than cases. Hence, adding the BOADICEAFH as a covariate in the model indirectly corrects the oversampling of cases of our design. Within the family-based cohort, the sPRS was significantly associated with breast cancer, conferring an HR of 1.16 (95% CI 1.03 to 1.28; p=0.026) per SD. No statistical significant association was found without adjustment, HR 1.10, 95% CI 0.98 to 1.23, p=0.122.
PRS-based individualised risk score
To calculate a PRS-based breast cancer risk score (BOADICEAsPRS), the individual sPRS was combined with the BOADICEALTR. Both risk scores for each individual in the family-based cohort are plotted against each other in figure 2. This resulted in a change in breast cancer lifetime risk for all individuals. We evaluated the proportions of individuals that would fall in another risk management category, given risk cut-off levels from three different clinical guidelines. Risk management changed for 19.8%, 14.7% and 11.5% of women under the IKNL,18 NICE17 and NCCN16 guidelines, respectively (table 3). The percentage of family cases and unaffected relatives who changed to a lower or higher risk category based on these guidelines are shown in online supplementary table S3. Examples of the change in breast cancer risk category are shown for individuals in three pedigrees in figure 3 and online supplementary table S4.
PRSs, derived from a combination of disease-associated SNPs, are gaining importance as predictive factor for a range of disease phenotypes, including breast cancer.26 All discovered breast cancer SNPs to date explain 18% of the familial relative risk.4 Here, we use a PRS based on these SNPs to show the potential clinical utility within high-risk breast cancer families. While most studies use population controls as a reference group,2 8 12 13 we used the healthy relatives of breast cancer cases as a reference to make it more compatible with clinical practice in Family Cancer Clinics. Similar to population-based case–control studies,2 12 13 we found that the PRS was significantly associated with breast cancer within high-risk breast cancer families. In addition, the PRS may change breast screening recommendations in a substantial proportion of women from these families, according to currently used screening guidelines.16–18 For incompleteness of data on ER status, we did not calculate PRSs predictive for ER-positive or ER-negative disease.5 27 While breast cancer screening guidelines are mainly based on overall breast cancer risk, some guidelines suggest discussing the use of chemoprevention with women at high risk of breast cancer.16 17 We expect these ER-specific PRSs, similar to the overall PRS, to individualise these discussions within these families.
Some studies have described an association between the PRS and contralateral breast cancer.8 28 In agreement with this, we found the average sPRS in women diagnosed with two primary breast cancers in our family cohort to be higher in comparison with women with one breast cancer (similarly in ORIGO cases, online supplementary figure S1 and table S5). Thus, the PRS may be helpful managing contralateral breast cancer risk and guide the choice for treatment or risk reducing mastectomy.
The family-based cohort used in our study was not part of the cohort used to discover the breast cancer associated SNPs by GWAS, while the ORIGO cohort was.3 4 A notable finding in our family-based cohort was that unaffected relatives of familial breast cancer cases had on average a higher sPRS than ORIGO incident breast cancer cases, not selected by family history. This may be due to our selection of families with multiple cases of breast cancer, since SNPs of this PRS are expected to cluster in breast cancer families. Moreover, the mean sPRS we calculated for ORIGO cases was lower than found in a large population-based study.2 Since we found no evidence for substructures in the ORIGO cohort (online supplementary figure S1 and table S5), this effect is probably due to the relatively small number of ORIGO cases included in this study.
Three previous studies have also genotyped breast cancer cases and their unaffected relatives.7 15 29 These studies found an association with breast cancer as well, but effect sizes are difficult to compare because of differences in methodology and cohort selection criteria. Furthermore, these studies used a much smaller number of SNPs to calculate the PRS. Li et al 7 analysed a prospective dataset and concluded that their 24-SNP PRS could have altered clinical management in up to 23% of women, regarding an MRI screening threshold of 20% breast cancer lifetime risk. Evans et al 29 performed a case–control study of women attending a familial risk clinic and showed that their 18-SNP PRS moved 52% of the controls without a pathogenic variant in BRCA1 or BRCA2 to a different lifetime risk category based on the NICE guideline.17 29
In our study, we adopted a conditional approach for association analysis because of the large heterogeneity between the families. Although our use of the BOADICEAFH adjusts for family history, the HR is probably still underestimated given the strong selection criteria used in our study. Of note, this BOADICEAFH is not a true family score in a clinical sense, given the retrospective nature of our family cohort. In clinical practice, the risk scores are only calculated for unaffected family members, while in this study, we derive the BOADICEAFH also for cases, assuming they were at age 1 year and unaffected. With this definition, controls have, in general, a larger BOADICEAFH than cases. Hence, adding the BOADICEAFH as a covariate in the model indirectly corrects the oversampling of cases of our design. The same definition of the BOADICEAFH is also used when computing BOADICEALTR and the new individual score BOADICEAsPRS, given by expression 2.
We found that 1.1% of the variance in the sPRS is explained by the BOADICEAFH. Given that 18% of the familial relative risk for breast cancer is explained by the currently known SNPs, this is lower than expected. Nonetheless, other studies have also found a weak correlation or no correlation at all between the PRS and the BOADICEAFH or total BOADICEA score.12 15 Thus, BOADICEA appears to be a poor predictor of the PRS, underscoring the value of measuring the PRS for every individual in the family instead of using an estimated PRS based on the total family history.
It is estimated that a large number of SNPs just below the level of genome-wide significance, combined with the currently used 161 SNPs, are able to explain about 41% of the familial relative risk.4 Addition of these SNPs could potentially further refine risk prediction and improve the discriminatory power of the PRS. Studies are now ongoing to find the best performing PRS, including also these SNPs. Khera et al 30 found that a PRS of 5218 SNPs associated with breast cancer at a significance level of <5.10−4, combined with age, had the best performance based on the area under the receiver-operator curve. Mavaddat et al 31 used a hard-thresholding approach to include 313 SNPs at a significance level of <10−5. A further improvement for breast cancer risk prediction could come from information on pathogenic variants in non-BRCA high-risk or moderate-risk breast cancer genes (eg, PALB2, CHEK2 and ATM). Pathogenic variants in these genes are found in approximately 4%–6% of women affected with breast cancer.32 33 Recently, the BOADICEA model has been extended with incorporation of the effects of truncating variants in CHEK2, PALB2 and ATM and the 313-SNP based PRS to calculate breast cancer lifetime risks.34 A limitation of our study is that we had no ethical approval to test CHEK2, PALB2 and ATM in the studied families. Extrapolating from expected prevalences of pathogenic variants in these genes, we estimate the total percentage of individuals that would have changed to another risk category by addition of the PRS to be 3%–4% higher than the 20% we report here.
In summary, we showed that the PRS based on the most recently discovered breast cancer SNPs can be used for breast cancer risk prediction within high-risk breast cancer families. Individualising breast cancer risk prediction by adding the individual 161-SNP PRS to family history-based risk prediction may change screening recommendation in up to 20% of the individuals in these families. While this study illustrates the importance of clinical applicability of the PRS, our results must be interpreted with caution. The HR obtained in this family cohort cannot be translated directly to the clinic as the effect size must be validated in another larger familial breast cancer cohort. Further evaluation, preferably in prospective settings, will be needed.
We would like to thank Prof. D.F. Easton (University of Cambridge, United Kingdom) for critical review of the manuscript. We would like to thank M.E. Braspenning (Leiden University Medical Centre, the Netherlands) for drawing all pedigrees from the family-based cohort. We would like to thank dr. S. Böhringer and R.L.M. Tissier (Leiden University Medical Centre, the Netherlands) for helpful discussions on statistics at the beginning of the project.
IMML and FSH contributed equally.
Contributors PD and CvA designed and supervised the project. AH, CS, HMH, JO, NH, EO, HV, and CvA have recruited the included breast cancer families and provided the DNA samples. MV, and FH contributed to DNA sample preparation. FH, and IL were responsible for data acquisition. AL has calculated the BOADICEA scores. IL analysed the results with support from MRG. IL, MRG, PD and CvA were involved in data interpretation. IL wrote the manuscript with support from MRG, CvA and PD. All authors read and approved the final manuscript.
Funding This work was supported by the Dutch Cancer Society (KWF), grants UL2009-4388 and UL2014-7473.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available upon reasonable request. No data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.