Article Text

*RAD51C*,

*RAD51D*,

*BARD1*updates to tumour pathology and cancer incidence

## Abstract

**Background** BOADICEA (Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm) for breast cancer and the epithelial tubo-ovarian cancer (EOC) models included in the CanRisk tool (www.canrisk.org) provide future cancer risks based on pathogenic variants in cancer-susceptibility genes, polygenic risk scores, breast density, questionnaire-based risk factors and family history. Here, we extend the models to include the effects of pathogenic variants in recently established breast cancer and EOC susceptibility genes, up-to-date age-specific pathology distributions and continuous risk factors.

**Methods** BOADICEA was extended to further incorporate the associations of pathogenic variants in *BARD1*, *RAD51C* and *RAD51D* with breast cancer risk. The EOC model was extended to include the association of *PALB2* pathogenic variants with EOC risk. Age-specific distributions of oestrogen-receptor-negative and triple-negative breast cancer status for pathogenic variant carriers in these genes and *CHEK2* and *ATM* were also incorporated. A novel method to include continuous risk factors was developed, exemplified by including adult height as continuous.

**Results**
*BARD1*, *RAD51C* and *RAD51D* explain 0.31% of the breast cancer polygenic variance. When incorporated into the multifactorial model, 34%–44% of these carriers would be reclassified to the near-population and 15%–22% to the high-risk categories based on the UK National Institute for Health and Care Excellence guidelines. Under the EOC multifactorial model, 62%, 35% and 3% of *PALB2* carriers have lifetime EOC risks of <5%, 5%–10% and >10%, respectively. Including height as continuous, increased the breast cancer relative risk variance from 0.002 to 0.010.

**Conclusions** These extensions will allow for better personalised risks for *BARD1*, *RAD51C*, *RAD51D* and *PALB2* pathogenic variant carriers and more informed choices on screening, prevention, risk factor modification or other risk-reducing options.

- genetic counseling

## Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. The models are implemented in the CanRisk tool, which is freely available at www.canrisk.org.

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

## Statistics from Altmetric.com

#### WHAT IS ALREADY KNOWN ON THIS TOPIC

Pathogenic variants in

*BARD1*,*RAD51C*and*RAD51D*have recently been established as breast cancer susceptibility genes, and pathogenic variants in*PALB2*have been shown to be associated with epithelial ovarian cancer risk. No cancer risk prediction model currently exists which incorporates these associations.

#### WHAT THIS STUDY ADDS

The BOADICEA (Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm) multifactorial breast and ovarian cancer risk prediction model has been extended to incorporate these associations and has been implemented in the CanRisk tool (www.canrisk.org) for use by healthcare professionals.

#### HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE AND/OR POLICY

The enhanced risk prediction models will enable healthcare professionals to provide personalised breast and epithelial ovarian cancer risks to

*BARD1*,*RAD51C*,*RAD51D*and*PALB2*pathogenic variant carriers and will allow for more informed choices on cancer risk management options.

## Introduction

Breast cancer (BC) and epithelial tubo-ovarian cancer (EOC) are two of the most common cancers in women.1 2 Through mammography or other methods, screening for BC can reduce mortality, and organised screening is available in most developed countries.3 For EOC, no effective screening exists, but the disease can be prevented by salpingo-oophorectomy. However, these preventative options are associated with adverse effects. Therefore, identifying those at increased risk may help to target screening and preventative options to those most likely to benefit.4 Both BC and EOC risks are multifactorial diseases, with family history of cancer (FH), genetic factors and lifestyle, hormonal and reproductive risk factors (RFs) all contributing to risk.5–7

Previously we developed the BOADICEA (Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm) model for BC risk prediction and for the likelihood of carrying pathogenic variants (PVs) in BC susceptibility genes. BOADICEA v5 incorporates the effects of PVs in five BC susceptibility genes (*BRCA1*, *BRCA2*, *PALB2*, *CHEK2* and *ATM*), the effects of known common genetic variants summarised as a polygenic risk score (PRS, accounting for ~20% of the polygenic variance), and a polygenic component that accounts for any residual familial aggregation.8 9 We also developed a similar EOC model (Ovarian Cancer Model v1) that considers the effects of PVs in *BRCA1*, *BRCA2*, *RAD51D*, *RAD51C* and *BRIP1* on EOC together with a PRS (accounting for ~5% of the polygenic variance) and a residual polygenic component.10 11 BOADICEA includes mammographic density and both models incorporate the effects of known lifestyle, hormonal, reproductive and anthropometric RFs. In addition, the models incorporate breast tumour heterogeneity by considering the distributions of tumour oestrogen receptor (ER) and triple-negative (TN) (ER, progesterone receptor and human epidermal growth factor receptor 2 negative) status for *BRCA1* and *BRCA2* PV carriers and the general population.12 13 Both models are freely available to healthcare professionals via the CanRisk webtool (www.canrisk.org) and are widely used by healthcare professionals.14

Recently, large population-based and family-based targeted sequencing studies have established that PVs in *RAD51C*, *RAD51D* and *BARD1* are associated with BC risk15 16 and that PVs in *PALB2* are associated with EOC risk.17 18 In addition, analysis of the tumour characteristics in the BRIDGES study has provided age-specific estimates of the distributions of tumour characteristics for PV carriers in all established susceptibility genes.19

A further limitation of the previous models is that all epidemiological RFs are treated as categorical. However, some RFs (eg, height, body mass index (BMI) mammographic density) are intrinsically continuous, and discretisation results in a loss of information, reducing their predictive ability.

Here we extend both models to explicitly model the effects of PVs in the recently established BC and EOC susceptibility genes and incorporate up-to-date age-specific pathology distributions. We present a methodological framework for incorporating continuous RFs into the model, and we demonstrate this by including height as a continuous variable. Finally, we describe updates to the population reference cancer incidence rates used in the models by incorporating more up-to-date incidences, incidences for additional countries and refining the derivation of birth-cohort-specific incidences for inclusion in the models that address sparsity in the population incidence data.

## Methods

### Rare moderate-risk pathogenic variants

Both BOADICEA and the EOC models, model cancer incidence as an explicit function of PVs in known high- and moderate-penetrance susceptibility genes (major genes) together with a polygenic component.9–12 20–22 By using an explicit genetic model, they can account for both genetic testing and detailed FH. BOADICEA includes the genes *BRCA1*, *BRCA2*, *PALB2*, *CHEK2* and *ATM*, with dominance in that order, along with a BC susceptibility polygenic component. The EOC model includes the genes *BRCA1*, *BRCA2*, *RAD51D*, *RAD51C* and *BRIP1*, with dominance in that order, along with an EOC susceptibility polygenic component. Details of the underlying model are included in the online supplemental material. The values of the parameters for the original models were determined by complex segregation analysis.9 10 However, this was not possible for the extended versions since no sufficiently large dataset containing all the model features was available. Instead, we adopted a synthetic approach,23 in which additional model parameters are taken from large-scale external studies.8 11 12 21

### Supplemental material

Here, BOADICEA was extended to explicitly model the effects of PVs in *BARD1*, *RAD51C* and *RAD51D* (i.e. in total, eight BC susceptibility genes), while the EOC model was extended to include *PALB2* (i.e. in total, six EOC susceptibility genes). In both models, the effects of PVs were included as major genes and are parameterised by their allele frequency in the general population and their age-specific relative risks (RRs). The BC RR for carriers of PV in *BARD1* was taken from the BRIDGES study,15 while those for *RAD51C* and *RAD51D* were the meta-analysed values from Dorling *et al*
15 and Yang *et al*.24 The EOC RR for *PALB2* PV carriers was taken from Yang *et al*.17 The BRIDGES study15 suggested that the RR estimates associated with PVs in *ATM* are lower than the previously assumed estimate of 2.8,21 and it was therefore updated to the Dorling *et al*
15 estimate. The previously assumed RR estimates for PVs in *BRCA1*, *BRCA2*, *PALB2* and *CHEK2*
8 21 were based on large studies that enabled the estimation of age-specific risks or were estimated as part of the BOADICEA model fitting process, and were not updated, except for the *BRCA2*-associated EOC RRs for ages 59 and over (online supplemental material). The PV carrier frequencies for *PALB2*, *CHEK2* (including all PVs), *ATM*, *BARD1*, *RAD51D*, *RAD51C* and *BRIP1* and screening test sensitivities for all genes were derived from Dorling *et al*.15 We used the BRIDGES study to derive these frequency estimates as it is a very large population-based dataset that includes targeted sequencing data. Frequencies were based on the control frequencies in European populations, adjusted for the assumed sensitivity of the sequencing and the fact that large rearrangements were not detectable (online supplemental material). The default sensitivities were then calculated, assuming that clinical genetic testing will detect all known pathogenic mutations except for large rearrangements (except *BRCA1* and *BRCA2*, where testing for large rearrangements is routinely done). All model parameters for PVs are given in table 1.

As the polygenic component captures all residual familial aggregation not explained by the major genes, the previous models implicitly included the contributions of PVs in the new genes (ie, *BARD1*, *RAD51C* and *RAD51D* for BOADICEA and *PALB2* for the EOC model). Therefore, to avoid double counting their contribution, it was necessary to remove their contribution from the polygenic component by adjusting the log-RR per SD of the polygenic component such that the total variance of the polygenic component and the new genes is the same as that of the polygenic component of the previous model21 (online supplemental material).

The association between *PALB2* PVs and EOC was also included in the BOADICEA model, and the associations with male BC and pancreatic cancer have been included in both models.17

The impact of including PVs in the new BC susceptibility genes on risk prediction were assessed by considering the risk categories described in the National Institute for Health and Care Excellence familial BC guidelines25 for hypothetical women with different ages or family history. For lifetime risk (aged 20–80 years), three categories are defined: (1) near-population risk, for risks less than 17%, (2) moderate risk, for risks in the range of 17%–30% and (3) high risk, for risks of 30% or greater. Reclassification was considered based on questionnaire-based RFs (QRFs) (RFs other than mammographic density), mammographic density (MD, based on the BI-RADS system) and a polygenic risk score (PRS). The assumed distributions and RRs for QRFs and MD have been described in detail previously.8 11 For BC, the PRS was taken to be the Breast Cancer Association Consortium 313 variant PRS, which accounts for 20% of the overall polygenic variance.8 26 For EOC, we defined three risk categories based on lifetime risk27 28: (1) near-population risk, for risks of less than 5%, (2) moderate risk, for risks in the range of 5%–10% and (3) high risk, for risks of 10% or greater, and reclassification was considered based on RFs and PRS. For EOC, the PRS was taken as the Ovarian Cancer Association Consortium 36 variant PRS, which accounts for 5% of the overall polygenic variance.11 29

### Updates to tumour pathology

Both models incorporate data on BC tumour pathology, specifically ER and TN. The distribution of pathology for affected carriers of PVs differs substantially from that in non-carriers for several genes, so that pathology data can affect the carrier probabilities and hence cancer risks.11 12 In BOADICEA and the EOC model, breast tumours are classified into five groups based on ER and TN status: ER unknown, ER-positive, ER-negative/TN unknown, ER-negative/not TN and TN. Previously, the models achieved this using age-dependent distributions in the general population and *BRCA1* and *BRCA2* PV carriers and an age-independent distribution for *CHEK2* PV carriers.12 21 Due to the lack of data, the tumour ER distribution for carriers of PV in other genes was assumed to be the same as the general population. Here, the models have been updated to incorporate age-dependent ER and TN tumour distributions for carriers of PVs in the BC susceptibility genes *PALB2*, *CHEK2*, *ATM*, *BARD1*, *RAD51C* and *RAD51D*, using data from BRIDGES.19

### Continuous risk factors

The previous versions of the models included reproductive, lifestyle, hormonal and anthropometric RFs.8 11 One limitation of these models was that the RFs needed to be coded as categorical variables. Some RFs are naturally continuous, requiring prior discretisation to a finite number of categories, resulting in some loss of information and reduction in risk discrimination. Here, the methodology was extended to allow the inclusion of continuous RFs.

The key challenge is to calculate the baseline incidences
in equation s.1 (online supplemental file 1) from the population incidence and the RF distributions. The baseline incidences are calculated sequentially for each age
t
(considered discrete) using the values at age
, starting from age 0, requiring the evolution with age of the probability distribution of those who are disease free.30 For discrete factors/genes, this involves summing over all possible categories/genotypes, but for continuous factors/genes, it would involve integrating over all possible values. In principle, these integrals could be computed (either analytically or numerically). However, at each age, the number of terms in the integrand increases by a factor of 2, so by age 80, there are >10^{24} terms, with evaluation becomes impracticable. Alternatively, the RF could be discretised into a very large number of categories. This would give a very close approximation to the continuous distribution, but (particularly once multiple RFs are considered, as here) the large number of categories would also make the calculations impractical. Instead, we propose an alternate approach in which the continuous factors are discretised with categories adapted according to the observed RF. The approach is as follows:

1. First, discretise the range of possible RF values into a finite number ( n ) of bins and calculate the probability mass and RR for each bin from the probability density and RR function for the continuous RF. This part is identical to the standard approach for discretising RFs, used in the existing models.8 For a RF, x , with probability density and relative risk , the probability mass for bin i with range is:

(1)

and the corresponding RR is

(2)

2. Create an additional bin based on the individual’s measured RF value that has an infinitesimal width. The RR for this bin is taken as the RR at the measured value, and it has zero mass. As this bin is infinitesimal, its overlap with the other bins is zero, so there is no double-counting.

This procedure creates a categorical RF with categories, where the individual is assigned to the category defined in step 2. This allows the exact value of the risk for the individual to be used, while the number of categories required to compute the baseline rates is fixed, limiting the computation time.

The accuracy of the approximation in the procedure relies on the assumption that the range of values within each bin have similar RRs, which should be reflected in the choice of discretisation scheme and the number of bins n . These choices will depend on the shape of the distribution and the RR function.

The above procedure can be applied to any RF distribution or RR function. However, the process assumes that an individual’s position within the distribution is fixed with respect to age, although the value of the RF and RR may vary with age. Here, the method was applied to height.

### Updates to population incidences

The baseline incidences in equation s.1 in online supplemental file 1 are birth year and country specific as a consequence of using birth year and country-specific population incidences in the constraining process. We refined the derivation of cohort-specific population incidences to account for variability in the incidences due to small numbers. In addition, we have updated existing incidences in the model to include more recent calendar periods and adapted the model to use cancer incidence from four new populations: the Netherlands, France, Slovenia and Estonia. Details are included in the online supplemental material.

## Results

### Rare moderate-risk pathogenic variants

Table 1 summarises the models’ genetic parameter estimates, including those for the new genes. The estimated cumulative age-specific BC risks for *BARD1*, *RAD51C* and *RAD51D* PV carriers in BOADICEA and EOC risks for *PALB2* carriers, assuming the UK incidences applicable to those born in the 1980s, are shown in figure 1. The estimated average lifetime BC risks for PV carriers are 24%, 22% and 21% for *BARD1*, *RAD51C* and *RAD51D* PV carriers, respectively. The estimated lifetime EOC risk for *PALB2* carriers is 5.0%. Based on the assumed allele frequencies, 0.22% of the population carry PV in the genes *BARD1*, *RAD51C* or *RAD51D*, and these explain on average 0.31% of the female BC polygenic variance (averaged over all ages and cohorts, weighted by the age-specific and cohort-specific BC incidences). Approximately 0.13% of the population carry PVs in *PALB2*, explaining 0.16% of the EOC polygenic variance and 2.5% of the male BC polygenic variance.

Figure 2A–F and online supplemental table S1 show the distributions of lifetime BC risks for carriers of PVs in B*ARD1*, *RAD51C* and *RAD51D* for a female with unknown FH and a female whose mother is affected at age 50 based on PV carrier status alone and including QRF, MD and a PRS. Based solely on PV carrier status, all females with unknown FH would be classified as at moderate risk. When information on QRF, MD or PRS is known, there is significant reclassification to near-population and high-risk categories, which is greatest when all factors are used in combination. For example, based on lifetime BC risks and using the full multifactorial model incorporating QRF, MD and with unknown FH would be classified as at moderate risk. When information on QRF, MD or PRS is known, there is significant reclassification to near-population and high-risk categories, which is greatest when all factors are used in combination. For example, based on lifetime BC risks and using the full multifactorial model incorporating QRF, MD and PRS, 33.9% of *BARD1* PV carriers with unknown FH would be reclassified from moderate risk to near-population risk, and 21.9% would be reclassified to high risk (online supplemental table S1). Similarly, *BARD1* PV carriers with an affected first-degree relative would be considered high risk (risk of 33.7% by age 80) based on family history and PV status alone. Incorporating the other risk factors would reclassify 12% as near-population risk and 40.2% as moderate risk (online supplemental table S1).

Figure 2G,H and online supplemental table S2 show the distribution of lifetime EOC risks for carriers of PVs in *PALB2* for a female with unknown FH and a female whose mother is affected at age 50, as a function of the RFs and PRS. For a *PALB2* carrier with unknown FH, when the RFs and PRS are considered jointly, 62.4% are classified as near-population risk, 34.9% as moderate risk and 2.7% as high risk. The corresponding proportions with an affected mother are 11.2%, 55.8% and 33%, respectively. However, even among *PALB2* carriers with an affected mother, 97.5% will have risks of less than 3% by age 50 (online supplemental table S2).

### Tumour pathology

Figure 3 and online supplemental tables S3 and S4 show the age-specific distributions of ER-negative tumours and TN tumours among ER-negative tumours used in the models for *PALB2*, *ATM*, *CHEK2*, *BARD1*, *RAD51C* and *RAD51D* PV carriers based on the BRIDGES data.19 *BARD1*, *RAD51C* and *RAD51D PV* carriers predominantly develop ER-negative BCs, and the proportions decrease with increasing age. On the other hand, *CHEK2* and *ATM* carriers primarily develop ER-positive BCs, and the proportion of ER-positive tumours increases with age. Among those with ER-negative tumours, most tumours are TN for PV carriers in all genes, except *CHEK2* carriers, in whom the majority are ER-negative but not TN.

Using the updated age-specific and gene-specific ER-negative and TN tumour status distributions resulted in differences in the predicted overall and gene-specific carrier probabilities by different tumour pathology and age (figure 4). For *ATM*, the carrier probabilities for ER-negative tumours are reduced relative to previous estimates, reflecting the stronger association with ER-positive disease. Carrier probabilities for *CHEK2* now show a decline with age for ER-negative tumours (previously, this was only predicted for ER-positive disease). The carrier probabilities for *PALB2* remain similar to previous estimates. For the new genes *BARD1*, *RAD51C* and *RAD51D*, the carrier probabilities are, as expected, higher for ER-negative and TN diseases, but there is little variation by age.

### Continuous risk factors

As previously, adult female height was assumed to be normally distributed with mean 162.81 cm and SD 6.452 cm, and be associated with a log-RR per SD, for both BC and EOC, of 0.10130.8 11 We therefore discretised the normal distribution such that the probability masses of the bins were given by a binomial distribution , giving sufficient discretisation to adequately capture the tails of the distribution. We examined the relative discretisation error of the predicted lifetime risk as a function of the number of bins (figure 5E,F) and chose , as the lowest number of bins such that the root-mean-square relative error was less than . Compared with the discrete (five-level) RF, the variance of the RR of both BC and EOC increased from 0.002 to 0.010 when height was included as a continuous RF. The effects on predicted lifetime risks are shown in figure 5 A–D). Under the continuous implementation here, the lifetime BC risk varied from 9.7% for the first percentile to 14.6% for the 99th percentile, whereas under the previous discrete distribution, the risks range from 10.1% to 14.2%.

## Discussion

This work has extended the multifactorial BOADICEA BC and EOC risk prediction models (BOADICEA v6 and the Ovarian Cancer Model v2), employing a synthetic approach.23 The explicit effects of PVs in *RAD51C*, *RAD51D*, *BARD1* and *PALB2*, which have now been established as BC and/or EOC susceptibility genes15–17 and are commonly included on cancer gene panels, are now included in the models. The models have also been extended to accommodate continuous RFs, and parameterisation of tumour pathology and cancer incidence have been updated with more recent data. These represent the most comprehensive models for BC and EOC and will allow more complete BC and EOC risk assessment of those undergoing gene-panel testing. In a separate study, the BOADICEA v6 breast cancer model presented here has been validated in an independent prospective study of 66 415 women attending mammographic screening in Sweden. The full model, including RFs, mammographic density, PRS and PVs in *BRCA1*, *BRCA2*, *PALB2*, *CHEK2*, *ATM*, *BARD1*, *RAD51C* and *RAD51D* was well calibrated overall (calibration slope 0.97 (95% CI: 0.95 to 0.99)) and in deciles of predicted 5-year risks and had a C-index of 0.71 (95% CI: 0.68 to 0.74) for discriminating between affected and unaffected women.31

By explicitly modelling the effects of PVs in the new cancer susceptibility genes, the models provide personalised cancer risks of PV carriers when combined with QRFs, MD and PRS. Although the numbers affected by these changes will be small at population level, for individuals with *RAD51C*, *RAD51D* and *BARD1* PVs and their families, the updated risks will be clinically important. *RAD51C*, *RAD51D* and *BARD1* (like *ATM* and *CHEK2*) would be classified as ‘moderate risk’ BC genes based on the average risks.15 However, according to the BOADICEA predictions, over half (56%–59%) of carriers of PVs in these genes in the population would be reclassified from being in the moderate BC risk category to either being near-population risk (34%–44%) or high risk (15%–22%), if data on the other RFs were incorporated (online supplemental table S1). Such changes may have important implications for discussions around earlier or more frequent screening or on risk-reduction options for these women. Similarly, based on the multifactorial EOC model ~38% of *PALB2* PV carriers will have lifetime EOC risks of >5% (online supplemental table S2), which may influence recommendations on the timing of risk-reducing surgery.

As previously, the models assume that the effects of the PVs in the new genes interact multiplicatively with the PRS and the RFs. No studies have yet assessed the joint effects of PVs in these genes and the PRS or RFs. Previous results for *CHEK2* and *ATM* suggest that the multiplicative model holds true for earlier versions of the PRS.32–34 Unlike *CHEK2* and *ATM*, however, the new genes predispose more strongly to ER-negative disease, and the combined effect may depart from the multiplicative assumption. Demonstrating this explicitly for the new genes will be challenging given the rarity of the mutations. The multiplicative model has also been shown to be reasonable for the combined effects of PRS and RFs,35 but there is as yet no large-scale evaluation of the combined effects of PVs and RFs. However, recent prospective validation studies of the current and previous versions of the models suggest that, overall, the models fit well.11 ,31 Should deviations from the multiplicative model between these PVs and RFs emerge, the model can be updated to take them into account.

Both the BC and EOC models incorporate PVs’ effects using the estimated population allele frequencies and RRs. These are combined with reference population incidences to calculate absolute risks while constraining the overall incidences over the RFs included in the model. Our implementation used RR and allele frequency estimates from the largest available studies on those of European ancestry.15 These were assumed to be constant across all countries. Available data are currently too sparse to obtain country-specific estimates. Although there is no evidence that RRs vary among populations, the allele frequencies are likely to vary to some extent.15 This is most apparent for *CHEK2*, where the founder c.1100delC variant (p.Thr367Metfs*15) is common in northwest Europe with carrier frequencies between 0.3% and 1.2%36 and explains the majority of carriers but is rare or absent in other populations. If population-specific variant frequencies can be generated, the model can be easily updated to accommodate these. Nevertheless, by allowing population incidences to vary by country, the predicted absolute risks given by the models are country-specific.

The updated age-specific distributions of tumour ER and TN status for six of the BC susceptibility genes in the model (*PALB2*, *CHEK2*, *ATM*, *BARD1*, *RAD51C* and *RAD51D*) should allow better differentiation between PVs that may be present in a family and provide age-specific and gene-specific mutation carrier probabilities consistent with the prevalence of PVs observed in Mavaddat *et al*.19 We note, however, that estimates are more uncertain at very young and very old ages, where the data are sparse, and more extensive validation may be required in these age-groups. Since PV carrier probabilities are used internally in the models, these will also impact the predicted absolute risks for all unaffected individuals if information on tumour characteristics is available for affected relatives whether or not they carry a PV.

We have developed a novel methodological approach for including continuous RFs into the models. We demonstrated this by including height in both the BC and EOC models, allowing for more nuanced predictions and improving the risk discrimination. While the resulting discrimination based on height alone is modest, the framework will allow other more predictive RFs to be included in the model if accurate risk estimates become available. The most important example is MD: continuous measures of MD, available through tools such as STRATUS, CUMULUS and Volpara,37–39 have been shown to have stronger associations with BC risk than the categorical BI-RADS system. Other examples include BMI and ages at menarche and menopause. Further, the method could be applied to the joint distribution of several continuous risk factors, where the integrals in equations (1) and (2) become multidimensional integrals.

We have further refined the method for creating cohort incidences from calendar period incidences (online supplemental material). The approach provides incidences that are less sensitive to year-on-year fluctuations by averaging over all years in the birth cohort. This method is particularly useful for cancers with low incidences, such as EOC and male BC, where the population size is small, and there is no prior averaging over calendar years. The refinement will have little effect on incidences from larger countries.

Our models have certain limitations. No single dataset containing all the required information was available to construct the multifactorial models, so the models were extended via a synthetic approach. The new model parameters were taken from extensive, well-designed published studies together with existing parameters from model fitting.9 10 We and others have used this approach for developing previous versions of the models,8 11 12 21 40 41 which have been shown to provide clinically valid predictions.42 43 ,31 As is the case for the previous versions, the updates presented here are primarily based on studies of those of European ancestry in developed countries. There is little evidence that the RRs associated with PVs differ by ancestry. The PV frequencies are also broadly similar across populations, except for specific founder mutations and *CHEK2* PVs, which have a much higher frequency in European than non-European populations. However, other parameters in the model, including RF and PRS distributions, will differ by population, and the model will need to be adapted for use in non-European ancestry populations and developing countries. The synthetic approach presented here allows the model to be easily customised to other populations as better estimates become available.44 45 Although we used the associations between PVs and tumour ER and TN status, the models do not currently consider the associations with intrinsic BC subtypes based on combinations of ER, progesterone receptor, HER2 and/or grade.19 The methodology described here could be used to further extend the models to consider these BC subtypes. Finally, the models make the simplifying assumption that PVs in the assumed BC and EOC susceptibility genes are associated with similar risks to those for truncating variants. These would include missense variants which have similar risks to truncating variants. However, there is evidence that missense variants in *CHEK2* and *ATM* are associated with BC risk, which may be different from the risks for truncating variants.46 The models would not be applicable to carriers of such variants.

The new model features have been built on the established and well-validated BOADICEA and EOC models.8 11 42 The updated models will allow for more personalised risk assessment and can help guide decisions on screening, prevention, risk factor modification or other risk-reducing options. The models presented are now available for use by healthcare professionals through the user-friendly CanRisk webtool (www.canrisk.org, CanRisk V.2).

## Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. The models are implemented in the CanRisk tool, which is freely available at www.canrisk.org.

## Ethics statements

### Patient consent for publication

### Ethics approval

Not applicable.

## References

## Supplementary materials

## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

Contributors Conceptualisation: ACA, PPDP, DFE; Methodology: AL, ACA, DFE, PPDP; Software: AL, AC, TC, LF, SA, FMW, MT, JR, JU-S, ACA; Investigation: AL, ACA, PPDP, DFE; Formal analysis: AL, NM, LF; Funding acquisition: ACA, MKS, PD, JS; Resources: PPDP, DFE, ACA, VZ, HJ, EM-F, ADP, MR, TMM; Supervision: ACA; Visualisation: AL; Writing–original draft: AL, DFE, ACA; Writing–review and editing: all authors.

Funding This work has been supported by grants from Cancer Research UK (C12292/A20861 and PPRPGM-Nov20\100002); the European Union’s Horizon 2020 research and innovation programme under grant agreement numbers 633784 (B-CAST) and 634935 (BRIDGES); the PERSPECTIVE I&I project, which the Government of Canada funds through Genome Canada (#13529) and the Canadian Institutes of Health Research (#155865), the Ministère de l’Économie et de l’Innovation du Québec through Genome Québec, the Quebec Breast Cancer Foundation, the CHU de Quebec Foundation and the Ontario Research Fund. MT is supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014).

Competing interests The authors ACA, DFE, AL, AC and TC are named inventors of BOADICEA v5 commercialised by Cambridge Enterprise. AL is now employed by Illumina.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.