Introduction

In the majority of high-income Western countries, breast cancer screening is systematic and population-based, and this has contributed to an improvement in survival1. By contrast, screening in the majority of Asian countries is opportunistic and suffers from poor uptake, contributing to delayed detection and poor survival2. In addition, there are concerns about the appropriate starting age of screening, as women are recommended to start screening at age 50 in many Asian countries, even though the peak breast cancer incidence in Asian populations is between 40 and 50 years of age3. Taken together with the rapidly increasing incidence of breast cancer in Asia4, there is thus an urgent need to develop an appropriate screening strategy for Asian women.

Provision of genetic counselling and genetic testing for rare variants in breast cancer predisposition genes such as BRCA1 and BRCA2 can lead to better management of risk, but these only explain a small fraction of breast cancer cases in the general population5. Risk profiles based on a combination of low penetrance but common breast cancer susceptibility single nucleotide polymorphisms (SNPs), summarised as polygenic risk scores (PRS), have been shown to be an important predictor of disease risk6,7,8. A 313-SNP PRS developed in European populations has improved predictive power compared to earlier PRS based on fewer SNPs;6,7 this PRS demonstrated similar associations with disease risk in eleven independent prospective studies. Studies in European populations have demonstrated that PRS substantially improve discrimination, in comparison to risk prediction models based on classical risk factors alone8,9. In particular, using the recent extension of the BOADICEA model, it has been demonstrated that the 313-SNP PRS provides greater level of risk stratification in the population than epidemiological risk factors alone, and that the greatest level of risk stratification is achieved when both the PRS and epidemiological risk factors are considered jointly10. Screening trials11,12,13 in women of predominantly European descent are ongoing to evaluate personalised breast cancer screening programme based on a woman’s individual risk of disease, as a means of improving screening efficiency14.

Although there have been several efforts to create an Asian-specific PRS, these have been limited by the smaller sample size of Asian genetic studies15,16,17,18,19. Only ~20% of the existing breast cancer genome-wide association study (GWAS) data are from women of Asian ancestry20,21. This limits the precision in the relative risk estimates for individual variants, which is critical for development of predictive PRS. Furthermore, Asian populations are ethnically and genetically diverse22, and genetic associations with breast cancer risk may vary by ancestry. Here, we evaluate the predictive ability of the 313-SNP PRS developed for European women for predicting breast cancer risk in Asian women, using data from 17,262 cases and 17,695 control women of Asian ancestry, from 10 studies based in Asian countries and three studies from North America, participating in the Breast Cancer Association Consortium (BCAC); and 10,255 Chinese women from a prospective cohort. We also evaluate the heterogeneity in the associations with breast cancer risk by ethnicity. We show that European ancestry-based PRS is predictive of breast cancer risk in Asian women.

Results

SNPs included in PRS analyses

To ensure accurate determination of PRS in the ethnic-specific analyses, 26 of the 313 SNPs with imputation accuracy scores <0.9, based in the Malaysian Breast Cancer Genetic Study (MyBrCa) and Singapore Breast Cancer Cohort (SGBCC) of 6900 cases and 7606 controls, combined, were excluded. Hence, the PRS was constructed using 287 SNPs for all BCAC studies (Supplementary Table 1). For the Singapore Chinese Health Study (SCHS), 229 of the 287 SNPs that were polymorphic and could be imputed in this dataset were used for PRS derivation. To compare the PRS performance with that in women of European ancestry, we recalculated the PRS using these sets of SNPs in the validation and prospective cohorts of European women described in Mavaddat et al.7.

For association analyses between PRS and overall breast cancer, the PRS was calculated using overall breast cancer weights while for association analyses between PRS and subtype-specific breast cancer, subtype-specific PRSs were constructed using the same set of SNPs but weights from the hybrid method described by Mavaddat et al.7 [see section “Methods”]. The list of SNPs and corresponding weights used to construct the 287-SNP PRS and 229-SNP PRS are provided in Supplementary Data 1.

PRS and breast cancer risk in Asian women living in Asia

Data on 15,755 invasive cases and 16,483 control women from 10 Asian studies in BCAC were included (Supplementary Table 1). The mean of the 287-SNP PRS was markedly higher in Asian women compared to European women for overall breast cancer PRS (PRSOVERALL), ER-positive PRS (PRSER+) and ER-negative PRS (PRSER−), while the standard deviations (SDs) were slightly lower in Asian controls [versus European controls] for all three PRSs (0.556 [0.597], 0.592 [0.638] and 0.533 [0.567], respectively, Table 1). The remaining analyses for 287-SNP PRSs in this manuscript are presented in terms of the PRS standardised to the SD in the European controls.

Table 1 Mean and standard deviation of 287-SNP and 229-SNP polygenic risk scores.

Table 2 shows the estimated odds ratio (OR) per unit increase of standardised PRSs for overall and subtype-specific breast cancer. For overall breast cancer, the estimated OR per SD was 1.52 (95% confidence interval (CI): 1.49–1.56). For subtype-specific disease, the estimated OR per SD of the subtype-specific PRS was 1.62 (95% CI: 1.57–1.67) for ER-positive and 1.41 (95% CI: 1.36–1.46) for ER-negative disease. There was no evidence of heterogeneity among studies genotyped with either the iCOGS or OncoArray (p values for heterogeneity >0.05 [chi-squared test], Fig. 1). For overall breast cancer and ER-positive disease, the ORs per SD of the PRS were slightly higher in OncoArray genotyped studies compared to the studies genotyped with iCOGS. However, the confidence intervals for the array-specific estimates overlapped (Fig. 1). There was no evidence that the effect of PRS was modified by age (p value of interaction < 0.05 [Student’s t test]; Supplementary Table 2). When analyses were stratified by 10-year age groups, the ORs per SD of the PRS by age group were similar (Supplementary Table 3).

Table 2 Association between standardised polygenic risk scores and breast cancer risk.
Fig. 1: Association between standardised 287-SNP polygenic risk scores and breast cancer risk.
figure 1

Panel a shows the results for iCogs array by study and panel b shows the results for Oncoarray. The squares represent the odds ratios (ORs) and the horizontal lines represent the corresponding 95% confidence intervals. Overall estimates within genotyping array were obtained by combining the estimates across studies using fixed-effect meta-analysis, represented by the diamond shape. I-squared and p value (two-sided) for heterogeneity were obtained by fitting a random-effects model and using generalised Q-statistic estimator (the rma() command in R). The sample size of individual studies are listed in Supplementary Table 1. The ORs and corresponding 95% confidence intervals are provided as a Source Data file.

The association between the PRSs and breast cancer risk by PRS percentile are shown in Fig. 2 and Supplementary Table 4. Compared to women in the middle quintile (40–60%), the observed OR of developing overall breast cancer for women in the highest and lowest 1% of the PRS distribution was 2.72 (95% CI: 2.24–3.29) and 0.38 (95% CI: 0.27–0.52), respectively. Women in the highest and lowest 1% of the ER-specific PRSs had 2.84 (95% CI: 2.30–3.49)- and 0.25 (95% CI: 0.16–0.39)-fold risk, respectively, for ER-positive disease, and 2.29 (95% CI: 1.77–2.97)- and 0.57 (95% CI: 0.36–0.90)-fold risk, respectively, for ER-negative disease. The observed ORs by PRS percentile did not differ from those predicted under a theoretical polygenic model in which the log OR depends-linearly on the PRS: all predicted ORs fall within the confidence intervals of the observed ORs (Fig. 2; Supplementary Table 4).

Fig. 2: Association between percentiles of 287-SNP polygenic risk scores (PRS) and breast cancer risk in combined Asian studies.
figure 2

The results for overall breast cancer, oestrogen-receptor (ER)-positive breast cancer and ER-negative breast cancer are shown in Fig. 2a–c, respectively. The squares/dots represent the odds ratios (ORs) and the vertical lines represent the corresponding 95% confidence intervals, with middle quintile (40–60th) as the reference category. Solid lines represent the observed ORs, black dashed lines represent the predicted ORs of PRSs under a multiplicative polygenic model in the Asian population and the red dashed line represent the predicted OR in the European population. The analysis was conducted using 15,755 cases and 16,438 controls. Of 15,755 cases, 9989 were ER-positive breast cancer while 4611 were ER-negative breast cancer. Source data are provided in Supplementary Table 5.

Table 3 shows the association between family history of breast cancer and overall/ER-specific breast cancer risk, adjusted and unadjusted for standardised overall/ER-specific PRSs. Family history information was not available for all cases in Seoul Breast Cancer Study (genotyped on OncoArray) or control women in the second batch of Singapore Breast Cancer Cohort, hence both studies were excluded from these analyses. The percentage attenuation in the log ORs for family history after adjusting for PRSs was 10.0% for overall breast cancer (unadjusted family history OR = 1.35, adjusted OR = 1.31), 7.3% for ER-positive breast cancer (unadjusted OR = 1.36, adjusted OR = 1.33) and 13.2% for ER-negative breast cancer (unadjusted OR = 1.2, adjusted OR = 1.18). There was no evidence of interaction between the PRSs and family history (p values ≥ 0.05 [Student’s t test], Supplementary Table 2). Including family history in the model, in addition to the PRS, increased the AUC only slightly (0.616 vs. 0.613 for PRS alone; Table 2).

Table 3 First-degree family history of breast cancer and breast cancer risk in Asian studies.

PRS and breast cancer risk in Chinese, Malays and Indians

Analyses by ethnic subgroup were limited to 6900 invasive cases and 7506 controls participating in MyBrCa and SGBCC studies (Supplementary Table 1). Malaysia and Singapore are ethnically diverse, with the majority of individuals identifying as Chinese, Malay or Indian. Principal component analysis showed that these ethnic groups can be distinguished based on genetic data; however, the distribution of the first two principal components for each ethnic group was similar between the two countries (Supplementary Fig. 1). Hence for the purposes of this analysis, women belonging to the same ethnic group from the two countries were analysed together.

Table 4 summarises the characteristics of the study participants by self-reported ethnicity. The majority of the participants were Chinese (72%), while 17% and 11% were Malay and Indian, respectively. The mean PRS was markedly higher in Chinese and Malay women compared to European women, with the mean being highest for Chinese women. The mean for Indian women was intermediate between those for Chinese and Malay women and those for European women (Tables 1 and 4). The PRS SDs of Malay and Indian controls were similar to that of women of European ancestry, while Chinese’s SDs were slightly lower.

Table 4 Characteristics of women in Malaysian Breast Cancer Genetic Study and Singapore Breast Cancer Cohort.

The breast cancer OR per SD of the 287-SNP PRSs and the discriminatory accuracy, measured by area under the receiver operating characteristic curve (AUC), was similar across the three ethnic groups (heterogeneity p values > 0.05 [chi-squared test]; AUCs for overall breast cancer were 0.60–0.62, for ER-positive disease were 0.62–0.63 and for ER-negative disease were 0.57–0.60; Fig. 3). OR estimates by percentiles for overall breast cancer risk, compared to the middle quintile are shown in Fig. 4 and Supplementary Table 5. The OR estimates were similar across ethnicities, except that for the highest 10% of the PRS distribution, where Chinese had a higher OR (2.19, 95% CI: 1.91–2.52) compared to Malays (1.79 95% CI: 1.35–2.37) and Indians (1.57, 95% CI: 1.09–2.26). However, the confidence intervals of the ethnicity-specific estimates overlapped (Fig. 4).

Fig. 3: Association between standardised PRSs and breast cancer risk in Chinese, Malay and Indian women from Malaysia and Singapore.
figure 3

Odds ratios (ORs) and AUCs were generated using data from Malaysia Breast Cancer Genetics (MyBrCa) and Singapore Breast Cancer Cohort (SGBCC) studies, stratified by ethnicity. The squares represent the odds ratios (ORs), the horizontal lines represent the corresponding 95% confidence intervals and the diamond shapes represent the overall estimates. I-squared and p value (two-sided) for heterogeneity were obtained by fitting a random-effects model and using generalised Q-statistic estimator (the rma() command in R). The number of cases and controls for each ethnicity by breast cancer subtypes are tabulated in Table 4. The sample size, ORs and corresponding 95% confidence intervals are also provided in the Source Data file.

Fig. 4: Association between percentiles of 287-SNP polygenic risk scores and overall breast cancer risk in Chinese, Malay and Indian women from Malaysia and Singapore.
figure 4

Results were generated using 5236/5516 Chinese cases/controls, 1084/1332 Malay cases/controls and 580/1018 Indian cases/controls from Malaysia Breast Cancer Genetics (MyBrCa) and Singapore Breast Cancer Cohort (SGBCC) studies, stratified by ethnicity. The squares represent the odds ratios (ORs) and the vertical lines represent the corresponding 95% confidence intervals, with middle quintile (40–60th) as the reference category. Solid lines represent the observed ORs and dashed lines represent the predicted ORs of PRS under a multiplicative polygenic model. Source data are provided in Supplementary Table 6.

PRS and breast cancer risk in Asian Americans

The 287-SNP PRS was also evaluated using data from 2719 women of Asian ancestry recruited into three studies from North America (Supplementary Table 1). The means for all PRS were very similar to those in the Asian studies, and markedly higher than those in Europeans. The SDs in controls for all PRSs were similar to those in the Asian studies and somewhat lower than the observed SDs in European controls (Table 1).

Compared to the breast cancer OR per SD in the Asian studies from Asia, the OR per SD of the 287-SNP PRS in the North American studies was smaller (p < 0.05) for overall breast cancer (1.36, 95% CI: 1.25–1.49) and ER-positive breast cancer (1.38, 95% CI: 1.25–1.53), but higher (p < 0.05) for ER-negative breast cancer (1.49, 95% CI: 1.26–1.76, Table 2). Of the three studies included in these analyses, only the Los Angeles County Asian–American Breast Cancer (LAABC) case–control study showed a significant association with breast cancer risk for all three PRSs while the Canadian Breast Cancer (CBC) study showed non-significant association across all PRSs (Fig. 1). However, the heterogeneity in the estimates among studies was not significant.

Prospective evaluation for PRS

We further evaluated the PRS in the prospective Singapore Chinese Health Study (SCHS), using data on 10,255 women, of whom 413 had developed breast cancer (Supplementary Table 1). The mean and SD of the 229-SNP PRS in the prospective study were similar to the mean and SD of 229-SNP PRS in the BCAC Asian studies (Table 1). The estimated hazard ratio (HR) for overall breast cancer, per European-SD of the 229-SNP PRS, was 1.49 (95% CI: 1.33–1.67) and the AUC was 0.610 (Table 2). The estimates were similar to those for the 229-SNP PRS in Asian studies (Asian studies from Asia: 1.49 (1.45–1.52); from North American studies: 1.33 (1.22–1.45)) but slightly lower than those in the European studies (1.59 (1.55–1.64)).

Absolute risk of developing breast cancer by PRS percentiles

Absolute lifetime and 10-year breast cancer risks by 287 SNP PRS percentile were derived by combining the estimated overall breast cancer ORs from BCAC Asian studies (Supplementary Table 4) and the breast cancer incidence and mortality rates for Chinese, Malay and Indian women in Singapore23,24 (Table 5; Supplementary Fig. 2). The risks of developing breast cancer by age 80 for women in the lowest and highest 1% of the PRS distribution were ~2% and ~13–16%, respectively, depending on ethnicity. For women between the 90 and 99th percentiles of the risk distribution, the lifetime risks vary from 9 to 13%. Assuming that a 10-year absolute risk threshold of 2.3% (approximately the 10-year risk from age 50 in women of European descent25) is used to define women at sufficient risk to justify screening, Chinese and Malay women in the highest 1% of the PRS distribution would reach this threshold by age 35, while Indian women in the highest 1% would reach the threshold at age 39 years.

Table 5 Absolute risk of developing overall breast cancer by percentiles.

We also determined the proportion of women in the general population who would have 10-year absolute risk above the risk threshold (2.3%) at some point in their life. The maximum 10-year absolute risk for Chinese women in the highest 25%, Malay women in the highest 16% and Indian women in the highest 17% of the PRS distribution were greater than 2.3%. Offering screening to these women would capture ~40%, ~27% and ~28% of all breast cancer cases in the Chinese, Malay and Indian populations, respectively (Supplementary Fig. 3).

Comparison with other PRSs

We compared the predictive performance of the 287-SNP PRS for overall breast cancer with five PRSs15,17,19,26,27, which were previously developed or evaluated using data from Asian populations. Of these 5 PRSs, one was developed using iCogs genotyped studies in BCAC and 744 samples from MyBrCa study15. To avoid the potential of overfitting and to enable direct comparison between PRSs, we limited the analyses to OncoArray genotyped studies only (excluding 744 samples from MyBrCa study). We also recalculate the 287-SNP PRS using the same samples. The list of SNPs and corresponding weights as reported in the literature are given in Supplementary Table 6. The ORs per one SD of the 5 Asian PRSs were between 1.10 and 1.41 and corresponding AUCs were between 0.533 and 0.586, substantially lower than that for the European-ancestry based 287-SNP PRS (Table 6).

Table 6 Association between Asian-specific PRSs and overall breast cancer risk.

Discussion

To date, the utility of incorporating common genetic variants into breast cancer risk prediction models has predominantly been investigated in women of European descent. Previous efforts in Asian studies thus far have focused on the development of Asian-specific PRS, and have been limited by small sample size. Given the difficulties of defining population-specific PRS, a more practical question is whether the PRS developed using data from women of European ancestry is predictive of risk for women of Asian ancestry. In this study, using the largest available data of Asian women, we independently evaluated the predictive performance of PRS developed based on 287 variants.

Our study showed that the European-ancestry PRS was predictive of overall breast cancer risk for Asians. The magnitudes of association were generally consistent across the ten participating case–control Asian studies and the prospective Singaporean Chinese study. The association was also consistent across the three ethnic groups in Malaysia and Singapore, suggesting that the PRS is associated with similar relative risk estimates in all three ethnicities, though the confidence intervals for Malays and Indians are wide.

The estimated effect size and AUC of both the 287-SNP PRS and 229-SNP PRS were slightly lower than that observed in women of European ancestry. We evaluated the individual association of the 287 SNPs with overall breast cancer risk in Chinese, Malays and Indians separately and compared with the effect sizes in women of European ancestry (Supplementary Data 1). The intraclass correlation coefficients (ICC), taking into account standard errors of estimates, was estimated to be >0.7 for all ethnicities. These results indicate that the susceptibility variants in both populations are largely similar and confer similar relative risks, the lower effect size and AUC may arise from different patterns of linkage disequilibrium. Notably, our analyses showed that the Asian-specific PRS which included only five Asian-specific SNPs27, achieved AUC of 0.562 (Table 6), suggesting the development of more accurate PRSs in the Asian population is possible when larger cohorts of Asians becomes available to identify population-specifc SNPs.

The mean for the 287-SNP PRS was markedly higher in Asian populations than European populations, but the SD was slightly lower in Asians than Europeans. The lower variation (SD) may reflect the different allele frequency distributions: of the 287 SNPs that are common in women of European ancestry (minor allele frequency > 0.05), 43 are rare in Asian women and therefore contribute minimally to the PRS. In this paper, we have standardised the PRS to the European SD to enable comparison of the performance of the PRS in European and Asian populations. A more relevant approach is to standardised the PRS to the Asian SD, in which case the overall breast cancer OR per unit increase in PRS would be decreased to 1.48 (95% CI: 1.44–1.52). Taken together, these results highlight the need to calibrate the PRS distribution to enable risk models developed based on one population (e.g. Europeans) to be used in another population (e.g. Asians).

The 287-SNP PRS had a lower predictive performance for overall breast cancer among Asians from the three North American studies, compared to the Asian or European studies (Table 2). This somewhat surprising observation might be due to chance, but might reflect a greater admixture with non-Asian ancestry populations, or a greater variation in the distribution of lifestyle factors26 leading to a greater variation in risk of breast cancer. Larger studies of Asian women in non-Asian countries are needed to provide more reliable estimates.

For subtype analyses using ER-specific PRS, we observed greater discrimination for ER-positive than ER-negative disease. This difference was also seen in European studies, and reflects the fact that the majority of risk SNPs are more strongly associated with ER-positive than ER-negative disease.

The majority of breast cancer studies have been conducted in populations of European descent and, as a result, the screening guidelines for Asian women are often based on those developed in Europe or North America28,29. In high income countries with predominantly women of European descent, personalised screening strategy based on age and PRS rather than age alone could reduce the number of people eligible for screening30, thus potentially reducing overdiagnosis, overtreatment and false-positive diagnoses, which could lead to anxiety and stress in women who have gone for screening14. In the Asian context, however, a more cogent argument for stratified screening is to target limited screening resources on those women most likely to benefit. Based on the OR estimated in our analyses, and assuming that a 10-year absolute risk threshold of 2.3% is an appropriate threshold for screening, the majority of Asian women living in the Asian country with the highest population risk of breast cancer (Singapore) would never reach this threshold (Table 5; Supplementary Fig. 2). Notably, only ~25% of Chinese women, ~16% of Malay women and ~17% of Indian women, would reach this threshold at any point in their lives. It is important to note, however, that Asians will experience a substantial increase in breast cancer incidence over the next decade, and it will therefore be necessary to revisit the screening recommendations over time. To explore this, we simulated the 10-year absolute breast cancer risk of Chinese women using Australian breast cancer incidence31, which is about twice of that in Singapore (Supplementary Fig. 4). Assuming the breast cancer ORs associated with the PRS remain similar to those estimated here, those who are in the 60–80th percentile of the risk distribution, which would be classified as a low-risk group for screening based on current incidence, would reach the risk threshold for screening at age 45 based on the increased incidences. If the incidence rate reaches that of Western European countries, a similar proportion of women (~20%) would not meet screening threshold at any age7.

Our study has some limitations worth noting. Although we used the largest dataset of Asian women available to date to evaluate the performance of PRS, the sample size was still too limited to provide precise relative risk estimates for the extremes of the PRS distribution, particularly for ER-specific disease. The majority of the data in the BCAC dataset were generated with the OncoArray, however, ~27% samples were genotyped using iCOGS array, which has lower genome-coverage. Of the 287 SNPs, 42 SNPs have imputation score between 0.75 and 0.9, while 53 SNPs have imputation score below 0.75 in the iCOGs dataset. This may explain in part the evidence for some heterogeneity in effect sizes between iCOGS and OncoArray datasets. The attenuation (10%) in the effect size of family history of breast cancer on breast cancer risk after adjusting for the 287-SNP PRS is consistent with the predicted contribution of the SNPs to the twofold familial risk of breast cancer for 287 SNPs (~11%, based on an overall OR per Asian SD of 1.488). It is important to note, however, that the estimated association of family history on breast cancer risk (OR = 1.35) is lower compared with other studies (OR = 1.8–3.9 in European studies32,33,34 and OR = 1.52–2.1 in Asian studies16,26). This might be due to inaccuracies in the family history data. The control women in the largest study (MyBrCa) contributing to these analyses, accounted for ~30% of the total data, were recruited through opportunistic screening which may be enriched for family history relative to the cases. In addition, there was evidence of heterogeneity (I2 = 66.1%, p value < 0.0001 [chi-squared test]) in the effect sizes of association between family history and breast cancer risk across Asian studies.

In summary, we have shown that a PRS based on common breast cancer susceptibility variants identified in women of European ancestry is a strong predictor of breast cancer risk in Asian women. Furthermore, even though Asians are genetically diverse, our study shows that the PRS derived from women of European ancestry work equivalently well across the diverse ethnic groups in Asia. In the meantime, the PRS developed using data from large European-ancestry studies (providing this is recalibrated to the Asian population being tested) may be used as the basis for Asian-specific breast cancer risk prediction models that include the PRS as well as other predictors of breast cancer risk. These models will allow for higher levels of risk stratification to be achieved, as recently demonstrated in women of European ancestry10. Such risk assessment tools could help in resource planning, especially in low- and middle-income countries where resources are limited and population-based screening is unavailable, to improve the efficiency of personalised screening.

Methods

Study populations

The study participants were 45,233 women of Asian ancestry from three sources: (a) 32,238 women (15,755 invasive cases and 16,483 controls) participating in 10 Asian studies in Breast Cancer Association Consortium (BCAC); (b) 2719 women (1507 invasive cases and 1212 controls) of Asian ancestry participating in 3 north American population-based case–control studies in BCAC; and (c) 10,266 women of Chinese ethnicity participating in Singapore Chinese Health Study (SCHS32,33). SCHS is a population-based prospective cohort study. Of the total of 10,255 women aged 43–75 years who had not had any cancer diagnosis prior to recruitment, 413 registry-confirmed breast cancers developed over 195,317.2 person years of prospective follow-up. Follow-up started 6 months after recruitment and was censored at age of breast cancer diagnosis, age at last known non-breast cancer status, or age on 31 December 2015, whichever came first. Supplementary Table 1 shows study design and number of breast cancer cases and controls for individual studies. Comparative results for European women were obtained from (a) 4926 cases and 4979 controls from 26 population-based case–control studies participating in BCAC and included in the validation analysis in Mavaddat et al.7 and (b) ten nested case–control studies within prospective cohorts in BCAC, comprising 11,225 cases and 17,788 controls, included in the test dataset in Mavaddat et al.7, but excluding subjects <80 years old and for whom age was unknown. All studies were approved by the relevant institutional ethics committees and review boards, and all participants provided written informed consent.

Genotyping methods

All samples in BCAC studies were genotyped using one of two arrays: the ~211,155-SNP iCOGS array and the ~533,000-SNP OncoArray34. Genotype calling, quality control procedure and imputation has been described previously20,21. Briefly, samples found to be genotypically not female, discordant or cryptic duplicate pairs, and samples with assay call rate <95% and extreme heterozygosity (<5% or >40%, 4.89 SD from the mean for the ethnicity), were excluded. For first-degree relative pairs, the control was removed from the case–control pairs; otherwise the sample with the lower call rate was excluded. SNPs with assay call rate <95% and deviation from Hardy–Weinberg equilibrium in controls at p < 10−7 in controls or p < 10−12 for cases were excluded. The iCOGS and OncoArray datasets were imputed separately using a two-stage imputation approach, using SHAPEIT235 for phasing and IMPUTE236 for imputation, with 1000 Genomes Project (Phase 3) data as the reference panel37.

Samples in the prospective study (SCHS) were genotyped using Illumina Global Screening Array. Samples with call rate < 95% and extremes in heterozygosity were excluded. For first- and second-degree relative pairs, the sample with the lower call rate was excluded. Data were imputed using IMPUTE2 with 1000 Genomes Project (Phase 3) as reference panel. Only non-monomorphic SNPs in East Asian population in the reference panel were imputed.

Post-imputation quality was based on the imputation accuracy score INFOSCORE as provided by IMPUTE236. This metric takes values between 0 and 1, with higher values indicating higher imputation certainty and 1 implying perfect imputation.

Principal components analyses were used to identify ethnic outliers and define ancestry informative covariates. For the BCAC data, continental ancestry was derived by combining the data with the 1000 Genomes Project reference data34. Individuals with >40% estimated East Asian ancestry were retained. In the second stage, principal components were generated on the Asian ancestry individuals using a subset of uncorrelated SNPs. Similar ancestry informative principal components were generated on the SCHS dataset.

Statistical methods

The analyses were based on the 313-SNP PRS developed in women of European ancestry7. SNPs with an imputation accuracy score <0.9, based in the MyBrCa and SGBCC studies, combined, were excluded; to ensure accurate determination of PRSs in the ethnic-specific analyses.

We derived PRS for overall breast cancer using Eq. (1)

$${\mathrm{PRS}}_{{\mathrm{overall}}} = \beta _1x_1 + \beta _2x_2 + \cdots + \beta _kx_k + \cdots + \beta _nx_n,$$
(1)

where xk is the dosage of risk allele (0–2) for SNP k and βk is the corresponding weight. To avoid bias due to overfitting, we used the weights previously derived for women European ancestry7. The ER-specific PRSs (denoted as PRSER+ for ER-positive PRS and PRSER− for ER-negative PRS) used same set of SNPs but weights from the hybrid method as reported in Mavaddat et al.7; the hybrid method assigns subtype-specific weights to a subset of SNPs for which the effect sizes differ significantly by subtype. The list of SNPs and the corresponding weight are provided in Supplementary Data 1. To enable direct comparison of the performance of each PRS with those reported in European women, we standardised the PRSs by dividing the PRSs of each individual by the SD) of the PRSs in the control subjects from the population-based case–control series in European studies.

Logistic regression models were used to estimate ORs for the association between the standardised PRSs and breast cancer risk. The overall breast cancer PRS was used as predictor in association analyses between overall breast cancer and PRS while for subtype-specific analyses, ER-specific PRS were used as predictors. The PRS were treated as either a continuous or categorical predictor in the model. When used as a categorical variable, the PRS was categorised into the following PRS percentile ranges based on the PRS distribution in controls: 1%, 1–5%, 5–10%, 10–20%, 20–40%, 40–60%, 60–80%, 80–90%, 90–95%, 95–99% and 99–100%. The 40–60% category was used as the reference. For ethnic-specific analyses, analyses were stratified by ethnicity (Chinese, Malay and Indian) using only the MyBrCA and SGBCC datasets. All models were adjusted for first ten principal components and study/array/batch; here samples from the same study that were genotyped in two batches (as was the case for MyBrCa and SGBCC) or on both arrays were treated as different strata for the purposes of adjustment. A Cox proportional hazard model was used for the evaluation of the PRS association with overall breast cancer risk in the prospective cohort and HRs per SD of the PRS were estimated.

The discriminatory accuracy of models for predicting breast cancer risk was evaluated using the area under the receiver operating characteristic curve (AUC), adjusted by study. Estimated ORs by PRS quantiles were compared with the predicted ORs under the model in which the PRS is considered as a continuous covariate and the log (OR) is linearly related to the PRS. To determine the proportion of the familial breast cancer risk that could be explained by PRS, we estimated the OR for the association of first-degree family history and breast cancer risk first adjusted for first 10 principal components and study/array/batch, and then additionally adjusted for the PRS.

To evaluate the effect modification of the PRS (as a continuous covariate) by age and family history of breast cancer in first-degree relatives, we included additional interaction terms in the logistic regression model.

The predicted proportion of the familial relative risk of breast cancer explained by the PRS was estimated by noting that the familial relative risk to first degree relatives of affected individuals due to PRS alone is estimated to be \(\lambda _{\mathrm{P}} = {\mathrm{exp}}(\frac{{\gamma ^2}}{2})\), where \(\gamma\) is the OR per one SD (equivalent to the SD of the polygenic risk distribution)38. The proportion of the familial relative risk (on a log scale) due to the PRS was therefore estimated by using Eq. (2):

$$\frac{{{\mathrm{ln}}(\lambda _{\mathrm{P}})}}{{{\mathrm{ln}}\left( \lambda \right)}} = \hat \gamma ^2{\mathrm{/}}2{\mathrm{ln}}(\lambda ),$$
(2)

where \(\lambda\) is the familial relative risk of breast cancer in first degree relatives, assumed to be 2 for breast cancer.

To compare the effect sizes of individuals SNPs and breast cancer risk with those reported in women of European ancestry, we estimated the effect size of the association between individual SNP and breast cancer risk in Chinese, Malays and Indians in MyBrCa and SGBCC studies separately using logistic regression, adjusting for age, study and the first 10 principal components, assuming a log-additive genetic model. Intra-class correlation (ICC) was then used to compare the estimated effect sizes with those reported in Mavaddat et al. (2019)7. To take into account the sampling error of the effect sizes in the ICC estimate, we fitted a hierarchical model of the form given by Eq. (3):

$$y_{{ij}} = \beta _{{ij}} + \delta _{{ij}},$$
(3)

where \(y_{{ij}}\) denotes the parameter estimate of SNP i in population j, \(\beta _{{ij}}\) are the true parameter estimates and \(\delta _{{ij}}\sim {N}(0,\sigma _{{ij}}^2)\) are the sampling errors, with known SDs \(\sigma _{{ij}}\). The model was fitted by using the expectation–maximisation (EM) algorithm39 in which \(\beta _{{ij}}\) were estimated using a weighted mean of the observed estimates \(y_{{ij}}\) and the group mean \(\alpha _{i}^{\left( {k} \right)}\), as given in Eq. (4)

$$\hat \beta _{{ij}}^{({k})} = \frac{{\frac{{\alpha _{i}^{\left( {k} \right)}}}{{\sigma _{R}^2}} + \frac{{y_{{ij}}}}{{\sigma _{{ij}}^2}}}}{{\frac{1}{{\sigma _{R}^2}} + \frac{1}{{\sigma _{{ij}}^2}}}},$$
(4)

in the E-step and the estimated \(\beta _{{ij}}\) were treated as complete data in the M-step to estimate \(\alpha _{i}^{({k} + 1)}\) and \(\sigma _{R}^2\), the within-group variance. This process is iterated until the estimated ICC converged.

The age-specific absolute risks of developing breast cancer, adjusting for competing mortality, in each PRS percentile was calculated using Eq. (5)

$${\mathrm{AR}}_{g}\left( t \right) = \mathop {\sum }\limits_{u = 0}^t \lambda _{g}\left( u \right) \cdot {S}_{g}(u) \cdot {S}_{m}(u),$$
(5)

where \(\lambda _{g}\)(u) is the breast cancer incidence associated with PRS at age u, Sg(u) is the probability of being breast cancer free at age u, and Sm(u) is the probability of not dying from a cause other than breast cancer to age u. The PRS-specific breast cancer incidences, \(\lambda _{g}\)(u), were calculated iteratively by assuming that the average age-specific breast cancer incidence over all PRS percentiles agreed with the population breast cancer incidence6. We calculated lifetime and 10-year absolute risks using Singaporean mortality and breast cancer incidence for Chinese, Malays and Indians23,24. The recommended screening age at 50 years old in many Asian countries is based on European or North American guidelines29 and the average 10-year risk of breast cancer for women of European ancestry at age 50 years old is 2.3%25. Hence, we determined the proportion of women in the general population who would have the 10-year risk of breast cancer above this threshold, using method as described in Pharoah et al.38. To do this, the maximum 10-year absolute risk, adjusting for competing mortality, for women age 20–70, was calculated for each PRS centile category (0–0.1%, …, 99.9–100%), assuming an OR per 1 SD of the PRS of 1.48 (the estimated effect size in Asian studies).

We compared the predictive performance of the European ancestry-based PRS with PRSs that were previously developed or evaluated in Asian populations. The five Asian ancestry-derived PRSs included 5 SNPs15, 51 SNPs17, 44 SNPs19, 6 SNPs26 and 46 SNPs27. The PRSs were derived using Eq. (1) and the corresponding weights reported in the literature. The list of SNPs and corresponding weights are tabulated in Supplementary Table 6.

All statistical analyses were conducted using R v.3.0.3 or Stata v.14.2. Logistic regression and AUC were done using logistic() and comproc() in Stata, Cox proportional hazard model was done using Coxph() in R.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.