Introduction

Breast cancer is the most common malignant form of cancer among occidental women, with hereditary cases representing approximately 5–10%.1 Mutations in high penetrance BRCA1 and BRCA2 genes account for only 25% of families with hereditary breast cancer2 whereas other lower penetrance genes such as ATM, CHEK2, TP53, PTEN, STK11, CDH1, PALB2/FANCN and BRIP1/FANCJ3 explain a much smaller proportion of families with hereditary breast cancer. Deleterious mutations in BRCA1/BRCA2 were identified in 24% of our 256 high-risk French Canadian breast/ovarian cancer families,4 supporting that a significant proportion of high-risk breast cancer families still remain unexplained by either germline mutations in high penetrance BRCA1/2 alleles or other lower penetrance genes.5, 6, 7, 8, 9, 10

Several structural motifs allow the BRCA1 protein to interact with cellular proteins to regulate diverse biological functions such as control of transcription11 and DNA damage repair.12 Therefore, BRCA1-interacting proteins implicated in these cellular pathways represent attractive candidates with regard to breast cancer susceptibility. Based on a candidate gene strategy, four genes encoding BRCA1-interacting proteins, namely AURKA, BAP1, BARD1 and DHX9, were thoroughly investigated in an attempt to identify germline deleterious mutations and/or aberrant splicing events, which could potentially be associated with an increase in breast cancer risk. In this study, familial breast cancer cases were purposely selected. Indeed, such a study design has been demonstrated to substantially decrease the number of cases and controls to achieve the same power, as compared with studies where cases are unselected for family history.13

The AURKA protein is a centrosome-associated Ser/Thr kinase which localizes to the centrosome and could be involved in centrosome duplication,14 maturation and separation, as well as in the control of bipolar spindle assembly.15 AURKA binds and phosphorylates BRCA1, and it was suggested that this BRCA1 phosphorylation plays a role in G2/M transition.16 The AURKA gene has been recently identified as a low-penetrance tumor susceptibility gene,17 and was subsequently analyzed for its implication with breast cancer albeit leading conflicting results.18, 19, 20, 21, 22, 23, 24, 25, 26, 27

BAP1 is an ubiquitin carboxy-terminal hydrolase which binds to the BRCA1 RING finger domain and enhances BRCA1 growth suppression properties.28 BAP1 was suggested to regulate the BRCA1/BRCA2-RAD51 DNA repair complex.29 Given the known implication of BRCA1 in DNA repair12 and its interaction with RNA pol II,11 which is linked to transcription-coupled repair (TCR) processes, a role for BAP1 in TCR has been suggested.30 It has been reported that mutation of an amino acid of BAP1 in its BRCA1-binding domain (L691P) abolishes the BAP1-BRCA1 interaction.28 On the other hand, a study on 47 non-BRCA1/BRCA2 familial breast cancer cases from the French population did not reveal any association with breast cancer predisposition.31

The BARD1 protein interacts with BRCA1 and both proteins form a complex which possesses a dual E3 ubiquitin ligase activity.32 The heterodimer BRCA1/BARD1 promotes its own ubiquitination33 and ubiquitinates RNA pol II subunit A.34 The interaction of BRCA1 with RNA pol II11 and DHX935 suggested a role for BARD1 in TCR. The BARD1 gene has been reported to be targeted by somatic36 and germline37, 38, 39 mutations in breast and in ovarian36, 40 cancer cohorts. BARD1 mutations have been identified in hereditary breast/ovarian cancers from BRCA1/2-negative patients37, 41 and it was suggested that the Cys557Ser variant may represent a low-penetrance breast cancer susceptibility allele.39, 42, 43 However, a recent study did not identify any increased risk of breast cancer associated with specific BARD1 variations,44 whereas another study reported an association of BARD1 haplotypes with either an increased or decreased risk of breast cancer.45

As for the DHX9 helicase, a member of the DEAH helicase family,46 it binds directly to both CBP and pol II and is required for complex formation between both proteins.47 The DHX9 protein (aa 230–325) binds the full-length BRCA1 protein through a subregion of the BRCT domain of BRCA1.35 DHX9 truncating mutations were reported to affect the interaction of both proteins (BRCA1 and RNA pol II) and to result in decreased transcriptional activity of BRCA1,35 thus possibly affecting its function in transcription and TCR. Despite an interaction with BRCA1 and a role in transcriptional activity of BRCA1, DHX9 mutations have not been assessed thus far in non-BRCA1/BRCA2 breast cancer families.

Based on the multiple functions of these BRCA1-interacting genes, they obviously represent attractive candidates that might explain a proportion of the remaining inherited breast cancer cases. The goal of this study being the identification of deleterious mutation, and given that such deleterious disease-causing mutations producing a premature truncation of the protein occur mostly within the coding region, complementary DNA (cDNA) mutation screening was therefore performed. In addition, since alternative splicing is now recognized as the primary source of human proteomic diversity and gene regulation and that an increasing number of disease-causing mutations affecting the correct splicing of genes are identified,48 cDNA material definitively represents a resource of choice as alternative splice forms could not be detected when extensive re-sequencing is performed on genomic DNA.

Thus the goal of this study was to evaluate the possible involvement of AURKA, BAP1, BARD1 and DHX9 germline deleterious mutations or on breast cancer susceptibility by analyzing all mRNA-encoding sequences as well as alternative splicing in cDNA samples from 96 breast cancer cases, in which no BRCA1/2 mutations were detected.

Materials and methods

Ascertainment of families and DNA extraction

High-risk French Canadian breast and/or ovarian families were recruited through a large ongoing interdisciplinary research program designated INHERIT BRCAs. Clinicians (listed in Appendix) were directly involved in this translational research program and were responsible for the BRCA1/2 test result disclosure to participants. Approval was obtained from ethics committees corresponding to the different participating institutions. More details regarding ascertainment criteria, experimental and clinical procedures, as well as the INHERIT BRCAs research program have been described elsewhere.4, 5 Subsequently, another component was designed for the ‘Localization and identification of novel breast cancer susceptibility loci/genes’. Ethics approval for this latter study was also obtained and each participant knowing their inconclusive BRCA1/2 test results status had to sign a specific informed consent for their participation in this component. A subset of 96 high-risk French Canadian breast/ovarian cancer families were recruited in this study according to the ascertainment criteria described previously.5, 7, 9 Recruitment of 98 healthy unrelated French Canadian individuals was conducted on a non-nominative basis, as part of a long term study aiming at the characterization of the genetic variability in human population and approved by the Institutional Ethics Review Board of the Hôpital Sainte-Justine.

DNA extraction, RNA isolation from immortalized cell lines and cDNA synthesis

Lymphocytes from breast cancer individuals were isolated and immortalized as previously described.5 Total RNA extraction, storage and reverse transcription of RNA, as well as genomic DNA extraction from healthy unrelated French Canadian individuals were also performed as previously described.5, 6, 8, 9, 10

PCR amplification and mutation analysis

cDNA amplicons covering the entire mRNA-encoding sequences of the GenBank mRNA records (AURKA: NM_198436.1; BAP1: NM_004656.2; BARD1: NM_000465.1; DHX9: NM_001357.3) were obtained by PCR amplification from cDNA of breast cancer cases using the primer pairs listed in Supplementary Table 1. Direct sequencing and sequence data analysis were performed as previously described.5 Genomic DNA sequences from healthy unrelated French Canadian individuals were obtained using the same direct sequencing method, but with amplification of relevant variant-containing exons from amplicons obtained using intronic primers (Supplementary Table 1). The segregation of rare missense and deletion variants (minor allele frequency; MAF <5%), and variants demonstrating a suggestive difference in genotype frequency between breast cancer cases and unaffected individuals were also analyzed in families for which DNA material was available from multiple individuals.

Each variant was tested for deviation from HWE by means of a χ2 test. All P-values were two-sided with 1 degree of freedom. To estimate the genetic association of each variation or haplotype with breast cancer, differences in genotype distributions between the case and control groups were tested using a χ2 test. P-values less than 0.05 are considered as significant. However, to address the issue of multiple comparison, Bonferroni correction was applied for each estimate of genetic association.

Conservation in other species and computational analyses

Protein alignment was performed using data extracted from the UCSC database, and both the SIFT and PolyPhen web-based softwares were used to predict the effect of amino acid substitutions. All tests were run under default threshold values.

LD analysis and haplotype estimation

To estimate the pattern of LD, the LDA program was used to calculate pairwise LD (D′ and r2) values. Haplotype analysis was performed with the PHASE 2.1.1 software. Haplotype frequencies were estimated using all variants genotyped in both sample series (cases and controls) and a permutation test was conducted to determine the significance of differences in the distribution of inferred haplotypes between both groups. All association tests were run under default conditions, with 1000 permutations. Haplotype blocks were identified using genotyping data from both series using the Haploview software. tSNPs from each LD block were thereafter identified using the same software, and haplotype frequencies were also estimated using only the tSNPs identified.

Results

The entire coding sequence of AURKA, BAP1, BARD1 and DHX9 genes were analyzed using cDNA from 96 high-risk breast cancer individuals, each coming from a distinct high-risk family. No deleterious truncating germline mutations were found. In addition, based on gel electrophoresis and sequence chromatogram analysis, no major alternative splice forms have been detected for any of the genes analyzed, unlikely involving this kind of alteration with breast cancer risk. However, a number of sequence variations were identified within each of these four genes, as described below.

AURKA gene analysis

Five of the 10 variants identified in the AURKA gene are coding substitutions, whereas the remaining five are located within the 3′UTR region (Table 1). Four of the AURKA variants are novel, not being reported in dbSNP database (Build 127). Albeit no significant difference was observed between both sample sets based on single-marker analysis, the novel c.1130T>A variation is present exclusively among three breast cancer cases at the heterozygous state. As for the c.1117A>G (Met373Val) variant present in four breast cancer individuals at the heterozygous state, segregation analysis suggests a possible association with breast cancer. Indeed as illustrated in the pedigree from one of the four carrier's families (Figure 1), all heterozygous carriers are affected with breast cancer whereas among the two non-carriers, one is affected (diagnosed at 45 years) and the other is unaffected (55 years). A DNA sample from an additional family member was also available for two other families. An unaffected first-degree relative (59 years) could be genotyped and was found to be a non-carrier of the Met373Val variant in one family, whereas an affected third-degree relative diagnosed with breast cancer at 74 years was found to be a heterozygote carrier for this variant in the other family (data not shown). No additional family member was available for the remaining family. With the exception of the c.1212+423G>C variant which displays a slight excess of homozygotes (Hardy–Weinberg equilibrium (HWE) P=0.03) among the breast cancer cases, no variants display a significant deviation from HWE.

Table 1 Sequence variations and genotype frequencies of AURKA, BAP1, BARD1 and DHX9 genes in cases and controls
Figure 1
figure 1

Pedigree of AURKA:Met373Val carriers family. The age of onset of cancer, age of death or current age are indicated below family members. The affected case that was sequenced first is designated by an arrow. The genotype call is indicated below the respective individual analyzed. d, deceased.

As shown in Table 2, all variants located in the coding sequence result in an amino acid change. Of these, three residues (Val57, Ala213 and Val377) are completely conserved in distant species (apart from Leu377 in Md), suggesting that they are under strong functional constraint or may have a specific role on protein conformation. The three variants (Ala213Val, Met373Val and Val377Glu) located within the Ser/Thr protein kinase domain are predicted to be not tolerated by SIFT, with Val377Glu also evaluated as probably damaging by the PolyPhen program (Table 3).

Table 2 Non-synonymous sequence variants detected in AURKA, BAP1, BARD1 and DHX9 genes and corresponding residues present in orthologs
Table 3 Location of non-synonymous sequence variants detected in AURKA, BAP1, BARD1 and DHX9 genes and prediction on protein function

A graphical representation of linkage disequilibrium (LD) with both r2 and Lewontin's D′ and using data from both series is shown in Figure 2. Most single nucleotide polymorphism (SNP) pairs display strong LD (0.8 <D1), whereas lower LD is observed for a combination involving SNPs 1, 2, 4 and 6. The r2 coefficient calculated displays a large spectrum of values ranging from 0 to 0.6, given the high dependence of r2 on allele frequency. Analysis of haplotypes estimated with the PHASE program led to six haplotypes exhibiting a frequency 1% (Table 4). Based on solid block algorithm, two LD blocks were identified at the AURKA locus, with LD breakage being located in the region of exon 9 (Figure 3). However, when performing a similar analysis using CEPH/CEU data, three LD blocks are obtained, including two relatively small blocks (0.5 and 1.2 kb) in the 5′ region and a larger block covering intron 1 to intron 7 (17 kb) (data not shown). As indicated in Figure 3, three tagging SNPs (tSNPs) (rs2273535, rs1047972 and rs34189236) have been identified, representing 95% of our estimated haplotypes.

Figure 2
figure 2

Graphical overview of pairwise linkage disequilibrium (LD) measures of (D′) (lower left part) and r2 (higher right part) for each SNPs pair identified in breast cancer cases and healthy individuals. All SNPs are denoted numerically with reference to Table 1. Distance between first and last SNPs for each gene is indicated on the left end on each LD representation.

Table 4 Estimated haplotype frequencies in cases and controls using PHASE
Figure 3
figure 3

Haplotype blocks and tSNPs identified for the AURKA (a), BAP1 (b), BARD1 (c) and DHX9 (d) genes. Tagging SNPs identified on a block-by-block basis using haplotypes showing a frequency higher than 1% are denoted with an asterisk (*). Haplotype frequencies are displayed on the right of each haplotype combination whereas the level of recombination is displayed above the connections between two blocks. Thin connections represent haplotypes with frequencies between 1 and 10%, whereas haplotypes with frequencies higher than 10% are represented by thick lines.

BAP1 gene analysis

Analysis of BAP1 sequence in our breast cancer individuals led to the identification of only three variants (Table 1). Two are novel synonymous variants, c.294C>T being detected only once among our 96 breast cancer individuals, whereas c.1026C>T was present exclusively in two unaffected individuals at the heterozygous state. No significant difference in allele frequency was obtained between both series for any of these nucleotide substitutions based on single-marker analysis.

All BAP1 variations are comprised within one block of strong LD (Figure 3), which is confirmed by the perfect LD (D′=1) observed between the two most distant intragenic SNPs (SNP1 and SNP4) (Figure 2). As displayed in Table 4, PHASE analysis estimated two major haplotypes, which are represented by only one tSNP (rs123598; SNP4) (Figure 3), and both haplotypes display a similar frequency in both sample sets. Given that only three SNPs, all located in intron 3, are available from HapMap data, it was not relevant to perform haplotype and tSNP identification using the CEU population.

BARD1 gene analysis

Among the BARD1 sequence variations identified in this study, six are missense variants, whereas another consists of an in frame 21 bp deletion (seven amino acids) in exon 4 (c.1075del21). Three (c.820A>G, c.1203T>C and c.2212A>G) are considered as novel, being not reported in dbSNP database (Table 1). All variants follow HWE, with the exception of the c.1670G>C variant, where a significant deviation is observed due to the occurrence of one rare homozygote. A significant difference is also observed for the c.1518C>T substitution, which is slightly more frequent within cases (P=0.013, data not shown). However, this association does not reach statistical significance following Bonferroni correction, and segregation of this variant in a large informative family does not reveal any obvious association with breast cancer (data not shown). Of the seven amino acid changes identified, only the Ile738Val residue displays a certain level of conservation in higher species, with the alternative Val residue being observed in rodents and Tetraodon nigroviridis (Table 2). Apart from the deletion of 7 amino acids, in silico evaluation of the potential effect of amino acid changes on the protein function (Table 3) revealed that four amino acid changes are predicted to affect the protein function by at least one of the two programs used, including the Arg378Ser change which is predicted to potentially affect the protein function by both SIFT and PolyPhen softwares.

The graphical representation of the pairwise LD between the 11 variants identified in both series combined is shown in Figure 2. The low pairwise D′ values observed involve mainly SNP8, whereas moderate pairwise values are observed for SNP1. A large spectrum of r2 values ranging from 0 to 1.0 is observed for BARD1. The Haploview software (Figure 3) determined two LD blocks, with breakage of LD located between SNP3 and SNP4 in the region of exon 4. Within these two LD blocks, 2 and 6 tSNPs were identified, block 1 being defined by SNPs 1 and 2, whereas SNPs 4, 6, 8, 9, 10 and 11 represent the second block. Similar analyses using CEPH/CEU data set led to the identification of seven LD blocks (data not shown). The discordance obtained is present mainly in the 5′-region of the BARD1 gene, in which the four haplotype blocks identified from HapMap data are represented by a single block in the French Canadian population. Regarding haplotypes estimation, the analysis led to the identification of 11 haplotypes, all exhibiting a frequency 1% (Table 4). No significant difference in single or global haplotype distribution was identified between both groups.

DHX9 gene analysis

The analysis of DHX9 in our breast cancer individuals led to the identification of five variants, with an additional nucleotide substitution present exclusively among unaffected individuals (Table 1). Out of these variations, two involve an amino acid change, among which c.1873C>G is located in the proximity of the helicase C-terminal domain (aa 631–776). The c.1745C>T (rs2275177) sequence variation demonstrates a significant deviation from HWE in breast cancer cases, where an excess of rare homozygous genotypes is observed. Single-marker analysis did not reveal a significant difference in MAF for any of the nucleotide changes identified. Regarding the amino acid conservation of the Pro89Ala variant in other orthologs, the Pro allele is conserved in the three species in which this residue is available (Canis and rodents, Table 2). The two DHX9 amino acid changes identified are predicted to be possibly damaging by the PolyPhen program. In addition, the Ser625Cys located within the RNA pol II-binding domain (aa 230–650) according to SIFT is predicted to be not tolerated (Table 3).

Most pairs of adjacent SNPs display strong LD (0.8<D1) following LDA analysis. As displayed in Figure 2, r2 values under 0.3 are obtained for all pairwise comparisons. The perfect D′ value associated with the two most distant variations identified is confirmed by Haploview analysis (Figure 3). Three tSNPs were highlighted for DHX9, namely c.915T>G, c.1745C>T and c.1976C>T (Figure 3). Four haplotypes exhibiting a frequency 1% have been estimated with the PHASE program (Table 4).

Discussion

Given that deleterious mutations in BRCA1/BRCA2 were identified in only 24% of the 256 high-risk French Canadian breast/ovarian cancer families,4 the analysis of the coding region of four genes encoding BRCA1-interacting proteins was undertaken in 96 high-risk individuals drawn from non-BRCA1/BRCA2 French Canadian families. To our knowledge, this is the first study to investigate the potential association of DHX9 germline mutations with breast cancer susceptibility in high-risk non-BRCA1/BRCA2 breast cancer families. Although AURKA and BARD1 genes have been investigated in tumor samples, population-based studies or high-risk families, only one investigation assessed the involvement of BAP1 in breast cancer susceptibility.31

The Phe31Ile variant was reported to confer an increased risk of breast cancer, either alone or in combination with Val57Ile.18, 19, 20, 21, 23, 27 However, our study and others did not replicate this association.22, 24, 25, 26 With the exception of c.1212+423G>C (rs8173), evaluated in both breast44 and ovarian49 cancer, none of the 3′UTR AURKA variants identified in our study were assessed in other studies. As for the Met373Val variant, we report here a possible implication of this variant with breast cancer susceptibility, segregation analysis indicating that all heterozygous carriers are affected with breast cancer. However, this rare variant is observed at a similar frequency in affected and unaffected individuals from our study, thus requiring further analyses.

Many BARD1 variants identified in this study have been evaluated in the context of breast cancer. The c.1670G>C variant was associated with an increased risk of breast cancer either in high-risk families or in population-based studies,38, 39 whereas our results and others43, 50 do not support these findings. None of the other variants identified here (that is, c.609A>C, c.1053G>C, c.1075del21 and c.1518C>T) was previously associated with an increased risk of breast cancer,38, 41 despite our results suggesting an association of the c.1518C>T (His506His) with an increased risk of breast cancer (P=0.013), which is however not significant following Bonferroni correction. This finding would nonetheless have to be confirmed in much bigger cohorts.

The BAP1 analysis led to the identification of the c.1026C>T (Ser342Ser) variant, which was reported once in a cohort of 47 French familial breast cancer cases negative for BRCA1/BRCA2 mutations.31 Interestingly the common c.2190+444C>T sequence variation localized in the 3′UTR region was identified in our population with a MAF>5% in both series but was not analyzed in the French cohort.31

As for DHX9 sequence variations, we report here the first assessment of its implication in breast cancer. Of the six sequence variants identified, four are rare, novel variants, two of which being missense variants. The Ser625Cys variation is located within the RNA pol II-binding domain (aa 230–650) close to the C-terminal helicase domain (aa 631–776), whereas the Pro89Ala is situated in the N-terminal basic region capable of binding efficiently the C/H3 region of CBP transcription coactivator (aa 1–250).47

In addition to establishing the allele frequency of each variant in a series of unaffected controls, the effect of the variants identified were also assessed using conservation analysis and web-based programs. Both sequencing chromatograms and gel electrophoresis were thoroughly analyzed to detect any aberrant spliced forms, and we were able to conclude that no major spliced mRNAs are highly expressed in the cDNA material from our high-risk breast cancer individuals. Despite the fact that alternative transcripts expressed at low levels (10% compared to the wild type form) can be detected by sequencing chromatogram analysis,5, 51 we cannot exclude the presence of minor, weakly expressed forms which could not have been detected here. The identification of alternative spliced forms of BARD140, 52, 53 and AURKA54 mRNA have been reported, which is in accordance with UCSC database data and other alternative transcript databases such as ASAP II and Fast DB. As for BAP1, a splice variant was reported in a small cell lung cancer cell line.28 However, discordant results are presented in different databases, UCSC identifying many alternative spliced mRNA species for both BAP1 and DHX9 genes, whereas the database Fast DB identified only few alternative spliced mRNA species for BAP1, and none for DHX9.

In terms of linkage disequilibrium, relatively strong LD (0.8 <D1) is observed for the four genes when using the D′ measure (Figure 2), and the D′ values obtained here are concordant with data from the sole study reporting LD values (D′) for the AURKA gene.22 Using the r2 correlation coefficient, much lower LD values were obtained. Indeed differences between D′ and r2 values are likely to be explained by the fact that r2 is more stringent and sensitive to frequency than D′.55

To our knowledge, the current study is the first to report haplotype estimation for the BAP1 and DHX9 genes. The sole other study assessing BARD1 haplotypes45 reported a significant difference in haplotype distribution between cases and controls for two different haplotypes. Although both haplotypes have indeed been estimated in our analysis (BARD1-h1 and BARD1-h6), this association was not observed. The current study is also the most extensive investigation conducted on AURKA haplotypes, which includes 11 variations encompassing exons 3–9. Our results do not confirm any association of AURKA haplotypes with an increased risk of breast cancer, as previously demonstrated in other studies.19, 22, 23

The identification of haplotype blocks was thereafter conducted for the AURKA, BAP1, BARD1 and DHX9 genes. With the exception of the AURKA gene, for which the 5′-end can be roughly separated into two clusters,22 this analysis was not previously performed or reported for the BAP1, BARD1 and DHX9 genes. The haplotype block identification for BAP1 is concordant with results obtained from HapMap data, whereas similar analyses using French Canadian data for the AURKA, BARD1 and DHX9 genes revealed slight differences when compared with HapMap data (data not shown). However, we have to keep in mind that the greater extent of LD generally found in the French Canadian population can be responsible for a lower number of haplotype blocks.56 In addition such a difference between HapMap and this study may be due to the higher coverage of the gene regions in the HapMap data set which include both exonic and intronic SNPs, the latter not assessed in our study.

The analysis of the coding region of the AURKA, BAP1, BARD1 and DHX9 genes did not identify any deleterious truncating mutation or aberrant splicing mRNAs. We report here a possible association of the BARD1 c.1518C>T variant with an increased risk of breast cancer. Further studies on much bigger cohorts are however needed to fully evaluate the association of this variant with breast cancer risk and to assess the functional contribution of this variant on mRNA stability and protein expression. Haplotype analysis, LD block and tSNPs identification were also conducted for all four genes, and permitted to state that haplotypes displaying a frequency 1% represent more than 94% of all haplotypes estimated for each gene in our population. This first haplotype-based analysis to establish tSNPs for both the BAP1 and DHX9 genes will provide a valuable tool for further large association studies.