Current clinical approaches for mutation discovery are based on short sequence reads (100–300 bp) of exons and flanking splice sites targeted by multigene panels or whole exomes. Short-read sequencing is highly accurate for detection of single nucleotide variants, small indels and simple copy number differences but is of limited use for identifying complex insertions and deletions and other structural rearrangements. We used CRISPR-Cas9 to excise complete BRCA1 and BRCA2 genomic regions from lymphoblast cells of patients with breast cancer, then sequenced these regions with long reads (>10 000 bp) to fully characterise all non-coding regions for structural variation. In a family severely affected with early-onset bilateral breast cancer and with negative (normal) results by gene panel and exome sequencing, we identified an intronic SINE-VNTR-Alu retrotransposon insertion that led to the creation of a pseudoexon in the BRCA1 message and introduced a premature truncation. This combination of CRISPR–Cas9 excision and long-read sequencing reveals a class of complex, damaging and otherwise cryptic mutations that may be particularly frequent in tumour suppressor genes replete with intronic repeats.
- genetic testing
- germ-line mutation
- sequence analysis
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Multigene panel sequencing for inherited cancer risk was widely adopted in 2013 after the US Supreme Court invalidated the patenting of the genomic DNA sequences of BRCA1 and BRCA2.1 Panel sequencing has increased the diagnostic yield of pathogenic variants and decreased the cost of genetic testing for patients. The approach relies on DNA capture of exons and flanking intronic splice sites and highly accurate sequencing with short reads (100–300 bp).2 However, this technology does not efficiently detect complex structural rearrangements such as inversions and mobile element insertions.3 The BRCA1 genomic region is particularly challenging for short-read sequencing. It is composed of 42% Alu repeats,4 the second highest proportion in the genome, and a 30 kb tandem segmental duplication spanning its promoter and first two exons.5 As a consequence of this unstable genomic structure, simple structural variants (deletions and duplications) of BRCA1 exons represent more than 10% of all germline BRCA1 mutations.6 However, the frequency and nature of complex structural variants within introns and other non-coding regions of BRCA1 is not yet known.
For families severely affected with breast cancer, we applied sequencing of long DNA reads (>10 000 bp) to evaluate complete BRCA1 and BRCA2 genomic loci, including exons, introns, promoters and regulatory regions. Participants with DNA sequenced by this approach were the probands of 19 families with at least four relatives with young-onset breast cancer, all with negative (normal) sequence based on gene panel and whole exome sequencing. All participants provided informed consent (UW protocol 1583). For each participant, freshly grown lymphoblasts were loaded onto a high molecular weight library system (Sage Science, Beverly, Massachusetts, USA) and lysed directly in agarose gels. Pairs of CRISPR guides, designed to excise 200 kb genomic loci including BRCA17 (chr17:41,170,535–41,368,879) and BRCA2 (chr13:32,836,996–33,026,430) were added to the gels along with Cas9 enzyme. Cut fragments were separated by field gel inversion electrophoresis (FIGE), an approach developed in the 1980s for separation of very large DNA fragments by reversing the polarity of the electrophoretic field periodically at pulse times in hundreds to thousands of milliseconds. ‘Next generation’ FIGE is automated to modify duration of pulse times systematically over course of the run. Separated fragments were eluted and evaluated for BRCA1 and BRCA2 enrichment by TaqMan qPCR. BRCA1 and BRCA2 fragments, which were ~200 kb in size, were sheared to ~20–30 kb by two passages through a gTUBE (Covaris, Woburn, Massachusetts, USA). Fragments were then end repaired, A-tailed and ligated to SMRTbell adapters using the Express Template Prep Kit 2.0 (Pacific Biosciences, Menlo Park, California, USA) following the manufacturers recommendations for low DNA input. Libraries were then sequenced on a Sequel I (Pacific Biosciences) with average read length of 9700 bp. Reads were aligned to BRCA1 and BRCA2 and evaluated using PALMER8 for structural variants (deletions, duplications, insertions, inversions and translocations) >50 bp in size.
In genomic DNA of one of the 19 probands, we identified an intronic insertion event that was not present in the Database of Genomic Variants,9 or in the gnomAD V.2.1 structural variant call set10 or in a diverse group of individuals whose whole genomes were sequenced to high depth with long reads.11 The proband of family CF1225 harboured a 2856 bp SVA_F (SINE+VNTR+ Alu) retrotransposon at chr17:41,229,081 in intron 13 of BRCA1 (GRCh37/hg19 assembly). This participant, American of Romanian ancestry, was diagnosed with bilateral breast cancer at ages 40 and 42 years. PCR and Sanger sequencing confirmed the SVA insertion location, flanked on both ends by a palindromic 14 bp target site duplication (figure 1A). The SVA insertion shared 98.6% identity to sequence at chr1:46,706,032–46,708,626. Multiple long reads included all elements of the mutation and of wild-type flanking BRCA1 intronic sequence, so that the mutation’s position and the sequence were clear.
In order to determine if the intronic SVA insertion altered BRCA1 transcription, we grew lymphoblasts of CF1225.04 in puromycin to inhibit nonsense mediated decay, then evaluated cDNA of BRCA1 by RT-PCR. Sequencing cDNA across BRCA1 exons 12–15 yielded three transcripts: one of the expected size and two larger. Sanger sequence of these larger products revealed that the naturally occurring splice acceptor of BRCA1 intron 13 was paired with each of two cryptic splice donor sites in the Alu portion of the SVA insertion, yielding pseudoexons of sizes 509 bp and 666 bp in the BRCA1 message (figure 1B). Both pseudoexons included premature stop codons, predicted to truncate the BRCA1 protein at codon 1558 of the 1863 full length protein. The SVA insertion segregated with breast cancer in family CF1225 (figure 1C). All relatives and their adult children have been recontacted for genetic and clinical follow-up. We also hope to determine if this SVA retrotransposon could represent a founder allele in the Romanian population.
The genomic regions harbouring tumour-suppressor genes are replete with repeats and segmental duplications. Indeed, these features yield the tumour suppressor phenotype, in that they lead to frequent somatic mutation and complete loss of gene function among persons carrying an inherited damaging allele at the same locus. Given these genomic structures, it is possible, even likely, that complex mutations are common at tumour suppressor genes. We suggest that complex mutations have thus far been rarely encountered, because they are difficult to detect with existing approaches. A recent whole genome sequencing study of triple negative breast tumours, with targeted analysis of mobile elements, identified an SVA insertion in BRCA1 intron 2 in a tumour with independent loss of the wild-type BRCA1 allele, leading to reduced expression of the BRCA1 message.12 Insofar as we know, the only other tumour-suppressor gene previously known to harbour an SVA insertion is PMS2, in a case discovered by Southern blotting.13 The genomic approach described here, integrating CRISPR–Cas9 excision of critical loci with long-read sequencing, yields complete sequence of targeted loci and thus can detect all classes of complex non-coding structural variants. The frequency of these classes of mutations could be determined by offering this approach on a research basis to families severely affected with breast, ovarian or prostate cancer but with negative gene panel and exome sequencing results.
We would like to thank Chris Boles for technical advice and help.
Contributors All the authors contributed to generating and/or analysing the data.
Funding This project was supported by National Cancer Institute grant 5R35CA197458 and by the Breast Cancer Research Foundation.
Competing interests TW discloses consulting fees from Color Genomics outside the submitted work. M-CK is an American Cancer Society Research Professor.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.