Purpose Whereas most human genes encode multiple mRNA isoforms with distinct function, clinical workflows for assessing this heterogeneity are not readily available. This is a substantial shortcoming, considering that up to 25% of disease-causing gene variants are suspected of disrupting mRNA splicing or mRNA abundance. Long-read sequencing can readily portray mRNA isoform diversity, but its sensitivity is relatively low due to insufficient transcriptome penetration.
Methods We developed and applied capture-based target enrichment from patient RNA samples combined with Oxford Nanopore long-read sequencing for the analysis of 123 hereditary cancer transcripts (capture and ultradeep long-read RNA sequencing (CAPLRseq)).
Results Validating CAPLRseq, we confirmed 17 cases of hereditary non-polyposis colorectal cancer/Lynch syndrome based on the demonstration of splicing defects and loss of allele expression of mismatch repair genes MLH1, PMS2, MSH2 and MSH6. Using CAPLRseq, we reclassified two variants of uncertain significance in MSH6 and PMS2 as either likely pathogenic or benign.
Conclusion Our data show that CAPLRseq is an automatable and adaptable workflow for effective transcriptome-based identification of disease variants in a clinical diagnostic setting.
- Nanopore Sequencing
- Gastrointestinal Diseases
- Gene Expression Profiling
Data availability statement
Data are available upon reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
WHAT IS ALREADY KNOWN ON THIS TOPIC
A large fraction of disease-causing variants are known to disrupt mRNA structure or expression.
Long-read RNA sequencing is a powerful tool to assess mRNA structure, but its sensitivity is limited.
WHAT THIS STUDY ADDS
We developed capture and ultradeep long-read RNA sequencing (CAPLRseq) as an automatable and adaptable workflow for effective transcriptome-based identification of disease variants in a clinical diagnostic setting.
CAPLRseq can evaluate a wide range of simple and complex DNA variants that affect mRNA structure and expression.
We validated CAPLRseq for the diagnosis of Lynch syndrome.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
CAPLRseq may be incorporated into the diagnostic workflow to unambiguously classify DNA variants of uncertain significance in hereditary cancer predisposition genes
DNA genetic testing for germline variants in tumour suppressor genes can identify individuals with hereditary cancer predisposition. However, up to 44% of genetic variants identified by DNA sequencing are classified as variants of uncertain significance (VUS).1 VUS present significant clinical challenges as their uncertain impact can impede optimal preventive and therapeutic care of patients and their family members. Fifteen to twenty-five percent of VUS are predicted to disrupt mRNA splicing,2 although the functional impact of individual variants is rarely confirmed. This is particularly worrisome as hereditary cancer genes are highly susceptible to splicing defects.3–5 For example, studying genes encoding mismatch repair (MMR) proteins MLH1, PMS2, MSH2 and MSH6 involved in hereditary non-polyposis colorectal cancer (HNPCC)/Lynch syndrome, a form of hereditary colorectal cancer predisposition, we previously reported that 16% of missense variants and 12% of VUS actually cause splicing defects.6 These considerations highlight the potential of RNA-based analysis in aiding the classification of VUS. Indeed, a recent study demonstrated successful interpretation of VUS identified at the DNA level by subsequent RNA-based analysis in 88% of cases.7
Long-read sequencing is ideally suited for detecting mRNA isoforms that might arise from DNA variants.8–11 For example, using PCR amplification of cDNA, a study identified 32 alternatively spliced isoforms of BRCA1 mRNA by Oxford Nanopore Technology (ONT) sequencing, 20 of which were novel.12 Using a similar approach, a study identified single-nucleotide variants (SNVs) and an aberrantly spliced form of NF1 mRNA.13 However, the requirement for PCR amplification limits this approach to distinct loci and is thus unsuitable for highly parallel panel analyses and automation. Although whole transcriptome sequencing on the ONT platform is possible, sequencing depth is often limiting in confidently assigning new isoforms to mRNAs with low-level expression (see the Results section). For example, many variants causing premature termination codons (PTCs) trigger nonsense-mediated mRNA decay (NMD), causing severe depletion of PTC carrying mRNA species of potential diagnostic value.14 15 Likewise, the relatively high error rate of ONT sequencing necessitates ultradeep coverage for variant calling and mRNA isoform profiling.
Recent studies established the feasibility of RNA capture by hybridisation to enrich parts of the transcriptome for deep long-read sequencing.11 16 17 While being a powerful tool for the de novo discovery of coding and non-coding transcripts, this methodology requires custom capture probe design and validation. To perform RNA-based analysis in a routine diagnostic setting, we sought to develop an approach that employs validated probe sets and is amenable to automation. Here, we describe a facile protocol merging Agilent’s SureSelectXT Low Input Target Enrichment System with Oxford Nanopore’s cDNA-PCR Barcoding Library Preparation Kit for highly efficient capture of transcripts from hereditary cancer predisposition genes. Studying samples from patients with suspected HNPCC/Lynch syndrome, we demonstrate that the technique readily enables the interpretation of variants affecting mRNA structure or expression, including (1) PTCs or promoter methylation causing allelic reduction in mRNA expression, (2) alterations in mRNA splicing resulting in exon skipping or intron retention, and (3) structural variants such as SINE-VNTR-Alu (SVA) insertions and fusion transcripts.
Performance of RNA capture sequencing (capture and ultradeep long-read RNA sequencing (CAPLRseq))
The CAPLRseq method established here is summarised in figure 1. It involves extraction of total cellular RNA from either whole human blood or short-term cultures of peripheral blood mononuclear cells (PBMCs). The RNA is then reversely transcribed by oligo-dT priming and template switching according to the ONT cDNA-PCR sequencing Kit (SQK-PCS109), followed by PCR amplification with or without barcoding. The amplified cDNA is employed as input for the Agilent SureSelectXT capture workflow according to the Low Input Target Enrichment System. Captured cDNAs are PCR-amplified and subjected to ONT Rapid Adapter Ligation and sequencing on a GridION instrument.
Table 1 compares run statistics of cDNA sequencing (cDNA-seq) performed with total RNA or after enrichment of 123 hereditary cancer-related transcripts with the CAPLRseq method developed here. In single runs on the GridION platform using R.9.4.1 flow cells, we obtained 4.8–9.8 million reads, with multiplexed samples typically being at the lower end of this range. A minimum of 80% and as many as 94% of reads were typically aligned to the reference genome. Mean read length varied between 625 and 1111 nucleotides, whereas medians ranged from 625 to 917 nucleotides. We did not observe any systematic differences in read length between total cDNA-seq versus CAPLRseq samples.
Whereas we obtained reasonable sequencing depth of 82× across the exome by total RNA sequencing (RNA-seq), average read depth was relatively shallow for 123 cancer-related transcripts (48; table 1, top row) which appear to be expressed at a comparatively low level. This sequencing depth was deemed insufficient for reliable use in disease diagnostics. The CAPLRseq approach raised average sequencing depth of the cancer gene transcripts in duplexed samples to over 5000×, a >100-fold improvement over total RNA-seq (table 1). High depth was also obtained by CAPLRseq of RNA derived from whole blood (~2000×). This depth was not increased in a substantial way by depleting >80% of haemoglobin (HB)-encoding mRNAs HBA1/HBA2 and HBB using the GlobinLock strategy,18 even though globin mRNAs were sequenced to a depth of ~75 000× in non-depleted samples (table 1). This indicates that, at least in a duplex format, flow cell capacity is not limiting for CAPLRseq of cancer-related mRNAs from whole blood RNA samples even without depletion of globin mRNA.
Although average sequencing depth of the 123 hereditary cancer gene transcripts in the 12-plex format was 187×–273× (table 1), relative quantification based on transcripts per million reads (TPM) revealed that the abundance of individual mRNAs of the cancer panel varied in a representative sample across four orders of magnitude (online supplemental figure S1A). Thus, 75% of the total reads mapped to the 15 most abundant mRNAs, while the remaining 108 transcripts were represented in 25% of all reads (online supplemental figure S1B). The median TPM value of the 123 cancer gene transcripts was 1590. Thirteen transcripts on our cancer panel (ALK, CASR, CDKN2B, CFTR, CTRC, GREM1, HOXB13, KIT, MITF, PDGFRA, PHOX2B, PTCH2, SPRED1 and WT1) could not be reliably detected in the 12-plex format due to low abundance in PBMCs. The wide range of expression levels illustrates the importance of enriching transcripts in order to assess low abundance cancer gene transcripts.
Despite these limitations, quantification of mRNA expression by CAPLRseq was remarkably reproducible with correlation coefficients between technical replicates of the same RNA sample typically above 0.9 (online supplemental figure S1C). Correlations between samples from distinct patients was also high, varying between 0.62 and >0.9. Poor correlation (r<0.5) was obtained only for RNA samples of low integrity (online supplemental figure S1C).
Figure 1 also summarises the timeline of CAPLRseq. Times refer to the duration of the RNA-seq procedure for one Gridion flow cell. Considering the time needed for tissue culture and RNA isolation, the entire duration per sample from blood draw to result is ca. 1 week. Since 12 libraries fit on one flow cell (12-plex), four patients per flow cell can be analysed in triplicate. On the fully loaded Gridion instrument with five flow cells, the maximum number of patients that can be analysed in parallel is 20. If required, the throughput of 20 patients per week can be increased by stacking the entire process, such that several 24 hours runs on the Gridion instrument are scheduled per week.
Validation of CAPLRseq for the diagnosis of patients with suspected HNPCC/Lynch syndrome
Using a panel of 25 samples, we had previously demonstrated that PCR-based analysis of mRNAs encoding each of the four MMR proteins involved in HNPCC/Lynch syndrome considerably increases diagnostic yield.6 Although relative expression values (TPM) of the four mRNAs differed more than 10-fold (PMS2=34 312, MSH6=17 458, MLH1=7215 and MSH2=3031), all were expressed well beyond the median TPM value of the 123-cancer gene panel (1590) and thus readily analysable.
We initially used eight of these PCR verified samples harbouring a wide array of disease variants to test whether our newly developed CAPLRseq method would recapitulate the PCR-based variant classification. As shown in table 2, all eight variants were confirmed by CAPLRseq. This included splicing changes in MSH2 and MLH1 caused by intronic SNVs as well as nucleotide and whole exon duplications in MSH6 and PMS2, respectively. An SNV creating a PTC in MSH2 mRNA was confirmed to cause allelic imbalance in mRNA expression due to NMD as the variant transcript was rescued in PBMC cultures treated with puromycin (figure 2A).
To assess the effect of an intronic variant affecting a splice donor region of MSH2 mRNA, we determined percent spliced-in (PSI) values which represent the fraction of reads containing a certain exon relative to all reads spanning that exonic region.19 The analysis confirmed that the splice donor variant caused in-frame skipping of exon 5 with a PSI of 41.93%±4.16%. (figure 2B). Exon 5 skipping was similar in puromycin-treated PBMCs (45.21%±0.79%), suggesting that the aberrantly spliced isoform is not degraded by NMD as was expected for a variant that does not lead to a frame shift. Control samples show almost complete inclusion of exon 5 in MSH2 transcripts with a PSI of 90.84±3.62 (figure 2B).
Next, we sequenced a series of nine samples with known DNA variants in MMR genes graded as pathogenic based on American College of Medical Genetics and Genomics (ACMG) criteria for which we had no prior RNA data available. CAPLRseq confirmed all nine variants as either class 4 or 5 pathogenic variants (table 2). For example, a germline promoter methylation identified by multiplexed ligation-dependent probe analysis was apparent as a ~2-fold downregulation of MLH1 mRNA in PBMCs (figure 2C).
Furthermore, an intronic variant in MSH2 caused an 11-nucleotide extension of exon 15 predicted to cause a frame shift and a stop codon three residues downstream (figure 3A). The frequency of the extension was increased from a PSI of 20.19±6.75 to a PSI of 26.49±3.62 in puromycin-treated PBMCs.
We also documented the deleterious effect of insertion of an SVA retrotransposon on the PMS2 mRNA. This SVA insertion was previously shown by PCR amplification and Sanger sequencing to add 71 nucleotides to exon 8 of PMS2 mRNA, thus resulting in a premature stop codon.20 In the CAPLRseq data, the SVA insertion was readily detectable in the Integrated Genomics Viewer as a shoulder in the coverage track formed by a stretch of 71 nucleotides mapping to the reference sequence with multiple mismatches (figure 3B). The sequence of the insertion corresponded to the relevant region of the SVA retrotransposon (data not shown). The frequency of the variant allele was ~2-fold increased in PBMCs treated with puromycin (37.91%±5.57% without puromycin vs 54.73%±1.40% with puromycin), suggesting that it is degraded by NMD as expected for a frameshift variant.
Lastly, we confirmed fusion transcripts between MLH1 and DCLK3, a gene located ~230 kb upstream of MLH1.21 The fusion arises from a genomic inversion with breakpoints in MLH1 and DCLK3 resulting in the fusion of MLH1 exon 1 (and parts of an alternative exon 2; see further) with exons 4 and 5 of DCLK3 (figure 3C and data not shown). In the rearranged allele, transcription is likely driven by the inverted MLH1 promoter. Some of the fusion transcripts also contained exon 2 of a unique isoform of MLH1 mRNA (NM_001354619) corresponding to the genomic region in which the breakpoint is located (figure 3C). We also detected heterogeneity at the 3′ end of the MLH1–DCLK3 fusion transcript with some isoforms missing exon 4 of DCLK3 presumably due to alternative splicing (figure 3C). Puromycin led to a ~2.2-fold upregulation of the fusion transcripts (16.52%±0.28% without puromycin vs 36.81%±5.42% with puromycin) indicating that they are subjected to NMD. Consistent with this interpretation, we detected a ~2-fold downregulation of MLH1 mRNA (figure 3C), which is known to cause severe deficiency in MMR activity.22 We did not detect the predicted reverse DCLK3–MLH1 fusion transcript, indicating that the promoter of DCLK3 is either disrupted through the rearrangement or otherwise silenced.
Application of CAPLRseq for the diagnosis of patients with suspected HNPCC/Lynch syndrome
Finally, we sequenced two samples in which variants of unknown significance were identified in MMR genes. The first proband had a positive family history for colorectal cancer with one parent and a grandparent affected. An SNV in intron 2 of the PMS2 gene was predicted to alter a conserved splice donor site, but in the absence of any evidence at the mRNA level, the variant was graded a VUS. CAPLRseq revealed a skipping of exon 2 in PMS2 transcripts with a PSI of 38.42%±9.33% (figure 4A). The skipping is predicted to result in a frame shift, although we found no evidence that the variant transcript is degraded by NMD (PSI of exon 2=41.06%±3.27% in puromycin-treated PBMCs, figure 4A). Consistent with this conclusion, we did not see a major allelic imbalance in PMS2 mRNA expression based on transcribed SNVs (figure 4A). Regardless, the frame shift predicts a defective PMS2 protein, thus presumably causing a loss-of-function of PMS2. These results provided evidence to reclassify the variant as class 4 according to ACMG criteria (PS3_VRS, PM2_SUP).
The second proband had ovarian cancer (mother) and bladder cancer (brother) in the family, and a VUS was found at the end of intron 3 of the MSH6 gene, potentially affecting a splice acceptor site. CAPLRseq did not identify a change in splicing of exon 4 nor a change in the balance of allele expression (figure 4B), and the variant was hence reclassified as likely benign (ACMG class 2, BS3_SUP, BP4).
The CAPLRseq method we developed, validated and deployed for the diagnosis of HNPCC/Lynch syndrome provides a powerful diagnostic workflow. Its main strengths comprise its versatility in identifying the consequences of a diverse set of genomic variants for the structural integrity and expression of cognate mRNAs. The spectrum of variants includes coding and non-coding (intronic) SNVs as well as structural variants such as insertion of retrotransposons and large genomic rearrangements resulting in the formation of fusion transcripts. Whereas the use of individual splice junctions can also be assessed by short-read RNA-seq, only long-read sequencing can mirror the full diversity of the transcriptome at a single molecule level. As such, the de novo detection of altered transcripts arising from structural variants such as the MLH1-DCLK3 inversion is not readily accomplished by short-read sequencing. Likewise, genes such as PMS2, which has several highly homologous pseudogenes, cannot be fully assessed by short-read RNA-seq.4 Finally, the enrichment approach provides deep penetration of the hereditary cancer transcriptome, which spans an abundance range of four orders of magnitude, thus enabling the application of the approach in clinical genetic diagnosis.
Our standard input material for CAPLRseq was total RNA isolated from short-term cultures of patient PBMCs. Apart from straightforward sampling, PBMC cultures have several advantages that, in our view, outweigh their main disadvantage of increased handling time:
Unlike RNA isolated from whole blood (eg, PAXgene Blood RNA System), PBMC RNA does not contain the vast amounts of HBA and HBB mRNA that can limit transcriptome penetration. Although CAPLRseq is compatible with whole blood RNA samples (table 1) because HB RNA is partially depleted during the enrichment process, HB sequencing depth remains high even after enrichment despite the inclusion of an additional HB mRNA depletion step (GlobinLock, table 1). It is thus likely that HB will negatively impact capacity to sequence cancer-related transcripts at higher levels of multiplexing, such as the 12-plex format we typically use on an R.9.4.1 flow cell. Availability of patient PBMC RNA samples free of HB mRNA also provides the option of subsequent follow-up by standard total RNA-seq, if applicable.
A second major advantage is the ability to supplement the PMBC cultures with puromycin to inhibit NMD. The necessity to inhibit NMD-mediated degradation of aberrant transcripts for the reliable RNA-based diagnosis of HNPCC/Lynch syndrome has been repeatedly demonstrated.6 15 23 24 Confirming this, we are showing here that NMD inhibition improves the detection of a splicing variant in MSH2 (NM_000251.3 c.2459–12A>G, figure 3A). It is likely that NMD-mediated degradation of aberrantly spliced transcripts is equally pervasive for many other hereditary cancer genes included in our panel as was, for example, previously demonstrated for BRCA1.14
A secondary benefit of PBMC cultures is that cells can be frozen and banked for later reuse as a source of additional RNA, DNA and protein samples for confirmation studies without the need of resampling the patient, which is often inconvenient or impossible.
Lastly, although we have not systematically addressed this, it is conceivable that the surprisingly modest interindividual variability in the expression of hereditary cancer genes we observed within our diverse cohort of patients subjected to different environmental influences is, at least in part, due to the standardised conditions under which the patient-derived PBMCs were cultured for several days. The resulting stabilisation of expression signatures by passaging cells through uniform culture conditions might cancel out extraneous influences thus facilitating the identification of genetically driven changes of potential diagnostic value.
In summary, CAPLRseq is a highly efficient diagnostic method that readily integrates into existing workflows of modern clinical genetics laboratories. The method is automatable and cost effective with an approximate material cost of ~€400 per patient in our setting.
Nineteen patients meeting at least the revised Bethesda criteria25 were retrospectively enrolled in this study. All patients underwent genetic counselling and genetic diagnostic testing by DNA and RNA-seq with consent according to German laws. ACMG guidelines26 were used to categorise variants as class 5 (pathogenic), class 4 (likely pathogenic), class 3 (variant of uncertain significance), class 2 (likely not pathogenic) or class 1 (not pathogenic).
RNA isolation from clinical samples
Short-term PBMC cultures were established from 3 mL whole blood anticoagulated with heparin. The blood sample was diluted with 3 mL sterile saline solution and PBMCs were isolated by centrifugation in 10 mL Leucosep tubes (Greiner). The layer containing PBMCs was removed, washed and transferred into two 15 mL conical tubes containing 5 mL PB-MAX Karyotyping Media (Thermo Fisher Scientific, catalogue number 12557021). Cells were incubated at 37°C for 72–96 hours. Five hours prior to harvesting, one of the two cultures received 50 µg/mL of the protein synthesis inhibitor puromycin to prevent NMD. Cells were collected by centrifugation, and RNA was isolated with the Qiagen RNA Blood Mini-Kit (Qiagen) or the NEB Monarch Total RNA Miniprep Kit (T2010S) according to the manufacturer’s instructions. RNA yield was determined by spectrophotometry, and RNA integrity was assessed using the Agilent Fragment Analyzer.
For extraction of total RNA from whole blood, 2.5 mL blood obtained by standard venipuncture was collected into PAXgene Blood RNA Tubes (Qiagen). Blood samples were kept at room temperature for at least 2 hours prior to storage at 4°C for up to 3 days or at −20°C for long-term storage. RNA was purified on spin columns according the instructions provided in the PAXgene Blood RNA Kit. RNA yield was determined by spectrophotometry, and RNA integrity was assessed using the Agilent Fragment Analyzer. RNA samples with an RNA integrity number (RIN) >9.5 were used as input for reverse transcription.
Reverse transcription and PCR amplification
Fifty nanograms total RNA were used for cDNA synthesis following the ONT SQK-PCB109 kit protocol with the adjustments described as follows. After cDNA synthesis, four parallel PCRs per sample were performed with the respective barcoding primer pairs (online supplemental table 1) using LongAmp Taq Master Mix (New England Biolabs, catalogue number M0287) with the following cycling conditions: denaturation at 95°C for 30 s, 14 cycles of 95°C for 15 s, 62°C for 15 s, 65°C for 8 min and 20 s, 65°C for 6 min, hold at 4°C. After digestion with 20 units of exonuclease 1 (New England Biolabs, catalogue number M0293) at 37°C for 15 min, samples were heated to 80°C for 15 min, and the four PCR reactions were pooled in a 1.5 mL Eppendorf LoBind tube, followed by purification with 0.8× equivalents of resuspended AMPure XP beads (Beckman Coulter). Samples were incubated on an Intelli-Mixer ELMI RM-2M (mode F-F7, 15 RPM) for 5 min at room temperature, followed by magnetic capture and two washes with 200 µL of freshly prepared 70% ethanol. After brief drying, the pelleted beads were resuspended in 12 µL of elution buffer (EB) and incubated on an Intelli-Mixer ELMI RM-2M (mode F-90, 15 RPM) for 10 min at room temperature. Beads were retained on a magnetic rack and 12 µL of eluate was transferred to a new LoBind tube. Quantification of the amplified DNA was done with Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, catalogue numbser Q32851) using 1 µL of amplified library. Typical yields ranged between 30 ng/µL and 100 ng/µL.
Purified library (between 350 ng and 850 ng) in a volume of 11 µL was used for cDNA capture with the Agilent SureSelectXT Low Input Enrichment System. Prehybridisation blocking was done by adding 5 µL of SureSelect XT HS and XT Low Input Blocker Mix followed by incubation in a thermal cycler with the following settings: heated lid on, 95°C for 5 min, 65°C for 10 min, and pause at 65°C for 1 min, during which the hybridisation mix was added. The capture library hybridisation mix was prepared at room temperature and contained 2 µL 25% RNase Block solution, 1 µL Cancer Panel Capture Library (<3 Mb; see online supplemental table 2 for individual genes), 6 µL SureSelect Fast Hybridization Buffer and 3 µL nuclease-free water. The amount of capture library was optimised by titration from 0.1 µL to 2 µL (data not shown). The mix was added to the samples while in the thermal cycler at 65°C and mixed by slowly pipetting up and down 8–10 times. Brief vortexing was followed by brief centrifugation, and the thermal cycler programme continued for 60 cycles each for 1 min at 65°C and 3 s at 37°C, after which samples were held at 65°C for a maximum of 10 min before probe binding to streptavidin beads.
Dynabeads MyOne Streptavidin T1 magnetic beads were prepared according to the manufacturer (Thermo Fisher Scientific). Briefly, 200 µL of SureSelect Binding Buffer was mixed with 50 µL of the resuspended beads. Beads were pelleted in a magnetic rack and the supernatant was discarded. A total of three washes were done, and beads were resuspended in 200 µL of SureSelect Binding Buffer. After hybridisation, samples were transferred and mixed with the washed streptavidin beads. A 30 min incubation in the Intelli-Mixer ELMI RM-2M was done at room temperature and low speed (F-F30, RPM20). Beads were gently mixed by flicking the tube every 5 min to prevent beads from settling. During this incubation, PCR tubes containing 200 µL of SureSelect Wash Buffer 2 were prewarmed at 70°C in a thermal cycler. Exact temperature control is essential at this step to maintain capture specificity. In order to remove non-hybridised DNA, beads were collected in a magnetic rack and the supernatant was discarded. Beads were fully resuspended in 200 µL of SureSelect Wash Buffer 1 by pipetting up and down 15–20 times. Using the magnet, the supernatant was removed and beads were fully resuspended in 200 µL of 70°C prewarmed Wash Buffer 2 by pipetting up and down 15–20 times. Samples were vortexed at high speed for 8 s and spun briefly, taking care that beads did not pellet but remained in suspension. Beads were incubated for 5 min at 70°C in the thermal cycler. Beads were pelleted for 1 min in the magnetic rack, and the supernatant was discarded. These steps were repeated for a total of 6 washes. After removal of the wash buffer, beads were resuspended in 25 µL of nuclease-free water.
A second PCR was done using 14 µL of the enriched library, the same ONT Barcode Primers used in the first PCR (see reverse transcription and PCR amplification) and 2× LongAmp Taq Master Mix. Cycling conditions were 95°C for 30 s, 20 cycles (95°C for 15 s, 62°C for 15 s, 65°C for 8 min and 20 s), 65°C for 6 min, hold at 4°C. After digestion with 20 units of NEB Exonuclease 1 at 37°C for 15 min and heat inactivation at 80°C for 15 min, amplification products were purified with 0.8× equivalents of resuspended AMPure XP beads as described previously, followed by elution in 12 µL of EB. Quantification of the enriched DNA was done with Qubit dsDNA HS Assay Kit, using 1 µL of sample. Typical yields ranged between 50 ng/µL and 100 ng/µL.
Depletion of HBA and HBB mRNA was done with the GlobinLock procedure.18 For this, the ONT SQK-PCB109 protocol was modified by annealing locked nucleic acid (LNA) containing oligonucleotides complementary to the 3′-untranslated regions (UTRs) of HBA (oligo LNA-A) and HBB (oligo LNA-B) mRNAs (online supplemental table 1) prior to first strand cDNA synthesis. Reactions of 10 µL contained 50 ng of total RNA, 3 µM each of LNA-A and LNA-B, 125 mM KCl, 1 mM dNTPs, and 1× RT buffer. Samples were heated to 95°C for 30 s and incubated at 60°C for 5 min. After this, 1 µL VNP primer was added (0.2 µM final concentration), followed by incubation at 60°C for another 5 min, after which the sample was placed on ice. Reverse transcription was started by adding 2 µL 5× RT buffer, 1 µL RNAseOUT, 2 µL SSP, 3 µL H2O and 1 µL Maxima RT and incubation as per ONT SQK-PCB109 protocol.
Adapter ligation and ONT sequencing
Amplified library of 100 fmol in a volume of 11 µL was used for sequencing. Assuming a mean length of 1.5 kb, 100 fmol corresponds to ~100 ng. If multiplexed libraries were sequenced on one flow cell, barcoded libraries were mixed at equal ratio to obtain a total of 100 fmol in 11 µL. ONT Rapid Adapter of 1 µL was added and incubated for 5 min at room temperature. The prepared library was kept on ice until loading onto the sequencing flow cell.
Priming and sample loading onto R.9.4.1. flow cells were done according to standard protocols described by ONT. Sequencing was performed on the GridION platform using the ONT Flo-Min 106D flow cell setting and the standard MinKNOW protocol script (NC_48 hours_sequencing_FLO-MIN106_SQK-PBC109) with accurate base calling and demultiplexing turned on.
Mapping for the analysis of aberrant splicing and transcript fusion detection was performed using minimap2 V.2.17-r941. Fusion transcripts were detected using JAFFA V.2.2. Percentages of aberrant transcripts due to splicing defects were determined either by manual inspection of mapped reads in Integrative Genomics Viewer (IGV) and deduction of affected read counts or by calculation of PSI scores using PSI-Sigma V.1.9r19 as applicable. Differential gene expression (DGE) analysis was performed using ONT’s pipeline for DGE and differential transcript usage analysis of long reads (https://github.com/nanoporetech/pipeline-transcriptome-de). Accordingly, mapping was performed using minimap2 V.2.18, expression was quantified using salmon V.1.5.0, and DGE analysis was performed using edgeR V.3.34.0. Log2-fold changes and adjusted p values were calculated relative to the cohort of all samples, and data were plotted in volcano plots.
Data availability statement
Data are available upon reasonable request.
Patient consent for publication
This study involves human participants and was approved by the ethics committee of the Ludwig-Maximilians University (project number 20-0839). The participants gave informed consent to participate in the study before taking part.
We thank all members of the MGZ Medical Genetics Center for supporting this work.
VS, RMLS and FS are joint first authors.
EH-F and DAW are joint senior authors.
Contributors Conceptualisation: EH-F and DAW. Data curation: FS, VS, JMAP, AL and MM. Investigation: VS, RMLS, FS, TH, MW and DAW. Formal analysis: VS, RMLS, FS, KK, AL and MM. Visualisation: VS, FS and DAW. Writing: DAW and FS. Guarantor: DAW.
Funding Initial phases of this project were funded by German Cancer Aid (project no. 111222) and the Wilhelm Sander-Stiftung (#2012.081.1).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.