Article Text
Abstract
Background Duchenne muscular dystrophy or Becker muscular dystrophy might be a suitable candidate disease for application of next-generation sequencing in the genetic diagnosis because the complex mutational spectrum and the large size of the dystrophin gene require two or more analytical methods and have a high cost. The authors tested whether large deletions/duplications or small mutations, such as point mutations or short insertions/deletions of the dystrophin gene, could be predicted accurately in a single platform using next-generation sequencing technology.
Methods A custom solution-based target enrichment kit was designed to capture whole genomic regions of the dystrophin gene and other muscular-dystrophy-related genes. A multiplexing strategy, wherein four differently bar-coded samples were captured and sequenced together in a single lane of the Illumina Genome Analyser, was applied. The study subjects were 25 patients: 16 with deficient dystrophin expression without a large deletion/duplication and 9 with a known large deletion/duplication.
Results Nearly 100% of the exonic region of the dystrophin gene was covered by at least eight reads with a mean read depth of 107. Pathogenic small mutations were identified in 15 of the 16 patients without a large deletion/duplication. Using these 16 patients as the standard, the authors' method accurately predicted the deleted or duplicated exons in the 9 patients with known mutations. Inclusion of non-coding regions and paired-end sequence analysis enabled accurate identification by increasing the read depth and providing information about the breakpoint junction.
Conclusions The current method has an advantage for the genetic diagnosis of Duchenne muscular dystrophy and Becker muscular dystrophy wherein a comprehensive mutational search may be feasible using a single platform.
- Duchenne muscular dystrophy
- Becker muscular dystrophy
- genetic diagnosis
- next-generation sequencing
- genetics
- neurology
- epilepsy and seizures
Statistics from Altmetric.com
- Duchenne muscular dystrophy
- Becker muscular dystrophy
- genetic diagnosis
- next-generation sequencing
- genetics
- neurology
- epilepsy and seizures
Introduction
Duchenne muscular dystrophy (DMD; MIM #310200) and Becker muscular dystrophy (BMD; MIM #3003376) are the most common forms of childhood muscular dystrophy affecting from 1 in 3500 to 1 in 6000 male births.1 2 Genetic testing of the dystrophin gene (DMD; Xp21.2) is now the initial method for confirming the diagnosis, although a muscle biopsy might be considered if rapid and reliable genetic testing is unavailable.3 However, full characterisation of the mutational spectrum is necessary for genetic counselling, prenatal diagnosis and selecting the patients eligible for future mutation-specific treatments. Although the proportion of mutations differs slightly between studies, possibly reflecting bias in cohort selection and application of different molecular diagnostic methods,4–7 the mutational spectrum can be approximated as follows: a large deletion in about 60% of patients, a large duplication in about 10% of patients and small mutations confined mostly to coding exons in about 30% of patients. To date, no genetic testing has been developed to cover this whole mutational spectrum in a single platform. In most laboratories, methods for detecting large deletions/duplications5 8–10 and methods for detecting small mutations11 12 are conducted separately. In addition, the large size of the dystrophin gene requires considerable effort, cost and time for direct sequencing using the Sanger method.
Massively parallel or next-generation sequencing (NGS) technologies have become essential tools for searching for new human disease genes, most of which were identified by either whole-exome sequencing or targeted sequencing of the regions identified by linkage analyses.13–17 NGS technology is also useful for molecular diagnosis of certain diseases where laborious sequencing efforts are required because of the large gene size or the presence of multiple causative genes in single disease entities. A few proof-of-concept studies have been published recently in this field,18–20 although further optimisation and validation are required. Besides small mutations including point mutations and short insertions/deletions (indels) spanning several base pairs, large deletions/duplications can also be identified by NGS technologies through read depth estimation.21 Because NGS provides a comprehensive mutation search from large deletions/duplications to small mutations in a single platform, DMD/BMD is a suitable candidate disease for testing whether NGS can be applied for molecular diagnosis.
To demonstrate whether large deletions/duplications and small mutations of DMD/BMD can be predicted accurately in a single NGS platform, we analysed the entire dystrophin gene regions of 25 patients with DMD or BMD using solution capture with bar-code multiplexing and massively parallel sequencing. First, the pathogenic mutations were searched in 16 patients with pathologically proven DMD/BMD who were negative for large deletions/duplications. Next, we showed how large deletion/duplication mutations could be detected in the same platform by analysing the sequencing data of nine patients with DMD/BMD with known deletion/duplication mutations.
Methods
Patients
The Institutional Review Board of the Seoul National University Hospital approved the study protocol (H-1104-053-358). Multiplex ligation-dependent probe amplification (MLPA) was the first molecular diagnostic method used in Seoul National University Children's Hospital since 2006 when DMD/BMD was suspected from the clinical and laboratory findings. If a large deletion/duplication is not identified with MLPA, the next step is to confirm deficient dystrophin staining in the muscle biopsy specimen. We randomly selected 16 patients who were candidates for small mutations because they were negative for large deletions/duplications and positive for deficient dystrophin staining. The additional nine patients harbouring large deletion/duplication mutations, which were confirmed with MLPA, were included to validate whether large deletion/duplication mutations could be correctly predicted through NGS sequencing data. The MLPA, dystrophin immunohistochemistry findings and clinical phenotypes of the 25 male patients are summarised in table 1.
Target enrichment of genomic DNA and sequencing
After the patients provided informed consent, genomic DNA was extracted from peripheral blood leucocytes using a QIAamp DNA Blood Midi Kit (Qiagen, California, USA) according to the manufacturer's instructions. Three micrograms of DNA was sheared using a Covaris S2 (Covaris, Inc., Massachusetts, USA) to ∼250 nt at a 20% duty cycle, level 5 intensity and 200 cycles per burst for 180 s. Bar-coded fragment sequencing libraries were made using an NEB sample preparation kit (New England Biolabs Inc., Massachusetts, USA) and Illumina multiplexing adaptor (Illumina, California, USA) according to the manufacturers' instructions. After ligation with the Illumina adaptor, the libraries were prepared using AMPure bead (Beckman Coulter, Inc., California, USA) rather than gel purification. Library quality was assessed using an Agilent 2100 Analyser and DNA 1000 chips (Agilent Technology, California, USA). An equimolar four-plex pool was produced for enrichment using a SureSelect Target Enrichment System Kit (Agilent Technology) and a modified protocol. Five hundred nanogrammes of pooled DNA with 5 µl (100 ng) of custom baits were used for enrichment, with blocking oligonucleotides specific for paired-end sequencing libraries and 24-h hybridisation. Biotinylated RNA library hybrids were recovered with streptavidin beads. In addition to the dystrophin gene, the target regions included 25 genes involved in congenital muscular dystrophies and limb girdle muscular dystrophies (supplementary table 1). The dystrophin gene region to be captured is at genomic position ChrX:31 047 266 to 33 139 594 (NM_004006, hg18, National Center for Biotechnology Information (NCBI) build 36), and this region was uploaded to Agilent's web-based design tool, eArray. The parameters selected were bait length (120 bp), bait tiling frequency (2×), allowed overlap into avoided regions (20 bp) and avoided standard repeat masked regions, which eliminates repetitive sequences using the RepeatMasker program alignment-based method. The bait region, which was defined as the region covered by one or more capture probes in the dystrophin gene, was 1 069 974 bp (supplementary file 1). The captured libraries were amplified and sequenced on the Illumina Genome Analyser IIx by 2×69 cycles. The other experimental procedures were performed according to the protocol of each manufacturer.
Single nucleotide variant (SNV) and indel detection
The sequencing data were processed using the Genomic Short-read Nucleotide Alignment Program (GSNAP) alignment tool,22 23 and human genome NCBI build 36 was used as the reference. Reads aligned to multiple regions of the reference sequence were removed for further analysis. The variant detection criteria for SNVs, insertions (<10 bases) and deletions (<31 bases) were as follows: (1) ≥50% of the reads indicated variation, (2) ≥8 reads were uniquely aligned to each position of variance and (3) the mean quality score (Q score) was ≥20. The identified SNVs and indels were compared with the data from the 1000 Genomes Project, dbSNP132, data from 68 Korean healthy controls (13 from whole genome sequencing and 55 from whole-exome sequencing data) and the Leiden DMD Mutation Database.24 The novel nonsense or frameshift variants were regarded as putative pathogenic mutations. The functional consequences of the novel missense variants were predicted using Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping (PolyPhen).
Large deletion/duplication prediction
The exonic coverage depth of each dystrophin gene exon was calculated by dividing the sum of all read depths per base within the specific exon by the size of that exon (base pairs) and corrected by all aligned bases within the bait region except the dystrophin gene to ensure comparability between experiments. The reference exonic coverage depth of each dystrophin gene exon was obtained by averaging the exonic coverage depths of the 16 patients without deletion/duplication. The exonic coverage depth of patients 17–25 was compared with the corresponding reference exonic coverage depth in each of the 79 exons. The threshold ratio for deletion was set at <0.2, and that for duplication was set at >1.8. After identifying a series of deleted or duplicated exons, the range of deleted or duplicated regions was approximated by plotting the read depth spanning the whole dystrophin gene region. Finally, the breakpoint of the deleted or duplicated region was estimated with the aid of aberrant paired-end sequencing calls. In the paired-end sequencing setting, the GSNAP alignment tool gives a ‘too long’ signal when the distance of a pair of reads is >1000 bp, indicating the possibility of a large deletion between that pair of reads. The GSNAP reports ‘inversion’ when the orientations are opposite to those expected and ‘scramble’ when the distance appears to be negative. These two signals suggest the disruption of genomic integrity and the possibility of duplication near both reads of pairs.
MLPA and validation through Sanger sequencing
MLPA analysis was performed using the SALSA MLPA KIT P034-A2/P035-A2 DMD/Becker (MRC Holland, Amsterdam, The Netherlands) according to the manufacturer's instructions. Direct Sanger sequencing of the selected regions was performed using primer pairs designed by the authors (available upon request) to validate the coding or known pathogenic intronic variations identified through NGS. For breakpoint junction analysis, PCR primers were designed to cover the deleted or duplicated regions suggested by the paired-end sequencing (available upon request).
Results
Sequencing summary and small mutation detection (patients 1–16)
Analysis of the sequence data revealed uniform coverage and high read depths in 16 patients (table 2). For estimation of coverage, only positions having ≥8 uniquely aligned reads with a Q score ≥20 were included. The bait region including all the dystrophin gene exons was almost completely covered by ≥8 read depth. The mean read depth of the exons was 107 (range, 86–125). Accordingly, about 68% of the whole dystrophin gene region was covered adequately to identify SNVs and short indels. The number of total SNVs and indels and the number of variants within the coding exons are summarised in supplementary table 2. There were no SNVs or short indels involving splice sites. All coding SNVs and short indels were presented specifically and compared with the data from the 1000 Genomes Project, dbSNP132 and data from 68 Korean healthy controls; these data are provided in supplementary file 2. All coding variants were confirmed using Sanger sequencing. We also tested whether there were false negatives among other bases in the exons selected for Sanger sequencing; no false-negative variant was present. Putative small mutations were identified in 15 of 16 patients (table 3); most were either nonsense or frameshift mutations causing truncation of dystrophin. Intronic variant in patient 6 (c.4518+5G>A) was demonstrated previously to cause aberrant splicing in vivo and in vitro.25 Novel nonsense (c.1006G>T, p.Glu336X) and missense (c.1005G>A, p.Met335Ile) variants were found in patient 15. This missense variant was not considered as a putative pathogenic mutation because SIFT and PolyPhen predicted that the functional consequences of this variant are ‘tolerated’ and ‘benign’. The functional consequences of the novel missense variant in patient 16 (c.450T>G, p.Asn150Lys) were also predicted by SIFT and PolyPhen as tolerated and benign.
Large deletion/duplication prediction (patients 17–25)
The reference exonic coverage depth of each dystrophin gene exon was obtained using the methods described above (supplementary file 3). With the predetermined threshold criteria (deletion <0.2, duplication >1.8), deleted or duplicated exons in all nine patients were completely consistent with the results identified using the MLPA method (table 4, supplementary file 4). The coverage plot of these nine patients reconfirmed these results (figure 1). The junctions of deleted or duplicated regions, which were usually located in introns, were also roughly estimated from these coverage plots. To determine the breakpoint precisely, we used the raw coverage depth and paired-end sequencing data around the junctional sequence identified from the coverage plots. Subsequent Sanger sequencing confirmed the breakpoints in seven patients (patient 17 and patients 19–24; table 4). Paired-end data for patient 25 revealed an inversion between intron 55 of the dystrophin gene and intron 13 of PHEX, suggesting the presence of complex rearrangement in addition to duplication of exons 50 to 55 of the dystrophin gene.
Discussion
Although the targeted resequencing by NGS technology has gained much attention in the field of molecular diagnostics, several limitations have delayed its application to medical genetic diagnosis.19 20 26 It is difficult to standardise the procedures because of the use of various sequencing platforms and target enrichment methods, which are updated rapidly. The cut-off thresholds for accurate variant identification, including the minimum read depth, range of variant percentage compared with the wild type and quality score, have not been defined fully. Variable coverage depth across target regions and misalignment between homologous sequences or pseudogenes may also affect the accuracy of sequencing data.
Accepting that some problems are inherent in NGS technologies regardless of the target regions, we believe that the dystrophin gene has an advantage over other target regions located in autosomal chromosomes. Because of hemizygosity and the absence of reported pseudogenes, a less stringent cut-off threshold can be applied. This advantage may also make target enrichment with bar-code multiplexing feasible, thus further reducing the cost and time, which are other obstacles to the clinical application. A recent report by Cummings et al,27 which used multiplexing and target enrichment protocols similar to those used in the present study, provides a proof-of-concept example for our study. Even after multiplexing four samples in one lane, nearly 100% of regions of all the dystrophin gene exons were covered optimally to identify the SNVs and indels. Although most research groups suggest that a minimum of 10–30 read depths is needed for accurate SNV identification,19 20 28–30 the cut-off criteria used in the present study might be acceptable considering the hemizygosity of the dystrophin gene.
The high proportion of variants compared with the wild type illustrates the advantage of hemizygosity of the dystrophin gene when identifying SNVs (table 3). The relatively low proportion of variants in indels (table 3) may stem from the terminal location of deleted or inserted bases in part of the aligned reads. When deleted or inserted bases are located in either end of an aligned read, they could be erroneously identified as a series of SNVs rather than indels. Although a separate algorithm for indel detection may be necessary to improve sensitivity, we applied the same criteria to both SNVs and indels because the high coverage depth in this study and the hemizygosity of the target region should decrease the likelihood of such false negatives. Because 16 patients in the present study were selected randomly and their genotypes were unconfirmed, the mutation detection rate might provide further support for the accuracy of this method. The pathogenic mutations were identified in 15 of 16 patients, and the mutation detection rate and distribution of mutation by types (12 nonsense, 2 small deletions causing frameshift and 1 splicing mutation) were similar to the results of other studies using different methods.6 7 As the SIFT and PolyPhen prediction suggested tolerable or benign functional changes, the novel missense variant found in patient 16 (c.450T>G, p.Asn150Lys) was not classified as a pathogenic mutation.
As described above, variable coverage across the target regions is inherent in the targeted resequencing method. Although the coverage depth may differ among each of the 79 dystrophin gene exons in a single patient, this pattern of variation seems to be reproducible in multiple patients sharing the same target enrichment and NGS platform (supplementary file 1). This was demonstrated recently by Walsh et al21 Because a large deletion/duplication of the dystrophin gene typically includes several exons, comparison using the coverage depth per exon (exonic coverage depth in the present study) might be justified. Compared with the target genes located in autosomal chromosomes, copy number changes would be more obvious in the dystrophin gene because of hemizygosity. This difference is particularly significant when analysing large deletions because the coverage depth of the deleted region approaches zero.
The non-coding region of the dystrophin gene, except for the repetitive area, was also captured and sequenced in the present study. Although intronic variations contribute little to the whole DMD/BMD mutational spectrum and are difficult to validate by genomic sequencing alone, inclusion of intronic regions may have several implications for DMD/BMD molecular diagnosis. First, targeting only exonic regions may produce an uneven distribution of coverage within the exon, resulting in the possible detection of a suboptimally covered region. Cummings et al noted that the coverage per base is reduced at the flanks and is increased in the middle of the target regions when discontiguous exons are resequenced.27 Sequencing of nearly contiguous regions encompassing non-coding regions would increase the mean read depth of all the dystrophin gene exons, which would appear more evenly distributed when plotted across the whole dystrophin gene region, as indicated in figure 1. This positively affects the accuracy of the prediction of both small mutations and large deletion/duplication mutations.
Second, aberrant paired-end sequencing calls (too long, scramble and inversion) usually identified in intronic sequences could provide clues for breakpoint determination in large duplication/deletion cases. Although breakpoint determination from paired-end sequencing data may not be an indispensable step in routine genetic diagnosis, this method may provide additional information regarding complex rearrangement, as in the case of patient 25. Contrary to the prediction of in-frame duplication using the reading frame rule in patient 25 (exons 50–55), the clinical course and expression of dystrophin in muscle were consistent with DMD. In addition, insertion of PHEX exons into the dystrophin gene was identified by reverse transcription PCR experiments (data not shown). As it is predicted to cause the protein truncation, it could explain the severity in patient 25.
Compared with previous proof-of-concept studies that discussed the detailed methodological perspectives in limited cases, the present study emphasises a more practical approach by testing a relatively large number of patients and demonstrating complex mutational spectra from large deletions/duplications to small mutations in this single platform. This method would reduce the time necessary to reach the genetic diagnosis of DMD/BMD compared with the current method—a combination of MLPA and Sanger sequencing. Moreover, multiplexing and inclusion of other related genes might contribute to reducing the cost, which is one of the main obstacles for clinical use. We hope that the present study provides practical evidence to encourage the application of NGS technology to routine genetic diagnosis of DMD/BMD.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Download Supplementary Data (PDF) - Manuscript file of format pdf
- Download Supplementary Data (PDF) - Manuscript file of format pdf
- Download Supplementary Data (PDF) - Manuscript file of format pdf
- Download Supplementary Data (PDF) - Manuscript file of format pdf
- Download Supplementary Data (PDF) - Manuscript file of format pdf
Footnotes
BCL and SL contributed equally to this work.
Funding This study was supported by a grant from the Korea Healthcare Technology R&D Project, Ministry for Health, Welfare and Family Affairs, Republic of Korea (Grant No. A080588) and Seoul Broadcasting System Grant-in-Aid for Seoul National University Children's Hospital Research (Grant No.30-2010-0190).
Competing interests None.
Ethics approval Ethics approval was provided by the Institutional Review Board of the Seoul National University Hospital.
Provenance and peer review Not commissioned; externally peer reviewed.