Article Text

Download PDFPDF

Molecular diagnosis for heterogeneous genetic diseases with targeted high-throughput DNA sequencing applied to retinitis pigmentosa
  1. David A Simpson,
  2. Graeme R Clark,
  3. Sharon Alexander,
  4. Giuliana Silvestri,
  5. Colin E Willoughby
  1. Centre for Vision and Vascular Science, Queen's University Belfast, Belfast, Northern Ireland, UK
  1. Correspondence to Dr David Simpson, Queen's University Belfast, Centre for Vision and Vascular Science, Ophthalmic Research Centre, Institute of Clinical Science A, Royal Victoria Hospital, Belfast BT12 6BA, N Ireland, UK; david.simpson{at}


Background The genetic heterogeneity of many Mendelian disorders, such as retinitis pigmentosa which results from mutations in over 40 genes, is a major obstacle to obtaining a molecular diagnosis in clinical practice. Targeted high-throughput DNA sequencing offers a potential solution and was used to develop a molecular diagnostic screen for patients with retinitis pigmentosa.

Methods A custom sequence capture array was designed to target the coding regions of all known retinitis pigmentosa genes and used to enrich these sequences from DNA samples of five patients. Enriched DNA was subjected to high-throughput sequencing singly or in pools, and sequence variants were identified by alignment of up to 10 million reads per sample to the normal reference sequence. Potential pathogenicity was assessed by functional predictions and frequency in controls.

Results and conclusions Known homozygous PDE6B and compound heterozygous CRB1 mutations were detected in two patients. A novel homozygous missense mutation (c.2957A→T; p.N986I) in the cyclic nucleotide gated channel β1 (CNGB1) gene predicted to have a deleterious effect and absent in 720 control chromosomes was detected in one case in which conventional genetic screening had failed to detect mutations. The detection of known and novel retinitis pigmentosa mutations in this study establishes high-throughput DNA sequencing with DNA pooling as an effective diagnostic tool for heterogeneous genetic diseases.

  • Diagnostics test
  • clinical genetics
  • genetic screening/counselling
  • molecular genetics
  • ophthalmology
View Full Text

Statistics from


The completion of the Human Genome Project in 2003 was heralded as the dawn of an era of genomic medicine,1 in which information from genomes would guide clinical decision making and deliver personalised medicine.2 It was expected that accelerated detection of disease-related mutations would improve genetic diagnosis and prognosis.3 However, delivery of personalised genomic medicine requires not only access to the complete human genome, but availability of appropriate genetic tests for individual patients. The genetic heterogeneity of many Mendelian disorders is a major obstacle to obtaining molecular diagnoses in clinical practice.4 For example, retinitis pigmentosa5 (RP (MIM #268000)), the most common inherited retinal degeneration (prevalence 1:4000), is caused by mutations in over 40 genes for the non-syndromic form of the disease alone (Retnet: Retinal Information Network, Molecular genetic testing is important for clinical care,6 enabling assignment of risk, genetic counselling and prognosis, and will be essential for enrolling patients in the future gene therapy trials likely to stem from the promising current human trials of RPE65 (MIM +180069) therapies for Leber congenital amaurosis.7 8

Apart from the genetic heterogeneity of RP, there are a number of obstacles that currently limit molecular diagnosis and therefore hinder the potential for personalised genomic medicine to guide clinical decision making. The lack of clearly defined genotype–phenotype correlations makes it difficult to direct testing to specific candidate genes. Approximately 50% of all patients with RP are the only known affected family member and have no evidence of consanguinity; these patients are classified as isolated, sporadic or simplex cases,9 which excludes diagnostic evaluation based on inheritance pattern. Traditional genetic screening for RP is laborious, although technological advances have had some impact.10–12 The development of massively parallel or ‘next-generation’ sequencing techniques which generate millions of DNA sequence reads in parallel during a single experimental run offers a potential solution.4 13 However, to date, applications of this high-throughput DNA sequencing approach as a molecular diagnostic tool have been limited because of the costs and perceived technical and data-handling challenges.14 15 The aim of this work is to demonstrate that high-throughput DNA sequencing is now sufficiently established to warrant introduction into clinical practice. Herein we report the use of this technique to detect known and novel mutations in patients with RP, demonstrating the clinical utility of this new technology.



All applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during this research. The study was approved by the Northern Ireland Research Ethics Committee and all participants gave informed consent. Patients were selected from a previously reported cohort,12 and all underwent a comprehensive ocular evaluation including Snellen visual acuity testing, visual fields (Humphrey 24-2; Carl Zeiss Meditec, Inc, Dublin, California, USA), fundal examination, and electroretinography to establish the diagnosis of RP based on the presence of the typical fundal features (figure 1), visual field constriction and an attenuated or abolished electroretinogram. DNA was extracted from whole blood using standard protocols (Wizard DNA Purification Kit; Promega, Southampton, UK). A total of 360 DNA samples from the general Northern Ireland population (unscreened for ocular disease) were used as controls.

Figure 1

Fundal images and pedigrees of the five patients with retinitis pigmentosa (RP). The fundal appearances of the patients are typical of RP, showing variable amounts of optic disc pallor, attenuation of the retinal vessels, retinal pigment epithelial atrophy, and bone spicule pigmentation.

Sequence capture and next-generation sequencing

A custom sequence capture array (Roche NimbleGen, Madison, Wisconsin, USA) was designed to target all exons and 100 bp of flanking sequence from all genes in which mutations were known to cause RP (Retnet: Retinal Information Network) and selected genes associated with Leber congenital amaurosis, a disease that has phenotypic overlap with RP (table 1).

Table 1

Genes implicated in retinitis pigmentosa (or Leber congenital amaurosis which shows phenotypic overlap) arranged by inheritance pattern

All reference sequences were based on the NCBI36/hg18 assembly of the human genome. The array comprised 385 000 unique probes selected using the Sequence Search and Alignment by Hashing Algorithm16 to capture a total of 359 kb of genomic sequence comprising 681 exons from 45 genes. Patient DNA samples were enriched for the targeted sequences using the manufacturer's protocols. Briefly, 21 μg aliquots were fragmented and hybridised to the array, non-target sequences were washed off, and the enriched fragment DNA pool was subsequently eluted and amplified by ligation-mediated PCR. Approximately 5 μg of amplified enriched DNA was used as input for massively parallel sequencing on a Genome Analyser II (Illumina, San Diego, California, USA) with either a single sample or four pooled, bar-coded samples per flow cell to generate single end reads of 40 bp or 32 bp after removal of tags (GATC Biotech, Konstanz, Germany). Reference gene sequences were annotated with known single-nucleotide polymorphisms (SNPs) from the NCBI dbSNP database build 130 and RP mutations from The Human Gene Mutation Database ( or reported in the literature. Sequencing reads were aligned to the reference sequences using Genomic Workbench software (CLC bio, Nottingham, UK) with default settings. Sequence variants present in >30% of reads at positions covered by at least five reads were classified as novel or known non-coding, synonymous or non-synonymous changes by comparison with the annotated reference sequences. These thresholds were chosen to maximise the detection of real variants while minimising artefactual calls and were informed by observation of the detection rate of known SNPs. Identified sequence variants were annotated according to the guidelines published by the Human Genome Variation Society.

Determination of pathogenicity

The potential pathogenicity of sequence variants identified by high-throughput DNA sequencing was assessed as follows. First, known SNPs were excluded as described above. The frequency of non-synonymous sequence variants in 360 population controls (720 chromosomes) was then determined by either Sequenom mass array genotyping (Sequenom, San Diego, California, USA) or high-resolution melt curve analysis on a LightCycler 480 (Roche Diagnostics, Basel, Switzerland) using Precision HRM MasterMix (Primerdesign, Southampton, UK), according to the manufacturers' protocols. The functional effects of non-synonymous sequence variants that were absent from or present in less than 1% of the control population were predicted using a range of freely available computational tools. The conservation of the affected amino acid across species was analysed using the ClustalW multiple sequence alignment program ( Deleterious structural effects of amino acid substitutions on protein function were assessed using the PolyPhen (Polymorphism Phenotyping;, PMut ( and SIFT ( algorithms. Synonymous and intronic sequence variants were assessed for potential deleterious effects upon messenger RNA splicing using the Human Splicing Finder V.2·4 tool ( Sequence variants of interest identified by high-throughput DNA sequencing were verified in replicate PCR amplicons using Sanger sequencing. When other family members were available, the segregation of sequence variants with the disease was assessed.


We performed array-based enrichment of 45 RP genes and subsequent high-throughput DNA sequencing and report the detection of known RP mutations and a novel recessive mutation in CNGB1 (MIM *600724; c.2957A→T; p.N986I) in a patient with simplex RP. Two patients with an established genetic diagnosis12 but different patterns of molecular involvement, one homozygote (patient RP148: PDE6B; MIM +180072; c.1685G→A; p.G562D) and one compound heterozygote (patient RP179: CRB1 MIM +604210; c.2129A→T; p.E710V and c.2548G→A; p.G850S), were included to demonstrate the ability of our approach to detect known mutations. Three patients without a genetic diagnosis were selected because they were the only affected member in their family (simplex RP) and conventional genotyping and resequencing arrays had failed to detect mutations.12 Such cases present the greatest challenge, and these patients were chosen deliberately to assess the ability of targeted sequence capture and high-throughput DNA sequencing to detect novel mutations in the known RP genes.

A custom sequence capture array was designed to capture all exons and 100 bp of flanking sequence from a total of 45 genes in which mutations have been reported to cause RP or Leber congenital amaurosis (table 1). When the captured DNA was subjected to high-throughput DNA sequencing, ∼35% of the reads aligned uniquely to the reference sequences, indicating a >3000 fold enrichment of the targeted sequences. One sample was sequenced in a single flow cell and resulted in high fold coverage of almost the entire targeted region, with a mean coverage of 486 reads and >20-fold coverage over more than 99% of the target region (figure 2). The four remaining samples were labelled with 4 bp tags and pooled before sequencing. Although the coverage was reduced to an average of 98 reads, ∼94% and 88% of the target region was covered by at least 10 or 20 reads, respectively (figure 2), demonstrating that sample pooling is a feasible approach to reducing sequencing costs.

Figure 2

Sequence coverage of targeted regions. The graph shows the ‘completeness’ of total coverage—that is, the percentage of the targeted sequences that are represented at different minimum fold coverages. The solid line denotes the sample analysed in a single lane of a Genome Analyser flow cell, and the dashed line the four samples multiplexed together (mean values±SD).

Applying a requirement for at least five reads and a minimum variant frequency of 30%, 582 single-nucleotide variants were detected, of which 150 were novel variants and 432 were previously reported SNPs. Of the novel variants, 113 occurred in non-coding and 37 in coding sequences (13 of which were non-synonymous), and none affected canonical splice sites. The non-synonymous sequence variants detected included the known homozygous and compound heterozygous mutations, and were all confirmed by Sanger sequencing (table 2). The known PDE6B homozygous mutation (c.1685G→A; p.G562D) in patient RP148 was detected in all 134 reads that spanned this position (figure 3). For the known CRB1 heterozygous variants, 76 of 135 reads (c.2129A→T; p.E710V) and 78 of 130 reads (c.2548G→A; p.G850S) carried the mutated sequence in patient RP179 (figure 3).

Table 2

Classification of non-synonymous sequence variants detected by massively parallel DNA sequencing

Figure 3

Detection of previously identified mutations in two patients with retinitis pigmentosa. (A) Detection of PDE6B homozygous mutation c.1685G→A (p.G562D). The target regions (black boxes), across which capture probes were designed, coincide with the exons of the PDE6B gene. The target region for exon 13 is expanded and can been seen to extend 100 bp 5′ from the exon (because of the proximity of exons 13 and 14, the whole 3′ intron is captured). Part of the exon is expanded to show the DNA sequence, and below this the reads aligned at this position are shown in red or green to indicate forward and reverse reads (for legibility, not all aligned reads are shown). All the sequence reads have an A in place of a C in the reference sequence indicating a homozygous change. This variant is confirmed by Sanger sequencing (bottom) and results in a missense substitution of glycine (GGC) with aspartic acid (GAC). (B) Detection of CRB1 compound heterozygous mutations c.2129A→T; p.E710V and c.2548G→A; p.G850S. The targeted regions are depicted as in (A). However, for both variants approximately half of the reads match the reference sequence, whereas half carry the variant nucleotide indicating a heterozygous variant.

Two previously reported rare missense variants were detected in RP1 (MIM *603937): c.4250T→C; p.L1417P19 in patient RP141 and c.1118C→T; p.T373I19 in patient RP179. These were absent from or present at a frequency of 1%, respectively, in 360 control samples (720 chromosomes), but neither segregated with disease and they were therefore considered to be non-pathogenic. Of the remaining eight non-synonymous variants, five were found to occur in <1% of controls (table 2). Three of these were novel heterozygous changes (ABCA4; MIM *601691; c.3352C→G; p.H1118D: FSCN2; MIM *607643; c.538C→T; p.R180W: USH2A; MIM +608400; c.3123C→A; p.H1041Q), and, because no second potentially pathogenic variant could be found in the respective genes, they were considered unclassified variants. The non-synonymous variant in ROM1 (MIM *180721; c.686G→A; p.R229H) was found in <1% in controls but was a previously reported polymorphism,18 and no mutation was found in the peripherin-RDS gene (MIM *179605).

One novel homozygous missense variant was detected in a simplex patient (RP167; figure 1), in the cyclic nucleotide-gated channel β1 gene (CNGB1), and was absent in controls (table 2, figure 4). The c.2957A→T sequence variant in CNGB1 exon 29 results in the non-conservative substitution (p.N986I) of the amidic amino acid asparagine (AAC: Asn; N) with the aliphatic isoleucine (ATC: Ile; I). Computational analyses (PolyPhen, PMut and SIFT) predicted pathogenicity (table 2), and evolutionary conservation analysis (ClustalW) showed the N986 position to be invariant across species with CNGB1 orthologs (figure 4). When a recessive RP mutation was detected by high-throughput DNA sequencing, the affected patient was counselled about the diagnosis and risk of disease transmission. During this interview, the patient revealed that his parents were second cousins, establishing consanguinity in the pedigree. Available family members were screened using Sanger sequencing to assess segregation, and the heterozygous state of these family members indicates segregation of the mutation with disease (figure 4).

Figure 4

Detection of a novel mutation (c.2957A→T; p.N986I) in cyclic nucleotide gated channel β1 gene (CNGB1; MIM *600724). (A) Alignment of sequence reads to the CNGB1 reference sequence. All of the 134 aligned reads are depicted as lines below the sequence of CNGB1 exon 29. A small number of reads are expanded to show their DNA sequence and the T which varies from the reference (A) in all reads displayed, indicating a homozygous mutation c.2957A→T in exon 29 of CNGB1. (B) Pedigree showing the relatedness of the parents of the index case uncovered following the identification of a recessive mutation. The heterozygous state (+/−) of the other available family members indicates segregation of the mutation with disease. (C) The CNGB1 protein (NP_001288) contains an ion-transport domain (pfam00520) and a cyclic nucleotide-binding domain (CNBD: cd0038; NCBI Conserved Domain Database). The novel mutation results in a non-conservative substitution (p.N986I) of a polar, uncharged, hydrophilic (blue) residue (asparagine: N) for a non-polar, hydrophobic (red) residue (isoleucine: I). ClustalW alignment of CNGB1 orthologs from 10 species, with amino acids coloured according to hydrophobicity, reveals the high level of conservation of this region. The asparagine (N986) is an invariant residue in a key structural domain: the cyclic nucleotide-binding domain (CNBD). CNBDs are present in cAMP- and cGMP-dependent protein kinases and vertebrate cyclic nucleotide-gated ion channels. Closure of cyclic nucleotide-gated channels occurs when the concentration of cGMP decreases in response to the G-protein-coupled cascade initiated by light. The decrease in cGMP is thus transduced into membrane hyperpolarisation. Previously a homozygous missense mutation in exon 30 of the CNGB1 gene (c. 2978G→T; p.G993V) was identified as a rare cause of recessive RP in a consanguineous French family20; the residue is indicated (V). Both of these missense mutations in CNGB1 result in amino acid substitutions (p.G993V; p.N986I) in the CNBD and may affect the binding of cGMP to CNGB1. Modulation of the sensitivity of CNGB1-containing channels is potentially important for light or dark adaptation and can be achieved by phosphorylation of a tyrosine residue (Y983, equivalent to Y1097 in bovine CNGB121). The p.N986I mutation is only three residues from this tyrosine (P) and is likely to disrupt phosphorylation by altering the phosphorylase binding motif or changing the local three-dimensional protein structure. The loss of 170 amino acids of the distal C-terminus caused by the only other reported mutation in CNGB1 (c.3444 + 1G→A)22 23 is indicated by a black line.


The potential of targeted sequence capture and high-throughput DNA sequencing to improve genetic diagnosis and counselling and deliver personalised medicine is widely acknowledged. However, applications of this approach as a molecular diagnostic tool have been limited, and most commentators report that a number of technical and economic issues must be solved before this technique can be incorporated into routine clinical care.4 13 15 24 The detection of known and novel RP mutations in this study now establishes DNA pooling and high-throughput DNA sequencing as an effective diagnostic tool for heterogeneous genetic diseases.

Over 40 genes have been identified in RP, and there are insufficient phenotypic differences between patients (figure 1) to indicate which gene is probably mutated and guide molecular diagnostics. Inheritance pattern can direct genetic testing, particularly for X-linked RP in which mutations in RPGR (MIM *312610) and RP2 (MIM *300757) predominate, but the majority of patients have no family history.9 Current molecular testing strategies focus on small sets of genes and apply conventional laboratory techniques based on the relative frequency of mutations and the cost-effectiveness of testing. Commercially available genotyping arrays (Asper Ophthalmics, Tartu, Estonia) will only screen for known mutations based on inheritance pattern.10 Our results show that high-throughput DNA sequencing can detect previously reported mutations in known RP genes regardless of family history or inheritance pattern.

In three patients with simplex RP, conventional genotyping and resequencing arrays10 12 had failed to detect mutations, and high-throughput DNA sequencing was used to look for novel mutations. A novel homozygous mutation was detected in CNGB1 (c.2957A→T) in a patient presumed initially to have simplex RP (figure 1), but who subsequently reported consanguinity in the family (figure 4). The likely functional effects of this novel homozygous variant (figure 4), absence in controls, segregation with disease, and consanguinity in the family support its designation as the causative mutation in this patient. CNGB1, which encodes the β subunit of the rod cGMP-gated channel, is a rare cause of recessive RP (RP45; MIM *600724), with only two mutations previously reported.20 22 23 This mutation would not have been identified by conventional testing strategies, because CNGB1 would have been excluded because of the extremely low frequency of mutations. Therefore, high-throughput DNA sequencing can detect novel variants in addition to any known mutations if applied in previously uncharacterised RP cases. In addition to the immediate benefits for genetic counselling of patients with RP, high-throughput DNA sequencing provides the opportunity to identify candidates for gene therapy trials.7 8

A diagnostic molecular test must deliver a reliable, cost-effective mutational screen which can be performed in a clinical laboratory. Perhaps the most important factor for ensuring detection of variants is sufficient coverage. A similar approach for screening of genes involved in RP is being developed by Daiger et al25 to target complete genes and additional candidate genes. By focusing on the exonic regions of known RP genes, we kept our target region to 359 kb, which, although well below the capacity of the capture array, meant that coverage was maximised. The minimal coverage of 15-fold recommended for diagnostic implementation of this technology by Hoischen et al26 was achieved in 99% of the target region. We demonstrated that sufficient coverage (>15-fold in 90% of the target region) was maintained when four samples, each labelled with a 4 bp tag or ‘bar code’, were pooled and sequenced together. Additional cost savings could be achieved by multiplexing of more samples in a first pass screen, and, if any did not meet specified coverage criteria or no pathogenic variant was detected, further sequencing could be performed to increase coverage. Although high-throughput DNA sequencing might have to be outsourced by smaller diagnostic laboratories, the data analysis process performed by our group could be readily adopted by any laboratory.

The sheer length of sequence interrogated by targeted capture and massively parallel sequencing inevitably leads to the detection of many sequence variants, and the assessment of which of these are potentially pathogenic is a significant challenge. In this study we have focused on the ‘low hanging fruit’ represented by missense (or nonsense) variants and assessed pathogenicity from predicted functional effects, frequency in controls and segregation with disease. The successful detection of known and novel mutations justifies the introduction of a mutational screen using this approach. More sophisticated analyses are possible, although if a novel non-coding or synonymous variant does not have an obvious effect upon splicing, then it is difficult to assess its likely functional effect. It would not be feasible to test all new variants in a control population as part of a routine diagnostic service, but the availability of an increasing number of genome sequences (1000 Genomes Project) will make it easier to classify common and rare variants.

The power of high-throughput DNA sequencing is exemplified by studies in which targeted sequencing of all coding regions, termed the ‘exome’, successfully identified mutations in two rare Mendelian disorders,27 28 and whole-genome sequencing detected the genetic basis of Charcot–Marie–Tooth disease.29 Unfortunately, the cost of achieving sufficient coverage and the challenges of data analysis mean that the application of exome and whole-genome sequencing as a clinical diagnostic tool is remote. However, we have demonstrated that targeted sequence capture of a disease-specific cohort and subsequent high-throughput DNA sequencing can be used as a cost-effective genetic diagnostic tool.


We thank Clive Wolsley for his assistance with retinal electrophysiology, Gareth McKay for assistance with control DNA samples, and Justin O'Neill for his work in recruiting the families.


View Abstract


  • Funding This research was supported by the Health and Social Care Northern Ireland R&D of the Public Health Agency (project grant RRG 4·43).

  • Competing interests None.

  • Ethics approval This study was conducted with the approval of the Northern Ireland Research Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.