Article Text

High-throughput mutation analysis in patients with a nephronophthisis-associated ciliopathy applying multiplexed barcoded array-based PCR amplification and next-generation sequencing
1. Jan Halbritter1,
2. Katrina Diaz1,
3. Moumita Chaki1,
4. Jonathan D Porath1,
5. Brendan Tarrier2,
6. Clementine Fu1,
7. Jamie L Innis1,
8. Susan J Allen1,
9. Robert H Lyons2,
10. Constantinos J Stefanidis3,
11. Heymut Omran4,
12. Neveen A Soliman5,6,
13. Edgar A Otto1
1. 1Department of Pediatrics, University of Michigan, Ann Arbor, Michigan, USA
2. 2DNA Sequencing Core, University of Michigan, Ann Arbor, Michigan, USA
3. 3Pediatric Nephrology, ‘A. and P. Kyriakou’ Children's Hospital, Athens, Greece
4. 4Department of Pediatrics and Adolescent Medicine, University Hospital Münster, Münster, Germany
5. 5Center of Pediatric Nephrology & Transplantation, Cairo University, Cairo, Egypt
6. 6Egyptian Group for Orphan Renal Diseases (EGORD), Cairo, Egypt
1. Correspondence to Dr Edgar A Otto, Department of Pediatrics, University of Michigan Health System, 8220A MSRB III, 1150 West Medical Center Drive, Ann Arbor, MI 48109-5646, USA; eotto{at}umich.edu

## Abstract

Objective To identify disease-causing mutations within coding regions of 11 known NPHP genes (NPHP1-NPHP11) in a cohort of 192 patients diagnosed with a nephronophthisis-associated ciliopathy, at low cost.

Methods Mutation analysis was carried out using PCR-based 48.48 Access Array microfluidic technology (Fluidigm) with consecutive next-generation sequencing. We applied a 10-fold primer multiplexing approach allowing PCR-based amplification of 475 amplicons (251 exons) for 48 DNA samples simultaneously. After four rounds of amplification followed by indexing all of 192 patient-derived products with different barcodes in a subsequent PCR, 2×100 paired-end sequencing was performed on one lane of a HiSeq2000 instrument (Illumina). Bioinformatics analysis was performed using ‘CLC Genomics Workbench’ software. Potential mutations were confirmed by Sanger sequencing and shown to segregate.

Results Bioinformatics analysis revealed sufficient coverage of 30×for 168/192 (87.5%) DNA samples (median 449×) and of 234 out of 251 targeted coding exons (sensitivity: 93.2%). For proof-of-principle, we analysed 20 known mutations and identified 18 of them in the correct zygosity state (90%). Likewise, we identified pathogenic mutations in 34/192 patients (18%) and discovered 23 novel mutations in the genes NPHP3 (7), NPHP4 (3), IQCB1 (4), CEP290 (7), RPGRIP1L (1), and TMEM67 (1). Additionally, we found 40 different single heterozygous missense variants of unknown significance.

Conclusions We conclude that the combined approach of array-based multiplexed PCR-amplification on a Fluidigm Access Array platform followed by next-generation sequencing is highly cost-efficient and strongly facilitates diagnostic mutation analysis in broadly heterogeneous Mendelian disorders.

• Molecular genetics
• Diagnostics
• Renal Medicine
View Full Text

## Introduction

Nephronophthisis-associated ciliopathies (NPHP-AC) comprise a group of rare autosomal-recessive cystic kidney diseases that include nephronophthisis (NPHP), Senior-Loken syndrome (SLSN), Joubert syndrome (JBTS), and Meckel Gruber syndrome (MKS).1 NPHP-AC are genetically heterogeneous disorders that share a broad phenotypic spectrum as a result of primary cilia, basal body or centrosomal defects.2 In patients with NPHP, renal tubular atrophy, basement membrane disintegration, interstitial fibrosis and cyst formation result in progressive kidney failure during childhood.3 About 15% of patients develop extrarenal manifestations, most notably progressive retinal dystrophy, referred to as SLSN. In patients with JBTS, midbrain-hindbrain malformations and cerebellar vermis hypoplasia/aplasia results in numerous neurological features including developmental delay, intellectual disability, muscle hypotonia, ataxia, oculomotor apraxia, nystagmus and breathing pattern irregularities.4 The most severe manifestation of the NPHP-AC clinical spectrum is seen in fetuses with MKS, a perinatally lethal ciliopathy, characterised by central nervous system malformations (typically occipital encephalocele), bilateral postaxial hexadactyly, hepatobiliary ductal plate malformation and multicystic dysplastic kidneys.5 To date, mutations in 11 genes (NPHP1-NPHP11) have been implicated in patients with isolated NPHP and/or SLSN (NPHP1, INVS, NPHP3, NPHP4, IQCB1, CEP290, GLIS2, RPGRIP1L, NEK8, SDCCAG8, TMEM67) (table 1).6–17 JBTS or MKS results from mutations in a subset of these genes, or from any of one of at least 16 additional disease genes (MKS1, B9D1, B9D2, AHI1, INPP5E, ARL13B, TMEM216, CC2D2A, TTC21B, KIF7, TCTN1, TCTN2, ATXN10, WDR19, CEP41 and TMEM237) most of which have been identified only recently.18–32 The broad heterogeneity in NPHP-AC presents a huge challenge in that it requires the ability to sequence a large number of specific genomic regions (eg, coding exons of many disease genes) across many DNA samples, rendering Sanger sequencing analysis tedious and prohibitively expensive.

Table 1

Genes investigated using Fluidigm 48.48 Access Array amplification and consecutive next-generation sequencing

Fortunately, recent advances in next-generation sequencing (NGS) techniques are now allowing high-throughput targeted resequencing at very low cost per sample.33 For target sequence enrichment, we applied PCR-based amplification using the 48.48 Access Array System (Fluidigm, South San Francisco, California, USA). A workflow of this high-throughput method used for mutation analysis is illustrated in figure 1. This array system with integrated fluidic circuits (IFC) contains micro-PCR chambers and enables nanolitre reaction volumes. It facilitates parallel amplification of 48 amplicons for each of 48 different DNA samples resulting in 2304 different PCR assays (figure 1A). Universal tags are introduced during this first amplification step after adding respective sequences to the 5′-end of each target-specific primer (figure 1A). In a second PCR, performed on a standard PCR cycler (figure 1B), the presence of the universal tags enables the incorporation of unique barcodes to the final amplicon, as well as specific adaptor sequences required by the various NGS platforms (figure 1C). In order to increase throughput and reduce costs per sample further, we applied a multiplex protocol in which 10 primer pairs per single PCR reaction were combined, allowing the amplification of 480 different PCR products per array run for each of 48 DNA samples. We performed mutation analysis for all nephronophthisis genes (NPHP1-11) in 192 patients by combining the high-throughput technique of Fluidigm Access Array amplification (four arrays) with one paired-end (2×100 bases) sequence run on one lane of a flowcell of an Illumina HiSeq2000 NGS instrument. This highly effective economic approach will help to accelerate genetic diagnostics in paediatrics, especially in patients with heterogeneous Mendelian disorders.

Figure 1

Fluidigm Access Array high-throughput mutation analysis workflow. (A) 48.48 Access Array with 48 DNA sample inlets (50 ng genomic DNA (gDNA) per sample) and 48 primer inlets is shown on the top. We applied a multiplex approach by using 10 primer pairs per single primer inlet in order to amplify 48×480=23 040 different amplicons per array. All target-specific forward primers (dark red) and reverse primers (light red) have been tagged with a universal forward (dark blue) or reverse tag (light blue), respectively (designed by Fluidigm). The first amplification on the Fluidigm system results in amplicons containing universal tags on both ends. (B) In a second PCR performed in a standard PCR cycler, Illumina next-generation sequencing (NGS) forward (dark green) and reverse (light green) adaptors are incorporated into the amplicons, together with a unique barcode (yellow) for each of 48 different DNA samples. Subsequently, samples are pooled, purified, quantified and quality control (QC) assessed using a Bioanalyzer instrument (Agilent). (C) Samples are submitted for high-throughput sequencing on a HiSeq2000 Illumina platform performing a paired-end run of 2×100 bases. Fluidigm custom primers are used for sequencing amplicons from both sides (1st and 2nd sequencing primer), and for sequencing the barcode of 10 bases (3rd sequencing read). Note, that we used four Fluidigm access arrays in order to perform mutation analysis in 192 DNA samples for 11 genes (NPHP1-11) on one lane of a HiSeq2000 NGS instrument. Taq: High-Fidelity DNA Polymerase.

## Materials and methods

### Human subjects

We obtained blood samples, pedigrees and clinical information after receiving informed consent (http://www.renalgenes.org). Approval for experiments on humans was obtained from the University of Michigan institutional review board. The diagnosis of NPHP-associated ciliopathy was based on published clinical criteria.34 The cohort of 192 patients with NPHP-AC included 91 (47.5%) patients with NPHP and early onset of end-stage renal disease before age 6 years, 98 (51%) patients with SLSN and three (1.5%) patients with JBTS with kidney involvement. Many of these patients presented with additional extrarenal manifestations, for instance, retinal dystrophy (84), early blindness (Leber congenital amaurosis) (14), heart anomalies (14), situs inversus (10), polydactyly (7), urinary tract malformations (6), liver fibrosis/hepatomegaly (4), facial dysmorphic features (4), cone-shaped epiphysis (3), congenital oculomotor apraxia (3), deafness (3), cerebellar vermis hypoplasia (3), hydrocephalus (2), skeletal malformations (2), pancreatic cysts (2) and microcephaly (2). The total cohort consisted of 20 (10%) familial cases and 172 (90%) sporadic cases. Consanguinity was known to be present in 26 (14%) families. As a first diagnostic step, homozygous deletions of NPHP1 were excluded in all patients by applying a multiplex PCR-based deletion analysis described elsewhere.35

### Primer design and evaluation for the Fluidigm Access Array IFC system

We designed 475 target-specific primer pairs to cover all 251 coding exons of the genes NPHP1-11 according to the guidelines from Fluidigm (Access Array System user guide). The maximal amplicon size was chosen as 200 bp anticipating subsequent NGS paired-end sequence reads of 2×100 bases. This resulted in needing to split 67% of all coding exons, and to design exon-flanking overlapping amplicons. All primer sequences are listed in supplementary table S1 with failed primer pairs (median exon coverage ≤30×) indicated in red. Primer 3 software (http://frodo.wi.mit.edu/primer3/input.htm) was used to design primers with a melting temperature (Tm) between 59°C and 61°C, and an optimal size of 20 (range: 17–27) nucleotides avoiding homopolymers of four or more nucleotides. Universal primer sequences (5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′) and (5′CTCGGCATTCCTGCTGAACCGCTCTTCCGATCT-3′) required for Illumina MiSeq, GAII or HiSeq2000 compatible amplicon tagging and for downstream indexing were added at the 5′ end to all target-specific forward and reverse primers, respectively (figure 1).

### Target DNA enrichment using the Fluidigm 48.48 Access Array IFC System

Starting with 475 amplicons, covering all 251 coding exons and intron/exon boundaries of the genes NPHP1-11, we generated 48×10-plex primer pools with a final concentration of 1 µM per primer (20 primers/well). Every sample master mix solution (1 per DNA sample) contained 50 ng genomic DNA, 1X FastStart High Fidelity Reaction Buffer with MgCl2, 5% DMSO, dNTP (200 µM each), 1X Access Array loading reagent, and FastStart High Fidelity Enzyme Blend (Roche, Indianapolis, Indiana, USA). DNA containing samples were applied to the 48 ‘sample inlets’, whereas, the 48 different separately prepared 10-plexed primer dilutions were applied to the 48 ‘primer inlets’ on the 48.48 Access Array (figure 1A). The pre-PCR IFC controller AX distributed and combined each of the 48 samples with each of the 48 primer solutions into 2304 separate microreaction chambers on a 2 cm2 area. Subsequently, the access array underwent thermal cycling on an FC1 Cycler with PCR conditions provided in supplementary table S2. PCR products were then harvested as 48 separate 10-plex amplicon pools using the post-PCR IFC controller AX and transferred to a 96-well plate. In a second PCR reaction with 15 cycles, Illumina sequence-specific adaptors and sample barcodes were attached to the different amplicon pools originating from different patients (figure 1B). We indexed 192 total DNA samples by attaching 192 different barcodes after processing four different 48.48 Access Arrays. The primer sequences required for Illumina GAII or HiSeq2000 compatible amplicon tagging are as follows: forward- 5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGA-3′ and reverse- 5′-CAAGCAGAAGACGGCATACGAGAT-(BARCODE)-CGGTCTCGGCATTCCTGCTGAAC-3′. Subsequently, all 192 barcoded samples were pooled and submitted for Illumina NGS.

### Next-generation sequencing using the Illumina HiSeq2000 platform

Pooled and indexed PCR products were sequenced on a single lane of an Illumina HiSeq2000 instrument as 2×100 bases paired-end run (v2.5 reagents) following standard Illumina protocols with the following modifications. In order to sequence the Fluidigm-specific barcodes and perform a multiplexed paired-end run, we substituted the Illumina index sequencing primer with the custom Fluidigm-specific Index-SP primer (5′-AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG-3′). Furthermore, we extended the index read length to eight cycles in order to decipher the 8 bp-long Fluidigm barcodes. For sequencing the first and second read, Illumina common primers PE-Rd1 (5′ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′) and PE-Rd2 (5′CGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT-3′) were used, respectively (figure 1C).

### Bioinformatics pipeline

A comprehensive flowchart illustrating the bioinformatics pipeline implemented for variant/mutation analysis is shown in figure 2. Sequence reads were separated according to their barcodes by using the CASAVA v1.7 demultiplex.dp script (Illumina) resulting in 30–40 Mbases per barcode. Sequence reads were aligned for each barcode (patient) using ‘CLC Genomics Workbench’ software (CLC-bio, Aarhus, Denmark) to a single reference sequence generated by concatenating the genomic sequences of all 11 known NPHP target genes which were downloaded from NCBI (hg19 build). We annotated all donor and acceptor splice sites of all exons within that reference sequence.

Figure 2

Bioinformatics pipeline flowchart implemented for high-throughput mutation analysis. Acronyms: SNP, single nucleotide polymorphism; DIP, deletion insertion polymorphism.

Parameters used for the alignment algorithm were as follows: maximum window length=11; maximum gap and mismatch count=2; length fraction=50% with a required similarity of 80%; mismatch cost (penalty)=2, indel cost (penalty)=3. Variant calls were obtained using the following filter parameters: minimal central base quality=20, minimal average quality=15, variant frequency ≥20%, and a minimum variant count of 10 reads. The rationale for choosing these variant count and frequency parameters is described in the Results section. The variant analysis included coordinates of obligatory splice sites and all variants predicted to change the amino acid sequence of the protein (missense, nonsense, insertions and deletions). Synonymous variants and single nucleotide polymorphism (dbSNP) (v132) reported in the database with a population allele frequency above 1% were excluded. Variants were ranked by the criteria of whether mutations were most likely truncating the conceptual reading frame (nonsense, frameshift and obligatory splice variants). Missense variants were ranked by evolutionary conservation analysis using web-based programmes predicting the impact of the variants on the encoded protein.

### Sanger sequencing confirmation and segregation analysis

The mutations most likely to be detrimental as identified by NGS were subsequently confirmed by performing Sanger sequencing using original stock DNA samples from the respective patient as a template. Furthermore, missense variants with an allele frequency ≤1% were analysed as well. In all cases where DNA was available (index patient, affected siblings, parents) we performed intrafamilial segregation analysis. In patients in whom only one heterozygous mutation was detected, all exons and flanking intronic sequences of the respective gene(s) were analysed by Sanger sequencing. PCR was performed using a touchdown protocol described previously.36 Sequencing was performed using BigDye Terminator v3.1 Cycle Sequencing Kit on an ABI 3730 XL sequencer (Applied Biosystems). Sequence traces were analysed using Sequencher (v.4.8) software (Gene Codes Corporation, Ann Arbor, Michigan, USA).

### Web-based programmes for variant analysis

Predictions on the possible impact of an amino acid substitution on the chemical change, evolutionary conservation and protein function were obtained by using the web-based PolyPhen2 software programme (http://genetics.bwh.harvard.edu/pph2/).

## Results

### Illumina NGS and mapping statistics

We performed sequencing of one lane of a flow-cell on a HiSeq2000 instrument after targeted amplification of 251 coding exons (475 amplicons) in 192 different indexed patients using the Fluidigm microfluidic platform. This generated a total of 66.9 million 100-base-long reads, of which 62.1 million (93%) mapped to the targeted reference coding exons of the genes NPHP1-11. The average of matched reads per patient was 323 267±114 174 SEM resulting in an average base coverage depth of 522×±185× (median=449×) across 35 640 bases targeted within coding regions (figure 2). Insufficient exon coverage was found in the DNA samples of 24 patients (12.5% failed DNA samples due to DNA quality) with a coverage depth below 30× (figure 3A). The average coverage data of each of 251 exons (NPHP1-11) across all 192 patients analysed is provided in supplementary table S3. The majority of the other patient-derived samples showed a median exon coverage in the range between 300× and 650×. In three cases, a very high coverage depth was found with 968×, 992× and 1032×, respectively (figure 3A). When assessing and comparing the median coverage of all 251 sequenced coding exons across all patients, we found that 17 exons (6.8%) showed a median coverage depth below 30×, which is insufficient in order to consistently call heterozygous mutations when applying a minimal cutoff parameter of 10 variant counts/reads (rational of parameter settings is described in the following section) (figures 2, 3B, online supplementary table S3). The enrichment/coverage profile with 2× median coverage for 98%, 10× coverage for 95.6%, 30× coverage for 93.2%, and 100× coverage for 88.4% of 251 different coding exons is shown in figures 2, 3B.

Figure 3

Coverage depth statistics after Fluidigm 48.48 Access Array amplification and next-generation sequencing of all coding exons of NPHP1-11 in 192 samples. (A) Median sequence coverage (y-axis) for each of 192 patients (x-axis) across all exons. Note that insufficient coverage (≤30×) was obtained for 24 different DNA samples. (B) Median sequence coverage (sorted on y-axis) for each of 251 exons (x-axis) across all DNA samples (192 patients). Note that 5 exons showed no coverage, and an additional 12 exons had a coverage depth below 30× (sensitivity 93%). (C) Allele frequency distribution of variant calls of known dbSNP132 variants compared with unknown variant calls. Note that only 0.7% (0.2%+0.5%) of the SNP132 calls show allele frequencies below 20% compared with 64.8% (51.9%+12.9%) of unknown variants. (D) Distribution of variant calls of dbSNP132 compared with unknown variant calls. Note that only 5.5% of dbSNP132 variants showed count numbers below 10 counts compared with 57% of all counts in unknown variants. Allele frequency of ≥20% and variant calls of ≥10 counts were therefore used as filter parameters.

### Variant filtering, validation and parameter setting

Variant SNP and insertion/deletion polymorphism (DIP) calling was performed using ‘CLC Genomics Workbench’ software, and resulted in altogether 13 375 SNP and 10 559 DIP calls. Filter parameters for potential variants/mutations were set after the analysis of the distribution of allele frequency and coverage parameters of all the known dbSNP132 identified in the total dataset assuming that these calls most likely represent ‘true positives’. We found a total of 1733 heterozygous dbSNP132 (exonic and intronic) across the dataset derived from 192 patients. The distribution of allele frequencies of these calls in comparison with the distribution of unknown (non-dbSNP132) variants is shown in figure 3C. Only 24 out of 1733 known dbSNP132 (1.4%) showed an allele frequency below 20% compared with (9430/13 375) 70.5% in non-dbSNP callings. We therefore set the 20% minimal allele frequency as one of the filter parameters, and we consider most (98.6%) of the unknown variants/mutation calls below this threshold as ‘false positives’ (figure 3C). Furthermore, 5.5% of the known dbSNP132 (96 out of 1733) vs 50.5% (6754/13 375) of the unknown (non-dbSNP132) dataset showed counts below 10 (figure 3D). We set 10 counts/reads as the second filter parameter to reduce ‘false positive’ calls further. Applying additional filters described in figure 2 led to the final selection of 95 potential mutations (73 SNPs and 22 DIPs) for Sanger sequencing validation. We were able to confirm 83 out of these 95 variants (87% specificity). The confirmed variants were 12 nonsense mutations, eight essential splice site mutations, 20 frameshift mutations and 43 missense changes (figure 2).

### Detection of previously identified known mutations

For proof-of-principle, we tested 19 DNA samples with 20 different known variants/mutations including deletions, insertions, nonsenses, missenses and splice site changes (see online supplementary table S4). All 20 positive control variants were detected in their correct zygosity state (19 heterozygous, 1 homozygous). Only two mutations/variants would have been missed when applying the established stringent parameters of at least 10 counts and allele frequency above 20%. One of these two variants is the heterozygous CEP290/NPHP6 splice site change (c.3104−2A>G) with seven out of nine counts, and the other variant is the previously published heterozygous NPHP4 missense change (p.Phe91Leu), with an allele frequency of only 14% (see online supplementary table S4). The Fluidigm amplification, and NGS strategy resulted, therefore, in the reliable detection of 18/20 variants/mutations (90% sensitivity).

### Discovery of disease-associated mutations in 34 patients with a NPHP-AC

High-throughput resequencing using the Fluidigm 48.48 Access Arrays and consecutive Illumina NGS led to the identification of mutations in 34 out of the 192 ascertained NPHP-AC patients (18%) in the genes NPHP1 (1), INVS (1), NPHP3 (6), NPHP4 (4), IQCB1 (11), CEP290 (8), RPGRIP1L (1), and TMEM67 (2) (table 2). We identified both mutated alleles in 26 patients (10 homozygous, 16 compound heterozygous), whereas in eight patients, only one potential loss of function mutation (nonsense (2), insertion (1), deletion (2), obligatory splice site (3)) has been found (table 2). Sanger sequencing of all exons together with the exon/intron boundaries did not reveal a second mutated allele in these eight cases. In total, we discovered 23 novel mutations in the genes NPHP3 (p.T136Rfs*13, p.E145Vfs*3, p.K179_E180del, c.2570+1G>T, p.R1125*, p.K1189Nfs*5, c.3812+1G>C), NPHP4 (c.1763+1G>A, p.N613Kfs*5, c.1956−2A>G), IQCB1 (p.C253Sfs*9, p.I301Lfs*42, c.1130−1G>C , p.R445*), CEP290 (p.L32S, p.K90Nfs*6, c.1189+1G>A, p.K663*, p.Q1268*, p.K1484Nfs*4, c.5226+5_8delGTAA), RPGRIP1L (c.1700−1G>A), and TMEM67 (p.D430G) (figure 4). All mutations were absent from 96 healthy control individuals (Human Random Control DNA panel-2, HRC-2) obtained from the European Collection of Cell Cultures (ECACC, Salisbury, UK). In all cases where DNA was available, the mutations segregated with the affected status compatible with a recessive inheritance pattern.

Table 2

Genotypes and phenotypes of 39 patients (34 families) with mutations in NPHP1, INVS, NPHP3, NPHP4, IQCB1, CEP290, RPGRIP1L and TMEM67

Figure 4

Novel mutations (23) detected in NPHP3, NPHP4, IQCB1, CEP290, RPGRIP1L and TMEM67 in individuals with nephronophthisis-associated ciliopathy. Family identifier (underlined), gene name, nucleotide change, deduced amino acid change and reading frame are indicated. Sequence traces are shown for mutations above wild-type controls. All mutations were absent from at least 96 healthy control individuals, and in cases where DNA from relatives was available, segregation analysis was carried out and was compatible with an autosomal-recessive inheritance.

### Single heterozygous variants of unknown significance

In our cohort of 192 patients, high-throughput mutation analysis revealed an additional 40 different single heterozygous missense variants of unknown significance, and an allele frequency below 1% (see online supplementary table S5). We used the frequency data available from the exome variant server (EVS), and found that 21 out of these 40 variants are listed, whereas, 19 are completely novel. Additionally, a high probability for an existing protein damaging effect (score ≥0.8) was predicted for 20 of the variants by using the PolyPhen2 programme (see online supplementary table S5). None of the 40 variants have been described in a public mutation database before. After applying Sanger sequencing for all exons with an insufficient coverage below 30× in the respective NPHP gene and respective patient, we have not found a second mutated recessive allele.

## Discussion

Molecular diagnostics in patients with a heterogeneous disease like nephronophthisis and other ciliopathies is very tedious and expensive when using the standard technique of Sanger sequencing on a capillary sequencer. The development and continued improvements of high-throughput, massively parallel sequencer platforms, and of targeted DNA enrichment methods, now provide the basis and opportunity for a more economical mutation analysis approach.

### Targeted enrichment methods

There are currently several preparative approaches available that allow targeted enrichment and resequencing of specific genomic regions prior to NGS, including oligohybridisation-based techniques (Agilent Sure Select, NimbleGen EZCap), molecular inversion probe ligation sequencing, and PCR-based strategies.43 There are pros and cons to each of these target enrichment modules, and their performances are different regarding costs, ease of use, DNA requirement, sensitivity, specificity, uniformity and reproducibility (reviewed in44). On-array, or in-solution oligohybridisation, is usually applied in cases where whole exome capture and sequencing for a few samples is the goal, whereas, PCR-based strategies are particularly useful for projects with only a small number of genomic regions of interest (eg, all exons of a small number of genes) across a large number of DNA samples. Furthermore, PCR has the advantage in that NGS adaptors, as well as specific barcodes, can be easily attached, avoiding expensive downstream library preparations.

### High-throughput PCR-based resequencing platforms

There are several PCR-based amplicon sequencing platforms commercially available. One of these is the Thunderstorm system from Raindance, which uses emulsion PCR to generate picolitre droplets as vessels for PCR amplification, and allows up to 96 samples, and between 500 and 20 000 amplicons to be processed in a two-day run. Another platform which we used in this study, is the 48.48 Access Array IFC microfluidic system from Fluidigm, which enables the amplification of 48 different DNA samples in combination with each of 48 target-specific primer pairs, resulting in 2304 individual PCR reactions in parallel. Both systems require high up-front costs for the instruments (approximate cost for Fluidigm US$79 000, and for Raindance, US$225 000).44 Non-reusable chips are in the range of US$350 each from both companies. A primer design fee of US$12 000, which covers 500 amplicons, is charged by Raindance, with a US$12 charge for every additional primer pair. For the Fluidigm system, all oligonucleotides can be purchased from any oligo provider, and there are no additional requirements other than the NGS-specific addition of respective tails at the 5′ end. We decided to use the Fluidigm system because of the lower up-front instrument costs and lower oligo costs. In order to reduce costs further, we applied a primer multiplexing protocol (10-plex) which allows to amplify up to 480 primer pairs, rather than only 48, in combination with 48 DNA samples simultaneously on one array (23 040 indexed amplicons). In the present study, all 192 DNA samples were processed using four separate Fluidigm access arrays, subsequently indexed with 192 different barcodes, and then sequenced on one lane on an Illumina HiSeq2000 instrument running paired-end chemistry (2×100 bases). The cost for the entire pilot experiment for 192 samples amounted to approx. US$9250 (US$1400 for the four 48.48 Access Arrays (US$350 each), US$5000 for 475 primer pairs (US$0.10/base), US$2350 for NGS Illumina HiSeq2000 sequencing (1 lane 2×100 bases), and about US$500 for Taq polymerase, barcodes and various consumables). Fortunately, the primers can be used for many additional mutation analyses, reducing the costs further when running more samples. By comparison, the price tag for sequencing all 251 exons of the 11 known NPHP genes in 192 patients (48 192 amplicons) using Sanger sequencing (one strand only) on an ABI capillary sequencer would amount to about US$195 000 assuming US$4 total cost per Sanger sequence.

### Limitations of the Fluidigm/NGS technology for mutation analysis

Although the multiplex Fluidigm/NGS mutation analysis approach is about 50 times less expensive compared with Sanger sequencing (when disregarding primer costs required for both methods), there are still some limitations to be outlined. NGS platforms generate lots of sequencing base call errors, and sequence quality scores decline with read length. PCR artefacts, especially shorter products generated with unspecific primers present within the multiplex pools, can mimic false variants after alignment. Therefore, in order to rule out false positives, it is crucial to always confirm potential mutations by Sanger sequencing using the original stock DNA sample as template. The vast sequence data generation of NGS requires bioinformatics and data analysis software for sequence alignment to a reference, and algorithms for mutant variant identification. NGS generates so many potential variants that these have to be prioritised, and cutoffs have to be established in order to reduce the downstream Sanger sequencing confirmation process. Unfortunately, this will most likely result in some missed mutations. We established cutoffs for potential mutant variants after analysing the allele frequency of heterozygous-known exonic dbSNP132 in our dataset, assuming that these changes are ‘true positives’. We found that 99% (1709/1733) of these ‘true SNP132’ showed an allele frequency above 20%. We applied this cutoff together with a minimum coverage of ≥30× to identify potential mutations, avoid false positives, and reduce the need for extensive Sanger sequencing. We were able to confirm 83 out of 95 in silico selected potential mutations by Sanger sequencing giving a specificity of 87%. In cases where the variants have been published earlier, or are present in mutation databases (eg, BIOBASE, http://www.biobase-international.com/), we confirmed these variants/mutations regardless of count, coverage or percentage of allele frequency values. Another limitation of NGS data is the difficulty of some analysis software to detect larger insertions or deletions of more than seven bases. Different programmes and algorithms are available to tackle this problem and to identify such variants. Besides the problem with false positives, other drawbacks to the Fluidigm/NGS approach include occasional insufficient coverage due to failed exon amplification, or because of the failure of some low-quality DNA samples.

### Coverage of targeted coding exons of 11 different genes

The multiplex PCR approach using 10 different primer pairs per PCR reaction proved very robust in the Fluidigm/NGS system starting with PCR reaction volumes in the nanolitre range. About 93% of all targeted exons have been amplified and showed sufficient coverage of at least 30× with a high likelihood of ≥10 counts/reads per heterozygous change. Unfortunately, not all exons amplify easily and sufficiently in a multiplex setting, resulting in low coverage and subsequent mutation detection failures. NGS analysis revealed 5 exons with virtually no coverage across all patients and 12 low-coverage exons with a median coverage below 30× out of all 251 targeted exons. The resulting low or zero coverage was most likely due to a high GC-content as, for example, seen in exon 1 of the NPHP3 gene. Some of the GC-rich exons seem to require special PCR conditions, and therefore, should be amplified separately in a conventional PCR setting prior to NGS. Other suggestions to raise the coverage for low-performing exons are as follows: redesigning of respective exon specific primer pairs, splitting exons into two or more overlapping amplicons with specific primers localising to unique exonic regions, adding nested primers to the multiplex pool in order to generate additional specific templates during PCR, or lowering the fold of primer-plexing per PCR reaction. Furthermore, besides the failure of certain exons, we found that in 24 DNA samples, almost no sequence results could be obtained. When assessing these DNA samples, we found that low DNA quality (degraded and low concentration) was only present in six of these samples. For the other samples, we speculate that the failure was due to a partial failure of one of the 48.48 Access Arrays in which 12 DNA samples of ‘good’ quality were processed. Whether the array itself was defective, or pipette handling errors occurred, and air bubbles were accidentally introduced into the array system, cannot be determined with certainty.

### Mutations identified in patients with a nephronophthisis-associated ciliopathy

High-throughput mutation analysis of 11 different NPHP genes in a large cohort of 192 patients using the Fluidigm/NGS system led to the identification of 54 mutated alleles found in 34 unrelated patients/families. We detected homozygous mutations in 10 patients/families, compound heterozygous mutations in another 16 patients/families, and single heterozygous truncating mutations in eight patients/families. Lack of finding causative mutations in 158 patients indicates the presence of further heterogeneity in patients with an NPHP-associated ciliopathy. For these remaining unsolved cases, we are planning to extend the spectrum of genes to be analysed by high-throughput mutation analysis, and to include additional genes known to be implicated in JBTS and MKS as for example, AHI1, ARL13B, CC2D2A, INPP5E, TCTN1, TCTN2, ATXN10 and MKS1.

As we have described before, there is a strong phenotype-genotype correlation for patients with mutations in NPHP5/IQCB1.11 In the present study, all 14 patients (11 families) in whom we detected NPHP5/IQCB1 mutations presented with retinal dystrophy or Leber congenital amaurosis and early blindness, together with cystic kidney disease (SLSN). Besides NPHP5/IQCB1, the gene most frequently implicated in SLSN, we similarly detected mutations in NPHP4, CEP290 and TMEM67 in those patients with additional eye involvement.

As mentioned above, we have found heterozygous deleterious mutations in eight patients without detecting a second mutated allele to fully explain a recessive disease phenotype, although we have Sanger-sequenced all exons of the respective genes. The consequence of focusing on sequencing coding exons only will result in missed mutations localised within introns, enhancers, promoters and 3′ or 5′ untranslated regions. We also have to consider the possibility that some of these truncating mutations, although rare, might have been found by chance in concordance with their frequency seen in the healthy population. Using the exome capture and NGS data derived from over 5000 individuals deposited in the Exome Variant Server database (EVS, http://evs.gs.washington.edu/EVS/), we calculated that one heterozygous severe truncating mutation within an NPHP gene is expected to be present by chance for every 100 individuals. This number is obviously even bigger for missense variants. In total, we identified 40 different single heterozygous missense variants of unknown significance in the genes NPHP1-11. The circumstance that no second mutated allele has been found in these cases indicates that most of these variants possibly will be random findings without relevance to kidney disease. The EVS server lists 736 different missense variants with a frequency ≤1% in over 5000 individuals across all 11 NPHP genes. Therefore, we expect to find about 28 different missense variants by chance in these genes in a cohort of 192 individuals. Whether some of these variants contribute to the disease phenotype, and whether a second mutated allele might be overlooked, cannot be decided without extensive further investigation.

In our cohort, the high-throughput Fluidigm/NGS approach led to the identification of 36 different mutations, 23 of which were novel findings in the genes NPHP3 (7), NPHP4 (3), NPHP5/IQCB1 (4), NPHP6/CEP290 (7), NPHP8/RPGRIP1L (1) and NPHP11/TMEM67 (1). This contributes a significant amount of novel changes to public databases, especially for mutations in NPHP3 and NPHP5/IQCB1 for which ‘Biobase’ (released 30 March 2012) currently lists only 31 different NPHP3 and 19 different NPHP5/IQCB1 mutations.

In patients in whom we have not found any mutations in NPHP1-11, we are planning mutation analysis of all known JBTS and Meckel–Gruber syndrome genes using the Fluidigm/NGS strategy. In cases where no mutations have been found we consider a whole exome capture and NGS approach.

### Future improvements of the Fluidigm/NGS mutation analysis approach

Recent protocol improvements supplied by Fluidigm enable bidirectional amplicon tagging which allows unidirectional NGS sequencing and avoiding the necessity of paired-end sequencing cost. Another huge improvement is the opportunity to perform reliable runs of 150 bases on an Illumina MiSeq or on an upgraded Illumina HiSeq2000 platform which will be the basis to sequence amplicons with a length of up to 300 bases so that most exons can be amplified with one primer pair only. This will significantly reduce oligo costs and will help to almost double the amount of exons which can be processed per array. Additional cost reductions might result after increasing the number of primer pairs for multiplexing further or by running more than 192 samples per NGS lane; however, lower-sequence coverage must be anticipated. The HiSeq2000 platform sequence output of only 62 million reads was exceptionally low in our pilot project. Usually, the output is in the range of 200–300 million reads per lane and, therefore, we expect better coverage and performance of our Fluidigm/NGS strategy in future experiments.

In summary, we showed that multiplex PCR amplification on a Fluidigm microfluidic array combined with Illumina NGS is a robust and very cost-efficient mutation analysis method easily adaptable to various resequencing applications.

## Acknowledgments

The authors sincerely thank the affected individuals and their families for participation, and we thank the physicians who contributed to this study. This work was supported by a grant from the National Institutes of Health to EO (RC4-DK090917). We thank all the physicians and researchers of the ‘Gesellschaft für pädiatrische Nephrologie (GPN)’ study group for participation.

View Abstract

## Footnotes

• JH and KD contributed equally

• Contributors JH and KD performed 48.48 Access Array amplification, sample indexing, NGS data analysis and Sanger sequence confirmation. BT and RHL performed large-scale sequencing on an Illumina HiSeq2000 analyser and helped with the bioinformatics analysis. SJA, MC, JDP, CF and JLI performed DNA extraction, Sanger mutation analysis and segregation analysis. CJS, HO and NHS recruited patients and gathered detailed clinical information for the study. EAO conceived and directed the project, analysed and evaluated all data, and wrote the paper.

• Competing interests None.

• Patient consent Obtained.

• Ethics approval Ethics approval was granted by the University of Michigan Institutional Review Board.

• Provenance and peer review Not commissioned; externally peer reviewed.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.