Article Text

Comprehensive genomic analysis of PKHD1 mutations in ARPKD cohorts
  1. A M Sharp1,
  2. L M Messiaen2,
  3. G Page3,
  4. C Antignac4,
  5. M-C Gubler5,
  6. L F Onuchic6,
  7. S Somlo7,
  8. G G Germino8,
  9. L M Guay-Woodford1
  1. 1Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
  2. 2Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, USA
  3. 3Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
  4. 4Department of Genetics, Hopital Necker-Enfants Malades, Paris, France
  5. 5INSERM U574, Hopital Necker-Enfants Malades, Paris, France
  6. 6Department of Medicine, University of São Paulo School of Medicine, São Paulo, Brazil
  7. 7Departments of Internal Medicine and Genetics, Yale University School of Medicine, New Haven, CT, USA
  8. 8Departments of Medicine and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
  1. Correspondence to:
 Dr Lisa M Guay-Woodford
 Division of Genetic and Translational Medicine, University of Alabama at Birmingham, 740 Kaul Human Genetics Building, 720 20th Street South, Birmingham, AL, USA;

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Autosomal recessive polycystic kidney disease (ARPKD; MIM 263200) is an important childhood nephropathy, occurring in 1 in 20 000 live births.1 The clinical phenotype is dominated by dilatation of the renal collecting ducts, biliary dysgenesis, and portal tract fibrosis. Affected children often present in utero with enlarged, echogenic kidneys, as well as oligohydramnios secondary to poor urine output. Approximately 30% of affected neonates die shortly after birth as a result of severe pulmonary hypoplasia and secondary respiratory insufficiency. Those who survive the perinatal period express widely variable disease phenotypes with systemic hypertension, renal insufficiency, and portal hypertension due to portal tract fibrosis as the most common clinical features.2

Linkage analysis indicates that mutations in a single locus on chromosome 6p12 are responsible for all typical forms of ARPKD.3,4 Two groups working independently have identified PKHD1 (MIM 606702) as the locus responsible for ARPKD and have demonstrated that this novel gene is among the largest in the human genome, extending over at least 470 kb and including a minimum of 86 exons.5,6 Both PKHD1 and its mouse orthologue (Pkhd1) encode a complex and extensive array of splice variants, with most abundant transcriptional expression in fetal and adult kidney and weaker expression in other tissues including liver and pancreas.5,7 The longest PKHD1 transcript includes 67 exons with an open reading frame (ORF) composed of 66 exons that encode a 4074 amino acid protein, polyductin/fibrocystin.5,6 The full length protein is predicted to have several immunoglobulin-like, plexin, transcription factor (IPT) domains and multiple parallel beta-helix 1 (PbH1) repeats in its approximately 3860 amino acid extracellular amino terminus; a single transmembrane (TM) spanning domain; and a short, cytoplasmic carboxyl terminus with potential phosphorylation sites. Alternatively spliced transcripts are predicted to fall into two broad groups. The first subset, polyductin-M, is comprised of polypeptides that contain the single TM element but vary with respect to inclusion of the other predicted domains. The second subset, polyductin-S, lacks the TM domain and thus its members may be secreted.5 The PKHD1 gene products share structural features with hepatocyte growth factor receptor and plexins, members of a superfamily of proteins involved in regulation of cellular adhesion and repulsion as well as cell proliferation. In addition, recent studies have demonstrated that like other cystoproteins, polyductin/fibrocystin is expressed in the primary apical cilium.6,8–12

Based on the available data, ARPKD appears to result from partial or complete loss of polyductin/fibrocystin function. However, the mechanisms by which PKHD1 mutations cause clinical disease phenotypes are not well understood. Gene based analyses have been complicated by the large gene size and reported mutation detection rates have ranged from 47% to 61%.5,6,13–15 The limited mutation detection rates and the absence of mutational hot spots in PKHD1 have confounded efforts to examine potential genotype-phenotype correlations. These methodological challenges must be overcome before such correlative analyses are revealing and gene based examination is robust enough for clinical diagnostic testing.

Key points

  • Mutations at a single locus, PKHD1, are responsible for all typical forms of autosomal recessive polycystic kidney disease (ARPKD). We have refined previously reported mutation detection strategies and evaluated all 86 predicted PKHD1 exons, including the 67 exons in the longest open reading frame transcript as well the 19 alternative exons.

  • We have rigorously examined the predicted pathogenicity of amino acid substitutions using the matrix criteria described by Miller and Kumar (Hum Mol Genet 2001;21:2319), as well as potential splice site alterations using the Splice Site Prediction by Neural Network (SSPNN) algorithm at

  • Our mutation detection rate of 82.7% is the best reported to date in an ethnically diverse population with a wide range of ARPKD associated phenotypes.

  • We have re-categorised all reported mutations by numbering the exons according to their genomic order and providing the genomic nucleotide designation to clarify position. This re-compilation provides an essential platform for the robust interpretation of PKHD1 variants identified in the course of prenatal testing or pre-implantation genetic diagnosis. We have submitted this data compilation to the PKHD1 database maintained at

In the current study, we have refined the mutation detection strategies and evaluated all 86 predicted exons, including the 67 exons in the longest ORF transcript as well the 19 alternative exons. Our mutation detection rate of 82.7% is the best reported to date in an ethnically diverse population with a wide range of ARPKD associated phenotypes. We have examined potential correlations between disease phenotypes and specific mutational mechanisms and/or linear positions along PKHD1. Consistent with previous studies, we found that mutations are distributed along the PKHD1 gene and patients carrying two potentially chain terminating mutations expressed the severe perinatal phenotype.13–15 We have rigorously examined the predicted pathogenicity of amino acid substitutions and potential splice site alterations in both our dataset and all sequence variants reported to date. These systematic analyses, annotation of missense changes, and characterisation of mutations based on genomic position will provide an essential platform for the robust interpretation of PKHD1 variants identified in the course of prenatal testing or pre-implantation genetic diagnosis.


Patients and samples

A cohort of 59 unrelated families and individuals was ascertained from the databases at the University of Alabama at Birmingham (North American ARPKD Database and prenatal testing database at UAB Molecular Genomics Laboratory). ARPKD was diagnosed according to previously established criteria.16 In addition, 16 affected fetuses from unrelated families with at least one previous ARPKD affected child were identified by haplotype analysis.1 The pregnancies were terminated and ARPKD was confirmed by histopathological analyses.17 Study subjects were ethnically diverse and represented the full spectrum of clinical presentations for ARPKD (table 1). Whole blood, chorionic villus samples, or products of conception were obtained from study subjects and family members under informed consent approved by the University of Alabama at Birmingham Institutional Review Board. Genomic DNA was prepared using standard protocols.15 In addition, DNA samples were obtained from 100 anonymous, unrelated normal individuals.

Table 1

PKHD1 variants in the study cohort

PCR amplification

The 67 exons that compose the transcript (GenBank accession no. AF4800064) with the longest predicted ORF of PKHD1 (GenBank accession no. AY129465) were amplified as a set of 80 amplicons. PCR primers were designed to amplify exon sequences, the adjacent splice sites, and 40–50 nucleotides of flanking sequence on each side as 200–400 bp products. When the exon size was greater than 400 bp, a series of overlapping primers was designed to limit the size of the amplicons. Exons 32, 59, 65, and 71 were amplified in seven, four, four, and two overlapping fragments, respectively. A number of the primers used in this study were described previously15 and the remainder were designed using the Primer3 program (supplemental table 1; supplemental tables 1–4 are available from Several of these primers sets also amplified the predicted alternative exons 20a, 32a, 32b, 32c, 39a, 39b, 41a, 44a, 51a, 51b, 60a, 66a, and 71a5 and we designed primers to amplify the remaining alternative exons 38, 38a, 39b, 41a, 62, 63, 64, and 71b (supplemental table 2).

Table 2

 Truncating variants

Amplifications were performed in an MJ Research Dyad thermocycler using 100 ng genomic DNA in a total reaction volume of 50 µl, incorporating: 1×HotMasterMix (Eppendorf, Hamburg, Germany) and 400 pM of each forward and reverse primer (Integrated DNA Technologies, Coralville, IA, USA). The PCR reaction mixes were subjected to the following “touchdown” thermal cycling protocol: initial denaturation at 95°C for 5 min; followed by 10 cycles at 95°C for 30 s, 65°C to 55°C (−1°C per cycle) for 30 s, 65°C for 30 s; followed by 35 cycles at 95°C for 30 s, 53°C for 30 s, 65°C for 30 s; followed by a final extension for 7 min at 65°C. The PCR reaction volume provided amplification product for denaturing high performance liquid chromatography (DHPLC) analysis as well as for sequencing templates.

Mutation detection

Mutation detection was performed by heteroduplex analysis using the Transgenomic WAVE DHPLC system (Transgenomic, Omaha, NE). For each amplicon, the optimal elution gradient for amplicon size and GC content as well as the optimal denaturing temperature were determined according to the WaveMaker version 4.1.44 system control software. Briefly, amplicons were prepared from three normal control samples and injected at five different temperatures: the determined optimal melting temperature, as well as temperatures ±0.5–1.5°C. Where necessary, GC clamps were included in primer sequences to normalise the melting profile of the amplicon and to permit higher injection temperatures. The DHPLC conditions used in these analyses are provided in supplemental table 1.

To enhance heteroduplex formation, PCR products were denatured at 95°C for 5 min and allowed to gradually re-anneal in the thermocycler block using 48 cycles of 1 min each with temperatures stepped down from 93.5°C to 21.5°C (−1.5°C per cycle). Samples were then placed into the DHPLC autosampler and held at 10°C as 5–8 μl aliquots were injected onto the column at the appropriate analytic temperature. The mobile phase consisted of a mixture of buffer A (0.1 M triethylammonium acetate (TEAA)), and buffer B (25% acetonitrile in 0.1 M TEAA) as per the manufacturer’s instructions. Each fragment was eluted with a linear acetonitrile gradient at a flow rate of 0.9 ml/min. A normal control was processed with each batch of patient amplicons, and when possible, amplicons harbouring known mutations were included in the DHPLC analysis. For patients in whom no pathogenic variants were detected by DHPLC, their DNA samples were mixed with wildtype control DNA templates prior to re-amplification and DHPLC analysis. This “sample mixing” enhanced the DHPLC based detection of homozygous sequence variants in the patient DNAs.

Samples exhibiting altered chromatogram patterns or retention times with respect to normal controls were subjected to direct sequence analysis. PCR products were purified using the PCR Product Presequencing Kit (USB, Cleveland, OH, USA) and sequenced in both directions with BigDye Deoxy Terminator cycle sequencing on an ABI 3100 DNA sequencer (Applied Biosystems, Foster City, CA, USA). The primers used for DHPLC were also used as sequencing primers. Sequences were aligned and analysed using SeqMan II version 5.05 (DNASTAR, Madison, WI). All putative mutations were tested by segregation analysis when family material was available. To assess whether missense changes represented potential pathogenic mutations or benign polymorphisms, DNA samples from 100 unrelated normal control subjects were examined using DHPLC and sequence analysis.

The sequence variants were described using the nomenclature reported by Dunnen and Antonarakis18 and the Human Genome Variation Society website (HGVS;, accessed May 31, 2004). Nomenclature of PKHD1 sequence variants was based upon the PKHD1 mRNA sequence (GenBank accession no. AF4800064) with the A of the start codon designated as nucleotide 1. Exon and gDNA nucleotide numbers were reported from the PKHD1 genomic DNA sequence (GenBank accession no. AY129465).

Statistical analyses

Genotype-phenotype comparisons were performed in those patients for whom two putative mutations were identified using a contingency table and a χ2 significance test. We compared allelic and genotypic frequencies of chain terminating, splice site variants, and amino acid substitution mutations among our patients with and without perinatal presentation. Analyses were conducted in Excel, SPSS, and SAS.


PKHD1 mutations

We performed a systematic DHPLC based analysis of PKHD1 mutations in a cohort of 75 unrelated ARPKD patients. Samples were screened for all 67 exons comprising the transcript with the longest continuous PKHD1 ORF as well as the 19 predicted alternative exons. We detected a total of 173 variants in the PKHD1 gene, 92 of which have not been previously reported. The sequence variations included 10% frameshifts (50% novel), 3.3% nonsense alterations (100% novel), 8% splicing variants (77% novel), 41% amino acid substitutions (56% novel), 1.7% in-frame insertion/deletion alterations (67% novel), 12% silent exon variants (32% novel), and 24% intronic alterations (60% novel).

The following criteria were applied to predict whether a sequence variant was pathogenic: (i) a potential chain terminating effect on the longest predicted polypeptide; (ii) disruption of a canonical splice site or creation of a novel site; (iii) substitution of an evolutionarily conserved amino acid; (iv) alteration in the polarity or charge of an amino acid; and (v) assessment of the variant frequency in 200 control chromosomes. In total, we identified 124 pathogenic sequence variants among 150 test chromosomes for an overall detection efficiency of 82.7% (table 1). Homozygous PKHD1 mutations were detected by mixing in eight samples. In three additional patients, there was insufficient DNA for mixing analysis. Mutations were identified on both chromosomes in 56 individuals (74.7%) and one mutation was found in 14 samples (18.7%). No disease causing variants were detected in six patients (8%).

Our patient cohort included six patients previously reported in Furu et al15: AL40/21-1, AL39/2-18, AL7/10-16, AL41/92-1, AL31/72-1, and AL33/74-1. In each case we confirmed the previously identified mutation and identified a second putative pathogenic sequence variant. In addition, while Furu et al did not detect any sequence variant in AL3/5-1, mixing analysis revealed that the patient was homozygous for the p.G1971D substitution.

Among the putative mutant alleles, 22 were predicted to cause chain termination with 16 frameshift variants (deletions, insertions, and duplications) and six nonsense mutations (table 2). Of note, c.9689delA was found on 10 alleles in seven patients; three were homozygotes and four were heterozygotes. Because these patients had different ethnic origins, a founder effect is less likely and this deletional event may represent a mutational “hot spot”. Two additional frameshift variants, for example, c.3761_3762delCCinsG and c.5895dupA, were detected in several unrelated individuals from ethnically distinct populations, also suggesting the possibility of mutational hot spots. Of note, we identified the frameshift mutation c.3528dupC only in a control DNA sample while assessing the frequency of another variant in 200 control chromosomes. Given that the carrier frequency for PKHD1 mutations is approximately 1:70 (1.4%) in non-isolated populations19 and given the diverse array of mutant alleles described to date, it has been suggested that the prevalence of any individual mutation would be low in the normal population.20 Therefore, it has been proposed that the absence of a PKHD1 variant in 400 control chromosomes strongly indicates that it is a pathogenic alteration.20 Our detection of this frameshift mutation in one of 200 control chromosomes suggests that, although supportive, a variant frequency below 0.25% is neither a sufficient nor a necessary criterion for categorising a sequence change as a pathogenic alteration.

We analysed all single nucleotide changes for potential splice site effects using Splice Site Prediction by Neural Network (SSPNN; Splice site scores predicted the theoretical impact upon donor and acceptor site strength and the probability that sequence variants created novel splice sites. We found this to be a very useful tool to predict the effect of certain nucleotide changes on splicing, especially as mRNA work to complement our data is not currently feasible. Sixteen variants were predicted to alter splicing (table 3) with alterations in splice site scores generally >50%. Six variants disrupted the canonical GT splice donor or AG splice acceptor and presumably caused exon skipping. For c.7912-1 G→C, disruption of the 100% conserved canonical splice acceptor site had minimal impact (0.37 to <0.3) on the splice site score. However, we note that reductions of predicted splice site strength as low as 0.03 (3%) have been demonstrated to cause exon skipping.21 Four mutations occurred at less conserved positions of the 5′ or 3′ splice sites: c.391-5 A→G, c.5751+3 A→G, c.6121+3_6121 +4insT, c.8302+3 G→C. Six missense changes were predicted to alter splicing, of which two, p.R760H, and p.G2809R, altered conserved residues. As we did not have access to kidney mRNA and PKHD1 is expressed at low levels in peripheral blood lymphocytes, the predicted impact of the splice site alterations could not be tested directly. We note however that investigations at the transcript level are likely to be problematic given the complex transcriptional profile of PKHD1.5

Table 3

 Splice variants

Of the 71 amino acid substitutions, 30 replaced residues that were either conserved in Mus musculus polyductin (NM_153179) and/or predicted to be conserved in polyductin-like proteins from Rattus norvegicus (XM_236979, XM_236984), Gallus gallus, and Macaca fascicularis (table 4). In addition, we applied the matrix criteria described by Miller and Kumar22 to assess the statistical probability of pathogenicity for each amino acid substitution. This matrix, developed using disease associated human genetic variation and interspecific comparisons as well as Graham’s chemical difference matrix, defines the relative likelihood that a missense change represents a polymorphism versus a pathogenic alteration. In our cohort, 24 missense changes were predicted to be pathogenic, and an additional three potentially pathogenic substitutions were identified. The p.Y1136C substitution was previously reported as a putative pathogenic mutation.15 In the current study, this substitution was detected in another patient as well as a control chromosome and was predicted to have a higher pathogenic potential than polymorphic probability. Therefore, as a conservative estimate, 46% of the amino acid substitutions in our cohort were predicted to be pathogenic. We categorised an additional 20 variants as “unclassified” because the change disrupted non-conserved residues or the predicted polymorphic potential was higher than the pathogenic probability. Eight of these sequence variants were also detected in one control chromosome.

Table 4

 Amino acid substitutions

The missense alteration, p.T36M, has been described in each PKHD1 mutation study reported to date (reviewed in Bergmann et al20) and has been proposed to either represent a founder effect, as most of the patients were of Central European origin, or constitute a mutational “hot spot”, perhaps due to methylation induced deamination.23 In the current study, the p.T36M change was identified in 10 unrelated individuals. Unlike the previous studies, these individuals were of diverse ethnic origins including Scotch Irish, French, African American, and Egyptian, as well as Central European. Thus, our analysis provides circumstantial evidence favouring the possibility that p.T36M occurs due to a frequent mutational event.

A second missense change, p.R1624W, was detected in three Saudi patients on five of six chromosomes. This observation may suggest a common founder allele in the Saudi population. However, this allele was not detected in two previously reported Saudi patients15 nor in a fourth Saudi patient (SB137) in the current study. Moreover, the pathogenic potential of this sequence variant is not clear. The missense change disrupts a non-conserved residue but is predicted to have a high pathogenic potential. We were unable to examine the frequency of this variant in control Saudi chromosomes.

Both of the South African patients were homozygous for the p.M627K substitution. These unrelated children were of Afrikaner origin, a population with a higher prevalence of ARPKD24 and other recessively transmitted disorders.25 We are currently examining whether this substitution represents a founder effect in this population, information that could streamline gene based diagnostic testing in at risk Afrikaner children.

Two amino acid substitutions merit further discussion. The missense changes, p.V419S and p.W2749S, were caused by in-frame insertion and deletion mutations (in-frame indel) (table 4). The p.W2749S substitution was identified in a single patient (2-18), whereas the p.V419S variant was detected only in one control chromosome. While these sequence variants disrupted conserved residues, the Miller and Kumar criteria, which apply only to single nucleotide changes, were not informative regarding potential pathogenicity. Therefore, we included the in-frame indel mutations among the unclassified variants. We recognise, however, that the genetic events leading to these variations are likely to be more complex than single nucleotide substitution, thus perhaps increasing the likelihood that these variants are associated with pathogenic effects. Functional studies will be required to examine the pathogenic potential of each of these variants further.

We note that a maximum of two putative mutations were detected in all except patients 33-1 and 74-1, in whom three potentially pathogenic missense changes were identified (table 1). For 33-1, the p.T36M and p.P3602L substitutions have been described in other ARPKD patients and are predicted to have high pathogenic potential. However, the p.M3642I variant is predicted to create a relatively strong acceptor splice site. Functional studies will be required to assess the relative pathogenicity of these three missense variants. For 74-1, the p.I2957T and p.V3440D variants have been reported previously in ARPKD patients and were not found in control chromosomes. Our analysis indicated that p.V3440D involves a substitution of a non-conserved residue but has a higher probability of being a pathogenic alteration than a polymorphism.22 Therefore, we would categorise this missense change as an unclassified variant. In comparison, the p.S2861G variant has a higher probability of being a polymorphism than a pathogenic alteration, the substitution involves a highly conserved residue, and it was not detected in 200 control chromosomes. Therefore, the pathogenic potential of this variant in patients 36-6 and 74-1 cannot be excluded. Unfortunately, for both patients 33-1 and 74-1, maternal and paternal DNAs were not available to define the allelic inheritance of these sequence variants.

PKHD1 sequence polymorphisms

In addition to the putative pathogenic changes described above, we also detected 19 single nucleotide polymorphisms (SNPs) that cause amino acid substitutions (table 4). As expected, these missense changes occurred frequently in the normal control population. However, 50% (8/19) involved conserved residues and the Miller and Kumar matrix predicted that these eight variants had a higher pathogenic potential than polymorphic probability. Therefore, such criteria should be interpreted as a guide and pathogenic potential will ultimately need to be assessed by functional studies.

We also detected silent nucleotide changes in both exons and intronic sequences (table 5). Those variants that were common in the control population were designated as polymorphisms. In addition, a number of these variants were detected only in the affected cohort, but no clear pathogenic potential could be ascribed. In the absence of functional studies, we have categorised these variants as probable polymorphisms. While several of these intragenic SNPs have been reported previously (reviewed in Bergmann et al20), many are novel. Taken together, these SNPs can be used in future analyses to define the haplotypes for mutant chromosomes.

Table 5

 Silent and intronic variants

Finally, we identified a number of single nucleotide changes that would be predicted to involve alternative exons. However, without information defining those transcripts that contain alternative exons and their predicted reading frames, we were unable to interpret the potential pathogenic impact of these sequence variants.

Genotype-phenotype correlations

To examine possible genotype-phenotype correlations, we stratified our patient cohort into two groups: the severe group (perinatal) in which the affected child died in the perinatal period; and the less severe group (non-perinatal or older presentation) in which the child either survived the perinatal period or presented at an older age beyond the perinatal period. Haplotype based diagnosis initially identified all the fetal cases and in each family the index child had died in the perinatal period. Therefore, for the purposes of our analyses, we combined the fetal and perinatal cohorts.

We identified both putative mutations in 56 (74.7%) of the 75 patients examined: 36/49 (73.5%) perinatal and 20/26 (76.9%) non-perinatal (table 1). We focused our genotype-phenotype analysis on these 56 patients and characterised the mutations into three classes: (i) chain terminating groups, including nonsense and insertion/deletion with frame shifting; (ii) putative splice site variants; and (iii) amino acid substitution. We reasoned that chain terminating mutations were likely to be complete loss of function variants. Splice site variants could be associated with chain termination if skipping occurs as an out-of-frame event or with a hypomorphic allele if in-frame skipping removes only a number of AAs. Amino acid substitutions may be an admixture of hypomorphic alleles of varying degrees of functional loss.

First, we confirmed that the observed mutational combinations were in Hardy-Weinberg equilibrium, suggesting that there is no obvious bias in the identification of mutations in our patient cohort. Homozygous mutations were detected in 12 patients and the remaining patients were compound heterozygotes. Homozygous chain terminating mutations were identified in four patients with the perinatal phenotype, whereas homozygous amino acid substitutions were found in one perinatal patient and six non-perinatal cases. None of the patients were known to be related. The presence of a common amino acid substitution among the Saudi and Afrikaner patients may be due to population bottlenecks, founding effects, and/or inbreeding.

We then examined the phenotype correlations in patients with various combinations of chain terminating mutations, splice site variants, and amino acid substitutions (table 6). Consistent with previous reports,5,13–15,26 all patients with combinations of predicted chain terminating and splice site variants (n = 13) were in the perinatal cohort. There was a significant difference in the mutational spectra between the perinatal phenotype and the non-perinatal subset, with variant pairings in the perinatal group comprised mainly of clear pathogenic mutations (chain terminating and splicing variants) or combinations of more pathogenic variants with amino acid variants (p = 0.00003 by χ2 test, df = 2). The presence of at least one amino acid substitution was significantly associated with the non-perinatal subset (p = 0.0029 by χ2 test with df = 1; odds ratio 1.81, 95% confidence interval 1.374 to 2.406). However, we cannot definitively conclude that at least one amino acid mutation is required for survival. As in previous studies, the T36M variant was detected frequently in our cohort (11 patients), but the distribution of this substitution was not significantly different between the phenotypes.

Table 6

 Mutation type and phenotype in PKHD1

Lastly, we examined whether the position of the putative PKHD1 mutation within the longest ORF correlated with disease phenotype. RT-PCR data from mouse kidney indicates that probes from the 5′ and 3′ ends of the longest transcript reveal different transcriptional profiles (GG Germino, unpublished). Therefore, we partitioned the PKHD1 ORF into three “bins”: exon 1–20 (c.-222−1964); exon 21–37/39–50 (c.1965–7911); and exon 51–61/65–71 (7912–12225). Based upon these three bins, there is no significant difference in the distribution of mutations along the PKHD1 ORF by phenotype (p = 0.182, χ2 test, df = 2). Exons 32, 59, and 65 have the highest absolute number of mutations, but these are also some of the largest exons. In contrast, exons 3, 5, and 9 are short exons (78, 109, and 65 bp, respectively) but have significantly higher mutational rates than the rest of the ORF exons (p<0.05, by binomial test). Sequence variants involving 3.67–4.61% of the nucleotides in these exons are predicted to have pathogenic effects. When all sequence variants are taken into account (pathogenic, unclassified, and polymorphic), exons 7 and 22 have significantly higher variant rates. The significance of this observation, for example, potential domain effects, is unclear and will require further study. Finally, we note that no mutations were detected in exons 28, 42, 45, 46, or 66. The pathogenic significance of this observation remains to be determined.

Analysis of all reported mutations

The PKHD1 gene is predicted to contain a minimum of 86 exons, 71 non-overlapping exons that span the entire length of the gene and 15 alternative exons that use different splice sites. The longest ORF transcript contains 67 of the 71 non-overlapping exons. We designed primers for all 86 predicted exons5 and defined DHPLC protocols to examine each exon (supplemental tables 1 and 2).

Previous mutational studies have either numbered PKHD1 exons sequentially from 1 to 67 according to the longest ORF transcript6,13,14,26 or provided two numerical identifiers per exon, one using the sequential numbering system and the second using the genomic number.5,15 Given the potential pathogenic importance of sequence variants in the alternative exons (this study), we have re-categorised all the mutations reported to date (including the current study) by numbering the exons according to their genomic order and providing the genomic nucleotide designation to clarify position (supplemental table 3). This re-compilation provides a robust template for future mutational reports, particularly those that examine the pathogenic potential of sequence variants observed in alternative exons. We have submitted this data compilation to the PKHD1 database maintained at

We re-analysed all reported missense mutations using the SSPNN website to determine whether previously reported missense changes involving the longest ORF exons could disrupt predicted splice sites (supplemental table 4). The variants c.6865+4 A→T, c.657 C→T, and p.Q1917R have been reported in previous studies and were predicted to cause splice site alterations. Our analysis using the SSPNN scores is consistent with these predictions. In addition, these analyses indicate that p.S1664F, p.S1867N, p.I2303F, p.C2688F, and p.S2983L all have a very high probability of causing aberrant splicing in the PKHD1 ORF. We also determined that two variants in normal controls, p.T2938M (1/2006) and p.R3107P (1/40020) were likely to alter splice sites. In contrast, while previous reports have classified the variants p.I222V,6 p.A17V,6 and c.5381−9 T→G14 as putative splice site mutations, our SSPNN scores were not consistent with a significant splice site effect. The pathogenic potential of these missense changes will require further analysis with RNA templates.

We note that 30 variants detected in the studies reported to date would be predicted to have pathogenic effects on the longest ORF but may also affect alternate PKHD1 transcripts, as they involve nine of the alternative exons (supplemental table 3). These 30 variants comprise six frameshift, three splice, four nonsense, and 17 missense variants. The frameshift and splice variants would be responsible for chain termination of transcripts encoded by the longest ORF as well as transcripts containing alternate exons. The nonsense and missense variants may have an effect on alternate transcripts depending on the reading frame. Of the missense variants predicted to occur in alternate transcripts, 10 are classified as pathogenic and seven are unclassified in the longest ORF. Two variants from the unclassified group are silent variations in the longest ORF, however, an altered reading frame may cause a more pathogenic amino acid substitution, or in the case of c.6975 C→T, one reading frame may lead to the formation of a stop codon and subsequent chain termination. Further investigations must clarify the complex transcriptional profile of PKHD1 before the pathogenic potential of sequence variants involving alternative exons can be assessed.

Finally, we re-examined all the reported amino acid substitutions and assessed putative pathogenicity using the Miller and Kumar matrix criteria. Supplemental table 4 provides a compilation of all these missense mutations stratified according to their likely pathogenic effect. These data should provide a useful reference for laboratories performing gene based diagnostic testing.


The current study represents the first comprehensive genomic analysis of PKHD1, a novel gene that includes a minimum of 86 exons.5,6 All previous mutational studies have focused on the 67 exons that comprise the longest ORF transcript and reported mutation detection rates that vary from 47–61% in phenotypically diverse cohorts5,6,13–15 to 85% among patients with the severe perinatal phenotype.26

We have postulated that genomic analysis of all 86 predicted exons should identify putative pathogenic changes among all biologically relevant exons. This information thus should provide a rational starting point to investigate the role of alternative exons among the extensive array of PKHD1 splice variants and perhaps, once the complex transcriptional profile of PKHD1 is better understood, increase the effectiveness of mutation detection strategies.

Mutation detection rate

Our overall detection efficiency among the 67 exons in the longest ORF transcript was 82.7%, the highest reported to date in a phenotypically diverse ARPKD cohort. The detection rates were not statistically different among the three phenotypic groups examined (fetal, perinatal, older presentation). When combined with linkage data indicating that mutations in PKHD1 are causative in all typical cases of ARPKD,3,4 these data suggest that the previous low detection rate among children with older presentations was more likely due to methodological issues than potential genetic heterogeneity.

A total of 319 PKHD1 variants have been reported to date, including the 173 variants from this report, 92 of which were novel. Our data have increased the PKHD1 variant database content by 28.8%. Among these novel variants, 44/92 (47.8%) were classified as likely pathogenic changes (splice site alterations, chain termination, and pathogenic amino acid substitutions) and 33/92 (35.9%) were unclassified variants, consisting primarily of amino acid substitutions at non-conserved residues. Four variants, p.D1720E, p.W2736G, p.W2749S (in-frame indel), and p.G2849A, involved substitutions of highly conserved amino acids and were not detected in 200 normal chromosomes. However, the Miller and Kumar matrix predicted these missense changes were likely to be polymorphisms and thus we categorised them as unclassified variants.

We did not detect any mutations in 8% of our cohort and we found both mutations in only 15 (78.9%) of the 19 fetal cases, despite the fact that ARPKD was confirmed in all by histopathological analysis. In the former case, these findings may suggest misdiagnosis, but several of these children had affected siblings with pathoanatomically proven ARPKD. The unidentified PKHD1 alterations in these patients could involve mutational events that are not detectable with our exon based method, for example, intronic changes that generate cryptic splice sites or large rearrangements that cause one to multiple exon deletions. Alternatively, these patients may carry alterations in regulatory elements. For example, the proximal promoter of the mouse Pkhd1 gene contains an evolutionarily conserved hepatocyte nuclear factor-1 (HNF-1) binding site and mutations of this site inhibit promoter activity.27 The human PKHD1 promoter region also contains a putative HNF-1 site. When this site was examined in 21 patients using a PCR based strategy, no sequence variants were identified (Guay-Woodford and Igarashi, unpublished data). However, exhaustive analysis of this human PKHD1 promoter region has not been performed and alternative promoters for this transcriptionally complex gene are at least theoretically possible. Finally, missense changes involving alternative exons were identified in several patients. For example, the missense change g.405493A→G was detected in exon 64 in patient 32-1 (data not shown). While only one other putative mutation was identified for this patient, the pathogenic potential of the g.405493A→G variant cannot be assessed in the absence of more detailed information regarding the putative ORF of alternative exon containing transcripts.

Direct gene based testing

Given that ARPKD is an often devastating disease with typical presentation in the perinatal period, there is strong demand for gene based diagnostic testing. However, the complexity of the gene, the variability in mutation detection efficiency, and the high frequency of missense mutations have complicated the development of an efficient clinical test and the robust interpretation of detected sequence variants. To address these issues, we have refined the DHPLC screening protocol by optimising primer design and DHPLC analytic conditions. We have included a template mixing step when no mutation was detected in the first round of screening to optimise the identification of homozygous variants. Without mixing, the detection rate would have been 75.3%. In addition, we have incorporated the matrix criteria described by Miller and Kumar22 and the SSPNN algorithm into our analysis of missense mutations to optimise predictions regarding the pathogenicity of amino acid substitutions and potential splice site effects. We have compiled our missense mutations with all missense changes reported to date and categorised them as pathogenic, unclassified, or polymorphic based on these criteria. This compilation should serve as an important reference source for laboratories performing gene based diagnostic testing.

For newly identified missense changes, assessment of the likely pathogenicity remains problematic. Bergmann et al20 have proposed that the absence of a PKHD1 variant among 400 control chromosomes (<0.25%) is sufficient to conclude that the change is pathogenic. However, we have detected variants from all mutational classes in 1 of 200 control chromosomes (0.5%). Moreover, a few variants with high pathogenic potential have been detected only in normal chromosomes to date.

Therefore, we suggest the following criteria to predict potential pathogenicity of novel mutations: (i) a putative chain terminating effect on the longest predicted polypeptide; (ii) disruption of a canonical splice site or creation of a novel site (SSPNN algorithm); (iii) alteration of an evolutionarily conserved amino acid in the context of the Miller and Kumar matrix; and (iv) detection in <0.5% of normal chromosomes. We propose categorising missense changes that meet one of the first three criteria as pathogenic; those that meet criteria (ii) or (iii) but occur in >0.5% of normal chromosomes as unclassified, that is pathogenicity cannot be excluded; and those found in >0.5% of normal chromosomes as polymorphisms, particularly if they do not meet any of the first three criteria.

Finally, we recognise that despite our stratification strategies, missense mutations may pose challenges for clinical diagnostic laboratories engaged in gene based testing, particularly when the data are sought to guide prenatal diagnosis or pre-implantation genetic diagnosis. Therefore, when possible in these cases, we recommend a combined approach using haplotype analysis to complement PKHD1 mutation screening.

Genotype-phenotype correlations

In our previous study,15 we used a binary assessment of clinical presentation based on whether or not the affected children survived the immediate perinatal period. This endpoint was chosen because it was reliably available for all our patient samples. We tested the hypothesis that differences in the clinical severity observed among the patients resulted, at least in part, from the nature of the germ-line mutations. These analyses indicated that the presence of two chain terminating mutations invariably resulted in perinatal lethality and survival past the immediate perinatal period required the presence of at least one amino acid substitution mutation. However, the converse did not apply, that is perinatal lethality could occur in presence of an apparent missense variant.

The current study focused on those 56 patients in whom we had identified two putative mutations. We applied the same phenotype stratification and examined correlations with various combinations of chain terminating mutations, splice site variants, and amino acid substitutions. In addition, we examined whether the linear position of putative PKHD1 mutations correlated with disease phenotype. Our experimental design has advantages over previous analyses because we did not combine data from several studies, we did not have a high frequency of sequence variants attributable to founder effects, and we did not include a mixed group of individuals with one and two identified mutations. As a result, we lowered the risk of biased results due to differential discovery rates for different mutational types. In addition, because all patients were from a single study and genotype information is complete, we had a higher probability of more accurately assessing genotype-phenotype correlations. That said, as in previous studies, all our patients with combinations of predicted chain terminating and splice site variants were in the perinatal cohort. The presence of at least one amino acid substitution was significantly higher in the non-perinatal subset. However, even this study had limited power and we cannot definitively conclude that at least one amino acid mutation is required for survival. In contrast, our analysis did not confirm previous observations15 suggesting a significant correlation between the less severe phenotype and the distribution of mutations along the PKHD1 ORF. In fact, we conclude that neither the location nor the type of amino acid substitution correlates with disease severity. Of specific note, our two patients with isolated congenital hepatic fibrosis had mutations that were not unique in either type or position.

Future directions for mutation testing

Our study demonstrates that DHPLC provides an efficient and economical screening approach for a gene of the size and complexity of PKHD1. However, we acknowledge that a PCR based methodology is subject to various limitations. For example, PCR will not detect genomic rearrangements involving deletion or duplications of a few kilobases. Two recently described techniques, multiplex amplification and probe hybridisation (MAPH) and multiplex ligation dependent probe amplification (MLPA), allow detection of such mid-size rearrangements by simultaneously screening for the loss or duplication of up to 40 target sequences.28 Both methods rely on sequence specific probe hybridisation to genomic DNA, followed by amplification of the hybridised probe, and semi-quantitative analysis of the resulting PCR products. We speculate that a subset of ARPKD patients in whom ⩽1 sequence variant was detected may carry such mid-size deletions or duplications and we are examining these patients further. In addition, we note that PCR based mutation detection methods may be adversely affected by intronic variations that do not themselves have pathogenic potential but compromise the binding of PCR primers.29 Given the frequency of intronic variants reported in PKHD1 (approximately 17%), it is at least theoretically possible that non-amplification of a mutant allele due to primer binding site variants may contribute to a reduced mutation detection rate in patients with strong clinical evidence for ARPKD.

Finally, we have considered RNA based methodologies, such as RT-PCR30 and protein truncation testing,31 as complementary strategies to enhance the identification of pathogenic variants in PKHD1. These approaches have proven to be quite robust in mutational analyses of transcriptionally complex genes such as NF1, increasing the detection efficiency rate to 95%.21 However, several factors complicate the application of these methodologies to PKHD1 testing. First, unlike neurofibromatosis (NF1) and hereditary breast cancer (BRCA1), ARPKD is transmitted as a recessive trait and detection of both mutations is required for definitive diagnosis. Second, PKHD1 is expressed at very low levels in peripheral blood lymphocytes (see Onuchic et al5 and unpublished data) and neither kidney tissue nor renal epithelial cells are generally available for clinical testing. Third, in mouse Pkhd1, the complex array of splice variants appears to vary, at least in part, in an organ specific fashion (Guay-Woodford, unpublished data). If the same holds true for PKHD1, it will be difficult to assess the biological impact of variants detected in templates from non-phenotypically affected organs.

We propose that the next major advance in PKHD1 mutation detection efficiency will be predicated on an exhaustive examination of the PKHD1 complex transcriptional profile. Moreover, comparative analysis of mRNA processing will be required to determine whether pathogenic variants are represented among the array of transcripts in more clinically accessible tissues such as peripheral blood lymphocytes, amniocytes, and chorionic villus cells. However, in the interim, we believe that DHPLC provides an acceptable screening tool for the detection of PKHD1 variants. Our data and that recently reported by Bergmann et al26 provide an appropriate platform to begin offering gene based diagnostic testing in prenatal cases, patients with unusual clinical presentations, and pre-implantation assessment of early stage embryos.


The authors thank the patients and families who were involved in these studies.


Supplementary materials

  • The tables are available as a downloadable PDF (printer friendly file).

    If you do not have Adobe Reader installed on your computer,
    you can download this free-of-charge, please Click here


    Files in this Data Supplement:

    • [view PDF] -
      Table 1: PKHD1 Primers and DHPLC Conditions
      Table 2: Primers for Alternate PKHD1 Exons
      Table 3: PKHD1 Variant Database
      Table 4: Compilation of PKHD1 Missense Mutations


  • These investigations were supported by the National Institutes of Health (GGG), the Polycystic Kidney Disease Foundation (LG-W), and a Clinical Scientist Award in Translational Research from the Burroughs-Wellcome Foundation (LG-W).

  • Competing interests: none declared