Background Intellectual disability (ID) is characterised by an extreme genetic heterogeneity. Several hundred genes have been associated to monogenic forms of ID, considerably complicating molecular diagnostics. Trio-exome sequencing was recently proposed as a diagnostic approach, yet remains costly for a general implementation.
Methods We report the alternative strategy of targeted high-throughput sequencing of 217 genes in which mutations had been reported in patients with ID or autism as the major clinical concern. We analysed 106 patients with ID of unknown aetiology following array-CGH analysis and other genetic investigations. Ninety per cent of these patients were males, and 75% sporadic cases.
Results We identified 26 causative mutations: 16 in X-linked genes (ATRX, CUL4B, DMD, FMR1, HCFC1, IL1RAPL1, IQSEC2, KDM5C, MAOA, MECP2, SLC9A6, SLC16A2, PHF8) and 10 de novo in autosomal-dominant genes (DYRK1A, GRIN1, MED13L, TCF4, RAI1, SHANK3, SLC2A1, SYNGAP1). We also detected four possibly causative mutations (eg, in NLGN3) requiring further investigations. We present detailed reasoning for assigning causality for each mutation, and associated patients’ clinical information. Some genes were hit more than once in our cohort, suggesting they correspond to more frequent ID-associated conditions (KDM5C, MECP2, DYRK1A, TCF4). We highlight some unexpected genotype to phenotype correlations, with causative mutations being identified in genes associated to defined syndromes in patients deviating from the classic phenotype (DMD, TCF4, MECP2). We also bring additional supportive (HCFC1, MED13L) or unsupportive (SHROOM4, SRPX2) evidences for the implication of previous candidate genes or mutations in cognitive disorders.
Conclusions With a diagnostic yield of 25% targeted sequencing appears relevant as a first intention test for the diagnosis of ID, but importantly will also contribute to a better understanding regarding the specific contribution of the many genes implicated in ID and autism.
- intellectual disability
- high-throughput sequencing
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Intellectual disability (ID) is a common neurodevelopmental disorder reported in 1.5–2% of children and adolescents.1 ,2 ID is defined by significant limitations in both intellectual functioning and adaptative behaviour with onset before the age of 18. Different classes of ID are conventionally defined upon IQ values (severe or profound, <35; moderate, 35–49 and mild, 50–70). However, in routine genetic practice, clinical assessment mainly based on records of developmental history, speech acquisition and patients' autonomy is used for classification in such subcategories.
Causes of ID can be environmental, genetic or multifactorial. Single genetic events are thought to account for a majority of cases, varying from large chromosomal anomalies or copy number variants (CNVs) affecting several genes to point mutations in single genes. These latter monogenic forms are characterised by an extreme genetic heterogeneity, with a hundred genes described as implicated in X-linked ID (XLID), and more associated to autosomal-recessive or autosomal-dominant forms. Altogether there are more than 500 genes proposed to cause ID with high penetrance when mutated3–9 underlying a phenotypic heterogeneity of the same extent in both severity and associated symptoms. This genetic heterogeneity has long limited the diagnostic offer for patients and families, which was often restricted to fragile-X (MIM 300624) testing, array-CGH (comparative genomic hybridization) analysis and generic metabolic tests (see online supplementary figure S1). It may be complemented by sequencing a few genes associated to a specific syndrome evoked by patients' phenotype, yet the diagnostic yield remains low (1–2% for the recurrent fragile-X mutation; 10–15% for array-CGH and chromosomal analyses, higher in highly syndromic patients).10–12 A majority of patients remain therefore without molecular diagnosis, while it is of crucial importance for establishing recurrence risks and providing genetic counselling in the family. Moreover, such diagnosis often has direct consequences for the medical prognosis of patients or their optimised healthcare, and even (yet in still a minority of cases) can indicate specific therapeutic options.
To obviate this low diagnostic yield, we developed the simultaneous targeted sequencing of protein-coding exons of 217 genes associated with ID or autism spectrum disorders (ASDs) as primary clinically significant feature: 99 located on the X-chromosome, 118 on the autosomes. We report here the results of such strategy on a cohort of 106 ID patients with or without associated autistic-like features, negative for array-CGH, fragile-X and other specific genetic analyses. A causal mutation was detected in 25% of these patients, regardless the severity of their cognitive impairment. We illustrate cases in which the molecular diagnosis was immediately established, as well as other more complex situations. This highlights the challenge of interpreting variants generated by NGS technologies, already from targeted approaches restricted to a few hundred genes. This work demonstrates that a targeted sequencing approach is highly efficient for the diagnosis of ID, but also allows refining the clinical spectrum associated with mutations in certain genes, and confirming or questioning the involvement of other genes in cognitive disorders.
Cohort of patients
DNA samples from 106 patients were addressed for testing through clinical geneticists from 16 public hospitals in France. Inclusion criteria for the patients were to be negative for the recurrent fragile-X mutation and for pathogenic CNVs via array-CGH testing, and availability of DNA samples from family members in order to conclude upon molecular findings. Additional specific genetic investigations had been performed on a majority of patients (on average: two genes tested per patient, table 1). Presence of multiple major congenital anomalies or suspected mitochondrial/peroxisomal disorders was an exclusion criterion. Clinical data were recorded following a standardised clinical record highlighting prenatal history, developmental history, neurological and behavioural disorders. ID severity was assessed by medical geneticists upon clinical evaluation and was not a discriminating inclusion criterion, although we encouraged inclusion of probands with moderate or severe ID. This study was approved by the local Ethics Committee of the Strasbourg University Hospital (Comité Consultatif de Protection des Personnes dans la Recherche Biomédicale (CCPPRB). For all patients, a written informed consent for genetic testing was obtained from their legal representative.
Targeted genes and capture design
The 217 selected genes include 99 genes associated to XLID and 118 genes located on autosomes, implicated in dominant (45), recessive (66) or complex (7) forms of ID (see online supplementary table S8, further justification on gene selection is given in online supplementary methods). We targeted all protein-coding exons of these genes, including 20 bp of intronic flanking sequences. The overall size of targeted regions is 1.034 Mbp. Corresponding 120 bp RNA baits were designed using SureDesign (https://earray.chem.agilent.com/suredesign/).
Library preparation and sequencing
DNA samples were extracted from peripheral blood or saliva. Sequencing libraries were prepared as described previously,13 performing individual in-solution SureSelect capture reaction for each DNA sample (Agilent, Santa Clara, California, USA). Paired-end sequencing (2×101-bp) was performed on an Illumina HiSeq 2000/2500, multiplexing up to 16 samples per sequencing lane.
Bioinformatic pipeline and variant ranking
Read mapping, variant calling and annotation was performed as described previously.13 Detected variants (short indels and single nucleotide variants (SNVs)) were ranked by VaRank (an in-house developed script), which incorporates the annotations retrieved by alamut-HT (putative effect on the protein, conservation scores, splice site predictions, allelic frequency in the 106 patients and in control cohorts such as Exome Variant Server (EVS) or 1000 genomes). Candidate variants were selected when harbouring a frequency compatible with the incidence of the disease: expectedly accounting for less than 0.1% of all ID cases, aka resulting in a disease frequency <0.002%.14 Using the EVS population as a subset of the general population, candidate variants were thus retrieved when reported in EVS: with a minor allele frequency lower than 0.45% for variants in autosomal-recessive genes (ie, with a frequency of homozygotes <0.002%), in no more than one carrier for variants in autosomal-dominant genes or in no more than one male for variants in X-linked genes (as we cannot exclude, although it is unlikely, that a particular carrier from the general population might have mild ID). We concomitantly excluded variants present more than twice in the cohort of 106 patients. Remaining variants predicted as potentially pathogenic and fitting with the mode of inheritance associated to the affected gene were further tested for validation (see online supplementary table S9).
CNVs detection pipeline
Putative heterozygous/homozygous/hemizygous structural variants or CNVs were highlighted using the previously described method based on a depth-of-coverage comparison between the index sample and eight other random samples from the same sequencing lane.13 For the X-chromosome, coverage was normalised according to the number of X-chromosomes of the patient.
All candidate mutations were validated by Sanger sequencing and co-segregation analyses were performed as extensively as possible. Putative splicing mutations were confirmed either using a minigene in vitro assay with the SPL3B plasmid as described previously15 or using patients’ fibroblasts or blood RNA when available. For apparent de novo variants, pedigree concordance was checked using polymorphic microsatellite markers (PowerPlex 16HS System, Promega). Mutations were considered as certainly causative when no doubt remained regarding their pathogenicity and an unambiguous diagnosis could thus be established. Such mutations co-segregated with the disease status in the family and were either truncating mutations or missense mutations that had been previously convincingly published or that we confirmed with functional analyses. Mutations were considered as potentially causative when they appeared to co-segregate with ID in available members of the family and were predicted to be damaging, but further functional studies are needed to prove unambiguously their pathogenicity.
High-quality sequencing data ensure low rates of false-positive/negative calls for SNVs, indels and CNVs
Our strategy allowed generating a high-quality sequencing dataset, with a mean depth of coverage of 350× and an average per patient of 97.7% of targeted regions being well covered (>40×; see online supplementary table S1). Such coverage ensures a sensitivity of 99–99.9% of detecting SNVs and indels at any allelic state (Illumina Technical Note). We further assessed the sensitivity of SNV detection by comparing allelic states of SNPs detected by SNP-array (Affymetrix SNP Array 6.0) with the corresponding sites located in the targeted sequencing data of two patients and found no false-negatives after Sanger sequencing validation of six SNPs showing discrepant allelic states between both methods (452 SNPs analysed). Interestingly, for these six SNPs, Sanger sequencing results were always in favour of targeted sequencing data suggesting a much higher accuracy (data not shown). No false positive was detected out of the 80 candidate variants located in well-covered regions that were tested for confirmation by Sanger sequencing.
This high-sequencing depth also ensured reliable CNVs calling. All CNVs detected by our pipeline were validated by Sanger sequencing, qPCR and/or confirmed retrospectively when looking at array-CGH data (see online supplementary table S2). Some were not initially mentioned in the array-CGH report because they were covered by only a few SNP probes (≤2 deleted probes, under the detection threshold).
Very few regions (a total of 3.9 kb, only 1.8 kb being protein coding) appear consistently poorly covered (coverage <40× in >90% of the samples; see online supplementary table S3). Those are mainly first exons or highly GC-rich regions that are a well-known burden in such capture strategy.
Cohort description and diagnostic yield
Patients harboured various degrees of cognitive impairment, although with a higher proportion of moderate or severe forms (46% and 42% respectively; table 1). The cohort was highly enriched in males. Among male probands, 68% were sporadic cases, the remaining had familial history of cognitive impairment mainly evocative of an X-linked mode of transmission (table 1). We detected certainly causative mutations in 26/106 patients (table 2; see online supplementary figures S2–S19), leading to an overall diagnostic yield of 25% for the entire cohort (from 23% for sporadic cases to 31% for familial cases). Unexpectedly, the diagnostic yield appears unrelated to the severity of ID in patients (table 1).
Sixteen mutations are located in genes of the X-chromosome: 14 point mutations or small indels (in ATRX, CUL4B, DMD, HCFC1, IL1RAPL1, IQSEC2, KDM5Cx2, MECP2x2, MAOA, PHF8, SLC9A6, SLC16A2), as well as two larger pathogenic events (one hemizygous complex rearrangement in MECP2, one hemizygous exon deletion in FMR1; see online supplementary figures S17 and S13). We identified 10 de novo point mutations or small indels in genes involved in autosomal-dominant/haploinsufficient forms of ID (in DYRK1Ax2, GRIN1, MED13L, RAI1, SHANK3, SYNGAP1, SLC2A1, TCF4x2). In four other patients, we identified potentially causative mutations (in NLGN3, PQBP1, SLC2A1 and TCF4; see online supplementary figures S20, S21, S7 and S9, respectively), whose implication in cognitive impairment has to be further confirmed. Finally, missense variants that appeared at first likely to be pathogenic, notably based on the very high evolutionary conservation of the affected residues or on previous publications, appeared excluded as causal after further segregation analysis (in FLNA, FMR1, HUWE1 or MECP2, see online supplementary figure S22).
Atypical type of mutations
Among the 26 certainly causative mutations identified, some were surprising by the nature of the mutation itself. In a boy with severe encephalopathy, epilepsy, hypotonia and microcephaly, we detected a highly complex rearrangement in exon #4 of MECP2, involving a 139 bp deletion flanked by the insertion of two sequences in inverted orientation derived from intron #2 still keeping the reading frame downstream of the rearrangement. Such event is inherited from the proband's mother who presents with speech delay and dyslexia (see online supplementary figure S17). This exon is known to be the one accumulating most mutations in patients, especially the 3′ half, which is a recombination hotspot and the target of several deletions/duplications and less frequently inversions.16–18
We report here for the first time an intragenic deletion affecting FMR1 outside of the promoter/exon #1 region that is the target of the fragile-X syndrome CGG-expansion. Very few point mutations have been reported in coding regions. We identified a complete deletion of the last exon of FMR1 in one patient and his two affected brothers unevenly presenting with clinical features of fragile-X syndrome (see online supplementary figure S13).
We also identified a patient carrying a maternally inherited 10 bp deletion causing a frameshift in exon #7 of IL1RAPL1, while unexpectedly his affected brother bears a de novo deletion of the full exon highlighting that affected relatives may carry distinct mutations. We propose that small 10 bp deletion may have created a sequence conformation favouring further instability and leading to the larger deletion observed in the second brother (see online supplementary figure S14). Indeed, a large proportion of IL1RAPL1 causative mutations are intragenic exon deletions or pericentric inversions,19 supporting that IL1RAPL1 region is highly susceptible to recombination events.
Lastly, we identified a de novo Pro578Arg missense mutation in GRIN1 in a male proband with severe ID, hypotonia, feeding disorders and very poor speech but no epilepsy, a phenotype similar to that associated to GRIN1 missense mutations in some patients.20 ,21 A maternal uncle presented with similar features except for the poor speech and hypotonia (see online supplementary figure S3), initially suggesting an associated X-linked mode of inheritance. The finding of a de novo deleterious missense mutation excludes this latter hypothesis, further highlighting the prevalence of phenocopies in cognitive disorders.
Genotype–phenotype correlations: from expected to unexpected
The majority of certainly causative mutations were identified in patients whose clinical phenotype was retrospectively consistent with previous reports (table 2). For instance, the proband carrying a truncating mutation in IQSEC2 presents with severe ID, no speech, motor developmental delay, severe epilepsy, strabismus and autistic features (see online supplementary figure S15), which matches the recently proposed clinical spectrum associated to mutations in this gene.22 ,23 Likewise, DYRK1A was originally found disrupted by translocations or deleted in several patients with ID and microcephaly.24–26 More recently, truncating mutations in this gene were shown to cause ID associated with primary microcephaly (sometimes borderline at −2 SD), growth retardation, developmental delay, facial dysmorphic traits, seizures and major feeding difficulties, with or without associated autism.27–31 We report here two novel de novo truncating mutations in patients with similar clinical features but no epilepsy (see online supplementary figure S2). Nonetheless, in a few other cases (eg, mutations in RAI1 or MECP2), the probands lacked some clinical features, thus the corresponding diagnosis of Smith-Magenis (MIM 182290) or Rett (MIM 312750) syndrome was not evoked by experienced clinical geneticists (see online supplementary figures S5 and S17). For instance, among the three patients with MECP2 mutations, one female had the classic Rett phenotype, the other female proband presented with non-classic Rett phenotype (no regression episode, no hand-flapping), while the rarity of reports of MECP2 encephalopathies in adolescent males precluded suspicion of the involvement of that gene in a patient (APN-3) who had undergone rather extensive prior genetic testing
A few detected mutations were unexpected as they were detected in patients whose phenotype did not match previous descriptions. For instance, mutations in TCF4 mainly cause Pitt–Hopkins syndrome (PHS, MIM 610954) characterised by severe motor retardation, absence of speech, characteristic dysmorphic traits, autistic features, intestinal problems and hyperventilation.32 We here describe truncating TCF4 mutations in two patients, one with clinical manifestations highly suggestive of PHS, the other with less syndromic manifestations and no dysmorphic traits (see online supplementary figure S9). TCF4 mutations were already reported in patients with non-syndromic ID, suggesting that such mutations were likely to be underdiagnosed.33
Another patient and his affected brother both carry a distal frameshift mutation in DMD affecting the major muscle transcript encoding the dystrophin protein associated to Duchenne or Becker muscular dystrophy (DMD, MIM 310200; BMD, MIM 300376), and the brain-specific isoform Dp71. The index case presents with moderate ID, psychomotor retardation, no speech, behavioural disorders, dysmorphic traits but strikingly no muscular phenotype (see online supplementary figure S12). His brother harbours a milder phenotype with additional hypotonia and cerebellar dysplasia. Both harbour borderline-high CPK levels. The association of cognitive impairment with DMD/BMD has been extensively reported and correlated to truncating mutations affecting Dp71, yet never in the absence of a muscular phenotype.34–39 Our findings extend the recent report of a large family with affected males carrying an in-frame single amino acid deletion associated to mild ID and no muscular phenotype.40
Confirmation of candidate genes for cognitive disorders
Some selected genes were only candidate ID or ASD genes at the time of the design, with single pieces of evidence in the literature. The identification of additional mutations in patients with similar phenotype definitively confirms their implication in cognitive disorders.
We reported a damaging missense affecting the function of the Monoamine Oxidase A enzyme (MAOA),41 ,42 which replicated for the first time in 20 years the implication of MAOA in autism/ID associated to significant behavioural disorders.41 ,42 We also identified a probably pathogenic missense variant in NLGN3 in a male and his cousin, both presenting with ID and autism (see online supplementary figure S20). In silico predictions, high conservation of the mutated residue across all neuroligin paralogs, and familial analysis are altogether in favour of a pathogenicity of this missense change. A definitive functional effect of this missense still needs to be demonstrated to clearly establish the diagnosis. The implication of this gene was never replicated since the initial publication,43 although screened in several cohorts with comparable phenotypes.44–50
Similarly, we identified a novel truncating point mutation in MED13L confirming the implication of this gene in ID (see online supplementary figure S4). Disruption of MED13 L was initially associated with transposition of the great arteries (TGA), associated to ID in a single case with a chromosomal translocation.51 A homozygous missense mutation was then identified in two siblings from a consanguineous family presenting with non-syndromic ID, suggesting an implication of the gene in autosomal-recessive forms of ID justifying its selection in our panel.5 A total of five patients were more recently described with de novo intragenic CNVs or point mutations affecting MED13L, delineating a recognisable MED13L-haploinsufficiency syndrome characterised by hypotonia, moderate ID, variable cardiac defects, facial hypotonia and dysmorphic traits.52 ,53 Our patient with the MED13L mutation presents with concordant phenotype such as an open mouth appearance and muscular hypotonia but no cardiac defects. Due to the initially proposed autosomal-recessive mode of inheritance associated to ID, at first we did not consider this heterozygous truncating mutation as causative, as it may also have been the case for a heterozygous splicing mutation identified earlier in a large-scale exome sequencing study in one male with ASD.28 Altogether, these findings suggest that ID associated to MED13L-haploinsufficiency syndrome is a relatively frequent condition.5 ,53
Ambiguous mode of inheritance in ID-associated genes: the example of DEAF1
As for MED13L, the mode of inheritance associated to some genes is ambiguously described in literature. We identified an heterozygous variant affecting splicing in DEAF1 inherited from the asymptomatic mother in a patient presenting with severe ID, developmental delay, poor speech, pain resistance, dysmorphic features and aggressive behaviour (figure 1), while the gene had been proposed as associated to autosomal-dominant forms of ID.7 ,8 The recent report of two additional individuals carrying de novo missense mutations narrowed the associated phenotype to moderate/severe ID, speech impairment, behavioural problems, high pain threshold, dysmorphic features and abnormal walking pattern, hence highly similar to the one of our proband.54 Although in vitro validation studies suggest that the reported missense variants lead to an impaired function of DEAF1, the authors concluded that they presumably act as dominant-negatives incapacitating both normal and mutant proteins since truncating variants had been observed in asymptomatic individuals.54 In parallel, a homozygous missense mutation clustering in the same SAND-domain with all three de novo missense mutations was reported in members of a consanguineous family presenting with ID, microcephaly and white matter abnormalities, therefore suggesting a possible autosomal-recessive mode of inheritance.55 The pathogenic mechanism associated to DEAF1 mutations is therefore unclear. Due to the highly similar clinical features of the herein reported proband and of probands carrying de novo missense mutations, the splice variant detected here may contribute to the phenotype of our patient, possibly through a recessive mode of inheritance (ie, acting in trans with another heterozygous variant) since haploinsufficiency appears tolerated in healthy individuals. Altogether those findings either suppose a similar phenotype for autosomal-dominant and autosomal-recessive mode of inheritance associated to DEAF1 mutations, or a universal autosomal-recessive mode of inheritance with a second variant that has not been yet identified, alike what was finally proven for Thrombocytopenia-absent radius (TAR) syndrome for instance.56
Unsupportive evidences for proposed ID-associated mutations and genes
The identification in patients of a previously reported ID mutation should be considered with caution: indeed, as already discussed in previous publications, some ‘false-positive’ mutations are present among variants annotated as ‘pathogenic’ in dbSNP, OMIM or in published reports.13 ,14 We here question the causative effect of a few previously proposed mutations (see online supplementary table S4). For instance, one missense variant identified in FLNA (c.3872C>T , p.Pro1291Leu) was previously reported in a patient with FG syndrome (MIM 300321).57 This variant was identified here in a patient with a different phenotype (see online supplementary figure S22) and is also reported in one male in EVS (presumably not presenting with cognitive disorders), raising doubt about its pathogenicity.
The identification of non-segregating truncating variants (ie, detected both in patients and healthy relatives) can also challenge the implication of genes in X-linked and autosomal-dominant forms of ID. In one family, we identified a frameshift variant in SRPX2 in a male proband. It is most likely inherited from the deceased asymptomatic maternal grandfather since it was also detected in the mother and in three maternal aunts yet absent from the maternal grandmother. Despite recent functional evidences regarding the role of SRPX2 in brain development,58 ,59 its definitive implication in cognitive disorders has already been questioned following the presence of the initially proposed mutations in EVS14 and in control individuals,60 and subsequently to the identification of a missense mutation in GRIN2A co-segregating with the epileptic status in the initial SRPX2 family.61 In another family, we identified a nonsense variant in SHROOM4 in a male proband yet also in his unaffected brothers, a finding that further challenges the implication of SHROOM4 in X-linked cognitive disorders (see online supplementary table S5, figure 2).14
Patients carrying probably pathogenic variants in two ID-associated genes
In three unrelated patients, we identified candidate variants in two separate genes, requiring the evaluation of different scenarios: (a) one single contributor while the second variant is innocuous, (b) one major contributor while the second variant acts as a modifier, and (c) both variants are implicated in the phenotype and have a synergistic effect.
In one male proband, we identified a maternally inherited splice variant leading to a frameshift in the X-linked gene PHF8, together with a de novo truncating variant in DOCK8 (see online supplementary figure S18). Considering both the weak evidences implicating DOCK8 in autosomal-dominant ID (two probands reported with translocations disrupting the gene), the clinical features consistent with a PHF8 mutation (even in the absence of cleft lip/palate) and the X-inactivation bias identified in the mother, the PHF8 variant alone is most likely to be responsible for the phenotype leaving the DOCK8 variant as probably innocuous.62 ,63
In another male proband, two possibly causative missense variants were identified in the XLID genes ATRX and HCFC1 (figure 3). Patients' phenotype was not evocative of an ATRX mutation (no dysmorphic traits, no urogenital abnormalities, absence of Heinz bodies), but perfectly matched the recent description of cobalamin-X metabolic disorder (cblX, MIM 30954164). We concluded to a causative effect of the HCFC1 mutation (being one of the cblX recurrent mutations), but cannot exclude a contributory effect of the ATRX variant based on current co-segregation data.
In the third family, the proband carries one distal truncating variant in SLC2A1 and one missense variant in ANKRD11, each inherited from one asymptomatic parent. Some mutations in SLC2A1 have already been associated to GLUT-1 deficiency (MIM 606777/612126) with incomplete penetrance.65 The proband presents with evocative symptoms of both SLC2A1 and ANKRD11 mutations (major hypotonia along with skeletal abnormalities) that might suggest a synergistic contribution of both variants to the phenotype (see online supplementary figure S7). No conclusion could be unambiguously drawn in this case, but such a di- or oligogenic mode of inheritance has already been proposed in neurodevelopmental disorders.66–68
Targeted sequencing of 217 genes in a cohort of 106 patients with unknown genetic aetiology of ID led to a conclusive diagnostic yield of 25%. A majority of causative mutations identified are located in XLID genes, which can be partially explained by our male-enriched cohort. When excluding familial cases, X-linked mutations are found in 7/72 sporadic male cases, matching the proposed figure that 10% of males with sporadic ID carry mutations in X-linked genes.69 Combining the results of recent trio-exome analyses leads to similar proportions (X-linked causative mutations in 7/66 males with sporadic ID3 ,8). The detection rate of X-linked mutations in our cohort is—as expected—higher in males with familial history of ID, but is also significant in females (2/10, both with MECP2 mutations). Also unsurprisingly, mutations in autosomal-dominant/haploinsufficient ID genes are mostly found within sporadic cases.
Although we did not expect patients to harbour mutations in the same genes because of the small size of the cohort and the extensive genetic heterogeneity of ID, we had a few genes hit more than once: MECP2 (despite being screened in 11% of patients prior to inclusion), KDM5C, DYRK1A and TCF4. Our results, intersected with the ones from other studies, highlight mutations within common genes (MECP2, CUL4B, IL1RAPL1, IQSEC2, KDM5C, SLC9A6, SLC16A2, DYRK1A, SLC2A1, SYNGAP1 or TCF4), suggesting that they are more frequently mutated in ID patients.3 ,7–9 ,20 ,28–30 ,70 ,71 If mutations in such ID genes are confirmed to account for a substantial number of patients, introducing as a diagnostic step massive multiplexed resequencing of such genes in large cohorts may be considered.29
The extensive genetic and phenotypic heterogeneity of ID is the major hindrance for obtaining a precise molecular diagnosis. Direct sequencing strategies consisting in sequentially screening candidate genes are being replaced in diagnostic laboratories by more high-throughput NGS-based strategies: multiplex targeted sequencing of a few genes in large cohorts, selective targeted sequencing of up to several hundred genes, exome sequencing, while whole-genome sequencing (WGS), although powerful, notably for genome rearrangements and for CG-rich exons, remains in the research domain. Exome or full-genome strategies are more exhaustive alternatives and very attractive issues in the field of genetic diagnosis. Their universal approaches, whatever the clinical features, enhance the technical management of the workflow. However, the actual coverage proposed with the exome or full-genome strategies is still frequently insufficient (see online supplementary table S6), which may result in missing mutations.9 ,29 Also, the finding of a putative mutation in a gene never associated before to a given pathology marks the start of a research endeavour to validate the finding but is not per se a diagnostic result.
Targeted sequencing appears more appealing for well-defined pathologies in which most implicated genes have been uncovered or in clinically homogeneous entities (ie, Bardet-Biedl and related ciliopathies,13 retinal dystrophies,72 hearing loss,73 etc). We show here that it is also a powerful alternative for diagnostic purposes in ID (see online supplementary table S6). The positive diagnostic yield in our cohort of 106 patients is of 25% overall and 21% for sporadic cases, which is similar to the initial reports with the trio-exome strategy (32 highly likely causative mutations in a total of 151 patients, a 21% yield; see online supplementary table S73 ,8). Indeed, most of the mutations identified by exome sequencing affect genes included in our panel and would thus have been detected with our strategy. Our diagnostic yield is slightly lower than the 30% yield (on 211 patients) we now calculated from pooling the three published exome studies (including also29 and using the updated data for2 very recently reported in Gilissen et al9), a difference that can be accounted for by the recently recognised ID genes missing in our panel (see online supplementary table S7). Our relatively high diagnostic yield is not due to an overestimation regarding the pathogenicity of the identified variants as we were highly stringent regarding the classification of certainly causative mutations (eg, we did not include all probably damaging missense variants located in XLID genes that were detected in males). The high depth of coverage and the smaller portion of poorly covered regions achieved with our strategy ensure a high sensitivity and specificity of detecting pathogenic events in the regions of interest (SNVs, indels, but also CNVs, a topic that had not been addressed in previous exome studies). Also, the relative ease of variant analysis and smaller number of follow-up studies for candidate variants may contribute to this significant yield. As fewer candidate variants are identified per patient with this approach, they can also be analysed more thoroughly (putative effect on splicing for variants not affecting canonical splice sites, predictive impact on the protein through structural modelling, etc). With an even broader panel of genes (including, for instance, genes involved in ID associated to multiple congenital anomalies but found mutated in patients with a milder presentation), one can expect that the resulting diagnostic yield will be higher.
The limited number of sequenced genes with the targeted approach—restricted to those involved in cognitive disorders—also avoids the controversial issues raised by incidental findings. Nonetheless, the targeted sequencing approach will miss newly identified genes and theoretically prevents data reanalysis (ie, incorporating novel findings regarding genetics of ID). The major value of targeted sequencing for ID is that it should allow the application of such test to a much higher proportion of patients awaiting molecular diagnosis given the significantly lower cost of sequencing, but especially of data analysis, storage and interpretation. It should thus generate much more promptly large amounts of data regarding the spectrum of mutations and phenotypes associated to the many genes implicated in ID, and thus considerably increase our knowledge on the specific condition associated with each gene.
The substantial proportion of patients that remain without molecular diagnosis with either strategy raises several issues: whether a large fraction of ID-associated genes remain to be discovered, whether many mutations are missed because they are located in non-coding regions or lastly whether more complex genetic scenarios are implicated such as variants with reduced penetrance, oligogenic and/or multifactorial modes of inheritance.66–68 In particular, many disorders associated with autosomal-dominant inheritance are associated with incomplete penetrance and high intrafamilial phenotypic variability, although such scenario is mostly excluded from trio-studies that focus on de novo mutations. The recent results from Gilissen et al provide initial response on this issue: a substantial proportion of such previously negative patients seem to carry pathogenic CNVs that could not be detected by array-CGH studies (and were not tested in the exome studies), other patients carry mutations located in newly identified genes or, mutations in known genes yet that had been missed by previous bioinformatic pipelines.9 With an estimated cumulative diagnostic yield of 62% (although derived from a small cohort of 50 patients, for WGS analysis), whole-genome analyses of larger cohorts of patients may shed some more light regarding the contribution of the abovementioned hypotheses to explain the remaining proportion of patients without molecular diagnosis.
This study was supported by grants and fellowships from Fondation pour la Recherche Médicale, Agence de Biomedecine and Fondation Jerome Lejeune, APLM and CREGEMES. We thank the students who were involved in this project: Sébastien Kirsch, Audrey Creppy, Inès Bekkour and Grace Gan. We thank Nadège Calmels, Valérie Biancalana, Elsa Nourisson and all other members of the Genetic Diagnostic Laboratory of the Nouvel Hopital Civil (Strasbourg) for their help with patients’ DNA sample selection and preparation. We thank Cecile Pizot for the development of VaRank. We thank Damien Sanlaville, Christine Coubes, Delphine Héron, Sophie Naudion, James Lespinasse and Marie-Line Bichon for their contribution to the recruitment of patients, and all medical interns or genetic counsellors who participated in this project. Finally, we warmly thank all patients and their families for their implication in this study.
Contributors Study concept and design: CR, J-LM, AP. Clinical genetics investigations: AM-P, MW, GL, SE-C, YA, MB, DB, EC, HD, BD, M-AD, VD-G, EF, MF, CF, AG, SL, MM-D, DM-C, DL, GM, AP, SS, CT-R, JT, MD-F, DG, PS, PE, BI and LO-F. Acquisition, analysis and interpretation of data: CR, BG, JL, YH, JM, AQ, BJ, J-LM and AP. Drafting of the manuscript: CR, J-LM and AP. Critical revision of the manuscript for important intellectual content: BG and JM. Obtained funding: J-LM and AP. Administrative, technical or material support: NH, MD, CF, VG, SLG, MP and SV. Study supervision: BG, J-LM, AP.
Competing interests None.
Patient consent Obtained.
Ethics approval Comité de Protection des Personnes, France.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All the variants identified during the course of this study will be submitted to the variant database ClinVar: http://www.ncbi.nlm.nih.gov/clinvar/.