From the periphery to centre stage: de novo single nucleotide variants play a key role in human genetic disease
- 1Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- 2Department of Neurology, National Neuroscience Institute, Singapore, Singapore
- 3Duke-National University of Singapore Graduate Medical School, Singapore General Hospital, Singapore, Singapore
- 4School of Medicine, Institute of Medical Genetics, Cardiff University, Cardiff, UK
- Correspondence to Dr Chee-Seng Ku, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden; and Professor David N Cooper, Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK;
- Received 4 January 2013
- Accepted 9 January 2013
- Published Online First 9 February 2013
Human germline mutations arise anew during meiosis in every generation. Such spontaneously occurring genetic variants are termed de novo mutations. Although the introduction of microarray based approaches led to the discovery of numerous de novo copy number variants underlying a range of human genetic conditions, de novo single nucleotide variants (SNVs) remained refractory to analysis at the whole genome level until the advent of next generation sequencing technologies such as whole genome sequencing and whole exome sequencing. These approaches have recently allowed the estimation of the mutation rate of de novo SNVs and greatly increased our understanding of their contribution to human genetic disease. Indeed, de novo SNVs have been found to underlie various common human neurodevelopmental conditions such as schizophrenia, autism and intellectual disability, as well as sporadic cases of rare Mendelian disorders. In many cases, however, confirmation of the pathogenicity of identified de novo SNVs remains a major challenge.
Germline mutations arise anew during meiosis in every generation. Such spontaneously occurring genetic alterations are termed de novo mutations and this term serves to describe those heritable mutations that neither parent possessed or transmitted. Thus, de novo mutations are mutations that arose in the gametes of an individual's parents as distinct from post-zygotic somatic mutations that may have arisen post-fertilisation.1–3
De novo mutations are evident in the context of a range of different types of lesion, from single nucleotide variants (SNVs) to small indels of multiple bases, to larger structural variations including deletions, duplications, and other chromosomal rearrangements. Studies of human de novo mutations on a genome wide scale were, until comparatively recently, extremely challenging owing to the technological limitations of available screening methods. Microarray based technologies have been deployed very successfully in the context of de novo copy number variants (CNVs), most notably in the identification of submicroscopic deletions and duplications underlying schizophrenia and intellectual disability.4–9 However, these microarray based methodologies were inadequate to the task of investigating de novo SNVs and small indels on a genome wide scale. Such studies required large scale sequencing which was simply not feasible by means of Sanger sequencing. Thus, the extent to which de novo SNVs and small indels contribute to the burden of human genetic disease has remained largely unexplored. The advent of high throughput, next generation sequencing (NGS) technologies has now ushered in a new era in the study of these types of de novo mutation in the human genome.10
Whole exome sequencing (WES) and whole genome sequencing (WGS) can now be rapidly performed on parent–offspring trios to identify de novo SNVs residing either within the protein coding regions or the entire genome.1–3 However, the majority of studies reported to date have adopted WES (see online supplementary table S111–36) owing to: (1) the fact that WES is cheaper and analytically less challenging than WGS; and (2) that most of the sought after disease causing and/or deleterious mutations are expected to be found within the protein coding regions. In this article, we review and summarise recent discoveries of de novo SNVs (ie, single nucleotide substitutions) causing human genetic disease. We focus on those studies which have applied WES to the investigation of de novo SNVs and their implications for our understanding of the aetiology of common complex diseases as well as rare Mendelian diseases. We also discuss the potential clinical applications of NGS in the genetic diagnosis of those diseases characterised by a high frequency of de novo SNVs.
By definition, WES focuses on sequencing the entire set of exons (protein coding regions) in the genome.37 As such, and in contrast to WGS, it requires exome enrichment before massively parallel sequencing. The development of a battery of whole exome enrichment methods was therefore a prerequisite for the success of this approach.38 During the sequence enrichment steps, the genomic regions of interest (ie, all exons) are captured through hybridisation selected DNA fragments using oligonucleotide probes, whereas the unwanted DNA sequences (ie, the non-coding regions) are removed before sequencing. Different commercial exome enrichment methods vary markedly in terms of the size of their targeted exomes and the specific genomic regions being enriched.39–42 Genomic enrichment leads to a significant reduction in the proportion of the genome that needs to be sequenced; hence WES can achieve higher sequencing coverage. Thus, since the exome comprises only ∼30 Mb of genomic DNA sequence43 (the actual targeted size being dependent on the precise choice of exome enrichment method adopted), approximately 3 Gb of filtered sequence reads or sequencing data are sufficient to achieve an average 100× depth of coverage. Achieving sufficient depth of coverage is particularly important for the detection of heterozygous de novo mutations which are characteristic of dominant conditions, in order to ensure that both alleles are sequenced adequately. These sequence enrichment methods, coupled with the high throughput NGS technologies such as Illumina HiSeq and Life Technologies SOLiD sequencing platforms, have together heralded a veritable explosion of WES applications in delineating the genetic causes of rare Mendelian disorders with new insights even for some common complex diseases (eg, schizophrenia, autism, and intellectual disability).44–49
De novo mutations in human genetic disease
Owing to technological limitations, the contribution of de novo SNVs to human genetic disease (both common complex diseases and rare Mendelian disorders) has until recently remained largely unexplored, at least at the whole genome level. De novo SNVs are nevertheless likely to have profound clinical and/or phenotypic consequences when they impact functionally important nucleotides in the genome (eg, nonsense and splice site mutations or missense mutations in evolutionarily conserved sequences). However, only a relatively small proportion of the de novo SNVs occurring at each meiosis are nonsense and splice site mutations or will alter functionally important nucleotides, and hence are likely to be disease causing (see online supplementary table S1). Although de novo CNVs were shown quite early on, by means of whole genome microarrays, to be associated with several common neurodevelopmental diseases,4–9 it was not until comparatively recently that de novo SNVs were also implicated in the aetiology of schizophrenia, autism/autism spectrum disorder (ASD) and intellectual disability through the WES of affected trios.50–56 In similar vein, de novo SNVs have also been found to be responsible for sporadic cases of various rare dominant Mendelian disorders such as Kabuki syndrome (MIM 147920), Schinzel–Giedion syndrome (MIM 269150), and Bohring–Opitz syndrome (MIM 605039).11–13 These Mendelian disorders are also characterised by multiple neurodevelopmental defects; thus, for example, Bohring–Opitz syndrome is a clinically recognisable syndrome characterised by severe intellectual disability, distinctive facial features, and multiple congenital malformations.13
It has been postulated that the occurrence of de novo mutations might explain why diseases characterised by dramatically reduced fecundity such as schizophrenia, autism, and intellectual disability nevertheless remain fairly common in the general population. If this were to be the case, the de novo mutations occurring in these diseases would serve to replenish the number of highly penetrant disease mutations in every generation despite continual negative selection against the disease alleles. Consistent with this hypothesis, de novo CNVs have been shown to be a common cause of schizophrenia, autism, and intellectual disability.57 ,58
De novo SNVs have also been identified in schizophrenia through the WES of affected case–parent trios.52 The exomes of 53 sporadic schizophrenia cases (with no history of the disease in a first or second degree relative), 22 unaffected controls and their parents were sequenced. This study implemented a set of filters to eliminate false positive de novo variants (ie, these would appear to occur de novo either as a consequence of non-detection in the parents or due to systematic false positive calls in the offspring), including validation by Sanger sequencing. As such, the study identified a total of 34 de novo point mutations (33 SNVs and one dinucleotide substitution) and four de novo indels (microdeletions or microinsertions) in the affected trios throughout the exomes. This study found an excess of non-synonymous changes among the identified de novo SNVs in schizophrenia cases, as well as an increased likelihood of missense variants affecting protein structure and function when compared with the rare inherited exonic variants identified in the same study. Of the 34 de novo point mutations identified, 32 were missense mutations, 19 of which affected evolutionarily conserved positions and were predicted bioinformatically (by PolyPhen-2 analysis) to alter protein function. In addition, three of the de novo indels were predicted to give rise to protein truncations whereas one resulted in a single amino acid deletion. Overall, 27 of the 53 cases of schizophrenia were found to carry at least one de novo mutation (SNVs or/and indels). However, it is noteworthy that several exonic de novo SNVs (four non-synonymous and three synonymous) were also identified in seven of the 22 control subjects.52 This highlights the fact that distinguishing pathogenic and/or deleterious de novo mutations from non-disease-associated de novo mutations can be a tricky task; indeed, it has become clear that de novo non-synonymous SNVs detected in controls need not necessarily be pathogenic.
By contrast, sequencing of complete genomes from two ‘apparently healthy’ parent-offspring trios (from the HapMap CEU (Utah residents with Northern and Western European ancestry) and YRI (Yoruba in Ibadan, Nigeria)) identified a total of 49 and 35 germline de novo mutations in two parent offspring (CEU and YRI) trios, respectively.2 However, none of these was non-synonymous SNVs, and only one synonymous SNV was detected in the CEU trio. A larger study on apparent healthy trios would be needed to determine the occurrence of de novo non-synonymous SNVs occurred per meiosis. Thus, additional evidence would generally be required to demonstrate the pathogenicity of a given de novo non-synonymous SNV—for example, prediction of protein damage, evolutionary conservation of the affected nucleotide or amino acid residue—and whether the SNV is located in a gene already known to be responsible for the disease in question, or at least a gene which is a biologically plausible candidate for disease involvement (this is discussed in more detail below). Girard et al53 also applied WES to 14 trios (each trio comprising an individual with schizophrenia plus his or her parents) and identified 15 de novo SNVs in different genes in eight of the schizophrenia probands. Of these 15 validated de novo SNVs, four were nonsense SNVs (the remaining were missense) predicted to lead to premature termination of translation. The predicted effect on protein function of the 11 missense SNVs includes benign, possibly damaging, and probably damaging. It has been well known from the inherited variants as well, that the prediction on protein function alone is inadequate to determine the pathogenicity of de novo missense SNVs, and additional lines of evidence are required.
In a parallel development, de novo SNVs were also found at elevated frequencies in ASD/autism.51 ,54–56 This phenomenon was first noted in a WES study of 20 sporadic ASD cases and their parents that identified 18 de novo SNVs within coding sequence, 11 of which were protein altering (missense and nonsense).51 Potentially causative de novo SNVs were identified in the FOXP1, GRIN2B, SCN1A, and LAMC3 genes, respectively, in four unrelated ASD probands who were among the most severely affected individuals studied.51 Interestingly, de novo mutations in some of these genes had already been found in association with a variety of different neurodevelopmental phenotypes. For example, four de novo mutations in the GRIN2B gene were identified in a study of 468 individuals with intellectual disability.59 Similarly, de novo mutations in the FOXP1 gene had been identified in individuals with intellectual disability, autism, and language impairment.60 ,61 A pleiotropic effect for these genes in different neurodevelopmental disorders is therefore to be suspected—it follows that de novo mutations in these genes could give rise to variable phenotypic expressivity or even a spectrum of neurodevelopmental phenotypes, depending upon mutation severity and modulation by other genetic and/or environmental factors.
A much larger WES study was performed in 238 families from a comprehensively phenotyped ASD cohort, comprising pedigrees with two unaffected parents and an affected proband (and in 200 ASD families, with an unaffected sibling).55 This study found that highly deleterious (nonsense and splice site) de novo mutations in brain expressed genes were associated with ASD, but the total number of non-synonymous de novo SNVs was significantly higher in ASD probands than in the unaffected siblings. In addition, the odds ratio of de novo non-synonymous to silent mutations in probands versus unaffected siblings was 1.93. When the analysis was confined to de novo nonsense and splice site mutations in brain expressed genes, this substantially increased the estimate of the effect size and demonstrated a significant difference in cases versus controls based either on an analysis of mutational burden or an evaluation of the odds ratio of nonsense and splice site mutations to silent SNVs (OR=5.65).55 These results demonstrated that non-synonymous de novo mutations, and particularly highly deleterious nonsense and splice site de novo mutations in brain expressed genes, are associated with ASD.
Several new insights have been obtained from the recent WES studies of de novo mutations in ASD.54–56 First, the number of de novo SNVs correlates positively with increasing paternal age; this is consistent with the postulate that the increased risk for children of older fathers to develop ASD is the result of an increased mutation rate with paternal age,56 an observation also made by Sanders et al.55 Further, new biologically plausible disease genes, such as NTNG1, were also identified on the basis that NTNG1 harboured recurrent, protein disruptive mutations. NTNG1 is a strong biological candidate given its role in the laminar organisation of dendrites and axonal guidance; both de novo SNVs identified were missense located at highly conserved positions and predicted to disrupt protein function.56 Another gene of interest was SCN2A since two probands carrying de novo nonsense SNVs in SCN2A were identified; gain-of-function mutations in SCN2A were already known to be associated with a range of epilepsy phenotypes.56 ,62–64 The presence of two or more de novo nonsense and/or splice site mutations in the same gene in unrelated affected individuals is unlikely to have occurred by chance alone, and hence provides strong evidence to implicate SCN2A as a disease gene in ASD. In addition, KATNAL2 and CHD8 were each found to carry two highly deleterious mutations in different individuals (by combining data with another study), again strongly implicating this gene in the aetiology of ASD.55
Taken together, these studies identified a variety of de novo SNVs that are predicted to disrupt gene function in ASD, thereby strengthening the hypothesis that the occurrence of de novo SNVs underlies at least a proportion of ASD. The de novo nature of the underlying lesions may account for the high prevalence of this disease which is clearly associated with a notable reduction in reproduction fitness. Further, these studies highlighted the extreme locus heterogeneity of ASD. This might explain why de novo mutations that are individually very rare could play a role in the causation of common diseases, because many different genes may harbour such mutations in different individuals. In support of this view, for example, it has been estimated that mutations in more than 1000 different genes may cause intellectual disability65; this large number of disease ‘targets’ could account for the prevalence of intellectual disability in the general population, as demonstrated by the findings of de Ligt et al66 who implicated a total of 79 de novo mutations in 77 genes in causing intellectual disability (discussed below).
Comparable progress in identifying de novo SNVs has also been achieved in individuals with hitherto unexplained causes of intellectual disability. The first report of its kind employed the WES approach to identify de novo SNVs in children with a normal karyotype in whom array based genome profiling had excluded a potential contribution from de novo CNVs.50 Thus, WES was performed on 10 children with unexplained intellectual disability and their parents, leading to the identification of a total of nine de novo SNVs. These mutations occurred in different genes (including RAB39B and SYNGAP1), some of which had previously been implicated in causing intellectual disability,67 ,68 thereby lending further support to the results obtained by exome sequencing. Six of these nine de novo SNVs were predicted to be pathogenic on the basis of gene function, evolutionary conservation, and likely mutational impact.50
A recent WES study, performed on a larger cohort of 100 people with intellectual disability and their unaffected parents, identified 79 de novo SNVs in 53 patients.66 Of these, 10 de novo SNVs identified in 10 patients were predicted to compromise the structure or function of known intellectual disability genes. Potentially causative de novo SNVs in novel candidate genes were also detected in 22 patients. Taken together, this study identified the underlying genetic cause in 10 patients with de novo mutations in known intellectual disability genes and in three male patients with severely disruptive, maternally inherited mutations in known X linked intellectual disability genes, giving a diagnostic yield of 13%. This study once again conveyed the message that de novo SNVs represent an important cause of intellectual disability.66 Given the extensive locus heterogeneity of intellectual disability (ie, it is caused by mutations in a very large number of different genes), WES represents a promising diagnostic tool to detect the underlying mutations. The diagnostic yield of 13% in patients with intellectual disability using WES is most encouraging. It should be appreciated that this is a conservative estimate because a further 24 new candidate genes affected by de novo mutations could also be potentially pathogenic, but their actual clinical significance would require further investigation in additional patient cohorts to be confirmed. Indeed, a pathogenic role for three of these genes (DYNC1H1, GATAD2B, and CTNNB1) was supported by the identification of additional patients with intellectual disability and severely disruptive mutations in these genes.66
Since CNVs constitute a significant cause of intellectual disability, improvements in the detection of genomic deletions and duplications from WES data should further increase the diagnostic yield (while eliminating the need for concomitant microarray screening), as different types of mutations would then be detectable in a single experiment.69–71 Although promising, various challenges remain in relation to the implementation of NGS methods as diagnostic tools in the clinical setting (eg, with reference to the development of a standardised pipeline and protocol from the experiment to interpretation of results), data analysis, interpretation of the results (eg, variants of uncertain significance), communication of the results to patients and their families, and other ethical concerns such as the disclosure of incidental findings. Challenges are also to be faced in the application of NGS methods to the genetic diagnosis of rare Mendelian diseases as discussed below.72–74
Rare Mendelian disorders
Over the last 2 years, WES has been widely applied to the identification of new inherited causal mutations for a range of dominant and recessive Mendelian disorders.48 ,75–78 However, our focus here has been placed firmly on the utility of WES in the context of identifying de novo SNVs for rare Mendelian disorders. Heterozygous de novo mutations are believed to be a common cause of sporadic instances of rare diseases characterised by multiple congenital malformations or anomalies, developmental delay and intellectual disability such as Schinzel–Giedion syndrome (MIM 269150),12 Bohring–Opitz syndrome (MIM 605039),13 and Coffin–Siris syndrome (MIM 135900),22 ,23 just to name a few, whose genetic bases had previously remained elusive.
Among recent WES studies, of particular interest is Coffin–Siris syndrome, characterised by various anomalies such as developmental delay and severe speech impairment. It is a rare congenital anomaly syndrome in which the majority of affected individuals are sporadic cases, strongly implying a dominant genetic basis for the disorder with underlying de novo mutations. This has now been confirmed to be the case by molecular analysis. WES, performed on five affected individuals with Coffin–Siris syndrome, identified 51 plausible variants under the hypothesis that an abnormality in a causal gene would be shared by at least two individuals.23 These variants were then validated by targeted Sanger sequencing of genomic DNA from the five affected individuals and their parents. This led to the identification of two de novo heterozygous SNVs in SMARCB1 in two (unrelated) affected individuals. Since the a priori probability that two different coding sequence mutations in unrelated individuals would be found to occur de novo within the same gene is extremely low, this provided strong support for the disease gene candidacy of SMARCB1 in Coffin–Siris syndrome. Further, in another study that performed WES on Coffin–Siris syndrome patients (including one case–parents trio and two sporadic cases), a second disease gene was identified22; this time, three de novo mutations were found to truncate the ARID1B reading frame (one frameshift and two nonsense mutations). It would be interesting to reanalyse the data from Tsurusaki et al23 to confirm if de novo mutations in this gene were missed in the remaining three patients (who did not harbour de novo mutations in SMARCB1) in their earlier analysis, or if a third gene for these three patients should be suspected. Intriguingly, both SMARCB1 and ARID1B encode components of the SWI/SNF (SWItch/Sucrose NonFermentable) chromatin remodelling complex, which acts as an epigenetic modifier modulating the accessibility of transcription factors to DNA by altering chromatin structure. This was the first evidence to show that germline de novo mutations in SWI/SNF complex genes are associated with a multiple congenital anomaly syndrome. WES has clearly already proved itself to be a powerful discovery tool.
Among other new discoveries has been the detection of de novo microlesions in the KAT6B gene in genitopatellar syndrome (MIM 606170).16 ,17 Genitopatellar syndrome is a rare disorder in which patellar aplasia or hypoplasia is associated with external genital anomalies and severe intellectual disability. In line with the assumption that the underlying causal mutations would act in a dominant manner and would have arisen de novo, KAT6B was identified as the only candidate gene harbouring previously unidentified (not previously reported in a public database) heterozygous variants in five of the six individuals whose exomes were sequenced. The mutations included a single nonsense variant and three frameshift indels, one of which (a 4 bp microdeletion) was observed in two unrelated cases. The de novo status was subsequently discovered by Sanger sequencing the parents who had not been subjected to WES, as none of the mutations was present in genomic DNA from the unaffected parents.16 De novo heterozygous truncating mutations in the KAT6B gene were independently identified in three subjects with genitopatellar syndrome by another study (also through WES), and subsequent Sanger sequencing detected similar KAT6B gene mutations in three additional subjects.17
The consistency of these findings provided strong support for the causative role of deleterious de novo mutations of KAT6B as a cause of genitopatellar syndrome. Intriguingly, de novo protein truncating mutations in KAT6B were also implicated in another rare disorder, Say–Barber–Biesecker–Young–Simpson syndrome (SBBYSS or Ohdo syndrome) (MIM 603736).14 This is a multiple anomaly syndrome which is also characterised by severe intellectual disability in addition to blepharophimosis and a mask-like facial appearance. KAT6B is a gene encoding a highly conserved histone acetyltransferase involved in chromatin modification. This finding, together with the identification of the genes for Coffin–Siris syndrome,22 ,23 has provided evidence for a key role for chromatin modifying genes and epigenetic abnormalities in human developmental and congenital disorders characterised by different developmental anomalies. The findings so far reported are likely to be merely the tip of the iceberg, since many more mutations affecting genes involved in chromatin remodelling or modification are expected to be identified in other rare congenital disorders.
In addition to the application of WES to unrelated cases (with subsequent further validation of the putative disease variants in parental DNA to confirm their de novo status), other studies have applied WES directly to trios. Thus, Lin et al24 applied WES to an individual with Olmsted syndrome (MIM 607066) and her parents; to detect de novo mutations, heterozygous variants from her parents were filtered out. In total, 45 putative de novo variants were found in the affected individual which were predicted to be damaging by SIFT (Sorting Intolerant From Tolerant). It is very likely that among these putative de novo variants, there are quite a few false positives as the number appears too large by comparison with other studies in apparently normal healthy individuals.2 The false positives could be due to variants which were detected in offspring but overlooked in parents. Indeed, further validation of these candidate variants by Sanger sequencing yielded a solitary de novo heterozygous point mutation in TRPV3. Support for the causative role of this mutation was then garnered by the discovery of additional heterozygous missense mutations in TRPV3 in five additional individuals; all five mutations were found to have occurred de novo. Further, these mutations were absent in 216 ethnically matched normal controls.24 This is debatable, as for de novo mutations, there are chances that they will be absent in normal controls; therefore this does not constitute additional independent proof.
The major advantage of applying WES to parent–case trios is that it narrows down the list of potential candidate disease genes very significantly because of the limited number of de novo events to be expected in protein coding sequences. This is of course especially useful in the case of disorders in which de novo mutations are strongly suspected from the outset, for example, Baraitser–Winter syndrome (MIM 243310), a rare but well defined developmental disorder.25 No familial recurrence or consanguinity has ever been observed in families affected with this syndrome, and hence the genetic basis in the known cases of Baraitser–Winter syndrome was always likely to be due to the occurrence of de novo microlesions (particularly as no obviously pathogenic CNVs had been detected using microarrays). Thus, WES was applied in the case of three probands and their unaffected parents. This allowed the direct investigation of de novo mutation and resulted in the identification of de novo missense changes in the ACTG1 gene in two probands and in the ACTB gene in the third proband.25 However, the potential drawback of this study design is the additional cost of WES incurred by sequencing the parents as well as the child, in contrast to testing a limited panel of candidate disease mutations or genes in the corresponding parents in order to determine their de novo status.
Although these different approaches come with their own particular advantages and disadvantages, both have been successful in identifying genes harbouring de novo mutations causing rare disorders. WES would be sufficient to identify the mutations or genes underlying most rare Mendelian disorders as long as the causative mutations reside within the coding regions. In contrast to WGS, WES is more cost effective and analytically less challenging. However, some studies have also applied the WGS approach, for example, to a family quartette of a subject affected by a sporadic case of severe epileptic encephalopathy and her unaffected parents and sibling.79 WGS revealed a de novo missense mutation in SCN8A. The advantage of WGS is that it does not (like WES) exclude the possibility that the sought after pathological mutations might occur in functional regulatory or splicing elements, some of which might be remote from the genes whose expression they help to regulate.80 The disadvantage of WGS is that there are potentially many more de novo mutations that need to be assessed before pathogenicity can be reliably attributed. For example, from variants detected in the quartette, 34 violated the Mendelian inheritance rules, suggesting the occurrence of multiple de novo mutations within the proband. However, 10 of these variants were removed because they were found during the course of the 1000 Genomes Project, and excluded by other criteria such as in error-prone regions and known segmental duplications. Finally, Sanger sequencing demonstrated that 23 of the remaining 24 candidates were false positives, leaving a single, true de novo variant in the proband.
Although numerous de novo mutations have now been identified as being responsible for rare Mendelian disorders (see online supplementary table S1), it remains unclear what constitutes evidence for causality. It is difficult and challenging to establish the causative role of newly identified mutations even when these events have occurred de novo.81 ,82 Cosegregation of the putative causal mutation with the disease phenotype in large multigenerational pedigrees can provide strong genetic evidence of causality, but this is obviously not feasible with de novo mutations that severely impair reproductive fitness and are therefore not transmitted down the generations. Therefore, for any newly identified putative pathological lesion which appears to have occurred de novo, further screening of additional cases is invariably required. Detection of recurrent deleterious mutation or different mutations in the same gene in additional cases constitutes strong evidence of causality.83 However, it can be difficult, in the context of extremely rare disorders, to find additional cases to validate the newly identified de novo mutation. Further, this search might be unsuccessful in disorders characterised by extensive locus heterogeneity. Moreover, de novo mutations tend to be individually extremely rare. Although sometimes very challenging, further validation in additional patient samples has nevertheless been achieved in recent WES studies of rare disorders. Thus, after the discovery of de novo mutations in the ACTG1 and ACTB genes by WES of three parent–case trios with Baraitser–Winter syndrome, Sanger sequencing was employed to screen the coding sequences of both genes in 15 additional affected individuals, with the result that pathogenic mutations in one or other of these genes were detected in all subjects under study.25 These mutations were then shown to have occurred de novo in all 11 subjects for whom parental DNA was available. Moreover, further validation in normal controls demonstrated that none of the mutations identified in Baraitser–Winter syndrome was present in several large control datasets.25
This approach was also successfully employed by Rosewich et al33 who aimed to identify the genetic cause of alternating hemiplegia of childhood type 2 (AHC2; MIM 614820). A total of 24 patients with a clinical diagnosis of AHC2, together with their healthy parents, were recruited for this study. Even though WES was initially performed in only three proband–parent trios, the analysis identified ATP1A3 as the disease causing gene for AHC2 on the basis that it harboured three quite distinct heterozygous de novo missense mutations in the different patients. However, to further strengthen the evidence to support the contention that de novo mutations in ATP1A3 were causative, this gene was sequenced in the remaining 21 AHC2 patients and their healthy parents; in line with expectation, mutations in the ATP1A3 gene were identified in all patients and all the heterozygous mutations were shown to have occurred de novo.33 Finally, confirmation of ATP1A3 as the causal gene for AHC2 was provided by another WES based study. This study also applied WES on seven patients with AHC2 and their unaffected parents, which identified de novo non-synonymous mutations in ATP1A3 in all patients.32 This suggested that ATP1A3 might be the only disease gene for this disorder. However, subsequent analysis of ATP1A3 in an additional 98 patients with AHC2 found that ATP1A3 mutations were likely to be responsible for at least 74% of the cases. Most AHC2 cases appear to be caused by one of seven recurrent ATP1A3 mutations found by Heinzen et al.32 The identification of ATP1A3 as the disease gene for AHC2 is critical because AHC2 is currently diagnosed only on the basis of clinical criteria, and the variable phenotypic manifestations of this condition could easily give rise to diagnostic confusion. Delineation of the underlying molecular basis of AHC2 should allow the development of a genetic test to identify unequivocally children with the disease at the same time as ascertaining its clinical/phenotypic spectrum.
In addition to genetic evidence as described above, molecular functional studies can be performed to confirm the functional significance of newly identified de novo mutations. Evidence from such studies can be considered confirmatory only when the mutated gene can be shown to have a clear and well defined role in the molecular pathology of the disease. For example, ATP1A3, identified as the underlying disease gene in AHC2, is an α subunit of the Na+/K+ ATPase pump that is partly responsible for establishing and maintaining electrochemical gradients of sodium and potassium ions across the plasma membrane of neurons.32 To better understand how ATP1A3 mutations cause two clinically distinct disorders, as mutations in ATP1A3 have also been shown to cause rapid onset dystonia–parkinsonism,84 Heinzen et al32 investigated the in vitro functional consequences of the ATP1A3 mutations underlying AHC2 and rapid onset dystonia–parkinsonism, respectively. It was found that unlike the ATP1A3 mutations associated with rapid onset dystonia–parkinsonism, the AHC2-causing mutations in this gene caused consistent reductions in ATPase activity without affecting the level of protein expression.
Nonsense and splice site mutations are very likely to be deleterious. However, most of the de novo SNVs identified to date are missense and it is often difficult to establish an unequivocal causative link between a specific missense mutation and a disease phenotype.85 Amino acid substitutions in evolutionarily conserved residues can provide evidence for pathogenicity.86 If the function of the protein is known, assessment of the biological effect of the missense mutation can be performed by in vitro mutagenesis and functional assay.87–92 Without in depth analytical studies, however, missense mutations may often be difficult to distinguish from polymorphisms with little or no clinical significance, either in the context of candidate gene sequencing studies93 or in the context of WES studies.94 Evidence for pathological authenticity usually comes from one or more different lines of evidence,93 ,95 including: (1) independent occurrence in additional patients; (2) absence in normal controls (only applicable to inherited variants as it does not constitute an additional proof for de novo mutations); (3) cosegregation of the lesion and disease phenotype through the family pedigree (not applicable in the case of de novo mutations); (4) non-conservative substitutions being more likely to disrupt protein function; (5) location in a protein region of structural or functional importance; (6) location in an evolutionarily conserved nucleotide sequence and/or amino acid residue; and albeit rarely (7) reversal of the pathological phenotype in patient/cultured cells by gene replacement. These lines of evidence should not of course be given equal weight; in most cases, the independent occurrence of de novo mutations in additional patients provides by far the most compelling support for pathogenicity.
De novo mutations located outside coding regions, for example, in promoter, intronic, intergenic regions or untranslated regions, can also be of pathological significance. However, in the context of a whole genome screen, most such pathogenic mutations could only be identified using WGS rather than WES. This notwithstanding, the ability of WES to identify de novo mutations residing within flanking non-coding regions was exemplified by the identification of a heterozygous de novo mutation in the 5′-untranslated region of IFITM5 (it was located 14 bp upstream of the annotated translation initiation codon) underlying osteogenesis imperfecta type V (MIM 610967).29 Subsequently, the study identified an identical heterozygous de novo mutation in an additional patient with osteogenesis imperfecta type V by Sanger sequencing, thereby strongly suggesting that this was the causal mutation for the disease. This finding concurred with another study, which also found the same heterozygous mutation in the 5′-untranslated region of IFITM5. This mutation occurred de novo in five unrelated cases, and cosegregated completely with the disease in three families—providing yet another line of evidence to support the pathological significance of this mutation (irrespective of whether it was acquired de novo or was inherited from the parents) in osteogenesis imperfecta type V.30
The identification of the causal genes for rare Mendelian disorders is important for the development of future molecular diagnostic tests to confirm the clinical diagnosis, especially for those disorders characterised by diverse clinical manifestations lacking specific phenotypes/phenotypic characteristics, or for those disorders which share common clinical features, and where ambiguity in diagnosis is common. For example, Weaver syndrome (MIM 277590) is a rare congenital anomaly syndrome characterised by generalised overgrowth, advanced bone age, pronounced macrocephaly, hypertelorism, and characteristic facial features. In addition, intellectual disability is common. Some patients with Weaver syndrome harbour mutations in NSD1, the gene that is mutated (or deleted) in most patients with classic Sotos syndrome (MIM 117550).96 ,97 This molecular finding has led to uncertainty as to whether the Sotos and Weaver syndromes represent variable expressivity of a single locus with allelic heterogeneity or whether they represent distinct disorders caused by mutations in different genes. To compound the problem, these disorders have some shared and some distinguishing clinical features.96 ,97 The application of WES to two trios affected by Weaver syndrome successfully identified two different de novo mutations in the EZH2 gene.19 Sanger sequencing of EZH2 in a third classically affected proband identified a third de novo mutation in this gene. In addition, the study ruled out rare variants in NSD1 by means of Sanger sequencing in all three probands.19 The identification of this new disease gene has extended our definition of the genetic basis of Weaver syndrome. It may well be that other genes still remain to be identified, since Weaver syndrome is genetically and clinically quite a heterogeneous condition.
In the context of the entirety of the human genome, de novo point mutations remained largely refractory to analysis until the arrival of NGS and its twin fruits, WES and WGS. Microarray technologies have been used widely and successfully to identify de novo CNVs in a number of common neurodevelopmental conditions such as schizophrenia and ASD. By contrast, since SNVs and other microlesions occurring de novo were not amenable to analysis by these microarray based methods, little was known about either their frequency or their impact upon neurodevelopmental disease until the advent of WES. However, by means of WES of case–parent trios, de novo SNVs have recently been implicated in schizophrenia, ASD, and intellectual disability. Taken together, these findings strengthen the hypothesis that the occurrence of de novo mutations could account for the high prevalence of those diseases which are associated with a notable reduction in reproductive fitness.
Many rare Mendelian disorders that are associated with multiple congenital malformations or anomalies, developmental delay, and intellectual disability occur sporadically because the severity and/or early onset of the disorders tend to preclude the transmission of the casual mutations to subsequent generations. As a result, these causal mutations are under strong negative selection that ensures they are quickly eliminated from the population. Since de novo mutations are by definition refractory to traditional linkage analysis, many genes causing sporadic cases of rare diseases still remain to be identified. Although the importance of de novo mutations in rare disorders without family transmission is well recognised, the genome wide screening methodology to identify novel disease de novo mutations within the coding regions was not widely available until 3 years ago. Recent studies have demonstrated the effectiveness of WES in the elucidation of the de novo genetic basis of an ever increasing number of rare Mendelian diseases. We are only at the beginning of the process of elucidation of the molecular basis of a large number of individually rare diseases whose genetic aetiologies have previously been elusive. It is really just a matter of time until WES is applied to all those rare diseases which were not previously amenable to study by pre-genomic era analytical methods. In addition to improving our knowledge of the pathogenesis of human genetic disease and the biological roles of the newly identified genes and the proteins they encode, the identification of disease-causing genes (whether the mutation has been parentally transmitted or has instead occurred de novo) will form the basis of future molecular diagnostic tests for these conditions. While technological advances have driven studies of de novo mutations, the collection of DNA samples from parent–offspring trios affected by various diseases is an important prerequisite to further enhance the rate of discovery of novel disease genes characterised by de novo mutations.
Confirming the pathological authenticity of missense mutations occurring de novo which have been identified in unique individuals/families is extremely challenging. This is because, in such cases, there is often little or no possibility of confirming the involvement of the candidate mutation or gene in the disease through analysis of a second individual or family. In addition to the in silico (bioinformatics) prediction of a protein damaging effect or the assessment of the evolutionary conservation of the amino acid residues affected by these ‘unique’ de novo missense mutations, the identification of these or similar mutations in other closely related conditions could be held to be suggestive of their pathogenicity. A further line of evidence might be the biological plausibility of the gene harbouring these mutations. However, the biological functions of the candidate gene(s) might not be fully known, and the biology underlying the disease might not be well characterised. Individualised functional studies of the ‘unique’ de novo missense mutations will provide additional evidence for their pathogenicity, and this functional characterisation will be greatly facilitated if the gene encodes a protein that plays a role in a pathophysiological pathway known to be involved in the disease. Currently, there are no well established criteria with which to distinguish de novo pathological lesions from other missense mutations that may have occurred de novo in a particular individual. It is even more challenging to show that a de novo missense mutation has no pathological significance in unique cases, as such mutations cannot be screened for in a general population of apparently healthy individuals to confirm their non-involvement in pathology.
Contributors CSK and DNC contributed to the conceptualisation of this manuscript. KCS contributed to the writing of this manuscript. DNC contributed to the editing of this manuscript. All authors contributed to proofreading. CSK approved the final version of this manuscript.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.