Article Text

Download PDFPDF

Original article
SETD2 and DNMT3A screen in the Sotos-like syndrome French cohort
  1. Camille Tlemsani1,2,
  2. Armelle Luscan1,2,
  3. Nicolas Leulliot3,
  4. Eric Bieth4,
  5. Alexandra Afenjar5,
  6. Geneviève Baujat6,
  7. Martine Doco-Fenzy7,
  8. Alice Goldenberg8,
  9. Didier Lacombe9,
  10. Laetitia Lambert10,
  11. Sylvie Odent11,
  12. Jérôme Pasche12,
  13. Sabine Sigaudy13,
  14. Alexandre Buffet14,
  15. Céline Violle-Poirsier15,
  16. Audrey Briand-Suleau1,
  17. Ingrid Laurendeau2,
  18. Magali Chin2,
  19. Pascale Saugier-Veber8,16,
  20. Dominique Vidaud1,2,
  21. Valérie Cormier-Daire5,
  22. Michel Vidaud1,2,
  23. Eric Pasmant1,2,
  24. Lydie Burglen5,17
  1. 1Service de Génétique et Biologie Moléculaires, Hôpital Cochin, Hôpitaux Universitaires Paris Centre, AP-HP, Paris, France
  2. 2EA7331, Faculté de Pharmacie, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
  3. 3Faculté de Pharmacie, Laboratoire de Cristallographie et RMN Biologiques-CNRS UMR-8015, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
  4. 4Service de Génétique, Hôpital Purpan, Toulouse, France
  5. 5Département de Génétique, Centre de référence des anomalies du développement et syndromes malformatifs, Hôpital Trousseau, AP-HP, Paris, France
  6. 6INSERM UMR_1163, Département de Génétique, Université Paris Descartes, Sorbonne Paris Cité, Institut Imagine, Hôpital Necker-Enfants Malades, AP-HP, Paris, France
  7. 7Service de génétique HMB CHU Reims, EA 3801, SFR CAPSANTE, Reims, France
  8. 8Service de Génétique, Centre Normand de Génomique Médicale et Médecine personnalisée, CHU de Rouen, Rouen, France
  9. 9Service de Génétique, CHU Bordeaux, Bordeaux, France
  10. 10Service de Génétique, CHU, Nancy, France
  11. 11Service de Génétique, CHU, Rennes, France
  12. 12Service de Pédiatrie, Centre Hospitalier de Polynésie française, Papeete, Tahiti, France
  13. 13Service de Génétique, CHU de Marseille—Hôpital de la Timone, Marseille, France
  14. 14Service d'Endocrinologie, Maladies Métaboliques, Nutrition, Hôpital Larrey, Toulouse, France
  15. 15Département de Génétique, CHU de Reims, Reims, France
  16. 16Inserm U1079, Université de Rouen, IRIB, Rouen, France
  17. 17INSERM UMR_1141, Paris, France
  1. Correspondence to Dr Eric Pasmant, Service de Biochimie et Génétique Moléculaire, Hôpital Cochin, AP-HP, Bâtiment Jean DAUSSET—3ème étage, 27 rue du Faubourg Saint Jacques, Paris, France; eric.pasmant{at}


Background Heterozygous NSD1 mutations were identified in 60%–90% of patients with Sotos syndrome. Recently, mutations of the SETD2 and DNMT3A genes were identified in patients exhibiting only some Sotos syndrome features. Both NSD1 and SETD2 genes encode epigenetic ‘writer’ proteins that catalyse methylation of histone 3 lysine 36 (H3K36me). The DNMT3A gene encodes an epigenetic ‘reader’ protein of the H3K36me chromatin mark.

Methods We aimed at confirming the implication of DNMT3A and SETD2 mutations in an overgrowth phenotype, through a comprehensive targeted-next generation sequencing (NGS) screening in 210 well-phenotyped index cases with a Sotos-like phenotype and no NSD1 mutation, from a French cohort.

Results Six unreported heterozygous likely pathogenic variants in DNMT3A were identified in seven patients: two nonsense variants and four de novo missense variants. One de novo unreported heterozygous frameshift variant was identified in SETD2 in one patient. All the four DNMT3A missense variants affected DNMT3A functional domains, suggesting a potential deleterious impact. DNMT3A-mutated index cases shared similar clinical features including overgrowth phenotype characterised by postnatal tall stature (≥+2SD), macrocephaly (≥+2SD), overweight or obesity at older age, intellectual deficiency and minor facial features. The phenotype associated with SETD2 mutations remains to be described more precisely. The p.Arg882Cys missense de novo constitutional DNMT3A variant found in two patients is the most frequent DNMT3A somatic mutation in acute leukaemia.

Conclusions Our results illustrate the power of targeted NGS to identify rare disease-causing variants. These observations provided evidence for a unifying mechanism (disruption of apposition and reading of the epigenetic chromatin mark H3K36me) that causes an overgrowth syndrome phenotype. Further studies are needed in order to assess the role of SETD2 and DNMT3A in intellectual deficiency without overgrowth.

  • Overgrowth
  • Sotos syndrome
  • SETD2
  • DNMT3A
  • H3K36me3

Statistics from


Human growth comes from a complex interplay of various factors including genetic backgrounds and environmental influences.1 Overgrowth refers to a condition characterised by extreme physical size and stature including tall stature or generalised or localised overgrowth of tissues. Among various conditions showing overgrowth, genetic overgrowth syndrome refers to a non-hormonally mediated overgrowth condition which can accompany increased height and/or head circumference, various degrees of intellectual disability or physical dysmorphisms in children. Sotos syndrome (MIM 117550) is a childhood overgrowth condition, first described in 1964, characterised by cardinal features including overgrowth (including height and occipito-frontal circumference), dysmorphism and learning disability with a wide spectrum of associated features including advanced bone age, neonatal jaundice and hypotonia, seizures, scoliosis, cardiac defects and genitourinary anomalies.2 In 2002, the nuclear receptor set domain containing protein 1 gene, NSD1 (MIM 606681) on chromosome 5q35 was identified as at least one of the genetic causes of Sotos syndrome.3 Since this discovery, heterozygous NSD1 intragenic mutations or microdeletions were identified in 60%–90% of patients with clinically diagnosed Sotos syndrome. Hundreds of cases have been reported in the literature with a broad phenotypic spectrum, varying from a classical Sotos syndrome phenotype to patients exhibiting only some Sotos syndrome features (Sotos-like syndrome).4–6 Recently, mutations of the DNMT3A and SETD2 genes were identified in patients with Sotos-like overgrowth syndromes by two studies employing next generation sequencing (NGS) approaches. Trio-based whole-exome sequencing (WES) in NSD1-negative families with overgrowth, distinctive facial appearance and intellectual disability identified de novo mutations in the DNA (cytosine-5)-methyltransferase 3A (DNMT3A) gene.7 At the same time, we reported heterozygous loss-of-function mutations in the SET domain containing 2 (SETD2) gene in patients with postnatal tall stature, macrocephaly, minor facial features and learning difficulties, using a targeted NGS approach.8

Interestingly, both NSD1 and SETD2 genes encode epigenetic ‘writer’ proteins that catalyse methylation of histone H3 lysine 36 (H3K36me) via the conserved Suppressor of variegation 3–9, Enhancer of Zeste and Trithorax (SET) domain.9 Numerous studies in multiple systems support a role for H3K36 methylation in transcriptional activation. Histone lysine methylation signalling is a principal chromatin regulatory mechanism that influences fundamental nuclear processes linked to downstream biological functions by methyllysine-binding proteins.10 Lysine (K) residues can accept up to three methyl groups to form monomethylated, dimethylated and trimethylated derivatives. NSD1 mediates H3K36 monomethylation and dimethylation which appears to affect transcriptional initiation.11 SETD2 also functions as a histone methyltransferase and is non-redundantly responsible for all trimethylation of lysine 36 of histone H3.12 ,13 Moreover, the DNMT3A gene encodes a member of the mammalian family of DNA methyltransferases and is an epigenetic ‘reader’ protein of the H3K36me chromatin mark.14 The DNMT3A DNA methyltransferase contains a C-terminal catalytic domain and a Proline-Tryptophan-Tryptophan-Proline (PWWP) and ATRX-DNMT3A-DNMT3L-type zinc finger (ADD) domains that recognise histone H3 lysine modifications and recruit the protein to specific gene targets. These observations provided evidence for a unifying mechanism (disruption of apposition and reading of the epigenetic chromatin mark H3K36me) that causes an overgrowth syndrome phenotype.

The aim of this study was to confirm the implication of DNMT3A and SETD2 constitutional mutations in an overgrowth syndrome phenotype, through a comprehensive targeted NGS screening in 210 well-phenotyped index cases with a Sotos-like phenotype and no NSD1 mutation, from a French cohort.

Subjects and methods


A total of 210 patients were included in the cohort between 2004 and 2014 in France. Inclusion criteria were as follows: (1) Clinical criteria: at least two of the four following characteristics: tall stature (≥+2SD), macrocephaly (≥+2SD), learning disability and Sotos-like facial dysmorphy and (2) Molecular criteria: no NSD1 intragenic mutations or microdeletions found. Patients were regularly followed in French medical genetics departments and phenotypically scored by clinical geneticists. For each patient, the full phenotypic information was recorded in a standardised way by using a standardised case report file (available upon request; Dr Lydie Burglen, Department of Genetics, Trousseau Hospital, AP-HP, France). The study was approved by the local ethics committee. Informed consent was obtained from all patients and/or parents. In all cases, routine G banding and R banding chromosome analyses showed a normal karyotype.

DNA extraction

Blood samples from index cases and their parents were obtained and genomic DNA was isolated from EDTA anticoagulated using a Nucleon kit (Amersham, UK) according to the manufacturer's instructions. DNA concentrations were quantified by using a Quant-iT dsDNA HS assay kit and a Qubit V.2.0 Fluorometer (Life Technologies, Saint-Aubin, France).

Targeted next generation sequencing

Experiments were performed on the NGS facility of the Cochin hospital, Paris (Assistance Publique—Hôpitaux de Paris, France), as previously described.15 The custom primers panel targeting the DNMT3A and SETD2 genes was designed using the AmpliSeq Designer (reference IAD62451_192, Life Technologies, Saint-Aubin, France). The targeted region included the entire DNMT3A and SETD2 coding exons and their intron boundaries (20 bp). The targeted region (19.4 kb) was amplified by 136 amplicons (length between 125 and 275 bp) distributed in three primer pools: 30 ng of genomic DNA are amplified to generate the library using the Ion AmpliSeq Library Kit V.2.0 (Life Technologies), according to the manufacturer’s instructions (Ion AmpliSeq Library Preparation, Revision V.5.0, July 2013, Life Technologies). The amplified libraries were purified using Agencourt AMPure XP beads (Beckman Coulter, Brea, California, USA). Prior to library pooling and sequencing sample preparation, amplified libraries were quantified using the Qubit V.2.0 Fluorometer (Life Technologies). Emulsion PCR was performed using the Ion OneTouch Instrument (Life Technologies). Enrichment of the template-positive Ion OneTouch 200 ion sphere particles (ISPs, containing clonally amplified DNA) PCR was performed using the Ion OneTouch ES (Life Technologies), according to the manufacturer's procedures. An ISP quality control was then performed using a Qubit V.2.0 Fluorometer. The template-positive ISPs were loaded on Ion 316 chips and sequenced with an Ion Personal Genome Machine (PGM) System (Life Technologies). The chip capacity allowed mixing 24 bar-coded samples.

NGS bioinformatics analysis: variants calling, filtering and annotation

Data collected on the PGM were collated and re-analysed using the Torrent Suite V.4.2 using FASTQ files from the Ion Torrent Browser. Sequence alignment and extraction of SNPs and short insertions/deletions (In/Del) were performed using the Variant Caller plugin on the Ion Torrent Browser and DNA sequences visualised using the Integrated Genomics Viewer (IGV, V.2.3) from Broad Institute (Cambridge, Massachusetts, USA). The NextGENe software V.2.3.3 (Softgenetics, State College, Pennsylvania, USA) was also used for sequence alignment, extraction of SNPs and In/Del, and their visualisation and annotation. In brief, major calling parameters were chosen as follow: minimum allele frequency ≥20% for both SNPs and In/Del, minimum sequencing depth ≥6× for SNPs and ≥15× for In/Del, and minimum sequencing depth on either strand ≥6× for In/Del. A NGS bioinformatics analysis was also performed for single-exon and multi-exon deletions/duplications identification, as previously described.8 Briefly, quantitative values were obtained from the number of reads for each amplicon of each sample, extracted using the Coverage Analysis plugin on the Ion Torrent Browser (Life Technologies). Read number for each separated amplicon was normalised by dividing each amplicon read number by the total of amplicon read numbers of a control gene from the same sample. Normalised read number obtained for each amplicon of a sample were then divided by the average normalised read number of control samples for the corresponding amplicon. Copy number ratios of <0.7 and >1.3 were considered deleted and duplicated, respectively.

Variant annotation was performed using the phyloP software, that generates a score to evaluate the evolutionary conservation, consistent with functional importance ( and the PolyPhen-2 ( and the Sorting Intolerant from Tolerant (SIFT, softwares that provide in silico prediction of possible impact of an amino acid substitution on the structure and function of a human protein. For exon variants, exonic splicing enhancer (ESE) putative modification was evaluated using the ESE finder software (¼home). Creation of a new splice site was evaluated using the Human Splicing Finder V.2.4.1 software (

Variants confirmation using Sanger sequencing

DNMT3A and SETD2 variants detected by targeted NGS were confirmed using Sanger DNA sequencing analysis performed on the corresponding exon, as previously described.8 Mutational screening was performed using bidirectional DNA sequencing of the purified PCR products with the ABI Big Dye terminator sequencing kit (Applied Biosystems) on an ABI Prism 3130 automatic DNA sequencer (Applied Biosystems). Sequences were aligned with Seqscape analysis software V.2.5 (Applied Biosystems). The primer oligonucleotide sequences and PCR conditions are available on request.

Protein structural modelling

To explore the potential impact of the four identified missense DNMT3A variants, we undertook protein structure modelling. DNMT3A contains three functional domains: a PWWP domain, an ADD domain and a C-terminal DNA methyltransferase (MTase) domain.16 Mutants were mapped onto the active and autoinhibited form of the DNMT3A-DNMT3L complex (Protein Data Bank (PDB) 4U7T and 4U7P). Bound DNA was positioned by superposing the crystal structure of bacterial cytosine methyltransferase HhaI in complex with DNA (PDB 1MHT) onto the DNMT3A MTase domain. The Histone bound PWWP domain was modelled by superposition of the DNMT3A PWWP domain (PDB 3LLR) with Hepatoma-derived growth factor 2 PWWP domain in complex with H3K79me3 peptide. Molecular graphics and analyses were performed with the UCSF Chimera package. Chimera is developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIGMS P41-GM103311).17


NGS analysis

For a typical run of 24 samples, about 640 Mb were generated, corresponding to ∼3.5×106 reads. The mean read length was ∼178 bp. On average, for every sample, 99% of high quality sequencing reads (98% of bases) mapped to the reference genome. NGS analysis of the DNMT3A and SETD2 genes in the 210 samples generated 2958 variants after calling bioinformatic step. Variants with a frequency >1% in database of SNP (dbSNP)18 were filtered out. Only variants affecting evolutionarily conserved residues were retained using phyloP software.

SETD2 and DNMT3A variant detection and causality investigation

Based on population data, functional prediction and genetic (inheritance vs de novo) evidences, assessment of variants implication in overgrowth syndrome was performed according to recent guidelines.19 After annotation and filtering, DNMT3A and SETD2 variation screening performed by NGS identified seven heterozygous unreported different likely causal variants (ie, mutations) in 8/210 NSD1-negative index cases: p.Arg320*, p.Gln362*, p.Tyr365Cys, p.Asp529Asn, p.Val778Gly, and p.Arg882Cys in DNMT3A and p.His1762Leufs*26 in SETD2. All variations identified by NGS were confirmed using Sanger sequencing of the corresponding exons. Online supplementary figure S2 and table S1 present the six DNMT3A (two nonsense and four de novo missense) mutations and the SETD2 frameshift de novo mutation. The genotypes and clinical features of the eight carrier patients are summarised in table 1. Variants identified in our study and in the two previous studies7 ,8 were distributed over the entire SETD2 and DNMT3A coding sequences with no hotspots. Two DNMT3A variants were previously reported in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (c.958C>T, p.Arg320* and c.2644C>T, p.Arg882Cys). We confirmed that the five DNMT3A missense variants p.Tyr365Cys, p.Asp529Asn, p.Val778Gly and p.Arg882Cys (in two patients) were not found in either of the index cases' parents, indicating that these variants were de novo. The DNMT3A nonsense variant p.Gln362* was found to be inherited from T214 unaffected mother: ratio of reads with mutated sequence versus wild-type sequence was ∼17% with the NGS approach and the Sanger sequencing confirmed that the mutant allele was less represented than the wild-type allele (see online supplementary figure S1), arguing for a somatic mosaic mutation in the unaffected mother. We were able to detect the mutation at a low-level in her buccal cells (∼5%) and saliva (∼17%). The SETD2 frameshift variant p.His1762Leufs*26 was also found to be de novo.

Table 1

Phenotypic manifestations in patients with SETD2 and DNMT3A mutations

It is noteworthy that all the DNMT3A identified missense variants affected DNMT3A functional domains. One of the four DNMT3A missense variants (T144; p.Tyr365Cys) was located in the PWWP DNA methyltransferase conserved motif (PWWP). One missense variant was located in the ADD protein–protein interaction domain (T94; p.Asp529Asn) and the two other missense variants were located in DNA MTase domain (T8, p.Val778Gly; T155 and T162, p.Arg882Cys). Protein structural modelling allowed exploring the potential impact of the four identified missense DNMT3A variants (figure 1).

Figure 1

(A) Ribbon representation of the DNMT3A-DNMT3L dimer in active conformation (see Subjects and Methods section for PDB codes and modelling details). The PWWP, ADD and MTase domains of DNMT3A are coloured in orange, blue and cyan, respectively, DNMT3L is coloured in green and histone peptides are in magenta. Reported mutated residues are represented in yellow spheres. A substrate DNA oligomer is modelled on the complex. (B) Surface representation of the ADD domain in complex with an H3K4me0 containing peptide show specific recognition of unmethylated lysine by Asp529 and Asp531. (C) Position of Val778 in respect to the active (full colour) and inactive (faded colour) conformation of the ADD domain. (D) Surface representation of the PWWP domain in complex with an H3K36me3 containing peptide.

In addition, seven likely neutral variants and two variants of unknown significance were identified in nine NSD1-negative index cases (see online supplementary table S2): seven missense SETD2 variants and two synonymous DNMT3A variants. SETD2 variants of unknown significance need further family and functional studies to help clarify their significance.

Patient phenotypes

All patients with DNMT3A mutations were sporadic cases. They showed postnatal tall stature (>+2 SD), and macrocephaly (>+2 SD). Overweight or obesity (body mass index scores range 27.4–29.4) was observed in older patients (>13 years: 4/4). Mild intellectual deficiency (ID) was present in all. Minor facial features were observed: horizontal eyebrows, down-slanting small palpebral fissures, marked philtrum with a peculiar ‘drop’ shape of the interpillar space, large nasal bridge and thick nasal alae (figure 2). One patient presented combined pituitary hormone deficiency (somatotroph, thyrotroph and gonadotroph deficiencies; normal prolacrin; no corticotroph deficiency, T214). Other patients have no evidence of endocrine anomalies and puberty was normal in three patients older than 13 years. Other features were café-au-lait spots (1/7), nystagmus (2/7), sensory-motor axonal neuropathy (1/7) and atrial septal defect (1/7). The single patient with a de novo truncating SETD2 mutation had neonatal hypotonia and feedings difficulties in the first months, psychomotor retardation (able to walk at 21 months, first words at 3 years). At 12 years, he had ID, tall stature, macrocephaly, dysmorphy, café-au-lait spots and hyperchromic spots following the lines of Blaschko. He was able to speak complete sentences but did not have meaningful communication. He was in a special school for children with ID. The patient had a younger sister with tall stature (+2 SD), macrocephaly (+2.3 SD) and Blaschkoid skin spots on arms. However this girl had no hypotonia and was able to walk at 15 months. She was able to speak with sentences at 3 years. She had mild learning difficulties but no ID (IQ 95). She was in a normal school and was very well adapted socially. She was not dysmorphic and did not carry the SETD2 mutation.

Figure 2

Minor dysmorphic features in DNMT3A-positive patients: horizontal eyebrows, large nasal bridge, thick nasal alae, small palpebral fissures, relatively high midface and marked philtrum with triangular interpillar space. The pointed chin and receding hairline observed in NSD1-positive patients were not present in the herein described patients with DNMT3A mutation.


We report here seven different likely causal variants (six in DNMT3A and one in SETD2) in eight unrelated patients (table 1). Patients with DNMT3A mutation showed a consistent although not specific overgrowth phenotype characterised by postnatal tall stature (≥+2SD), macrocephaly (≥+2SD), ID and minor facial features. This observation confirms the previous publication.7 Our observations showed that phenotypes associated with mutations in NSD1 and DNMT3A overlap, combining tall stature, macrocephaly and learning disabilities, as previously described.4–7 Whereas facial dysmorphism in NSD1-positive patients is frequently recognisable, we failed to observe a highly specific facial dysmorphism in patients with DNMT3A mutation. The pointed chin, dolichocephaly and prominent forehead with receding hairline commonly identified in patients with Sotos NSD1-mutation were not observed in patients with DNMT3A mutation who presented with only minor facial features (horizontal eyebrows, small palpebral fissures, thick nasal alae and marked philtrum with a ‘drop’ shape). One patient carrying a DNMT3A missense mutation (T162; p.Arg882Cys) presented a cardiac malformation (atrial septal defect). Atrial septal defect was reported in 2/13 patients with DNMT3A mutations in the original study.7 In our study, somatotroph, thyrotroph and gonadotroph deficiencies were observed in a patient carrying a nonsense DNMT3A mutation (T214; p.Gln362*). Other symptoms observed in patients with DNMT3A mutation were overweight or obesity in four patients older than 13 years (table 1), café-au-lait spots in one patient and behavioural or psychiatric symptoms (schizophrenia, aggressiveness) in two adult patients both with nonsense mutations.

Seven patients with mutations in SETD2 were previously reported.8 ,20 ,21 The two patients with de novo mutations described by Luscan et al8 had tall stature and macrocephaly but they were identified from an overgrowth cohort. They presented learning difficulties with no ID. They had socialisation difficulties and one of them had behavioural disorders (aggressiveness, temper outburst). O'Roak et al20 reported two patients with de novo SETD2 mutations (one frameshift and one missense) and two patients with an inherited nonsense SETD2 mutation. They were identified by targeted sequencing of 44 candidate genes in 2446 autism spectrum disorder (ASD) probands. The four patients had autism or ASD but no data was available regarding their cognitive level, height or occipito-frontal circumference. Likewise, the phenotype of the carrier parents was not reported. Recently, Lumish et al21 reported a patient with a frameshift de novo SETD2 mutation. This patient showed ASD, ID and macrocephaly, but short stature was described. Obesity due to overeating was reported in the two patients with SETD2 mutations in the original study.8 In the present study, this phenotypic trait was not found in T56 carrying a SETD2 mutation; perhaps due to his young age. In the present study, patient with SETD2 mutation showed ID, macrocephaly and tall stature. His younger sister had tall stature and macrocephaly, but had no ID and did not carry the mutation. We could not therefore relate the overgrowth to the presence of a SETD2 mutation in this family. Conversely, the ID and difficulties in pragmatic language as well as the dysmorphism, observed in the boy were not present in the non-carrier sister and could be related to the SETD2 de novo mutation. In conclusion, the phenotype associated with SETD2 mutation remains to be described more precisely, but social communication and ASD could be part of the clinical spectrum. Like in Sotos syndrome, tall stature as well as ID could be frequent but inconstant features in SETD2-positive patients. Macrocephaly as well as autistic and other behavioural disorders remain to be confirmed as consistent features. Further large studies in cohorts of ID, ID with macrocephaly or patients with autism are needed in order to accurately describe the phenotype caused by SETD2 constitutional mutations. Likewise, screening for DNMT3A mutations in large cohorts of patients with ID with or without overgrowth will help to confirm if overgrowth is a major feature of patients with DNMT3A mutation. However, the relatively high number of DNMT3A-mutated patients in the cohort may suggest a real association between DNMT3A mutations and overgrowth.

Our NGS approach allowed the identification of one DNMT3A somatic and germline mosaic point mutation in the unaffected mother of T214 index case carrying a nonsense mutation (p.Gln362*). Mutation allele frequency was estimated to be at ∼17% in leucocytes DNA and ∼5% in buccal cells DNA in the unaffected mother of T214, using targeted NGS. Mosaicism in small populations of mutation-bearing cells may result in this maternal asymptomatic presentation. The frequency of mosaicism in sporadically affected Sotos patients or in unaffected index cases' parents is currently still unknown.

The overlapping phenotypes caused by NSD1, SETD2 and DNMT3A loss-of-functions may be explained by common functional effects. NSD1 and SETD2 encode ‘writers’ of the H3K36me3 mark and DNMT3A encodes a ‘reader’ of the same chromatin mark. However, the ADD domain of DNMT3A can recognise another epigenetic mark: unmethylated histone H3 (H3K4me0).16 H3K4me0 binding to the ADD domain was recently shown to stimulate DNMT3A to undergo a conformational change from an autoinhibitory form to an active form.16 DNMT3A Asp529 is both part of the ADD-MTase interface and forms a salt bridge to H3K4me0 (figure 1). It therefore participates in the specificity of the ADD domain for unmethylated lysine binding and the autoinhibition of the MTase activity. The DNMT3A p.Asp529Asn mutation (in patient T94) is located in the DNMT3A ADD domain and may hinder recognition of H3K4me0 by removing the charge on this residue (figure 1B).22 The mutation of the corresponding Asp to Ala in DNMT3L (an inactive paralogue of DNMT3A enzymes) was previously described to abolish Lys-peptide binding.23 The p.Asp529Ala mutation was also shown to release DNMT3A autoinhibition in vitro.16 DNMT3A Val778 is located in the MTase domain at the interface with the ADD domain and in a region contacting the MTase-ADD domain linker in the autoinhibited conformation (figure 1C). The local disruption in the DNMT3A p.Val778Gly mutation (in patient T8) was also predicted to influence binding of both the active and auto-inhibited conformations of the ADD domain and indirectly modulate DNMT3A MTase activity. This protein structural modelling allowed confirming the likely causal effect of the potential impact of the de novo DNMT3A missense mutations, in particular the p.Tyr365Cys DNMT3A missense variant (figure 1). The Tyr365 is located in the DNMT3A PWWP domain in the second α-helix of the C-terminal helical bundle (figure 1A, D). Tyr365 is totally buried between the helix bundle and the β barrel and packs with the tryptophane (W305) of the PWWP motif. W305 forms part of the hydrophobic pocket that specifically binds trimethylated histone lysine 36 (H3K36m3 purple in figure 1D). Since Cys is less bulky than Tyr, the p.Tyr365Cys variant could locally collapse the hydrophobic core, leading to a structural breakdown of this region or protein instability. This would in turn alter the H3K36m3 binding capability and/or specificity of this domain, disrupting apposition of the epigenetic chromatin mark H3K36me3. However, the relative contribution of the H3K36me3 and H3K4me0 epigenetic marks to the Sotos-like phenotype remains to be determined by functional studies.

Our results suggest analysing NSD1, SETD2 and DNMT3A coding sequences in routine molecular diagnosis in patients with Sotos or ‘Sotos-like’ overgrowth syndromes. Targeted-NGS seems particularly suitable for this strategy, as it allows simultaneous sequencing of the three genes coding sequences and provides a quantitative aspect allowing exons copy number alteration (CNA) detection. The NGS coupled qualitative and quantitative analysis avoids the sequential implementation of different techniques for the identification of these two different types of alterations in three different genes. No SETD2 or DNMT3A exon CNAs were identified in our study while 5q35 microdeletions and partial NSD1 constitutional deletions account for approximately 15% of Sotos syndrome.24 NGS may also be the method of choice for NSD1, SETD2 and DNMT3A mosaic detection thanks to a higher analytic sensitivity compared with Sanger sequencing.

No molecular NSD1, SETD2 and DNMT3A coding sequence abnormality was found in 193/210 patients (92%). Genome-wide array-based comparative genomic hybridisation was performed in the majority of patients. Revising clinical features would be probably helpful to recognise other specific overgrowth syndromes like Beckwith-Wiedemann, Perlman (DIS3L2) and Simpson-Golabi-Behmel (GPC3) syndromes25–27 which could have been missed in some of these patients. Moreover, these negative index cases and index cases with likely neutral variants should be analysed using NGS to search for causal mutations in other overgrowth syndromes known genes: NFIX,28 EZH229 and EED.30 Finally an exome sequencing approach which has proven to be a useful and relevant method for the identification of disease-causing genes should be used.31 ,32 WES bioinformatic analysis may be driven by the recent involvement of chromatin mark alterations in overgrowth condition development. Using NGS approaches, de novo loss-of-function mutations in the SETD5 gene were recently identified to cause a genetic syndrome with intellectual disability and facial dysmorphism.33 ,34 SETD5 encodes a histone SET domain methyltransferase.

Defects in NSD1, SETD2 and DNMT3A have also been identified at the somatic level in several types of sporadic cancers. NSD1 hypermethylation was described as a predictor of poor outcome in high-risk neuroblastoma35 and somatic inactivation of SETD2 was found to be a common event in clear cell renal cell carcinoma with loss of H3K36me3 mark.36 Somatic mutations of DNMT3A were reported in many malignancies, including acute myeloid leukaemia (AML) with a ∼25% frequency.37 ,38 DNMT3A mutations were described to arise early in AML evolution leading to a clonally expanded pool of preleukaemic haematopoietic stem cells (from which AML evolves). Functional experiments have shown that DNMT3A loss impairs differentiation of haematopoietic stem cells, resulting in an increase in the number of such cells in the bone marrow.39 On mouse haematopoietic stem cells (HSCs), Dnmt3a loss of function confers a preleukemic phenotype.40 ,41 Preleukaemic HSCs carrying DNMT3A mutations were found in remission samples, indicating that they survive chemotherapy.42 One nonsense DNMT3A variant (patient T57; p.Arg320*) was previously described in AMLs.43 The p.Arg882Cys missense constitutional DNMT3A variant found in two patients (T155 and T162) is the most frequent DNMT3A somatic mutation in AML (190 occurrences in the COSMIC database, June 2015), and is associated with a poor prognosis. Functional studies have shown a decrease in DNMT3A methyltransferase activity in tumour cells with this p.Arg882Cys mutation.44 These data support the pathogenicity of this variant shown to be de novo in two overgrowth patients in the present study. This variant was not identified in any patients in the original description of constitutional DNMT3A mutations.7 Recently, DNMT3A somatic mutations were described in WES of leucocyte DNA from >30 000 individuals without blood disorders.45 ,46 These DNMT3A mutations are thought to be initiating events for haematological malignancies that remain in subclinical states for long periods.47

Alterations in NSD1, DNMT3A and SETD2 are involved in the development of inherited genetic disorders (constitutional mutation) and also sporadic cancers (somatic alterations). Although reporting bias may exist, it has been suggested that overgrowth syndromes were associated with differing, although overlapping, patterns of tumour formation.48 However, only limited evidence of cancer association with Sotos syndrome has been provided and thus far, none of the index cases in our study has developed malignancies. Current therapeutic advances targeting chromatin marks give hope that patients with genetic syndromes associated with abnormal chromatin marks, including rare disease like Sotos syndrome, may benefit from these therapies in the future.49 ,50


The authors gratefully acknowledge the generosity of the families in providing samples and clinical details for this study.



  • Contributors All authors of this manuscript fulfil the criteria of authorship. EB, AA, GB, MD-F, AG, DL, LL, SO, JP, SS, AB, PS-V, LB, CV-P and VC-D recruited patients and collected clinical information. NL performed protein structure analysis. IL, AL, CT, DV, AB-S and MC performed sequencing and NGS data analysis. LB, CT, MV and EP designed the study and wrote the manuscript.

  • Competing interests None declared.

  • Patient consent Parental/guardian consent obtained.

  • Ethics approval Local Institutional (Hospital) Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.