Article Text


Non-hotspot-related breakpoints of common deletions in Sotos syndrome are located within destabilised DNA regions
  1. R Visser2,
  2. O Shimokawa1,
  3. N Harada1,
  4. N Niikawa1,
  5. N Matsumoto2
  1. 1Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
  2. 2Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
  1. Correspondence to:
 Dr Naomichi Matsumoto
 Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama 236-0004, Japan;


Background: Sotos syndrome (SoS) is a disorder characterised by excessive growth, typical craniofacial features, and developmental retardation. It is caused by haploinsuffiency of NSD1 at 5q35. There is a 3.0 kb recombination hotspot in which the breakpoints of around 80% of SoS patients with a common deletion can be mapped.

Objective: To identify deletion breakpoints located outside the SoS recombination hotspot.

Methods: A screening system for the directly orientated segments of the SoS LCRs was developed for 10 SoS patients with a common deletion who were negative for the SoS hotspot. Deletion-junction fragments were analysed for DNA duplex stability and their relation to scaffold/matrix attachment regions (S/MARs). These features were compared with the SoS hotspot and recombination hotspots of other genomic disorders.

Results: The breakpoint was mapped in four SoS patients, two with a deletion in the maternally derived chromosome. These breakpoint regions were located ∼2.5 kb, ∼9.6 kb, ∼27.2, and ∼27.7 kb telomeric to the SoS hotspot and were confined to 164 bp, 46 bp, 256 bp, and 124 bp, respectively. Two of the regions were mapped within Alu elements. All crossover events were found to have occurred within or adjacent to a highly destabilised DNA duplex with a high S/MAR probability. In contrast, the SoS hotspot and other genomic disorders’ recombination hotspots were mapped to stabilised DNA helix regions, flanked by destabilised regions with high probability of containing S/MAR elements.

Conclusions: The data suggest that a specific chromatin structure may increase susceptibility for recurrent crossover events and thus predispose to recombination hotspots in genomic disorders.

  • AF4, ALL-1 fused chromosome 4
  • CMT1A, Charcot-Marie-Tooth disease type 1A
  • DLCR, distal low copy repeat
  • HNPP, hereditary neuropathy with liability to pressure palsies
  • LCR, low copy repeat
  • MLL, mixed lineage leukaemia gene
  • NAHR, non-allelic homologous recombination
  • NF1, neurofibromatosis type 1
  • NSD1, nuclear receptor binding, SET domain containing protein 1
  • PLCR, proximal low copy repeat
  • PSV, paralogous sequence variant
  • RB1, retinoblastoma 1 gene
  • SIDD, stress induced destabilisation duplex
  • S/MAR, scaffold/matrix attachment region
  • SMS, Smith-Magenis syndrome
  • SoS, Sotos syndrome
  • Sotos syndrome
  • low copy repeats
  • homologous recombination
  • stress induced destabilisation duplex (SIDD)

Statistics from

Sotos syndrome (SoS; OMIM No 117550) is a congenital disorder characterised by overgrowth, distinctive craniofacial features, and various degrees of developmental delay.1 Aberrations of the nuclear receptor binding, SET domain containing protein 1 (NSD1) at 5q35 include intragenic mutations or submicroscopic whole gene deletions.2–9 Microdeletions are found in around 50% of the Japanese SoS patient population, while they account for about 10% of non-Japanese SoS patients.10 Recently, we showed that the 1.9 Mb common microdeletion is caused by homologous recombination between directly orientated segments (PLCR-B and DLCR-2B) of the proximal and distal low copy repeats (PLCR and DLCR).11 The unequal strand exchange region was limited to a 3.0 kb hotspot in which we mapped the breakpoints of 78.7% (37/47) of our Sotos patients with a common deletion. This major hotspot was recently confirmed by others.12 Similar analysis at a nucleotide level of recombination hotspots in other genomic disorders has identified, among others, regions of uninterrupted sequence homology, several sequence motifs, and raised GC content as hotspot features.11,13–16 However, these features are not consistent for all hotspots and, owing to the analytically difficult background of highly homologous LCRs, the number of identified hotspot related and non-hotspot-related breakpoints is limited. Other possible contributing factors—such as epigenetic alterations or specific chromatin structure—have been suggested.17 Interestingly, breakpoints of gross deletions were indeed found to coincide with non-B DNA conformations.18 Non-B DNA conformation could result in an increase in accessibility for cleavage enzymes or a weakened chemical stability of the DNA helix, or both.18 Recently, breakpoints of recurrent intragenic deletions of the retinoblastoma 1 (RB1) gene were located within a transition region between double stranded B-DNA and single stranded DNA.19 This region was adjacent to a strong scaffold/matrix attachment region (S/MAR).19 S/MARs are responsible for chromatin attachment to the nuclear matrix and for organisation of chromatin into loop domains.20 Thus chromatin organisation in relation to stability of the DNA duplex may be a contributing factor for hotspot predisposition in genomic disorders.

In this study, we screened the directly orientated regions within the Sotos LCRs in order to identify deletion breakpoints located outside the SoS recombination hotspot. The deletion junction fragments found were investigated at nucleotide level and compared with the SoS hotspot with regard to their locations, neighbouring structures, stability of the DNA helix (so called stress induced destabilisation duplex (SIDD)), and probability of containing an S/MAR element. Furthermore, the recombination hotspots of other genomic disorders were analysed for their SIDD and S/MAR profiles.



This study included 10 Japanese patients with Sotos syndrome who carry a common deletion but from whom the breakpoint could not be mapped to the SoS hotspot.3,11 Furthermore, available parental DNA of patients with newly identified breakpoints was analysed. The control group consisted of 50 healthy Japanese individuals. After informed consent, genomic DNA was obtained from peripheral blood cells or lymphoblastoid cell lines using standard methods. Experimental protocols were approved by the committee for ethical issues at Yokohama City University School of Medicine and by the committee for ethical issues on human genome and gene analysis at Nagasaki University.

Screening by long range polymerase chain reaction

Methods followed were similar to those described previously.11 In short, sets of primers with the forward primer specific for PLCR-B and the reverse for DLCR-2B were designed with the online version of Primer3 ( Primer sequences and product length are shown in table 1.

Table 1

 Primer sequences used for long range polymerase chain reactions

Amplification was tested on PLCR-B BAC-clone RP11-546L14 (GenBank accession number AC108509), DLCR-2B BAC-clone CTD-2515I1 (GenBank accession number AC118457), and genomic DNA from a normal individual. The annealing temperatures decisive for specific amplification of a possible deletion-junction fragment were determined experimentally. Long range polymerase chain reaction (PCR) was carried out using the GeneAmp XL PCR Kit (Applied Biosystems, Foster City, California, USA). Positive PCR products were amplified with nested primers and subsequently sequenced. For primer set 6, all products were first submitted to restriction with FspI to eliminate the amplified product of the normal DLCR-2B and possible breakpoint-junction fragments were sequenced. All nested primer sequences and conditions are available on request. Paralogous sequence variants (PSVs) (that is, nucleotide differences between the PLCR-B and DLCR-2B)22 were mapped to the PLCR-B and DLCR-2B according to the NCBI build 35 (May 2004) database (

Analysis of the deletion-junction fragments and recombination hotspots

The identified SoS deletion-junction fragments were analysed including 3.0 kb of their flanking sequences. Repetitive sequence elements were identified with RepeatMasker ( The sequences covering the recombination hotspots of deletions in neurofibromatosis type 1 (NF1; OMIM No 162200),14 of deletions in hereditary neuropathy with liability to pressure palsies (HNPP; OMIM No 162500) and its reciprocal duplications in Charcot–Marie–Tooth disease type 1A (CMT1A; OMIM No 118220),23–25 of deletions in Smith–Magenis syndrome (SMS; OMIM No 182290) and its reciprocal duplications in dup(17)(p11.2p11.2),15,26 were obtained by use of the “PCR” and “Blat” functions on the UCSC homepage, containing the NCBI build 35 (May 2004 version). WebSIDD was used for the prediction of stress induced, duplex destabilised (SIDD) sites in double stranded DNA ( Scaffold/matrix attachment regions were predicted with S/MAR-Wiz version 1.0 ( Both programs were run under default conditions.


Four primers sets were designed (table 1) and in combination with the previously designed hotspot primer sets,11 a nearly complete coverage was achieved of PLCR-B and PLCR-2B (fig 1, panels A and B). Remaining small gaps included ∼1.5 kb, ∼0.4 kb and ∼0.6 kb, respectively, owing to difficulties in obtaining amplification of these regions. SoS 85 and SoS 110 showed a ∼11.1 kb amplified product for primer set 4, while their respective parents were negative for the same reaction (fig 2A). This indicated a deletion-junction fragment. Also 50 control samples were negative for this product. Sequencing revealed a transition of PSVs mapped to PLCR-B and those mapped to DLCR-2B for both patients. The breakpoint region for SoS 85 could be restricted to 164 base pairs (bp) (between nucleotide position 1319 and 1483) and to 46 bp for SoS 110 (between nucleotide position 8517 and 8563) (fig 2C). PSVs at position 8975 and 9460 for SoS 110 are likely to be polymorphisms as they were also mapped to PLCR-B in SoS 85 (data not shown). The breakpoints were located ∼2.5 kb and ∼9.6 kb in the telomeric region of the SoS recombination hotspot for SoS 85 and SoS 110, respectively. SoS 4 and SoS 5 were, after restriction with FspI, positive for an ∼11.3 kb product, which indicated a breakpoint-junction fragment. Restriction with FspI was necessary as amplification of the normal DLCR-2B chromosome was also occasionally detected. Fifty normal controls were screened and in seven an amplified product could be obtained. However, none of the seven controls showed a breakpoint-junction fragment after restriction with FspI. Unfortunately, parental DNA could not be obtained. The breakpoint-junction fragments were sequenced and the crossover regions based on PSVs were confined to 256 bp for SoS 5 (between position 5505 and 5761) and 124 bp for SoS 4 (between position 6028 and 6152) (fig 2D). The two breakpoint regions are located ∼27.2 kb and ∼27.7 kb telomeric to the SoS hotspot and are separated from each other by ∼0.3 kb. In SoS 5, an insertion of 4 nucleotides (GACA) was found at position 5594. This could indicate either the exact breakpoint location or a mere polymorphism.

Figure 1

 (A) Schematic presentation of possible non-allelic homologous recombination (NAHR) events resulting in the common 1.9 Mb microdeletion in Sotos syndrome (SoS). Mechanisms of genomic rearrangements are reviewed in detail by Stankiewicz and Lupski.29 The upper part depicts the possible crossover in an interchromosomal or an intrachromosomal recombination event. The lower part shows an intrachromatid crossover event. The predicted deletion-junction fragment is shown with thick black lines. The segments within the proximal low copy repeat (PLCR) together with corresponding homologous counterparts in the distal low copy repeat (DLCR), are indicated with blocks and their respective letters. PLCR-B is represented twice in the DLCR (DLCR-1B and DLCR-2B). The arrows indicate the genomic orientation. (B) Presentation of the deletion-junction fragment of directly orientated PLCR-B and DLCR-2B. A shaded box indicates the SoS hotspot. Vertical arrows show the breakpoint location of SoS 85, SoS 110, SoS 5, and SoS 4. Bidirectional arrows above the fragment depict the genomic distances between the different breakpoint locations and the remaining parts. Horizontal black lines below the deletion-junction fragment show the schematic position and length of long PCR products with the used primer sets. The letters X, Y, and Z indicate existing gaps of ∼1.5 kb, ∼0.4 kb, and ∼0.6 kb, respectively. Cen: centromere; Tel: telomere.

Figure 2

 (A) Polymerase chain reaction (PCR) results for primer set 4 in patients SoS 85 and SoS 110 and their parents. Left lane: a 1 kb plus DNA ladder (Invitrogen, San Diego, California, USA). F 85: father of SoS 85; M 85: mother of SoS 85; F 110: father of SoS 110; M 110: mother of SoS 110. (B) PCR results for primer set 6 before and after restriction with FspI in SoS 4, SoS 5, DNA of a normal individual and clone CTD-2515I1. The detected ∼11.3 kb product in SoS 4 and SoS 5 indicates a breakpoint-junction fragment. Left lane: a 1 kb plus DNA ladder. (C) Paralogous sequence variants (PSVs) identified in the breakpoint regions of SoS 85 (upper) and SoS 110 (lower). Black boxes indicate PSVs of PLCR-B and white boxes show those of DLCR-2B. The PSVs as deposited in the NCBI build 35 (May version 2004) are shown above the respective patient’s PSVs. The position in bp indicates the position of the PSVs within the product amplified with primer set 4. (D) PSVs identified in the breakpoint regions of SoS 5 (upper) and SoS 4 (lower). The position in base pairs (bp) indicates the position of the PSVs within the product amplified with primer set 6 and after restriction with FspI. The grey boxes show the position of the four inserted nucleotides as found in SoS 5. DLCR, distal low copy repeat; PLCR, proximal low copy repeat; SoS, Sotos syndrome.

For SoS 5 and SoS 85, the breakpoint region was mapped to a sequence not related to any interspersed repeats. However in SoS 85, a simple repeat (TA)n was found at position 1308–1319 and a LINE1 element was found in close proximity—that is, 5 bp telomeric. The crossover event for SoS 4 occurred within an Alu-Sx element and the region for SoS 110 overlapped partially with an Alu-Sg element. In the breakpoint regions for the four patients, only in SoS 5 was one translin motif (5′- GCCCWSSW-3′) detected. This motif was found increased for the SoS hotspot.11 Patients SoS 85 and SoS 110 were haplotyped previously and confirmed to carry a deletion in the maternally derived chromosome.30 The parental origin was unknown in SoS 4 and SoS 5. The deletion-junction fragment of SoS 85 arose through an intrachromosomal recombination event (fig 1A, upper and lower panels), while this is not known for SoS 110.30 The parents of SoS 85 and SoS 110 had a heterozygous inversion of the interval between the SoS LCRs.11 The father of SoS 5 also showed a heterozygous inversion, while the mother did not carry an inversion.11 Parental DNA could not be obtained for SoS 4.

The results of the SIDD and S/MAR analysis based on the proximal sequences of the PLCRs involved are shown in fig 3. Analysis of the distal sequences of the respective LCRs did not show any significant differences in SIDD and S/MAR profiles owing to the high homology of proximal and distal LCRs (data not shown). The breakpoints of SoS 85 and SoS 110 (fig 3, panels A and B, respectively) overlapped with DNA regions which are very susceptible to duplex destabilisation with G(x) values (that is, the energy needed to force the base pair at position x always to be unpaired)27 close to 0 kcal/mol. In concordance, the same regions showed increased S/MAR potential. The breakpoints of SoS 5 and SoS 4 were mapped to a transition region with an increased S/MAR potential and directly adjacent to destabilised DNA (fig 3C). The SoS hotspot region was mapped to a ∼4.8 kb segment of highly stabilised DNA without S/MAR potential (fig 3D). Also the recombination hotspots for NF1 (2.1 kb),14 for common 4 Mb deletions in SMS (∼8 kb),15 and for uncommon deletions in SMS (577 bp),26 showed similar stretches of stabilised SIDD sites covering the hotspots, flanked by non-stable regions with a high S/MAR potential (fig 3, panels E, F, and G, respectively). The recombination hotspot in CMT1A and HNPP (557–741 bp)23–25 was mapped to a stabilised region, although this region also showed a slightly increased S/MAR potential (∼0.10) (fig 3H). However, the S/MAR potential of the corresponding sequence of the distal LCR was close to zero (data not shown).

Figure 3

 The stress induced destabilisation duplex (SIDD) energy profile (upper graph) and scaffold/matrix attachment region (S/MAR) potential (lower) of the proximal sequences of the low copy repeats (LCRs) involved are shown in relation to the breakpoint region in SoS 85 (A), SoS 110 (B), SoS 5 (C), and SoS 4 (C) (which are ordered based upon their genomic location in DLCR-2B (see fig 1B)), the SoS hotspot (D), NF1 hotspot (E), SMS hotspot (F), SMS hotspot for uncommon sized deletions (G), and the CMT1A/HNPP hotspot (H). Analysis of the distal LCR sequences showed similar patterns (data not shown). Horizontal bidirectional arrows between the SIDD and S/MAR profiles indicate the respective hotspots. Vertical bidirectional arrows show the breakpoint regions for SoS 85, SoS 110, SoS 5, and SoS 4, respectively. In the SIDD profile, the y axis shows the incremental free energy G(x) in (kcal/mol), which corresponds to the energy necessary to force the base pair at position x always to be open.27 For example, a G(x) value of 10.2 kcal/mol indicates unstressed stable DNA at that position. The y axis in the S/MAR profile indicates the normalised S/MAR association potential with values between 0 and 1.0. The higher the predicted value, the more likely it is that the corresponding region contains a S/MAR element. Both x axes show the sequence distances in base pairs. CMT1A, Charcot-Marie-Tooth disease type 1A; DLCR, distal low copy repeat; HNPP, hereditary neuropathy with liability to pressure palsies; NF1, neurofibromatosis type 1; SMS, Smith–Magenis syndrome; SoS, Sotos syndrome.


Common deletions in SoS syndrome are caused by non-allelic homologous recombination (NAHR) between directly orientated LCR segments.11 By use of long range PCR screening, we identified the first non-hotspot-related breakpoints in four SoS patients with a common microdeletion. These breakpoint locations are expected to have a low recurrence—first, because of the low frequency in our SoS patients population so far (2.1% (1/47) for each of the breakpoint regions of SoS 85 and SoS 110, and 4.3% (2/47) for the region containing the breakpoints of SoS 4 and SoS 5); and second, in case of SoS 85 and SoS 110, because of a maternal deletion origin, while all other patients previously haplotyped carried the deletion in the paternally derived chromosome.30 A rare maternal deletion origin and uncommon breakpoint locations are suggestive of a sex biased recombination mechanism in these two patients, but investigations in larger populations are necessary for confirmation. It is, however, this low recurrence characteristic in all four patients, in combination with NAHR as the general underlying mechanism, which makes them interesting candidates for comparison with recombination hotspot features.

In SoS 5 and SoS 85, similar to the hotspot, the breakpoints were not located within short repetitive DNA elements. However, a LINE1 element was found in close proximity in SoS 85. Active human LINE1 retrotransposons in vivo have been shown to induce genomic instability such as inversions, deletions, and recombination between L1 elements.31,32 However, because of truncating mutations (data not shown) within the two open reading frames necessary for retrotransposition, it is unlikely that the LINE1 element near the breakpoint of SoS 85 maintained its function. On the other hand, the location of LINE elements has also been proposed to be a marker for the localisation of S/MARs, as easily unwinding DNA might predispose to their integration within the genome.20 In SoS 4, the crossover event occurred in an Alu-Sx element and in SoS 110 the breakpoint region partially overlapped with an Alu-Sg element. Alu mediated illegitimate recombination is estimated to be responsible for ∼0.3% of all human genetic diseases.33 An Alu-Sq/x element was found in the hotspot of uncommon sized deletions in Smith–Magenis syndrome and the recombination interval was mapped to this element in four patients.26 However, this region was also characterised by a stabilised DNA helix and low S/MAR potential (fig 3G).

The breakpoints of the other six SoS patients could not be identified. Although our LCR specific long range PCR proved its use, it is possible that the remaining primer sets are not sensitive enough, while only detection of a true positive can confirm their sensitivity. Also, possible polymorphisms at the primer sites, unknown complex rearrangements within the LCRs during deletion-junction formation, or a location of the breakpoints within the remaining gaps might have inhibited detection.

To date, analysis of hotspot related and non-hotspot-related breakpoints have deepened our knowledge about the underlying causative mechanisms. LCRs in direct orientation with a high sequence identity are the necessary structures for rearrangement resulting in deletion and reciprocal duplication.34 Even higher sequence similarity is usually found within hotspots in combination with regions of uninterrupted sequence homology (∼300–500 bp), which are thought to be necessary for efficient recombination in mammalian cells.35,36 Several sequence motifs have been described, but neither has a common recombination initiating factor been found, nor have the identified motifs been confirmed in vivo.11,14,15,24,26 In the quest for an explanation for the exhibited preference for unequal recombination in a small region—for example, the 3.0 kb hotspot in SoS is only ∼5% of the total PLCR-B in size—other role playing factors seem likely. A specific chromatin structure was hypothesised to be such a determining factor.17,18 A susceptible conformation would possibly have increased accessibility for the double strand break and repair machinery and could thus predispose to a hotspot location.17

The human chromatin is organised in around 60 000 loop domains, which are periodically attached at their base to a supporting skeleton, the so called nuclear scaffold or matrix.20 This compartmentalisation of the genome has an important regulatory function in gene expression, DNA replication, and recombination.20,37 As S/MARs are essentially recombinogenic unpairing regions, a strong correlation has been found between two basically different algorithms—the S/MAR-wiz and WebSIDD—with the latter detecting stress induced destabilised unwound DNA.38 In general, results of in silico analysis should be considered carefully. However, a good correlation for the programs used was already confirmed with the results of in vitro experiments.39

As many as 74% (23/31) of the breakpoints in the mixed lineage leukaemia (MLL) gene in de novo leukaemia patients were mapped to a breakpoint cluster region located between S/MARs.20 Furthermore, a clustering of breakpoints of t(4;11) translocations in the human MLL and AF4 (ALL-1 fused chromosome 4) genes was also found to be located outside high affinity S/MARs, but with flanking S/MARs in the vicinity.40 The hotspots for SoS, NF1, SMS/dup(17)(p11.2p11.2), and uncommon deletions in SMS as investigated with these programs in this study, showed a similar pattern of stabilised DNA duplex regions, located between destabilised regions with a coinciding higher S/MAR probability. In contrast, the four non-hotspot-related breakpoints were found in or at the border of highly destabilised DNA with an increased S/MAR potential. The patterns of the hotspot for deletions and reciprocal duplications in HNPP/CMT1A were not in complete correlation. The hotspot still seemed to be located within a stabilised DNA helix, but an S/MAR potential of ∼0.10 was also found. As the S/MAR potential in the distal LCR was close to zero (data not the shown), the meaning of such a slightly increased potential remains to be determined. The differences in DNA destabilisation profiles and in frequency of occurrence between breakpoints located in and outside the SoS hotspot seem to support the view that the centre for recombination is located in stabilised DNA regions and that regions with strand separation potential (that is, S/MARs) are likely to function as mediators.20 However, it should be noted that the previous data are based upon somatic events in leukaemia patients with translocations between different chromosomes.20,40 Currently only a limited number of genomic disorders could be used for analysis. Therefore, future identification and analysis of other breakpoint clusters and non-hotspot-related breakpoints mediated by NAHR will possibly determine whether this analysis could be used in combination with other hotspot characteristics to predict possible recombination hotspot locations within LCRs.

In conclusion, the first identification of four non-hotspot-related breakpoints in SoS in comparison with the SoS and other recombination hotspots indicates that DNA duplex stabilisation and specific chromatin organisation might play a role in predisposition for recombination hotspot locations of genomic disorders.


We express our gratitude to the patients, their parents, and the referring physicians for their cooperation. We thank Ms Tamae Hanai and Ms Yasuko Noguchi for their excellent technical assistance. The study was supported by the Japan Science and Technology Agency (CREST), the International Consortium for Medical Care of Hibakusha and Radiation Life Science, The 21st Century Center of Excellence (COE).


View Abstract


  • Competing interests: none declared

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.