Article Text

Download PDFPDF

Original article
Whole genome paired-end sequencing elucidates functional and phenotypic consequences of balanced chromosomal rearrangement in patients with developmental disorders
Free
  1. Caroline Schluth-Bolard1,2,
  2. Flavie Diguet1,2,
  3. Nicolas Chatron1,2,
  4. Pierre-Antoine Rollat-Farnier1,
  5. Claire Bardel3,
  6. Alexandra Afenjar4,5,
  7. Florence Amblard6,
  8. Jeanne Amiel7,
  9. Sophie Blesson8,
  10. Patrick Callier9,
  11. Yline Capri10,
  12. Patrick Collignon11,
  13. Marie-Pierre Cordier1,
  14. Christine Coubes12,
  15. Benedicte Demeer13,
  16. Annabelle Chaussenot14,
  17. Florence Demurger15,
  18. Françoise Devillard6,
  19. Martine Doco-Fenzy16,
  20. Céline Dupont10,
  21. Jean-Michel Dupont17,
  22. Sophie Dupuis-Girod1,
  23. Laurence Faivre18,
  24. Brigitte Gilbert-Dussardier19,
  25. Anne-Marie Guerrot20,
  26. Marine Houlier7,
  27. Bertrand Isidor21,
  28. Sylvie Jaillard22,
  29. Géraldine Joly-Hélas23,
  30. Valérie Kremer24,
  31. Didier Lacombe25,
  32. Cédric Le Caignec21,
  33. Aziza Lebbar17,
  34. Marine Lebrun26,
  35. Gaetan Lesca1,2,
  36. James Lespinasse27,
  37. Jonathan Levy10,
  38. Valérie Malan28,
  39. Michele Mathieu-Dramard13,
  40. Julie Masson1,2,
  41. Alice Masurel-Paulet18,
  42. Cyril Mignot29,
  43. Chantal Missirian30,
  44. Fanny Morice-Picard25,
  45. Sébastien Moutton25,
  46. Gwenaël Nadeau27,31,
  47. Céline Pebrel-Richard32,
  48. Sylvie Odent15,33,
  49. Véronique Paquis-Flucklinger14,
  50. Laurent Pasquier15,
  51. Nicole Philip34,
  52. Morgane Plutino14,
  53. Linda Pons1,2,
  54. Marie-France Portnoï4,
  55. Fabienne Prieur26,
  56. Jacques Puechberty12,
  57. Audrey Putoux1,2,
  58. Marlène Rio7,
  59. Caroline Rooryck-Thambo25,
  60. Massimiliano Rossi1,2,
  61. Catherine Sarret35,
  62. Véronique Satre6,36,
  63. Jean-Pierre Siffroi4,
  64. Marianne Till1,
  65. Renaud Touraine26,
  66. Annick Toutain8,
  67. Jérome Toutain25,
  68. Stéphanie Valence5,37,
  69. Alain Verloes10,
  70. Sandra Whalen4,
  71. Patrick Edery1,2,
  72. Anne-Claude Tabet10,
  73. Damien Sanlaville1,2
  1. 1 Service de Génétique, Hospices Civils de Lyon, Bron, France
  2. 2 INSERM U1028, CNRS UMR5292, UCBL1, GENDEV Team, Neurosciences Research Center of Lyon, Bron, France
  3. 3 Cellule bioinformatique de la plateforme NGS, Hospices Civils de Lyon, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, Lyon 1 University, Bron, France
  4. 4 Département de génétique et embryologie médicale, Centre de référence des déficiences intellectuelles de causes rares, AP-HP, Hôpital Armand Trousseau, Paris, France
  5. 5 GRC n°19, pathologies Congénitales du Cervelet-LeucoDystrophies, AP-HP, Hôpital Armand Trousseau, Sorbonne Université, Paris, France
  6. 6 Laboratoire de Génétique Chromosomique, Hôpital Couple Enfant, CHU Grenoble, Grenoble, France
  7. 7 Service de Génétique Médicale, Hôpital Necker-Enfants Malades, Paris, France
  8. 8 Service de Génétique, CHRU de Tours, Tours, France
  9. 9 Laboratoire de Cytogénétique, CHU Dijon, Dijon, France
  10. 10 Département de Génétique, Hôpital Robert Debré, Paris, France
  11. 11 Service de Génétique Médicale, CHI Toulon, Toulon, France
  12. 12 Service de Génétique, Hôpital Arnaud de Villeneuve, Montpellier, France
  13. 13 Centre d’activité de génétique clinique, CLAD nord de France, CHU Amiens, Amiens, France
  14. 14 Service de Génétique Médicale, CHU Nice, Nice, France
  15. 15 Service de Génétique Clinique, CHU Rennes, Rennes, France
  16. 16 Service de Génétique, EA3801, SFR CAP SANTE, CHU Reims, Reims, France
  17. 17 Laboratoire de Cytogénétique Constitutionnelle, APHP-HUPC site Cochin, Paris, France
  18. 18 Centre de référence anomalies du développement et syndromes malformatifs, FHU TRANSLAD et équipe GAD INSERM UMR1231, CHU Dijon-Bourgogne et Université de Bourgogne-Franche Comté, Dijon, France
  19. 19 Service de Génétique, EA3808, Université de Poitiers, CHU de Poitiers, Poitiers, France
  20. 20 Unité de Génétique Clinique, CHU de Rouen, Rouen, France
  21. 21 Service de Génétique Médicale, CHU-Nantes, Nantes, France
  22. 22 Laboratoire de Cytogénétique et de Biologie Cellulaire, CHU Pontchaillou, Rennes, France
  23. 23 Laboratoire de Cytogénétique, CHU de Rouen, Rouen, France
  24. 24 Laboratoire de Cytogénétique, CHU Strasbourg, Strasbourg, France
  25. 25 Service de Génétique Médicale, Hôpital Pellegrin, Université de Bordeaux, MRGM INSERM U1211, CHU Bordeaux, Bordeaux, France
  26. 26 Service de Génétique Clinique, Chromosomique et Moléculaire, CHU Hôpital Nord, Saint-Etienne, France
  27. 27 Laboratoire de Génétique Chromosomique, CH Général, Chambéry, France
  28. 28 Service de Cytogénétique, Hôpital Necker Enfants Malades, Paris, France
  29. 29 Département de Génétique; Centre de Référence Déficience Intellectuelle de Causes Rares, Groupe Hospitalier Pitié-Salpêtrière, APHP, Paris, France
  30. 30 Laboratoire de Génétique Chromosomique, Département de Génétique Médicale, AP-HM, Marseille, France
  31. 31 Service de Cytogénétique, CH Valence, Valence, France
  32. 32 Service de Cytogénétique Médicale, Hôpital Estaing, CHU Clermont-Ferrand, Clermont-Ferrand, France
  33. 33 CNRS, IGDR (Institut de Génétique et Développement de Rennes) UMR 6290, Université de Rennes, Rennes, France
  34. 34 Département de Génétique Médicale, Unité de Génétique Clinique, AP-HM, Marseille, France
  35. 35 Service de Génétique Médicale, Hôpital Estaing, CHU Clermont-Ferrand, Clermont-Ferrand, France
  36. 36 Equipe Génétique, Epigénétique et Thérapies de l’Infertilité, IAB, INSERM 1209, CNRS UMR5309, Grenoble, France
  37. 37 Service de Neurologie Pédiatrique, Hôpital Armand Trousseau, APHP, GHUEP, Paris, France
  1. Correspondence to Dr Caroline Schluth-Bolard, Service de Génétique, Centre de Référence des Anomalies du Développement, Centre Hospitalier Universitaire de Lyon, Bron cedex 69677, France; caroline.schluth-bolard{at}chu-lyon.fr

Abstract

Background Balanced chromosomal rearrangements associated with abnormal phenotype are rare events, but may be challenging for genetic counselling, since molecular characterisation of breakpoints is not performed routinely. We used next-generation sequencing to characterise breakpoints of balanced chromosomal rearrangements at the molecular level in patients with intellectual disability and/or congenital anomalies.

Methods Breakpoints were characterised by a paired-end low depth whole genome sequencing (WGS) strategy and validated by Sanger sequencing. Expression study of disrupted and neighbouring genes was performed by RT-qPCR from blood or lymphoblastoid cell line RNA.

Results Among the 55 patients included (41 reciprocal translocations, 4 inversions, 2 insertions and 8 complex chromosomal rearrangements), we were able to detect 89% of chromosomal rearrangements (49/55). Molecular signatures at the breakpoints suggested that DNA breaks arose randomly and that there was no major influence of repeated elements. Non-homologous end-joining appeared as the main mechanism of repair (55% of rearrangements). A diagnosis could be established in 22/49 patients (44.8%), 15 by gene disruption (KANSL1, FOXP1, SPRED1, TLK2, MBD5, DMD, AUTS2, MEIS2, MEF2C, NRXN1, NFIX, SYNGAP1, GHR, ZMIZ1) and 7 by position effect (DLX5, MEF2C, BCL11B, SATB2, ZMIZ1). In addition, 16 new candidate genes were identified. Systematic gene expression studies further supported these results. We also showed the contribution of topologically associated domain maps to WGS data interpretation.

Conclusion Paired-end WGS is a valid strategy and may be used for structural variation characterisation in a clinical setting.

  • whole genome sequencing
  • chromosomal rearrangements
  • intellectual disability
  • position effect
  • structural variation

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction 

Structural variations (SV) are rearrangements of chromosome architecture that may be benign or pathogenic. They were first identified on the basis of karyotype in 0.7% of the population.1 Two groups are distinguished: unbalanced chromosomal rearrangements, such as deletions or duplications, generally associated with an altered phenotype, and apparently balanced chromosomal rearrangements (ABCR), such as reciprocal translocations, inversions and insertions. The latter category is characterised by the absence of gain or loss of genetic material and has usually no phenotypic consequence for the carrier, except reproductive issues such as infertility or miscarriages.1 However, it has been estimated that up to 27% of these ABCR may be associated with an abnormal phenotype.2 3 Until recently, the pathophysiology was poorly understood and these cases have been for a long time a challenge for genetic counselling. The development of molecular cytogenetic techniques, such as FISH4 and chromosomal microarray (CMA),5 contributed to decipher the underlying aetiologies of abnormal phenotypes in ABCR. In approximately 40% of cases, pathogenic CNV were uncovered.6 In some other cases, the phenotype could be related to gene disruption7 or to position effect,8 a mechanism in which the modification of the chromatin environment alters gene expression. However, the precise characterisation of ABCR breakpoints is time-consuming and rarely proposed in diagnostic settings.

More recently, a further step has been taken by the use of next-generation sequencing that has proved to be a rapid and efficient method to detect SV, including CNVs and balanced SVs.9 It was used to characterise the breakpoints of ABCR in patients with developmental disorders.10–16 By allowing the refinement of SV breakpoints at base pair level, these studies shed light on some mechanistic and pathophysiological aspects. In particular, they showed the unexpected complexity of ABCR,11 the role of repeated elements13 14 and the predominance of non-homologous end-joining (NHEJ).11–13 17 They also assessed their impact on genome architecture, especially on topologically associated domains (TADs)12–14 and their involvement in patient phenotype.13 14 16 However, studies remain scarce and often included small number of cases except for few of them.2 11–14 16 In addition, these studies used mainly mate-pair or similar strategies11–14 16 that were developed for SV detection and are not available in all laboratories. Paired-end sequencing is most widely used in genetics laboratories and may have other applications.18 We previously demonstrated that paired-end whole genome sequencing (WGS) was also powerful for SV detection.19 In the study presented herein, we applied this strategy to one of the largest cohorts of patients with intellectual disability and/or multiple congenital anomalies (ID/MCA) associated with ABCR and no pathogenic CNV detected by CMA in order to characterise ABCR breakpoints and study their mechanisms as well as their functional and phenotypic consequences.

Materials and methods

Patients

Fifty-five patients were recruited between April 2015 and February 2017 among 21 French clinical genetics centres as part of the ANI project (ABCR characterisation by next-generation sequencing in patients with ID/MCA, ClinicalTrials NCT02451761). Inclusion criteria were the following: i) abnormal phenotype including intellectual disability and/or congenital anomalies; ii) presence of ABCR on standard karyotype; iii) no pathogenic CNVs detected by CMA (according to French CMA Guidelines V.3.1); iv) ABCR de novo or inherited from a parent presenting the same phenotype. All patients or their parents gave written informed consent for this study, which was conducted with respect to the recommendations of the Helsinki Declaration. All patients had a detailed clinical examination by a trained geneticist.

Whole genome sequencing

Genomic DNA was extracted from EDTA-blood samples with the QIAmp DNA Blood Midikit (Qiagen, Vento, The Netherlands) according to the manufacturer’s instructions. Genomic DNA libraries of 350 bp fragments were prepared following the Illumina TruSeq DNA PCR-free protocol (Illumina, San Diego, California, USA) with 3 µg DNA. DNA libraries were sequenced on an Illumina NextSeq 500 as paired-end 101 bp reads using the High Output (300 cycles) NextSeq500 kit, with two patients per flow-cell yielding a mean sequencing depth between 5.53X and 17.91X. For patient MD/0110, carrying a mosaic translocation, a complete flow cell was used with a 24.9X mean sequencing depth. Image analysis and base calling were performed using Illumina Real-Time Analysis Pipeline 2 and bcl2fastq with default parameters. For each sample, an alignment of the reads against the hg19 version of the human genome was done using BWA-MEM V.0.7.10.20 The reads were then sorted using Samtools V.1.3.1,21 and the duplicates removed by PicardTools V.1.138. Then, SV were detected using BreakDancer V.1.4.5.22 The generated outputs were then annotated using an in-house programme, Svagga (Rollat-Farnier et al, https://gitlab.inria.fr/NGS/svagga). For each sample, we used the other samples of the ANI project as reference, in order to remove recurrent variants. A distance of 500 nucleotides was used as the maximal distance for Svagga to consider two breakpoints as identical. Integrative Genomics Viewer V.2.323 was used for the SV visualisation and validation. In case of failed variant identification, or discrepancy with the corresponding karyotype, the same pipeline was used but with an alignment against the hg38 version of the human genome.

PCR amplification of junction fragment and Sanger sequencing

Each breakpoint identified by WGS was confirmed by PCR and Sanger sequencing. Junction fragments were amplified using the Taq DNA Core kit 10 (MP Biomedicals, Solon, Ohio, USA), from patient DNA. A DNA from an individual who was not a carrier of chromosomal rearrangement was included as negative control; amplification with the primer pair for the ATP1A3 gene (exons 7–8) was used as positive control. Specific PCR products corresponding to the junction fragments were sequenced by the Sanger method. Breakpoint sequences were aligned to the reference genome hg19 using BLAT tool (UCSC) in order to determine the breakpoint coordinates. For some complex rearrangements, FISH on lymphocyte metaphase spreads was performed according to standard procedures.

CNV validation by qPCR

CNVs associated with the chromosomal rearrangement were confirmed by real-time qPCR. It was performed according to the manufacturer’s recommendations with QuantiTect SYBR Green PCR kit (Qiagen, Courtaboeuf, France) on a Light Cycler 2000 (Roche Applied Science, Indianapolis, Indiana, USA) using specific primers amplifying a unique sequence within the CNV and ADORA2B as reference gene.

Expression studies

Blood RNA was extracted from Paxgene samples according to the manufacturer’s instruction (Qiagen). Lymphoblastoid cell lines (LCL) RNA was extracted using the RNeasy plus mini kit (Qiagen). RT was performed using 500 ng RNA with the Superscript II RT kit (Invitrogen, Life Technologies, Carlsbad, California, USA). Real-time PCR was performed using 1/20 dilutions of cDNAs with the QuantiTect SYBR Green PCR kit (Qiagen) on a LightCycler 2000 (Roche Applied Science) in triplicate. Primers were designed for each disrupted gene and genes adjacent to the breakpoints; ACTB was used as reference gene. Samples included patient and two to four sex-matched and tissue-matched control cDNAs, as well as RT-negative RT products. Data were analysed by relative quantification according to the 2-ΔΔCt method.24

For fusion transcript amplification, primer pairs were chosen on each side of the putative mRNA junction. The fusion was then amplified by PCR from cDNA with Taq DNA Core kit 10 (MP Biomedicals).

All primer sequences are available on request.

Determination of breakpoints characteristics

ABCR were classified according to the number of breakpoints assessed by WGS in simple rearrangements (2 break rearrangements), complex rearrangements (3 to 10 breakpoints) and chromoanagenesis (>10 breakpoints with clustering of the breakpoints).11 Disruption of gene, TADs and repeated elements were established according to UCSC (UCSC genes track), 3D Genome Browser (GM12878 cells25) and RepeatMasker, respectively. Molecular signatures included deletion, duplication, microhomology defined as a series of nucleotides (<70) that were identical at the junctions of the two genomic segments that contributed to the rearrangement26 and templated insertions defined as insertions originating from nearby segments that contributed to the rearrangement.27 The mechanisms were defined at the junction level according to the following criteria. Non-homologous end-joining was defined by the presence of blunt ends, small deletions or duplications, microhomology not exceeding 4 nucleotides, insertions of <10 nucleotides.28 Microhomology-mediated end-joining was considered in junction showing microhomology, deletions of >10 nucleotides and templated insertion of >10 nucleotides.28 Replicative mechanisms (including microhomology-mediated break-induced replication [MMBIR] and fork stalling and template switching [FoSTeS]) were considered in junction showing microhomology, templated insertions and possible gains or losses of nucleotides.26 Pathogenicity of ABCR was assessed according to the criteria used by Redin et al.13 For position effect, genes located in the same TAD as the breakpoint were taken into account.

Statistical analyses

Statistical analyses were performed using the R software V.3.3.3.29 The proportion of breakpoints disrupting genes, located in TADs or in repeated elements, were compared with reference proportions calculated from the following databases: UCSC genes, 3D Genome Browser (GM12878 cells25) and RepeatMasker using a standard one sample test to compare proportions. The distribution of breakpoints in the different classes of repeated element was compared with the proportion of the reference genome in the different classes of repeated elements using a Χ2 ‘goodness-of-fit’ test. Five classes were studied: SINE, LINE, LTR, DNA and satellite+segmental duplications. The correlation between the expression level and the absolute value of the distance to the breakpoint was estimated and tested with the R package rmcorr.30 The R package nlme was used to compare the mean expression level of the genes located in the same TAD as the breakpoint, in an adjacent TAD or in an inter-TAD region.

Results

Population characteristics

In the present study, 55 patients were included; there were 23 males and 32 females (sex ratio 0.7). They were aged between 1 and 44 years (mean 13.1 years, SD 9.6). They all presented with an abnormal karyotype (figure 1). In 49 patients, ABCR arose de novo; in 6 patients, ABCR was familial and cosegregated with the phenotype. Eighty-seven per cent of the patients (48/55) presented with isolated or syndromic intellectual disability. Seven patients (13%) had congenital malformations without intellectual disability. Details of patients' phenotype and karyotype are available in online supplementary table S1 and figure S1.

Figure 1

Study flow chart. ID, intellectual disability; CCR, complex chromosomal rearrangement; VUS, variants of unknown significance; WGS, whole genome sequencing.

Yield of paired-end whole genome sequencing for breakpoint characterisation

In order to characterise the breakpoint sequences of these rearrangements, a paired-end WGS strategy was applied. It allowed the detection of the rearrangements in 49/55 patients (89%) (figure 1, online supplementary data S1 and table S2). Chromosomal rearrangements were detected with the standard pipeline in 46 patients, including a mosaic translocation (MD/0110) and rearrangements involving alpha-satellite sequences (MD/0104) or segmental duplications (JP/0107, IL/1901). A second analysis (including hg38 alignment and/or focused analysis on breakpoint region defined by karyotype) was required to detect rearrangements in three additional patients (CM/0103, VD/2401, JE/1401). For two patients carrying complex SV (NM/0201, KT/1403), WGS was not able to resolve the entire complexity of the breakpoints. In six patients, WGS did not detect the chromosomal rearrangements (EB/0501, MG/1001, ML/1402, VL/1102, LD/0108, PS/0502). In all these problematic cases, at least one chromosomal breakpoint, as defined by karyotype, involved highly repetitive sequences or sequence gaps: short arms of acrocentric chromosomes (CM/0103, MG/1001, ML/1402, KT/1403), alpha-satellite regions (PS/0502), constitutive heterochromatin (EB/0501), subtelomeric regions (VL/1102, LD/0108) or segmental duplications (VD/2401, JE/1401, NM/0201).

In 9/49 patients, additional CNVs (loss and/or gain) were detected at the breakpoint. The CNV size ranged from 2 to 71 kb, which is smaller than the resolution of CMA in diagnostic setting. They were all confirmed by real-time qPCR (online supplementary table S3).

Moreover, WGS showed an additional level of complexity of SVs compared with karyotype. WGS detected 218 breakpoints whereas 119 breakpoints were expected from karyotype (+83.2%). Fifteen patients presented a complex SV (at least three breakpoints) after WGS (figure 1). For 14 of them, the level of complexity was unexpected (28.5% patients). The various degrees of complexity included a small inverted fragment at the breakpoints, cryptic insertions and chromoanagenesis events (figure 2 and online supplementary figure S2-S3). Overall, the karyotype breakpoints were revised in 29/49 patients (59%).

Figure 2

Characterisation of a reciprocal translocation t(1;14)(q32;q22) uncovered additional complexity that accounted for the patient’s phenotype (OL/2202). (A) Schematic representation of the rearrangement according to WGS result: insertion of chromosomal fragments of 153 and 736 kb from the 5q14.3 region to the breakpoint of derivative 1 and to the breakpoint of derivative 14, respectively. The black star indicates the localisation of MEF2C. Confirmation of the insertions by FISH on metaphase spread. (B) Insertion of 5q14.3 region in the derivative 1: RP11-484D1(5q14.3) (rhodamine) (BlueFish), 1pter (FITC) (Cytocell), 14qter (Texas Red) (Cytocell). (C) insertion of 5q14.3 region in the derivative 14: RP11-109H16 (5q14.3) (rhodamine) (RainbowFish), RP11-120I18 (14q12) (FITC), 1qter (Texas Red) (Cytocell). (D) Three-dimensional (3D)-genome map at the 5q14.3 locus derived from Hi-C data of GM1287825 (10 kb resolution). TAD, topologically associated domain (3D-genome browser); regulatory elements for MEF2C: defined according to GeneHancer, enhancer in grey, promoter in red; interactions between regulatory element and genes according to GeneHancer. Yellow bars indicate patients’ breakpoint in 5q14.3 locus. For patients OL/2202 and EB/0401, breakpoints disrupt interaction between enhancer and MEF2C. For patient MD/2203, proximal breakpoint disrupt MEF2C gene whereas the distal breakpoint has no consequence on regulatory elements. (E) Expression of MEF2C in blood cells for patients OL/2202 and EB/0401 compared with healthy controls. (F) Expression of MEF2C in lymphoblastoid cell line for patient MD/2203 compared with healthy controls.

Genomic features at the breakpoints

Different genomic features involved in SVs were assessed, including genes, TADs and repeated elements (table 1). Ninety-five of the 218 breakpoints disrupted a gene (43.5%). This proportion did not significantly differ from the proportion of genes in the reference genome hg19 (p=0.8374). Eighty-three different genes were disrupted, one gene being disrupted in two different patients (MBD5) and nine genes being disrupted by multiple breakpoints in the same patient (BC015590, CAMK1D, CCDC3, MEF2C, KANSL1, TENM2, CELF2, TTC23, PKN2, GBP3). A mean of 1.6 genes were disrupted per patient (0–7 genes per patient). TAD disruption was observed in 192/218 breakpoints (88%). This was not statistically different from the TAD proportion in the genome (p=0.2155). Seven TADs were disrupted in a recurrent manner, in at least two patients (chr2:148675000_149850000, chr5:88000000_90100000, chr10:79600000_82050000, chr14:46950000_49825000, chr14:94100000_94675000, chr14:97425000_99900000, chr15:35800000_38200000, hg19). Repeated elements, were involved in 110/218 breakpoints (50.4%), which was not significantly different from the proportion of repeated sequences in the genome (p=0.4227). Similarly, the distribution of breakpoints among the different families of repeated elements did not significantly differ from their distribution in the genome (p=0.607).

Table 1

Genomic features disrupted by chromosomal breakpoints

Thus, all these results suggest that DNA breaks at the origin of chromosomal rearrangements arose randomly and that there was no major influence of DNA architecture or repeated sequences.

Molecular signatures and mechanism of ABCR

We also studied the molecular signatures at junction sequences, including deletion, gain, insertion and microhomology, in order to infer the underlying mechanism at the origin of the rearrangements (table 2). Among the 49 rearrangements, 57.1% (28/49) were probably due to NHEJ and 10.2% (5/49) were likely to be the result of microhomology-mediated mechanisms, including replicative mechanisms, such as FoSTeS and MMBIR,31 and microhomology-mediated end-joining (MMEJ).32 It was not possible to attribute 20.4% of cases (10/49) to a specific mechanism. Interestingly, 12.2% of rearrangements (6/49) may have combined both NHEJ and microhomology-mediated mechanisms (online supplementary table S4).

Table 2

Molecular signatures at junction sequences

Phenotypic consequences of chromosomal rearrangements

We address the question of the contribution of ABCR to the phenotype of patients. To assess the pathogenicity of ABCR, we used the classification proposed by Redin et al.13 Sixteen ABCR were considered as pathogenic, 7 as likely pathogenic and 26 as variants of unknown significance (VUS) (table 3). Pathogenic and likely pathogenic ABCR accounted for the phenotype in 22 patients (44.8%). In patient DM/0109, disruption of PRDM16 was considered as a secondary finding as it did not account for his phenotype but may be responsible for left ventricular non-compaction (MIM615373). Fifteen patients with pathogenic ABCR presented disruption of well-known disease genes (KANSL1, FOXP1, SPRED1, MEIS2, MBD5, DMD, MEF2C, NRXN1, NFIX, AUTS2, SYNGAP1, GHR) or more recently described gene (TLK2, ZMIZ1).33 34 In 11 of these patients, expression studies could be performed and supported further the role of the putative causative gene in 7 patients (IS/0101, CL/2001, SL/0105, MH/1103, JP/0107, MD/2203, JB/1404) (see online supplementary figure S1). In patient CG/0106 showing an X-autosome translocation disrupting DMD, X-inactivation study showed a biased inactivation profile, supporting the role of DMD disruption in the phenotype, as previously described.35 For patients showing likely pathogenic position effect, the breakpoint lied within the same TAD as the putative causal gene (BCL11B, MEF2C, DLX5, SATB2, ZMIZ1) separated the gene from its regulatory element (figure 2), resulting in regulatory loss of function. It was further supported by expression studies showing decreased gene expression in five patients (VJ/0601, CS/1902, OL/2202, VD/2401, JE/1401) (online supplementary data S1). It is of note that for 6/23 patients the pathogenic breakpoint accounting for the phenotype was part of additional breakpoint complexity only visible after WGS (IL/1901, CS/1002, MD/2203, OL/2202, EB/0401, CF/2304).

Table 3

Apparently balanced chromosomal rearrangements with a phenotypic impact in the patients

We also identified four potential in-frame fusion transcripts in three patients: M1AP-MBD5 (SL/0105), CCDC3-CCSER1 (TS/1101), MECOM-DDX24 and DDX24-MECOM (OP/0701) (online supplementary figure S4). Three of them could not be amplified in blood or LCL samples due to tissue-specific expression pattern (M1AP-MBD5, CCDC3-CCSER1 and MECOM-DDX24). M1AP is mainly expressed in testis, CCDC3 is expressed in arterial tissues and MECOM is not expressed in blood (GTEx). We were able to amplify the DDX24-MECOM fusion. This fusion transcript retained almost all the coding sequence of MECOM (exons 2–17).

Identification of candidate genes in developmental disorders

We then searched to identify new candidate genes for developmental disorders. First, we looked among the disrupted genes, those that showed a high probability of loss of function intolerance (>0.85) suggesting they may be responsible for a phenotype by haploinsufficiency. Fourteen genes not previously reported in developmental disorders were identified (table 4). Then we observed that in 8/21 patients with pathogenic and probably pathogenic ABCR, the causal breakpoint involved one of the seven recurrently disrupted TADs defined above. We looked at the genes of the other recurrently disrupted TADs and were able to identify two more candidate genes (DDX24, MDGA2) (table 4). All these genes are predicted to be involved in different cellular processes and all showed cerebral expression (GTEx).

Table 4

Candidate genes identified through gene disruption or TAD disruption

Role of genome architecture in gene regulation

We then explored the general impact of breakpoints on the regulation of genes. In particular, we looked for an effect of the distance from the breakpoints36 and of genomic architecture (TADs) on gene expression level. For this purpose, we systematically explored the expression in blood or LCL of disrupted genes and genes at either side of the breakpoints. One hundred eighty-nine genes in 47 patients were studied. Regarding disrupted genes, relative mRNA ratios ranged from 0.239 to 1.6 in blood and from 0.187 to 2.007 in LCL. For genes in the vicinity of the breakpoints, relative mRNA ratios ranged from 0.258 to 6.821 in blood and from 0.644 to 1.996 in LCLs. No significant correlation was found between the mRNA ratio and the absolute value of the distance from the breakpoint in blood (repeated measure correlation coefficient r=−0.187, p=0.097) and in LCL (repeated measures correlation coefficient r=−0.0637, p=0.77). There was also no significant difference between the mean mRNA ratio for genes located in the same TAD as the breakpoint, genes located in an adjacent TAD or genes in inter-TAD regions (repeated measures analysis of variance, p=0.7581; online supplementary figure S5).

Discussion

In the present work, we developed a paired-end WGS approach to characterise ABCRs and demonstrated its efficiency in a clinical setting.

WGS has already proved to be powerful for breakpoint mapping of chromosomal rearrangements.10 11 13 14 16 19 Most of the studies used a mate-pair library or similar approaches, which are more cost-effective and more robust.11 13 14 37 These approaches are based on long insert sequencing and require less sequencing to achieve a better sequence coverage. They are also expected to overcome, at least in part, difficulties for mapping in repeated sequences and gaps. In the present study, the chromosomal rearrangement detection rate was similar to previous studies using mate-pair sequencing.13 The paired-end strategy was able to detect breakpoints in highly repetitive regions such as centromeric regions, segmental duplications and short arms of acrocentric chromosomes. Working with the latest version of reference genome, hg38, was helpful to minimise gaps.17 We were also able to characterise the breakpoints of a mosaic SV. Mosaicism detection from next-generation sequencing data has scarcely been considered in balanced SV and need to be further addressed.

The present study provided a diagnosis in nearly half of patients, stopping a long diagnostic odyssey. For these patients and their family, it improved medical management and allowed an informed genetic counselling. For a subset of patients, WGS was the only way to achieve this result, as the pathogenic breakpoint could not be inferred from karyotype alone. Although we opted for a low depth WGS in this study, which is sufficient for SV detection, the analysis strategy we developed would be easily applicable to classical 30X depth WGS. An advantage of the 30X paired-end WGS over mate-pair is that it provides the possibility of an overall approach combining SV, including balanced SVs and CNVs, and SNV detection. A recent study found that WGS strategy combining SNV and CNV analysis could reach a diagnostic yield of up to 62% in patients with severe ID.18 It was also pointed out that phenotype could result from variation on multiple loci in 4.9% of patients.38 Regarding patients with ABCR and abnormal phenotype, an extrapolation combining analysis of CNVs (40%), SV (40% of patients with no CNVs) and SNV (based on a yield of 39%)18 could reach a diagnostic yield of 78%. Concerning most patients with ID/MCA, karyotype is no longer performed systematically and CMA, used as first-tier diagnostic method, is not able to detect balanced rearrangements. It has been demonstrated that WGS could provide unbiased detection of previously unknown ABCR.39 In the future, as WGS could be considered as the initial test in ID/MCA, pipelines should include balanced SV detection as it would be the unique way to determine gene disruption. It is of note however that we were faced with a secondary finding in a patient for which the disrupted gene caused a disorder unrelated to his phenotype. This observation underlines the importance of complete pretest information including the possibility of secondary findings even if the analysis is focused on breakpoint mapping.40

In the 22 patients with a diagnosis, the pathophysiology mechanism was either gene disruption or position effect, both supposed to result in haploinsufficiency. In one patient, we identified a fusion transcript DDX24-MECOM that allowed preservation of almost all MECOM coding sequence. Chimeric genes have been rarely described in constitutional rearrangements.41 42 Their pathogenicity is not well established and could result from a new function, expression deregulation or dominant negative effect. In the present case, the phenotype of patient OP/0701 was not relevant with the phenotype of radioulnar synostosis with amegakaryocytic thrombocytopaenia (MIM616738) described in MECOM mutations. It cannot be excluded that deregulation of MECOM expression under the control of DDX24 promoter may contribute to the phenotype.

The organisation of the genome is not linear, and domains of preferential chromatin interactions, called TADs, have been recently mapped across the genome.25 Their disruption by SV has been shown to be involved in altered phenotypes.43 In the present study, TADs map help support the pathogenicity of breakpoints in seven patients. Two TADs, one involving MEF2C and the other SATB2, have already been described as recurrently disrupted in patients with ID.13 In the future, TAD maps may be integrated into tools for breakpoint annotation. This will be all the more possible as TAD mapping algorithms will improve44 and in silico modelling tools will become available.45 Nevertheless, it has to be noted that no systematic correlation between TAD disruption and altered gene expression could be demonstrated in the present study. Although expression studies were limited by the tissues available, blood and LCL, this result may highlight fine intra-TAD regulations, dependent on tissue and developmental stages.46

Regarding ABCR mechanisms, the distribution of the breakpoints suggests that non-recurrent rearrangements occur randomly; in particular, we did not show any enrichment for repeated elements. This may be explained by the fact that the present cohort did not include any patient with large CNV, more prone to be associated with repeated elements.13Second, some rearrangements hold molecular signatures suggesting that they may result from the combination of multiple repair mechanisms, in particular, combination of NHEJ and MMEJ. MMEJ, also called alternative NHEJ, is a double-strand break repair mechanism. It is associated with microhomology and larger deletion than NHEJ and may be activated when canonical NHEJ pathway is deficient.32 This observation needs to be further validated in a larger cohort.

In conclusion, the present study demonstrated the relevance of paired-end WGS for ABCR breakpoint characterisation and its contribution to diagnosis with a yield of 44.8%. This analysis should be systematically proposed to patients harbouring ABCR with abnormal phenotype and SV analysis should be included in WGS pipeline in clinical setting.

Acknowledgments

The authors would like to thank the patients and their families for their cooperation. The authors would like to thank Isabelle Rouvet and Emilie Chopin for lymhoblastoid cell line support (Cellular Biotechnology Center, Hospices Civils de Lyon, France), Lamia El-Amrani for clinical data collection (Clinical Investigation Center, Hospices Civils de Lyon, France) and Hélène Dessuant-Karageorgiou, Guillaume Jedraszak, Olivier Dupuy, Pascale Kleinfinger and Evan Gouy for providing karyogram photographs.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.

Footnotes

  • Contributors CS-B conceived the project, designed and coordinated the research, interpreted the data and wrote the paper. AA, FA, JA, SB, YC, PC, M-PC, CC, AC, FDem, FDev, MD-F, CD, SD-G, LF, A-MG, MH, BI, GJ-H, DL, AL, ML, GL, JL, AM-P, CMig, CMis, FM-P, SM, CP-R, SO, VP-F, LP, NP, MP, M-FP, FP, AP, MR, MasR, CS, MT, SV, AV, SW and PE provided patients phenotyping, collected clinical samples and enrolled the cohort. PC, BD, J-MD, BG, SJ, VK, CLC, JL, VM, MM-D, GN, JP, CR, VS, JP, RT, AT, A-CT, DS assisted design of the work and data interpretation. FD, NC, JM, LP conducted the experiments, analysed the data and contributed to the interpretation. P-AR-F, CB conducted bioinformatics analyses.

  • Funding This study was supported by the French Ministry of Health (DGOS) and the French National Agency for Research (ANR) (PRTS 2013 grant to CS-B, n° PRTSN1300001N).

  • Competing interests None declared.

  • Ethics approval This study was approved by the local ethics committee (CPP Lyon Sud-Est 06/04/2014).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

  • Patient consent for publication Obtained.