Article Text

Original research
Disruption of the topologically associated domain at Xp21.2 is related to 46,XY gonadal dysgenesis
  1. Jakob A Meinel1,
  2. Verónica Yumiceba2,
  3. Axel Künstner3,4,
  4. Kristin Schultz2,
  5. Nathalie Kruse2,
  6. Frank J Kaiser5,6,
  7. Paul-Martin Holterhus7,
  8. Alexander Claviez8,
  9. Olaf Hiort1,
  10. Hauke Busch3,4,
  11. Malte Spielmann2,9,
  12. Ralf Werner1,10
  1. 1Department of Pediatrics and Adolescent Medicine, Division of Pediatric Endocrinology and Diabetes, Universität zu Lübeck, Lubeck, Germany
  2. 2Institute of Human Genetics, Universität zu Lübeck, Lübeck, Germany
  3. 3Group of Medical Systems Biology, Lübeck Institute of Experimental Dermatology, Universität zu Lübeck, Lübeck, Germany
  4. 4Institute for Cardiogenetics, Universität zu Lübeck, Lübeck, Germany
  5. 5Institute of Human Genetics, Universität Duisburg-Essen, Duisburg, Germany
  6. 6Essen Center for Rare Diseases (EZSE), University Hospital Essen, Essen, Germany
  7. 7University Medical Center for Pediatric Endocrinology and Diabetes, Department of Pediatrics and Adolescent Medicine I, Universitätsklinikum Schleswig-Holstein, Kiel, Germany
  8. 8Department of Pediatrics and Adolescent Medicine I, Division of Pediatric Oncology and Hematology, Universitätsklinikum Schleswig-Holstein, Kiel, Germany
  9. 9Partner Site Hamburg/Kiel/Lübeck, German Center for Cardiovascular Disease, Berlin, Germany
  10. 10Institute of Molecular Medicine, Universität zu Lübeck, Lübeck, Germany
  1. Correspondence to Dr Ralf Werner, Department of Pediatrics and Adolescent Medicine, Division of Pediatric Endocrinology and Diabetes, Universität zu Lübeck, Lübeck 23562, Germany; ralf.werner{at}uni-luebeck.de

Abstract

Background Duplications at the Xp21.2 locus have previously been linked to 46,XY gonadal dysgenesis (GD), which is thought to result from gene dosage effects of NR0B1 (DAX1), but the exact disease mechanism remains unknown.

Methods Patients with 46,XY GD were analysed by whole genome sequencing. Identified structural variants were confirmed by array CGH and analysed by high-throughput chromosome conformation capture (Hi-C).

Results We identified two unrelated patients: one showing a complex rearrangement upstream of NR0B1 and a second harbouring a 1.2 Mb triplication, including NR0B1. Whole genome sequencing and Hi-C analysis revealed the rewiring of a topological-associated domain (TAD) boundary close to NR0B1 associated with neo-TAD formation and may cause enhancer hijacking and ectopic NR0B1 expression. Modelling of previous Xp21.2 structural variations associated with isolated GD support our hypothesis and predict similar neo-TAD formation as well as TAD fusion.

Conclusion Here we present a general mechanism how deletions, duplications or inversions at the NR0B1 locus can lead to partial or complete GD by disrupting the cognate TAD in the vicinity of NR0B1. This model not only allows better diagnosis of GD with copy number variations (CNVs) at Xp21.2, but also gives deeper insight on how spatiotemporal activation of developmental genes can be disrupted by reorganised TADs causing impairment of gonadal development.

  • sex determination processes
  • gene expression regulation
  • gene rearrangement
  • high-throughput nucleotide sequencing
  • sequence analysis, DNA

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Duplications at Xp21.2 have previously been linked to 46,XY gonadal dysgenesis (GD), and the aetiology has been attributed to enhanced NR0B1 gene dosage.

  • The study presents the first individual with 46,XY GD harbouring a Xp21.2 duplication excluding NR0B1.

WHAT THIS STUDY ADDS

  • We provide a novel model of how duplications, deletions or inversions at Xp21.2 with or without NR0B1 can lead to 46,XY GD by shuffling of topologically associating domains and enhancer hijacking.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • The findings of this study will improve future diagnostic practice in patients with 46,XY GD and help to elucidate the regulatory mechanisms of gonadal development induced by copy number variations.

Introduction

Mammalian sex determination is a time-dependent mechanism controlled by antagonising sex determination genes that direct the bipotential genital ridges to induce either ovarian or testicular development in any embryo. Gonadal dysgenesis (GD) as a diminished or absent reproductive system development occurs when there is an interference in timely expression of key genes and/or their appropriate gene expression levels are not reached. In the 46,XY embryo, GD leads to a difference/disorder of sex development (DSD) with a range of genital phenotypes from ambiguity of the external genitalia to female appearing genitalia, caused by the variable degrees of testicular failure. Even today, the genetic origin of GD remains elusive in a majority of up to 60% of cases.1

Thus far, genetic variants in more than 20 genes involved in sex determination have been described as monogenic causes of 46,XY GD. The genetic variants also include copy number variations (CNVs) of different loci, and the phenotype was explained by the altered allele number of candidate genes, that is, their gene dosage.2 Among these, NR0B1 (nuclear receptor subfamily 0, group B, member 1; also known as DAX1), is located within a 160 kb region termed dosage-sensitive sex reversal on chromosome Xp21.2.3 During the last two decades, eight non-syndromic 46,XY GD patients with duplications at the Xp21.2 locus have been described encompassing NR0B1.4–11 However, in only six of the cases, the approximate boundaries of the duplications were reported. In contrast, deletions or inactivating NR0B1 mutations in 46,XY patients cause adrenal hypoplasia congenita (AHC).12 Although individuals with AHC have a normal sexual development at birth, they present with hypogonadotropic hypogonadism at puberty (OMIM#300200).

In mice, Nr0b1 expression starts in the genital ridge in both sexes and is synchronised with Sry (sex determining region Y) expression. However, it is downregulated in the developing testis and persists in the ovary.13 14 It has been shown that high exogenous Nr0b1 expression in transgenic mice delays testis formation. Coexpression with a weak Sry allele even resulted in complete sex reversal,15 resembling the phenotype of 46,XY GD patients with NR0B1 duplications (OMIM #300018). This makes NR0B1 the most plausible candidate gene for 46,XY GD in the Xp21.2 region.

The six previously published and well-characterised duplications at Xp21.2 associated with non-syndromic 46,XY GD harbour two copies of NR0B1 indicating a gene dosage effect as the cause of GD. However, Smyk et al16 reported a case of 46,XY GD carrying a Xp21.2 deletion upstream of NR0B1 challenging the gene dosage hypothesis. This demands a novel explanation of the pathogenicity of structural variations (SVs) at this locus.

In recent years, the role of three-dimensional chromosome structure and their organisation in topologically associating domains (TADs) have described novel disease mechanisms.17 TADs are mega-base, non-randomly organised regions that insulate local chromatin interaction between regulatory elements and their cognate promoters.18 SVs disturbing TADs’ boundaries were shown to alter the architecture and enhancer–promoter interactions within different domains leading to deleterious effects on development17 19 20 and oncogenesis.21 22 These recent data indicate that SVs should therefore be investigated for gene dosage effects and must be interpreted in the three-dimensional (3D) genomic context.

Here we present two new cases of 46,XY GD, with CNVs at Xp21.2 including and excluding NR0B1. Analysis of the size and orientation of the SVs by whole genome sequencing (WGS) and their effect on TAD structure by high throughput chromosome conformation capture (Hi-C) provides strong evidence for TAD disruption as a possible cause of GD in patients with Xp21.2 SVs. Our data indicate that TAD disruption and Neo-TAD formation with subsequent ectopic enhancer adoption are the most likely causes of GD in patients with SV at the Xp21.2 locus.

Materials and methods

Patient description

Patient 1 (P1)

First presentation in the University DSD Center in Kiel and Lübeck occurred during early adolescence due to primary amenorrhoea and pubertal delay. The girl had been treated for 15 months due to hyperprolactaemia and a left-sided gonadal tumour (6×11×11 cm) was surgically removed 3 months earlier. Histology revealed dysgerminoma. Tanner stages were B4 and P4-5. However, breast development started only few months before clinical tumour diagnosis. External genitalia were entirely female without clitoromegaly, and a uterus with tubular configuration was present. The karyotype was 46,XY. Hormonal evaluation revealed basal luetinizing hormone (LH) 53 IU/L (normal range for 46,XY control males, Tanner 4, 1.2–3.4 IU/L) and follicle stimulating hormone (FSH) 94.3 IU/L (normal range for 46,XY control males, Tanner 4, 3.0–5.2 IU/L) increasing to 200 IU/L (normal range for 46,XY control males, Tanner 4, 12.2–29.4 IU/L) and 128 IU/L (normal range for 46,XY control males, Tanner 4, 4.9–9.6 IU/L), respectively, 30 min following 60 µg/m2 GnRH intravenously. Plasma oestradiol was 33 pmol/L (age-dependent reference interval 10.0–221.5 pmol/L for 46,XY control males and 10–507 pmol/L for 46,XX control females) (prepubertal for girls) and testosterone 4.26 nmol/L (age-dependent reference interval 0.1–17.6 nmol/L for 46,XY control males and 0.1–2.0 nmol/L for 46,XX control females) (midpubertal for males) (both determined by liquid chromatography tandem mass spectrometry (LC-MS/MS)).23 Plasma anti-Mullerian hormone (AMH) was 5.1 pmol/L (age-dependent range for 46,XY control males with Tanner 4–5, 48±14 pmol /L (±SEM)24), which is extremely low. Therefore, the clinical diagnosis of GD was established, and the patient underwent whole genome sequencing to investigate the molecular genetic aetiology of her GD. Laparoscopic gonadectomy of the right side was performed and showed gonadoblastoma with focal transition into dysgerminoma. Five years after initial diagnosis, the patient remains in first remission.

Patient 2 (P2)

This patient was born at term after an uneventful pregnancy, and genital status was described as unequivocal female. The family history is unremarkable, with healthy siblings. Because of muscular hypotonia, a chromosomal analysis was initiated and revealed a 46,XY karyotype. On ultrasound, a small prepubertal uterus was seen, but the gonads could not be visualised. Hormone analysis in early childhood showed a prepubertal status with inhibin B below the threshold and AMH at 3.0 pmol/L (age-dependent range for 46,XY control males, 499±66 pmol /L (±SEM),24), which was considered low for her age, compatible with a clinical diagnosis of GD. A laparoscopy was performed, and the gonadal tissue was removed. The histology revealed a gonadoblastoma of the right gonad, while the left side was purely stromal tissue. Quantitative PCR was suggestive of a copy number gain at Xp21.2. The patient was seen several times and developed epilepsy in middle childhood.

Whole genome sequencing

WGS was performed to identify deleterious point mutations and indels as well as SVs and to fine map the breakpoints of the identified duplications and triplication at Xp21.2 in both patients. P1 was sequenced as a trio here and P2 as a singleton. Sequencing libraries were constructed from 1.0 µg DNA per sample using the Truseq Nano DNA HT Sample preparation Kit (Illumina, USA) following the recommendations of the manufacturer. Genomic DNA was randomly fragmented to a size of approximately 350 bp by Covaris cracker (Covaris, USA). DNA fragments were blunted, A-tailed and ligated with the full-length adapter for Illumina sequencing with further PCR amplification. Libraries were purified using AMPure XP (Beckman Coulter, USA), analysed for size distribution on an Agilent 2100 Bioanalyser (Agilent Technologies) and quantified by qPCR. Paired end sequencing was performed on Illumina HiSeq platforms (Illumina). Per sample more than 90 GB of raw data were obtained, resulting in an average genomic read depth of 30×.

Bioinformatics

Sequencing reads were mapped to human reference genome version GRCh37/hg19 using Burrows-Wheeler Aligner.25 Resulting mapping files were screened for duplicated reads applying Picard tools MarkDuplicates version 1.111 (Picard: http://sourceforge.net/projects/picard/). Split-reads and discordant paired-end alignments were extracted using SAMtools V.0.1.18.25 SNPs and InDels were called using HaplotypeCaller as implemented in Genome Analysis Toolkit V.3.8.0,26 with standard parameters. Detection of SVs was performed using DELLY.27 Variations were annotated using ANNOVAR.28

CNV validation

For validation of CNVs identified through WGS and qPCR, array-based comparative genomic hybridisation (aCGH) was performed. DNA of P1, P2 and the mother of P2 was hybridised to an Agilent 180K aCGH (Agilent Technologies, Inc) and was either compared with a pooled sample of 10 normal males (P1 and P2) or 10 normal females (mother of P2). Data were analysed using CytoGenomics Software V.4.0.2.21 (Agilent). CNVs were analysed using the Database of Genomic Variants (DGV, Version CNV_DGV_hg19_v4, Toronto, Canada).

Preparation of Hi-C libraries

Lymphoblastoid cell lines (LCLs) were established by Epstein-Barr virus transformation of leucocytes from peripheral blood samples of P1 as well as of controls. Fibroblast cell lines were established from a skin biopsy of the mother of P2.

In situ Hi-C libraries were processed as described previously,29 with minor modifications. Briefly, ~1 million cells were harvested, and genomic material was cross-linked with 2% of formaldehyde (PanReacAppliChem, A0877) in intact nucleus as above. gDNA from lysed cells was digested with a total of 200 U of DpnII enzyme (New England BioLabs (NEB), R0543) at 37°C, split in 2 intervals of 30 min each. The 5′ overhang of restricted fragments were filled in with biotin-14-dATP (Thermo Fisher Scientific, 19524016) plus dCTP, dGTP and dTTP (NEB, N0446) all at 0.3 mM. The resulting blunted-ends were ligated overnight at 4°C with 2000 U of T4 DNA ligase (NEB, M0202). DNA samples were reverse cross-linked with 25 µL Proteinase K (QIAGEN, 19131) and 1% SDS at 55°C for 30 min and 4-hour incubation with 0.5M NaCl (final concentration) at 68°C. DNA was subsequently purified by ethanol precipitation at 4°C. Hi-C libraries were prepared by shearing ~3 µg of DNA with Bioruptor Pico (Diagnode) to obtain fragments between 300 and 700 bp (12 cycles of 20s on, 60 s off each cycle). Sonicated and biotin-filled in fragments were pulled down using 150 µL Dynabeads MyOne Streptavidin T1 beads (Thermo Fisher Scientific, 65602). The DNA ends were repaired using 12 U of T4 DNA polymerase (NEB, M0203), 5U of Klenow fragment of DNA polymerase I (NEB, M0210) and 50 U of T4 Polynucleotide Kinase (NEB, M0201). Hi-C libraries were processed according to the NEBNext Multiple Oligos kit (E7335): adding first the adaptors and later the indexes through PCR amplification on beads (four to six cycles) using the NEBNext Ultra II Q5Master Mix (NEB, M0544). Double size selection (0.55X and 0.7, respectively) was carried out using Agencourt AMPure XPbeads (Beckman Coulter, A63881) to clean up the PCR products. Finally, Hi-C libraries were qPCR quantified by NEBNext Library Quant Kit (NEB, E7630) and sequenced (~200 million fragments) in a 150 bp paired-end run on a NextSeq2000 (Illumina).

Hi-C bioinformatic analysis

Interaction maps were generated using the HiC-Pro pipeline V.3.0.0 in parallel mode. The pipeline was set up with access to Bowtie2 V.2.3.5.1, Samtools V.1.9, R V.3.6.3 and Python V.3.7.6. Ligation sites for DnpII enzyme were generated with the HiC-Pro utils script digest genome using hg19 as reference.

Paired-end sequencing data were aligned with Bowtie2, parameters were HiC-Pro default, that is, very sensitive (modified seed length 30 for global alignment), score-min L, −0.6 to –0.2 and end-to-end. Reads were mapped to hg19. Singletons and reads with MAPQ <10 were discarded, and duplicates were removed. The resulting valid pairs were converted to Juicer format using the HiC-Pro utility script hicpro2juicebox, which uses Juicer Tools V.1.22.01. Juicer Tools was then used to add Knight and Ruiz (KR) matrix balancing. Since the normalisation assumes equal visibility of all loci and would distort the display of CNVs, we used raw interaction counts the locus of interest.

Genome-wide Hi-C maps were visualised using Juicebox (V.1.11.8). Intrachromosomal TADs in chromosome X were extracted using straw library function at a resolution of 10 kb and visualised as a heatmaps rotated 45°. Hi-C maps were compared with sex and cell-type matched controls. ChIP-seq ENCODE data for CTCF (human mammary fibroblast and GM12864; B-lymphoblastoid cell lines) were overlapped with heatmaps.

All chromosomal positions in this paper are according to GRCh37/hg19.

Results

We generated whole genome data from two patients with GD and previously undescribed SVs at the Xp21.2 locus (figure 1). The SVs in both patients differed in size, complexity and their direct inclusion of NR0B1. Whereas P1 has a complex rearrangement of two duplications and two small deletions in the proximity of NR0B1, P2 carries a triplication including NR0B1. Copy number gains and losses were verified by aCGH (online supplemental figures S1 and S2) and WGS split reads at CNV borders were used to construct continuous breakpoint sequences. Thus, determining orientation and location of inserted copy number gains and losses (online supplemental figures S3 and S4). Breakpoint sequences were verified by Sanger sequencing (described in online supplemental). WGS also excluded any other known genetic cause of 46,XY GD in the patients. In both patients the SVs were maternally inherited.

Supplemental material

Figure 1

TAD structure at Xp21.2 and location of copy number variations (CNVs). Hi-C analysis identified a NR0B1 TAD containing IL1RAPL1, MAGEB1-4 and NR0B1, followed by a second TAD including TASL, GK and TAB3. A third TAD contains the FTHL17 and DMD genes (dashed lines). Below a schematic representation compares NR0B1 locus copy number variations in patients presented in this study with those previously described in the literature and associated with 46,XY gonadal dysgenesis. The green bars represent the duplications and the triplication described in the patients of this study (P1 – P2), and the blue bars represent duplications previously reported by other researchers.4–10 All previously reported duplications include the NR0B1, TASL (CXorf21) and GK gene. The yellow bar marks the deletion reported by Smyk et al.16 There is a mutual 195 kb region present in all Xp21.2 copy number variations associated with 46,XY GD, highlighted by the purple box. Hi-C, high throughput chromosome conformation capture; TAD, topological-associated domain.

P1 WGS revealed two major duplications and two small deletions at Xp21.2; one 389 kb duplication maps to a region downstream of NR0B1 containing the MAGEB (MAGE family member B) genes 1–4 and a part of IL1RAPL1 (interleukin 1 receptor accessory protein like 1). The second 447 kb duplication, containing TASL and GK and the 3′-part of TAB3, maps upstream of NR0B1. Between the two duplications two small deletions of 2.7 kb and 2.2 kb flank an inverted region of 1.2 kb (online supplemental figure S3). Both duplications are inserted upstream of NR0B1. Notably, the 447 kb duplication encompassing TASL, GK and the TAB3 fragment is inserted proximal to NR0B1 in an inverted position. The 389 kb duplication of MAGEB1-4 and the IL1RAPL1 fragment is inserted further upstream in the orientation of the reference sequence. WGS of both parents established that the SV was maternally inherited.

Investigation of other known DSD candidate genes in P1, revealed only one rare variant (MAF<0.01) of unknown significance in ZFPM2 (zinc finger protein, FOG family member 2) transmitted by the mother. ZFPM2 variants are associated with abnormalities in testis determination; however, the patients SNV (dbSNP:rs202217256) has an average population frequency of 0.004 across all populations30 and is listed as benign by ClinVar Miner (supplemental Table S1). Considering an incidence of 46,XY GD of around 1.5 per 100 000,31 this SNV could be catalogued as a rare polymorphism, rather than a relevant pathogenic variant leading to GD.

P2 harbours a Xp21.2 triplication initially identified through qPCR copy number detection (online supplemental methods). In contrast to P1, this CNV includes the NR0B1 gene. The 1.24 Mb triplication includes the genes MAGEB1-4, NR0B1, TASL, GK, TAB3 and part of IL1RAPL1. The triplicated segments are arranged in tandem and are separated by a 49 bp insert (online supplemental figure S4).

Analysis of known DSD candidate genes revealed only SNVs reported as benign or likely benign in ClinVar. Except from a missense variant (dbSNP: rs367855747) in oestrogen receptor 2 (ESR2), with a clinical significance not discovered so far. However, recently homozygous and heterozygous ESR2 mutations have been described in the context of 46,XY partial and complete GD.32 The effect of this particular heterozygous SNV remains unclear, but in the context of widely agreed association of NR0B1 copy number gains and 46,XY DSD, this remains a subordinate factor in the aetiology, if at all.

Analysis by qPCR, aCGH (online supplemental figure S5) and Sanger sequencing of the patient’s mother identified her to carry a tandem duplication similar in size to P2s triplication. The mother agreed to donate fibroblasts, which were subsequently used for Hi-C analysis.

Hi-C analysis of P1 and P2

We performed Hi-C in patient fibroblast and LCLs to investigate the effects of the SVs on chromatin structure and 3D genome organisation. Hi-C maps of the NR0B1 locus revealed TADs delimited by its boundaries and insulating genes in chromatin domains (figure 1). The NR0B1 TAD contains IL1RAPL1, MAGEB1-4 and NR0B1. The contiguous TAD includes TASL, GK and TAB3. While the third TAD (centromeric direction) contains the FTHL17 and DMD (figure 1). In P1 LCLs, Hi-C maps shows duplications as intense signal rising from the bottom and deletions as a loss of signal and white V-shape in between the duplications (figure 2). More importantly, the inverted fragment showed strong interactions between NR0B1 and the region upstream of TASL indicating ectopic contacts with potential enhancer elements (marked in a circle in figure 2). The Hi-C data showed the formation of a novel chromatin domain (neo-TAD).

Figure 2

Hi-C analysis and representation of structural variation in P1. On the upper panel, the Hi-C maps from a male control LCLs and P1 patient’s LCLs (10 kb resolution), shows the 3D architecture of the Xp 21.2 locus in absence and presence of SVs, respectively. The subtraction map from control versus P1 revealed the contact differences. Note the interaction between upstream NR0B1 and upstream TASL, marked with a dashed circumference. Genes (black boxes; hg19) are displayed at the bottom as well as the TADs organisation represented as grey-white bars on the track below. At the lower panel, there is a representation of the wild type TADs and the outcome after the complex rearrangement: one shuffled TAD and one neo-TAD. The genes drawn as grey squares were not duplicated completely (TAB3 and IL1RAPL). The potential enhancer is represented as a red oval, and its interaction with the gene is exhibited with an arrow. Hi-C, high throughput chromosome conformation capture; SVs, structural variations TAD, topological-associated domain.

In figure 3, P2s mother’s fibroblast Hi-C displayed a 1.2 Mb tandem duplication, explained by the intense interaction between the beginning and end of the duplicated region. The map clearly shows the formation of a neo-TAD where ectopic pathogenic interactions could take place, as reported in other cases.33 34

Figure 3

Hi-C analysis and representation of structural variation in mother of P2. Hi-C map from a female control and the P2 mother’s fibroblasts with a 1.2 Mb duplication. The Hi-C subtraction between control and P2 mother indicates the increase in contacts within the duplication. The arrow in the subtraction map indicates the interaction between beginning and end of the duplication, which indicates this SV is in tandem. Below there are two tracks, the first displays the genes and the second the TADs organisation. In the schematic representation, its depicted that the duplication creates a neo-TAD containing: TASL, GK, TAB3, IL1RAPL1 partially, MAGEB1-4 and NR0B1. The potential enhancer (red oval) and non-cognate gene (NR0B1) interaction is shown with an arrow connection. Hi-C, high-throughput chromosome conformation capture; TADs, topological-associated domain.

Form these Hi-C data, it became evident that both our cases show a minimal overlap of 195 kb (chrX: 30 401 819–30 596 386; GRCh37/hg19) harbouring a TAD boundary (chrX: 30 510 000–30 530 000; GRCh37/hg19 figure 1). The duplication of this boundary localises NR0B1 in the vicinity of the genes and regulatory elements of the neighbouring TAD. To investigate the potential effect of all previously described SVs at the Xp 21.2 locus we modelled the effect on 3D genome architecture and TAD architecture. Remarkably, all previously reported duplications4–9 (online supplemental figure S6), the deletion by Smyk et al,16 and our two cases show an overlapping minimal critical region not including NR0B1 but the TAD boundary next to it (figure 1). These data support the hypothesis that instead of gene dosage, enhancer hijacking could be the underlying causes of GD in patients with 46,XY karyotype and Xp21.2 SVs.

Discussion

In our study, we report on the first 46,XY GD patient with Xp21.2 duplications, excluding NR0B1, the strongest candidate gene related to the phenotype. This case presents a complex duplication and inversion at the Xp21.2 locus. Hi-C data show that the duplication and the inversion include a TAD boundary and lead to the formation of a neo-TAD and ectopic chromatin contact between NR0B1 and its neighbouring domain including TASL and several predicted enhancer elements. Our data, together with the recent report of a deletion,16 of this TAD boundary questioned that gene dosage effects of NR0B1 alone are the underlying disease mechanism of this unique cause of GD. Instead, we propose that the duplications, the inversion and the deletion all lead to the rearrangement of a TAD boundary and result in enhancer hijacking between NR0B1 and several predicted enhancer elements in the TASL TAD (online supplemental figure S7).

Our findings are in line with several recent studies showing that SVs can rewire the complex 3D chromatin architecture of a locus by deleting or repositioning regulatory elements and/or TAD boundaries, leading to ectopic enhancer–promoter interactions and ultimately pathogenic effects on development,17 19 20 33 35–37 and oncogenesis.21 22 To our knowledge, we performed the first Hi-C analysis of Xp21.2 SVs and were able to show how SVs altered TAD architectures when compared with normal healthy controls. In normal controls, NR0B1 is in a separate TAD from neighbouring TASL and GK with their respective promoter and predicted enhancer regions (online supplemental figure S7). This TAD boundary is highly conserved across many different cell lines (online supplemental figure S8). Disrupting the TAD boundary can shuffle genes and their specific regulatory elements and potentially allowing enhancer adoption of the TASL and GK enhancers by NR0B1.

Enhancer hijacking of at the NR0B1 locus could lead to upregulation or aberrant spatial-temporal expression of NR0B1, which in consequence would result in decreased SF1-mediated SOX9 expression and impaired Sertoli cell differentiation and hindering successful testis development.38 This could be a shared disease mechanism among all recently published Xp21.2 CNVs,4–7 9 16 39 in the context of GD as they share a common 195 kb overlap region crossing the TAD boundary causing the formation of neo and shuffled TADs.

Nevertheless, further research is needed to definitively identify the enhancer regions that may be hijacked by NR0B1 after TAD disruption. Besides, in the selected region of TAD boundary, there are several CTCF binding sites, so it will be worth narrowing down the region for instance by inserting them in reverse orientation and evaluating TADs alterations. This is based in the knowledge that convergent CTCF sites are pairing at TAD boundaries. It would also be worth uncovering the exact genomic insertion of the other reported SVs at this locus, as well as more Hi-C data from GD patients to prove the truly unifying nature of this mechanism. Furthermore, it should be noted that this TAD disruption may also affect the MAGEB1-4 genes, which are predominantly expressed in germ cells. A synergistic effect of NR0B1 and MAGEB1-4 dysregulation in the aetiology of 46,XY GD cannot be excluded.

In both patients, the SVs were maternally inherited, and P2 has healthy siblings. Maternal inheritance of an Xp21.2 duplication in GD is well known. No effects of reproductive development and function were reported so far. NR0B1 duplications do not seem to impair ovarian function.5 6 40 41

Despite considerable advances in the understanding of sex development, the genetic aetiology of many 46,XY GD patients still remains unclear.2 42 Our research emphasises that, besides unidentified genes in the gonadal developmental pathway, this may be due to neglected non-coding and/or regulatory elements, which explain the molecular basis of the GD phenotype. This highlights the relevance of identifying the position and orientation of SVs to deduce new enhancer-genes arrangements and improve our knowledge in genotype–phenotype relationships in this rare disease.

Our data further show the potential diagnostic power of whole genome sequencing to deliver detailed data on non-coding regions alterations, which allows accurate breakpoint identification of SVs through interrogation of individual reads. However, Hi-C has proven to be a sensitive tool for SVs detection in the clinical setting,34 and in our case, aided in TAD recognition in the target locus and guide our prediction on the outcome of the SVs in a 3D context.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by Ethics Committee of the Universitiy of Lübeck ID: AZ 08-081, Investigation of the molecular pathogenesis and pathophysiology of disorders of sex development. ID: AZ 17-219. Ethics Committee of the Christian-Albrechts-Universität zu Kiel ID: D 410/08. Participants gave informed consent to participate in the study before taking part.

Acknowledgments

The authors are truly grateful to the patients and families of these patients who cooperated in this study. HB and AK acknowledge computational support from the Lübeck OmicsCluster. This work is part of the doctoral thesis of JAM.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • JAM and VY are joint first authors.

  • JAM and VY contributed equally.

  • Correction notice This article has been corrected since it was first published to indicate co-first authorship.

  • Contributors Conceptualisation: RW; conceived model: JAM, VY, MS and RW; manuscript draft: JAM, VY, MS and RW; editing and review of manuscript: FJK, P-MH, AC, OH, HB, MS and RW; funding acquisition: OH and HB; experiments/data generation: JAM, VY and RW; bioinformatics: AK, HB, KS and NK; figures and visualisations: JAM, VY and RW; patient recruitment: PMH, OH and AC; guarantor: RW.

  • Funding This work was funded by financial support from Bundesministerium für Bildung und Forschung BMBF (01DQ17004) and Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy (EXC 22167-390884018).

  • Disclaimer The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.