Background Early-onset scoliosis (EOS), defined by an onset age of scoliosis less than 10 years, conveys significant health risk to affected children. Identification of the molecular aetiology underlying patients with EOS could provide valuable information for both clinical management and prenatal screening.
Methods In this study, we consecutively recruited a cohort of 447 Chinese patients with operative EOS. We performed exome sequencing (ES) screening on these individuals and their available family members (totaling 670 subjects). Another cohort of 13 patients with idiopathic early-onset scoliosis (IEOS) from the USA who underwent ES was also recruited.
Results After ES data processing and variant interpretation, we detected molecular diagnostic variants in 92 out of 447 (20.6%) Chinese patients with EOS, including 8 patients with molecular confirmation of their clinical diagnosis and 84 patients with molecular diagnoses of previously unrecognised diseases underlying scoliosis. One out of 13 patients with IEOS from the US cohort was molecularly diagnosed. The age at presentation, the number of organ systems involved and the Cobb angle were the three top features predictive of a molecular diagnosis.
Conclusion ES enabled the molecular diagnosis/classification of patients with EOS. Specific clinical features/feature pairs are able to indicate the likelihood of gaining a molecular diagnosis through ES.
- clinical genetics
- molecular genetics
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Early-onset scoliosis (EOS), defined as the clinical presentation of scoliosis before 10 years of age, can be associated with progressive pulmonary compromise and severe dysmorphic skeletal appearance if left untreated. The deformed spine results in impaired pulmonary function and the impairment or inability to perform activities of daily life, which is an extremely debilitating type of disease in the paediatric age group.1
The aetiology of EOS includes congenital scoliosis (CS) due to structural spine defects, neuromuscular scoliosis (NMS), syndromic conditions and idiopathic early-onset scoliosis (IEOS).2 3 However, it is difficult to provide a definite aetiological classification or diagnosis solely based on clinical evaluation, especially for syndromic EOS with variable expressivity, such as Ehlers-Danlos syndrome (EDS)4, neurofibromatosis (NF),5 and other neuromuscular conditions like Charcot-Marie-Tooth disease that can have an age-dependent penetrance.6 Identification of the molecular aetiology underlying patients with EOS could provide valuable information for both clinical management and prenatal screening.
In previous studies, we identified that TBX6 compound variants, a rare loss-of-function allele and a common non-coding haplotype could explain around 9.6% of CS cases.7–9In addition to TBX6, variants in DDR2 10 (spondylometaepiphyseal dysplasia, Mendelian Inheritance in Man [MIM]: 271665), FLNB 11 (spondylocarpotarsal synostosis syndrome, MIM: 272460), RUNX2 12 (cleidocranial dysplasia, MIM:119600) and numerous other genes are thus far known to cause early-onset scoliotic phenotypes. Moreover, for other diseases and loci, only certain alleles or genotypes may be more likely to cause the EOS phenotype (e.g., homozygous CMT1A duplication and heterozygous CMT1A triplication).6
In this study, we aim to investigate the diagnostic performance of exome sequencing (ES) in patients with EOS and to dissect the genetic architecture of the clinical entity of EOS. We report the molecular diagnostic rates by ES among two cohorts: (1) a Chinese cohort ascertained and studied in Beijing (China) and consisting of 447 patients clinically diagnosed with EOS (encompassing 670 subjects) who underwent orthopaedic surgery and (2) a Texas (US) cohort that consisted of 13 patients with IEOS. The potential association between clinical phenotypical features and molecular diagnostic status of the patients was also investigated.
In the Chinese cohort, we consecutively recruited 447 patients with EOS of Chinese Han ethnicity who underwent spinal surgery at Peking Union Medical College Hospital from 2009 to 2016 as a part of the Deciphering disorders Involving Scoliosis and COmorbidities study (http://www.discostudy.org/). Samples on all probands and closest relatives (kinships) for 105probands were available, including six parents (male and female) also affected by scoliosis, presenting a potential dominant disease trait pattern (online supplementary figure S1). The patients/research subjects who denied a family history were counted as sporadic cases. Physical examination, X-ray, CT and MRI were performed on each patient to give a prior clinical diagnosis2: (1) CS (congenital scoliosis type I, vertebral malformations [CS I]; congenital scoliosis type II, segmentation defects [CS II]; and congenital scoliosis type III, mixed type [CS III]); (2) NMS; (3) syndromic EOS; and (4) IEOS (online supplementary methods).
ES, data processing and analysis
For the Chinese cohort, ES was performed on DNA extracted from peripheral blood from 342 assumed sporadic singleton cases and 105 cases along with their family members, totaling 670 subjects (online supplementary table S2). The detailed sequencing process and in-house developed Peking Union Medical College Hospital Pipeline14 are described in the online supplementary methods.
US Texas cohort
For the US cohort, ES of 42 samples from 13 EOS families were performed at the Human Genome Sequencing Center at Baylor College of Medicine through the Baylor-Hopkins Centre for Mendelian Genomics initiative. Detailed sequencing methods are described in the online supplementary methods.
For both cohorts, interpretation of single-nucleotide variants (SNVs) and insertions/deletions (indel variant alleles) was adapted from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP) guidelines15 (online supplementary methods) and demonstrated in online supplementary figure S2. Variant pathogenicity, expected mode of disease inheritance (eg, autosomal dominant (AD) for monoallelic and autosomal recessive (AR) for biallelic variants) and patient phenotype were taken into consideration. Potential clinically actionable secondary findings from 59 ACMG-recommended genes were analysed.16
Copy number variant (CNV) allele calling from ES data
For both cohorts, CNVs were computed from ES data using two independent software packages: XHMM17 and Copy Number Inference from Exome Reads.18 Detailed calling and analysis methods are described in online supplementary methods and online supplementary figure S3. Identified pathogenic CNVs were validated using comparative genomic hybridisation microarray (aCGH), as described in online supplementary methods.
Diagnostic findings in the Chinese cohort of operative EOS
In the Chinese cohort, 447 patients with unrelated EOS of Chinese Han ethnicity were recruited, including 342 singletons and 105 cases with first-degree relative samples (table 1). Six families have more than one generation affected by scoliosis, presenting a potential dominant (AD or X-linked [XL]) pattern (online supplementary figure S1). Among the recruited patients, most have congenital (ie, structural) scoliosis; NMS, syndromic conditions and presumed IEOS are less common. The average age at presentation is 3.1 years, and 217/447 (48.5%) are male (table 1).
After ES data processing and variant interpretation, we detected molecular diagnostic variants in 92/447 (20.6%) Chinese patients with EOS (table 1), encompassing 33 disease-causing genes and five genomic regions (table 2).
In the CS group, 79/424 (18.6%) patients had identified a molecular diagnosis and potential molecular aetiology, including 41 (9.7%) patients with TBX6-associated congenital scoliosis (TACS) and 38 (9.0%) patients with other pathogenic variant alleles/CNVs (table 1). The molecular diagnostic rate was higher in the CS I (45/130, 34.6%) patient population than that of either CS II (7/41, 17%) or CS III (27/253,10.7%), due to the fact that TACS is predominantly observed in CS I cases (35/130, 26.9%) (online supplementary table S3). For NMS, pathogenic variants in TTN, LMNA and RYR1 were found in three out of five patients. As anticipated, eight patients with clinically diagnosed syndromic EOS, including four patients with neurofibromatosis type 1 (NF1), three patients with achondroplasia (ACH) and one patient with EDS, were all solved by the identification of a specific variant allele in disease genes (NF1, FGFR3 and PLOD1) which could explain their complete phenotypes (table 2). Interestingly, one of the patients with NF1 was found to have a 573 kb deletion CNV spanning the NF1 gene (table 1), which would not be detected using a single-gene test and might be missed by clinical exome sequencing (cES) that can sometimes focus exclusively on SNV alleles. In the IEOS subgroup, causative variants in 2 out of 10 patients were detected (table 1).
TBX6-associated scoliosis (TACS)
TACS, defined by a combination of a heterozygous 16p11.2 deletion/TBX6 null variant and a hypomorph allele in trans,7 8 accounts for 41/422 (9.7%) CS patients in this study, which is consistent with the result of our previous study7–9 and recent studies of multiple cohorts.20 21
Of the 41 patients with TACS, 31 harboured a 16p11.2 deletion CNV identified by computational CNV analysis and validated by additional experiments (online supplementary table S4), while the other 10 were carrying TBX6 truncating alleles (nonsense, frameshift or canonical splicing site variants) (online supplementary table S4). All 41 patients carried the T-C-A (rs2289292, rs3809624 and rs3809627) haplotype (frequency=44.4%, Chinese Han population) in trans with the 16p11.2 deletion or null allele (online supplementary table S4), consistent with our previous findings regarding the compound inheritance of TACS.7–9
From the clinical phenotypical perspective, all (41/41) of the patients presented with hemivertebra or butterfly vertebra at the lower half of the spine, which are in accordance with the reported clinical phenotypical characteristics of TACS and the animal model with Tbx6 compound inheritance.7 8 22
Pathogenic SNVs/indels identified by ES
In addition to the contribution of TACS to thediagnostics, 36 pathogenic and 21 likely pathogenic SNVs/indels in 33 genes were identified in 47 patients in the Chinese cohort (online supplementary table S5).
Variant types include missense (32/57), nonsense (6/57), frameshift (12/57), in-frame-indel (1/57) and splice site (6/57) alleles. Disease traits in the 47 patients included AD (34/47), AR (10/47), and X-linked (3/47) (table 2 and online supplementary figure S5). Of the 34 monoallelic AD conditions, 12 variants arose as de novo variants as confirmed by Sanger sequencing of family trios (online supplementary figure S6); 8 variants were inherited, including 5 inherited from affected parents and 3 from asymptomatic parents without apparent parental mosaicism, implicating incomplete penetrance. This conclusion could not be made for sure due to the lack of radiographs from parents; the remaining 16 were identified in proband-only cases with unknown inheritance. In addition, dual diagnoses resulting in blended phenotypes23 were found in 2/93 (2%) patients with pathogenic variants and molecular diagnostic findings (online supplementary table S8).
Of the 34 disease-causing genes, only 11 genes (TBX6, NF1, RUNX2, RYR1, COL5A2, PLOD1, MYH3, TRPV4, FGFR3, FLNB and CHD7) were implicated in more than one unrelated patient (table 2), suggesting that the genetic heterogeneity underlying EOS is rather substantial. Critical biological processes such as skeletal system development, extracellular matrix organisation and ossification were identified by Gene Ontology enrichment from analysis of the 34 causative genes (online supplementary figure S7).
ES-based CNV analysis
ES-based CNV prediction was performed by computational tools and confirmed by array comparative genomic hybridisation (aCGH, online supplementary methods). Besides 16p11.2 deletions, pathogenic CNVs were identified in five patients, including two with 16p13.1 duplications, one with 22q11.2 deletion, one with 5q35.3 deletion and one with 17q11.2 deletion (online supplementary table S6, figure S8). NF1 likely acts as a key driver gene within the 17q11.2 deletion CNV, given the NF1 phenotype in this patient (XH821). XH821 had cafe-au-lait spots at birth, left inguinal hernia and mild intellectual disability, a more severe phenotype compared with the other three patients with NF1 solved by NF1 SNV alleles (online supplementary table S6). The patient with the 5q35.3 deletion, the leading cause of Sotos syndrome-1 (MIM: 117550) in reported Japanese patients,24 had congenital heart disease, macrocephaly and rapid growth, consistent with the driver gene being NSD1 (online supplementary table S6). We are unable to localise the causative genes for 16p13.1 duplication and 22q11.2 deletion, but their pathogenic roles and causal relationship with scoliosis are supported by other cases from the Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources or DECIPHER.25
Clinically relevant secondary findings
Secondary findings involving the 59 ACMG-recommended genes were identified in nine patients.16 Relevant variants were identified in seven genes, including BRCA1, BRCA2, APC, RET, TSC2, COL3A1 and LDLR (online supplementary table S7). Notably, RYR1 is one of the 59 ACMG-recommended genes for secondary findings but in the present cohort was identified as a primary finding (ie, related to the scoliosis phenotype) in three cases (online supplementary table S5). In addition, one patient (XH984) with hyperglycaemia was found to carry a pathogenic GCK variant that could lead to glucokinase-maturity-onset diabetes of the young (MIM: 125851) (online supplementary table S7).
Diagnostic yield in the US cohort of IEOS
For the US cohort, 1 (non-Hispanic white) of the 13 patients received a molecular diagnosis of Sotos syndrome (MIM: 117550) caused by a de novo heterozygous frameshift variant: c.2051_2055TAAAGdel in NSD1 (online supplementary figure S4). This patient presented with premature birth, mild developmental delay, atrial septal defect and ventricular septal defect. Interestingly, one patient in the Chinese cohort who was first diagnosed with CS was also found to carry a 5q35 deletion encompassing NSD1 and thus was molecularly diagnosed with Sotos syndrome.
Molecular classification of patients with EOS
Through the identification of the pathogenic variants/CNVs and their corresponding Mendelian diseases in patients with a primary clinical diagnosis of CS/NMS/IEOS, 84/447 (18.8%) patients from the Chinese cohort (figure 1) and 1/13 patients from the US IS cohort were molecularly classified as either TACS or a syndromic form of EOS. The secondary identification of TACS/syndromic EOS by molecular diagnosis indicates the efficacy of genetic testing when patients exhibit only mild phenotypes rather than the whole phenotypical spectrum of specific Mendelian syndromes, and when regular methods to ascertain a diagnosis have been exhausted. Our results also exemplify the cohort-level changes in EOS classification enabled by ES.
Clinical features of cases with different diagnostic status/categories
In an effort to explore the potential medical indications for genetic testing in patients with EOS, we selected 12 clinical features of significance (online supplementary methods). We hypothesised that the observed pattern of phenotypical features might be used to identify individuals who are more likely to benefit from genomic testing. All patients were divided into three groups according to their diagnostic status and categories: the group without a current molecular diagnosis, the TACS group and the syndromic EOS group. The ability of the 12 clinical features mentioned previously to distinguish these three patient groups was analysed using the random forest algorithm (online supplementary methods). A mean reciprocal rank (MRR=0–1) was determined for each clinical feature, demonstrating the degree of relevance between that feature and each of the patient groups (the more relevant, the higher the rank score) (online supplementary table S9).
As a result, the age of presentation (MRR=0.53), the number of organ systems involved (MRR=0.50) and the Cobb angle (MRR=0.49) were ranked as the most distinct features among the three groups. By independently analysing each of the three features, we found that TACS patients are associated with younger age at presentation (1.56±2.17 years, p=0.001; Wilcoxon signed-rank test), involvement of fewer organ systems (0.07±0.26, p=0.002; Wilcoxon signed-rank test) and smaller Cobb angle (51.41±19.85, p=0.032; Student t-test) than patients without a molecular diagnosis (online supplementary table S10 and figure 2A–C). On the other hand, patients with syndromic EOS are associated with a younger age at presentation (2.50±2.52, p=0.038; Wilcoxon signed-rank test) and involvement of more organ systems (0.54±0.68, p=0.026;Wilcoxon signed-rank test) than the undiagnosed group (online supplementary table S10 and figure 2A/2B).
Hereby, we provide evidence of the important utility of genetic testing in molecularly diagnosing and classifying a large cohort of patients with EOS. By performing ES on Chinese operative patients with EOS (n=447), followed by rare variant family-based genomics, variant interpretation and computational CNV analysis, a molecular diagnostic rate of 20.6% was achieved. As the major subgroup of EOS, patients with CS received a diagnostic rate of 18.6%, which is higher compared with a recent genetic study on CS using both ES (2/28, 8%) and panel-based sequencing of five genes (10/73, 13.7%).26 The increased diagnostic rate observed in patients with CS by ES is possibly due to the more expansive gene coverage in the exome analysis and the utility of computational CNV analysis. The positive rate for neuromuscular EOS (3/5) is consistent with a previous study on a cohort of neuromuscular disease (18/38, 47.4%).27 As anticipated, ES confirmed the diagnosis of syndromic EOS in 8/8 patients. In our study, we also identified the molecular basis of IEOS in 3/23 patients from both the Chinese and the US cohorts, which demonstrated the power of ES in cases that have eluded a clinical diagnosis.
One of the main obstacles for seeking a molecular diagnosis in patients with EOS is the heterogeneous genetic predisposition within this broad clinical description. Indeed, 93 molecularly diagnosed patients were explained by variants in 34 genes and five distinct CNV regions (table 2 and online supplementary table S6). Because our study is driven by a core clinical endophenotype of EOS, patients in our cohort may arguably tend toward the milder end of the clinical spectrum of each molecularly diagnosed syndrome. Strong genetic heterogeneity and limited clinical phenotype expressivity brought challenges to locating the disease-causing genes and characterising gene-specific phenotypes in the patients.
In addition to the molecular diagnosis made for individuals, ES on a large sample size in our study also provided insights into the genetic and genomic architecture of EOS. Several highly relevant biological processes were identified (online supplementary figure S7), which are instructive for both cES analysis and future discovery of novel disease-associated genes.
By phenotype analysis, we identified that the Cobb angle, the number of multisystemic defects and the age of onset as the three top associated clinical phenotypical features that directly correlate with the likelihood of identifying a genetic aetiology for CS. Thus, these may help to stratify which patients with EOS are recommended to undergo genetic testing and what testing should be done. Patients with younger age at presentation, fewer multi-systemic defects and a smaller Cobb angle are more likely to be affected with TBX6 compound mutations, that is, TACS.7–9 22 For these patients, an aCGH analysis or clinical chromosomal microarray CNV analysis supplemented by targeted next-generation sequencing would be beneficial and cost-effective. In contrast, patients with younger age at presentation and more multisystemic defects are more likely to have a molecular diagnosis in genes other than TBX6. Given the genetic heterogeneity in this patient group, exome or genome sequencing would be the first-line diagnostic tool for patients with more complex phenotypes.
The identification of a molecular diagnosis for certain genetic disorders from this study could inform clinical management and provide new clinical insights. Three pathogenic RYR1 variants were identified in three patients with EOS (online supplementary table S5), resulting in a molecular diagnosis of central core disease (MIM: 117000), which increased their potential risk of developing malignant hyperthermia (MH) with general anaesthesia.28 One of the three patients (XH696) indeed developed MH during his surgery and fortunately survived after immediate clinical response to dantrolene salvage. Under such a condition, the risk of MH could be readily avoided by alternative anaesthesia if the molecular diagnosis is revealed prior to the surgery.29
Even in patients with clinically diagnosed syndromic disorders, such as ACH and NF1, the exact molecular diagnosis, including presumed pathogenic variant allele contribution to the patient’s disease process that was revealed by ES, could inform clinical management. Of the four patients with NF1, three were molecularly diagnosed with SNV alleles in the NF1 gene, and the other was found to carry a 573 kb genomic deletion CNV spanning NF1, different alleles with potentially different clinical consequences. NF1 caused by 17q11.2 deletion is associated with early-onset of NF30 accompanied by congenital heart malformation and cognitive dysfunction31; that is, a more severe clinical phenotype is potentially anticipated than that caused by NF1 variant alleles that were SNV mutations. Therefore, the precise molecular diagnosis has important management implications and potentially lifelong clinical (cardiovascular, skeletal and malignancy) surveillance issues for this patient with NF. Moreover, the variant information may help contribute to family management and recurrence risk information that they seek. Clinical experience suggests that up to half of NF cases may be due to new mutations. Moreover, the clinically observed phenotype of ‘segmental NF’ may potentially reflect mosaic mutations.32
There are several limitations of our study. First, we enrolled more CS cases in the Chinese cohort; this patient selection hinders us from exploring the real diagnostic rate of other forms in patients with EOS. Second, the use of multiple capture and sequencing platforms for ES resulted in different capture regions and exome coverage across the cohort, which might affect the identification of some small fraction of pathogenic SNVs and CNVs. Due to the small number of individual genetic loci (34 genes of about 20 000 interrogated by ES) involved in the molecular diagnoses, subtle differences in capture design, genomic sequencing or sequence analytical pipelines are not likely to affect our molecular diagnostic rate. Nevertheless, due to the small number of molecularly diagnosed Mendelian conditions by each specific gene, we are unable to perform gene-based phenotypical analysis and thus cannot explain the mechanism of phenotypical characteristics identified from molecularly diagnosed patients.
To conclude, ES enabled the molecular classification of EOS in 92 out of 447 (20.6%) patients from the Chinese cohort and 1/13 patients from the US cohort. Specific clinical features/feature pairs are able to indicate the likelihood of gaining a molecular diagnosis through ES.
We appreciate all of the patients, their families and clinical research coordinators who participated in this project. We thank GeneSeeq Inc. for exome sequencing technical support. We thank Ekitech Ltd. (Beijing) for providing machine learning solutions.
CAW, FZ, JRL, JZ and NW are joint senior authors.
Twitter @LiuPF, @poseypod
SZ, YZ, WC, WL and SW contributed equally.
Contributors NW, JZ and SZ conceived of the project and designed the study. SZ, YZ, WC, WL, SW, LW, LJ, ZW, JC and GL collected and interpreted the data. YZ, YY, JL, HZ, ZY, ZC and JS conducted the statistical analysis. SZ, YZ, YW, ML, JL, YY, YH, ZZ, SL, XL, RD, ZL and YN conducted the bioinformatic analyses. JZ, GQ, YW, NG, HZ, YY, YL, YT, WL, YZ, JL, BY, NZ, KY, XY, SL, YX, JH, JS and SZ recruited the patients. SZ, YZ, ZW and NW wrote the first draft of the manuscript, and AMK, YK, BR, JJR, PL, VRS, JEP, CAW, FZ and JRL critically revised the work for important intellectual content. All authors provided crucial input on several iterations of the manuscript and approved the final version.
Funding This research was funded in part by the National Natural Science Foundation of China (81822030 to NW, 81930068 and 81772299 to ZW, 81672123 and 81972037 to JZ, 31625015, 31571297 and 31771396 to FZ, 81871746 to YW and 81772301 to GQ), Beijing Natural Science Foundation (7191007 to ZW), CAMS Initiative Fund for Medical Sciences (2016-I2M-3-003 to GQ and NW, 2016-I2M-2-006 and 2017-I2M-2-001 to ZW), Tsinghua University-Peking Union Medical College Hospital Initiative Scientific Research Program (to NW), the National Key Research and Development Program of China (No. 2018YFC0910506 to N.W. and Z.W., No. 2016YFC0901501 to SZ), and the National Undergraduates Innovation and Training Program of Peking Union Medical College (2019zlgc0627 to SZ), CAMS Innovation Fund for Graduates (2018-1002-01-09 to YZ). Also supported by the US National Institutes of Health, National Institute of Neurological Disorders and Stroke (NINDS R35 NS105078 to JRL), National Human Genome Research Institute/National Heart, Lung, and Blood Institute (NHGRI/NHLBI UM1 HG006542 to Baylor-Hopkins Center for Mendelian Genomics), the National Human Genome Research Institute (NHGRI K08 HG008986 to JEP), TX Scottish Rite Hospital Research Fund (to CAW), Foundation Cotrel (to CAW), and P01 HD084387 (to CAW).
Competing interests JRL has stock ownership in 23andMe, is a paid consultant for Regeneron Pharmaceuticals and Novartis and is a coinventor on multiple US and European patents related to molecular diagnostics for inherited neuropathies, eye diseases and bacterial genomic fingerprinting. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the chromosomal microarray analysis (by comparative genomic hybridisation microarray and/or single-nucleotide polymorphism arrays), clinical exome sequencing and whole-genome sequencing offered in the Baylor Genetics Laboratory (http://bmgl.com).
Patient consent for publication Not required.
Ethics approval Written informed consent was provided by each participant in the two cohorts. Approval for the study was obtained from the ethics committee at Peking Union Medical College Hospital (JS-098), the institutional review board of the University of Texas Southwestern Medical Center (STU 112010-150) and Baylor College of Medicine (H-29697).
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available upon reasonable request. The datasets analyzed during the current study are available from the corresponding author on reasonable request.