Article Text

Original research
Clinical exome sequencing as the first-tier test for diagnosing developmental disorders covering both CNV and SNV: a Chinese cohort
1. Xinran Dong1,
2. Bo Liu1,
3. Lin Yang2,
4. Huijun Wang2,
5. Bingbing Wu2,3,
6. Renchao Liu2,
7. Hongbo Chen2,
8. Xiang Chen2,
9. Sha Yu2,
10. Bin Chen2,
11. Sujuan Wang4,
12. Xiu Xu5,
13. Wenhao Zhou1,
14. Yulan Lu1
1. 1 Center for Molecular Medicine of Children's Hospital of Fudan University, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
2. 2 Shanghai Key Laboratory of Birth Defects, Pediatrics Research Institute, Children’s Hospital of Fudan University, Shanghai, China
3. 3 Clinical Genetic Center, Children’s Hospital of Fudan University, Shanghai, China
4. 4 Department of Rehabilitation, Children’s Hospital of Fudan University, Shanghai, China
5. 5 Division of Child Health Care, Children’s Hospital of Fudan University, Shanghai, China
1. Correspondence to Yulan Lu, Children's Hospital & Institutes of Biomedical Sciences, Fudan University, Shanghai 201102, China; yulanlu{at}fudan.edu.cn

## Abstract

Background Developmental disorders (DDs) are early onset disorders affecting 5%–10% of children worldwide. Chromosomal microarray analysis detecting CNVs is currently recommended as the first-tier test for DD diagnosis. However, this analysis omits a high percentage of disease-causing single nucleotide variations (SNVs) that warrant further sequencing. Currently, next-generation sequencing can be used in clinical scenarios detecting CNVs, and the use of exome sequencing in the DD cohort ahead of the microarray test has not been evaluated.

Methods Clinical exome sequencing (CES) was performed on 1090 unrelated Chinese DD patients who were classified into five phenotype subgroups. CNVs and SNVs were both detected and analysed based on sequencing data.

Results An overall diagnostic rate of 41.38% was achieved with the combinational analysis of CNV and SNV. Over 12.02% of patients were diagnosed based on CNV, which was comparable with the published CMA diagnostic rate, while 0.74% were traditionally elusive cases who had dual diagnosis or apparently homozygous mutations that were clarified. The diagnostic rates among subgroups ranged from 21.82% to 50.32%. The top three recurrent cytobands with diagnostic CNVs were 15q11.2-q13.1, 22q11.21 and 7q11.23. The top three genes with diagnostic SNVs were: MECP2, SCN1A and SCN2A. Both the diagnostic rate and spectrums of CNVs and SNVs showed differences among the phenotype subgroups.

Conclusion With a higher diagnostic rate, more comprehensive observation of variations and lower cost compared with conventional strategies, simultaneous analysis of CNVs and SNVs based on CES showed potential as a new first-tier choice to diagnose DD.

• clinical exome sequencing
• first-tier test
• diagnostic rate
• genetic spectrum
• developmental disorder

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

## Introduction

Developmental disorders (DDs) are a group of early onset disorders affecting 5%–10% of children around the world, primarily including neurodevelopmental disorders with/without congenital anomalies, abnormal growth parameters, dysmorphic features and unusual behavioural phenotypes.1 2 To date, chromosomal microarray analysis (CMA) has been regarded as the gold standard method for detecting CNV and a first-tier diagnostic test for DD with an estimated diagnostic rate of 15%–20%.3 4 However, when no positive result is acquired from the CMA test, most patients usually resort to further gene sequencing to detect small genetic variants such as single nucleotide variations (SNVs) and small insertions/deletions (indels). This process is referred to as a ‘diagnostic odyssey’, resulting in unnecessary expenditures, as well as a waste of resources.

Conventional exome sequencing (ES) mainly focuses on the detection of SNVs and indels. However, with the increased sensitivity and accuracy of detecting CNV through ES data, this method can now be used in clinical scenarios.5–9 Compared with traditional methodologies, CNV detection based on ES offers a more flexible resolution for large-scale parallel assessment. In addition to large CNVs, small CNVs of exon-level deletions can also be identified with high-depth next-generation sequencing (NGS) data10–12; these deletions might not be detected in low-resolution microarray tests. The American College of Medical Genetics and Genomics (ACMG) has offered guidelines for CNV and SNV interpretation,13–15 and several studies based on ES data have been published for genetic diagnoses2 8 9 16–21; however, there are still some limitations: (1) traditional pipelines have merely been deployed on either SNV or CNV analysis, leaving the other part not thoroughly evaluated; (2) ES for CNV diagnosis was performed after microarray test screening, making the performance of absolute ES-based diagnosis unclear; and (3) studies of heterogeneous disorders or with small sample sizes, resulted in insufficient guidance for clinical application and poor expectations of diagnostic yield in particular disorders.

At present, whether a genomic microarray should be conducted before gene sequencing for children with DD has not been determined, and the use of ES alone in the DD cohort has not been thoroughly evaluated.22 Thus, research based on using ES data to obtain both CNV and SNV results to diagnose DD patients is desirable for research and clinical practice. In this study, we performed clinical exome sequencing (CES) on 1090 unrelated children with DD phenotypes, generating both CNV and SNV results to assess the performance of using NGS data alone. Our findings have important implications for the precise diagnosis of DD patients.

## Materials and methods

### Study design and sample collection

The outline of the study design is shown in figure 1. Patients over 1 year old were recruited from the Children’s Hospital of Fudan University between February 2017 and February 2019 after meeting the following inclusion criteria: (1) abnormalities in gross/fine motor, speech/language and cognition; (2) abnormalities in social/personal behavioural and unusual behavioural phenotypes; and (3) abnormalities in intellectual ability.1 2 23 Patients were excluded if they had traumas/central nervous system infections or had a history of maternal exposures/infections. Among the enrolled patients, highly suspected karyotype abnormalities were first evaluated using the karyotype test, and patients with positive results were excluded. Before CES, all patients underwent detailed clinical examinations conducted by experienced geneticists. According to clinical phenotypes, namely, clinical manifestations and the results of particular examinations (eg, head MRI and metabolic tests), patients were divided into different phenotype subgroups. After performing CES, both CNVs and SNVs were detected and analysed for molecular diagnosis.

Figure 1

Outline of the study design. Patients suspected of suffering from DD were enrolled according to the inclusion criteria. A total of 1100 patients were originally recruited in this study cohort. After assessing their clinical phenotypes, karyotype tests were performed on those highly suspected of karyotypic abnormalities. A total of 10 patients with abnormal karyotypes were identified (two Turner syndrome and eight trisomy 21) and excluded. Clinical exome sequencing (CES) was performed on the remaining 1090 patients. Variation detection included conventional SNV detection and NGS data-based CNV detection. Genetic interpretation of the detected variations was then conducted. Collectively, 152 diagnostic CNVs were identified in 139 patients, and 397 diagnostic SNVs were identified in 320 patients. The qPCR/MLPA/CMA and Sanger sequencing were respectively performed for confirmation of CNVs and SNVs. CMA, chromosomal microarray analysis; DD, developmental disorders; MLPA, multiplex ligation-dependent probe amplification; SNVs, single nucleotide variations.

### NGS and data processing

A protocol24 using the Agilent (Santa Clara, California, USA) ClearSeq Inherited Disease panel kit for enrichment followed by NGS targeted on 2742 genes was adapted for the clinical testing of every enrolled proband. For variant calling, GATK best practice was employed for SNV/small indels. CANOES25 and HMZDelFinder12 were separately applied for CNV detection, and the results were merged. The annotation and filtrations of both SNVs and CNVs followed those reported in published works.26 27 Detailed descriptions of the sequencing, variant calling, annotation and filtering processes can be found in the online supplementary notes and online supplementary figure S1.

### Criteria for variant classification and diagnosis

A previously described variant classification criteria24 based on the ACMG guidelines and the condition of diagnosis that considered both CNVs and SNVs were applied in the genetic analysis. All of the diagnostic SNVs/small indels were confirmed by Sanger sequencing. Diagnostic CNVs <1 Mb or located on the X chromosome were confirmed by qPCR/multiplex ligation-dependent probe amplification/CMA validation. A detailed description of the variant classification and experimental validation information can be found in online supplementary notes, online supplementary table S1 and online supplementary file 1.

## Results

### Characteristics of enrolled patients

A total of 1100 unrelated patients with DD-related phenotypes were recruited following the inclusion criteria. Karyotype tests were performed in 16 patients highly suspected of karyotypic abnormalities. Ten patients obtained positive results (two Turner syndrome and eight trisomy 21) and were excluded from the following study. The remaining 1090 patients underwent CES. In the sequencing cohort, 661 (60.64%) were men and 429 (39.36%) were women, aged from 1 year to 16 years. Detailed demographic information is shown in table 1. Based on clinical phenotypes, patients were divided into two main groups: an isolated DD group (n=348, 31.93%) and a syndromic DD group (n=742, 68.07%). To obtain more distinct findings, patients in the latter group were further classified into four subgroups according to their clinical manifestations, including DD with malformations (n=316, 42.59%), DD with epilepsy (n=289, 38.95%), DD with behavioural troubles (n=165, 22.24%) and DD with metabolic disorders (n=122, 16.44%). Depending on the patients’ clinical phenotypes, the same patient may have been in different subgroups. The detailed subgroup information is shown in figure 2A.

Table 1

Demographic, clinical characteristics and diagnostic rate of patients

Figure 2

Distribution of diagnostic rate and diagnostic variations in different phenotype groups. (A) Sample classification among different phenotypes. The square area represents the isolated DD group. The four coloured ellipses represent the four subgroups of syndromic DD: green for the DD with metabolic disorder group, blue for the DD with behavioural troubles group, pink for the DD with epilepsy group and yellow for the DD with malformation group. Overlap between the different ellipses shows the overlap of patients among those subgroups, with figures indicating the number of individuals. (B) The distribution of diagnostic variations among the different phenotype groups. Among the diagnosed patients, the proportion of cases explained by diagnostic CNV alone, SNV alone, apparent homozygous mutations formed by CNV and SNV or dual diagnoses with both CNV and SNV varied among the different phenotype groups. AH, apparent homozygous mutation (formed by CNV and SNV); DD, developmental disorder; SNV, single nucleotide variations.

### Diagnostic rate of the cohort

By simultaneously analysing SNV and CNV, an overall diagnostic yield of 41.38% (451/1090, 95% CI 38.43% to 44.37%) was achieved, of which diagnostic CNV alone accounted for 29.05% (131/451), diagnostic SNV alone accounted for 69.18% (312/451), 0.67% (3/451) were dual diagnosed with both diagnostic CNV and SNV and the remaining 1.11% (5/451) were apparently homozygous mutations caused by overlapping CNV and SNV.

The diagnostic rate varied in different phenotype groups. In the isolated DD group, the yield was 39.94% (139/348, 95% CI 34.76% to 45.30%), close to that of the syndromic DD group, which was 42.05% (312/742, 95% CI:38.47% to 45.69%), with no significant difference (p=0.554). In the four phenotype subgroups, the diagnostic rate ranged from the lowest, 21.82% (DD with behavioural troubles group), to the highest, 50.32% (DD with malformation group). The diagnostic rate of DD with behavioural troubles was significantly lower than that of the other three subgroups (p values for comparison with the metabolic disorder, epilepsy, and malformations group were 1.5e-05, 1.337e-05 and 2.756e-09, respectively), while these three subgroups showed no significant difference from one another (p>0.05). In addition, the diagnostic rate among females was significantly higher than that among males (p=9.542e-05). The detailed diagnostic rates are given in table 1.

Additionally, the proportions of diagnostic SNV and CNV varied in different groups. In the two main groups, the proportions of patients diagnosed by SNV alone were higher than those diagnosed by CNV alone: 61.15% (85/139) vs 35.25% (49/139) in the isolated DD group (p=2.657e-05) and 72.76% (227/312) vs 26.28% (82/312) in the syndromic DD group (p<2.2e-16). In the four subgroups of syndromic DD, the DD with epilepsy group and the DD with metabolic disorders group were the two groups where diagnostic SNVs accounted for the largest percentages, 86.18% and 78.95%, respectively. Comparatively, the two groups in which diagnostic CNVs occurred at the highest prevalence were the DD with malformation group (38.36%) and the DD with behavioural troubles group (38.89%). Detailed numbers and percentages of patients diagnosed with SNVs and CNVs in all groups can be found in table 1 and figure 2B.

Furthermore, three patients identified both diagnostic SNV and CNV and were considered dual-diagnosed cases. All three of these patients were classified in the isolated DD group, accounting for 2.16% (3/139) of positive patients in this group. In addition, five patients were simultaneously identified with both SNV and CNV (heterozygous deletion), which overlapped on the same disease-causing gene and formed apparently homozygous mutations. Two of the five patients were labelled as isolated DD, and the remaining three were in the syndromic DD group, accounting for 1.44% and 0.96%, respectively, of the diagnosed patients in the corresponding groups.

### Genetic spectrums of the cohort

While 152 diagnostic CNVs were detected among 139 patients, ranging from single exon (77 bp) to 50.7 Mb variants, 397 diagnostic SNVs were found in 320 patients with variants spanning 178 genes. Details regarding diagnostic SNVs and CNVs of the cohort are given in online supplementary table S2 and online supplementary table S3.

### Spectrums of CNVs

For the 152 diagnostic CNVs, 108 (71.05%) were deletion variants (5 homozygous/hemizygous deletion and 103 heterozygous deletion) and 44 (28.95%) were duplication variants. Among these CNVs, 123 (80.92%) CNVs were larger than 1 Mb, 21 (13.82%) CNVs were less than 1 Mb and spanning multiple genes and 8 (5.26%) CNVs affected a single gene.

The top three recurrent cytobands with diagnostic CNVs ascertained in the cohort were 15q11.2-q13.1 (n=18, Angelman syndrome: MIM#105830; Prader-Willi syndrome: MIM#176270); 22q11.21 (n=16, DiGeorge syndrome: MIM#188400); 7q11.23 (n=9, Williams-Beuren syndrome: MIM#194050; chromosome 7q11.23 duplication syndrome: MIM#609757). Diagnosed CNVs located on these top three recurrent cytobands accounted for 28.29% (43/152) of all diagnostic CNVs. All of the recurrently occurring diagnostic CNVs identified in the cohort are shown in figure 3A.

Figure 3

Recurrent, diagnostic CNVs and SNVs were identified in the cohort. (A) Location and copy-number of recurrent diagnostic CNVs identified in the cohort. Recurrent (≥2 patients) diagnostic CNVs identified in the cohort are shown with colour bars, demonstrating their chromosomal locations, variation types (red bar: deletion; blue bar: duplication) and number of samples (indicated by the depth of colour bar). CNV-affected cytobands and the number of diagnosed patients are listed on the right. (B) Bar plot of the most frequently occurring genes with diagnostic SNVs among the cohort. Diagnostic SNV-influenced genes that recurrently appeared in ≥5 patients are displayed. The colour in each bar indicates the number of cases diagnosed by the specific variant type (row) in the relevant gene (column). (C) Heatmap of identified recurrent diagnostic CNVs among the different phenotype subgroups. Recurrently identified diagnostic CNVs of the cohort and the corresponding patient subgroups are shown. The colour of each cell indicates the number of cases diagnosed by the specific CNV (row) in the relevant phenotype group (column). (D) Heatmap of identified recurrent genes with diagnostic SNVs among different phenotype subgroups. Recurrently identified genes with diagnostic SNVs in the entire cohort and in the corresponding patient subgroups are shown. The colour of each cell indicates the number of cases diagnosed by the specific gene (row) in the relevant phenotype group (column). DD, developmental disorder; SNVs, single nucleotide variations.

The diagnostic CNVs varied among the groups, and there were recurrent CNVs in every group. A total of four recurrent diagnostic CNVs were observed in more than three phenotype groups: 15q11.2-q13.1 deletion/duplication, 22q11.21 deletion/duplication, 7q11.23 deletion/duplication and 16p11.2 deletion/duplication. Overall, recurrent diagnostic CNVs were observed on 14 different cytobands. Of the 14 cytobands, 12 recurring CNVs involved the isolated DD group and the DD with malformations group and were repeatedly detected in several distinct cytobands. Recurrent diagnostic CNVs in the DD with behavioural troubles group and the DD with metabolic disorder group were mainly focused on 15q11.2-q13.1 and 22q11.21, respectively. In particular, two patients with mosaic CNVs, one with a 1.6 Mb duplication on 5q35.3 (S0016) and the other with a 22.3 Mb deletion on 18q21.31-q23 (S0831), were also detected in the study cohort. Further information about diagnostic CNVs among the different phenotype groups is shown in figure 3C, and the validation results of the cases with mosaic variants are given in online supplementary file 2.

### Spectrums of SNVs

For the 397 diagnostic SNVs, 233/397 (58.69%) SNVs had been reported as pathogenic (P)/likely-pathogenic (LP) variants in ClinVar or marked as ‘DM’/‘DM?’ in The Human Gene Mutation Database (HGMD), and 164/397 (41.31%) variants were novel or produced different amino acids from the reported pathogenic variants. Among the diagnostic SNVs, 211 (53.15%) were missense variants, 81 (20.40%) were frameshift variants, 58 (14.61%) were stop-gained variants, 42 (10.58%) were splicing variants, 4 (1.01%) were inframeshift variants and 1 (0.25%) was stop-lost variant.

The top 10 genes with diagnostic variants were MECP2 (n=18), SCN1A (n=13), SCN2A (n=9), TSC2 (n=7), ARID1B (n=6), BRAF (n=6), STXBP1 (n=6), TSC1 (n=6), KCNQ2 (n=5) and NF1 (n=5), making up 20.40% (81/397) of all diagnostic SNVs in our cohort, and all were previously reported disease-causing genes of developmental abnormalities28–31 (figure 3B). Moreover, a total of 114 (114/178, 64.04%) genes appeared only once in the patients diagnosed with SNV.

The genes containing diagnostic SNVs differed among the different phenotype groups. A total of 27 genes containing diagnostic SNVs were identified in more than two patients. Among these genes, 13 recurring genes influenced the isolated DD group, of which MECP2 was most frequently involved. Meanwhile, MECP2 was detected in a number of other phenotype groups, including the DD with malformation, DD with epilepsy and DD with behavioural troubles groups. In the DD with malformation group, there were 16 recurrent genes, and none showed an obvious advantage in proportion, while BRAF, which was the most recurrent gene, only appeared in this group. Sixteen recurrent genes influenced the DD with epilepsy group, and unlike other top recurring genes involving multiple phenotypes, the SCN1A gene was only identified in this group and accounted for the largest proportion. Further diagnostic SNVs identified in the different groups are given in figure 3D.

### Representative cases diagnosed by combinational CNV and SNV analysis

#### Apparently homozygous case: previously unexplained but diagnosed with combinational analysis

Case S0027, a boy with global developmental delay (HP:0001263), was diagnosed with congenital hypothyroidism (HP:0000851). In this patient, a novel SNV located on the splice donor of the 10th exon of the SLC5A5 gene (NM_000453:c.1242+1G>A) and an 8 kb deletion variation that affected the 4th-13th exons of SLC5A5 were both detected. Biallelic mutations in the SLC5A5 gene affect thyroid hormone synthesis in thyrocytes and cause thyroid dyshormonogenesis 1 (MIM#274400) with clinical manifestations of growth retardation, thyroid nodules and hyperplastic, and intellectual disability if untreated in infancy. The pedigree analysis demonstrated that the splicing mutation was paternal, and the deletion was maternal (figure 4A–D). These two variations in combination resulted in an apparently homozygous SNV, which caused the boy’s abnormal phenotypes. Noticing that the patient had a younger brother, we suggested an additional test on his sibling and found exactly the same mutations as those found in the patient. According to the molecular testing results, thyroid tablets were given to the younger brother (50µg/day), and his thyroid function was monitored to ensure that his thyroid-stimulating hormone was maintained in the normal range. Fortunately, the younger brother received timely therapeutic treatments and has not shown any abnormalities to date. In this case, a previously unclear situation in which neither SNV nor CNV alone were able to provide an explanation, the patient’s symptoms were finally clarified due to ES data analysis; as a result, timely intervention was introduced, preventing another tragedy for the family.

Figure 4

Characterisation of an apparently homozygous variant formed by SNV and CNV on the SLC5A5 gene responsible for the patient’s phenotype. (A) A paternal SNV located on the splice donor of the 10th exon of the SLC5A5 gene (NM_000453: c.1242+1G>A) and a maternal 8 kb deletion variation that affected the 4th–13th exons of SLC5A5 were both detected in the child. (B) Normalised exon depth ratio of the family; exons influenced by the deletion are labelled with red dots. (C) Sanger results and (D) qPCR results of the variants in the family. Three pairs of primers were used in the qPCR, three biological replicates were performed for each test and the error bars indicate the variation. *Indicates a significant difference from the control sample (p<0.05, two-sided t-test). The X-axis indicates the value of 2-ΔΔC T during qPCR analysis. SNV, single nucleotide variation.

#### Dual diagnosis case: additional genetic information supplied

In case S0690, a 2-year-old girl with skin rash (HP:0000988), erythema (HP:0010783), hearing impairment (HP:0000365) and growth delay (HP:0001510) was diagnosed with ichthyosis (HP:0008064) and developmental delay (HP:0001263). SNV analysis identified a homozygous mutation in the ALDH3A2 gene (NM_000382:c.1157A>G), which is a previously established pathogenic mutation responsible for Sjogren-Larsson syndrome (MIM#270200) that could impair the skin and central nervous system, causing pruritic ichthyosis and intellectual disability. In addition, a 2.2 Mb heterozygous deletion located on 1q21.1-q21.2 was also identified based on the girl’s sequencing data. The 2.2 Mb deletion caused chromosome 1q21.1 deletion syndrome (MIM#612474), which is characterised by growth delay and intellectual disability (mild to moderate). Both SNV and CNV identified in the girl were previously established variations, and each partially explained her clinical phenotypes, making it a dual diagnosis case. Compared with the limited ability of the CMA test in identifying the CNVs, analysis based on ES data helped to provide a more comprehensive visualisation of the variation landscape.

## Discussion

To date, NGS has significantly changed the molecular diagnosis of rare diseases. Compared with the CMA test, which mainly detects CNVs/absence of heterozygosity (AOH), NGS data-based analysis allows simultaneous detection of SNVs, CNVs and AOH under certain conditions.32 In this study, we performed CES on 1090 DD patients suspected of having genetic disorders and reached an overall diagnostic yield of 41.38%. Previously, Wright et al 2 analysed whole-exome sequencing (WES) data of 1133 undiagnosed DD children. By focusing only on de novo and segregating variants in known DD genes, they achieved a diagnostic yield of 27%. Grozeva et al 28 analysed a cohort of 986 individuals with moderate to severe intellectual disability focusing on 565 known or candidate associated genes, and likely pathogenic variants were found in ~11% of the cases. Gilissen33 applied whole-genome sequencing (WGS) to 50 patients with severe intellectual disability and their unaffected parents studying both CNVs and SNVs and reported a diagnostic yield of 42%. To the best of our knowledge, this report describes the largest cohort study from a single clinical centre in China to investigate the genetic spectrum among children suffering from DD, and it is also the first study to display the genetic spectrums of CNVs and SNVs identified from ES data in Chinese DD patients. Additionally, the combinational analysis revealed five apparently homozygous cases, three dual diagnosis cases and two mosaic CNV cases. As it has often been the situation that apparently homozygous cases were left undiagnosed using traditional cytogenetic tests, causal SNVs were easily missed when merely applying traditional CMA tests. However, these situations could be better explained when both CNVs and SNVs were detected. Moreover, the detection of mosaic cases was another situation benefiting from the high coverage of the ES data. Such cases and current results illustrated the importance of simultaneous analysis of both CNVs and SNVs, which could provide a more comprehensive picture of the molecular landscape for genetic interpretation compared with traditional analysis.

In this study, patients were classified based on their clinical manifestations. Diagnostic rates and genetic spectrums of causal variants showed differences in our study cohort. Among all subgroups, patients in the DD with malformation group achieved the highest diagnostic rate. This finding may be attributable to the specificity of the disease-causing gene and the relatively higher recognition and directivity of patients’ particular phenotypes. However, patients in DD with behavioural troubles who were mainly diagnosed with autism spectrum disease (ASD) achieved the lowest diagnostic rate. We extracted a behavioural trouble-related gene list containing 66 genes from the DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources (DECIPHER) (online supplementary table S4) and compared it with the CES targets. The comparison showed that for ‘confirmed’ genes classified by the DECIPHER, only three genes, KMT2E, TBL1XR and TRIP12, were not covered by the CES. These three genes were all rarely reported in previous studies, which should have little influence on the diagnosis of DD patients with behavioural troubles in this study.34–37 Additionally, we used ‘behavioural abnormalities’ as the search term and extracted a list containing 99 genes from the OMIM database. We compared the gene list with the CES targets and found that only six relatively newly established behavioural-related genes that diagnosed few patients were not covered, namely, FBXO11, TCF20, POLR2A, USP9X, BCORL1 and C19ORF12. Additionally, we obtained a behavioural trouble-related CNV list from the DECIPHER, which contained nine CNVs and were all covered and detectable in this study (online supplementary table S5). Thus, the relatively low diagnostic rate in this subgroup was unlikely to be caused by many known disease-related CNVs/genes not being covered by the used gene panel, which might result from the complexity of the pathogenesis of ASD and the insufficiency of relevant genetic studies. Among the classified phenotype subgroups, diagnostic CNV had the highest proportion in the DD with malformation group, the top mutations of which were 15q11.2-q13.1 deletion/duplication, 22q11.21 deletion/duplication and 7q11.23 deletion/duplication. In other words, these DD patients with malformations would benefit most from the CMA test. However, in the DD with epilepsy group, the proportion of diagnostic CNVs was found to be the lowest, while the diagnostic SNVs were relatively more common and recurred in known disease-causing genes, such as SCN1A, SCN2A, TSC2, TSC1 and KCNQ2. The same situation was also found in the DD with metabolic disorder group. For patients with the aforementioned phenotypes of which CNVs only occurred in small proportions, using CMA as a first-tier test may not be effective and may even aggravate the ‘diagnostic odyssey’ phenomenon. Overall, the diagnostic yield of CNV was considerably lower than that of SNV in this cohort, indicating that traditional CMA as a first-tier test had limitations for diagnosing DD patients.

### Supplemental material

In developing countries where genetic tests are rarely covered by health insurance, patients will suffer more financial losses from the ‘diagnostic odyssey’, let alone the testing time. As a conventional first-tier test, CMA can reach a diagnostic yield of approximately 15%–20%. Most patients with negative CMA results will turn to additional tests, which are often expensive and time consuming. According to the experience in our centre, a CMA test costs $800 with a turnaround time (TAT) of approximately 3 weeks, which is less competitive than CES with costs at$250 and a TAT that is still also approximately 3 weeks. According to our diagnosis results, 12% patients with diagnostic CNVs might receive a positive molecular diagnosis if tested with CMA, but the remaining 29% of patients would need further tests to identify the causal SNV. In other words, if all the patients were to follow the traditional method of CMA first and NGS second, nearly two-thirds of patients would have to spend more time and more money. In general, since NGS data can be used to analyse SNV and CNV at the same time, which is more cost-effective to patients, CES as a first-tier method is worth considering. Correspondingly, CES would also facilitate physicians who are tasked with choosing an appropriate genetic test for patients. In less developed countries, experienced genetic clinicians are inadequate in number, and most clinicians, due to their limited genetic knowledge, are unable to choose the most suitable testing approach for their patients. Given that CES makes simultaneously detecting CNV and SNV possible, it could be a better choice for both clinicians and patients.

CES detection of both CNV and SNV achieved a considerable diagnostic yield in our cohort, illustrating its potential for conventional clinical application. One the one hand, analysing two kinds of variations from a single test enhances the ability of molecular interpretation. On the other hand, this approach reduces the cost and time of clarifying the diagnosis compared with the traditional strategy of sequentially performing CMA and NGS. However, there are still some limitations to this approach that require improvements. ES data generated from capture-based sequencing make it difficult to clearly identify CNV breakpoints. Additionally, capture-based sequencing may result in deviations between different sequencing batches, while the accuracy of CNV detection is highly correlated to the stability of data and robustness of the algorithm. In addition, the ES data have inevitable limitations for certain variations, for example, AOH, uniparental disomy, balanced translocation or inversion, as well as on variants located in the ‘NGS dead zone’. Robust and specific tools still need to be developed, and the completion of this part will be very helpful for the comprehensive assessment of chromosomal diseases. When patients acquire no positive results, using CES, WES or WGS could be conducted for further potential variation detection and possibly achieve a positive diagnosis. For example, by comparing an unpublished internal dataset (2195 DD individuals) using WES for molecular diagnosis, eight genes (WDR45, DDX3X, AHDC1, ARID2, GNB1, KCNA2, KCNH1 and PURA) not included in the CES target but with detected P/LP variants were identified in at least three patients. These genes were relatively newly discovered compared with the CES design. Additionally, we obtained a list of CNV-related syndromes provided by the DECIPHER, which contained 67 expert-curated microdeletion/microduplication syndromes involved in DD (by 4 September 2019). We compared the regions of these 67 CNVs with the CES-covered targets. The comparison results showed that three CNVs were not covered by the panel, namely, 12p13.33 microdeletion syndrome, recurrent 16p12.1 microdeletion and Leri-Weill dyschondrosteosis SHOX deletion (four CNVs on chromosome Y were covered by this panel, but CNVs on chromosome Y were not performed in this study). These DD-related CNVs could be missed following the method applied in this study. Detailed comparison results are given in the online supplementary table S6. For WGS, further potential disease-causing variants, such as non-coding region variants and copy neutral structural variations, could be identified.

### Supplemental material

In conclusion, by simultaneously analysing SNVs and CNVs based on NGS data, our study reached a high diagnostic yield in children with DD. This approach is more cost-effective than the conventional diagnostic strategy. The subgroups with different phenotypes showed diverse genetic spectrums of both CNV and SNV. The results demonstrated the potential of analysing SNVs and CNVs from NGS data in combination for genetic interpretation, paving the way for a new first-tier test for DD patients.

## Acknowledgments

We are very grateful to the patients’ families for their trust in our laboratory, and would also like to thank Professor FX for his support and mentorship.

• ## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

• XD and BL contributed equally.

• Correction notice The article has been corrected since it was published Online First. The funding statement has been amended.

• Contributors YL and WZ conceived the analysis. BL and XD performed the analysis. XD, BL, BC and YL drafted the manuscript. LY, HW, BW reviewed the results. RL, HC and SY performed the validation experiments. SW and XX performed clinical diagnosis, communication with patients and patients care. XC reviewed the clinical phenotypes. All authors read and approved the final manuscript. The authors wish it to be known that, in their opinion, XD and BL should be regarded as joint first authors.

• Funding This work was funded by National Natural Science Foundation of China (31701152, 31701138), National Key Research and Development Program (2016YFC0905102, 2018YFC0116903), Science and Technology Commission of Shanghai Municipality (16ZR1446500), Shanghai Sailing Program (16YF1401000), Shanghai Hospital Development Center (SHDC 12017110), Shanghai Key Laboratory of Birth Defects (13DZ2260600) and Research projects of the Shanghai municipal health and family planning committee (20174Y0026).

• Competing interests None declared.

• Patient consent for publication Not required.

• Ethics approval This study was approved by the ethics committees of Children's Hospital of Fudan University (2014–107 and 2015–130). Counselling was performed by physicians prior to testing. Informed consent was obtained from the parents of each patient.

• Provenance and peer review Not commissioned; externally peer reviewed.

• Data availability statement Data are available on reasonable request. The data that support the findings of this study are either included in the article (or in its supplementary files) or available from the corresponding author on reasonable request. The data are not publicly available due to privacy or ethical restrictions.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.