Article Text

Download PDFPDF
Original research
Contribution of de novo and inherited rare CNVs to very preterm birth
  1. Hilary S Wong1,
  2. Megan Wadon2,
  3. Alexandra Evans2,
  4. George Kirov2,
  5. Neena Modi3,
  6. Michael C O'Donovan2,
  7. Anita Thapar2
  1. 1 Department of Paediatrics, Cambridge University, Cambridge, UK
  2. 2 MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
  3. 3 Section of Neonatal Medicine, Imperial College London, London, UK
  1. Correspondence to Professor Anita Thapar, MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, South Glamorgan CF24 2FN, UK; thapar{at}


Background The genomic contribution to adverse health sequelae in babies born very preterm (<32 weeks’ gestation) is unknown. We conducted an investigation of rare CNVs in infants born very preterm as part of a study to determine the feasibility and acceptability of a larger, well-powered genome-wide investigation in the UK, with follow-up using linked National Health Service records and DNA storage for additional research.

Methods We studied 488 parent–offspring trios. We performed genotyping using Illumina Infinium OmniExpress Arrays. CNV calling and quality control (QC) were undertaken using published protocols. We examined de novo CNVs in infants and the rate of known pathogenic variants in infants, mothers and fathers and compared these with published comparator data. We defined rare pathogenic CNVs as those consistently reported to be associated with clinical phenotypes.

Results We identified 14 de novo CNVs, representing a mutation rate of 2.9%, compared with 2.1% reported in control populations. The median size of these CNV was much higher than in comparator data (717 kb vs 255 kb). The rate of pathogenic CNVs was 4.3% in infants, 2.7% in mothers and 2% in fathers, compared with 2.3% in UK Biobank participants.

Conclusion Our findings suggest that the rate of de novo CNVs, especially rare pathogenic CNVs, could be elevated in those born very preterm. However, we will need to conduct a much larger study to corroborate this conclusion.

  • microarray
  • molecular genetics
  • copy-number
  • developmental

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Around 1 in 10 babies are born preterm (<37 weeks’ gestation), and in high-income settings, survival rates are well over 90%.1 The adverse health sequelae of preterm birth are multiple, especially common in the most immature (very preterm birth, <32 weeks’ gestation) and include respiratory, cardiac, neurological, developmental and neuropsychiatric disorders especially attention deficit/ hyperactivity disorder (ADHD) and autism spectrum disorder (ASD).2–7 Family8 9 and twin studies10 11 have also shown that preterm birth is modestly heritable, highlighting that both the maternal and fetal genome contribute to preterm birth.8 9 However, limited progress has been made in the discovery of genes predisposing to preterm birth.12 Genome-wide molecular genetic studies that could provide novel insights into the relationship between preterm birth and later adverse health outcomes are sparse.

A large maternal genome-wide association study (GWAS) of gestational age at delivery (n=43 568 mothers),13 which included 3331 women who self-reported preterm birth (<37 weeks), identified six maternal loci to be significantly associated with gestational age and three with preterm birth. A more recent fetal GWAS meta-analysis identified one locus on chromosome 2q13 to be associated with gestational duration, although no locus reached genome-wide significance for preterm birth as a dichotomous outcome.14 Huusko et al 15 reported that rare variants in glucocorticoid receptor signalling pathway were common to ten Finnish mothers with recurrent spontaneous preterm births. There have been no well-powered maternal or fetal investigations of rare mutations in preterm birth.

Genome-wide studies have implicated CNVs in developmental delay16 and neuropsychiatric disorders such as ASD17 and ADHD18 19 that are known to be associated with preterm birth. However, such studies have yet to be conducted for preterm birth on a large scale. One study of 454 mothers that adopted a genome-wide, case–control approach comparing those who delivered an infant prior to 34 weeks with those delivering at term found no increased burden or specific CNV associations.20 A more recent study included 270 mothers and infants who were born prior to 37 weeks’ gestation.21 However, this investigation implicated gene pathways but did not report on de novo mutations. Studies of de novo mutations using parent–offspring trios provide an especially powerful approach because of the likely pathogenicity of such mutations and the low background rate in healthy individuals. To date, one small sequencing investigation of 292 preterm (<37 weeks’ gestation) parent–offspring trios suggested an increased burden of fetal de novo rare coding variants in genes involved in early brain development.22 In this investigation, we use data from a pilot study to assess the feasibility of recruiting a much larger UK-wide cohort of very preterm-born infants from UK neonatal units for genomic investigations and for assessing links with later neuropsychiatric disorders using National Health Service record linkage. We report the results of this study separately. Here we present the genome-wide investigation of rare CNVs in 488 very preterm-born infants, using parent–offspring trios.



Between 1 May 2017 and 30 June 2018, we recruited infants born at <32 weeks’ gestation and, where possible, one or both parents from 60 neonatal units in England. Local clinical teams approached potential participants at any point during the infants’ neonatal hospitalisation to explain the study. We excluded families if the pregnancy had arisen from a donor gamete, as trio-based analysis would not be possible. Otherwise, we included all consenting families in order not to bias the study population. Parents provided written informed consent for themselves and their infant to participate in this study.

Clinical staff collected peripheral blood into EDTA specimen tubes using a heel lance or venepuncture from the infants (0.5–1 mL) at the time of medically indicated sampling and by venepuncture (5 mL) from each parent. The samples were posted to the Cardiff laboratory where they were stored prior to DNA extraction.

Laboratory methods

We used nucleon BACC3 genomic kits to extract DNA from both parent and infant blood samples. We performed genotyping using standard protocols and Illumina Infinium OmniExpress Arrays.


CNV calling and QC

We performed CNV calling using protocols we have published previously.23 Briefly, we processed raw intensity data using Illumina GenomeStudio software with SNP clustered using the current sample. Log R ratios (LRRs) and B-allele frequencies (BAFs) were calculated and PennCNV software24 used to call CNV, adjusting for guanine-cytosine (GC) content following standard protocols.25 We merged adjacent CNV calls that were separated by less than 50% of their combined length.

Individuals were excluded if the LRR SD was <0.2, waviness factor was <−0.03 or >0.03 or the total number of CNVs called at three probes or more was >50. If any family member failed these criteria, the whole trio was excluded. We excluded CNVs that occurred in >1% of individuals, with a length of <10 kb or called by <15 probes as these can be unreliable. We confirmed family relationships using identity by descent.26 We excluded 16 trios for failing quality control requirements. The total remaining sample comprised 488 trios. We inspected potential de novo CNVs for LRR and BAF patterns for each of the three family members using the Genome Browser tool of GenomeStudio (online supplementary material). Due to the definitive patterns, we did not undertake additional validation.

Supplemental material

Comparator datasets

In the absence of large genome-wide CNV studies in newborn infants to use as a comparator, we compared our findings with published control population de novo CNV rates.27 28 These were derived from two separate datasets, totalling 3495 parent–proband trios. The first dataset consists of an Icelandic whole population dataset recruited for research conducted at deCODE genetics. Incomplete trios and probands known to be affected with neurodevelopmental/psychiatric disorders were excluded.27 The second dataset consists of 872 unaffected siblings used for research into autism.28

Analysis of pathogenic CNVs

We assessed the rate of pathogenic CNVs in infants and their parents. We provide the list of tested CNV in online supplementary table S1. Rare pathogenic CNVs were defined as those in regions associated with a clinical phenotype that have been documented to be clinically significant in multiple peer-reviewed publications, consistent with the criteria proposed by the American College of Medical Genetics.29 We used the lists of such CNVs complied by Cooper et al 16 and Dittwald et al,30 following our previous publication on the UK Biobank,31 and excluding three common CNVs that would only introduce noise. These are 2q13(NPHP1) deletions and duplications, which are only pathogenic when deleted in the homozygous state, and duplications at 15q13.3(CHRNA7), which appear not to be pathogenic. We used the UK Biobank as a comparator group for the rate of CNVs, using the approach we have previously used.31


Description of participants

The study population consisted of 488 parent–offspring trios (figure 1). The infants were born at a median gestational age of 28.7 weeks (range 22 weeks and 6 days to 31 weeks and 6 daysgestation) and with a median birth weight of 1100 g (range 445 g to 2230 g) (table 1). In 273 (56.0%) cases, preterm birth followed spontaneous onset of labour, and in the remaining cases, delivery was medically indicated due to fetal or maternal pregnancy complications.

Table 1

Characteristics of study participants (proband)

Figure 1

Flow chart of recruitment and establishment of study cohort. VPT, very preterm.

De novo CNV (infants)

We identified 14 de novo CNVs in probands, representing a per diploid genome de novo mutation rate of 2.9%. Mutations included six duplications and eight deletions (table 2). Of the individuals carrying a de novo CNV, five were girls and eight were boys, with one individual possessing two de novo deletion CNV (table 2).

Table 2

De novo CNV

Overall, 2.7% of individuals had a de novo CNV. The mutation rate for CNVs >10 kb was 2.9% (14/488), and for CNVs >100 kb, 2.7% (13/488). Both rates of de novo CNVs are greater compared with comparator datasets (table 3),32 and the size was over 2.5-fold larger (717 kb vs 255 kb, Mann-Whitney U test p=0.022). This was also true for de novo CNVs >100 kb, although not statistically significant (812 kb vs 314 kb, p=0.062) (table 3). In our sample, the mutation rate for CNVs among individuals born following spontaneous preterm births was 3.3% (9/273) and following medically indicated preterm birth 1.9% (4/214). The range of gestational ages for individuals with a de novo CNVs was 24–31 weeks.

Table 3

Rate and median size of de novo CNVs in very preterm infants and in the comparator group

Several of the de novo CNVs have likely or known causal effects on developmental delay, neuropsychiatric and neurological disorders as well as other health problems (table 2). These were: (1) a 2 Mb de novo duplication on chromosome 17 spanning PMP22 that is known to cause Charcot-Marie-Tooth Type 1A disease (28-week gestation boy); (2) a de novo deletion of 4.9 Mb on the maternal originated copy of chromosome 15, causing Angelman syndrome (31-week gestation boy); and (3) a de novo deletion on chromosome 4 known to cause Wolf-Hirschhorn syndrome (30-week gestation girl). Other de novo CNVs associated with neurodevelopmental disorders were a deletion of 622 kb on 2p16.3 that spanned exons of NRXN1 gene (associated with schizophrenia and intellectual disability) in a boy born at 29 weeks’ gestation, a deletion of 812 kb on chromosome 3, which spans CNTN4 gene (implicated in ASD) in a girl born at 25 weeks’ gestation, and a 297 kb 15q11.2 duplication spanning CYFIP1 in a girl born at 30 weeks’ gestation. We also found a very large (>11 Mb) de novo pericentric duplication on chromosome 8 in a boy born at 30 weeks.

All rare pathogenic CNV (transmitted and non-transmitted in parents and probands)

We list pathogenic CNVs in parents and probands in online supplementary table S1 and summarise their frequencies in table 4. There were a total of 21, 13 and 10 known pathogenic CNVs in infants, mothers and fathers, respectively, translating to frequencies of 4.3%, 2.7% and 2.0%. The rate for the same list of CNVs in the UK Biobank is 2.3%.

Table 4

Rates of rare pathogenic CNVs identified in infants born very preterm and their parents and compared with UK Biobank


This is the largest genome-wide study to date of rare CNVs in very preterm infants and parents. We identified 14 de novo CNVs in 13 infants born very preterm, with one individual having two de novo CNVs. Overall, the frequency of de novo CNV was higher than the rate reported in a healthy comparator group (2.9% vs 2.1%, respectively) and was more pronounced when we only included large CNV over 100 kb in length (2.7% in very preterm infants vs 1.6% in comparator group). The latter approach may be a more valid comparison, as smaller CNV calling is more reliant on array type and genotyping quality. De novo CNVs in those born very preterm were also much larger than in the comparator group. In general, large CNVs are more likely to have adverse impacts when they span a longer section of the genome and are therefore more likely to encompass an increased number of genes.33 34 As de novo CNVs have not been subjected to selection pressure, they are also more likely than transmitted CNV to have a serious adverse effect on the individual.32 35–37

Our sample of children born very preterm has a larger rate and size of de novo CNVs than expected. This suggests that de novo CNVs may arise because of factors acting around the time of conception. Larger, well-powered studies are required to address this possibility as well as longitudinal studies to assess if the CNVs identified contribute to health problems known to be associated with preterm birth such as ADHD, autism and developmental delay. While we observed a higher de novo mutation rate in the ‘spontaneous preterm birth’ subgroup, the aetiological pathways for ‘spontaneous’ and ‘medically indicated’ preterm births may not be distinct.38 Most recognised risk factors such as advanced maternal age and lower socioeconomic status are shared between these categories. The conditions that lead to ‘medically indicated’ deliveries will result in adverse pregnancy outcome (early fetal demise or preterm birth) without intervention. Furthermore, the recurrence risk of spontaneous preterm birth is elevated after medically indicated preterm birth in previous pregnancies, and vice versa, suggesting that the two categories share common aetiologies.39

Although our study sample was not large enough to detect any significantly associated individual CNV, three de novo CNVs that we identified in our very preterm-born cohort are known to cause rare pathogenic diseases or syndromes that have significant detrimental impacts on development and physical health. Charcot-Marie-Tooth disease also known as hereditary motor and sensory neuropathy40 has an approximate prevalence of 1 in 2500.41 Prader-Willi syndrome or Angelman syndrome occurs in approximately 1 in 15 000 births and Wolf-Hirschhorn syndrome, characterised by growth impairment, developmental delay, intellectual disability, seizures and distinctive physical features occurs in approximately 1 in 50 000 births.42

The de novo pericentric duplication of chromosome 8 we observed has been reported previously in individuals with a variety of difficulties, including developmental delay,43 although the extent to which it is pathogenic is debated.44 Three additional CNVs are known to be associated with neuropsychiatric disorders and intellectual disability. Exonic deletions of the NRXN1 gene are robustly associated with schizophrenia45 and ASD,46 as well as developmental and language delay.47 These conditions have also been robustly associated with preterm birth.48–50 CNVs affecting CNTN4 51 and CYFIP1 have been implicated in autism.52

The findings suggest that de novo CNVs with adverse clinical consequences may be more common than expected in those born very preterm, and if this is true, the same is likely true for small other forms of de novo exonic mutation. We speculate that these deleterious preterm-associated mutations involve genes with roles in embryonic processes and fetal development and the pathogenic consequences of these mutations cover a spectrum of outcomes from early fetal demise to pregnancy complications and fetal health disorders, resulting in preterm birth. Large-scale genomic studies of those born preterm, especially very preterm, potentially using whole exome or whole genome sequencing approach,12 are needed to test this hypothesis and guide future decisions about the value of newborn genomic screening in this high-risk group. This is a highly topical debate given the many associated ethical, social and legal issues.53 A recent UK report has recommended against screening all babies using whole-genome sequencing ( However, the balance of benefits to hazards of undertaking genome analysis in groups that are highly enriched for genetic disorders, some of which are amenable to therapeutic intervention,54 may favour screening. Our data suggest that infants born very preterm may be such a group.

We also examined transmitted rare pathogenic CNVs. This also showed the possibility of elevation in the very preterm offspring as did the rate in their mothers. Twin and family investigations have found a significant maternal and fetal genetic contribution to preterm birth9 10 but only a negligible paternal contribution.11

It is difficult to draw direct comparisons from our study and previous studies because of differences in the methods used. The most comparable investigation, using a trio design and sequencing, suggested an increase in preterm birth of de novo mutations in genes involved in early fetal brain development.22 The preterm infants in this study were born at a higher gestation (predominantly 32–37 weeks’ gestation) than our study cohort. We found no overlap in the reported regions of de novo mutations as the CNVs identified in our study.

Our study has limitations and strengths. Although larger than most previous studies, it was underpowered to identify individual rare pathogenic variants. We only assessed rare CNVs, which is only one component of the spectrum of genetic variation.55 It was also not possible to investigate maternal de novo mutations in the mother–father–offspring study design. Larger samples, sequencing studies as well as longitudinal follow-up to assess links between preterm birth and later health outcomes will be important for the future especially as such studies may help guide clinical practice.

In summary, preterm birth is a growing public health problem with implications across the life span. However, there have only been a small number of investigations of rare genetic variants,13–15 22 with most studies showing inconsistent results without adequate power. Our findings suggest that the rate of de novo CNVs, especially rare pathogenic variants, could be elevated in those born very preterm. However, a larger study will be necessary to corroborate this conclusion.


We are extremely grateful to the participating families, staff at all the neonatal units. We would like to thank R Colquhoun at the National Neonatal Data Analysis Unit at Imperial College London and L Bates, J Morgan, N N Vinh, A Evans, L Hopkins, L Tram, S Jaques and S Lewis for laboratory and technical assistance at the Division of Psychological Medicine and Clinical Neurosciences at Cardiff University School of Medicine. This research has been conducted using the UK Biobank Resource under Application Number 14421.



  • HSW and MW are joint first authors.

  • Twitter @hilaryswong, @na

  • Contributors NM and AT conceptualised this study; NM, AT, MCO and HSW designed the study; AT, MCO, NM and HSW obtained funding. HSW coordinated the clinical aspects including sample collection; MW wrote the first draft of the manuscript supervised by AT and MCO, and edited by NM; CNVs were called by AE; CNV analyses were overseen and supervised by GK; all authors contributed to the editing of the manuscript and approved the final version.

  • Funding This pilot study was funded by the Medical Research Council, UK (reference MR/N025288/1).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available on reasonable request. Anonymised participant data may be available for reuse, subjected to research ethics committee approval. Request to be made to the corresponding author.