Article Text

Download PDFPDF

Original Article
GWAS on prolonged gestation (post-term birth): analysis of successive Finnish birth cohorts
  1. William Schierding1,2,
  2. Jisha Antony3,
  3. Ville Karhunen4,5,6,7,
  4. Marja Vääräsmäki4,5,6,7,8,
  5. Steve Franks4,5,6,7,
  6. Paul Elliott6,
  7. Eero Kajantie8,9,10,
  8. Sylvain Sebert4,5,11,
  9. Alex Blakemore4,5,6,7,
  10. Julia A Horsfield2,3,
  11. Marjo-Riitta Järvelin4,5,6,7,
  12. Justin M O’Sullivan1,2,
  13. Wayne S Cutfield1,2
  1. 1 University of Auckland, Auckland, New Zealand
  2. 2 Gravida: National Centre for Growth and Development, University of Auckland, Auckland, Auckland, New Zealand
  3. 3 Department of Pathology, Dunedin School of Medicine, The University of Otago, Dunedin, New Zealand
  4. 4 Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu, Finland
  5. 5 Biocenter Oulu, University of Oulu, Oulu, Finland
  6. 6 Department of Epidemiology and Biostatistics, MRC–PHE Centre for Environment & Health, School of Public Health, Imperial College London, London, UK
  7. 7 Unit of Primary Care, Oulu University Hospital, Oulu, Finland
  8. 8 PEDEGO Research Unit, MRC Oulu, Oulu University Hospital and University of Oulu, Oulu, Finland
  9. 9 Chronic Disease Prevention Unit, National Institute for Health and Welfare, Oulu, Finland
  10. 10 Children’s Hospital, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
  11. 11 Department of Genomics of Complex Diseases School of Public Health, Imperial College London, London, UK
  1. Correspondence to Dr Justin M O’Sullivan, Liggins Institute, University of Auckland, Private Bag 92019, Auckland, New Zealand; justin.osullivan{at}auckland.ac.nz

Abstract

Background Gestation is a crucial timepoint in human development. Deviation from a term gestational age correlates with both acute and long-term adverse health effects for the child. Both being born preterm and post-term, that is, having short and long gestational ages, are heritable and influenced by the prenatal and perinatal environment. Despite the obvious heritable component, specific genetic influences underlying differences in gestational age are poorly understood.

Methods We investigated the genetic architecture of gestational age in 9141 individuals, including 1167 born post-term, across two Northern Finland cohorts born in 1966 or 1986.

Results Here we identify one globally significant intronic genetic variant within the ADAMTS13 gene that is associated with prolonged gestation (p=4.85×10−8). Additional variants that reached suggestive levels of significance were identified within introns at the ARGHAP42 and TKT genes, and in the upstream (5’) intergenic regions of the B3GALT5 and SSBP2 genes. The variants near the ADAMTS13, B3GALT5, SSBP2 and TKT loci are linked to alterations in gene expression levels (cis-eQTLs). Luciferase assays confirmed the allele specific enhancer activity for the BGALT5 and TKT loci.

Conclusions Our findings provide the first evidence of a specific genetic influence associated with prolonged gestation. This study forms a foundation for a better understanding of the genetic and long-term health risks faced by induced and post-term individuals. The long-term risks for induced individuals who have a previously overlooked post-term potential may be a major issue for current health providers.

  • gwas
  • post-term
  • prolonged gestation
  • adamts13
  • b3galt5

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Gestation is a crucial period of human development. Being born too early (preterm, <37 weeks gestation) or too late (post-term, ≥42 weeks gestation) can have significant acute and long-term health consequences.1 While preterm birth has received substantial attention,2–4 post-term birth has been scantily explored despite approximately 3%–5% of all births each year being post-term.5 Prolonged gestation poses a unique set of acute and long-term adverse health outcomes, including an increased need for intervention during labour and risk factors for truncal obesity, insulin resistance, altered lipids and elevated blood pressure.6 7 Thus, there is a vital need to understand the role of genetic variants on post-term birth.

The acute health risks, for both the mother and the child, of being born post-term are well documented (for review see reference 8). Consequently, induction of birth at or before 41 weeks gestation is recommended in order to reduce the acute risks associated with post-term birth.5 8 As a result, approximately 25% of the routine inductions in Australia in 2010 were primarily performed to prevent prolonged pregnancy. However, the rules regarding the decision to induce labour are not consistently applied across different hospitals, reflecting the influence of opinions of individual practitioners and differing staff routines.9 Despite this, induction remains an excellent intervention, and its application has reduced post-term births from approximately 20% of all births in the 1960s10 to the modern-day rate of under 5%.5 11 However, there is a possibility that the long-term risks associated with post-term birth are a part of the genetically informed trajectory for induced individuals. In this case, induction would not change this aspect of the biology of post-term individuals. Family and twin studies attribute 25%–40% of the variation in gestational age to genetic factors12–19 with fetal (26%) and maternal (21%) factors each explaining nearly half of this variation.18 Thus, there is a large population of individuals who have ‘post-term potential’20 and possibly face the long-term health risks of post-term birth without actually having been born post-term (due to pregnancy ending from obstetric management).

Evidence supports the hypothesis that it is the fetus that determines the timing of labour rather than the mother. Therefore, we have investigated the genetic architecture of gestational age in 9141 Northern Finnish (white European) individuals (1167 post-term) across two birth cohorts (Northern Finland Birth Cohort (NFBC) 1966 and NFBC1986). Here we identify intronic genetic variants within the TKT, ARGHAP42 and ADAMTS13 genes and intergenic upstream (5’) of the B3GALT5 and SSBP2 genes that are associated with prolonged gestation.

Materials and methods

Subjects

We undertook a discovery-replication study of two successive birth cohorts from Northern Finland (for cohort information including loss-to-follow-up, please see the cohort papers: NFBC196610 and NFBC198611). Both cohorts were recruited from the two northernmost provinces of Finland (ie, Oulu and Lapland). Each cohort followed participants prospectively from approximately 12–16 weeks of gestation, providing one of the earliest-known cohorts with accurate gestational age determination.10 11

The NFBC1966 dataset consists of 12 231 children born to 12 068 mothers. This cohort represents 96% of all children born in Oulu and Lapland in 1966 with expected delivery dates between 1 January and 31 December 1966. Blood samples of the children in this cohort were collected for genotyping at age 31 (ie, in 1997) and genetic data were available for 5402 individuals. Genotyping was completed using Illumina HumanCNV370DUO Analysis BeadChip and the Beadstudio 3.1 algorithm.

The NFBC1986 dataset consists of a prospectively recruited cohort containing 9432 children born to 9362 mothers. This cohort represents 99% of all available births in Northern Finland between the 1 July 1985 and 30 June 1986. Blood samples from the children of the NFBC1986 cohort were collected for genotyping at 16 years of age (ie, in 2002–2003). Genetic data are available for 3739 individuals in total (~500 were selected as representing individuals with Gestational Diabetes Mellitus (GDM), Gestational Hypertension (GHT) and preterm birth; the remaining represented a random sample of the cohort). Genotyping was completed using the OmniExpresse Exome Chip and the Beadstudio 3.1 algorithm.

Defining the post-term dataset

The gestational ages of the individuals in the 1966 and 1986 NFBC cohorts were calculated at their first antenatal visit. For the NFBC1966 cohort, gestational age was calculated through last menstrual period. In the NFBC1986 cohort, gestational age was based on ultrasound at <20 weeks of gestation or on the last menstrual period, with discrepant cases reviewed in detail from medical records as previously described.21 The control cohort was restricted to those born at full-term which was defined as between 38 0/7 and 40 0/7 weeks of gestation. Children born between 37 0/7 and 37 6/7 weeks (ie, Early term) or 41 0/7 and 41 6/7 weeks (ie, Late term) were excluded to reduce mischaracterisation due to errors in the calculated gestational age. The post-term case cohorts included those individuals born at ≥42 0/7 weeks of gestation. To further reduce the chances of obscuring the genetic potential of gestational age, we excluded individuals from our control cohort who were born early due to induction of labour or other factors: (1) those born from multiple births; (2) those whose mother had gestational diabetes (prediabetes in 1966 cohort) and (3) those whose birth was by planned caesarean section. Gestational and prediabetes were included as exclusion criteria because it is a strong indicator for induced delivery, which could have biased the selection in the 1966 post-term cohort.22

Quality control of genetic data

Genetic data were vetted for quality control. Genetic data for a subject were excluded if: (1) the call rate was <95% (99% if the minor allele frequency <5%); (2) the mean heterozygosity was <0.29; (3) there were multidimensional scaling outliers; (4) the concordance with other DNA samples in the cohort ≥0.99 (risk of being duplicated sample); (5) identity by state (IBS) pairwise comparisons were >0.99 with most other samples (suspicion of samples being contaminated); (6) IBS pairwise sharing was >0.20; (7) consent was not given; (8) comparison to medical records identified a gender-genotype mismatch; (9) there was an elevated heterozygosity rate (four or more SD from the mean) or (10) there was significant deviation from the Hardy-Weinberg Equilibrium (p<0.0001).

After all QC measures and exclusions were applied, 5402 and 3739 individuals remained in the 1966 and 1986 studies, respectively. These included 1034 post-term individuals and 2375 term-born controls from the NFBC1966 cohort and 133 post-term individuals and 1250 term-born controls from the NFBC1986 cohort.

Imputation of genetic data

Impute V.2 was used to estimate the SNPs that were not sampled directly by the genotyping platform for the NFBC1966 samples. The imputation used HapMap 2 (Build 36) as the reference panel and proper_info>0.4 as the quality metric.23 24 Before imputation, there were 309 948 directly genotyped SNPs. After imputation, 3 855 963 SNPs, including those directly genotyped, were available from the 1966 genotypes for analysis.

Imputation of the missing 1986 genotypes was carried out in two steps: (1) a prephasing step that estimated haplotypes for all available samples using the SHAPEIT program with the 1000 genomes reference panel as the guide and (2) an imputation step (Impute V.2) that imputed the missing alleles directly onto the phased haplotypes.23 24 After imputation, 59 683 063 SNPs, including those directly genotyped, were available from the 1986 genotypes for analysis.

Statistical analysis

SNPTEST V.224 was used to perform all genetic analyses on the imputed genetic data for both cohorts. In the regression analysis, the main effects model tested the association between SNP markers and gestational age. SNP genotypes were coded as 0, 1 or 2 (according to the number of copies of the minor allele) and an additive model of genetic variance was assumed where the effect on the trait of the heterozygote was estimated to be midway between the levels of the two homozygotes. This model fits the best assumption for a post-term phenotype that is thought to have a small amount of genetic variance produced by multiple genetic variants in combination.

All genetic analyses also accounted for child’s sex, as this was the only trait that has consistently shown a large effect on post-term birth status in any previous post-term studies.15 25–27

The p value for results that were suggestive of statistical significance in the discovery phase in each cohort was set at any p value less than 1×10−5. In the validation phase, any finding (p<0.05) within the significant linkage disequilibrium (LD) block was considered validation of significance of the locus. These p values were selected because we planned functional analyses to confirm the significant variants,

Quantile–quantile plots were generated (qqman package in R), by plotting the expected distribution of p values versus the observed p values (assuming a uniform distribution), to test for possible sources of p value inflation.

Spatial analysis of the validated SNPs for putative regulatory roles within the genome

HiC spatial genomic connectivity (HiC) data were used to identify genes that SNPs connected to.28 29 GWAS3D30 was used, with default parameters, to identify physical connections (as captured by proximity ligation) that occurred with the most significant Genome-Wide Association Study (GWAS) SNPs.

Identification of gene expression alterations associated with the GWAS loci

eQTL analysis identifies SNPs that associate with altered expression level(s) of one or more genes.31 The Genotype-Tissue Expression (GTEx) project database V.6 eQTL data are powered for the global examination of larger eQTL effects.32 Therefore, we limited false positives in our trans-eQTL results by only testing eQTLs supported by SNP-gene spatial interactions. Significance levels for this analysis were based on evidence from prior literature:29 32 33 cis-eQTL (genes<1 Mb distance from SNP, p<1×10−4), trans-eQTL (longer distance or interchromosomal, p<1×10−3). Thus, the identification of a SNP-gene spatial interaction that is reinforced by SNP-gene eQTL has two independent sources of evidence verifying the long-distance transcription regulatory functions.

Luciferase assays

Enhancer activity of the post-term SNPS in proximity to loci (B3GALT5, ARHGAP42, ADAMTS13, SSBP2 and TKT) was measured by luciferase assay. Briefly, the regions spanning each SNP were PCR amplified from genomic DNA obtained from 1000 genome samples cloned into the Gateway adapted pGL4.23-GW (Addgene Plasmid #60323)34 and sequenced to confirm the genotype. For rs11170213, no sample genotype information could be obtained from the 1000 genomes, so the region spanning the allele ‘A’ of the SNP was amplified from MCF-7 genomic DNA. The Allele ‘C’ version of the SNP was generated by site-directed mutagenesis (SDM) of the cloned pGL4.23 plasmid using the QuickChange mutagenesis protocol (Agilent Technologies). Primers for PCR amplification and SDM are listed in online supplementary table 1. The ADAMTS13 locus could not be amplified by PCR. HeLa cells were seeded at 5×103 cells per well in a 96 well plate, grown in DMEM media supplemented with 10% fetal bovine serum (Thermo Scientific, #11 995–065) 1 day prior to transfection. Cells were cotransfected with the cloned pGL4.23 and renilla plasmids using Lipofectamine 3000 (Thermo Scientific, #L3000008) and the Promega Dual Glo Luciferase Assay System (#PME2920) was used to measure luciferase activity after 48 hours. Luminescence was normalised to Renilla and expressed relative to the normalised luminescence of empty pGL4.23. Results are from four independent biological replicates.

Supplementary file 1

Results

Discovery phase

NFBC1966 variants associated with post-term birth

Analysis of the NFBC1966 post-term cohort identified six GWAS peaks that were suggestive of global significance (p<1×10−5, figures 1a and 2a). Two clusters of variants were located within introns in the B3GALT5 (lead SNP rs1534080) and DNHD1 (rs12285957) genes. An additional four clusters of variants were located within intergenic regions on chromosomes 10 (two regions), 12 and 15 (see online supplementary table 2).

Figure 1

Manhattan plots of the discovery phase of the post-term GWAS for the (A) NFBC1966 and (B) NFBC1986 cohorts. The –log10 observed p values (2-tailed) for the GWA (y-axis) are plotted versus the chromosomal position of each SNP (x-axis). The blue line indicates significance for follow-up (p<1×10−5) through cross-validation, while the red line indicates global significance (5×10–8). Only SNPs in the ADAMTS13 locus in the NFBC1986 cohort reach genome-wide significance in the discovery phase. NFBC, Northern Finland Birth Cohort.

Figure 2

Q–Q plots of the quantiles of expected versus observed –log10(p value) of the association with gestational age in the (A) 1966 and (B) 1986 cohort. The negative logarithm of the expected (x-axis) and the observed (y-axis) p values for the GWA analysis is plotted for each SNP (black dots). Deviation from the red line indicates points whose observed values are deviating from the null hypothesis of no true association. Inflation factors (λ) near 1 suggest that population stratification was adequately controlled.  NFBC, Northern Finland Birth Cohort.

NFBC1986 variants associated with post-term birth

Analysis of the NFBC1986 post-term cohort identified 25 significant GWAS peaks (p<1×10−5, see online supplementary table 2, figures 1B and 2B). The lead SNPs for 14 of these GWAS peaks were intronic: AC079779.5 (rs72774524), ADAMTS13 (rs655911), ANO4 (rs11609845), ARHGAP42 (rs78598508), ASAH2 (rs75320537), C14orf37/PSMA-AS1 (rs78874632), CTD-2277K2.1 (rs191706929), DCDC2C (rs12612077), DTWD2 (rs17440178), ESR1 (rs117533178), FAT3 (rs7950344), KCNB2 (rs79648768), RIN3 (rs6575274) and TKT (rs4687715). Only the SNP at the ADAMTS13 locus was globally significant (p<5×10−8).

Eleven intergenic loci were also associated with gestational age in the NFBC1986 cohort. These intergenic loci were located ≤116 kbp from a coding exon (gene): AL671972.1 (rs10995050, 7.2 kb downstream), GRIK2 (rs183770336, 724 kb downstream), HMX1 (rs145023824, 75 kb upstream), KCNA5 (rs2239507, 2 kb upstream), LRPPRC (rs62135521, 73 kb upstream), RP11-289F5.1 (rs10780480, 116 kb upstream), RP11-465K16.1 (rs7013779, 40 kb upstream), RP11-644L4.1 (rs72965926, 5.3 kb downstream), SSBP2 (rs2135, 31 kb upstream), ZFR (rs66858738, 7.7 kb upstream) and 7SK (rs11610162, 11 kb upstream)

Of the list of genes located close to the loci that were significantly associated with post-term birth in the 1986 cohort, only ARHGAP42 has previously been associated with a developmental phenotype (age at menarche in a Japanese population, rs12800752).35

Validation of results of the discovery phase

The 6 NFBC1966 loci and 25 NFBC1986 loci were tested for cross-validation (ie, significance) in the opposite cohort. Of these loci, none were associated with gestational age at a p value of ≤1×10−5 in both cohorts. However, the B3GALT5, SSBP2 and TKT GWAS loci (hereafter referred to as post-term loci) were validated as significant in both the 1966 and 1986 cohorts (ie, discovery p<1×10−5 and validation p<0.05, see online supplementary tables 2A and 2B). ADAMTS13 reached global significance (p<5×10−8)36 in the NFBC1986 cohort and was included in the post-term loci for further analyses (see online supplementary table 2B).

The rs78598508 variant, which is located within ARHGAP42 and associated with gestational age in the 1986 cohort, was not measured in the 1966 cohort. Furthermore, rs78598508 was not in strong LD (>0.9 r2) with any other variants. Therefore, rs78598508 could not be tested for cross-validation in this study (see online supplementary table 2). This is a limitation due to the historical use of different platforms for the SNP detection. However, given that ARHGAP42 has previously been associated with a developmental phenotype (age at menarche in a Japanese population, rs12800752),35 it was included in further analyses in this study.

Identification of spatial and functional connections to post-term birth

Spatial associations

We screened Haploreg V.4.1 (1000 genomes haplotype data) to show that none of the lead SNPs in the ADAMTS13, ARHGAP42, B3GALT5, SSBP2 and TKT loci are in LD (r2>0.95 and D’>0.95) with any variants located within exons or critical transcriptional processing sequences (eg, intronic branch sites, polyA signals or transcription termination signals). Thus, there is no evidence that these SNPs directly impact on protein function through aberrant transcript processing. As such, we hypothesised that the post-term SNPs were affecting enhancer regions (ie, short genomic regions that are bound by transcription factors) and altering the transcriptional regulation of distant genes.

Interactions between the lead SNPs at each post-term locus and distant genes were screened for using GWAS3D (figure 3 and see online supplementary table 3). The ADAMTS13 locus spatially connects to the SLC2A6 (9q34), COL5A1 (9q34.2-q34.3) and RABGAP1L (1q24) loci. This is notable as ADAMTS13, SLC2A6, COL5A1, and additional genes within 9q34 have been implicated in the coagulation process and associated with ovarian function.37 The ARHGAP42 locus spatially connects to an intergenic region downstream of FAM133A. The B3GALT5 locus spatially connects to the Down Syndrome Cell Adhesion Molecule (DSCAM) locus. DSCAM is a member of the immunoglobulin superfamily of cell adhesion molecules that are involved in human central and peripheral nervous system development. The SSBP2 locus shows no significant spatial connections in the Hi-C data in GWAS3D. The TKT locus spatially connects to an intergenic region in 21p11.2 which contains predicted open reading frames that encode undefined proteins.

Figure 3

Spatial results from GWAS3D identify four significant spatial connections between loci in the validated GWAS data and distant genomic regions. The SNP associated with the ADAMTS13 locus had multiple spatial connections, SSBP2 had none, while the others only exhibited a single spatial association. Only the spatial connections with high confidence scores are plotted here (thickness of the red line).

Locus-specific transcriptional (eQTL) associations with gene expression

Functional-regulatory roles for the SNPs we identified in this study were refined by testing the spatial SNP-gene pairs for significant eQTLs using the GTEx database V.6. Variants in the ADAMTS13 (tibial nerve tissue, p=1.50×10−8), B3GALT5 (thyroid tissue, p=9.0×10−5) and TKT (left ventricular heart tissue, p=2.0×10−5) loci associate with altered expression changes within these genes, confirming that these SNPs fall within loci that regulate their local gene landscape (see online supplementary table 4). In addition, rs655911 (intronic, ADAMTS13) also showed eQTL associations with the expression of SLC2A6, reinforcing the significance of the spatial connection. The variant associated with the ARHGAP42 locus (rs78598508) showed no significant eQTLs.

The SNP rs2135, which was 44 kb upstream of SSBP2, did not show any evidence of spatial connections to other regions. Moreover, there was no evidence of a cis-eQTL between this SNPs and the SSBP2 gene itself. A global survey of eQTL associations with the SSBP2 SNPs, within GTEx, did not identify any globally significant eQTLs (see online supplementary figure 1). Analyses indicated putative eQTLs between the SSBP2 lead SNP (rs2135; 5q14.1) and HBG1 (11p15.5, Lung, 1.10×10–5); HLA-DRB5 (6p21.3, Whole Blood, 2.50×10–5) and FYB (5p13.1, Mucosa of the Oesophagus, 4.00×10–5) (see online supplementary figure 1).

Locus-specific enhancer (Luciferase) associations with gene expression

The lead SNPs that were proximal to ARHGAP42 (rs78598508), B3GALT5 (rs111702173, rs560928), SSBP2 (rs2135) and TKT (rs4687715) were screened for enhancer activity (figure 4). ADAMTS13 (rs655911) was unable to be cloned and could not be tested. Cloning loci with alternate alleles allowed the measurement of the effect of genetic variation on the observed enhancer activity. Luciferase assays in Hela cells revealed a pronounced enhancer effect for the loci containing ARHGAP42 (rs78598508), B3GALT5 (rs111702173, rs560928) and TKT (rs4687715) and a repressive effect for SSBP2 (rs2135) (figure 4). For rs4687715 and rs560928, the region shows a differential enhancer allelic effect. Therefore, for two of the five loci tested, the enhancer activity associated with these gestational age associated regions is sensitive to the identity of the haplotype at the SNP position. For the remaining three regions, the SNP tested did not have a measurable allelic effect in HeLa cells, but still showed significant enhancer/insulator capabilities.

Figure 4

Post-term associated SNPs show allele specific enhancer and repressor effects. All amplified regions, except that containing rs21355, acted as enhancers. There were significant differences (p<0.0001) between the enhancer activity of the ‘A’ and ‘G’ versions of rs4687715. Similarly, there were significant (p<0.05) differences between the enhancer activity of the ‘C’ and ‘T’ alleles of rs560928 in HeLa cells. Notably, DNA amplicons containing the A and G alleles of rs2135 acted as a repressor of basal activity. PCR-amplified genomic DNA with the indicated SNP variants were assayed for their ability to drive luciferase expression in HeLa cells. The increase in luminescence indicates that competence for transcription depends on the alellic version of these SNPs. Error bars represent ±SEM from four biological replicates and significance was determined by one-way ANOVA. Asterisks above the bars denote significant differences compared with the empty pGL4.23 vector control. Allele specific differences that are significant are indicated. ****p<0.0001, ***p<0.001, **p<0.01, *p<0.05.  ANOVA, analysis of variance.

Discussion

Previously, there has been indirect evidence of a genetic component to post-term birth, as children born post-term are more likely to have a sibling or mother born post-term.6 This study is the first to identify specific genetic variants in proximity to the ADAMTS13, B3GALT5, SSBP2 and TKT genes as being associated with prolonged gestation. Data on the spatial connections with these loci, eQTLs and enhancer activity are consistent with these post-term variants acting as functional determinants of gestational length. Thus, the SNPs associated with the ADAMTS13, B3GALT5, SSBP2 and TKT loci may alter the expression of these genes, contributing to the post-term phenotype by affecting regulation of processes involved in human development such as growth and metabolism, and, more specifically, haematopoiesis.

Post-term associated SSBP2 and TKT are linked to alterations in cellular growth, proliferation and metabolism

Alterations in cellular growth, proliferation and metabolism preprogram biological development (eg, pentose phosphate pathway (PPP)) resulting in an amplified risk of chronic non-communicable disease (ie, diseases of long duration and slow progression including cardiovascular disease, diabetes and obesity).38 Proteins encoded by the SSBP2 and TKT genes are involved in cellular growth, proliferation and metabolism and are thus capable of altering developmental trajectories. For example, TKT is involved in carbohydrate metabolism39 and could contribute to the later-in-life increased adiposity and risk of metabolic syndrome in children and adults born post-term.6 7

Post-term associated TKT is linked to alterations in cell cycle and growth

TKT dysregulation has an important role in cellular growth rates, oocyte cell cycle progression and maturation.39 The TKT gene encodes a protein that contributes to the main carbohydrate metabolic pathways by connecting the PPP to glycolysis. This process results in NADPH synthesis. NADPH is part of the control for reactive oxygen species, which were found to be imbalanced in post-term births.40 Underexpression of TKT in maternal mice contributes to pregnancy resulting in fewer progeny, retarded postnatal growth and reduced levels of adipose tissue in offspring.41 This phenotype has similarities to post-term infants, who are typically born lean.14 Collectively, the effects of aberrant TKT expression are consistent with TKT variation in humans contributing to aberrant gestational timing. The links between variation in TKT function and metabolism may help to partially explain the observed links between post-term birth and the later development of symptoms of the metabolic syndrome.6

Post-term associated ADAMTS13 and SSBP2 are linked with haematopoiesis and blood disorders

Alterations in ADAMTS13 and SSBP2 levels could be affecting gestation through alterations in hematopoietic pathways. There is a large developmental aspect to haematopoiesis during different phases of gestation. Maturity of the hematopoietic system occurs late in gestation and tracks with gestational age, with the proportions of fetal haemoglobin decreasing during the progression from preterm to term to post-term (83.93% to 68.59% to 60.03%, respectively).42 In post-term births, cord blood collected at birth shows differences in levels of polycythaemia (increased concentration of haemoglobin in the blood), erythropoietin levels (increased erythropoiesis), mean corpuscular haemoglobin, red blood cell count, neutrophil count and monocyte count.43

The ADAMTS13 gene encodes a protease that has previously been shown to disrupt the regulation of platelet thrombosis by cleaving Von Willebrand factor. Critically, ADAMTS13 variants are also known to cause neonatal platelet disorders. These disorders include Upshaw-Schulman syndrome (haemolytic anaemia) and blood hypercoagulation (thrombophilia), the second of which is associated with fetal loss.37 Additionally, the identification of a significant eQTL between a post-term associated B3GALT5 SNP and B3GALTL expression reinforces the significance of the pathways altered by the ADAMTS13 variants (see online supplementary table 4).35 Specifically, B3GALTL encodes a β-1,3 glucosyltransferase that is required to glycosylate the ADAMTS13 gene product as part of its preprocessing for Von Willebrand factor cleavage.44

Our eQTL analysis showed that the variants in SSBP2 were likely eQTLs with HBG1 and FYB (see online supplementary figure 1). HBG1 is a key component of the fetal haemoglobin locus that is normally expressed in the fetal liver, spleen and bone marrow, as a part of the constitution of fetal haemoglobin (HbF).45 In adults, the beta-globin locus is only accessible (chromatin open, DNase I hypersensitive) in adult erythroid cells; however, HBG1 shows chromatin accessibility in both fetal and adult erythroid cells.45 The FYB gene encodes a hematopoietic-specific protein involved in platelet activation.46 In mice, FYB knockout affects platelet function and causes mild thrombocytopenia.47 Therefore, our results are consistent with rs2135 being involved in dysregulation of genes that contribute to the production and development of fetal blood.

Therefore, the role of SNPs in both the ADAMTS13 and SSBP2 loci could result in a post-term birth phenotype arising through alterations in haematopoiesis. In future work, identifying the regulatory role of these regions could help elucidate the cause-and-effect relationships between alterations of haematopoiesis and gestational length.

Post-term birth versus the rise of more intensive obstetric management of birth

Analysing two cohorts from the same geographical region (Northern Finland) enabled us to control for regional and culture-specific differences in routine management of pregnancies. However, two major confounders remain between the 1966 and 1986 cohorts: (1) technological changes led to better estimation and certainty of gestational age in the 1986 cohort and (2) management practices led to changes in the incidence of induced labour and post-term birth. First, the uncertainty surrounding gestational age prediction in 1966 raises issues around phenotype definition in this cohort. The impact of this ambiguity on our study was minimised through the use of a narrow term gestational age range of 38 0/7 to 40 0/7.

There have been significant changes to obstetric management over the last 50 years, with a shift from conservative monitoring of prolonged pregnancies through to the current recommendations to induce women who are beyond 41 weeks gestation.5 8 48 The induction of labour became a therapeutic option between 1966 and 1986. Therefore, the NFBC1986 cohort contains births that would have been post-term if not for induction and/or Caesarean-section. Consistent with this, we observed a reduction in numbers of post-term births from approximately 20% in the 1966 NFBC cohort to less than 5% in the 1986 NFBC cohort. However, while the induction of labour is an improvement in obstetric management, it is possible that these individuals have ‘post-term potential’ and carry genetic risks that were not mitigated by the act of induction. Therefore, we excluded all term-born induced births from the 1986 cohort from the analyses.

Conclusion

We have identified genetic variants in proximity to the ADAMTS13, B3GALT5, SSBP2 and TKT loci as being associated with post-term birth in two birth cohorts (NFBC1966 and NFBC1986). This finding is consistent with previous observations that suggested there was a genetic component to post-term birth.12 13 15–20 26 Spatial and mRNA expression analyses further provided novel clues about how these loci contribute to the regulation and consequences of post-term birth. This study forms a foundation for a better understanding of the genetic and long-term metabolic health risks faced by induced and post-term individuals. Since nearly 20% of births in the NFBC cohort were post-term, the long-term risks for induced individuals who have a previously overlooked post-term potential may be a major issue for current health providers.

Acknowledgments

The authors would like to thank and acknowledge the contributions of Tuula Ylitalo, Nikman Adli Nor Hashim, Alexessander Da Silva Couto Alves, Andrianos M Yiorkas and Minna Männikkö, each of whom played a significant role in data generation and collaboration at the NFBC.

References

Footnotes

  • JMO’S and WSC contributed equally.

  • Contributors WS, JMO and WSC contributed to the first draft of the manuscript. All authors critically reviewed the manuscript. JA and JAH performed the molecular analyses. WS contributed to bioinformatic analyses. VK, MV, SF, PE, EK, SS, AB and M-RJ generated and reviewed clinical data. WS, JMO and WSC contributed to interpretation of molecular and bioinformatic data. WSC, M-RJ and JMO conceptualised the study.

  • Competing interests None declared.

  • Ethics approval Signed, informed consent and written permission to use their data for scientific research was obtained from the NFBC 1966 study participants at age 31. For the 1986 cohort, adolescents and parents received written and oral information and gave their written informed consent. The studies were approved by the ethics committees of each of the participating medical university study sites in Finland. The research protocols for both the 1966 and 1986 studies were approved by the Ethics Committee of Northern Ostrobotnia Hospital District, Finland.

  • Provenance and peer review Not commissioned; externally peer reviewed.