Introduction

Imprinted genes are defined by the functional nonequivalence of the maternal and paternal copy resulting in monoallelic expression in a parent-of-origin-dependent manner.1, 2, 3 Imprinting is known to be affected in numerous human disorders ranging from cancer to psychiatric diseases such as autism and schizophrenia.1, 2, 3 Imprinted elements have been identified in large chromosomal segments by parametric methods,4, 5, 6 but none of these novel imprinted elements have been mapped to a highly restricted area as accomplished for the well-characterized genes responsible for the Beckwith–Wiedemann Syndrome (11p15.5) or the Prader–Willi and Angelman Syndrome (15q11–q13).1, 2, 3

Cystic fibrosis (CF) is a generalized exocrinopathy caused by two defective copies of the cystic fibrosis transmembrane conductance regulator (CFTR) gene, whereby about 70% of CF-causing chromosomes in Caucasians carry the deletion F508del-CFTR.7 The large clinical variability among F508del-CFTR homozygous CF patients7, 8 points to a major role of environment and genetic background in shaping the disease phenotype.8 However, the majority of CF modifier studies deal with few or even one single-nucleotide polymorphism within one candidate gene,8 in contrast to systematic mapping to dissect neighboring, but equally plausible, candidate genes, which allows the identification of novel CF modulators.9

Previously, we could detect a paternally imprinted CF modulator within a 40 cM chromosomal segment 3′ to CFTR.10 In this study, we have used a highly refined customized microsatellite map to confine the paternally imprinted CF modulator to a 2.9-Mb locus on 7q34 in our study cohort, consisting of 29 clinically concordant and 19 clinically discordant F508del-CFTR homozygous sibling pairs and their parents. These sib pairs were selected by their extreme clinical phenotype and their CFTR mutation genotype from a set of 318 affected patient pairs,9, 10 accordingly representing the 15% most informative phenotypes of the European CF patient-pair population. CF nuclear families were genotyped at 11 microsatellite markers, chosen from 21 candidate marker sequences on the basis of their informativity and typability, which were selected from the genome database or developed de novo from raw genomic sequences spanning 7q31 to 7qtel.

Patients and methods

Informative F508del-CFTR homozygous patient pairs

As described in detail previously,11 we recruited CF twin and sibling pairs and their parents from 158 CF clinics from central Europe. Briefly, we selected dizygous F508del-CFTR homozygous CF siblings showing extreme clinical phenotypes as judged by a ranking algorithm that relies on two clinical parameters most sensitive for the course and prognosis of CF, that is, weight as percentage of predicted weight for height, and CF population centiles for the forced expiratory volume in 1 s, expressed as percentage of predicted values.11 We described concordant mildly affected patient pairs (CON+), concordant severely affected patient pairs (CON−), patient pairs who are concordant with an average disease severity (ND) and discordant sib pairs composed of one mildly and one severely affected sibling (DIS). Compared with the patient cohort of 37 CON+, CON− and DIS families used to map CF modulators on 12p13 and 16p12,9 we extended the cohort for the analysis of recombination break points on paternal chromosomes. For the analysis of parent-of-origin-specific decay of genomic sharing presented within this paper, all concordant phenotypes – CON+, CON− and ND – were pooled (CONC) and compared against the discordant patient pairs (DIS) of the European CF sibling population. As our analysis required the knowledge of the parental origin of the chromosomes, only families for whom we were able to recruit at least one parent or families with siblings who shared both their chromosomes along 7q could be enrolled. In total, we evaluated the genotypes of 48 families split into the following clinical phenotypes: 11 CON−, 13 CON+, 5 ND and 19 DIS.

Genotyping

All microsatellite markers were typed using PCR amplification with one biotinylated primer and high-resolution direct blotting electrophoresis with subsequent chemoluminescence detection as described elsewhere (Supplementary Table 1).9

Data evaluation

We compared the observed number of recombined and unrecombined chromosomes between concordant and discordant F508del-CFTR homozygous CF sib pairs. Statistical evaluation was performed using the program CLUMP by Sham and Curtis,12 relying on the hypothesis-free permutation analysis by Monte Carlo simulation. Results obtained with the CLUMP program were corrected for multiple testing by Bonferroni taking the seven independent genomic fragments observed on discordant paternal chromosomes (Figure 1) into account. All further case–control and family-based analyses were executed using the FAMHAP software package,13 which allows family-based analysis13 and accepts data evaluation in association studies on unrelated individuals, as well as on affected sib pairs.13 Nuclear families were analyzed using the Monte Carlo simulation-based association test,14 which can be viewed as an extension of the transmission–disequilibrium test15 to both nuclear families with more than one affected child and to multimarker haplotypes.

Figure 1
figure 1

Analysis of parent-of-origin-specific decay of sharing and allelic association at CFTR to 7qtel among concordant and discordant sibs. Recombination at markers between CFTR and 7qtel (a and b), as well as allelic association to disease severity (c) and intra-pair discordance (d), is shown. The physical map is displayed below panel (d), whereby the 4.43-Mb candidate region, flanked by markers no. 6 and no. 9, and the 2.87-Mb core candidate region, flanked by markers no. 7 and no. 9, are visualized as gray and black thick lines on the physical map below. Markers no. 6 to no. 9 are indicated by vertical lines throughout this figure. (a) Comparison of distributions of recombined and nonrecombined paternal CF chromosomes between concordant and discordant CF patient pairs (uncorrected single locus P-values). Bonferroni's correction for multiple testing for observed independent genomic fragments yields Pcorr=0.047 for marker no. 7 and Pcorr=0.061 for marker no. 9. (b) Decay of sharing among concordant and discordant F508del-CFTR homozygous sib pairs visualized as proportion of nonrecombined paternal chromosomes among concordant and discordant sib pairs. The candidate region and the core candidate region are defined on the basis of this analysis (see text for details). (c) case–control (CC) comparison of allele distributions between concordant mildly affected sib pairs (CON+) and concordant severely affected sib pairs (CON−). The minimal pCCCON+/CON− is observed at marker no. 7 (D7Sat3), localized centrally in the candidate region (P=0.0005). (d) case–control (CC) comparison of allele distributions between concordant (CONC) and discordant (DIS) sib pairs.

Identification of DNA motifs

Motifs associated with imprinted genes16 or CTCF-binding sites17 were identified on NT_079596 with the aligner Mummer.18 Alignments to shorter consensus sequences of 19–26 bp16 were rejected if they covered less than 80%, and alignments of longer consensus sequences of 35–48 bp16 were rejected if they covered less than 65% of the consensus motif.16 Alignments to the 14 bp degenerate consensus sequence of CTCF17 were only accepted if the similarities covered 11 consecutive bases without gaps or mismatches.

Bioinformatics: repeats in CpG islands

Global GC content of the entire contig was calculated using the program OligoWords19 with a window size of 300 bp and step size of 150 bp. Repetitive elements were surveyed using eTandem20 on the Galaxy server21 with default settings except for a maximum repeat size of 50 bp. CpG islands were localized using CpGPlot20 with a minimum island size of 300 bp but otherwise default settings. These two data sets were combined using the Galaxy intersect program. Graphs were plotted using the statistical language R.22

WWW resources

The URLs for internet resources and databases presented herein are as follows: Genomic sequences were retrieved from NCBI database resources (http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar).23 microRNA sequences and targets were predicted using miRbase at http://microrna.sanger.ac.uk/sequences/.24, 25 CpG islands and repetitive elements were located using CpGPlot and eTandem from the EMBOSS package20 available on the Galaxy server at http://main.g2.bx.psu.edu/.21 The Imprinting Gene Catalogue was accessed at http://igc.otago.ac.nz/home.html.26

Results

Monitoring parent-of-origin-specific decay of genomic sharing on 7q31-7qtel

We genotyped 29 clinically concordant and 19 clinically discordant F508del-CFTR homozygous sibling pairs9, 10 at 11 loci spanning a 38-Mb genomic area from the CFTR gene on 7q32 to 7qtel. CF is an autosomal recessive trait, and hence all CF siblings obligatorily share both alleles at the CFTR locus. Starting with IVS17bTA, a highly informative microsatellite marker within the CFTR gene, maternal and paternal chromosomes of both sibs were reconstructed along 7q. Using the information at neighboring markers to interpret loci with a noninformative phase, we implemented four previously typed markers10 into an integrated 15-marker map (Supplementary Table 1). The resulting microsatellite genotyping data set, consisting of approximately 2100 individual genotypes, was evaluated to compare the parent-of-origin-specific decay of intra-pair genomic sharing along 7q between concordant and discordant sib pairs (Figure 1a and b).

The frequency of recombined and non-recombined maternal chromosomes was similar comparing concordant and discordant pairs (P>0.26 at all loci, data not shown). In contrast, the distribution of nonrecombined and recombined paternal chromosomes was significantly different between concordant and discordant sib pairs at markers no. 6 (P=0.057), no. 7 (P=0.007), no. 8 (P=0.025), no. 9 (P=0.009) and no. 10 (P=0.018) (Figure 1a). After Bonferroni's correction for multiple testing of observed independent genomic fragments, significance was retained at marker no. 7 (Pcorr=0.047). Recombination events on discordant paternal chromosomes were detected 10 Mb earlier toward 7 cen compared with concordant paternal chromosomes (Figure 1b).

Owing to the more proximal onset of recombination on paternal chromosomes of discordant pairs, an elevated frequency of recombined chromosomes on paternal but not on maternal chromosomes was found to be associated with intrapair discordance in the clinical phenotype of F508del-CFTR homozygous siblings at several markers on 7q. This association can be explained by the presence of a paternally imprinted CF modifier in the analyzed genomic area, which determines intrapair discordance among CF siblings. To delineate the region that contains the putative imprinted gene(s) and/or their control elements, we described a candidate region, defined by the marker for which the significant onset of decay of sharing was observed among discordant pairs, and a core-candidate region, starting with the marker for which the maximum divergence of sharing was observed comparing the two contrasting phenotypes. Using these definitions, divergence in sharing of the paternal genomic sequence comparing concordant and discordant sibs points toward a 4.4 Mb candidate region, flanked by markers no. 6 and no. 9, which encompasses an even smaller 2.9 Mb core-candidate region, flanked by markers no. 7 and no. 9 (Figure 1).

Association with CF disease severity is observed within the 2.9-Mb core candidate region

We evaluated the allele distributions between concordant mildly affected, concordant severely affected and discordant patient pairs at all 7q marker loci (Figure 1c and d). Allele frequencies were significantly different at marker no. 7 (P=0.0005) comparing mildly and severely affected patient pairs (Figure 1c). Two alleles at marker no. 7, D7Sat3-10 and D7Sat3-13, accounted for 35% of CON+ chromosomes, whereas these alleles were observed on less than 5% of CON− chromosomes (Table 1). Less-prominent allelic associations were observed comparing concordant and discordant patient pairs (Figure 1d) at markers no. 8 (P=0.024) and no. 12 (P=0.017). As marker no. 7 defines the onset of the 2.9 Mb core candidate region, the colocalization of the association signal with CF disease severity and the paternally imprinted CF modifier predicted by the parent-of-origin-specific decay of genomic sharing might indicate that the underlying genes are the same.

Table 1 Allele frequencies for concordant mildly (CON+) and concordant severely (CON−) affected F508del-CFTR homozygous patient pairs at marker no. 7 (D7Sat3)

Linkage disequilibrium on 7q differs by phenotype and parental origin

We asked for the decay of linkage disequilibrium on CF chromosomes using the transmission–disequilibrium test (TDT) on all markers along 7q. For maternal transmissions, concordant and discordant families behaved similarly in the TDT, whereas differences between these two phenotypes were observed for paternal transmissions (Figure 2). At marker no. 1, localized at a 10-Mb distance 3′ to CFTR, no imbalance between transmitted and nontransmitted chromosomes was observed for paternal transmissions among discordant pairs and maternal transmissions of both phenotypes. In contrast, TDT was still significant at marker no. 1 for paternal transmissions of concordant pairs (Table 2). Although our previous analysis of decay of genomic sharing monitors only the direct effect of the most recent recombination event within the last generation on the patient's phenotype, the analysis by TDT reflects recombination events that have occurred on ancestral chromosomes. The correlation between the concordant phenotype and the TDT signal that is still detectable at a 10-Mb distance to CFTR on paternal chromosomes reflects an enrichment of those CFTR haplotypes in this highly selected sub-population on which CFTR and marker no. 1 are in LD in the present-day CF population. In summary, concordance among CF sibs is observed if the paternal, but not the maternal, genomic region 3′ to CFTR is shared among siblings.

Figure 2
figure 2

Decay of linkage disequilibrium at CFTR to 7qtel in concordant and discordant F508del-CFTR families. The transmission disequilibrium test (TDT) was carried out for all families and both parental chromosomes (a), restricted to paternal transmissions among discordant (b) and concordant (c) families and restricted to maternal transmissions among discordant (filled circles) and concordant (open circles) families (d) TDT was carried out using the software package FAMHAP with the setting ‘haptdt’.13 The map was extended to encompass the area 3′ to CFTR by incorporating D7S52510 and four SNPs in the 5′ (XV2c, KM19) and 3′ (HUG16RS,42 J3.11) region of the CFTR gene.10 The finding of a highly significant TDT at the polymorphic microsatellite marker IVS17bTA within the CFTR gene of P=4.26 × 10−26 is trivial, as this represents the imbalance between transmitted F508del-CFTR alleles and nontransmitted wt-CFTR alleles for the autosomal recessively inherited disease CF. However, the transmission disequilibrium on paternal, but not on maternal chromosomes of concordant pairs is still detectable at D7S514 (Marker1), located at a distance of 10 Mb 3′ to CFTR (see also Table 2).

Table 2 Transmission disequilibrium at CFTR (IVS17bTA) and marker no. 1 (D7S514) among families of concordant and discordant F508del-CFTR homozygous CF sib pairs

Exploratory analysis of the candidate region for imprinted genes

Although paternal imprinting has been reported for the 7q32 genes MEST27 (alias PEG1) and COPG2,28 this known imprinted cluster does not colocalize with the candidate region and hence is unlikely to be responsible for the clinical discordance among CF siblings with recombined paternal chromosomes. We have analyzed 7 Mb of genomic sequence of NT_079596 encompassing markers no. 6–no. 10 for elements that have been described to be associated with imprinted genes.

Wang et al16 described 16 consensus motifs of 19–48 bp near or within imprinted genes. Overall, 26 similarities covering at least 80% of the short 19–26 bp consensus sequences or at least 65% of the long 35–48 bp consensus sequences were observed in the analyzed 7-Mb region. Shorter alignments were rejected. In total, matches for 8 of the 16 imprinting gene-associated motifs were recognized on the 7q34 genomic sequence. As Wang et al16 differentiated between upstream-, downstream- and intronic consensus motifs, we have used these annotations to distinguish anchored and unanchored similarities. Half of the extragenic motifs could not be anchored to any nearby gene (seven anchored and eight unanchored similarities to upstream consensus motifs; two anchored and two unanchored similarities to downstream consensus motifs), whereas the majority of intronic motifs were found within genes (five anchored and two unanchored intronic consensus motifs). In total, 11 genes localized on the 7-Mb fragment were associated with imprinting-related motifs as described by Wang et al16 and/or predicted to be imprinted by Luedi et al,29, 30 who applied a genome-wide prediction algorithm to identify murine imprinted genes. The growth factor pleiotropin (PTN) and an aldo-keto reductase (AKR1D1), located immediately before and at the start of the core candidate region, were the only paternally imprinted genes within the 7q33-qtel syntenic region of the murine genome,29 suggesting that, besides the MEST/COPG2-area, an additional chromosomal segment is regulated by paternal imprinting on 7q. Pollard et al31 provided experimental evidence for imprinting KIAA1466 and HSPC049, both located in the candidate region near marker no. 6. In summary, 13 genes within the 4.4-Mb candidate region have been associated with imprinting by either in silico or experimental evidence (Table 3), although none of these genes seem to be directly connected to the pathophysiology of CF. In other words, on the basis of their annotation, none of these 13 genes is a straightforward candidate for a modifier gene for CF disease severity.

Table 3 Suggestions for imprinted genes on 7q33–7q34

CCCTC-binding factor consensus sites and CpG islands enriched with repetitive elements cluster in the core candidate region

Ishihara et al17 have described a 14-bp evolutionarily conserved consensus sequence for CCCTC factor (CTCF) binding sites from sequences near H19/IGF2. Even though this consensus binding site is derived from one genomic region only, CTCF is known to act as a universal chromatin insulator, involved in the regulation of mono- and biallelically expressed genes.32 Similarities to this consensus sequence covering at least 11 bp, whereby no gap or mismatch of the aligned sequence was tolerated, were observed at eight positions in the analyzed 7-Mb genomic sequence of NT_079596. Of these eight motifs, seven are located distal to marker no. 7, defining the onset of the 2.9-Mb core candidate region. In other words, the core candidate region, but not the preceding 3 Mb genomic sequence, is enriched in CTCF consensus sites described for an evolutionarily conserved insulator element of the imprinted H19/IGF2 domain (Supplementary Table 2).

An enrichment of CpG islands with repetitive elements near imprinted genes has been observed.33, 34, 35 As the GC content and hence the density of CpG islands vary considerably throughout the human genome, we have decided to monitor a larger 24-Mb region rather than merely the 7 Mb encompassing the candidate region and flanking sequences. We compared the position of GC islands and the position of tandem repeats, thus being able to interrogate the entire genomic region for uncommon GC-rich repetitive elements (Supplementary Table 3). Three GC-rich repeat regions are observed at 7q32.1–32.2, 7q34 and 7q35–36.1 (Figure 3). The cluster of imprinted genes PEG1/COPG2 is located near the GC-rich tandem repeat enriched region at 7q32.1–32.2. CTNAP2, reported as an imprinted gene involved in autism, is located near the GC-rich tandem repeat enriched region at 7q35–36.1. The third GC-rich tandem repeat enriched region at 7q34 colocalizes with the 2.9-Mb core candidate region for which parent-of-origin-specific decay of genomic sharing predicts a paternally imprinted CF modifier. It is tempting to speculate that these CG-rich repetitive elements are most vulnerable to structural alterations on germline methylation of cytosines, which is the characteristic of all imprinting control elements identified to date.1, 2, 3 Interestingly, genes that show imprinting in human, mouse and cattle have repetitive elements that are conserved between the species,36 supporting the hypothesis that these repeats provide a structural signal that results from the primary sequence of the DNA. Such control elements might provide a robust and easily recognizable signal, whereby the structure is altered on parent-of-origin-specific methylation, in the DNA structure for downstream processes such as binding of proteins and alignment of noncoding RNAs,37 which establish the permanent silencing of the nonexpressed chromosomal copy of an imprinted gene.1, 2, 3, 38

Figure 3
figure 3

Density of GC-rich repetitive elements on 7q31.3–q36.1. The overall GC content (a) and the density of repeats within GC islands (b) are shown for a 24-Mb genomic segment on 7q. The 4.43-Mb candidate region is located centrally within this genomic segment (see physical map below (b)). (a) GC content was averaged for overlapping segments of 300 bp with a step size of 150 bp. (b) To display the density of repeats within GC islands, 24 Mb was divided into 48 segments of 500 000 bp each. GC islands were defined as a stretch of 300 bp with an average GC content exceeding 50%. Repetitive elements within GC islands were localized by comparing both data sets. Among the forty-eight 500 kb segments, 18 did not contain any repeats within GC islands. The remaining thirty 500 kb segments contained between 1 and 9 repeats within GC islands. A list of these GC-rich motifs located within the candidate region is displayed in Supplementary Table 3. The position of the 7q32 cluster of imprinted genes26 CPA4, PEG1/MEST and its natural antisense transcript MESTIT, COPG2 and its natural antisense transcript COPG2IT, and KLF14 is indicated below the physical map of markers no. 1 to no. 13 by the black box labeled A. The black box labeled B denotes the position of CNTNAP2 on 7q35, reported as an imprinted gene involved in autism.26, 43 Chromosome bands along 7q are visualized in the cytogenetic map at the bottom of the figure.

Discussion

We monitored the parent-of-origin-dependent decay of genomic sharing among CF sib pairs between CFTR and 7qtel. Concordant, but not discordant, CF sib pairs coinherited a segment of the paternal F508del-CFTR chromosome, indicative of a paternally imprinted gene that modifies the course of CF disease. Two independent observations in our study substantiate the description of a paternally imprinted CF modulator on 7q34, which is most likely localized within or controlled by elements mapping to the 2.9-Mb core candidate region: first, the more proximal onset of recombination on paternal chromosomes is associated with a discordant clinical phenotype among CF siblings (Figure 1b), whereas the onset of recombination on maternal chromosomes is similar for CONC and DIS pairs. Second, imprinting-associated elements such as CTCF consensus sites and repeat-rich CpG islands are enriched in the 2.9-Mb core candidate region on 7q34 (Supplementary Table 2, Figure 3). Interestingly, allele distribution at marker D7Sat3, also defining the onset of the core candidate region, differs significantly between mildly and severely affected CF patient pairs (Figure 1, Table 1), which points to the presence of at least one modifier in the 7q34 region. It is tempting to speculate that underlying genetic entities that cause the association signal at D7Sat3 and the parent-of-origin effect that mapped to the core candidate region are the same, although no further evidence than positional overlap can be provided for this hypothesis so far. Besides the research reported in this paper, the larger 7q region has been implicated to contain a paternally imprinted gene responsible for growth retardation by maternal isodisomy 7 and/or Silver Russel Syndrome,39 and two QTLs for stature have been mapped in the vicinity of the paternally imprinted modulator on 7q34 described here.40, 41 Taken together, these findings indicate that the paternally imprinted 7q34 CF modifier acts on growth and stature.

We propose that the method applied here, that is, monitoring the decay of sharing in phenotypically informative sib pairs using a customized microsatellite map, can aid in the physical fine mapping of imprinted modulators of any disease that is shaped by parent-of-origin effects (Figure 4). As a quarter of sib pairs can be expected to share any genomic locus of interest according to Mendelian law, many large studies on affected patient pairs for diseases such as autism, schizophrenia and bipolar disorder will allow a priori classification of clinically concordant and discordant patient pairs and subsequent sampling of subgroups defined by intrapair sharing of parental haplotypes near a locus of interest. Paternal and maternal chromosomes of these selected subsamples of affected patient pairs can be followed up on a dense marker map as accomplished in this study for the paternally imprinted element on 7q34 to delineate novel imprinted genes in the human genome.

Figure 4
figure 4

Mapping imprinted genes by parent-of-origin-specific decay of genomic sharing. The principle of the assay is illustrated by three concordant sib pairs (top panel, sibs are shown as gray pictograms) and three discordant sib pairs (bottom panel, sibs are shown as pairs of black and white pictograms). Only one parental chromosome – the paternal chromosome for mapping of a paternally imprinted gene – is shown next to the sibs. Alleles at five adjacent marker loci are visualized as circles, whereby gray color for both sibs of a pair denotes a shared chromosomal segment, and loci depicted in black for one and white for the other sib of a pair denote an unshared genomic segment. Intrapair sharing of the first locus in all sib pairs is obligatory to monitor decay of sharing. For this study on CF sibs at 7q, this locus is represented by the CFTR gene, and, as all siblings are F508del-CFTR homozygous, this locus is also shared between the pairs, although interpair sharing is not required if the method is applied to other diseases. In other words, application of this method to other regions and diseases than CF and 7q31-qtel requires that a subset of affected pairs be selected from the complete cohort for whom genomic intrapair sharing is ascertained through observed identity by descent of alleles at a marker locus, which is expected for a quarter of any sib pair sample according to Mendelian law. The candidate region, starting with the marker for which significant decay of sharing is observed among discordant pairs, and the core candidate region, starting with the marker for which maximum divergence of sharing between the two contrasting phenotypes is observed, enclose the imprinted gene(s) and/or regulatory elements that control imprinting.