Trends in Genetics
Genome AnalysisDefining a genomic radius for long-range enhancer action: duplicated conserved non-coding elements hold the key
Introduction
Candidate regulatory elements are often identified using comparative genomics because sequence conservation is considered to indicate negative selection and functional constraint 1, 2, 3. However, the assignment of regulatory elements to genes is a challenging and laborious task. Distal cis-regulatory elements in vertebrates are often located far from the genes they interact with and, in some cases, they are found within the introns of neighbouring genes 4, 5, 6, 7, 8. For example, in the human genome, an enhancer of the Sonic Hedgehog gene (SHH) is found within an intron of a gene that is located 1 Mb away from SHH [4]. Similarly, an enhancer of PAX6 is 200-kb downstream of PAX6 [6]. This makes the association of genes with their potential regulatory elements a significant problem in the human genome.
Conserved non-coding elements (CNEs) in vertebrate genomes have been found to cluster near transcription factors and developmental regulators, indicating that they are involved in vertebrate development 7, 9, 10, 11. Indeed, most of the experimentally tested CNEs in transient transfection assays appear to function as tissue-specific enhancers 7, 11, 13. Subsets of CNEs have been found to share sequence similarity and to reside next or within a few genes from transcription factors from the same protein families 9, 10, 11, 12, 13 (G.K. McEwen et al., unpublished data). For example, in a recent analysis, five sets of CNEs were found to share >75% identity over an alignment length of at least 50 bp [10]. These findings suggest that duplicated CNEs (dCNEs) might be cis-regulatory elements that direct tissue-specific expression and, assuming that sequence similarity indicates similarity of function, they are expected to be shared between paralogous genes with common expression patterns [14]. In this article, we propose the comparison of all neighbouring protein-coding genes with each other as an unbiased method to assign dCNEs to specific genes, even when the dCNEs are located hundreds of kilobases away from their predicted targets. Assigning dCNEs to individual genes enables us to calculate the distances that separate these elements from their predicted targets. Assuming that dCNEs function as cis-regulatory elements, we present the first computational analysis that aims to define the genomic radius of regulatory activity for cis-elements involved in early development in the human genome.
Section snippets
dCNEs are associated with duplicated genes
To test whether duplicated CNEs are the result of retention of regulatory elements after gene duplication, we used a set of CNEs that is conserved between the human and Fugu genomes, identified by a more sensitive search than previously described in Ref. [11] (supplementary material online and G.K. McEwen et al., unpublished data). The resulting set of DNA elements consists of 267 dCNEs that can be grouped into 129 families of two-to-four members (a mean of two dCNEs per family). For every
dCNEs are retained with duplicated transcription factors
It has previously been shown that genes adjacent to CNEs usually encode transcription factors 9, 10, 11. We confirmed this ‘enrichment’ for transcription factors (P-value <10−22) by comparing the Gene Ontology (GO) annotation [20] of all Fugu genes within 1 Mb of the dCNEs with those in the human genome using GOstat [21] (Table 1 in the supplementary material online). We then assessed the transcription-factor enrichment of genes identified by paralogy mapping compared with that of genes adjacent
Half of dCNEs are associated with genes that are found >250 kb away
So far, we have considered for each dCNE all the genes that are in the genomic radius defined by the most distal enhancer documented in the human genome. To assess the number of predicted target genes for each dCNE across a range of distances, we repeated our analysis every 250 kb up to 2 Mb away (Figure 2). Only half of the dCNEs have predicted target genes within the first 250 kb, whereas 95% of the dCNEs have predicted targets within 1.25 Mb. After that distance, the number of unassigned dCNEs
Concluding remarks
Non-coding elements that are conserved between human and Fugu are considered to have great regulatory potential. We assigned conserved elements that exist at low copy numbers in the human genome to their probable targets, based on paralogy mapping of all neighbouring protein-coding genes. Our results have shown that these elements are strongly associated with duplicated transcription factors. Most of our candidate regulatory elements could be assigned to individual genes, even when analysing a
Acknowledgements
We thank Ben Lehner for stimulating discussions and for critically reading the article. Part of the analysis described in this article was carried out at the MRC Rosalind Franklin Centre for Genomics Research. This work was supported by the U.K. MRC. T.V. is a Predoctoral Fellow funded by the MRC.
References (25)
Conserved noncoding sequences are reliable guides to regulatory elements
Trends Genet.
(2000)- et al.
Long-range control of gene expression: emerging mechanisms and disruption in disease
Am. J. Hum. Genet.
(2005) New 3′ elements control Pax6 expression in the developing pretectum, neural retina and olfactory region
Mech. Dev.
(2002)Relationship between the genomic organization and the overlapping embryonic expression patterns of the zebrafish dlx genes
Genomics
(1997)- et al.
Genomic strategies to identify mammalian regulatory sequences
Nat. Rev. Genet.
(2001) Exploiting human–fish genome comparisons for deciphering gene regulation
Hum Mol Genet
(2004)A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly
Hum. Mol. Genet.
(2003)Scanning human gene deserts for long-range enhancers
Science
(2003)Comparative analyses of multi-species sequences from targeted genomic regions
Nature
(2003)Ultraconserved elements in the human genome
Science
(2004)
Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes
BMC Genomics
Highly conserved non-coding sequences are associated with vertebrate development
PLoS Biol.
Cited by (62)
Multiple sclerosis treatment effects on plasma cytokine receptor levels
2018, Clinical ImmunologyCitation Excerpt :We wanted to explore additional genetic variants within or close to the genes of the four studied cytokine receptors that could influence their protein expression. Such pQTLs [28] are cis-regulatory elements expected within 1 Mb from the gene transcription start site [29]. We used a similar approach as has been used in expression quantitative traits loci (eQTL) studies [30] for testing if genetic variations located in the region 1 Mb upstream and downstream of the target gene could impact protein expression.
Mapping Complex Traits in a Diversity Outbred F1 Mouse Population Identifies Germline Modifiers of Metastasis in Human Prostate Cancer
2017, Cell SystemsCitation Excerpt :cis-eQTLs were calculated for all expressed transcripts within the Chr. 8 locus using DOQTL and were defined as a variant within 1 MB of either the transcription start site (TSS) or transcription end site (TES) of the cognate transcript, because 95% of enhancer elements fall within this range (Vavouri et al., 2006). Sixteen of 33 transcripts within the Chr.
Genomic features of human limb specific enhancers
2016, GenomicsCitation Excerpt :This exceptional case illuminates the fact that search space for human enhancer can extend up to 9 Mb (4.5 Mb on either side) around the gene of interest. Tissue and developmental stage specific expression of developmentally important SHH [34] and SOX9 [35] genes were found to be regulated by distantly acting enhancers (positioned ~ 1 MB away from gene body). This study together with the previously reported CNS specific genomic regulatory blocks firmly confirms the fact [17] that long-range spatial interaction among cis-acting regulatory sites and their target promoters are not rare exceptions but occur on pervasive scale (Supplementary Fig. 2).
When needles look like hay: How to find tissue-specific enhancers in model organism genomes
2011, Developmental BiologyUnderstanding blood development and leukemia using sequencing-based technologies and human cell systems
2023, Frontiers in Molecular Biosciences