Trends in Genetics
Volume 22, Issue 1, January 2006, Pages 5-10
Journal home page for Trends in Genetics

Genome Analysis
Defining a genomic radius for long-range enhancer action: duplicated conserved non-coding elements hold the key

https://doi.org/10.1016/j.tig.2005.10.005Get rights and content

Many conserved non-coding elements (CNEs) in vertebrate genomes have been shown to function as tissue-specific enhancers. However, the target genes of most CNEs are unknown. Here we show that the target genes of duplicated CNEs can be predicted by considering their neighbouring paralogous genes. This enables us to provide the first systematic estimate of the genomic range for distal cis-regulatory interactions in the human genome: half of CNEs are >250 kb away from their associated gene.

Introduction

Candidate regulatory elements are often identified using comparative genomics because sequence conservation is considered to indicate negative selection and functional constraint 1, 2, 3. However, the assignment of regulatory elements to genes is a challenging and laborious task. Distal cis-regulatory elements in vertebrates are often located far from the genes they interact with and, in some cases, they are found within the introns of neighbouring genes 4, 5, 6, 7, 8. For example, in the human genome, an enhancer of the Sonic Hedgehog gene (SHH) is found within an intron of a gene that is located 1 Mb away from SHH [4]. Similarly, an enhancer of PAX6 is 200-kb downstream of PAX6 [6]. This makes the association of genes with their potential regulatory elements a significant problem in the human genome.

Conserved non-coding elements (CNEs) in vertebrate genomes have been found to cluster near transcription factors and developmental regulators, indicating that they are involved in vertebrate development 7, 9, 10, 11. Indeed, most of the experimentally tested CNEs in transient transfection assays appear to function as tissue-specific enhancers 7, 11, 13. Subsets of CNEs have been found to share sequence similarity and to reside next or within a few genes from transcription factors from the same protein families 9, 10, 11, 12, 13 (G.K. McEwen et al., unpublished data). For example, in a recent analysis, five sets of CNEs were found to share >75% identity over an alignment length of at least 50 bp [10]. These findings suggest that duplicated CNEs (dCNEs) might be cis-regulatory elements that direct tissue-specific expression and, assuming that sequence similarity indicates similarity of function, they are expected to be shared between paralogous genes with common expression patterns [14]. In this article, we propose the comparison of all neighbouring protein-coding genes with each other as an unbiased method to assign dCNEs to specific genes, even when the dCNEs are located hundreds of kilobases away from their predicted targets. Assigning dCNEs to individual genes enables us to calculate the distances that separate these elements from their predicted targets. Assuming that dCNEs function as cis-regulatory elements, we present the first computational analysis that aims to define the genomic radius of regulatory activity for cis-elements involved in early development in the human genome.

Section snippets

dCNEs are associated with duplicated genes

To test whether duplicated CNEs are the result of retention of regulatory elements after gene duplication, we used a set of CNEs that is conserved between the human and Fugu genomes, identified by a more sensitive search than previously described in Ref. [11] (supplementary material online and G.K. McEwen et al., unpublished data). The resulting set of DNA elements consists of 267 dCNEs that can be grouped into 129 families of two-to-four members (a mean of two dCNEs per family). For every

dCNEs are retained with duplicated transcription factors

It has previously been shown that genes adjacent to CNEs usually encode transcription factors 9, 10, 11. We confirmed this ‘enrichment’ for transcription factors (P-value <10−22) by comparing the Gene Ontology (GO) annotation [20] of all Fugu genes within 1 Mb of the dCNEs with those in the human genome using GOstat [21] (Table 1 in the supplementary material online). We then assessed the transcription-factor enrichment of genes identified by paralogy mapping compared with that of genes adjacent

Half of dCNEs are associated with genes that are found >250 kb away

So far, we have considered for each dCNE all the genes that are in the genomic radius defined by the most distal enhancer documented in the human genome. To assess the number of predicted target genes for each dCNE across a range of distances, we repeated our analysis every 250 kb up to 2 Mb away (Figure 2). Only half of the dCNEs have predicted target genes within the first 250 kb, whereas 95% of the dCNEs have predicted targets within 1.25 Mb. After that distance, the number of unassigned dCNEs

Concluding remarks

Non-coding elements that are conserved between human and Fugu are considered to have great regulatory potential. We assigned conserved elements that exist at low copy numbers in the human genome to their probable targets, based on paralogy mapping of all neighbouring protein-coding genes. Our results have shown that these elements are strongly associated with duplicated transcription factors. Most of our candidate regulatory elements could be assigned to individual genes, even when analysing a

Acknowledgements

We thank Ben Lehner for stimulating discussions and for critically reading the article. Part of the analysis described in this article was carried out at the MRC Rosalind Franklin Centre for Genomics Research. This work was supported by the U.K. MRC. T.V. is a Predoctoral Fellow funded by the MRC.

References (25)

  • A. Sandelin

    Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes

    BMC Genomics

    (2004)
  • A. Woolfe

    Highly conserved non-coding sequences are associated with vertebrate development

    PLoS Biol.

    (2005)
  • Cited by (62)

    • Multiple sclerosis treatment effects on plasma cytokine receptor levels

      2018, Clinical Immunology
      Citation Excerpt :

      We wanted to explore additional genetic variants within or close to the genes of the four studied cytokine receptors that could influence their protein expression. Such pQTLs [28] are cis-regulatory elements expected within 1 Mb from the gene transcription start site [29]. We used a similar approach as has been used in expression quantitative traits loci (eQTL) studies [30] for testing if genetic variations located in the region 1 Mb upstream and downstream of the target gene could impact protein expression.

    • Mapping Complex Traits in a Diversity Outbred F1 Mouse Population Identifies Germline Modifiers of Metastasis in Human Prostate Cancer

      2017, Cell Systems
      Citation Excerpt :

      cis-eQTLs were calculated for all expressed transcripts within the Chr. 8 locus using DOQTL and were defined as a variant within 1 MB of either the transcription start site (TSS) or transcription end site (TES) of the cognate transcript, because 95% of enhancer elements fall within this range (Vavouri et al., 2006). Sixteen of 33 transcripts within the Chr.

    • Genomic features of human limb specific enhancers

      2016, Genomics
      Citation Excerpt :

      This exceptional case illuminates the fact that search space for human enhancer can extend up to 9 Mb (4.5 Mb on either side) around the gene of interest. Tissue and developmental stage specific expression of developmentally important SHH [34] and SOX9 [35] genes were found to be regulated by distantly acting enhancers (positioned ~ 1 MB away from gene body). This study together with the previously reported CNS specific genomic regulatory blocks firmly confirms the fact [17] that long-range spatial interaction among cis-acting regulatory sites and their target promoters are not rare exceptions but occur on pervasive scale (Supplementary Fig. 2).

    View all citing articles on Scopus
    View full text