CpG islands in vertebrate genomes

J Mol Biol. 1987 Jul 20;196(2):261-82. doi: 10.1016/0022-2836(87)90689-9.

Abstract

Although vertebrate DNA is generally depleted in the dinucleotide CpG, it has recently been shown that some vertebrate genes contain CpG islands, regions of DNA with a high G+C content and a high frequency of CpG dinucleotides relative to the bulk genome. In this study, a large number of sequences of vertebrate genes were screened for the presence of CpG islands. Each CpG island was then analysed in terms of length, nucleotide composition, frequency of CpG dinucleotides, and location relative to the transcription unit of the associated gene. CpG islands were associated with the 5' ends of all housekeeping genes and many tissue-specific genes, and with the 3' ends of some tissue-specific genes. A few genes contained both 5' and 3' CpG islands, separated by several thousand base-pairs of CpG-depleted DNA. The 5' CpG islands extended through 5'-flanking DNA, exons and introns, whereas most of the 3' CpG islands appeared to be associated with exons. CpG islands were generally found in the same position relative to the transcription unit of equivalent genes in different species, with some notable exceptions. The locations of G/C boxes, composed of the sequence GGGCGG or its reverse complement CCGCCC, were investigated relative to the location of CpG islands. G/C boxes were found to be rare in CpG-depleted DNA and plentiful in CpG islands, where they occurred in 3' CpG islands, as well as in 5' CpG islands associated with tissue-specific and housekeeping genes. G/C boxes were located both upstream and downstream from the transcription start site of genes with 5' CpG islands. Thus, G/C boxes appeared to be a feature of CpG islands in general, rather than a feature of the promoter region of housekeeping genes. Two theories for the maintenance of a high frequency of CpG dinucleotides in CpG islands were tested: that CpG islands in methylated genomes are maintained, despite a tendency for 5mCpG to mutate by deamination to TpG+CpA, by the structural stability of a high G+C content alone, and that CpG islands associated with exons result from some selective importance of the arginine codon CGX. Neither of these theories could account for the distribution of CpG dinucleotides in the sequences analysed. Possible functions of CpG islands in transcriptional and post-transcriptional regulation of gene expression were discussed, and were related to theories for the maintenance of CpG islands as "methylation-free zones" in germline DNA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Composition
  • Base Sequence
  • Cattle
  • Cytidine Monophosphate / analogs & derivatives
  • Cytidine Monophosphate / genetics*
  • Cytosine Nucleotides / genetics*
  • Dinucleoside Phosphates*
  • Genes
  • Guanosine / analogs & derivatives*
  • Guanosine / genetics
  • Humans
  • Mice
  • Rats
  • Transcription, Genetic
  • Vertebrates / genetics*

Substances

  • Cytosine Nucleotides
  • Dinucleoside Phosphates
  • Guanosine
  • cytidylyl-3'-5'-guanosine
  • Cytidine Monophosphate