Computational searches for splicing signals

Methods. 2005 Dec;37(4):292-305. doi: 10.1016/j.ymeth.2005.07.011.

Abstract

The removal of introns from pre-mRNA requires as an initial event the accurate molecular recognition of the proper exon-intron borders. It is now evident that RNA sequence elements in addition to the consensus splice site sequences themselves are required for this recognition. Genomic analyses have contributed to the definition of these elements as exonic and intronic splicing enhancers and silencers, comprising what has been called the "splicing code." Many computational methods have been brought to bear in such studies. We describe here some of the methods we have used to discover functional splicing signals. What these methods have in common is a comparison of sequences in and around exons to sequences found elsewhere in the genome. We have especially made use of comparisons to "pseudo exons," intronic sequences resembling exons by virtue of being bounded by sequences indistinguishable from splice sites. Two computational strategies are emphasized: (1) the use of a machine learning technique in which a computational algorithm, a support vector machine, is first trained on known examples and then used to predict sequences associated with splicing; and (2) straight statistical analysis of differences between regions associated with exons and other regions in the genome. In most cases, the predictions made using these methods have been validated by subsequent empirical tests. An attempt has been made to make this description understandable by researchers unfamiliar with computational practice and to include practical references to specific databases and programs.

Publication types

  • Comparative Study

MeSH terms

  • Alternative Splicing / genetics*
  • Artificial Intelligence
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Exons
  • Genetic Code
  • Humans
  • Introns
  • Sequence Alignment
  • Software
  • Statistics as Topic