Article
Prediction of human mRNA donor and acceptor sites from the DNA sequence

https://doi.org/10.1016/0022-2836(91)90380-OGet rights and content

Abstract

Artificial neural networks have been applied to the prediction of splice site location in human pre-mRNA. A joint prediction scheme where prediction of transition regions between introns and exons regulates a cutoff level for splice site assignment was able to predict splice site locations with confidence levels far better than previously reported in the literature. The problem of predicting donor and acceptor sites in human genes is hampered by the presence of numerous amounts of false positives: here the distribution of these false splice sites is examined and linked to a possible scenario for the splicing mechanism in vivo. When the presented method detects 95% of the true donor and acceptor sites, it makes less than 0·1 % false donor site assignments and less than 0·4% false acceptor site assignments. For the large data set used in this study, this means that on average there are one and a half false donor sites per true donor site and six false acceptor sites per true acceptor site. With the joint assignment method, more than a fifth of the true donor sites and around one fourth of the true acceptor sites could be detected without accompaniment of any false positive predictions. Highly confident splice sites could not be isolated with a widely used weight matrix method or by separate splice site networks. A complementary relation between the confidence levels of the coding/non-coding and the separate splice site networks was observed, with many weak splice sites having sharp transitions in the coding/non-coding signal and many stronger splice sites having more ill-defined transitions between coding and non-coding.

References (51)

  • N. Qian et al.

    Predicting the secondary structure of globular proteins using neural network models

    J. Mol. Biol

    (1988)
  • R. Reed et al.

    A role for exon sequences and splice-site proximity in splice site selection

    Cell

    (1986)
  • R. Treisman et al.

    Specific transcription and RNA splicing defects in five cloned Beta-thalassemia genes

    Nature (London)

    (1983)
  • Y. Zhuang et al.

    A compensatory base change in U1 snRNA suppresses a 5′ splice site mutation

    Cell

    (1986)
  • S. Brunak et al.

    Cleaning up gene databases

    Nature (London)

    (1990)
  • S. Brunak et al.

    Neural network detects errors in the assignment of pre-mRNA splice sites

    Nucl. Acids Res

    (1990)
  • B. Chabot et al.

    The 3′ splice site of pre-messenger RNA is recognized by a small nuclear ribonucleoprotein

    Science

    (1985)
  • J. Fickett

    Recognition of protein coding regions in DNA sequences

    Nucl. Acids Res

    (1982)
  • P.J. Furdon et al.

    Inhibition of splicing but not cleavage at the 5′ splice site by truncating human B-globin pre-mRNA

  • P.J. Furdon et al.

    The length of the downstream exon and the substitution of specific sequences affect pre-mRNA splicing in vitro

    Mol. Cell. Biol

    (1988)
  • D.G. George et al.

    The protein identification resource (PIR)

    Nucl. Acids. Res

    (1986)
  • M.R. Green

    Pre-mRNA splicing

    Annu. Rev. Genet

    (1986)
  • N.L. Harris et al.

    Distribution and consensus of branch point signals in eucaryotic genes: a computerized statistical analysis

    Nucl. Acids Res

    (1990)
  • J.D. Hawkins

    A survey on intron and exon lengths

    Nucl. Acids Res

    (1988)
  • L.H. Holley et al.

    Protein secondary structure prediction with a neural network

  • Cited by (668)

    View all citing articles on Scopus

    This work was supported in part by the Danish Natural Science Research Council under grants no. J.nr, 11-1173 and 11-8168.

    View full text