Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Functional annotation of a full-length mouse cDNA collection

Abstract

The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Phase II full-insert sequencing flow chart.
Figure 2: The criteria used in assigning RIKEN definitions (riken_defs).

Similar content being viewed by others

The ENCODE Project Consortium, Michael P. Snyder, … Richard M. Myers

References

  1. Roest Crollius, H. et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 (2000).

    Article  CAS  Google Scholar 

  2. Ewing, B. & Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genet. 25, 232–234 (2000).

    Article  CAS  Google Scholar 

  3. Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nature Genet. 25, 239–240 (2000).

    Article  CAS  Google Scholar 

  4. Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).

    Article  CAS  Google Scholar 

  5. Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).

    Article  CAS  Google Scholar 

  6. Carninci, P. et al. Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 10, 1617–1630 (2000).

    Article  CAS  Google Scholar 

  7. Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

    Article  CAS  Google Scholar 

  8. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

    Article  CAS  Google Scholar 

  9. Gautheret, D., Poirot, O., Lopez, F., Audic, S. & Claverie, J. M. Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. Genome Res. 8, 524–530 (1998).

    Article  CAS  Google Scholar 

  10. Huang, X., Adams, M. D., Zhou, H. & Kerlavage, A. R. A tool for analyzing and annotating genomic sequences. Genomics 46, 37–45 (1997).

    Article  CAS  Google Scholar 

  11. Huang, X. & Madan, A. CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877 (1999).

    Article  CAS  Google Scholar 

  12. Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).

    Article  CAS  Google Scholar 

  13. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).

    Article  CAS  Google Scholar 

  14. Croft, L. et al. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nature Genet. 24, 340–341 (2000).

    Article  ADS  CAS  Google Scholar 

  15. Hanke, J. et al. Alternative splicing of human genes: more the rule than the exception? Trends Genet. 15, 389–390 (1999).

    Article  CAS  Google Scholar 

  16. Rubin, G. M. et al. Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000).

    Article  CAS  Google Scholar 

  17. Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

    Article  Google Scholar 

  18. Aravind, L. & Koonin, E. V. SAP- a putative DNA-binding motif involved in chromosomal organization. Trends Biochem. Sci. 25, 112–114 (2000).

    Article  CAS  Google Scholar 

  19. Matsuda, H. Detection of conserved domains in protein sequences using a maximum-density subgraph algorithm. IEICE Trans. Fundamentals Electron. Commun. Comput. Sci. E83-A, 713–721 (2000).

    Google Scholar 

  20. Pesole, G., Liuni, S. & D'Souza, M. PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16, 439–450 (2000).

    Article  CAS  Google Scholar 

  21. Carninci, P. et al. Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. Proc. Natl Acad. Sci. USA 95, 520–524 (1998).

    Article  ADS  CAS  Google Scholar 

  22. Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. & Lander, E. S. Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000).

    Article  CAS  Google Scholar 

  23. Itoh, M. et al. Automated filtration-based high-throughput plasmid preparation system. Genome Res. 9, 463–470 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Shibata, K. et al. RIKEN integrated sequence analysis (RISA) system-384-format sequencing pipline with 384 multicapillary sequencer. Genome Res. 10, 1757–1771 (2000).

    Article  CAS  Google Scholar 

  25. Gordon, D., Abajian, C. & Green, P. Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998).

    Article  CAS  Google Scholar 

  26. Fukunishi, Y. & Hayashizaki, Y. Amino-acid translation program for full-length cDNA sequences with frame-shift error. Physiol. Genomics. (in the press).

Download references

Acknowledgements

We thank the following (in alphabetical order) for discussion, encouragement and technical assistance: R. Abagyan, T. Akimura, K. Arakawa, M. Boguski, L. Corbani, T. A. Dragani, J. T. Eppig, S. Fujimori, G. Grillo, T. Haga, T. Hanagaki, S. Hanaoka, S. Hatta, N. Hayatsu, K. Hiramoto, T. Hiraoka, T. Hirozane, Y. Hodoyama, F. Hori, T. Hubbard, R. Hynes, K. Ikeda, K. Ikeo, C. Imamura, K. Imotani, S. Inoue, H. Kato, N. Kikuchi, Y. Kojima, A. Konagaya, M. Kouda, S. Koya, M. Kubota, S. Kumagai, C. Kurihara, M. Kusakabe, F. Licciulli, S. Liuni, L. Maltais, T. Matsuyama, L. McKenzie, A. Miyazaki, K. Mori, M. Muramatsu, M. Nakamura, K. Nomura, N. Nukina, K. Numata, R. Numazaki, M. Ohno, Y. Okuma, H. Ono, C. Owa, Y. Ozawa, G. Pertea, S. Ramachandran, E. M. Rubin, N. Saga, H. Saitou, H. Sakai, C. Sakai, A. Sakurai, H. Sano, D. Sasaki, L. Sato, C. Schneider, J. Schug, T. Shiraki, M. B. Soares, Y. Sogabe, C. Stoeckert, H. Sugawara, R. Sultana, H. Suzuki, M. Tagami, A. Tagawa, F. Takahashi, S. Takaku-Akahira, M. Takeuchi, T. Tanaka, Y. Tateno, Y. Tejima, J. Todd, A. Tomaru, S. Tonegawa, T. Toya, A. Wada, L. Wagner, A. Watahiki, T. Yamamura, T. Yamashita, T. Yao, A. Yasunishi, T. Yokota, S. Yokoyama, A. Yoshiki and K. Yotsutani. We also thank N. Kazuta, Y. Sigemoto, H. Torigoe and T. Washida for secretarial assistance. This study has been mainly supported by a grant for the RIKEN Genome Exploration Research Project and CREST (Core Research for Evolutional Science and Technology) to Y.H. Further support came from ACT-JST (Research and Development for Applying Advanced Computational Science and Technology) of Japan Science and Technology Corporation (JST) to Y.H. and H.M., and the Science and Technology Agency of the Japanese Government to Y.H. and Y.O. (All funds from the Science Technology Agency of the Japanese Government.) This work was also supported by a Grant-in-Aid for Scientific Research on Priority Areas and Human Genome Program, from the Ministry of Education, Science and Culture, and by a Grant-in-Aid for a Second Term Comprehensive 10-Year Strategy for Cancer Control from the Ministry of Health and Welfare to Y.H. Authors’ contributions: J. Kawai and Y. Okazaki contributed as organizers in phase II team and FANTOM, respectively. A. Shinagawa and H. Bono contributed as managers in sequence data production system and computing system, respectively. J. Quackenbush, P. Carninci, M. J. Brownstein, D. A. Hume, C. Schönbach, H. Suzuki and C. Weitz acted as senior managers of the annotation project.

Author information

Authors and Affiliations

Consortia

Corresponding author

Correspondence to Y. Hayashizaki.

Supplementary information

Supplementary Figure 1

A Distribution of the length of 21,076 insert DNAs (DOC 34 kb)

B Sequence accuracy

Supplementary Figure 2

A SAP domain containing RIKEN clones (DOC 41 kb)

B Phylogenetic tree of the known OATPs and RIKEN clones

Supplementary Figure 3

Alignment of the amino acid sequences of the known OATPs (DOC 135 kb)

Supplementary Figure 4

Mapping of RIKEN clones using Radiation Hybrid (DOC 31 kb)

Supplementary Table 1

DDBJ accsession number and MGI ID for RIKEN ID (TXT 801 kb)

Supplementary Table 2

A Full-length evaluation (DOC 80 kb)

B Analysis of Alternative Splicing in Redundant Clone Set

Supplementary Table 3 (DOC 1026 kb)

Supplementary Table 4

A RIKEN clones containing a DSP domain (DOC 997 kb)

B RIKEN clones containing a consensus kinase signature motif

C InterPro Motifs in RIKEN clones

Supplementary Table 5

A Motifs identified by maximum density subgraph analysis (DOC 64 kb)

B UTR Functional Elements

Supplementary Table 6

RH Map of RIKEN Clones using RH databases (DOC 56 kb)

Supplementary Table 7

A Databases that were used for the functional annotation (DOC 82 kb)

B Software that was used during full-length sequencing and the functional annotation

Supplementary Table 8

Strategies of Gene Ontology Assignment (DOC 32 kb)

Supplementary methods

This information is also available at: http://www.gsc.riken.go.jp/e/FANTOM/supplement/ (DOC 32 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001). https://doi.org/10.1038/35055500

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/35055500

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing