VariBench: a benchmark database for variations

Preethy Sasidharan Nair; Mauno Vihinen

doi:10.1002/humu.22204

VariBench: a benchmark database for variations

Hum Mutat. 2013 Jan;34(1):42-9. doi: 10.1002/humu.22204. Epub 2012 Oct 11.

Authors

Preethy Sasidharan Nair¹, Mauno Vihinen

Affiliation

¹ Institute of Biomedical Technology, University of Tampere, Tampere, Finland.

PMID: 22903802
DOI: 10.1002/humu.22204

Abstract

Several computational methods have been developed for predicting the effects of rapidly expanding variation data. Comparison of the performance of tools has been very difficult as the methods have been trained and tested with different datasets. Until now, unbiased and representative benchmark datasets have been missing. We have developed a benchmark database suite, VariBench, to overcome this problem. VariBench contains datasets of experimentally verified high-quality variation data carefully chosen from literature and relevant databases. It provides the mapping of variation position to different levels (protein, RNA and DNA sequences, protein three-dimensional structure), along with identifier mapping to relevant databases. VariBench contains the first benchmark datasets for variation effect analysis, a field which is of high importance and where many developments are currently going on. VariBench datasets can be used, for example, to test performance of prediction tools as well as to train novel machine learning-based tools. New datasets will be included and the community is encouraged to submit high-quality datasets to the service. VariBench is freely available at http://structure.bmc.lu.se/VariBench.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking
Computational Biology / methods
DNA / chemistry
DNA / genetics*
Databases, Genetic*
Genetic Predisposition to Disease / genetics
Genetic Variation*
Humans
Internet
Mutation
Polymorphism, Single Nucleotide
Protein Conformation
Proteins / chemistry
Proteins / genetics*
RNA / chemistry
RNA / genetics*
Reproducibility of Results
Sequence Analysis

Substances

Proteins
RNA
DNA