ppiTrim: constructing non-redundant and up-to-date interactomes

Aleksandar Stojmirović; Yi-Kuo Yu

doi:10.1093/database/bar036

ppiTrim: constructing non-redundant and up-to-date interactomes

Database (Oxford). 2011 Aug 27:2011:bar036. doi: 10.1093/database/bar036. Print 2011.

Authors

Aleksandar Stojmirović¹, Yi-Kuo Yu

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Abstract

Robust advances in interactome analysis demand comprehensive, non-redundant and consistently annotated data sets. By non-redundant, we mean that the accounting of evidence for every interaction should be faithful: each independent experimental support is counted exactly once, no more, no less. While many interactions are shared among public repositories, none of them contains the complete known interactome for any model organism. In addition, the annotations of the same experimental result by different repositories often disagree. This brings up the issue of which annotation to keep while consolidating evidences that are the same. The iRefIndex database, including interactions from most popular repositories with a standardized protein nomenclature, represents a significant advance in all aspects, especially in comprehensiveness. However, iRefIndex aims to maintain all information/annotation from original sources and requires users to perform additional processing to fully achieve the aforementioned goals. Another issue has to do with protein complexes. Some databases represent experimentally observed complexes as interactions with more than two participants, while others expand them into binary interactions using spoke or matrix model. To avoid untested interaction information buildup, it is preferable to replace the expanded protein complexes, either from spoke or matrix models, with a flat list of complex members. To address these issues and to achieve our goals, we have developed ppiTrim, a script that processes iRefIndex to produce non-redundant, consistently annotated data sets of physical interactions. Our script proceeds in three stages: mapping all interactants to gene identifiers and removing all undesired raw interactions, deflating potentially expanded complexes, and reconciling for each interaction the annotation labels among different source databases. As an illustration, we have processed the three largest organismal data sets: yeast, human and fruitfly. While ppiTrim can resolve most apparent conflicts between different labelings, we also discovered some unresolvable disagreements mostly resulting from different annotation policies among repositories. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/ppiTrim.html.

Publication types

Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural

MeSH terms

Animals
Computational Biology / methods*
Database Management Systems*
Databases, Genetic*
Drosophila
Humans
Internet
Molecular Sequence Annotation*
Protein Interaction Mapping / methods*
Proteins / chemistry

Substances

Proteins

Grants and funding

Intramural NIH HHS/United States