Stockholm Bioinformatics Center seminars

Models for domain functional interplay and Gene Ontology function prediction

by Mr Kristoffer Forslund (Stockholm Bioinformatics Centre)

Europe/Stockholm
RB35

RB35

Description
The relationship between protein domain content and function is important to investigate, both for understanding domains and to annotate proteins in an automated manner. We present two different models for how protein domains combine to yield specific function; one rule-based, one probabilistic, and demonstrate how these are useful for Gene Ontology annotation transfer. The former is an intuitive generalization of the pfam2go mapping, and detects cases of strict functional implications of sets or motifs of domains. The latter uses a Naive Bayesian network-based model to represent the relationship between domain content and annotation terms, and is found to be better adapted to incomplete training sets. We implement these models as predictors of Gene Ontology annotation terms, and the resulting tools are shown to be more effective than conventional best BLAST-hit annotation transfer on a large-scale dataset. We further present a number of cases where combinations of Pfam-A protein domains can be shown to significantly predict functional terms that do not follow from the individual domains.