Speaker
Pierre Barrat-Charlaix
Description
Global coevolutionary models of protein families have become
increasingly popular due to their capacity to predict residue-residue
contacts from sequence information, but also to predict fitness effects
of amino-acid substitutions or to infer protein-protein interactions. The
central idea in these models is to construct a probability distribution, a
Potts model, that reproduces single and pairwise frequencies of amino
acids found in natural sequences of the protein family. This approach
treats sequences from the family as independent samples, completely
ignoring phylogenetic relations between them. This simplification is
known to lead to potentially biased estimates of the parameters of the
model, decreasing their biological relevance. Current workarounds for
this problem, such as re-weighting sequences, are poorly understood
and not principled. Here, we propose an inference scheme that takes
the phylogeny of a protein family into account in order to correct biases
in estimating the frequencies of amino-acids. Using artificial data, we
show that a Potts model inferred using these corrected frequencies
performs better in predicting contacts and fitness effect of mutations.
Primary author
Pierre Barrat-Charlaix