Complex Systems and Biological Physics Seminars

DCA for genome-wide epistasis analysis: the statistical genetics perspective

by Prof. Erik Aurell (KTH)

Europe/Stockholm
Description
Direct Coupling Analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. As a methodology it means to learn parameters in an Ising/Potts model which could have generated the data, and then use those parameters as predictions. DCA has been applied with great success to sequences of homologous proteins, and also recently to whole-genome population-wide sequencing data. This raises the conceptual question why DCA works at all.

I will argue that at least for genome-wide data that depends on the state of the population, which in turn depends on the relative speed of mutations, fitness variations and recombination, and will compare couplings to correlations obtained in a study of about 3,000 genomes of the human pathogen Streptococcus pneumoniae.

This is joint work with Chen-Yi Gao, Fabio Cecconi, Angelo Vulpiani and Hai-Jun Zhou, available as arXiv:1808.03478.