1 November 2010 to 10 December 2010
Nordita
Europe/Stockholm timezone

Statistics for clustering in gene expression data: from statistical significance to biological relevance

6 Dec 2010, 11:55
30m
Nordita

Nordita

Speaker

Marta Luksza (Universität Köln)

Description

Clustering is used widely to infer putative functional relationships between data elements. An example is gene expression clusters arising through common biological pathways or shared modes of regulation. In this talk, we discuss elements of a statistical theory of clustering. First, data sets often contain dependencies between the components of data vectors, for example between experimental conditions in gene expression data. How can such genuine correlations be disentangled from the spurious ones that arise due to presence of clusters? Second, even unrelated objects can form cluster-like structures, simply due to random density fluctuations. How can we distinguish such random clusters from a signal of functional correlations? We discuss how to compute a cluster p-value, using a mapping of clustering to statistical mechanics of disordered systems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.

Presentation materials

There are no materials yet.