Speaker
Marta Luksza
(Universität Köln)
Description
Clustering is used widely to infer putative functional
relationships between data elements. An example is gene
expression clusters arising through common biological
pathways or shared modes of regulation. In this talk, we
discuss elements of a statistical theory of clustering.
First, data sets often contain dependencies between the
components of data vectors, for example between experimental
conditions in gene expression data. How can such genuine
correlations be disentangled from the spurious ones that
arise due to presence of clusters? Second, even unrelated
objects can form cluster-like structures, simply due to
random density fluctuations. How can we distinguish such
random clusters from a signal of functional correlations? We
discuss how to compute a cluster p-value, using a mapping of
clustering to statistical mechanics of disordered systems.
In an application to gene expression data, we find a
remarkable link between the statistical significance of a
cluster and the functional relationships between its genes.