7–9 Apr 2011
Europe/Stockholm timezone

Networks of motifs from sequences of symbols

9 Apr 2011, 10:40
30m
FD5

FD5

Speaker

Roberta Sinatra (University of Catania)

Description

There are many examples in biology, in linguistics and in the theory of dynamical systems, where information resides and has to be extracted from corpora of raw data consisting in sequences of symbols. For instance, a written text in English or in another language is a collection of sentences, each sentence being a sequence of the letters from a given alphabet. Not all sequences of letters are possible, since the sentences are organized on a lexicon of a certain number of words. In addition to this, different words are used together in a structured and conventional way. Similarly, in biology, DNA nucleotides or aminoacidic sequence data can be seen as corpora of strings. Many results have shown proteins are far from being a random assembly of peptides and DNA sequences show non-trivial statistical properties. All this gives meaning to the metaphor of DNA and protein sequences regarded as texts written in a still unknown language. Sequences of symbols can also be found in time series generated by dynamical systems. In fact, a trajectory in the phase space can be transformed into sequence of symbols, by the so-called “symbolic dynamic” approach. In all the examples mentioned above, the main challenge is to decipher the message contained in the corpora of data sequences, and to infer the underlying rules that govern their production. We propose a general method to construct networks out of any symbolic sequential data. The method is based on two different steps: first it extracts in a “natural” way motifs, i.e. those recurrent short strings which play the same role words do in language; then it represents correlations of motifs within sequences as a network. Important information from the original data are embedded in such a network and can be easily retrieved as we will show through diverse applications to social dialogs, biological examples and dynamical systems. With the respect to previous linguistic methods, our approach does not need the a priori knowledge of a given dictionary. All this, makes the method very general and opens up a wide range of applications from the study of written text, to the analysis of different trajectories in dynamical systems.

Presentation materials

There are no materials yet.