Statistical Mechanics of Learning and Inference

Europe/Stockholm
Erik Aurell (KTH), John Hertz, Mikko Alava, Yasser Roudi
Description

Background and other information

For several years, ideas from statistical mechanics have been used in developing inference techniques useful for analyzing high dimensional data. Furthermore, in recent years technological advances in multi-electrode and multi-array recordings have resulted in an increase in the number of elements that can be observed simultaneously in many biological systems. This event is meant to gather scientists interested in applications of statistical mechanics for building useful inference techniques and the use of such techniques for making sense of multi-electrode/multi-array data. The event is also meant to gather participants from Computer and Information Science working on similar ideas.

The venue is Hotel Arkipelag in downtown Mariehamn, the capital of the province of Åland, Finland. The Åland archipelago, lying between Sweden and mainland Finland, is easily reachable by ferry from Stockholm (Sweden), from Turku (Finland), and from Helsinki (Finland). In addition, there are flights from Sweden and Finland.

The workshop begins on a ferry from Stockholm to Mariehamn Wednesday afternoon May 26, 2010, with arrival late evening that day, and the continues in Mariehamn over 2.5 full days (Thursday to Saturday noon, May 27-29). Participation costs, to be covered by Workshop participants, comprise travel to and from Stockholm, accommodation in Mariehamn, and a flat Workshop fee (100 Euros).

The Workshop fee covers travel from Stockholm to Mariehamn (ferry ticket), dinner on the ferry and coffee and lunches at the Workshop. The Workshop fee can be waived for participants on tight budgets (PhD students). The organizers arrange suitable accommodation at the conference venue, please see Registration page for information. If you prefer to arrange your own accommodation in Mariehamn, please indicate so upon registration. If you chose to travel to Mariehamn otherwise than by ferry from Stockholm (see information e.g. on the Åland official tourism site), please indicate so upon registration (and please make your own travel arrangements).

For invited speakers the workshop organizers cover travel, accommodation and the Workshop fee. The meeting is generously supported by NORDITA, the KTH Linnaeus Centre ACCESS, the National Graduate School in Materials Physics (Finland) NGSMP, and and the National Graduate School in Computational Sciences (Finland) FICS. <par> Deadline for registration for scientists interested in participation is April 1 (we have recently recieved the max. participant number, so please contact the organizers and explain your situation if you have not registered yet). We also have taken notice of the recent flight turbulence thanks to the volcano eruption in Iceland. The meeting will go on however nevertheless. Please note that the program should start on a ferry from Stockholm, so your presence on that is important information for the organizers. There is a maximum number of participants, 50, for capacity reasons.

</par> <par> </par> <par> Invited speakers are

Venkat Chandrasekaran Boston
Marek Cieplak Warsaw
Simona Cocco Paris
Jaakko Hollmen Helsinki
Bert Kappen Nijmengen
Samuel Kaski Helsinki
Enzo Marinari Rome
Amos Maritan Padova
Matteo Marsili Trieste
Rémi Monasson Paris
Pradeep RavikumarAustin
Angelo Vulpiani Rome
Martin Weigt Turin
Riccardo Zecchina Turin

Practical information about the conference start

The conference begins on Viking Line ferry leaving Stockholm harbour May 26 2010 at 16.45 Swedish time. We will travel on the Viking Line ferries going from Stockholm to Helsinki with a stop-over in Mariehamn (where we get off).

The trip to Mariehamn from Stockholm takes almost exactly five hours. We will have a lecture room and dinner on the ship.

It is recommended that you be in the ferry terminal at 16.00, at the latest. Someone from the organizing committee (Erik Aurell, Mikko Alava or Yasser Roudi) will be in the ferry terminal with the tickets from 15.30 at the latest.

Practical information about the return trip

The return trip from Mariehamn is the responsibility of the individual participants.

However, to simplify things, everyone except those who have explicitly indicated a contrary preference will be booked on Viking Line sailing from Mariehamn on Saturday May 29 at 14.25 (Finnish time), arriving in Stockholm at 18.55 (Swedish time). If you continue elsewhere by air travel from Stockholm that same evening you should count at least one hour from the ferry terminal to the airport (to be on the safe side). Note that the ferry tickets are provided/included in the conference fee.

    • 1
      From gene expressions to genetic networks
      A method based on the principle of entropy maximization is used to identify the gene interaction network with the highest probability of giving rise to experimentally observed transcript profiles [1]. In its simplest form, the method yields the pairwise gene interaction network, but it can also be extended to deduce higher order correlations. Analysis of microarray data from genes in Saccharomyces cerevisiae chemostat cultures exhibiting energy metabollic oscillations identifies a gene interaction network that reflects the intracellular communication pathways. These pathways adjust cellular metabolic activity and cell division to the limiting nutrient conditions that trigger metabolic oscillations. The success of the present approach in extracting meaningful genetic connections suggests that the maximum entropy principle is a useful concept for understanding living systems, as it is for other complex, nonequilibrium systems. The time-dependent behavior of the genetic network is found to involve only a few fundamental modes [2,3]. REFERENCES: [1] T. R. Lezon, J. R. Banavar, M. Cieplak, A. Maritan, and N. Fedoroff, Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns, Proc. Natl. Acad. Sci. (USA) 103, 19033-19038 (2006) [2] N. S. Holter, M. Mitra, A. Maritan, M. Cieplak, J. R. Banavar, and N. V. Fedoroff, Fundamental patterns underlying gene expression profiles: simplicity from complexity, Proc. Natl. Acad. Sci. USA 97, 8409-8414 (2000) [3] N. S. Holter, A. Maritan, M. Cieplak, N. V. Fedoroff, and J. R. Banavar, Dynamic modeling of gene expression data, Proc. Natl. Acad. Sci. USA 98, 1693-1698 (2001)
      Speaker: Marek Cieplak (Institute of Physics, Polish Academy of Sciences)
    • 18:00
      coffee break
    • 2
      Information, interaction and inference in finance
      Speaker: Matteo Marsili (ICTP)
    • 3
      High-dimensional Ising model selection using ell_1 regularized regression
      Speaker: Pradeep Ravikumar (University of Texas, Austin)
    • 10:00
      coffee break
    • 4
      TBA
      Speaker: Venkat Chandrasekaran (MIT)
    • 5
      TBA
      Speaker: Samuel Kaski (Aalto University)
    • 12:00
      Lunch
    • 6
      Optimal control as a graphical model inference problem
      To compute a course of actions in the presence of uncertainty is the topic of stochastic optimal control theory. Such computations require the solution of complex partial differential equations and these computations become intractable for most problems. I will introduce a class of control problems that can be expressed as a KL divergence and that can be mapped onto a graphical model inference problem. In this talk, we show how to apply this theory in the context of a delayed choice task and for collaborating agents. We first introduce the KL control framework. Then we show that in a delayed reward task when the future is uncertain it is optimal to delay the timing of your decision. We show preliminary results on human subjects that confirm this prediction. Subsequently, we discuss two player games, such as the stag-hunt game, where collaboration can improve or worsen as a result of recursive reasoning about the opponents actions. The Nash equilibria appear as local minima of the optimal cost to go, but may disappear when monetary gain decreases. This behaviour is in agreement with experimental findings in humans. We subsequently extend the setting to delayed rewards and show how cooperation develops as a result of recursive reasoning.Suboptimal cooperation arise as local minima of the objective function.
      Speaker: Bert Kappen (Radboud University Nijmegen)
    • 7
      Modeling Viral Evolution in competition with the Immune System
      Influenza viruses evolve at a high speed to escape acquired immunity and infect the same host several time. Contrary to naive expectation, this does not lead to a large diversity in the viral population. Phylogenetic studies show that the viral population display the characters of an "evolving quasispecies" with reduced instantaneous diversity. In this talk I will discuss a simple stochastic model of and evolving viral population that allows to rationalize the evolving quasispecies behavior as an emerging feature of the competition between strains with different level of infectivity.
      Speaker: Silvio Franz (Université Paris-Sud)
    • 16:00
      coffee break
    • 8
      The fluctuation-dissipation relations as an inference tool
      As first we discuss as the Fluctuation Dissipation Relations (FDRs) hold in a generalized form for any systems with a stationary probability distribution. One can say that the essence of the FDRs is the possibility to establish a bridge between equilibrium and non equilibrium properties. We show how FDRs are useful tools to understand the statistical behaviour of complex systems such as driven granular gases and protein models. In the first case although the velocity response function of a particle and its velocity self-correlation are not proportional, a generalized form of fluctuation-dissipation relation holds, this is due to the presence of strong correlations between velocities and spatial density. This happens at high densities and strong inelasticities, but still in the fluid-like (and ergodic) regime. In addition we discuss how, following the Jarzynski- like approach, i.e. perfoming a series of "pulling experiments", one can infer the basic feature of the underlying dynamics. (Joint work with F. Cecconi, A. Puglisi and D. Villamaina)
      Speaker: Angelo Vulpiani (Università di Roma "La Sapienza")
    • 9
      TBA
      Speaker: Shaomeng Qin (Aalto University)
    • 10
      Intrinsic Limitations of Inverse Inference in Spin Glasses
      Abstract. We analyze the limits inherent to the inverse reconstruction of a pairwise Ising spin glass based on susceptibility propagation. We establish the conditions under which the susceptibility propagation algorithm is able to reconstruct the characteristics of the network given first- and second-order local observables, evaluate eventual errors due to various types of noise in the originally observed data, and discuss the scaling of the problem with the number of degrees of freedom.
      Speaker: Enzo Marinari (Università di Roma "La Sapienza")
    • 10:00
      coffee break
    • 11
      Statistical physics of optimization under uncertainty
      Optimization under uncertainty deals with the problem of optimizing stochastic cost functions given some partial information on their inputs. These problems are extremely difficult to solve and yet pervade all areas of technological and natural sciences. We propose a general approach to solve such large-scale stochastic optimization problems and a Survey Propagation based algorithm that implements it. As an illustration, we apply our method to the stochastic bipartite matching problem, in the two-stage and multi-stage cases. The efficiency of our approach, which does not rely on sampling techniques, allows us to validate the analytical predictions with large-scale numerical simulations. (joint work with Fabrizio Altarelli, Alfredo Braunstein and Abolfazl Ramezanpour)
      Speaker: Riccardo Zecchina (Politecnico di Torino)
    • 12
      Dynamical TAP equations and the inverse Ising problem
      Recent advances in recording technology allow simultaneous measurement of the activity of many elements in a biological system, e.g. many neurons, genes etc. This has inspired people to study how this recorded data can be used to learn something about the connectivity between these elements. A useful and powerful platform for studying this problem is the inverse Ising problem: finding the coupling of an Ising model given the means and pairwise correlation or samples from the distribution. In this talk, after briefly describing exact and approximate methods for finding the couplings of an equilibrium Ising model, I will describe how we can use a non-equilibrium model to improve the inference of the connections.
      Speaker: Yasser Roudi (Nordita)
    • 12:00
      Lunch
    • 13
      Cluster expansion for the Inverse Ising Problem: application to synthetic and real data.
      I will introduce a procedure to infer the fields and the couplings of a spatially-distributed Ising model, given the magnetizations and pairwise correlations of spins. The algorithm is based on the recursive decomposition of the entropy into contributions coming from clusters of spins. I will explain and validate the procedure on synthetic data sets,and then apply it to experimental data coming from multi-electrode recordings of neural activity. (Work done in collaboration with R. Monasson and S. Leibler)
      Speaker: Simona Cocco (École Normale Supérieure)
    • 14
      Reading out the activity of large neural ensembles: The Ising Decoder
      New technologies such as high-density multi-electrode array recording and multiphoton calcium imaging allow the activity of large numbers of neurons to be monitored. However, analysis tools have lagged behind the experimental technology, with most approaches limited to very small population sizes. In the limit of short time windows, where neuronal activity can be binarized without loss of information, the Ising model provides a useful approach towards capturing the information content of large neural ensembles. I will show how maximum entropy models including the Ising model fit with the information component analysis theoretical framework for studying neural coding, and how the Ising model can be used to decode large neural ensembles. I will highlight some recent advances we have made in scaling up our decoders, and demonstrate the algorithms on in vivo multielectrode array and two photon calcium imaging data.
      Speaker: Simon Schultz (Imperial College)
    • 16:00
      coffee break
    • 15
      Exploring Nash Equilibria in Network Games
      Game theoretic problems defined on graphs may admit many Nash equilibria, with very different properties. An example is provided by strategic substitutes game on network. Searching for (socially) optimal Nash equilibria in these games is a non-trivial task. I will discuss some algorithmic techniques based on Monte Carlo and Belief Propagation as well as learning methods by means of which players endogeneously organize toward one or more Nash equilibria.
      Speaker: Luca Dall'Asta (ICTP)
    • 16
      Poster Session
    • 17
      Inference of protein-protein interactions from multi-species sequence data using statistical-physics inspired approaches
      Experimental approaches to transient protein interactions are laborious and serendipitous, and our understanding of fundamental questions like the identification of interaction surfaces or the specificity of molecular recognition between interacting proteins is far from being complete. We propose a computational approach based on recent techniques from the statistical physics of disordered systems, which exploits the natural sequence variability of homologous proteins across hundreds of species species. Using bacterial two-component signal transduction (TCS) as a test case, we show that our method is able (i) to identify inter-protein residue contacts and to facilitate the prediction of protein complex strutures, and (ii) to reconstruct a molecular recognition code which elucidates specificity in signal transduction in bacteria.
      Speaker: Martin Weigt (Institute for Scientific Interchange, Torino)
    • 10:00
      break
    • 18
      TBA
      Speaker: Mr WITOELAR, Aree (Comp. Science, University of Groningen)
    • 19
      Mixture modeling of DNA copy number aberrations
      DNA copy number aberrations, i.e. copy number amplifications and copy number deletions, are hallmarks of nearly all advanced tumors. We present the data collection of genome-wide DNA copy number amplification data consisting of data of over 4500 cases of human neoplasms. The data set has been gathered from scientific journal articles covering a period of ten years and is naturally represented as 0-1 data. We motivate the use of mixture models in probabilistic clustering of amplification data and present a mixture model of multivariate Bernoulli distributions to yield patterns that are relevant to all cancer types. Appropriate complexity for the mixture model for each chromosome is selected with a model selection procedure. A methodology to create a naming scheme for the identified patterns is also presented. Results are interpreted and the diagnostic value of the findings is further investigated in the light of background risk factors.
      Speaker: Jaakko Hollmen (Aalto University)