Speaker
Antonio Celani
(ICTP, Quantitative Life Sciences Unit)
Description
Proper balance between exploitation and exploration is what
makes good decisions that achieve high reward, like payoff
or evolutionary fitness. The Infomax principle postulates
that maximization of information directs the function of
diverse systems, from living systems to artificial neural
networks. While specific applications turn out to be
successful, the validity of information as a proxy for
reward remains unclear. Here, we consider the multi-armed
bandit decision problem, which features arms (slot-machines)
of unknown probabilities of success and a player trying to
maximize cumulative payoff by choosing the sequence of arms
to play. We show that an Infomax strategy which optimally
gathers information on the highest probability of success
among the arms, saturates known optimal bounds and compares
favorably to existing policies. Conversely, gathering
information on the identity of the best arm in the bandit
leads to a strategy that is vastly suboptimal in terms of
payoff. The nature of the quantity selected for Infomax
acquisition is then crucial for effective tradeoffs between
exploration and exploitation.