Abstract
This paper studies the analytical properties of the reinforcement learning model proposed in Erev and Roth (1998), also termed cumulative reinforcement learning in Laslier et al. (2001). This stochastic model of learning in games accounts for two main elements: the law of effect (positive reinforcement of actions that perform well) and the law of practice (the magnitude of the reinforcement effect decreases with players’ experience). The main results of the paper show that, if the solution trajectories of the underlying replicator equation converge exponentially fast, then, with probability arbitrarily close to one, all the realizations of the reinforcement learning process lie within an ǫ band of that solution. The paper improves upon results currently available in the literature by showing that a reinforcement learning process that has been running for some time and is found sufficiently close to a strict Nash equilibrium, will reach it with probability one.
When
Tuesday 14 June 2011 at noon
Where
Gabor Seminar Room 611
Electrical & Electronic Engineering Building
Imperial College London
South Kensington Campus
London
SW7 2AZ