Yufei Zhang: Exploration-Exploitation Trade-Off for Reinforcement Learning with Continuous-Time Models

Event
Seminar

Date: Wednesday, 23 March 2022

Time: 13.30 - 14.30 GMT
Location: Room 140, Huxley Building
Campus: South Kensington Campus
Accessibility information

Audience: Open to all
Cost: Free
Tickets: Drop in

Add event to calendar

Event speakers:

Yufei Zhang

Title: Assistant Professor

Organisation: London School of Economics

For further details:

Contact: Cristopher Salvi

Abstract: We develop a probabilistic framework for analysing model-based reinforcement learning in the episodic setting. We then apply it to study finite-time horizon stochastic control problems with linear dynamics but unknown coefficients and convex, but possibly irregular, objective function. Using probabilistic representations, we study regularity of the associated cost functions and establish precise estimates for the performance gap between applying optimal feedback control derived from estimated and true model parameters. Next, we propose a phase-based learning algorithm for which we show how to optimise exploration-exploitation trade-off. Our algorithm achieves sublinear (or even logarithmic) regrets in high probability and expectation, matching the best possible results from the literature.

This talk is based on several projects with Xin Guo (UC Berkeley), Anran Hu (UC Berkeley), Lukasz Szpruch (U of Edinburgh, Alan Turing Institute) and Tanut Treetanthiploet (Alan Turing Institute).

Add event to calendar View map

See all events