In this section

Publications

Showing results for:
Reset all filters

Journal article
Kormushev P, Nomoto K, Dong F, Hirota Ket al., 2011,
Time Hopping Technique for Faster Reinforcement Learning in Simulations
, International Journal of Cybernetics and Information Technologies, Vol: 11, Pages: 42-59
- Publisher Web Link
- Cite
Conference paper
Kormushev P, Calinon S, Caldwell DG, 2010,
Robot Motor Skill Coordination with EM-based Reinforcement Learning
, Pages: 3232-3237
- Publisher Web Link
- Cite
Conference paper
Filippi S, Cappe O, Garivier A, 2010,
Optimism in Reinforcement Learning and Kullback-Leibler Divergence
, ALLERTON 2010
We consider model-based reinforcement learning in finite Markov De- cisionProcesses (MDPs), focussing on so-called optimistic strategies. In MDPs,optimism can be implemented by carrying out extended value it- erations under aconstraint of consistency with the estimated model tran- sition probabilities.The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows thisstrategy, has recently been shown to guarantee near-optimal regret bounds. Inthis paper, we strongly argue in favor of using the Kullback-Leibler (KL)divergence for this purpose. By studying the linear maximization problem underKL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, forsolving KL-optimistic extended value iteration. Using recent deviation boundson the KL divergence, we prove that KL-UCRL provides the same guarantees asUCRL2 in terms of regret. However, numerical experiments on classicalbenchmarks show a significantly improved behavior, particularly when the MDPhas reduced connectivity. To support this observation, we provide elements ofcom- parison between the two algorithms based on geometric considerations.
Conference paper
Filippi S, Cappe O, Garivier A, Szepesvari Cet al., 2010,
Parametric bandits: The generalized linear case
, Neural Information Processing Systems (NIPS’2010)
- Cite

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://www.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=17&respub-action=search.html Current Millis: 1771104389718 Current Time: Sat Feb 14 21:26:29 GMT 2026

Email us: contact-ml@imperial.ac.uk

Subscribe to the Machine Learning mailing list

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Time Hopping Technique for Faster Reinforcement Learning in Simulations

Robot Motor Skill Coordination with EM-based Reinforcement Learning

Optimism in Reinforcement Learning and Kullback-Leibler Divergence

Parametric bandits: The generalized linear case