Publications

Conference paper

Cully AHR, Demiris Y, 2018,

Hierarchical behavioral repertoires with unsupervised descriptors

, Genetic and Evolutionary Computation Conference 2018, Publisher: ACM

Enabling artificial agents to automatically learn complex, versatile and high-performing behaviors is a long-lasting challenge. This paper presents a step in this direction with hierarchical behavioral repertoires that stack several behavioral repertoires to generate sophisticated behaviors. Each repertoire of this architecture uses the lower repertoires to create complex behaviors as sequences of simpler ones, while only the lowest repertoire directly controls the agent's movements. This paper also introduces a novel approach to automatically define behavioral descriptors thanks to an unsupervised neural network that organizes the produced high-level behaviors. The experiments show that the proposed architecture enables a robot to learn how to draw digits in an unsupervised manner after having learned to draw lines and arcs. Compared to traditional behavioral repertoires, the proposed architecture reduces the dimensionality of the optimization problems by orders of magnitude and provides behaviors with a twice better fitness. More importantly, it enables the transfer of knowledge between robots: a hierarchical repertoire evolved for a robotic arm to draw digits can be transferred to a humanoid robot by simply changing the lowest layer of the hierarchy. This enables the humanoid to draw digits although it has never been trained for this task.

Abstract
Cite

Software

Cully A, Chatzilygeroudis K, Allocati F, Mouret J-B, Rama R, Papaspyros Vet al., 2018,

Limbo: A Flexible High-performance Library for Gaussian Processes modeling and Data-Efficient Optimization

Limbo (LIbrary for Model-Based Optimization) is an open-source C++11 library for Gaussian Processes and data-efficient optimization (e.g., Bayesian optimization) that is designed to be both highly flexible and very fast. It can be used as a state-of-the-art optimization library or to experiment with novel algorithms with “plugin” components. Limbo is currently mostly used for data-efficient policy search in robot learning and online adaptation because computation time matters when using the low-power embedded computers of robots. For example, Limbo was the key library to develop a new algorithm that allows a legged robot to learn a new gait after a mechanical damage in about 10-15 trials (2 minutes), and a 4-DOF manipulator to learn neural networks policies for goal reaching in about 5 trials.The implementation of Limbo follows a policy-based design that leverages C++ templates: this allows it to be highly flexible without the cost induced by classic object-oriented designs (cost of virtual functions). The regression benchmarks show that the query time of Limbo’s Gaussian processes is several orders of magnitude better than the one of GPy (a state-of-the-art Python library for Gaussian processes) for a similar accuracy (the learning time highly depends on the optimization algorithm chosen to optimize the hyper-parameters). The black-box optimization benchmarks demonstrate that Limbo is about 2 times faster than BayesOpt (a C++ library for data-efficient optimization) for a similar accuracy and data-efficiency. In practice, changing one of the components of the algorithms in Limbo (e.g., changing the acquisition function) usually requires changing only a template definition in the source code. This design allows users to rapidly experiment and test new ideas while keeping the software as fast as specialized code.Limbo takes advantage of multi-core architectures to parallelize the internal optimization processes (optimization of the acquisition funct

Abstract
Cite

Journal article

Cully AHR, Demiris Y, 2018,

Quality and diversity optimization: a unifying modular framework

, IEEE Transactions on Evolutionary Computation, Vol: 22, Pages: 245-259, ISSN: 1941-0026

The optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.

Abstract
Cite

Conference paper

Zhang F, Cully A, Demiris YIANNIS, 2017,

Personalized Robot-assisted Dressing using User Modeling in Latent Spaces

, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866

Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.

Abstract
Cite

Conference paper

Zambelli M, Fischer T, Petit M, Chang HJ, Cully A, Demiris Yet al., 2016,

Towards Anchoring Self-Learned Representations to Those of Other Agents

, Workshop on Bio-inspired Social Robot Learning in Home Scenarios IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: Institute of Electrical and Electronics Engineers (IEEE)

In the future, robots will support humans in their every day activities. One particular challenge that robots will face is understanding and reasoning about the actions of other agents in order to cooperate effectively with humans. We propose to tackle this using a developmental framework, where the robot incrementally acquires knowledge, and in particular 1) self-learns a mapping between motor commands and sensory consequences, 2) rapidly acquires primitives and complex actions by verbal descriptions and instructions from a human partner, 3) discoverscorrespondences between the robots body and other articulated objects and agents, and 4) employs these correspondences to transfer the knowledge acquired from the robots point of view to the viewpoint of the other agent. We show that our approach requires very little a-priori knowledge to achieve imitation learning, to find correspondent body parts of humans, and allows taking the perspective of another agent. This represents a step towards the emergence of a mirror neuron like system based on self-learned representations.

Abstract
Cite

Conference paper

Tarapore D, Clune J, Cully AHR, Mouret J-Bet al., 2016,

How do different encodings influence the performance of the MAP-Elites algorithm?

, Proceedings of the Genetic and Evolutionary Computation Conference 2016, Publisher: ACM, Pages: 173-180

The recently introduced Intelligent Trial and Error algorithm (IT&E) both improves the ability to automatically generate controllers that transfer to real robots, and enables robots to creatively adapt to damage in less than 2 minutes. A key component of IT&E is a new evolutionary algorithm called MAP-Elites, which creates a behavior-performance map that is provided as a set of "creative" ideas to an online learning algorithm. To date, all experiments with MAP-Elites have been performed with a directly encoded list of parameters: it is therefore unknown how MAP-Elites would behave with more advanced encodings, like HyperNeat and SUPG. In addition, because we ultimately want robots that respond to their environments via sensors, we investigate the ability of MAP-Elites to evolve closed-loop controllers, which are more complicated, but also more powerful. Our results show that the encoding critically impacts the quality of the results of MAP-Elites, and that the differences are likely linked to the locality of the encoding (the likelihood of generating a similar behavior after a single mutation). Overall, these results improve our understanding of both the dynamics of the MAP-Elites algorithm and how to best harness MAP-Elites to evolve effective and adaptable robotic controllers.

Abstract
Cite

Journal article

Cully A, Mouret J-B, 2016,

Evolving a behavioral repertoire for a walking robot

, Evolutionary Computation, Vol: 24, Pages: 59-88, ISSN: 1063-6560

Numerous algorithms have been proposed to allow legged robots to learn to walk.However, most of these algorithms are devised to learn walking in a straight line,which is not sufficient to accomplish any real-world mission. Here we introduce theTransferability-based Behavioral Repertoire Evolution algorithm (TBR-Evolution), anovel evolutionary algorithm that simultaneously discovers several hundreds of simplewalking controllers, one for each possible direction. By taking advantage of solutionsthat are usually discarded by evolutionary processes, TBR-Evolution is substantiallyfaster than independently evolving each controller. Our technique relies on two meth-ods: (1) novelty search with local competition, which searches for both high-performingand diverse solutions, and (2) the transferability approach, which combines simulationsand real tests to evolve controllers for a physical robot. We evaluate this new techniqueon a hexapod robot. Results show that with only a few dozen short experiments per-formed on the robot, the algorithm learns a repertoire of controllers that allows therobot to reach every point in its reachable space. Overall, TBR-Evolution introduceda new kind of learning algorithm that simultaneously optimizes all the achievablebehaviors of a robot.

Abstract
Cite

Conference paper

Maestre C, Cully AHR, Gonzales C, Doncieux Set al., 2015,

Bootstrapping interactions with objects from raw sensorimotor data: a Novelty Search based approach

, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Publisher: IEEE

Determining in advance all objects that a robot will interact with in an open environment is very challenging, if not impossible. It makes difficult the development of models that will allow to perceive and recognize objects, to interact with them and to predict how these objects will react to interactions with other objects or with the robot. Developmental robotics proposes to make robots learn by themselves such models through a dedicated exploration step. It raises a chicken-and-egg problem: the robot needs to learn about objects to discover how to interact with them and, to this end, it needs to interact with them. In this work, we propose Novelty-driven Evolutionary Babbling (NovEB), an approach enabling to bootstrap this process and to acquire knowledge about objects in the surrounding environment without requiring to include a priori knowledge about the environment, including objects, or about the means to interact with them. Our approach consists in using an evolutionary algorithm driven by a novelty criterion defined in the raw sensorimotor flow: behaviours, described by a trajectory of the robot end effector, are generated with the goal to maximize the novelty of raw perceptions. The approach is tested on a simulated PR2 robot and is compared to a random motor babbling.

Abstract
Cite

Journal article

Cully A, Clune J, Tarapore D, Mouret J-Bet al., 2015,

Robots that can adapt like animals

, Nature, Vol: 521, Pages: 503-507, ISSN: 0028-0836

As robots leave the controlled environments of factories to autonomouslyfunction in more complex, natural environments, they will have to respond tothe inevitable fact that they will become damaged. However, while animals canquickly adapt to a wide variety of injuries, current robots cannot "thinkoutside the box" to find a compensatory behavior when damaged: they are limitedto their pre-specified self-sensing abilities, can diagnose only anticipatedfailure modes, and require a pre-programmed contingency plan for every type ofpotential damage, an impracticality for complex robots. Here we introduce anintelligent trial and error algorithm that allows robots to adapt to damage inless than two minutes, without requiring self-diagnosis or pre-specifiedcontingency plans. Before deployment, a robot exploits a novel algorithm tocreate a detailed map of the space of high-performing behaviors: This maprepresents the robot's intuitions about what behaviors it can perform and theirvalue. If the robot is damaged, it uses these intuitions to guide atrial-and-error learning algorithm that conducts intelligent experiments torapidly discover a compensatory behavior that works in spite of the damage.Experiments reveal successful adaptations for a legged robot injured in fivedifferent ways, including damaged, broken, and missing legs, and for a roboticarm with joints broken in 14 different ways. This new technique will enablemore robust, effective, autonomous robots, and suggests principles that animalsmay use to adapt to injury.

Journal article

Koos S, Cully A, Mouret J-B, 2013,

Fast damage recovery in robotics with the T-resilience algorithm

, The International Journal of Robotics Research, Vol: 32, Pages: 1700-1723, ISSN: 0278-3649

Damage recovery is critical for autonomous robots that need to operate for a long time without assistance. Most current methods are complex and costly because they require anticipating potential damage in order to have a contingency plan ready. As an alternative, we introduce the T-resilience algorithm, a new algorithm that allows robots to quickly and autonomously discover compensatory behavior in unanticipated situations. This algorithm equips the robot with a self-model and discovers new behavior by learning to avoid those that perform differently in the self-model and in reality. Our algorithm thus does not identify the damaged parts but it implicitly searches for efficient behavior that does not use them. We evaluate the T-resilience algorithm on a hexapod robot that needs to adapt to leg removal, broken legs and motor failures; we compare it to stochastic local search, policy gradient and the self-modeling algorithm proposed by Bongard et al. The behavior of the robot is assessed on-board thanks to an RGB-D sensor and a SLAM algorithm. Using only 25 tests on the robot and an overall running time of 20 min, T-resilience consistently leads to substantially better results than the other approaches.

Abstract
Cite

Conference paper

Cully AHR, Mouret J-B, 2013,

Behavioral repertoire learning in robotics

, Proceedings of the 15th annual conference on Genetic and evolutionary computation, Publisher: ACM, Pages: 175-182

Behavioral Repertoire Learning in RoboticsAntoine CullyISIR, Université Pierre et Marie Curie-Paris 6,CNRS UMR 72224 place Jussieu, F-75252, Paris Cedex 05,Francecully@isir.upmc.frJean-Baptiste MouretISIR, Université Pierre et Marie Curie-Paris 6,CNRS UMR 72224 place Jussieu, F-75252, Paris Cedex 05,Francemouret@isir.upmc.frABSTRACTLearning in robotics typically involves choosing a simple goal(e.g. walking) and assessing the performance of each con-troller with regard to this task (e.g. walking speed). How-ever, learning advanced, input-driven controllers (e.g. walk-ing in each direction) requires testing each controller on alarge sample of the possible input signals. This costly pro-cess makes difficult to learn useful low-level controllers inrobotics.Here we introduce BR-Evolution, a new evolutionary learn-ing technique that generates a behavioral repertoire by tak-ing advantage of the candidate solutions that are usuallydiscarded. Instead of evolving a single, general controller,BR-evolution thus evolves a collection of simple controllers,one for each variant of the target behavior; to distinguishsimilar controllers, it uses a performance objective that al-lows it to produce a collection of diverse but high-performingbehaviors. We evaluated this new technique by evolving gaitcontrollers for a simulated hexapod robot. Results show thata single run of the EA quickly finds a collection of controllersthat allows the robot to reach each point of the reachablespace. Overall, BR-Evolution opens a new kind of learningalgorithm that simultaneously optimizes all the achievablebehaviors of a robot.

Abstract
Cite

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Limbo: A Flexible High-performance Library for Gaussian Processes modeling and Data-Efficient Optimization

Towards Anchoring Self-Learned Representations to Those of Other Agents

Head of group