Publications

Conference paper

Lim BWT, Flageat M, Cully A, 2023,

Efficient exploration using model-based quality-diversity with gradients

, Conference on Artificial Life, Publisher: MIT Press, Pages: 1-10

Exploration is a key challenge in Reinforcement Learning,especially in long-horizon, deceptive and sparse-reward environments. For such applications, population-based approaches have proven effective. Methods such as Quality-Diversity deals with this by encouraging novel solutions and producing a diversity of behaviours. However, these methods are driven by either undirected sampling (i.e. mutations) or use approximated gradients (i.e. Evolution Strategies) in the parameter space, which makes them highly sample-inefficient. In this paper, we propose Dynamics-Aware QD-Ext (DA-QD-ext) and Gradient and Dynamics Aware QD (GDA-QD), two model-based Quality-Diversity approaches. They extend existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration.Our approach takes advantage of the effectiveness of QD algorithms as good data generators to train deep models and use these models to learn diverse and high-performing populations. We demonstrate that they outperform baseline RL approaches on tasks with deceptive rewards, and maintain the divergent search capabilities of QD approaches while exceeding their performance by ∼ 1.5 times and reaching the same results in 5 times less samples.

Abstract
Cite

Conference paper

Sanguedolce G, Naylor PA, Geranmayeh F, 2023,

Uncovering the potential for a weakly supervised end-to-end model in recognising speech from patient with post-stroke aphasia

, 5th Clinical Natural Language Processing Workshop, Publisher: Association for Computational Linguistics, Pages: 182-190

Post-stroke speech and language deficits (aphasia) significantly impact patients' quality of life. Many with mild symptoms remain undiagnosed, and the majority do not receive the intensive doses of therapy recommended, due to healthcare costs and/or inadequate services. Automatic Speech Recognition (ASR) may help overcome these difficulties by improving diagnostic rates and providing feedback during tailored therapy. However, its performance is often unsatisfactory due to the high variability in speech errors and scarcity of training datasets. This study assessed the performance of Whisper, a recently released end-to-end model, in patients with post-stroke aphasia (PWA). We tuned its hyperparameters to achieve the lowest word error rate (WER) on aphasic speech. WER was significantly higher in PWA compared to age-matched controls (10.3% vs 38.5%, p < 0.001). We demonstrated that worse WER was related to the more severe aphasia as measured by expressive (overt naming, and spontaneous speech production) and receptive (written and spoken comprehension) language assessments. Stroke lesion size did not affect the performance of Whisper. Linear mixed models accounting for demographic factors, therapy duration, and time since stroke, confirmed worse Whisper performance with left hemispheric frontal lesions. We discuss the implications of these findings for how future ASR can be improved in PWA.

Abstract
Cite

Conference paper

Grillotti L, Flageat M, Lim B, Cully Aet al., 2023,

Don't bet on luck alone: enhancing behavioral reproducibility of quality-diversity solutions in uncertain domains

, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM

Quality-Diversity (QD) algorithms are designed to generate collections of high-performing solutions while maximizing their diversity in a given descriptor space. However, in the presence of unpredictable noise, the fitness and descriptor of the same solution can differ significantly from one evaluation to another, leading to uncertainty in the estimation of such values. Given the elitist nature of QD algorithms, they commonly end up with many degeneratesolutions in such noisy settings. In this work, we introduce Archive Reproducibility Improvement Algorithm (ARIA); a plug-and-play approach that improves the reproducibility of the solutions present in an archive. We propose it as a separate optimization module, relying on natural evolution strategies, that can be executed on top of any QD algorithm. Our module mutates solutions to (1) optimize their probability of belonging to their niche, and (2) maximize their fitness. The performance of our method is evaluated on various tasks, including a classical optimization problem and two high-dimensional control tasks in simulated robotic environments. We show that our algorithm enhances the quality and descriptor space coverage of any given archive by at least 50%.

Abstract
Cite

Conference paper

Faldor M, Chalumeau F, Flageat M, Cully Aet al., 2023,

MAP-elites with descriptor-conditioned gradients and archive distillation into a single policy

, The Genetic and Evolutionary Computation Conference, Publisher: Association for Computing Machinery, Pages: 138-146

Quality-Diversity algorithms, such as MAP-Elites, are a branch of Evolutionary Computation generating collections of diverse and high-performing solutions, that have been successfully applied to a variety of domains and particularly in evolutionary robotics. However, MAP-Elites performs a divergent search based on random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation by integrating a gradient-based variation operator inspired by Deep Reinforcement Learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based operator does not direct mutations towards archive-improving solutions. In this work, we present two contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that improves the archive across the entire descriptor space, (2) we exploit the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the archive into one single versatile policy that can execute the entire range of behaviors contained in the archive. Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.

Abstract
Cite

Journal article

Flageat M, Chalumeau F, Cully A, 2023,

Empirical analysis of PGA-MAP-Elites for neuroevolution in uncertain domains

, ACM Transactions on Evolutionary Learning and Optimization, Vol: 3, Pages: 1-32, ISSN: 2688-299X

Quality-Diversity algorithms, among which MAP-Elites, have emerged as powerful alternatives to performance-only optimisation approaches as they enable generating collections of diverse and high-performing solutions to an optimisation problem. However, they are often limited to low-dimensional search spaces and deterministic environments. The recently introduced Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by pairing the traditional Genetic operator of MAP-Elites with a gradient-based operator inspired by Deep Reinforcement Learning. This new operator guides mutations toward high-performing solutions using policy-gradients. In this work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the benefits of policy-gradients on the performance of the algorithm and the reproducibility of the generated solutions when considering uncertain domains. We first prove that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments, decorrelating the two challenges it tackles. Secondly, we show that in addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments, approaching the reproducibility of solutions found by Quality-Diversity approaches built specifically for uncertain applications. Finally, we propose an ablation and in-depth analysis of the dynamic of the policy-gradients-based variation. We demonstrate that the policy-gradient variation operator is determinant to guarantee the performance of PGA-MAP-Elites but is only essential during the early stage of the process, where it finds high-performing regions of the search space.

Conference paper

Chalumeau F, Boige R, Lim BWT, Mace V, Allard M, Flajolet A, Cully A, Pierrot Tet al., 2023,

Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery

, The 11th International Conference on Learning Representations (ICLR) 2023

Cite

Conference paper

Surana S, Lim BWT, Cully A, 2023,

Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors

, IEEE International Conference on Robotics and Automation, ISSN: 2152-4092

Cite

Journal article

Grillotti L, Cully A, 2022,

Unsupervised behaviour discovery with quality-diversity optimisation

, IEEE Transactions on Evolutionary Computation, Vol: 26, Pages: 1539-1552, ISSN: 1089-778X

Quality-Diversity algorithms refer to a class of evolutionary algorithms designed to find a collection of diverse and high-performing solutions to a given problem. In robotics, such algorithms can be used for generating a collection of controllers covering most of the possible behaviours of a robot. To do so, these algorithms associate a behavioural descriptor to each of these behaviours. Each behavioural descriptor is used for estimating the novelty of one behaviour compared to the others. In most existing algorithms, the behavioural descriptor needs to be hand-coded, thus requiring prior knowledge about the task to solve. In this paper, we introduce: Autonomous Robots Realising their Abilities, an algorithm that uses a dimensionality reduction technique to automatically learn behavioural descriptors based on raw sensory data. The performance of this algorithm is assessed on three robotic tasks in simulation. The experimental results show that it performs similarly to traditional hand-coded approaches without the requirement to provide any hand-coded behavioural descriptor. In the collection of diverse and high-performing solutions, it also manages to find behaviours that are novel with respect to more features than its hand-coded baselines. Finally, we introduce a variant of the algorithm which is robust to the dimensionality of the behavioural descriptor space.

Abstract
Cite

Conference paper

Lim B, Allard M, Grillotti L, Cully Aet al., 2022,

QDax: on the benefits of massive parallelization for quality-diversity

, Genetic and Evolutionary Computation Conference (GECCO), Publisher: Association for Computing Machinery, Pages: 128-131

Quality-Diversity (QD) algorithms are a well-known approach to generate large collections of diverse and high-quality policies. However, QD algorithms are also known to be data-inefficient, requiring large amounts of computational resources and are slow when used in practice for robotics tasks. Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine as most physics simulators run on CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can be performed in parallel on single GPU/TPU. In this paper, we present QDax, an implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales. The increase in parallelism does not significantly affect the performance of QD algorithms, while reducing experiment runtimes by two factors of magnitudes, turning days of computation into minutes. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.

Conference paper

Lim BWT, Grillotti L, Bernasconi L, Cully Aet al., 2022,

Dynamics-aware quality-diversity for efficient learning of skill repertoires

, IEEE International Conference on Robotics and Automation, Publisher: IEEE, Pages: 5360-5366

Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills. However, QD algorithms are sample inefficient and require millionsof evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models. We also show how DA-QD can then be used for continual acquisition of new skill repertoires. To do so, weincrementally train a deep dynamics model from experience obtained when performing skill discovery using QD. We can then perform QD exploration in imagination with an imagined skill repertoire. We evaluate our approach on three robotic experiments. First, our experiments show DA-QD is 20 timesmore sample efficient than existing QD approaches for skill discovery. Second, we demonstrate learning an entirely new skill repertoire in imagination to perform zero-shot learning. Finally, we show how DA-QD is useful and effective for solving a long horizon navigation task and for damage adaptation in the real world. Videos and source code are available at: https://sites.google.com/view/da-qd.

Conference paper

Lim BWT, Reichenbach A, Cully A, 2022,

Learning to walk autonomously via reset-free quality-diversity

, The Genetic and Evolutionary Computation Conference (GECCO)

Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at this https URL.

Abstract
Cite

Conference paper

Pierrot T, Macé V, Chalumeau F, Flajolet A, Cideron G, Beguir K, Cully A, Sigaud O, Perrin-Gilbert Net al., 2022,

Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization

, The Genetic and Evolutionary Computation Conference (GECCO)

Cite

Conference paper

Allard M, Smith Bize S, Chatzilygeroudis K, Cully Aet al., 2022,

Hierarchical Quality-Diversity For Online Damage Recovery

, The Genetic and Evolutionary Computation Conference, Publisher: ACM

Adaptation capabilities, like damage recovery, are crucial for the deployment of robots in complex environments. Several works have demonstrated that using repertoires of pre-trained skills can enable robots to adapt to unforeseen mechanical damages in a few minutes. These adaptation capabilities are directly linked to the behavioural diversity in the repertoire. The more alternatives the robot has to execute a skill, the better are the chances that it can adapt to a new situation. However, solving complex tasks, like maze navigation, usually requires multiple different skills. Finding a large behavioural diversity for these multiple skills often leads to an intractable exponential growth of the number of required solutions.In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot more adaptive to different situations. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. The experiments with a hexapod robot show that our method solves maze navigation tasks with 20% less actions in the most challenging scenarios than the best baseline while having 57% less complete failures.

Abstract
Cite

Conference paper

Grillotti L, Cully A, 2022,

Relevance-guided unsupervised discovery of abilities with quality-diversity algorithms

, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM, Pages: 77-85

Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.

Abstract
Cite

Journal article

Zhang F, Demiris Y, 2022,

Learning garment manipulation policies toward robot-assisted dressing.

, Science Robotics, Vol: 7, Pages: eabm6010-eabm6010, ISSN: 2470-9476

Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user's arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%.

Journal article

Cursi F, Bai W, Yeatman EM, Kormushev Pet al., 2022,

GlobDesOpt: a global optimization framework for optimal robot manipulator design

, IEEE Access, Vol: 10, Pages: 5012-5023, ISSN: 2169-3536

Robot design is a major component in robotics, as it allows building robots capable of performing properly in given tasks. However, designing a robot with multiple types of parameters and constraints and defining an optimization function analytically for the robot design problem may be intractable or even impossible. Therefore black-box optimization approaches are generally preferred. In this work we propose GlobDesOpt, a simple-to-use open-source optimization framework for robot design based on global optimization methods. The framework allows selecting various design parameters and optimizing for both single and dual-arm robots. The functionalities of the framework are shown here to optimally design a dual-arm surgical robot, comparing the different two optimization strategies.

Journal article

Wang K, Fei H, Kormushev P, 2022,

Fast online optimization for terrain-blind bipedal robot walking with a decoupled actuated SLIP model

, Frontiers in Robotics and AI, Vol: 9, Pages: 1-11, ISSN: 2296-9144

We present an online optimization algorithm which enables bipedal robots to blindly walk overvarious kinds of uneven terrains while resisting pushes. The proposed optimization algorithmperforms high level motion planning of footstep locations and center-of-mass height variationsusing the decoupled actuated Spring Loaded Inverted Pendulum (aSLIP) model. The decoupledaSLIP model simplifies the original aSLIP with Linear Inverted Pendulum (LIP) dynamics inhorizontal states and spring dynamics in the vertical state. The motion planning can beformulated as a discrete-time Model Predictive Control (MPC) problem and solved at a frequencyof 1 kHz. The output of the motion planner is fed into an inverse-dynamics based whole bodycontroller for execution on the robot. A key result of this controller is that the feet of the robot arecompliant, which further extends the robot’s ability to be robust to unobserved terrain variations.We evaluate our method in simulation with the bipedal robot SLIDER. Results show the robotcan blindly walk over various uneven terrains including slopes, wave fields and stairs. It can alsoresist pushes of up to 40 N for a duration of 0.1 s while walking on uneven terrain.

Journal article

AlAttar A, Chappell D, Kormushev P, 2022,

Kinematic-model-free predictive control for robotic manipulator target reaching with obstacle avoidance

, Frontiers in Robotics and AI, Vol: 9, Pages: 1-9, ISSN: 2296-9144

Model predictive control is a widely used optimal control method for robot path planning andobstacle avoidance. This control method, however, requires a system model to optimize controlover a finite time horizon and possible trajectories. Certain types of robots, such as softrobots, continuum robots, and transforming robots, can be challenging to model, especiallyin unstructured or unknown environments. Kinematic-model-free control can overcome thesechallenges by learning local linear models online. This paper presents a novel perception-basedrobot motion controller, the kinematic-model-free predictive controller, that is capable of controllingrobot manipulators without any prior knowledge of the robot’s kinematic structure and dynamicparameters and is able to perform end-effector obstacle avoidance. Simulations and physicalexperiments were conducted to demonstrate the ability and adaptability of the controller toperform simultaneous target reaching and obstacle avoidance.

Conference paper

Cursi F, Chappell D, Kormushev P, 2022,

Augmenting loss functions of feedforward neural networks with differential relationships for robot kinematic modelling

, Ljubljana, Slovenia, 20th International Conference on Advanced Robotics (ICAR), Publisher: IEEE, Pages: 201-207

Model learning is a crucial aspect of robotics as it enables the use of traditional and consolidated model-based controllers to perform desired motion tasks. However, due to the increasing complexity of robotic structures, modelling robots is becoming more and more challenging, and analytical models are very difficult to build, particularly for redundant robots. Machine learning approaches have shown great capabilities in learning complex mapping and have widely been used in robot model learning and control. Generally, inverse kinematics is learned, directly obtaining the desired control commands given a desired task. However, learning forward kinematics is simpler and allows the computation of the robot Jacobian and enables the exploitation of the optimality of controllers. Nevertheless, typical learning methods have no knowledge about the differential relationship between the position and velocity mappings. In this work, we present two novel loss functions to train feedforward Artificial Neural network (ANN) which incorporate this information in learning the forward kinematic model of robotic structures, and carry out a comparison with standard ANN training using position data only. Simulation results show that incorporating the knowledge of the velocity mapping improves the suitability of the learnt model for control tasks.

Conference paper

Cursi F, Kormushev P, 2021,

Pre-operative offline optimization of insertion point location for safe and accurate surgical task execution

, Prague, Czech Republic, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021), Publisher: IEEE, Pages: 4040-4047

In robotically assisted surgical procedures thesurgical tool is usually inserted in the patient’s body througha small incision, which acts as a constraint for the motionof the robot, known as remote center of Motion (RCM). Thelocation of the insertion point on the patient’s body has hugeeffects on the performances of the surgical robot. In this workwe present an offline pre-operative framework to identify theoptimal insertion point location in order to guarantee accurateand safe surgical task execution. The approach is validatedusing a serial-link manipulator in conjunction with a surgicalrobotic tool to perform a tumor resection task, while avoidingnearby organs. Results show that the framework is capable ofidentifying the best insertion point ensuring high dexterity, hightracking accuracy, and safety in avoiding nearby organs.

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery

Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors