Citation

BibTex format

@article{Wang:2024:10.1109/TASE.2024.3432405,
author = {Wang, Y and Boyle, D},
doi = {10.1109/TASE.2024.3432405},
journal = {IEEE Transactions on Automation Science and Engineering},
title = {Constrained reinforcement learning using distributional representation for trustworthy quadrotor UAV tracking control},
url = {http://dx.doi.org/10.1109/TASE.2024.3432405},
year = {2024}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic environments is challenging. The chaotic nature of aerodynamics, derived from drag forces and moment variations, makes precise identification difficult. Consequently, many existing quadrotor tracking systems treat these aerodynamic effects as simple ‘disturbances’ in conventional control approaches. We propose a novel and interpretable trajectory tracker integrating a distributional Reinforcement Learning (RL) disturbance estimator for unknown aerodynamic effects with a Stochastic Model Predictive Controller (SMPC). Specifically, the proposed estimator ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) effectively identifies uncertainties between the true and estimated values of aerodynamic effects. Control parameterization employs simplified affine disturbance feedback to ensure convexity, which is seamlessly integrated with the SMPC. We theoretically guarantee that ConsDRED achieves an optimal global convergence rate, and sublinear rates if constraints are violated with certain error decreases as neural network dimensions increase. To demonstrate practicality, we show convergent training, in simulation and real-world experiments, and empirically verify that ConsDRED is less sensitive to hyperparameter settings compared with canonical constrained RL. Our system substantially improves accumulative tracking errors by at least 70%, compared with the recent art. Importantly, the proposed ConsDRED-SMPC framework balances the trade-off between pursuing high performance and obeying conservative constraints for practical implementations. Note to Practitioners —This work is motivated by challenges in training Reinforcement Learning (RL) for autonomous navigation in unmanned aerial vehicles, but its implications extend to other high-criticality applications in, for example, healthcare and financial services. The implementation of RL algo
AU - Wang,Y
AU - Boyle,D
DO - 10.1109/TASE.2024.3432405
PY - 2024///
SN - 1545-5955
TI - Constrained reinforcement learning using distributional representation for trustworthy quadrotor UAV tracking control
T2 - IEEE Transactions on Automation Science and Engineering
UR - http://dx.doi.org/10.1109/TASE.2024.3432405
UR - https://ieeexplore.ieee.org/document/10614102
UR - http://hdl.handle.net/10044/1/114780
ER -

Contact us

Dyson School of Design Engineering
Imperial College London
25 Exhibition Road
South Kensington
London
SW7 2DB

design.engineering@imperial.ac.uk
Tel: +44 (0) 20 7594 8888

Campus Map