Evaluating the Robustness of HJB Optimal Feedback Control
Conference: ISR 2020 - 52th International Symposium on Robotics
12/09/2020 - 12/10/2020 at online
Proceedings: ISR 2020
Pages: 8Language: englishTyp: PDF
Authors:
Lutter, Michael; Belousov, Boris (Computer Science Department, Technical University of Darmstadt, Darmstadt, Germany)
Clever, Debora (Computer Science Department, Technical University of Darmstadt, Darmstadt, Germany & ABB Corporate Research Center Ladenburg, Ladenburg, Germany)
Listmann, Kim (ABB Future Labs Switzerland, Baden-Dättwil, Switzerland)
Peters, Jan (Computer Science Department, Technical University of Darmstadt, Darmstadt, Germany & Robot Learning Group, Max Planck Institute for Intelligent Systems,Tübingen, Germany)
Abstract:
Developing and tuning a feedback controller including dynamics compensation is a challenging but essential part of many control applications. In contrast, describing a control task using a cost function and learning the corresponding optimal controller can simplify this controller development. However, current popularized deep reinforcement learning methods to obtain these controllers automatically, do not achieve good control policies w.r.t. the required smoothness, generalization and robustness w.r.t. parameter uncertainty of the approximate dynamics model. In this paper we describe HJB optimal control (HJBopt) a different approach to obtain optimal feedback policies by optimization rather than repeated sampling actions. This approach optimizes the residual of the Hamilton-Jacobi-Bellman differential equation on the complete state domain to obtain an optimal value function that directly implies a continuous time optimal policy on the complete state domain. The experiments show that the proposed HJBopt learns a good approximation of the optimal policy and this approximation exhibits much better smoothness and generalization compared to the deep reinforcement learning baselines. In addition, we show empirically that these characteristics enable HJBopt to obtain much more robust policies compared to these baselines.