Current online reinforcement algorithms struggle to utilize large and diverse datasets. In contrast, offline reinforcement learning algorithms offer an efficient solution for this problem. This paves the way for data-driven reinforcement learning. With the help of offline reinforcement learning algorithms, it is now possible to apply reinforcement learning in costly environments such as healthcare or autonomous driving. For this reason, we tested one of the latest offline reinforcement learning algorithm, CQL, in the autonomous driving environments CarRacing-v0 and Carla. We evaluated the CQL performance on different datasets with different α values. The α value controls the conservatism of the algorithm. Thereby, we tested the hypothesis that higher α values perform better the better the dataset and lower α values perform better the worse the dataset. To this end, we created expert datasets with excellent trajectories and imperfect datasets with noisy trajectories. Furthermore, we evaluated the CQL performance in contrast to behavior cloning and the state-of-the-art online reinforcement learning algorithm SAC.
Mohammd Karam Daaboul
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Angewandte Technisch-Kognitive Systeme