DFKI
We focused on learning a policy based on the expert training data, while allowing the policy to go beyond data by interacting with an environment dynamics model and accounting uncertainty in the state estimation with virtual sensors.
Involving human knowledge into the learning model can be done in diverse ways, e.g., by learning from expert data in imitation learning or reward engineering in deep reinforcement learning. In many applications, however, the expert data usually covers part of the search space or “normal” behaviors/scenarios. Learning a policy in the autonomous driving application under the limited dataset can make the policy vulnerable to novel or out of distribution (OOD) inputs and, thus, produce overconfident and dangerous actions. In this micro project, we focused on learning a policy based on the expert training data, while allowing the policy to go beyond data by interacting with an environment dynamics model and accounting uncertainty in the state estimation with virtual sensors. To avoid a dramatic shift in distribution, we used the uncertainty of environmental dynamics to penalize the policy for states that are different from human behavior.
Output
This Humane-AI-Net micro-project was carried out by German Research Centre for Artificial Intelligence (DFKI, Christian Müller) and Volkswagen (AG, Andrii Kleshchonok).