

Coping with the variability of human feedback during interactive learning through ensemble reinforcement learning
This project entails robot online behavioral adaptation during interactive learning with humans.
During human-robot interaction in a cooperative context, both human and robot may display behavioral adaptation and learning in response to their partner's behavior. The goal is to enable robots to adapt online to such dynamic and interactive situation. Robots shall in particular adapt to each human subject’s specific way of giving feedback during the interaction. Feedback here includes reward, instruction and demonstration, and can be regrouped under the term “teaching signals”.
As an example, when exchanging objects reachable by only one partner (either the human or the robot) in order to put them in various boxes, some human subjects prefer a proactive robot while others prefer the robot to wait for their instructions; some humans only tell the robot when it performs a wrong action, remaining silent when it performs correct actions, while other humans tend to reward each correct action, etc.
The proposed research strategy consists in endowing robots with a cognitive architecture constituted of an ensemble of machine learning methods, as potential tools that the robot can autonomously choose when it considers them as appropriate during different phases of the human-robot interaction. In particular, we propose to combine model-based and model-free reinforcement, coordinated by a meta-controller which monitors their respective performance and arbitrates. Additionally, the architecture shall enable the robot to learn models of various human feedback strategies and use them for online tuning of reinforcement learning so that the robot can quickly adapt its behavioral policy.
Main results of micro project
We designed a new ensemble learning algorithm, combining model-based and model-free reinforcement learning, for on-the-fly robot adaptation during human-robot interaction. The algorithm includes a mechanism for the robot to autonomously detect changes in a human's reward function from its observed behavior, and a reset of the ensemble learning accordingly. We simulated a series of human-robot interaction scenarios to test the robustness of the algorithm. In scenario 1, the human rewards the robot with various feedback profiles: stochastic reward; non-monotonic reward; or punishing for error without rewarding correct responses. In scenario 2, the humans teaches the robot through demonstrations, again with different degrees of stochasticity and levels of expertise from the human. In scenario 3, we simulated a human-robot cooperation task for putting a set of cubes in the right box. The task includes abrupt changes in the target box. Results show the generality of the algorithm.
How do the results contribute to the objectives of Humane-AI-net?
Humans and robots are doomed to cooperate more and more within the society. This micro- project addresses a major AI challenge to enable robots to adapt on-the-fly to different situations and to different more-or-less naive human users. The solution consists in designing a robot learning algorithm which generalizes to a variety of simple human-robot interaction scenarios. Following the HumanE AI vision, interactive learning puts the human in the loop, prompting human-aware robot behavioral adaptation.
This Humane-AI-Net micro-project was carried out by Sorbonne Université (Mohamed Chetouani and Mehdi Khamassi) and ATHINA (Petros Maragos).