

Charles University
Research and development of a scalable human-machine collaboration system with the goal of executing high quality actions together
We started to develop a scalable human-machine collaboration system with the goal of executing high-quality actions together (e.g., in rehabilitation exercise). We extracted information from video for video-based goal-oriented dialogue and built on a video and text database. We evaluated high-performance body position estimation devices and estimated their real-time body position estimation capabilities on smartphones and cloud services.
An additional part of the project focused on the texts collected for the exercises for which we separated chat-like dialogues and protocol-related dialogues. For the former, we use crowdsourcing and estimate the amount of text needed for dialogues, which can lead to machine assistance to improve practices and predict the quality of practices. The protocol part of the dialogues concerned navigation, guiding the human partner to move to the optimal position and to take the optimal pose for the recording.
Human-machine collaboration will soon be ubiquitous, as machines can help in everyday life. However, spatial tasks are challenging because of real-time constraints. We want to optimize the interaction offline before it happens in real time to ensure high quality. We present the SPAtial TAsk (SPATA) framework. SPATA is modular, and here we address two connected components; body pose optimization and navigation. Our experiments show that 3D pose estimation using 2D cameras is accurate when the motion is captured from the right direction and distance. This limitation currently restricts us to simple forms of movement, such as those used in physical rehabilitation exercises. Accurate estimation requires (a) estimation of body size, (b) optimization of body and camera position, (c) navigation assistance to a location, and (d) activity capture and error estimation. An avatar model is used to estimate the shape and a skeleton model is used to estimate the body pose for (a). For (b), we use SLAM. For (c), we use a semantic map and optimize a minimal NLP system for human needs that we test. Finally, we estimate the accuracy of the motion and propose a visual comparison between the planned and the implemented motion pattern for (d). Our SPATA framework is useful for various tasks at home, in gyms and other spatial applications. Depending on the task, different components can be integrated.
This Humane-AI-Net micro-project was carried out by Eötvös Loránd University (ELTE, András Lőrincz) and Charles University Prague (CU, Ondřej Dušek).
Tangible outputs: