Multi-modal behaviour recognition in realistic environments

We present present the first prototype implementation of tools for (a) single target behaviour recognition and (b) group-level behaviour analysis. For single target behaviour recognition, we propose a novel method, which performs unsupervised domain adaptation as well as presenting a new spatial transformer. We then approach the group-level behaviour analysis from two perspectives: (i) gaze target detection
and (ii) group detection. All detail are available in D4.3 at our results page.

Regarding the single target behaviour recognition, we have introduced a novel method based on spatial transformers and unsupervised domain adaptation, which is important to handle the domain-shift problem that can occur when the trained and test domains are coming from different distributions. The performance of this module was evaluated in publicly available datasets, which are corresponding to realistic environments. This method will be integrated into ARI in the future and will be tested on human-robot interaction scenarios.
Regarding group-level behaviour analysis, we have proposed two methods. One concerns gaze target detection and the other concerns group detection. In the future, both will serve as important cues to understand and detect the group membership and its joint behaviour. Gaze target detection, which is implemented as a multi-modal network using depth and scene images, was tested on publicly available datasets. This method was already integrated into ARI. It will be further improved to handle the domain-shift problem, and then will be tested on human-robot interaction scenarios as well, and most importantly on the dataset collected by the SPRING project. Group detection module is adapted from a state-of-the-art method. It will particularly allow us to correctly detect the conversational groups, detect which group is interacting with ARI, identify the people ARI has to consider interaction with. Also, we expect that it will improve the human-aware navigation, as it should allow to model group spaces, which are used to navigate ARI to correctly join the groups and to avoid interrupting others while joining. The group and individual activity detection modules will be improved by including the detected objects in the scene. The method presented in this study showed promising results on publicly available datasets.

Multi-modal behaviour recognition in realistic environments

Share This Story, Choose Your Platform!

Related Posts