ARI is a high-performance robotic platform designed for a wide range of multimodal expressive gestures and behaviours, making it the ideal social robot and suitable for Human-Robot-Interaction, perception, cognition, navigation, and interaction. Its behaviour can be customised using ROS (Robot Operating System) API and the provided, easy to use, web interface.

General description

ARI-SPRING is an autonomous humanoid social humanoid robot, which basic features are:

  • A wheeled mobile base
  • 2 arms: including shoulder, elbow and a hand in each
  • 1 torso with touchscreen
  • 1 head
  • Height: 160 cm
  • Width: 53 cm
  • Depth: 75 cm
  • Weight: 60 kg

It includes:

  • Vision system: multiple cameras to allow development of complex visual perceptions:; RGB camera in the head, torso front and back fisheye RGB cameras, stereo camera on the back;
  • Mobility system: the robot could move on flat ground with its differential drive mobile base and overcome small obstacles or climb small ramps;
  • Social interaction system: Text-to-speech, arm gestures, face detection, speech recognition;
  • Navigation system: self-localisation and mapping achieved with RGBD sensor on the torso.
  • Web-based interface integrated in 10 inches touchscreen and available also on external pc, tablets or mobile phones

Usage requirements (vision, audio)

The SPRING project heavily relies on the robot’s auditory and visual ability. Some of the objectives of the project are the following:

  1. Specific Objective 1.2: Visual detection, localisation and tracking of several personas and objects over long periods of time, audio localisation, tracking, diarisation and enhancement of multiple speakers in the presence of adverse acoustic conditions (environmental noise and reverberant rooms
  2. Specific Objective 1.4: Human behaviour feature detection, e.g. head- and eye-gaze, facial expressions, body poses, as well as physiological features measured by combined audio and visual features (heart rate and breathing rhythm)
  3. Specific Objective 2.1: the robot will be able to hold a conversation with several people at the same time.

Requirements for the visual and auditory architecture of the robot have been gathered based on discussion with SPRING partners, in particular Bar-Ilan University (for audio) as well as CVUT, UNITN and INRIA (for vision). In order to select the best robot architecture it was the interest of the partners to achieve as many requirements as possible. Among those are the prominent features:

For vision:

  • Camera(s) for human behaviour analysis and robot localisation which provides high-quality facial images of several users involved in some form of human-human and human-robot interaction. To be specific, relatively good resolution to perform facial expression including hand and body gesture analysis. To be specific:
    • It is extremely important to have a minimum framerate of 30fps to be able to catch the slightest movements of the facial traits for facial expression recognition.
    • The camera must have a tunable aperture time to avoid motion artifacts. In this sense low light conditions may be better handled by the network comparing to high motion artifacts.
    • A resolution of at least 2 Mp (1920×1080) is, in normal conditions, ensuring a good face definition at a distance of 1.5 – 2m.
  • Camera for obstacle avoidance
  • Camera for localization
  • Camera for self-charging

For audio:

  • Camera(s) for human behaviour analysis and robot localisation which provides high-quality facial images of several users involved in some form of human-human and human-robot interaction. To be specific, relatively good resolution to perform facial expression including hand and body gesture analysis. To be specific:
    • It is extremely important to have a minimum framerate of 30fps to be able to catch the slightest movements of the facial traits for facial expression recognition.
    • The camera must have a tunable aperture time to avoid motion artifacts. In this sense low light conditions may be better handled by the network comparing to high motion artifacts.
    • A resolution of at least 2 Mp (1920×1080) is, in normal conditions, ensuring a good face definition at a distance of 1.5 – 2m.
  • Camera for obstacle avoidance
  • Camera for localization
  • Camera for self-charging

Contact us for more details!