Section: Research Program

Natural Interaction with Robotic Systems

Scientific Context

Interaction with the environment is the primordial requirement for an autonomous robot: the robot must rely on measurements from its onboard sensors and, when available, can benefit from exteroceptive sensors distributed in the environment (e.g., external cameras, motion detectors, beacons) in order to model its surrounding and plan its actions based on its status. In this sense, interaction with the environment also includes interaction between the robot and a sensorized environment (sometimes called “smart”, “connected”, or “robotized”) or interaction between the human and this robotics environment. Taking decisions when multiple sensors are spread in such environments is still an open question. In many applications, this requires the capability of the robot to localize itself while moving, and for the environment to fuse the information from its multiple distributed sensors to track the behaviors of robots and humans, analyzing their actions and predicting their intent.

Predicting the evolution of the environment and of the different agents (robots and humans) that populate it, is of primary importance for taking valuable decision in dynamical environments. However, this is still a challenging problem, especially because we lack robust predictive models of human behavior. Using environmental sensors capable to extract main human social or physical signals (e.g., posture or gaze) is a way to simplify the problem for a robot. Putting together information from different sensors and viewpoints is beneficial for robots understanding complex scenes but often significantly increases the complexity of the data and of the representations that can be formed of the environment. At the same time, we aim at being able to control robots or mobile sensors, which means deciding, at each time instant, what to do. A critical constraint is the uncertainty arising both from the incomplete knowledge of the environment and the other agents (typically humans) that share this environment, and from the intrinsic noise of sensors and actuators.

When working in proximity of or directly with humans, robots must be capable of interacting safely with them, which calls upon a mixture of physical and social skills. In particular, robots working outside labs must exhibit the necessary social skills that allow them to interact with people that are not robotics experts. People operating industrial robots are usually specialized operators that receive a proper training for programming and operating the machines [31] . In contrast, the potential end-users of robots for service or personal assistance are usually not familiar with new technologies and robots [46] . To introduce robots in these contexts, the robot must be accepted as a reliable, trustworthy and efficient partner; it must be able to be used by people that are not skilled robotics experts [58] , therefore be endowed with the necessary social skills; it must be capable to interact physically with humans, a skill that calls upon its online learning, control and adaptation skills. Despite the growing interest of the robotics community for physical Human-Robot Interaction (HRI) [34] , social and collaborative HRI [56] , [51] , there are few examples in the literature about incorporating human signals in the control of movement and interaction forces. There are also very few examples of whole-body control of robot movement that takes into account human feedback [39] . In psychology, the literature analyzing the social and cognitive aspects of interaction is notable [36] , [56] . Sadly, as discussed by [53] , most HRI studies focus on verbal communication, and there are only few studies about dyadic interaction with physical contacts with robots. On the contrary, applications such as assistance robotics require a deeper knowledge of the intertwined exchange of social and physical signals to provide suitable robot controllers.

Main Challenges

We are here interested in building the bricks for a situated Human-Robot Interaction (HRI) addressing both the physical and social dimension of the close interaction, and the cognitive aspects related to the analysis and interpretation of human movement and activity.

The combination of physical and social signals into the robot control is a crucial investigation for assistance robots [55] and robotic co-workers [51] . A major obstacle is the control of physical interaction (precisely, the control of contact forces) between the robot and the human, while both partners are moving. In mobile robots, this problem is usually addressed by planning the robot movement taking into account the human as an obstacle or as a target, then delegating the execution of this “high-level” motion to whole-body controllers, where a mixture of weighted tasks is used to account for the robot balance, constraints and desired end-effectors trajectories [37] .

The first challenge is to make these controllers easier to deploy in real robotics systems, as currently they require a lot of tuning and can become very complex to handle the interaction with unknown dynamical systems such as humans. Here, the key is to combine machine learning techniques with such controllers.

The second challenge is to make the robot react and adapt online to the human feedback, exploiting the whole set of measurable verbal and non-verbal signals that humans naturally produce during a physical or social interaction. Technically, this means finding the optimal policy that adapts the robot controllers online, taking into account feedback from the human. Here, we need to carefully identify the significant feedback signals or some metrics of human feedback. In real-world conditions (i.e., outside the research laboratory environment) the set of signals is technologically limited by the robot's and environmental sensors and the onboard processing capabilities.

The third challenge is for a robot to be able to identify and track people on board. The motivation is to be able to estimate online either the position, the posture, or even moods and intentions of persons surrounding the robot. The main challenge is to be able to do that online, in real-time and in cluttered environments.

Angle of Attack

Our key idea is to exploit the physical and social signals produced by the human during the interaction with the robot and the environment in controlled conditions, to learn simple models of human behavior. Consequently, use these models to optimize the robot movements and actions. In a first phase, we will exploit the human physical signals (e.g., posture and force measurements) to identify the elementary posture tasks during balance and physical interaction. The identified model will be used to optimize the robot whole-body control, as a prior knowledge that is used to improve both the robot balance and the control of the interaction forces. Technically, we will combine weighted and prioritized controllers with stochastic optimization techniques. To adapt online the control of physical interaction and make it possible with human partners that are not robotics experts, we will exploit verbal and non-verbal signals (e.g., gaze, touch, prosody). The idea here is to estimate online from these signals the human intent along with some inter-individual factors that the robot can exploit to adapt its behavior, maximizing the engagement and acceptability during the interaction.

Another promising approach already investigated in Larsen team is the capability for a robot and/or an intelligent space to localize humans in its surrounding environment and to understand their activities. This is an important issue to handle both for safe and efficient human-robot interaction.

Simultaneous Tracking and Activity Recognition (STAR) [57] is an approch we want to develop. The activity of a person is highly correlated with its position and this approach aims at combining tracking and activity recognition to benefit one from another. By tracking the individual, the system may help infer its possible activity, while by estimating the activity of the individual, the system may have a better prediction of its possible future positions (which can be very effective in case of occlusion). This direction has been tested with simulator and particle filters [40] and one promising direction would be to couple STAR with decision making formalisms like partially observable Markov decision processes, POMDPs). This would allow to formalize problems such as deciding which action to take given an estimate of the human location and activity. This could also formalize other problems linked to the active sensing direction of the team: how the robotic system might choose its actions in order to have a better estimate of the human location and activity (for instance by moving in the environment or by changing the orientation of its cameras)?

Another issue we want to address is robotic human body pose estimation. Human body pose estimation consists of tracking body parts by analyzing a sequence of input images from single or multiple cameras.

Human posture analysis is of high value for human robot interaction or activity recognition. However, even if the arrival of new sensors like RGB-D cameras has simplified the problem, it still poses a great challenge, especially if we want to do it online, on a robot and in realistic world conditions (cluttered environment). This is even worse for a robot to bring together different capabilities both at the perception and navigation level [38] . This will be tackled through different techniques going from Bayesian state estimation (particle filtering), learning, active and distributed sensing.