Section: Scientific Foundations

Perception and autonomous navigation

Autonomy in robotics largely relies on the capability of processing the information provided by exteroceptive sensors. Perception of the surrounding environment involves data acquisition, via sensors endowed with various characteristics and properties, and data processing in order to extract the information needed to plan and execute actions. In this respect, the fusion of complementary informations provided by different sensors is a central issue. Much research effort is devoted to the modeling of the environment and the construction of maps used, for instance, for localization, estimation, and navigation purposes. Another important category of problems concerns the selection and treatment of the information used by low-level control loops. Much of the processing must be performed in real-time, with a good degree of robustness so as to accommodate with the large variability of the physical world. Computational efficiency and well-posedness of the algorithms are constant preoccupations.

Advanced perception for robotics   

A key point is to handle the right compromise between the simplicity of the models and the complexity of the real world. For example, numerous computer vision algorithms have been proposed with the implicit assumptions that the observed surfaces are Lambertian and the illumination is uniform. These assumptions are only valid in customized environments. For applications such as the exploration of an outdoor environment the robustness of vision-based control schemes can be improved by using more realistic photometric models (including color information). Even though such models have already been used in the computer vision and augmented reality communities [45] , [67] their applicability to real-time robotic tasks has not been much explored.

In the same way that sensor models currently in use in robotics are often too simple to capture the complexity of the real world, the hypotheses underlying the geometrical structure in the scene are often restrictive. Most of the methods assume that the observed environment is rigid [53] . For many applications like, for example, autonomous navigation in variable and dynamical environments, this assumption is violated. In these cases, distinguishing between the observed global (dominant) motion and the true motion, or even the deformations, of particular objects, is important.

More generally, the question is to estimate robustly and in real-time the information needed for the visual task. Real-time processing of a complete model of a deformable environment (i.e. the tri-dimensional shape, the deformations of the surfaces, textures and colors and other physical properties that can be perceived by robotic sensors) has not yet been achieved. Recent studies carried out on visual tracking (i.e. tracking of visual clues in the image without feedback control of the camera pose), using a stereo pair of cameras [68] or a single camera [41] , are essentially concerned with parametric surfaces. To the best of our knowledge, the use of deformable visual information for navigation or feedback control has been limited to deformable contours [46] , or simple articulated planar objects [71] .

In many applications, using only one sensor may not be the optimal way to gather the information needed to perform the robot task. Many exteroceptive sensors provide complementary information (for example, unlike a single camera, a laser telemeter can directly measure the distance to an object), while proprioceptive sensors (odometry) are convenient to estimate local displacements of a robot. We participate in the development of “intelligent” devices composed of several complementary sensors well-suited to the tasks involved in autonomous robotics. Developing such sensors requires to solve different aspects of the problem : calibration, data representation, estimation and filtering. A theory for the proper integration of multi-sensor information within a general unified framework is still critically lacking.

Reliable robot localization and scene modeling   

Most of the applications involving mobile robotic systems (ground vehicles, aerial robots, automated submarines,...) require a reliable localization of the robot in its environment. The problem of localization, given a map of the environment in the form of a set of landmarks or, conversely, the problem of constructing a map assuming that the vehicle's situation (position+orientation) is known, has been addressed and solved using a number of different approaches. A more attractive problem is when neither the robot path nor the map is known. Localization and mapping must then be considered concurrently. This problem is known as Simultaneous Localization And Mapping. In this case, the vehicle moves from an unknown location in an unknown environment and proceeds to incrementally build up a navigation map of the environment, while simultaneously using this map to update its estimated position. Two tutorials by Hugh Durrant-Whyte and Tim Bailey [40] , [49] describe some of the standard methods for solving the SLAM problem but also some more recent algorithms. More recently, a new class of approaches has appeared based on graphical inference technique which represents the SLAM problem as a set of links between robot and landmarks poses, and formulates a global optimization algorithm for generating a map from such constraints [59] , [69] , [74] . Unfortunately, in the case of a robot explorating a large scale environment, such a method yields to dramatically increase the state vector during the motion. We are investigating for well-founded methods which allow us to automatically introduce, if needed, a new local submap while preserving the consistency (in the sense of the probability) of the global map.

The use of vision in SLAM provides a rich perceptual information compared to lasers and yields a low level of data association ambiguity. However real-time visual SLAM has only become possible recently with faster computers and ways of selecting sparse but distinct features. The main difficulty comes from the loss of the depth dimension due to the projective model of the camera. Consequently, monocular vision yields to address the specific configuration of bearing-only slam. In such a configuration, only the directions of sight of the landmaks can be measured. This leads to observability problems during the initialization. It is well-known in the computer vision community that specific motions of the camera, or very distant landmarks, lead also to observability problems. To overcome this type of problem, delayed landmark insertion techniques such as local bundle adjustment [48] or particle filtering [47] have been proposed. More recently undelayed approaches [50] , [56] , [72] have been investigated. These approaches generally rely on a probabilistic model of the depth distribution along the sight ray and require the use of particle filtering techniques or gaussian multi-hypothesis methods. Another approach relies on the use of dense representations instead of sparse ones based on landmarks. We are applying these ideas to visual SLAM [70] by stating the problem in terms of the optimization of a warping function directly expressed in the image space. The function parameters capture not only the geometrical and the photometrical aspects of the scene but also the camera motion. Robustness is enhanced by using a dense approach taking advantage of all the information available in the regions of interest instead of a sparse representation based on features like Harris or Sift points.

Nevertheless, solving the SLAM problem is not sufficient for guaranteeing an autonomous and safe navigation. The choice of the representation of the map is, of course, essential. The representation has to support the different levels of the navigation process : motion planning, motion execution and collision avoidance and, at the global level, the definition of an optimal strategy of displacement. The original formulation of the SLAM problem is purely metric (since it basically consists in estimating the Cartesian situations of the robot and a set of landmarks), and it does not involve complex representations of the environment. However, it is now well recognized that several complementary representations are needed to perform exploration, navigation, mapping, and control tasks successfully. Alike several authors, we proposed [22] to use composite models of the environment which mix topological, metric, and grid-based representations. Each type of representation is well adapted to a particular aspect of autonomous navigation : the metric model allows one to locate the robot precisely and plan Cartesian paths, the topological model captures the accessibility of different sites in the environment and allows a coarse localization, and finally the grid representation is useful to characterize the free space and design potential functions used for reactive obstacle avoidance. However, ensuring the consistency of these various representations during the robot exploration, and merging observations acquired from different viewpoints by several co-operative robots, are difficult problems. This is particularly true when different sensing modalities are involved. New studies to derive efficient algorithms for manipulating the hybrid representations (merging, updating, filtering...) while preserving their consistency are needed.

Exploration strategy and reactive navigation   

The exploration of an unknown environment relies on a robot motion strategy which allows to construct a complete representation of the environment in minimal time or, equivalently, with displacements of minimal lengths. Few works have addressed these aspects so far. Most exploration approaches [44] , [55] use a topological representation like the Generalized Voronoï diagram (GVD). Assuming an infinite range for the sensors, GVD provides an aggregated representation of the environment and an elegant means to solve the optimality problem. Unfortunately, the usual generalized Voronoï diagram, which is based on the L 2 metric, does not cope well with real environments and the bounded range of the sensors used in robotic applications. Building topological representations supporting exploration strategies in real-time remains a challenging issue which is pursued in ARobAS .

For large-scale environments and long-time survey missions, the SLAM process can rapidly diverge due to the uncertainties and the drift inherent to dead reckoning methods, and the unavailability of absolute position measurements (as provided, for example, by a GNSS whose drawback is that it is not operational everywhere nor always). The problem of motion control is rarely considered as a constitutive part of the SLAM problem. We advocate that autonomous navigation and SLAM should not be treated separately, but rather addressed in an unified framework involving perception, modeling, and control. Reactive navigation and sensor-based control constitute the core of our approach. Sensor-based control, whose design relies on the modeling of the interactions between the robot and its nearby environment, is particularly useful in such a case. We show in simulation and experimentally [21] that embedding the SLAM problem in a sensor-based control framework acts as adding constraints on the relative pose between the robot and its local environment. In other words, the sensor-based control approach allows to guarantee, under certain observability conditions, a uniformly bounded estimation error in the localization process. We pursue our research work on the design of navigation functions in order to, at a reactive control level, ensure collision-free robot motions and, at the navigation level, implement a (topologically) complete exploration of the environment in autonomous mode.