Section: Research Program

Visual servoing

Basically, visual servoing techniques consist in using the data provided by one or several cameras in order to control the motions of a dynamic system [1]. Such systems are usually robot arms, or mobile robots, but can also be virtual robots, or even a virtual camera. A large variety of positioning tasks, or mobile target tracking, can be implemented by controlling from one to all the degrees of freedom of the system. Whatever the sensor configuration, which can vary from one on-board camera on the robot end-effector to several free-standing cameras, a set of visual features has to be selected at best from the image measurements available, allowing to control the desired degrees of freedom. A control law has also to be designed so that these visual features 𝐬(t) reach a desired value 𝐬*, defining a correct realization of the task. A desired planned trajectory 𝐬*(t) can also be tracked. The control principle is thus to regulate the error vector 𝐬(t)-𝐬*(t) to zero. With a vision sensor providing 2D measurements, potential visual features are numerous, since 2D data (coordinates of feature points in the image, moments, ...) as well as 3D data provided by a localization algorithm exploiting the extracted 2D features can be considered. It is also possible to combine 2D and 3D visual features to take the advantages of each approach while avoiding their respective drawbacks.

More precisely, a set 𝐬 of k visual features can be taken into account in a visual servoing scheme if it can be written:

𝐬 = 𝐬 ( 𝐱 ( 𝐩 ( t ) ) , 𝐚 ) (1)

where 𝐩(t) describes the pose at the instant t between the camera frame and the target frame, 𝐱 the image measurements, and 𝐚 a set of parameters encoding a potential additional knowledge, if available (such as for instance a coarse approximation of the camera calibration parameters, or the 3D model of the target in some cases).

The time variation of 𝐬 can be linked to the relative instantaneous velocity 𝐯 between the camera and the scene:

𝐬 ˙ = 𝐬 𝐩 𝐩 ˙ = 𝐋 𝐬 𝐯 (2)

where 𝐋𝐬 is the interaction matrix related to 𝐬. This interaction matrix plays an essential role. Indeed, if we consider for instance an eye-in-hand system and the camera velocity as input of the robot controller, we obtain when the control law is designed to try to obtain an exponential decoupled decrease of the error:

𝐯 c = - λ 𝐋 𝐬 ^ + ( 𝐬 - 𝐬 * ) - 𝐋 𝐬 ^ + 𝐬 t ^ (3)

where λ is a proportional gain that has to be tuned to minimize the time-to-convergence, 𝐋𝐬^+ is the pseudo-inverse of a model or an approximation of the interaction matrix, and 𝐬t^ an estimation of the features velocity due to a possible own object motion.

From the selected visual features and the corresponding interaction matrix, the behavior of the system will have particular properties as for stability, robustness with respect to noise or to calibration errors, robot 3D trajectory, etc. Usually, the interaction matrix is composed of highly non linear terms and does not present any decoupling properties. This is generally the case when 𝐬 is directly chosen as 𝐱. In some cases, it may lead to inadequate robot trajectories or even motions impossible to realize, local minimum, tasks singularities, etc. It is thus extremely important to design adequate visual features for each robot task or application, the ideal case (very difficult to obtain) being when the corresponding interaction matrix is constant, leading to a simple linear control system. To conclude in a few words, visual servoing is basically a non linear control problem. Our Holy Grail quest is to transform it into a linear control problem.

Furthermore, embedding visual servoing in the task function approach allows solving efficiently the redundancy problems that appear when the visual task does not constrain all the degrees of freedom of the system. It is then possible to realize simultaneously the visual task and secondary tasks such as visual inspection, or joint limits or singularities avoidance. This formalism can also be used for tasks sequencing purposes in order to deal with high level complex applications.