## Section: Scientific Foundations

### Visual servoing

Nowadays, visual servoing is a widely used technique is robot control. It consists in using data provided by a vision sensor for controlling the motions of a robot [43] . Various sensors can be considered such as perspective cameras, omnidirectional cameras, 2D ultrasound probes or even virtual cameras. In fact, this technique is historically embedded in the larger domain of sensor-based control [51] so that other sensors than vision sensors can be properly used. On the other hand, this approach was first dedicated to robot arms control. Today, much more complex system can be considered like humanoid robots, cars, submarines, airships, helicopters, aircrafts. Therefore, visual servoing is now seen as a powerful approach to control the state of dynamic systems.

Classically, to achieve a visual servoing task, a set of visual features $\mathbf{s}$ has to be selected from visual measurements $\mathbf{m}$ extracted from the image. A control law is then designed so that these visual features reach a desired value ${\mathbf{s}}^{*}$ related to the desired state of the system. The control principle is thus to regulate to zero the error vector $\mathbf{e}=\mathbf{s}-{\mathbf{s}}^{*}$. To build the control law, the knowledge of the so-called *interaction matrix* ${\mathbf{L}}_{\mathbf{s}}$ is usually required. This matrix links the time variation of $\mathbf{s}$ to the camera instantaneous velocity $\mathbf{v}$

$\dot{\mathbf{s}}={\mathbf{L}}_{\mathbf{s}}\phantom{\rule{0.166667em}{0ex}}\mathbf{v}+\frac{\partial \mathbf{s}}{\partial t}$ | (1) |

where the term $\frac{\partial \mathbf{s}}{\partial t}$ describes the non-stationary behavior of $\mathbf{s}$. Typically, if we try to ensure an exponential decoupled decrease of the error signal and if we consider the camera velocity as the input of the robot controller, the control law writes as follow

$\mathbf{v}=-\lambda {\widehat{{\mathbf{L}}_{\mathbf{s}}}}^{+}\mathbf{e}-{\widehat{{\mathbf{L}}_{\mathbf{s}}}}^{+}\widehat{\frac{\partial \mathbf{e}}{\partial t}}$ | (2) |

with $\lambda $ a proportional gain that has to be tuned to minimize the time-to-convergence, ${\widehat{{\mathbf{L}}_{\mathbf{s}}}}^{+}$ the pseudo-inverse of a model or an approximation of ${\mathbf{L}}_{\mathbf{s}}$ and $\widehat{\frac{\partial \mathbf{e}}{\partial t}}$ an estimation of $\frac{\partial \mathbf{e}}{\partial t}$.

The behavior of the closed-loop system is then obtained, from (2 ), by expressing the time variation of the error $\mathbf{e}$

$\dot{\mathbf{e}}=-\lambda {\mathbf{L}}_{\mathbf{s}}{\widehat{{\mathbf{L}}_{\mathbf{s}}}}^{+}\mathbf{e}-{\mathbf{L}}_{\mathbf{s}}{\widehat{{\mathbf{L}}_{\mathbf{e}}}}^{+}\widehat{\frac{\partial \mathbf{e}}{\partial t}}+\frac{\partial \mathbf{e}}{\partial t}.$ | (3) |

As can be seen, visual servoing explicitly relies on the choice of the visual features $\mathbf{s}$ and then on the related interaction matrix; that is the key point of this approach. Indeed, this choice must be performed very carefully. Especially, an isomorphism between the camera pose and the visual features is required to ensure that the convergence of the control law will lead to the desired state of the system. An optimal choice would result in finding visual features leading to a diagonal and constant interaction matrix and, consequently, to a linear decoupled system for which the control problem is well known. Thereafter, the isomorphism as well as the global stability would be guaranteed. In addition, since the interaction matrix would present no more nonlinearities, a suitable robot trajectory would be ensured.

However, finding such visual features is a very complex problem and it is still an open issue. Basically, this problem consists in building the visual features $\mathbf{s}$ from the nonlinear visual measurements $\mathbf{m}$ so that the interaction matrix related to $\mathbf{s}$ becomes diagonal and constant or, at least, as simple as possible.

On the other hand, a robust extraction, matching (between the initial and desired measurements) and real-time spatio-temporal tracking (between successive measurements) have to be ensure but have proved to be a complex task, as testified by the abundant literature on the subject. Nevertheless, this image process is, to date, a necessary step and often considered as one of the bottlenecks of the expansion of visual servoing. That is why more and more non geometric visual measurements are proposed [3] .