Section: Scientific Foundations
Statistical learning
Before detailing some issues of statistical learning, let us remind the definition of a few terms.
- Glossary
- Machine learning
refers to a system capable of the autonomous acquisition and integration of knowledge. This capacity to learn from experience, analytical observation, and other means, results in a system that can continuously self-improve and thereby offer increased efficiency and effectiveness. (source: http://www.aaai.org/AITopics/html/machine.html AAAI website)
- Statistical learning
is an approach to machine intelligence which is based on statistical modeling of data. With a statistical model in hand, one applies probability theory and decision theory to get an algorithm. This is opposed to using training data merely to select among different algorithms or using heuristics/“common sense” to design an algorithm.
- Kernel method
Generally speaking, a kernel function is a function that maps a couple of points to a real value. Typically, this value is a measure of dissimilarity between the two points. Assuming a few properties on it, the kernel function implicitly defines a dot product in some function space. This very nice formal property as well as a bunch of others have ensured a strong appeal for these methods in the last 10 years in the field of function approximation. Many classical algorithms have been “kernelized”, that is, restated in a much more general way than their original formulation. Kernels also implicitly induce the representation of data in a certain “suitable” space where the problem to solve (classification, regression, ...) is expected to be simpler (non-linearity turns to linearity).
The fundamental tools used in SequeL come from the field of statistical learning [73] . We briefly present the most important for us to date, namely, kernel-based non parametric function approximation, and non parametric Bayesian models.
Kernel methods for non parametric function approximation
In statistics in general, and applied mathematics, the approximation of a multi-dimensional real function given some samples is a well-known problem (known as either regression, or interpolation, or function approximation, ...). Regressing a function from data is a key ingredient of our research, or to the least, a basic component of most of our algorithms. In the context of sequential learning, we have to regress a function while data samples are being obtained one at a time, while keeping the constraint to be able to predict points at any step along the acquisition process. In sequential decision problems, we typically have to learn a value function, or a policy.
Many methods have been proposed for this purpose. We are looking for suitable ones to cope with the problems we wish to solve. In reinforcement learning, the value function may have areas where the gradient is large; these are areas where the approximation is difficult, while these are also the areas where the accuracy of the approximation should be maximal to obtain a good policy (and where, otherwise, a bad choice of action may imply catastrophic consequences).
We particularly favor non parametric methods since they make quite a few assumptions about the function to learn. In particular, we have strong interests in
Non–parametric Bayesian models
Numerous problems in signal processing may be solved efficiently by way of a Bayesian approach. The use of Monte-Carlo methods allows us to handle non–linear, as well as non–Gaussian, problems. In their standard form, they require the formulation of probability densities in a parametric form. For instance, it is a common usage to use Gaussian likelihood, because it is handy. However, in some applications such as Bayesian filtering, or blind deconvolution, the choice of a parametric form of the density of the noise is often arbitrary. If this choice is wrong, it may also have dramatic consequences on the estimation quality. To overcome this shortcoming, one possible approach is to consider that this density must also be estimated from data. A general Bayesian approach then consists in defining a probabilistic space associated with the possible outcomes of the object to be estimated. Applied to density estimation, it means that we need to define a probability measure on the probability density of the noise : such a measure is called a random measure. The classical Bayesian inference procedures can then been used. This approach being by nature non parametric, the associated frame is called Non Parametric Bayesian.
In particular, mixtures of Dirichlet processes [72] provide a very powerful formalism. Dirichlet Processes are a possible random measure and Mixtures of Dirichlet Processes are an extension of well-known finite mixture models. Given a mixture density
where
Given a set of observations, the estimation of the parameters of a mixture of Dirichlet processes is performed by way of a Monte Carlo Markov Chain (MCMC) algorithm. Dirichlet Process Mixture are also widely used in clustering problems. Once the parameters of a mixture are estimated, they can be interpreted as the parameters of a specific cluster defining a class as well. Dirichlet processes are well known within the machine learning community and its potential in statistical signal processing still need to be developped.
Random Finite Sets for multisensor multitarget tracking
In the general multi-sensor multi-target Bayesian framework, an unknown (and possibly varying) number of targets whose states
The random finite set theory provides a powerful framework to deal with these issues. Mahler's work on finite sets statistics (FISST) provides a mathematical framework to build multi-object densities and derive the Bayesian rules for state prediction and state estimation. Randomness on object number and their states are encapsulated into random finite sets (RFS), namely multi-target(state) sets
where:
-
is a multi-target state, i.e. a finite set of elements defined on the single-target space ; (The state of a target is usually composed of its position, its velocity, etc.) -
is the current multi-sensor observation, i.e. a collection of measures produced at time by all the sensors; -
is the collection of observations up to time ; -
is the current multi-target posterior density in state ; -
is the current multi-target Markov transition density, from state to state ; -
is the current multi-sensor/multi-target likelihood function.
Although equations (5 ) may seem similar to the classical single-sensor/single-target Bayesian equations, they are generally intractable because of the presence of the set integrals. For, a RFS