The scientific objectives of ASPI are the design, analysis and implementation of interacting Monte Carlo methods, also known as particle methods, with focus on

statistical inference in hidden Markov models, e.g. state or parameter estimation, including particle filtering,

risk evaluation, including simulation of rare events.

The whole problematic is multidisciplinary, not only because of the many scientific and engineering areas in which particle methods are used, but also because of the diversity of the scientific communities which have already contributed to establish the foundations of the field

target tracking, interacting particle systems, empirical processes, genetic algorithms (GA), hidden Markov models and nonlinear filtering, Bayesian statistics, Markov chain Monte Carlo (MCMC) methods, etc.

Intuitively speaking, interacting Monte Carlo methods are sequential simulation methods, in which particles

*explore*the state space by mimicking the evolution of an underlying random process,

*learn*the environment by evaluating a fitness function,

and
*interact*so that only the most successful particles (in view of the value of the fitness function) are allowed to survive and to get offsprings at the next generation.

The effect of this mutation / selection mechanism is to automatically concentrate particles (i.e. the available computing power) in regions of interest of the state space. In the special case of particle filtering, which has numerous applications under the generic heading of positioning, navigation and tracking, in

target tracking, computer vision, mobile robotics, ubiquitous computing and ambient intelligence, sensor networks, etc.,

each particle represents a possible hidden state, and is multiplied or terminated at the next generation on the basis of its consistency with the current observation, as quantified by the likelihood function. These genetic–type algorithms are particularly adapted to situations which combine a prior model of the mobile displacement, sensor-based measurements, and a base of reference measurements, for example in the form of a digital map (digital elevation map, attenuation map, etc.). In the most general case, particle methods provide approximations of probability distributions associated with a Feynman–Kac flow, by means of the weighted empirical probability distribution associated with an interacting particle system, with applications that go far beyond filtering, in

simulation of rare events, simulation of conditioned or constrained random variables, interacting MCMC methods, molecular simulation, etc.

ASPI essentially carries methodological research activities, rather than activities oriented towards a single application area, with the objective of obtaining generic results with high potential for applications, and of implementing up–to–date results on a few appropriate examples, through collaboration with industrial partners.

The main applications currently considered are geolocalisation and tracking of mobile terminals, terrain–aided navigation, data fusion for indoor localisation, risk assessment for complex hybrid systems, e.g. in air traffic management, and protection of digital documents.

Monte Carlo methods are numerical methods that are widely used in situations where (i) a stochastic (usually Markovian) model is given for some underlying process, and (ii) some
quantity of interest should be evaluated, that can be expressed in terms of the expected value of a functional of the process trajectory, which includes as an important special case the
probability that a given event has occurred. Numerous examples can be found, e.g. in financial engineering (pricing of options and derivative securities)
, in performance evaluation of communication networks (probability of buffer overflow), in statistics of hidden
Markov models (state estimation, evaluation of contrast and score functions), etc. Very often in practice, no analytical expression is available for the quantity of interest, but it is possible
to simulate trajectories of the underlying process. The idea behind Monte Carlo methods is to generate independent trajectories of this process or of an alternate instrumental process, and to
build an approximation (estimator) of the quantity of interest in terms of the weighted empirical probability distribution associated with the resulting independent sample. By the law of large
numbers, the above estimator converges as the size
Nof the sample goes to infinity, with rate
and the asymptotic variance can be estimated using an appropriate central limit theorem. To reduce the variance of the estimator, many variance reduction techniques have been proposed.
Still, running independent Monte Carlo simulations can lead to very poor results, because trajectories are generated
*blindly*, and only afterwards are the corresponding weights evaluated. Some of the weights can happen to be negligible, in which case the corresponding trajectories are not going to
contribute to the estimator, i.e. computing power has been wasted.

A recent and major breakthrough, a brief mathematical presentation of which is given in
, has been the introduction of interacting Monte Carlo methods, also known as sequential Monte Carlo (SMC) methods, in which a
whole (possibly weighted) sample, called
*system of particles*, is propagated in time, where the particles

*explore*the state space under the effect of a
*mutation*mechanism which mimics the evolution of the underlying process,

and are
*replicated*or
*terminated*, under the effect of a
*selection*mechanism which automatically concentrates the particles, i.e. the available computing power, into regions of interest of the state space.

In full generality, the underlying process is a discrete–time Markov chain, whose state space can be

finite, continuous, hybrid (continuous / discrete), graphical, constrained, time varying, pathwise, etc.,

the only condition being that it can easily be
*simulated*. The very important case of a sampled continuous–time Markov process, e.g. the solution of a stochastic differential equation driven by a Wiener process or a more general Lévy
process, is also covered.

In the special case of particle filtering, originally developed within the tracking community, the algorithms yield a numerical approximation of the optimal filter, i.e. of the conditional
probability distribution of the hidden state given the past observations, as a (possibly weighted) empirical probability distribution of the system of particles. In its simplest version,
introduced in several different scientific communities under the name of
*bootstrap filter*
,
*Monte Carlo filter*
or
*condensation*(conditional density propagation) algorithm
, and which historically has been the first algorithm to include a redistribution step, the selection mechanism is
governed by the likelihood function: at each time step, a particle is more likely to survive and to replicate at the next generation if it is consistent with the current observation. The
algorithms also provide as a by–product a numerical approximation of the likelihood function, and of many other contrast functions for parameter estimation in hidden Markov models, such as the
prediction error or the conditional least–squares criterion.

Particle methods are currently being used in many scientific and engineering areas

positioning, navigation, and tracking , , visual tracking , mobile robotics , , ubiquitous computing and ambient intelligence, sensor networks, risk evaluation and simulation of rare events , genetics, molecular simulation , etc.

Other examples of the many applications of particle filtering can be found in the contributed volume
and in the special issue of
*IEEE Transactions on Signal Processing*devoted to
*Monte Carlo Methods for Statistical Signal Processing*in February 2002, where the tutorial paper
can be found, and in the textbook
devoted to applications in target tracking. Applications of sequential Monte Carlo methods to other areas, beyond
signal and image processing, e.g. to genetics, can be found in
.

Particle methods are very easy to implement, since it is sufficient in principle to simulate independent trajectories of the underlying process. The whole problematic is multidisciplinary, not only because of the already mentioned diversity of the scientific and engineering areas in which particle methods are used, but also because of the diversity of the scientific communities which have contributed to establish the foundations of the field

target tracking, interacting particle systems, empirical processes, genetic algorithms (GA), hidden Markov models and nonlinear filtering, Bayesian statistics, Markov chain Monte Carlo (MCMC) methods.

The following abstract point of view, developed and extensively studied by Pierre Del Moral
,
, has proved to be extremely fruitful in providing a very general framework to the design and analysis of numerical
approximation schemes, based on systems of branching and / or interacting particles, for nonlinear dynamical systems with values in the space of probability distributions, associated with
Feynman–Kac flows. Feynman–Kac distributions are characterized by a Markov chain and by nonnegative potential functions that play the role of selection functions. They naturally arise whenever
importance sampling is used: this applies for instance to simulation of rare events, to filtering, i.e. to state estimation in hidden Markov models (HMM), etc. To solve
*numerically*the recurrent equation satisfied by the Feynman–Kac distributions, and in view of the basic assumption that it is easy to
*simulate*r.v.'s according to the Markov transition kernel, i.e. to mimic the evolution of the Markov chain, and that it is easy to
*evaluate*the potential functions, the original idea behind particle methods consists of looking for an approximation in the form of a (possibly weighted) empirical probability
distribution associated with a system of particles. The approximation is completely characterized by the set of particle positions and weights, and the algorithm is completely described by the
mechanism which builds this set recursively. In practice, in the simplest version of the algorithm, known as the
*bootstrap*algorithm, particles

are selected according to their respective weights (selection step),

move according to the Markov transition kernel (mutation step),

are weighted by evaluating the fitness function (weighting step).

The algorithm yields a numerical approximation of the Feynman–Kac distribution as the weighted empirical probability distribution associated with a system of particles, and many asymptotic
results have been proved as the number
Nof particles (sample size) goes to infinity, using techniques coming from applied probability (interacting particle systems, empirical processes
), see e.g. the survey article
or the recent textbook
, and references therein

convergence in
L^{p}, convergence as empirical processes indexed by classes of functions, uniform convergence in time, see also
,
, central limit theorem, see also
,
, propagation of chaos, large deviations principle, moderate deviations principle
, etc.

Beyond the simplest
*bootstrap*version of the algorithm, many algorithmic variations have been proposed
,
, and are commonly used in practice. For instance (i) in the selection step, sampling with replacement could be
replaced with other redistribution schemes so as to reduce the variance (this issue has also been addressed in genetic algorithms), and (ii) to reduce the variance and to save
computational effort, it is often a good idea not to redistribute the particles at each time step, but only when the weights are too far from equidistribution. Even with interacting Monte Carlo
methods, it could happen that some particles generated in one time step have a negligible weight: if this happens for too many particles in the sample, then computing power has been wasted, and
it has been suggested to use importance sampling again in the mutation step, i.e. to let particles explore the state space under the action of an alternate wrong mutation kernel, and to weight
the particles according to their likelihood for the true model, so as to compensate for the wrong modeling. More specifically, using an arbitrary importance decomposition results in the
following general algorithm, known as the
*sampling with importance resampling*(SIR) algorithm, in which particles

are selected according to their respective weights (selection step),

move according to the proposed importance Markov transition kernel (mutation step),

are weighted by evaluating the corresponding importance function, which now depends on the Markov transition (weighting step).

Many of the early convergence results proved in the literature assume that particles are redistributed (i) using sampling with replacement and (ii) at each time step, and move according to the original Markov transition kernel. Systematically studying the impact of the proposed algorithmic variants on the convergence results is still the subject of active research.

Hidden Markov models (HMM) form a special case of partially observed stochastic dynamical systems, in which the state of a Markov process (in discrete or continuous time, with finite or continuous state space) should be estimated from noisy observations. The conditional probability distribution of the hidden state given past observations is a well–known example of a normalized (nonlinear) Feynman–Kac distribution, see . These models are very flexible, because of the introduction of latent variables (non observed) which allows to model complex time dependent structures, to take constraints into account, etc. In addition, the underlying Markovian structure makes it possible to use numerical algorithms (particle filtering, Markov chain Monte Carlo methods (MCMC), etc.) which are computationally intensive but whose complexity is rather small. Hidden Markov models are widely used in various applied areas, such as speech recognition, alignment of biological sequences, tracking in complex environment, modeling and control of networks, digital communications, etc.

Beyond the recursive estimation of a hidden state from noisy observations, the problem arises of statistical inference of HMM with general state space , including estimation of model parameters, early monitoring and diagnosis of small changes in model parameters, etc.

**Large time asymptotics** A fruitful approach is the asymptotic study, when the observation time increases to infinity, of an extended Markov chain, whose state includes
(i) the hidden state, (ii) the observation, (iii) the prediction filter (i.e. the conditional probability distribution of the hidden state given observations at all previous time
instants), and possibly (iv) the derivative of the prediction filter with respect to the parameter. Indeed, it is easy to express the log–likelihood function, the conditional least–squares
criterion, and many other clasical contrast processes, as well as their derivatives with respect to the parameter, as additive functionals of the extended Markov chain.

The following general approach has been proposed

first, prove an exponential stability property (i.e. an exponential forgetting property of the initial condition) of the prediction filter and its derivative, for a misspecified model,

from this, deduce a geometric ergodicity property and the existence of a unique invariant probability distribution for the extended Markov chain, hence a law of large numbers and a central limit theorem for a large class of contrast processes and their derivatives, and a local asymptotic normality property,

finally, obtain the consistency (i.e. the convergence to the set of minima of the associated contrast function), and the asymptotic normality of a large class of minimum contrast estimators.

This programme has been completed in the case of a finite state space , and has been generalized under an uniform minoration assumption for the Markov transition kernel, which typically does only hold when the state space is compact. Clearly, the whole approach relies on the existence of an exponential stability property of the prediction filter, and the main challenge currently is to get rid of this uniform minoration assumption for the Markov transition kernel , , so as to be able to consider more interesting situations, where the state space is noncompact.

**Small noise asymptotics** Another asymptotic approach can also be used, where it is rather easy to obtain interesting explicit results, in terms close to the language of
nonlinear deterministic control theory
. Taking the simple example where the hidden state is the solution to an ordinary differential equation, or a
nonlinear state model, and where the observations are subject to additive Gaussian white noise, this approach consists in assuming that covariances matrices of the state noise and of the
observation noise go simultaneously to zero. If it is reasonable in many applications to consider that noise covariances are small, this asymptotic approach is less natural than the large time
asymptotics, where it is enough (provided a suitable ergodicity assumption holds) to accumulate observations and to see the expected limit laws (law of large numbers, central limit theorem,
etc.). In opposition, the expressions obtained in the limit (Kullback–Leibler divergence, Fisher information matrix, asymptotic covariance matrix, etc.) take here a much more explicit form than
in the large time asymptotics.

The following results have been obtained using this approach

the consistency of the maximum likelihood estimator (i.e. the convergence to the set
Mof global minima of the Kullback–Leibler divergence), has been obtained using large deviations techniques, with an analytical approach
,

if the abovementioned set
Mdoes not reduce to the true parameter value, i.e. if the model is not identifiable, it is still possible to describe precisely the asymptotic behavior of the estimators
: in the simple case where the state equation is a noise–free ordinary differential equation and using a
Bayesian framework, it has been shown that (i) if the rank
rof the Fisher information matrix
Iis constant in a neighborhood of the set
M, then this set is a differentiable submanifold of codimension
r, (ii) the posterior probability distribution of the parameter converges to a random probability distribution in the limit, supported by the manifold
M, absolutely continuous w.r.t. the Lebesgue measure on
M, with an explicit expression for the density, and (iii) the posterior probability distribution of the suitably normalized difference between the parameter and its projection on
the manifold
M, converges to a mixture of Gaussian probability distributions on the normal spaces to the manifold
M, which generalized the usual asymptotic normality property,

it has been shown
that (i) the parameter dependent probability distributions of the observations are locally asymptotically
normal (LAN)
, from which the asymptotic normality of the maximum likelihood estimator follows, with an explicit expression
for the asymptotic covariance matrix, i.e. for the Fisher information matrix
I, in terms of the Kalman filter associated with the linear tangent linear Gaussian model, and (ii) the score function (i.e. the derivative of the log–likelihood function w.r.t.
the parameter), evaluated at the true value of the parameter and suitably normalized, converges to a Gaussian r.v. with zero mean and covariance matrix
I.

The estimation of the small probability of a rare but critical event, is a crucial issue in industrial areas such as

nuclear power plants, food industry, telecommunication networks, finance and insurance industry, air traffic management, etc.

In such complex systems, analytical methods cannot be used, and naive Monte Carlo methods are clearly unefficient to estimate accurately very small probabilities. Besides importance sampling, an alternate widespread technique consists in multilevel splitting , where trajectories going towards the critical set are given offsprings, thus increasing the number of trajectories that eventually reach the critical set. As shown in , the Feynman–Kac formalism of is well suited for the design and analysis of splitting algorithms for rare event simulation.

**Propagation of uncertainty** Multilevel splitting can be used in static situations. Here, the objective is to learn the probability distribution of an output random
variable
Y=
(
X), where the function
is only defined pointwise for instance by a computer programme, and where the probability distribution of the input random variable
Xis known and easy to simulate from. More specifically, the objective could be to compute the probability of the output random variable exceeding a threshold, or more generally to
evaluate the cumulative distribution function of the output random variable for different output values. This problem is characterized by the lack of an analytical expression for the function,
the computational cost of a single pointwise evaluation of the function, which means that the number of calls to the function should be limited as much as possible, and finally the complexity
and/or unavailability of the source code of the computer programme, which makes any modification very difficult or even impossible, for instance to change the model as in importance sampling
methods.

The key issue is to learn as fast as possible regions of the input space which contribute most to the computation of the target quantity. The proposed splitting methos consists in (i) introducing a sequence of intermediate regions in the input space, implicitly defined by exceeding an increasing sequence of thresholds or levels, (ii) counting the fraction of samples that reach a level given that the previous level has been reached already, and (iii) regenerating the sample at each intermediate step, through redistribution. In this way, the algorithm learns

the transition probability between successive levels, hence the probability of reaching each intermediate level,

and the probability distribution of the input random variable, conditionned on the output variable reaching each intermediate level.

A further remark, is that this conditional probability distribution is precisely the optimal (zero variance) importance distribution needed to compute the probability of reaching the considered intermediate level.

**Rare event simulation** To be specific, consider a complex dynamical system modelled as a Markov process, whose state can possibly contain continuous components and finite
components (mode, regime, etc.), and the objective is to compute the probability, hopefully very small, that a critical region of the state space is reached by the Markov process before a final
time
T, which can be deterministic and fixed, or random (for instance the time of return to a recurrent set, corresponding to a nominal behaviour).

The proposed splitting method consists in (i) introducing a decreasing sequence of intermediate, more and more critical, regions in the state space, (ii) counting the fraction of
trajectories that reach an intermediate region before time
T, given that the previous intermediate region has been reached before time
T, and (iii) regenerating the population at each stage, through redistribution. In addition to the non–intrusive behaviour of the method, the splitting methods make it possible to
learn the probability distribution of typical critical trajectories, which reach the critical region before final time
T, an important feature that methods based on importance sampling usually miss. Many variants have been proposed, whether

the branching rate (number of offsprings allocated to a successful trajectory) is fixed, which allows for depth–first exploration of the branching tree, but raises the issue of controlling the population size,

the population size is fixed, which requires a breadth–first exploration of the branching tree, with random (multinomial) or deterministic allocation of offsprings, etc.

Just as in the static case, the algorithm learns

the transition probability between successive levels, hence the probability of reaching each intermediate level,

and the entrance probability distribution of the Markov process in each intermediate region.

Contributions have been given to

minimizing the asymptotic variance, obtained through a central limit theorem, with respect to the shape of the intermediate regions (selection of the importance function), to the thresholds (levels), to the population size, etc.

controlling the probability of extinction (when not even one trajectory reaches the next intermediate level),

designing and studying variants suited for hybrid state space (resampling per mode, marginalization),

and in the static case, to

minimizing the asymptotic variance, obtained through a central limit theorem, with respect to intermediate levels, to the Metropolis kernel introduced in the mutation step, etc.

Among the many application domains of particle methods, or interacting Monte Carlo methods, ASPI has decided to focus on applications in localisation (or positioning), navigation and tracking , , which already covers a very broad spectrum of application domains. The objective here is to estimate the position (and also velocity, attitude, etc.) of a mobile object, from the combination of different sources of information, including

a prior dynamical model of typical evolutions of the mobile, such as inertial estimates and prior model for inertial errors,

measurements provided by sensors,

and possibly a digital map providing some useful feature (terrain altitude, power attenuation, etc.) at each possible position.

In some applications, another useful source of information is provided by

a map of constrained admissible displacements, for instance in the form of an indoor building map,

which particle methods can easily handle (map-matching). This Bayesian dynamical estimation problem is also called filtering, and its numerical implementation using particle methods, known as particle filtering, has been introduced by the target tracking community , , which has already contributed to many of the most interesting algorithmic improvements and is still very active, and has found applications in

target tracking, integrated navigation, points and / or objects tracking in video sequences, mobile robotics, wireless communications, ubiquitous computing and ambient intelligence, sensor networks, etc.

ASPI is contributing to several applications of particle filtering in positioning, navigation and tracking, such as geolocalisation and tracking in wireless communications, terrain–aided navigation, see , and data fusion for indoor localisation, see .

Another application domain of particle methods, or interacting Monte Carlo methods, that ASPI has decided to focus on is the estimation of the small probability of a rare but critical event, in complex dynamical systems. This is a crucial issue in industrial areas such as

nuclear power plants, food industry, telecommunication networks, finance and insurance industry, air traffic management, etc.

In such complex systems, analytical methods cannot be used, and naive Monte Carlo methods are clearly unefficient to estimate accurately very small probabilities. Besides importance sampling, an alternate widespread technique consists in multilevel splitting , where trajectories going towards the critical set are given offsprings, thus increasing the number of trajectories that eventually reach the critical set. This approach not only makes it possible to estimate the probability of the rare event, but also provides realizations of the random trajectory, given that it reaches the critical set, i.e. provides realizations of typical critical trajectories, an important feature that methods based on importance sampling usually miss.

ASPI is contributing to several applications of multilevel splitting for rare event simulation, such as risk assessment in air traffic management, see , and protection of digital documents, see .

To illustrate that particle filtering algorithms are efficient, easy to implement, and extremely visual and intuitive by nature, for localisation, navigation and tracking problems in complex environments, with geometrical constraints, that would be very difficult to solve with usual Kalman filters. This material has proved very useful in training sessions and seminars that have been organized in response to the demand from industrial partners (SAGEM, CNES and EDF), and also in teaching. At the moment, the following three demos are available

Inertial position and velocity estimates are known to drift away from their true values, and need to be combined with some external source of information. In this demo, noisy measurements of the terrain height below an aircraft are obtained as the difference between (i) the aircraft altitude above the sea level (provided by a pression sensor) and (ii) the aircraft altitude above the terrain (provided by an altimetric radar), and are compared to the terrain height in any possible point (read on the elevation map). A cloud (swarm) of particles explores various possible trajectories generated from inertial navigation estimates and from a model of inertial navigation errors, and are replicated or discarded depending on whether the terrain height below the particle (i.e. at the same horizontal position) matches or not the available noisy measurement of the terrain height below the aircraft.

In this demo, several stations cooperate to locate and track a mobile from noisy angle measurements, in the presence of obstacles (walls, tunnels, etc), which make the mobile temporarily invisible from one or several stations.

In this demo, a mobile robot is finding its way inside a building, a digital map of which (including walls, doorways, etc.) is provided. The initial position, velocity and orientation of the robot are unknown, and noisy measurements of its rotation and linear displacement are given by an odometer. In addition, a ring of laser sensors detects with some error the distance from the robot to obstacles in sixteen different directions. A cloud (swarm) of particles explores various possible trajectories generated from odometer navigation estimates and from a model of odometer navigation errors, and are replicated or discarded depending on whether the distance from the particle to obstacles matches or not the available noisy measurement of the distance from the robot to the obstacles, in all sixteen directions, and depending also on whether the generated trajectories are compatible with the presence of obstacles.

This is a collaboration with Élise Arnaud, from université Joseph Fourier and INRIA Grenoble — Rhône Alpes.

A longstanding problem in particle or sequential Monte Carlo (SMC) methods is to mathematically prove the popular belief that resampling does improve the performance of the estimation (this of course is not always true, and the real question is to clarify classes of problems where resampling helps). A more pragmatic answer to the problem is to use adaptive procedures that have been proposed on the basis of heuristic considerations, where resampling is performed only when it is felt necessary, i.e. when some criterion (effective number of particles, entropy of the sample, etc.) reaches some prescribed threshold. It still remains to mathematically prove the efficiency of such adaptive procedures. Our first contribution has been to consider a design where resampling is performed at some intermediate fixed time instants, and to optimize the asymptotic variance of the estimation error w.r.t. the resampling time instants. The second contribution has been to prove a central limit theorem for particle methods with adaptive resampling, using an interpretation of particle methods where importance weights are interpreted as particles , as long as they are not used for resampling purpose, and to minimize the asymptotic variance w.r.t. the threshold.

This is a collaboration with Valérie Monbet, from université de Bretagne Sud.

Surprisingly, very little was known about the asymptotic behaviour of the ensemble Kalman filter
,
,
, whereas on the other hand, the asymptotic behaviour of many different classes of particle filters is well
understood, as the number of particles goes to infinity. Interpreting the ensemble elements as a population of particles with mean–field interactions, and not only as an instrumental device
producing an estimation of the hidden state as the ensemble mean value, it has been possible to prove the convergence of the ensemble Kalman filter, with a rate of order
, as the number
Nof ensemble elements increases to infinity
. In addition, the limit of the empirical distribution of the ensemble elements has been exhibited, which differs
from the usual Bayesian filter. Several cases have been invesigated, from the simple case where the drift coefficient is bounded and globally Lipschitz continuous, to the more realistic case
where the drift coefficient is locally Lipschitz continuous, with polynomial growth. In all these cases, the observation coefficient was assumed linear, so that the analysis step for each
ensemble element has exactly the same structure as the analysis step of the usual Kalman filter.

The next step is to study the asymptotic normality of the estimation error, i.e. to prove a central limit theorem. It is somehow expected that the asymptotic variance for the ensemble Kalman filter would be smaller than the known asymptotic variance for the different brands of particle filters, just because the ensemble Kalman filter follows essentially a parametric approach, where only the first two empirical moments are propagated, whereas the particle filters follow a fully nonparametric approach.

This is a collaboration with Christophe Baehr, from the centre national de recherche météorologique (CNRM) of Météo–France.

The motivating application is the estimation of Lagrangian velocity in a turbulent flow: to filter out observation noise a Bayesian approach is used, with a simplified Langevin model
as the prior for the Lagrangian velocity. This model involves local means of the Eulerian velocity field, which can
be expressed in terms of the probability distribution of the Lagrangian velocity. Other nonlinear terms in the model, such as the mean pressure gradient, the turbulent kinetic energy
kand its dissipation rate
are either considered as unknown random variables, with a somehow arbitrary prior probability distribution, or are related with local Eulerian means and can then be expressed in terms of
the probability distribution of the Lagrangian velocity. In other words, the proposed simplified Langevin model is a special example of a nonlinear McKean model with mean–field interactions,
where the drift coefficient depends on the probability distribution of the solution.

The original estimation problem reduces to the estimation of the hidden state in a nonlinear Markov model, and numerical approximations have been studied, with two populations of particles: the first population of particles with mean–field interactions learns the unconditional probability distribution of the hidden state, whereas the second population of particles approximates the Bayesian filter, i.e. the conditional probability distribution of the hidden state given the observations.

Alternatively, since noisy observations are available, the local Eulerian means can be expressed in terms of the conditional probability distribution of the Lagrangian velocity given the observations. This results in a much simpler model, where a single population of particles is sufficient to approximate the Bayesian filter.

Gábor Tardos was the first to give a construction of a fingerprinting code whose length meets the lowest known bound. This was a real breakthrough because the construction is very simple. Its efficiency comes from its probabilistic nature. However, although Tardos almost gave no argument of his rationale, many parameters of his code are precisely fine–tuned. We propose this missing rationale supporting the code construction. The key idea is to render the statistics of the scores as independent as possible from the collusion process. Tardos optimal parameters are rediscovered. This interpretation allows small improvements when some assumptions hold on the collusion process.

This is a collaboration with Gérard Biau, from université Paris 6.

Using covering numbers for compact imbeddings in fuctional spaces, we have optained explicit rates of convergence for the
k–nearest neighbor classifier in (infinite dimensional) function spaces. The key idea is to use a norm to compute the neighbors which comes from a functional space with less regularity
than the samples. The rates obtained are genuine nonparametric convergence rates, and up to our knowledge the first of their kind for
k–nearest neighbor classification. This work is still in progress.

INRIA contract ALLOC 2399 — May 2007 to August 2010

This FP6 project is coordinated by National Aerospace Laboratory (NLR) (The Netherlands). The academic partners are University of Cambridge and University of Leicester (United Kingdom), Politecnico di Milano and Universita dell'Aquila (Italy), University of Twente (The Netherlands), ETH Zürich (Switzerland), University of Tartu (Estonia), National Technical University of Athens (NTUA) and Athens University of Economics and Business (Greece), Direction des Services de la Navigation Aérienne (DSNA), École Nationale de l'Aviation Civile (ENAC), Eurocontrol Experimental Center (EEC) and INRIA Bretagne–Atlantique (France), and the industrial partners are Honeywell (Czech Republic), Isdefe (Spain), Dedale (France), NATS En Route Ltd. (United Kingdom).

The objective of iFLYis to develop both an advanced airborne self separation design and a highly automated air traffic management (ATM) design for en–route traffic, which takes advantage of autonomous aircraft operation capabilities and which is aimed to manage a three to six times increase in current en–route traffic levels. The proposed research combines expertise in air transport human factors, safety and economics with analytical and Monte Carlo simulation methodologies. The contribution of ASPI to this project concerns the work package on accident risk assessment methods and their implementation using conditional Monte Carlo methods, especially for large scale stochastic hybrid systems: designing and studying variants suited for hybrid state space (resampling per mode, marginalization) are currently investigated.

INRIA contract ALLOC 2857 — January 2007 to December 2009

This collaboration with Thalès Communications is supported by DGA (Délégation Générale à l'Armement) and is related with the supervision of the CIFRE thesis of Nordine El Baraka.

The overall objective is to study innovative algorithms for terrain–aided navigation, and to demonstrate these algorithms on four different situations involving different platforms, inertial navigation units, sensors and georeferenced databases. The thesis also considers the special use of image sensors (optical, infra–red, radar, sonar, etc.) for navigation tasks, based on correlation between the observed image sequence and a reference image available on–board in the database.

Marginalized particle filters and regularized particle filters have been implemented, and several propositions have been studied to adapt the sample size, such as KLD–sampling , which could be useful in the case of a poor initial information, or if the platform flies over a poorly informative area. Besides particle methods, which are proposed as the basic navigation algorithm, simpler algorithms such as the extended Kalman filter (EKF) or the unscented Kalman filter (UKF) have also been investigated.

INRIA contract ALLOC 2856 — January 2008 to December 2010

This ANR project is coordinated by Thalès Alenia Space. Academic partners are LAAS (laboratoire d'architecture et d'analyse des systèmes), TeSA consortium including ENAC (école nationale de l'aviation civile). Industrial partners are Microtec and Silicom.

The overall objective is to study and demonstrate information fusion algorithms for localisation of pedestrian users in an indoor environment, where GPS solution cannot be used. The sought design combines

a pedestrian dead–reckoning (PDR) unit, providing noisy estimates of the linear displacement, angular turn, and possibly of the level change through an additional pression sensor,

range and / or proximity measurements provided by beacons at fixed and known locations, and possibly indirect distance measurements to access points, through a measure of the power signal attenuation,

constraints provided by an indoor map of the building (map-matching),

collaborative localisation when two users meet and exchange their respective position estimates.

Besides particle methods, which are proposed as the basic information fusion algorithm for the centralized server–based implementation, simpler algorithms such as the extended Kalman filter (EKF) or the unscented Kalman filter (UKF) are investigated, to be used for the local PDA–based implementation with a map of a smaller part of the building. In both cases, constraints are taken care of with the help of a Voronoi graph . Adaptating the sample size using KLD–sampling has also been investigated, which could be useful in the case of a poor initial information, or if the user walks in poorly informative area (open zone, absence of beacons). Preliminary investigations have been made during the internship of Pierre Blanchart (TELECOM Bretagne), and are now continued by Liyun He.

INRIA contract ALLOC 2229 — January 2007 to December 2009.

This ANR project is coordinated by the project–team TEMICS from IRISA / INRIA Bretagne Atlantique. The other partners are LIS–INPG in Grenoble and université de Nice.

There are mainly two strategic axes in NEBBIANO: watermarking and independent component analysis, and watermarking and rare event simulations. To protect copyright owners, user identifiers are embedded in purchased content such as music or movie. This is basically what we mean by watermarking. This watermarking is to be “invisible” to the standard user, and as difficult to find as possible. When content is found in an illegal place (e.g. a P2P network), the right holders decode the hidden message, find a serial number, and thus they can trace the traitor, i.e. the client who has illegally broadcast their copy. However, the task is not that simple as dishonest users might collude. For security reasons, anti–collusion codes have to be employed. Yet, these solutions (also called weak traceability codes) have a non–zero probability of error defined as the probability of accusing an innocent. This probability should be, of course, extremely low, but it is also a very sensitive parameter: anti–collusion codes get longer (in terms of the number of bits to be hidden in content) as the probability of error decreases. Fingerprint designers have to strike a trade–off, which is hard to conceive when only rough estimation of the probability of error is known. The major issue for fingerprinting algorithms is the fact that embedding large sequences implies also assessing reliability on a huge amount of data which may be practically unachievable without using rare event analysis. Our task within this project is to adapt our methods for estimating rare event probabilities to this framework, and provide watermarking designers with much more accurate false detection probabilities than the bounds currently found in the literature. We have already applied these ideas to some randomized watermarking schemes and obtained much sharper estimates of the probability of accusing an innocent.

Numerical investigations have been made during the internship of Vincent Bahuon (EURIA, Brest). A patent
*“Validation de schémas de verrous numériques en watermarking et fingerprinting”*has been submitted by INRIA and by université de Rennes 2.

INRIA contract ALLOC 2205 — December 2006 to November 2009.

This ANR project is coordinated by Alcatel–Lucent. The other partners are Alcatel Thales III–V Lab, INT Évry, INRIA Bretagne–Atlantique (project–teams ASPI and TEMICS), Kylia, Photline and XLIM (université de Limoges).

The project COHDEQ40 intends to demonstrate the potential of coherent detection associated with digital signal processing for the next generation high density 40Gb/s WDM systems optimized for transparency and flexibility. Key integrated optoelectronics components and specific algorithms will be developed and system evaluation performed. The INRIA task is to develop these signal processing algorithms needed to recover the message on the decoder side. This makes full use of our knowledge of equalization and synchronization techniques involved in digital communications .

A patent
*“A decision directed algorithm for adjusting a polarization demultiplexer in a coherent detection optical receivers”*has been jointly submitted in September by Alcatel Lucent and by
INRIA.

INRIA contract ALLOC 2801 — January 2008 to December 20010.

This ANR project is coordinated by Alcatel–Lucent. The other partners are E2V, TELECOM ParisTech, LIP (ENS Lyon).

The primary goal of the TCHATER project is to demonstrate a coherent terminal operating at 40Gb/s using
*real–time*digital signal processing and efficient polarization division multiplexing. The terminal will benefit to next-generation high information-spectral density optical networks,
while offering straightforward compatibility with current 10Gbit/s networks. It will require that advanced high–speed electronic components, especially analog–to–digital converters, are
designed within the project. Specific algorithms for polarisation demultiplexing and forward error correction with soft decoding will also have to be developed.

Arnaud Guyader is a co–organizer of the Séminaire de Statistiquesof the statistics research teamof IRMAR (institut de recherche mathématique de Rennes).

François Le Gland has co–organized a special session on adaptive Monte Carlo methods at the meetingof the MAS (modélisation aléatoire et statistique) thematic group of SMAI (société de mathématiques appliquées et industrielles), held in Rennes in September 2008.

François Le Gland was a member of the committee for the PhD thesis of Christophe Baehr (université Paul Sabatier, Toulouse, advisor: Pierre Del Moral).

Arnaud Guyader and François Le Gland are members of the “commission de spécialistes” in applied mathematics (section 26) of université de Rennes 2. Arnaud Guyader is a member of the “commission de spécialistes” in computer science (section 27) of université de Rennes 2. François Le Gland is a member of the “commission de spécialistes” in mathematics (sections 25–26) of INSA (institut national de sciences appliquées) Rennes.

François Le Gland gives a course on Kalman filtering and hidden Markov models, at université de Rennes 1, within the Master SISEA (signal, image, systèmes embarqués, automatique, école doctorale MATISSE), a 3rd year course on Bayesian filtering and particle approximation, at ENSTA (école nationale supérieure de techniques avancées), Paris, within the systems and control module, and a 3rd year course on hidden Markov models, at Télécom Bretagne, Brest.

Arnaud Guyader is a member of the committee of “oraux blancs d'agrégation de mathématiques” for ENS Cachan at Ker Lann.

In addition to presentations with a publication in the proceedings, and which are listed at the end of the document, members of ASPI have also given the following presentations.

Frédéric Cérou has given a talk on rare event simulation for a static distribution at the 7th international workshop on Rare Event Simulation (RESIM'08), held in Rennes in September 2008.

Arnaud Guyader has given a talk on the rate of convergence for nearest neighbor classification at the joint meeting of the Statistical Society of Canada and the Société Française de Statistique held in Ottawa in May 2008. He has also given a talk on rare event simulation for a static distribution at the meeting of the MAS (modélisation aléatoire et statistique) thematic group of SMAI (société de mathématiques appliquées et industrielles), held in Rennes in September 2008.

François Le Gland has given several talks on the multilevel splitting approach to rare event simulation at the applied mathematics seminar, université Blaise Pascal, Clermont–Ferrand in April 2008, at the 7th international workshop on Rare Event Simulation (RESIM'08), held in Rennes in September 2008, and at the LSTA (laboratoire de statistique théorique et appliquée) seminar, université Pierre et Marie Curie in November 2008. He has given a talk on sequential data assimilation at the meeting of the GDR Turbulence et Mélange, held at CEMAGREF in Rennes in January 2008. He has given several talks on the large sample asymptotics of the ensemble Kalman filter at the workshop on Ensemble Methods in Meteorology and Oceanography, held at IPSL (institut Pierre–Simon Laplace) in Paris in May 2008, at the 3rd meeting on Meteorology and Applied Mathematics, held at Météo–France in Toulouse in September 2008, and at the STAR (statistique rennaise) meeting, held at ENSAI (école nationale de la statistique et de l'analyse de l'information) in Rennes in December 2008.

Vu Duc Tran has given a talk on ensemble Kalman filter vs. particle filters at the LSTA (laboratoire de statistique théorique et appliquée) student seminar, université Pierre et Marie Curie in March 2008.