## Section: Scientific Foundations

### Statistics of HMM

Hidden Markov models (HMM) form a special case of partially observed stochastic dynamical systems, in which the state of a Markov process (in discrete or continuous time, with finite or continuous state space) should be estimated from noisy observations. The conditional probability distribution of the hidden state given past observations is a well–known example of a normalized (nonlinear) Feynman–Kac distribution, see 3.1 . These models are very flexible, because of the introduction of latent variables (non observed) which allows to model complex time dependent structures, to take constraints into account, etc. In addition, the underlying Markovian structure makes it possible to use numerical algorithms (particle filtering, Markov chain Monte Carlo methods (MCMC), etc.) which are computationally intensive but whose complexity is rather small. Hidden Markov models are widely used in various applied areas, such as speech recognition, alignment of biological sequences, tracking in complex environment, modeling and control of networks, digital communications, etc.

Beyond the recursive estimation of a hidden state from noisy observations, the problem arises of statistical inference of HMM with general state space [38] , including estimation of model parameters, early monitoring and diagnosis of small changes in model parameters, etc.

**Large time asymptotics** A fruitful approach is the asymptotic study, when the observation
time increases to infinity, of an extended Markov chain, whose
state includes (i) the hidden state, (ii) the observation,
(iii) the prediction filter (i.e. the conditional probability
distribution of the hidden state given observations at all previous
time instants), and possibly (iv) the derivative of the prediction
filter with respect to the parameter.
Indeed, it is easy to express the log–likelihood function,
the conditional least–squares criterion, and many other clasical
contrast processes, as well as their derivatives with respect to
the parameter, as additive functionals of the extended Markov chain.

The following general approach has been proposed

first, prove an exponential stability property (i.e. an exponential forgetting property of the initial condition) of the prediction filter and its derivative, for a misspecified model,

from this, deduce a geometric ergodicity property and the existence of a unique invariant probability distribution for the extended Markov chain, hence a law of large numbers and a central limit theorem for a large class of contrast processes and their derivatives, and a local asymptotic normality property,

finally, obtain the consistency (i.e. the convergence to the set of minima of the associated contrast function), and the asymptotic normality of a large class of minimum contrast estimators.

This programme has been completed in the case of a finite state space [7] , and has been generalized [43] under an uniform minoration assumption for the Markov transition kernel, which typically does only hold when the state space is compact. Clearly, the whole approach relies on the existence of an exponential stability property of the prediction filter, and the main challenge currently is to get rid of this uniform minoration assumption for the Markov transition kernel [41] , [64] , so as to be able to consider more interesting situations, where the state space is noncompact.

**Small noise asymptotics** Another asymptotic approach can also be used, where it is rather easy
to obtain interesting explicit results, in terms close to the language
of nonlinear deterministic control theory [58] .
Taking the simple example where the hidden state is the solution to
an ordinary differential equation, or a nonlinear state model, and
where the observations are subject to additive Gaussian white noise,
this approach consists in assuming that covariances matrices
of the state noise and of the observation noise go simultaneously
to zero. If it is reasonable in many applications to consider that
noise covariances are small, this asymptotic approach is less natural
than the large time asymptotics, where it is enough (provided a
suitable ergodicity assumption holds) to accumulate observations
and to see the expected limit laws (law of large numbers, central
limit theorem, etc.). In opposition, the expressions obtained in the
limit (Kullback–Leibler divergence, Fisher information matrix, asymptotic
covariance matrix, etc.) take here a much more explicit form than in the
large time asymptotics.

The following results have been obtained using this approach

the consistency of the maximum likelihood estimator (i.e. the convergence to the set $M$ of global minima of the Kullback–Leibler divergence), has been obtained using large deviations techniques, with an analytical approach [55] ,

if the abovementioned set $M$ does not reduce to the true parameter value, i.e. if the model is not identifiable, it is still possible to describe precisely the asymptotic behavior of the estimators [56] : in the simple case where the state equation is a noise–free ordinary differential equation and using a Bayesian framework, it has been shown that (i) if the rank $r$ of the Fisher information matrix $I$ is constant in a neighborhood of the set $M$, then this set is a differentiable submanifold of codimension $r$, (ii) the posterior probability distribution of the parameter converges to a random probability distribution in the limit, supported by the manifold $M$, absolutely continuous w.r.t. the Lebesgue measure on $M$, with an explicit expression for the density, and (iii) the posterior probability distribution of the suitably normalized difference between the parameter and its projection on the manifold $M$, converges to a mixture of Gaussian probability distributions on the normal spaces to the manifold $M$, which generalized the usual asymptotic normality property,

it has been shown [65] that (i) the parameter dependent probability distributions of the observations are locally asymptotically normal (LAN) [61] , from which the asymptotic normality of the maximum likelihood estimator follows, with an explicit expression for the asymptotic covariance matrix, i.e. for the Fisher information matrix $I$, in terms of the Kalman filter associated with the linear tangent linear Gaussian model, and (ii) the score function (i.e. the derivative of the log–likelihood function w.r.t. the parameter), evaluated at the true value of the parameter and suitably normalized, converges to a Gaussian r.v. with zero mean and covariance matrix $I$.