## Section:
New Results2>
### Markov models3>
#### Spatial risk mapping for rare disease with hidden Markov fields and variational EM4>

#### Spatial risk mapping for rare disease with hidden Markov fields and variational EM4>

Participants : Florence Forbes, Senan James Doyle.

**Joint work with:** Lamiae Azizi, David Abrial and Myriam
Garrido from INRA Clermont-Ferrand-Theix.

Current risk mapping models for pooled data focus on
the estimated risk for each geographical unit. A risk
classification, *i.e.* grouping of geographical units with
similar risk, is then necessary to easily draw interpretable maps,
with clearly delimited zones in which protection measures can be
applied. As an illustration, we focus on the Bovine Spongiform
Encephalopathy (BSE) disease that threatened the bovine production
in Europe and generated drastic cow culling. This example features
typical animal disease risk analysis issues with very low risk
values, small numbers of observed cases and population sizes that
increase the difficulty of an automatic classification. We propose
to handle this task in a spatial clustering framework using a non
standard discrete hidden Markov model prior designed to favor a
smooth risk variation. The model parameters are estimated using an
EM algorithm and a mean field approximation for which we develop a
new initialization strategy appropriate for spatial Poisson
mixtures. Using both simulated and our BSE data, we show that our
strategy performs well in dealing with low population sizes and
accurately determines high risk regions, both in terms of
localization and risk level estimation.

Main corresponding paper [14] .

#### Spatial modelling of biodiversity from high-througput DNA sequence data4>

Participants : Florence Forbes, Angelika Studeny.

This is joint work with Eric Coissac and Pierre Taberlet from LECA (Laboratoire d'Ecologie Alpine) and Alain Viari from EPI Bamboo

Biodiversity has been acknowledged as a vital ressource for ecosystem health and stability, faced with an unprecedented global decline. In order to be effective, conservation actions need to be based on reliable and fast analysis. Recent advances in DNA sequencing methods now enable DNA-based identification of multiple species from only few, even potentially degraded environmental samples (metabarcoding.org, [74] ). This offers a new way of biodiversity assessment and is of particular interest where large-scale individual-based diversity assessment is difficult, for example in tropical environments. Due to their comparatively low demand in cost and effort, these methods are characterized by their high throughput; they are expected to produce vast amounts of data as they gain in popularity over the coming years. The specific properties of these data (e.g. bias from sequencing errors, notion of species) and their high dimensionality provides new statistical and computational challenges for biodiversity assessment. This project aims at extending existing summary statistics to be used with data from metabarcoding surveys and, where this is not adequate, to develop new methodology. A special focus is on the spatial mapping of biodiversity and the co-occurrence of species. In a first instance, we investigate spatial clustering algorithms based on Markov random fields (software SpaCEM3, http://spacem3.gforge.inria.fr/ ) to identify regions of high species occurrence as well as structured additive regression models and their implementation to estimate cross-correlations between species occurrences in space [61] , [72] , [71] . At present, results have been derived in form of species occurrence maps, which take into account pairwise cross-correlation, and interaction graphs.

#### Statistical characterization of tree structures based on Markov tree models and multitype branching processes, with applications to tree growth modelling.4>

Participant : Jean-Baptiste Durand.

**Joint work with:** Pierre Fernique (Montpellier 2 University
and CIRAD) and Yann Guédon (CIRAD), Inria Virtual Plants.

The quantity and quality of yields in fruit trees is closely related
to processes of growth and branching, which determine ultimately the
regularity of flowering and the position of flowers. Flowering and
fruiting patterns are explained by statistical dependence between
the nature of a parent shoot (*e.g.* flowering or not) and the
quantity and natures of its children shoots – with potential
effect of covariates. Thus, better characterization of patterns and
dependencies is expected to lead to strategies to control the
demographic properties of the shoots (through varietal selection or crop
management policies), and thus to bring substantial improvements in
the quantity and quality of yields.

Since the connections between shoots can be represented by mathematical trees, statistical models based on multitype branching processes and Markov trees appear as a natural tool to model the dependencies of interest. Formally, the properties of a vertex are summed up using the notion of vertex state. In such models, the numbers of children in each state given the parent state are modeled through discrete multivariate distributions. Model selection procedures are necessary to specify parsimonious distributions. We developed an approach based on probabilistic graphical models to identify and exploit properties of conditional independence between numbers of children in different states, so as to simplify the specification of their joint distribution. The graph building stage was based on exploring the space of possible chain graph models, which required defining a notion of neighbourhood of these graphs. A parametric distribution was associated with each graph. It was obtained by combining families of univariate and multivariate distributions or regression models. These were chosen by selection model procedures among different parametric families.

This work was carried out in the context of Pierre Fernique's first year of PhD (Montpellier 2 University and CIRAD). It was applied to model dependencies between short or long, vegetative or flowering shoots in apple trees. The results highlighted contrasted patterns related to the parent shoot state, with interpretation in terms of alternation of flowering (see paragraph 6.3.4 ). It was also applied to the analysis of the connections between cyclic growth and flowering of mango trees. This work will be continued during Pierre Fernique's PhD thesis, with extensions to other fruit tree species and other parametric discrete multivariate families of distributions, including covariates and mixed effects.

#### Statistical characterization of the alternation of flowering in fruit tree species4>

Participant : Jean-Baptiste Durand.

**Joint work with:** Jean Peyhardi and Yann Guédon (Mixed
Research Unit DAP, Virtual Plants team), Baptiste Guitton, Yan
Holtz and Evelyne Costes (DAP, AFEF team), Catherine Trottier
(Montpellier University)

The aim of this work was to characterize genetic determinisms of the alternation of flowering in apple tree progenies. Data were collected at two scales: at whole tree scale (with annual time step) and a local scale (annual shoot or AS, which is the portions of stem that were grown during the same year). Two replications of each genotype were available.

Indices were proposed to characterize alternation at tree scale. The difficulty is related to early detection of alternating genotypes, in a context where alternation is often concealed by a substantial increase of the number of flowers over consecutive years. To separate correctly the increase of the number of flowers due to aging of young trees from alternation in flowering, our model relied on a parametric hypothesis for the trend (fixed slopes specific to genotype and random slopes specific to replications), which translated into mixed effect modelling. Then, different indices of alternation were computed on the residuals. Clusters of individuals with contrasted patterns of bearing habits were identified.

To model alternation of flowering at AS scale, a second-order Markov tree model was built. Its transition probabilities were modelled as generalized linear mixed models, to incorporate the effects of genotypes, year and memory of flowering for the Markovian part, with interactions between these components.

Asynchronism of flowering at AS scale was assessed using an entropy-based criterion. The entropy allowed for a characterisation of the roles of local alternation and asynchronism in regularity of flowering at tree scale.

Moreover, our models highlighted significant correlations between indices of alternation at AS and individual scales.

This work was extended by the Master 2 internship of Yan Holtz, supervised by Evelyne Costes and Jean-Baptiste Durand. New progenies were considered, and a methodology based on a lighter measurement protocol was developed and assessed. It consisted in assessing the accuracy of approximating the indices computed from measurements at tree scale by the same indices computed as AS scale. The approximations were shown sufficiently accurate to provide an operational strategy for apple tree selection.

As a perspective of this work, patterns in the production of children ASs (numbers of flowering and vegetative children) depending on the type of the parent AS must be analyzed using branching processes and different types of Markov trees, in the context of Pierre Fernique's PhD Thesis (see paragraph 6.3.3 ).