## Section: New Results

### Graphical and Markov models

#### Fast Bayesian network structure learning using quasi-determinism screening

Participants : Thibaud Rahier, Stéphane Girard, Florence Forbes.

**Joint work with:**
Sylvain Marié, Schneider Electric.

Learning the structure of Bayesian networks from data is a NP-Hard problem that involves an optimization task on a super-exponential sized space. In this work, we show that in most real life datasets, a number of the arcs contained in the final structure can be prescreened at low computational cost with a limited impact on the global graph score. We formalize the identification of these arcs via the notion of quasi-determinism, and propose an associated algorithm that reduces the structure learning to a subset of the original variables. We show, on diverse benchmark datasets, that this algorithm exhibits a significant decrease in computational time and complexity for only a little decrease in performance score.

#### Robust graph estimation

Participants : Karina Ashurbekova, Florence Forbes.

**Joint work with:**
Sophie Achard, CNRS, Gipsa-lab.

Graphs are an intuitive way of representing and visualizing the relationships between many variables. A graphical model is a probabilistic model whose conditional independence or other measures of relationship between random variables is given by a graph. Learning graphical models using their observed samples is an important task, and involves both structure and parameter estimation. Generally, graph estimation consists of several steps. First of all, we do not know the distribution of the real data. But we can do an assumption about this distribution. Then the measure of relationship between variables we are interested in can be chosen based on this assumption. All these measures of relationship are related with elements of the covariance or precision matrices. After estimating the covariance/precision matrix the

final graph can be constructed based on elements of this matrix. A lot of graph estimation methods rely on the Gaussian graphical model, in which the random vector Y is assumed to be Gaussian. Under this assumption, the most popular method is the graphical lasso (glasso). In practice, data may deviate from the Gaussian model in various ways. Outliers and heavy tails frequently occur. Contamination of a handful of variables in a few experiments can lead to a drastically wrong graph. So one of our objective is to deal with heavy tailed data using a new family of multivariate heavy-tailed distributions [8] and infer a graph robust to outliers without having to remove them.

#### Spatial mixtures of multiple scaled $t$-distributions

Participants : Florence Forbes, Alexis Arnaud.

**Joint work with:**
Steven Quinito Masnada, Inria Grenoble Rhone-Alpes

The goal is to implement an hidden Markov model version of our recently introduced mixtures of non standard multiple scaled $t$-distributions. The motivation for doing that is the application to multiparametric MRI data for lesion analysis. When dealing with MRI human data, spatial information is of primary importance. For our preliminary study on rat data [15], the results without spatial information were already quite smooth. The main anatomical structures can be identified. We suspect the reason is that the measured parameters already contain a lot of information about the underlying tissues. However, introducing spatial information is always useful and is our ongoing work. In the statistical framework we have developed (mixture models and EM algorithm), it is conceptually straightforward to introduce an additional Markov random field. In addition, when using a Markov random field it is easy to incorporate additional atlas information.

#### Spectral CT reconstruction with an explicit photon-counting detector model: a "one-step" approach

Participants : Florence Forbes, Pierre-Antoine Rodesch.

**Joint work with:**
Veronique Rebuffel and Clarisse Fournier from CEA-LETI Grenoble.

In the context of Pierre-Antoine Rodesh's PhD thesis, we investigate new statistical and optimization methods for tomographic reconstruction from non standard detectors providing multiple energy signals. Recent developments in energy-discriminating Photon-Counting Detector (PCD) enable new horizons for spectral CT. With PCDs, new reconstruction methods take advantage of the spectral information measured through energy measurement bins. However PCDs have serious spectral distortion issues due to charge-sharing, fluorescence escape, pileup effect Spectral CT with PCDs can be decomposed into two problems: a noisy geometric inversion problem (as in standard CT) and an additional PCD spectral degradation problem. The aim of this study is to introduce a reconstruction method which solves both problems simultaneously: a one-step approach. An explicit linear detector model is used and characterized by a Detector Response Matrix (DRM). The algorithm reconstructs two basis material maps from energy-window transmission data. The results prove that the simultaneous inversion of both problems is well performed for simulation data. For comparison, we also perform a standard two-step approach: an advanced polynomial decomposition of measured sinograms combined with a filtered-back projection reconstruction. The results demonstrate the potential uses of this method for medical imaging or for non-destructive control in industry. Preliminary results will be presented at the SPIE medical imaging 2018 conference in Houston, USA [44].

#### Non parametric Bayesian priors for hidden Markov random fields

Participants : Florence Forbes, Julyan Arbel, Hongliang Lu.

Hidden Markov random field (HMRF) models are widely used for image segmentation or more generally for clustering data under spatial constraints. They can be seen as spatial extensions of independent mixture models. As for standard mixtures, one concern is the automatic selection of the proper number of components in the mixture, or equivalently the number of states in the hidden Markov field. A number of criteria exist to select this number automatically based on penalized likelihood (eg. AIC, BIC, ICL etc.) but they usually require to run several models for different number of classes to choose the best one. Other techniques (eg. reversible jump) use a fully Bayesian setting including a prior on the class number but at the cost of prohibitive computational times. In this work, we investigate alternatives based on the more recent field of Bayesian nonparametrics. In particular, Dirichlet process mixture models (DPMM) have emerged as promising candidates for clustering applications where the number of clusters is unknown. Most applications of DPMM involve observations which are supposed to be independent. For more complex tasks such as unsupervised image segmentation with spatial relationships or dependencies between the observations, DPMM are not satisfying.

#### Automated ischemic stroke lesion MRI quantification

Participant : Florence Forbes.

**Joint work with:**
Senan Doyle (Pixyl), Assia Jaillard (CHUGA) , Olivier Heck (CHUGA) , Olivier Detante (CHUGA) and
Michel Dojat (GIN)

Manual delineation by an expert is currently the gold standard for lesion quantification, but is resource-intensive, suffers from inter-rater and intra-rater variability, and does not scale well to large population cohorts. We develop an automated lesion quantification method to assess the efficacy of cell therapy in patients after ischemic stroke. A high-quality sub-acute and chronic stroke dataset was supplied by HERMES. T1-w and 3D-Flair MRIs were acquired from 20 ischemic stroke patients with MCA infarct at 2 and 6 months post-event. Manual delineation was performed by an expert using the Flair image. We propose an unsupervised method employing a hidden Markov random field, with innovations to address the challenges posed by stroke MR scans. We introduce a probabilistic vascular territory atlas, adapted to the patient-specific data in a joint segmentation and registration framework, to model the potential progression and delimitation of vascular accidents. After segmentation, a good correlation is observed between manual and automated lesion volume delineation for the two time points. We therefore propose an unsupervised method with the hypothesis that such a class of methods is more robust to the diversity of images obtained with different sequence parameters and scanners; a particularly sensitive point for multi-center studies. Interestingly, this approach will be used in the European RESSTORE cohort.

#### PyHRF: A python library for the analysis of fMRI data based on local estimation of hemodynamic response function

Participants : Florence Forbes, Jaime Eduardo Arias Almeida, Aina Frau Pascual.

**Joint work with:**
Michel Dojat and Jan Warnking from Grenoble Institute of Neuroscience.

Functional Magnetic Resonance Imaging (fMRI) is a neuroimaging technique that allows the non-invasive study of brain function. It is based on the hemodynamic changes induced by cerebral activity following sensory or cognitive stimulation. The measured signal depends on the variation of blood oxygenation level (BOLD signal) which is related to brain activity: a decrease in deoxyhemoglobin induces an increase in BOLD signal. In fact, the signal is convoluted by the Hemodynamic Response Function (HRF) whose exact form is unknown and fluctuates with various parameters such as age, brain region or physiological conditions. In this work we focused on PyHRF, a software to analyze fMRI data using a joint detection-estimation (JDE) approach. It jointly detects cortical activation and estimates the HRF. In contrast to existing tools, PyHRF estimates the HRF instead of considering it as constant in the entire brain, improving thus the reliability of the results. We investigated a number of real data case to demonstrate that PyHRF was a suitable tool for clinical applications. This implied the definition of guidelines to set some of the parameters required to run the software. We investigated a calibration method by comparing results with the standard SPM software in the case of a fixed HRF. An overview of the package and its performance was presented at the 16th Python in Science Conference (SciPy 2017) in Austin, TX, United States [38].

#### Hidden Markov models for the analysis of eye movements

Participants : Jean-Baptiste Durand, Brice Olivier.

This research theme is supported by a LabEx PERSYVAL-Lab project-team grant.

**Joint work with:**
Anne Guérin-Dugué (GIPSA-lab)
and Benoit Lemaire (Laboratoire de Psychologie et Neurocognition)

In the last years, GIPSA-lab has developed computational models of information search in web-like materials, using data from both eye-tracking and electroencephalograms (EEGs). These data were obtained from experiments, in which subjects had to decide whether a text was related or not to a target topic presented to them beforehand. In such tasks, reading process and decision making are closely related. Statistical analysis of such data aims at deciphering underlying dependency structures in these processes. Hidden Markov models (HMMs) have been used on eye movement series to infer phases in the reading process that can be interpreted as steps in the cognitive processes leading to decision. In HMMs, each phase is associated with a state of the Markov chain. The states are observed indirectly though eye-movements. Our approach was inspired by Simola et al. (2008), but we used hidden semi-Markov models for better characterization of phase length distributions [55]. The estimated HMM highlighted contrasted reading strategies (ie, state transitions), with both individual and document-related variability. However, the characteristics of eye movements within each phase tended to be poorly discriminated. As a result, high uncertainty in the phase changes arose, and it could be difficult to relate phases to known patterns in EEGs.

This is why, as part of Brice Olivier's PhD thesis, we are developing integrated models coupling EEG and eye movements within one single HMM for better identification of the phases. Here, the coupling should incorporate some delay between the transitions in both (EEG and eye-movement) chains, since EEG patterns associated to cognitive processes occur lately with respect to eye-movement phases. Moreover, EEGs and scanpaths were recorded with different time resolutions, so that some resampling scheme must be added into the model, for the sake of synchronizing both processes.

New results were obtained in the standalone analysis of the eye-movements. A comparison between the effects of three types of texts was performed, considering texts either closely related, moderately related or unrelated to the target topic.

Our goal for this coming year is to develop and implement a model for jointly analyzing eye-movements and EEGs in order to improve the discrimination of the reading strategies.

#### Markov models for the analysis of the alternation of flowering in apple tree progenies

Participant : Jean-Baptiste Durand.

This research theme is supported by a Franco-German ANR grant (AlternApp project).

**Joint work with:**
Evelyne Costes (INRA AGAP, AFEF team)

A first study was published to characterize genetic determinisms of the alternation of flowering in apple tree progenies. Data were collected at two scales: at whole tree scale (with annual time step) and a local scale (annual shoots, which correspond to portions of stems that were grown during the same year). One or several replications of each genotype were available.

Three families of indices were proposed for early detection of alternation during the juvenile phase. The first family was based on a trend model and a quantification of the deviation amplitudes and dependency, with respect to the trend. The second family was based on a 2nd-order Markov chain with fixed and random effect in transition probabilities. The third family was based on entropy indices, in which flowering probabilities were corrected from fixed effects using Generalized Linear Models.

This allowed early quantification of alternation from the yearly numbers of inflorescences at tree scale. Some quantitative trait loci (QTL) were found in relation with these indices [40], [20].

New data sets where collected in other F1 progenies. Ancestral relationships between parents of different progenies were taken into account to enhance the power of QTL detection using Bayesian methods. Other QTLs are expected to be found using these new indices and genetic material. However, the amount of replicate per genotype and of data per replicate is quite reduced compared to those of our previous work. This is why we will investigate the loss of power in QTL detection due to a degraded amount of data, by simulating data deletion in our reference results.