• The Inria's Research Teams produce an annual Activity Report presenting their activities and their results of the year. These reports include the team members, the scientific program, the software developed by the team and the new results of the year. The report also describes the grants, contracts and the activities of dissemination and teaching. Finally, the report gives the list of publications of the year.

• Legal notice
• Personal data

#### BIGS

##### BIGS - 2021

2021
Activity report
Project-Team
BIGS
RNSR: 200920955T
Research center
In partnership with:
CNRS, Université de Lorraine
Team name:
Biology, genetics and statistics
In collaboration with:
Institut Elie Cartan de Lorraine (IECL)
Domain
Digital Health, Biology and Earth
Theme
Computational Biology
Creation of the Project-Team: 2011 January 01

# Keywords

• A3.1. Data
• A3.2. Knowledge
• A3.2.3. Inference
• A3.3. Data and knowledge analysis
• A3.3.1. On-line analytical processing
• A3.3.2. Data mining
• A3.3.3. Big data analysis
• A3.4.1. Supervised learning
• A3.4.2. Unsupervised learning
• A3.4.4. Optimization and learning
• A3.4.7. Kernel methods
• A6. Modeling, simulation and control
• A6.1. Methods in mathematical modeling
• A6.1.2. Stochastic Modeling
• A6.2. Scientific computing, Numerical Analysis & Optimization
• A6.2.3. Probabilistic methods
• A6.2.4. Statistical methods
• A6.4. Automatic control
• A6.4.2. Stochastic control
• B1. Life sciences
• B1.1. Biology
• B1.1.2. Molecular and cellular biology
• B1.1.10. Systems and synthetic biology
• B1.1.11. Plant Biology
• B2.2. Physiology and diseases
• B2.2.1. Cardiovascular and respiratory diseases
• B2.2.3. Cancer
• B2.3. Epidemiology
• B2.4. Therapies

# 1 Team members, visitors, external collaborators

## Research Scientists

• Nicolas Champagnat [Team leader, Inria, Senior Researcher, HDR]
• Coralie Fritsch [Inria, Researcher]
• Ulysse Herbach [Inria, Researcher]
• Bruno Scherrer [Inria, Researcher, HDR]

## Faculty Members

• Thierry Bastogne [Univ de Lorraine, Associate Professor, HDR]
• Sandie Ferrigno [Univ de Lorraine, Associate Professor]
• Anne Gégout Petit [Univ de Lorraine, Professor, HDR]
• Jean-Marie Monnez [Univ de Lorraine, Emeritus, HDR]
• Aurélie Muller-Gueudin [Univ de Lorraine, Associate Professor]
• Sophie Mézières [Univ de Lorraine, Associate Professor]
• Pierre Vallois [Univ de Lorraine, Emeritus, HDR]
• Denis Villemonais [Univ de Lorraine, Associate Professor, HDR]

## Post-Doctoral Fellows

• Leo Darrigade [Inria, from Apr 2021]
• Joseph Lam-Weil [Univ de Lorraine, from Jun 2021]
• William Ocafrain [Inria]

## PhD Students

• Vincent Hass [Univ de Lorraine, Inria until Aug 2021, ATER from Sep 2021]
• Rodolphe Loubaton [Univ de Lorraine, ATER]
• Anouk Rago [Univ de Lorraine, from Oct 2021]
• Nassim Sahki [Univ de Lorraine, Inria until Feb 2021, ATER from Mar 2021 until Aug 2021]
• Nino Vieillard [Google, CIFRE]
• Nicolás Zalduendo Vidal [Inria]

## Technical Staff

• Joseph Lam-Weil [Univ de Lorraine, Engineer, from Apr 2021 until Jun 2021]
• Nicolas Thorr [Inria, Engineer, until Jun 2021]

• Emmanuelle Deschamps [Inria]

# 2 Overall objectives

BIGS is a joint team of Inria, CNRS and Université Lorraine, via the Institut Élie Cartan, UMR 7502 CNRS-UL laboratory in mathematics, of which Inria is a strong partner. One member of BIGS, T. Bastogne, comes from the Research Center of Automatic Control of Nancy (CRAN), with which BIGS has strong relations in the domain "Health-Biology-Signal". Our research is mainly focused on stochastic modeling and statistics but also aiming at a better understanding of biological systems. BIGS involves applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) population studies of genomic data and of micro-organisms genomics, (4) epidemiology and e-health.

# 3 Research program

## 3.1 Introduction

We give here the main lines of our research that belongs to the domains of probability and statistics. For clarity, we made the choice to structure them in four items. Although this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.

## 3.2 Stochastic modeling

Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells 83, 59. Piecewise deterministic processes are non diffusion processes also frequently used in the biological context 49, 58, 51. Among Markov models, we developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations 76, 57. For instance, knowledge about Brownian or random walk excursions 82, 74 helps to analyse genetic sequences and to develop inference about them. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion.

This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion of fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the so-called rough paths tools 66, but also invoking the Russo-Vallois integration techniques 75. The specific issue of Volterra equations driven by fractional Brownian motion, which is central for the subdiffusion within proteins problem, is addressed in 50. Many generalizations (Gaussian or not) of this model have been recently proposed for some Gaussian locally self-similar fields, or for some non-Gaussian models 62, or for anisotropic models 44.

## 3.3 Estimation and control for stochastic processes

We develop inference about stochastic processes that we use for modeling. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.

There are many estimation techniques for diffusion processes or coefficients of fractional or multifractional Brownian motion according to a set of observations 61, 40, 48. However, the inference problem for diffusions driven by a fractional Brownian motion is still in its infancy. Our team has a good expertise about inference of the jump rate and the kernel of piecewise-deterministic Markov processes (PDMP) 39, 35, 38, 37, but there are many directions to go further into. For instance, previous work made the assumption of a complete observation of jumps and mode, which is unrealistic in practice. We tackle the problem of inference of “hidden PDMP”. For example, in pharmacokinetics modeling inference, we want to account for the presence of timing noise and identification from longitudinal data. We have expertise on these subjects 41, and we also used mixed models to estimate tumor growth 42.

We consider the control of stochastic processes within the framework of Markov Decision Processes 73 and their generalization known as multi-player stochastic games, with a particular focus on infinite-horizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of 43. Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years 85, 72, 56, 79, but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes 81, 80, 77, 65, 78, and we currently investigate whether/how similar ideas may be adapted to multi-player stochastic games.

## 3.4 Algorithms and estimation for graph data

A graph data structure consists of a set of nodes, together with a set of pairs of these nodes called edges. This type of data is frequently used in biology because they provide a mathematical representation of many concepts such as biological structures and networks of relationships in a population. Some attention has recently been focused in the group on modeling and inference for graph data.

Network inference is the process of making inference about the link between two variables, taking into account the information about other variables. 84 gives a very good introduction and many references about network inference and mining. Many methods are available to infer and test edges in Gaussian graphical models 84, 67, 54, 55. However, the Gaussian assumption does not hold when dealing with typical “zero-inflated” abundance data, and we want to develop inference in this case.

Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model. We also work on lossy compression of trees via directed acyclic graphs. These methods enable us to compute distances between tree data faster than from the original structures and with a high accuracy.

## 3.5 Regression and machine learning

Regression models and machine learning aim at inferring statistical links between a variable of interest and covariates. In biological study, it is always important to develop adapted learning methods both in the context of standard data and also for data of high dimension (with sometimes few observations) and very massive or online data.

Many methods are available to estimate conditional quantiles and test dependencies 71, 60. Among them we have developed nonparametric estimation by local analysis via kernel methods 52, 53 and we want to study properties of this estimator in order to derive a measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to develop omnibus tests that examine several assumptions together.

Concerning the analysis of high dimensional data, our view on the topic relies on the French data analysis school, specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool 64, which allows one to approximate eigenvectors in a stepwise manner 69, 68, 70. BIGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms 45. We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA (Principal Component Analysis).

We also focus on the biological context of high-throughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. We have to account for the inter-individual variability within the modeling procedure. We aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (Expectation-Maximisation) algorithm for the estimation of the model parameters.

# 4 Application domains

## 4.1 Tumor growth-oncology

On this topic, we want to propose branching processes to model the appearance of mutations in tumors, through new collaborations with clinicians who measure a particular quantity called circulating tumor DNA (ctDNA). The final purpose is to use ctDNA as an early biomarker of the resistance to an immunotherapy treatment: it is the aim of the ITMO project. Another topic is the identification of dynamic networks of gene expression. In the ongoing work on low-grade gliomas, a local database of 400 patients will be soon available to construct models. We plan to extend it through national and international collaborations (Montpellier CHU, Montreal CRHUM). Our aim is to build a decision-aid tool for personalised medicine. In the same context, there is a topic of clustering analysis of a brain cartography obtained by sensorial simulations during awake surgery.

## 4.2 Genomic data and micro-organisms population

Despite of his 'G' in the name of BIGS, Genetics is not central in the applications of the team. However, we want to contribute to a better understanding of the correlations between genes trough their expression data and of the genetic bases of drug response and disease. We have contributed to methods detecting proteomics and transcriptomics variables linked with the outcome of a treatment.

## 4.3 Epidemiology and e-health

We have many works to do in our ongoing projects in the context of personalized medicine with CHU Nancy. They deal with biomarkers research, prognostic value of quantitative variables and events, scoring, and adverse events. We also want to develop our expertise in rupture detection in a project with APHP (Assistance Publique Hôpitaux de Paris) for the detection of adverse events, earlier than the clinical signs and symptoms. The clinical relevance of predictive analytics is obvious for high-risk patients such as those with solid organ transplantation or severe chronic respiratory disease for instance. The main challenge is the rupture detection in multivariate and heterogeneous signals (for instance daily measures of electrocardiogram, body temperature, spirometry parameters, sleep duration, etc.). Other collaborations with clinicians concern foetopathology and we want to use our work on conditional distribution function to explain fetal and child growth. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy).

## 4.4 Dynamics of telomeres

Telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Through a collaboration with Pr A. Benetos, geriatrician at CHU Nancy, we recently obtained data on the distribution of the length of telomeres from blood cells. With members of Inria team TOSCA, we want to work in three connected directions: (1) refine methodology for the analysis of the available data; (2) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior, quasi-stationarity, etc.); and (3) use these properties to develop new statistical methods. A slot of postdoc position is already planned in the Lorraine Université d'Excellence, LUE project GEENAGE (managed by CHU Nancy).

# 5 Social and environmental responsibility

We followed Inria's recommendations to get involved in the fight against COVID 19. We tried to collaborate with the LCPME laboratory in the purpose to predict the number of SARS‐CoV‐2 positive patients from the Grand Nancy metropolitan at the Nancy University Hospital from the concentration of SARS-Cov-2 residues in waste water. We have encountered difficulties with the Obépine network in obtaining raw data instead of pre-processed indicators. We made predictions from the incidence rates available on Santé Publique France. The predictions are available on the siwam website.

We were also involved in the MODCOV19 project, a platform of coordination of research actions about modeling of SARS-CoV-2 (Covid-19) pandemic. We were in particular responsible for the bibliographic awareness group of the coordination committee.

# 6 Highlights of the year

The list of permanent members of the team noticeably increased in 2021, due to the arrival of several researchers from the former Inria team Tosca. These researchers are experts of stochastic modeling and analysis for bio-medical applications. Their arrival led to a strengthening of the first axis of our research program. We are currently proposing a new Inria team Simba which takes into account these arrivals and the recent recruitments in the past few years in our team, and more generally on the topic of mathematical biology in Institut Élie Cartan de Lorraine.

# 7 New software and platforms

The team has been developing three new packages.

## 7.1 New software

• Name:
A Statistical Methodology to Select Covariates in High-Dimensional Data under Dependence
• Keywords:
Biostatistics, Aggregated methods, High Dimensional Data, Personalized medicine, Variable selection
• Functional Description:
Two steps variable selection procedure in a context of high-dimensional dependent data with few observations. A first step is dedicated to eliminate the dependency between variables (clustering of variables, followed by factor analysis inside each cluster). A second step consists in variable selection by aggregation of adapted methods.
• News of the Year:
This package is a new one.
• URL:
• Publications:
• Contact:
Aurélie Muller
• Participants:
Aurélie Muller, Anne Gegout Petit

### 7.1.2 cvmgof

• Keywords:
Regression, Test, Estimators
• Scientific Description:
Many goodness-of-fit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are "directional" in that they detect departures from a given assumption of the model. Other tests are "global" (or "omnibus") in that they assess whether a model fits a dataset on all its assumptions. cvmgof focuses on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. It implements 2 nonparametric "directional" tests and one nonparametric "global" test, all based on generalizations of the Cramer-von Mises statistic.
• Functional Description:
cvmgof is an R library devoted to Cramer-von Mises goodness-of-fit tests. It implements three nonparametric statistical methods based on Cramer-von Mises statistics to estimate and test a regression model.
• News of the Year:
New version available on CRAN website since Jan 11 2021
• URL:
• Publication:
• Contact:
Romain Azais
• Participants:
Sandie Ferrigno, Marie-José Martinez, Romain Azais

### 7.1.3 Harissa

• Name:
Hartree approximation for inference along with a stochastic simulation algorithm
• Keywords:
Gene regulatory networks, Reverse engineering, Molecular simulation
• Functional Description:
Harissa is a Python package for both inference and simulation of gene regulatory networks, based on stochastic gene expression with transcriptional bursting. It was implemented in the context of a mechanistic approach to gene regulatory network inference from single-cell data.
• URL:
• Publications:
• Contact:
Ulysse Herbach

# 8 New results

## 8.1 Stochastic modeling

Participants: Nicolas Champagnat, Coralie Fritsch, Anne Gégout-Petit, Vincent Hass, Ulysse Herbach, William Oçafrain, Pierre Vallois, Denis Villemonais, Nicolás Zalduendo Vidal.

### 8.1.1 Reconstruction of epigenetic landscapes from single-cell data

The aim is to better understand how living cells make decisions (e.g., differentiation of a stem cell into a particular specialized type), seeing decision-making as an emergent property of an underlying complex molecular network. Indeed, it is now proven that cells react probabilistically to their environment: cell types do not correspond to fixed states, but rather to “potential wells” of a certain energy landscape (representing the energy of the possible states of the cell) that we are trying to reconstruct. A first paper proposing a reconstruction method has been submitted 26 in the framework of an international collaboration (USA, Switzerland, France). Another paper is about to be submitted 28, dealing more specifically with the inference of the underlying networks.

Joint work with Nan Papili Gao (ETH Zurich), Olivier Gandrillon (ENS Lyon), András Páldi (EPHE, Paris), and Rudiyanto Gunawan (University at Buffalo, New York)

### 8.1.2 Modeling and estimation of circulating tumor DNA (ctDNA) dynamics for detecting resistance to targeted therapies

Continuation of the ITMO Cancer project, supervised by Nicolas Champagnat, concerning the modeling of circulating tumor DNA (ctDNA) to detect the appearance of resistance to targeted therapies (personalized medicine). After a phase of investigation of possible scenarios in collaboration with Alexandre Harlé of the Institute of Cancerology of Lorraine (ICL), a final model was selected. Based on a mathematical analysis, the members of the project then designed a statistical inference algorithm (learning the parameters of the model, including the genealogical tree of mutations for each patient) which is intended to be validated on real data currently being acquired at the Nancy CHRU. The general idea is to exploit a “variational principle” that allows to explore the discrete space of family trees, of very large size, through a “pivot” space of continuous parameters, easy to optimize (and in reasonable numbers). A paper detailing the model and its inference is in preparation. The previous method allows for the reconstruction of intratumoral heterogeneity, i.e. the subclone composition of the tumor. Based on these data, we are currently studying models of stochastic tumor growth with an emphasis on interactions between the clones to assess the effects of different treatment strategies.

### 8.1.3 Quasi-stationary distributions

We are continuing our research on quasi-stationary distributions (QSD), that is, distributions of Markov stochastic processes with absorption, which are stationary conditionally on non-absorption. For models of biological populations, absorption corresponds usually to extinction of a (sub-)population. QSDs are fundamental tools to describe the population state before extinction and to quantify the large-time behavior of the probability of extinction.

This year, we solved a general conjecture on the Fleming-Viot particle systems approximating QSDs: in cases where several QSDs exist, it is expected that the stationary distributions of the Fleming-Viot processes approach a particular QSD, called minimal QSD. We proved that this holds true for general absorbed Markov processes with soft obstacles in 7. We also obtained in 8 criteria based on Lyapunov functions allowing to check general conditions of  47 which characterize the exponential uniform convergence in total variation of conditional distributions of an absorbed Markov process to a unique quasi-stationary distribution. Among the various applications they give, they prove that these conditions apply to any logistic Feller diffusions in any dimension conditioned to the non-extinction of all its coordinates. This question was left partly open since the first work of Cattiaux and Méléard on this topic  46.

Together with M. Benaïm (Univ. Neuchâtel), we studied in 4 stochastic algorithms to approximate quasi-stationary distributions of diffusion processes absorbed at the boundary of a bounded domain. We considered a reinforced version of the diffusion, which is resampled according to its occupation measure when it reaches the boundary. We showed that its occupation measure converges to the unique quasi-stationary distribution of the diffusion process. We also obtained in 24 general criteria ensuring existence, uniqueness and/or exponential convergence properties for quasi-stationary distributions. The criteria were specifically designed to apply to degenerate processes such as hypoelliptic diffusions. We also provided in 25 a counterexample to the uniqueness of a quasi-stationary distribution for a diffusion process which satisfies the weak Hörmander condition.

Together with R. Schott (IECL, Univ. Lorraine), we studied in 6 models of deadlocks in distributed systems, using the approach we developped in 8 to study quasi-stationary distributions, in order to characterize and compute numerically the asymptotic behaviour of the deadlock time and the behaviour of the system before deadlock, both for discrete and for diffusion models.

### 8.1.4 Evolutionary models of food webs

We studied models of food web adaptive evolution in 10. We identified the biomass conversion efficiency as a key mechanism underlying food web evolution and discussed the relevance of such models to study the evolution of food webs.

In collaboration with S. Billiard (Univ. Lille).

### 8.1.5 Adaptive dynamics in biological populations

We studied evolutionary models of bacteria with horizontal transfer in 5. Horizontal transfer is a common mechanism of DNA exchange between micro-organisms that is thought to be responsible for fast evolution of antibiotic resistance for bacteria or evolution of virulence for pathogenes. We considered a scaling of parameters taking into account the influence of negligible but non-extinct populations, allowing us to study specific phenomena observed in these models (re-emergence of traits, cyclic evolutionary dynamics and evolutionary suicide). This work is done in collaboration with S. Méléard (École Polytechnique) and V.C. Tran (Univ. Paris Est Marne-la-Vallée).

We also worked on general evolutionary models of adaptive dynamics under an assumption of large population and small mutations. This year, we obtained existence, uniqueness and ergodicity results for a centered version of the Fleming-Viot process of population genetics, which is a key step to recover variants of the canonical equation of adaptive dynamics, which describes the long time evolution of the dominant phenotype in the population, under less stringent biological assumptions than in previous works. We plan to complete this work next year.

## 8.2 Optimal control of Markov processes

Participants: Bruno Scherrer, Nino Vieillard.

We consider Offline Reinforcement Learning methods. The problem is to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions.

In 17, we propose an iterative procedure to learn a pseudometric (closely related to bisimulation metrics) from logged transitions, and use it to define this notion of closeness. We show its convergence and extend it to the function approximation setting. We then use this pseudometric to define a new lookup based bonus in an actor-critic algorithm: PLOFF. This bonus encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions.

In 18, noticing that an agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data, we take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more common regularization of the learned policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.

Joint work with Robert Dadashi, Shideh Rezaeifar, Léonard Hussenot, Olivier Pietquin, Olivier Bachem and Matthieu Geist.

## 8.3 Regression and machine learning

Participants: Thierry Bastogne, Sandie Ferrigno, Anne Gégout-Petit, Aurélie Gueudin, Benoît Lalloué, Jean-Marie Monnez, Nassim Sahki, Sophie Wantz-Mézières.

### 8.3.1 Cramér–von Mises goodness-of-fit tests in regression models

Many goodness-of-fit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are 'directional' in that they detect departures from a given assumption of the model. Other tests are 'global' (or 'omnibus') in that they assess whether a model fits a dataset on all its assumptions. We focus on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. We consider 2 nonparametric 'directional' tests and one nonparametric 'global' test, all based on generalizations of the Cramér-von Mises statistic.

To perform these goodness-of-fit tests, we develop the R package cvmgof  36, an easy-to-use tool for practitioners, available from the Comprehensive R Archive Network (CRAN). The use of the library is illustrated through a tutorial on real data and simulation studies are carried out in order to show how the package can be exploited to compare the 3 implemented tests. The practitioner can also easily compare the test procedures with different kernel functions, bootstrap distributions, numbers of bootstrap replicates, or bandwidths. The package was updated at the start of 2021, this is its third version. A first article 1 has been published on this work in October 2021.

We are now working on nonparametric tests associated with the functional form of the variance of the regression model. For this, we continue to work on the global test of Ducharme and Ferrigno in order to compare it in terms of performance with directional tests associated with the variance of the model. Many simulations are in progress. This will also make it possible to propose a more general package-type tool making it possible to validate the regression models used in practice.

To complete this work, it would be interesting to assess the other assumptions of a regression model such as the additivity of the random error term. The implementation of these directional tests would enrich the cvmgof package and offer a complete easy-to-use tool for validating regression models. Moreover, the assessment of the overall validity of the model when using several directional tests could be compared with that done when using only a global test. In particular, the well-known problem of multiple testing could be discussed by comparing the results obtained from multiple test procedures with those obtained when using a global test strategy. Another perspective of this work would be to develop a similar tool for other statistical models widely used in practice such as generalized linear models.

Join work with Romain Azaïs (INRIA, ENS Lyon) and Marie-José Martinez (LJK, Université Grenoble Alpes).

### 8.3.2 Online data analysis

Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods. This article in collaboration with A. Skiredj was presented in the 2020 Activity Report (Section 8.3.5) and is now published in Journal of Multivariate Analysis 15.

Streaming constrained binary logistic regression with online standardized data. This article in collaboration with E. Albuisson was presented in the 2020 Activity Report (Section 8.3.5) and is now accepted in Journal of Applied Statistics 13.

Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression. This article in collaboration with E. Albuisson was presented in the 2020 Activity Report (Section 8.3.5) and is being submitted 3063.

Stochastic approximation of eigenvectors and eigenvalues of the Q-symmetric expectation of a random matrix. Application to streaming PCA. In this analysis, we have studied the convergence of stochastic approximation processes of the Oja type for estimating eigenvectors of the unknown $Q$-symmetric expectation $B$ of a random matrix, the metric $Q$ being unknown. We have established a theorem of a.s. convergence of these processes with assumptions on the noisy observations ${B}_{n}$ of $B$ that are more general than in previous results. The estimation of eigenvectors corresponding to eigenvalues of $B$ in decreasing order is obtained using at step $n$ a Gram-Schmidt orthonormalization with respect to a random metric ${Q}_{n+1}$ such that ${Q}_{n+1}$ converges a.s. to $Q$ as $n$ goes to infinity. We have proved the a.s. convergence of specific processes to corresponding eigenvalues. Corollaries of this theorem apply in particular to cases where $E\left[{B}_{n}|{T}_{n}\right]$ or ${B}_{n}$ converges a.s. to $B$ which were studied by Monnez and Skiredj 15. In the case of a process using only the current observations at each step, we have suggested constructing another process using past and current observations. We have applied these results to the online estimation of principal components in streaming PCA of a random vector, taking into account all the observations up to the current step with possibly different weights assigned to past and current observations.

Other applications to methods related to PCA such as generalized canonical correlation analysis are in progress.

### 8.3.3 Change-point detection thresholds in the sequential context

To apply our algorithms of change-point to real data, we turned to some EMG signal data provided by INRS. The study concerns the development of trapezius muscle myalgia in the workplace. We apply change-point detection to characterize different computer activities carried out during an experimental day. Our analysis allowed us to characterize activities according to the frequency and amplitude of jumps and to distinguish office activities using the mouse from those using the keyboard. This work was presented in a conference paper 19.

## 8.4 Statistical learning and application in health

Participants: Nicolas Champagnat, Léo Darrigade, Sandie Ferrigno, Coralie Fritsch, Anne Gégout-Petit, Aurélie Gueudin, Ulysse Herbach, Benoît Lalloué, Rodolphe Loubaton, Jean-Marie Monnez, Anouk Rago, Nicolas Thorr, Pierre Vallois, Sophie Wantz-Mézières.

### 8.4.1 Analysis of diffuse low-grade gliomas growth

In the aim of understanding the growth of low-grade glioma, we investigate multiple fields of information available in clinical practice: patient-related predictors, variables related to tumor tissue and genetics. Monitoring growth through regular MRIs gives us access to many imaging-related variables, including an original one measuring tumor infiltration (thesis defended in 2021: Cyril Brzenczek CRAN, article in preparation). Our last efforts have focused on the statistical analysis of the database composed of these variables. We have obtained a regional fund PACTE to host this database and use it for teaching, dissemination and development of experimentation tools: PIANO platform.

Join work with J.M. Moureaux, Y. Gaudeau (CRAN), F. Rech, L. Taillandier, M. Blonski, T. Obara (CHRU Nancy)

### 8.4.2 Estimation of reference curves for fetal weight

In Epidemiology, we are working with INSERM to study fetal development in the last two trimesters of pregnancy. Reference or standard curves are required in this kind of biomedical problems. Values that lie outside the limits of these reference curves may indicate the presence of a disorder. Data are from the French EDEN mother-child cohort (INSERM). It's a mother-child cohort study investigating the prenatal and early postnatal determinants of child health and development. 2002 pregnant women were recruited before 24 weeks of amenorrhoea in two maternity clinics from middle-sized French cities (Nancy and Poitiers). From May 2003 to September 2006, 1899 newborns were then included. The main outcomes of interest are fetal (via ultra-sound) and postnatal growth, adiposity development, respiratory health, atopy, behaviour and bone, cognitive and motor development. We are studying fetal weight and height as a function of the gestional age in the third trimester of pregnancy. Some classical empirical and parametric methods such as polynomial regression are first used to construct these curves. For instance, polynomial regression is one of the most common parametric approaches for modeling growth data, especially during the prenatal period. However, these classical methods require strong assumptions. We therefore propose to work with semi-parametric LMS methods, by modifying the response variable (fetal weight) with, among others, Box–Cox transformations. A first article detailing these methodologies applied to the EDEN data should be submitted next year and is the object of the communication 31.

Alternative nonparametric methods as Nadaraya-Watson kernel estimation, local polynomial estimation, B-splines or cubic splines are also developed in this context to construct these curves. The practical implementation of these methods required working on smoothing parameters or choice of knots for the different types of nonparametric estimation. In particular, optimal choice of these parameters has been proposed. Then, a first version of an R package has been developed to propose a tool to construct nonparametric reference curves. It will soon be available on GitHub. In addition, a graphical interface (GUI) intended for practitioners is being developed to allow intuitive visualization of the results given by the package and an article is in progress.

Join work with Myriam Maumy-Bertrand (IRMA, Université de Strasbourg) and INSERM.

### 8.4.3 Construction of parsimonious event risk scores by an ensemble method. An illustration for short-term predictions in chronic heart failure patients from the GISSI-HF trial

This article in collaboration with E. Albuisson and D. Lucci was presented in the 2020 Activity Report (Section 8.4.2) and is now published in Applied Mathematics 14.

### 8.4.4 Prediction of silencing experiments on gene networks for chronic lymphocytic leukemia

We are working with L. Vallat (CHRU Strasbourg) on the inference of dynamical gene networks from RNAseq and proteome data. The goal is to infer a model of gene expression allowing to predict the gene expression in cells where the expression of genes is silenced (e.g. using siRNA), in order to select the silencing experiments which are more likely to reduce the cell proliferation. We expect the selected genes to provide new therapeutic targets for the treatment of chronic lymphocytic leukemia. This year, we addressed the general problem of prediction as defined above, and constructed and proposed an inference method for a new gene network model for which such a prediction is possible. Next year, we expect to identify potential therapeutic targets for which silencing experiments could be conducted.

### 8.4.5 A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology

We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. A simulation study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced non-small-cell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments. This work is published in 2 and is implemented in R package ‘ARMADA’.

In collaboration with T. Boukhobza and H. Dumond from CRAN, and B. Bastien from biopharmaceutical industry Transgene.

### 8.4.6 Projects linked with the COVID 19 pandemic

Seroprevalence study Pierre Vallois is the scientific coordinator of the seroprevalence study COVAL Nancy held in Nancy in July 2020 in collaboration with CHRU de Nancy (CIC épidémiologie clinique and Laboratoire de Virologie).

Background. The World Health Organisation recommends monitoring the circulation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We aimed to estimate anti–SARS-CoV-2 total immunoglobulin (IgT) antibody seroprevalence and describe symptom profiles and in vitro seroneutralization in Nancy, France, in spring 2020.

Methods. Individuals were randomly sampled from electoral lists and invited with household members over 5 years old to be tested for anti–SARS-CoV-2 (IgT, i.e. IgA/IgG/IgM) antibodies by ELISA (Bio-rad). Serum samples were classified according to seroneutralization activity 50 % (NT50) on Vero CCL-81 cells. Age- and sex-adjusted seroprevalence was estimated. Subgroups were compared by chi-square or Fisher exact test and logistic regression.

Results. Among 2006 individuals, 43 were SARS-CoV-2–positive; the raw seroprevalence was 2.1 % (95 % confidence interval 1.5 to 2.9), with adjusted metropolitan and national standardized seroprevalence 2.5 % (1.8 to 3.3) and 2.3 % (1.7 to 3.1). Seroprevalence was highest for 20- to 34-year-old participants (4.7 % [2.3 to 8.4]), within than out of socially deprived area (2.5 % vs 1 %, P=0.02) and with than without intra-family infection (p<10-6). Moreover, 25 % (23 to 27) of participants presented at least one COVID-19 symptom associated with SARS-CoV-2 positivity (p<10-13), with anosmia or ageusia highly discriminant (odds ratio 27.8 [13.9 to 54.5]), associated with dyspnea and fever. Among the SARS-CoV-2-positives, 16.3 % (6.8 to 30.7) were asymptomatic. For 31 of these individuals, positive seroneutralization was demonstrated in vitro.

Conclusions. In this population of very low anti-SARS-CoV-2 antibody seroprevalence, a beneficial effect of the lockdown can be assumed, with frequent SARS-CoV-2 seroneutralization among IgT-positive patients.

The results were published first in Medrxiv corresponding to 27 and in a peer-reviewed international journal 11.

SARS‐CoV‐2 positive patients in hospital predictions Participants : A. Gégout-Petit, U. Herbach, N. Thorr.

In collaboration with H. Berry, D. Gemmerlé, T. Lepoutre, D. Maucourt and D. Parsons.

We followed Inria's recommendations to get involved in the fight against COVID 19. We tried to collaborate with the LCPME laboratory in the purpose to predict the number of SARS‐CoV‐2 positive patients from the Grand Nancy metropolitan at the Nancy University Hospital from the concentration of SARS-Cov-2 residues in waste water. We have encountered difficulties with the Obépine network in obtaining raw data rather than mere indicators. We made predictions from the incidence rates available on Santé Publique France. The predictions are available on the siwam website. Inria hired Nicolas Thorr as engineer during 6 months for this project.

# 9 Bilateral contracts and grants with industry

Participants: Bruno Scherrer.

## 9.1 Bilateral contracts with industry

B. Scherrer collaborates with Google Brain on reinforcement learning in the framework of the PhD thesis of Nino Vieillard.

# 10 Partnerships and cooperations

Participants: Nicolas Champagnat, Léo Darrigade, Coralie Fritsch, Anne Gégout-Petit, Ulysse Herbach, Joseph Lam-Weil, Jean-Marie Monnez, Aurélie Muller-Gueudin, Pierre Vallois, Denis Villemonnais, Sophie Wantz-Mézières.

## 10.1 International initiatives

### 10.1.1 Participation in International Programs

#### BRN

• Title:
Biostochastic Research Network
• Partner Institution(s):
• Universidad de Valparaiso (Chile) - CIMFAV – Facultad de Ingenieria - Soledad Torres, Rolando Rebolledo.
• CNRS, Inria & Institut Élie Cartan de Lorraine (France) - N. Champagnat, A. Lejay (coordinator for France), D. Villemonnais, R. Schott.
• Date/Duration:
2018–2022
• Goal:
Scientific exchange around probabilistic models in population ecology.

## 10.2 National initiatives

• FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing). Leader: Pr Athanase Benetos. Participants: Jean-Marie Monnez, Benoît Lalloué, Anne Gégout-Petit.
• RHU Fight HF (Fighting Heart Failure), located at the University Hospital of Nancy. Leader: Pr Patrick Rossignol). Participants: Jean-Marie Monnez, Benoît Lalloué.
• ITMO Physics, Mathematics applied to Cancer (2017-2022): “Modeling ctDNA dynamics for detecting targeted therapy resistance”. Funding organisms: ITMO Cancer, ITMO Technologies pour la santé de l’alliance nationale pour les sciences de la vie et de la santé (AVIESAN), INCa. Partners: Inria and IECL (Institut Élie Cartan de Lorraine), CHRU Strasbourg, CRAN (Centre de Recherche en Automatique de Nancy) and ICL (Institut de Cancérologie de Lorraine). Leader: N. Champagnat. Participants: L. Darrigade, C. Fritsch, A. Gégout-Petit, U. Herbach, A. Muller-Gueudin, P. Vallois.
• GDR 720 ISIS (funded by CNRS). Leader: Laure Blanc-Féraud. Participant: Sophie Mézières.

## 10.3 Regional initiatives

• CHRU de Nancy. We have good collaborations with several researchers from CHRU de Nancy. We are involved in LUE Impact Geenage in research axis telomeres.
• Région Grand-Est. In the context of the Telomere project, Anne Gégout-Petit and Denis Villemonais obtained a grant from Grand-Est region to hire Joseph Lam-Weil as a post-doctoral fellow. University of Lorraine and LUE GEENAGE program completed the grant.

# 11 Dissemination

Participants: Thierry Bastogne, Nicolas Champagnat, Léo Darrigade, Sandie Ferrigno, Coralie Fritsch, Anne Gégout-Petit, Vincent Hass, Ulysse Herbach, Joseph Lam-Weil, Rodolphe Loubaton, Jean-Marie Monnez, Aurélie Muller-Gueudin, Pierre Vallois, Denis Villemonnais, Sophie Wantz-Mézières, Nicolás Zalduendo Vidal.

## 11.1 Promoting scientific activities

### 11.1.3 Journal

#### Member of the editorial boards

• N. Champagnat served as co-editor-in-chief with Béatrice Laurent-Bonneau (IMT Toulouse) of ESAIM: Probability & Statistics until June. Since then, he served as an associate editor of this journal.
• N. Champagnat serves as an associate editor of Stochastic Models.

#### Reviewer - reviewing activities.

Here is a selection of the journals for which we regularly write referee reports: Bernoulli, Cell, Medicina, The Annals of Applied Probability, Stochastic Processes and their Applications, Journal de Mathématiques Pures et Appliquées, ALEA - Latin American Journal of Probability and Mathematical Statistics, ESAIM: Probability & Statistics, Journal of Theoretical Biology, Mathematical Biosciences, Journal of Physics A: Mathematical and Theoretical, Current Opinion in Systems Biology, Bioinformatics...

### 11.1.5 Leadership within the scientific community

• A. Gégout-Petit is vice-president of the European Network for Business and Industrial Statistics (ENBIS).

### 11.1.6 Scientific expertise

• C. Fritsch has been a member of the Committee for junior permanent research positions of Inria Nancy - Grand Est.
• A. Gégout-Petit has been a member of several hiring committees: as President for Université Technologique de Compiègne (MCF 26e section); Sorbonne Université (PR 26th section); National jury for 46.1 Professor recruitment; University of Luxembourg, Assistant professor in statistics.

### 11.1.7 Research administration

• N. Champagnat is a member of the coordination committee of MODCOV19, a platform of coordination of research actions about modeling of SARS-CoV-2 (Covid-19) pandemic. He heads the bibliographic awareness group.
• N. Champagnat is a member of the Comité de Centre, the COMIPERS and the Commission Information Scientifique et Technique of Inria Nancy - Grand Est and Responsable Scientifique for the library of Mathematics of the IECL. He is also local correspondent of the COERLE (Comité Opérationel d'Évaluation des Risques Légaux et Éthiques) for the Inria Research Center of Nancy - Grand Est.
• C. Fritsch is a member of the Commission du Développement Technologique of Inria Nancy-Grand Est and of the Commission du personnel of IECL. She was member of the Commission Parité-Égalité of IECL until August. She is the local Radar correspondent for the Inria Research Center of Nancy - Grand Est.
• A. Gégout-Petit is the head of “Institut Élie Cartan de Lorraine” (mathematics laboratory of Université de Lorraine).

## 11.2 Teaching - Supervision - Juries

### 11.2.1 Teaching

BIGS faculty members have teaching obligations at Université de Lorraine and are teaching at least 192 hours each year. They teach probability and statistics at different levels (Licence, Master, Engineering school). Many of them have pedagogical responsibilities.

• D. Villemonais is the head of the Mathematical Engineering Major of ENSMN, Université de Lorraine, France.
• T. Bastogne is in charge of the research master program “Santé Numérique et Imagerie Médicale” with the Faculty of Medicine, Université de Lorraine, France.
• Master: N. Champagnat, Introduction to Quantitative Finance, 12h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: N. Champagnat, Introduction to Quantitative Finance, 9h, M2, third year of ENSMN, Université de Lorraine, France.
• Master: N. Champagnat, Problèmes inverses, 15h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: S. Ferrigno, Experimental designs, 4.5h, M1, fourth year of EEIGM, Université de Lorraine, France.
• Master: S. Ferrigno, Data analyzing and mining, 63h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: S. Ferrigno, Modeling and forecasting, 43h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: S. Ferrigno, Training projects, 18h, M1/M2, second and third year of ENSMN, Université de Lorraine, France.
• Master: A. Muller-Gueudin, Probability and Statistics, 160h, second year of ENSEM and ENSAIA, Université de Lorraine, France.
• Master: A. Muller-Gueudin, Scientific calculation with Matlab, 20h, second year of ENSAIA, Université de Lorraine, France.
• Master: A. Gégout-Petit, Statistics, modeling, data analysis, 80h, master in applied mathematics, Université de Lorraine, France.
• Master: U. Herbach, Data analyzing and mining tutorial, 18h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: U. Herbach, Introduction to probability theory, 18h, M1, second year of ENSEM (apprenticeship cursus), Université de Lorraine, France.
• Master: R. Loubaton, Analyse de données, 16h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: R. Loubaton, Introduction à l'apprentissage automatique, 6h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: S. Wantz-Mézières, Learning and analysis of medical data, 36h, with J.M. Moureaux, Master SNIM, Université de Lorraine, France.
• Master: D. Villemonais, Probability Theory II, 63h, M1, second year of ENSMN, Université de Lorraine, France.
• Master: D. Villemonais, Stochastic processes, 32h, Master 2 MFA, Université de Lorraine, France.
• Master: D. Villemonais, Modeling and forecasting, 14h, M1, second year of ENSMN, Université de Lorraine, France.
• Licence: S. Wantz-Mézières, Applied mathematics for management, financial mathematics, Probability and Statistics, 160h, IUT Nancy-Charlemagne (L1/L2/L3), Université de Lorraine, France.
• Licence: S. Wantz-Mézières, Probability, 100h, first year in TELECOM Nancy (initial and apprenticeship cursus), Université de Lorraine, France.
• Licence: A. Muller-Gueudin, Statistics, 60h, first year of ENSAIA, Université de Lorraine, France.
• Licence: S. Ferrigno, Descriptive and inferential statistics, 60h, L2, second year of EEIGM, Université de Lorraine, France.
• Licence: S. Ferrigno, Statistical modeling, 60h, L2, second year of EEIGM, Université de Lorraine, France.
• Licence: S. Ferrigno, Mathematical and computational tools, 20h, L3, third year of EEIGM, Université de Lorraine, France.
• Licence: S. Ferrigno, Training projects, 40h, L1/L3, first, second and third year of EEIGM, Université de Lorraine, France.
• Licence: C. Fritsch, Probability Theory tutorial, 40h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: V. Hass, Complément d'analyse, 38h, L1, FST, Université de Lorraine, France.
• Licence: V. Hass, Analyse numérique et optimisation, 46h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: V. Hass, Probabilités, 40h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: V. Hass, Mathématiques FIGIM 1A, 35h, L1/L2, first year of ENSMN, Université de Lorraine, France.
• Licence: V. Hass, Mathématiques FIGIM 2A, 21h, L2, first year of ENSMN, Université de Lorraine, France.
• Licence: U. Herbach, Statistics tutorial, 39h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: R. Loubaton, Inférence statistique, 21h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: R. Loubaton, Probabilités, 20h, L2, FST, Université de Lorraine, France.
• Licence: R. Loubaton, FST, Méthodes Numériques, 10h, L2, FST, Université de Lorraine, France.
• Licence: R. Loubaton, Latex, 9h, L2, FST, Université de Lorraine, France.
• Licence: R. Loubaton, Remédiation mathématique, 30h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: R. Loubaton, Analyse numérique et optimisation, 40h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: D. Villemonais, Probability Theory, 57h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: N. Zalduendo Vidal, Probability Theory tutorial, 40h, L3, first year of ENSMN, Université de Lorraine, France.
• Licence: N. Zalduendo Vidal, Numerical Analysis tutorial, 20h, L3, first year of ENSMN, Université de Lorraine, France.

### 11.2.2 Supervision

#### PhD

• PhD: Nassim Sahki, “Data-driven methodology for sequential change-point detection for physiological signals”, grant Inria-Cordis. Defence 29 Nov 2021. Advisors: A. Gégout-Petit, S. Wantz-Mézières 23.
• PhD in progress: Vincent Hass, “Individual-based models in adaptive dynamics and long time evolution under assumptions of rare advantageous mutations”, grant Inria-Cordis. Advisor: N. Champagnat.
• PhD in progress: Rodolphe Loubaton, “Caractérisation des cibles thérapeutiques dans un programme génique tumoral”, grant Région Grand-Est. Advisors: N. Champagnat and L. Vallat (CHRU Strasbourg).
• PhD in progress: Anouk Rago, “Inférence de réseaux de gènes dynamiques et prédiction d’expériences d’interventions biologiques dans des cellules cancéreuses”, grant Région Grand-Est, Inria. Advisors: N. Champagnat, A. Gégout-Petit.
• PhD in progress: Nino Vieillard, "Approximate Dynamic Programming and Deep Reinforcement Learning", CIFRE with Google Brain. Advisors: B. Scherrer, M. Geist (Google Brain).
• PhD in progress: Nicolás Zalduendo Vidal, “Processus de branchement bi-sexués multi-types”, grant Inria-Cordis. Advisors: C. Fritsch, D. Villemonais.

#### Other

• Parcours Recherche: Asmaa Labtaina, “Processus de Markov déterministes par morceaux et leur application à l’expression stochastique des gènes” (full-year research project, M1 ENSMN). Advisor: U. Herbach.
• TER: Mohammed Khatbane and Abdelkabir Bouyghf, “Méthodes variationnelles en apprentissage statistique : l’exemple du modèle Latent Dirichlet Allocation” (research project, M1 Univ. Lorraine). Advisor: U. Herbach.

### 11.2.3 Juries

• PhD: N. Champagnat, reporter, thesis of Maxime Berger, “Le comportement critique de la quasi-espèce”, Université PSL.
• PhD: N. Champagnat, reporter, thesis of Felipe Munoz-Hernandez, “Approximation quantitative en grande population de modèles stochastiques avec interaction ou environnement variable”, Institut Polytechnique de Paris.
• PhD: N. Champagnat, reporter, thesis of Julie Tourniaire, “Spatial dynamics of interfaces in ecology: deterministic and stochastic models”, Institut Polytechnique de Paris.
• PhD: A. Gégout-Petit, president, thesis of A. Conanec Rago, Université de Bordeaux.
• PhD: A. Gégout-Petit, president, thesis of S. Yacheur, Université de Lorraine.
• PhD Prize: A. Gégout-Petit, jury member, 2021 AMIES PhD Prize.
• PhD: B. Scherrer, reporter, thesis of Yannis Flet-Berliac, “Sample-Efficient Deep Reinforcement Learning for Control, Exploration and Safety”, Université de Lille.
• PhD: B. Scherrer, reporter, thesis of Marc Etheve, “Using machine learning to solve repeated optimization problems”, CNAM Paris (CIFRE with EDF).

## 11.3 Popularization

### 11.3.1 Education

• S. Ferrigno: Advisor of a group of students (EEIGM), "Traitement statistique de données" project, various high schools, Nancy, 2021
• S. Ferrigno: Advisor of a group of students (EEIGM), "La main à la Pâte" project, Institut médico-éducatif (IME), Commercy, October-November 2021
• S. Mézières: organisation of a research training week on Neurooncology and Numerics, for medical and engineering students, January 2021

### 11.3.2 Interventions

• C. Fritsch made two interventions in the Lycée Cormontaigne in Metz, as part of the “Chiche!” program, in December 2021.

# 12 Scientific production

## 12.1 Publications of the year

### International journals

• 1 articlecvmgof: an R package for Cramér-von Mises goodness-of-fit tests in regression models.Journal of Statistical Computation and SimulationOctober 2021
• 2 articleB.Bérangère Bastien, T.Taha Boukhobza, H.Hélène Dumond, A.Anne Gégout-Petit, A.Aurélie Muller-Gueudin and C.Charlène Thiébaut. A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology.Journal of Applied Statistics2021, 23
• 3 articleT.Thierry Bastogne. iQbD: a TRL-indexed quality-by-design paradigm for medical device engineering.Journal of Medical Devices2021
• 4 articleM.Michel Benaïm, N.Nicolas Champagnat and D.Denis Villemonais. Stochastic approximation of quasi-stationary distributions for diffusion processes in a bounded domain.Annales de l'Institut Henri Poincaré (B) Probabilités et Statistiques5722021, 726-739
• 5 articleN.Nicolas Champagnat, S.Sylvie Méléard and V. C.Viet Chi Tran. Stochastic analysis of emergence of evolutionary cyclic behavior in population dynamics with transfer.Annals of Applied Probability3142021, 1820-1867
• 6 articleN.Nicolas Champagnat, R.René Schott and D.Denis Villemonais. Analysis of distributed systems via quasi-stationary distributions.Stochastic Analysis and Applications3662021, 981-998
• 7 articleConvergence of the Fleming-Viot process toward the minimal quasi-stationary distribution.ALEA : Latin American Journal of Probability and Mathematical Statistics182021, 1-15
• 8 articleLyapunov criteria for uniform convergence of conditional distributions of absorbed Markov processes.Stochastic Processes and their Applications135May 2021, 51-74
• 9 articleA. M.Alexander M. G. Cox, E. L.Emma L. Horton, A. E.Andreas E. Kyprianou and D.Denis Villemonais. Stochastic Methods for Neutron Transport Equation III: Generational many-to-one and ${k}_{\mathrm{𝚎𝚏𝚏}}$.SIAM Journal on Applied Mathematics813May 2021
• 10 articleC.Coralie Fritsch, S.Sylvain Billiard and N.Nicolas Champagnat. Identifying conversion efficiency as a key mechanism underlying food webs evolution : a step forward, or backward ?Oikos13062021, 904-930
• 11 articleA.Anne Gégout-Petit, H.Hélène Jeulin, K.Karine Legrand, N.Nicolas Jay, A.Agathe Bochnakian, P.Pierre Vallois, E.Evelyne Schvoerer and F.Francis Guillemin. Seroprevalence of SARS-CoV-2, Symptom Profiles and Sero-Neutralization in a Suburban Area, France.Viruses136June 2021, 1076
• 12 articleJ.-P.Jean-Philippe Jehl, P.Pan Dan, A.Arnaud Voignier, N.Nguyen Tran, T.Thierry Bastogne, P.Pablo Maureira and F.Franck Cleymand. Transverse isotropic modelling of left-ventricle passive filling: mechanical characterization for epicardial biomaterial manufacturing.Journal of the mechanical behavior of biomedical materials119July 2021, 104492
• 13 articleB.Benoît Lalloué, J.-M.Jean-Marie Monnez and E.Eliane Albuisson. Streaming constrained binary logistic regression with online standardized data.Journal of Applied Statistics2021
• 14 articleB.Benoît Lalloué, J.-M.Jean-Marie Monnez, D.Donata Lucci and E.Eliane Albuisson. Construction of Parsimonious Event Risk Scores by an Ensemble Method. An Illustration for Short-Term Predictions in Chronic Heart Failure Patients from the GISSI-HF Trial.Applied Mathematics127July 2021, 627-653
• 15 articleJ.-M.Jean-Marie Monnez and A.Abderrahman Skiredj. Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods.Journal of Multivariate Analysis182March 2021, 104694
• 16 articleM.Mathias Waelli, E.Etienne Minvielle, M. X.Maria Ximena Acero, K.Khouloud Ba and B.Benoit Lalloue. What matters to patients? A mixed method study of the importance and consideration of oncology patient demands.BMC Health Services Research212021, 256

### International peer-reviewed conferences

• 17 inproceedingsR.Robert Dadashi, S.Shideh Rezaeifar, N.Nino Vieillard, L.Léonard Hussenot, O.Olivier Pietquin and M.Matthieu Geist. Offline Reinforcement Learning with Pseudometric Learning.38th International Conference on Machine Learning139virtual, FranceJune 2021
• 18 inproceedingsS.Shideh Rezaeifar, R.Robert Dadashi, N.Nino Vieillard, L.Léonard Hussenot, O.Olivier Bachem, O.Olivier Pietquin and M.Matthieu Geist. Offline Reinforcement Learning as Anti-Exploration.36th AAAI Conference on Artificial IntelligenceVancouver, CanadaFebruary 2022
• 19 inproceedingsN.Nassim Sahki, A.Anne Gégout-Petit and S.Sophie Wantz-Mézières. Detection of breaks in EMG signals of upper trapezius muscle activity.JDS 2021 - 52èmes Journées de Statistique de la SFdSNice / Virtual, FranceJune 2021

### Conferences without proceedings

• 20 inproceedingsT.Thierry Bastogne, S.Sanne Bevers, S.Sander Kooijmans, L.Lucie Hassler, S. E.Samir El Andaloussi, R.Raymond Schiffelers and S.Stefaan De Koker. easyQBD: A quality by design SaaS platform. Application to the development of lipid nanoparticles for mRNA delivery..6th Bioproduction CongressLyon, FranceSeptember 2021
• 21 inproceedingsS.Sanne Bevers, S.Sander Kooijmans, E.Elien van de Velde, M.Martijn Evers, S.Sofie Seghers, J.Jerney Gitz-François, L.Lucie Hassler, K.Karine Breckpot, T.Thierry Bastogne, R.Raymond Schiffelers and S.Stefaan De Koker. Tuning LNPs to target antigen presenting cells in spleen induces CD8 T-cell responses and tumor regression in mice.18th CIMT Annual MeetingMainz, GermanyMay 2021
• 22 inproceedingsS.Sander Kooijmans, S.Sanne Bevers, E.Elien van de Velde, M. J.Martijn J W Evers, S.Sofie Seghers, J. J.Jerney J J M Gitz-François, L.Lucie Hassler, K.Karine Breckpot, T.Thierry Bastogne, R. M.Raymond M Schiffelers and S.Stefaan De Koker. Rationally designed mRNA-loaded lipid nanoparticles provoke strong antitumor T cell immunity which critically depends on specific immune cell subsets.Annual Meeting of the Controlled Release Society, CRS 2021Virtual, United StatesJuly 2021

### Doctoral dissertations and habilitation theses

• 23 thesisData-driven methodology for sequential change-point detection for physiological signals.Université de Lorraine; École doctorale IAEM Lorraine - Informatique, Automatique, Électronique - Électrotechnique, Mathématiques de LorraineNovember 2021

### Reports & preprints

• 24 miscM.Michel Benaïm, N.Nicolas Champagnat, W.William Oçafrain and D.Denis Villemonais. Degenerate processes killed at the boundary of a domain.2021
• 25 miscM.Michel Benaïm, N.Nicolas Champagnat, W.William Oçafrain and D.Denis Villemonais. Transcritical bifurcation for the conditional distribution of a diffusion process.December 2021
• 26 miscN. P.Nan Papili Gao, O.Olivier Gandrillon, A.András Páldi, U.Ulysse Herbach and R.Rudiyanto Gunawan. Universality of cell differentiation trajectories revealed by a reconstruction of transcriptional uncertainty landscapes from single-cell transcriptomic data.February 2021
• 27 miscA.Anne Gégout-Petit, H.Hélène Jeulin, K.Karine Legrand, A.Agathe Bochnakian, P.Pierre Vallois, E.Evelyne Schvoerer and F.Francis Guillemin. Seroprevalence of SARS-CoV-2, symptom profiles and seroneutralization during the first COVID-19 wave in a suburban area, France.June 2021
• 28 miscGene regulatory network inference from single-cell data using a self-consistent proteomic field.October 2021
• 29 miscS.Svante Janson, C.Cécile Mailler and D.Denis Villemonais. Fluctuations of balanced urns with infinitely many colours.November 2021
• 30 miscB.Benoît Lalloué, J.-M.Jean-Marie Monnez and E.Eliane Albuisson. Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression.February 2021

### Other scientific publications

• 31 inproceedingsS.Sandie Ferrigno. Semiparametric reference curves for EDEN cohort.CMStatistics 2021Londres, United KingdomDecember 2021

## 12.2 Other

### Educational activities

• 32 unpublishedJ.-M.Jean-Marie Monnez. Cours d'analyse des données et apprentissage : L'analyse en composantes principales.April 2021, MasterFrance
• 33 unpublishedJ.-M.Jean-Marie Monnez. Méthodes de classification non supervisée.April 2021, MasterFrance

### Softwares

• 34 softwareHarissa: tools for mechanistic gene network inference from single-cell data.October 2021BSD 3-Clause "New" or "Revised" License

## 12.3 Cited publications

• 35 articleR.Romain Aza\"is, F.François Dufour and A.Anne Gégout-Petit. Non-Parametric Estimation of the Conditional Distribution of the Interjumping Times for Piecewise-Deterministic Markov Processes.Scandinavian Journal of Statistics414December 2014, 950--969
• 36 softwareR.Romain Aza\"is, S.Sandie Ferrigno and M.-J.Marie-José Martinez. cvmgof: Cramer-von Mises goodness-of-fit tests.1.0.0November 2018CeCILL
• 37 articleR.Romain Aza\"is and A.Aurélie Muller-Gueudin. Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes.Electronic journal of statistics 2016
• 38 articleR.Romain Azaïs. A recursive nonparametric estimator for the transition kernel of a piecewise-deterministic Markov process.ESAIM: Probability and Statistics182014, 726--749
• 39 inproceedingsR.Romain Azaïs, F.François Dufour and A.Anne Gégout-Petit. Nonparametric estimation of the jump rate for non-homogeneous marked renewal processes.Annales de l'Institut Henri Poincaré, Probabilités et Statistiques494Institut Henri Poincaré2013, 1204--1231
• 40 incollectionJ. M.J. M. Bardet, G.G. Lang, G.G. Oppenheim, A.A. Philippe, S.S. Stoev and M.M.S. Taqqu. Semi-parametric estimation of the long-range dependence parameter: a survey.Theory and applications of long-range dependenceBirkhauser Boston2003, 557-577
• 41 articleT.Thierry Bastogne, S.Sophie Mézières-Wantz, N.Nacim Ramdani, P.Pierre Vallois and M.Muriel Barberi-Heyob. Identification of pharmacokinetics models in the presence of timing noise.Eur. J. Control1422008, 149--157
• 42 articleT.Thierry Bastogne, A.Adeline Samson, P.Pierre Vallois, S.S Wantz-Mézières, S.Sophie Pinel, D.Denise Bechet and M.Muriel Barberi-Heyob. Phenomenological modeling of tumor diameter growth based on a mixed effects model.Journal of theoretical biology26232010, 544--552
• 43 bookD.D.P. Bertsekas and J.J.N. Tsitsiklis. Neurodynamic Programming.Athena Scientific1996
• 44 articleH.Hermine Biermé, C.Céline Lacaux and H.-P.Hans-Peter Scheffler. Multi-operator Scaling Random Fields.Stochastic Processes and their Applications12111MAP5 2011-012011, 2642-2677
• 45 articleH.Hervé Cardot, P.Peggy Cénac and J.-M.Jean-Marie Monnez. A fast and recursive algorithm for clustering large datasets with k-medians.Computational Statistics & Data Analysis5662012, 1434--1449
• 46 articleP.Patrick Cattiaux and S.Sylvie Méléard. Competitive or weak cooperative stochastic Lotka--Volterra systems conditioned on non-extinction.Journal of mathematical biology6062010, 797--829
• 47 articleN.Nicolas Champagnat and D.Denis Villemonais. Exponential convergence to quasi-stationary distribution and Q-process.Probability Theory and Related Fields164146 pages2016, 243-283
• 48 articleJ. F.J. F. Coeurjolly. Simulation and identification of the fractional brownian motion: a bibliographical and comparative study.Journal of Statistical Software52000, 1--53
• 49 articleM. H.Mark HA Davis. Piecewise-deterministic Markov processes: A general class of non-diffusion stochastic models.Journal of the Royal Statistical Society. Series B (Methodological)1984, 353--388
• 50 articleA.Aurélien Deya and S.Samy Tindel. Rough Volterra equations. I. The algebraic integration setting.Stoch. Dyn.932009, 437--477
• 51 articleM.Marie Doumic, M.Marc Hoffmann, N.Nathalie Krell and L.Lydia Robert. Statistical estimation of a growth-fragmentation model observed on a genealogical tree.Bernoulli2132015, 1760--1799
• 52 articleS.Sandie Ferrigno and G.Gilles Ducharme. Un test d'adéquation global pour la fonction de répartition conditionnelle.C. R. Math. Acad. Sci. Paris34152005, 313--316
• 53 articleS.Sandie Ferrigno, M.Myriam Maumy-Bertrand and A.Aurélie Muller-Gueudin. Uniform law of the logarithm for the local linear estimator of the conditional distribution function.C. R. Math. Acad. Sci. Paris34817-182010, 1015--1019
• 54 articleJ.Jerome Friedman, T.Trevor Hastie and R.Robert Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics932008, 432--441
• 55 articleC.Christophe Giraud, S.Sylvie Huet and N.Nicolas Verzelen. Graph selection with GGMselect.Statistical applications in genetics and molecular biology1132012
• 56 inproceedingsT.T.D. Hansen and U.U. Zwick. Lower Bounds for Howard's Algorithm for Finding Minimum Mean-Cost Cycles.ISAAC (1)2010, 415-426
• 57 articleS.Samuel Herrmann and P.Pierre Vallois. From persistent random walk to the telegraph noise.Stoch. Dyn.1022010, 161--196
• 58 incollectionJ.Jianghai Hu, W.-C.Wei-Chung Wu and S.Shankar Sastry. Modeling subtilin production in bacillus subtilis using stochastic hybrid systems.Hybrid Systems: Computation and ControlSpringer2004, 417--431
• 59 articleR.Roukaya Keinj, T.Thierry Bastogne and P.Pierre Vallois. Multinomial model-based formulations of TCP and NTCP for radiotherapy treatment planning.Journal of Theoretical Biology2791June 2011, 55-62
• 60 bookR.Roger Koenker. Quantile regression.38Cambridge university press2005
• 61 bookY. A.Yury A. Kutoyants. Statistical inference for ergodic diffusion processes.Springer Series in StatisticsLondonSpringer-Verlag London Ltd.2004, xiv+481
• 62 articleC.Céline Lacaux. Real Harmonizable Multifractional Lévy Motions.Ann. Inst. Poincaré.4032004, 259--277
• 63 inproceedingsB.Benôit Lalloué, J.-M.Jean-Marie Monnez and E.Eliane Albuisson. Convergence d'un score d'ensemble en ligne : étude empirique.52e Journées de StatistiqueSociété Française de StatistiqueNice, FranceJuly 2020
• 64 incollectionL.Ludovic Lebart. On the Benzecri's method for computing eigenvectors by stochastic approximation (the case of binary data).Compstat 1974 (Proc. Sympos. Computational Statist., Univ. Vienna, Vienna, 1974)ViennaPhysica Verlag1974, 202--211
• 65 inproceedingsB.Boris Lesner and B.Bruno Scherrer. Non-Stationary Approximate Modified Policy Iteration.ICML 2015Lille, FranceJuly 2015
• 66 bookT.T. Lyons and Z.Z. Qian. System control and rough paths.Oxford mathematical monographsClarendon Press2002,
• 67 articleN.Nicolai Meinshausen and P.Peter Bühlmann. High-dimensional graphs and variable selection with the lasso.The Annals of Statistics2006, 1436--1462
• 68 articleJ.-M.Jean-Marie Monnez. Approximation stochastique en analyse factorielle multiple.Ann. I.S.U.P.5032006, 27--45
• 69 articleJ.-M.Jean-Marie Monnez. Convergence d'un processus d'approximation stochastique en analyse factorielle.Publ. Inst. Statist. Univ. Paris3811994, 37--55
• 70 articleJ.-M.Jean-Marie Monnez. Stochastic approximation of the factors of a generalized canonical correlation analysis.Statist. Probab. Lett.78142008, 2210--2216
• 71 articleE.EA Nadaraya. On non-parametric estimates of density functions and regression curves.Theory of Probability & Its Applications1011965, 186--190
• 72 techreportI.I. Post and Y.Y. Ye. The simplex method is strongly polynomial for deterministic Markov decision processes.arXiv:1208.5083v22012
• 73 bookM.M. Puterman. Markov Decision Processes.Wiley, New York1994
• 74 inproceedingsB.Bernard Roynette, P.Pierre Vallois and M.Marc Yor. Brownian penalisations related to excursion lengths, VII.Annales de l'IHP Probabilités et statistiques4522009, 421--452
• 75 incollectionF.Francesco Russo and P.Pierre Vallois. Elements of stochastic calculus via regularization.Séminaire de Probabilités XL1899Lecture Notes in Math.BerlinSpringer2007, 147--185
• 76 articleF.Francesco Russo and P.Pierre Vallois. Stochastic calculus with respect to continuous finite quadratic variation processes.Stochastics: An International Journal of Probability and Stochastic Processes701-22000, 1--40
• 77 inproceedingsB.Bruno Scherrer. Approximate Policy Iteration Schemes: A Comparison.ICML - 31st International Conference on Machine Learning - 2014Pékin, ChinaJune 2014
• 78 articleB.Bruno Scherrer, M.Mohammad Ghavamzadeh, V.Victor Gabillon, B.Boris Lesner and M.Matthieu Geist. Approximate Modified Policy Iteration and its Application to the Game of Tetris.Journal of Machine Learning Research16A parâitre2015, 1629--1676
• 79 articleB.Bruno Scherrer. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration.Mathematics of Operations ResearchMarkov decision processes ; Dynamic Programming ; Analysis of AlgorithmsFebruary 2016
• 80 inproceedingsB.Bruno Scherrer and B.Boris Lesner. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes.NIPS 2012 - Neural Information Processing SystemsSouth Lake Tahoe, United StatesDecember 2012
• 81 articleB.Bruno Scherrer. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris.Journal of Machine Learning Research14January 2013, 1175-1221
• 82 articleP.Pierre Vallois and C. S.Charles S. Tapiero. Memory-based persistence in a counting random walk process.Phys. A.38612007, 303--307
• 83 articleP.Pierre Vallois. The range of a simple random walk on Z.Advances in applied probability1996, 1014--1033
• 84 miscN.Nathalie Villa-Vialaneix. An introduction to network inference and mining.(consulté le 22/07/2015)2015,
• 85 articleY.Y. Ye. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate.Math. Oper. Res.3642011, 593-603