Keywords
 A3.1. Data
 A3.2. Knowledge
 A3.2.3. Inference
 A3.3. Data and knowledge analysis
 A3.3.1. Online analytical processing
 A3.3.2. Data mining
 A3.3.3. Big data analysis
 A3.4.1. Supervised learning
 A3.4.2. Unsupervised learning
 A3.4.4. Optimization and learning
 A3.4.7. Kernel methods
 A6. Modeling, simulation and control
 A6.1. Methods in mathematical modeling
 A6.1.2. Stochastic Modeling
 A6.2. Scientific computing, Numerical Analysis & Optimization
 A6.2.3. Probabilistic methods
 A6.2.4. Statistical methods
 A6.4. Automatic control
 A6.4.2. Stochastic control
 B1. Life sciences
 B1.1. Biology
 B1.1.2. Molecular and cellular biology
 B1.1.10. Systems and synthetic biology
 B1.1.11. Plant Biology
 B2.2. Physiology and diseases
 B2.2.1. Cardiovascular and respiratory diseases
 B2.2.3. Cancer
 B2.3. Epidemiology
 B2.4. Therapies
1 Team members, visitors, external collaborators
Research Scientists
 Nicolas Champagnat [Team leader, Inria, Senior Researcher, HDR]
 Coralie Fritsch [Inria, Researcher]
 Ulysse Herbach [Inria, Researcher]
 Bruno Scherrer [Inria, Researcher, HDR]
Faculty Members
 Thierry Bastogne [Univ de Lorraine, Associate Professor, HDR]
 Sandie Ferrigno [Univ de Lorraine, Associate Professor]
 Anne Gégout Petit [Univ de Lorraine, Professor, HDR]
 JeanMarie Monnez [Univ de Lorraine, Emeritus, HDR]
 Aurélie MullerGueudin [Univ de Lorraine, Associate Professor]
 Sophie Mézières [Univ de Lorraine, Associate Professor]
 Pierre Vallois [Univ de Lorraine, Emeritus, HDR]
 Denis Villemonais [Univ de Lorraine, Associate Professor, HDR]
PostDoctoral Fellows
 Leo Darrigade [Inria, from Apr 2021]
 Joseph LamWeil [Univ de Lorraine, from Jun 2021]
 William Ocafrain [Inria]
PhD Students
 Vincent Hass [Univ de Lorraine, Inria until Aug 2021, ATER from Sep 2021]
 Rodolphe Loubaton [Univ de Lorraine, ATER]
 Anouk Rago [Univ de Lorraine, from Oct 2021]
 Nassim Sahki [Univ de Lorraine, Inria until Feb 2021, ATER from Mar 2021 until Aug 2021]
 Nino Vieillard [Google, CIFRE]
 Nicolás Zalduendo Vidal [Inria]
Technical Staff
 Joseph LamWeil [Univ de Lorraine, Engineer, from Apr 2021 until Jun 2021]
 Nicolas Thorr [Inria, Engineer, until Jun 2021]
Administrative Assistant
 Emmanuelle Deschamps [Inria]
2 Overall objectives
BIGS is a joint team of Inria, CNRS and Université Lorraine, via the Institut Élie Cartan, UMR 7502 CNRSUL laboratory in mathematics, of which Inria is a strong partner. One member of BIGS, T. Bastogne, comes from the Research Center of Automatic Control of Nancy (CRAN), with which BIGS has strong relations in the domain "HealthBiologySignal". Our research is mainly focused on stochastic modeling and statistics but also aiming at a better understanding of biological systems. BIGS involves applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) population studies of genomic data and of microorganisms genomics, (4) epidemiology and ehealth.
3 Research program
3.1 Introduction
We give here the main lines of our research that belongs to the domains of probability and statistics. For clarity, we made the choice to structure them in four items. Although this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.
3.2 Stochastic modeling
Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells 83, 59. Piecewise deterministic processes are non diffusion processes also frequently used in the biological context 49, 58, 51. Among Markov models, we developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations 76, 57. For instance, knowledge about Brownian or random walk excursions 82, 74 helps to analyse genetic sequences and to develop inference about them. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion.
This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion of fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the socalled rough paths tools 66, but also invoking the RussoVallois integration techniques 75. The specific issue of Volterra equations driven by fractional Brownian motion, which is central for the subdiffusion within proteins problem, is addressed in 50. Many generalizations (Gaussian or not) of this model have been recently proposed for some Gaussian locally selfsimilar fields, or for some nonGaussian models 62, or for anisotropic models 44.
3.3 Estimation and control for stochastic processes
We develop inference about stochastic processes that we use for modeling. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.
There are many estimation techniques for diffusion processes or coefficients of fractional or multifractional Brownian motion according to a set of observations 61, 40, 48. However, the inference problem for diffusions driven by a fractional Brownian motion is still in its infancy. Our team has a good expertise about inference of the jump rate and the kernel of piecewisedeterministic Markov processes (PDMP) 39, 35, 38, 37, but there are many directions to go further into. For instance, previous work made the assumption of a complete observation of jumps and mode, which is unrealistic in practice. We tackle the problem of inference of “hidden PDMP”. For example, in pharmacokinetics modeling inference, we want to account for the presence of timing noise and identification from longitudinal data. We have expertise on these subjects 41, and we also used mixed models to estimate tumor growth 42.
We consider the control of stochastic processes within the framework of Markov Decision Processes 73 and their generalization known as multiplayer stochastic games, with a particular focus on infinitehorizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of 43. Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years 85, 72, 56, 79, but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes 81, 80, 77, 65, 78, and we currently investigate whether/how similar ideas may be adapted to multiplayer stochastic games.
3.4 Algorithms and estimation for graph data
A graph data structure consists of a set of nodes, together with a set of pairs of these nodes called edges. This type of data is frequently used in biology because they provide a mathematical representation of many concepts such as biological structures and networks of relationships in a population. Some attention has recently been focused in the group on modeling and inference for graph data.
Network inference is the process of making inference about the link between two variables, taking into account the information about other variables. 84 gives a very good introduction and many references about network inference and mining. Many methods are available to infer and test edges in Gaussian graphical models 84, 67, 54, 55. However, the Gaussian assumption does not hold when dealing with typical “zeroinflated” abundance data, and we want to develop inference in this case.
Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model. We also work on lossy compression of trees via directed acyclic graphs. These methods enable us to compute distances between tree data faster than from the original structures and with a high accuracy.
3.5 Regression and machine learning
Regression models and machine learning aim at inferring statistical links between a variable of interest and covariates. In biological study, it is always important to develop adapted learning methods both in the context of standard data and also for data of high dimension (with sometimes few observations) and very massive or online data.
Many methods are available to estimate conditional quantiles and test dependencies 71, 60. Among them we have developed nonparametric estimation by local analysis via kernel methods 52, 53 and we want to study properties of this estimator in order to derive a measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to develop omnibus tests that examine several assumptions together.
Concerning the analysis of high dimensional data, our view on the topic relies on the French data analysis school, specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool 64, which allows one to approximate eigenvectors in a stepwise manner 69, 68, 70. BIGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms 45. We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA (Principal Component Analysis).
We also focus on the biological context of highthroughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. We have to account for the interindividual variability within the modeling procedure. We aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (ExpectationMaximisation) algorithm for the estimation of the model parameters.
4 Application domains
4.1 Tumor growthoncology
On this topic, we want to propose branching processes to model the appearance of mutations in tumors, through new collaborations with clinicians who measure a particular quantity called circulating tumor DNA (ctDNA). The final purpose is to use ctDNA as an early biomarker of the resistance to an immunotherapy treatment: it is the aim of the ITMO project. Another topic is the identification of dynamic networks of gene expression. In the ongoing work on lowgrade gliomas, a local database of 400 patients will be soon available to construct models. We plan to extend it through national and international collaborations (Montpellier CHU, Montreal CRHUM). Our aim is to build a decisionaid tool for personalised medicine. In the same context, there is a topic of clustering analysis of a brain cartography obtained by sensorial simulations during awake surgery.
4.2 Genomic data and microorganisms population
Despite of his 'G' in the name of BIGS, Genetics is not central in the applications of the team. However, we want to contribute to a better understanding of the correlations between genes trough their expression data and of the genetic bases of drug response and disease. We have contributed to methods detecting proteomics and transcriptomics variables linked with the outcome of a treatment.
4.3 Epidemiology and ehealth
We have many works to do in our ongoing projects in the context of personalized medicine with CHU Nancy. They deal with biomarkers research, prognostic value of quantitative variables and events, scoring, and adverse events. We also want to develop our expertise in rupture detection in a project with APHP (Assistance Publique Hôpitaux de Paris) for the detection of adverse events, earlier than the clinical signs and symptoms. The clinical relevance of predictive analytics is obvious for highrisk patients such as those with solid organ transplantation or severe chronic respiratory disease for instance. The main challenge is the rupture detection in multivariate and heterogeneous signals (for instance daily measures of electrocardiogram, body temperature, spirometry parameters, sleep duration, etc.). Other collaborations with clinicians concern foetopathology and we want to use our work on conditional distribution function to explain fetal and child growth. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy).
4.4 Dynamics of telomeres
Telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Through a collaboration with Pr A. Benetos, geriatrician at CHU Nancy, we recently obtained data on the distribution of the length of telomeres from blood cells. With members of Inria team TOSCA, we want to work in three connected directions: (1) refine methodology for the analysis of the available data; (2) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior, quasistationarity, etc.); and (3) use these properties to develop new statistical methods. A slot of postdoc position is already planned in the Lorraine Université d'Excellence, LUE project GEENAGE (managed by CHU Nancy).
5 Social and environmental responsibility
We followed Inria's recommendations to get involved in the fight against COVID 19. We tried to collaborate with the LCPME laboratory in the purpose to predict the number of SARS‐CoV‐2 positive patients from the Grand Nancy metropolitan at the Nancy University Hospital from the concentration of SARSCov2 residues in waste water. We have encountered difficulties with the Obépine network in obtaining raw data instead of preprocessed indicators. We made predictions from the incidence rates available on Santé Publique France. The predictions are available on the siwam website.
We were also involved in the MODCOV19 project, a platform of coordination of research actions about modeling of SARSCoV2 (Covid19) pandemic. We were in particular responsible for the bibliographic awareness group of the coordination committee.
6 Highlights of the year
The list of permanent members of the team noticeably increased in 2021, due to the arrival of several researchers from the former Inria team Tosca. These researchers are experts of stochastic modeling and analysis for biomedical applications. Their arrival led to a strengthening of the first axis of our research program. We are currently proposing a new Inria team Simba which takes into account these arrivals and the recent recruitments in the past few years in our team, and more generally on the topic of mathematical biology in Institut Élie Cartan de Lorraine.
7 New software and platforms
The team has been developing three new packages.
7.1 New software
7.1.1 ARMADA

Name:
A Statistical Methodology to Select Covariates in HighDimensional Data under Dependence

Keywords:
Biostatistics, Aggregated methods, High Dimensional Data, Personalized medicine, Variable selection

Functional Description:
Two steps variable selection procedure in a context of highdimensional dependent data with few observations. A first step is dedicated to eliminate the dependency between variables (clustering of variables, followed by factor analysis inside each cluster). A second step consists in variable selection by aggregation of adapted methods.

News of the Year:
This package is a new one.
 URL:
 Publications:

Contact:
Aurélie Muller

Participants:
Aurélie Muller, Anne Gegout Petit
7.1.2 cvmgof

Keywords:
Regression, Test, Estimators

Scientific Description:
Many goodnessoffit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are "directional" in that they detect departures from a given assumption of the model. Other tests are "global" (or "omnibus") in that they assess whether a model fits a dataset on all its assumptions. cvmgof focuses on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. It implements 2 nonparametric "directional" tests and one nonparametric "global" test, all based on generalizations of the Cramervon Mises statistic.

Functional Description:
cvmgof is an R library devoted to Cramervon Mises goodnessoffit tests. It implements three nonparametric statistical methods based on Cramervon Mises statistics to estimate and test a regression model.

News of the Year:
New version available on CRAN website since Jan 11 2021
 URL:
 Publication:

Contact:
Romain Azais

Participants:
Sandie Ferrigno, MarieJosé Martinez, Romain Azais
7.1.3 Harissa

Name:
Hartree approximation for inference along with a stochastic simulation algorithm

Keywords:
Gene regulatory networks, Reverse engineering, Molecular simulation

Functional Description:
Harissa is a Python package for both inference and simulation of gene regulatory networks, based on stochastic gene expression with transcriptional bursting. It was implemented in the context of a mechanistic approach to gene regulatory network inference from singlecell data.
 URL:
 Publications:

Contact:
Ulysse Herbach
8 New results
8.1 Stochastic modeling
Participants: Nicolas Champagnat, Coralie Fritsch, Anne GégoutPetit, Vincent Hass, Ulysse Herbach, William Oçafrain, Pierre Vallois, Denis Villemonais, Nicolás Zalduendo Vidal.
8.1.1 Reconstruction of epigenetic landscapes from singlecell data
The aim is to better understand how living cells make decisions (e.g., differentiation of a stem cell into a particular specialized type), seeing decisionmaking as an emergent property of an underlying complex molecular network. Indeed, it is now proven that cells react probabilistically to their environment: cell types do not correspond to fixed states, but rather to “potential wells” of a certain energy landscape (representing the energy of the possible states of the cell) that we are trying to reconstruct. A first paper proposing a reconstruction method has been submitted 26 in the framework of an international collaboration (USA, Switzerland, France). Another paper is about to be submitted 28, dealing more specifically with the inference of the underlying networks.
Joint work with Nan Papili Gao (ETH Zurich), Olivier Gandrillon (ENS Lyon), András Páldi (EPHE, Paris), and Rudiyanto Gunawan (University at Buffalo, New York)
8.1.2 Modeling and estimation of circulating tumor DNA (ctDNA) dynamics for detecting resistance to targeted therapies
Continuation of the ITMO Cancer project, supervised by Nicolas Champagnat, concerning the modeling of circulating tumor DNA (ctDNA) to detect the appearance of resistance to targeted therapies (personalized medicine). After a phase of investigation of possible scenarios in collaboration with Alexandre Harlé of the Institute of Cancerology of Lorraine (ICL), a final model was selected. Based on a mathematical analysis, the members of the project then designed a statistical inference algorithm (learning the parameters of the model, including the genealogical tree of mutations for each patient) which is intended to be validated on real data currently being acquired at the Nancy CHRU. The general idea is to exploit a “variational principle” that allows to explore the discrete space of family trees, of very large size, through a “pivot” space of continuous parameters, easy to optimize (and in reasonable numbers). A paper detailing the model and its inference is in preparation. The previous method allows for the reconstruction of intratumoral heterogeneity, i.e. the subclone composition of the tumor. Based on these data, we are currently studying models of stochastic tumor growth with an emphasis on interactions between the clones to assess the effects of different treatment strategies.
8.1.3 Quasistationary distributions
We are continuing our research on quasistationary distributions (QSD), that is, distributions of Markov stochastic processes with absorption, which are stationary conditionally on nonabsorption. For models of biological populations, absorption corresponds usually to extinction of a (sub)population. QSDs are fundamental tools to describe the population state before extinction and to quantify the largetime behavior of the probability of extinction.
This year, we solved a general conjecture on the FlemingViot particle systems approximating QSDs: in cases where several QSDs exist, it is expected that the stationary distributions of the FlemingViot processes approach a particular QSD, called minimal QSD. We proved that this holds true for general absorbed Markov processes with soft obstacles in 7. We also obtained in 8 criteria based on Lyapunov functions allowing to check general conditions of 47 which characterize the exponential uniform convergence in total variation of conditional distributions of an absorbed Markov process to a unique quasistationary distribution. Among the various applications they give, they prove that these conditions apply to any logistic Feller diffusions in any dimension conditioned to the nonextinction of all its coordinates. This question was left partly open since the first work of Cattiaux and Méléard on this topic 46.
Together with M. Benaïm (Univ. Neuchâtel), we studied in 4 stochastic algorithms to approximate quasistationary distributions of diffusion processes absorbed at the boundary of a bounded domain. We considered a reinforced version of the diffusion, which is resampled according to its occupation measure when it reaches the boundary. We showed that its occupation measure converges to the unique quasistationary distribution of the diffusion process. We also obtained in 24 general criteria ensuring existence, uniqueness and/or exponential convergence properties for quasistationary distributions. The criteria were specifically designed to apply to degenerate processes such as hypoelliptic diffusions. We also provided in 25 a counterexample to the uniqueness of a quasistationary distribution for a diffusion process which satisfies the weak Hörmander condition.
Together with R. Schott (IECL, Univ. Lorraine), we studied in 6 models of deadlocks in distributed systems, using the approach we developped in 8 to study quasistationary distributions, in order to characterize and compute numerically the asymptotic behaviour of the deadlock time and the behaviour of the system before deadlock, both for discrete and for diffusion models.
8.1.4 Evolutionary models of food webs
We studied models of food web adaptive evolution in 10. We identified the biomass conversion efficiency as a key mechanism underlying food web evolution and discussed the relevance of such models to study the evolution of food webs.
In collaboration with S. Billiard (Univ. Lille).
8.1.5 Adaptive dynamics in biological populations
We studied evolutionary models of bacteria with horizontal transfer in 5. Horizontal transfer is a common mechanism of DNA exchange between microorganisms that is thought to be responsible for fast evolution of antibiotic resistance for bacteria or evolution of virulence for pathogenes. We considered a scaling of parameters taking into account the influence of negligible but nonextinct populations, allowing us to study specific phenomena observed in these models (reemergence of traits, cyclic evolutionary dynamics and evolutionary suicide). This work is done in collaboration with S. Méléard (École Polytechnique) and V.C. Tran (Univ. Paris Est MarnelaVallée).
We also worked on general evolutionary models of adaptive dynamics under an assumption of large population and small mutations. This year, we obtained existence, uniqueness and ergodicity results for a centered version of the FlemingViot process of population genetics, which is a key step to recover variants of the canonical equation of adaptive dynamics, which describes the long time evolution of the dominant phenotype in the population, under less stringent biological assumptions than in previous works. We plan to complete this work next year.
8.2 Optimal control of Markov processes
Participants: Bruno Scherrer, Nino Vieillard.
We consider Offline Reinforcement Learning methods. The problem is to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the stateaction space of the environment, it is necessary to enforce the policy to visit stateaction pairs close to the support of logged transitions.
In 17, we propose an iterative procedure to learn a pseudometric (closely related to bisimulation metrics) from logged transitions, and use it to define this notion of closeness. We show its convergence and extend it to the function approximation setting. We then use this pseudometric to define a new lookup based bonus in an actorcritic algorithm: PLOFF. This bonus encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions.
In 18, noticing that an agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data, we take inspiration from the literature on bonusbased exploration to design a new offline RL agent. The core idea is to subtract a predictionbased exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more common regularization of the learned policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.
Joint work with Robert Dadashi, Shideh Rezaeifar, Léonard Hussenot, Olivier Pietquin, Olivier Bachem and Matthieu Geist.
8.3 Regression and machine learning
Participants: Thierry Bastogne, Sandie Ferrigno, Anne GégoutPetit, Aurélie Gueudin, Benoît Lalloué, JeanMarie Monnez, Nassim Sahki, Sophie WantzMézières.
8.3.1 Cramér–von Mises goodnessoffit tests in regression models
Many goodnessoffit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are 'directional' in that they detect departures from a given assumption of the model. Other tests are 'global' (or 'omnibus') in that they assess whether a model fits a dataset on all its assumptions. We focus on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. We consider 2 nonparametric 'directional' tests and one nonparametric 'global' test, all based on generalizations of the Cramérvon Mises statistic.
To perform these goodnessoffit tests, we develop the R package cvmgof 36, an easytouse tool for practitioners, available from the Comprehensive R Archive Network (CRAN). The use of the library is illustrated through a tutorial on real data and simulation studies are carried out in order to show how the package can be exploited to compare the 3 implemented tests. The practitioner can also easily compare the test procedures with different kernel functions, bootstrap distributions, numbers of bootstrap replicates, or bandwidths. The package was updated at the start of 2021, this is its third version. A first article 1 has been published on this work in October 2021.
We are now working on nonparametric tests associated with the functional form of the variance of the regression model. For this, we continue to work on the global test of Ducharme and Ferrigno in order to compare it in terms of performance with directional tests associated with the variance of the model. Many simulations are in progress. This will also make it possible to propose a more general packagetype tool making it possible to validate the regression models used in practice.
To complete this work, it would be interesting to assess the other assumptions of a regression model such as the additivity of the random error term. The implementation of these directional tests would enrich the cvmgof package and offer a complete easytouse tool for validating regression models. Moreover, the assessment of the overall validity of the model when using several directional tests could be compared with that done when using only a global test. In particular, the wellknown problem of multiple testing could be discussed by comparing the results obtained from multiple test procedures with those obtained when using a global test strategy. Another perspective of this work would be to develop a similar tool for other statistical models widely used in practice such as generalized linear models.
Join work with Romain Azaïs (INRIA, ENS Lyon) and MarieJosé Martinez (LJK, Université Grenoble Alpes).
8.3.2 Online data analysis
Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods. This article in collaboration with A. Skiredj was presented in the 2020 Activity Report (Section 8.3.5) and is now published in Journal of Multivariate Analysis 15.
Streaming constrained binary logistic regression with online standardized data. This article in collaboration with E. Albuisson was presented in the 2020 Activity Report (Section 8.3.5) and is now accepted in Journal of Applied Statistics 13.
Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression. This article in collaboration with E. Albuisson was presented in the 2020 Activity Report (Section 8.3.5) and is being submitted 30, 63.
Stochastic approximation of eigenvectors and eigenvalues of the Qsymmetric expectation of a random matrix. Application to streaming PCA. In this analysis, we have studied the convergence of stochastic approximation processes of the Oja type for estimating eigenvectors of the unknown $Q$symmetric expectation $B$ of a random matrix, the metric $Q$ being unknown. We have established a theorem of a.s. convergence of these processes with assumptions on the noisy observations ${B}_{n}$ of $B$ that are more general than in previous results. The estimation of eigenvectors corresponding to eigenvalues of $B$ in decreasing order is obtained using at step $n$ a GramSchmidt orthonormalization with respect to a random metric ${Q}_{n+1}$ such that ${Q}_{n+1}$ converges a.s. to $Q$ as $n$ goes to infinity. We have proved the a.s. convergence of specific processes to corresponding eigenvalues. Corollaries of this theorem apply in particular to cases where $E\left[{B}_{n}\right{T}_{n}]$ or ${B}_{n}$ converges a.s. to $B$ which were studied by Monnez and Skiredj 15. In the case of a process using only the current observations at each step, we have suggested constructing another process using past and current observations. We have applied these results to the online estimation of principal components in streaming PCA of a random vector, taking into account all the observations up to the current step with possibly different weights assigned to past and current observations.
Other applications to methods related to PCA such as generalized canonical correlation analysis are in progress.
8.3.3 Changepoint detection thresholds in the sequential context
To apply our algorithms of changepoint to real data, we turned to some EMG signal data provided by INRS. The study concerns the development of trapezius muscle myalgia in the workplace. We apply changepoint detection to characterize different computer activities carried out during an experimental day. Our analysis allowed us to characterize activities according to the frequency and amplitude of jumps and to distinguish office activities using the mouse from those using the keyboard. This work was presented in a conference paper 19.
8.4 Statistical learning and application in health
Participants: Nicolas Champagnat, Léo Darrigade, Sandie Ferrigno, Coralie Fritsch, Anne GégoutPetit, Aurélie Gueudin, Ulysse Herbach, Benoît Lalloué, Rodolphe Loubaton, JeanMarie Monnez, Anouk Rago, Nicolas Thorr, Pierre Vallois, Sophie WantzMézières.
8.4.1 Analysis of diffuse lowgrade gliomas growth
In the aim of understanding the growth of lowgrade glioma, we investigate multiple fields of information available in clinical practice: patientrelated predictors, variables related to tumor tissue and genetics. Monitoring growth through regular MRIs gives us access to many imagingrelated variables, including an original one measuring tumor infiltration (thesis defended in 2021: Cyril Brzenczek CRAN, article in preparation). Our last efforts have focused on the statistical analysis of the database composed of these variables. We have obtained a regional fund PACTE to host this database and use it for teaching, dissemination and development of experimentation tools: PIANO platform.
Join work with J.M. Moureaux, Y. Gaudeau (CRAN), F. Rech, L. Taillandier, M. Blonski, T. Obara (CHRU Nancy)
8.4.2 Estimation of reference curves for fetal weight
In Epidemiology, we are working with INSERM to study fetal development in the last two trimesters of pregnancy. Reference or standard curves are required in this kind of biomedical problems. Values that lie outside the limits of these reference curves may indicate the presence of a disorder. Data are from the French EDEN motherchild cohort (INSERM). It's a motherchild cohort study investigating the prenatal and early postnatal determinants of child health and development. 2002 pregnant women were recruited before 24 weeks of amenorrhoea in two maternity clinics from middlesized French cities (Nancy and Poitiers). From May 2003 to September 2006, 1899 newborns were then included. The main outcomes of interest are fetal (via ultrasound) and postnatal growth, adiposity development, respiratory health, atopy, behaviour and bone, cognitive and motor development. We are studying fetal weight and height as a function of the gestional age in the third trimester of pregnancy. Some classical empirical and parametric methods such as polynomial regression are first used to construct these curves. For instance, polynomial regression is one of the most common parametric approaches for modeling growth data, especially during the prenatal period. However, these classical methods require strong assumptions. We therefore propose to work with semiparametric LMS methods, by modifying the response variable (fetal weight) with, among others, Box–Cox transformations. A first article detailing these methodologies applied to the EDEN data should be submitted next year and is the object of the communication 31.
Alternative nonparametric methods as NadarayaWatson kernel estimation, local polynomial estimation, Bsplines or cubic splines are also developed in this context to construct these curves. The practical implementation of these methods required working on smoothing parameters or choice of knots for the different types of nonparametric estimation. In particular, optimal choice of these parameters has been proposed. Then, a first version of an R package has been developed to propose a tool to construct nonparametric reference curves. It will soon be available on GitHub. In addition, a graphical interface (GUI) intended for practitioners is being developed to allow intuitive visualization of the results given by the package and an article is in progress.
Join work with Myriam MaumyBertrand (IRMA, Université de Strasbourg) and INSERM.
8.4.3 Construction of parsimonious event risk scores by an ensemble method. An illustration for shortterm predictions in chronic heart failure patients from the GISSIHF trial
This article in collaboration with E. Albuisson and D. Lucci was presented in the 2020 Activity Report (Section 8.4.2) and is now published in Applied Mathematics 14.
8.4.4 Prediction of silencing experiments on gene networks for chronic lymphocytic leukemia
We are working with L. Vallat (CHRU Strasbourg) on the inference of dynamical gene networks from RNAseq and proteome data. The goal is to infer a model of gene expression allowing to predict the gene expression in cells where the expression of genes is silenced (e.g. using siRNA), in order to select the silencing experiments which are more likely to reduce the cell proliferation. We expect the selected genes to provide new therapeutic targets for the treatment of chronic lymphocytic leukemia. This year, we addressed the general problem of prediction as defined above, and constructed and proposed an inference method for a new gene network model for which such a prediction is possible. Next year, we expect to identify potential therapeutic targets for which silencing experiments could be conducted.
8.4.5 A statistical methodology to select covariates in highdimensional data under dependence. Application to the classification of genetic profiles in oncology
We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of highdimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. A simulation study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced nonsmallcell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments. This work is published in 2 and is implemented in R package ‘ARMADA’.
In collaboration with T. Boukhobza and H. Dumond from CRAN, and B. Bastien from biopharmaceutical industry Transgene.
8.4.6 Projects linked with the COVID 19 pandemic
Seroprevalence study Pierre Vallois is the scientific coordinator of the seroprevalence study COVAL Nancy held in Nancy in July 2020 in collaboration with CHRU de Nancy (CIC épidémiologie clinique and Laboratoire de Virologie).
Background. The World Health Organisation recommends monitoring the circulation of severe acute respiratory syndrome coronavirus 2 (SARSCoV2). We aimed to estimate anti–SARSCoV2 total immunoglobulin (IgT) antibody seroprevalence and describe symptom profiles and in vitro seroneutralization in Nancy, France, in spring 2020.
Methods. Individuals were randomly sampled from electoral lists and invited with household members over 5 years old to be tested for anti–SARSCoV2 (IgT, i.e. IgA/IgG/IgM) antibodies by ELISA (Biorad). Serum samples were classified according to seroneutralization activity 50 % (NT50) on Vero CCL81 cells. Age and sexadjusted seroprevalence was estimated. Subgroups were compared by chisquare or Fisher exact test and logistic regression.
Results. Among 2006 individuals, 43 were SARSCoV2–positive; the raw seroprevalence was 2.1 % (95 % confidence interval 1.5 to 2.9), with adjusted metropolitan and national standardized seroprevalence 2.5 % (1.8 to 3.3) and 2.3 % (1.7 to 3.1). Seroprevalence was highest for 20 to 34yearold participants (4.7 % [2.3 to 8.4]), within than out of socially deprived area (2.5 % vs 1 %, P=0.02) and with than without intrafamily infection (p<106). Moreover, 25 % (23 to 27) of participants presented at least one COVID19 symptom associated with SARSCoV2 positivity (p<1013), with anosmia or ageusia highly discriminant (odds ratio 27.8 [13.9 to 54.5]), associated with dyspnea and fever. Among the SARSCoV2positives, 16.3 % (6.8 to 30.7) were asymptomatic. For 31 of these individuals, positive seroneutralization was demonstrated in vitro.
Conclusions. In this population of very low antiSARSCoV2 antibody seroprevalence, a beneficial effect of the lockdown can be assumed, with frequent SARSCoV2 seroneutralization among IgTpositive patients.
The results were published first in Medrxiv corresponding to 27 and in a peerreviewed international journal 11.
SARS‐CoV‐2 positive patients in hospital predictions Participants : A. GégoutPetit, U. Herbach, N. Thorr.
In collaboration with H. Berry, D. Gemmerlé, T. Lepoutre, D. Maucourt and D. Parsons.
We followed Inria's recommendations to get involved in the fight against COVID 19. We tried to collaborate with the LCPME laboratory in the purpose to predict the number of SARS‐CoV‐2 positive patients from the Grand Nancy metropolitan at the Nancy University Hospital from the concentration of SARSCov2 residues in waste water. We have encountered difficulties with the Obépine network in obtaining raw data rather than mere indicators. We made predictions from the incidence rates available on Santé Publique France. The predictions are available on the siwam website. Inria hired Nicolas Thorr as engineer during 6 months for this project.
9 Bilateral contracts and grants with industry
Participants: Bruno Scherrer.
9.1 Bilateral contracts with industry
B. Scherrer collaborates with Google Brain on reinforcement learning in the framework of the PhD thesis of Nino Vieillard.
10 Partnerships and cooperations
Participants: Nicolas Champagnat, Léo Darrigade, Coralie Fritsch, Anne GégoutPetit, Ulysse Herbach, Joseph LamWeil, JeanMarie Monnez, Aurélie MullerGueudin, Pierre Vallois, Denis Villemonnais, Sophie WantzMézières.
10.1 International initiatives
10.1.1 Participation in International Programs
BRN

Title:
Biostochastic Research Network

Partner Institution(s):
 Universidad de Valparaiso (Chile)  CIMFAV – Facultad de Ingenieria  Soledad Torres, Rolando Rebolledo.
 CNRS, Inria & Institut Élie Cartan de Lorraine (France)  N. Champagnat, A. Lejay (coordinator for France), D. Villemonnais, R. Schott.

Date/Duration:
2018–2022

Goal:
Scientific exchange around probabilistic models in population ecology.
10.2 National initiatives
 FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing). Leader: Pr Athanase Benetos. Participants: JeanMarie Monnez, Benoît Lalloué, Anne GégoutPetit.
 RHU Fight HF (Fighting Heart Failure), located at the University Hospital of Nancy. Leader: Pr Patrick Rossignol). Participants: JeanMarie Monnez, Benoît Lalloué.
 ITMO Physics, Mathematics applied to Cancer (20172022): “Modeling ctDNA dynamics for detecting targeted therapy resistance”. Funding organisms: ITMO Cancer, ITMO Technologies pour la santé de l’alliance nationale pour les sciences de la vie et de la santé (AVIESAN), INCa. Partners: Inria and IECL (Institut Élie Cartan de Lorraine), CHRU Strasbourg, CRAN (Centre de Recherche en Automatique de Nancy) and ICL (Institut de Cancérologie de Lorraine). Leader: N. Champagnat. Participants: L. Darrigade, C. Fritsch, A. GégoutPetit, U. Herbach, A. MullerGueudin, P. Vallois.
 GDR 720 ISIS (funded by CNRS). Leader: Laure BlancFéraud. Participant: Sophie Mézières.
10.3 Regional initiatives
 CHRU de Nancy. We have good collaborations with several researchers from CHRU de Nancy. We are involved in LUE Impact Geenage in research axis telomeres.
 Région GrandEst. In the context of the Telomere project, Anne GégoutPetit and Denis Villemonais obtained a grant from GrandEst region to hire Joseph LamWeil as a postdoctoral fellow. University of Lorraine and LUE GEENAGE program completed the grant.
11 Dissemination
Participants: Thierry Bastogne, Nicolas Champagnat, Léo Darrigade, Sandie Ferrigno, Coralie Fritsch, Anne GégoutPetit, Vincent Hass, Ulysse Herbach, Joseph LamWeil, Rodolphe Loubaton, JeanMarie Monnez, Aurélie MullerGueudin, Pierre Vallois, Denis Villemonnais, Sophie WantzMézières, Nicolás Zalduendo Vidal.
11.1 Promoting scientific activities
11.1.1 Scientific events: organisation
Member of the organizing committees
 N. Champagnat, C. Fritsch and U. Herbach organized the workshop Modélisation de l’hétérogénéité tumorale et thérapies ciblées (IECL, Univ. Lorraine, 21–22 Oct.).
 N. Champagnat is member of the organizing committee of the conference A Random Walk in the Land of Stochastic Analysis and Numerical Probability in honor of Denis Talay (CIRM, Luminy, 37 Jan. 2022). Unfortunately, this event is postponed due to sanitary restrictions.
 N. Champagnat was member of the organizing committee of the Journée Scientifique FCH "Covid" (Inria Nancy  Grand Est, 28 Sep.).
 U. Herbach coorganized the IBOMAN 2021 conference (Institut Curie, Paris, 2527 Oct.).
 A. GégoutPetit coorganized the formation session Journée d'études en statistique : Données manquantes (SFdS, Fréjus 1519 Nov.).
11.1.2 Scientific events: selection
Member of the conference program committees
 N. Champagnat is a member of the scientific committee for the 53èmes Journées de Statistiques (Lyon, 1317 June 2022).
11.1.3 Journal
Member of the editorial boards
 N. Champagnat served as coeditorinchief with Béatrice LaurentBonneau (IMT Toulouse) of ESAIM: Probability & Statistics until June. Since then, he served as an associate editor of this journal.
 N. Champagnat serves as an associate editor of Stochastic Models.
Reviewer  reviewing activities.
Here is a selection of the journals for which we regularly write referee reports: Bernoulli, Cell, Medicina, The Annals of Applied Probability, Stochastic Processes and their Applications, Journal de Mathématiques Pures et Appliquées, ALEA  Latin American Journal of Probability and Mathematical Statistics, ESAIM: Probability & Statistics, Journal of Theoretical Biology, Mathematical Biosciences, Journal of Physics A: Mathematical and Theoretical, Current Opinion in Systems Biology, Bioinformatics...
11.1.4 Invited talks
 N. Champagnat has been invited to give a talk at the online workshop Biostochastic Networks 2021 in November and at the online Journées Scientifiques du GE2MI sur le thème EDP et modèles biomathématiques in December.
 V. Hass has been invited to give talks at the École de la Chaire MMB in Aussois in June, at the Journées de probabilités in Guidel in June and at the Colloque des Jeunes Probabilistes et Statisticiens (JPS2021) in SaintPierre d’Oléron in October.
 R. Loubaton has been invited to give a talk at the Congrès des Jeunes Chercheurs en Mathématiques Appliquées 2021 (CJCMA 2021) in Palaiseau in October.
 N. Zalduendo Vidal has been invited to give a talk at the Research School of the Chaire Modélisation Mathématique et Biodiversité at Aussois in June and at the Etheridge Group Seminar at Oxford University in June.
 U. Herbach has been invited to give a talk at the Biohasard 2021 online conference in June.
11.1.5 Leadership within the scientific community
 A. GégoutPetit is vicepresident of the European Network for Business and Industrial Statistics (ENBIS).
11.1.6 Scientific expertise
 C. Fritsch has been a member of the Committee for junior permanent research positions of Inria Nancy  Grand Est.
 A. GégoutPetit has been a member of several hiring committees: as President for Université Technologique de Compiègne (MCF 26e section); Sorbonne Université (PR 26th section); National jury for 46.1 Professor recruitment; University of Luxembourg, Assistant professor in statistics.
11.1.7 Research administration
 N. Champagnat is a member of the coordination committee of MODCOV19, a platform of coordination of research actions about modeling of SARSCoV2 (Covid19) pandemic. He heads the bibliographic awareness group.
 N. Champagnat is a member of the Comité de Centre, the COMIPERS and the Commission Information Scientifique et Technique of Inria Nancy  Grand Est and Responsable Scientifique for the library of Mathematics of the IECL. He is also local correspondent of the COERLE (Comité Opérationel d'Évaluation des Risques Légaux et Éthiques) for the Inria Research Center of Nancy  Grand Est.
 C. Fritsch is a member of the Commission du Développement Technologique of Inria NancyGrand Est and of the Commission du personnel of IECL. She was member of the Commission ParitéÉgalité of IECL until August. She is the local Radar correspondent for the Inria Research Center of Nancy  Grand Est.
 A. GégoutPetit is the head of “Institut Élie Cartan de Lorraine” (mathematics laboratory of Université de Lorraine).
11.2 Teaching  Supervision  Juries
11.2.1 Teaching
BIGS faculty members have teaching obligations at Université de Lorraine and are teaching at least 192 hours each year. They teach probability and statistics at different levels (Licence, Master, Engineering school). Many of them have pedagogical responsibilities.
 D. Villemonais is the head of the Mathematical Engineering Major of ENSMN, Université de Lorraine, France.
 T. Bastogne is in charge of the research master program “Santé Numérique et Imagerie Médicale” with the Faculty of Medicine, Université de Lorraine, France.
 Master: N. Champagnat, Introduction to Quantitative Finance, 12h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: N. Champagnat, Introduction to Quantitative Finance, 9h, M2, third year of ENSMN, Université de Lorraine, France.
 Master: N. Champagnat, Problèmes inverses, 15h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: S. Ferrigno, Experimental designs, 4.5h, M1, fourth year of EEIGM, Université de Lorraine, France.
 Master: S. Ferrigno, Data analyzing and mining, 63h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: S. Ferrigno, Modeling and forecasting, 43h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: S. Ferrigno, Training projects, 18h, M1/M2, second and third year of ENSMN, Université de Lorraine, France.
 Master: A. MullerGueudin, Probability and Statistics, 160h, second year of ENSEM and ENSAIA, Université de Lorraine, France.
 Master: A. MullerGueudin, Scientific calculation with Matlab, 20h, second year of ENSAIA, Université de Lorraine, France.
 Master: A. GégoutPetit, Statistics, modeling, data analysis, 80h, master in applied mathematics, Université de Lorraine, France.
 Master: U. Herbach, Data analyzing and mining tutorial, 18h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: U. Herbach, Introduction to probability theory, 18h, M1, second year of ENSEM (apprenticeship cursus), Université de Lorraine, France.
 Master: R. Loubaton, Analyse de données, 16h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: R. Loubaton, Introduction à l'apprentissage automatique, 6h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: S. WantzMézières, Learning and analysis of medical data, 36h, with J.M. Moureaux, Master SNIM, Université de Lorraine, France.
 Master: D. Villemonais, Probability Theory II, 63h, M1, second year of ENSMN, Université de Lorraine, France.
 Master: D. Villemonais, Stochastic processes, 32h, Master 2 MFA, Université de Lorraine, France.
 Master: D. Villemonais, Modeling and forecasting, 14h, M1, second year of ENSMN, Université de Lorraine, France.
 Licence: S. WantzMézières, Applied mathematics for management, financial mathematics, Probability and Statistics, 160h, IUT NancyCharlemagne (L1/L2/L3), Université de Lorraine, France.
 Licence: S. WantzMézières, Probability, 100h, first year in TELECOM Nancy (initial and apprenticeship cursus), Université de Lorraine, France.
 Licence: A. MullerGueudin, Statistics, 60h, first year of ENSAIA, Université de Lorraine, France.
 Licence: S. Ferrigno, Descriptive and inferential statistics, 60h, L2, second year of EEIGM, Université de Lorraine, France.
 Licence: S. Ferrigno, Statistical modeling, 60h, L2, second year of EEIGM, Université de Lorraine, France.
 Licence: S. Ferrigno, Mathematical and computational tools, 20h, L3, third year of EEIGM, Université de Lorraine, France.
 Licence: S. Ferrigno, Training projects, 40h, L1/L3, first, second and third year of EEIGM, Université de Lorraine, France.
 Licence: C. Fritsch, Probability Theory tutorial, 40h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: V. Hass, Complément d'analyse, 38h, L1, FST, Université de Lorraine, France.
 Licence: V. Hass, Analyse numérique et optimisation, 46h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: V. Hass, Probabilités, 40h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: V. Hass, Mathématiques FIGIM 1A, 35h, L1/L2, first year of ENSMN, Université de Lorraine, France.
 Licence: V. Hass, Mathématiques FIGIM 2A, 21h, L2, first year of ENSMN, Université de Lorraine, France.
 Licence: U. Herbach, Statistics tutorial, 39h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: R. Loubaton, Inférence statistique, 21h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: R. Loubaton, Probabilités, 20h, L2, FST, Université de Lorraine, France.
 Licence: R. Loubaton, FST, Méthodes Numériques, 10h, L2, FST, Université de Lorraine, France.
 Licence: R. Loubaton, Latex, 9h, L2, FST, Université de Lorraine, France.
 Licence: R. Loubaton, Remédiation mathématique, 30h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: R. Loubaton, Analyse numérique et optimisation, 40h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: D. Villemonais, Probability Theory, 57h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: N. Zalduendo Vidal, Probability Theory tutorial, 40h, L3, first year of ENSMN, Université de Lorraine, France.
 Licence: N. Zalduendo Vidal, Numerical Analysis tutorial, 20h, L3, first year of ENSMN, Université de Lorraine, France.
11.2.2 Supervision
PhD
 PhD: Nassim Sahki, “Datadriven methodology for sequential changepoint detection for physiological signals”, grant InriaCordis. Defence 29 Nov 2021. Advisors: A. GégoutPetit, S. WantzMézières 23.
 PhD in progress: Vincent Hass, “Individualbased models in adaptive dynamics and long time evolution under assumptions of rare advantageous mutations”, grant InriaCordis. Advisor: N. Champagnat.
 PhD in progress: Rodolphe Loubaton, “Caractérisation des cibles thérapeutiques dans un programme génique tumoral”, grant Région GrandEst. Advisors: N. Champagnat and L. Vallat (CHRU Strasbourg).
 PhD in progress: Anouk Rago, “Inférence de réseaux de gènes dynamiques et prédiction d’expériences d’interventions biologiques dans des cellules cancéreuses”, grant Région GrandEst, Inria. Advisors: N. Champagnat, A. GégoutPetit.
 PhD in progress: Nino Vieillard, "Approximate Dynamic Programming and Deep Reinforcement Learning", CIFRE with Google Brain. Advisors: B. Scherrer, M. Geist (Google Brain).
 PhD in progress: Nicolás Zalduendo Vidal, “Processus de branchement bisexués multitypes”, grant InriaCordis. Advisors: C. Fritsch, D. Villemonais.
Other
 Parcours Recherche: Asmaa Labtaina, “Processus de Markov déterministes par morceaux et leur application à l’expression stochastique des gènes” (fullyear research project, M1 ENSMN). Advisor: U. Herbach.
 TER: Mohammed Khatbane and Abdelkabir Bouyghf, “Méthodes variationnelles en apprentissage statistique : l’exemple du modèle Latent Dirichlet Allocation” (research project, M1 Univ. Lorraine). Advisor: U. Herbach.
11.2.3 Juries
 PhD: N. Champagnat, reporter, thesis of Maxime Berger, “Le comportement critique de la quasiespèce”, Université PSL.
 PhD: N. Champagnat, reporter, thesis of Felipe MunozHernandez, “Approximation quantitative en grande population de modèles stochastiques avec interaction ou environnement variable”, Institut Polytechnique de Paris.
 PhD: N. Champagnat, reporter, thesis of Julie Tourniaire, “Spatial dynamics of interfaces in ecology: deterministic and stochastic models”, Institut Polytechnique de Paris.
 PhD: A. GégoutPetit, president, thesis of A. Conanec Rago, Université de Bordeaux.
 PhD: A. GégoutPetit, president, thesis of S. Yacheur, Université de Lorraine.
 PhD Prize: A. GégoutPetit, jury member, 2021 AMIES PhD Prize.
 PhD: B. Scherrer, reporter, thesis of Yannis FletBerliac, “SampleEfficient Deep Reinforcement Learning for Control, Exploration and Safety”, Université de Lille.
 PhD: B. Scherrer, reporter, thesis of Marc Etheve, “Using machine learning to solve repeated optimization problems”, CNAM Paris (CIFRE with EDF).
11.3 Popularization
11.3.1 Education
 S. Ferrigno: Advisor of a group of students (EEIGM), "Traitement statistique de données" project, various high schools, Nancy, 2021
 S. Ferrigno: Advisor of a group of students (EEIGM), "La main à la Pâte" project, Institut médicoéducatif (IME), Commercy, OctoberNovember 2021
 S. Mézières: organisation of a research training week on Neurooncology and Numerics, for medical and engineering students, January 2021
11.3.2 Interventions
 C. Fritsch made two interventions in the Lycée Cormontaigne in Metz, as part of the “Chiche!” program, in December 2021.
12 Scientific production
12.1 Publications of the year
International journals
 1 articlecvmgof: an R package for Cramérvon Mises goodnessoffit tests in regression models.Journal of Statistical Computation and SimulationOctober 2021
 2 articleA statistical methodology to select covariates in highdimensional data under dependence. Application to the classification of genetic profiles in oncology.Journal of Applied Statistics2021, 23
 3 articleiQbD: a TRLindexed qualitybydesign paradigm for medical device engineering.Journal of Medical Devices2021
 4 articleStochastic approximation of quasistationary distributions for diffusion processes in a bounded domain.Annales de l'Institut Henri Poincaré (B) Probabilités et Statistiques5722021, 726739
 5 articleStochastic analysis of emergence of evolutionary cyclic behavior in population dynamics with transfer.Annals of Applied Probability3142021, 18201867
 6 articleAnalysis of distributed systems via quasistationary distributions.Stochastic Analysis and Applications3662021, 981998
 7 articleConvergence of the FlemingViot process toward the minimal quasistationary distribution.ALEA : Latin American Journal of Probability and Mathematical Statistics182021, 115
 8 articleLyapunov criteria for uniform convergence of conditional distributions of absorbed Markov processes.Stochastic Processes and their Applications135May 2021, 5174

9
articleStochastic Methods for Neutron Transport Equation III: Generational manytoone and
${k}_{\mathrm{\U0001d68e\U0001d68f\U0001d68f}}$ .SIAM Journal on Applied Mathematics813May 2021  10 articleIdentifying conversion efficiency as a key mechanism underlying food webs evolution : a step forward, or backward ?Oikos13062021, 904930
 11 articleSeroprevalence of SARSCoV2, Symptom Profiles and SeroNeutralization in a Suburban Area, France.Viruses136June 2021, 1076
 12 articleTransverse isotropic modelling of leftventricle passive filling: mechanical characterization for epicardial biomaterial manufacturing.Journal of the mechanical behavior of biomedical materials119July 2021, 104492
 13 articleStreaming constrained binary logistic regression with online standardized data.Journal of Applied Statistics2021
 14 articleConstruction of Parsimonious Event Risk Scores by an Ensemble Method. An Illustration for ShortTerm Predictions in Chronic Heart Failure Patients from the GISSIHF Trial.Applied Mathematics127July 2021, 627653
 15 articleWidening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods.Journal of Multivariate Analysis182March 2021, 104694
 16 articleWhat matters to patients? A mixed method study of the importance and consideration of oncology patient demands.BMC Health Services Research212021, 256
International peerreviewed conferences
 17 inproceedingsOffline Reinforcement Learning with Pseudometric Learning.38th International Conference on Machine Learning139virtual, FranceJune 2021
 18 inproceedingsOffline Reinforcement Learning as AntiExploration.36th AAAI Conference on Artificial IntelligenceVancouver, CanadaFebruary 2022
 19 inproceedingsDetection of breaks in EMG signals of upper trapezius muscle activity.JDS 2021  52èmes Journées de Statistique de la SFdSNice / Virtual, FranceJune 2021
Conferences without proceedings
 20 inproceedingseasyQBD: A quality by design SaaS platform. Application to the development of lipid nanoparticles for mRNA delivery..6th Bioproduction CongressLyon, FranceSeptember 2021
 21 inproceedingsTuning LNPs to target antigen presenting cells in spleen induces CD8 Tcell responses and tumor regression in mice.18th CIMT Annual MeetingMainz, GermanyMay 2021
 22 inproceedingsRationally designed mRNAloaded lipid nanoparticles provoke strong antitumor T cell immunity which critically depends on specific immune cell subsets.Annual Meeting of the Controlled Release Society, CRS 2021Virtual, United StatesJuly 2021
Doctoral dissertations and habilitation theses
 23 thesisDatadriven methodology for sequential changepoint detection for physiological signals.Université de Lorraine; École doctorale IAEM Lorraine  Informatique, Automatique, Électronique  Électrotechnique, Mathématiques de LorraineNovember 2021
Reports & preprints
 24 miscDegenerate processes killed at the boundary of a domain.2021
 25 miscTranscritical bifurcation for the conditional distribution of a diffusion process.December 2021
 26 miscUniversality of cell differentiation trajectories revealed by a reconstruction of transcriptional uncertainty landscapes from singlecell transcriptomic data.February 2021
 27 miscSeroprevalence of SARSCoV2, symptom profiles and seroneutralization during the first COVID19 wave in a suburban area, France.June 2021
 28 miscGene regulatory network inference from singlecell data using a selfconsistent proteomic field.October 2021
 29 miscFluctuations of balanced urns with infinitely many colours.November 2021
 30 miscConstruction and update of an online ensemble score involving linear discriminant analysis and logistic regression.February 2021
Other scientific publications
 31 inproceedingsSemiparametric reference curves for EDEN cohort.CMStatistics 2021Londres, United KingdomDecember 2021
12.2 Other
Educational activities
 32 unpublishedCours d'analyse des données et apprentissage : L'analyse en composantes principales.April 2021, MasterFrance
 33 unpublishedMéthodes de classification non supervisée.April 2021, MasterFrance
Softwares
 34 softwareHarissa: tools for mechanistic gene network inference from singlecell data.October 2021BSD 3Clause "New" or "Revised" License
12.3 Cited publications
 35 articleNonParametric Estimation of the Conditional Distribution of the Interjumping Times for PiecewiseDeterministic Markov Processes.Scandinavian Journal of Statistics414December 2014, 950969
 36 softwarecvmgof: Cramervon Mises goodnessoffit tests.1.0.0November 2018CeCILL
 37 articleOptimal choice among a class of nonparametric estimators of the jump rate for piecewisedeterministic Markov processes.Electronic journal of statistics 2016
 38 articleA recursive nonparametric estimator for the transition kernel of a piecewisedeterministic Markov process.ESAIM: Probability and Statistics182014, 726749
 39 inproceedingsNonparametric estimation of the jump rate for nonhomogeneous marked renewal processes.Annales de l'Institut Henri Poincaré, Probabilités et Statistiques494Institut Henri Poincaré2013, 12041231
 40 incollectionSemiparametric estimation of the longrange dependence parameter: a survey.Theory and applications of longrange dependenceBirkhauser Boston2003, 557577
 41 articleIdentification of pharmacokinetics models in the presence of timing noise.Eur. J. Control1422008, 149157URL: http://dx.doi.org/10.3166/ejc.14.149157
 42 articlePhenomenological modeling of tumor diameter growth based on a mixed effects model.Journal of theoretical biology26232010, 544552
 43 bookNeurodynamic Programming.Athena Scientific1996
 44 articleMultioperator Scaling Random Fields.Stochastic Processes and their Applications12111MAP5 2011012011, 26422677
 45 articleA fast and recursive algorithm for clustering large datasets with kmedians.Computational Statistics & Data Analysis5662012, 14341449
 46 articleCompetitive or weak cooperative stochastic LotkaVolterra systems conditioned on nonextinction.Journal of mathematical biology6062010, 797829
 47 articleExponential convergence to quasistationary distribution and Qprocess.Probability Theory and Related Fields164146 pages2016, 243283
 48 articleSimulation and identification of the fractional brownian motion: a bibliographical and comparative study.Journal of Statistical Software52000, 153
 49 articlePiecewisedeterministic Markov processes: A general class of nondiffusion stochastic models.Journal of the Royal Statistical Society. Series B (Methodological)1984, 353388
 50 articleRough Volterra equations. I. The algebraic integration setting.Stoch. Dyn.932009, 437477URL: http://dx.doi.org/10.1142/S0219493709002737
 51 articleStatistical estimation of a growthfragmentation model observed on a genealogical tree.Bernoulli2132015, 17601799
 52 articleUn test d'adéquation global pour la fonction de répartition conditionnelle.C. R. Math. Acad. Sci. Paris34152005, 313316URL: http://dx.doi.org/10.1016/j.crma.2005.07.003
 53 articleUniform law of the logarithm for the local linear estimator of the conditional distribution function.C. R. Math. Acad. Sci. Paris34817182010, 10151019URL: http://dx.doi.org/10.1016/j.crma.2010.08.003
 54 articleSparse inverse covariance estimation with the graphical lasso.Biostatistics932008, 432441
 55 articleGraph selection with GGMselect.Statistical applications in genetics and molecular biology1132012
 56 inproceedingsLower Bounds for Howard's Algorithm for Finding Minimum MeanCost Cycles.ISAAC (1)2010, 415426
 57 articleFrom persistent random walk to the telegraph noise.Stoch. Dyn.1022010, 161196URL: http://dx.doi.org/10.1142/S0219493710002905
 58 incollectionModeling subtilin production in bacillus subtilis using stochastic hybrid systems.Hybrid Systems: Computation and ControlSpringer2004, 417431
 59 articleMultinomial modelbased formulations of TCP and NTCP for radiotherapy treatment planning.Journal of Theoretical Biology2791June 2011, 5562URL: http://hal.inria.fr/hal00588935/en
 60 bookQuantile regression.38Cambridge university press2005
 61 bookStatistical inference for ergodic diffusion processes.Springer Series in StatisticsLondonSpringerVerlag London Ltd.2004, xiv+481
 62 articleReal Harmonizable Multifractional Lévy Motions.Ann. Inst. Poincaré.4032004, 259277
 63 inproceedingsConvergence d'un score d'ensemble en ligne : étude empirique.52e Journées de StatistiqueSociété Française de StatistiqueNice, FranceJuly 2020
 64 incollectionOn the Benzecri's method for computing eigenvectors by stochastic approximation (the case of binary data).Compstat 1974 (Proc. Sympos. Computational Statist., Univ. Vienna, Vienna, 1974)ViennaPhysica Verlag1974, 202211
 65 inproceedingsNonStationary Approximate Modified Policy Iteration.ICML 2015Lille, FranceJuly 2015
 66 bookSystem control and rough paths.Oxford mathematical monographsClarendon Press2002, URL: http://books.google.com/books?id=H9fRQNIngZYC
 67 articleHighdimensional graphs and variable selection with the lasso.The Annals of Statistics2006, 14361462
 68 articleApproximation stochastique en analyse factorielle multiple.Ann. I.S.U.P.5032006, 2745
 69 articleConvergence d'un processus d'approximation stochastique en analyse factorielle.Publ. Inst. Statist. Univ. Paris3811994, 3755
 70 articleStochastic approximation of the factors of a generalized canonical correlation analysis.Statist. Probab. Lett.78142008, 22102216URL: http://dx.doi.org/10.1016/j.spl.2008.01.088
 71 articleOn nonparametric estimates of density functions and regression curves.Theory of Probability & Its Applications1011965, 186190
 72 techreportThe simplex method is strongly polynomial for deterministic Markov decision processes.arXiv:1208.5083v22012
 73 bookMarkov Decision Processes.Wiley, New York1994
 74 inproceedingsBrownian penalisations related to excursion lengths, VII.Annales de l'IHP Probabilités et statistiques4522009, 421452
 75 incollectionElements of stochastic calculus via regularization.Séminaire de Probabilités XL1899Lecture Notes in Math.BerlinSpringer2007, 147185URL: http://dx.doi.org/10.1007/9783540711896_7
 76 articleStochastic calculus with respect to continuous finite quadratic variation processes.Stochastics: An International Journal of Probability and Stochastic Processes70122000, 140
 77 inproceedingsApproximate Policy Iteration Schemes: A Comparison.ICML  31st International Conference on Machine Learning  2014Pékin, ChinaJune 2014
 78 articleApproximate Modified Policy Iteration and its Application to the Game of Tetris.Journal of Machine Learning Research16A parâitre2015, 16291676
 79 articleImproved and Generalized Upper Bounds on the Complexity of Policy Iteration.Mathematics of Operations ResearchMarkov decision processes ; Dynamic Programming ; Analysis of AlgorithmsFebruary 2016
 80 inproceedingsOn the Use of NonStationary Policies for Stationary InfiniteHorizon Markov Decision Processes.NIPS 2012  Neural Information Processing SystemsSouth Lake Tahoe, United StatesDecember 2012
 81 articlePerformance Bounds for Lambda Policy Iteration and Application to the Game of Tetris.Journal of Machine Learning Research14January 2013, 11751221
 82 articleMemorybased persistence in a counting random walk process.Phys. A.38612007, 303307URL: http://dx.doi.org/10.1016/j.physa.2007.08.027
 83 articleThe range of a simple random walk on Z.Advances in applied probability1996, 10141033
 84 misc An introduction to network inference and mining.(consulté le 22/07/2015)2015, URL: http://www.nathalievilla.org/doc/pdf//wikistatnetwork_compiled.pdf
 85 articleThe Simplex and PolicyIteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate.Math. Oper. Res.3642011, 593603