cvmgof: an R package for Cramér-von Mises goodness-of-fit tests in regression models

BIGS Biology, genetics and statistics

Computational Biology

Digital Health, Biology and Earth

http://team.inria.fr/bigs Institut Elie Cartan de Lorraine (IECL) CNRS, Université de Lorraine Creation of the Project-Team: 2011 January 01 Project-Team A3.1. - Data A3.2. - Knowledge A3.2.3. - Inference A3.3. - Data and knowledge analysis A3.3.1. - On-line analytical processing A3.3.2. - Data mining A3.3.3. - Big data analysis A3.4.1. - Supervised learning A3.4.2. - Unsupervised learning A3.4.4. - Optimization and learning A3.4.7. - Kernel methods A6. - Modeling, simulation and control A6.1. - Methods in mathematical modeling A6.1.2. - Stochastic Modeling A6.2. - Scientific computing, Numerical Analysis & Optimization A6.2.3. - Probabilistic methods A6.2.4. - Statistical methods A6.4. - Automatic control A6.4.2. - Stochastic control B1. - Life sciences B1.1. - Biology B1.1.2. - Molecular and cellular biology B1.1.10. - Systems and synthetic biology B1.1.11. - Plant Biology B2.2. - Physiology and diseases B2.2.1. - Cardiovascular and respiratory diseases B2.2.3. - Cancer B2.3. - Epidemiology B2.4. - Therapies Nancy - Grand Est Nicolas Champagnat Chercheur Team leader, Inria, Senior Researcher oui Coralie Fritsch Chercheur Inria, Researcher Ulysse Herbach Chercheur Inria, Researcher Bruno Scherrer Chercheur Inria, Researcher oui Thierry Bastogne Enseignant Univ de Lorraine, Associate Professor oui Sandie Ferrigno Enseignant Univ de Lorraine, Associate Professor Anne Gégout Petit Enseignant Univ de Lorraine, Professor oui Jean-Marie Monnez Enseignant Univ de Lorraine, Emeritus oui Aurélie Muller-Gueudin Enseignant Univ de Lorraine, Associate Professor Sophie Mézières Enseignant Univ de Lorraine, Associate Professor Pierre Vallois Enseignant Univ de Lorraine, Emeritus oui Denis Villemonais Enseignant Univ de Lorraine, Associate Professor oui Leo Darrigade PostDoc Inria, from Apr 2021 Joseph Lam-Weil PostDoc Univ de Lorraine, from Jun 2021 William Ocafrain PostDoc Inria Vincent Hass PhD Univ de Lorraine, Rodolphe Loubaton PhD Univ de Lorraine, ATER Anouk Rago PhD Univ de Lorraine, from Oct 2021 Nassim Sahki PhD Univ de Lorraine, Nino Vieillard PhD Google, CIFRE Nicolás Zalduendo Vidal PhD Inria Joseph Lam-Weil Technique Univ de Lorraine, Engineer, from Apr 2021 until Jun 2021 Nicolas Thorr Technique Inria, Engineer, until Jun 2021 Emmanuelle Deschamps Assistant Inria Overall objectives

BIGS is a joint team of Inria, CNRS and Université Lorraine, via the Institut Élie Cartan, UMR 7502 CNRS-UL laboratory in mathematics, of which Inria is a strong partner. One member of BIGS, T. Bastogne, comes from the Research Center of Automatic Control of Nancy (CRAN), with which BIGS has strong relations in the domain "Health-Biology-Signal". Our research is mainly focused on stochastic modeling and statistics but also aiming at a better understanding of biological systems. BIGS involves applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) population studies of genomic data and of micro-organisms genomics, (4) epidemiology and e-health.

Research program Introduction

We give here the main lines of our research that belongs to the domains of probability and statistics. For clarity, we made the choice to structure them in four items. Although this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.

Stochastic modeling

Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells 83, 59. Piecewise deterministic processes are non diffusion processes also frequently used in the biological context 49, 58, 51. Among Markov models, we developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations 76, 57. For instance, knowledge about Brownian or random walk excursions 82, 74 helps to analyse genetic sequences and to develop inference about them. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion.

This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion of fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the so-called rough paths tools 66, but also invoking the Russo-Vallois integration techniques 75. The specific issue of Volterra equations driven by fractional Brownian motion, which is central for the subdiffusion within proteins problem, is addressed in 50. Many generalizations (Gaussian or not) of this model have been recently proposed for some Gaussian locally self-similar fields, or for some non-Gaussian models 62, or for anisotropic models 44.

Estimation and control for stochastic processes

We develop inference about stochastic processes that we use for modeling. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.

There are many estimation techniques for diffusion processes or coefficients of fractional or multifractional Brownian motion according to a set of observations 61, 40, 48. However, the inference problem for diffusions driven by a fractional Brownian motion is still in its infancy. Our team has a good expertise about inference of the jump rate and the kernel of piecewise-deterministic Markov processes (PDMP) 39, 35, 38, 37, but there are many directions to go further into. For instance, previous work made the assumption of a complete observation of jumps and mode, which is unrealistic in practice. We tackle the problem of inference of “hidden PDMP”. For example, in pharmacokinetics modeling inference, we want to account for the presence of timing noise and identification from longitudinal data. We have expertise on these subjects 41, and we also used mixed models to estimate tumor growth 42.

We consider the control of stochastic processes within the framework of Markov Decision Processes 73 and their generalization known as multi-player stochastic games, with a particular focus on infinite-horizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of 43. Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years 85, 72, 56, 79, but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes 81, 80, 77, 65, 78, and we currently investigate whether/how similar ideas may be adapted to multi-player stochastic games.

Algorithms and estimation for graph data

A graph data structure consists of a set of nodes, together with a set of pairs of these nodes called edges. This type of data is frequently used in biology because they provide a mathematical representation of many concepts such as biological structures and networks of relationships in a population. Some attention has recently been focused in the group on modeling and inference for graph data.

Network inference is the process of making inference about the link between two variables, taking into account the information about other variables. 84 gives a very good introduction and many references about network inference and mining. Many methods are available to infer and test edges in Gaussian graphical models 84, 67, 54, 55. However, the Gaussian assumption does not hold when dealing with typical “zero-inflated” abundance data, and we want to develop inference in this case.

Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model. We also work on lossy compression of trees via directed acyclic graphs. These methods enable us to compute distances between tree data faster than from the original structures and with a high accuracy.

Regression and machine learning

Regression models and machine learning aim at inferring statistical links between a variable of interest and covariates. In biological study, it is always important to develop adapted learning methods both in the context of standard data and also for data of high dimension (with sometimes few observations) and very massive or online data.

Many methods are available to estimate conditional quantiles and test dependencies 71, 60. Among them we have developed nonparametric estimation by local analysis via kernel methods 52, 53 and we want to study properties of this estimator in order to derive a measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to develop omnibus tests that examine several assumptions together.

Concerning the analysis of high dimensional data, our view on the topic relies on the French data analysis school, specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool 64, which allows one to approximate eigenvectors in a stepwise manner 69, 68, 70. BIGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms 45. We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA (Principal Component Analysis).

We also focus on the biological context of high-throughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. We have to account for the inter-individual variability within the modeling procedure. We aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (Expectation-Maximisation) algorithm for the estimation of the model parameters.

Application domains Tumor growth-oncology

On this topic, we want to propose branching processes to model the appearance of mutations in tumors, through new collaborations with clinicians who measure a particular quantity called circulating tumor DNA (ctDNA). The final purpose is to use ctDNA as an early biomarker of the resistance to an immunotherapy treatment: it is the aim of the ITMO project. Another topic is the identification of dynamic networks of gene expression. In the ongoing work on low-grade gliomas, a local database of 400 patients will be soon available to construct models. We plan to extend it through national and international collaborations (Montpellier CHU, Montreal CRHUM). Our aim is to build a decision-aid tool for personalised medicine. In the same context, there is a topic of clustering analysis of a brain cartography obtained by sensorial simulations during awake surgery.

Genomic data and micro-organisms population

Despite of his 'G' in the name of BIGS, Genetics is not central in the applications of the team. However, we want to contribute to a better understanding of the correlations between genes trough their expression data and of the genetic bases of drug response and disease. We have contributed to methods detecting proteomics and transcriptomics variables linked with the outcome of a treatment.

Epidemiology and e-health

We have many works to do in our ongoing projects in the context of personalized medicine with CHU Nancy. They deal with biomarkers research, prognostic value of quantitative variables and events, scoring, and adverse events. We also want to develop our expertise in rupture detection in a project with APHP (Assistance Publique Hôpitaux de Paris) for the detection of adverse events, earlier than the clinical signs and symptoms. The clinical relevance of predictive analytics is obvious for high-risk patients such as those with solid organ transplantation or severe chronic respiratory disease for instance. The main challenge is the rupture detection in multivariate and heterogeneous signals (for instance daily measures of electrocardiogram, body temperature, spirometry parameters, sleep duration, etc.). Other collaborations with clinicians concern foetopathology and we want to use our work on conditional distribution function to explain fetal and child growth. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy).

Dynamics of telomeres

Telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Through a collaboration with Pr A. Benetos, geriatrician at CHU Nancy, we recently obtained data on the distribution of the length of telomeres from blood cells. With members of Inria team TOSCA, we want to work in three connected directions: (1) refine methodology for the analysis of the available data; (2) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior, quasi-stationarity, etc.); and (3) use these properties to develop new statistical methods. A slot of postdoc position is already planned in the Lorraine Université d'Excellence, LUE project GEENAGE (managed by CHU Nancy).

Social and environmental responsibility

We followed Inria's recommendations to get involved in the fight against COVID 19. We tried to collaborate with the LCPME laboratory in the purpose to predict the number of SARS‐CoV‐2 positive patients from the Grand Nancy metropolitan at the Nancy University Hospital from the concentration of SARS-Cov-2 residues in waste water. We have encountered difficulties with the Obépine network in obtaining raw data instead of pre-processed indicators. We made predictions from the incidence rates available on Santé Publique France. The predictions are available on the siwam website.

We were also involved in the MODCOV19 project, a platform of coordination of research actions about modeling of SARS-CoV-2 (Covid-19) pandemic. We were in particular responsible for the bibliographic awareness group of the coordination committee.

Highlights of the year

The list of permanent members of the team noticeably increased in 2021, due to the arrival of several researchers from the former Inria team Tosca. These researchers are experts of stochastic modeling and analysis for bio-medical applications. Their arrival led to a strengthening of the first axis of our research program. We are currently proposing a new Inria team Simba which takes into account these arrivals and the recent recruitments in the past few years in our team, and more generally on the topic of mathematical biology in Institut Élie Cartan de Lorraine.

New software and platforms

The team has been developing three new packages.

New software ARMADA Name:

A Statistical Methodology to Select Covariates in High-Dimensional Data under Dependence

Keywords:

Biostatistics, Aggregated methods, High Dimensional Data, Personalized medicine, Variable selection

Functional Description:

Two steps variable selection procedure in a context of high-dimensional dependent data with few observations. A first step is dedicated to eliminate the dependency between variables (clustering of variables, followed by factor analysis inside each cluster). A second step consists in variable selection by aggregation of adapted methods.

News of the Year:

This package is a new one.

URL:

https://cran.r-project.org/web/packages/armada/

Publications:

hal-02173568, hal-02363338

Contact:

Aurélie Muller

Participants:

Aurélie Muller, Anne Gegout Petit

cvmgof Keywords:

Regression, Test, Estimators

Scientific Description:

Many goodness-of-fit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are "directional" in that they detect departures from a given assumption of the model. Other tests are "global" (or "omnibus") in that they assess whether a model fits a dataset on all its assumptions. cvmgof focuses on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. It implements 2 nonparametric "directional" tests and one nonparametric "global" test, all based on generalizations of the Cramer-von Mises statistic.

Functional Description:

cvmgof is an R library devoted to Cramer-von Mises goodness-of-fit tests. It implements three nonparametric statistical methods based on Cramer-von Mises statistics to estimate and test a regression model.

News of the Year:

New version available on CRAN website since Jan 11 2021

URL:

https://cran.r-project.org/web/packages/cvmgof/index.html

Publication:

hal-03101612

Contact:

Romain Azais

Participants:

Sandie Ferrigno, Marie-José Martinez, Romain Azais

Harissa Name:

Hartree approximation for inference along with a stochastic simulation algorithm

Keywords:

Gene regulatory networks, Reverse engineering, Molecular simulation

Functional Description:

Harissa is a Python package for both inference and simulation of gene regulatory networks, based on stochastic gene expression with transcriptional bursting. It was implemented in the context of a mechanistic approach to gene regulatory network inference from single-cell data.

URL:

https://github.com/ulysseherbach/harissa

Publications:

hal-03370296, hal-03370228, hal-01646910

Contact:

Ulysse Herbach

New results Stochastic modeling NicolasChampagnatCoralieFritschAnneGégout-PetitVincentHassUlysseHerbachWilliamOçafrainPierreValloisDenisVillemonaisNicolásZalduendo Vidal Reconstruction of epigenetic landscapes from single-cell data

The aim is to better understand how living cells make decisions (e.g., differentiation of a stem cell into a particular specialized type), seeing decision-making as an emergent property of an underlying complex molecular network. Indeed, it is now proven that cells react probabilistically to their environment: cell types do not correspond to fixed states, but rather to “potential wells” of a certain energy landscape (representing the energy of the possible states of the cell) that we are trying to reconstruct. A first paper proposing a reconstruction method has been submitted 26 in the framework of an international collaboration (USA, Switzerland, France). Another paper is about to be submitted 28, dealing more specifically with the inference of the underlying networks.

Joint work with Nan Papili Gao (ETH Zurich), Olivier Gandrillon (ENS Lyon), András Páldi (EPHE, Paris), and Rudiyanto Gunawan (University at Buffalo, New York) Modeling and estimation of circulating tumor DNA (ctDNA) dynamics for detecting resistance to targeted therapies

Continuation of the ITMO Cancer project, supervised by Nicolas Champagnat, concerning the modeling of circulating tumor DNA (ctDNA) to detect the appearance of resistance to targeted therapies (personalized medicine). After a phase of investigation of possible scenarios in collaboration with Alexandre Harlé of the Institute of Cancerology of Lorraine (ICL), a final model was selected. Based on a mathematical analysis, the members of the project then designed a statistical inference algorithm (learning the parameters of the model, including the genealogical tree of mutations for each patient) which is intended to be validated on real data currently being acquired at the Nancy CHRU. The general idea is to exploit a “variational principle” that allows to explore the discrete space of family trees, of very large size, through a “pivot” space of continuous parameters, easy to optimize (and in reasonable numbers). A paper detailing the model and its inference is in preparation. The previous method allows for the reconstruction of intratumoral heterogeneity, i.e. the subclone composition of the tumor. Based on these data, we are currently studying models of stochastic tumor growth with an emphasis on interactions between the clones to assess the effects of different treatment strategies.

Quasi-stationary distributions

We are continuing our research on quasi-stationary distributions (QSD), that is, distributions of Markov stochastic processes with absorption, which are stationary conditionally on non-absorption. For models of biological populations, absorption corresponds usually to extinction of a (sub-)population. QSDs are fundamental tools to describe the population state before extinction and to quantify the large-time behavior of the probability of extinction.

This year, we solved a general conjecture on the Fleming-Viot particle systems approximating QSDs: in cases where several QSDs exist, it is expected that the stationary distributions of the Fleming-Viot processes approach a particular QSD, called minimal QSD. We proved that this holds true for general absorbed Markov processes with soft obstacles in 7. We also obtained in 8 criteria based on Lyapunov functions allowing to check general conditions of 47 which characterize the exponential uniform convergence in total variation of conditional distributions of an absorbed Markov process to a unique quasi-stationary distribution. Among the various applications they give, they prove that these conditions apply to any logistic Feller diffusions in any dimension conditioned to the non-extinction of all its coordinates. This question was left partly open since the first work of Cattiaux and Méléard on this topic 46.

Together with M. Benaïm (Univ. Neuchâtel), we studied in 4 stochastic algorithms to approximate quasi-stationary distributions of diffusion processes absorbed at the boundary of a bounded domain. We considered a reinforced version of the diffusion, which is resampled according to its occupation measure when it reaches the boundary. We showed that its occupation measure converges to the unique quasi-stationary distribution of the diffusion process. We also obtained in 24 general criteria ensuring existence, uniqueness and/or exponential convergence properties for quasi-stationary distributions. The criteria were specifically designed to apply to degenerate processes such as hypoelliptic diffusions. We also provided in 25 a counterexample to the uniqueness of a quasi-stationary distribution for a diffusion process which satisfies the weak Hörmander condition.

Together with R. Schott (IECL, Univ. Lorraine), we studied in 6 models of deadlocks in distributed systems, using the approach we developped in 8 to study quasi-stationary distributions, in order to characterize and compute numerically the asymptotic behaviour of the deadlock time and the behaviour of the system before deadlock, both for discrete and for diffusion models.

Evolutionary models of food webs

We studied models of food web adaptive evolution in 10. We identified the biomass conversion efficiency as a key mechanism underlying food web evolution and discussed the relevance of such models to study the evolution of food webs.

In collaboration with S. Billiard (Univ. Lille). Adaptive dynamics in biological populations

We studied evolutionary models of bacteria with horizontal transfer in 5. Horizontal transfer is a common mechanism of DNA exchange between micro-organisms that is thought to be responsible for fast evolution of antibiotic resistance for bacteria or evolution of virulence for pathogenes. We considered a scaling of parameters taking into account the influence of negligible but non-extinct populations, allowing us to study specific phenomena observed in these models (re-emergence of traits, cyclic evolutionary dynamics and evolutionary suicide). This work is done in collaboration with S. Méléard (École Polytechnique) and V.C. Tran (Univ. Paris Est Marne-la-Vallée).

We also worked on general evolutionary models of adaptive dynamics under an assumption of large population and small mutations. This year, we obtained existence, uniqueness and ergodicity results for a centered version of the Fleming-Viot process of population genetics, which is a key step to recover variants of the canonical equation of adaptive dynamics, which describes the long time evolution of the dominant phenotype in the population, under less stringent biological assumptions than in previous works. We plan to complete this work next year.

Optimal control of Markov processes BrunoScherrerNinoVieillard

We consider Offline Reinforcement Learning methods. The problem is to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions.

In 17, we propose an iterative procedure to learn a pseudometric (closely related to bisimulation metrics) from logged transitions, and use it to define this notion of closeness. We show its convergence and extend it to the function approximation setting. We then use this pseudometric to define a new lookup based bonus in an actor-critic algorithm: PLOFF. This bonus encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions.

In 18, noticing that an agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data, we take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more common regularization of the learned policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.

Joint work with Robert Dadashi, Shideh Rezaeifar, Léonard Hussenot, Olivier Pietquin, Olivier Bachem and Matthieu Geist. Regression and machine learning ThierryBastogneSandieFerrignoAnneGégout-PetitAurélieGueudinBenoîtLallouéJean-MarieMonnezNassimSahkiSophieWantz-Mézières Cramér–von Mises goodness-of-fit tests in regression models

Many goodness-of-fit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are 'directional' in that they detect departures from a given assumption of the model. Other tests are 'global' (or 'omnibus') in that they assess whether a model fits a dataset on all its assumptions. We focus on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. We consider 2 nonparametric 'directional' tests and one nonparametric 'global' test, all based on generalizations of the Cramér-von Mises statistic.

To perform these goodness-of-fit tests, we develop the R package cvmgof 36, an easy-to-use tool for practitioners, available from the Comprehensive R Archive Network (CRAN). The use of the library is illustrated through a tutorial on real data and simulation studies are carried out in order to show how the package can be exploited to compare the 3 implemented tests. The practitioner can also easily compare the test procedures with different kernel functions, bootstrap distributions, numbers of bootstrap replicates, or bandwidths. The package was updated at the start of 2021, this is its third version. A first article 1 has been published on this work in October 2021.

We are now working on nonparametric tests associated with the functional form of the variance of the regression model. For this, we continue to work on the global test of Ducharme and Ferrigno in order to compare it in terms of performance with directional tests associated with the variance of the model. Many simulations are in progress. This will also make it possible to propose a more general package-type tool making it possible to validate the regression models used in practice.

To complete this work, it would be interesting to assess the other assumptions of a regression model such as the additivity of the random error term. The implementation of these directional tests would enrich the cvmgof package and offer a complete easy-to-use tool for validating regression models. Moreover, the assessment of the overall validity of the model when using several directional tests could be compared with that done when using only a global test. In particular, the well-known problem of multiple testing could be discussed by comparing the results obtained from multiple test procedures with those obtained when using a global test strategy. Another perspective of this work would be to develop a similar tool for other statistical models widely used in practice such as generalized linear models.

Join work with Romain Azaïs (INRIA, ENS Lyon) and Marie-José Martinez (LJK, Université Grenoble Alpes). Online data analysis

Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods. This article in collaboration with A. Skiredj was presented in the 2020 Activity Report (Section 8.3.5) and is now published in Journal of Multivariate Analysis 15.

Streaming constrained binary logistic regression with online standardized data. This article in collaboration with E. Albuisson was presented in the 2020 Activity Report (Section 8.3.5) and is now accepted in Journal of Applied Statistics 13.

Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression. This article in collaboration with E. Albuisson was presented in the 2020 Activity Report (Section 8.3.5) and is being submitted 30, 63.

Stochastic approximation of eigenvectors and eigenvalues of the Q-symmetric expectation of a random matrix. Application to streaming PCA. In this analysis, we have studied the convergence of stochastic approximation processes of the Oja type for estimating eigenvectors of the unknown $Q$ -symmetric expectation $B$ of a random matrix, the metric $Q$ being unknown. We have established a theorem of a.s. convergence of these processes with assumptions on the noisy observations $B_{n}$ of $B$ that are more general than in previous results. The estimation of eigenvectors corresponding to eigenvalues of $B$ in decreasing order is obtained using at step $n$ a Gram-Schmidt orthonormalization with respect to a random metric $Q_{n + 1}$ such that $Q_{n + 1}$ converges a.s. to $Q$ as $n$ goes to infinity. We have proved the a.s. convergence of specific processes to corresponding eigenvalues. Corollaries of this theorem apply in particular to cases where $E [B_{n} | T_{n}]$ or $B_{n}$ converges a.s. to $B$ which were studied by Monnez and Skiredj 15. In the case of a process using only the current observations at each step, we have suggested constructing another process using past and current observations. We have applied these results to the online estimation of principal components in streaming PCA of a random vector, taking into account all the observations up to the current step with possibly different weights assigned to past and current observations.

Other applications to methods related to PCA such as generalized canonical correlation analysis are in progress.

Change-point detection thresholds in the sequential context

To apply our algorithms of change-point to real data, we turned to some EMG signal data provided by INRS. The study concerns the development of trapezius muscle myalgia in the workplace. We apply change-point detection to characterize different computer activities carried out during an experimental day. Our analysis allowed us to characterize activities according to the frequency and amplitude of jumps and to distinguish office activities using the mouse from those using the keyboard. This work was presented in a conference paper 19.

Statistical learning and application in health NicolasChampagnatLéoDarrigadeSandieFerrignoCoralieFritschAnneGégout-PetitAurélieGueudinUlysseHerbachBenoîtLallouéRodolpheLoubatonJean-MarieMonnezAnoukRagoNicolasThorrPierreValloisSophieWantz-Mézières Analysis of diffuse low-grade gliomas growth

In the aim of understanding the growth of low-grade glioma, we investigate multiple fields of information available in clinical practice: patient-related predictors, variables related to tumor tissue and genetics. Monitoring growth through regular MRIs gives us access to many imaging-related variables, including an original one measuring tumor infiltration (thesis defended in 2021: Cyril Brzenczek CRAN, article in preparation). Our last efforts have focused on the statistical analysis of the database composed of these variables. We have obtained a regional fund PACTE to host this database and use it for teaching, dissemination and development of experimentation tools: PIANO platform.

Join work with J.M. Moureaux, Y. Gaudeau (CRAN), F. Rech, L. Taillandier, M. Blonski, T. Obara (CHRU Nancy) Estimation of reference curves for fetal weight

In Epidemiology, we are working with INSERM to study fetal development in the last two trimesters of pregnancy. Reference or standard curves are required in this kind of biomedical problems. Values that lie outside the limits of these reference curves may indicate the presence of a disorder. Data are from the French EDEN mother-child cohort (INSERM). It's a mother-child cohort study investigating the prenatal and early postnatal determinants of child health and development. 2002 pregnant women were recruited before 24 weeks of amenorrhoea in two maternity clinics from middle-sized French cities (Nancy and Poitiers). From May 2003 to September 2006, 1899 newborns were then included. The main outcomes of interest are fetal (via ultra-sound) and postnatal growth, adiposity development, respiratory health, atopy, behaviour and bone, cognitive and motor development. We are studying fetal weight and height as a function of the gestional age in the third trimester of pregnancy. Some classical empirical and parametric methods such as polynomial regression are first used to construct these curves. For instance, polynomial regression is one of the most common parametric approaches for modeling growth data, especially during the prenatal period. However, these classical methods require strong assumptions. We therefore propose to work with semi-parametric LMS methods, by modifying the response variable (fetal weight) with, among others, Box–Cox transformations. A first article detailing these methodologies applied to the EDEN data should be submitted next year and is the object of the communication 31.

Alternative nonparametric methods as Nadaraya-Watson kernel estimation, local polynomial estimation, B-splines or cubic splines are also developed in this context to construct these curves. The practical implementation of these methods required working on smoothing parameters or choice of knots for the different types of nonparametric estimation. In particular, optimal choice of these parameters has been proposed. Then, a first version of an R package has been developed to propose a tool to construct nonparametric reference curves. It will soon be available on GitHub. In addition, a graphical interface (GUI) intended for practitioners is being developed to allow intuitive visualization of the results given by the package and an article is in progress.

Join work with Myriam Maumy-Bertrand (IRMA, Université de Strasbourg) and INSERM. Construction of parsimonious event risk scores by an ensemble method. An illustration for short-term predictions in chronic heart failure patients from the GISSI-HF trial

This article in collaboration with E. Albuisson and D. Lucci was presented in the 2020 Activity Report (Section 8.4.2) and is now published in Applied Mathematics 14.

Prediction of silencing experiments on gene networks for chronic lymphocytic leukemia

We are working with L. Vallat (CHRU Strasbourg) on the inference of dynamical gene networks from RNAseq and proteome data. The goal is to infer a model of gene expression allowing to predict the gene expression in cells where the expression of genes is silenced (e.g. using siRNA), in order to select the silencing experiments which are more likely to reduce the cell proliferation. We expect the selected genes to provide new therapeutic targets for the treatment of chronic lymphocytic leukemia. This year, we addressed the general problem of prediction as defined above, and constructed and proposed an inference method for a new gene network model for which such a prediction is possible. Next year, we expect to identify potential therapeutic targets for which silencing experiments could be conducted.

A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology

We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. A simulation study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced non-small-cell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments. This work is published in 2 and is implemented in R package ‘ARMADA’.

In collaboration with T. Boukhobza and H. Dumond from CRAN, and B. Bastien from biopharmaceutical industry Transgene. Projects linked with the COVID 19 pandemic

Seroprevalence study Pierre Vallois is the scientific coordinator of the seroprevalence study COVAL Nancy held in Nancy in July 2020 in collaboration with CHRU de Nancy (CIC épidémiologie clinique and Laboratoire de Virologie).

Background. The World Health Organisation recommends monitoring the circulation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We aimed to estimate anti–SARS-CoV-2 total immunoglobulin (IgT) antibody seroprevalence and describe symptom profiles and in vitro seroneutralization in Nancy, France, in spring 2020.

Methods. Individuals were randomly sampled from electoral lists and invited with household members over 5 years old to be tested for anti–SARS-CoV-2 (IgT, i.e. IgA/IgG/IgM) antibodies by ELISA (Bio-rad). Serum samples were classified according to seroneutralization activity 50 % (NT50) on Vero CCL-81 cells. Age- and sex-adjusted seroprevalence was estimated. Subgroups were compared by chi-square or Fisher exact test and logistic regression.

Results. Among 2006 individuals, 43 were SARS-CoV-2–positive; the raw seroprevalence was 2.1 % (95 % confidence interval 1.5 to 2.9), with adjusted metropolitan and national standardized seroprevalence 2.5 % (1.8 to 3.3) and 2.3 % (1.7 to 3.1). Seroprevalence was highest for 20- to 34-year-old participants (4.7 % [2.3 to 8.4]), within than out of socially deprived area (2.5 % vs 1 %, P=0.02) and with than without intra-family infection (p<10-6). Moreover, 25 % (23 to 27) of participants presented at least one COVID-19 symptom associated with SARS-CoV-2 positivity (p<10-13), with anosmia or ageusia highly discriminant (odds ratio 27.8 [13.9 to 54.5]), associated with dyspnea and fever. Among the SARS-CoV-2-positives, 16.3 % (6.8 to 30.7) were asymptomatic. For 31 of these individuals, positive seroneutralization was demonstrated in vitro.

Conclusions. In this population of very low anti-SARS-CoV-2 antibody seroprevalence, a beneficial effect of the lockdown can be assumed, with frequent SARS-CoV-2 seroneutralization among IgT-positive patients.

The results were published first in Medrxiv corresponding to 27 and in a peer-reviewed international journal 11.

SARS‐CoV‐2 positive patients in hospital predictions Participants : A. Gégout-Petit, U. Herbach, N. Thorr.

In collaboration with H. Berry, D. Gemmerlé, T. Lepoutre, D. Maucourt and D. Parsons.

We followed Inria's recommendations to get involved in the fight against COVID 19. We tried to collaborate with the LCPME laboratory in the purpose to predict the number of SARS‐CoV‐2 positive patients from the Grand Nancy metropolitan at the Nancy University Hospital from the concentration of SARS-Cov-2 residues in waste water. We have encountered difficulties with the Obépine network in obtaining raw data rather than mere indicators. We made predictions from the incidence rates available on Santé Publique France. The predictions are available on the siwam website. Inria hired Nicolas Thorr as engineer during 6 months for this project.

Bilateral contracts and grants with industry BrunoScherrer Bilateral contracts with industry

B. Scherrer collaborates with Google Brain on reinforcement learning in the framework of the PhD thesis of Nino Vieillard.

Partnerships and cooperations NicolasChampagnatLéoDarrigadeCoralieFritschAnneGégout-PetitUlysseHerbachJosephLam-WeilJean-MarieMonnezAurélieMuller-GueudinPierreValloisDenisVillemonnaisSophieWantz-Mézières International initiatives Participation in International Programs BRN Title:

Biostochastic Research Network

Partner Institution(s):

Universidad de Valparaiso (Chile) - CIMFAV – Facultad de Ingenieria - Soledad Torres, Rolando Rebolledo.

CNRS, Inria & Institut Élie Cartan de Lorraine (France) - N. Champagnat, A. Lejay (coordinator for France), D. Villemonnais, R. Schott.

Date/Duration:

2018–2022

Goal:

Scientific exchange around probabilistic models in population ecology.

National initiatives

FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing). Leader: Pr Athanase Benetos. Participants: Jean-Marie Monnez, Benoît Lalloué, Anne Gégout-Petit.

RHU Fight HF (Fighting Heart Failure), located at the University Hospital of Nancy. Leader: Pr Patrick Rossignol). Participants: Jean-Marie Monnez, Benoît Lalloué.

ITMO Physics, Mathematics applied to Cancer (2017-2022): “Modeling ctDNA dynamics for detecting targeted therapy resistance”. Funding organisms: ITMO Cancer, ITMO Technologies pour la santé de l’alliance nationale pour les sciences de la vie et de la santé (AVIESAN), INCa. Partners: Inria and IECL (Institut Élie Cartan de Lorraine), CHRU Strasbourg, CRAN (Centre de Recherche en Automatique de Nancy) and ICL (Institut de Cancérologie de Lorraine). Leader: N. Champagnat. Participants: L. Darrigade, C. Fritsch, A. Gégout-Petit, U. Herbach, A. Muller-Gueudin, P. Vallois.

GDR 720 ISIS (funded by CNRS). Leader: Laure Blanc-Féraud. Participant: Sophie Mézières.

Regional initiatives

CHRU de Nancy. We have good collaborations with several researchers from CHRU de Nancy. We are involved in LUE Impact Geenage in research axis telomeres.

Région Grand-Est. In the context of the Telomere project, Anne Gégout-Petit and Denis Villemonais obtained a grant from Grand-Est region to hire Joseph Lam-Weil as a post-doctoral fellow. University of Lorraine and LUE GEENAGE program completed the grant.

Dissemination ThierryBastogneNicolasChampagnatLéoDarrigadeSandieFerrignoCoralieFritschAnneGégout-PetitVincentHassUlysseHerbachJosephLam-WeilRodolpheLoubatonJean-MarieMonnezAurélieMuller-GueudinPierreValloisDenisVillemonnaisSophieWantz-MézièresNicolásZalduendo Vidal Promoting scientific activities Scientific events: organisation Member of the organizing committees

N. Champagnat, C. Fritsch and U. Herbach organized the workshop Modélisation de l’hétérogénéité tumorale et thérapies ciblées (IECL, Univ. Lorraine, 21–22 Oct.).

N. Champagnat is member of the organizing committee of the conference A Random Walk in the Land of Stochastic Analysis and Numerical Probability in honor of Denis Talay (CIRM, Luminy, 3-7 Jan. 2022). Unfortunately, this event is postponed due to sanitary restrictions.

N. Champagnat was member of the organizing committee of the Journée Scientifique FCH "Covid" (Inria Nancy - Grand Est, 28 Sep.).

U. Herbach co-organized the IBOMAN 2021 conference (Institut Curie, Paris, 25-27 Oct.).

A. Gégout-Petit co-organized the formation session Journée d'études en statistique : Données manquantes (SFdS, Fréjus 15-19 Nov.).

Scientific events: selection Member of the conference program committees

N. Champagnat is a member of the scientific committee for the 53èmes Journées de Statistiques (Lyon, 13-17 June 2022).

Journal Member of the editorial boards

N. Champagnat served as co-editor-in-chief with Béatrice Laurent-Bonneau (IMT Toulouse) of ESAIM: Probability & Statistics until June. Since then, he served as an associate editor of this journal.

N. Champagnat serves as an associate editor of Stochastic Models.

Reviewer - reviewing activities.

Here is a selection of the journals for which we regularly write referee reports: Bernoulli, Cell, Medicina, The Annals of Applied Probability, Stochastic Processes and their Applications, Journal de Mathématiques Pures et Appliquées, ALEA - Latin American Journal of Probability and Mathematical Statistics, ESAIM: Probability & Statistics, Journal of Theoretical Biology, Mathematical Biosciences, Journal of Physics A: Mathematical and Theoretical, Current Opinion in Systems Biology, Bioinformatics...

Invited talks

N. Champagnat has been invited to give a talk at the online workshop Biostochastic Networks 2021 in November and at the online Journées Scientifiques du GE2MI sur le thème EDP et modèles biomathématiques in December.

V. Hass has been invited to give talks at the École de la Chaire MMB in Aussois in June, at the Journées de probabilités in Guidel in June and at the Colloque des Jeunes Probabilistes et Statisticiens (JPS2021) in Saint-Pierre d’Oléron in October.

R. Loubaton has been invited to give a talk at the Congrès des Jeunes Chercheurs en Mathématiques Appliquées 2021 (CJC-MA 2021) in Palaiseau in October.

N. Zalduendo Vidal has been invited to give a talk at the Research School of the Chaire Modélisation Mathématique et Biodiversité at Aussois in June and at the Etheridge Group Seminar at Oxford University in June.

U. Herbach has been invited to give a talk at the Biohasard 2021 online conference in June.

Leadership within the scientific community

A. Gégout-Petit is vice-president of the European Network for Business and Industrial Statistics (ENBIS).

Scientific expertise

C. Fritsch has been a member of the Committee for junior permanent research positions of Inria Nancy - Grand Est.

A. Gégout-Petit has been a member of several hiring committees: as President for Université Technologique de Compiègne (MCF 26e section); Sorbonne Université (PR 26th section); National jury for 46.1 Professor recruitment; University of Luxembourg, Assistant professor in statistics.

Research administration

N. Champagnat is a member of the coordination committee of MODCOV19, a platform of coordination of research actions about modeling of SARS-CoV-2 (Covid-19) pandemic. He heads the bibliographic awareness group.

N. Champagnat is a member of the Comité de Centre, the COMIPERS and the Commission Information Scientifique et Technique of Inria Nancy - Grand Est and Responsable Scientifique for the library of Mathematics of the IECL. He is also local correspondent of the COERLE (Comité Opérationel d'Évaluation des Risques Légaux et Éthiques) for the Inria Research Center of Nancy - Grand Est.

C. Fritsch is a member of the Commission du Développement Technologique of Inria Nancy-Grand Est and of the Commission du personnel of IECL. She was member of the Commission Parité-Égalité of IECL until August. She is the local Radar correspondent for the Inria Research Center of Nancy - Grand Est.

A. Gégout-Petit is the head of “Institut Élie Cartan de Lorraine” (mathematics laboratory of Université de Lorraine).

Teaching - Supervision - Juries Teaching

BIGS faculty members have teaching obligations at Université de Lorraine and are teaching at least 192 hours each year. They teach probability and statistics at different levels (Licence, Master, Engineering school). Many of them have pedagogical responsibilities.

D. Villemonais is the head of the Mathematical Engineering Major of ENSMN, Université de Lorraine, France.

T. Bastogne is in charge of the research master program “Santé Numérique et Imagerie Médicale” with the Faculty of Medicine, Université de Lorraine, France.

Master: N. Champagnat, Introduction to Quantitative Finance, 12h, M1, second year of ENSMN, Université de Lorraine, France.

Master: N. Champagnat, Introduction to Quantitative Finance, 9h, M2, third year of ENSMN, Université de Lorraine, France.

Master: N. Champagnat, Problèmes inverses, 15h, M1, second year of ENSMN, Université de Lorraine, France.

Master: S. Ferrigno, Experimental designs, 4.5h, M1, fourth year of EEIGM, Université de Lorraine, France.

Master: S. Ferrigno, Data analyzing and mining, 63h, M1, second year of ENSMN, Université de Lorraine, France.

Master: S. Ferrigno, Modeling and forecasting, 43h, M1, second year of ENSMN, Université de Lorraine, France.

Master: S. Ferrigno, Training projects, 18h, M1/M2, second and third year of ENSMN, Université de Lorraine, France.

Master: A. Muller-Gueudin, Probability and Statistics, 160h, second year of ENSEM and ENSAIA, Université de Lorraine, France.

Master: A. Muller-Gueudin, Scientific calculation with Matlab, 20h, second year of ENSAIA, Université de Lorraine, France.

Master: A. Gégout-Petit, Statistics, modeling, data analysis, 80h, master in applied mathematics, Université de Lorraine, France.

Master: U. Herbach, Data analyzing and mining tutorial, 18h, M1, second year of ENSMN, Université de Lorraine, France.

Master: U. Herbach, Introduction to probability theory, 18h, M1, second year of ENSEM (apprenticeship cursus), Université de Lorraine, France.

Master: R. Loubaton, Analyse de données, 16h, M1, second year of ENSMN, Université de Lorraine, France.

Master: R. Loubaton, Introduction à l'apprentissage automatique, 6h, M1, second year of ENSMN, Université de Lorraine, France.

Master: S. Wantz-Mézières, Learning and analysis of medical data, 36h, with J.M. Moureaux, Master SNIM, Université de Lorraine, France.

Master: D. Villemonais, Probability Theory II, 63h, M1, second year of ENSMN, Université de Lorraine, France.

Master: D. Villemonais, Stochastic processes, 32h, Master 2 MFA, Université de Lorraine, France.

Master: D. Villemonais, Modeling and forecasting, 14h, M1, second year of ENSMN, Université de Lorraine, France.

Licence: S. Wantz-Mézières, Applied mathematics for management, financial mathematics, Probability and Statistics, 160h, IUT Nancy-Charlemagne (L1/L2/L3), Université de Lorraine, France.

Licence: S. Wantz-Mézières, Probability, 100h, first year in TELECOM Nancy (initial and apprenticeship cursus), Université de Lorraine, France.

Licence: A. Muller-Gueudin, Statistics, 60h, first year of ENSAIA, Université de Lorraine, France.

Licence: S. Ferrigno, Descriptive and inferential statistics, 60h, L2, second year of EEIGM, Université de Lorraine, France.

Licence: S. Ferrigno, Statistical modeling, 60h, L2, second year of EEIGM, Université de Lorraine, France.

Licence: S. Ferrigno, Mathematical and computational tools, 20h, L3, third year of EEIGM, Université de Lorraine, France.

Licence: S. Ferrigno, Training projects, 40h, L1/L3, first, second and third year of EEIGM, Université de Lorraine, France.

Licence: C. Fritsch, Probability Theory tutorial, 40h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: V. Hass, Complément d'analyse, 38h, L1, FST, Université de Lorraine, France.

Licence: V. Hass, Analyse numérique et optimisation, 46h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: V. Hass, Probabilités, 40h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: V. Hass, Mathématiques FIGIM 1A, 35h, L1/L2, first year of ENSMN, Université de Lorraine, France.

Licence: V. Hass, Mathématiques FIGIM 2A, 21h, L2, first year of ENSMN, Université de Lorraine, France.

Licence: U. Herbach, Statistics tutorial, 39h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: R. Loubaton, Inférence statistique, 21h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: R. Loubaton, Probabilités, 20h, L2, FST, Université de Lorraine, France.

Licence: R. Loubaton, FST, Méthodes Numériques, 10h, L2, FST, Université de Lorraine, France.

Licence: R. Loubaton, Latex, 9h, L2, FST, Université de Lorraine, France.

Licence: R. Loubaton, Remédiation mathématique, 30h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: R. Loubaton, Analyse numérique et optimisation, 40h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: D. Villemonais, Probability Theory, 57h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: N. Zalduendo Vidal, Probability Theory tutorial, 40h, L3, first year of ENSMN, Université de Lorraine, France.

Licence: N. Zalduendo Vidal, Numerical Analysis tutorial, 20h, L3, first year of ENSMN, Université de Lorraine, France.

Supervision PhD

PhD: Nassim Sahki, “Data-driven methodology for sequential change-point detection for physiological signals”, grant Inria-Cordis. Defence 29 Nov 2021. Advisors: A. Gégout-Petit, S. Wantz-Mézières 23.

PhD in progress: Vincent Hass, “Individual-based models in adaptive dynamics and long time evolution under assumptions of rare advantageous mutations”, grant Inria-Cordis. Advisor: N. Champagnat.

PhD in progress: Rodolphe Loubaton, “Caractérisation des cibles thérapeutiques dans un programme génique tumoral”, grant Région Grand-Est. Advisors: N. Champagnat and L. Vallat (CHRU Strasbourg).

PhD in progress: Anouk Rago, “Inférence de réseaux de gènes dynamiques et prédiction d’expériences d’interventions biologiques dans des cellules cancéreuses”, grant Région Grand-Est, Inria. Advisors: N. Champagnat, A. Gégout-Petit.

PhD in progress: Nino Vieillard, "Approximate Dynamic Programming and Deep Reinforcement Learning", CIFRE with Google Brain. Advisors: B. Scherrer, M. Geist (Google Brain).

PhD in progress: Nicolás Zalduendo Vidal, “Processus de branchement bi-sexués multi-types”, grant Inria-Cordis. Advisors: C. Fritsch, D. Villemonais.

Other

Parcours Recherche: Asmaa Labtaina, “Processus de Markov déterministes par morceaux et leur application à l’expression stochastique des gènes” (full-year research project, M1 ENSMN). Advisor: U. Herbach.

TER: Mohammed Khatbane and Abdelkabir Bouyghf, “Méthodes variationnelles en apprentissage statistique : l’exemple du modèle Latent Dirichlet Allocation” (research project, M1 Univ. Lorraine). Advisor: U. Herbach.

Juries

PhD: N. Champagnat, reporter, thesis of Maxime Berger, “Le comportement critique de la quasi-espèce”, Université PSL.

PhD: N. Champagnat, reporter, thesis of Felipe Munoz-Hernandez, “Approximation quantitative en grande population de modèles stochastiques avec interaction ou environnement variable”, Institut Polytechnique de Paris.

PhD: N. Champagnat, reporter, thesis of Julie Tourniaire, “Spatial dynamics of interfaces in ecology: deterministic and stochastic models”, Institut Polytechnique de Paris.

PhD: A. Gégout-Petit, president, thesis of A. Conanec Rago, Université de Bordeaux.

PhD: A. Gégout-Petit, president, thesis of S. Yacheur, Université de Lorraine.

PhD Prize: A. Gégout-Petit, jury member, 2021 AMIES PhD Prize.

PhD: B. Scherrer, reporter, thesis of Yannis Flet-Berliac, “Sample-Efficient Deep Reinforcement Learning for Control, Exploration and Safety”, Université de Lille.

PhD: B. Scherrer, reporter, thesis of Marc Etheve, “Using machine learning to solve repeated optimization problems”, CNAM Paris (CIFRE with EDF).

Popularization Education

S. Ferrigno: Advisor of a group of students (EEIGM), "Traitement statistique de données" project, various high schools, Nancy, 2021

S. Ferrigno: Advisor of a group of students (EEIGM), "La main à la Pâte" project, Institut médico-éducatif (IME), Commercy, October-November 2021

S. Mézières: organisation of a research training week on Neurooncology and Numerics, for medical and engineering students, January 2021

Interventions

C. Fritsch made two interventions in the Lycée Cormontaigne in Metz, as part of the “Chiche!” program, in December 2021.

cvmgof: an R package for Cramér-von Mises goodness-of-fit tests in regression models R. Romain Azaïs S. Sandie Ferrigno M.-J. Marie-José Martinez Journal of Statistical Computation and Simulation October 2021 A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology B. Bérangère Bastien T. Taha Boukhobza H. Hélène Dumond A. Anne Gégout-Petit A. Aurélie Muller-Gueudin C. Charlène Thiébaut Journal of Applied Statistics 2021 23 iQbD: a TRL-indexed quality-by-design paradigm for medical device engineering T. Thierry Bastogne Journal of Medical Devices 2021 Stochastic approximation of quasi-stationary distributions for diffusion processes in a bounded domain M. Michel Benaïm N. Nicolas Champagnat D. Denis Villemonais Annales de l'Institut Henri Poincaré (B) Probabilités et Statistiques 2021 57 2 726-739 Stochastic analysis of emergence of evolutionary cyclic behavior in population dynamics with transfer N. Nicolas Champagnat S. Sylvie Méléard V. C. Viet Chi Tran Annals of Applied Probability 2021 31 4 1820-1867 Analysis of distributed systems via quasi-stationary distributions N. Nicolas Champagnat R. René Schott D. Denis Villemonais Stochastic Analysis and Applications 2021 36 6 981-998 Convergence of the Fleming-Viot process toward the minimal quasi-stationary distribution N. Nicolas Champagnat D. Denis Villemonais ALEA : Latin American Journal of Probability and Mathematical Statistics 2021 18 1-15 Lyapunov criteria for uniform convergence of conditional distributions of absorbed Markov processes N. Nicolas Champagnat D. Denis Villemonais Stochastic Processes and their Applications May 2021 135 51-74 Stochastic Methods for Neutron Transport Equation III: Generational many-to-one and <formula type="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>k</mi> <mi>𝚎𝚏𝚏</mi> </msub></math></formula> A. M. Alexander M. G. Cox E. L. Emma L. Horton A. E. Andreas E. Kyprianou D. Denis Villemonais SIAM Journal on Applied Mathematics May 2021 81 3 Identifying conversion efficiency as a key mechanism underlying food webs evolution : a step forward, or backward ? C. Coralie Fritsch S. Sylvain Billiard N. Nicolas Champagnat Oikos 2021 130 6 904-930 Seroprevalence of SARS-CoV-2, Symptom Profiles and Sero-Neutralization in a Suburban Area, France A. Anne Gégout-Petit H. Hélène Jeulin K. Karine Legrand N. Nicolas Jay A. Agathe Bochnakian P. Pierre Vallois E. Evelyne Schvoerer F. Francis Guillemin Viruses June 2021 13 6 1076 Transverse isotropic modelling of left-ventricle passive filling: mechanical characterization for epicardial biomaterial manufacturing J.-P. Jean-Philippe Jehl P. Pan Dan A. Arnaud Voignier N. Nguyen Tran T. Thierry Bastogne P. Pablo Maureira F. Franck Cleymand Journal of the mechanical behavior of biomedical materials July 2021 119 104492 Streaming constrained binary logistic regression with online standardized data B. Benoît Lalloué J.-M. Jean-Marie Monnez E. Eliane Albuisson Journal of Applied Statistics 2021 Construction of Parsimonious Event Risk Scores by an Ensemble Method. An Illustration for Short-Term Predictions in Chronic Heart Failure Patients from the GISSI-HF Trial B. Benoît Lalloué J.-M. Jean-Marie Monnez D. Donata Lucci E. Eliane Albuisson Applied Mathematics July 2021 12 7 627-653 Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods J.-M. Jean-Marie Monnez A. Abderrahman Skiredj Journal of Multivariate Analysis March 2021 182 104694 What matters to patients? A mixed method study of the importance and consideration of oncology patient demands M. Mathias Waelli E. Etienne Minvielle M. X. Maria Ximena Acero K. Khouloud Ba B. Benoit Lalloue BMC Health Services Research 2021 21 256 Offline Reinforcement Learning with Pseudometric Learning R. Robert Dadashi S. Shideh Rezaeifar N. Nino Vieillard L. Léonard Hussenot O. Olivier Pietquin M. Matthieu Geist 38th International Conference on Machine Learning virtual, France June 2021 139 Offline Reinforcement Learning as Anti-Exploration S. Shideh Rezaeifar R. Robert Dadashi N. Nino Vieillard L. Léonard Hussenot O. Olivier Bachem O. Olivier Pietquin M. Matthieu Geist 36th AAAI Conference on Artificial Intelligence Vancouver, Canada February 2022 Detection of breaks in EMG signals of upper trapezius muscle activity N. Nassim Sahki A. Anne Gégout-Petit S. Sophie Wantz-Mézières JDS 2021 - 52èmes Journées de Statistique de la SFdS Nice / Virtual, France June 2021 easyQBD: A quality by design SaaS platform. Application to the development of lipid nanoparticles for mRNA delivery. T. Thierry Bastogne S. Sanne Bevers S. Sander Kooijmans L. Lucie Hassler S. E. Samir El Andaloussi R. Raymond Schiffelers S. Stefaan De Koker 6th Bioproduction Congress Lyon, France September 2021 Tuning LNPs to target antigen presenting cells in spleen induces CD8 T-cell responses and tumor regression in mice S. Sanne Bevers S. Sander Kooijmans E. Elien van de Velde M. Martijn Evers S. Sofie Seghers J. Jerney Gitz-François L. Lucie Hassler K. Karine Breckpot T. Thierry Bastogne R. Raymond Schiffelers S. Stefaan De Koker 18th CIMT Annual Meeting Mainz, Germany May 2021 Rationally designed mRNA-loaded lipid nanoparticles provoke strong antitumor T cell immunity which critically depends on specific immune cell subsets S. Sander Kooijmans S. Sanne Bevers E. Elien van de Velde M. J. Martijn J W Evers S. Sofie Seghers J. J. Jerney J J M Gitz-François L. Lucie Hassler K. Karine Breckpot T. Thierry Bastogne R. M. Raymond M Schiffelers S. Stefaan De Koker Annual Meeting of the Controlled Release Society, CRS 2021 Virtual, United States July 2021 Data-driven methodology for sequential change-point detection for physiological signals N. Nassim Sahki November 2021 Degenerate processes killed at the boundary of a domain M. Michel Benaïm N. Nicolas Champagnat W. William Oçafrain D. Denis Villemonais 2021 Transcritical bifurcation for the conditional distribution of a diffusion process M. Michel Benaïm N. Nicolas Champagnat W. William Oçafrain D. Denis Villemonais December 2021 Universality of cell differentiation trajectories revealed by a reconstruction of transcriptional uncertainty landscapes from single-cell transcriptomic data N. P. Nan Papili Gao O. Olivier Gandrillon A. András Páldi U. Ulysse Herbach R. Rudiyanto Gunawan February 2021 Seroprevalence of SARS-CoV-2, symptom profiles and seroneutralization during the first COVID-19 wave in a suburban area, France A. Anne Gégout-Petit H. Hélène Jeulin K. Karine Legrand A. Agathe Bochnakian P. Pierre Vallois E. Evelyne Schvoerer F. Francis Guillemin June 2021 Gene regulatory network inference from single-cell data using a self-consistent proteomic field U. Ulysse Herbach October 2021 Fluctuations of balanced urns with infinitely many colours S. Svante Janson C. Cécile Mailler D. Denis Villemonais November 2021 Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression B. Benoît Lalloué J.-M. Jean-Marie Monnez E. Eliane Albuisson February 2021 Semiparametric reference curves for EDEN cohort S. Sandie Ferrigno CMStatistics 2021 Londres, United Kingdom December 2021 Cours d'analyse des données et apprentissage : L'analyse en composantes principales J.-M. Jean-Marie Monnez France April 2021 Méthodes de classification non supervisée J.-M. Jean-Marie Monnez France April 2021 Harissa: tools for mechanistic gene network inference from single-cell data U. Ulysse Herbach October 2021 Non-Parametric Estimation of the Conditional Distribution of the Interjumping Times for Piecewise-Deterministic Markov Processes R. Romain Aza\"is F. François Dufour A. Anne Gégout-Petit Scandinavian Journal of Statistics December 2014 41 4 950--969 cvmgof: Cramer-von Mises goodness-of-fit tests R. Romain Aza\"is S. Sandie Ferrigno M.-J. Marie-José Martinez November 2018 Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes R. Romain Aza\"is A. Aurélie Muller-Gueudin Electronic journal of statistics 2016 A recursive nonparametric estimator for the transition kernel of a piecewise-deterministic Markov process R. Romain Azaïs ESAIM: Probability and Statistics 2014 18 726--749 Nonparametric estimation of the jump rate for non-homogeneous marked renewal processes R. Romain Azaïs F. François Dufour A. Anne Gégout-Petit 2013 49 4 1204--1231 Semi-parametric estimation of the long-range dependence parameter: a survey J. M. J. M. Bardet G. G. Lang G. G. Oppenheim A. A. Philippe S. S. Stoev M. M.S. Taqqu 2003 Birkhauser Boston 557-577 Identification of pharmacokinetics models in the presence of timing noise T. Thierry Bastogne S. Sophie Mézières-Wantz N. Nacim Ramdani P. Pierre Vallois M. Muriel Barberi-Heyob Eur. J. Control 2008 14 2 149--157 Phenomenological modeling of tumor diameter growth based on a mixed effects model T. Thierry Bastogne A. Adeline Samson P. Pierre Vallois S. S Wantz-Mézières S. Sophie Pinel D. Denise Bechet M. Muriel Barberi-Heyob Journal of theoretical biology 2010 262 3 544--552 Neurodynamic Programming D. D.P. Bertsekas J. J.N. Tsitsiklis 1996 Athena Scientific Multi-operator Scaling Random Fields H. Hermine Biermé C. Céline Lacaux H.-P. Hans-Peter Scheffler Stochastic Processes and their Applications 2011 121 11 2642-2677 A fast and recursive algorithm for clustering large datasets with k-medians H. Hervé Cardot P. Peggy Cénac J.-M. Jean-Marie Monnez Computational Statistics & Data Analysis 2012 56 6 1434--1449 Competitive or weak cooperative stochastic Lotka--Volterra systems conditioned on non-extinction P. Patrick Cattiaux S. Sylvie Méléard Journal of mathematical biology 2010 60 6 797--829 Exponential convergence to quasi-stationary distribution and Q-process N. Nicolas Champagnat D. Denis Villemonais Probability Theory and Related Fields 2016 164 1 243-283 Simulation and identification of the fractional brownian motion: a bibliographical and comparative study J. F. J. F. Coeurjolly Journal of Statistical Software 2000 5 1--53 Piecewise-deterministic Markov processes: A general class of non-diffusion stochastic models M. H. Mark HA Davis Journal of the Royal Statistical Society. Series B (Methodological) 1984 353--388 Rough Volterra equations. I. The algebraic integration setting A. Aurélien Deya S. Samy Tindel Stoch. Dyn. 2009 9 3 437--477 Statistical estimation of a growth-fragmentation model observed on a genealogical tree M. Marie Doumic M. Marc Hoffmann N. Nathalie Krell L. Lydia Robert Bernoulli 2015 21 3 1760--1799 Un test d'adéquation global pour la fonction de répartition conditionnelle S. Sandie Ferrigno G. Gilles Ducharme C. R. Math. Acad. Sci. Paris 2005 341 5 313--316 Uniform law of the logarithm for the local linear estimator of the conditional distribution function S. Sandie Ferrigno M. Myriam Maumy-Bertrand A. Aurélie Muller-Gueudin C. R. Math. Acad. Sci. Paris 2010 348 17-18 1015--1019 Sparse inverse covariance estimation with the graphical lasso J. Jerome Friedman T. Trevor Hastie R. Robert Tibshirani Biostatistics 2008 9 3 432--441 Graph selection with GGMselect C. Christophe Giraud S. Sylvie Huet N. Nicolas Verzelen Statistical applications in genetics and molecular biology 2012 11 3 Lower Bounds for Howard's Algorithm for Finding Minimum Mean-Cost Cycles T. T.D. Hansen U. U. Zwick 2010 415-426 From persistent random walk to the telegraph noise S. Samuel Herrmann P. Pierre Vallois Stoch. Dyn. 2010 10 2 161--196 Modeling subtilin production in bacillus subtilis using stochastic hybrid systems J. Jianghai Hu W.-C. Wei-Chung Wu S. Shankar Sastry 2004 Springer 417--431 Multinomial model-based formulations of TCP and NTCP for radiotherapy treatment planning R. Roukaya Keinj T. Thierry Bastogne P. Pierre Vallois Journal of Theoretical Biology June 2011 279 1 55-62 Quantile regression R. Roger Koenker 2005 Cambridge university press 38 Statistical inference for ergodic diffusion processes Y. A. Yury A. Kutoyants Springer Series in Statistics 2004 Springer-Verlag London Ltd. xiv+481 Real Harmonizable Multifractional Lévy Motions C. Céline Lacaux Ann. Inst. Poincaré. 2004 40 3 259--277 Convergence d'un score d'ensemble en ligne : étude empirique B. Benôit Lalloué J.-M. Jean-Marie Monnez E. Eliane Albuisson July 2020 On the Benzecri's method for computing eigenvectors by stochastic approximation (the case of binary data) L. Ludovic Lebart 1974 Physica Verlag 202--211 Non-Stationary Approximate Modified Policy Iteration B. Boris Lesner B. Bruno Scherrer July 2015 System control and rough paths T. T. Lyons Z. Z. Qian Oxford mathematical monographs 2002 Clarendon Press High-dimensional graphs and variable selection with the lasso N. Nicolai Meinshausen P. Peter Bühlmann The Annals of Statistics 2006 1436--1462 Approximation stochastique en analyse factorielle multiple J.-M. Jean-Marie Monnez Ann. I.S.U.P. 2006 50 3 27--45 Convergence d'un processus d'approximation stochastique en analyse factorielle J.-M. Jean-Marie Monnez Publ. Inst. Statist. Univ. Paris 1994 38 1 37--55 Stochastic approximation of the factors of a generalized canonical correlation analysis J.-M. Jean-Marie Monnez Statist. Probab. Lett. 2008 78 14 2210--2216 On non-parametric estimates of density functions and regression curves E. EA Nadaraya Theory of Probability & Its Applications 1965 10 1 186--190 The simplex method is strongly polynomial for deterministic Markov decision processes I. I. Post Y. Y. Ye 2012 Markov Decision Processes M. M. Puterman 1994 Wiley, New York Brownian penalisations related to excursion lengths, VII B. Bernard Roynette P. Pierre Vallois M. Marc Yor 2009 45 2 421--452 Elements of stochastic calculus via regularization F. Francesco Russo P. Pierre Vallois Lecture Notes in Math. 2007 Springer 1899 147--185 Stochastic calculus with respect to continuous finite quadratic variation processes F. Francesco Russo P. Pierre Vallois Stochastics: An International Journal of Probability and Stochastic Processes 2000 70 1-2 1--40 Approximate Policy Iteration Schemes: A Comparison B. Bruno Scherrer June 2014 Approximate Modified Policy Iteration and its Application to the Game of Tetris B. Bruno Scherrer M. Mohammad Ghavamzadeh V. Victor Gabillon B. Boris Lesner M. Matthieu Geist Journal of Machine Learning Research 2015 16 1629--1676 Improved and Generalized Upper Bounds on the Complexity of Policy Iteration B. Bruno Scherrer Mathematics of Operations Research February 2016 On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes B. Bruno Scherrer B. Boris Lesner December 2012 Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris B. Bruno Scherrer Journal of Machine Learning Research January 2013 14 1175-1221 Memory-based persistence in a counting random walk process P. Pierre Vallois C. S. Charles S. Tapiero Phys. A. 2007 386 1 303--307 The range of a simple random walk on Z P. Pierre Vallois Advances in applied probability 1996 1014--1033 An introduction to network inference and mining N. Nathalie Villa-Vialaneix 2015 The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate Y. Y. Ye Math. Oper. Res. 2011 36 4 593-603