BIGS - 2020 - Annual activity report

BIGS

BIGS - 2020

2020

Activity report

Project-Team

BIGS

RNSR: 200920955T

Research center

Nancy - Grand Est

In partnership with:

CNRS, Université de Lorraine

Biology, genetics and statistics

In collaboration with:

Institut Elie Cartan de Lorraine (IECL)

Domain

Digital Health, Biology and Earth

Theme

Computational Biology

Creation of the Team: 2009 January 01, updated into Project-Team: 2011 January 01

Keywords

Computer Science and Digital Science

A3.1. Data
A3.1.1. Modeling, representation
A3.2. Knowledge
A3.2.3. Inference
A3.3. Data and knowledge analysis
A3.3.1. On-line analytical processing
A3.3.2. Data mining
A3.3.3. Big data analysis
A3.4.1. Supervised learning
A3.4.2. Unsupervised learning
A3.4.4. Optimization and learning
A3.4.7. Kernel methods
A6. Modeling, simulation and control
A6.1. Methods in mathematical modeling
A6.1.2. Stochastic Modeling
A6.2. Scientific computing, Numerical Analysis & Optimization
A6.2.3. Probabilistic methods
A6.2.4. Statistical methods
A6.4. Automatic control
A6.4.2. Stochastic control

1 Team members, visitors, external collaborators

Research Scientists

Nicolas Champagnat [Inria, Senior Researcher, from Dec 2020, HDR]
Coralie Fritsch [Inria, Researcher, from Dec 2020]
Ulysse Herbach [Inria, Researcher]
Bruno Scherrer [Inria, Researcher, HDR]

Faculty Members

Anne Gégout Petit [Team leader, Univ de Lorraine, Professor, HDR]
Thierry Bastogne [Univ de Lorraine, Professor, HDR]
Sandie Ferrigno [Univ de Lorraine, Associate Professor]
Sophie Mezieres [Univ de Lorraine, Associate Professor]
Jean-Marie Monnez [Univ de Lorraine, Emeritus, HDR]
Aurélie Muller-Gueudin [Univ de Lorraine, Associate Professor]
Samy Tindel [Univ de Lorraine, Professor, HDR]
Pierre Vallois [Univ de Lorraine, Professor, HDR]
Denis Villemonais [Univ de Lorraine, Associate Professor, from Sep 2020, HDR]

Post-Doctoral Fellows

Emma Horton [Université de Bath - Angleterre, until Nov 2020]
William Ocafrain [Inria, from Dec 2020]

PhD Students

Vincent Hass [Inria, from Dec 2020]
Clémence Karmann [Univ de Lorraine, until Aug 2020]
Rodolphe Loubaton [Univ de Lorraine, from Dec 2020]
Nassim Sahki [Inria]
Nino Vieillard [Google, CIFRE]
Nicolás Zalduendo Vidal [Inria, from Dec 2020]

Technical Staff

Benoît Lalloué [Centre hospitalier universitaire de Nancy, Engineer]
Nicolas Thorr [Inria, Engineer, from Dec 2020]

Interns and Apprentices

Salma Aziz [Inria, from Aug 2020 until Sep 2020]
Alfred Kamdem Tezanlekeu [Inria, from Sep 2020]

Administrative Assistant

Celine Cordier [Inria]

External Collaborators

Céline Lacaux [Univ d'Avignon et des pays du Vaucluse, HDR]
Lionel Lenôtre [Univ de Haute Alsace]

2 Overall objectives

BIGS is a joint team of Inria, CNRS and Université Lorraine, via the Institut Élie Cartan, UMR 7502 CNRS-UL laboratory in mathematics, of which Inria is a strong partner. One member of BIGS, T. Bastogne, comes from the Research Center of Automatic Control of Nancy (CRAN), with which BIGS has strong relations in the domain "Health-Biology-Signal". Our research is mainly focused on stochastic modeling and statistics but also aiming at a better understanding of biological systems. BIGS involves applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) population studies of genomic data and of micro-organisms genomics, (4) epidemiology and e-health.

3 Research program

3.1 Introduction

We give here the main lines of our research that belongs to the domains of probability and statistics. For clarity, we made the choice to structure them in four items. Although this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.

3.2 Stochastic modeling

Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells 73, 50. Piecewise deterministic processes are non diffusion processes also frequently used in the biological context 40, 49, 42. Among Markov model, we developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations 66, 48. For instance, knowledge about Brownian or random walk excursions 72, 64 helps to analyse genetic sequences and to develop inference about it. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion.

This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion of fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the so-called rough paths tools 56, but also invoking the Russo-Vallois integration techniques 65. The specific issue of Volterra equations driven by fractional Brownian motion, which is central for the subdiffusion within proteins problem, is addressed in 41. Many generalizations (Gaussian or not) of this model have been recently proposed for some Gaussian locally self-similar fields, or for some non-Gaussian models 53, or for anisotropic models 37.

3.3 Estimation and control for stochastic processes

We develop inference about stochastic processes that we use for modeling. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.

There are many estimation techniques for diffusion processes or coefficients of fractional or multifractional Brownian motion according to a set of observations 52, 33, 39. But, the inference problem for diffusions driven by a fractional Brownian motion is still in its infancy. Our team has a good expertise about inference of the jump rate and the kernel of Piecewise Deterministic Markov Processes (PDMP) 31, 30, 29, 32. However, there are many directions to go further into. For instance, previous works made the assumption of a complete observation of jumps and mode, that is unrealistic in practice. We tackle the problem of inference of "Hidden PDMP". As an example, in pharmacokinetics modeling inference, we want to take into account for presence of timing noise and identification from longitudinal data. We have expertise on this subjects 34, and we also used mixed models to estimate tumor growth 35.

We consider the control of stochastic processes within the framework of Markov Decision Processes 63 and their generalization known as multi-player stochastic games, with a particular focus on infinite-horizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of 36. Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years 75, 62, 47, 69, but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes 71, 70, 67, 55, 68, and we currently investigate whether/how similar ideas may be adapted to multi-player stochastic games.

3.4 Algorithms and estimation for graph data

A graph data structure consists of a set of nodes, together with a set of pairs of these nodes called edges. This type of data is frequently used in biology because they provide a mathematical representation of many concepts such as biological structures and networks of relationships in a population. Some attention has recently been focused in the group on modeling and inference for graph data.

Network inference is the process of making inference about the link between two variables taking into account the information about other variables. 74 gives a very good introduction and many references about network inference and mining. Many methods are available to infer and test edges in Gaussian graphical models 74, 57, 45, 46. However, when dealing with abundance data, because inflated zero data, we are far from gaussian assumption and we want to develop inference in this case.

Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model. We also work on lossy compression of trees via directed acyclic graphs. These methods enable us to compute distances between tree data faster than from the original structures and with a high accuracy.

3.5 Regression and machine learning

Regression models and machine learning aim at inferring statistical links between a variable of interest and covariates. In biological study, it is always important to develop adapted learning methods both in the context of standard data and also for data of high dimension (with sometimes few observations) and very massive or online data.

Many methods are available to estimate conditional quantiles and test dependencies 61, 51. Among them we have developed nonparametric estimation by local analysis via kernel methods 43, 44 and we want to study properties of this estimator in order to derive a measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to develop omnibus tests that examine several assumptions together.

Concerning the analysis of high dimensional data, our view on the topic relies on the French data analysis school, specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool 54, which allows one to approximate eigenvectors in a stepwise manner 59, 58, 60. BIGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms 38. We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA (Principal Component Analysis).

We also focus on the biological context of high-throughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. We have to account for the inter-individual variability within the modeling procedure. We aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (Expectation-Maximisation) algorithm for the estimation of the model parameters.

4 Application domains

4.1 Tumor growth-oncology

On this topic, we want to propose branching processes to model appearance of mutations in tumor through new collaborations with clinicians. The observed process is the "circulating DNA" (ctDNA). The final purpose is to use ctDNA as a early biomarker of the resistance to an immunotherapy treatment. It is the aim of the ITMO project. Another topic is the identification of dynamic network of expression. In the ongoing work on low-grade gliomas, a local database of 400 patients will be soon available to construct models. We plan to extend it through national and international collaborations (Montpellier CHU, Montreal CRHUM). Our aim is to build a decision-aid tool for personalised medicine. In the same context, there is a topic of clustering analysis of a brain cartography obtained by sensorial simulations during awake surgery.

4.2 Genomic data and micro-organisms population

Despite of his 'G' in the name of BIGS, Genetics is not central in the applications of the team. However, we want to contribute to a better understanding of the correlations between genes trough their expression data and of the genetic bases of drug response and disease. We have contributed to methods detecting proteomics and transcriptomics variables linked with the outcome of a treatment.

4.3 Epidemiology and e-health

We have many works to do in our ongoing projects in the context of personalized medicine with CHU Nancy. They deal with biomarkers research, prognostic value of quantitative variables and events, scoring, and adverse events. We also want to develop our expertise in rupture detection in a project with APHP (Assistance Publique Hôpitaux de Paris) for the detection of adverse events, earlier than the clinical signs and symptoms. The clinical relevance of predictive analytics is obvious for high-risk patients such as those with solid organ transplantation or severe chronic respiratory disease for instance. The main challenge is the rupture detection in multivariate and heterogeneous signals (for instance daily measures of electrocardiogram, body temperature, spirometry parameters, sleep duration, etc.). Other collaborations with clinicians concern foetopathology and we want to use our work on conditional distribution function to explain fetal and child growth. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy).

4.4 Dynamics of telomeres

Telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Through a collaboration with Pr A. Benetos, geriatrician at CHU Nancy, we recently obtained data on the distribution of the length of telomeres from blood cells. With members of Inria team TOSCA, we want to work in three connected directions: (1) refine methodology for the analysis of the available data; (2) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior, quasi-stationarity, etc.); and (3) use these properties to develop new statistical methods. A slot of postdoc position is already planned in the Lorraine Université d'Excellence, LUE project GEENAGE (managed by CHU Nancy).

5 Social and environmental responsibility

We followed Inria's recommendations to get involved in the fight against COVID 19. We responded to the WHO's encouragement, relayed by our mathematical colleagues at the national level, to conduct seroprevalence studies in randomly drawn samples of the population. This is the purpose of the COVAL study described in the results section, initiated by Pierre Vallois.

6 Highlights of the year

The highlight of the year is the merger between BIGS and the members of the former TOSCA team, specialised in modelling for biological sciences and medicine: Nicolas Champagnat, Coralie Fritsch, Denis Villemonais and their post-doc and PhD students. The other highlights of the year are, unsurprisingly, those of the pandemic: most of our teachers devoted a lot of time to distance learning. Other researchers, especially PhD students, suffered from the lack of contacts and meetings. Part of the team was involved in supervising a seroprevalence study. Thanks to the quality of the collaboration with hospital doctors in this study, we are now involved in modelling the amount of coronavirus in wastewater in order to predict the number of hospital admissions.

7 New software and platforms

7.1 New software

7.1.1 Angio-Analytics

Keywords: Health, Cancer, Biomedical imaging
Scientific Description: This tool allows the pharmacodynamic characterization of anti-vascular effects in anti-cancer treatments. It uses time series of in vivo images provided by intra-vital microscopy. Such in vivo images are obtained owing to skinfold chambers placed on mice skin. The automatized analysis is split up into two steps that were completely performed separately and manually before. The first steps corresponds to image processing to identify characteristics of the vascular network. The last step is the system identification of the pharmacodynamic response and the statistical analysis of the model parameters.
Functional Description: Angio-Analytics allows the pharmacodynamic characterization of anti-vascular effects in anti-cancer treatments.
Contact: Thierry Bastogne
Participant: Thierry Bastogne

7.1.2 ARMADA

Name: A Statistical Methodology to Select Covariates in High-Dimensional Data under Dependence
Keywords: Biostatistics, Aggregated methods, High Dimensional Data, Personalized medicine, Variable selection
Functional Description: Two steps variable selection procedure in a context of high-dimensional dependent data but few observations. First step is dedicated to eliminate dependence between variables (clustering of variables, followed by factor analysis inside each cluster). Second step is a variable selection using by aggregation of adapted methods. <https://hal.archives-ouvertes.fr/hal-02173568>
News of the Year: This package is a new one.
URL: https://cran.r-project.org/web/packages/armada/
Publication: hal-02363338
Contacts: Aurélie Muller, Anne Gégout-Petit
Participants: Aurélie Muller, Anne Gégout-Petit

7.1.3 kosel

Name: Variable Selection by Revisited Knockoffs Procedures
Keywords: Variable selection, Regression
Functional Description: Performs variable selection for many types of L1-regularised regressions using the revisited knockoffs procedure. This procedure uses a matrix of knockoffs of the covariates independent from the response variable Y. The idea is to determine if a covariate belongs to the model depending on whether it enters the model before or after its knockoff. The procedure suits for a wide range of regressions with various types of response variables. Regression models available are exported from the R packages 'glmnet' and 'ordinalNet'. Based on the paper linked to via the URL below: Gegout A., Gueudin A., Karmann C. (2019) <arXiv:1907.03153>
News of the Year: This package is a new one.
URL: https://cran.r-project.org/web/packages/kosel/kosel.pdf
Publication: hal-01799914
Contacts: Clémence Karmann, Aurélie Muller
Participants: Clémence Karmann, Aurélie Muller, Anne Gégout-Petit

7.1.4 SesIndexCreatoR

Functional Description: This package allows computing and visualizing socioeconomic indices and categories distributions from datasets of socioeconomic variables (These tools were developed as part of the EquitArea Project, a public health program).
URL: http://www.equitarea.org/documents/packages_1.0-0/
Contact: Benoît Lalloué
Participants: Benoît Lalloué, Jean-Marie Monnez, Nolwenn Le Meur, Severine Deguen

7.1.5 In silico

Name: In silico design of nanoparticles for the treatment of cancers by enhanced radiotherapy
Keywords: Bioinformatics, Cancer, Drug development
Functional Description: To speed up the preclinical development of medical engineered nanomaterials, we have designed an integrated computing platform dedicated to the virtual screening of nanostructured materials activated by X-ray making it possible to select nano-objects presenting interesting medical properties faster. The main advantage of this in silico design approach is to virtually screen a lot of possible formulations and to rapidly select the most promising ones. The platform can currently handle the accelerated design of radiation therapy enhancing nanoparticles and medical imaging nano-sized contrast agents as well as the comparison between nano-objects and the optimization of existing materials.
Contact: Thierry Bastogne
Participant: Thierry Bastogne

7.1.6 HSPOR

Name: Hidden Smooth Polynomial Regression for Rupture Detection
Keywords: Polynomial regression, Rupture detection
Functional Description: Several functions that allow by different methods to infer a piecewise polynomial regression model under regularity constraints, namely continuity or differentiability of the link function. The implemented functions are either specific to data with two regimes, or generic for any number of regimes, which can be given by the user or learned by the algorithm.
News of the Year: This package is a new one
URL: https://cran.r-project.org/web/packages/HSPOR/
Contact: Florine Greciet
Participants: Florine Greciet, Romain Azais, Anne Gégout-Petit

7.1.7 cvmgof

Keywords: Regression, Test, Estimators
Scientific Description: Many goodness-of-fit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are "directional" in that they detect departures from a given assumption of the model. Other tests are "global" (or "omnibus") in that they assess whether a model fits a dataset on all its assumptions. cvmgof focuses on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. It implements 2 nonparametric "directional" tests and one nonparametric "global" test, all based on generalizations of the Cramer-von Mises statistic.
Functional Description: cvmgof is an R library devoted to Cramer-von Mises goodness-of-fit tests. It implements three nonparametric statistical methods based on Cramer-von Mises statistics to estimate and test a regression model.
News of the Year: New version available on CRAN website since Jan 11 2021 Preprint available on HAL since Jan 7 2021
URL: https://cran.r-project.org/web/packages/cvmgof/index.html
Publication: hal-03101612v1
Contacts: Sandie Ferrigno, Romain Azais
Participants: Sandie Ferrigno, Marie-José Martinez, Romain Azais

7.1.8 starm R

Name: Spatio-Temporal Autologistic Regression Model, package R
Keywords: Spatio-temporal, Autologistic model
Functional Description: Estimation and model selection of the two-time centered autologistic regression model based on Gegout-Petit A., Guerin-Dubrana L., Li S. "A new centered spatio-temporal autologistic regression model. Application to local spread of plant diseases." 2019 <arXiv:1811.06782>. Application for the spatio-temporal modelling of the spread of a disease on a grid over time.
Contact: Anne Gégout-Petit

8 New results

8.1 Stochastic modelling

Participants: Anne Gégout-Petit, Ulysse Herbach, Sophie Wantz-Mézières, Pierre Vallois.

8.1.1 Modelling of diffuse low-grade gliomas growth

We are continuing our research on the modelling of the growth of low grade diffuse gliomas. We propose an original MRI-based method to quantify gliomas brain infiltration, easy to implement and to interpret for Neuro-oncologists. The aim is to guide the treatment strategy in giving functional information using only anatomical knowlege and conventional MRI sequences. This work has been the subject of a conference paper 15.

A retrospective survival study over 35 years follow-up has been done 9.

8.1.2 Reconstruction of epigenetic landscapes from single-cell data

The aim is to better understand how living cells make decisions (e.g., differentiation of a stem cell into a particular specialized type), seeing decision-making as an emergent property of an underlying complex molecular network. Indeed, it is now proven that cells react probabilistically to their environment: cell types do not correspond to fixed states, but rather to “potential wells” of a certain energy landscape (representing the energy of the possible states of the cell) that we are trying to reconstruct. A first paper proposing a reconstruction method has been submitted 24 in the framework of an international collaboration (USA, Switzerland, France). Another paper is about to be submitted, dealing more specifically with the inference of the underlying networks.

Joint work with Nan Papili Gao (ETH Zurich), Olivier Gandrillon (ENS Lyon), András Páldi (EPHE, Paris), and Rudiyanto Gunawan (University at Buffalo, New York)

8.2 Optimal control of Markov processes

Participants: Bruno Scherrer, Nino Vieillard.

In 13, we adapt the optimization's concept of momentum to reinforcement learning. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive q-functions. We derive Momentum Value Iteration (MoVI), a variation of Value iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically, we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games. This work has been published in the AISTATS conference.

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. In 12, we study KL regularization within an approximate value iteration scheme and show that it implicitly averages q-values. Leveraging this insight, we provide a very strong performance bound, the very first to combine two desirable aspects: a linear dependency to the horizon (instead of quadratic) and an error propagation term involving an averaging effect of the estimation errors (instead of an accumulation effect). We also study the more general case of an additional entropy regularizer. The resulting abstract scheme encompasses many existing RL algorithms. Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study. This work has been accepted to the Neurips conference and selected for oral presentation (selection rate: 1.1% of all submissions)

Joint work with Matthieu Geist, Olivier Pietquin, Rémi Munos and Tadashi Kozuno (Google Brain Paris).

8.3 Regression and machine learning

Participants: Thierry Bastogne, Sandie Ferrigno, Anne Gégout-Petit, Clémence Karmann, Benoît Lalloué, Jean-Marie Monnez, Pauline Guyot, Aurélie Gueudin, Clémence Karmann, Sophie Wantz-Mézières.

8.3.1 Cramér–von Mises goodness-of-fit tests in regression models

Many goodness-of-fit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are 'directional' in that they detect departures from a given assumption of the model. Other tests are 'global' (or 'omnibus') in that they assess whether a model fits a dataset on all its assumptions. We focus on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. We consider 2 nonparametric 'directional' tests and one nonparametric 'global' test, all based on generalizations of the Cramér–von Mises statistic.

To perform these goodness-of-fit tests, we develop the R package cvmgof (https://hal.archives-ouvertes.fr/hal-02014516), an easy-to-use tool for practitioners, available from the Comprehensive R Archive Network (https://CRAN.R-project.org/package=cvmgof). The use of the library is illustrated through a tutorial on real data and simulation studies are carried out in order to show how the package can be exploited to compare the 3 implemented tests. The practitioner can also easily compare the test procedures with different kernel functions, bootstrap distributions, numbers of bootstrap replicates, or bandwidths. A first article 22 has been submitted on this work.

To complete this work, it would be interesting to assess the other assumptions of a regression model such as the functional form of the variance or the additivity of the random error term. It should be noted that this can already be done using Ducharme and Ferrigno test implemented in cvmgof since it is a global test. However, it would be relevant to compare the results obtained from Ducharme and Ferrigno test with the ones obtained from other directional tests, especially developed to assess one of these specific assumptions. The implementation of these directional tests would enrich cvmgof package and offer a complete easy-to-use tool for validating regression models. Moreover, the assessment of the overall validity of the model when using several directional tests could be compared with that done when using only a global test. In particular, the well-known problem of multiple testing could be discussed by comparing the results obtained from multiple test procedures with those obtained when using a global test strategy. Another perspective of this work would be to develop a similar tool for other statistical models widely used in practice such as generalized linear models.

Join work with Romain Azaïs (INRIA, ENS Lyon) and Marie-José Martinez (LJK, Université Grenoble Alpes).

8.3.2 The revisited knockoffs method for variable selection in L1-penalized regressions

We consider the problem of variable selection in regression models. In particular, we are interested in selecting explanatory covariates linked with the response variable and we want to determine which covariates are relevant, that is which covariates are involved in the model. In this framework, we deal with L1-penalized regression models. To handle the choice of the penalty parameter to perform variable selection, we develop a new method based on the knockoffs idea. This revisited knockoffs method is general, suitable for a wide range of regressions with various types of response variables. Besides, it also works when the number of observations is smaller than the number of covariates and gives an order of importance of the covariates. Finally, we provide many experimental results to corroborate our method and compare it with other variable selection methods. This work is published in 5 and is implemented in package ‘kosel’.

The next subsections are dedicated to online data analysis

8.3.3 Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods

Accepted in Journal of Multivariate Analysis in October 2020 8.

We prove the almost sure convergence of processes of Oja type to eigenvectors of the expectation of a random matrix while relaxing the i.i.d. assumptions on the observed random matrices. As an application of this generalization, we can perform the online PCA of a random vector Z when there is a data stream of i.i.d. observations of Z, even when both the metric used M and the expectation of Z are unknown and estimated online. Moreover, in order to update the stochastic approximation process at each step we are no more bound to using only a data mini-batch of observations of Z, but we can use all the previous observations up to the current step without storing them. This is useful not only when dealing with streaming data but also with Big Data as on can process it sequentially as a data stream. In addition, the general framework of this process, unlike other algorithms in the literature, covers also the case of factorial methods related to PCA.

In collaboration with A. Skiredj.

8.3.4 Streaming constrained binary logistic regression with online standardized data

Accepted in "Journal of Applied Statistics" in December 2020 7.

Online learning is a method for analyzing very large datasets ("big data") as well as data streams. In this article, we consider the case of constrained binary logistic regression and show the interest of using processes with an online standardization of the data, in particular to avoid numerical explosions or to allow the use of shrinkage methods. We prove the almost sure convergence of such a process and propose using a piecewise constant step-size such that the latter does not decrease too quickly and does not reduce the speed of convergence. We compare twenty-four stochastic approximation processes with raw or online standardized data on five real or simulated datasets. Results show that, unlike processes with raw data, processes with online standardized data can prevent numerical explosions and yield the best results.

In collaboration with E. Albuisson.

8.3.5 Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression

Submitted in Februray 2021 25, 20.

The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, without the need to store all of the previously obtained data. Poisson bootstrap and stochastic approximation processes were used with online standardized data to avoid numerical explosions, the convergence of which has been established theoretically. This empirical convergence of online ensemble scores to a reference "batch" score was studied on five different datasets from which data streams were simulated, comparing six different processes to construct the online scores. For each score, 50 replications using a total of $10 N$ observations ( $N$ being the size of the dataset) were performed to assess the convergence and the stability of the method, computing the mean and standard deviation of a convergence criterion. A complementary study using $100 N$ observations was also performed. The best processes were averaged processes using online standardized data and a piecewise constant step-size.

8.3.6 Change-point detection theresholds in the sequential context

Our work around change-point theresholds for the score-based CUSUM statistic in a sequential context has been published 11. In this paper, we consider the score-based cumulative sum statistic and propose to evaluate the detection performance of somethresholds on simulated data. Three thresholds come from the literature: the Wald constant, the empirical constant, and the conditional empirical instantaneous threshold. Two new thresholds are built by a simulation-based procedure: the first one is instantaneous, the second is a dynamical version of the previous one. The thresholds' performance measured by an estimation of the mean time between false alarm (MTBFA) and the average detection delay (ADD) are evaluated on independent and autocorrelated data for several scenarios, according to the detection objective and the real change in the data. The simulations allow us to compare the difference between the thresholds' results and to see that their performances prove to be robust when a parameter of the prechange regime is poorly estimated or when the data independence assumption is violated. We found also that the conditional empirical threshold is the best at minimizing the detection delay while maintaining the given false alarm rate. However, on real data, we suggest to use the dynamic instantaneous threshold because it is the easiest to build for practical implementation.

Our collaboration with APHP could not succeed because of the great delay in data collection. To apply our algorithms to real data, we turned to some EMG signal data provided by INRS. The study concerns the development of trapezius muscle myalgia in the workplace. We apply change-point detection to characterise different computer activities carried out during an experimental day.

8.4 Statistical learning and application in health

Participants: Ulysse Herbach, Sandie Ferrigno, Anne Gégout-Petit, Aurélie Gueudin, Pierre Vallois, Benoît Lalloué, Jean-Marie Monnez, Nicolas Thorr, Pierre Vallois.

8.4.1 Estimation of reference curves for fetal weight

In Epidemiology, we are working with INSERM to study fetal development in the last two trimesters of pregnancy. Reference or standard curves are required in this kind of biomedical problems. Values which lie outside the limits of these reference curves may indicate the presence of disorder. Data are from the French EDEN mother-child cohort (INSERM). It's a mother-child cohort study investigating the prenatal and early postnatal determinants of child health and development. 2002 pregnant women were recruited before 24 weeks of amenorrhoea in two maternity clinics from middle-sized French cities (Nancy and Poitiers). From May 2003 to September 2006, 1899 newborns were then included. The main outcomes of interest are fetal (via ultra-sound) and postnatal growth, adiposity development, respiratory health, atopy, behaviour and bone, cognitive and motor development. We are studying fetal weight that depends on the gestional age in the second and the third trimesters of mother's pregnancy. Some classical empirical and parametric methods as polynomial are first used to construct these curves. Polynomial regression is one of the most common parametric approach for modelling growth data espacially during the prenatal period. However, some of them requires strong assumptions. So, we propose to work with semi-parametric LMS method, by modifying the response variable (fetal weight) with a Box-cox transformation. A first article detailing these methodologies applied to the data is being written.

Alternative nonparametric methods as Nadaraya-Watson kernel estimation, local polynomial estimation, B-splines or cubic splines are also developed in this context to construct these curves. The practical implementation of these methods required working on smoothing parameters or choice of knots for the different types of nonparametric estimation. In particular, optimal choice of these parameters has been proposed. Then, a first version of an R package has been developed to propose a tool to construct nonparametric reference curves. This should be submitted to CRAN very soon. In addition, a graphical interface (GUI) intended for practitioners has been developed to allow intuitive visualization of the results given by the package.

Join work with Myriam Maumy-Bertrand (IRMA, Université de Strasbourg) and INSERM.

8.4.2 Construction of parsimonious event risk scores by an ensemble method. An illustration for short-term predictions in chronic heart failure patients from the GISSI-HF trial

Submitted in December 2020 27.

Heart failure (HF) is a worldwide major cause of mortality and morbidity for which many predictive scores have been defined. Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. The stepwise selection allows constructing a succession of scores with the practitioner able to choose which score best fits his or her needs. The methods proposed herein can be reproduced on any set of variables as long as the training dataset comprises a sufficient number of cases. Three methods were compared in an application to construct parsimonious short-term scores in chronic HF patients. The working sample consisted of 11,411 couples patient-visit dyads from the GISSI-HF database, with 5,595 events and 5,816 non-events. Sixty-two candidate explanatory variables were studied. Focusing on the fastest method, four scores were constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables). These results are slightly better than those obtained by other scores reported in the literature using a similar number of variables.

In collaboration with E. Albuisson and D. Lucci.

8.4.3 Modeling and estimation of circulating tumor DNA (ctDNA) dynamics for detecting resistance to targeted therapies

Continuation of the ITMO Cancer project, supervised by Nicolas Champagnat, concerning the modeling of circulating tumor DNA (ctDNA) to detect the appearance of resistance to targeted therapies (personalized medicine). After a phase of investigation of possible scenarios in collaboration with Alexandre Harlé of the Institute of Cancerology of Lorraine (ICL), a final model was selected. Based on a mathematical analysis, the members of the project then designed a statistical inference algorithm (learning the parameters of the model, including the genealogical tree of mutations for each patient) which is intended to be validated on real data currently being acquired at the Nancy CHRU. The general idea is to exploit a “variational principle” that allows to explore the discrete space of family trees, of very large size, through a “pivot” space of continuous parameters, easy to optimize (and in reasonable numbers). An article detailing the model and its inference is currently being written.

In collaboration with N. Champagnat and C. Fritsch.

8.4.4 A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology

We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. A simulation study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced non-small-cell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments. This work is published in 2 and is implemented in R package ‘ARMADA’.

In collaboration with T. Boukhobza and H. Dumond from CRAN and B. Bastien from biopharmaceutical industry Transgene.

8.4.5 Project linked with the COVID 19 pandemic

Pierre Vallois is the scientific coordinator of the seroprevalence study COVAL Nancy held in Nancy in July 2020 in collaboration with CHRU de Nancy (CIC épidémiologie clinique and Laboratoire de Virologie).

Background. The World Health Organisation recommends monitoring the circulation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We aimed to estimate anti–SARS-CoV-2 total immunoglobulin (IgT) antibody seroprevalence and describe symptom profiles and in vitro seroneutralization in Nancy, France, in spring 2020.

Methods. Individuals were randomly sampled from electoral lists and invited with household members over 5 years old to be tested for anti–SARS-CoV-2 (IgT, i.e. IgA/IgG/IgM) antibodies by ELISA (Bio-rad). Serum samples were classified according to seroneutralization activity 50 % (NT50) on Vero CCL-81 cells. Age- and sex-adjusted seroprevalence was estimated. Subgroups were compared by chi-square or Fisher exact test and logistic regression.

Results. Among 2006 individuals, 43 were SARS-CoV-2–positive; the raw seroprevalence was 2.1 % (95 % confidence interval 1.5 to 2.9), with adjusted metropolitan and national standardized seroprevalence 2.5 % (1.8 to 3.3) and 2.3 % (1.7 to 3.1). Seroprevalence was highest for 20- to 34-year-old participants (4.7 % [2.3 to 8.4]), within than out of socially deprived area (2.5 % vs 1 %, P=0.02) and with than without intra-family infection (p<10-6). Moreover, 25 % (23 to 27) of participants presented at least one COVID-19 symptom associated with SARS-CoV-2 positivity (p<10-13), with anosmia or ageusia highly discriminant (odds ratio 27.8 [13.9 to 54.5]), associated with dyspnea and fever. Among the SARS-CoV-2-positives, 16.3 % (6.8 to 30.7) were asymptomatic. For 31 of these individuals, positive seroneutralization was demonstrated in vitro.

Conclusions. In this population of very low anti-SARS-CoV-2 antibody seroprevalence, a beneficial effect of the lockdown can be assumed, with frequent SARS-CoV-2 seroneutralization among IgT-positive patients.

9 Bilateral contracts and grants with industry

9.1 Bilateral contracts with industry

- R. Azaïs, A. Gégout-Petit, F. Greciet collaborated with SAFRAN Aircraft Engines (through a 2016-2019 contract). SAFRAN Aircraft Engines designs and products aircraft engines. For the design of pieces, they have to understand the mechanism of crack propagation under different conditions. BIGS models crack propagation with Piecewise Deterministic Markov Processes (PDMP).

- B. Scherrer collaborate with Google brain on reinforcement learning in the framework of the PhD thesis of Nino Vieillard

10 Partnerships and cooperations

10.1 International initiatives

10.1.1 Participation in other international programs

In Fall 2020, Bruno Scherrer was invited for 4 months in Berkeley to participate to Simons Institute Programme on the Theory of Reinforcement Learning. Due to the Covid constraints, the semester was eventually hold online.

10.2 International research visitors

10.2.1 Visits of international scientists

Juhyun Park (Lancaster University) visited Nancy for one week in the framework of her collaboration with A. Gégout-Petit on statistical test for paired distribution.

10.3 National initiatives

FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing ; leader : Pr Athanase Benetos), Jean-Marie Monnez, Benoît Lalloué, Anne Gégout-Petit.
RHU Fight HF (Fighting Heart Failure; leader: Pr Patrick Rossignol), located at the University Hospital of Nancy, Jean-Marie Monnez, Benoît Lalloué.
Project "Handle your heart", team responsible for the creation of a drug prescription support software for the treatment of heart failure, head: Jean-Marie Monnez.
A. Gégout-Petit, N. Sahki, S. Mézières are involved in the learning aspect of the clinical protocol "EOLEVAL" with Assistance Publique des Hopitaux de Paris (APHP).
"ITMO Physics, mathematics applied to Cancer" (2017-2019): "Modeling ctDNA dynamics for detecting targeted therapy", Funding organisms: ITMO Cancer, ITMO Technologies pour la santé de l’alliance nationale pour les sciences de la vie et de la santé (AVIESAN), INCa, Leader: N. Champagnat (Inria TOSCA), Participants: A. Gégout-Petit, A. Muller-Gueudin, P. Vallois, U. Herbach.
PEPS AMIES (2019-2020), Etude Biométrique en foetopathologie et développement de l'enfant, Collaboration between Institut Elie Cartan and the CRESS INSERM, S. Ferrigno.
Modular, multivalent and multiplexed tools for dual molecular imaging (2017-2020), Funding organism: ANR, Leader: B Kuhnast (CEA). Participant: T. Bastogne.
Sophie Mézières belongs to GDR 720 ISIS, Funding organism: CNRS, leader: Laure Blanc-Féraud.

10.4 Regional initiatives

CHRU de Nancy. We have good collaborations with several researchers from CHRU de Nancy. We are involved in LUE Impact Geenage in research axis telomeres.
CHRU de Nancy. Joint initiave of the Sars-Cov2 seroprevalence study COVAL Nancy with CIC épidémiologie. https://clinicaltrials.gov

11 Dissemination

11.1 Promoting scientific activities

11.1.1 Journal

Ulysse Herbach was a guest editor for the journal “Mathematical Biosciences and Engineering” (special edition “Cells as dynamical systems”).

11.1.2 Invited talks

Anne Gégout-Petit was invited to a plenary communication in “Journées de Statistique”, Nice, France.
Ulysse Herbach was invited to a plenary communication in conference “Interplay between Oncology, Mathematics and Numerics”, Paris, France.

11.1.3 Research administration

Anne Gégout-Petit is the head of “Institut Élie Cartan de Lorraine” (mathematics laboratory of Université de Lorraine) since September 1st.

11.2 Teaching - Supervision - Juries

11.2.1 Teaching

Bruno Scherrer and Ulysse Herbach excepted, BIGS members have teaching obligations at "Université Lorraine" and are teaching at least 192 hours each year. They teach probability and statistics at different levels (Licence, Master, Engineering school). Many of them have pedagogical responsibilities.

A. Gégout-Petit: Head of the Master 2 "Ingénierie Mathématique pour la science des données (Mathematical Engineering for data science)", Université de Lorraine
T. Bastogne is in charge research master program "Santé Numérique et Imagerie Médicale" with the Faculty of Medicine, Université de Lorraine, France
Master: S. Ferrigno, Experimental designs, 4.5h, M1, fourth year of EEIGM, Université de Lorraine, France
Master: S. Ferrigno, Data analyzing and mining, 63h, M2, third year of Ecole des Mines, Université de Lorraine, France
Master: S.Ferrigno, Modeling and forecasting, 43h, M1, second year of Ecole des Mines, Université de Lorraine, France
Master: S.Ferrigno, Training projects, 18h, M1/M2, second and third year of Ecole des Mines, Université de Lorraine, France
Master: A. Muller-Gueudin, Probability and Statistics, 160h, second year of ENSEM and ENSAIA, University of Lorraine, France.
Master: A. Muller-Gueudin, Scientific calculation with Matlab, 20h, second year of ENSAIA, University of Lorraine, France.
Master: A.Gégout-Petit, Statistics, modeling, 15h, future teacher, Université de Lorraine, France
Master: A.Gégout-Petit, Statistics, modeling, data analysis, 80h, master in applied mathematics, Université de Lorraine, France
Master: S. Wantz-Mézières, Learning and analysis of medical data, 36h, with J.M. Moureaux, Master SNIM, Université de Lorraine, France
Licence: S. Wantz-Mézières, Applied mathematics for management, financial mathematics, Probability and Statistics, 160h, I.U.T. (L1/L2/L3)
Licence: S. Wantz-Mézières, Probability, 100h, first year in Telecom Nancy engineering school (initial and apprenticeship cursus)
Licence: A. Muller-Gueudin, Statistics, 60h, first year of ENSAIA, University of Lorraine, France.
Licence: S. Ferrigno, Descriptive and inferential statistics, 60h, L2, second year of EEIGM, Université de Lorraine, France
Licence: S. Ferrigno, Statistical modeling, 60h, L2, second year of EEIGM, Université de Lorraine, France
Licence: S. Ferrigno, Mathematical and computational tools, 20h, L3, third year of EEIGM, Université de Lorraine, France
Licence: S. Ferrigno, Training projects, 20h, L1/L3, first, second and third year of EEIGM, Université de Lorraine, France

11.2.2 Supervision

Defended PhD thesis

PhD: Florine Greciet, "Modèles markoviens déterministes par morceaux cachés pour la propagation de fissures", grant CIFRE SAFRAN AIRCRAFT ENGINES, Advisors : R. Azaïs, A. Gégout-Petit, Université de Lorraine, defense on January, 2020.

PhD thesis

PhD: Pauline Guyot, "Modélisation et Simulation de l’Electrocardiogramme d’un Patient Numérique", Grant : CIFRE-Cybernano. Advisors: T. Bastogne, E. H. Djermoune.
PhD: Nassim Shaki, "Détection de rupture dans des signaux multivariés pour la prédiction d’événement redouté à partir de paramètres physiologiques recueillis par capteurs connectés après greffe pulmonaire", grant Inria-Cordis. Advisors: A. Gégout-Petit, S. Wantz-Mézières, M. d'Ortho.
PhD: Nino Vieillard, "Deep Reinforcement Learning", CIFRE grand with Google Brain Paris. Advisors: B. Scherrer, M. Geist.

Post-doctoral positions

Benoît Lalloué, contract research engineer for two years, RHU Fight RF, supervised by Jean-Marie Monnez.
Postdoc: Emma Horton, Telomer Modelling, grant LUE GEENAGE. Advisors: A. Gégout-Petit, D. Villemonais. Emma was hired CR Inria at Bordeaux Sud-Ouest (ASTRAL team)

Other

Master: all BIGS members regularly supervise project and internship of master IMOI students.
Engineering school: all BIGS members regularly supervise projects of “École des Mines”, ENSEM, EEIGM or Télécom-Nancy students.

11.2.3 Juries

Anne Gégout-Petit wrote the report and participated to the jury of the Phd defense of Titin Agustin NENGSIH, Strasbourg University, March 16th.
Anne Gégout-Petit wrote the report and participated to the jury of the HDR defense of Maud Delattre, Paris-Saclay University, November 6th.
Anne Gégout-Petit is member of the “Jury du prix de thèse AMIES”.
Bruno Scherrer participated to the jury of the Phd defense of Matthieu Guillot, G-SCOP lab, Grenoble INP, July 3rd.
Bruno Scherrer participated to the jury of the Phd defense of Rituraj Kaushik, July 23rd.

11.3 Popularization

11.3.1 Education

Sandie Ferrigno: Advisor of a group of students (EEIGM), "La main à la Pâte" project, elementary schools, Nancy, January-June 2020.
Sandie Ferrigno: Advisor of a group of students (EEIGM), "Energies renouvelables", "La main à la Pâte" project, Institut médico-éducatif (IME), Commercy, January 2020.
Sandie Ferrigno: Advisor of a group of students (EEIGM), "L'Astronomie", Cgénial project, Collège Paul Verlaine, Malzéville, January 2020.
Sandie Ferrigno: Advisor of a group of students (EEIGM), "Le Chocolat", Cgénial project, Collège de la Craffe, Nancy, January 2020.

11.3.2 Interventions

Sophie Wantz-Mézières was part of the organization of a thematic and multidisciplinary week “Neurosciences, Neuro-oncologie et Numérique” for students from Télécom-Nancy and Faculté de Médecine de Nancy, janvier 2020.
Bruno Scherrer made detailed simulations of the reform for the retirement system that has been considered by Philippe's government in France 28.

12 Scientific production

12.1 Publications of the year

International journals

1 article Jean-BaptisteJ.-B. Barbry, Anne-SophieA.-S. Poinsard, ThierryT. Bastogne and OlivierO. Balland. Short-term effects of ocular 2% dorzolamide, 0.5% timolol or 0.005% latanoprost on the anterior segment architecture in healthy cats: a prospective study. Open Veterinary Journal 2020
HAL
2 articleBérangèreB. Bastien, TahaT. Boukhobza, HélèneH. Dumond, AnneA. Gégout-Petit, AurélieA. Muller-Gueudin and CharlèneC. Thiébaut. A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncologyJournal of Applied Statistics2021, 23
HAL DOI back to text
3 articleAurélienA. Buessler, TaharT. Chouihed, KévinK. Duarte, AdrienA. Bassand, MatthieuM. Huot-Marchand, YannickY. Gottwalles, AliceA. Pénine, EliesE. André, LionelL. Nace, DéborahD. Jaeger, MasatakeM. Kobayashi, StefanoS. Coiro, PatrickP. Rossignol and NicolasN. Girerd. Accuracy of Several Lung Ultrasound Methods for the Diagnosis of Acute Heart Failure in the ED: A Multicenter Prospective StudyChest1571January 2020, 99-110
HAL DOI
4 article MarieM. Ferrua, EtienneE. Minvielle, AudeA. Fourcade, BenoîtB. Lalloué, ClaudeC. Sicotte, MarioM. Di Palma and OlivierO. Mir. How to Design a Remote Patient Monitoring System? A French Case Study BMC Health Services Research 20 1 December 2020
HAL DOI
5 article AnneA. Gégout-Petit, AurélieA. Gueudin-Muller and ClémenceC. Karmann. The revisited knockoffs method for variable selection in L 1 -penalized regressions Communications in Statistics - Simulation and Computation July 2020
HAL DOI back to text
6 articlePaulineP. Guyot, El-HadiE.-H. Djermoune, BrunoB. Chenuel and ThierryT. Bastogne. A signal demodulation-based method for the early detection of Cheyne-Stokes respirationPLoS ONE153March 2020, e0221191
HAL DOI
7 article BenoîtB. Lalloué, Jean-MarieJ.-M. Monnez and ElianeE. Albuisson. Streaming constrained binary logistic regression with online standardized data Journal of Applied Statistics 2021
HAL DOI back to text
8 articleJean-MarieJ.-M. Monnez and AbderrahmanA. Skiredj. Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methodsJournal of Multivariate Analysis182March 2021, 19
HAL DOI back to text
9 articleTiphaineT. Obara, MarieM. Blonski, CyrilC. Brzenczek, SophieS. Mézières, YannY. Gaudeau, CelsoC. Pouget, GuillaumeG. Gauchotte, AntoineA. Verger, GuillaumeG. Vogin, Jean-MarieJ.-M. Moureaux, HuguesH. Duffau, FabienF. Rech and LucL. Taillandier. Adult diffuse low-grade gliomas: 35-year experience at the Nancy France neurooncology unitFrontiers in Oncology10October 2020, 574679
HAL DOI back to text
10 article PatrickP. Rossignol, KévinK. Duarte, NicolasN. Girerd, MoezM. Karoui, John J.V.J. McMurray, KarlK. Swedberg, DirkD. Veldhuisen, StuartS. Pocock, KennethK. Dickstein, FaiezF. Zannad and BertramB. Pitt. Cardiovascular risk associated with serum potassium in the context of mineralocorticoid receptor antagonist use in patients with heart failure and left ventricular dysfunction European Journal of Heart Failure January 2020
HAL DOI
11 articleNassimN. Sahki, AnneA. Gégout-Petit and SophieS. Wantz-Mézières. Performance study of change‐point detection thresholds for cumulative sum statistic in a sequential contextQuality and Reliability Engineering International1-21July 2020, 21
HAL DOI back to text

International peer-reviewed conferences

12 inproceedings NinoN. Vieillard, TadashiT. Kozuno, BrunoB. Scherrer, OlivierO. Pietquin, RémiR. Munos and MatthieuM. Geist. Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning NeurIPS - 34th Conference on Neural Information Processing Systems Vancouver / Online, Canada December 2020
HAL back to text
13 inproceedings NinoN. Vieillard, BrunoB. Scherrer, OlivierO. Pietquin and MatthieuM. Geist. Momentum in Reinforcement Learning Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy. PMLR : Volume 108. Copyright 2020 by the author(s). AISTATS 2020 - 23rd International Conference on Artificial Intelligence and Statistics 108 Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020 Palermo / Virtual, Italy 2020
HAL back to text

Conferences without proceedings

14 inproceedings LevyL. Batista, MathieuM. Milhem, ThierryT. Bastogne, FabienF. Clanché, GabinG. Personeni, Jean-PhilippeJ.-P. Jehl and GérômeG. Gauchard. A data-driven classification solution for the timed-up and go test in risk falling assessment EMBC 2020 - 42nd Engineering in Medicine and Biology Conference Montréal, Canada July 2020
HAL
15 inproceedings CyrilC. Brzenczek, SophieS. Wantz-Mézières, YannY. Gaudeau, MarieM. Blonski, FabienF. Rech, TiphaineT. Obara, Jean-MarieJ.-M. Moureaux and LucL. Taillandier. An original MRI-based method to quantify the diffuse low-grade glioma brain infiltration 10th International Conference on Image Processing Theory, Tools and Applications, IPTA’20 Paris, France November 2020
HAL back to text
16 inproceedings JeanneJ. Deleforterie, LucieL. Hassler and ThierryT. Bastogne. A dendrogram clustering of lipid nanoparticles 15th annual event of the ETPN – European Technology Platform on Nanomedicine, ETPN2020 Heraklion, Greece October 2020
HAL
17 inproceedings LucieL. Hassler and ThierryT. Bastogne. Approche bayésienne du Quality-by-Design appliquée à un bioprocédé d’extraction de principe actif 5th Bioproduction Congress Lyon, France September 2020
HAL
18 inproceedings YaëlY. Kolasa, ThierryT. Bastogne, Jean-PhilippeJ.-P. Georges and SylvainS. Kubler. Quality-by-design-engineered pBFT consensus configuration for medical device development EMBC 2020 - 42nd Engineering in Medicine and Biology Conference Montreal, Canada July 2020
HAL
19 inproceedings YaelY. Kolasa, EliottE. Gandiole and ThierryT. Bastogne. Quality-by-design development of a patient mobility e-monitoring system 2nd EAI International Conference on Wearables in Healthcare, EAI HealthWear 2020 Virtual, France 2020
HAL
20 inproceedings BenoîtB. Lalloué, Jean-MarieJ.-M. Monnez and ElianeE. Albuisson. Convergence d'un score d'ensemble en ligne : étude empirique 52e Journées de Statistique Nice, France https://jds2020.sciencesconf.org/ July 2020
HAL back to text

Doctoral dissertations and habilitation theses

21 thesis FlorineF. Greciet. Piecewise polynomial regression for crack propagation Université de Lorraine January 2020
HAL

Reports & preprints

22 misc RomainR. Azaïs, SandieS. Ferrigno and Marie-JoséM.-J. Martinez. cvmgof: an R package for Cramér-von Mises goodness-of-fit tests in regression models January 2021
HAL back to text
23 misc ThierryT. Bastogne. Supplementary material iQbD: a TRL-indexed Quality-by-Design Paradigm for Medical Device Development September 2020
HAL
24 misc Nan PapiliN. Gao, OlivierO. Gandrillon, AndrásA. Páldi, UlysseU. Herbach and RudiyantoR. Gunawan. Universality of cell differentiation trajectories revealed by a reconstruction of transcriptional uncertainty landscapes from single-cell transcriptomic data February 2021
HAL DOI back to text
25 misc BenoîtB. Lalloué, Jean-MarieJ.-M. Monnez and ElianeE. Albuisson. Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression February 2021
HAL back to text
26 misc BenoîtB. Lalloué and Jean-MarieJ.-M. Monnez. Ensemble methods and online learning for creation and update of prognostic scores in HF patients November 2020
HAL
27 misc BenoîtB. Lalloué, Jean-MarieJ.-M. Monnez, DonataD. Lucci and ElianeE. Albuisson. Construction of parsimonious event risk scores by an ensemble method. An illustration for short-term predictions in chronic heart failure patients from the GISSI-HF trial. December 2020
HAL back to text
28 report BrunoB. Scherrer. Simulations de carrières et retraites à points dans 3 cadres macro-économiques: modèle du gouvernement Philippe (âge-pivot bloqué), modèle du gouvernement Philippe corrigé (âge-pivot glissant), modèle Destinie2 (avec revalorisation de la fonction publique) INRIA March 2020
HAL back to text

12.2 Cited publications

29 articleRomainR. Azaïs. A recursive nonparametric estimator for the transition kernel of a piecewise-deterministic Markov processESAIM: Probability and Statistics182014, 726--749
back to text
30 articleRomainR. Azaïs, FrançoisF. Dufour and AnneA. Gégout-Petit. Non-Parametric Estimation of the Conditional Distribution of the Interjumping Times for Piecewise-Deterministic Markov ProcessesScandinavian Journal of Statistics414December 2014, 950--969
HAL DOI back to text
31 inproceedingsRomainR. Azaïs, FrançoisF. Dufour and AnneA. Gégout-Petit. Nonparametric estimation of the jump rate for non-homogeneous marked renewal processesAnnales de l'Institut Henri Poincaré, Probabilités et Statistiques494Institut Henri Poincaré2013, 1204--1231
back to text
32 article RomainR. Azaïs and AurélieA. Muller-Gueudin. Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes Electronic journal of statistics 2016
HAL back to text
33 incollectionJ. M.J. Bardet, G. Lang, G. Oppenheim, A. Philippe, S. Stoev and M.S.M. Taqqu. Semi-parametric estimation of the long-range dependence parameter: a surveyTheory and applications of long-range dependenceBirkhauser Boston2003, 557-577
back to text
34 articleThierryT. Bastogne, SophieS. Mézières-Wantz, NacimN. Ramdani, PierreP. Vallois and MurielM. Barberi-Heyob. Identification of pharmacokinetics models in the presence of timing noiseEur. J. Control1422008, 149--157URL: http://dx.doi.org/10.3166/ejc.14.149-157
DOI back to text
35 articleThierryT. Bastogne, AdelineA. Samson, PierreP. Vallois, SS. Wantz-Mézières, SophieS. Pinel, DeniseD. Bechet and MurielM. Barberi-Heyob. Phenomenological modeling of tumor diameter growth based on a mixed effects modelJournal of theoretical biology26232010, 544--552
back to text
36 book D.P.D. Bertsekas and J.N.J. Tsitsiklis. Neurodynamic Programming Athena Scientific 1996
back to text
37 articleHermineH. Biermé, CélineC. Lacaux and Hans-PeterH.-P. Scheffler. Multi-operator Scaling Random FieldsStochastic Processes and their Applications12111MAP5 2011-012011, 2642-2677
HAL DOI back to text
38 articleHervéH. Cardot, PeggyP. Cénac and Jean-MarieJ.-M. Monnez. A fast and recursive algorithm for clustering large datasets with k-mediansComputational Statistics & Data Analysis5662012, 1434--1449
back to text
39 articleJ. F.J. Coeurjolly. Simulation and identification of the fractional brownian motion: a bibliographical and comparative studyJournal of Statistical Software52000, 1--53
back to text
40 articleMark HAM. Davis. Piecewise-deterministic Markov processes: A general class of non-diffusion stochastic modelsJournal of the Royal Statistical Society. Series B (Methodological)1984, 353--388
back to text
41 articleAurélienA. Deya and SamyS. Tindel. Rough Volterra equations. I. The algebraic integration settingStoch. Dyn.932009, 437--477URL: http://dx.doi.org/10.1142/S0219493709002737
DOI back to text
42 articleMarieM. Doumic, MarcM. Hoffmann, NathalieN. Krell and LydiaL. Robert. Statistical estimation of a growth-fragmentation model observed on a genealogical treeBernoulli2132015, 1760--1799
back to text
43 articleSandieS. Ferrigno and GillesG. Ducharme. Un test d'adéquation global pour la fonction de répartition conditionnelleC. R. Math. Acad. Sci. Paris34152005, 313--316URL: http://dx.doi.org/10.1016/j.crma.2005.07.003
DOI back to text
44 articleSandieS. Ferrigno, MyriamM. Maumy-Bertrand and AurélieA. Muller-Gueudin. Uniform law of the logarithm for the local linear estimator of the conditional distribution functionC. R. Math. Acad. Sci. Paris34817-182010, 1015--1019URL: http://dx.doi.org/10.1016/j.crma.2010.08.003
DOI back to text
45 articleJeromeJ. Friedman, TrevorT. Hastie and RobertR. Tibshirani. Sparse inverse covariance estimation with the graphical lassoBiostatistics932008, 432--441
back to text
46 article ChristopheC. Giraud, SylvieS. Huet and NicolasN. Verzelen. Graph selection with GGMselect Statistical applications in genetics and molecular biology 11 3 2012
back to text
47 inproceedingsT.D.T. Hansen and U. Zwick. Lower Bounds for Howard's Algorithm for Finding Minimum Mean-Cost CyclesISAAC (1)2010, 415-426
back to text
48 articleSamuelS. Herrmann and PierreP. Vallois. From persistent random walk to the telegraph noiseStoch. Dyn.1022010, 161--196URL: http://dx.doi.org/10.1142/S0219493710002905
DOI back to text
49 incollectionJianghaiJ. Hu, Wei-ChungW.-C. Wu and ShankarS. Sastry. Modeling subtilin production in bacillus subtilis using stochastic hybrid systemsHybrid Systems: Computation and ControlSpringer2004, 417--431
back to text
50 articleRoukayaR. Keinj, ThierryT. Bastogne and PierreP. Vallois. Multinomial model-based formulations of TCP and NTCP for radiotherapy treatment planningJournal of Theoretical Biology2791June 2011, 55-62URL: http://hal.inria.fr/hal-00588935/en
DOI back to text
51 book RogerR. Koenker. Quantile regression 38 Cambridge university press 2005
back to text
52 bookYury A.Y. Kutoyants. Statistical inference for ergodic diffusion processesSpringer Series in StatisticsLondonSpringer-Verlag London Ltd.2004, xiv+481
back to text
53 articleCélineC. Lacaux. Real Harmonizable Multifractional Lévy MotionsAnn. Inst. Poincaré.4032004, 259--277
back to text
54 incollectionLudovicL. Lebart. On the Benzecri's method for computing eigenvectors by stochastic approximation (the case of binary data)Compstat 1974 (Proc. Sympos. Computational Statist., Univ. Vienna, Vienna, 1974)ViennaPhysica Verlag1974, 202--211
back to text
55 inproceedings BorisB. Lesner and BrunoB. Scherrer. Non-Stationary Approximate Modified Policy Iteration ICML 2015 Lille, France July 2015
HAL back to text
56 bookT. Lyons and Z. Qian. System control and rough pathsOxford mathematical monographsClarendon Press2002, URL: http://books.google.com/books?id=H9fRQNIngZYC
back to text
57 articleNicolaiN. Meinshausen and PeterP. Bühlmann. High-dimensional graphs and variable selection with the lassoThe Annals of Statistics2006, 1436--1462
back to text
58 articleJean-MarieJ.-M. Monnez. Approximation stochastique en analyse factorielle multipleAnn. I.S.U.P.5032006, 27--45
back to text
59 articleJean-MarieJ.-M. Monnez. Convergence d'un processus d'approximation stochastique en analyse factoriellePubl. Inst. Statist. Univ. Paris3811994, 37--55
back to text
60 articleJean-MarieJ.-M. Monnez. Stochastic approximation of the factors of a generalized canonical correlation analysisStatist. Probab. Lett.78142008, 2210--2216URL: http://dx.doi.org/10.1016/j.spl.2008.01.088
DOI back to text
61 articleEAE. Nadaraya. On non-parametric estimates of density functions and regression curvesTheory of Probability & Its Applications1011965, 186--190
back to text
62 techreport I. Post and Y. Ye. The simplex method is strongly polynomial for deterministic Markov decision processes arXiv:1208.5083v2 2012
back to text
63 book M. Puterman. Markov Decision Processes Wiley, New York 1994
back to text
64 inproceedingsBernardB. Roynette, PierreP. Vallois and MarcM. Yor. Brownian penalisations related to excursion lengths, VIIAnnales de l'IHP Probabilités et statistiques4522009, 421--452
back to text
65 incollectionFrancescoF. Russo and PierreP. Vallois. Elements of stochastic calculus via regularizationSéminaire de Probabilités XL1899Lecture Notes in Math.BerlinSpringer2007, 147--185URL: http://dx.doi.org/10.1007/978-3-540-71189-6_7
DOI back to text
66 articleFrancescoF. Russo and PierreP. Vallois. Stochastic calculus with respect to continuous finite quadratic variation processesStochastics: An International Journal of Probability and Stochastic Processes701-22000, 1--40
back to text
67 inproceedings BrunoB. Scherrer. Approximate Policy Iteration Schemes: A Comparison ICML - 31st International Conference on Machine Learning - 2014 Pékin, China June 2014
HAL back to text
68 articleBrunoB. Scherrer, MohammadM. Ghavamzadeh, VictorV. Gabillon, BorisB. Lesner and MatthieuM. Geist. Approximate Modified Policy Iteration and its Application to the Game of TetrisJournal of Machine Learning Research16A paraître2015, 1629--1676
HAL back to text
69 article BrunoB. Scherrer. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration Mathematics of Operations Research Markov decision processes ; Dynamic Programming ; Analysis of Algorithms February 2016
HAL DOI back to text
70 inproceedings BrunoB. Scherrer and BorisB. Lesner. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes NIPS 2012 - Neural Information Processing Systems South Lake Tahoe, United States December 2012
HAL back to text
71 articleBrunoB. Scherrer. Performance Bounds for Lambda Policy Iteration and Application to the Game of TetrisJournal of Machine Learning Research14January 2013, 1175-1221
HAL back to text
72 articlePierreP. Vallois and Charles S.C. Tapiero. Memory-based persistence in a counting random walk processPhys. A.38612007, 303--307URL: http://dx.doi.org/10.1016/j.physa.2007.08.027
DOI back to text
73 articlePierreP. Vallois. The range of a simple random walk on ZAdvances in applied probability1996, 1014--1033
back to text
74 miscNathalieN. Villa-Vialaneix. An introduction to network inference and mining(consulté le 22/07/2015)2015, URL: http://www.nathalievilla.org/doc/pdf//wikistat-network_compiled.pdf
back to text back to text
75 articleY. Ye. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount RateMath. Oper. Res.3642011, 593-603
back to text

BIGS - 2020

BIGS - 2020

Keywords

Computer Science and Digital Science

Other Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists

Faculty Members

Post-Doctoral Fellows

PhD Students

Technical Staff

Interns and Apprentices

Administrative Assistant

External Collaborators

2 Overall objectives

3 Research program

3.1 Introduction

3.2 Stochastic modeling

3.3 Estimation and control for stochastic processes

3.4 Algorithms and estimation for graph data

3.5 Regression and machine learning

4 Application domains

4.1 Tumor growth-oncology

4.2 Genomic data and micro-organisms population

4.3 Epidemiology and e-health

4.4 Dynamics of telomeres

5 Social and environmental responsibility

6 Highlights of the year

7 New software and platforms

7.1 New software

7.1.1 Angio-Analytics

7.1.2 ARMADA

7.1.3 kosel

7.1.4 SesIndexCreatoR

7.1.5 In silico

7.1.6 HSPOR

7.1.7 cvmgof

7.1.8 starm R

8 New results

8.1 Stochastic modelling

8.1.1 Modelling of diffuse low-grade gliomas growth

8.1.2 Reconstruction of epigenetic landscapes from single-cell data

8.2 Optimal control of Markov processes

8.3 Regression and machine learning

8.3.1 Cramér–von Mises goodness-of-fit tests in regression models

8.3.2 The revisited knockoffs method for variable selection in L1-penalized regressions

8.3.3 Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods

8.3.4 Streaming constrained binary logistic regression with online standardized data

8.3.5 Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression

8.3.6 Change-point detection theresholds in the sequential context

8.4 Statistical learning and application in health

8.4.1 Estimation of reference curves for fetal weight

8.4.2 Construction of parsimonious event risk scores by an ensemble method. An illustration for short-term predictions in chronic heart failure patients from the GISSI-HF trial

8.4.3 Modeling and estimation of circulating tumor DNA (ctDNA) dynamics for detecting resistance to targeted therapies

8.4.4 A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology

8.4.5 Project linked with the COVID 19 pandemic

9 Bilateral contracts and grants with industry

9.1 Bilateral contracts with industry

10 Partnerships and cooperations

10.1 International initiatives

10.1.1 Participation in other international programs

10.2 International research visitors

10.2.1 Visits of international scientists

10.3 National initiatives

10.4 Regional initiatives

11 Dissemination

11.1 Promoting scientific activities

11.1.1 Journal

11.1.2 Invited talks

11.1.3 Research administration

11.2 Teaching - Supervision - Juries

11.2.1 Teaching

11.2.2 Supervision

11.2.3 Juries

11.3 Popularization

11.3.1 Education

11.3.2 Interventions

12 Scientific production

12.1 Publications of the year

International journals