BIGS is a joint team of Inria, CNRS and Université Lorraine, via the Institut Élie Cartan, UMR 7502 CNRS-UL laboratory in mathematics, of which Inria is a strong partner. One member of BIGS, T. Bastogne, comes from the Research Center of Automatic Control of Nancy (CRAN), with which BIGS has strong relations in the domain "Health-Biology-Signal". Our research is mainly focused on stochastic modeling and statistics for a methodological purpose but also aiming at a better understanding of biological systems. BIGS is composed of applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of some issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) genomic data and micro-organisms population study, (4) epidemiology and e-health.
We give here the main lines of our research that belongs to the domains of probability and statistics. For a better understanding, we made the choice to structure them in four items. Although this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.
Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells , . Piecewise deterministic processes are non diffusion processes also frequently used in the biological context , , . Among Markov model, we developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations , . For instance, knowledge about Brownian or random walk excursions , helps to analyse genetic sequences and to develop inference about it. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion. This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion of fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the so-called rough paths tools , but also invoking the Russo-Vallois integration techniques . The specific issue of Volterra equations driven by fractional Brownian motion, which is central for the subdiffusion within proteins problem, is addressed in . Many generalizations (Gaussian or not) of this model have been recently proposed for some Gaussian locally self-similar fields, or for some non-Gaussian models , or for anisotropic models .
We develop inference about stochastic processes that we use for modeling. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.
There are many estimation techniques for diffusion processes or coefficients of fractional or multifractional Brownian motion according to a set of observations , , . But, the inference problem for diffusions driven by a fractional Brownian motion is still in its infancy. Our team has a good expertise about inference of the jump rate and the kernel of Piecewise Deterministic Markov Processes (PDMP) , , , . However, there are many directions to go further into. For instance, previous works made the assumption of a complete observation of jumps and mode, that is unrealistic in practice. We tackle the problem of inference of "Hidden PDMP". As an example, in pharmacokinetics modeling inference, we want to take into account for presence of timing noise and identification from longitudinal data. We have expertise on this subjects , and we also used mixed models to estimate tumor growth .
We consider the control of stochastic processes within the framework of Markov Decision Processes and their generalization known as multi-player stochastic games, with a particular focus on infinite-horizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of . Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years , , , , but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes , , , , , and we currently investigate whether/how similar ideas may be adapted to multi-player stochastic games.
A graph data structure consists of a set of nodes, together with a set of pairs of these nodes called edges. This type of data is frequently used in biology because they provide a mathematical representation of many concepts such as biological structures and networks of relationships in a population. Some attention has recently been focused in the group on modeling and inference for graph data.
Network inference is the process of making inference about the link between two variables taking into account the information about other variables. gives a very good introduction and many references about network inference and mining. Many methods are available to infer and test edges in Gaussian Graphical models , , , . However, when dealing with abundance data, because inflated zero data, we are far from gaussian assumption and we want to develop inference in this case.
Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model. We also work on lossy compression of trees via linear directed acyclic graphs. These methods enable us to compute distances between tree data faster than from the original structures and with a high accuracy.
Regression models and machine learning aim at inferring statistical links between a variable of interest and covariates. In biological study, it is always important to develop adapted learning methods both in the context of standard data and also for data of high dimension (with sometimes few observations) and very massive or online data.
Many methods are available to estimate conditional quantiles and test dependencies , . Among them we have developed nonparametric estimation by local analysis via kernel methods , and we want to study properties of this estimator in order to derive a measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to develop omnibus test that examine several assumptions together.
Concerning the analysis of high dimensional data, our view on the topic relies on the French data analysis school, specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool , which allows one to approximate eigenvectors in a stepwise manner , , . BIGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms . We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA.
We also focus on the biological context of high-throughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. We have to account for the inter-individual variability within the modeling procedure. We aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (Expectation-Maximisation) algorithm for the estimation of the model parameters.
On this topic, we want to propose branching processes to model appearance of mutations in tumor through new collaborations with clinicians. The observed process is the "circulating DNA" (ctDNA). The final purpose is to use ctDNA as a early biomarker of the resistance to an immunotherapy treatment. It is the aim of the ITMO project. Another topic is the identification of dynamic network of expression. We continue our work on low-grade gliomas. The ongoing collaboration with Montpellier CHU, and a new one with Montreal CRHUM should provide us more data. We initiate as well interactions with researchers from Montreal LIO to extend the previous work. We still have much work to do in modeling to reach our goal of a decision-aid tool for personalised medicine. In the same context, there is a question of clustering analysis of a brain cartography obtained by sensorial simulations during awake surgery.
Despite of his 'G' in the name of BIGS, Genetics is not central in the applications of the team. However, we want to contribute to a better understanding of the correlations between genes trough their expression data and of the genetic bases of drug response and disease. We have contributed to methods detecting proteomics and transcriptomics variables linked with the outcome of a treatment.
We have many works to do in our ongoing projects in the context of personalized medicine with "CHU Nancy". They deal with biomarkers research, prognostic value of quantitative variables and events, scoring, and adverse events. We also want to develop our expertise in rupture detection in a project with APHP for the detection of adverse events, earlier than the clinical signs and symptoms. The clinical relevance of predictive analytics is obvious for high-risk patients such as those with solid organ transplantation or severe chronic respiratory disease for instance. The main challenge is the rupture detection in multivariate and heterogeneous signals (for instance daily measures of electrocardiogram (during 30 minutes), body temperature, spirometry parameters, sleep duration, etc. Other collaborations with clinicians concern foetopathology and we want to use our work on conditional distribution function to explain fetal and child growth. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy).
The telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Trough a beginning collaboration with Pr A. Benetos, geriatrician at CHU Nancy, we recently data on the distribution of the length of telomeres from blood cells. With some members of Inria team TOSCA, we want to work in three connected directions: (1) refine methodology for the analysis of the available data; (2) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior, quasi-stationarity, etc.); and (3) use these properties to develop new statistical methods. A slot of postdoc position is already planned in the Lorraine Université d'Excellence, LUE project GEENAGE (managed by CHU Nancy).
BIGS organised the annual meeting of the European Network of Business and Industrial Statistics (ENBIS), 150 participants, 3 days of conference (3-5 september) plus 3 tutorials.
Romain Azaïs and Florian Bouguet edited a book “Statistical Inference for Piecewise-deterministic Markov Processes” . The idea for this book stemmed from a workshop organized in Nancy in the 2016-17 winter. Two chapters have been co-authored by one or more BIGS members.
T. Bastogne created of a new start-up specialized on the automatic analysis of cardiac signals from cells up to patients.
Participants: A. Gégout-Petit, S. Mézières, Y. Petot, P. Vallois
In the framework of the esca-illness of vines, we developed different spatial models and spatio-temporal models for different purposes: (1) study the distribution and the dynamics of esca vines in order to tackle the aggregation and the potential spread of the illness (2) propose a spatio-temporal model in order to capture the dynamics of cases and measure the effects of environmental covariates. For purpose (2), we developed an autologistic model (centered in a new way), estimators of the parameters, and showed their good properties, and proposed a way to choose between several neighborhood models. It is the object of preprint .
In the framework of chalara of ashand, through a collaboration with INRA researchers, we have proposed a mechanistic model of propagation whose parameters are estimated by bayesian estimation. It is the object of the communication .
In a collaboration with physicists from Nancy CHRU, we have worked about the interest to use the whole distribution of telomeres lengths until the mean that is usually used to characterise ageing of a cell. We have shown that the shape of the distribution can be seen as a individuals's signature. It is the object of the accepted paper .
We analyse the probabilistic features of the Choquet integral with respect to a capacity over a finite set where the entries are random variables. Despite the amount of studies, the question of uncertainty remains under-considered. Such a question is of first importance in many applications and uses.
In the multifactorial context of modelling for gliomas, we focused our attention on the acquisition of the tumor diameter from clinical-collected data . 3-D reconstruction via an equivalent sphere from multiple contouring of the tumor leads us to characterize its infiltrating phenotype (infiltration rate, direction of infiltration, evolution of morphology over time), current work. Our aim is to incorporate this new factor in the modeling already started (to appear in JBHI, beginning 2019).
A brain cartography obtained by sensorial simulations during awake surgery with the aid of clustering analysis is in revision.
Participants: R. Azaïs, F. Bouguet, A. Gégout-Petit, F. Greciet, B. Scherrer
Piecewise-deterministic Markov processes form a class of stochastic models with a sizeable scope of applications. Such processes are defined by a deterministic motion punctuated by random jumps at random times, and offer simple yet challenging models to study. The issue of statistical estimation of the parameters ruling the jump mechanism is far from trivial. Responding to new developments in the field as well as to current research interests and needs, the book “Statistical Inference for Piecewise-deterministic Markov Processes” edited by Romain Azaïs and Florian Bouguet gather 7 chapters by different authors on the topic. The idea for this book stemmed from a workshop organized in Nancy in the 2016-17 winter. Two chapters have been co-authored by one or more BIGS members.
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In , multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In , , we study multiple-step greedy algorithms in more practical setups: we describe and analyze a stochastic approximation variation and general sensitivity analyses to approximations. In , we describe a short study on an Anderson acceleration of the fixed point computation involved in Reinforcement Learning. These contributions resulted in one publication in ICML, in NeurIPS, and two in EWRL (the European Workshop on Reinforcement Learning).
Participants: A. Gégout-Petit, A. Gueudin, C. Karmann
In the purpose to deal with inference for network of zero-inflated variables, we have developped a new regression model. We consider the problem of variable selection when the response is ordinal, that is an ordered categorical variable. In particular, we are interested in selecting quantitative explanatory variables linked with the ordinal response variable and we want to determine which predictors are relevant. In this framework, we choose to use the polytomous ordinal logistic regression model using cumulative logits which generalizes the logistic regression. We then introduce the Lasso estimation of the regression coefficients using the Frank-Wolfe algorithm. To deal with the choice of the penalty parameter, we use the stability selection method and we develop a new method based on the knockoffs idea. This knockoffs method is general and suitable to any regression and besides, gives an order of importance of the covariates. Finally, we provide some experimental results to corroborate our method and we present an application of this regression method for zero-inflated network inference. This work is the object of a presentation in a conference and a preprint submitted in a journal .
Participants: E. Albuisson, R. Azaïs (Inria, Lyon), T. Bastogne, L. Batista, K. Duarte, S. Ferrigno, A. Gégout-Petit, P. Guyot, J.-M. Monnez, N. Sahki, S. Mézières
In the purpose to detect change of health state for lung-transplanted patient, we have begun to work on breakdowns in multivariate physiological signals. Based on the CUSUM statistics, we have used dynamical thresholds of detection . A more general talk about statistical learning and connected patient was given in a workshop "Evaluation des objets en santé connectée" .
We consider the analysis of cardiomyocyte signals (cardiac cells) for the cardiotoxicity assessment of new pharmaceutical compounds in preclinical assays. The experimental data are either impedance signals measuring the contractility of cardiomyocytes , , field potential signals measuring their functionality or fluorescence signals measuring the activity of some ion channels such as calcium pumps (Ca2+). At this preclinical level, our main contribution is the estimation of important characteristics such the field potential duration or the identification of cardiotoxic events such as the early-afterdepolarization.We have also developed new methods for the analysis of electrocardiograms at patient level and more precisely the estimation of parameters such as the RR and QT intervals in long and noisy signals provided by wearable sensors , , , , . We also study the efficacy of a new biomarker in radiotherapy. The objective is to compute a score able to predict risk of radiosensitivity for patients in radiotherapy , .We are also developing a new method to characterize the potential interactions between nanoparticles and biological compounds of complex media such as blood. This new method aims at predicting risks on the biodistribution and toxicity of the nanoparticles , .
In , we present a methodology for constructing a short-term event risk score from an ensemble predictor using bootstrap samples, two different classification rules, logistic regression and linear discriminant analysis for mixed data, continuous or categorical, and random selections of variables into the construction of predictors. We establish a property of linear discriminant analysis for mixed data and define an event risk measure by an odds-ratio. This methodology is applied to heart failure patients on whom biological, clinical and medical history variables were measured and the results obtained from our data are detailed.
The study addresses the problem of sequential least square multidimensional linear regression, particularly in the case of a data stream, using a stochastic approximation process. To avoid the phenomenon of numerical explosion which can be encountered and to reduce the computing time in order to take into account a maximum of arriving data, we propose using a process with online standardized data instead of raw data and the use of several observations per step or all observations until the current step. Herein, we define and study the almost sure convergence of three processes with online standardized data: a classical process with a variable step-size and use of a varying number of observations per step, an averaged process with a constant step-size and use of a varying number of observations per step, and a process with a variable or constant step-size and use of all observations until the current step. Their convergence is obtained under more general assumptions than classical ones. These processes are compared to classical processes on 11 datasets for a fixed total number of observations used and thereafter for a fixed processing time. Analyses indicate that the third-defined process typically yields the best results.
Many articles were devoted to the problem of estimating recursively the
eigenvectors and eigenvalues in decreasing order of the expectation of a
random matrix using an i.i.d. sample of it. In , we make the
following contributions. The convergence of a normed process is proved under
more general assumptions: the random matrices are not supposed i.i.d. and a
new data mini-batch or all data until the current step are taken into
account at each step without storing them; three types of processes are
studied; this is applied to online principal component analysis of a data
stream, assuming that data are realizations of a random vector
Let
In epidemiology, we are working with clinicians to study fetal development in the last two trimesters of pregnancy. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy) and from the EDEN cohort (INSERM). We propose to use non parametric methods of estimation to obtain reference curves of fetus and child growth. In addition, we want to develop a test, based on Z-scores, to detect any slope breaks in the fetal development curves (work in progress).
Bruno Scherrer has done some consulting for EDF. This was a skill transfer activity involving training and consulting on the theory and algorithms for reinforcement learning, for the Research & Development team of EDF conducted by Lorenzo Audibert. This R&D team wants to apply reinforcement learning to several EDF problems: optimizing maintenance of uranium rods in the cores of nuclear power plants, optimizing the control of dam, optimization of load profiles for a network of electric vehicles. Bruno Scherrer's role was to give them the basics of reinforcement learning theory, and help them to use the algorithms of the literature. It was a one-shot action, running in 2018, and contractualized via a "framework agreement" Inria-EDF. This contract brings in approximately 12,000 euros to BIGS team (among which 2,000 for mission expenses).
R. Azaïs, A. Gégout-Petit, F. Greciet collaborated with SAFRAN Aircraft Engines (through a 2016-2019 contract). SAFRAN Aircraft Engines designs and products Aircraft Engines. For the design of pieces, they have to understand mechanism of crack propagation under different conditions. It appeals to BIGS for modeling crack propagation with Piecewise Deterministic Markov Processes (PDMP).
GDR 3475 Analyse Multifractale, Funding organism: CNRS, Leader: S. Jaffard (Université Paris-Est), Céline Lacaux
GDR 3477 Géométrie stochastic, Funding organism: CNRS, Leader: P. Calka (Université Rouen), Céline Lacaux
FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing ; leader : Pr Athanase Benetos), Jean-Marie Monnez
RHU Fight HF (Fighting Heart Failure ; leader : Pr Patrick Rossignol), located at the University Hospital of Nancy, Jean-Marie Monnez
Project "Handle your heart", team responsible for the creation of a drug prescription support software for the treatment of heart failure, head: Jean-Marie Monnez
A. Gégout-Petit, N. Sahki, S. Mézières are involved in the learning aspect of the clinical protocol "EOLEVAL" with Assistance Publique des Hopitaux de Paris (APHP)
"ITMO Physics, mathematics applied to Cancer" (2017-2019): "Modeling ctDNA dynamics for detecting targeted therapy", Funding organisms: ITMO Cancer, ITMO Technologies pour la santé de l’alliance nationale pour les sciences de la vie et de la santé (AVIESAN), INCa, Leader: N. Champagnat (Inria TOSCA), Participants: A. Gégout-Petit, A. Muller-Gueudin, P. Vallois.
PEPS AMIES (2018-2019), Etude Biométrique en foetopathologie et développement de l'enfant, Collaboration Institut Elie Cartan avec le CRESS INSERM, S. Ferrigno.
Modular, multivalent and multiplexed tools for dual molecular imaging (2017-2020), Funding organism: ANR, Leader: B Kuhnast (CEA). Participant: T. Bastogne.
Sophie Mézières belongs to GDR 720 ISIS, Funding organism: CNRS, leader: Laure Blanc-Féraud.
T. Bastogne participates to the ASCATIM Project (project FEDER), leaded by N. Tran and J.-P. Jehl (2018-2021).
A. Gégout-Petit is ETC program coordinator of the European Meeting of Statisticians, Palermo, July 2019.
All BIGS members are regular reviewers for journals in probability, statistics and machine learning as: Bernoulli, Scandinavian Journal of statistics, Stochastics, Journal of Statistical Planning Inference, Journal of theoretical Biology, IEEE Trans. Biomedical Eng., Theoretical Biology and Medical Modelling, Royal Society of Chemistry, Signal Processing: Image Communication, Mathematical Biosciences, LIDA, Annals of Applied Probability, Annals of Operations Research and Journal of Machine Learning Research, as well as conferences such as ICML, World IFAC Congress, FOSBE, ALCOSP...
Anne Gégout-Petit is member of the board of the European Regional Council of the Bernoulli society.
Anne Gégout-Petit is member of "Bureau du comité des projets", centre Inria Nancy Grand-Est.
R. Azaïs and B. Scherrer excepted, BIGS members have teaching obligations at "Université Lorraine" and are teaching at least 192 hours each year. They teach probability and statistics at different level (Licence, Master, Engineering school). Many of them have pedagogical responsibilities.
PhD : Clémence Karmann, "Network inference for zero-inflated models", Grant : Inria-Cordis. Advisors: A. Gégout-Petit, A. Muller-Gueudin.
PhD : Florine Greciet, "Modèles markoviens déterministes par morceaux cachés pour la propagation de fissures", grant CIFRE SAFRAN AIRCRAFT ENGINES, Advisors : R. Azaïs, A. Gégout-Petit.
PhD : Kévin Duarte, "Aide à la décision médicale et télémédecine dans le suivi de l'insuffisance cardiaque", Advisors : J.-M. Monnez and E. Albuisson.
PhD : Pauline Guyot, "Modélisation et Simulation de l’Electrocardiogramme d’un Patient Numérique", Grant : CIFRE-Cybernano. Advisors: T. Bastogne, E. H. Djermoune.
PhD: Nassim Shaki, "Détection de rupture dans des signaux multivariés pour la prédiction d’événement redouté à partir de paramètres physiologiques recueillis par capteurs connectés après greffe pulmonaire", grant Inria-Cordis. Advisors: A. Gégout-Petit, S. Mézières, M. d'Ortho.
Benoît Lalloué, contract research engineer for two years, RHU Fight RF, supervised by Jean-Marie Monnez.
Postdoc: Lionel Lenôtre, Telomer Modelling, grant LUE GEENAGE. Advisors: A. Gégout-Petit, D. Villemonais.
Since October 15, 2018, Jean-Marie Monnez supervises the work in RHU Fight.
Master: all BIGS members regularly supervise project and internship of master IMOI students.
Engineering school: all BIGS members regularly supervise project of "Ecole des Mines ", ENSEM or EEIGM students.
T. Bastogne participated to the jury of the Phd defense of Julie Kabil, Université Lorraine, January 2018.
A. Gégout-Petit participated to the jury of the Phd defense of Neska El Haouij, Université de Paris-Saclay, July 2018.
A. Gégout-Petit participated to the jury of the Phd defense of Lucien Bacharach, Université de Paris-Saclay, September 2018.
A. Gégout-Petit was reviewer and participated to the jury of the Phd defense of Maxime Redondin, Université Paris-Est, December 2018.
A. Gégout-Petit participated to the jury of the Phd defense of Aurélie Marton, Université Lorraine, December 2018.
A. Gégout-Petit was reviewer and participated to the jury of the HDR defense of Frédéric Bertrand, Université de Strasbourg, December 2018.
JM Monnez was advisor and participated to the jury of the PhD defense of Kévin Duarte, December 2018.
Bruno Scherrer was reviewer and participated to the jury of the Phd defense of Anna Harutyunyan, at Vrije Universiteit, Bruxelles, March 2018.
S. Ferrigno: Advisor of a group of students (EEIGM), "La main à la Pâte" project, elementary schools, Nancy, January-June 2018
S. Ferrigno: Advisor of a group of students (EEIGM), "La main à la Pâte" project, Institut médico-éducatif (IME), Commercy, September-December 2018
S. Ferrigno: Advisor of a group of students (EEIGM), "Utilisation de la technologie dans la médecine" Cgénial project, Collège Paul Verlaine, Malzéville, December 2018-February 2019.
S. Ferrigno: Advisor of a group of students (EEIGM), "Le cartable connecté" Cgénial project, Collège de la Craffe, Nancy, December 2018-February 2019.