BIGS is a team common to Inria, CNRS and Université de Lorraine, via the Institut Élie Cartan (UMR 7502 CNRS-Inria-UL). Our research is mainly focused on stochastic modeling and statistics for a methodological purpose but also aiming at a better understanding of biological systems. BIGS is composed of applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of some issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) genomic data and micro-organisms population study, (4) epidemiology and e-health and (5) dynamics of telomeres. Each of these items will be detailed in the sequel.
We give here the main lines of our research that belongs to the domains of probability and statistics. For a best understanding, we made the choice to structure them in four items. Even if this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.
Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells , . Piecewise deterministic processes are non diffusion processes also frequently used in the biological context , , , . Among Markov model, we also developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations , , . For instance, knowledge about Brownian or random walk excursions , helps to analyse genetic sequences and to develop inference about it. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion. This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion or (in higher dimensions of the parameter) fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the so-called rough paths tools , but also invoking the Russo-Vallois integration techniques . The specific issue of Volterra equations driven by fBm, which is central for the subdiffusion within proteins problem, is addressed in . Many generalizations (Gaussian or not) of this model have been recently proposed, see for instance for some Gaussian locally self-similar fields, for some non-Gaussian models, for anisotropic models. Our team has thus contributed , , , , and still contributes , , , , to this theoretical study: Hölder continuity, fractal dimensions, existence and uniqueness results for differential equations, study of the laws to quote a few examples. On the other hand, because of the observation of longitudinal data for each subject in medicine, we have to care about the random effect due to the subject and to choose adapted models like mixed effect models , , . In the context of health-care and cost-effectiveness analysis, we are also interested in model of aggregation of different criteria. For this purpose, we develop research about fuzzy binary measures and Choquet integral , .
When one desires to confront theoretical probabilistic models with real data, statistical tools and control of the dynamics are obviously crucial. As matter of course, we develop inference about stochastic processes that we use for modeling, it is the heart of some of our projects. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.
The monograph is a good reference on the basic estimation techniques for diffusion processes. Some attention has been paid recently to the estimation of the coefficients of fractional or multifractional Brownian motion according to a set of observations. Let us quote for instance the nice surveys , . On the other hand, the inference problem for diffusions driven by a fractional Brownian motion has been in its infancy. A good reference on the question is , dealing with some very particular families of equations, which do not cover the cases of interest for us. We also recently proposed least-square estimators for these kind of processes , . Inference about PDMP is also a recent subject that we want to develop.
Our team has a good expertise about inference of the rate jump and the kernel of PDMP , , , . However, there is many directions to go further into. For instance, previous works made the assumption of a complete observation of jumps and mode, that is unrealistic in practice. We want to tackle the problem of inference of "Hidden PDMP". It could be also interesting to investigate estimation followed by optimal control for ergodic PDMP.
About pharmacokinetics modeling inference, several papers have been reported for the application of system identification techniques. But two issues were ignored in these previous works: presence of timing noise and identification from longitudinal data. In , we have proposed a bounded-error estimation algorithm based on interval analysis to solve the parameter estimation problem while taking into consideration uncertainty on observation time instants. Statistical inference from longitudinal data based on mixed effects models can be performed by the Monolix software (http://
We consider the control of stochastic processes within the framework of Markov Decision Processes and their generalization known as multi-player stochastic games , with a particular focus on infinite-horizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of . Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years , , , , , but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes , , , , and we currently investigate whether/how similar ideas may be adapted to multi-player stochastic games.
A graph data structure consists of a set of nodes, together with a set of (either unordered or ordered) pairs of these nodes called edges. This type of data is frequently used in various domains of application (in particular in biology) because they provide a mathematical representation of many concepts such as physical or biological structures and networks of relationship in a population. Some attention has recently been focused in the group on modeling and inference for graph data.
Suppose that we know the value of
Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model (critical Galton-Watson trees for example): in this context, the structure of the tree depends on an integer-valued distribution that we estimate from the observation of either only one tree, or a forest. We also work on lossy compression of trees via linear directed acyclic graphs. These methods make us able to compute distances between tree data faster than from the original structures and with a high accuracy. These results are valuable in the context of very large trees arising for instance in biology of plants.
Regression models or machine learning aim at inferring statistical links between a variable of interest and covariates. It amis also at clustering subjects or variables in set homogeneous sets. In biological study, it is always important to develop adapted learning methods both in the context of "standard" data and also for very massive or online data.
A first approach for regression of quantitative variable is the non-parametric estimation of its cumulative distribution function. Many methods are available to estimate conditional quantiles and test dependencies , . Among them we have developped nonparametric estimation trough local analysis via polynomial , and we want to study properties of this estimator in order to derive measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to test, thanks to simulation methods, validity of their assumptions. These kind of test are called omnibus test. An omnibus test is an overall test that examines several assumptions together, the most known omnibus test is the one for testing gaussianity (that examines both skewness and kurtosis ).
As it concerns the analysis point of high dimensional data, our view on the topic relies on the so-called French data analysis school, and more specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool (see Lebart's paper ), which allows one to approximate eigenvectors in a stepwise manner. A systematic study of Principal Component and Factorial Analysis has then been lead by Monnez in the series of papers , , , in which many aspects of convergences of online processes are analyzed thanks to the stochastic approximation techniques. BiGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms . We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA. We also focus the biological context of high-throughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. The inference of the modeling conclusions from a sample of wells to the whole population requires to account for the inter-individual variability within the modeling procedure. One solution consists in using mixed effects models but up to now no similar approach exists in the field of dynamical system identification. As a consequence, we aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (Expectation-Maximisation) algorithm for the estimation of the model parameters.
Cancer is the result of inter-dependent multi-scale phenomena and this is mainly why the understanding of its spread is still an unsolved problem. In integrative biology, mathematical models play a central role; they help biologists and clinicians to answer complex questions through numerical simulations and statistical analyses. The main issue here is to better understand and describe the role of cell damage heterogeneity and associated mutant cell phenotypes in the therapeutic responses of cancer cell populations submitted to a radiotherapy sessions during in vitro experiments. The cell heterogeneity is often described as randomness in mathematical modeling and different representations, such as Markov chains, branching processes and even stochastic differential equations, have been recently used.
Since 1988, some control system scientists and biologists at the Centre de Recherche en Automatique de Nancy (CRAN in short) http://
Generation genomic technologies allow clinicians and biomedical researchers to drastically increase the amount of genomic data collected on large cohort of patients and populations. We want to contribute to a better understanding of the correlations between gene trough their expression data, of the structure of ARN and of the genetic bases of drug response and disease and to detect significant sequences characterizing a gene. For instance the biopharmaceutical company Transgene recently contacts us to analyse their genomic and proteomic data particularly for the purpose to find markers of the success of therapies that they develop against cancer.
Network inference has also applications for the analysis of micro-organisms population, that we apply to micro-organism inside and around the truffle trough a collaboration with INRA Nancy. We want also study other specific complex microbial communities like that found at tree roots in order to characterize phenotype of the tree. There is also application in human health (for instance identification of network between bacteria inside colon).
Trough J.-M. Monnez and his collaborator Pr E. Albuisson, BIGS is stakeholder of projects with University Hospital of Nancy that is FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing ; leader : Pr Athanase BENETOS), RHU Fight HF (Fighting Heart Failure ; leader : Pr Patrick ROSSIGNOL), and "Handle your heart", team responsible for the creation of a drug prescription support software for the treatment of heart failure. All these projects are in the context of personalized medicine and deal with biomarkers research; prognostic value of quantitative variables and events and scoring of heart failure. Other collaborations with clinicians concern foetopathology and cancer again.
A telomere is a region of repetitive and non coding nucleotide sequences at each end of a chromosome. The telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Mathematical modeling of telomeres dynamics is recent , , , . Trough a collaboration with Pr A. Benetos, geriatrician at CHU Nancy, and some members of Inria team TOSCA, we want to work in three connected directions: (1) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior of the distribution of lengths, quasi-stationarity, etc); (2) use these properties to develop new statistical methods for estimating the various parameters; and (3) find and use a suitable methodology for the analysis of the available data (Pr Benetos) for instance for the study of the length distribution for a subject and its evolution.
The composition of the team was changed this year : Bruno Scherrer (Inria researcher) and Anne Gégout-Petit (Pr) joined the team (resp in January and in May). Samy Tindel moved to Purdue University as full Professor and Céline Lacaux has been promoted full Professor at Avignon University. Anne Gégout-Petit is temporary team leader since September.
Keywords: socioeconomic status, multidimensional index, principal component analysis, hierarchical classification, R
Scientific Description
In order to study social inequalities, indices can be used to summarize the multiple dimensions of the socioeconomic status. As a part of the Equit'Area Project, a public health program focused on social and environmental health inequalities, a statistical procedure to create (neighborhood) socioeconomic indices was developed. This procedure uses successive principal components analyses to select variables and create the index. In order to simplify the application of the procedure for non-specialists, the R package SesIndexCreatoR was created. It allows the creation of the index with all the possible options of the procedure, the classification of the resulting index in categories using several classical methods, the visualization of the results, and the generation of automatic reports.
Functional Description
This package allows computing and visualizing socioeconomic indices and categories distributions from datasets of socioeconomic variables (These tools were developed as part of the EquitArea Project, a public health program).
Participants: Benoît Lalloué, Severine Deguen, Jean-Marie Monnez and Nolwenn Le Meur
Contact: Benoît Lalloué
Keywords: Health - Cancer - Biomedical imaging
Scientific Description
Angio Analytics at l’APP under identification N° FR001.280027.000.R.P.2015.000.10000 - Software for image analysis and statistical analysis of parameters issued from these images
Angiogenesis is the phenomenon by which new blood vessels are created from preexisting ones. But this natural process is also involved, in a chaotic way, in tumor development. Many molecules have shown particular efficiency in inhibiting this phenomenon, hopefully leading to either: (i) a reorganization of the neovessels allowing a better tumor uptake of cytotoxic molecules (as chemotherapy) or (ii) a deprivation of the tumor vascular network with the view to starve it. However, characterizing the anti-angiogenic effects of a molecule remains difficult, mainly because the proposed physical modeling approaches have barely been confronted to in vivo data, which are not directly available. We have developed an original approach to characterize and analyze the anti-angiogenic responses in cancerology that allows biologists to account for spatial and dynamical dimensions of the problem. The proposed solution relies on the association of a specific biological in vivo protocol using skinfold chambers, image processing and dynamic system identification. An empirical model structure of the anti-angiogenic effect of a tested molecule is selected according to experimental data. Finally the model is identified and its parameters are used to characterize and compare responses of the tested molecule. The solution has been implemented in a software developed in The Matlab environment.
Functional Description
Angio-Analytics allows the pharmacodynamic characterization of anti-vascular effects in anti-cancer treatments.
Participant: Thierry Bastogne
Contact: Thierry Bastogne
Keywords: Bioinformatics - Cancer - Drug development
Functional Description
To speed up the preclinical development of medical engineered nanomaterials, we have designed an integrated computing platform dedicated to the virtual screening of nanostructured materials activated by X-ray making it possible to select nano-objects presenting interesting medical properties faster. The main advantage of this in silico design approach is to virtually screen a lot of possible formulations and to rapidly select the most promising ones. The platform can currently handle the accelerated design of radiation therapy enhancing nanoparticles and medical imaging nano-sized contrast agents as well as the comparison between nano-objects and the optimization of existing materials.
Participant: Thierry Bastogne
Contact: Thierry Bastogne
Participants: P. Vallois, S. Wantz-Mézières
External collaborator: J-S. Giet (IECL,Université de Lorraine)
A cancer tumor can be represented for simplicity as an aggregate of cancer cells, each cell behaving according to the same discrete model and independently of the others. Therefore to measure its size evolution, it seems natural to use tools coming from dynamics of population, for instance the logistic model. This deterministic framework is well-known but the stochastic one is worthy of interest. We work with a model in which we suppose that the size
where
Participants: T. Bastogne, P. Vallois
External collaborator: S. Pinel (CRAN, Université de Lorraine)
Cancer is the result of inter-dependent multi-scale phenomena and this is mainly why the understanding of its spread is still an unsolved problem. In integrative biology, mathematical models play a central role; they help biologists and clinicians to answer complex questions through numerical simulations and statistical analyses. The main issue here is to better understand and describe the role of cell damage heterogeneity and associated mutant cell phenotypes in the therapeutic responses of cancer cell populations submitted to a radiotherapy sessions during in vitro experiments. The cell heterogeneity is often described as randomness in mathematical modeling and different representations, such as Markov chains, branching processes and even stochastic differential equations, have been recently used. Conversely to these previous studies, which only focused on the steady-state responses of cell populations, we are interested by modeling the transient behavior after treatment and to identify the role of mutation heterogeneity in the global dynamic response of the cell populations. We propose to describe the survival response of anin vitro cancer cell culture treated by radiotherapy as a superposition of independent dynamics. Each cell is represented by a finite collection of cell mutation states with possible transitions between them. The population dynamics is given by an age-dependent multi-type branching process. From this representation, we obtain equations satisfied by the average size of the global survival population as well as the one of subpopulations associated with 10 mutation phenotypes. This work was presented via a poster communication in a international congress .
Participant: S. Wantz-Mézières
External collaborators: M. Ben Abdallah, Yann Gaudeau, J.-M. Moureaux (CRAN, Université de Lorraine) and M. Blonski, L. Taillandier (CHU Nancy)
In the framework of a collaboration with neurologists (Luc Taillandier, Marie Blonski, CHU Nancy) and automaticians (Jean-Marie Moureaux, Yann Gaudeau, CRAN), around the thesis supervision of M. Ben Abdallah, our aim is to work out personalized therapeutic strategy in the monitoring of diffuse low-grade glioma patients. Regular monitoring with MRI are used to estimate the tumour volume ; we proposed a method by manual segmentation and statistically assessed its reproducibility by a subjective test. In order to design a decision-aid tool for the response to chemotherapy, our approach is phenomelogical and we used simple regression tools to model and predict the cinetics of the tumour growth. We identified two different models. These results open up many perspectives, the main one being the modeling by multi-factor models, including biological and anatomopathological factors. This work is currently in progress.
Participant: C. Lacaux
External collaborators: T. Obara and M. Thomassin (CRAN, Université de Lorraine), L. Vinckenboch (Fribourg)
Our project focuses on an innovative application: the interstitial PDT for the treatment of high-grade brain tumors. This strategy requires the installation of optical fibers to deliver the light directly into the tumor tissue to be treated, while nanoparticles are used to carry the photosensitizer into the cancer cells. In order to optimize the intra-cerebral position of our optical fiber, two fundamental questions have to be answered: (1) What is the optimal shape and position of the light source in order to optimize the damage on malignant cells? (2) Is there a way to identify the physical parameters of the tissue which drive the light propagation?
Notice that we are obviously not the first ones to address these issues, and there is nowadays a consensus in favor of the algorithm proposed by L. Wang and S. L. Jacques for the simulation of light transport in biological tissues. However, our starting point is the observation that the usual methods slightly lack of formalism and miss formal representations that answer the questions of identifiability. In , in the framework of homogeneous biological tissues, we propose an alternative MC method to Wang’s algorithm. Then we also propose a variance reduction method. Interestingly enough, our formulation also allows us to design quite easily a Markov chain Monte Carlo (MCMC) method based on Metropolis-Hastings algorithm and to handle the inverse problem (of crucial importance for practitioners), consisting in estimating the optical coefficients of the tissue according to a series of measurements. We have compared the proposed MC and MCMC method and Wang’s algorithm: we see that our MC method is much more consistent. However, MCMC methods induce quick mutations, which paves the way to very promising algorithms in the inhomogenous case. To handle the inverse problem, we derive a probabilistic representation of the variation of the fluence with respect to the absorption and scattering coefficients. This leads us to the implementation of a Levenberg-Marquardt type algorithm that gives an approximate solution to the inverse problem. Our results open the way for new improvements of Monte-Carlo methods in the context of light propagation. They should rather be seen as a starting point for new methods, including in inhomogeneous tissue. This work has been presented in several french seminars (Lille, Avignon, Paris Descartes, Orléans).
Participant: C. Lacaux
External collaborator: G. Samorodnitsky (Cornell, USA)
In extreme value theory, one of the major topics is the study of the limiting behavior of the partial maxima of a stationary sequence. When this sequence is i.i.d., the unique limiting process is well-known and called the extremal process. Considering a long memory stable sequence, the limiting process is obtained as a simple power time change extremal process. Céline Lacaux and Gennady Samorodnistky have proved in that this limiting process can also be interpreted as a restriction of a self-affine random sup measure. In addition, they have established that this random measure arises as a limit of the partial maxima of the same long memory stable sequence, but in a different space. Their results open the way to propose new self-similar processes with stationary max-increments. Céline Lacaux has presented this work in an invited session of the international conference Extreme Value Analysis at Ann Arbor (June 2015).
Participant: C. Lacaux
External collaborator: H. Biermé (Poitiers)
Hermine Biermé and Céline Lacaux maintain their collaboration on the study of anisotropic random fields. They have extended their previous work in the framework of conditionally sub-Gaussian random series. For such anisotropic fields, they have obtained a modulus of continuity and a rate of uniform convergence. Their framework enables the study of study e.g., Gaussian fields, stable random fields and multi-stable random fields. As invited speaker, Céline Lacaux has presented this work in the international conference Adventure in Self-similarity at Cornell University (June 2015) . Another of their works in progress deals with the simulation of anisotropic Gaussian random fields and the estimation of their parameters using quadratic variations.
Participants: P. Vallois
External collaborators: A. Lagnoux and S. Mercier (Toulouse)
Here we want to determine the sequences that are biologically interesting and compare the results using the single local score Hn and using the pair (Hn;Ln) where Ln is the length of the segment that realizes the best score. In that view, we work on the p-values associated to the observed samples.
Participants: T. Bastogne, Y. Petot, P. Vallois
The framework of this work is the PhD thesis of Yann Petit. The first chapter of the thesis is a state of the art identifying the current challenges in medico-economic analyses. A review article should be submitted in spring 2016. We are currently working on the aggregation operators, based on fuzzy measures and the Choquet integral. Theoretical results have been obtained and a publication is planned to be submitted in the second half of 2016. Work continues by introducing probabilities. The next step will be to apply our theoretical results to real clinical cases.
Participant : A. Gégout-Petit
External collaborators: S. Li, L. Guerin-Dubrana (Inra Bordeaux)
In the framework of a collaboration with INRA Bordeaux about the esca-illness of vines, Anne Gégout-Petit with Shuxian Li developed different spatial models and spatio-temporal models for different purposes: (1) study the distribution and the dynamics of esca vines in order to tackle the aggregation and the potential spread of the illness (2) propose a spatio-temporal model in order to capture the dynamics of cases and measure the effects of environmental covariates. For this, we propose different hierarchic models with latent process associated with a bayesian inference. A part of the research has been submitted in a journal of biology . Shuxian Li defended his PhD on December the 15th.
Participants: R. Azaïs, A. Gégout-Petit
External collaborators: A.B. Abdessalem, M. Puiggali, M. Touzet (Bordeaux)
Fatigue crack propagation is a stochastic phenomenon due to the inherent uncertainties originating
from material properties and environmental conditions. In a recent preprint , we
propose to model and to predict the fatigue crack growth by a piecewise-deterministic Markov process
associated with deterministic crack laws of the literature, namely the Paris-Erdogan equation defined by
Participant: S. Tindel
External collaborators: K. Chouk, A. Deya, Y. Hu, L. Khoa, D. Nualart, E. Nualart, F. Xu. (US)
The problem of estimating the coefficients of a general differential equation driven by a Gaussian process is still largely unsolved. To be more specific, the most general (
where
where
To this aim, here are the steps we have focused on in 2015:
Some limit theorems for general functionals of Gaussian sequences , or for functionals of a Brownian motion , which give some insight on the asymptotic behavior of systems like ().
Extension of pathwise stochastic integration to processes indexed by the plane in , which helps to the definition of noisy systems such as partial differential equations.
Definition of new systems driven by a (spatial) fractional Brownian motion, such as the stochastic PDE considered in .
The local asymptotic normality obtained for the system (), which implies a lower bound on general estimators of the coefficient
Participants: R. Azaïs, A. Muller-Gueudin
A piecewise-deterministic Markov process is a stochastic process whose behavior is governed by an ordinary differential equation punctuated by random jumps occurring at random times. In a recent preprint , we focus on the nonparametric estimation problem of the jump rate for such a stochastic model observed within a long time interval under an ergodicity condition. More precisely, we introduce an uncountable class (indexed by the deterministic flow) of recursive kernel estimates of the jump rate and we establish their strong pointwise consistency as well as their asymptotic normality. In addition, we propose to choose among this class the estimator with the minimal variance, which is unfortunately unknown and thus remains to be estimated. We also discuss the choice of the bandwidth parameters by cross-validation methods. This paper has also been presented in two national workshops.
Participant : R. Azaïs
External collaborators: N. Krell (Rennes), B. de Saporta (Montpellier)
In , we assume that the transition kernel is continuous with respect to the Lebesgue measure. This condition may be not satisfied in some applications, as for instance for the well-known TCP process that appears in the modeling of the famous Transmission Control Protocol used for data transmission over the Internet. As a consequence, we propose to investigate estimation followed by optimal control for this ergodic process. The particular framework defined by this process allows us to define an optimal policy for the estimation of its jump rate. We obtain at present an efficient method for estimating the moments of the conditional distribution of the inter-congestion times in an optimal way. This work is currently in progress.
Participant : R. Azaïs
External participants: B. Delyon, F. Portier
Monte-Carlo methods for estimating an integral assume that the distribution of the random design is known. Unfortunately, some applications generate a design whose density function
when the number
Participants : R. Azaïs, B. Scherrer, S. Tindel, S. Wantz-Mézières
In recent years, Bastogne, Keinj and Vallois designed a Markov model of the evolution of cells under a radiotherapy treatment. We are currently investigating the problem of optimizing the radiotherapy intensity sequence in order to kill as many cancerous cells as possible while preserving as many healthy cells, a problem that fits into the stochastic optimal control problem. Our preliminary efforts suggest that, since we are dealing with large populations of cells, the problem can be well approximated by a limit deterministic optimal control problem. We can solve this problem numerically with a Pontryagine approach, and symbolically (in the simplest cases) by identifying the critical points of some multivariate polynomials. The latter approach allows us to validate the fact that the former actually finds globally optimal solutions. This is a work in progress.
Participant: B. Scherrer
External collaborators: V. Gabillon, M. Ghavamzadeh, M. Geist, B. Lesner, J. Perolat, O. Pietquin, M. Tagorti
We have provided in (ICML 2015) the first finite-sample analysis of the LSTD(
The long version of our previous work on the analysis of an approximate modified policy iteration for optimal control and its application to the Tetris domain is now published in JMLR . The extension of this algorithm family for computing approximately-optimal non-stationary policies allows to improve the dependency with respect to the discount factor: we provide such improved bounds in , as well as examples that show that our analysis is tight (and cannot be further improved).
An original analysis of the variation of the approximate modified policy iteration for computing approximate Nash equilibria in the more general setting of two-player zero-sum games was published in ICML 2015 .
Participant: A. Muller-Gueudin
External collaborators: A. Girard, S. Martin, I.C. Morarescu (CRAN, Nancy)
We relate here a starting of collaboration with researchers in Automatics in Nancy. We consider here networks, modeled as a graph with nodes and edges representing the agents and their interconnections, respectively. The objective is to study the evolution of the opinion of all the agents. The connectivity of the network, persistence of links and interactions reciprocity influence the convergence speed towards a consensus. The problem of consensus or synchronization is motivated by different applications as communication networks, power and transport grids, decentralized computing networks, and social or biological networks. We then consider networks of interconnected dynamical systems, called agents, that are partitioned into several clusters. Most of the agents can only update their state in a continuous way using only inner-cluster agent states. On top of this, few agents also have the peculiarity to rarely update their states in a discrete way by reseting it using states from agents outside their clusters. In social networks, the opinion of each individual evolves by taking into account the opinions of the members belonging to its community. Nevertheless, one or several individuals can change their opinions by interacting with individuals outside its community. These inter-cluster interactions can be seen as resets of the opinions. This leads us to a network dynamics that is expressed in term of reset systems. We suppose that the reset instants arrive stochastically following a Poisson renewal process. We have an accepted paper in the journal IEEE Transactions on Automatic Control .
Participants: A. Gégout-Petit, A. Muller-Gueudin
External collaborators: A. Deveau (INRA Nancy), C. Raïssy (Inria Orpailleur)
The objective is to characterize microbial interactions in a particular environment: the truffles.
The truffle provides a habitat for complex bacterial
communities. The role for bacteria in
the development of truffles has been
suggested but very little is known regarding the structure
and the functional potential of the truffle's bacterial communities
along truffle maturation.
In a mathematical point of view, two micro-organisms are connected if they are not independent, conditionally to the other micro-organisms. Several models fit into this setting, especially the gaussian graphical models, the bayesians networks, and the graphical log-linear models. But the data, which can be zeros inflated, need developments and we have to proposed new models. Moreover, we are confronted to the problem that
Participant: R. Azaïs
External collaborators: J-B. Durand, C. Godin
A classical compression method for trees is to represent them by directed acyclic graphs.
This approach exploits subtree repeats in the structure and is efficient only for trees with a high level of redundancy.
The class of self-nested trees presents remarkable compression properties by this method because of the systematic repetition of subtrees.
In particular, the compressed version of a self-nested tree
Participant: R. Azaïs
External collaborator: A. Genadot (Inria CQFD Bordeaux)
Galton-Watson trees are an elementary model for the genealogy of a branching population and thus play a central role in biology. Critical Galton-Watson trees are generated from a sibling distribution
Participants: S. Ferrigno, A. Muller-Gueudin
External collaborator: M. Maumy-Bertrand (IRMA, Strasbourg)
In this work with Myriam Maumy-Bertrand (IRMA, Strasbourg), we study the conditional cumulative distribution function and a nonparametric estimator associated to this function. The conditional cumulative distribution function has the advantages of completely characterizing the law of the random considered variable, allowing to obtain the regression function, the density function, the moments and the conditional quantile function. As a nonparametric estimator of this function, we focus on local polynomial techniques described in Fan and Gijbels . In particular, we use the local linear estimation of the conditional cumulative distribution function.
The objective of this work is to establish uniform asymptotic certainty bands for the conditional cumulative distribution function. To this aim, we give exact rate of strong uniform consistency for the local linear estimator of this function. We show that limit laws of the logarithm are useful in the construction of uniform asymptotic certainty bands for the conditional distribution function. In particular, we use a single bootstrap to construct sharp uniform asymptotic bands of this estimator.
We illustrate our results with simulations and a study of fetal growth which is based on 694 fetuses (carefully selected by exclusion of multiple pregnancies, malformed, macerated or serious ill fetuses, or those with chromosomal abnormalities) autopsied in fetopathologic units of the "Service de foetopathologie et de placentologie" of the Maternité Régionale Universitaire (CHU Nancy, France) between 1996 and 2013.
We have presented our results in two international conferences with proceedings in Lille in June 2015 ("47èmes Journées de Statistique de la SFdS") and London in December 2015 ("CM Statistics") .
Participants: R. Azaïs, S. Ferrigno
External collaborator: M-J. Martinez Marcoux (LJK, Grenoble)
The aim of this collaboration with Marie-José Martinez Marcoux (LJK, Grenoble) is to compare, through simulations, several methods to test the validity of a regression model. These tests can be "directional" in that they are designed to detect departures from mainly one given assumption of the model (for example the regression function, the variance or the error) or global (for example the conditional distribution function). The establishment of such statistical tests require the use of nonparametric estimators various functions (regression, variance, cumulative distribution function). The idea would then be able to build a tool ( package R) that allows a user to test the validity of the model it uses through different methods and varying parameters associated with modeling. This work is currently in progress.
Participant: J-M. Monnez
External collaborators : W. Kihal, B. Lalloué, C. Padilla, D, S. Zmirou-Navier
Everyone is subject to environmental exposures from various sources, with negative health impacts (air, water and soil contamination, noise, etc.) or with positive effects (e.g. green space). Studies considering such complex environmental settings in a global manner are rare. We propose to use statistical factor and cluster analyses to create a composite exposure index with a data-driven approach, in view to assess the environmental burden experienced by populations. We illustrate this approach in a large French metropolitan area. The study was carried out in the Great Lyon area (France, 1.2 M inhabitants) at the census Block Group (BG) scale. We used as environmental indicators ambient air NO2 annual concentrations, noise levels and proximity to green spaces, to industrial plants, to polluted sites and to road traffic. They were synthesized using Multiple Factor Analysis (MFA), a data-driven technique without a priori modeling, followed by a Hierarchical Clustering to create BG classes. The first components of the MFA explained, respectively, 30, 14, 11 and 9% of the total variance. Clustering in five classes group: (1) a particular type of large BGs without population; (2) BGs of green residential areas, with less negative exposures than average; (3) BGs of residential areas near midtown; (4) BGs close to industries; and (5) midtown urban BGs, with higher negative exposures than average and less green spaces. Other numbers of classes were tested in order to assess a variety of clustering. We present an approach using statistical factor and cluster analyses techniques, which seem overlooked to assess cumulative exposure in complex environmental settings. Although it cannot be applied directly for risk or health effect assessment, the resulting index can help to identify hot spots of cumulative exposure, to prioritize urban policies or to compare the environmental burden across study areas in an epidemiological framework .
Participant: J-M. Monnez
External collaborator: R. Bar (EDF, R & D)
Consider a data stream and suppose that each data vector is a realization of a random vector whose expectation varies with time, the law of the centered data vector being stationary. Consider the principal component analysis (PCA) of this centered vector called partial PCA. In this study are defined online estimators of direction vectors of the first principal axes by stochastic approximation processes using a data batch at each step or all the data until the current step. This extends a former result obtained by the second author by using one data vector at each step. This is applied to partial generalized canonical correlation analysis by defining a stochastic approximation process of the metric involved in this case using all the data until the current step. If the expectation of the data vector varies according to a linear model, a stochastic approximation process of the model parameters is used. All these processes can be performed in parallel.
Moreover, several incremental procedures of linear and logistic regression of a data stream were defined and tested and compared on existing batch data files and on simulated data streams.
Participant: J-M. Monnez
External collaborators: E. Albuisson, B. Pitt, P. Rossignol, F. Zannad (CHU, Nancy)
The purpose of this study was to assess the prognostic value of the estimation of plasma volume or of its variation beyond clinical examination in a post-hoc analysis of EPHESUS (Eplerenone Post-Acute Myocardial Infarction Heart Failure Efficacy and Survival Study).
Assessing congestion after discharge is challenging but of paramount importance to optimize patient management and to prevent hospital readmissions.
The present analysis was performed in a subset of 4,957 patients with
available data (within a full dataset of 6,632 patients). The study endpoint
was cardiovascular death or hospitalization for heart failure (HF) between
months 1 and 3 after post-acute myocardial infarction HF. Estimated plasma
volume variation (
An instantaneous estimation of plasma volume at month 1 was defined and also tested.
Multivariate analysis was performed with stepwise logistic regression.
In HF complicating myocardial infarction, congestion as assessed by the Strauss formula and an instantaneous derived measurement of plasma volume provided a predictive value of early cardiovascular events beyond routine clinical assessment. Prospective trials to assess congestion management guided by this simple tool to monitor plasma volume are warranted .
Participant: J-M. Monnez
External collaborator: E. Albuisson (CHU Nancy)
The purpose of this study was to define an event - death or hospitalization - score for heart failure patients based on the observation of biological, clinical and medical historical variables. Some of them were transformed or winsorized. Two methods of statistical learning were performed, logistic regression and linear discriminant analysis, with a stepwise selection of variables. Aggregation of classifiers by bagging was used. Finally a score taking values between 0 and 100 was established.
Participant: J-M. Monnez
External collaborator: O. Collignon (LIH, Luxembourg)
In supervised learning the number of values of a response variable to predict can be high. Also clustering them in a few clusters can be useful to perform relevant supervised classification analyses. On the other hand selecting relevant covariates is a crucial step to build robust and efficient prediction models, especially when too many covariates are available in regard to the overall sample size. As a first attempt to solve these problems, we had already devised in a previous study an algorithm that simultaneously clusters the levels of a categorical response variable in a limited number of clusters and selects forward the best covariates by alternate minimization of Wilks' Lambda. In this paper we first extend the former version of the algorithm to a more general framework where Wilks's Lambda can be replaced by any model selection criterion. We also turned forward selection into stepwise selection in order to remove covariates while the procedure processes if necessary. Finally an application of our algorithm to real datasets from peanut allergy studies allowed confirming previously published results and suggesting new discoveries.
Participant: T. Bastogne, L. Batista
External Collaborator: El-Hadi Djermoune (Université de Lorraine, CRAN)
With the advent of high-throughput technologies, life scientists are starting to grapple with massive data sets, encountering challenges with handling, processing and moving information that were once the domain of astronomers and high-energy physicists . We particularly focus the statistical analysis of large batch of time series with applications in the preclinical research in Cancerology. Our original contribution consists in developing new dynamical system identification methods suited to the processing of those type of data. System identification is a data-driven modeling approach more and more used in biology and biomedicine. In this application context, each assay is always repeated to estimate the response variability. The inference of the modeling conclusions to the whole population requires to account for the inter-individual variability within the modeling procedure. One solution consists in using mixed effects models but up to now no similar approach exists in the field of dynamical system identification. Therefore, our objective is to develop a new identification method integrating mixed effects within an ARX (Auto Regressive model with eXternal inputs) model structure. The parameter estimation step relies on the EM (Expectation-Maximisation) algorithm. First simulation results show the relevance of this solution compared with a classical procedure of system identification repeated for each subject. This work and derived was accepted in conference papers .
T. Bastogne, full Professor at Université de Lorraine and BIGS member is head of the startup Cybernano that provides computational solutions for biopharma and nano-medicine. http://
PhotoBrain (2015-17), AGuIX® theranostic nanoparticles for vascular-targeted interstitial photodynamic therapy of brain tumors, Funding organism: EuroNanoMed II, Leader: M. Barberi-Heyob (CRAN), Thierry Bastogne
(2014-16), A library of Near-InfraRed absorbing photosensitizers: tailoring and assessing photophysical and synergetic photodynamic properties, Funding organism: PHC Bosphore - Campus France, Leader: M. Barberi-Heyob (CRAN), Thierry Bastogne
GDR 3475 Analyse Multifractale, Funding organism: CNRS, Leader: S. Jaffard (Université Paris-Est), Céline Lacaux
GDR 3477 Géométrie stochastic, Funding organism: CNRS, Leader: P. Calka (Université Rouen), Céline Lacaux
FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing ; leader : Pr Athanase BENETOS), Jean-Marie Monnez
RHU Fight HF (Fighting Heart Failure ; leader : Pr Patrick ROSSIGNOL), located at the University Hospital of Nancy, Jean-Marie Monnez
Project "Handle your heart", team responsible for the creation of a drug prescription support software for the treatment of heart failure, head: Jean-Marie Monnez
S. Roelly, University of Postdam visited P. Vallois in 2015 September.
A. Gégout-Petit and P. Vallois supervised an internship of a master IMOI student at the startup SD-Innovation, http://
P. Vallois visited S. Roelly in Postdam (Germany), March 2015
P. Vallois visited P. Salminien in Turku (Turkey), March 2015
P. Vallois visited the Finance department in New York, April 2015
R. Azaïs: Organization of the weekly seminar of the Probability and Statistics group at the Institut Élie Cartan de Lorraine
P. Vallois organised a meeting "Fédération Charles Hermite", with assurance societies in Luxembourg, 2015 October
A. Gégout-Petit co-organised the day "Méga données pour la santé" for the Fédération Charles Hermite, Nancy, March 2015.
A. Gégout-Petit was chair of 2015 "Forum des jeunes mathématicien-ne-s", Lille November 2015.
P. Vallois is in the editorial board of "Risk and Decision Analysis".
All the BIGS members are regular reviewers for journals in probability, statistics and machine learning as: Bernoulli, Scandinavian Journal of statistics, Stochastics, Journal of Statistical Planning Inference, Journal of theoretical Biology, IEEE Trans. Biomedical Eng., Theoretical Biology and Medical Modelling, LIDA, Annals of Applied Probability, Annals of Operations Research and Journal of Machine Learning Research, ICML and IJCAI conferences, ...
P. Vallois: New York, Finance Department, April 2015
P. Vallois: Journées de probabilités, Toulouse, May 2015
P. Vallois: Conference in memory to Marc Yor, June 2015
C. Lacaux: Invited as plenary speaker at Adventure in Self-Similarity, Cornell University, USA, June 2015
C. Lacaux: Invited speaker at 9th International conference on Extreme Value Analysis, Session Max-stable processes and applications, Ann Arbor, Michigan, USA
C. Lacaux: Invited as pleanary speaker at 4th Stochastic Geometry Days, Poitiers, August 2015
R. Azaïs: Choix optimal parmi une classe d'estimateurs non paramétriques du taux de saut d'un processus markovien déterministe par morceaux. Rencontres de l'ANR Piece à Saint-Martin-de-Londres (May 2015)
R. Azaïs: Recursive kernel estimates for piecewise-deterministic Markov processes. Rencontres de l'ANR Piece à Tours (November 2015)
A. Muller-Gueudin: Certainty bands for the conditional cumulative distribution function and applications. Séminaire Statistique, Probabilités, Optimisation et Contrôle, Institut de Mathématiques de Bourgogne (June 2015)
A. Gégout-Petit was the president of the french statistical society (SFdS) until June 2015.
C. Lacaux: Elected member of the Conseil National des Universités (section 26) (2011–2015)
R. Azaïs: Member of the Technological Development Committee (Commission de Développement Technologique (CDT) in french), Inria Nancy – Grand Est
C. Lacaux: Member of the board of SMAI-MAS group
A. Gégout-Petit: elected member of the laboratory of mathematics "Institut Elie Cartan de Lorraine"
P. Vallois: head of the "Fédération Charles Hermite", consortium of three laboratories from Université de Lorraine in mathematics (Institut de Elie Cartan), computer science (Loria) and automatics (CRAN)
R. Azaïs and B. Scherrer excepted, BIGS members are teachers at "Université de Lorraine" and are teaching at least 200 hours each years. Many of them have pedagogical responsibilities.
Licence: A. Gueudin, Statistics, 45h, L3, first year of ENSAIA, Université de Lorraine, France
Licence: S. Ferrigno, Descriptive and inferential statistics, 60h, L2, second year of EEIGM, Université de Lorraine, France
Licence: S. Ferrigno, Statistical modeling, 60h, L2, second year of EEIGM, Université de Lorraine, France
Licence: S. Ferrigno, Mathematical and computational tools, 20h, L3, third year of EEIGM, Université de Lorraine, France
Licence: S. Ferrigno, Training projects, 33.5h, L1/L2/L3, first, second and third year of EEIGM, Université de Lorraine, France
Licence: A. Gégout-Petit, Permutation test, 20h, Université de Lorraine, France
Licence: P. Vallois, Exercices in probability, 30h, Université de Lorraine, France
Licence: C. Lacaux, Probability, 35h, Université de Lorraine, France
Licence: S. Wantz-Mézières, Probability and Statistics, financial mathematics, 164h, I.U.T, Université de Lorraine, France
Licence: S. Wantz-Mézières, Probability, 60h, first year in Telecom Nancy, Université de Lorraine, France
Licence: J-M. Monnez, Probability and Statistics, financial mathematics, 136h, I.U.T, Université de Lorraine, France
Licence: C. Lacaux, Probability, 35h, L3, first year of Mines Nancy, Université de Lorraine, France
Licence: S. Tindel, Introduction to probability and statistics, 30h, 2nd year for Bioengineering majors (LVE CMI), Université de Lorraine, France
Licence: T. Bastogne, Automatics, 70h
Master: S. Tindel, Stochastic calculus, 60h, M2, Université de Lorraine, France
Master: S. Tindel, Applied mathematics and probability, 60h, 1st year of Telecom Nancy engineering school, Université de Lorraine, France
Master: C. Lacaux, Stochastic Differential Equations, 31h, M1, third year of Mines Nancy, Université de Lorraine, France
Master: A. Gueudin, Probability and Statistics, 202 h, M1, second year of ENSEM and ENSAIA, Université de Lorraine, France
Master: S. Ferrigno, Experimental designs, 3h, M1, fourth year of EEIGM, Université de Lorraine, France
Master: S. Ferrigno, Data analyzing and mining, 63h, M2, third year of Ecole des Mines, Université de Lorraine, France
Master: S.Ferrigno, Modeling and forecasting, 43h, M1, second year of Ecole des Mines, Université de Lorraine, France
Master: S.Ferrigno, Training projects, 18h, M2/M3, second and third year of Ecole des Mines, Université de Lorraine, France
Master: A.Gégout-Petit, Statistics, modeling, 20h, future teacher, Université de Lorraine, France
Master: P.Vallois, Mathematical finance, 70h, Université de Lorraine, France
Master: A.Gégout-Petit, Statistics, modeling, 150h, master in applied mathematics, Université de Lorraine, France
Master: A.Gégout-Petit, Statistics, 20h, future engineer in informatics, Telecom Nancy, Université de Lorraine, France
Master: C.Lacaux, Stochastic Differential Equations, 31,5h, Université de Lorraine, France
Master: J-M.Monnez, Data analysis, statistical learning, Master 2 IMOI (Ingénierie Mathématique et Outils Informatiques), Université de Lorraine, France
Master: J-M.Monnez, Multivariate analysis, Master 2 IFM (Ingénierie de la Finance de Marché), Université de Lorraine, France
Master: R. Azaïs, 20h, Mines de Nancy (TD de Probabilités)
Master: T. Bastogne 120h
B.S: S. Tindel, Differential equations, 90h, 2nd year for various Engineering majors, Purdue University, USA
Doctoral and research: B. Scherrer gave a course on stochastic optimal control at the Machine Learning Summer School organized by Centre International de Mathématiques et d’Informatique de Toulouse
J-M. Monnez: Until June 2015 : Head of the Master 2 "Ingénierie Mathématique et Outils Informatiques (Mathematical Engineering and Computer Tools)", Université de Lorraine
A. Gégout-Petit: since June 2015 : Head of the Master 2 "Ingénierie Mathématique et Outils Informatiques (Mathematical Engineering and Computer Tools)", Université de Lorraine
A. Gégout-Petit created and is now in charge of cursus CMI in applied mathematics for Lorraine University
C. Lacaux is in charge of the cursus Ingénerie Mathématique of École Nationale Supérieure des Mines de Nancy, until 09/2015
P. Vallois is head of the "Parcours Mathématiques Financières" of the master "Applied mathematics" of Université de Lorraine
P. Vallois is head of the convention between "Université de Lorraine and Université Hammam Sousse" about master organization. Master ISC (Ingénierie de Systèmes Complexes)
T. Bastogne is in charge of the spécialité Systèmes & TIC du master Ingénierie de Systèmes Complexes
T. Bastogne created and is now in charge of professional master: CIIBLE (Cybernétique, Instrumentation, Image en Biologie et medecinE) en M2 with Medicine Faculty of Université de Lorraine
T. Bastogne created and is now in charge research master « Biosanté Numérique » with engineering school "Telecom Nancy"
PhD : Shuxian Li, "Modélisation spatio-temporelle pour l’esca de la vigne à l’échelle de la parcelle", INRA- Université de Bordeaux , defence December, 15, 2015. Advisor : A. Gégout-Petit
PhD in progress : Kévin Duarte, 2013, Jean-Marie Monnez and Eliane Albuisson
PhD : Marwa HAMZA "Caractérisations des familles exponentielles naturelles cubiques : étude des lois Beta généralisées et de certaines lois de Kummer", Université de Lorraine, Université de Sfax, defense May, 15, 2015. Advisors : P. Vallois and A. Hassairi
PhD (2014- ), in progress, Lévy Batista, Grant CIFRE with Cybernano, "Identification de modèles dynamiques linéaires à effets mixtes. Applications aux dynamiques de populations cellulaires", Université de Lorraine, Advisor: T. Bastogne
PhD, in progress, Paul Rétif, "Modélisation, simulation et analyses numériques de l’interaction nanoparticules-rayons X. Applications à la radiothérapie augmentée", Université de Lorraine, CHR Metz-Thionville, Advisor: T. Bastogne, defense expected in March 2016
PhD, in progress, Yann Petot, "Modèle probabiliste d'aide à la décision multicritère pour les études médico-économiques", Université de Lorraine, Advisors : P. Vallois and T. Bastogne
Master: all BIGS members regularly supervise project and internship of master IMOI students
Engineering school: all BIGS members regularly supervise project of "Ecole des Mines ", ENSEM or EEIGM students
PhD: Marthe-Aline Jutand, Université de Bordeaux, "Etudes de phénomènes de transposition didactique de la statistique dans le champ universitaire et ses environnements", December, 15, 2015, Referee : A. Gégout-Petit
PhD: Pierre Colin, AgroParistech, Sanofi, "Méthodes bayésiennes et adaptatives pour la recherche de dose optimale : le développement clinique précoce de thérapies ciblées en oncologie", December, 2015, Referee : A. Gégout-Petit
PhD: Julien Riposo, Université Paris 6 "Computational and Mathematical Methods for Data Analysis in Biology and Finance", September, 2015, Referee: A. Gégout-Petit
PhD: Yingjun DENG , Université Technologique de Troyes "Degradation modeling based on a time-dependent Ornstein-Uhlenbeck Process and Prognosis of system failures", February, 2015, Referee : A. Gégout-Petit
PhD: Jérémy Rohmer, Université de Lorraine, "Importance ranking of parameter uncertainties in geo-hazard assessments ", November, 15, 2015, President: A. Gégout-Petit
B. Scherrer was a reviewer
HdR: Nicolas Champagnat Université de Lorraine, February, 15, 2015, President: P. Vallois
PhD: Aurélie Beal, "Description et sélection de données en grandes dimensions", Université Aix-Marseille, February 2015. Referee: T. Bastogne
Anne Gégout-Petit was member of four selection committies: CR2 Inria Bordeaux, PR Statistics Université Paris 6, MCF Statistics Toulouse, MCF Psychology and Statistics Grenoble.
A. Gégout-Petit: manager of the project "ZOOM des métiers des mathématiques et de l'informatique" brochure and videos about 22 professionals in mathematics and computer science in order to promote these two domains with young people.
http://
S. Ferrigno: Advisor of a group of students, "La main à la Pâte" project, elementary schools, Nancy, January-June 2015
S. Ferrigno: Advisor of a group of students, "La main à la Pâte" project, Institut médico-éducatif (IME), Commercy, September-December 2015
R. Azaïs: Animation of a workshop "MATh.en.JEANS", Collège George Chepfer, Villers-lès-Nancy
P. Vallois: popularizing talk at MJC de Toul