BIGS is a team common to Inria, CNRS and Université de Lorraine, via the Institut Élie Cartan (UMR 7502 CNRS-Inria-UL). Our research is mainly focused on stochastic modeling and statistics for a methodological purpose but also aiming at a better understanding of biological systems. BIGS is composed of applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of some issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) genomic data and micro-organisms population study, (4) epidemiology and e-health and (5) dynamics of telomeres. Each of these items will be detailed in the sequel.

We give here the main lines of our research that belongs to the domains of probability and statistics. For a better understanding, we made the choice to structure them in four items. Although this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.

Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells , . Piecewise deterministic processes are non diffusion processes also frequently used in the biological context , , , . Among Markov model, we also developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations , , . For instance, knowledge about Brownian or random walk excursions , helps to analyse genetic sequences and to develop inference about it. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion. This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion or (in higher dimensions of the parameter) fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the so-called *rough paths* tools , but also invoking the Russo-Vallois integration techniques . The specific issue of Volterra equations driven by fBm, which is central for the subdiffusion within proteins problem, is addressed in . Many generalizations (Gaussian or not) of this model have been recently proposed, see for instance for some Gaussian locally self-similar fields, for some non-Gaussian models, for anisotropic models.
Our team has thus contributed , , , , and still contributes , , , , to this theoretical study: Hölder continuity, fractal dimensions, existence and uniqueness results for differential equations, study of the laws to quote a few examples. On the other hand, because of the observation of longitudinal data for each subject in medicine, we have to care about the random effect due to the subject and to choose adapted models like mixed effect models , , .
In the context of health-care and cost-effectiveness analysis, we are also interested in model of aggregation of different criteria. For this purpose, we develop research about fuzzy binary measures and Choquet integral , .

When one desires to confront theoretical probabilistic models with real data, statistical tools and control of the dynamics are obviously crucial. As matter of course, we develop inference about stochastic processes that we use for modeling, it is the heart of some of our projects. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.

The monograph is a good reference on the basic estimation techniques for diffusion processes. Some attention has been paid recently to the estimation of the coefficients of fractional or multifractional Brownian motion according to a set of observations. Let us quote for instance the nice surveys , . On the other hand, the inference problem for diffusions driven by a fractional Brownian motion has been in its infancy. A good reference on the question is , dealing with some very particular families of equations, which do not cover the cases of interest for us. We also recently proposed least-square estimators for these kind of processes , . Inference about PDMP is also a recent subject that we want to develop.
Our team has a good expertise about inference of the rate jump and the kernel of PDMP , , , . However, there are many directions to go further into. For instance, previous works made the assumption of a complete observation of jumps and mode, that is unrealistic in practice. We want to tackle the problem of inference of "Hidden PDMP". It could be also interesting to investigate estimation followed by optimal control for ergodic PDMP.
About pharmacokinetics modeling inference, several papers have been reported for the application of system identification techniques. But two issues were ignored in these previous works: presence of timing noise and identification from longitudinal data. In , we have proposed a bounded-error estimation algorithm based on interval analysis to solve the parameter estimation problem while taking into consideration uncertainty on observation time instants. Statistical inference from longitudinal data based on mixed effects models can be performed by the *Monolix* software (http://

We consider the control of stochastic processes within the framework of Markov Decision Processes and their generalization known as multi-player stochastic games , with a particular focus on infinite-horizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of . Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years , , , , , but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes , , , , , and we currently investigate whether/how similar ideas may be adapted to multi-player stochastic games.

A graph data structure consists of a set of nodes, together with a set of (either unordered or ordered) pairs of these nodes called edges. This type of data is frequently used in various domains of application (in particular in biology) because they provide a mathematical representation of many concepts such as physical or biological structures and networks of relationship in a population. Some attention has recently been focused in the group on modeling and inference for graph data.

Suppose that we know the value of

Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model (critical Galton-Watson trees for example): in this context, the structure of the tree depends on an integer-valued distribution that we estimate from the observation of either only one tree, or a forest. We also work on lossy compression of trees via linear directed acyclic graphs. These methods make us able to compute distances between tree data faster than from the original structures and with a high accuracy. These results are valuable in the context of very large trees arising for instance in biology of plants.

Regression models or machine learning aim at inferring statistical links between a variable of interest and covariates. It also aims at clustering subjects or variables in set homogeneous sets. In biological study, it is always important to develop adapted learning methods both in the context of "standard" data and also for very massive or online data.

A first approach for regression of quantitative variable is the non-parametric estimation of its cumulative distribution function. Many methods are available to estimate conditional quantiles and test dependencies , . Among them we have developped nonparametric estimation trough local analysis via polynomial , and we want to study properties of this estimator in order to derive measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to test, thanks to simulation methods, validity of their assumptions. Tests of this kind are called omnibus test. An omnibus test is an overall test that examines several assumptions together, the most known omnibus test is the one for testing gaussianity (that examines both skewness and kurtosis ).

As it concerns the analysis point of high dimensional data, our view on the topic relies on the so-called *French data analysis
school*, and more specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool
(see Lebart's paper ), which allows one to approximate eigenvectors in a stepwise manner.
A systematic study of Principal Component and Factorial Analysis has then been lead
by Monnez in the series of papers , , , in which many aspects of convergences
of online processes are analyzed thanks to the stochastic approximation techniques. BIGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms . We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA.
We also focus the biological context of high-throughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. The inference of the modeling conclusions from a sample of wells to the whole population requires to account for the inter-individual variability within the modeling procedure. One solution consists in using mixed effects models but up to now no similar approach exists in the field of dynamical system identification. As a consequence, we aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (Expectation-Maximisation) algorithm for the estimation of the model parameters.

Cancer is the result of inter-dependent multi-scale phenomena and this is mainly why the understanding of its spread is still an unsolved problem. In integrative biology, mathematical models play a central role; they help biologists and clinicians to answer complex questions through numerical simulations and statistical analyses. The main issue here is to better understand and describe the role of cell damage heterogeneity and associated mutant cell phenotypes in the therapeutic responses of cancer cell populations submitted to a radiotherapy sessions during *in vitro* experiments. The cell heterogeneity is often described as randomness in mathematical modeling and different representations, such as Markov chains, branching processes and even stochastic differential equations, have been recently used.

Since 1988, some control system scientists and biologists at the CRAN

Generation genomic technologies allow clinicians and biomedical researchers to drastically increase the amount of genomic data collected on large cohort of patients and populations. We want to contribute to a better understanding of the correlations between gene trough their expression data, of the structure of ARN and of the genetic bases of drug response and disease and to detect significant sequences characterizing a gene. For instance the biopharmaceutical company Transgene has recently contacted us to analyse their genomic and proteomic data particularly for the purpose to find markers of the success of therapies that they develop against cancer.

Network inference has also applications for the analysis of micro-organisms population, that we apply to micro-organism inside and around the truffle trough a collaboration with INRA Nancy. We want also study other specific complex microbial communities like that found at tree roots in order to characterize phenotype of the tree. There is also application in human health (for instance identification of network between bacteria inside colon).

Trough J.-M. Monnez and his collaborator Pr E. Albuisson, BIGS is stakeholder of projects with University Hospital of Nancy that is FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing; leader: Pr Athanase BENETOS), RHU Fight HF (Fighting Heart Failure; leader: Pr Patrick ROSSIGNOL), and "Handle your heart", team responsible for the creation of a drug prescription support software for the treatment of heart failure. All these projects are in the context of personalized medicine and deal with biomarkers research; prognostic value of quantitative variables and events and scoring of heart failure. Other collaborations with clinicians concern foetopathology and cancer again.

A telomere is a region of repetitive and non coding nucleotide sequences at each end of a chromosome. The telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Mathematical modeling of telomere dynamics is recent , , , . Trough a collaboration with Pr A. Benetos, geriatrician at CHU Nancy, and some members of Inria team TOSCA, we want to work in three connected directions: (1) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior of the distribution of lengths, quasi-stationarity, etc); (2) use these properties to develop new statistical methods for estimating the various parameters; and (3) find and use a suitable methodology for the analysis of the available data (Pr Benetos) for instance for the study of the length distribution for a subject and its evolution.

BIGS team has organised a two-days workshop "Rencontres des équipes Inria travaillant sur le cancer" that took place Paris in March. 10 inria teams were present. The program is available on https://

Keywords: statistical analysis, ordered trees

Scientific Description

The Matlab toolbox AGH provides methods for statistical analysis of ordered trees from their Harris paths in a user-friendly environment. More precisely it allows to easily compute estimators of the relative scale of trees which share the same shape. These estimators have been introduced for Galton-Watson trees conditioned on their number of nodes but may be computed for any ordered tree. The theoretical study of these estimators is presented in the associated paper which should be consulted in parallel.

Functional Description

The Matlab toolbox AGH provides methods for statistical analysis of ordered trees from their Harris paths in a user-friendly environment.

Participants: Romain Azaïs, Alexandre Genadot, Benoît Henry

Contact: romain.azais@inria.fr

Participants: A. Gégout-Petit

External collaborators: Y. Cao, S. Li, L. Guerin-Dubrana (Inra Bordeaux)

In the framework of a collaboration with INRA Bordeaux about the esca-illness of vines, Anne Gégout-Petit with Shuxian Li developed different spatial models and spatio-temporal models for different purposes: (1) study the distribution and the dynamics of esca vines in order to tackle the aggregation and the potential spread of the illness (2) propose a spatio-temporal model in order to capture the dynamics of cases and measure the effects of environmental covariates. For purpose (1), we propose different test based on the join count statistics, a paper is accepted for publication . We also developped a two-step centered autologistic model for the study of the dynamic of propagation. This work has been presented as invited paper in and is in preparation for publication.

Participant: S. Wantz-Mézières

External collaborator: J.-M. Moureaux, Y. Gaudeau, M. Ben Abdallah, M. Ouqamra (CRAN, Université de Lorraine), L. Taillandier, M. Blonski (CHU Nancy)

The collaboration with neurologists (CHU Nancy) and automaticians (CRAN) has carried on this year and led to the PhD presentation of M. Ben Abdallah, on December 12, 2016 , . We completed the modeling approach by a data analysis one. In the framework of a master 2 project, supervised and non supervised methods have confirmed our results on our local data base. This encourages us to continue our work in extending the data base via a collaboration with Montpellier CHU. Our perspectives are to validate multi-factor models, including biological and anatomopathological factors, and to design a decision-aid tool for praticians.

Participant: Céline Lacaux

External collaborator: Gennady Samorodnistky

In extreme value theory, one of the major topics is the study of the limiting behavior of the partial maxima of a stationary sequence. When this sequence is i.i.d., the unique limiting process is well-known and called the extremal process. Considering a long memory stable sequence, the limiting process is obtained as a simple power time change extremal process. Céline Lacaux and Gennady Samorodnistky have proved that this limiting process can also be interpreted as a restriction of a self-affine random sup measure. In addition, they have established that this random measure arises as a limit of the partial maxima of the same long memory stable sequence, but in a different space. Their results open the way to propose new self-similar processes with stationary max-increments.

Participant: Céline Lacaux

External collaborator: Hermine Biermé

Operator scaling Gaussian random fields, as anisotropic generalizations of self-similar fields, know an increasing interest in the literature. Up to now, such models were only defined through stochastic integrals, without knowing explicitly their covariance functions. In link with this misunderstanding, one of the drawbacks is that no exact method of simulation has been proposed. In view to fill this lack, Hermine Biermé and Céline Lacaux have recently exhibit explicit covariance functions, as anisotropic generalizations of fractional Brownian fields ones. This allows them to propose a fast and exact method to synthetise an operator scaling Gaussian random fields with such a covariance function. Their algorithm is based on the famous circulant embedding matrix method. This is a first piece of work to popularized operator scaling Gaussian random field in anisotropic spatial data modeling.

Participants: P. Vallois

External collaborators: A. Lagnoux and S. Mercier (Toulouse)

In an article accepted at Bioinformatics, the goal is to illustrate different results on the local score distribution assuming an i.i.d. model, especially the one based on the pair (local score,length) and the one on the local score position. We measure with statistical tests how different approximations of the local score distribution fit to simulated sequences. In particular, our simulations show that the popular Karlin & Altschul approximation is not accurate in a wide range of situations.
We add to the local score the length of the segment that realises it and we study the induced changes with numerical simulations. We also study specificity and sensitivity for the different methods. We introduce a new one dimensional statistic which is a function of

Participants: Romain Azaïs, Florian Bouguet, Anne Gégout-Petit, Florine Greciet, Aurélie Muller-Gueudin

External participants: Michel Benaïm (Université de Neuchâtel), Bertrand Cloez (Inra-SupAgro MISTEA), Alexandre Genadot (Inria CQFD, Université de Bordeaux)

A piecewise-deterministic Markov process is a stochastic process whose behavior is governed by an ordinary differential equation punctuated by random jumps occurring at random times. This class of stochastic processes offers a wide range of applications, especially in biology (kinetic diatery exposure model and growth of bacteria for example). BIGS' members mainly work on statistical inference techniques for these stochastic processes , , which is an essential step to build relevant application models. We also investigate the probabilistic properties of these processes , as well as the application in reliability to crack growth in some alloy in the industrial context of the PhD thesis of Florine Greciet with SAFRAN Aircraft Engines .

In a preprint recently accepted for publication in Electronic Journal of Statistics , we focus on the nonparametric estimation problem of the jump rate for piecewise-deterministic Markov processes observed within a long time interval under an ergodicity condition. More precisely, we introduce an uncountable class (indexed by the deterministic flow) of recursive kernel estimates of the jump rate and we establish their strong pointwise consistency as well as their asymptotic normality. In addition, we propose to choose among this class the estimator with the minimal variance, which is unfortunately unknown and thus remains to be estimated. We also discuss the choice of the bandwidth parameters by cross-validation methods. In , we state a new characterization of the jump rate when the transition kernel only charges a discrete subset of the state space. We deduce from this result a competitive nonparametric technique for estimating this feature of interest. We state the uniform convergence in probability of the estimator. Both the methodologies have been illustrated on numerical examples and real data.

Participant: Romain Azaïs

External participants: Bernard Delyon (Université Rennes 1), François Portier (Télécom ParisTech)

Suppose that a mobile sensor describes a Markovian trajectory in the ambient space. At each time the sensor measures an attribute of interest, e.g., the temperature. Using only the location history of the sensor and the associated measurements, the aim of the paper is to estimate the average value of the attribute over the space. In contrast to classical probabilistic integration methods, e.g., Monte Carlo, the proposed approach does not require any knowledge on the distribution of the sensor trajectory. Probabilistic bounds on the convergence rates of the estimator are established. These rates are better than the traditional “root

Participant: T. Bastogne

Photodynamic therapy (PDT) is an alternative treatment for cancer that involves the administration of a photosensitizing agent, which is activated by light at a specific wavelength. This illumination causes after a sequence of photoreactions, the production of reactive oxygen species responsible for the death of the tumor cells but also the degradation of the photosensitizing agent, which then loose the fluorescence properties. The phenomenon is commonly known as photobleaching process and can be considered as a therapy efficiency indicator. In , we present the design and validation of a real time controller able to track a preset photobleaching trajectory by modulating the light impulses width during the treatment sessions. This innovative solution was validated by in vivo experiments that have shown a significantly improvement of reproducibility of the inter-individual photobleaching kinetic. We believe that this approach could lead to personalized photodynamic therapy modalities in the near future.

Participant: T. Bastogne

The increase of computational environments dedicated to the simulation of nanoparticles (NP)-X-Rays interactions has opened new perspectives in computer-aided-design of nanostructured materials for biomedical applications. Several published studies have shown a crucial need of standardization of these numerical simulations . That is why, we proposed to perform a robustness multivariate analysis in . A gold nanoparticle (GNP) of 100 nm diameter was selected as a standard nano-system activated by a X-ray source placed just below the NP. Two response variables were examined: the dose enhancement in seven different spatial regions of interest around the NP and the duration of the experiments. 9 factors were pre-identified as potentially critical. A Plackett-Burman design of numerical experiments was applied to estimate and test the effects of each simulation factors on the examined responses. Four factors: the working volume, the spatial resolution, the spatial cutoff and the computational mode (parallelization) do not significantly affect the dose deposition results and none except the last one may reduce the computational duration. The energy cutoff may cause significant variations of the dose enhancement in some specific regions of interest: the higher the cutoff, the closer the secondary particles will stop from the GNP. By contrast, the Auger effect as well as the choice of the physical medium and the fluence level clearly appear as critical simulation parameters. Consequently, these four factors may be compulsory examined before comparing and interpreting any simulation results coming from different simulation sessions.

This in-silico approach was tested in to screen radiosensitizing nanoparticles and the results have been validated by in vitro assays.

Participant: Bruno Scherrer

Participant: Bruno Scherrer

We have made two contributions to the analysis of Approximate Dynamic Programming algorithms for Markov Games.

Participants: Aurélie Muller-Gueudin

We relate here a collaboration with researchers in Automatic in Nancy (CRAN).

We consider here networks, modeled as a graph with nodes and edges representing the agents and their interconnections, respectively. The connectivity of the network, persistence of links and interactions reciprocity influence the convergence speed towards a consensus.

The problem of consensus or synchronization is motivated by different applications as communication networks, power and transport grids, decentralized computing networks, and social or biological networks.

We then consider networks of interconnected dynamical systems, called agents, that are partitioned into several clusters. Most of the agents can only update their state in a continuous way using only inner-cluster agent states. On top of this, few agents also have the peculiarity to rarely update their states in a discrete way by reseting it using states from agents outside their clusters. In social networks, the opinion of each individual evolves by taking into account the opinions of the members belonging to its community. Nevertheless, one or several individuals can change its opinion by interacting with individuals outside its community. These inter-cluster interactions can be seen as resets of the opinions. This leads us to a network dynamics that is expressed in term of reset systems. We suppose that the reset instants arrive stochastically following a Poisson renewal process.

We have an accepted paper in the journal IEEE Transactions on Automatic Control .

Participant: Romain Azaïs

External participants: Jean-Baptiste Durand (ENSIMAG, Inria MISTIS), Christophe Godin (Inria Virtual Plants), Benoît Henry (Inria TOSCA puis Madynes), Alexandre Genadot (Université de Bordeaux, Inria CQFD)

Tree-structured data naturally appear in various fields, particularly in biology where plants and blood vessels may be described by trees, but also in computer science because XML documents form a tree structure. Among trees, the class of self-nested trees presents remarkable compression properties because of the systematic repetition of subtrees in their structure. In a recent preprint , we provide a better combinatorial characterization of this specific family of trees. We show that self-nested trees may be considered as a good approximation class of unordered trees. In addition, we compare our approximation algorithms with a competitive approach of the literature on a simulated dataset. On the other hand, the paper is devoted to the estimation of the relative scale of ordered trees that share the same layout. The theoretical study is achieved for the stochastic model of conditioned Galton-Watson trees. New estimators are introduced and their consistency is stated. A comparison is made with an existing approach of the literature. A simulation study shows the good behavior of our procedure on finite-sample sizes. An application to the analysis of revisions of Wikipedia articles is also considered through real data.

Participants: A. Gégout-Petit, A. Muller-Gueudin, Y. Shi

External collaborators: B. Bastien (Transgene, Strasbourg)

In the purpose to select factors linked to the efficiency of a treatment in the context of high dimension (about 100.000 covariates), we have developed a new methodology to select and rank covariates associated to a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology imbricates successively rough selection, clustering of variables, decorrelation of variables using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking through bootstrap replications. Simulations study shows the interest of the decorrelation inside the different clusters of covariates. The methodology was applied to select covariates among genomics, proteomics covariates linked to the success of a immunotherapy treatment for the lung cancer. A paper on the subject is in preparation.

Participant: J.-M. Monnez

External collaborator: O. Collignon (LIH Luxembourg)

In supervised learning the number of values of a response variable to predict can be very high. Grouping these values in a few clusters can be useful to perform accurate supervised classification analyses. On the other hand selecting relevant covariates is a crucial step to build robust and efficient prediction models. We propose in this paper an algorithm that simultaneously groups the values of a response variable into a limited number of clusters and selects stepwise the best covariates that discriminate this clustering. These objectives are achieved by alternate optimization of a user-defined selection criterion. This process extends a former version of the algorithm to a more general framework. Moreover possible further developments are discussed in detail .

Participant: J.-M. Monnez, K. Duarte

External collaborator: E. Albuisson (CHU, Nancy)

The purpose of this study was to define a short term event (death or hospitalization) score for heart failure patients based on the observation of biological, clinical and medical historical variables. Some of them were transformed or winsorized. Two methods of statistical learning were performed, logistic regression and linear discriminant analysis, different variable selection methods were used, on bootstrap samples. Aggregation of classifiers and out-of-bag validation were used. Finally a score taking values between 0 and 100 was established and an odds-ratio was defined in order to support medical decision (writing in progress).

Participant: J.-M. Monnez, K. Duarte

External collaborator: E. Albuisson

We consider the problem of sequential least square multidimensional linear regression using a stochastic approximation process. The choice of the stepsize may be crucial in this type of process. In order to avoid the risk of numerical explosion which can be encountered, we define three processes with a variable or a constant stepsize and establish their convergence. Finally these processes are compared to classic processes on 11 datasets, 6 with a continuous output and 5 with a binary output, for a fixed total number of observations used and then for a fixed processing time. It appears that the third-defined process with a very simple choice of the stepsize gives usually the best results (paper to be submitted).

Participants: T. Bastogne, L. Batista

System identification is a data-driven modeling approach more and more used in biology and biomedicine . In this application context, each assay is always repeated to estimate the response variability. The inference of the modeling conclusions to the whole population requires to account for the inter-individual variability within the modeling procedure. One solution consists in using mixed effects models but up to now no similar approach exists in the field of dynamical system identification. In , we propose a new solution based on an ARX (Auto Regressive model with eXternal inputs) structure using the EM (Expectation-Maximisation) algorithm for the estimation of the model parameters. Simulations show the relevance of this solution compared with a classical procedure of system identification repeated for each subject.

In parallel, we applied the mixed-effect modeling approach to the analysis in vivo responses in order to identify pronostic biomarkers of tumor regrowth after photodynamic therapy . This application corroborated the practical relevance of our model-based approach.

Participants: S.Ferrigno, A. Muller-Gueudin, M. Maumy-Bertrand (IRMA, Strasbourg)

In this work, we study the conditional cumulative distribution function and a nonparametric estimator associated to this function. The conditional cumulative distribution function has the advantages of completely characterizing the law of the random considered variable, allowing to obtain the regression function, the density function, the moments and the conditional quantile function. As a nonparametric estimator of this function, we focus on local polynomial techniques described in Fan and Gijbels [ref]. In particular, we use the local linear estimation of the conditional cumulative distribution function.

The objective of this work is to establish uniform asymptotic certainty bands for the conditional cumulative distribution function. To this aim, we give exact rate of strong uniform consistency for the local linear estimator of this function (writing in progress).

Participants: R.Azaïs, S.Ferrigno, M-J Martinez Marcoux (LJK, Grenoble)

The aim of this collaboration begins is to compare, through simulations, several methods to test the validity of a regression model. These tests can be "directional" in that they are designed to detect departures from mainly one given assumption of the model (for example the regression function, the variance or the error) or global (for example the conditional distribution function). The establishment of such statistical tests require the use of nonparametric estimators various functions (regression, variance, cumulative distribution function). The idea would then be able to build a tool (package R) that allows a user to test the validity of the model it uses through different methods and varying parameters associated with modeling. This work is currently in progress.

Participants: A. Gégout-Petit, A. Muller-Gueudin, Y. Shi

Transgene (Euronext: TNG), part of Institut Mérieux, is a publicly traded French biopharmaceutical company focused on discovering and developing targeted immunotherapies for the treatment of cancer and infectious diseases. B. Bastien, head of the biostatistics team appeals to BIGS to select covariates among genomics, proteomics expressions linked to the success of a treatment of the lung cancer. This subject was the purpose of the master thesis of Y. Shi and a paper on the subject is in preparation.

Participants: T. Bastogne, L. Batista, P. Vallois

Transgene (Euronext: TNG), part of Institut Mérieux, is a publicly traded French biopharmaceutical company focused on discovering and developing targeted immunotherapies for the treatment of cancer and infectious diseases. B. Bastien, head of the biostatistics team appeals to BIGS to model data collected in vivo for growth tumor and to measure the effect of the treatment on the dynamics of the tumor.

Participants: R. Azaïs, A. Gégout-Petit, F. Greciet

SAFRAN Aircraft Engines designs and products Aircraft Engines. For the design of pieces, they have to understand mechanism of crack propagation under different conditions. It appeals to BIGS for modeling crack propagation with Piecewise Deterministic Markov Processes (PDMP). It is the subject of F. Greciet PhD, granted by ANRT. F. Greciet presented her work during a Fédération Charles Hermite Journey on November the 23th. She was laureat of "Mathématiques, oxygene du monde numérique" poster challenge .

*PEPS AMIES* (2016), Apprentissage supervisé pour l'aide au diagnostic, Collaboration Institut Elie Cartan avec la StartUp SD Innovation Frouard. Participants: A. Gégout-Petit, P. Vallois

*Popart (2016-2017)* In the framework of collaboration with A. Deveau of Inra Nancy, A. Gégout-Petit and A. Muller-Gueudin are included in the Inra "Microbial Ecosystems & Metaomics, Call 2016" Project "Popart"for "Regulation of the Poplar microbiome by its host: is the immune system involved ? ". The aim is to develop methodology for the inference of regulation network betwen micro-organisms around Poplar. The specificity of the data is the inflation of zeros that has to be taken into account.

*Intérêt des antiangiogènes dans la potentialisation
des thérapies par rayonnement dans le cas des glioblastomes* (2016). Funding organism: Ligue contre
le Cancer (CCIR-GE). Leader: N. Thomas (CRAN, U. Lorraine). Participants : C. Lacaux and A. Muller-Gueudin

(2014-16), A library of Near-InfraRed absorbing photosensitizers: tailoring and assessing photophysical and synergetic photodynamic properties, Funding organism: PHC Bosphore - Campus France, Leader: M. Barberi-Heyob (CRAN), Thierry Bastogne

GDR 3475 Analyse Multifractale, Funding organism: CNRS, Leader: S. Jaffard (Université Paris-Est), Céline Lacaux

GDR 3477 Géométrie stochastic, Funding organism: CNRS, Leader: P. Calka (Université Rouen), Céline Lacaux

FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing ; leader : Pr Athanase BENETOS), Jean-Marie Monnez

RHU Fight HF (Fighting Heart Failure ; leader : Pr Patrick ROSSIGNOL), located at the University Hospital of Nancy, Jean-Marie Monnez

Project "Handle your heart", team responsible for the creation of a drug prescription support software for the treatment of heart failure, head: Jean-Marie Monnez

Photobrain project. AGuIX theranostic nanoparticles for vascular-targeted interstitial photodynamic therapy of brain tumors, project **EuroNanoMed II**, resp.: M. Barberi-Heyob, (2015-2017), participant: T. Bastogne.

NanoBit Project. Nanoscintillator-Porphyrin Complexes for Bimodal RadioPhotoDynamic Therapy, project **EuroNanoMed II**, resp.: P. Juzenas, (2016-2018), participant: T. Bastogne.

BIGS team has organised a two-days workshop "Rencontres des équipes Inria travaillant sur le cancer" in Paris. 10 inria teams were present. https://

A. Gégout-Petit co-organised the "Health Session" of the day Fédération Charles Hermite- Enterprises, Nancy, January 2016.

Céline Lacaux participated to the organisation of the following events:

European Study Group with Industry, 117th edition, May 2016, Avignon.

Session *Statistics* of the 14th Colloque Franco-Roumain de Mathématiques Appliquées, August 2016, Iasi.

Conference of GDR 3475 Analyse Multifractale, September 2016, Avignon.

A. Gégout-Petit is chair of 2017 "Congrès Francophone International de l'Enseignement de la Statistique" (CFIES), Grenoble, September, 2017.

P. Vallois is in the editorial board of "Risk and Decision Analysis".

All the BIGS members are regular reviewers for journals in probability, statistics and machine learning as: Bernoulli, Scandinavian Journal of statistics, Stochastics, Journal of Statistical Planning Inference, Journal of theoretical Biology, IEEE Trans. Biomedical Eng., Theoretical Biology and Medical Modelling, Royal Society of Chemistry, Signal Processing: Image Communication, Mathematical Biosciences, LIDA, Annals of Applied Probability, Annals of Operations Research and Journal of Machine Learning Research, as well as conferences such as ICML, World IFAC Congress, FOSBE, ALCOSP...

Anne Gégout-Petit was invited in SADA'2016 conference in Cotonou.

Bruno Scherrer was an invited speaker in EWRL'2016 workshop in Barcelona.

Thierry Bastogne was invited the 19 October 2015 by Pr. Luc Leyns at the Vrije Universities Brussels for a talk on the *Statistical Analysis of Cell Impedance Signals for the Characterization of Anti-Cancer Drug Effects*.

Romain Azaïs was an invited speaker in SSIAB 2016 in Rennes.

S. Ferrigno : *Nouvelles approches d'estimation de la croissance en Foetopathologie*. Journée "Nouvelle approche de la croissance foetale", Maternité Régionale du CHRU de Nancy (Sept 2016).

Anne Gégout-Petit is member of the board of the European Regional Council of the Bernoulli society

Céline Lacaux is responsible of the *Statistic team*, Laboratory of Mathematic of Avignon (since September 2016)

T. Bastogne: scientific expert in Biostatistics and Signal Processing in Nanomedicine for CYBERnano (start-up).

A. Gégout-Petit: elected member of the laboratory of mathematics "Institut Elie Cartan de Lorraine".

Céline Lacaux is

member of the bord of the SMAI-MAS group,

elected member of the council of the Laboratory Mathematics of Avignon,

correspondant AMIES pour Avignon,

Member of the scientific committee of GDR 3477 Stochastic Geometry.

A. Gégout-Petit : Head of the Master 2 "Ingénierie Mathématique et Outils Informatiques (Mathematical Engineering and Computer Tools)", Université de Lorraine

A. Gégout-Petit created and is now in charge of cursus CMI in applied mathematics for Lorraine University

P. Vallois is head of the "Parcours Mathématiques Financières" of the master "Applied mathematics" of Université de Lorraine

P. Vallois is head of the convention between "Université de Lorraine and Université Hammam Sousse" about master organization. Master ISC (Ingénierie de Systèmes Complexes)

T. Bastogne is in charge of the spécialité Systèmes & TIC du master Ingénierie de Systèmes Complexes

T. Bastogne created and is now in charge of professional master: CIIBLE (Cybernétique, Instrumentation, Image en Biologie et medecinE) en M2 with Medicine Faculty of Université de Lorraine

T. Bastogne created and is now in charge research master << Biosanté Numérique >> with engineering school "Telecom Nancy"

Master: S. Ferrigno, Experimental designs, 4.5h, M1, fourth year of EEIGM, Université de Lorraine, France

Master: S. Ferrigno, Data analyzing and mining, 63h, M2, third year of Ecole des Mines, Université de Lorraine, France

Master: S.Ferrigno, Modeling and forecasting, 43h, M1, second year of Ecole des Mines, Université de Lorraine, France

Master: S.Ferrigno, Training projects, 18h, M1/M2, second and third year of Ecole des Mines, Université de Lorraine, France

Master: A. Muller-Gueudin, Probability and Statistics, 160h, second year of ENSEM and ENSAIA, University of Lorraine, France.

Master: A. Muller-Gueudin, Scientific calculation with Matlab, 20h, second year of ENSAIA, University of Lorraine, France.

Master: R. Azaïs, Machine learning, 20h, M2, Université de Lorraine and third year of Telecom Nancy, France.

Master: R. Azaïs, Machine learning, 20h, M1, second year of Ecole des Mines, Université de Lorraine, France.

Master: J.-M. Monnez, Multivariate Analysis, Master 2 IFM (Ingénierie de la Finance de Marché), until June 2016.

Master: A.Gégout-Petit, Statistics, modeling, 15h, future teacher, Université de Lorraine, France

Master: A.Gégout-Petit, Statistics, modeling, data analysis, 80h, master in applied mathematics, Université de Lorraine, France

Licence: S. Wantz-Mézières, Applied mathematics for management, financial mathematics, Probability and Statistics, 160h, I.U.T. (L1/L2/L3)

Licence: S. Wantz-Mézières, Probability, 100h, first year in Telecom Nancy engineering school (initial and apprenticeship cursus)

Master: J.-M. Monnez, Data Analysis, Statistical Learning, Master 2 IMOI (Ingénierie Mathématique et Outils Informatiques), until June 2016.

Licence: A. Muller-Gueudin, Statistics, 60h, first year of ENSAIA, University of Lorraine, France.

Licence: S. Ferrigno, Descriptive and inferential statistics, 60h, L2, second year of EEIGM, Université de Lorraine, France

Licence: S. Ferrigno, Statistical modeling, 60h, L2, second year of EEIGM, Université de Lorraine, France

Licence: S. Ferrigno, Mathematical and computational tools, 20h, L3, third year of EEIGM, Université de Lorraine, France

Licence: S. Ferrigno, Training projects, 20h, L1/L3, first, second and third year of EEIGM, Université de Lorraine, France

Licence: C. Lacaux: Probability and Statistic, 75h, L3, University of Avignon.

Licence: C. Lacaux: Numerical simulation in probability, 36.75h, L3, University of Avignon.

Licence: C. Lacaux: Probability and Statitics, 22.5h, L1, University of Avignon.

Licence: C. Lacaux: Statistic techniques applied to SVT, 25.5h, L3, University of Avignon.

Licence: C. Lacaux: Statistics, 24h, L2, University of Avignon.

PhD : Clémence Karmann, " Network inference for zero-inflated models", Grant : Inria-Cordis. Advisors: A. Gégout-Petit, A. Muller-Gueudin.

PhD : Florine Gréciet, " Modèles markoviens déterministes par morceaux cachés pour la propagation de fissures", grant CIFRE SAFRAN AIRCRAFT ENGINES, Advisors : R. Azaïs, A. Gégout-Petit.

PhD : Kévin Duarte, "Aide à la décision médicale et télémédecine dans le suivi de l'insuffisance cardiaque", Advisors : J.-M. Monnez and E. Albuisson.

PhD : P. Retif. Modeling, digital simulation and analysis of nanoparticles-X ray interaction. Applications to augmented radiotherapy. Theses, Université de Lorraine, Mar. 2016.

Post-doc: Florian Bouguet. Advisors: Romain Azaïs, Anne Gégout-Petit, Aurélie Muller-Gueudin.

Post-doc: Benoît Henry (starting in Dec. 2016). Advisors: Romain Azaïs with Inria team Madynes.

Master: Yaojie Shi, Toulouse School of economics, 2016. « Analyse de données transcriptomiques et protéomiques en oncologie », Advisors: A. Gégout-Petit, A. Muller-Gueudin, B. Bastien (Société Transgene, Strasbourg).

Master: Yuyan Cao, Toulouse School of economics, 2016. " Spatio-temporal Bayesian models for the analysis of esca disease", with Inra Bordeaux. Advisor: A. Gégout-Petit, L. Guérin-Dubrana.

Master: Félicie Bonte, Master Ecologie Ecologie Lille, AgroParistech Nancy et Museum National d'Histoire Naturelle 2016. « Etude des modifications de la croissance et du développement des plantes herbacées en forêt en réponse aux changements globaux », co-direction AgroParistech Nancy. Advisors: A. Gégout-Petit, Jean-Claude Gégout, Serge Muller.

Master: all BIGS members regularly supervise project and internship of master IMOI students

Engineering school: all BIGS members regularly supervise project of "Ecole des Mines ", ENSEM or EEIGM students

HDR, Bruno Scherrer, "Contributions algorithmiques au contrôle optimal stochastique à temps discret et horizon infini", Université de Lorraine, July, 2016, Examinator, A. Gégout-Petit

HDR, Corine Hahn, "Penser la question didactique pour la formation en alternance dans l'enseignement supérieur. Dispositifs frontières, Statistique et Management." Université Lyon 2, May, 2016, Examinator, A. Gégout-Petit

PhD, Florian Bouguet, "Etude quantitative de processus de Markov déterministes par morceaux issus de la modélisation", Université de Rennes, June, 2016, Referee: A. Gégout-Petit

PhD, Houda Ghamlouch, "Modélisation de la dégradation, maintenance conditionnelle et pronostic : usage des processus de diffusion", Université Technologique de Troyes, June, 2016, Referee: A. Gégout-Petit

PhD, Etienne Baratchart, "Etude quantitative des aspects dynamiques et spatiaux du développement métastatique à l'aide de modèles mathématiques", Université de Bordeaux, February, 2016, Referee: A. Gégout-Petit

PhD, Johann Cuenin, Sur les modèles Tweedie multivariés, Université de Besancon, December, Examinator: A. Gégout-Petit

PhD : Clémence Chamard-Jovenin, Impact d'une surexpression d'ERα36 et/ou d'une exposition aux alkylphénols sur la physiopathologie de la glande mammaire, Université de Lorraine, 9 décembre 2016. Examinator : A. Muller-Gueudin.

PhD : M. Ben Abdallah, Université de Lorraine, "Un modèle de l'évolution des gliomes diffus de bas grade sous chimiothérapie", December, 12, 2016, jury member: S. Wantz-Mézières.

PhD : Marc Bourotte, Générateur stochastique de temps multisite basé sur un champ gaussien multivarié, INRA, Équipe BioSP, July, 4th,2016, President : C. Lacaux.

PhD : Nhu Dang, Estimation des indices de stabilité et d'autosimilarité par variations de puissances négatives, Laboratoire Jean Kuntzmann Grenoble, July, 5th,2016, Examinator : C. Lacaux.

A. Gégout-Petit is involved in the promotion of study in the fields of mathematics in Lorraine university. She was very active in the realisation of the video for the promotion of Mathematical studies http://

A. Gégout-Petit participates to the "Table ronde Bourse aux technologues, Big data et industrie du futur", Ecole des Mines de Nancy, November, 2016.

Animation d'ateliers MATh.en.JEANS en collège dans la région de Nancy (Romain Azaïs, Clémence Karmann)

S. Ferrigno: Advisor of a group of students, "La main à la Pâte" project, elementary schools, Nancy, January-June 2016

S. Ferrigno: Advisor of a group of students, "La main à la Pâte" project, Institut médico-éducatif (IME), Commercy, September-December 2016

S. Ferrigno: Advisor of a group of students, "De Léonard de Vinci au Drone" project, Collège Paul Verlaine, Malzéville, December 2016-February 2017.