The core component of our scientific agenda focuses on the development of statistical and probabilistic methods for the modeling and the optimization of complex systems. These systems require mathematical representations which are in essence dynamic and stochastic with discrete and/or continuous variables. This increasing complexity poses genuine scientific challenges that can be addressed through complementary approaches and methodologies:

Modeling: design and analysis of realistic and tractable models for such complex real-life systems and various probabilistic phenomena;

Estimation: developing theoretical and computational procedures in order to estimate and evaluate the parameters and the performance of the system;

Optimization: developing theoretical and numerical control tools to optimize the performance of complex systems such as computer systems and communication networks.

The scientific objectives of the team are to provide mathematical tools for modeling and optimization of complex systems. These systems require mathematical representations which are in essence dynamic, multi-model and stochastic. This increasing complexity poses genuine scientific challenges in the domain of modeling and optimization. More precisely, our research activities are focused on stochastic optimization and (parametric, semi-parametric, multidimensional) statistics which are complementary and interlinked topics. It is essential to develop simultaneously statistical methods for the estimation and control methods for the optimization of the models.

Stochastic modeling: Markov chain, Piecewise Deterministic Markov Processes (PDMP), Markov Decision Processes (MDP).

The mathematical representation of complex systems is a preliminary step to our final goal corresponding to the optimization of its performance. For example, in order to optimize the predictive maintenance of a system, it is necessary to choose the adequate model for its representation. The step of modeling is crucial before any estimation or computation of quantities related to its optimization. For this we have to represent all the different regimes of the system and the behavior of the physical variables under each of these regimes. Moreover, we must also select the dynamic variables which have a potential effect on the physical variable and the quantities of interest. The team CQFD works on the theory of Piecewise Deterministic Markov Processes (PDMP's) and on Markov Decision Processes (MDP's). These two classes of systems form general families of controlled stochastic processes suitable for the modeling of sequential decision-making problems in the continuous-time (PDMPs) and discrete-time (MDP's) context. They appear in many fields such as engineering, computer science, economics, operations research and constitute powerful class of processes for the modeling of complex system.

Estimation methods: estimation for PDMP; estimation in non- and semi parametric regression modeling.

To the best of our knowledge, there does not exist any general theory for the problems of estimating parameters of PDMPs although there already exist a large number of tools for sub-classes of PDMPs such as point processes and marked point processes. However, to fill the gap between these specific models and the general class of PDMPs, new theoretical and mathematical developments will be on the agenda of the whole team. In the framework of non-parametric regression or quantile regression, we focus on kernel estimators or kernel local linear estimators for complete data or censored data. New strategies for estimating semi-parametric models via recursive estimation procedures have also received an increasing interest recently. The advantage of the recursive estimation approach is to take into account the successive arrivals of the information and to refine, step after step, the implemented estimation algorithms. These recursive methods do require restarting calculation of parameter estimation from scratch when new data are added to the base. The idea is to use only the previous estimations and the new data to refresh the estimation. The gain in time could be very interesting and there are many applications of such approaches.

Dimension reduction: dimension-reduction via SIR and related methods, dimension-reduction via multidimensional and classification methods.

Most of the dimension reduction approaches seek for lower dimensional subspaces minimizing the loss of some statistical information. This can be achieved in modeling framework or in exploratory data analysis context.

In modeling framework we focus our attention on semi-parametric models in order to conjugate the advantages of parametric and nonparametric modeling. On the one hand, the parametric part of the model allows a suitable interpretation for the user. On the other hand, the functional part of the model offers a lot of flexibility.
In this project, we are especially interested in the semi-parametric regression model

Methods of dimension reduction are also important tools in the field of data analysis, data mining and machine learning.They provide a way to understand and visualize the structure of complex data sets.Traditional methods among others are principal component analysis for quantitative variables or multiple component analysis for qualitative variables. New techniques have also been proposed to address these challenging tasks involving many irrelevant and redundant variables and often comparably few observation units. In this context, we focus on the problem of synthetic variables construction, whose goals include increasing the predictor performance and building more compact variables subsets. Clustering of variables is used for feature construction. The idea is to replace a group of ”similar” variables by a cluster centroid, which becomes a feature. The most popular algorithms include K-means and hierarchical clustering. For a review, see, e.g., the textbook of Duda

Stochastic optimal control: optimal stopping, impulse control, continuous control, linear programming.

The first objective is to focus on the development of computational methods.

In the continuous-time context, stochastic control theory has from the numerical point of view, been mainly concerned with Stochastic Differential Equations (SDEs in short). From the practical and theoretical point of view, the numerical developments for this class of processes are extensive and largely complete. It capitalizes on the connection between SDEs and second order partial differential equations (PDEs in short) and the fact that the properties of the latter equations are very well understood. It is, however, hard to deny that the development of computational methods for the control of PDMPs has received little attention. One of the main reasons is that the role played by the familiar PDEs in the diffusion models is here played by certain systems of integro-differential equations for which there is not (and cannot be) a unified theory such as for PDEs as emphasized by M.H.A. Davis in his book. To the best knowledge of the team, there is only one attempt to tackle this difficult problem by O.L.V. Costa and M.H.A. Davis. The originality of our project consists in studying this unexplored area. It is very important to stress the fact that these numerical developments will give rise to a lot of theoretical issues such as type of approximations, convergence results, rates of convergence,....

Theory for MDP's has reached a rather high degree of maturity, although the classical tools such as value iteration, policy iteration and linear programming, and their various extensions, are not applicable in practice. We believe that the theoretical progress of MDP's must be in parallel with the corresponding numerical developments. Therefore, solving MDP's numerically is an awkward and important problem both from the theoretical and practical point of view. In order to meet this challenge, the fields of neural networks, neuro-dynamic programming and approximate dynamic programming became recently an active area of research. Such methods found their roots in heuristic approaches, but theoretical results for convergence results are mainly obtained in the context of finite MDP's. Hence, an ambitious challenge is to investigate such numerical problems but for models with general state and action spaces. Our motivation is to develop theoretically consistent computational approaches for approximating optimal value functions and finding optimal policies.

An effort has been devoted to the development of efficient computational methods in the setting of communication networks. These are complex dynamical systems composed of several interacting nodes that exhibit important congestion phenomena as their level of interaction grows. The dynamics of such systems are affected by the randomness of their underlying events (e.g., arrivals of http requests to a web-server) and are described stochastically in terms of queueing network models. These are mathematical tools that allow one to predict the performance achievable by the system, to optimize the network configuration, to perform capacity-planning studies, etc. These objectives are usually difficult to achieve without a mathematical model because Internet systems are huge in size. However, because of the exponential growth of their state spaces, an exact analysis of queueing network models is generally difficult to obtain. Given this complexity, we have developed analyses in some limiting regime of practical interest (e.g., systems size grows to infinity). This approach is helpful to obtain a simpler mathematical description of the system under investigation, which leads to the direct definition of efficient, though approximate, computational methods and also allows to investigate other aspects such as Nash equilibria.

The second objective of the team is to study some theoretical aspects related to MDPs such as convex analytical methods and singular perturbation. Analysis of various problems arising in MDPs leads to a large variety of interesting mathematical problems.

Our abilities in probability and statistics apply naturally to industry in particular in studies of dependability and safety.

An illustrative example which gathers all the topics of team is a
collaboration started in May 2010 with Thales Optronique
on the subject of *optimization of the maintenance of a
digital camera equipped with HUMS* (Health Unit Monitoring Systems).
This subject is very interesting for us because it combines many
aspects of our project. Classification tools will be used to select
significant variables as the first step in the modeling of a digital
camera. The model will then be analysed and estimated in order to optimize the maintenance.

A second example concerns the optimization of the maintenance date for an aluminum metallic structure subject to corrosion. It is a structure of strategic ballistic missile that is stored in a nuclear submarine missile launcher in peace-time and inspected with a given periodicity. The requirement for security on this structure is very strong. The mechanical stress exerted on the structure depends on its thickness. It is thus crucial to control the evolution of the thickness of the structure over time, and to intervene before the break.

A third example is the minimization of the acoustic signature of a submarine. The submarine has to chose its trajectory in order to minimize at each time step its observability by a surface ship following an unknown random trajectory.

However the spectrum of applications of the topics of the team is larger and may concern many other fields. Indeed non parametric and semi-parametric regression methods can be used in biometry, econometrics or engineering for instance. Gene selection from microarray data and text categorization are two typical application domains of dimension reduction among others. We had for instance the opportunity via the scientific program PRIMEQUAL to work on air quality data and to use dimension reduction techniques as principal component analysis (PCA) or positive matrix factorization (PMF) for pollution sources identification and quantization.

Mixed data type arise when observations are described by a mixture of numerical and categorical variables. The R package PCAmixdata extends standard multivariate analysis methods to incorporate this type of data. The key techniques included in the package are PCAmix (PCA of a mixture of numerical and categorical variables), PCArot (rotation in PCAmix) and MFAmix (multiple factor analysis with mixed data within a dataset). The MFAmix procedure handles a mixture of numerical and categorical variables within a group - something which was not possible in the standard MFA procedure. We also included techniques to project new observations onto the principal components of the three methods in the new version of the package.

QuantifQuantile is an R package that allows to perform quantization-based quantile regression. The different functions of the package allow the user to construct an optimal grid of N quantizers and to estimate conditional quantiles. This estimation requires a data driven selection of the size N of the grid that is implemented in the functions. Illustration of the selection of N is available, and graphical output of the resulting estimated curves or surfaces (depending on the dimension of the covariate) is directly provided via the plot function.

Biips is a software platform for automatic Bayesian inference with interacting particle systems. Biips allows users to define their statistical model in the probabilistic programming BUGS language, as well as to add custom functions or samplers within this language. Then it runs sequential Monte Carlo based algorithms (particle filters, particle independent Metropolis-Hastings, particle marginal Metropolis-Hastings) in a black-box manner so that to approximate the posterior distribution of interest as well as the marginal likelihood. The software is developed in C++ with interfaces with the softwares R, Matlab and Octave .

Creation of the Associate Team Inria: CDSS (2014-2016) with the University of Sao Paulo, Brasil.

We propose a new numerical approximation of the Kalman–Bucy filter for semi-Markov jump linear systems. This approximation is based on the selection of typical trajectories of the driving semi-Markov chain of the process by using an optimal quantization technique. The main advantage of this approach is that it makes pre-computations possible. We derive a Lipschitz property for the solution of the Riccati equation and a general result on the convergence of perturbed solutions of semi-Markov switching Riccati equations when the perturbation comes from the driving semi-Markov chain. Based on these results, we prove the convergence of our approximation scheme in a general infinite countable state space framework and derive an error bound in terms of the quantization error and time discretization step. We employ the proposed filter in a magnetic levitation example with markovian failures and compare its performance with both the Kalman–Bucy filter and the Markovian linear minimum mean squares estimator. This work was presented at the international conference and is submitted to an international journal .

We are interested in the optimization of a launcher integration process. It comprises several steps from the production of the subassemblies to the final launch. The four subassemblies go through various types of operations such as preparation, integration, control and storage. These operations are split up into three workshops. Due to possible breakdowns or staff issues, the time spent in each workshop is supposed random. So is the time needed to deliver the subassemblies, for similar reasons including e.g. shipping delays. We also have to deal with constraints related to the architecture of the assembly process itself. Indeed, we have to take into account waiting policies between workshops. The workshops may work in parallel but can be blocked if their output is not transferred to the next workshop in line. Storage capacity of output products is limited.

Our goal is finding the best rates of delivery of the subassemblies, the best choice of architecture (regarding stock capacities) and the best times when to stop and restart the workshops to be able to carry out twelve launches a year according to a predetermined schedule at minimal cost. To solve this problem, we choose a mathematical model particularly suitable for optimization with randomness: Markov decision processes (MDPs).

We have implemented a numerical simulator of the process based on the MDP model. It provides the fullest information possible on the process at any time. The simulator has first been validated with deterministic histories. Random histories have then been run with exponentially distributed delivery times for the subassemblies and several families of random laws for the time spent in each workshop. Using Monte Carlo simulations, we obtain the distribution of the launch times. Preliminary optimization results allow choosing stock capacities and delivery rates that satisfy the launch schedule. Work is still in progress concerning cost minimization. It was presented at Airbus internal PhD seminar in November 2014.

We consider the optimal stopping problem for a continuous finite-dimensional state space Markov chain under partial observation. Our aim is to build a numerical approximation of the value function. To do so, we first translate the problem into the Partially Observed Markov Decision Process (POMDP) framework. Then, we define the equivalent fully observed Markov Decision Process (MDP) on an infinite dimensional state space. Finally, we proposed a discretization scheme based on the discretization of an underlying measure to obtain a finite dimensional problem and a discretization of the resulting state space to obtain a fully discrete model that is numerically tractable. We prove the convergence of the approximation procedure. This work is still in progress and was presented at the workshop

The goal of this work is to predict the state of alertness of an individual by analyzing the brain activity through electroencephalographic data (EEG) captured with 58 electrodes. Alertness is characterized here as a binary variable that can be in a "normal" or "relaxed" state. We collected data from 44 subjects before and after a relaxation practice, giving a total of 88 records. After a pre-processing step and data validation, we analyzed each record and discriminate the alertness states using our proposed "slope criterion". Afterwards, several common methods for supervised classification (k nearest neighbors, decision trees (CART), random forests, PLS and discriminant sparse PLS) were applied as predictors for the state of alertness of each subject. The proposed "slope criterion" was further refined using a genetic algorithm to select the most important EEG electrodes in terms of classification accuracy. Results show that the proposed strategy derives accurate predictive models of alertness.

This work has been published in a book chapter .

We propose a novel class of algorithms for low rank matrix completion. Our approach builds on novel penalty functions on the singular values of the low rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an EM algorithm to obtain a Maximum A Posteriori estimate of the completed low rank matrix. The resulting algorithm is an iterative soft-thresholded algorithm which iteratively adapts the shrinkage coefficients associated to the singular values.

This work is in collaboration with Francois Caron from University of Oxford. It has been presented in the national conference of the French Statistical Society of Statistics

The analysis and measurement of quality of life may be made via two complementary approaches. The first one, based on survey of individuals, concerns the analysis of levels of life satisfaction. We focus here on the second one, based on national data, which analyses living conditions of people. The aim is to create composite indices of living conditions. According to authors, the components of quality of life are related to different themes (groups of variables): Family conditions", Employment", Housing",... For this purpose, dimension reduction methods are particularly suitable. Multiple Factor Analysis (MFA) is a method designed to handle data structured into groups of quantitative variables. In our study, each theme is composed of a group of quantitative and/or categorical variables. Since our data are naturally structured in groups of variables, we develop an extension of MFA for mixed data type, called MFAmix. Thus the principal components from MFAmix are our composite indices for measuring quality of life. However, the creation of these indices raises two questions. How many principal components keep to create indices? How select a limited number of variables to get similar indices for easier interpretation? We propose answers to these questions in this communication.

This work is in collaboration with Vanessa Kuentz from Irstea. It has been presented in the french meeting of the R users (Rencontres R) and in the international conference COMPSTAT 2014 .

We consider Jackson queueing networks with finite buffer constraints (JQN) and analyze the efficiency of sampling from their stationary distribution. In the context of exact sampling, the monotonicity structure of JQNs ensures that such efficiency is of the order of the 'coupling time' (or meeting time) of two extremal sample paths. In the context of approximate sampling, it is given by the 'mixing time'. Under a condition on the drift of the stochastic process underlying a JQN, which we call hyper-stability, in our main result we show that the coupling time is polynomial in both the number of queues and buffer sizes. Then, we use this result to show that the mixing time of JQNs behaves similarly up to a given precision threshold. Our proof relies on a recursive formula relating the coupling times of trajectories that start from network states having 'distance one', and it can be used to analyze the coupling and mixing times of other Markovian networks, provided that they are monotone. An illustrative example is shown in the context of JQNs with blocking mechanisms. This work has been published in an international journal; see .

We consider a queueing system composed of a dispatcher that routes deterministically jobs to a set of non-observable queues working in parallel. In this setting, the fundamental problem is which policy should the dispatcher implement to minimize the stationary mean waiting time of the incoming jobs. We present a structural property that holds in the classic scaling of the system where the network demand (arrival rate of jobs) grows proportionally with the number of queues. Assume that each queue of type

This work proposes a model to study the interaction of price competition and congestion in the cloud computing marketplace. Specifically, we propose a three-tier market model that captures a marketplace with users purchasing services from Software-as-Service (SaaS) providers, which in turn purchase computing resources from either Provider-as-a-Service (PaaS) providers or Infrastructure-as-a-Service (IaaS) providers. Within each level, we define and characterize competitive equilibria. Further, we use these characterizations to understand the relative profitability of SaaSs and PaaSs/IaaSs, and to understand the impact of price competition on the user experienced performance, i.e., the `price of anarchy' of the cloud marketplace. Our results highlight that both of these depend fundamentally on the degree to which congestion results from shared or dedicated resources in the cloud. This work has been submitted to an international journal. A preliminary has been published in .

Cloud computing is an emerging technology that allows to access computing resources on a pay-per-use basis. The main challenges in this area are the cient performance management and the energy costs minimization. In this work we model the service provisioning problem of Cloud Platform-as-a-Service systems as a Generalized Nash Equilibrium Problem and show that a potential function for the game exists. Moreover, we prove that the social optimum problem is convex and we derive some properties of social optima from the corresponding Karush-Kuhn-Tucker system. Next, we propose a distributed solution algorithm based on the best response dynamics and we prove its convergence to generalized Nash equilibria. Finally, we numerically evaluate equilibria in terms of their efficiency with respect to the social optimum of the Cloud by varying our algorithm initial solution. Numerical results show that our algorithm is scalable and very efficient and thus can be adopted for the run-time management of very large scale systems. This work has been published in an international journal; see .

We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure

We study a nonparametric method for estimating the conditional density associated to the jump rate of a piecewise-deterministic Markov process. In our framework, the estimation needs only one observation of the process within a long time interval. Our method relies on a generalization of Aalen?s multiplicative intensity model. We prove the uniform consistency of our estimator, under some reasonable assumptions related to the primitive characteristics of the process. A simulation study illustrates the behavior of our estimator. This work has been published in Scandinavian Journal of Statistics: .

We consider a discrete-time Markov decision process with Borel state and action spaces, and possibly unbounded cost function. We assume that the Markov transition kernel is absolutely continuous with respect to some probability measure

This work is keeping with the topic of two papers which treated dynamic reliability problems and were presented in previous conferences. Its aim is to confirm the potentialities of a method which combines the high modeling ability of the piecewise deterministic processes and the great computing power inherent to the Monte Carlo simulation. This method is now applied to a simplified but realistic offshore oil production system which is a hybrid system combining continuous-time and discrete-time dynamics. The results thus obtained have been compared with those given by an ad hoc Petri net model for comparison and validation purposes. This work has been published in an international journal; see .

We propose in this paper a numerical method which computes the trajectory of a vehicle subject to some mission objectives. The method is applied to a submarine whose goal is to best detect one or several targets (we consider signal attenuation due to acoustic propagation) or/and to minimize its own detection range perceived by the other targets. Our approach is based on dynamic programming of a finite horizon Markov decision process. The position and the velocity of the targets are supposed to be known only up to a random estimation error, as a Kalman type filter is used to estimate these quantities from the measurements given by the on board sonar. We also take into account the information on the environment through a sound propagation code. A quantization method is applied to fully discretize the problem and solve it numerically. This work is still in progress and was presented at the international conference .

We propose a numerical method for the optimal design and maintenance for the heated hold-up tank system. A multi-objective problem is framed to consider simultaneously the objectives of maximizing the operation profit and maximizing the reliability. The system consists of a tank containing a fluid whose level is controlled by three components: two inlet pumps and one outlet valve. A thermal power source heats up the fluid. The failure rates of the components depend on the temperature, the position of the three components monitors the liquid level in the tank and the liquid level determines the temperature. We model the system by a piecewise deterministic Markov process. To find the solution of the optimal maintenance interval, the non-dominated sorting genetic algorithm-II (NSGA-II) is used. This work is still in progress and was presented at the international conference .

We use quantization to construct a nonparametric estimator of conditional quantiles of a scalar response Y given a d-dimensional vector of covariates X. First we focus on the population level and show how optimal quantization of X, which consists in discretizing X by projecting it on an appropriate grid of N points, allows to approximate conditional quantiles of Y given X. We show that this approximation is arbitrarily good as N goes to infinity and provide a rate of convergence for the approximation error. Then we turn to the sample case and define an estimator of conditional quantiles based on quantization ideas. We prove that this estimator is consistent for its fixed-N population counterpart. The results are illustrated on a numerical example. This work is in collaboration with Davy Paindaveine from Université Libre de Bruxelles. It has been presented in the national conference of the French Statistical Society of Statistics and in the international conference on computational statistics .

recently introduced a promising nonparametric estimator of conditional quantiles based on optimal quantization, but almost exclusively focused on its theoretical properties. We now discuss its practical implementation (by proposing in particular a method to properly select the corresponding smoothing parameter, namely the number of quantizers) and (ii) we investigate how its finite-sample performances compare with those or classical kernel of nearest-neighbor competitors. Monte Carlo studies show that the quantization-based estimator competes well in all cases (in terms of mean squared errors) and tends to dominate its competitors as soon as the covariate is not uniformly distributed over its support. We also apply our approach to a real data set. While most of the paper focuses on the case of a univariate covariate, we also briefly discuss the multivariate case and provide an illustration for bivariate regressors. This work is in collaboration with Davy Paindaveine from Université Libre de Bruxelles. It has been presented in the national conference of the French Statistical Society of Statistics and in the international conference on computational statistics .

Quantile regression allows to assess the impact of some covariate X on a response Y . An important application is the construction of reference curves and conditional prediction intervals for Y . Recently, developed a new nonparametric quantile regression method based on the concept of optimal quantization. We now describe an R package, called QuantifQuantile, that allows to perform quantization-based quantile regression. We describe the various functions of the package and provide examples. This work is in collaboration with Davy Paindaveine from Université Libre de Bruxelles. It has been presented in the national conference on the R software .

Identifying specific effects of contaminants in a multi-stress field context remain a challenge in ecotoxicology. In this context, "omics" technologies, by allowing the simultaneous measurement of numerous biological endpoints, could help unravel the in situ toxicity of contaminants. In this study, wild Atlantic eels were sampled in 8 sites presenting a broad contamination gradient in France and Canada. The global hepatic transcriptome of animals was determined by RNA-Seq. In parallel, the contamination level of fish to 8 metals and 25 organic pollutants was determined. Factor analysis for multiple testing was used to identify genes that are most likely to be related to a single factor. Among the variables analyzed, arsenic (As), cadmium (Cd), lindane (γ-HCH) and the hepato-somatic index (HSI) were found to be the main factors affecting eel's transcriptome. Genes associated with As exposure were involved in the mechanisms that have been described during As vasculotoxicity in mammals. Genes correlated with Cd were involved in cell cycle and energy metabolism. For γ-HCH, genes were involved in lipolysis and cell growth. Genes associated with HSI were involved in protein, lipid and iron metabolisms. Our study proposes specific gene signatures of pollutants and their impacts in fish exposed to multi-stress conditions.

This work is in collaboration with G. Durrieu from Vannes University and R. Coudret. It will be published in Ecotoxicology .

A data-driven bandwidth choice for a kernel density estimator called critical bandwidth is investigated. This procedure allows the estimation to have as many modes as assumed for the density to estimate. Both Gaussian and uniform kernels are considered. For the Gaussian kernel, asymptotic results are given. For the uniform kernel, an argument against these properties is mentioned. These theoretical results are illustrated with a simulation study that compares the kernel estimators that rely on critical bandwidth with another one that uses a plug-in method to select its bandwidth. An estimator that consists in estimates of density contour clusters and takes assumptions on number of modes into account is also considered. Finally, the methodology is illustrated using environment monitoring data.

This work is in collaboration with G. Durrieu from Vannes University and R. Coudret. It will be published in Communication in Statistics - Simulation and Computation .

A semiparametric regression model of a q-dimensional multivariate response y on a p-dimensional covariate x is considered. A new approach is proposed based on sliced inverse regression (SIR) for estimating the effective dimension reduction (EDR) space without requiring a prespecified parametric model. The convergence at rate square root of n of the estimated EDR space is shown. The choice of the dimension of the EDR space is discussed. Moreover, a way to cluster components of y related to the same EDR space is provided. Thus, the proposed multivariate SIR method can be used properly on each cluster instead of blindly applying it on all components of y. The numerical performances of multivariate SIR are illustrated on a simulation study. Applications to a remote sensing dataset and to the Minneapolis elementary schools data are also provided. Although the proposed methodology relies on SIR, it opens the door for new regression approaches with a multivariate response.

This work is in collaboration with S. Girard from Inria MISTIS team and R. Coudret. It is published in CSDA .

Nonparametric regression is a powerful tool to estimate nonlinear relations between some predictors and a response variable. However, when the number of predictors is high, nonparametric estimators may suffer from the curse of dimensionality. In this chapter, we show how a dimension reduction method (namely Sliced Inverse Regression) can be combined with nonparametric kernel regression to overcome this drawback. The methods are illustrated both on simulated datasets as well as on an astronomy dataset using the R software .

This work is in collaboration with S. Girard from Inria MISTIS team .

As part of optimizing the reliability, Thales Optronics now includes systems that examine the state of its equipment. The aim of this paper is to use hidden Markov Model to detect as soon as possible a change of state of optronic equipment in order to propose maintenance before failure. For this, we carefully observe the dynamic of a variable called "cool down time" and noted Tmf, which reflects the state of the cooling system. Indeed, the Tmf is an indirect observation of the hidden state of the system. This one is modelled by a Markov chain and the Tmf is a noisy function of it. Thanks to filtering equations, we obtain results on the probability that an appliance is in degraded state at time t, knowing the history of the Tmf until this moment. We have evaluated the numerical behavior of our approach on simulated data. Then we have applied this methodology on our real data and we have checked that the results are consistent with the reality. This method can be implemented in a HUMS (Health and Usage Monitoring System). This simple example of HUMS would allow the Thales Optronics Company to improve its maintenance system. This company will be able to recall appliances which are estimated to be in degraded state and do not control to soon those estimated in stable state.

This work is in collaboration with A. Gegout-Petit from Lorraine University. It is published in Journal de la SFdS .

We investigate the asymptotic behavior of the Nadaraya-Watson estimator for the estimation of the regression function in a semiparametric regression model. On the one hand, we make use of the recursive version of the sliced inverse regression method for the estimation of the unknown parameter of the model. On the other hand, we implement a recursive Nadaraya-Watson procedure for the estimation of the regression function which takes into account the previous estimation of the parameter of the semiparametric regression model. We establish the almost sure convergence as well as the asymptotic normality for our Nadaraya-Watson estimator. We also illustrate our semiparametric estimation procedure on simulated data.

This work is in collaboration with B. Bercu from Bordeaux University and T.M.N Nguyen. It is published in Statistics .

Novelty Search (NS) is a unique approach towards search and optimization, where an explicit objective function is replaced by a measure of solution novelty to provide the selective pressure in an artificial evolutionary system. However, NS has been mostly used in evolutionary robotics, while it's applicability to classic machine learning problems has been mostly unexplored. This work presents a NS-based Genetic Programming (GP) algorithm for supervised classification, with the following noteworthy contributions. It is shown that NS can solve real-world classification tasks, validated over several commonly used benchmarks. These results are made possible by using a domain-specific behavioral descriptor, closely related to the concept of semantics in GP. Moreover, two new variants of the NS algorithm are proposed, Probabilistic NS (PNS) and a variant of Minimum Criterion NS (MCNS). The former models the behavior of each solution as a random vector, eliminating all the NS parameters and reducing the computational overhead of the traditional NS algorithm; the latter uses a standard objective function to constrain the search and bias the process towards high performance solutions. The paper also discusses the effects of NS on an important GP phenomenon, bloat. In particular, results indicate that some variants of the NS approach can have a beneficial effect on the search process by curtailing code growth. See .

The objective of the present work is to develop a method that is able to automatically determine mental states of vigilance; i.e., a person's state of alertness. Such a task is relevant to diverse domains, where a person is expected or required to be in a particular state of mind. For instance, pilots and medical staff are expected to be in a highly alert state and the proposed method could help to detect possible deviations from this expected state. This work poses a binary classification problem where the goal is to distinguish between a “relaxed" state and a baseline state (“normal") from the study of electroencephalographic signals (EEG) collected with a small number of electrodes. The EEG of 58 subjects in the two alertness states (116 records) were collected via a cap with 58 electrodes. After a data validation step, 19 subjects were retained for further analysis. A genetic algorithm was used to select a subset of electrodes. Common spatial pattern (CSP) coupled to linear discriminant analysis (LDA) was used to build a decision rule and thus predict the alertness of the subjects. Different subset sizes were investigated and the best compromise between the number of selected electrodes and the quality of the solution was obtained by considering 9 electrodes. Even if the present approach is costly in computation time (GA search), it allows to construct a decision rule that provides an accurate and fast prediction of the alertness state of an unseen individual. See , .

The canonical approach towards fitness evaluation in Genetic Programming (GP), is to use a static training set to determine fitness, based on a cost function (root-mean-squared error) averaged over all cases. However, motivated by different goals, researchers have recently proposed several techniques that focus selective pressure on a subset of fitnesscases at each generation. These approaches can be described as fitnesscase sampling techniques, where the training set is sampled, in someway, to determine fitness. This paper shows a comprehensive evaluation of some sampling methods using benchmark problems and real-world problems. The algorithms considered here are Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and a new sampling technique is proposed called Keep-Worst Interleaved Sampling (KW-IS). The algorithms are extensively evaluated based on test performance, overfitting and bloat. Results suggest that sampling techniques can improve performance based on testing error, bloat and overfitting compared to standard GP. Some of the best results were achieved by Lexicase Selection and Keep Worse-Interleaved Sampling which obtained good results in overfitting and bloat effect. Results also show that on these problems overfitting correlates strongly with bloating and exhibits a good compromise among the considered performance measures.

Since its introduction, Geometric Semantic Genetic Programming (GSGP) has aroused the interest of numerous researchers and several studies have demonstrated that GSGP is able to effectively optimize training data by means of small variation steps, that also have the effect of limiting overfitting. In order to speed up the search process, in this paper we propose a system that integrates a local search strategy into GSGP (called GSGP-LS). Furthermore, we present a hybrid approach, that combines GSGP and GSGP-LS, aimed at exploiting both the optimization speed of GSGP-LS and the ability to limit overfitting of GSGP. The experimental results we present, performed on a set of complex real-life applications, show that GSGP-LS achieves the best training fitness while converging very quickly, but severely overfits; GSGP converges very slowly, but is basically not affected by overfitting. The best overall results were achieved with the hybrid approach, allowing the search to converge quickly, while also exhibiting a noteworthy ability to limit overfitting. These results are encouraging, and suggest that future GSGP algorithms should focus on finding the correct balance between the greedy optimization of a local search strategy and the more robust geometric semantic operators.

We are interested in the optimization of a launcher integration process. It comprises several steps from the production of the subassemblies to the final launch. The four subassemblies go through various types of operations such as preparation, integration, control and storage. These operations are split up into three workshops. Due to possible breakdowns or staff issues, the time spent in each workshop is supposed random. So is the time needed to deliver the subassemblies, for similar reasons including e.g. shipping delays. We also have to deal with constraints related to the architecture of the assembly process itself. Indeed, we have to take into account waiting policies between workshops. The workshops may work in parallel but can be blocked if their output is not transferred to the next workshop in line. Storage capacity of output products is limited.

Our goal is finding the best rates of delivery of the subassemblies, the best choice of architecture (regarding stock capacities) and the best times when to stop and restart the workshops to be able to carry out twelve launches a year according to a predetermined schedule at minimal cost. To solve this problem, we choose a mathematical model particularly suitable for optimization with randomness: Markov decision processes (MDPs).

We have implemented a numerical simulator of the process based on the MDP model. It provides the fullest information possible on the process at any time. The simulator has first been validated with deterministic histories. Random histories have then been run with exponentially distributed delivery times for the subassemblies and several families of random laws for the time spent in each workshop. Using Monte Carlo simulations, we obtain the distribution of the launch times. Preliminary optimization results allow choosing stock capacities and delivery rates that satisfy the launch schedule.

In this context, the PhD Thesis of Christophe Nivot (2013-2016) is funded by Chaire Inria-Astrium-EADS IW-Conseil régional d'Aquitaine.

Integrated maintenance, failure intensity, optimisation.

As part of optimizing the reliability, Thales Optronics includes systems that examine the state of their equipment. This function is performed by HUMS (Health Unit Monitoring Systems). The collaboration is the subject of the PhD of Alize Geeraert (CIFRE). The aim of this thesis is to implement in the HUMS a program based on observations that can determine the state of the system, optimize maintenance operations and evaluate the failure risk of a mission.

The chaire is funding the PhD thesis of Christophe Nivot on the optimization of the assembly line of a launcher. It comprises several steps from the production of the subassemblies to the final launch. The aim of the thesis is finding the best rates of delivery of the subassemblies, the best choice of architecture (regarding stock capacities) and the best times when to stop and restart the workshops to be able to carry out twelve launches a year according to a predetermined schedule at minimal cost.

The topic of the project is "Advanced statistical methods for analysis of multidimensional databases of human brain imaging". The project focuses on the analysis of variability factors driving hemispheric specialization (HS) of the brain, a human specific character, for which a dedicated database has recently been built by GIN (Neurofunctional Imaging Group from L). GIN provides the database and performs genotyping of fifty loci potentially affecting HS. The “Probability and Statistics” group (EPS) from the LabEx CPU works on the methodological developments of statistical tools to analyze these high dimensional data. Interactions between GIN and EPS allow to identify and to characterize the best variables, to perform additional analyses, and to suggest appropriate additional variables, especially in the case of the voxel being implemented. GIN is also involved in the interpretation of the statistical results generated throughout the project.

Dr Solveig Badillo has been hired as Postdoctoral researcher in may 2014 on this project for 20 months.

The ANR project ADAPTEAU has been obtained for the period 2012-2016 and will start in january 2012.

ADAPTEAU aims to contribute to the analysis and management of global change impacts and adaptation patterns in River-Estuarine Environments (REEs) by interpreting the scientific challenges associated with climate change in terms of: i) scale mismatches; ii) uncertainty and cognitive biases between social actors; iii) interdisciplinary dialogue on the "adaptation" concept; iv) critical insights on adaptive governance and actions, v) understanding the diversity of professional, social and economic practices vis-à-vis global change. The project aims to build an integrative and interdisciplinary framework involving biophysical and social sciences, as well as stakeholders and civil society partners. The main objective is to identify adaptive strategies able to face the stakes of global change in REEs, on the basis of what we call ‘innovative adaptation options’.

We consider the adaptation of Social-Ecological Systems (SES) through the expected variations of the hydrological regimes (floods / low-flow) of the Garonne-Gironde REE—a salient issue in SW France, yet with a high potential for genericity The ADAPTEAU project will be organised as follows:

Achieve and confront socio-economic and environmental assessments of expected CC impacts on the Garonne-Gironde river-estuarine continuum (task 1);

Identify the emerging ‘innovative adaptation options’ endorsed by various social, economic, political actors of the territory (depolderisation, ‘room for rivers’ strategies, changes in economic activities, agricultural systems or social practices), then test their environmental, economic and social robustness through a selected subset (task 2);

Scientists, representatives from administrators and civil society collaborate to build adaptation scenarios, and discuss them in pluralistic arenas in order to evaluate their social and economic feasibility, as well as the most appropriate governance modes (task 3).

Disseminate the adaptation strategies to academics and managers, as well as to the broader society (task 4).

The expected results are the definition and diffusion of new regional-scale reference frameworks for the discussion of adaptation scenarios in REE and other SESs, as well as action guidelines to better address climate change stakes.

The CQFD team work on tasks 1 and 3.

ANR Piece (2013-2016) of the program *Jeunes chercheuses et jeunes chercheurs* of the French
National Agency of Research (ANR), lead by F. Malrieu (Univ. Tours).
The Piecewise Deterministic Markov Processes (PDMP) are non-diffusive
stochastic processes which naturally appear in many areas of applications as
communication networks, neuron activities, biological populations or reliability of
complex systems. Their mathematical study has been intensively carried out in
the past two decades but many challenging problems remain completely open.
This project aims at federating a group of experts with different backgrounds
(probability, statistics, analysis, partial derivative equations, modeling) in order
to pool everyone's knowledge and create new tools to study PDMPs. The main
lines of the project relate to estimation, simulation and asymptotic behaviors
(long time, large populations, multi-scale problems) in the various contexts of
application.

Statistical methods have become more and more popular in signal and image processing over the past decades. These methods have been able to tackle various applications such as speech recognition, object tracking, image segmentation or restoration, classification, clustering, etc. We propose here to investigate the use of Bayesian nonparametric methods in statistical signal and image processing. Similarly to Bayesian parametric methods, this set of methods is concerned with the elicitation of prior and computation of posterior distributions, but now on infinite-dimensional parameter spaces. Although these methods have become very popular in statistics and machine learning over the last 15 years, their potential is largely underexploited in signal and image processing. The aim of the overall project, which gathers researchers in applied probabilities, statistics, machine learning and signal and image processing, is to develop a new framework for the statistical signal and image processing communities. Based on results from statistics and machine learning we aim at defining new models, methods and algorithms for statistical signal and image processing. Applications to hyperspectral image analysis, image segmentation, GPS localization, image restoration or space-time tomographic reconstruction will allow various concrete illustrations of the theoretical advances and validation on real data coming from realistic contexts.

**IRSES FP7 MARIE CURIE ACOBSEC**: http://

Over the last decade, Human-Computer Interaction (HCI) has grown and matured as a field. Gone are the days when only a mouse and keyboard could be used to interact with a computer. The most ambitious of such interfaces are Brain-Computer Interaction (BCI) systems. BCI’s goal is to allow a person to interact with an artificial system using brain activity. A common approach towards BCI is to analyze, categorize and interpret Electroencephalography (EEG) signals in such a way that they alter the state of a computer. ACoBSEC’s objective is to study the development of computer systems for the automatic analysis and classification of mental states of vigilance; i.e., a person’s state of alertness. Such a task is relevant to diverse domains, where a person is required to be in a particular state. This problem is not a trivial one. In fact, EEG signals are known to be noisy, irregular and tend to vary from person to person, making the development of general techniques a very difficult scientific endeavor. Our aim is to develop new search and optimization strategies, based on evolutionary computation (EC) and genetic programming (GP) for the automatic induction of efficient and accurate classifiers. EC and GP are search techniques that can reach good solutions in multi-modal, non-differentiable and discontinuous spaces; and such is the case for the problem addressed here. This project combines the expertise of research partners from five converging fields: Classification, Neurosciences, Signal Processing, Evolutionary Computation and Parallel Computing in Europe (France Inria, Portugal INESC-ID, Spain UNEX) and South America (Mexico ITT, CICESE). The exchange program goals and milestones give a comprehensive strategy for the strengthening of current scientific relations amongst partners, as well as for the construction of long-lasting scientific relationships that produce high quality theoretical and applied research.

**Numerical methods for Markov decision processes (2013-2015)**.
This project is funded by the Gobierno de Espana, Direcion General de Investigacion Cientifica y Tecnica
(reference number: MTM2012-31393) for three years to support the scientific collaboration
between Tomas Prieto-Rumeau, Jonatha Anselmi and François Dufour. This research project is concerned with numerical
methods for Markov decision processes (MDPs). Namely, we are interested in approximating numerically
the optimal value function and the optimal controls for different classes of constrained and unconstrained
MDPs. Our methods are based on combining the linear programming formulation of an MDP with a
discretization procedure referred to as quantization of a probability distribution, underlying the random
transitions of the dynamic system. We are concerned with optimality criteria such as the total expected cost
criterion (for finite horizon problems) and, on the other hand, the total expected discounted cost and the average
cost optimality criteria (for infinite horizon problems).

**Control of Dynamic Systems Subject to Stochastic Jumps** USP-COFECUB grant (2013-2014).
This collaboration is also supported by the **Associate Team Inria: CDSS (2014-2016)**.
The main goals of this joint cooperation is to study the control of dynamic systems subject to stochastic jumps. Three topics are considered.
In the first topic we study the control problem of piecewise-deterministic Markov processes (PDMP's) considering constraints. In this case the main goal is to obtain a theoretical formulation for the equivalence between the original optimal control problem of PDMP's with constrains and an infinite dimensional static linear optimization problem over a space of occupation measures of the controlled process. F. Dufour at Inria and O. Costa in USP carry out this topic.
In the second topic we focus on numerical methods for solving control and filtering problems related to Markov jump linear systems (MJLS). This project allows a first cooperation between B. de Saporta and E. Costa. The third research subject is focused on quantum control by using Lyapunov-like stochastic methods and P. Rouchon and P. Pereira da Silva conduct it.

Tomas Prieto-Rumeau (Department of Statistics and Operations Research, UNED, Madrid, Spain) visited the team during two weeks in 2014. The main subject of the collaboration is the approximation of Markov Decision Processes.

Oswaldo Costa (Escola Politécnica da Universidade de São Paulo, Brazil) collaborate with the team on the theoretical aspects of continuous control of piecewise-deterministic Markov processes. He visited the team during two weeks in 2014 supported by the USP-COFECUB grant and the Associate Team Inria: CDSS.

Alexey Piunovskiy (University of Liverpool) visited the team during six weeks in 2014. The main subject of the collaboration is the linear programming approach for Markov Decision Processes. This research was supported by the Clusters d'excellence CPU.

Giuliano Casale (Imperial College), invited from December 10th to December 12nd 2014 to continue his collaboration with Jonatha Anselmi.

Leonardo Trujillo (ITT Tijuana, Mexico) visited the team for one month in october 2014 to continue his collaboration with Pierrick Legrand.

Francois Dufour visited Alexey Piunovskiy (University of Liverpool) to continue his work about the linear programming approach for Markov Decision Processes.

Pierrick Legrand visited Leonardo Trujillo (ITT Tijuana, Mexico) in nov 2014.

Jonatha Anselmi has been member of the conference program committee of the 8th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS) and of the 21st International Conference on Analytical and Stochastic Modelling Techniques and Applications (ASMTA).

Pierrick Legrand has been member of the conference program committee of the EVOLVE 2015 International Conference and of the Genetic and Evolutionary Computation Conference (GECCO 2014).

All the members of the team are regular reviewers for several conferences in applied probability and statistics.

F. Dufour is associate editor of the journal: SIAM Journal of Control and Optimization since 2009.

J. Saracco is an associate editor of the journal Case Studies in Business, Industry and Government Statistics (CSBIGS) since 2006.

All the members of the team are regular reviewers for several journals in applied probability and statistics.

Licence : F. Dufour, Probabilités et statistiques, 16 heures, niveau L3, Institut Polytechnique de Bordeaux, école ENSEIRB-MATMECA, France.

Master : F. Dufour, Méthodes numériques pour la fiabilité, 24 heures, niveau M1, Institut Polytechnique de Bordeaux, école ENSEIRB-MATMECA, France.

Master : F. Dufour, Probabilités, 20 heures, niveau M1, Institut Polytechnique de Bordeaux, école ENSEIRB-MATMECA, France.

Licence : J. Anselmi, Probabilités 16 heures, niveau L3, Institut Polytechnique de Bordeaux, école ENSEIRB-MATMECA, France.

Licence : M. Chavent, Statistique descriptive, 36 ETD ,L1, Bordeaux university, France

License : M. Chavent, Modélisation statistique, 18 ETD, niveau L3, Bordeaux university, France

Master : M. Chavent, Analyse des données 2, 25 ETD, niveau M2, Bordeaux university, France

Master : M. Chavent, Scoring, 21 ETD, niveau M2, Bordeaux university, France

Licence: J. Saracco, Descriptive statistics, 10.5h, L3, First year of ENSC, France

Licence: J. Saracco, Mathematical statistics, 20h, L3, First year of ENSC, France

Licence:J. Saracco, Data analysis (multidimensional statistics), 20h, L3, First year of ENSC, France

Licence: J. Saracco, Mathematics (complement of linear algebra), 20h, L3, First year of ENSC, France

Master: J. Saracco, Statistical modeling, 20h, M1, Second year of ENSC, France

Master: J. Saracco, training project, 20h, M1, Second year of ENSC, France

Licence : B. de Saporta, Logiciels scientifiques 15h ETD, M1, université Montpellier 2, France

Master : B. de Saporta, Processus de Markov, 31,5h ETD, M2, université Montpellier 2, France

P. Legrand, Mathématiques générales (responsable de l'UE), Licence 1 SCIMS (138 heures)

P. Legrand, Informatique pour les mathématiques (responsable de l'UE), Licence 1 SCIMS (36 heures)

P. Legrand, Complément d’Algèbre/Espaces Eucl. (responsable de l'UE), Licence 2 SCIMS (54 heures)

PhD completed : Karim Claudio, Un outil d’aide à la maîtrise des pertes dans les réseaux d’eau potable : mise en place d’un modèle de fuite multi-état en secteur hydraulique instrumenté , supervised by J. Saracco and V. Couallier.

PhD in progress : Amaury Labenne, Approche Statistique du diagnostic territorial par la notion de qualité de vie, supervised by M. Chavent, J. Saracco and V. Kuentz.

PhD in progress : Adrien Todeschini, Elaboration et validation d’un système de recommandation bayésien, supervised by F. Caron and M. Chavent.

PhD in progress : Isabelle Charlier, Optimal quantization applied to conditional quantile estimation, University of Bordeaux and Université Libre de Bruxelle, supervised by J. Saracco and D. Paindaveine.

PhD in progress : Christophe Nivot, Optimisation de la chaîne de montage du futur lanceur européen, September 2013, B. supervised by B. de Saporta and F. Dufour

PhD in progress : Alizé Geeraert, Contrôle optimal des processus Markoviens déterministes par morceaux et application à la maintenance, University of Bordeaux, September 2014, supervised by B. de Saporta and F. Dufour.

Nicolas Antunes: Application d’algorithmes prédictifs à l’identification de niches ecoculturelles des population du passé: approche ethnoarcheologique. Financement ERC F. D’Errico. Co-encadrement : D'Errico, Del Moral, Legrand. Cette thèse consiste à utiliser des algorithmes de type GARP pour prédire l’existence de niches écologiques à partir de données climatologiques. 2011-2014.

Emigdio Z. Flores Lopez, “Classification of mental states with genetic programming”, PhD in engineering sciences. Financement Conacyt (Consejo Nacional de Ciencia y Tecnologia) national cholarships for PNPC programs (Programa Nacional de Posgrados Calidad), Mexico. Co-encadrement : L. Trujillo (50%), P. Legrand (50%). 2013-2016.

B. de Saporta was a member of the PhD defense jury of Coralie Fritsch, université Montpellier 2, France.

M. Chavent was a member of the CR2 concour of Inria Bordeaux-Sud-Ouest.

J. Saracco was a member of the PhD defense jury of Hussein Hashem, Brunel University, UK.

J. Saracco was a member of the PhD defense jury of Karim Caludio, Bordeaux University, France.

J. Saracco was member of various juries for positions in french universities (Bordeaux, professor; Poitiers, assistant professor; Orléans, professor) in april-may 2014.

M Chavent and J. Saracco are elected members of CNU 26.

B. de Saporta was an elected member of CNU 26 until sept. 2014.

J. Saracco is vice president of the french statistical society (SFdS).