CQFD is an INRIA Team joint with University of Bordeaux (UB1, UB2 and UB4) and CNRS (IMB, UMR 5251) and Institut Polytechnique de Bordeaux.

Economic, scientific and military competition leads many industrial sectors to design ever more successful and reliable processes and equipment.

Reliability and quality control, and more generally dependability and safety, have become a crucial area in the field of industrial engineering. The term reliability is an acquisition of the 20th century. Initially, reliability was developed to meet the needs of the electronics industry. This was a consequence of the fact that the first complex systems appeared in this field of engineering. Such systems have a huge number of components which made their global reliability very low in spite of their relatively highly reliable components. This led to a specialized applied mathematical discipline which allowed one to make an a priori evaluation of various reliability indexes at the design stage, to choose an optimal system structure, to improve methods of maintenance, and to estimate the reliability on the basis of special testing or exploitation.

Our objective is to apply probabilistic and statistical tools from estimation and control theory to dependability and safety. We wish to investigate the following fields:

Design and analysis of realistic and accurate random models for dependability. In particular, we will study parametric models for dynamic reliability and semi- or non-parametric models for quality control.

Implementation of estimation algorithms in relation with our stochastic models and evaluation of reliability indexes.

Design of control for maintenance and reconfiguration.

We stress the fact that points 1)and 2)are strongly interlinked. Indeed, designing mathematical models for reliability is an important and basic research field. However, our models will be legitimated and practically validated by point 2). In particular, the feasibility of estimation routines and the quality of the evaluation of reliability indexes will be crucial. Point 3)deals with control through practical issues of maintenance and reconfiguration. This last point legitimates and is based on the first two ones: only after modelling, identifying and evaluating reliability indexes, shall we be able to compute a cost criterion to be optimized.

Our team was just awarded a grant from ARPEGE program of the French National Agency of Research (ANR): project ”FAUTOCOES”, number ANR-09-SEGI-004 starting in October 2009.

An important contract was signed in june 2009 between EDF R&D team ICAME, and INRIA teams ALEA and CQFD. It deals with new statistical methods for recursive adaptive prediction of the electricity consumption of EDF customers. The contract was divided into five lots and it will last until 2011. The total value of the contract is 140000 euros.

F. Dufour has been nominated as an Associate Editor of the SIAM Journal on Control and Optimization.

The team CQFD has been very involved in the organization of the
`"41e Journées de Statistique"`Annual event of the "Société Française de Statistique" which took place in Bordeaux in 2009 : 450 participants, 20 plenary sessions, 230 contributed talks, budget 100 000 euros.
Reliability and Quality Control was one of the main topics of the meeting. More details are available via the link
http://

The team CQFD has organized in june 2009 the "5e Rencontre statistiques Mathématiques Bordeaux-Santander-Toulouse-Valladolid". The goal of this annual workshop is to
strenghten the scientific exchanges between our Universities. The universities of Pau and Montpellier have participated in the workshop. More details are available via the link
http://

The teams ALEA and CQFD, as well as INRIA Bordeaux Sud-Ouest, are very involved in the organization of the "Journées MAS 2010". This biannual meeting is the most
important event for french probabilists and statisticiens. It will take place in Bordeaux 1 university in september 2010. The topic of this meeting is stochastic algorithms and combinatorics.
More details are available via the link
http://

In dependability and safety theory, modeling is a key step to study the properties of the physical processes involved. Nowadays, it appears necessary to take explicitly and realistically the dependencies into account, meaning the dynamic interactions existing between the physical parameters (for example: pressure, temperature, flow rate, level, ...) of the system and the functional and dysfunctional behavior of its components. Classically, the models described in the dependability and safety literature do not take such interactions into account. A first set of methods used in reliability theory is the so-called combinatory approaches (fault trees, event trees, reliability diagrams and networks, ...), which can be used to identify and evaluate the combinations of events leading to the occurrence of other desirable or undesirable events. These powerful methods suffer from the fact that such combinations do not take the order of occurrence into account, in the sense that they eliminate any notion of dependency between events. A second set of methods is described by finite state Markov (or semi-Markov) models. In this context, the system is described by a fixed number of components which can be in different states. For any component, the set of its possible states is assumed to be finite (generally it contains only two elements: an operational and a failure state). One of the main limitations encountered with such models is their difficulties to model correctly the physical processes involving deterministic behavior. To overcome such difficulties, dynamic reliability was introduced in 1980 as a powerful mathematical framework capable of explicitly handling interactions between components and process variables. Nowadays in the literature, the multi-model approach appears as a natural framework to formulate dynamic reliability problems. The behavior of the physical model is thus described by different modes of operation from nominal to failure states with intermediate dysfunctional regimes. For a large class of industrial processes, the layout of operational or accident sequences generally comes from the occurrence of two types of events:

The first type of event is directly linked to a deterministic evolution of the physical parameters of the process.

The second type of event is purely stochastic and usually corresponds to random demands or failures of system components.

In both cases, these events will induce jumps in the behavior of the system leading to stable or unstable trajectories for the process.

The first one is deterministic. From the mathematical point of view, it is given by the fact that the trajectory hits the boundary of E. From the physical point of view, it can be seen as a modification of the mode of operation when a physical parameter reaches a prescribed level (for example when the pressure of a tank reaches the critical value).

The second one is stochastic. It models the random nature of failures or inputs that modify the mode of operation of the system.

As it has been illustrated above, the key asset of this mathematical model is that it takes naturally into account the two kinds of events previously described. Several examples can be found in , , , and . Most stochastic processes presented in T. Aven and U. Jensen are special cases of piecewise deterministic Markov processes.

In conclusion, it is claimed that piecewise deterministic Markov processes provide a general framework to study dynamic reliability problems. Their dynamical properties allow explicit time dependencies, in contrast with piecewise constant jump Markov processes. Consequently, these processes are really suitable for modeling real phenomena of dynamic reliability.

The probabilistic background offers a very suitable framework to evaluate material quality from the dependability and safety point of view. One can classically characterize the
performances of a system by several indicators : availability, reliability, maintainability, safety, etc. Evaluating all these indicators is crucial. It makes it possible to calculate a
certain
*cost*in order to measure the performances of the system. Hence, the well-known topic in control, called
*robustness*, is given emphasis. In this framework, it is necessary to define the concept of subsystem and sensitivity:

Which are the subsystems of greater impact on the cost?

What is the evolution of cost sensibility with respect to modifications of one or several components of the system ?

For instance, evaluating the production availability of a factory is a vital concern for the industrial world. This notion complements the more classical notions of instantaneous availability and asymptotic availability. Production availability is a probability measure of the regularity of production. Previously, its calculation was usually based on the naive hypothesis that the production level associated with each mode of regime (operational, damaged, partial breakdown, ...) was instantaneously reached as soon as the system entered that state. Consequently, modeling of production availability was done via a discrete random variable and a typical trajectory of the production availability was piecewise constant. It was shown in on a large set of real cases that this hypothesis was not realistic. In fact, the production level evolves continuously and is influenced by the mode of regime as well as internal variables of the system such as pressure, temperature, etc.

Quite obviously, it is necessary to take into account the naturally continuous dynamic of indicators in Reliability, Availability and Maintainability (RAM). In particular, we shall see that the so-called Piecewise Deterministic Markov Processes are very suitable tools to define and evaluate the indicators in RAM for physical systems.

In the domain of safety and dependability, the notions of control and maintainability play a prominent role in the design of reliable systems. This maintainability can be
*active*or
*passive*.

In this context of vulnerability, the usual way to make the system more tolerant towards failures is to introduce several redundancies. We improve the reliability of a
system not only by improving the reliability of its components but also via their redundant organization. This is commonly called
*passive maintainability*. However, it is not always possible to introduce direct physical redundancy which clearly restricts the usefulness of this approach. For example, it seems
impossible to put motor units or pressure transducers in the same place on certain structures such as oil wells or communication satellites.

A second approach, more realistic and promising, is
*active maintainability*. It is organized in the two following steps:

Detection and identification of failures,

Reconfiguration of the system.

In this context, PDMP are especially well-adapted for modeling real physical systems. In fact, a natural approach is first to make out a list of possible failures or
breakdowns. This will lead to the constitution of a set of regimes or modes for the system described in section
. Then, after the detection and identification of all those regimes, the
control or maintainability process will be in a position to react and maintain the system in a damaged but acceptable mode. However, this modeling approach is subject to a number of
limitations in terms of
*efficiency/complexity*. More precisely, if a non-identified breakdown occurs, this approach can fail dramatically. A simple way to rectify this situation out is to include this kind of
failure into the list of possible regimes. However, this will increase the complexity of the model. Therefore, a compromise must be sought during the modeling phase.

A classical aim of reliability is to study censored survival data. In this context, several parametric, semi-parametric, and non-parametric forms of modeling for survival functions estimation have already been proposed. In this project, we focus our attention on another aspect of reliability: Statistical Quality Control (SQC). More precisely, we wish to develop non-parametric and semi-parametric modeling in order to provide tolerance curves and hyper surfaces.

Tolerance curves are used in industry to predict performance of the manufacturing process from external measures such as temperature or pressure. They are particularly useful when the quality control is late (long manufacture time, intermediate storage ...) or results from small size samples. Tolerance curves provide the inspectors with a tool to control whether the evaluated parameters are within the interval required in the specifications and to make the inspection organization more efficient. Because of their graphical representation, they are particularly easy to use.

A tolerance interval differs from the well-known confidence interval. A confidence interval gives information about the position of the mean value of the parameter
whereas a tolerance interval gives information about the position of the parameter and the probability for this parameter to be in this interval. Let
Ybe the random variable which represents this parameter and
Xthe covariate (temperature, pressure,...). To take these covariate into account in the evaluation of the tolerance interval of
Y, the conditional distribution of
Ygiven
Xis studied. When
Xis in
, the conditional quantiles of
Ygiven
Xare used to build tolerance curves and when
Xis in
, they are used to build tolerance hyper surfaces. Finally when
, several parameters are studied simultaneously and multivariate or spatial conditional quantiles are used to build a tolerance region.

Three types of modeling can be used to define conditional univariate or multivariate quantiles. Parametric modeling has the advantage of giving results easy to interpret but the parametric
shape of the conditional distribution of
Ygiven
Xhas to be predetermined. Non-parametric modeling is more flexible because it relaxes the constraint on the conditional distribution. However, in practice the results are difficult to
interpret. Semi-parametric modeling is therefore a compromise between these two types of modeling and gives results easy to interpret. Real indices
X^{'}are indeed incorporated in order to reduce the dimension of the explicative part of the model and no parametric structure is imposed on the link between
Yand
X^{'}.

The choice of parametric, non-parametric or semi-parametric modeling is thus a key point in the estimation of tolerance curves, hyper surfaces or regions.

Non-parametric conditional quantiles estimation is usually based on kernel or local polynomial estimation methods (see for instance
,
) and suffers, like local smoothing methods, from what is called the
curse of dimensionality. Indeed, when the dimension
pof the covariate
Xincreases, the dispersion of the data increases and the quality of the estimation decreases. Another drawback is that graphical representation is possible only when the dimension of
Xis equal to 1 or 2: tolerance curves are obtained in 2D when
and tolerance hyper surfaces are obtained in 3D when
.

To avoid these two drawbacks, semi-parametric modeling can be used. The following two-step (one parametric and one non-parametric) methodology for conditional quantiles estimation is
proposed. First of all, the Euclidean parameter
used to reduce
Xto
X^{'}is estimated. Next, the functional parameters used in the non-parametric conditional quantile estimation are estimated. More precisely the semi-parametric approach combines the SIR
(sliced inverse regression) method and the kernel estimation of the conditional quantile. This methodology allows graphical representation of the tolerance curves in 2D (
) or surfaces in 3D (
), with the index
X^{'}easy to interpret.

We want to focus our attention on dimension-reduction approach for problems of quality control (sliced inverse regression approaches, clustering of variables, ...). We still work on non-parametric and semi-parametric estimation of tolerance curves and hyper-surfaces, via estimation of (multivariate) conditional quantiles. Moreover, another point of interest is the introduction of recursive methods in the estimation process in order to be able to manage the data stream, and at present we develop recursive methods in semi-parametric model for the estimation of tolerance curves.

The following examples illustrate the importance of dependability and safety in various fields.

A first example concerns oil production in deep water. We have already worked with IFREMER on the reliability of oil rigs and with IFP (French Oil Institute) on risk assessment and control for extraction of hydrocarbon substances from submarine deposits hard to work due to their depth.

A second example in the military field concerns combat aircraft with "relaxed static stability". These aircraft are slightly aerodynamically unstable by design: they will quickly depart from level and controlled flight unless the pilot constantly works to keep it in trim. While this enhances maneuverability, it is very wearing on a pilot relying on a mechanical flight control system. Hence, the aircraft is highly vulnerable to sensors or on-board calculator breakdown.

A third example deals with quality control linked with biomedical and biometric studies led by CERIES (Centre de Recherches et Investigations Epidermiques Sensorielles): the research center of CHANEL on human skin. The knowledge of tolerance curves for numerous skin biophysical parameters is crucial for CERIES in so far as it enables CHANEL chemists to develop new cosmetic products better adapted to the aimed target: elderly women on the Asian market, young Caucasian or African-American women. Thus, knowing the skin features of a person is enough to decide whether or not they fit in the reference limits of the various CHANEL cosmetic products.

A last example concerns the development of air quality control strategies, which is a wide preoccupation for human health. In order to achieve this purpose, air pollution sources have to be accurately identified and quantified. We have already worked in 2007-2008 on a scientific project initiated by the French ministry of Ecology and Sustainable Development. We have also worked on the statistical and quality control parts of a study financed by VNF (Voies Navigables de France) concerning a satisfaction survey of sailors on the “canal des deux mers” in south of France. An other possible application concerns the use of tolerance curves in industrial quality control process. We already had discussions with Danone on this subject.

Most of the statistical methods for dimension reduction and quality control have been implemented in R : variables clustering (Chavent,Kuentz), cluster-SIR (Kuentz, Saracco), bootstrap
choice of parameters for SIR related methods (Liquet, Saracco), geometric multivariate (conditional) quantile (Chaouch, Saracco), recursive SIR (Bercu, Nguyen, Saracco), sample selection models
(Chavent, Liquet, Saracco), bagging SIR (Kuentz, Liquet, Saracco). Development of an
R-package is in progress.

The long run average continuous control problem of piecewise deterministic Markov processes (PDMP's) taking values in a general Borel space and with compact action space depending on the state variable is investigated. The control variable acts on the jump rate and transition measure of the PDMP, and the running and boundary costs are assumed to be positive but not necessarily bounded. As far as we are aware of, this is the first time that this kind of problem is considered in the literature. Indeed, results are available for the long run average cost problem but for impulse control see O.L.V. Costa , D. Gatarek and the book by M.H.A. Davis (see the references therein). On the other hand, the continuous control problem has been studied only for discounted costs by A. Almudevar , M.H.A. Davis , , M.A.H. Dempster and J.J. Ye , , L. Forwick, M. Schäl, and M. Schmitz , M. Schäl , A.A. Yushkevich , .

This work is under revision for SIAM Journal of Control and Optimization The following results have been obtained in collaboration with Oswaldo Luis Do Valle Costa from Escola Politécnica da Universidade de São Paulo, Brazil. In particular,

It has been obtained an optimality equation for the long run average cost in terms of a discrete-time optimality equation related to the embedded Markov chain given by the post-jump location of the PDMP.

The existence of a feedback measurable selector for the discrete-time optimality equation has been shown by establishing a connection between this equation and an integro-differential equation

Sufficient conditions has been derived for the existence of a solution for a discrete-time optimality inequality and an ordinary optimal feedback control for the long run average cost using the so-called vanishing discount approach .

Two examples are presented based on the capacity expansion model, analyzed in , (34.45), and by the authors from the stability point of view in , . The first example illustrates the fact that the technical assumptions of the paper are satisfied and so there exists a solution for the optimality inequality. The second example shows that by using our new result (see Proposition 8.7), there exists a solution for the average-cost optimality equation.

This work is concerned with the existence of an optimal control strategy for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP's). It has been accepted for publication in Journal of Applied Probability . These results have been obtained in collaboration with Oswaldo Luis Do Valle Costa from Escola Politécnica da Universidade de São Paulo, Brazil. The present work can be seen as a continuation of the results derived in . In , sufficient conditions were derived to ensure the existence of an optimal control by using the vanishing discount approach. These conditions were mainly expressed in terms of the relative difference of the -discount value functions. The main goal of this paper is to derive tractable conditions directly related to the primitive data of the PDMP to ensure the existence of an optimal control. Our main assumptions are written in terms of some integro-differential inequalities related to the so-called expected growth condition, and geometric convergence of the post-jump location kernel associated to the PDMP.

The present work
is a continuation of a series of papers by the authors:
. Part of this work will be presented at the 48
^{th}IEEE Conference on Decision and Control, Shanghai, China, 2009,
. These results have been obtained in collaboration with Oswaldo Luis
Do Valle Costa from Escola Politécnica da Universidade de São Paulo, Brazil. The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average
continuous control problem of piecewise deterministic Markov processes taking values in a general Borel space and with compact action space depending on the state variable. The PIA has received
considerable attention in the literature and consists of three steps: initialization, policy evaluation, which is related to the Poisson equation (PE) associated to the transition law defining
the Markov decision process, and policy improvement. In our context, the policy evaluation step is connected to a kind of PE which we call a pseudo-Poisson equation. This equation is clearly
different from a classical PE encountered in the literature of the discrete-time Markov control processes. However, although different, we can show in section that this pseudo-Poisson equation
still has the good properties that we might expect to satisfy in order to guarantee the convergence of the policy iteration algorithm. These results are not straightforward to obtain due to the
specific structure of this discrete-time optimality equation. Finally, the PIA is studied in details. It is first shown that the convergence of the PIA to a solution satisfying the optimality
equation holds under some classical hypotheses. In the sequence it is shown that this optimal solution yields to an optimal control strategy for the average control problem for the
continuous-time PDMP in a feedback form.

We have developped a computational method for optimal stopping of a piecewise deterministic Markov processes by using a quantization technique for Markov chains, see . Optimal stopping problems have been studied for PDMP's in , , , , , . In the author defines an operator related to the first jump time of the process, which we shall call first jump operator, and shows that the value function of the optimal stopping problem is a fixed point for this operator. Based on a probabilistic interpretation for this jump operator, we designed a numerical scheme to approximate the value function. The originality of our work is two-fold. On the one hand, instead of using a fixed time-discretization grid for our continuous-time process, we first compute the quantization of an underlying discrete-time Markov chain which naturally appears in our problem, and only then derive path-adapted time-discretization grids. On the other hand, some functionals of the underlying Markov chains in our jump operator are not Lipschitz-continous as they have jumps (they are typically indicatrices of some thresholds). We proved the convergence of our procedure and computed an error-bound. We implemented our procedure on a simple example and obtained very good results.

This work will appear in Annals of Applied Probability . Parts of this work have been presented at the conferences Optimal Stopping and their Applications in Turku, Finland in june 2009 and International Conference on Analysis and Design of Hybrid Systems in Zaragoza, Spain in september.

We have already shown that Piecewise Deterministic Markov Processes (PDMP) are very suitable for the modeling of reliability problems, see for example and .

In this work, we study the relevance of PDMP's in an other context of application: the modeling of crack propagation.

The well known empirical Paris-Erdogan law
has been adapted in order to take into account the random nature of propagation crack even in a controlled environment. For some authors
Cand
mare random variable, for other there are random processes. We propose here to use PDMP for the modeling of the evolution of a crack length : during a random time
Tthe length
aevolves according to a deterministic evolution equation
. At time
Tnew parameter
(
m
_{2},
C
_{2})are chosen trough a Markov matrix and the new evolution of the length
ais
. The parameters of the propagation
(
m
_{i},
C
_{i})belong to a finite set. We use the model to predict the evolution of the propagation using only the 10 first measures of the crack. This method consists of reducing the number of
possible regimes
(
m
_{i},
C
_{i})in keeping the nearest of the beginning of the crack.

We have fitted this model with the Virkler data. 68 replicate constant amplitude crack propagation tests were conducted on 2024-T3 aluminum alloy. The data were recorded
starting at a crack length of 9 mm to a final length of 49.8 mm and it produced 164 data points for each replicate. For each replicate, we have chosen by simulated annealing the best vector
(
m
_{1},
C
_{1},
T,
m
_{2},
C
_{2})such that the theoretical crack is near the experimental one. We use empirical statistics on the 68 vectors
(
m
_{1},
C
_{1},
T,
m
_{2},
C
_{2})to grade the model.

The results of simulation are very satisfactory. The set of simulated cracks include the experimental one even with only 4 regimes. The results of the prediction are also excellent. 64 of the 68 crack are well predicted with the 10 first measures.

This work is a part of a technical report with EADS Astrium on some mechanical and mathematical aspects of the crack propagation in alloy and is submitted for publication .

Bifurcating autoregressive (BAR) processes are an adaptation of autoregressive (AR) processes to binary tree structured data. They were first introduced by Cowan and Staudte for cell lineage data, where each individual in one generation gives birth to two offspring in the next generation. Cell lineage data typically consist of observations of some quantitative characteristic of the cells over several generations of descendants from an initial cell. BAR processes take into account both inherited and environmental effects to explain the evolution of the quantitative characteristic under study.

There are several results on statistical inference and asymptotic properties of estimators for BAR models in the literature. For maximum likelihood inference on small independent trees, see Huggins and Basawa . For maximum likelihood inference on a single large tree, see Huggins for the original BAR model, Huggins and Basawa for higher order Gaussian BAR models, and Zhou and Basawa for exponential first-order BAR processes. We also refer the reader to Zhou and Basawa for the LS parameter estimation. In all those papers, the BAR process is supposed to be stationary. In Guyon , the LS estimator is also investigated, but the process is not stationary, and the author makes intensive use of the tree structure and Markov chain theory.

We extended our previous results to
p-th order (
p2) BAR processes. More precisely, we have carried out a sharp analysis of the
asymptotic properties of the least squares (LS) estimators of the unknown parameters of
p-th order BAR processes via a martingale approach, based on the generation-wise filtration. More precisely, we have established the almost sure convergence of our LS estimators with a
sharp rate of convergence, together with the quadratic strong law and the central limit theorem.

This work is published in the Electronic Journal of Probablility , and was presented in several seminars in France, as well as at the conference Mathematical models for cell division held in Paris in march 2009.

The usefulness of persistent excitation is well-known in the control community. Thanks to a persistently excited adaptive tracking control, we show that it is possible to avoid the strong controllability assumption recently proposed by the authors for multivariate ARX models. We establish the almost sure convergence for both least squares and weighted least squares estimators of the unknown parameters. A central limit theorem and a law of iterated logarithm are also provided. All this asymptotical analysis is related to the Schur complement of a suitable limiting matrix.

This work has been recently submitted to International Journal of Control, and it will be presented at the 48
^{e}IEEE Conference on Decision and Control, Shanghai, China, 2009.

We propose a new concept of strong controllability related to the Schur complement of a suitable limiting matrix. This new notion allows us to extend the previous convergence results associated with multidimensional ARX models in adaptive tracking. On the one hand, we carry out a sharp analysis of the almost sure convergence for both least squares and weighted least squares algorithms. On the other hand, we also provide a central limit theorem and a law of iterated logarithm for these two algorithms. Our asymptotic results are illustrated by numerical simulations.

This work has been recently submitted to Automatica
, and it was presented at the 47
^{e}IEEE Conference on Decision and Control, Cancun, Mexico, 2008.

Concerning the dimension reduction framework which is a useful tool for the quality control part of the project, some results have been obtained for Sliced Inverse Regression (SIR) and related methods such as Sliced Average Variance Estimation (SAVE).

SIR and related approaches were introduced in order to reduce the dimensionality of regression problems. In general semiparametric regression framework, these methods determine linear
combinations of a set of explanatory variables
related to the response variable
Y, without losing information on the conditional distribution of
Ygiven
. They are based on a “slicing step” in the population and sample versions. They are sensitive to the choice of the number
Hof slices, and this is particularly true for SIR-II and SAVE methods. At the moment there are no theoretical results nor practical techniques which allows the user to choose an
appropriate number of slices. In
, we propose an approach based on the quality of the estimation of the
effective dimension reduction (EDR) space: the square trace correlation between the true EDR space and its estimate can be used as goodness of estimation. We introduce a naïve bootstrap
estimation of the square trace correlation criterion to allow selection of an “optimal" number of slices. Moreover, this criterion can also simultaneously select the corresponding suitable
dimension
K(number of the linear combination of
). From a practical point of view, the choice of these two parameters
Hand
Kis essential. We propose a 3D-graphical tool, implemented in R, which can be useful to select the suitable couple
(
H,
K). In this article, we focus on the SIR, SIR-II and SAVE methods. We indicate how the proposed criterion can be used in practice. A simulation study is performed to
illustrate the behaviour of this approach and the need for selecting properly the number
Hof slices and the dimension
K. The
`R`codes are available.

In the theory of sufficient dimension reduction, SIR is a famous technique that enables to reduce the dimensionality of regression problems by determining linear combinations of a
p-dimensional explanatory variable
related to a response variable
Y. However it is based on a crucial condition on the marginal distribution of the predictor
, often called the linearity condition. From a theoretical and practical point of view, this condition appears to be a limitation. Using an idea of Li et al. (2004) in the Ordinary Least
Squares framework, we propose in
to cluster the predictor space so that the linearity condition
approximately holds in the different partitions. Then we apply SIR in each cluster and finally estimate the dimension reduction subspace by combining these individual estimates. We give
asymptotic properties of the corresponding estimator. We show with a simulation study that the proposed approach, refered as cluster-based SIR, improves the estimation of the e.d.r. basis. We
also propose an iterative implementation of cluster-based SIR and show in simulations that it increases the quality of the estimator. Finally the methodology is applied on the horse mussel data
and the comparison of the prediction reached on test samples shows the superiority of cluster-based SIR over SIR.

Most of the common estimation methods for sample selection models rely heavily on parametric and normality assumptions. We consider in a multivariate semiparametric sample selection model and develop a geometric approach to the estimation of the slope vectors in the outcome equation and in the selection equation. Contrary to most existing methods, we deal symmetrically with both slope vectors. Moreover, the estimation method is link-free and distribution-free. It works in two main steps: a multivariate sliced inverse regression step, and a canonical analysis step. We establish -consistency and asymptotic normality of the estimates. We describe how to estimate the observation and selection link functions. The theory is illustrated with a simulation study.

In a multidimensional setting, lack of objective basis for ordering multivariate observations is a major problem in extending the notion of quantiles. Conditional quantiles are required in various biomedical or industrial problems. Numerous alternative definitions of (conditional) quantile for multidimensional variables, have been proposed in statistical literature. In we focus on the notion of geometric quantile and conditional geometric quantile, based on the minimization of a loss function. Asymptotics results has been obtained and an implementation in R allowed us to show the good numerical performances of the proposed estimators in practice on simulated and real datasets.

It has been found that, for a variety of probability distributions, there is a surprising linear relation between mode, mean and median. In this paper, the relation between mode, mean and median regression functions is assumed to follow a simple parametric model. In , we propose a semiparametric conditional mode (mode regression) estimation for an unknown (unimodal) conditional distribution function in the context of regression model, so that any m-step-ahead mean and median forecasts can then be substituted into the resultant model to deliver m-step-ahead mode prediction. In the semiparametric model, Least Squared Estimator (LSEs) for the model parameters and the simultaneous estimation of the unknown mean and median regression functions by the local linear kernel method are combined to infer about the parametric and nonparametric components of the proposed model. The asymptotic normality of these estimators is derived, and the asymptotic distribution of the parameter estimates is also given and is shown to follow usual parametric rates in spite of the presence of the nonparametric component in the model. These results are applied to obtain a data-based test for the dependence of mode regression over mean and median regression under a regression model.

**Clustering of variables.**

Clustering of variables is studied as a way to arrange variables into homogeneous clusters, thereby organizing data into meaningful structures. Once the variables are clustered into groups such that variables are similar to the other variables belonging to their cluster, the selection of a subset of variables is possible. Several specific methods have been developed for the clustering of numerical variables. However concerning categorical variables, much less methods have been proposed. In , we extend the criterion used by Vigneau and Qannari (2003) in their Clustering around Latent Variables approach for numerical variables to the case of categorical data. The homogeneity criterion of a cluster of categorical variables is defined as the sum of the correlation ratio between the categorical variables and a latent variable, which is in this case a numerical variable. We show that the latent variable maximizing the homogeneity of a cluster can be obtained with Multiple Correspondence Analysis. Different algorithms for the clustering of categorical variables are proposed: iterative relocation algorithm, ascendant and divisive hierarchical clustering. The proposed methodology is illustrated by a real data application to satisfaction of pleasure craft operators.

**Rotation in Multiple Correspondence Analysis**

Multiple Correspondence Analysis (MCA) is a well-known multivariate method for statistical description of categorical data. Similarly to what is done in Principal Component Analysis (PCA) and Factor Analysis, the MCA solution can be rotated to increase the components simplicity. The idea behind a rotation is to find subsets of variables which coincide more clearly with the rotated components. This implies that maximizing components simplicity can help in factor interpretation and in variables clustering. In , we propose a two-dimensional analytic solution for rotation in MCA. Similarly to what is done by Kaiser (1958) for PCA, this planar solution is computed in a practical algorithm applying successive pairwise planar rotations for optimizing the rotation criterion. This criterion is a varimax-based one relying on the correlation ratio between the categorical variables and the MCA components. A simulation study is used to illustrate the proposed solution. An application on a real data set shows the possible benefits of using rotation in MCA.

An important contract was signed in june 2009 between EDF R&D team ICAME, and INRIA teams ALEA and CQFD. It deals with new statistical methods for recursive adaptative prediction of the electricity consumption of EDF customers. The contract was divided into five lots and it will last until 2011.The first lot was delivered in November 2009. The total value of the contract is 140000 euros.

We are working on the statistical and quality control parts (sampling and data collection, survey weights, sampling errors) of a study financed by VNF (Voies Navigables de France) concerning a satisfaction survey of sailors on the “canal des deux mers” in south of France (total value of the contract: 6000 euros).

The goal of this project is to propose and study an approach to evaluate the probability of occurrence of events defined by the crossing of a threshold. For this we have started collaborating with Marie Touzet of the LMP (Laboratoire de Mécanique Physique). Together, we have investigated the different laws adapted to the propagation of a crack in alloys. We have proposed the modeling of the evolution of a crack length by Piecewise Deterministic Markov Processes. This work will give rise to the technical report .

Jérome Saracco is the leader of a research project financed by the Région Aquitaine for three years (2007-2009), named
*Estimation recursive pour des modèles semiparamétriques en Statistique*with a total amount of 120 000 euros including the PhD-grant of Thi Mong Ngoc Nguyen.

The goal of the project ”FAUTOCOES” (number ANR-09-SEGI-004) of the ARPEGE program of the French National Agency of Research (ANR) can be described as follows. Today, complex technological processes must maintain an acceptable behavior in the event of random structural perturbations, such as failures or component degradation. Aerospace engineering provides numerous examples of such situations: an aircraft has to pursue its mission even if some gyroscopes are out of order, a space shuttle has to succeed in its re-entry trip with a failed on-board computer. Failed or degraded operating modes are parts of an embedded system history and should therefore be accounted for during the control synthesis.

These few basic examples show that complex systems like embedded systems are inherently vulnerable to failure of components and their reliability has to be improved through fault-tolerant control. Embedded systems require mathematical representations which are in essence dynamic, multi-model and stochastic. This increasing complexity poses a genuine scientific challenge:

to model explicitly and realistically the dynamical interactions existing between the physical state variables defining the system: pressure, temperature, flow rate, intensity, etc, and the functional and dysfunctional behavior of its components;

to estimate the performance of the system through the evaluation of reliability indexes such as availability, quality, and safety;

to optimize the control to prevent system failures, as well as to maintain the system function when a failure has occurred.

Our aim is to meet the previously mentioned challenge by using the framework of piecewise deterministic Markov processes (PDMP's in short) with an emphasis on probabilistic and deterministic numerical methods. More precisely, our objectives are

to use the framework of piecewise deterministic Markov processes to model complex physical systems and phenomena;

to compute expectations of functionals of the process in order to evaluate the performance of the system;

to develop theoretical and numerical control tools for PDMP's to optimize the performance and/or to maintain system function when a failure has occurred.

Bernard Bercu is the leader of the ECOS Action between France and Mexico, entitled
*Problèmes de rupture et le contrôle adaptatif pour les modèles de régression*. The Universities of Bordeaux 1, Toulouse 3, Paris 5, Mexico, and Puebla are involved in this action for four
years.

B. Bercu belongs to the MAS thematic group of the SMAI.

B. Bercu is an assistant director of the Institute of Mathematics of Bordeaux. He is also a member of the IMB council as well as of the UFR council of the University of Bordeaux

B. Bercu is the director of applied mathematic department of the University of Bordeaux.

M. Chavent is webmaster of the web site of the SFDS (Société Française de Statistique).

M. Chavent is a member of the scientific commitee of the following conferences EGC' 09.

M. Chavent is vice-secretary and member of the administration council of the SFDS (Société Française de Statistique).

B. de Saporta belongs to the board of SMAI-MAS group. She is webmaster of their website.

B. de Saporta is a member of the organizing comitee of enigmath 2008 and 2009, a free online mathematical quizz.
http://

B. de Saporta presented an animation at Cap Science for the
*fête de la science*2009.

F. Dufour is associate editor of the journal : SIAM Journal of Control and Optimization since 2009.

F. Dufour is member of the IFAC Technical Committee TC 1.4 Stochastic Systems, term Period 2008-2011.

F. Dufour is member of the scientific council of the engineering school ENSEIRB-MATMECA.

A. Gégout-Petit is in charge to promote the "Licence MASS" (Applied mathematics degree) of the University of Bordeaux 2 to the secondary school pupils.

A. Gégout-Petit is an elected member of the administration council of the SFdS.

A. Gégout-Petit is member of the administration team of the web domain emath.fr

A. Gégout-Petit is an elected member of the CEVU (Conseil des Etudes et de la Vie Universitaire) of the Bordeaux 2 University

J. Saracco is a member of the administration council of the University of Bordeaux 4.

J. Saracco is a reviewer for Computational Statistics and Data Analysis, Journal of Multivariate Analysis, The Annals of Statistics, Statistica Sinica, Biometrika, Journal of the American Statistical Association.

The team CQFD was very involved in the organization of the
`"41e Journées de Statistique"`Annual event of the "Société Française de Statistique" which took place in Bordeaux in 2009.

The teams ALEA and CQFD, as well as INRIA Bordeaux Sud-Ouest, are very involved in the organization of the "8e Journées MAS de la SMAI",
http://

B. Bercu was the organizer of the
`"Fifth Meeting of Mathematical Statistics"`between the Universities of Bordeaux, Montpellier, Santander, Valladolid, Pau, and Toulouse in June 2009.

B. de Saporta co-organized with Christian Paroissin a workshop on applied mathematics in dependability and safety

B. Bercu gave contributed talks at

Orsay University, February 2009

University of l'Aquila, Italia, March 2009

Bordeaux University, June 2009

Journées Algorithmes stochastiques, November 2009

Rennes University, December 2009

48
^{e}IEEE CDC Conference, Shanghai, December 2009

M. Chavent gave contributed talks at

IFCS 2009 (11th Conference of the International Federation of Classification Societies),Dresden, Germany, March 2009.

41èmes journées de Statistique, Bordeaux, May 2009.

XIIIth Applied Stochastic Models and Data Analysis (ASMDA2009) International Conference, Vilnius, Lithuania, July 2009.

SFC 2009 (XVIèmes Rencontres de la Société Francophone de Classification), Grenoble, France, September 2009.

B. de Saporta gave contributed talks at

the seminar of probability and statistics of Lille 1 university

the seminar of applied mathematics of IMB

the workshop Mathematical models for cell division at IHP Paris

the symposium on Optimal stopping with applicatios in Turku, Finland

the 3rd IFAC Conference on Analysis and Design of Hybrid Systems, Zaragoza, Spain.

F. Dufour spent two weeks at Escola Politecnica, Universidade de Sao Paulo, Brasil (June, 2009) working with O.L.V Costa.

F. Dufour spent two weeks at University of Wisconsin, Milwaukee, USA in February and October 2009 working with R. Stockbridge.

F. Dufour gave a contributed talk at
48
t
hIEEE CDC Conference, Shanghai, December 2009.

F. Dufour chairs the 48th CDC/28th CCC session "Stochastic Control I".

A. Gégout-Petit gave contributed talks at

41èmes journées de Statistique, Bordeaux, May 2009.

Causal modeling workshop, Oslow, september 2009.

M. Saracco gave contributed talks at

IFCS 2009 (11th Conference of the International Federation of Classification Societies),Dresden, Germany, March 2009.

41èmes journées de Statistique, Bordeaux, May 2009.

XIIIth Applied Stochastic Models and Data Analysis (ASMDA2009) International Conference, Vilnius, Lithuania, July 2009.

Troisièmes Rencontres des Jeunes Statisticiens (sous l'égide de la SFDS), Aussois, France, September 2009.

B. Bercu, A. Brandejsky, M. Chavent, B. de Saporta, F. Dufour, A. Gégout-Petit, J. Saracco, and H. Zhang teach graduate probability and Statistics and in the cursus "Statistique et Fiabilité" (Statistic and Reliability) of the Master "Ingénierie Mathématique Statistique et Economique" at the Universities of Bordeaux 1, 2, 4, and Institut Polytechnique de Bordeaux.

M. Chavent and A. Gégout-Petit teach Statistic in licence MASS of University of Bordeaux 2.

A. Gégout-Petit is an academic tutor for students work experiences in company or Research Organism.

B. de Saporta teaches undergraduate mathematics and postgraduate probability and finance at the university of economics Montesquieu Bordeaux 4.

J. Saracco teaches statistics in Magistère d'Economie et de Finance Internationale (MAGEFI), University Bordeaux 4 and linear algebra and Statistics in Licence of Economics, University Bordeaux 4.