The research domain for the selectproject is statistics. Statistical methodology has made great progress over the past few decades, with a variety of statistical learning software packages that support many different methods and algorithms. Users now face the problem of choosing among them, to select the most appropriate method for their data sets and objectives. The problem of model selection is an important but difficult problem both theoretically and practically. Classical model selection criteria, which use penalized minimum-contrast criteria with fixed penalties, are often based on unrealistic assumptions.

selectaims to provide efficient model selection criteria with data-driven penalty terms. In this context, selectexpects to improve the toolkit of statistical model selection criteria from both theoretical and practical perspectives. Currently, selectis focusing its effort on variable selection in statistical learning, non-linear regression models with random effects, hidden-structure models and supervised classification. Its domains of application concern reliability, curves classification, phylogeny analysis and classification in genetics. New developments of selectactivities are concerned with applications in biostatistics (statistical analysis of fMRI data, population pharmacology) and population genetics.

Pascal Massart has been nominated senior member of the Institut Universitaire de France.

We learned from the applications we treated that some assumptions which are currently used in asymptotic theory for model selection are often irrelevant in practice. For instance, it is not realistic to assume that the target belongs to the family of models in competition. Moreover, in many situations, it is useful to make the size of the model depend on the sample size which make the asymptotic analysis breakdown. An important aim of selectis to propose model selection criteria which take these practical constraints into account.

An important purpose of selectis to build and analyze penalized log-likelihood model selection criteria that are efficient when the number of models in competition grows to infinity with the number of observations. Concentration inequalities are a key tool for that purpose and lead to data-driven penalty choice strategies. A major issue of selectconsists of deepening the analysis of data-driven penalties both from the theoretical and the practical side. There is no universal way of calibrating penalties but there are several different general ideas that we want to develop, including heuristics derived from the Gaussian theory, special strategies for variable selection and using resampling methods.

Choosing a model is not only difficult theoretically. From a practical point of view, it is important to design model selection criteria that accommodate situations in which the data probability distribution P is unknown and which take the model user's purpose into account. Most standard model selection criteria assume that P belongs to one of a set of models, without considering the purpose of the model. By also considering the model user's purpose, we avoid or overcome certain theoretical difficulties and can produce flexible model selection criteria with data-driven penalties. The latter is useful in supervised Classification and hidden-structure models.

The Bayesian approach to statistical problems is fundamentally probabilistic. A joint probability distribution is used to describe the relationships among all the unknowns and the data. Inference is then based on the posterior distribution i.e. the conditional probability distribution of the parameters given the observed data. Exploiting the internal consistency of the probability framework, the posterior distribution extracts the relevant information in the data and provides a complete and coherent summary of post-data uncertainty. Using the posterior to solve specific inference and decision problems is then straightforward, at least in principle.

Mathematical modelling of the dynamic processes involved in biological processes constitutes an important application in biostatistics. Mixed effect models are very useful for modelling the variability within a population of these dynamic processes. Several statistical issues can be studied related to these models, such as parameter estimation, model selection (covariate model through the specification of fixed effect structure, covariance model for random effects), models defined by Ordinary or Stochastic Differential Equations, left censored models, hidden Markov models as well as design optimization for the trial itself.

A key goal of selectis to produce methodological contributions in statistics. For this reason, the selectteam works with applications that serve as an important source of interesting practical problems and require innovative methodologies to address them. Most of our applications involve contracts with industrial partners, e.g. in reliability and pharmacology, although we also have several more academic collaborations, e.g. genomics, genetics and neuroimaging.

The field of classification for complex data as curves, functions, spectra and time series is important. Standard data analysis questions are being revisited to define new strategies that take the functional nature of the data into account. Functional data analysis addresses a variety of applied problems, including longitudinal studies, analysis of fMRI data and spectral calibration.

We are focusing on unsupervised classification. In addition to standard questions as the choice of the number of clusters, the norm for measuring the distance between two observations, and the vectors for representing clusters, we must also address a major computational problem. The functional nature of the data needs to be design efficient anytime algorithms.

Since several years,
selecthas
collaborations with EDF-DER
*Maintenance des Risques Industriels*group. An important
theme concerns the resolution of inverse problems using
simulation tools to analyse incertainty in highly complex
physical systems. A collaboration on an analogous topic is
developed with Dassault Aviation.

The other major theme concerns probabilistic modelling in fatigue analysis in the context of a research collaboration with SAFRAN an high-technology group (Aerospace propulsion, Aicraft equipment, Defense Security, Communications).

Since 2007 selectparticipates to a working group with team Neurospin (CEA-INSERM-INRIA) on Classification, Statistics and fMRI (functional Magnetic Resonance Imaging) analysis. In this framework two theses are co-supervised by selectand Neurospin researchers (Merlin Keller since October 2006 and Vincent Michel since October 2007). The aim of this research is to determine which parts of the brain are activated by different types of stimuli. A model selection approach is useful to avoid "false-positive" detections.

Pharmacokinetic (PK) and pharmacodynamic (PD) studies (studies investigating the dose-concentration and concentration-effect relationships of drugs) show for many drugs a large variability of pharmacokinetic and pharmacodynamic parameters between individuals. Pharmacokinetic parameters describe processes such as absorption, diffusion and metabolism of drugs. The so-called "population PK/PD approach" has been developed to characterize and quantify this variability. We have developed a complete methodology for the analysis of PK/PD data using a maximum likelihood approach.

An important application is the study of anti-HIV treatment. The efficiency of antiretroviral treatments, whether in HIV or hepatitis B or C pathologies, is quantified by the decrease in viral loads. Models have been developed to describe the time-course of this decrease through a system of ODE, taking into account the physiology of viral replication and the action mechanisms of the different therapeutic options. There is a large inter-patient variability in these pathologies, and the joint study of viral load decrease through mixed effect models in a set of patients provides a better understanding of differences in the response to treatment.

mixmodis being
developed in collaboration with Christophe Biernacki, Florent
Langrognet (Université de Franche-Comté) and Gérard Govaert
(Université de Technologie de Compiègne).
mixmod(
mixture
modelling)
software fits mixture models to a given data set with either
a clustering or a discriminant analysis purpose.
mixmoduses a large
variety of algorithms to estimate mixture parameters, e.g.,
EM, Classification EM, and Stochastic EM. They can be
combined to create different strategies that lead to a
sensible maximum of the likelihood (or completed likelihood)
function. Moreover, different information criteria for
choosing a parsimonious model, e.g. the number of mixture
component, some of them favoring either a cluster analysis or
a discriminant analysis view point, are included. Many
Gaussian models for continuous variables and multinomial
models for discrete variable are available. Written in C++,
mixmodis
interfaced with
Scilaband
Matlab. The
software, the statistical documentation and also the user
guide are available on the Internet at the following address:
http://

Since November 2008, a new expert engineer Jean-François Si Abdallah has been hired for two years to continue to enrich the software, improve the performances and code a proper graphical user interface in mixmodwhich Version 1 will be available at the end of the year 2010.

monolix(
http://

Parameter estimation (computing the maximum likelihood estimator of the parameters, without any approximation of the model, computing standard errors for the maximum likelihood estimator),

Model selection (comparing several models using some information criteria (AIC, BIC), testing hypotheses using the Likelihood Ratio Test, testing parameters using the Wald Test),

Goodness of fit plots,

Data simulation.

Several stochastic algorithms are used in monolix: Stochastic approximation of EM (SAEM), Importance Sampling, MCMC, and Simulated Annealing... Theoretical properties of the proposed algorithms and practical applications were published in several papers.

Marc Lavielle has presented the software in several occasions:

PAGE meeting, Berlin, June 2010,

Sheffield University (UK), January 2010,

Sanofi-Aventis (France), October 2010,

Version 3.2 of monolixis available since November 2010. This version of the software was developed thanks to the financial support of Novartis, Roche, J&J, Sanofi-Aventis, Astrazeneca, Exprimo. This version was presented during the MONOLIX Day on November 26th at the Hotel Holliday Inn - Canal de la Villette, Paris.

The MONOLIX Project consists primarily of developing the next versions of the MONOLIX software with a view to raising its level of functionalities and responding to major requirements of the bio-pharmaceutical industry.

The MONOLIX Project is carried out by INRIA, and sponsored by the Industry.

The MONOLIX Scientific Guidance Committee involves representatives of the sponsors.

We have obtained from INRIA Saclay-Île-de-France an ADT (Action Développment Logiciel) to hire two engineers (Benoît Charles until June 2011, Hector Mesa until December 2010).

We have obtained from DIGITEO an OMTE (Opération de Maturation Technico-Économique) to hire one engineer (Kaelig Chatel from September 2009 to March 2011), market assessment and intellectual property coaching.

In collaboration with Serge Cohen (IPANEMA Soleil), Erwan Le Pennec has proposed a novel way to use spatial information for hyperspectral image segmentation. They rely on a Gaussian mixture model in which the proportions are piece-wise constant in the spatial domain. They proposed a penalization scheme to select simultaneously the number of clusters, the segmentation as well as the type of covariance structure. This scheme have been implemented within MIXMOD and used with success with conservation sciences data. Their algorithm is supported by a theoretical work on conditional density estimation. They have derive oracle inequalities relating a tensorized Kullback-like divergence of the estimation error with a deterministic quantity measuring a penalized tensorial Kullback divergence approximation property of the model collection. They obtain those results for their spatial Gaussian mixture model as well as for more classical histogram type conditional density models. The theoretical penalty is known up to a factor that has to be determined numerically. Serge Cohen and Erwan Le Pennec are investigating a slope heuristic approach to handle this issue. They want next to tackle the issue of dimension reduction in an unsupervised segmentation context .

In collaboration with Marie-Laure Martin-Magniette (URGV et UMR AgroParisTech/INRA MIA 518) and Cathy Maugis (INSA Toulouse) has extended their variable selection procedure for model-based clustering to supervised classification. Identifiabily and consistence of their model has been proved. Their forward selection procedure lead to dramatically improved error rates for quadratic discriminant analysis . These variable selection procedures are in particular used for genomics applications which is the result of a collaboration with researchers of of URGV (Evry Genopole).

Jean-Patrick Baudry in collaboration with Cathy Maugis (INSA Toulouse) and Bertrand Michel (Université Paris 6) have developped a Matlab package for the use of the slope heuristics of Birgé and Massart (dimension jump, slope estimation, ...). The aim is twofold: first to propose solutions to overcome the practical difficulties involved by its practical application and second to provide a user-friendly solution to apply the slope heuristics. They have submitted an overview about the slope heuristics to introduce this package .

Jean-Patrick Baudry and Gilles Celeux, in collaboration
with Ana Maria Ferreira and Margarida Cardoso (Lisbon
University), proposed a model selection criterion which can
be helpful when it can be interesting to find a solution
well-related to an external classification available
*a priori*. The development of the methodology and
applications to real data in various fields, and notably in
the professional development field, are in progress.

Pascal Massart and Caroline Meynet have analyzed the
performance of the Lasso as regards its
_{1}-regularization properties in a general Gaussian
framework which includes the fixed design regression or the
white noise frameworks. They have provided an
_{1}-oracle inequality satisfied by the Lasso in the case
of a finite dictionary. This result does not require any
assumption and proves that the Lasso performs almost as well
as the deterministic Lasso, provided that the regularization
parameter is properly chosen. They proposed a new estimator
to deal with infinite countable dictionaries. It is an
_{0}-penalized estimator among a sequence of Lasso
estimators associated to a dyadic sequence of growing
truncated dictionaries. They provided an oracle inequality
satisfied by this estimator and proved that it performs as
well as greedy algorithms with the advantage of being
adaptive
.

In collaboration with Professor Abdallah Mkhadri (University of Marrakesh, Marocco), Gilles Celeux supervised the thesis of Mohammed El Anbari which concern regularisation methods in linear regression. This year, in collaboration with Professor Abdallah Mkhadri (University of Marrakesh, Marocco), Mohammed El Anbari proposed a method to simultaneously select variables and favor a grouping effect where strongly correlated predictors tend to be in or out of the model together. Numerical experiments showed that their method can be preferred to Elastic-Net when the number of variables is less or equal to the sample size and remain competitive otherwise. Morever, they have proposed AdaGril an extension of the the adaptive Elastic Net which incorporates information redundancy among correlated variables for model selection and estimation. Under weak conditions, They have established an oracle property of AdaGril. Numerical experiments show in some cases of AdaGril outperforms its competitors.

In collaboration with Jean-Michel Marin (Université de Montpellier) and Christian P. Robert (CEREMADE, Université Paris Dauphine) Gilles Celeux and Mohammed El Anbari highlight the interest of Bayesian regularization methods, using hierarchical non informative priors, compared with standard regularisation methods in a poorly informative context through numerical experiments .

Jean-Michel Poggi is the supervisor of the PhD Thesis of Robin Genuer since September 2007 dedicated to Random Forests and related algorithms for variable selection in regression or classification. Random Forest, due to Leo Breiman in 2001, proceeds by aggregation decision trees according to two random perturbations. The first one perturbs the learning sample according to the bootstrap principle and the second one acts on the covariate space by choosing randomly a small number of explanatory variables to split a tree node. Surprisingly, this algorithm is extremely powerful for regression and classification problems, not only for prediction but also for variable selection purposes. The PhD thesis, obtained in november 2010, is articulated following three directions:

The theoretical direction concerns mathematical understanding of the reasons of this amazing behaviour. A simplified version of random forests, called purely random forests, which can be theoretically handled more easily, has been considered. In the regression context with a one-dimensional predictor space, both random trees and random forests has been proved to reach minimax rate of convergence. In addition, it is proven that compared to random trees, random forests improve accuracy by reducing the estimator variance by a factor of three fourths .

The second methodological direction aims at improving the knowledge about how to tune the parameters and to propose, with Jean-Michel Poggi and Christine Tuleau-Malot, a new variable selection strategy . It includes computer intensive simulations and comparisons based on well-known real data sets. Two classical issues of variable selection are considered. The first one is to find important variables for interpretation and the second one aims to design a good parsimonious prediction model. The main contribution is twofold: to provide some experimental insights about the behavior of the variable importance index based on random forests and to propose a strategy involving a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.

The last one is of applied nature and takes place on the joint working group between selectand Neurospin (INRIA, CEA) dedicated to statistical methods for fMRI new data in order to improve knowledge about brain activities. It aims to develop ad-hoc variable selection strategies (see .

Unsupervised segmentation is an issue similar to unsupervised classification with an added spatial aspect. Functional data is acquired on points in a spatial domain and the goal is to segment the domain in homogeneous domain. The range of applications includes hyperspectral images in conservation sciences, fMRi data and all spatialized functional data. Erwan Le Pennec is focusing on the questions of the way to handle the spatial component from both the theoretical and the practical point of views as well as the choice of the number of clusters. Furthermore, as functional data require heavy computation, he is required to propose numerically efficient algorithms.

After studying the state of the art on variational methods , Christine Keribin in collaboration with Gilles Celeux and Gérard Govaert (UTC Compiègne) has compared the behavior of the variational method with the use of the Stochastic EM algorithm, for the latent block model (case of biclustering, simultaneous classification of rows and columns of a matrix). This has been achieved on benchmark data and is leading to the use of the SEM algorithm to find an appropriate initialization for the VB algorithm ). This study will be continued on real data and extend to the construction of model selection methods to choose a relevant latent block model.

In the computer experiments field, the goal is to approximate an expensive black box function from a limited number of evaluations. The choice of these evaluations i.e. the choice of a design of (computer) experiments is a major issue.

have justified to take a design satisfying to the
maximincriterion
by using results from the approximation theory. In the case
where the black box function is to be approximated on a
hypercubic domain, the standard strategy consists of taking a
maximindesign
within a class of Latin hypercube Designs. However, the Latin
hypercube sampling is pointless in a non hypercubic domain.
Yves Auffray, Pierre Barbillon and Jean-Michel Marin (
) proposed a simulated annealing
algorithm, implemented in
*C*, which aims at obtaining a
maximindesign in
any bounded connected domain. They proved the convergence of
their algorithm. This year Yves Auffray and Pierre Barbillon,
in collaboration with Jean-Michel Marin (Université de
Montpellier) have considered estimating the probability of
rare events in the context of computer experiments. These
rare events depends on the output of a physical model with
random input variables. Since the model is only known through
an expensive black box function, a crude Monte Carlo
estimator does not perform well. Two strategies have been
developped to cope with this difficulty: a Bayesian estimate
and an importance sampling method. Both methods relies on
Kriging metamodeling. They are able to achieve sharp upper
confidence bounds on the rare event probability. These
methods have been applied to a toy example and a real case
study which consists of finding an upper bound of the
probability that the trajectory of an airborne load collides
the aircraft that has released it.

In the framework of a CIFRE convention with EDF (MRI
departement), Shuai Fu has started a thesis supervised by
Gilles Celeux. It concern the resolution of non linear
inverse problems for the quantification of uncertainties in a
physical model. Noisy observed data
(
Y)are dependent through a known but
complex function
Hfrom non observed data
X. The aim is to estimate parameters of the probability
distribution of
(
X)and the variance of the noise.
This problem has a missing data structure and has been solved
with a method coupling the use of the Stochastic EM algorithm
with a MCMC method and a Kriging approximation of the
Hfunction
under identifiability
constraints. In order to overcome those limitations, Shuai Fu
embedded the inverse problem in a Bayesian framework through
an hybrid MCMC algorithm. Preliminary numerical experiments
show the importance of choosing a smart design. Assessing the
quality of the approximation and defining efficient adaptive
designs are the next steps to be considered.

Moreover a complete numerical study has been achieved in the context of the project EHPOC by Rémi Fouchereau with the help of Pierre Barbillon, Gilles Celeux and Shuai Fu on Bayesian inference for inverse problems in industrial uncertainty analysis .

In collaboration with Farouk Mhamdi and Meriem Jaidane (ENIT, Tunis, Tunisia), Jean-Michel Poggi proposed, in . a method for trend extraction from seasonal time series through the Empirical Mode Decomposition (EMD). Experimental comparison of trend extraction based on EMD, X11, X12 and Hodrick Prescott filter are conducted. First results show the eligibility of the blind EMD trend extraction method. Tunisian real peak load is also used to illustrate the extraction of the intrinsic trend.

In collaboration with Mina Aminghafari (Amirkabir University, Teheran), Jean-Michel Poggi made uses of wavelets in a statistical forecasting purpose for time series. Recent approaches involve wavelet decompositions in order to handle non stationary time series. They study and extended an approach proposed by Renaud et al., to estimate the prediction equation by direct regression of the process on the Haar non-decimated wavelet coefficients depending on its past values. The new variants are used first for stationary data and after for stationary data contaminated by a deterministic trend .

Jean-Michel Poggi is the supervisor (with A. Antoniadis) of the PhD Thesis of Jairo Cugliari-Duhalde which takes place in a CIFRE convention with EDF. It is strongly related to the use of wavelets together with curves clustering in order to perform accurate load comsumption forecasting. The thesis develops methodological and applied aspects linked to the electrical context as well as theoretical ones by introducing exogeneous variables in the context of nonparametric forecasting time series (see and ).

This research takes place as part of a collaboration with
Neurospin (
http://

Vincent Michel began his PhD in October 2007 under the supervision of Gilles Celeux, Christine Keribin (Paris XI/ Select) and Bertrand Thirion (Parietal). His thesis focused on the different approaches of statistical learning for studying the neural code of cognitive functions, based on brain functional Magnetic Resonance Imaging (fMRI) data. In particular, he has studied the spatial organization of the neural entities implied in different cognitive tasks, with a special interest for the visual system. The three main methodological contributions of this thesis are the following:

A
*Bayesian*approach for sparsity-inducing
regularization, called
*Multi-Class Sparse Bayesian Regression – MCBR*
. This approach is a
generalization of the two principal Bayesian
regularization techniques,
*Bayesian Ridge Regression*and
*Automatic Relevance Determination*.

He proposed an approach, called
*supervised clustering*, that includes spatial
information in the prediction framework, and yields
clustered weighted maps
. It can be used with any
prediction functions for highly dimensional data.

He implemented both sparsity and
spatially-informed regularization within the same
framework. He proposed a generalization of the
*Total Variation regularization*for prediction task,
and he showed its good performance in the case of
*fMRI*data analysis
.

In addition to these methodological directions, Vincent
Michel has also worked on the implementation of the
algorithms studied an detailed in this thesis and contributed
to
*Scikit-learn*, an open-source library for statistical
learning (
http://

Moreover, Vincent Michel and Robin Genuer examine the value of random forests to deal with such problems. and present a new approach for the prediction of a behavioral variable from Functional Magnetic Resonance Imaging (fMRI) data. The difficulty comes from the huge number of image voxels that may provide relevant information with respect to the limited number of available images. Based on Random Forests, the approach provides an accurate auto-calibrated framework for selecting a reduced set of jointly informative regions.

Technique,

**Mixed effects hidden Markov models**: MHMM have been
recently developped as an extension of hidden Markov models
to population studies. In the pharmacometric area, mixed
hidden Markov models would provide an accurate description of
longitudinal data collected during certain clinical trials,
especially when distinct (hidden) disease stages are supposed
to condition the distribution of some biological markers.
Those particular models are quite easily interpretable and
could even show similarities in the biological process that
governs certain pathologies.

Our work mainly aimed at developping and evaluating a complete methodology for estimating parameters in those new models. Our algorithms were applied in the clinical context of epilepsy, to model daily seizure counts in epileptic patients and to assess the effects of a given anti-epileptic drug on the evolution of the epileptic symptoms.

**Mixture models and model mixtures**: a patient
population is usually heterogeneous with respect to response
to drug therapy. In any clinical efficacy trial, patients who
respond, those who partially respond and those who do not
respond present very different profiles. Then, diversity of
the observed kinetics cannot be explained adequately only by
the inter-patient variability of some parameters and mixtures
are a relevant alternative in such situations.

We have extended the SAEM algorithm for mixture models and model mixtures. We have applied the proposed methodology for analyzing viral load data arising from HIV infected patients. We propose to describe these viral load data with a mixture of three models. Indeed, the data seem to exhibit three different typical profiles: responders, non-responders and rebounders.

**Non-Linear Mixed Effects Models with Stochastic
Differential Equations**: the use of stochastic
differential equations in non linear mixed effects models
enables the decomposition of the intra-patient variability
into some residual errors and some dynamical system
variability.

We have developed a new algorithm based on the Stochastic Approximation EM (SAEM) method with the Kalman Filter for linear SDE systems and with the Extended Kalman Filter for nonlinear SDE systems.

selecthas a contrat with EDF regarding modelling uncertainty in deterministic models.

selecthas a contrat with EDF regarding wavelet analysis of the electrical load consumption for the aggregation and desaggregation of curves to improve total signal prediction.

Several pharmaceutical companies already joined the Monolix software project and signed a contract with INRIA:

Novartis

Roche

Sanofi-Aventis

Johnson & Johnson

Astrazeneca

Exprimo

selecthas a contrat with SAFRAN - MESSIER-DOWTY, an high-technology group (Aerospace propulsion, Aicraft equipment, Defense Security, Communications),regarding modelling reliability of Aircraft Equipment (collaboration with Patrick Pamphile (Université Paris-Sud).

selecthas a contract with Total regarding short time Fourier transform for Spurious signal detection.

The project GAS was selected by the DIGITEO consortium in the framework of the “Domaines d'Intérêt Majeur” call of the Région Ile-de-France. The main partner is GEOMETRICA. The other partners of the project are the Ecole Polytechnique (F. Nielsen) and select. The project intends to explore and to develop new researches at the crossing of information geometry, computational geometry and statistics. It started in September 2008 and it is expected duration is two years. In this setting, Pascal Massart is the cosupervisor with Frédéric Chazal (GEOMETRICA) of the thesis of Claire Caillerie (GEOMETRICA).

Select has been involved in the project EHPOC (industrial platform deliveries) of the pole SYSTEMATIC. An engineer, Rémi Fouchereau has been appointed to achieve a complete study on Bayesian inference for inverse problems in industrial uncertainty analysis.

selectis animating a working group on model selection and statistical analysis of genomics data with the Biometrics group of Institut Agronomique Nationale Paris-Grignon (INAPG).

Pascal Massart is co-organizing a working group at ENS (Ulm) on Statistical Learning. This year the group focused interest on regularisation methods in regression. Most of selectmembers are involved in this working group.

selectis animating a working group on Classification, Statistics and fMRI imaging with Neurospin.

Gilles Celeux and Pascal Massart are members of the PASCAL (Pattern Analysis, Statistical Learning and Computational Learning) network.

Gilles Celeux is one of the co-organizers of the Working Group on Model-Based Clustering.

Gilles Celeux is Editor-in-Chief of
*Statistics and Computing*. He is Associate Editor
of
*CSBIGS*and
*La Revue Modulad*.

Pascal Massart is Associated Editor
of
*Annals of Statistics*,
*Confluentes Mathematici*, and
*Foundations and Trends in Machine Learning*.

Jean-Michel Poggi is Associated
Editor of
*Journal of Statistical Software*,
*Journal de la SFdS*and
*CSBIGS*.

Gilles Celeux was invited speaker to the Fall session of the Swiss Statistical Seminar in Bern and to the Summer Model-Based Clustering working group in Grenoble.

Marc Lavielle was invited speaker at MAS 2010 in Bordeaux, Population PK 2010 in Amsterdam, participant of invited sessions at ISB 2010 in Montpellier, ERCIM 2010 in London.

Jean-Michel Poggi was invited speaker at SIS 2010, Scientific Meeting of the Italian Statistical Society in Padua.

Gilles Celeux is member of the scientific council of the MIA Department of INRA. He has been Guest Editor of the Special Issue on Mixture Models of the Revue Modulad (Number 40, July 2009). He was member of the AERES evaluation coucil of the Department BIAT (Unité de Biométrie et intelligence artificielle) of INRA Toulouse.

Marc Lavielle is member of the Haut Conseil des Biotechnologies.

Marc Lavielle is director of the GDR (Groupement de Recherche) "Statistique et Santé", Research Unit 3067 of the CNRS.

Marc Lavielle is member of the council of the SMAI (Société de Mathématiques Appliquées et Industrielles).

Marc Lavielle is member of the ITMO Committee on Public Health.

Marc Lavielle was the head of the AERES evaluation committee of the MISTEA team,

Marc Lavielle was member of the evaluation committee of the mathematical department of the Université Libre de Bruxelles.

Pascal Massart is a member of the scientific council of the French Mathematical Society.

Pascal Massart is a member of the scientific council of the Mathematical Department of the Ecole Normale Supérieure de Paris.

Pascal Massart was a member of the scientific committee of the European Meeting of Staticians in Piraeus.

Jean-Michel Poggi is Cochair seminar of Probability and Statistics of the "laboratoire de Mathématiques d'Orsay", seminar ECAIS (Extraction de connaissances : approches informatiques et statistiques) of IUT de Paris 5 Descartes and of "Séminaire Parisien de Statistique".

Jean-Michel Poggi is member of the Council of the French statistical society (SFdS).

Jean-Michel Poggi is member of the Board of the "Environment group" of the French statistical society (SFdS).

All the selectmembers are teaching in various courses of different universities and in particular in the M2 “Modélisation stochastique et statistique” of University Paris-Sud.