The research domain for the select project is statistics. Statistical methodology has made great progress over the past few decades, with a variety of statistical learning software packages that support many different methods and algorithms. Users now face the problem of choosing among them, to select the most appropriate method for their data sets and objectives. The problem of model selection is an important but difficult problem both theoretically and practically. Classical model selection criteria, which use penalized minimum-contrast criteria with fixed penalties, are often based on unrealistic assumptions.

select aims to provide efficient model selection criteria with data-driven penalty terms. In this context, select expects to improve the toolkit of statistical model selection criteria from both theoretical and practical perspectives. Currently, select is focusing its effort on variable selection in statistical learning, hidden-structure models and supervised classification. Its domains of application concern reliability, curves classification, phylogeny analysis and classification in genetics. New developments of select activities are concerned with applications in biostatistics (statistical analysis of medical images) and population genetics.

We learned from the applications we treated that some assumptions which are currently used in asymptotic theory for model selection are often irrelevant in practice. For instance, it is not realistic to assume that the target belongs to the family of models in competition. Moreover, in many situations, it is useful to make the size of the model depend on the sample size which make the asymptotic analysis breakdown. An important aim of select is to propose model selection criteria which take these practical constraints into account.

An important purpose of select is to build and analyze penalized log-likelihood model selection criteria that are efficient when the number of models in competition grows to infinity with the number of observations. Concentration inequalities are a key tool for that purpose and lead to data-driven penalty choice strategies. A major issue of select consists of deepening the analysis of data-driven penalties both from the theoretical and the practical side. There is no universal way of calibrating penalties but there are several different general ideas that we want to develop, including heuristics derived from the Gaussian theory, special strategies for variable selection and using resampling methods.

Choosing a model is not only difficult theoretically. From a practical point of view, it is important to design model selection criteria that accommodate situations in which the data probability distribution P is unknown and which take the model user's purpose into account. Most standard model selection criteria assume that P belongs to one of a set of models, without considering the purpose of the model. By also considering the model user's purpose, we avoid or overcome certain theoretical difficulties and can produce flexible model selection criteria with data-driven penalties. The latter is useful in supervised Classification and hidden-structure models.

The Bayesian approach to statistical problems is fundamentally probabilistic. A joint probability distribution is used to describe the relationships among all the unknowns and the data. Inference is then based on the posterior distribution i.e. the conditional probability distribution of the parameters given the observed data. Exploiting the internal consistency of the probability framework, the posterior distribution extracts the relevant information in the data and provides a complete and coherent summary of post-data uncertainty. Using the posterior to solve specific inference and decision problems is then straightforward, at least in principle.

A key goal of select is to produce methodological contributions in statistics. For this reason, the select team works with applications that serve as an important source of interesting practical problems and require innovative methodologies to address them. Most of our applications involve contracts with industrial partners, e.g. in reliability, although we also have several more academic collaborations, e.g. genomics, genetics and image analysis.

The field of classification for complex data as curves, functions, spectra and time series is important. Standard data analysis questions are being revisited to define new strategies that take the functional nature of the data into account. Functional data analysis addresses a variety of applied problems, including longitudinal studies, analysis of fMRI data and spectral calibration.

We are focusing on unsupervised classification. In addition to standard questions as the choice of the number of clusters, the norm for measuring the distance between two observations, and the vectors for representing clusters, we must also address a major computational problem. The functional nature of the data needs to be design efficient anytime algorithms.

Since several years, select has collaborations with EDF-DER *Maintenance des Risques Industriels* group.
An important theme concerns the resolution of inverse problems using simulation tools to
analyze incertainty in highly complex physical systems.

The other major theme concerns probabilistic modeling in fatigue analysis in the context of a research collaboration with SAFRAN an high-technology group (Aerospace propulsion, Aicraft equipment, Defense Security, Communications).

Moreover, a collaboration has started with Dassault Aviation on modal analysis of mechanical structures, which aims at identifying the vibration behavior of structures under dynamic excitations. From algorithmic view point, modal analysis amounts to estimation in parametric models on the basis of measured excitations and structural responses data. As it appears from literature and existing implementations, the model selection problem attached to this estimation is currently treated by a rather heavy and very heuristic proced ure. The model selection via penalisation tools are intended to be tested on this model selection problem.

Since Yves Rozenholc joins select, we are involved in quantifying tumor microcirculation to monitor treatments in cancer. Dynamic Contrast Enhanced (DCE) imaging provides information on the qualities of a vascular network. It enables biostatisticians to design biomarkers that can be used for diagnosis, prognosis and treatment monitoring. To make available robust tumoral microcirculation biomarkers in DCE imaging, Yves Rozenholc is developing several tools for denoising and clustering the dynamics found in DCE imaging sequences, to realize in the blood flow model, and testing equality of the survival functions coming from two DCE imaging sequences.

Since many years select collaborates with Marie-Laure Martin-Magniette (URGV) for the analysis of genomic data. An important theme of this collaboration is using statistically sound model-based clustering methods to discover groups of co-expressed genes from microarray and high-throughput sequencing data. In particular, identifying biological entities that share similar profiles across several treatment conditions, such as co-expressed genes, may help identify groups of genes that are involved in the same biological processes. Yann Vasseur started a thesis cosupervised by Gilles Celeux and Marie-Laure Martin-Magniette on this topic which is also an interesting investigation domain for the latent block model developed by select. On an other hand, select is involved in ANR “jeunes chercheurs” MixStatSeq directed by Cathy Maugis (INSA Toulouse) wich is concerned with Statistical analysis and clustering of RNASeq genomics data.

A collaboration has started with Pascale Tubert-Bitter, Ismael Ahmed and Mohamed Sedki (Pharmacoepidemiology and Infectious Diseases, PhEMI) for the analysis of pharmacovigilance data. In this framework, the objective is to detect as soon as possible potential associations between some drugs and adverse effects which appeared after the authorisation marketing of these drugs. Instead of working on aggregated data (contingency table) like it is usually the case, the developed approach aims at dealing with the individual data which perhaps give more information. Valerie Robert started a thesis cosupervised by Gilles Celeux and Christine Keribin on this topic which enables to develop a new model based-clusering inspired of the latent block model.

A study has been achieved by Jean-Michel Poggi, Benjamin Auder and Bruno Portier (INSA de Rouen), in the context of a collaboration between AirNormand, Orsay University and INSA of Rouen. It is an application of sequential prediction. To build the prediction, the question is to optimally combine before every term of forecast, the predictions of a set of experts. The study is original not only because of the specific field of application and the adaptation to the concrete context of the work of the air quality monitor in regional agency, but the main originality is that the initial set of experts contains at the same time experts coming from statistical models built by means of different methods and of different predictors and from experts coming from deterministic physico-chemical models. The interest of this kind of sequential prediction method in this specific context is under investigation and the first results on three monitoring stations are promising.

Ancient materials, encountered in archaeology, paleontology and cultural heritage, are often complex, heterogeneous and poorly characterised before their physico-chemical analysis. A technique of choice to gather as much physico-chemical information as possible is spectro-microscopy or spectral imaging where a full spectra, made of more than thousand samples, is measured for each pixel. The produced data is tensorial with two or three spatial dimensions and one or more spectral dimensions and it requires the combination of an «image» approach with «curve analysis» approach. Since 2010 select collaborates with Serge Cohen (IPANEMA) on the development of conditional density estimation through GMM and non-asymptotic model selection to perform stochastic segmentation of such tensorial dataset. This technic enablesx the simultaneous accounting for spatial and spectral information while producing statistically sound information on morphological and physico-chemical aspects of the studied samples.

Mixture model, cluster analysis, discriminant analysis

mixmod is being developed in collaboration with Christophe Biernacki, Florent Langrognet (Université de Franche-Comté) and
Gérard Govaert (Université de Technologie de Compiègne). mixmod (mixture modelling) software fits mixture models to a given data set with either a clustering or a
discriminant analysis purpose.
mixmod uses a large variety of algorithms to estimate mixture parameters, e.g., EM, Classification EM, and Stochastic EM.
They can be combined to create different strategies that lead to a sensible maximum of the likelihood (or completed likelihood) function.
Moreover, different information criteria for choosing a parsimonious model, e.g. the number of mixture component,
some of them favoring either a cluster analysis or a discriminant analysis view point, are included.
Many Gaussian models for continuous variables and multinomial models for discrete variable are available.
Written in C++, mixmod is interfaced with Matlab. The software, the
statistical documentation and also the user guide are available on the
Internet at the following address:
http://

Since 2010, mixmod has a proper graphical user interface. A version of mixmod in R is now available
http://

Erwan Le Pennec with the help of Serge Cohen has proposed a spatial extension in which the mixture weights can vary spatially.

Benjamin Auder contributes to the informatics improvement of mixmod. He implemented an interface to test any mathematical library (Armadillo, Eigen, ...) to replace NEWMAT. He contributed to the continuous integration setup using Jenkins tool and prepared an automated testing framework for unit and non-regression tests.

This year, it has been decide to create mixmodstore which proposes companion programs of mixmod. As a matter of fact, the program MixmodCombi of Jean-Patrick Baudry (Université Paris 6) and Gilles Celeux which allows a hierarchical clustering derived from a mixture has been associated to Rmixmod.

Mixture model, Block cluster analysis,

Blockcluster is a software devoted on model-based block clustering. It is developed by MODAL team (Inria Lille). With Parmeet Bathia (Inria Lille), Vincent Brault has added a Bayesian point of view for the binary, categorial and continuous datas with the variational Bayes algorithm. It ha been enriched by a full Bayesian version using a Gibbs sampler. This Gibbs sampler coulpled with the variaotional Bayes algorithm provides solutions more stable and less dependent of the starting values of the algorithm. An exact expression of criterion ICL has been provided. This criterion or BIC are used for selecting a relevant block clustering.

Unsupervised segmentation is an issue similar to unsupervised classification with an added spatial aspect. Functional data is acquired on points in a spatial domain and the goal is to segment the domain in homogeneous domain. The range of applications includes hyperspectral images in conservation sciences, fMRi data and all spatialized functional data. Erwan Le Pennec and Lucie Montuelle are focusing on the questions of the way to handle the spatial component from both the theoretical and the practical point of views. They study in particular the choice of the number of clusters. Furthermore, as functional data require heavy computation, they are required to propose numerically efficient algorithms. With Serge Cohen and an X intern some progress have been made on the use of logistic weights in the hyperspectral setting.

Lucie Montuelle has studied a model of mixture of Gaussian regressions in which the proportions are modeled using logistic weights. Using maximum likelihood estimators, a model selection procedure has been applied, supported by a theoretical guarantee. Numerical experiments have been conducted for regression mixtures with parametric logistic weights, using EM and Newton algorithms. This work is published in Electronic Journal of Statistics.

Another subject considered by Erwan Le Pennec and Lucie Montuelle was the obtention of oracle inequalities in deviation for model selection aggregation in the fixed design regression framework. Exponential weights are widely used but sub-optimal. They aggregate linear estimators and penalize Stein's unbiased risk estimate used in exponential weights to derive such inequalities. Furthermore if the infinity norm of the regression function is known and taken into account in the penalty, then a sharp oracle inequality is available. Pac-Bayesian tools and concentration inequalities play a key role in this work. These results may be found in a prepublication on arxiv or in Lucie Montuelle's PhD thesis.

In collaboration with Sylvain Arlot, Matthieu Lerasle an Patricia Reynaud-Bourret (CNRS) Nelo Molter Magalhaes considers estimator selection problem
with the

Emilie Devijver and Pascal Massart focused on the Lasso for high dimension finite mixture regression models.
An

Pascal Massart and Clément Levrard continue their work on the properties of the

Among selection methods for nonparametric estimators, a recent one is the procedure of Goldenshluger-Lespki. This method proposes a data-driven choice of

The well-documented and consistent variable selection procedure in model-based cluster analysis and classification, that Cathy Maugis (INSA Toulouse) has designed during her PhD. thesis in select, makes use of stepwise algorithms which are painfully slow in high dimensions. In order to circumvent this drawback, Gilles Celeux in collaboration with Mohammed Sedki (Université Paris XI) and Cathy Maugis), proposed to sort the variables using a lasso-like penalization adapted to the Gaussian mixture model context. Using this rank to select the variables they avoid the combinatory problem of stepwise procedures. After tests on challenging simulated and real data sets, their algorithm finalised and show good performances.

In collaboration with Jean-Michel Marin (Université de Montpellier) and Olivier Gascuel (LIRMM), Gilles Celeux has continued a research
aiming to select a short list of models rather a single model. This short list of models is declared to be compatible with the data using a

Vincent Brault, Ph D. student of Gilles Celeux and Christine Keribin defended his thesis on the Latent Block Model (LBM) for categorical data.
Their work investigated a Gibbs algorithm to avoid solutions with empty clusters on synthetic as well as real data (Congressional Voting Records and genomic data.
They detailed the link between the information criteria ICL and BIC, compared them on synthetic and real data, and conjectured that these criteria are both consistent
for LBM, which is not a standard behavior. Hence, ICL has to be preferred for LBM. This work is now published in *Statistics and Computing*.

Vincent Brault has achieved a detailed bibliographical review on coclustering with Aurore Lomet (UTC) which is currently under revision. He has also worked in collaboration with Mahindra Mariadassou (INRA) to overview the state of the art on theoretical results for latent or stochastic block model.

Vincent Brault, Christine Keribin and Mahindra Mariadassou have started a collaboration to tackle the consistency and asymptotic normality for the maximum likelihood and variational estimators in a stochastic or latent block model.

Gilles Celeux has started a collaboration with Jean-Patrick Baudry on strategies to avoid the traps of the EM algorithm in mixture analysis.
They anayse the effect of the spurious local maximizers and the regulariszed algorithms to avoid these spurious solutions.
They explore the link of the degree of regularization and the slope heuristics.
Moreover, they propose and study strategies to initiate the EM algorithm embedding the solution with

Erwan Le Pennec is supervising Solenne Thivin in her CIFRE with Michel Prenat and Thales Optronique. The aim is target detection on complex background such as clouds or sea. Their approach is a local approach based on test decision theory. They have obtained theoretical and numerical results on a segmentation based approach in which a simple Markov field testing procedure is used in each cell of a data driven partition. They also have obtained experimental results on images (or patches) unsupervised classification, with the aim of better calibrate the detection procedure. The classification is based on features which are defined in cloud texture modeling activity.

Erwan Le Pennec and Michel Prenat have also collaborated on a cloud texture modeling using a non-parametric approach. Such a modeling coud be used to better calibrate the detection procedure: it can lead to more examples than the one acquired and it could be the basis of an ensemble method.

In 2014, in the framework of a CIFRE convention with Snecma-SAFRAN Rémy Fouchereau has defended a thesis on the modeling of fatigue lifetime supervised by Gilles Celeux and Patrick Pamphile. In aircraft, space and nuclear industry, fatigue test is the main basic tool for analyzing fatigue lifetime of a given material, component, or structure. A sample of the material is subjected to cyclic loading S (stress, force, strain, etc.), by a testing machine which counts N, the number of cycles to failure. Fatigue test results are plotted on a SNcurve. A probabilistic model for the construction of SN-curve is proposed. In general, fatigue test results are widely scattered for High Cycle Fatigue region and "duplex" SN-curves appears for Very High Cycle region. That is why classic models from mechanic of rupture theory on one hand, probability theory on the other hand, do not fit SN-curve on the whole range of cycles. We have proposed a probabilistic model, based on a fracture mechanic approach: few parameters are required and they are easily interpreted by mechanic or material engineers.This model has been applied to both simulated and real fatigue test data sets. The SN-curves have been well fitted on the whole range of cycles. The parameters have been estimated using the EM algorithm, combining Newton-Raphson optimisation method and Monte Carlo integral estimations. The model has been then improved taking into account production process information, thanks to a clustering approach. Thus, we have provided engineers with a probabilistic tool for reliability design of mechanical parts, but also with a diagnostic tool for material elaboration.

Since two years SELECT collaborates with CEA for the estimation of the battery State of Charge (SoC). For vehicles powered by an electric motor, SoC estimation is essential to guarantee vehicle autonomy, as well as safe utilization. The aim is to create a reliable SoC model to closely fit the battery dynamic, in embedded applications (e.g. Electric Vehicle). Jana Kalawoun started a thesis supervised by Gilles Celeux, Patrick Pamphile and Maxime Montaru (CEA) on this topic. The SoC is modeled by a Switching Markov State-Space Model. The parameters are estimated by combining the EM algorithm and Particle Filter methods. The model is validated using real-life electric vehicle data. It has been proved to be highly superior to a simple state space model. The optimal number of battery modes is then identified, using different model selection criteria as BIC or the slope heuristics.

Yves Auffray and Gilles Celeux proposed a solution to a reliability problem on Dassault's F7X business jet brakes. As the origin brake version showed poor reliability performance, an increased frequency inspection of the brakes had been decided and, after a while, a new brake version adopted. The new version has not shown any failure since its adoption. Then the question was : is it possible to relax the brakes inspection frequency ?

On the basis of first brake version failure data, the parameters of a Weibull law was estimated :

A Weibull model for the new brakes was then estimated. The shape parameter beeing leaved conservatively unchanged, the scale parameter was estimated so that the no failure event probability amounts to 0.05. This led to

From the resulting Weibull model, dates

Dassault has adopted this far less constraining inspection calendar.

In collaboration with Florence Jaffrezic and Andrea Rau (INRA, animal genetic department), Mélina Gallopin is a third year PhD student under the supervision of Gilles Celeux. This thesis is concerned with the modelization and model selection in the analysis of RNA-seq data. This year, they proposed a model selection criterion for model-based clustering of annotated gene expression data. This criterion is a ICL-like criterion taking into account the annotations. They are also working on a objective comparison of discrete and continuous modelling after a transformations for RNA-seq data based on a comparison of the likelihoods (eventually penalized) of the models in competition.

The subject of Yann Vasseur PhD Thesis, supervised by Gilles Celeux and Marie-Laure Martin-Magniette (INRA URGV), is the inference of a regulatory network on
Transcriptions Factors (TFs), which are specific genes, of *Arabidopsis thaliana*. In that purpose, a transciptome dataset with a sensibly equal size of TFS
and statistical units is available. The first aim consists of reducing the dimension of the network to avoid high dimension difficulties. Representing this network
with a Gaussian Graphical Model, the following procedure has been defined:

*Selection step*: choosing the set of TFs regulators (supports) of each TF.

*Classification step*: deducing co-factors groups (TFs with similary expression levels) from these supports.

Thus, the reduced network would be built on the co-factors groups. Currently, several selection methods based on Gauss-LASSO and resampling procedures have been applied on the dataset. The study of the stability and the parameters calibration of these methods are in progress. The TFs are clustered with the Latent Block Model in a number of co-factors groups selected with the BIC or the exact ICL criterion.

In collaboration with Marie-Laure Martin-Magniette, Cathy Maugis and Andrea Rau, Gilles Celeux studied
gene expression gotten from high-throughput sequencing technology. They focus on the question of clustering
digital gene expression profiles as a means to discover groups of co-expressed genes. They propose a Poisson
mixture model using a rigorous framework for parameter estimation as well as the choice of the appropriate number
of clusters. They illustrate co-expression analyses using this approach on two real RNA-seq datasets.
A set of simulation studies also compares the performance of the proposed model with that of several related
approaches developed to cluster RNA-seq or serial analysis of gene expression data.
The proposed method is implemented in the open-source R package `HTSCluster`, available on CRAN.

In collaboration with Pascale Tubert-Bitter, Ismael Ahmed and Mohamed Sedki, Gilles Celeux and Christine Keribin has started a research concerning the detection of associations between drugs and adverse events in the framework of the PhD of Valerie Robert. At first, this team has developed a model-based clustering inspired of the latent black model which consists in co-clustering rows and columns of two binary tables imposing the same row ranking. Then it enables to highlight subgroups of individuals sharing the same drug profile and subgroups of adverse effects and drugs with strong interaction. Besides, some sufficient conditions are provided to obtain the identifiability of the model and some studies are experimented on simulated data.

In collaboration with Farouk Mhamdi and Meriem Jaidane (ENIT, Tunis, Tunisia), Jean-Michel Poggi proposed a method for trend extraction from seasonal time series through the Empirical Mode Decomposition (EMD). Experimental comparison of trend extraction based on EMD, X11, X12 and Hodrick Prescott filter are conducted. First results show the eligibility of the blind EMD trend extraction method. Tunisian real peak load is also used to illustrate the extraction of the intrinsic trend.

Jean-Michel Poggi, co-supervising with Anestis Antoniadis (Université Joseph Fourier Grenoble) the PhD thesis of Vincent Thouvenot, funded by a CIFRE with EDF. The industrial motivation of this work is the recent development of new technologies for measuring power consumption by EDF to acquire consumption data for different mesh network. The thesis will focus on the development of new statistical methods for predicting power consumption by exploiting the different levels of aggregation of network data collection. From the mathematical point of view, the work is to develop generalized additive models for this type of kind of aggregated data for the modeling of functional data, associating closely nonparametric estimation and variable selection using various penalization methods.

Jean-Michel Poggi and Pascal Massart are the co-advisors of the PhD thesis of Émilie Devijver, strongly motivated by the same kind of industrial forecasting problems in electricity, which is dedicated to curves clustering for the prediction. A natural framework to explore this question is mixture of regression models for functional data. They extend to functional data the recent work by Bühlmann and coauthors dealing with the simultaneous estimation of mixture regression models in the scalar case using Lasso type methods. It is based on the technical tools of the work of Caroline Meynet (which completes her thesis Orsay under the direction of P. Massart), which deals with the clustering of functional data using Lasso methods choosing simultaneously number of clusters and selecting significant wavelet coefficients. Nevertheless, they also propose a procedure dealing with low rank estimator. Simulations and benchmark data have been conducted for high-dimensional finite mixture regression models.

Jean-Michel Poggi, co-supervising with Meriem Jaëdane, Raja Ghozi (ENIT Tunisie) and from the industrial side, Sylvie Sevestre-Ghalila (CEA LinkLab), the PhD thesis of Neska El Haouij, funded by a kind of CIFRE with CEA LinkLab. The industrial motivation of this work is the recent development of new technologies for sensory measurements, environmental and physiological to explain and improve the driving tasks . The thesis aims to explain sensory aspects involved in automated decision to the car interior, by objectivization. The thesis will focus on the use and development of experimental designs and statistical methods to quantify and explain driving ability in to the modeling using functional explanatory factors. Statistical contributions of this work will involve nonparametric estimation and variable selection and/or models.

Yves Rozenholc and C. Keribin work the genomic tumoral alterations and supervised a Master student Yi LIU. The study of genomic DNA alterations (recurrent regions of alteration, patterns of instability) contributes to tumor classification, and becomes of great importance for the personalization of cancer treatments. The use of Single-Nucleotide Polymorphism (SNP) arrays or of New Generation Sequences (NGS) techniques allows the simultaneous estimation of segmented copy number (CN) and B-allele frequency (BAF) profiles along the whole genome. In this context, Popova (2009) proposed the GAP method, based on pattern recognition with (BAF, CN) maps to detect genotype status of each segment in complex tumoral genome profiles. It takes into account the fact that the observations on these maps are necessarily placed on centers that depend –up to a proper scaling of the CN– only on the unknown proportion of non tumoral tissue in the sample. Being deterministic and manually tuned, this method appears sensitive to noise. To overcome this drawback, they set a mixture model, allowing the automatic estimation of the proportion of non tumoral tissue and the test of genotype for each segment along the whole genome. They develop the estimation with an adapted EM algorithm that has been tested on simulated data. This work has already been presented (ERCIM 14, SEQBIO14) and provides many potential developments.

select has a contract with SAFRAN - SNECMA, an high-technology group (Aerospace propul- sion, Aicraft equipment, Defense Security, Communications),regarding modelling reliability of Air- craft Equipment.

select has a contract with Thales Optronique on target detection on complex backgrounds.

Pascal Massart is co-organizing a working group at ENS (Ulm) on Statistical Learning.

Christine Keribin is animating the bimensual rendez-vous SFdS "methods and Software".

Gilles Celeux and Christine Keribin has started a collaboration with the Pharmacoepidemiology and Infectious Diseases (PhEMI, INSERM).

select is participating to the ANR MixStatSeq.

Gilles Celeux is one of the co-organizers of the international Working Group on Model-Based Clustering. This year this workshop took place in Dublin (Ireland).

Yves Rozenholc has been invited at the Department of Statistics of the University of Haifa for three weeks, at the Department of Mathematics of Eindhoven University for one week and at the Institut of statistic, biostatistic and actuarial sciences of the catholic University of Louvain.

Gilles Celeux is one of the co-organizers of the international Working Group on Model-Based Clustering. This year this workshop took place in Dublin (Ireland).

Jean-Michel Poggi was Guest Editor (with R. Kenett, A. Pasanisi) of the special issue on Special Issue on Graphical causality models: Trees, Bayesian Networks and Big Data, in Quality Technology and Quantitative Management (QTQM).

Jean-Michel Poggi was Editor (with A. Antoniadis, X. Brossat) of a Lecture Notes in Statistics: Modeling and Stochastic Learning for Forecasting in High Dimension, Springer.

Jean-Michel Poggi was Organizer and President of the Scientific committee (with R. Kenett, A. Pasanisi) of the ENBIS-SFdS 2014 Spring Meeting on Graphical causality models: Trees, Bayesian Networks and Big Data, IHP, Paris, 9-11 April 2014.

Jean-Michel Poggi was Organizer of the meeting Horizons de la Statistique, Paris, IHP, 21 January 2014.

Jean-Michel Poggi was organizer of the ERCIM 2014 Session ElectricityLoad Forecsting, Pisa, 6-8 December 2014.

Yves Rozenholc was the scientific coordinator and organizer of the third edition of the school “Tumoral Genome Analysis”, 12-19 Mai 2014.

Gilles Celeux is Editor-in-Chief of *Journal de la SFdS*.
He is Associate Editor of *Statistics and Computing*,
*CSBIGS*.

Pascal Massart is Associate Editor of *Annals of Statistics*,
*Confluentes Mathematici*, and *Foundations and Trends in Machine Learning*.

Jean-Michel Poggi is Associate Editor of *Journal of Statistical Software*, *Journal de la SFdS* and *CSBIGS*.

The members of the team reviewed numerous papers for numerous international journals.

All the select members are teaching in various courses of different universities and in particular in the Master 2 “Modélisation stochastique et statistique” of University Paris-Sud.

PhD: Vincent Brault, Estimation et sélection de modèle pour le modèle des blocs latents, Université Paris-Sud, September 2014, Gilles Celeux and Christine Keribin

PhD: Rémi Fouchereau, Modélisation probabiliste des courbes S-N, Université Paris-Sud, March 2014, Gilles Celeux and Patrick Pamphile

PhD: Lucie Montuelle, Inégalités d'oracle et mélanges, Université Paris-Sud, December 2014, Erwan Le Pennec

PhD: Clément Levrard, Quantification vectorielle en grande dimension : vitesses de convergence et sélection de variables, Université Paris-Sud, September 2014, Pascal Massart and Gérard Biau (UPMC)

PhD in progress: Émilie Devivjer, 2012, Pascal Massart and Jean-Michel Poggi

PhD in progress: Jana Kalawoun, 2012, Gilles Celeux et Patrick Pamphile

PhD in progress: Nelo Molter Magalães, 2011, Pascal Massart

PhD in progress: Solenne Thivin, 2012, Erwan Le Pennec

PhD in progress: Valérie Robert, 2013, Gilles Celeux et Christine Keribin

PhD in progress: Yann Vasseur, 2013, Gilles Celeux et Marie-Laure Martin-Magniette (URGV)

PhD in progress: Neska El Haouij, 2014, Jean-Michel Poggi and Meriem Jaïdane, Raja Ghozi (ENIT Tunisie) and Sylvie Sevestre-Ghalila (CEA LinkLab), Thesis ENITUPS

PhD in progress: Vincent Thouvenot, 2012, Jean-Michel Poggi and Anestis Antoniadis (Univ. Joseph Fourier, Grenoble)

Gilles Celeux and Valérie Robert have written an article on statistics in basket-ball to appear in the Journal of the SFdS, special issue 'sport and statistics'.