The team is2 aims at doing research in statistical
modelling for industrial and medical fields. It deals with
diagnosis, reliability, industrial failure time analysis, models
for hospital lengths of stay, statistical image analysis and
finance. The chosen methods involve generalized linear models,
hidden structure models identified by stochastic algorithms,
adaptative estimation, conditionally heteroscedastic models for
time series, dynamical systems identification.

Hidden structure models are useful for taking into account
heterogeneity in data. They concern many domains of statistical
methodology (finite mixture analysis, hidden Markov models, random
effect models, ...). Owing to their missing data structure,
they involve specific difficulties for both estimating the model
parameters and assessing its performance. The team is2 is
concerned with research regarding both aspects. We design specific
algorithms for estimating the parameters of missing structure
models and we propose and study specific criteria for choosing the
most relevant missing structure models in several contexts.

Regression aims at modelling the relation between a response variable and some regressors. Multiple regression models propose generalizations of the classical linear model like:

Mixed linear models

Generalized linear models

ARCH models and GLM-ARCH

Generalized linear mixed models.

Our goal here is to propose wavelet based estimators aimed at
characterizing and analyzing scaling laws structures of processes
or systems. The compression/dilation operator, at the core of
wavelet analysis, allows to identify complex scale organizations,
such as mon-fractals), high order
statistics governed by power laws (e.g. multi-fractals), or
more generally cascade type constructions of measures and
processes.

An important domain of application for is2 concerns
Reliability and industrial lifetime analysis. This activity is
developed essentially through collaborations with the EDF research
department and the laboratory LCFR of CEA / Cadarache.

A secondary domain of applications concerns biomedical statistics and molecular biology.

Joint work with Christophe Biernacki and Florent Langrognet (Université de Franche-Comté) and Gérard Govaert (Université de Technologie de Compiègne).

MixMod (Mixture Modelling) software fits multivariate Gaussian
mixtures to a given data set with either a density estimation, a
cluster analysis or a discriminant analysis point of view. This
software is original in three ways.

A large variety of algorithms to estimate the mixture parameters are proposed (EM, Classification EM, Stochastic EM) and it is possible to combine them to lead to different strategies to get a sensible maximum of the likelihood function.

Moreover, 28 different mixture models can be considered according to different assumptions on the component variance matrix eigenvalue decomposition.

Finally, different information criteria for choosing a parsimonious model, some of them favoring a cluster analysis view point, are included.

Written in C++, MixMod is easily interfaced with
Scilab and Matlab. It can be downloaded at the following URL: http://www-math.univ-fcomte.fr/MIXMOD/index.htm.

The Extremes software is a toolbox dedicated to the
modelling of extremal events offering extreme quantile estimation
procedures and model selection methods

Joint work with Jean-Michel Marin and Christian Robert (Ceremade, Paris Dauphine).

Missing variable models are typical benchmarks for new computational techniques in that the ill-posed nature of missing variable models offers a characteristic testing ground for these techniques. Their special features also allow for an easier calibration of most computational techniques, by virtue of the data completion they naturally offer. The potential of this approach and its specifics in missing data problems has been studied in settings of increasing difficulty, in comparison with existing approaches.

Joint work with Jorge Marques and Jacinto Nascimento (ISR-IST, Lisbonne).

Object tracking problems can be faced by trying to describe the motion of the object with a set of significant dynamical models capable to describe it in a reliable way. We propose an hidden Markov model for object tracking. An exact technique to estimate the models involved in the objects motion has been conceived. Learning is achieved via an EM algorithm. We extend the study to determine a reliable number of models when the motion contains model uncertainty with BIC criterion.

Joint work with Yann Guédon (Cirad - Montpellier).

Preliminary studies show that the statistical modelling of the whole tree growth period with a single linear mixed model was not satisfactory: the model was difficult to interpret because of its high number of parameters. Moreover, the exploratory analysis of tree growth data shows that tree growth period is made up of a succession of growth phases. Consequently, we are developing a new family of statistical models : mixed multiphasic models. They result from the combination of two types of models:

- a Markov chain which models the succession of growth phases,

- linear mixed models associated with each state of the underlying Markov chain, each linear mixed model modelling a growth phase, with climatic covariables and individual random effects.

Joint work with Frédérique Letué (Grenoble).

We organized a working group to compare classical model selection criteria as BIC with Random criteria proposed by Birgé and Massart from a non asymptotic point of view. E. Labarbier has studied, with this approach, the problem of detecting the change-points in mean of a signal corrupted by an additive Gaussian noise. Moreover, a procedure for selecting the number of the clusters by extending the results of E. Lebarbier, in a non supervised context is in progress.

Joint work with Mike Titterington (Glasgow, Scotland) and Christian Robert, (CEREMADE, Paris Dauphine).

The deviance information criterion (DIC) introduced by Spiegelhater et al. is directly inspired by linear and generalized linear models, but it is not so naturally defined for missing data models. We have reassessed the criterion for such models, testing the behavior of various extensions in the cases of mixture and random effect models.

Linear Discriminant Analysis is a reference method in supervised classification. In cases where it performs poorly, alternative methods are required. We propose a method based on estimating the density of each group using a mixture of spherical Gaussian distributions. The features of this model are flexibility, simplicity and parsimony. Moreover, we propose choosing the numbers of mixture components with a Bayesian Entropy Criterion which is a penalized likelihood criterion taking into account the classification task.

With the Ensimag student, Nicolas Bousquet, we studied this year Bayesian Inference for the Bertholon distribution, a competing risk model involving an Exponential and a Weibull distributions. This Bayesian inference has been performed through importance sampling schemes taking profit of the missing structure data of the model.

We studied simulated scenarii to detect the reasons of flaws or incidents on nuclear plants via statistical inference. We essentially made use of mixture analysis and statistical tests. We used expert opinions to decide between alternative flaw scenarii.

Joint work with Mhamed El Aroui (ISG, Tunis).

We introduce a quasi-conjugate Bayes approach for estimating
Generalized Pareto Distribution (GPD) parameters, distribution tails and
extreme quantiles within the Peaks-Over-Threshold framework

Joint work with Nicolas Devictor (CEA - Cadarache).

During this second year of J. Jacques thesis, two research orientations, which were exhibit during the first year, have been explored. The first concern the impact of model uncertainty on results of sensitivity analysis. This uncertainty, which can be due to the use of a simplified model, or to a process mutation, is considered like a mutation of the start model.we have therefore listed all possible mutations, and analyzed the impact on sensitivity indices. Applications are in progress. The second research orientation concerns sensitivity analysis for models with non independent inputs. Investigations have led to introduce multidimensional sensitivity indices, which are under study.

Joint work with Alain Viari (Inria Rhône-Alpes) and Eduardo Rocha, (Institut Pasteur-ABI, Paris).

We investigated a part of the exploratory analysis of bacterial genomes, beyond gene detection. Our goal is to link proximities among genes on the chromosome with genetic mechanisms of the cell. We reviewed the main work in progress on the subject in order to suggest a formalism best possible suited. We focused on the notion of neighborhood in a broad sense which leads to some specific mathematical tools like renewal process. We certainly need to interact with biologists who are the only in a position to judge if our work is judicious.

Data variability can be important in micro-array data analysis. Thus, when clustering gene expression profiles, it could be judicious to make use of repeated data. In this work, the problem of analyzing repeated data in the model-based cluster analysis context is considered. Linear mixed models are chosen to take into account data variability. A mixture of these models are considered. This leads to a large range of possible models depending on the assumptions made on both the covariance structure of the observations and the mixture model. The maximum likelihood estimation of this family of models through the EM algorithm is presented. The problem of selecting a particular mixture of linear mixed models is considered using penalized likelihood criteria. Illustrative Monte Carlo experiments are presented and an application to the clustering of gene expression profiles is detailed. All those experiments highlight the interest of linear mixed model mixtures to take into account data variability in a cluster analysis context.They also show encouraging results of the BIC criterion for selecting a relevant model.

Joint work with Pierre Pollak (Department of Neurology, Grenoble University Hospital).

Detailed title : Double-blind multicentric study of bilateral subthalamic nucleus deep brain stimulation in Parkinson's disease.

After the introduction of levodopa therapy, many patients with idiopathic Parkinson's disease (PD) develop progressive disabling motor complications whose clinical, social and economic impacts impair health-related quality of life (HR-QOL). Deep brain stimulation of the subthalamic nucleus (STN) is an alternative surgical therapy to medical treatment but its initial high cost has limited its diffusion for many years. The French SPARK study group conducted a prospective multicentric study of STN stimulation in advanced PD in order to assess safety, efficiency but also social and economic impacts of this technique.

Joint work with Christophe Lenoir and Bernard Swynghedauw (Inserm, Paris).

This study deals with the heart beat rate analysis of mammalians. In particular, we investigate the role of the autonomous nerve system in mice hearts. In this direction, we have been led to identify and characterize the action of pharmacological autonomic blockades (propranolol and atropine) on the baseline heart rate. To cope with the nature of the experimental setup (repeated and incomplete measures), we then resorted to statistical mixed effect models. An article presenting the results of our study has been submitted to Cardiovascular Research.

Joint work with Serge Iovleff (Lille).

We focus on nonlinear PCA based on manifold approximation of the
set of points introduced in

Joint work with Anatoli Iouditski (Imag, Grenoble) ; Pierre Jacob, Ludovic Menneteau (Montpellier) and Alexandre Nazin (IPU, Moscou, Russie).

The first part of our work consists in building nonparametric
estimates of the boundary of some support based on the extreme
values of the sample

Joint work with P. Flandrin (CNRS, ENS Lyon) and P. Oliveira (IST-ISR, Lisbonne).

Empirical Mode Decomposition (EMD) is a complex signal analysis
algorithm, recently proposed by Huang et al. (1998). Because so
far no theory seems appropriate to mathematically formalize the
concept of EMD, only extensive numerical simulations have been
performed in order to assess the method

Joint work with Rudolf Riedi (Rice university, Houston (TX), USA).

In this study, we theoretically stated and demonstrated the relation existing between the decay of a fat tail probability distribution of a random variable, and the finiteness of its high order moments. We also proposed an efficient and simple wavelet based estimator for determining the bounds of existing moments, given a finite sample size of unknown random variables.

Joint work with Anestis Andoniadis (Imag - Grenoble) and Andrey Feuerverger (University of Toronto, Canada).

In this work, we exploited the sparsity of wavelet representations
to propose an

The purpose of our work is the development of dynamic factor models for multivariate financial time series, and the incorporation of stochastic volatility components for latent factor processes. The models are direct generalizations of univariate stochastic volatility models, and represent specific varieties of models recently discussed in the growing multivariate stochastic volatility literature.

This ``CRECO'' contract with the Reliability group of EDF R&D Chatou concerned Bayesian modelling and inference through MCMC methods of the statistical distributions of flaws PWR vessels.

This ``CRECO'' contract with the Reliability group of EDF R&D Chatou concerned statistical inference regarding the study of material features provoking flaws in nuclear equipments.

This contract with the Reliability group of EDF R&D Chatou
concerned statistical inference for extremal events. It funded the
development of the Extremes software.

This contract with the LCFR (Laboratoire de Conduite et Fiabilité des Réacteurs) of CEA/Cadarache/DER concerned sensitivity analysis and model uncertainty. It funded during three years the thesis of Julien Jacques.

is2 participates in the weekly statistical seminar of
Grenoble, G. Celeux is one of the organizers and several lecturers
have been invited in this context.

P. Gonçalvès is with two thematic regional programs:
« Application de l'Analyse en Ondelettes à l'Acoustique et à
la Turbulence » headed by V. Perrier (ensimag-inpg) and
« Diagnostic Acoustique de la Vorticité dans les Écoulements
Turbulents » headed by C. Baudet (legi-ujf). Both programs
have reached their third and last year of existence.

P. Gonçalvès is a member of the research imag group,
entitled « Analyse Multirésolution, Ondelettes et
Applications », headed by V. Perrier (ensimag-inpg).

The activities of the fima group of reliability continued.It is a collaboration between LMC-SMS and is2.

(

G. Celeux left Inria Rhône-Alpes to join Inria Futurs on
September 1st. He is aiming to create in Orsay a new Inria team
in association with the Statistical and Probability team of the
Mathematical Department of the University Paris-Sud. This Inria
Team, called select, will be essentially involved with model
selection in statistical learning.

C. Lavergne is member of the "Institut de Mathématiques et de Modélisation", Montpellier, UMR CNRS 5149. He supervised a PhD on the generalized linear mixed model and choice of criteria selection.

P. Gonçalvès has a collaboration with the U572 team of Inserm at the Lariboisière hospital (Paris). In particular, he
co-supervised a PhD work on the mice heart rate analysis.

P. Gonçalvès is since September 1st on leave at Instituto de Sistemas e Robótica of Instituto Superior
Tecnico, Lisbon (Portugal).

G. Celeux has a collaboration with this institution and he was referee of the PhD Thesis of Jacinto Nascimento in January 2003.

G. Celeux has joint work with M. Tiiterington (Glasgow University).

P. Gonçalves has joint work with:

- Riedi (Rice Univ., USA)

- A. Feuerverger (Univ. of Toronto, CA).

P. Gonçalvès is an Associate Editor of IEEE Signal
Processing Letters.

P. Gonçalvès is co-organizing the "Wavelet And Multifractal Analysis" summer school to be held in Cargèse (Corsica, France) from July 19th to 31st, 2004.

G. Celeux was invited lecturer for the INSERM workshop on
``Statistical methods for microarray data'' (Lalonde, May 2003).

G. Celeux lectured multidimensional statistics in the dea
MIMB,
UJF university of Grenoble.

P. Gonçalvès lectured a graduate course on Time-Frequency and Multi-resolution Analysis at ENSERG.

G. Celeux was invited speaker at the CLADAG meeting in Bologna (September 2003).