## Section:
New Results2>
### Mixture models3>
#### Taking into account the curse of dimensionality4>

#### Taking into account the curse of dimensionality4>

Participant : Stéphane Girard.

**Joint work with:** Bouveyron, C. (Université Paris 1), Fauvel, M. (ENSAT Toulouse)

In the PhD work of Charles Bouveyron (co-advised by Cordelia Schmid from the Inria LEAR team) [53] , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:

the introduction in the model of a dimension reduction constraint for each group

the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters

This modelling yields a new supervised classification method called High Dimensional Discriminant Analysis (HDDA) [4] . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named High Dimensional Data Clustering (HDDC) [3] . Also, the description of the R package is published in [11] . Our recent work consists in adding a kernel in the previous methods to deal with nonlinear data classification [27] , [45] .

#### Robust mixture modelling using skewed multivariate distributions with variable amounts of tailweight4>

Participants : Florence Forbes, Darren Wraith.

Clustering concerns the assignment of each of , possibly multidimensional, observations to one of groups. A popular way to approach this task is via a parametric finite mixture model. While the vast majority of the work on such mixtures has been based on Gaussian mixture models in many applications the tails of normal distributions are shorter than appropriate or parameter estimations are affected by atypical observations (outliers). In such cases, the multivariate student distribution is motivated as a heavy-tailed alternative to the multivariate Gaussian distribution. The additional flexibility of the multivariate comes from introducing an additional degree of freedom parameter () which can be viewed as a robust tuning parameter.

A useful representation of the -distribution is as a so-called
*infinite mixture of scaled Gaussians* or *Gaussian scale
mixture*,

where denotes the -dimensional Gaussian distribution with mean and covariance and is the probability distribution of a univariate positive variable referred to as the weight variable. When is a Gamma distribution where denotes the degrees of freedom, we recover the multivariate distribution. The weight variable in this case effectively acts to govern the tail behaviour of the distributional form from light tails () to heavy tails () depending on the value of .

For many applications, the distribution of the data may also be
highly asymmetric in addition to being heavy tailed (or affected
by outliers). A natural extension to the Gaussian scale mixture
case is to consider *location ***and*** scale Gaussian
mixtures* of the form,

where is an additional -dimensional vector parameter
for skewness and the determinant of equals 1 for
parameter identifiability. When is a Generalized Inverse
Gaussian distribution (, we recover
the family of Generalized Hyperbolic (GH) distributions. Depending
on the parameter choice for the GIG, special cases of the GH
family, include: the multivariate GH distribution with hyperbolic
margins (); the normal inverse Gaussian distribution
(); the multivariate hyperbolic
() distribution; the hyperboloid
distribution (); the hyperbolic skew-t distribution
(); and the normal gamma distribution
() ) amongst others. For applied problems,
the most popular of these forms appears to be the Normal Inverse
Gaussian (NIG) distribution, with extensive use in financial
applications. Another distributional form allowing for skewness
and heavy or light tails includes different forms of the
multivariate skew-.
Most of
these distributional forms are also able to be represented as
*location and scale Gaussian mixtures*.

Although the above approaches provide for great flexibility in modelling data of highly asymmetric and heavy tailed form the above approaches assume to be a univariate distribution and hence each dimension is governed by the same amount of tailweight. There have been various approaches to address this issue in the statistics literature for both symmetric and asymmetric distributional forms. In his work, [66] proposes a dependent bivariate -distribution with marginals of different degrees of freedom but the tractability of the extension to the multivariate case is unclear. Additional proposals are reviewed in chapters 4 and 5 of [67] but these formulations tend to be appreciably more complicated, often already in the expression of the probability density function. Increasingly, there has been much research on copula approaches to account for flexible distributional forms but the choice as to which one to use in this case and the applicability to (even) moderate dimensions is also not clear. In general the papers take various approaches whose relationships have been characterized in the bivariate case by [73] . However, most of the existing approaches suffer either from the non-existence of a closed-form pdf or from a difficult generalization to more than two dimensions.

In this work, we show that the location and scale mixture
representation can be further explored and propose a framework
that is considerably simpler than those previously proposed with
distributions exhibiting interesting properties. Using the normal
inverse Gaussian distribution (NIG) as an example, we extend the
standard *location and scale mixture of Gaussian
representation* to allow for the tail behaviour to be set or
estimated differently in each dimension of the variable space. The
key elements of the approach are the introduction of
multidimensional weights and a decomposition of the matrix
in (6 ) which facilitates the separate
estimation and also allows for arbitrary correlation between
dimensions.
We outline an approach for maximum
likelihood estimation of the parameters via the EM algorithm and
explore the performance of the approach on several simulated and
real data sets in the context of clustering.

#### Robust clustering for high dimensional data4>

Participants : Florence Forbes, Darren Wraith, Minwoo Lee.

For a clustering problem, a parametric mixture model is one of the popular approaches. Most of all, Gaussian mixture models are widely used in various fields of study such as data mining, pattern recognition, machine learning, and statistical analysis. The modeling and computational flexibility of the Gaussian mixture model makes it possible to model a rich class of density, and provides a simple mathematical form of cluster models.

Despite the success of Gaussian mixtures, the parameter estimations can be severely affected by outliers. By adding an additional degrees of freedom (dof) parameter, a robustness tuning parameter, the robust improvement in clustering has been achieved. Although adopting distribution loses the closed-form solution, it is still tractable by representing distribution as Gaussian scale mixture (GSM), which consists of a Gaussian random vector that is weighted by a hidden scaling variable. Recent work that uses the multivariate distribution has showed the improved robustness.

Along with robustness from distribution, for the practical use, efficient handling of a high dimensional data is critical. High dimensional data often make most of clustering methods perform poorly. To overcome the curse of dimensionality, Bouveyron et al. [54] proposed the model-based high dimensional data clustering (HDDC). HDDC searches the intrinsic dimension of each class with the BIC criterion or the scree-test of Cattell; this allows them to limit the number of parameters by taking into account only the specific subspace that each class is located. The parameterization makes HDDC not only computationally efficient but robust with respect to the ill-conditioning or the singularity of empirical covariance matrix.

This work proposes an approach that combines robust clustering with the HDDC. The use of the mixture of multivariate distribution on the basis of HDDC develops robust high dimensional clustering methods that can capture various kinds of density models. Further, extending the mixture model with multiple distributions for each dimension, we propose more flexible model that can be applicable to various data. We suggest a model-based approach for this method.

#### Partially Supervised Mapping: A Unified Model for Regression and Dimensionality Reduction4>

Participant : Florence Forbes.

**Joint work with:** Antoine Deleforge and Radu Horaud from the
Inria Perception team.

We cast dimensionality reduction and regression in a unified
latent variable model. We propose a two-step strategy consisting
of characterizing a non-linear *reversed* output-to-input
regression with a generative piecewise-linear model, followed by
Bayes inversion to obtain an output density given an input. We
describe and analyze the most general case of this model, namely
when only some components of the output variables are observed
while the other components are latent. We provide two EM inference
procedures and their initialization. Using simulated and real
data, we show that the proposed method outperforms several
existing ones.

#### Variational EM for Binaural Sound-Source Separation and Localization4>

Participant : Florence Forbes.

**Joint work with:** Antoine Deleforge and Radu Horaud from the
Inria Perception team.

We addressed the problem of sound-source separation and localization in real-world conditions with two microphones. Both tasks are solved within a unified formulation using supervised mapping. While the parameters of the direct mapping are learned during a training stage that uses sources emitting white noise (calibration), the inverse mapping is estimated using a variational EM formulation. The proposed algorithm can deal with natural sound sources such as speech which are known to yield sparse spectrograms, and is able to locate multiple sources both in azimuth and in elevation. Extensive experiments with real data show that the method outperform state-of-the-art both in separation and localization.