EN FR
EN FR


Section: New Results

Mixture models

Taking into account the curse of dimensionality

Participant : Stéphane Girard.

Joint work with: Bouveyron, C. (Université Paris 1), Fauvel, M. (ENSAT Toulouse)

In the PhD work of Charles Bouveyron (co-advised by Cordelia Schmid from the Inria LEAR team)  [53] , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:

  • the introduction in the model of a dimension reduction constraint for each group

  • the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters

This modelling yields a new supervised classification method called High Dimensional Discriminant Analysis (HDDA) [4] . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named High Dimensional Data Clustering (HDDC) [3] . Also, the description of the R package is published in [11] . Our recent work consists in adding a kernel in the previous methods to deal with nonlinear data classification [27] , [45] .

Robust mixture modelling using skewed multivariate distributions with variable amounts of tailweight

Participants : Florence Forbes, Darren Wraith.

Clustering concerns the assignment of each of N, possibly multidimensional, observations y 1 ,...,y N to one of K groups. A popular way to approach this task is via a parametric finite mixture model. While the vast majority of the work on such mixtures has been based on Gaussian mixture models in many applications the tails of normal distributions are shorter than appropriate or parameter estimations are affected by atypical observations (outliers). In such cases, the multivariate student t distribution is motivated as a heavy-tailed alternative to the multivariate Gaussian distribution. The additional flexibility of the multivariate t comes from introducing an additional degree of freedom parameter (dof) which can be viewed as a robust tuning parameter.

A useful representation of the t-distribution is as a so-called infinite mixture of scaled Gaussians or Gaussian scale mixture,

p(y;μ,Σ,θ)= 0 𝒩 M (y;μ,Σ/w)f W (w;θ)dw(5)

where 𝒩 M (.;μ,Σ/w) denotes the M-dimensional Gaussian distribution with mean μ and covariance Σ/w and f W is the probability distribution of a univariate positive variable W referred to as the weight variable. When f W is a Gamma distribution 𝒢(ν/2,ν/2) where ν denotes the degrees of freedom, we recover the multivariate t distribution. The weight variable W in this case effectively acts to govern the tail behaviour of the distributional form from light tails (ν) to heavy tails (ν0) depending on the value of ν.

For many applications, the distribution of the data may also be highly asymmetric in addition to being heavy tailed (or affected by outliers). A natural extension to the Gaussian scale mixture case is to consider location and scale Gaussian mixtures of the form,

p(y;μ,Σ,θ)= 0 𝒩 M (y;μ+wβΣ,wΣ)f W (w;θ)dw,(6)

where β is an additional M-dimensional vector parameter for skewness and the determinant of Σ equals 1 for parameter identifiability. When f W is a Generalized Inverse Gaussian distribution (GIG(y;λ,δ,γ), we recover the family of Generalized Hyperbolic (GH) distributions. Depending on the parameter choice for the GIG, special cases of the GH family, include: the multivariate GH distribution with hyperbolic margins (λ=1); the normal inverse Gaussian distribution (λ=-1/2); the multivariate hyperbolic (λ=M+1 2) distribution; the hyperboloid distribution (λ=0); the hyperbolic skew-t distribution (λ=-ν,γ=0); and the normal gamma distribution (λ>0,μ=0,δ=0) ) amongst others. For applied problems, the most popular of these forms appears to be the Normal Inverse Gaussian (NIG) distribution, with extensive use in financial applications. Another distributional form allowing for skewness and heavy or light tails includes different forms of the multivariate skew-t. Most of these distributional forms are also able to be represented as location and scale Gaussian mixtures.

Although the above approaches provide for great flexibility in modelling data of highly asymmetric and heavy tailed form the above approaches assume f W to be a univariate distribution and hence each dimension is governed by the same amount of tailweight. There have been various approaches to address this issue in the statistics literature for both symmetric and asymmetric distributional forms. In his work, [66] proposes a dependent bivariate t-distribution with marginals of different degrees of freedom but the tractability of the extension to the multivariate case is unclear. Additional proposals are reviewed in chapters 4 and 5 of [67] but these formulations tend to be appreciably more complicated, often already in the expression of the probability density function. Increasingly, there has been much research on copula approaches to account for flexible distributional forms but the choice as to which one to use in this case and the applicability to (even) moderate dimensions is also not clear. In general the papers take various approaches whose relationships have been characterized in the bivariate case by [73] . However, most of the existing approaches suffer either from the non-existence of a closed-form pdf or from a difficult generalization to more than two dimensions.

In this work, we show that the location and scale mixture representation can be further explored and propose a framework that is considerably simpler than those previously proposed with distributions exhibiting interesting properties. Using the normal inverse Gaussian distribution (NIG) as an example, we extend the standard location and scale mixture of Gaussian representation to allow for the tail behaviour to be set or estimated differently in each dimension of the variable space. The key elements of the approach are the introduction of multidimensional weights and a decomposition of the matrix Σ in (6 ) which facilitates the separate estimation and also allows for arbitrary correlation between dimensions. We outline an approach for maximum likelihood estimation of the parameters via the EM algorithm and explore the performance of the approach on several simulated and real data sets in the context of clustering.

Robust clustering for high dimensional data

Participants : Florence Forbes, Darren Wraith, Minwoo Lee.

For a clustering problem, a parametric mixture model is one of the popular approaches. Most of all, Gaussian mixture models are widely used in various fields of study such as data mining, pattern recognition, machine learning, and statistical analysis. The modeling and computational flexibility of the Gaussian mixture model makes it possible to model a rich class of density, and provides a simple mathematical form of cluster models.

Despite the success of Gaussian mixtures, the parameter estimations can be severely affected by outliers. By adding an additional degrees of freedom (dof) parameter, a robustness tuning parameter, the robust improvement in clustering has been achieved. Although adopting t distribution loses the closed-form solution, it is still tractable by representing t distribution as Gaussian scale mixture (GSM), which consists of a Gaussian random vector that is weighted by a hidden scaling variable. Recent work that uses the multivariate t distribution has showed the improved robustness.

Along with robustness from t distribution, for the practical use, efficient handling of a high dimensional data is critical. High dimensional data often make most of clustering methods perform poorly. To overcome the curse of dimensionality, Bouveyron et al. [54] proposed the model-based high dimensional data clustering (HDDC). HDDC searches the intrinsic dimension of each class with the BIC criterion or the scree-test of Cattell; this allows them to limit the number of parameters by taking into account only the specific subspace that each class is located. The parameterization makes HDDC not only computationally efficient but robust with respect to the ill-conditioning or the singularity of empirical covariance matrix.

This work proposes an approach that combines robust clustering with the HDDC. The use of the mixture of multivariate t distribution on the basis of HDDC develops robust high dimensional clustering methods that can capture various kinds of density models. Further, extending the mixture model with multiple t distributions for each dimension, we propose more flexible model that can be applicable to various data. We suggest a model-based approach for this method.

Partially Supervised Mapping: A Unified Model for Regression and Dimensionality Reduction

Participant : Florence Forbes.

Joint work with: Antoine Deleforge and Radu Horaud from the Inria Perception team.

We cast dimensionality reduction and regression in a unified latent variable model. We propose a two-step strategy consisting of characterizing a non-linear reversed output-to-input regression with a generative piecewise-linear model, followed by Bayes inversion to obtain an output density given an input. We describe and analyze the most general case of this model, namely when only some components of the output variables are observed while the other components are latent. We provide two EM inference procedures and their initialization. Using simulated and real data, we show that the proposed method outperforms several existing ones.

Variational EM for Binaural Sound-Source Separation and Localization

Participant : Florence Forbes.

Joint work with: Antoine Deleforge and Radu Horaud from the Inria Perception team.

We addressed the problem of sound-source separation and localization in real-world conditions with two microphones. Both tasks are solved within a unified formulation using supervised mapping. While the parameters of the direct mapping are learned during a training stage that uses sources emitting white noise (calibration), the inverse mapping is estimated using a variational EM formulation. The proposed algorithm can deal with natural sound sources such as speech which are known to yield sparse spectrograms, and is able to locate multiple sources both in azimuth and in elevation. Extensive experiments with real data show that the method outperform state-of-the-art both in separation and localization.