Section: New Results

Regression, classification, regression methods

Participants : Gérard Biau, Olivier Catoni, Ilaria Giulini.

Metric-based decision procedures

We know now that a good part of the statistical performance of regression and classification algorithms relies on the metric chosen to represent the proximity between the data points. Throughout his work, Gérard Biau became convinced that, well beyond the traditional distances, (dis)similarities and other self-reproducing kernel metrics, it is now necessary to attempt to define proximities generated by the sample itself. These metrics are inevitably random and probabilistic, and force us to rethink the nature of the estimates, as shown for example in the preliminary article [12] .

Unsupervised classification in reproducing kernel Hilbert spaces

In her PhD started in September 2012, Ilaria Giulini uses dimension free estimates of the principal components of an i.i.d. sample of points in a Reproducing Kernel Hilbert Space to derive new unsupervised clustering algorithms based on the idea of dimension reduction by nonlinear coordinate smoothing along aggregated principal components. The dimension free estimates are obtained using PAC-Bayes bounds derived from thresholded exponential moments.