Section: New Results
LargeMargin Metric Learning for Partitioning Problems
Participants : Rémi Lajugie [correspondent] , Sylvain Arlot, Francis Bach.
In [31] we consider unsupervised partitioning problems, such as clustering, image segmentation, video segmentation and other changepoint detection problems. We focus on partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, which include meanbased changepoint detection, Kmeans, spectral clustering and normalized cuts. Our main goal is to learn a Mahalanobis metric for these unsupervised problems, leading to feature weighting and/or selection. This is done in a supervised way by assuming the availability of several potentially partially labelled datasets that share the same metric. We cast the metric learning problem as a largemargin structured prediction problem, with proper definition of regularizers and losses, leading to a convex optimization problem which can be solved efficiently with iterative techniques. We provide experiments where we show how learning the metric may significantly improve the partitioning performance in synthetic examples, bioinformatics, video segmentation and image segmentation problems.
Unsupervised partitioning problems are ubiquitous in machine learning and other dataoriented fields such as computer vision, bioinformatics or signal processing. They include (a) traditional unsupervised clustering problems, with the classical Kmeans algorithm, hierarchical linkage methods [61] and spectral clustering [80] , (b) unsupervised image segmentation problems where two neighboring pixels are encouraged to be in the same cluster, with meanshift techniques [51] or normalized cuts [84] , and (c) changepoint detection problems adapted to multivariate sequences (such as video) where segments are composed of contiguous elements, with typical windowbased algorithms [54] and various methods looking for a change in the mean of the features (see, e.g., [49] ).
All the algorithms mentioned above rely on a specific distance (or more generally a similarity measure) on the space of configurations. A good metric is crucial to the performance of these partitioning algorithms and its choice is heavily problemdependent. While the choice of such a metric has been originally tackled manually (often by trial and error), recent work has considered learning such metric directly from data. Without any supervision, the problem is illposed and methods based on generative models may learn a metric or reduce dimensionality (see, e.g., [53] ), but typically with no guarantees that they lead to better partitions. In this paper, we follow [41] , [95] , [40] and consider the goal of learning a metric for potentially several partitioning problems sharing the same metric, assuming that several fully or partially labelled partitioned datasets are available during the learning phase. While such labelled datasets are typically expensive to produce, there are several scenarios where these datasets have already been built, often for evaluation purposes. These occur in video segmentation tasks, image segmentation tasks as well as changepoint detection tasks in bioinformatics (see [62] ).
W consider partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, which include Kmeans, spectral clustering and normalized cuts, and meanbased changepoint detection. We make the following contributions:

We review and unify several partitioning algorithms, and cast them as the maximization of a linear function of a rescaled equivalence matrix, which can be solved by algorithms based on spectral relaxations or dynamic programming.

Given fully labelled datasets, we cast the metric learning problem as a largemargin structured prediction problem, with proper definition of regularizers, losses and efficient lossaugmented inference.

Given partially labelled datasets, we propose an algorithm, iterating between labeling the full datasets given a metric and learning a metric given the fully labelled datasets. We also consider extensions that allow changes in the full distribution of univariate time series (rather than changes only in the mean), with application to bioinformatics.

We provide experiments where we show how learning the metric may significantly improve the partitioning performance in synthetic examples, video segmentation and image segmentation problems.