## Section: New Results

### Recent results on sparse representations

Sparse approximation, high dimension, scalable algorithms, dictionary design, sample complexity

The team has had a substantial activity ranging from theoretical results to algorithmic design and software contributions in the field of sparse representations, which is at the core of the ERC project PLEASE (projections, Learning and Sparsity for Efficient Data Processing, see Section 8.2.1.1 ).

#### A new framework for sparse representations: analysis sparse models

Participants : Rémi Gribonval, Nancy Bertin, Srdan Kitic, Cagdas Bilen, Laurent Albera.

In the past decade there has been a great interest in a synthesis-based model for signals, based on
sparse and redundant representations. Such a model assumes that the signal of interest can be composed
as a linear combination of *few* columns from a given matrix (the dictionary). An alternative
*analysis-based* model can be envisioned, where an analysis operator multiplies the signal,
leading to a *cosparse* outcome. Within the SMALL FET-Open project, we initiated a research programme
dedicated to this analysis model, in the context of a generic missing data problem (e.g., compressed
sensing, inpainting, source separation, etc.). We obtained a uniqueness result for the solution of
this problem, based on properties of the analysis operator and the measurement matrix. We also considered
a number of pursuit algorithms for solving the missing data problem, including an ${\ell}^{1}$-based and a new
greedy method called GAP (Greedy Analysis Pursuit). Our simulations demonstrated the appeal of the
analysis model, and the success of the pursuit techniques presented.

These results have been published in conferences and in a journal paper [100] . Other algorithms based on iterative cosparse projections [83] as well as extensions of GAP to deal with noise and structure in the cosparse representation have been developed, with applications to toy MRI reconstruction problems and acoustic source localization and reconstruction from few measurements [101] .

Successful applications of the cosparse approach to sound source localization, audio declipping and brain imaging have been developed this year. In particular, we compared the performance of several cosparse recovery algorithms in the context of sound source localization [97] and showed its efficiency in situations where usual methods fail ([37] , see paragraph 6.6.3 ). It was also shown to be applicable to the hard declipping problem [49] . Application to EEG brain imaging was also investigated and a paper was published in MLSP14 [28] (see paragraph 6.6.4 ).

#### Theoretical results on sparse representations

Participants : Rémi Gribonval, Anthony Bourrier, Pierre Machart, Yann Traonmilin, Gilles Puy.

Main collaboration: Mike Davies (University of Edinburgh), Patrick Perez (Technicolor R&I France), Tomer Peleg (The Technion)

**Fundamental performance limits for ideal decoders in high-dimensional linear inverse problems:**
The primary challenge in linear inverse problems is to design stable and robust "decoders" to reconstruct high-dimensional vectors from a low-dimensional observation through a linear operator. Sparsity, low-rank, and related assumptions are typically exploited to design decoders which performance is then bounded based on some measure of deviation from the idealized model, typically using a norm.
We characterized the fundamental performance limits that can be expected from an ideal decoder given a general model, ie, a general subset of "simple" vectors of interest. First, we extended the so-called notion of instance optimality of a decoder to settings where one only wishes to reconstruct some part of the original high dimensional vector from a low-dimensional observation. This covers practical settings such as medical imaging of a region of interest, or audio source separation when one is only interested in estimating the contribution of a specific instrument to a musical recording. We defined instance optimality relatively to a model much beyond the traditional framework of sparse recovery, and characterized the existence of an instance optimal decoder in terms of joint properties of the model and the considered linear operator [106] , [105] . This year, noiseless and noise-robust settings were both considered in the journal paper [16] . We showed somewhat surprisingly that the existence of noise-aware instance optimal decoders for all noise levels implies the existence of a noise-blind decoder. A consequence of our results is that for models that are rich enough to contain an orthonormal basis, the existence of an L2/L2 instance optimal decoder is only possible when the linear operator is not substantially dimension-reducing. This covers well-known cases (sparse vectors, low-rank matrices) as well as a number of seemingly new situations (structured sparsity and sparse inverse covariance matrices for instance). We exhibit an operator-dependent norm which, under a model-specific generalization of the Restricted Isometry Property (RIP), always yields a feasible instance optimality and implies instance optimality with certain familiar atomic norms such as the ${\ell}^{1}$ norm. Current work explores the existence of convex decoders for general union of subspaces models under generalized RIP assumptions, as well as conditions ensuring that random low-dimensional projections ensure the RIP even when the projection is from an infinite-dimensional space to a finite dimensional one. Envisioned applications are in compressive learning (see Section
6.4 ).

**Connections between sparse approximation and Bayesian estimation:** Penalized least squares regression is often used for signal denoising and inverse problems, and is commonly interpreted in a Bayesian framework as a Maximum A Posteriori (MAP) estimator, the
penalty function being the negative logarithm of the prior. For example, the widely used quadratic
program (with an ${\ell}^{1}$ penalty) associated to the LASSO / Basis Pursuit Denoising is very often
considered as MAP estimation under a Laplacian prior in the context of additive white Gaussian noise
(AWGN) reduction.

In 2011 we obtained a result [85] highlighting the fact that, while this is *one* possible Bayesian interpretation, there can be other
equally acceptable Bayesian interpretations. Therefore, solving a penalized least squares regression
problem with penalty $\phi \left(x\right)$ need not be interpreted as assuming a prior $C\xb7exp(-\phi (x\left)\right)$ and
using the MAP estimator. In particular, we showed that for *any* prior ${P}_{X}$, the minimum mean
square error (MMSE) estimator is the solution of a penalized least square problem with some penalty
$\phi \left(x\right)$, which can be interpreted as the MAP estimator with the prior $C\xb7exp(-\phi (x\left)\right)$.
Vice-versa, for *certain* penalties $\phi \left(x\right)$, the solution of the penalized least squares problem
is indeed the MMSE estimator, with a certain prior ${P}_{X}$. In
general $d{P}_{X}\left(x\right)\ne C\xb7exp(-\phi \left(x\right))dx$.
In 2013, we extended this result to general inverse problems [88] , [86] , [87] .
This year, we worked on the characterization of such relations beyond the Gaussian noise model, with the objective of understanding whether similar results hold when the quadratic data-fidelity term is replaced with other convex losses.

#### Algorithmic and theoretical results on dictionary learning

Participants : Rémi Gribonval, Nancy Bertin, Srdan Kitic, Cagdas Bilen, Luc Le Magoarou, Melanie Ducoffe.

Main collaboration (theory for dictionary learning): Rodolphe Jenatton, Francis Bach (Equipe-projet SIERRA (Inria, Paris)), Martin Kleinsteuber, Matthias Seibert (TU-Munich),

Main collaboration (dictionary learning for gesture recognition): Anatole Lecuyer, Ferran Argelaguet (EPI HYBRID, Rennes)

**Theoretical guarantees for dictionary learning :** An important practical problem in sparse modeling is to choose the
adequate dictionary to model a class of signals or images of interest.
While diverse heuristic techniques have been proposed in the litterature to learn a dictionary
from a collection of training samples, there are little existing results which provide an adequate
mathematical understanding of the behaviour of these techniques and their ability to recover an ideal
dictionary from which the training samples may have been generated.

Beyond our pioneering work [89] , [109] [6] on this topic, which concentrated on the noiseless case for non-overcomplete dictionaries, this year we obtained new results showing the relevance of an ${\ell}^{1}$ penalized cost function for the locally stable identification of overcomplete incoherent dictionaries, in the presence of noise and outliers [54] . Moreover, we established new sample complexity bounds of dictionary learning and other related matrix factorization schemes (including PCA, NMF, structured sparsity ...) [55] , [46] , [38] .

**Learning computationally efficient dictionaries** Classical dictionary learning is limited to small-scale problems. Inspired by usual fast transforms, we proposed a general dictionary structure that allows cheaper manipulation, and an algorithm to learn such dictionaries –and their fast implementation [50] . A preprint is available [56] , a paper will appear at ICASSP 2015, and a journal paper is in preparation.

**Operator learning for cosparse representations :**
Besides standard dictionary learning, we also considered learning in the context of the cosparse model.
The overall problem is to learn a low-dimensional signal model from a collection of training samples.
The mainstream approach is to learn an overcomplete dictionary to provide good approximations of
the training samples using sparse synthesis coefficients. This famous sparse model has a less well known
counterpart, in analysis form, called the cosparse analysis model. In this new model, signals are
characterized by their parsimony in a transformed domain using an overcomplete analysis operator.

In specific situations, when prior information is available on the operator, it is possible to express it in parametric form and learn this parameter. For instance, in the sound source localization problem, we showed that the unknown speed of sound can be learned jointly in the process of cosparse recovery, under mild conditions. This work was presented at iTwist'14 workshop [48] .

**Dictionary learning for gesture modeling**
In collaboration with the HYBRID project-team (internship of Melanie Ducoffe), we explored the potential of dictionary learning in the context of motion tracking. Motion tracking technology, especially for commodity hardware, requires robust gesture recognition algorithms to fully exploit the benefits of natural user interfaces. We proposed a gesture recognition algorithm based on the sparse representation of motion data, with a learning phase consisting in learning a dictionary of basic gestures. A paper is in preparation.