## Section: New Results

### Recent results on sparse representations

The team has had a substantial activity ranging from theoretical results to algorithmic design and software contributions in the field of sparse representations, which is at the core of the FET-Open European project (FP7) SMALL (Sparse Models, Algorithms and Learning for Large-Scale Data, see Section 7.2.1 ) and the ANR project ECHANGE (ECHantillonnage Acoustique Nouvelle GEnération, see, Section 6.3.1 ).

#### A new framework for sparse representations: analysis sparse models

Participants : Rémi Gribonval, Sangnam Nam.

Main collaboration: Mike Davies (Univ. Edinburgh), Michael Elad (The Technion), Hadi Zayyani (Sharif University)

In the past decade there has been a great interest in a synthesis-based model for signals, based on
sparse and redundant representations. Such a model assumes that the signal of interest can be composed
as a linear combination of *few* columns from a given matrix (the dictionary). An alternative
*analysis-based* model can be envisioned, where an analysis operator multiplies the signal,
leading to a *cosparse* outcome. Within the SMALL project, we initiated a research programme
dedicated to this analysis model, in the context of a generic missing data problem (e.g., compressed
sensing, inpainting, source separation, etc.). We obtained a uniqueness result for the solution of
this problem, based on properties of the analysis operator and the measurement matrix. We also considered
a number of pursuit algorithms for solving the missing data problem, including an L1-based and a new
greedy method called GAP (Greedy Analysis Pursuit). Our simulations demonstrated the appeal of the
analysis model, and the success of the pursuit techniques presented. These results have been published
in international conferences [64] [63] , and a journal paper is in preparation.

Our simulations demonstrated the appeal of the analysis model, and the success of the pursuit techniques presented. These results have been published in conferences [64] , [91] , [92] and a journal paper submitted to Applied and Computational Harmonic Analysis is under revision [103] . Other algorithms based on iterative cosparse projections [57] as well as extensions of GAP to deal with noise and structure in the cosparse representation have been developed, with applications to toy MRI reconstruction problems and acoustic source localization and reconstruction from few measurements (submitted to ICASSP 2012).

#### Theoretical results on sparse representations and dictionary learning

Participants : Rémi Gribonval, Sangnam Nam, Nancy Bertin.

Main collaboration: Karin Schnass (EPFL), Mike Davies (University of Edinburgh), Volkan Cevher (EPFL), Simon Foucart (Université Paris 5, Laboratoire Jacques-Louis Lions), Charles Soussen (Centre de recherche en automatique de Nancy (CRAN)) Jérôme Idier (Institut de Recherche en Communications et en Cybernétique de Nantes (IRCCyN)), Cédric Herzet (Equipe-projet FLUMINANCE (INRIA - CEMAGREF, Rennes)) Morten Nielsen (Department of Mathematical Sciences [Aalborg]), Gilles Puy, Pierre Vandergheynst, Yves Wiaux (EPFL) Mehrdad Yaghoobi, Rodolphe Jenatton, Francis Bach (Equipe-projet SIERRA (INRIA, Paris)) Boaz Ophir, Michael Elad (Technion) Mark D. Plumbley (Queen Mary, University of London)

**Sparse recovery conditions for Orthogonal Least Squares :**
We pursued our investigation of conditions on an overcomplete dictionary which guarantee that
certain ideal sparse decompositions can be recovered by some specific optimization
principles / algorithms. This year, we extended Tropp's analysis of Orthogonal Matching Pursuit
(OMP) using the Exact Recovery Condition (ERC) to a first exact recovery analysis of Orthogonal
Least Squares (OLS). We showed that when ERC is met, OLS is guaranteed to exactly recover the
unknown support. Moreover, we provided a closer look at the analysis of both OMP and OLS when ERC
is not fulfilled. We showed that there exist dictionaries for which some subsets are never recovered
with OMP. This phenomenon, which also appears with ${\ell}_{1}$ minimization, does not occur for OLS.
Finally, numerical experiments based on our theoretical analysis showed that none of the considered
algorithms is uniformly better than the other. This work has been submitted for publication
in a journal [108]

**New links between the Restricted Isometry Property and nonlinear approximations :**
It is now well known that sparse or compressible vectors can be stably recovered from their
low-dimensional projection, provided the projection matrix satisfies a Restricted Isometry
Property (RIP). We establish new implications of the RIP with respect to nonlinear approximation
in a Hilbert space with a redundant frame. The main ingredients of our approach are: a) Jackson
and Bernstein inequalities, associated to the characterization of certain approximation spaces
with interpolation spaces; b) a new proof that for overcomplete frames which satisfy a Bernstein
inequality, these interpolation spaces are nothing but the collection of vectors admitting a
representation in the dictionary with compressible coefficients; c) the proof that the RIP implies
Bernstein inequalities. As a result, we obtain that in most overcomplete random Gaussian dictionaries
with fixed aspect ratio, just as in any orthonormal basis, the error of best $m$-term approximation
of a vector decays at a certain rate if, and only if, the vector admits a compressible expansion in
the dictionary. Yet, for mildly overcomplete dictionaries with a one-dimensional kernel, we give
examples where the Bernstein inequality holds, but the same inequality fails for even the smallest
perturbation of the dictionary. This work has been submitted for publication in a journal
[102] .

**Performance guarantees for compressed sensing with spread spectrum techniques :**
We advocate a compressed sensing strategy that consists of multiplying the signal of interest
by a wide bandwidth modulation before projection onto randomly selected vectors of an orthonormal
basis. Firstly, in a digital setting with random modulation, considering a whole class of sensing
bases including the Fourier basis, we prove that the technique is universal in the sense that the
required number of measurements for accurate recovery is optimal and independent of the sparsity
basis. This universality stems from a drastic decrease of coherence between the sparsity and the
sensing bases, which for a Fourier sensing basis relates to a spread of the original signal spectrum
by the modulation (hence the name "spread spectrum"). The approach is also efficient as sensing
matrices with fast matrix multiplication algorithms can be used, in particular in the case of
Fourier measurements. Secondly, these results are confirmed by a numerical analysis of the phase
transition of the l1-minimization problem. Finally, we show that the spread spectrum technique remains
effective in an analog setting with chirp modulation for application to realistic Fourier imaging.
We illustrate these findings in the context of radio interferometry and magnetic resonance imaging.
This work has been presented at a conference [93] and accepted for
publication in a journal [105] .

**Dictionary learning :** An important practical problem in sparse modeling is to choose the
adequate dictionary to model a class of signals or images of interest.
While diverse heuristic techniques have been proposed in the litterature to learn a dictionary
from a collection of training samples, there are little existing results which provide an adequate
mathematical understanding of the behaviour of these techniques and their ability to recover an ideal
dictionary from which the training samples may have been generated.

In 2008, we initiated a pioneering work on this topic, concentrating in particular on the fundamental theoretical question of the identifiability of the learned dictionary. Within the framework of the Ph.D. of Karin Schnass, we developed an analytic approach which was published at the conference ISCCSP 2008 [13] and allowed us to describe "geometric" conditions which guarantee that a (non overcomplete) dictionary is "locally identifiable" by ${\ell}^{1}$ minimization.

In a second step, we focused on estimating the number of sparse training samples which is typically sufficient to guarantee the identifiability (by ${\ell}^{1}$ minimization), and obtained the following result, which is somewhat surprising considering that previous studies seemed to require a combinatorial number of training samples to guarantee the identifiability: the local identifiability condition is typically satisfied as soon as the number of training samples is roughly proportional to the ambient signal dimension. The outline of the second result was published in conferences [12] , [25] . These results have been published in the journal paper [15] .

This year we have worked on extending the results to noisy training samples with outliers. A journal paper is in preparation, and the results will be presented at a workshop at NIPS 2011.

**Analysis Operator Learning for Overcomplete Cosparse Representations :**
Besides standard dictionary learning, we also considered learning in the context of the cosparse model.
We consider the problem of learning a low-dimensional signal model from a collection of training samples.
The mainstream approach would be to learn an overcomplete dictionary to provide good approximations of
the training samples using sparse synthesis coefficients. This famous sparse model has a less well known
counterpart, in analysis form, called the cosparse analysis model. In this new model, signals are
characterized by their parsimony in a transformed domain using an overcomplete analysis operator.
We proposed two approaches to learn an analysis operator from a training corpus, both published in
the conference EUSIPCO 2011 [79] , [67] .

The first one uses a constrained optimization program based on L1 optimization. We derive a practical learning algorithm, based on projected subgradients, and demonstrate its ability to robustly recover a ground truth analysis operator, provided the training set is of sufficient size. A local optimality condition is derived, providing preliminary theoretical support for the well-posedness of the learning problem under appropriate conditions. Extensions to deal with noisy training samples are currently investigated, and a journal paper is in preparation.

In the second approach, analysis "atoms" are learned sequentially by identifying directions that are orthogonal to a subset of the training data. We demonstrate the effectiveness of the algorithm in three experiments, treating synthetic data and real images, showing a successful and meaningful recovery of the analysis operator.

**Connections between sparse approximation and Bayesian estimation:** Penalized least squares regression is often used for signal denoising and inverse problems, and
is commonly interpreted in a Bayesian framework as a Maximum A Posteriori (MAP) estimator, the
penalty function being the negative logarithm of the prior. For example, the widely used quadratic
program (with an ${\ell}^{1}$ penalty) associated to the LASSO / Basis Pursuit Denoising is very often
considered as MAP estimation under a Laplacian prior in the context of additive white Gaussian noise
(AWGN) reduction.

A first result, which has been published in IEEE Transactions on Signal Processing [35] ,
highlights the fact that, while this is *one* possible Bayesian interpretation, there can be other
equally acceptable Bayesian interpretations. Therefore, solving a penalized least squares regression
problem with penalty $\phi \left(x\right)$ need not be interpreted as assuming a prior $C\xb7exp(-\phi (x\left)\right)$ and
using the MAP estimator. In particular, we showed that for *any* prior ${P}_{X}$, the minimum mean
square error (MMSE) estimator is the solution of a penalized least square problem with some penalty
$\phi \left(x\right)$, which can be interpreted as the MAP estimator with the prior $C\xb7exp(-\phi (x\left)\right)$.
Vice-versa, for *certain* penalties $\phi \left(x\right)$, the solution of the penalized least squares problem
is indeed the MMSE estimator, with a certain prior ${P}_{X}$. In
general $d{P}_{X}\left(x\right)\ne C\xb7exp(-\phi \left(x\right))dx$.

A second result, obtained in collaboration with Prof. Mike Davies and Prof. Volkan Cevher (a paper is under revision) characterizes the "compressibility" of various probability distributions with applications to underdetermined linear regression (ULR) problems and sparse modeling. We identified simple characteristics of probability distributions whose independent and identically distributed (iid) realizations are (resp. are not) compressible, i.e., that can be approximated as sparse. We prove that many priors which MAP Bayesian interpretation is sparsity inducing (such as the Laplacian distribution or Generalized Gaussian distributions with exponent p<=1), are in a way inconsistent and do not generate compressible realizations. To show this, we identify non-trivial undersampling regions in ULR settings where the simple least squares solution outperform oracle sparse estimation in data error with high probability when the data is generated from a sparsity inducing prior, such as the Laplacian distribution.

#### Wavelets on graphs

Participant : Rémi Gribonval.

Main collaboration: Pierre Vandergheynst, David Hammond (EPFL)

Within the framework of the SMALL project 7.2.1 , we investigated the possibility of developping sparse representations of functions defined on graphs, by defining an extension to the traditional wavelet transform which is valid for data defined on a graph.

There are many problems where data is collected through a graph structure: scattered or non-uniform sampling, sensor networks, data on sampled manifolds or even social networks or databases. Motivated by the wealth of new potential applications of sparse representations to these problems, the partners set out a program to generalize wavelets on graphs. More precisely, we have introduced a new notion of wavelet transform for data defined on the vertices of an undirected graph. Our construction uses the spectral theory of the graph laplacian as a generalization of the classical Fourier transform. The basic ingredient of wavelets, multi-resolution, is defined in the spectral domain via operator-valued functions that can be naturally dilated. These in turn define wavelets by acting on impulses localized at any vertex. We have analyzed the localization of these wavelets in the vertex domain and showed that our multi-resolution produces functions that are indeed concentrated at will around a specified vertex. Our theory allowed us to construct an equivalent of the continuous wavelet transform but also discrete wavelet frames.

Computing the spectral decomposition can however be numerically expensive for large graphs. We have shown that, by approximating the spectrum of the wavelet generating operator with polynomial expansions, applying the forward wavelet transform and its transpose can be approximated through iterated applications of the graph Laplacian. Since in many cases the graph Laplacian is sparse, this results in a very fast algorithm. Our implementation also uses recurrence relations for computing polynomial expansions, which results in even faster algorithms. Finally, we have proved how numerical errors are precisely controlled by the properties of the desired spectral graph wavelets. Our algorithms have been implemented in a Matlab toolbox that has been released in parallel to the main theoretical article [16] . We also plan to include this toolbox in the SMALL project numerical platform.

We now foresee many applications. On one hand we will use non-local graph wavelets constructed from the set of patches in an image (or even an audio signal) to perform de-noising or in general restoration. An interesting aspect in this case, would be to understand how wavelets estimated from corrupted signals deviate from clean wavelets. In a totally different direction, we will also explore the applications of spectral graph wavelets constructed from brain connectivity graphs obtained from whole brain tractography. Our preliminary results show that graph wavelets yield a representation that is very well adapted to how the information flows in the brain along neuronal structures.

#### Algorithmic breakthrough in sparse approximation : LoCOMP

Participants : Rémi Gribonval, Frédéric Bimbot, Ronan Le Boulch.

Main collaborations: Pierre Vandergheynst (EPFL), Boris Mailhé (former team member, now with Queen Mary University, London)

Our team had already made a substantial breakthrough in 2005 when first releasing the Matching Pursuit ToolKit (MPTK, see Section 5.3 ) which allowed for the first time the application of the Matching Pursuit algorithm to large scale data such as hours of CD-quality audio signals. In 2008, we designed a variant of Matching Pursuit called LoCOMP (ubiquitously for LOw Complexity Orthogonal Matching Pursuit or Local Orthogonal Matching Pursuit) specifically designed for shift-invariant dictionaries. LoCOMP has been shown to achieve an approximation quality very close to that of a full Orthonormal Matching Pursuit while retaining a much lower computational complexity of the order of that of Matching Pursuit. The complexity reduction is substantial, from one day of computation to 15 minutes for a typical audio signal [20] , [19] . The main effort this year has been to integrate this algorithm into MPTK to ensure its dissemination and exploitation, and a journal paper has been published [22] .