Section: Scientific Foundations
Sparse representations
wavelet, dictionary, adaptive decomposition, optimisation, parcimony, non-linear approximation, pursuit, greedy algorithm, computational complexity, Gabor atom, data-driven learning, principal component analysis, independant component analysis
Over the past decade, there has been an intense and interdisciplinary research activity in the investigation of sparsity and methods for sparse representations, involving researchers in signal processing, applied mathematics and theoretical computer science. This has led to the establishment of sparse representations as a key methodology for addressing engineering problems in all areas of signal and image processing, from the data acquisition to its processing, storage, transmission and interpretation, well beyond its original applications in enhancement and compression. Among the existing sparse approximation algorithms, L1-optimisation principles (Basis Pursuit, LASSO) and greedy algorithms (e.g., Matching Pursuit and its variants) have in particular been extensively studied and proved to have good decomposition performance, provided that the sparse signal model is satisfied with sufficient accuracy.
The large family of audio signals includes a wide variety of temporal and frequential structures, objects of variable durations, ranging from almost stationary regimes (for instance, the note of a violin) to short transients (like in a percussion). The spectral structure can be mainly harmonic (vowels) or noise-like (fricative consonants). More generally, the diversity of timbers results in a large variety of fine structures for the signal and its spectrum, as well as for its temporal and frequential envelope. In addition, a majority of audio signals are composite, i.e. they result from the mixture of several sources (voice and music, mixing of several tracks, useful signal and background noise). Audio signals may have undergone various types of distortion, recording conditions, media degradation, coding and transmission errors, etc.
Sparse representations provide a framework which has shown increasingly fruitful for capturing, analysing, decomposing and separating audio signals
Redundant systems and adaptive representations
Traditional methods for signal decomposition are generally based on the description of the signal in a given basis (i.e. a free, generative and constant representation system for the whole signal). On such a basis, the representation of the signal is unique (for example, a Fourier basis, Dirac basis, orthogonal wavelets, ...). On the contrary, an adaptive representation in a redundant system consists of finding an optimal decomposition of the signal (in the sense of a criterion to be defined) in a generating system (or dictionary) including a number of elements (much) higher than the dimension of the signal.
Let
If
We will denote as
The principles of the adaptive decomposition then consist in
selecting, among all possible decompositions, the best one, i.e. the
one which satisfies a given criterion (for
example a sparsity criterion) for the signal
under consideration, hence the concept of adaptive decomposition (or representation).
In some cases, a maximum of
with
Sparsity criteria
Obtaining a single solution for the equation above requires
the introduction of a constraint on the coefficients
Among the most commonly used functions, let us quote the various functions
Let us recall that for
The minimization of the quadratic norm
An intermediate approach consists in minimizing norm
Other criteria can be taken into account and, as long as the function
Finally, let us note that the theory of non-linear
approximation offers a framework in which links can be established
between the sparsity of exact decompositions and the quality of
approximate representations with
Decomposition algorithms
Three families of approaches are conventionally used to obtain an (optimal or sub-optimal) decomposition of a signal in a redundant system.
The “Best Basis” approach consists
in constructing the dictionary
The “Basis Pursuit” approach
minimizes the norm
The “Matching Pursuit” approach
consists in optimizing incrementally the decomposition of the
signal, by searching at each stage the element of the dictionary
which has the best correlation with the signal to be decomposed,
and then by subtracting from the signal the contribution of this element.
This procedure is repeated on the residue thus
obtained, until the number of (linearly independent) components
is equal to the dimension of the signal. The coefficients
Intermediate approaches can also be considered, using hybrid algorithms which try to seek a compromise between computational complexity, quality of sparsity and simplicity of implementation.
Dictionary construction
The choice of the dictionary
The choice of the dictionary can rely on a priori considerations. For instance, some redundant systems may require less computation than others, to evaluate projections of the signal on the elements of the dictionary. For this reason, the Gabor atoms, wavelet packets and local cosines have interesting properties. Moreover, some general hint on the signal structure can contribute to the design of the dictionary elements : any knowledge on the distribution and the frequential variation of the energy of the signals, on the position and the typical duration of the sound objects, can help guiding the choice of the dictionary (harmonic molecules, chirplets, atoms with predetermined positions, ...).
Conversely, in other contexts, it can be desirable to build the
dictionary with data-driven approaches, i.e. training examples of
signals belonging to the same class (for example, the same speaker or the same
musical instrument, ...). In this respect, Principal Component Analysis (PCA)
offers interesting properties, but other approaches can be considered (in particular the direct
optimization of the sparsity of the decomposition, or properties on
the approximation error with
In some cases, the training of the dictionary can require stochastic optimization, but one can also be interested in EM-like approaches when it is possible to formulate the redundant representation approach within a probabilistic framework.
Extension of the techniques of adaptive representation can also be
envisaged by the generalization of the approach to probabilistic
dictionaries, i.e. comprising vectors which are random variables
rather than deterministic signals. Within this framework, the signal
Compressive sensing
The theoretical results around sparse representations have laid the foundations for a new research field called compressed sensing, emerging primarily in the USA. Compressed sensing investigates ways in which we can sample signals at roughly the lower information rate rather than the standard Shannon-Nyquist rate for sampled signals.
In a nutshell, the principle of Compressed Sensing is, at the acquisition step, to use as samples a number of random linear projections. Provided that the underlying phenomenon under study is sufficiently sparse, it is possible to recover it with good precision using only a few of the random samples. In a way, Compressed Sensing can be seen as a generalized sampling theory, where one is able to trade bandwidth (i.e. number of samples) with computational power. There are a number of cases where the latter is becoming much more accessible than the former; this may therefore result in a significant overall gain, in terms of cost, reliability, and/or precision.