## Section: Research Program

### Data Dimensionality Reduction

Manifolds, locally linear embedding, non-negative matrix factorization, principal component analysis

A fundamental problem in many data processing tasks (compression, classification, indexing) is to find a suitable representation of the data. It often aims at reducing the dimensionality of the input data so that tractable processing methods can then be applied. Well-known methods for data dimensionality reduction include principal component analysis (PCA) and independent component analysis (ICA). The methodologies which will be central to several proposed research problems will instead be based on sparse representations, on locally linear embedding (LLE) and on the “non negative matrix factorization” (NMF) framework.

The objective of *sparse representations* is to find a sparse approximation of a given input data. In theory, given $A\in {\mathbb{R}}^{m\times n}$, $m<n$, and $\mathbf{b}\in {\mathbb{R}}^{m}$ with $m<<n$ and $A$ is of full rank, one seeks the solution of $min\{\parallel \mathbf{x}{\parallel}_{0}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}A\mathbf{x}=\mathbf{b}\},$ where ${\parallel \mathbf{x}\parallel}_{0}$ denotes the ${L}_{0}$ norm of $x$, i.e. the number of non-zero components in $z$.
There exist many solutions $x$ to $Ax=b$. The problem is to find the sparsest, the one for which $x$ has the fewest non zero components. In practice, one actually seeks an approximate and thus even sparser solution which satisfies $min\{\parallel \mathbf{x}{\parallel}_{0}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}\parallel A\mathbf{x}-\mathbf{b}{\parallel}_{p}\le \rho \},$ for some $\rho \ge 0$, characterizing an admissible reconstruction error.
The norm $p$ is usually 2, but could be 1 or $\infty $ as well. Except for the exhaustive combinatorial approach, there is no known method to find the exact solution under general conditions on the dictionary $A$. Searching for this sparsest representation is hence unfeasible and both problems are computationally intractable. Pursuit algorithms have been introduced as heuristic methods which aim at finding approximate solutions to the above problem with tractable complexity.

*Non negative matrix factorization* (NMF) is a non-negative approximate data representation (D.D. Lee and H.S. Seung, “Algorithms for non-negative matrix factorization”, Nature 401, 6755, (Oct. 1999), pp. 788-791.). NMF aims at finding an approximate factorization of a non-negative input data matrix $V$ into non-negative matrices $W$ and $H$, where the columns of $W$ can be seen as *basis vectors* and those of $H$ as coefficients of the linear approximation of the input data. Unlike other linear representations like PCA and ICA, the non-negativity constraint makes the representation purely additive.
Classical data representation methods like PCA or Vector Quantization (VQ) can be placed in an NMF framework, the differences arising from different constraints being placed on the $W$ and $H$ matrices. In VQ, each column of $H$ is constrained to be unitary with only one non-zero coefficient which is equal to 1. In PCA, the columns of $W$ are constrained to be orthonormal and the rows of $H$ to be orthogonal to each other. These methods of data-dependent dimensionality reduction will be at the core of our visual data analysis and compression activities.