## Section: New Results

### Dimension free principal component analysis

Participants : Olivier Catoni, Ilaria Giulini.

In a work in progress, Ilaria Giulini, as part of her PhD studies, proved the following dimension free inequality, related to Principal Component Analysis in high dimension. Given an i.i.d. sample ${X}_{i}$, $1\le i\le n$ of vector valued random variables ${X}_{i}\in {\mathbf{R}}^{d}$, there exists an estimator $\widehat{N}$ of the quadratic form $N\left(\theta \right)=\mathbf{E}\left({\langle \theta ,X\rangle}^{2}\right)$ such that for any $n\le {10}^{20}$, with probability at least $1-2\u03f5$, for any $\theta \in {\mathbf{R}}^{d}$,

where

where $G=\mathbf{E}\left(X{X}^{\top}\right)$ is the Gram matrix and where $\kappa =sup\left\{\frac{\mathbf{E}\left({\langle \theta ,X\rangle}^{4}\right)}{\mathbf{E}{\left({\langle \theta ,X\rangle}^{2}\right)}^{2}},\theta \in {\mathbf{R}}^{d}\setminus \mathrm{\mathbf{K}\mathbf{e}\mathbf{r}}\left(G\right)\right\}$ is some kurtosis coefficient. This result proves that the expected energy in direction $\theta $ can be estimated at a rate that is independent of the dimension of the ambient space ${\mathbf{R}}^{d}$. It is obtained using PAC-Bayes inequalities with Gaussian parameter perturbations. The same bound holds in a Hilbert space of infinite dimension, opening the possibility of a rigorous mathematical study of kernel principal component analysis of random data, where the data are represented in a possibly infinite dimensional reproducing kernel Hilbert space.