CLASSIC - 2013 - Annual activity report

CLASSIC

CLASSIC - 2013

Project-Team Classic

Members

Overall Objectives

Research Program

Application Domains

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Dimension free principal component analysis

Participants : Olivier Catoni, Ilaria Giulini.

In a work in progress, Ilaria Giulini, as part of her PhD studies, proved the following dimension free inequality, related to Principal Component Analysis in high dimension. Given an i.i.d. sample $X_{i}$ , $1 \leq i \leq n$ of vector valued random variables $X_{i} \in 𝐑^{d}$ , there exists an estimator $\hat{N}$ of the quadratic form $N (θ) = 𝐄 ({〈 θ, X 〉}^{2})$ such that for any $n \leq 10^{20}$ , with probability at least $1 - 2 ϵ$ , for any $θ \in 𝐑^{d}$ ,

1 (4 μ < 1) |\frac{\hat{N} (θ)}{N (θ)} - 1| \leq \frac{μ}{1 - 4 μ},

where

μ = \sqrt{\frac{2.07 (κ - 1)}{n} [log (ϵ^{- 1}) + 4.3 + \frac{1.6 {∥ θ ∥}^{2} 𝐓𝐫 (G)}{N (θ)}]} + \sqrt{\frac{184 κ {∥ θ ∥}^{2} 𝐓𝐫 (G)}{n N (θ)}},

where $G = 𝐄 (X X^{⊤})$ is the Gram matrix and where $κ = sup \{\frac{𝐄 ({〈 θ, X 〉}^{4})}{𝐄 {({〈 θ, X 〉}^{2})}^{2}}, θ \in 𝐑^{d} ∖ 𝐊𝐞𝐫 (G)\}$ is some kurtosis coefficient. This result proves that the expected energy in direction $θ$ can be estimated at a rate that is independent of the dimension of the ambient space $𝐑^{d}$ . It is obtained using PAC-Bayes inequalities with Gaussian parameter perturbations. The same bound holds in a Hilbert space of infinite dimension, opening the possibility of a rigorous mathematical study of kernel principal component analysis of random data, where the data are represented in a possibly infinite dimensional reproducing kernel Hilbert space.

Previous |

Home | Next next