EN FR
EN FR


Section: New Results

Dimension free principal component analysis

Participants : Olivier Catoni, Ilaria Giulini.

In a work in progress, Ilaria Giulini, as part of her PhD studies, proved the following dimension free inequality, related to Principal Component Analysis in high dimension. Given an i.i.d. sample Xi, 1in of vector valued random variables Xi𝐑d, there exists an estimator N^ of the quadratic form N(θ)=𝐄(θ,X2) such that for any n1020, with probability at least 1-2ϵ, for any θ𝐑d,

1 4 μ < 1 N ^ ( θ ) N ( θ ) - 1 μ 1 - 4 μ ,

where

μ = 2 . 07 ( κ - 1 ) n log ( ϵ - 1 ) + 4 . 3 + 1 . 6 θ 2 𝐓𝐫 ( G ) N ( θ ) + 184 κ θ 2 𝐓𝐫 ( G ) n N ( θ ) ,

where G=𝐄XX is the Gram matrix and where κ=sup𝐄θ,X4𝐄θ,X22,θ𝐑d𝐊𝐞𝐫(G) is some kurtosis coefficient. This result proves that the expected energy in direction θ can be estimated at a rate that is independent of the dimension of the ambient space 𝐑d. It is obtained using PAC-Bayes inequalities with Gaussian parameter perturbations. The same bound holds in a Hilbert space of infinite dimension, opening the possibility of a rigorous mathematical study of kernel principal component analysis of random data, where the data are represented in a possibly infinite dimensional reproducing kernel Hilbert space.