Section: New Results

V-fold cross-validation and V-fold penalization in least-squares density estimation

Participant : Sylvain Arlot [correspondant] .

In [22] , we study V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares risk of the selected estimator. We first prove a non asymptotic oracle inequality for V-fold cross-validation and its bias-corrected version (V-fold penalization), with an upper bound decreasing as a function of V. In particular, this result implies V-fold penalization is asymptotically optimal. Then, we compute the variance of V-fold cross-validation and related criteria, as well as the variance of key quantities for model selection performances. We show these variances depend on V like 1+1/(V-1) (at least in some particular cases), suggesting the performances increase much from V=2 to V=5 or 10, and then is almost constant. Overall, this explains the common advice to take V=10—at least in our setting and when the computational power is limited—, as confirmed by some simulation experiments.

Collaboration with Matthieu Lerasle (CNRS, University Nice Sophia Antipolis).