Section: New Results
Choice of for -Fold Cross-Validation in Least-Squares
Participant : Sylvain Arlot [correspondent] .
Collaboration with Matthieu Lerasle.
The paper  studies -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for -fold cross-validation and its bias-corrected version (-fold penalization). In particular, this result implies that -fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on like , at least in some particular cases, suggesting that the performance increases much from to or 10, and then is almost constant. Overall, this can explain the common advice to take —at least in our setting and when the computational power is limited—, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter is replaced by the number B of random splits of the data.