Section:
New Results
Choice of for -Fold Cross-Validation in Least-Squares
Participant :
Sylvain Arlot [correspondent] .
Collaboration with Matthieu Lerasle.
The paper [30] studies -fold cross-validation for model selection in
least-squares density estimation. The goal is to provide theoretical
grounds for choosing in order to minimize the least-squares loss of the
selected estimator. We first prove a non-asymptotic oracle inequality for
-fold cross-validation and its bias-corrected version (-fold
penalization). In particular, this result implies that -fold
penalization is asymptotically optimal in the nonparametric case. Then, we
compute the variance of -fold cross-validation and related criteria, as
well as the variance of key quantities for model selection performance. We
show that these variances depend on like , at least in some
particular cases, suggesting that the performance increases much from
to or 10, and then is almost constant. Overall, this can explain the
common advice to take —at least in our setting and when the
computational power is limited—, as supported by some simulation
experiments. An oracle inequality and exact formulas for the variance are
also proved for Monte-Carlo cross-validation, also known as repeated
cross-validation, where the parameter is replaced by the number B of
random splits of the data.