SIERRA - 2015 - Annual activity report

SIERRA

SIERRA - 2015

Project-Team Sierra

Members

Overall Objectives

Statement

Research Program

Application Domains

Application Domains

Highlights of the Year

New Software and Platforms

DICA: Moment Matching for Latent Dirichlet Allocation (LDA) and Discrete Independent Component Analysis (DICA)
LinearFW: Implementation of linearly convergent versions of Frank-Wolfe
CNN-Head-Detection: Context-aware CNNs for person head detection

New Results

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Publications of the year

Previous |

Home | Next next

Section: New Results

Choice of $V$ for $V$ -Fold Cross-Validation in Least-Squares

Participant : Sylvain Arlot [correspondent] .

Collaboration with Matthieu Lerasle.

The paper [30] studies $V$ -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing $V$ in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for $V$ -fold cross-validation and its bias-corrected version ( $V$ -fold penalization). In particular, this result implies that $V$ -fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of $V$ -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on $V$ like $1 + 4 / (V - 1)$ , at least in some particular cases, suggesting that the performance increases much from $V = 2$ to $V = 5$ or 10, and then is almost constant. Overall, this can explain the common advice to take $V = 5$ —at least in our setting and when the computational power is limited—, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter $V$ is replaced by the number B of random splits of the data.

Previous |

Home | Next next

SIERRA - 2015

SIERRA - 2015

Section: New Results

Choice of V for V-Fold Cross-Validation in Least-Squares

Choice of $V$ for $V$ -Fold Cross-Validation in Least-Squares