EN FR
EN FR


Section: Research Program

Local regression techniques

Participants: S. Ferrigno, A. Muller-Gueudin. In the context where a response variable Y is to be related to a set of regressors X, one of the general goals of Statistics is to provide the end user with a model which turns out to be useful in predicting Y for various values of X. Except for the simplest situations, the determination of a good model involves many steps. For example, for the task of predicting the value of Y as a function of the covariate X, statisticians have elaborated models such as the regression model with random regressors:

Y = g ( X , θ ) + σ ( X ) ϵ .

Many assumptions must be made to reach it as a possible model. Some require much thinking, as for example, those related to the functional form of g(·,θ). Some are made more casually, as often those related to the functional form of σ(·) or those concerning the distribution of the random error term ϵ. Finally, some assumptions are made for commodity. Thus the need for methods that can assess if a model is concordant with the data it is supposed to adjust. The methods fall under the banner of goodness of fit tests. Most existing tests are directional, in the sense that they can detect departures from only one or a few aspects of a null model. For example, many tests have been proposed in the literature to assess the validity of an entertained structural part g(·,θ). Some authors have also proposed tests about the variance term σ(·) (cf. [51] ). Procedures testing the normality of the ϵi are given, but for other assumptions much less work has been done. Therefore the need of a global test which can evaluate the validity of a global structure emerges quite naturally.

With these preliminaries in mind, let us observe that one quantity which embodies all the information about the joint behavior of (X,Y) is the cumulative conditional distribution function, defined by

F ( y | x ) = P ( Y y | X = x ) .

The (nonparametric) estimation of this function is thus of primary importance. To this aim, notice that modern estimators are usually based on the local polynomial approach, which has been recognized as superior to classical estimates based on the Nadaraya-Watson approach, and are as good as the recent versions based on spline and other methods. In some recent works [41] , [42] , we address the following questions:

  • Construction of a global test by means of Cramér-von Mises statistic.

  • Optimal bandwidth of the kernel used for approximation purposes.

We also obtain sharp estimates on the conditional distribution function in [43] .