Section:
Scientific Foundations
Nearest neighbor estimates
This additional topic was not present in the initial list of objectives,
and has emerged only recently.
In pattern recognition and statistical learning, also known as machine
learning, nearest neighbor (NN) algorithms are amongst the simplest but
also very powerful algorithms available.
Basically, given a training set of data, i.e. an –sample of i.i.d. object–feature pairs, with real–valued features,
the question is how to generalize,
that is how to guess the feature associated with any new object.
To achieve this, one chooses some integer smaller than , and
takes the mean–value of the features associated with the objects
that are nearest to the new object, for some given metric.
In general, there is no way to guess exactly the value of the feature
associated with the new object, and the minimal error that can be done
is that of the Bayes estimator, which cannot be computed by lack of knowledge
of the distribution of the object–feature pair, but the Bayes estimator
can be useful to characterize the strength of the method.
So the best that can be expected is that the NN estimator converges, say
when the sample size grows, to the Bayes estimator. This is what has been
proved in great generality by Stone [69] for the mean square
convergence, provided that the object is a finite–dimensional random
variable, the feature is a square–integrable random variable,
and the ratio goes to 0.
Nearest neighbor estimator is not the only local averaging estimator with
this property, but it is arguably the simplest.
The asymptotic behavior when the sample size grows is well understood in
finite dimension, but the situation is radically different in
general infinite dimensional spaces, when the objects to be classified
are functions, images, etc.
Nearest neighbor classification in infinite dimension In finite dimension, the –nearest neighbor classifier
is universally consistent, i.e. its probability of error converges to
the Bayes risk as goes to infinity, whatever the joint probability
distribution of the pair, provided that the ratio goes to zero.
Unfortunately, this result is no longer valid in general metric spaces,
and the objective is to find out reasonable sufficient conditions for
the weak consistency to hold. Even in finite dimension, there are exotic
distances such that the nearest neighbor does not even get closer (in the
sense of the distance) to the point of interest, and the state space
needs to be complete for the metric, which is the first condition.
Some regularity on the regression function is required next. Clearly,
continuity is too strong because it is not required in finite dimension,
and a weaker form of regularity is assumed. The following consistency
result has been obtained: if the metric space is separable and
if some Besicovich condition holds, then the nearest neighbor classifier
is weakly consistent.
Note that the Besicovich condition is always fulfilled in finite dimensional
vector spaces (this result is called the Besicovich theorem), and that
a counterexample [3] can be given in an infinite
dimensional space with
a Gaussian measure (in this case, the nearest neighbor classifier is clearly
nonconsistent). Finally, a simple example has been found which verifies
the Besicovich condition with a noncontinuous regression function.
Rates of convergence of the functional –nearest neighbor
estimator Motivated by a broad range of potential applications, such as regression
on curves, rates of convergence of the –nearest neighbor estimator
of the regression function, based on independent copies of the
object–feature pair, have been investigated
when the object is in a suitable ball in some functional space.
Using compact embedding theory, explicit and general finite sample bounds
can be obtained for the expected squared difference between the –nearest
neighbor estimator and the Bayes regression function, in a very general
setting. The results have also been
particularized to classical function spaces such as Sobolev spaces,
Besov spaces and reproducing kernel Hilbert spaces.
The rates obtained are genuine nonparametric convergence rates,
and up to our knowledge the first of their kind for –nearest neighbor
regression.
This emerging topic has produced several theoretical
advances [1] , [2]
in collaboration with Gérard Biau (université Pierre et Marie Curie,
ENS Paris and EPI CLASSIC, Inria Paris—Rocquencourt),
and a possible target application domain has been identified
in the statistical analysis of recommendation systems, that would
be a source of interesting problems.