If variable kernel widths are often good for kernel regression, why are they generally not good for kernel density estimation?

This question is prompted by discussion elsewhere.

Variable kernels are often used in local regression. For example, loess is widely used and works well as a regression smoother, and is based on a kernel of variable width that adapts to data sparsity.

On the other hand, variable kernels are usually thought to lead to poor estimators in kernel density estimation (see Terrell and Scott, 1992).

Is there an intuitive reason why they would work well for regression but not for density estimation?

Answer

There seem to be two different questions here, which I’ll try to split:

1) how is KS, kernel smoothing, different from KDE, kernel density estimation ?
Well, say I have an estimator / smoother / interpolator

est( xi, fi -> gridj, estj )

and also happen to know the “real” densityf() at the xi. Then running
est( x, densityf )
must give an estimate of densityf(): a KDE.
It may well be that KSs and KDEs are evaluated differently —
different smoothness criteria, different norms —
but I don’t see a fundamental difference. What am I missing ?

2) How does dimension affect estimation or smoothing, intuitivly ?
Here’s a toy example, just to help intuition.
Consider a box of N=10000 points in a uniform grid,
and a window, a line or square or cube, of W=64 points within it:

                1d          2d          3d          4d
---------------------------------------------------------------
data            10000       100x100     22x22x22    10x10x10x10
side            10000       100         22          10
window          64          8x8         4x4x4       2.8^4
side ratio      .64 %       8 %         19 %        28 %
dist to win     5000        47          13          7

Here “side ratio” is window side / box side,
and “dist to win” is a rough estimate of the mean distance
of a random point in the box to a randomly-placed window.

Does this make any sense at all ?
(A picture or applet would really help: anyone ?)

The idea is that a fixed-size window within a fixed-size box
has very different nearness to the rest of the box, in 1d 2d 3d 4d.
This is for a uniform grid;
maybe the strong dependence on dimension carries over to
other distributions, maybe not.
Anyway, it looks like a strong general effect, an aspect of the curse of dimensionality.

Attribution
Source : Link , Question Author : Rob Hyndman , Answer Author : denis

Leave a Comment