This question is prompted by discussion elsewhere.
Variable kernels are often used in local regression. For example, loess is widely used and works well as a regression smoother, and is based on a kernel of variable width that adapts to data sparsity.
On the other hand, variable kernels are usually thought to lead to poor estimators in kernel density estimation (see Terrell and Scott, 1992).
Is there an intuitive reason why they would work well for regression but not for density estimation?
There seem to be two different questions here, which I’ll try to split:
1) how is KS, kernel smoothing, different from KDE, kernel density estimation ?
Well, say I have an estimator / smoother / interpolator
est( xi, fi -> gridj, estj )
and also happen to know the “real” densityf() at the xi. Then running
est( x, densityf )
must give an estimate of densityf(): a KDE.
It may well be that KSs and KDEs are evaluated differently —
different smoothness criteria, different norms —
but I don’t see a fundamental difference. What am I missing ?
2) How does dimension affect estimation or smoothing, intuitivly ?
Here’s a toy example, just to help intuition.
Consider a box of N=10000 points in a uniform grid,
and a window, a line or square or cube, of W=64 points within it:
1d 2d 3d 4d --------------------------------------------------------------- data 10000 100x100 22x22x22 10x10x10x10 side 10000 100 22 10 window 64 8x8 4x4x4 2.8^4 side ratio .64 % 8 % 19 % 28 % dist to win 5000 47 13 7
Here “side ratio” is window side / box side,
and “dist to win” is a rough estimate of the mean distance
of a random point in the box to a randomly-placed window.
Does this make any sense at all ?
(A picture or applet would really help: anyone ?)
The idea is that a fixed-size window within a fixed-size box
has very different nearness to the rest of the box, in 1d 2d 3d 4d.
This is for a uniform grid;
maybe the strong dependence on dimension carries over to
other distributions, maybe not.
Anyway, it looks like a strong general effect, an aspect of the curse of dimensionality.