After going through some slightly terse mathematics, I think I have a slight intuition of kernel density estimation. But I am also aware that estimating multivariate density for more than three variables might not be a good idea, in terms of the statistical properties of its estimators.
So, in what sorts of situations should I want to estimate, say, bivariate density using non-parametric methods? Is it worth enough to start worrying about estimating it for more than two variables?
If you can point to some useful links regarding application of estimation of multivariate density, that’d be great.
One typical case for the application of density estimation is novelty detection, a.k.a. outlier detection, where the idea is that you only (or mostly) have data of one type, but you are interested in very rare, qualitative distinct data, that deviates significantly from those common cases.
Examples are fraud detection, detection of failures in systems, and so on. These are situations where it is very hard and/or expensive to gather data of the sort you are interested in. These rare cases, i.e. cases with low probability of occurring.
Most of the times you are not interested on estimating accurately the exact distribution, but on the relative odds (how likely is a given sample to be an actual outlier vs. not being one).
There are dozens of tutorials and reviews on the topic. This one might be a good one to start with.
EDIT: for some people seems odd using density estimation for outlier detection.
Let us first agree on one thing: when somebody fits a mixture model to his data, he is actually performing density estimation. A mixture model represents a distribution of probability.
kNN and GMM are actually related: they are two methods of estimating such a density of probability. This is the underlying idea for many approaches in novelty detection. For example, this one based on kNNs, this other one based on Parzen windows (which stress this very idea at the beginning of the paper), and many others.
It seems to me (but it is just my personal perception) that most if not all work on this idea. How else would you express the idea of an anomalous/rare event?