Does anyone use the L_1 or L_.5 metrics for clustering, rather than L_2 ?

Aggarwal et al.,

On the surprising behavior of distance metrics in high dimensional space

said (in 2001) thatL_1 is consistently more preferable

then the Euclidean distance metric

L_2 for high dimensional data mining

applicationsand claimed that L_.5 or L_.1 can be better yet.

Reasons for using L_1 or L_.5 could be theoretical or experimental,

e.g. sensitivity to outliers / Kabán’s papers,

or programs run on real or synthetic data (reproducible please).

An example or a picture would help my layman’s intuition.This question is a follow-up to Bob Durrant’s answer to

When-is-nearest-neighbor-meaningful-today.

As he says, the choice of p will be both data and application dependent;

nonetheless, reports of real experience would be useful.

Notes added Tuesday 7 June:

I stumbled across

“Statistical data analysis based on the L1-norm and related methods”,

Dodge ed., 2002, 454p, isbn 3764369205 — dozens of conference papers.Can anyone analyze distance concentration for i.i.d. exponential features ?

One reason for exponentials is that |exp – exp| \sim exp;

another (non-expert) is that it’s the max-entropy distribution \ge 0;

a third is that some real data sets, in particular SIFTs,

look roughly exponential.

**Answer**

The key here is understanding the “curse of dimensionality” the paper references. From wikipedia: when the number of dimensions is very large,

nearly all of the high-dimensional space is “far away” from the centre, or, to put it another way, the high-dimensional unit space can be said to consist almost entirely of the “corners” of the hypercube, with almost no “middle”

As a result, it starts to get tricky to think about which points are close to which other points, because they’re all more or less equally far apart. This is the problem in the first paper you linked to.

The problem with high p is that it emphasizes the larger values–five squared and four squared are nine units apart, but one squared and two squared are only three units apart. So the larger dimensions (things in the corners) dominate everything and you lose contrast. So this inflation of large distances is what you want to avoid. With a fractional p, the emphasis is on differences in the smaller dimensions–dimensions that actually have intermediate values–which gives you more contrast.

**Attribution***Source : Link , Question Author : denis , Answer Author : David J. Harris*