I often need make $(x,y)$ scatter plots that have many ($>10^5$) points. I’ve experimented with different ways of representing this many points that capture the distribution while getting around the messiness of actually putting a million points in an image. I’ve done the obvious things like thinning points, and plotting the local density of points as either contours or a colour map. I’ve found that thinning tends to be ineffective as the distributions I deal with are often exponentially peaked, so I just get the same problem in a narrower area, and local density representations are sensitive to the details of smoothing/sampling used to get the density field.

Lately I’ve been thinking about trying a different strategy, but I haven’t been able to come up with an algorithm to make it work (which is where this question comes in). This sort of plot, which I’ve heard cosmologists colloquially call a ‘banana plot’, is the inspiration, and I think is fairly common in Bayesian analysis:

Here the contours show the $1\sigma$ and $2\sigma$ confidence intervals for a combination of two parameters. In this application a 2D probability density function is generated, so making the plot is easy. I want to do something similar.

I’d like to draw contours enclosing 99%, 95% and 68% (and more generally any percentage) of the data points in an $(x,y)$ dataset. Obviously such a contour is not unique, so an extra constraint is needed. I think something like minimising the area or maximising the average density of points inside the contour should give the desired effect. Does anyone know an algorithm that would produce such a contour? I eventually want to make a plot using matplotlib, so bonus points if you happen to know such an algorithm that is implemented in python, but I’ll do the implementation myself if I have to.

Not terribly familiar with the tags here on CV.SE, so if I’m missing any obvious ones please suggest/edit them in.

**Answer**

**Attribution***Source : Link , Question Author : Kyle , Answer Author : Community*