# How to get ellipse region from bivariate normal distributed data?

I have data which looks like:

I tried to apply normal distribution (kernel density estimation works better, but I don’t need such great precision) on it and it works quite well. Density plot makes a ellipse.

I need to get that ellipse function to decide if a point lies within the ellipse’s region or not. How to do that?

R or Mathematica code are welcomed.

Corsario provides a good solution in a comment: use the kernel density function to test for inclusion within a level set.

Another interpretation of the question is that it requests a procedure to test for inclusion within the ellipses created by a bivariate normal approximation to the data. To get started, let’s generate some data that look like the illustration in the question:

library(mvtnorm) # References rmvnorm()
set.seed(17)
p <- rmvnorm(1000, c(250000, 20000), matrix(c(100000^2, 22000^2, 22000^2, 6000^2),2,2))


The ellipses are determined by the first and second moments of the data:

center <- apply(p, 2, mean)
sigma <- cov(p)


The formula requires inversion of the variance-covariance matrix:

sigma.inv = solve(sigma, matrix(c(1,0,0,1),2,2))


The ellipse “height” function is the negative of the logarithm of the bivariate normal density:

ellipse <- function(s,t) {u<-c(s,t)-center; u %*% sigma.inv %*% u / 2}


(I have ignored an additive constant equal to $\log(2\pi\sqrt{\det(\Sigma)})$.)

To test this, let’s draw some of its contours. That requires generating a grid of points in the x and y directions:

n <- 50
x <- (0:(n-1)) * (500000/(n-1))
y <- (0:(n-1)) * (50000/(n-1))


Compute the height function at this grid and plot it:

z <- mapply(ellipse, as.vector(rep(x,n)), as.vector(outer(rep(0,n), y, +)))
plot(p, pch=20, xlim=c(0,500000), ylim=c(0,50000), xlab="Packets", ylab="Flows")
contour(x,y,matrix(z,n,n), levels=(0:10), col = terrain.colors(11), add=TRUE)


Evidently it works. Therefore, the test to determine whether a point $(s,t)$ lies inside an elliptical contour at level $c$ is

ellipse(s,t) <= c


Mathematica does the job in the same way: compute the variance-covariance matrix of the data, invert that, construct the ellipse function, and you’re all set.