How to estimate $P(x\le0)$ from $n$ samples of $x$?

Suppose, we have $n$ samples $x_i$ of a random variable:
$$x \sim \mathcal N(\mu,\sigma^2) $$

Based on the samples, we want to estimate the probability that $x$ is negative:
$$P(x\le0)$$

Intuitively, I would first estimate:
$$\hat \mu=\frac 1 n \sum x_i$$
$$\hat \sigma^2={\frac 1 {n-1} \sum (x_i-\hat \mu)^2}$$

and then calculate:

$$P(x\le0)=\frac1{\sqrt{2\pi\hat\sigma^2}} \int_{-\infty}^0 e^\frac{x-\hat\mu}{\hat \sigma} dx$$

However $\hat \mu$ and $\hat \sigma$ have variance! If I use this method, I suspect I am ignoring that variance and making an incorrect estimation.

Is this reasoning right?

If so, how can I estimate $P(x\le0)$ more correctly, taking $\text {VAR}[\hat \mu]$ and $\text {VAR}[\hat \sigma]$ into account?

Answer

The method you are using is very close to the MLE, which has reasonable estimation properties when the underlying parametric model is correct. The MLE has a property called functional invariance, which means that the MLE of a function of the parameters is that function of the MLE. Your method uses the sample variance estimator, which is a bias-corrected version of the MLE of the true variance, but your estimator should have reasonable properties if the underlying model is correct. Of course, you are correct that your estimator involves some variance, but that is true of any estimator in this situation.

If you are confident that your data is from an exchangeable sequence (i.e., it is an IID model) then I would recommend you give serious consideration to instead using the empirical estimator, which is:

$$\widehat{\mathbb{P}(X \leqslant 0)} = \frac{1}{n} \sum_{i=1}^n \mathbb{I}(x_i \leqslant 0).$$

This latter estimator also has good properties, but crucially, it does not rely on the assumption that the data are from a normal distribution. The empirical estimator is consistent for any underlying distribution (which your estimator is not) which makes it highly robust to model misspecification.

Attribution
Source : Link , Question Author : elemolotiv , Answer Author : Ben

Leave a Comment