# Approximation of a probability distribution

I have a continuous random variable $$XX$$ that can easily be sampled. I don’t have any other assumption on $$XX$$. Let’s say I have sampled $$XX$$ and I have constructed the set $$SS$$. We can assume that $$SS$$ is as big as needed.

I want to be able to approximate its probability distribution. By this I mean that I would like to “guess” a probability distribution, such that if it is sampled it will give me a set of values $$TT$$, which is statistically equivalent to $$SS$$. I do understand that this is still a vague question, so I am happy with any practical solution.

I guess the obvious solution is to approximate the PDF by the “histogram” of $$SS$$. I assume that if $$SS$$ is big enough, the approximation will be good enough. But is there anything more clever that can be done?

Is there any known and trusted method to do that? For example, can I use the first few moments to improve my guess?

The histogram approximation might be better than you think. The simplest “histogram” approximation is to use a discrete distribution with a point mass of $$1/n1/n$$ at each observation. This is the empirical density, and the corresponding CDF $$ˆFn\hat F_n$$ is the empirical cumulative distribution function (ECDF). With iid data, the ECDF enjoys a number of properties, one of which is the Dvoretzky-Kiefer-Wolfowitz inequality:
$$P(sup P\left(\sup_{x\in\mathbb R} |\hat F_n(x) - F(x)| > \epsilon\right) \leq 2e^{-2n\epsilon^2}.$$
This means that the probability of the largest deviation being greater than some $$\epsilon\epsilon$$ decreases exponentially in $$nn$$. Since you have access to lots of samples you can make this probability tiny even for a very small $$\epsilon\epsilon$$.
Sampling from $$\hat F_n\hat F_n$$ is equivalent to taking a bootstrap sample from your data, and the quality of $$\hat F_n\hat F_n$$ as an estimator of $$FF$$ is a big part of why bootstrapping works so well.