# How to use/interpret empirical distribution?

First of all I’d like to apologize for the vague title, I couldn’t really formulate a better one just now, please feel free to change, or advice me to change, the title to make it better fit the core of the question.

Now about the question itself, I have been working on a software in which I have come across the idea of using an empirical distribution for sampling, however now that it’s implemented I am not sure how to interpret it all. Allow me to describe what I have done, and why:

I have a bunch of calculations for a set of objects, yielding a final score. The score as it is however is very ad-hoc. So in order to make some sense out of the score of a particular object, what I do is to do a large number of (N = 1000) calculations of scores with mock/randomly generated values, yielding 1000 mock scores. Estimating an empirical “score distribution” for that particular object is then achieved by these 1000 mock score values.

I have implemented this in Java (as the rest of the software is also written in Java environment) using Apache Commons Math library, in particular the EmpiricalDistImpl class. According to the documentation this class uses:

what amounts to the Variable Kernel
Method with Gaussian smoothing:
Digesting the input file

1. Pass the file once to compute min and max.
2. Divide the range from min-max into binCount “bins.”
3. Pass the data file again, computing bin counts and univariate
statistics (mean, std dev.) for each
of the bins
4. Divide the interval (0,1) into subintervals associated with the bins,
with the length of a bin’s subinterval
proportional to its count.

Now my question is, does it make sense to sample from this distribution in order to calculate some sort of an expected value? In other words how much could I trust/rely on this distribution? Could I for instance draw conclusion about significance of observing a score $S$ by checking the distribution?

I realize that this is perhaps an unorthodox way looking at a problem like this but I think it would be interesting to get a better grip over the concept of empirical distributions, and how they can/can’t be used in analysis.