I am trying to estimate the mean of a more-or-less Gaussian distribution via sampling. I have no prior knowledge about its mean or its variance. Each sample is expensive to obtain. How do I dynamically decide how many samples I need to get a certain level of confidence/accuracy? Alternatively, how do I know when I can stop taking samples?
All the answers to questions like this that I can find seem to presume some knowledge of the variance, but I need to discover that along the way as well. Other are geared towards taking polls, and it’s not clear to me (beginner that I am) how that generalizes — my mean isn’t w/in [0,1], etc.
I think this is probably a simple question with a well known answer, but my Google-fu is failing me. Even just telling me what to search for would be helpful.
You need to search for ‘Bayesian adaptive designs’. The basic idea is as follows:
You initialize the prior for the parameters of interest.
Before any data collection your priors would be diffuse. As additional data comes in you re-set the prior to be the posterior that corresponds to the ‘prior + data till that point in time’.
Compute the posterior based on data + priors. The posterior is then used as the prior in step 1 if you actually collect additional data.
Assess whether your stopping criteria are met
Stopping criteria could include something like the 95% credible interval should not be bigger than $\pm \epsilon$ units for the parameters of interest. You could also have more formal loss functions associated with the parameters of interest and compute expected loss with respect to the posterior distribution for the parameter of interest.
You then repeat steps 1, 2 and 3 till your stopping criteria from step 4 are met.