# What are the best ways to generate Bayesian prior estimates using beliefs of non-statisticians?

I work with a lot of qualitative researchers and designers. Many of whom interact with users and develop strong, often accurate intuitions about how the data should look. I frequently try to quantify their intuitions so that we can integrate their beliefs with new data.

Asking the right question is tough and the way I ask the question changes what the priors look like. I have a few different methods (mostly for proportions):

• make wagers on the probability of different hypotheses and then turn that into Bayes factor
• how many people out of X would do Y?
• I go reverse and ask people what their posterior beliefs are
following running into fake new data (you can estimate their prior
from that)

Clearly this isn’t an academic exercise, but one to create engagement over new data.

What questions would you ask someone who doesn’t know much about stats to accurately quantify their beliefs into a Bayesian prior and how do you go from their answer to a prior (R code would be nice)?

This is a good question. I’m going to use a simple example to illustrate my approach.

Suppose I am working with someone who needs to provide me priors on the mean and the variance for a gaussian likelihood. Something like

$$y \sim \mathcal{N}(\mu, \sigma^2) y \sim \mathcal{N}(\mu, \sigma^2)$$

The question is: “What are this person’s priors on $$\mu\mu$$ and $$\sigma^2\sigma^2$$?”

For the mean I might ask “Gimme a range on what you think the expected value might be”. They might say “between 20 and 30”. I’m free to then interpret that as I may (perhaps as the IQR of the prior on $$\mu\mu$$).

Now, I’ll use R (or more likely Stan) to simulate possible scenarios in order to further narrow down what a realistic prior is. So for example, my colleague says $$\mu\mu$$ is between 20 and 30. Now I have to decide on a prior for $$\sigma\sigma$$. So, I may show them the following and say “which of these four looks more realistic and why?”

They might say “the first is much to variable, and the last two are much to precise. The second looks more realistic, but it is too concentrated at 25!”

At this time, I’ll go back and adjust the priors for the mean while narrowing in on a prior for the variance.

This is called a “prior predictive check” — essentially sampling from the prior to make sure that the priors are actually reflective of what the state of the knowledge is. The process can be slow, but if your collaborators have no data or statistical expertise, then what can they expect of you? Not every parameter can be given a flat prior.

Stan code used to generate samples:

data{

real mu_mean;
real mu_sigma;

real sigma_alpha;
real sigma_beta;

}
generated quantities{

real mu = normal_rng(mu_mean, mu_sigma);
real sigma = gamma_rng(sigma_alpha, sigma_beta);
real y = normal_rng(mu, sigma);
}


R code used to generate figures

library(rstan)
library(tidyverse)
library(patchwork)

make_plot = function(x){

fit1 = sampling(scode, data = x, algorithm = 'Fixed_param', iter = 10000, chains =1 )

t1 = tibble(y = extract(fit1)\$y)

p1 = t1 %>%
ggplot(aes(y))+
geom_histogram()+
xlim(0,50)

return(p1)
}
d1 = list(mu_mean = 25, mu_sigma = 1, sigma_alpha = 5, sigma_beta = 1)
d2 = list(mu_mean = 25, mu_sigma = 1, sigma_alpha = 3, sigma_beta = 1)
d3 = list(mu_mean = 25, mu_sigma = 1, sigma_alpha = 1, sigma_beta = 1)
d4 = list(mu_mean = 25, mu_sigma = 1, sigma_alpha = .1, sigma_beta = 2)
d = list(d1, d2, d3, d4)

y = map(d, make_plot)

reduce(y,+)