# How do Bayesian Statistics handle the absence of priors?

This question was inspired by two recent interactions I had, one here in CV, the other over at economics.se.

There, I had posted an answer to the well-known “Envelope Paradox” (mind you, not as the “correct answer” but as the answer flowing from specific assumptions about the structure of the situation). After a time a user posted a critical comment, and I engaged in conversation trying to understand his point. It was obvious that he was thinking the Bayesian way, and kept talking about priors -and then it dawned on me, and I said to my self:”Wait a minute, who said anything about any prior? In the way I have formulated the problem, there are no priors here, they just don’t enter the picture, and don’t need to”.

Recently, I saw this answer here in CV, about the meaning of Statistical Independence. I commented to the author that his sentence

“… if events are statistically independent then (by definition) we
cannot learn about one from observing the other.”

was blatantly wrong. In a comment exchange, he kept returning to the issue of (his words)

“Wouldn’t “learning” mean changing our beliefs about a thing based on
observation of another? If so, doesn’t independence (definitionally)
preclude this?

Once again, it was obvious that he was thinking the Bayesian way, and that he considered self-evident that we start by some beliefs (i.e. a prior), and then the issue is how we can change/update them. But how the first-first belief is created?

Since science must conform to reality, I note that situations exist were the human beings involved have no priors (I, for one thing, walk into situations without any prior all the time -and please don’t argue that I do have priors but I just don’t realize it, let’s spare ourselves bogus psychoanalysis here).

Since I happened to have heard the term “uninformative priors”, I break my question in two parts, and I am pretty certain that users here that are savvy in Bayesian theory, know exactly what I am about to ask:

Q1: Is the absence of a prior equivalent (in the strict theoretical sense) to having an uninformative prior?

If the answer to Q1 is “Yes” (with some elaboration please), then it means that the Bayesian approach is applicable universally and from the beginning, since in any instance the human being involved declares “I have no priors” we can supplement in its place a prior that is uninformative for the case at hand.

But if the answer to Q1 is “No”, then Q2 comes along:

Q2 : If the answer to Q1 is “No”, does this mean that, in cases where there are no priors, the Bayesian approach is not applicable from the beginning, and we have to first form a prior by some non-Bayesian way, so that we can subsequently apply the Bayesian approach?

Q1: Is the absence of a prior equivalent (in the strict theoretical sense) to having an uninformative prior?

No.

First, there is no mathematical definition for an “uninformative prior”. This word is only used informally to describe some priors.

For example, Jeffrey’s prior is often called “uninformative”. This prior generalizes the uniform prior for translation invariant problems. Jeffrey’s prior somehow adapts to the (information theoretic) Riemannian geometry of the model and thus is independent of parametrization, only dependent on the geometry of the manifold (in the space of distributions) that is the model. It might be perceived as canonical, but it’s only a choice. It’s just the uniform prior according to Riemannian structure. It’s not absurd to define “uninformative = uniform” as a simplification of the question. This applies to many cases and helps to ask a clear and simple question.

Doing Bayesian inference without a prior is like “how can I guess $E(X)$ without any assumption about the distribution of $X$ only knowing that $X$ has values in $[0;1]$?” This question obviously makes no sense. If you answer 0.5, you probably have a distribution in mind.

The Bayesian and frequentist approaches simply answer different questions. For example, about estimators which is maybe the simplest:

• Frequentist (for example): “How can I estimate $\theta$ such that my answer has the smallest error (only averaged over $x$) in the worst case (over $\theta$)?”. This leads to minimax estimators.

• Bayesian: “How can I estimate $\theta$ such that my answer has the smallest error in average (over $\theta$) ?”. This leads to Bayes estimators. But the question is incomplete and must specify “average in what sense?”. Thus the question is only complete when it contains a prior.

Somehow, frequentist aims at worst case control and does not need a prior. Bayesian aims at average control and requires a prior to say “average in what sense?”.

Q2 : If the answer to Q1 is “No”, does this mean that, in cases where there are no priors, the Bayesian approach is not applicable from the beginning, and we have to first form a prior by some non-Bayesian way, so that we can subsequently apply the Bayesian approach?

Yes.

But beware of canonical prior construction. It might sound mathematically appealing but is not automatically realistic from a Bayesian point of view. It is possible a mathematically nice prior actually corresponds to a dumb belief system. For example if you study $X\sim N(\mu,1)$, Jeffrey’s prior on $\mu$ is uniform and if about people’s average size, this might not be a very realistic system. However with only a few observations, the problem actually disappears quite fast. The choice is not very important.

True problems with prior specification happen in more complicated problems in my opinion. What is important here is to understand what a certain prior says.