For a given inference problem, we know that a Bayesian approach usually differ in both form and results from a fequentist approach. Frequentists (usually includes me) often point out that their methods don’t require a prior and hence are more “data driven” than “judgement driven”. Of course, Bayesian’s can point to non-informative priors, or, being pragmatic, just use a really diffuse prior.
My concern, especially after feeling a hint of smugness at my fequentist objectivity, is that perhaps my purportedly “objective” methods can be formulated in a Bayesian framework, albeit with some unusual prior and data model. In that case, am I just blissfully ignorant of the preposterous prior and model my frequentist method implies?
If a Bayesian pointed out such a formulation, I think my first reaction would be to say “Well, that’s nice that you can do that, but that’s not how I think about the problem!”. However, who cares how I think about it, or how I formulate it. If my procedure is statistically/mathematically equivalent to some Bayesian model, then I am implicitly (unwittingly!) performing Bayesian inference.
Actual Question Below
This realization substantially undermined any temptation to be smug. However, I’m not sure if its true that the Bayesian paradigm can accommodate all frequentist procedures (again, provided the Bayesian chooses a suitable prior and likelihood). I know the converse is false.
I ask this because I recently posted a question about conditional inference, which led me to the following paper: here (see 3.9.5,3.9.6)
They point out Basu’s well-known result that there can be more than one ancillary statistic, begging the question as to which “relevant subset” is most relevant. Even worse, they show two examples of where, even if you have a unique ancillary statistic, it does not eliminate the presence of other relevant subsets.
They go on to conclude that only Bayesian methods (or methods equivalent to them) can avoid this problem, allowing unproblematic conditional inference.
It may not be the case that Bayesian Stats $\supset$ Fequentist Stats — that’s my question to this group here. But it does appear that a fundamental choice between the two paradigms lies less in philosophy than in goals: do you need high conditional accuracy or low unconditional error:
High conditional accuracy seems applicable when we have to analyze a singular instance — we want to be right for THIS particular inference, despite the fact that this method may not be appropriate or accurate for the next dataset (hyper-conditionality/specialization).
Low unconditional error is appropriate when if we are willing make conditionally incorrect inferences in some cases, so long as our long run error is minimized or controlled. Honestly, after writing this, I’m not sure why I would want this unless I were strapped for time and couldn’t do a Bayesian analysis…hmmm.
I tend to favor likelihood-based fequentist inference, since I get some (asymptotic/approximate) conditionality from the likelihood function, but don’t need to fiddle with a prior – however, I’ve become increasingly comfortable with Bayesian inference, especially if I see the prior a a regularization term for small sample inference.
Sorry for the aside. Any help for my main problem is appreciated.
I would argue that frequentists are indeed often “implicit/unwitting Bayesians”, as in practice we often want to perform probabilistic reasoning about things that don’t have a long run frequency. The classic example being Null Hypothesis Statistical Testing (NHST), where what we really want to know is the relative probabilities of the Null and Research Hypotheses being true, but we cant do this in a frequentist setting as the truth of a particular hypothesis has no (non-trivial) long run frequency – it is either true or it isn’t. Frequentist NHSTs get around this by substituting a different question, “what is the probability of observing an outcome at least as extreme under the null hypothesis” and then compare that to a pre-determined threshold. However this procedure does not logically allow us to conclude anything about whether H0 or H1 is true, and in doing so we are actually stepping out of a frequentist framework into a (usually subjective) Bayesian one, where we conclude that the probability of observing such an extreme value under H0 is so low, that we can no longer believe that H0 is likely to be true (note this is implicitly assigning a probablility to a particular hypothesis).
Note it isn’t actually true that frequentist procedures don’t have subjectivity or priors, in NHSTs the threshold on the p-value, $\alpha$, serves much the same purpose as the priors $p(H_0)$ and $p(H_1)$ in a Bayesian analysis. This is illustrated by the much-discussed XKCD cartoon:
The main reason the frequentists conclusion is unreasonable is that the value of $\alpha$ does not represent a reasonable state of knowledge regarding the detector and/or solar physics (we know that it is extremely unlikely that the sun has exploded, and rather less so that the detector has a false alarm). Note in this case the conclusion that the sun has exploded inferred from a low p-value (a Bayesian inference) but is not logically entailed by it. The subjectivity is still there, but not stated explicitly in the analysis and often neglected.
Arguably confidence intervals are often used (and interpreted as) an interval in which we can expect to see the observations with a given probability, which again is a Bayesian interpretation.
Ideally statisticians ought to be aware of the benefits and disadvantages of both approaches and be prepared to use the right framework for the application at hand. Basically we should aim to use the analysis that provides the most direct answer to the question we actually want answered (and not quietly substitute a different one), so a frequentist approach is probably most efficient where we actually are interested in long-run frequencies and Bayesian methods where that is not the case.
I suspect that most frequentist questions can be answered by a Bayesian as there is nothing to stop a Bayesian from answering questions like “what is the probability of observing a result at least as extreme if $H_0$ is true”, however I’ll need to do a bit of reading on that one, interesting question.