Some Bayesians attack frequentist inference stating that “there is no unique sampling distribution” because it depends on the intentions of the researcher (Kruschke, Aguinis, & Joo, 2012, p. 733).
For instance, say a researcher starts data collection, but his funding was unexpectedly cut after 40 participants. How would the sampling distributions (and subsequent CIs and p-values) even be defined here? Would we just assume each constituent sample has N = 40? Or would it consist of samples with different N, with each size determined by other random times his funding may have been cut?
The t, F, chi-square (etc.), null distributions found in textbooks all assume that the N is fixed and constant for all the constituent samples, but this may not be true in practice. With every different stopping procedure (e.g., after a certain time interval or until my assistant gets tired) there seems to be a different sampling distribution, and using these ‘tried and true’ fixed-N distributions is inappropriate.
How damaging is this criticism to the legitimacy of frequentist CIs and p-values? Are there theoretical rebuttals? It seems that by attacking the concept of the sampling distribution, the entire edifice of frequentist inference is tenuous.
Any scholarly references are greatly appreciated.
Typically you’d carry out inference conditional on the actual sample size n, because it’s ancillary to the parameters of interest; i.e. it contains no information about their true values, only affecting the precision with which you can measure them. Cox (1958), “Some Problems Connected with Statistical Inference”, Ann. Math. Statist. 29, 2 is usually cited as first explicating what’s sometimes known as the Conditionality Principle, though it was implicit in much earlier work, harking back to Fisher’s idea of “relevant subsets”.
If your researcher’s funding was cut off because results so far were disappointing, then of course n isn’t ancillary. Perhaps the simplest illustration of the problem is estimation of a Bernoulli probability from either a binomial (fixed no. of trials) or negative binomial (fixed no. successes) sampling scheme. The sufficient statistic is the same under either, but its distribution differs. How would you analyze an experiment where you didn’t know which was followed? Berger & Wolpert (1988), The Likelihood Principle discuss the implications of this & other stopping rules for inference.
You might want to think about what happens if you don’t take any sampling distribution into account. Armitage (1961), “Comment on ‘Consistency in Statistical Inference and Decision’ by Smith”, JRSS B, 23,1 pointed out that if you sample x from a normal distribution until √nˉx≤k, the likelihood ratio for testing that the mean μ=0 vs μ≠0 is L(0)L(ˉx)≤e−k2/2, so the researcher can set a bound on this in advance by an appropriate choice of k. Only a frequentist analysis can take the distribution of the likelihood ratio under this rather unfair-seeming sampling scheme into account. See the responses of Kerridge (1963), “Bounds for the frequency of misleading Bayes inferences”, Ann. Math. Stat., 34, Cornfield (1966), “Sequential trials, sequential analysis, and the likelihood principle”, The American Statistician, 20, 2, & Kadane (1996), “Reasoning to a foregone conclusion”, JASA, 91, 435
Pointing out the dependence of frequentist inference on a researcher’s intentions is a handy dig at people (if there still are any) who get on their high horse about the “subjectivity” of Bayesian inference. Personally, I can live with it; the performance of a procedure over a long series of repetitions is always going to be something more or less notional, which doesn’t detract from its being a useful thing to consider (“a calibration of the likelihood” was how Cox described p-values). From the dates of the references you might have noticed that these issues aren’t very new; attempts to settle them by a priori argumentation have largely died down (except on the Internet, always behind the times except in trivial matters) & been replaced by acknowledgement that neither Bayesian nor frequentist statistics are going to collapse under the weight of their internal contradictions, & that there’s more than one useful way to apply probability theory to inference from noisy data.
PS: Thinking to add a counter-balance to Berger & Wolpert I happened upon Cox & Mayo (2010), “Objectivity and Conditionality in Frequentist Inference” in Error and Inference. There’s quite likely an element of wishful thinking in my assertion that the debate has died down, but it’s striking how little new there is to be said on the matter after half a century or so. (All the same, this is a concise & eloquent defence of frequentist ideas.)