# Are sampling distributions legitimate for inference?

Some Bayesians attack frequentist inference stating that “there is no unique sampling distribution” because it depends on the intentions of the researcher (Kruschke, Aguinis, & Joo, 2012, p. 733).

For instance, say a researcher starts data collection, but his funding was unexpectedly cut after 40 participants. How would the sampling distributions (and subsequent CIs and p-values) even be defined here? Would we just assume each constituent sample has N = 40? Or would it consist of samples with different N, with each size determined by other random times his funding may have been cut?

The t, F, chi-square (etc.), null distributions found in textbooks all assume that the N is fixed and constant for all the constituent samples, but this may not be true in practice. With every different stopping procedure (e.g., after a certain time interval or until my assistant gets tired) there seems to be a different sampling distribution, and using these ‘tried and true’ fixed-N distributions is inappropriate.

How damaging is this criticism to the legitimacy of frequentist CIs and p-values? Are there theoretical rebuttals? It seems that by attacking the concept of the sampling distribution, the entire edifice of frequentist inference is tenuous.

Any scholarly references are greatly appreciated.

Typically you’d carry out inference conditional on the actual sample size $n$, because it’s ancillary to the parameters of interest; i.e. it contains no information about their true values, only affecting the precision with which you can measure them. Cox (1958), “Some Problems Connected with Statistical Inference”, Ann. Math. Statist. 29, 2 is usually cited as first explicating what’s sometimes known as the Conditionality Principle, though it was implicit in much earlier work, harking back to Fisher’s idea of “relevant subsets”.
If your researcher’s funding was cut off because results so far were disappointing, then of course $n$ isn’t ancillary. Perhaps the simplest illustration of the problem is estimation of a Bernoulli probability from either a binomial (fixed no. of trials) or negative binomial (fixed no. successes) sampling scheme. The sufficient statistic is the same under either, but its distribution differs. How would you analyze an experiment where you didn’t know which was followed? Berger & Wolpert (1988), The Likelihood Principle discuss the implications of this & other stopping rules for inference.
You might want to think about what happens if you don’t take any sampling distribution into account. Armitage (1961), “Comment on ‘Consistency in Statistical Inference and Decision’ by Smith”, JRSS B, 23,1 pointed out that if you sample $x$ from a normal distribution until $\sqrt{n} \bar{x} \leq k$, the likelihood ratio for testing that the mean $\mu=0$ vs $\mu\neq0$ is $\frac{L(0)}{L(\bar{x})}\leq \mathrm{e}^{-k^2/2}$, so the researcher can set a bound on this in advance by an appropriate choice of $k$. Only a frequentist analysis can take the distribution of the likelihood ratio under this rather unfair-seeming sampling scheme into account. See the responses of Kerridge (1963), “Bounds for the frequency of misleading Bayes inferences”, Ann. Math. Stat., 34, Cornfield (1966), “Sequential trials, sequential analysis, and the likelihood principle”, The American Statistician, 20, 2, & Kadane (1996), “Reasoning to a foregone conclusion”, JASA, 91, 435