# Interpretation of confidence interval

Note: apologies in advance if this is a duplicate, I didn’t find a similar q in my search

Say we have a true parameter p. A confidence interval C(X) is a RV that contains p, say 95% of the time. Now suppose we observe X and compute C(X). The common answer seems to be that it is incorrect to interpret this as having a “95% chance of containing p” since it “either does or it doesn’t contain p”

However, let’s say I pick a card from the top of a shuffled deck and leave it face down. Intuitively I think of the probability of this card of being the Ace of Spades as 1/52, even though in reality “it either is or it isn’t the Ace of Spades.” Why can’t I apply this reasoning to the example of the confidence interval?

Or if it is not meaningful to talk of the “probability” of the card being the ace of spades since it “is or it isn’t”, I would still lay 51:1 odds that it isn’t the ace of spades. Is there another word to describe this information? How is this concept different than “probability”?

edit: Maybe to be more clear, from a bayesian interpretation of probability, if I’m told that a random variable contains p 95% of the time, given the realization of that random variable (and no other information to condition on) is it correct to say the random variable has a 95% probability of containing p?

edit: also, from a frequentist interpretation of probability, let’s say the frequentist agrees not to say anything like “there is a 95% probability that the confidence interval contains p”. Is it still logical for a frequentist to have a “confidence” that the confidence interval contains p?

Let alpha be the significance level and let t = 100-alpha. K(t) be the frequentist’s “confidence” that the confidence interval contains p. It makes sense that K(t) should be increasing in t. When t = 100%, the frequentist should have certainty (by definition) that the confidence interval contains p, so we can normalize K(1) = 1. Similarly, K(0) = 0. Presumably K(0.95) is somewhere between 0 and 1 and K(0.999999) is greater. In what way would the frequentist consider K different from P (the probability distribution)?

I think lots of conventional accounts of this matter are not clear.

Lets say you take a sample of size $100$ and get a $95\%$ confidence interval for $p$.

Then you take another sample of $100$, independent of the first, and get another $95\%$ confidence interval for $p$.

What changes is the confidence interval; what does not change is $p$. That means that in frequentist methods, one says the confidence interval is “random” but $p$ is “fixed” or “constant”, i.e. not random. In frequentist methods, such as the method of confidence intervals, one assigns probabilities only to things that are random.

So $\Pr(L and $(L,U)$ is a confidence interval. ($L=$ "lower" and $U=$ "upper".) Take a new sample and $L$ and $U$ change but $p$ does not.

Let's say in a particular instance you have $L=40.53$ and $U=43.61$. In frequentist methods one would not assign a probability to the statement $40.53, other than a probability of $0$ or $1$, becuase nothing here is random: $40.53$ is not random, $p$ is not random (since it won't change if we take a new sample), and $43.61$ is not random.

In practice, people do behave as if they're $95\%$ sure that $p$ is between $40.53$ and $43.61$. And as a practical matter, that may often make sense. But sometimes it doesn't. One such case is if numbers as large as $40$ or more are known in advance to be improbable, or if they are known to be highly probable. If one can assign some prior probability distribution to $p$, one uses Bayes theorem to get a credible interval, which may differ from the confidence interval because of prior knowledge of which ranges of values of $p$ are probable or improbable. It can also actually happen that the data themselves --- the things that change if a new sample is taken, can tell you that $p$ is unlikely to be, or even certain not to be, as big as $40$. That can happen even in cases in which the pair $(L,U)$ is a sufficient statistic for $p$. That phenomenon can be dealt with in some instances by Fisher's method of conditioning on an ancillary statistic. An example of this last phenomenon is when the sample consists of just two independent observations that are uniformly distributed in the interval $\theta\pm1/2$. Then the interval from the smaller of the two observations to the larger is a $50\%$ confidence interval. But if the distance between them is $0.001$, it would be absurd to be anywhere near $50\%$ sure that $\theta$ is between them, and if the distance is $0.999$, one would reasonably be almost $100\%$ sure $\theta$ is between them. The distance between them would be the ancillary statistic on which one would condition.