I have a couple of questions for prediction and tolerance intervals.
Let’s agree on the definition of the tolerance intervals first: We are given a confidence level, say 90%, the percentage of the population to capture, say 99%, and a sample size, say 20. The probability distribution is known, say normal for convenience. Now, given the above three numbers (90%, 99% and 20) and the fact that the underlying distribution is normal, we can compute the tolerance number k. Given a sample (x_1,x_2,\ldots,x_{20}) with mean \bar{x} and standard deviation s, the tolerance interval is \bar{x}\pm ks. If this tolerance interval captures 99% of the population, then the sample (x_1,x_2,\ldots,x_{20}) is called a success and the requirement is that 90% of the samples are successes.
Comment: 90% is the a priori probability for a sample to be a success. 99% is the conditional probability that a future observation will be in the tolerance interval, given that the sample is a success.
My questions:
Can we see prediction intervals as tolerance intervals? Looking on the web I got conflicting answers on this, not to mention that nobody really defined the prediction intervals carefully. So, if you have a precise definition of the prediction interval (or a reference), I would appreciate it.What I understood is that a 99% prediction interval for instance, does not capture 99% of all future values for all samples. This would be the same as a tolerance interval that captures 99% of the population with 100% probability.
In the definitions I found for a 90% prediction interval, 90% is the a priori probability given a sample, say (x_1,x_2,\ldots,x_{20}) (size is fixed) and a single future observation y, that y will be in the prediction interval. So, it seems that both the sample and the future value are both given at the same time, in contrast to the tolerance interval, where the sample is given and with a certain probability it is a success, and under the condition that the sample is a success, a future value is given and with a certain probability falls into the tolerance interval. I am not sure if the above definition of the prediction interval is right or not, but it seems counterintuitive (at least).
Any help?
Answer
Your definitions appear to be correct.
The book to consult about these matters is Statistical Intervals (Gerald Hahn & William Meeker), 1991. I quote:
A prediction interval for a single future observation is an interval that will, with a specified degree of confidence, contain the next (or some other prespecified) randomly selected observation from a population.
[A] tolerance interval is an interval that one can claim to contain at least a specified proportion, p, of the population with a specified degree of confidence, 100(1\alpha)\%.
Here are restatements in standard mathematical terminology. Let the data \mathbf{x}=(x_1,\ldots,x_n) be considered a realization of independent random variables \mathbf{X}=(X_1,\ldots,X_n) with common cumulative distribution function F_\theta. (\theta appears as a reminder that F may be unknown but is assumed to lie in a given set of distributions {F_\theta \vert \theta \in \Theta}). Let X_0 be another random variable with the same distribution F_\theta and independent of the first n variables.

A prediction interval (for a single future observation), given by endpoints [l(\mathbf{x}), u(\mathbf{x})], has the defining property that
\inf_\theta\{{\Pr}_\theta(X_0 \in [l(\mathbf{X}), u(\mathbf{X})])\}= 100(1\alpha)\%.
Specifically, {\Pr}_\theta refers to the n+1 variate distribution of (X_0, X_1, \ldots, X_n) determined by the law F_\theta. Note the absence of any conditional probabilities: this is a full joint probability. Note, too, the absence of any reference to a temporal sequence: X_0 very well may be observed in time before the other values. It does not matter.
I’m not sure which aspect(s) of this may be “counterintuitive.” If we conceive of selecting a statistical procedure as an activity to be pursued before collecting data, then this is a natural and reasonable formulation of a planned twostep process, because both the data (X_i, i=1,\ldots,n) and the “future value” X_0 need to be modeled as random.

A tolerance interval, given by endpoints (L(\mathbf{x}), U(\mathbf{x})], has the defining property that
\inf_\theta\{{\Pr}_\theta\left(F_\theta(U(\mathbf{X})) – F_\theta(L(\mathbf{X})\right) \ge p)\} = 100(1\alpha)\%.
Note the absence of any reference to X_0: it plays no role.
When \{F_\theta\} is the set of Normal distributions, there exist prediction intervals of the form
l(\mathbf{x}) = \bar{x} – k(\alpha, n) s, \quad u(\mathbf{x}) = \bar{x} + k(\alpha, n) s
(\bar{x} is the sample mean and s is the sample standard deviation). Values of the function k, which Hahn & Meeker tabulate, do not depend on the data \mathbf{x}. There are other prediction interval procedures, even in the Normal case: these are not the only ones.
Similarly, there exist tolerance intervals of the form
L(\mathbf{x}) = \bar{x} – K(\alpha, n, p) s, \quad U(\mathbf{x}) = \bar{x} + K(\alpha, n, p) s.
There are other tolerance interval procedures: these are not the only ones.
Noting the similarity among these pairs of formulas, we may solve the equation
k(\alpha, n) = K(\alpha’, n, p).
This allows one to reinterpret a prediction interval as a tolerance interval (in many different possible ways by varying \alpha’ and p) or to reinterpret a tolerance interval as a prediction interval (only now \alpha usually is uniquely determined by \alpha’ and p). This may be one origin of the confusion.
Attribution
Source : Link , Question Author : Ioannis Souldatos , Answer Author : whuber