How do I find values not given in (interpolate in) statistical tables?

Often people use programs to obtain p-values, but sometimes – for whatever reason – it may be necessary to obtain a critical value from a set of tables.

Given a statistical table with a limited number of significance levels, and a limited number of degrees of freedom, how do I obtain approximate critical values at other significance levels or degrees of freedom (such as with t, chi-square, or F tables)?

That is, how do I find the values “in between” the values in a table?


This answer is in two main parts: firstly, using linear interpolation, and secondly, using transformations for more accurate interpolation. The approaches discussed here are suitable for hand calculation when you have limited tables available, but if you’re implementing a computer routine to produce p-values, there are much better approaches (if tedious when done by hand) that should be used instead.

If you knew that the 10% (one tailed) critical value for a z-test was 1.28 and the 20% critical value was 0.84, a rough guess at the 15% critical value would be half-way between – (1.28+0.84)/2 = 1.06 (the actual value is 1.0364), and the 12.5% value could be guessed at halfway between that and the 10% value (1.28+1.06)/2 = 1.17 (actual value 1.15+). This is exactly what linear interpolation does – but instead of ‘half-way between’, it looks at any fraction of the way between two values.

Univariate linear interpolation

Let’s look at the case of simple linear interpolation.

So we have some function (say of x) that we think is approximately linear near the value we’re trying to approximate, and we have a value of the function either side of the value we want, for example, like so:


The two x values whose y‘s we know are 12 (20-8) apart. See how the x-value (the one that we want an approximate y-value for) divides that difference of 12 up in the ratio 8:4 (16-8 and 20-16)? That is, it’s 2/3 of the distance from the first x-value to the last. If the relationship were linear, the corresponding range of y-values would be in the same ratio.

linear interpolation

So y169.315.69.3 should be about the same as 168208.

That is y169.315.69.3168208



An example with statistical tables: if we have a t-table with the following critical values for 12 df:

\begin{array}{ c c }
(2\text{-tail})& \\
α & t\\
0.01 & 3.05\\
0.02 & 2.68\\
0.05 & 2.18\\
0.10 & 1.78

We want the critical value of t with 12 df and a two-tail alpha of 0.025. That is, we interpolate between
the 0.02 and the 0.05 row of that table:

\begin{array}{ c c }
α & t\\
0.02 & 2.68\\
0.025 & \text{?}\\
0.05 & 2.18\\

The value at “\text{?}” is the t_{0.025} value that we wish to use linear interpolation to approximate. (By t_{0.025} I actually mean the 1-0.025/2 point of the inverse cdf of a t_{12} distribution.)

As before, 0.025 divides the interval from 0.02 to 0.05 in the ratio (0.025-0.02) to (0.05-0.025) (i.e. 1:5) and the unknown t-value should divide the t range 2.68 to 2.18 in the same ratio; equivalently, 0.025 occurs (0.025-0.02)/(0.05-0.02) = 1/6th of the way along the x-range, so the unknown t-value should occur 1/6th of the way along the t-range.

That is \frac{t_{0.025}-2.68}{2.18-2.68} \approx \frac{0.025-0.02}{0.05-0.02} or equivalently

t_{0.025} \approx 2.68 + (2.18-2.68) \frac{0.025-0.02}{0.05-0.02} = 2.68 – 0.5 \frac{1}{6} \approx 2.60

The actual answer is 2.56 … which is not particularly close because the function we’re approximating isn’t very close to linear in that range (nearer \alpha = 0.5 it is).

linear interpolation of critical value in t-tables

Better approximations via transformation

We can replace linear interpolation by other functional forms; in effect, we transform to a scale where linear interpolation works better. In this case, in the tail, many tabulated critical values are more nearly linear the \log of the significance level. After we take \logs, we simply apply linear interpolation as before. Let’s try that on the above example:

\begin{array}{ c c }
α & \log(α)& t\\
0.02 & -3.912 & 2.68\\
0.025& -3.689 & t_{0.025}\\
0.05 & -2.996 & 2.18\\


\frac{t_{0.025}-2.68}{2.18-2.68} &\approx& \frac{\log(0.025)-\log(0.02)}{\log(0.05)-\log(0.02)} \\
&=& \frac{-3.689 – -3.912}{-2.996 – -3.912}\\

or equivalently

t_{0.025} &\approx& 2.68 + (2.18-2.68) \frac{-3.689 – -3.912}{-2.996 – -3.912}\\
&=& 2.68 – 0.5 \cdot 0.243 \approx 2.56

Which is correct to the quoted number of figures. This is because – when we transform the x-scale logarithmically – the relationship is almost linear:

linear interpolation in log alpha
Indeed, visually the curve (grey) lies neatly on top of the straight line (blue).

In some cases, the logit of the significance level (\text{logit}(\alpha)=\log(\frac{α}{1-α})=\log(\frac{1}{1-α}-1)) may work well over a wider range but is usually not necessary (we usually only care about accurate critical values when \alpha is small enough that \log works quite well).

Interpolation across different degrees of freedom

t, chi-square and F tables also have degrees of freedom, where not every df (\nu-) value is tabulated. The critical values mostly^\dagger aren’t accurately represented by linear interpolation in the df. Indeed, often it’s more nearly the case that the tabulated values are linear in the reciprocal of df, 1/\nu.

(In old tables you’d often see a recommendation to work with 120/\nu – the constant on the numerator makes no difference, but was more convenient in pre-calculator days because 120 has a lot of factors, so 120/\nu is often an integer, making the calculation a bit simpler.)

Here’s how inverse interpolation performs on 5% critical values of F_{4,\nu} between \nu = 60 and 120. That is, only the endpoints participate in the interpolation in 1/\nu. For example, to compute the critical value for \nu=80, we take (and note that here F represents the inverse of the cdf):

F_{4,80,.95} \approx F_{4,60,.95} + \frac{1/80 – 1/60}{1/120 – 1/60} \cdot (F_{4,120,.95}-F_{4,60,.95})

inverse interp in df

(Compare with diagram here)

^\dagger Mostly but not always. Here’s an example where linear interpolation in df is better, and an explanation of how to tell from the table that linear interpolation is going to be accurate.

Here’s a piece of a chi-squared table

            Probability less than the critical value
 df           0.90      0.95     0.975      0.99     0.999
______   __________________________________________________

 40         51.805    55.758    59.342    63.691    73.402
 50         63.167    67.505    71.420    76.154    86.661
 60         74.397    79.082    83.298    88.379    99.607
 70         85.527    90.531    95.023   100.425   112.317

Imagine we wish to find the 5% critical value (95th percentiles) for 57 degrees of freedom.

Looking closely, we see that the 5% critical values in the table progress almost linearly here:

enter image description here

(the green line joins the values for 50 and 60 df; you can see it touches the dots for 40 and 70)

So linear interpolation will do very well. But of course we don’t have time to draw the graph; how to decide when to use linear interpolation and when to try something more complicated?

As well as the values either side of the one we seek, take the next nearest value (70 in this case). If the middle tabulated value (the one for df=60) is close to linear between the end values (50 and 70), then linear interpolation will be suitable. In this case the values are equispaced so it’s especially easy: is (x_{50,0.95}+x_{70,0.95})/2 close to x_{60,0.95}?

We find that (67.505+90.531)/2 = 79.018, which when compared to the actual value for 60 df, 79.082, we can see is accurate to almost three full figures, which is usually pretty good for interpolation, so in this case, you’d stick with linear interpolation; with the finer step for the value we need we would now expect to have effectively 3 figure accuracy.

So we get: \frac{x-67.505}{79.082-67.505} \approx {57-50}{60-50} or

x\approx 67.505+(79.082-67.505)\cdot {57-50}{60-50}\approx 75.61.

The actual value is 75.62375, so we indeed got 3 figures of accuracy and were only out by 1 in the fourth figure.

More accurate interpolation still may be had by using methods of finite differences (in particular, via divided differences), but this is probably overkill for most hypothesis testing problems.

If your degrees of freedom go past the ends of your table, this question discusses that problem.

Source : Link , Question Author : Glen_b , Answer Author : Glen_b

Leave a Comment