understanding of p-value in multiple linear regression

Regarding the p-value of multiple linear regression analysis, the introduction from Minitab’s website is shown below.

The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect). A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor’s value are related to changes in the response variable.

For example, I have a resultant MLR model as
$. and the out put is shown below. Then a $y$ can be calculated using this equation.

            Estimate      SE        tStat       pValue  
               ________    ______    _________    _________

(Intercept)      14.48     5.0127       2.8886    0.0097836
x1             0.46753     1.2824      0.36458      0.71967
x2             -0.2668     3.3352    -0.079995      0.93712
x3              1.6193     9.0581      0.17877      0.86011
x4              4.5424     2.8565       1.5902       0.1292

Based on the introduction above, the null hypothesis is that the coefficient equals 0. My understanding is that the coefficient, for example the coefficient of $X_{4}$, will be set as 0 and another y will be calculated as $y_{2}=0.46753{{X}_{1}}-0.2668{{X}_{2}}+1.6193{{X}_{3}}+0{{X}_{4}}+14.48$.
Then a paired t-test is conducted for $y$ and $y_{2}$, but the p-value of this t-test is 6.9e-12 which does not equal to 0.1292 (p-value of coefficient of $X_{4}$.

Can anyone help on the correct understanding? Many thanks!


This is incorrect for a couple reasons:

  1. The model “without” X4 will not necessarily have the same coefficient estimates for the other values. Fit the reduced model and see for yourself.

  2. The statistical test for the coefficient does not concern the “mean” values of Y obtained from 2 predictions. The predicted $Y$ will always have the same grand mean, thus have a p-value from the t-test equal to 0.5. The same holds for the residuals. Your t-test had the wrong value per the point above.

  3. The statistical test which is conducted for the statistical significance of the coefficient is a one sample t-test. This is confusing since we do not have a “sample” of multiple coefficients for X4, but we have an estimate of the distributional properties of such a sample using the central limit theorem. The mean and standard error describe the location and shape of such a limiting distribution. If you take the column “Est” and divide by “SE” and compare to a standard normal distribution, this gives you the p-values in the 4th column.

  4. A fourth point: a criticism of minitab’s help page. Such a help file could not, in a paragraph, summarize years of statistical training, so I need not contend with the whole thing. But, to say that a “predictor” is “an important contribution” is vague and probably incorrect. The rationale for choosing which variables to include in a multivariate model is subtle and relies on scientific reasoning and not statistical inference.

Source : Link , Question Author : user2230101 , Answer Author : AdamO

Leave a Comment