OLS regression results: p-values > 0.10, how to proceed?

In the Python statsmodels documentation there is an example with the goal:

We want to know whether literacy rates (Literacy column) in the 85 French departments (Departments) are associated with per capita wagers on the Royal Lottery (Lottery) in the 1820s. We need to control for the level of wealth (Wealth) in each department, and we also want to include a series of dummy variables on the right-hand side of our regression equation to control for unobserved heterogeneity due to regional effects (Region; N, E, S, W to 0 or 1). The model is estimated using ordinary least squares regression (OLS).

OLS Regression Results
==============================================================================
Dep. Variable:                Lottery   R-squared:                       0.338
Model:                            OLS   Adj. R-squared:                  0.287
Method:                 Least Squares   F-statistic:                     6.636
Date:                Tue, 02 Feb 2021   Prob (F-statistic):           1.07e-05
Time:                        07:07:06   Log-Likelihood:                -375.30
No. Observations:                  85   AIC:                             764.6
Df Residuals:                      78   BIC:                             781.7
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      38.6517      9.456      4.087      0.000      19.826      57.478
Region[T.E]   -15.4278      9.727     -1.586      0.117     -34.793       3.938
Region[T.N]   -10.0170      9.260     -1.082      0.283     -28.453       8.419
Region[T.S]    -4.5483      7.279     -0.625      0.534     -19.039       9.943
Region[T.W]   -10.0913      7.196     -1.402      0.165     -24.418       4.235
Literacy       -0.1858      0.210     -0.886      0.378      -0.603       0.232
Wealth          0.4515      0.103      4.390      0.000       0.247       0.656
==============================================================================
Omnibus:                        3.049   Durbin-Watson:                   1.785
Prob(Omnibus):                  0.218   Jarque-Bera (JB):                2.694
Skew:                          -0.340   Prob(JB):                        0.260
Kurtosis:                       2.454   Cond. No.                         371.
==============================================================================

Prob (F-statistic), 1.07e-05, thus reject null hypothesis (H0: all coefficients are equal to zero), so there is statistically significant evidence that there is a relationship between dependent and independent variables together. But only Wealth has a p-value < 0.05.

Should the model be used as is? Or should all independent variables except Wealth be removed? What should be done based on the goal “We want to know whether literacy … We need to control for the level of wealth (Wealth) in each department …”?

Answer

Assuming that there are no problems with model assumptions, the model should be used as it is. Insignificant variables should not be removed. Removing them would invalidate any tests that are run within the reduced models. (Removing insignificant variables seems to be a common practice, but that doesn’t make it better. Occasionally there are reasons such as removing variables that are potentially expensive to observe in the future when using the model for prediction, or that the number of observations is too small for fitting a full model with reasonable reliability, but I don’t see such reasons here; even in such cases there are often better criteria than significance.)

Attribution
Source : Link , Question Author : Anne Maier , Answer Author : Christian Hennig

Leave a Comment