Is it realistic for all variables to be highly significant in a multiple regression model?

I want to regress the fuel economy on engine displacement, fuel type, 2 vs. 4 wheel drive, horsepower, manual vs. automatic transmission, and the number of speeds. My data set (link) contains vehicles from 2012-2014.

  • fuelEconomy in miles per gallon
  • engineDisplacement: engine size in liters
  • fuelStd: 1 for gas 0 for diesel
  • wheelDriveStd: 1 for 2-wheel drive, 0 for 4-wheel drive
  • hp: horsepower
  • transStd: 1 for Automatic, 0 for manual
  • transSpeed: Number of speeds

R-code:

reg = lm(fuelEconomy ~ engineDisplacement + fuelStd + wheelDriveStd + hp + 
                       transStd + transSpeed, data = a)
summary(reg)
Call:
lm(formula = fuelEconomy ~ engineDisplacement + fuelStd + wheelDriveStd + 
    hp + transStd + transSpeed, data = a)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.2765  -2.3142  -0.0655   2.0944  15.8637 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        48.147115   0.542910  88.683  < 2e-16 ***
engineDisplacement -3.673549   0.091272 -40.248  < 2e-16 ***
fuelStd            -6.613112   0.403989 -16.370  < 2e-16 ***
wheelDriveStd       2.778134   0.137775  20.164  < 2e-16 ***
hp                 -0.005884   0.001008  -5.840 5.86e-09 ***
transStd           -0.351853   0.157570  -2.233   0.0256 *  
transSpeed         -0.080365   0.052538  -1.530   0.1262    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.282 on 2648 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.7802,    Adjusted R-squared:  0.7797 
F-statistic:  1566 on 6 and 2648 DF,  p-value: < 2.2e-16
  1. Are the results realistic or am I doing something wrong here as most of the variables are highly statistically significant?
  2. Are other models better to use for this purpose?
  3. Is such a result usable for interpretation?

Answer

I know very little about the mechanics and physics involved, but the first thing I would look at is the regression diagnostics, in particular, the plots of residuals vs fitted values, for which we would like there to be no overall pattern.

You have fitted a linear model so that each covariate has a linear association with fuelEconomy . Is this supported by the underlying mechanical and physical theory ? Could there be any nonlinear association(s) ? If so then you could consider models with nonlinear terms, transforming certain variables, or you could consider using an additive model. Even if the associations are plausibly linear within your actual dataset, be very wary of extrapolating the results beyond your data limits.

Attribution
Source : Link , Question Author : Bert , Answer Author : Robert Long

Leave a Comment