I want to regress the fuel economy on engine displacement, fuel type, 2 vs. 4 wheel drive, horsepower, manual vs. automatic transmission, and the number of speeds. My data set (link) contains vehicles from 2012-2014.

`fuelEconomy`

in miles per gallon`engineDisplacement`

: engine size in liters`fuelStd`

: 1 for gas 0 for diesel`wheelDriveStd`

: 1 for 2-wheel drive, 0 for 4-wheel drive`hp`

: horsepower`transStd`

: 1 for Automatic, 0 for manual`transSpeed`

: Number of speedsR-code:

`reg = lm(fuelEconomy ~ engineDisplacement + fuelStd + wheelDriveStd + hp + transStd + transSpeed, data = a) summary(reg) Call: lm(formula = fuelEconomy ~ engineDisplacement + fuelStd + wheelDriveStd + hp + transStd + transSpeed, data = a) Residuals: Min 1Q Median 3Q Max -10.2765 -2.3142 -0.0655 2.0944 15.8637 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.147115 0.542910 88.683 < 2e-16 *** engineDisplacement -3.673549 0.091272 -40.248 < 2e-16 *** fuelStd -6.613112 0.403989 -16.370 < 2e-16 *** wheelDriveStd 2.778134 0.137775 20.164 < 2e-16 *** hp -0.005884 0.001008 -5.840 5.86e-09 *** transStd -0.351853 0.157570 -2.233 0.0256 * transSpeed -0.080365 0.052538 -1.530 0.1262 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.282 on 2648 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.7802, Adjusted R-squared: 0.7797 F-statistic: 1566 on 6 and 2648 DF, p-value: < 2.2e-16`

- Are the results realistic or am I doing something wrong here as most of the variables are highly statistically significant?
- Are other models better to use for this purpose?
- Is such a result usable for interpretation?

**Answer**

I know very little about the mechanics and physics involved, but the first thing I would look at is the regression diagnostics, in particular, the plots of residuals vs fitted values, for which we would like there to be no overall pattern.

You have fitted a linear model so that each covariate has a linear association with `fuelEconomy`

. Is this supported by the underlying mechanical and physical theory ? Could there be any nonlinear association(s) ? If so then you could consider models with nonlinear terms, transforming certain variables, or you could consider using an additive model. Even if the associations are plausibly linear within your actual dataset, be very wary of extrapolating the results beyond your data limits.

**Attribution***Source : Link , Question Author : Bert , Answer Author : Robert Long*