I have run a multiple regression in which the model as a whole is significant and explains about 13% of the variance. However, I need to find the amount of variance explained by each significant predictor. How can I do this using R?

Here’s some sample data and code:

`D = data.frame( dv = c( 0.75, 1.00, 1.00, 0.75, 0.50, 0.75, 1.00, 1.00, 0.75, 0.50 ), iv1 = c( 0.75, 1.00, 1.00, 0.75, 0.75, 1.00, 0.50, 0.50, 0.75, 0.25 ), iv2 = c( 0.882, 0.867, 0.900, 0.333, 0.875, 0.500, 0.882, 0.875, 0.778, 0.867 ), iv3 = c( 1.000, 0.067, 1.000, 0.933, 0.875, 0.500, 0.588, 0.875, 1.000, 0.467 ), iv4 = c( 0.889, 1.000, 0.905, 0.938, 0.833, 0.882, 0.444, 0.588, 0.895, 0.812 ), iv5 = c( 18, 16, 21, 16, 18, 17, 18, 17, 19, 16 ) ) fit = lm( dv ~ iv1 + iv2 + iv3 + iv4 + iv5, data=D ) summary( fit )`

Here’s the output with my actual data:

`Call: lm(formula = posttestScore ~ pretestScore + probCategorySame + probDataRelated + practiceAccuracy + practiceNumTrials, data = D) Residuals: Min 1Q Median 3Q Max -0.6881 -0.1185 0.0516 0.1359 0.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.77364 0.10603 7.30 8.5e-13 *** iv1 0.29267 0.03091 9.47 < 2e-16 *** iv2 0.06354 0.02456 2.59 0.0099 ** iv3 0.00553 0.02637 0.21 0.8340 iv4 -0.02642 0.06505 -0.41 0.6847 iv5 -0.00941 0.00501 -1.88 0.0607 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.18 on 665 degrees of freedom Multiple R-squared: 0.13, Adjusted R-squared: 0.123 F-statistic: 19.8 on 5 and 665 DF, p-value: <2e-16`

This question has been answered here, but the accepted answer only addresses uncorrelated predictors, and while there is an additional response that addresses correlated predictors, it only provides a general hint, not a specific solution. I would like to know what to do if my predictors are correlated.

**Answer**

The percentage explained depends on the order entered.

If you specify a particular order, you can compute this trivially in R (e.g. via the `update`

and `anova`

functions, see below), but a different order of entry would yield potentially very different answers.

[One possibility might be to average across all orders or something, but it would get unwieldy and might not be answering a particularly useful question.]

—

As Stat points out, with a single model, if you’re after one variable at a time, you can just use ‘anova’ to produce the incremental sums of squares table. This would follow on from your code:

```
anova(fit)
Analysis of Variance Table
Response: dv
Df Sum Sq Mean Sq F value Pr(>F)
iv1 1 0.033989 0.033989 0.7762 0.4281
iv2 1 0.022435 0.022435 0.5123 0.5137
iv3 1 0.003048 0.003048 0.0696 0.8050
iv4 1 0.115143 0.115143 2.6294 0.1802
iv5 1 0.000220 0.000220 0.0050 0.9469
Residuals 4 0.175166 0.043791
```

—

So there we have the incremental variance explained; how do we get the proportion?

Pretty trivially, scale them by 1 divided by their sum. (Replace the 1 with 100 for percentage variance explained.)

Here I’ve displayed it as an added column to the anova table:

```
af <- anova(fit)
afss <- af$"Sum Sq"
print(cbind(af,PctExp=afss/sum(afss)*100))
Df Sum Sq Mean Sq F value Pr(>F) PctExp
iv1 1 0.0339887640 0.0339887640 0.77615140 0.4280748 9.71107544
iv2 1 0.0224346357 0.0224346357 0.51230677 0.5137026 6.40989591
iv3 1 0.0030477233 0.0030477233 0.06959637 0.8049589 0.87077807
iv4 1 0.1151432643 0.1151432643 2.62935731 0.1802223 32.89807550
iv5 1 0.0002199726 0.0002199726 0.00502319 0.9468997 0.06284931
Residuals 4 0.1751656402 0.0437914100 NA NA 50.04732577
```

—

If you decide you want several particular orders of entry, you can do something even more general like this (which also allows you to enter or remove groups of variables at a time if you wish):

```
m5 = fit
m4 = update(m5, ~ . - iv5)
m3 = update(m4, ~ . - iv4)
m2 = update(m3, ~ . - iv3)
m1 = update(m2, ~ . - iv2)
m0 = update(m1, ~ . - iv1)
anova(m0,m1,m2,m3,m4,m5)
Analysis of Variance Table
Model 1: dv ~ 1
Model 2: dv ~ iv1
Model 3: dv ~ iv1 + iv2
Model 4: dv ~ iv1 + iv2 + iv3
Model 5: dv ~ iv1 + iv2 + iv3 + iv4
Model 6: dv ~ iv1 + iv2 + iv3 + iv4 + iv5
Res.Df RSS Df Sum of Sq F Pr(>F)
1 9 0.35000
2 8 0.31601 1 0.033989 0.7762 0.4281
3 7 0.29358 1 0.022435 0.5123 0.5137
4 6 0.29053 1 0.003048 0.0696 0.8050
5 5 0.17539 1 0.115143 2.6294 0.1802
6 4 0.17517 1 0.000220 0.0050 0.9469
```

(Such an approach might also be automated, e.g. via loops and the use of `get`

. You can add and remove variables in multiple orders if needed)

… and then scale to percentages as before.

(NB. The fact that I explain how to do these things should not necessarily be taken as advocacy of everything I explain.)

**Attribution***Source : Link , Question Author : baixiwei , Answer Author : Glen_b*