In all the (regression) random forest papers I’ve read, when it comes the time to gather the predictions of all the trees, we take the average value as the prediction.

My question is why do we do that?

Is there is a statistical justification for taking the average?

EDIT: To clarify the question, I know it’s possible to use other aggregation functions (we use the mode for classification), I’m mostly interested in whether there is some theoretical justification behind the choice of the average function.

**Answer**

I’ve always thought about the averaging in terms of the bias-variance tradeoff. If I remember correctly Leo Breiman hinted at this in the randomForest paper with his statement “… are more robust with respect to noise.”

The explanation goes like this: basically you are taking a bunch of trees that are grown to full length-no pruning-so you know they will each be biased by themselves. However, the random sampling that induces each tree in the forest should induce under-bias as often as over-bias. So by taking an average you then eliminate the bias of each tree-the over+under biases canceling. Hopefully in the process you also reduce the variance in each tree and so the overall variance should be reduced as well.

As indicated by the other answers to the post, this might not be the only reason for averaging.

**Attribution***Source : Link , Question Author : Bar , Answer Author : Lucas Roberts*