I’m running an experiment where I’m gathering (independent) samples in parallel, I compute the variance of each group of samples and now I want to combine then all to find the total variance of all the samples.

I’m having a hard time finding a derivation for this as I’m not sure of terminology. I think of it as a partition of one RV.

So I want to find Var(X) from Var(X1), Var(X2), …, and Var(Xn), where X = [X1,X2,…,Xn].

EDIT: The partitions are not the same size/cardinality, but the sum of the partition sizes equal the number of samples in the overall sample set.

EDIT 2: There is a formula for a parallel computation here, but it only covers the case of a partition into two sets, not n sets.

**Answer**

The formula is fairly straightforward if all the sub-sample have the same sample size. If you had g sub-samples of size k (for a total of gk samples), then the variance of the combined sample depends on the mean Ej and variance Vj of each sub-sample:

Var(X1,…,Xgk)=k−1gk−1(g∑j=1Vj+k(g−1)k−1Var(Ej)), where by Var(Ej) means the variance of the sample means.

A demonstration in R:

```
> x <- rnorm(100)
> g <- gl(10,10)
> mns <- tapply(x, g, mean)
> vs <- tapply(x, g, var)
> 9/99*(sum(vs) + 10*var(mns))
[1] 1.033749
> var(x)
[1] 1.033749
```

If the sample sizes are not equal, the formula is not so nice.

**EDIT: formula for unequal sample sizes**

If there are g sub-samples, each with kj,j=1,…,g elements for a total of n=∑kj values, then

Var(X1,…,Xn)=1n−1(g∑j=1(kj−1)Vj+g∑j=1kj(ˉXj−ˉX)2),

where ˉX=(∑gj=1kjˉXj)/n is the weighted average of all the means (and equals to the mean of all values).

Again, a demonstration:

```
> k <- rpois(10, lambda=10)
> n <- sum(k)
> g <- factor(rep(1:10, k))
> x <- rnorm(n)
> mns <- tapply(x, g, mean)
> vs <- tapply(x, g, var)
> 1/(n-1)*(sum((k-1)*vs) + sum(k*(mns-weighted.mean(mns,k))^2))
[1] 1.108966
> var(x)
[1] 1.108966
```

By the way, these formulas are easy to derive by writing the desired variance as the scaled sum of (Xji−ˉX)2, then introducing ˉXj: [(Xji−ˉXj)−(ˉXj−ˉX)]2, using the square of difference formula, and simplifying.

**Attribution***Source : Link , Question Author : gallamine , Answer Author : gallamine*