# How to calculate the variance of a partition of variables

I’m running an experiment where I’m gathering (independent) samples in parallel, I compute the variance of each group of samples and now I want to combine then all to find the total variance of all the samples.

I’m having a hard time finding a derivation for this as I’m not sure of terminology. I think of it as a partition of one RV.

So I want to find $Var(X)$ from $Var(X_1)$, $Var(X_2)$, …, and $Var(X_n)$, where $X$ = $[X_1, X_2, \dots, X_n]$.

EDIT: The partitions are not the same size/cardinality, but the sum of the partition sizes equal the number of samples in the overall sample set.

EDIT 2: There is a formula for a parallel computation here, but it only covers the case of a partition into two sets, not $n$ sets.

The formula is fairly straightforward if all the sub-sample have the same sample size. If you had $g$ sub-samples of size $k$ (for a total of $gk$ samples), then the variance of the combined sample depends on the mean $E_j$ and variance $V_j$ of each sub-sample:
where by $Var(E_j)$ means the variance of the sample means.

A demonstration in R:

> x <- rnorm(100)
> g <- gl(10,10)
> mns <- tapply(x, g, mean)
> vs <- tapply(x, g, var)
> 9/99*(sum(vs) + 10*var(mns))
[1] 1.033749
> var(x)
[1] 1.033749


If the sample sizes are not equal, the formula is not so nice.

EDIT: formula for unequal sample sizes

If there are $g$ sub-samples, each with $k_j, j=1,\ldots,g$ elements for a total of $n=\sum{k_j}$ values, then

where $\bar{X} = (\sum_{j=1}^gk_j\bar{X}_j)/n$ is the weighted average of all the means (and equals to the mean of all values).

Again, a demonstration:

> k <- rpois(10, lambda=10)
> n <- sum(k)
> g <- factor(rep(1:10, k))
> x <- rnorm(n)
> mns <- tapply(x, g, mean)
> vs <- tapply(x, g, var)
> 1/(n-1)*(sum((k-1)*vs) + sum(k*(mns-weighted.mean(mns,k))^2))
[1] 1.108966
> var(x)
[1] 1.108966


By the way, these formulas are easy to derive by writing the desired variance as the scaled sum of $(X_{ji}-\bar{X})^2$, then introducing $\bar{X}_j$: $[(X_{ji}-\bar{X}_j)-(\bar{X}_j-\bar{X})]^2$, using the square of difference formula, and simplifying.