How to calculate the variance of a partition of variables

I’m running an experiment where I’m gathering (independent) samples in parallel, I compute the variance of each group of samples and now I want to combine then all to find the total variance of all the samples.

I’m having a hard time finding a derivation for this as I’m not sure of terminology. I think of it as a partition of one RV.

So I want to find Var(X) from Var(X1), Var(X2), …, and Var(Xn), where X = [X1,X2,,Xn].

EDIT: The partitions are not the same size/cardinality, but the sum of the partition sizes equal the number of samples in the overall sample set.

EDIT 2: There is a formula for a parallel computation here, but it only covers the case of a partition into two sets, not n sets.

Answer

The formula is fairly straightforward if all the sub-sample have the same sample size. If you had g sub-samples of size k (for a total of gk samples), then the variance of the combined sample depends on the mean Ej and variance Vj of each sub-sample:
Var(X1,,Xgk)=k1gk1(gj=1Vj+k(g1)k1Var(Ej)), where by Var(Ej) means the variance of the sample means.

A demonstration in R:

> x <- rnorm(100)
> g <- gl(10,10)
> mns <- tapply(x, g, mean)
> vs <- tapply(x, g, var)
> 9/99*(sum(vs) + 10*var(mns))
[1] 1.033749
> var(x)
[1] 1.033749

If the sample sizes are not equal, the formula is not so nice.

EDIT: formula for unequal sample sizes

If there are g sub-samples, each with kj,j=1,,g elements for a total of n=kj values, then
Var(X1,,Xn)=1n1(gj=1(kj1)Vj+gj=1kj(ˉXjˉX)2),
where ˉX=(gj=1kjˉXj)/n is the weighted average of all the means (and equals to the mean of all values).

Again, a demonstration:

> k <- rpois(10, lambda=10)
> n <- sum(k)
> g <- factor(rep(1:10, k))
> x <- rnorm(n)
> mns <- tapply(x, g, mean)
> vs <- tapply(x, g, var)
> 1/(n-1)*(sum((k-1)*vs) + sum(k*(mns-weighted.mean(mns,k))^2))
[1] 1.108966
> var(x)
[1] 1.108966

By the way, these formulas are easy to derive by writing the desired variance as the scaled sum of (XjiˉX)2, then introducing ˉXj: [(XjiˉXj)(ˉXjˉX)]2, using the square of difference formula, and simplifying.

Attribution
Source : Link , Question Author : gallamine , Answer Author : gallamine

Leave a Comment