Computation of new standard deviation using old standard deviation after change in dataset

I have an array of n real values, which has mean μold and standard deviation σold. If an element of the array xi is replaced by another element xj, then new mean will be

μnew=μold+xjxin

Advantage of this approach is it requires constant computation regardless of value of n. Is there any approach to calculate σnew using σold like the computation of μnew using μold?

Answer

A section in the Wikipedia article on “Algorithms for calculating variance” shows how to compute the variance if elements are added to your observations. (Recall that the standard deviation is the square root of the variance.) Assume that you append xn+1 to your array, then

σ2new=σ2old+(xn+1μnew)(xn+1μold).

EDIT: Above formula seems to be wrong, see comment.

Now, replacing an element means adding an observation and removing another one; both can be computed with the formula above. However, keep in mind that problems of numerical stability may ensue; the quoted article also proposes numerically stable variants.

To derive the formula by yourself, compute (n1)(σ2newσ2old) using the definition of sample variance and substitute μnew by the formula you gave when appropriate. This gives you σ2newσ2old in the end, and thus a formula for σnew given σold and μold. In my notation, I assume you replace the element xn by xn:

σ2=(n1)1k(xkμ)2(n1)(σ2newσ2old)=n1k=1((xkμnew)2(xkμold)2)+ ((xnμnew)2(xnμold)2)=n1k=1((xkμoldn1(xnxn))2(xkμold)2)+ ((xnμoldn1(xnxn))2(xnμold)2)

The xk in the sum transform into something dependent of μold, but you’ll have to work the equation a little bit more to derive a neat result. This should give you the general idea.

Attribution
Source : Link , Question Author : user , Answer Author : krlmlr

Leave a Comment