How to combine standard errors for correlated variables

What is the formula for calculating the standard error of a quantity (A) that is the ratio of 2 quantities (A = B/C) if B and C are correlated?

According to page 2 of http://www.met.rdg.ac.uk/~swrhgnrj/combining_errors.pdf
the formula for independent variables would be:
to com

However, how do I account for the covariance of B and C?

Answer

I find a little algebraic manipulation of the following nature to provide a congenial path to solving problems like this — where you know the covariance matrix of variables (B,C) and wish to estimate the variance of some function of them, such as B/C. (This is often called the “Delta Method.”)

Write

B=β+X, C=γ+Y

where β is the expectation of B and γ that of C. This makes (X,Y) a zero-mean random variable with the same variances and covariance as (B,C). Seemingly nothing is accomplished, but this decomposition is algebraically suggestive, as in

A=BC=β+Xγ+Y=(βγ)1+X/β1+Y/γ.

That is, A is proportional to a ratio of two numbers that might both be close to unity. This is the circumstance that permits an approximate calculation of the variance of A based only on the covariance matrix of (B,C).

Right away this division by γ shows the futility of attempting a solution when γ0. (See https://stats.stackexchange.com/a/299765/919 for illustrations of what goes wrong when dividing one random variable by another that has a good chance of coming very close to zero.)

Assuming γ is reasonably far from 0, the foregoing expression also hints at the possibility of approximating the second fraction using the MacLaurin series for (1+Y/γ)1, which will be possible provided there is little change that |Y/γ|1 (outside the range of absolute convergence of this expansion). In other words, further suppose the distribution of C is concentrated between 0 and 2γ. In this case the series gives

1+X/β1+Y/γ=(1+X/β)(1(Y/γ)+O((Y/γ)2))=1+X/βY/γ+O((X/β)(Y/γ)2).

We may neglect the last term provided the chance that (X/β)(Y/γ)2 being large is tiny. This is tantamount to supposing most of the probability of Y is very close to γ and that X and Y2 are not too strongly correlated. In this case

Var(A)(βγ)2Var(1+X/βY/γ)=(βγ)2(1β2Var(B)+1γ2Var(C)2βγCov(B,C))=1γ2Var(B)+β2γ4Var(C)2βγ3Cov(B,C).


You might wonder why I fuss over the assumptions. They matter. One way to check them is to generate Normally distributed variates B and C in a simulation: it will provide a good estimate of the variance of A and, to the extent A appears approximately Normally distributed, will confirm the three bold assumptions needed to rely on this result do indeed hold.

For instance, with the covariance matrix (10.90.91) and means (β,γ)=(5,10), the approximation does OK (left panel):

Figure

The variance of these 100,000 simulated values is 0.0233, close to the formula’s value of 0.0215. But reducing γ from 10 to 4, which looks innocent enough (4 is still four standard deviations of C away from 0) has profound effects due to the strong correlation of B and C, as seen in the right hand histogram. Evidently C has a small but appreciable chance of being nearly 0, creating large values of B/C (both negative and positive). This is a case where we should not neglect the XY2 term in the MacLaurin expansion. Now the variance of these 100,000 simulated values of A is 2.200 but the formula gives 0.301, far too small.

This is the R code that generated the first figure. A small change in the third line generates the second figure.

n <- 1e5   # Simulation size
beta <- 5
gamma <- 10
Sigma <- matrix(c(1, -0.9, -0.9, 1), 2)

library(MASS) #mvrnorm

bc <- mvrnorm(n, c(beta, gamma), Sigma)
A <- bc[, 1] / bc[, 2]
#
# Report the simulated and approximate variances.
#
signif(c(`Var(A)`=var(A), 
  Approx=(Sigma[1,1]/gamma^2 + beta^2*Sigma[2,2]/gamma^4 - 2*beta/gamma^3*Sigma[1,2])),
  3)

hist(A, freq=FALSE, breaks=50, col="#f0f0f0")
curve(dnorm(x, mean(A), sd(A)), col="SkyBlue", lwd=2, add=TRUE)

Attribution
Source : Link , Question Author : Ralphael M. , Answer Author : whuber

Leave a Comment