What is the formula for calculating the standard error of a quantity (A) that is the ratio of 2 quantities (A = B/C) if B and C are correlated?
According to page 2 of http://www.met.rdg.ac.uk/~swrhgnrj/combining_errors.pdf
the formula for independent variables would be:
However, how do I account for the covariance of B and C?
I find a little algebraic manipulation of the following nature to provide a congenial path to solving problems like this — where you know the covariance matrix of variables (B,C) and wish to estimate the variance of some function of them, such as B/C. (This is often called the “Delta Method.”)
where β is the expectation of B and γ that of C. This makes (X,Y) a zero-mean random variable with the same variances and covariance as (B,C). Seemingly nothing is accomplished, but this decomposition is algebraically suggestive, as in
That is, A is proportional to a ratio of two numbers that might both be close to unity. This is the circumstance that permits an approximate calculation of the variance of A based only on the covariance matrix of (B,C).
Right away this division by γ shows the futility of attempting a solution when γ≈0. (See https://stats.stackexchange.com/a/299765/919 for illustrations of what goes wrong when dividing one random variable by another that has a good chance of coming very close to zero.)
Assuming γ is reasonably far from 0, the foregoing expression also hints at the possibility of approximating the second fraction using the MacLaurin series for (1+Y/γ)−1, which will be possible provided there is little change that |Y/γ|≥1 (outside the range of absolute convergence of this expansion). In other words, further suppose the distribution of C is concentrated between 0 and 2γ. In this case the series gives
We may neglect the last term provided the chance that (X/β)(Y/γ)2 being large is tiny. This is tantamount to supposing most of the probability of Y is very close to γ and that X and Y2 are not too strongly correlated. In this case
You might wonder why I fuss over the assumptions. They matter. One way to check them is to generate Normally distributed variates B and C in a simulation: it will provide a good estimate of the variance of A and, to the extent A appears approximately Normally distributed, will confirm the three bold assumptions needed to rely on this result do indeed hold.
For instance, with the covariance matrix (1−0.9−0.91) and means (β,γ)=(5,10), the approximation does OK (left panel):
The variance of these 100,000 simulated values is 0.0233, close to the formula’s value of 0.0215. But reducing γ from 10 to 4, which looks innocent enough (4 is still four standard deviations of C away from 0) has profound effects due to the strong correlation of B and C, as seen in the right hand histogram. Evidently C has a small but appreciable chance of being nearly 0, creating large values of B/C (both negative and positive). This is a case where we should not neglect the XY2 term in the MacLaurin expansion. Now the variance of these 100,000 simulated values of A is 2.200 but the formula gives 0.301, far too small.
This is the
R code that generated the first figure. A small change in the third line generates the second figure.
n <- 1e5 # Simulation size beta <- 5 gamma <- 10 Sigma <- matrix(c(1, -0.9, -0.9, 1), 2) library(MASS) #mvrnorm bc <- mvrnorm(n, c(beta, gamma), Sigma) A <- bc[, 1] / bc[, 2] # # Report the simulated and approximate variances. # signif(c(`Var(A)`=var(A), Approx=(Sigma[1,1]/gamma^2 + beta^2*Sigma[2,2]/gamma^4 - 2*beta/gamma^3*Sigma[1,2])), 3) hist(A, freq=FALSE, breaks=50, col="#f0f0f0") curve(dnorm(x, mean(A), sd(A)), col="SkyBlue", lwd=2, add=TRUE)