# Generate pairs of random numbers uniformly distributed and correlated

I would like to generate pairs of random numbers with certain correlation. However, the usual approach of using a linear combination of two normal variables is not valid here, because a linear combination of uniform variables is not any more an uniformly distributed variable. I need the two variables to be uniform.

Any idea on how to generate pairs of uniform variables with a given correlation?

I’m not aware of a universal method to generate correlated random variables with any given marginal distributions. So, I’ll propose an ad hoc method to generate pairs of uniformly distributed random variables with a given (Pearson) correlation.
Without loss of generality, I assume that the desired marginal distribution is standard uniform (i.e., the support is $[0, 1]$).

The proposed approach relies on the following:
a) For standard uniform random variables $U_1$ and $U_2$ with respective distribution functions $F_1$ and $F_2$, we have $F_i(U_i) = U_i$, for $i = 1, 2$.
Thus, by definition Spearman’s rho is

So, Spearman’s rho and Pearson’s correlation coefficient are equal (sample versions might however differ).

b) If $X_1, X_2$ are random variables with continuous margins and Gaussian copula with (Pearson) correlation coefficient $\rho$, then Spearman’s rho is

This makes it easy to generate random variables that have a desired value of Spearman’s rho.

The approach is to generate data from the Gaussian copula with an appropriate correlation coefficient $\rho$ such that the Spearman’s rho corresponds to the desired correlation for the uniform random variables.

Simulation algorithm
Let $r$ denote the desired level of correlation, and $n$ the number of pairs to be generated.
The algorithm is:

1. Compute $\rho = 2\sin (r \pi/6)$.
2. Generate a pair of random variables from the Gaussian copula (e.g., with this approach)
3. Repeat step 2 $n$ times.

Example
The following code is an example of implementation of this algorithm using R with a target correlation $r = 0.6$ and $n = 500$ pairs.

## Initialization and parameters
set.seed(123)
r <- 0.6                            # Target (Spearman) correlation
n <- 500                            # Number of samples

## Functions
gen.gauss.cop <- function(r, n){
rho <- 2 * sin(r * pi/6)        # Pearson correlation
P <- toeplitz(c(1, rho))        # Correlation matrix
d <- nrow(P)                    # Dimension
## Generate sample
U <- pnorm(matrix(rnorm(n*d), ncol = d) %*% chol(P))
return(U)
}

## Data generation and visualization
U <- gen.gauss.cop(r = r, n = n)
pairs(U, diag.panel = function(x){
h <- hist(x, plot = FALSE)
rect(head(h$breaks, -1), 0, tail(h$breaks, -1), h$counts/max(h$counts))})


In the figure below, the diagonal plots show histograms of variables $U_1$ and $U_2$, and off-diagonal plots show scatter plots of $U_1$ and $U_2$.

By constuction, the random variables have uniform margins and a correlation coefficient (close to) $r$. But due to the effect of sampling, the correlation coefficient of the simulated data is not exactly equal to $r$.

cor(U)[1, 2]
# [1] 0.5337697


Note that the gen.gauss.cop function should work with more than two variables simply by specifying a larger correlation matrix.

Simulation study
The following simulation study repeated for target correlation $r= -0.5, 0.1, 0.6$ suggests that the distribution of the correlation coefficient converges to the desired correlation as the sample size $n$ increases.

## Simulation
set.seed(921)
r <- 0.6                                                # Target correlation
n <- c(10, 50, 100, 500, 1000, 5000); names(n) <- n     # Number of samples
S <- 1000                                               # Number of simulations

res <- sapply(n,
function(n, r, S){
replicate(S, cor(gen.gauss.cop(r, n))[1, 2])
},
r = r, S = S)
boxplot(res, xlab = "Sample size", ylab = "Correlation")
abline(h = r, col = "red")