I would like to generate pairs of random numbers with certain correlation. However, the usual approach of using a linear combination of two normal variables is not valid here, because a linear combination of uniform variables is not any more an uniformly distributed variable. I need the two variables to be uniform.

Any idea on how to generate pairs of uniform variables with a given correlation?

**Answer**

I’m not aware of a universal method to generate correlated random variables with any given marginal distributions. So, I’ll propose an ad hoc method to generate pairs of uniformly distributed random variables with a given (Pearson) correlation.

Without loss of generality, I assume that the desired marginal distribution is standard uniform (i.e., the support is [0,1]).

The proposed approach relies on the following:

a) For standard uniform random variables U1 and U2 with respective distribution functions F1 and F2, we have Fi(Ui)=Ui, for i=1,2.

Thus, by definition Spearman’s rho is

ρS(U1,U2)=corr(F1(U1),F2(U2))=corr(U1,U2).

So, Spearman’s rho and Pearson’s correlation coefficient are equal (sample versions might however differ).

b) If X1,X2 are random variables with continuous margins and Gaussian copula with (Pearson) correlation coefficient ρ, then Spearman’s rho is

ρS(X1,X2)=6πarcsin(ρ2).

This makes it easy to generate random variables that have a desired value of Spearman’s rho.

The approach is to generate data from the Gaussian copula with an appropriate correlation coefficient ρ such that the Spearman’s rho corresponds to the desired correlation for the uniform random variables.

**Simulation algorithm**

Let r denote the desired level of correlation, and n the number of pairs to be generated.

The algorithm is:

- Compute ρ=2sin(rπ/6).
- Generate a pair of random variables from the Gaussian copula (e.g., with this approach)
- Repeat step 2 n times.

**Example**

The following code is an example of implementation of this algorithm using R with a target correlation r=0.6 and n=500 pairs.

```
## Initialization and parameters
set.seed(123)
r <- 0.6 # Target (Spearman) correlation
n <- 500 # Number of samples
## Functions
gen.gauss.cop <- function(r, n){
rho <- 2 * sin(r * pi/6) # Pearson correlation
P <- toeplitz(c(1, rho)) # Correlation matrix
d <- nrow(P) # Dimension
## Generate sample
U <- pnorm(matrix(rnorm(n*d), ncol = d) %*% chol(P))
return(U)
}
## Data generation and visualization
U <- gen.gauss.cop(r = r, n = n)
pairs(U, diag.panel = function(x){
h <- hist(x, plot = FALSE)
rect(head(h$breaks, -1), 0, tail(h$breaks, -1), h$counts/max(h$counts))})
```

In the figure below, the diagonal plots show histograms of variables U1 and U2, and off-diagonal plots show scatter plots of U1 and U2.

By constuction, the random variables have uniform margins and a correlation coefficient (close to) r. But due to the effect of sampling, the correlation coefficient of the simulated data is not exactly equal to r.

```
cor(U)[1, 2]
# [1] 0.5337697
```

Note that the `gen.gauss.cop`

function should work with more than two variables simply by specifying a larger correlation matrix.

**Simulation study**

The following simulation study repeated for target correlation r=−0.5,0.1,0.6 suggests that the distribution of the correlation coefficient converges to the desired correlation as the sample size n increases.

```
## Simulation
set.seed(921)
r <- 0.6 # Target correlation
n <- c(10, 50, 100, 500, 1000, 5000); names(n) <- n # Number of samples
S <- 1000 # Number of simulations
res <- sapply(n,
function(n, r, S){
replicate(S, cor(gen.gauss.cop(r, n))[1, 2])
},
r = r, S = S)
boxplot(res, xlab = "Sample size", ylab = "Correlation")
abline(h = r, col = "red")
```

**Attribution***Source : Link , Question Author : Pythonist , Answer Author : Glen_b*