# Simulating data to fit a mediation model

I am interested in finding a procedure to simulate data that are consistent with a specified mediation model. According to the general linear structural equation model framework for testing mediation models first outlined by Barron and Kenny (1986) and described elsewhere such as Judd, Yzerbyt, & Muller (2013), mediation models for outcome $Y$, mediator $\newcommand{\med}{\rm med} \med$, and predictor $X$ and are governed by the following three regression equations:

The indirect effect or mediation effect of $X$ on $Y$ through $\med$ can either be defined as $b_{22}b_{32}$ or, equivalently, as $b_{12}-b_{32}$. Under the old framework of testing for mediation, mediation was established by testing $b_{12}$ in equation 1, $b_{22}$ in equation 2, and $b_{32}$ in equation 3.

So far, I have attempted to simulate values of $\med$ and $Y$ that are consistent with values of the various regression coefficients using rnorm in R, such as the code below:

x   <- rep(c(-.5, .5), 50)
med <- 4 + .7 * x + rnorm(100, sd = 1)

# Check the relationship between x and med
mod <- lm(med ~ x)
summary(mod)

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

# Check the relationships between x, med, and y
mod <- lm(y ~ x + med)
summary(mod)

# Check the relationship between x and y -- not present
mod <- lm(y ~ x)
summary(mod)


However, it seems that sequentially generating $\med$ and $Y$ using equations 2 and 3 is not enough, since I am left with no relationship between $X$ and $Y$ in regression equation 1 (which models a simple bivariate relationship between $X$ and $Y$) using this approach. This is important because one definition of the indirect (i.e., mediation) effect is $b_{12}-b_{32}$, as I describe above.

Can anyone help me find a procedure in R to generate variables $X$, $\med$, and $Y$ that satisfy constraints that I set using equations 1, 2, and 3?

This is quite straightforward. The reason you have no relationship between $$xx$$ and $$yy$$ using your approach is because of the code:

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)


If you want some relationship between $$xx$$ and $$yy$$ even when $${\rm med}{\rm med}$$ is included (that is, you want partial mediation), you would simply use a non-zero value for $$b_{32}b_{32}$$ instead. For example, you could substitute the following code for the above:

y <- 2.5 + 3 * x + .4 * med + rnorm(100, sd = 1)


Thus, $$b_{32}b_{32}$$ has been changed from $$00$$ to $$33$$. (Of course some other, specific value would probably be more relevant, depending on your situation, I just picked $$33$$ off the top of my head.)

Edit:
With respect to the marginal $$x\rightarrow yx\rightarrow y$$ relationship being non-significant, that is just a function of statistical power. Since the causal force of $$xx$$ is passed entirely through $${\rm med}{\rm med}$$ in your original setup, you have lower power than you might otherwise. Nonetheless, the effect is still real in some sense. When I ran your original code (after having set the seed using 90 as a value that I again just picked off the top of my head), I did get a significant effect:

set.seed(90)
x <- rep(c(-.5, .5), 50)
med <- 4 + .7 * x + rnorm(100, sd = 1)

# Check the relationship between x and med
mod <- lm(med ~ x)
summary(mod)

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

# Check the relationships between x, med, and y
mod <- lm(y ~ x + med)
summary(mod)

# Check the relationship between x and y -- not present
mod <- lm(y ~ x)
summary(mod)

...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.8491     0.1151  33.431   <2e-16 ***
x             0.5315     0.2303   2.308   0.0231 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

...


To get more power, you can increase the $$NN$$ you are using, or use smaller error values (i.e., use sd= values less than the default 1 in the rnorm() calls).