Simulating data to fit a mediation model

I am interested in finding a procedure to simulate data that are consistent with a specified mediation model. According to the general linear structural equation model framework for testing mediation models first outlined by Barron and Kenny (1986) and described elsewhere such as Judd, Yzerbyt, & Muller (2013), mediation models for outcome Y, mediator \newcommand{\med}{\rm med} \med, and predictor X and are governed by the following three regression equations:
\begin{align}
Y &= b_{11} + b_{12}X + e_1 \tag{1} \\
\med &= b_{21} + b_{22}X + e_2 \tag{2} \\
Y &= b_{31} + b_{32}X + b_{32} \med + e_3 \tag{3}
\end{align}

The indirect effect or mediation effect of X on Y through \med can either be defined as b_{22}b_{32} or, equivalently, as b_{12}-b_{32}. Under the old framework of testing for mediation, mediation was established by testing b_{12} in equation 1, b_{22} in equation 2, and b_{32} in equation 3.

So far, I have attempted to simulate values of \med and Y that are consistent with values of the various regression coefficients using rnorm in R, such as the code below:

x   <- rep(c(-.5, .5), 50)
med <- 4 + .7 * x + rnorm(100, sd = 1) 

# Check the relationship between x and med
mod <- lm(med ~ x)
summary(mod)

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

# Check the relationships between x, med, and y
mod <- lm(y ~ x + med)
summary(mod)

# Check the relationship between x and y -- not present
mod <- lm(y ~ x)
summary(mod)

However, it seems that sequentially generating \med and Y using equations 2 and 3 is not enough, since I am left with no relationship between X and Y in regression equation 1 (which models a simple bivariate relationship between X and Y) using this approach. This is important because one definition of the indirect (i.e., mediation) effect is b_{12}-b_{32}, as I describe above.

Can anyone help me find a procedure in R to generate variables X, \med, and Y that satisfy constraints that I set using equations 1, 2, and 3?

Answer

This is quite straightforward. The reason you have no relationship between x and y using your approach is because of the code:

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

If you want some relationship between x and y even when {\rm med} is included (that is, you want partial mediation), you would simply use a non-zero value for b_{32} instead. For example, you could substitute the following code for the above:

y <- 2.5 + 3 * x + .4 * med + rnorm(100, sd = 1)

Thus, b_{32} has been changed from 0 to 3. (Of course some other, specific value would probably be more relevant, depending on your situation, I just picked 3 off the top of my head.)


Edit:
With respect to the marginal x\rightarrow y relationship being non-significant, that is just a function of statistical power. Since the causal force of x is passed entirely through {\rm med} in your original setup, you have lower power than you might otherwise. Nonetheless, the effect is still real in some sense. When I ran your original code (after having set the seed using 90 as a value that I again just picked off the top of my head), I did get a significant effect:

set.seed(90)
x <- rep(c(-.5, .5), 50)
med <- 4 + .7 * x + rnorm(100, sd = 1) 

# Check the relationship between x and med
mod <- lm(med ~ x)
summary(mod)

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

# Check the relationships between x, med, and y
mod <- lm(y ~ x + med)
summary(mod)

# Check the relationship between x and y -- not present
mod <- lm(y ~ x)
summary(mod)

...
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.8491     0.1151  33.431   <2e-16 ***
x             0.5315     0.2303   2.308   0.0231 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

...

To get more power, you can increase the N you are using, or use smaller error values (i.e., use sd= values less than the default 1 in the rnorm() calls).

Attribution
Source : Link , Question Author : Patrick S. Forscher , Answer Author : gung – Reinstate Monica

Leave a Comment