# What is non-parametric structural equation modeling?

I have been reading some work from Judea Pearl which is very excellent. In his papers, he suggests that “non-parametric SEM (structural equation modeling)” is a way of estimating associations from DAGs. His writing suggests, to me, it is agreed upon that such a method exists or may exist, but for the time being, the way to fit such a model is irrelevant. I take the “non-parametric” bit to differentiate the approach from “plain vanilla” SEM ala Muthen-Muthen’s M-plus or R’s lavaan package, which is parametric in the sense that estimates come from maximizing a joint normal likelihood.

The presumed or implemented method, however, is very relevant to me. I’d like to know exactly how we can more or less model complex, high dimensional structural equations. Part of my barrier to understanding is that I don’t understand, computationally, how SEMs are fit, except that (despite a common misinterpretation) they are not just sequences of regression models.

I know that at times “non-parametric” is simply a matter of interpretation. For instance, linear regression can be seen as non-parametric as it simply summarizes a first-order trend which is a summary intrinsic to any bivariate relationship between two variables of any distribution having any (possibly curvilinear) trend. On the other hand, if interest lies in determining a non-linear relationship between two variables, penalized splines provide excellent non-linear smoothing. However, SEM does not focus on mean differences, but looks at covariance between features.

Of the acceptable methods(s) what constitutes non-linear SEM? Is it just a matter of interpretation as above? Or do we need to use non-linear modeling with splines and penalties? What about robust covariance?

I have a feeling that you’re looking for piecewise SEM, as I’ve heard it mentioned with reference to Pearl in the past. It is literally sequences of regressions, with some graph theory to tie things together. There are also distribution free estimators of typical SEM models though (though they don’t really perform any better than ML with robust variance estimates).

Ultimately, however, it does depend on interpretation. “Non-parametric SEM” isn’t a widely use term, and different authors will use it differently.

The presumed or implemented method, however, is very relevant to me. I’d like to know exactly how we can more or less model complex, high dimensional structural equations. Part of my barrier to understanding is that I don’t understand, computationally, how SEMs are fit, except that (despite a common misinterpretation) they are not just sequences of regression models.

Maximum likelihood! We construct a distribution function $$f(X;θ)f(X;\theta)$$ for the data, and maximize it over $$θ\theta$$. In SEM, this is done jointly and automatically. However, it’s useful to understand how this can be done from marginal/conditional distributions:

Let’s say that we have the variables $$xx$$, $$zz$$, $$yy$$. We want to fit a mediation model to this data. Assuming normality, we construct the distributions as follows:

$$y∣z,x∼N(αy+β0x+β1z,σ2y) y_{\mid z,x} \sim \mathcal{N}(\alpha_y + \beta_0x + \beta_1z, \sigma_y^2)$$
$$z∣x∼N(αz+β2x,σ2z) z_{\mid x} \sim \mathcal{N}(\alpha_z+ \beta_2x, \sigma_z^2)$$
$$x∼N(αx,σ2x) x \sim \mathcal{N}(\alpha_x,\sigma_x^2)$$

We then combine these into a single distribution, using the probability chain rule, yielding $$f(y,z,x;θ)f(y,z,x;\theta)$$, where $$θ\theta$$ includes all the model parameters. This is our likelihood function, $$L(θ;data)L(\theta;data)$$, and we simply (or not-so-simply) need to find the maximum of it.

With SEM we add some assumptions so that we can actually identify the distribution (e.g. model is recursive, which means the opposite of what you would think it does), and specify all the relations at once as a matrix equation with a particular form. For instance, with a LISREL model:

1. $$η=Bη+Γξ+ζ \eta = B\eta + \Gamma\xi + \zeta$$
2. $$y=Λyη+ϵ y = \Lambda_y\eta + \epsilon$$
3. $$x=Λxξ+δ x = \Lambda_x\xi + \delta$$

Or, in words:

1. Latent DVs = (coefs * latent DVs) + (coefs * latent IVs) + error,
2. Observed DVs = (coefs * latent DVs) + error,
3. Observed IVs = (coefs * latent IVs) + error

For some more detail on constructing these matrices, see this. Alternatively, this is slightly ‘softer’. Modern software often constructs these matrices for you from some easier-to-read form such as lavaan or Mplus’ syntax.

We can also estimate the parameters, with the same equation setup, using alternative methods such as weighted least squares that don’t obviously depend on a distribution. However, these methods don’t actually tend to be any better than maximum likelihood if you use robust variances.

Of the acceptable methods(s) what constitutes non-linear SEM?

Non-linear SEM typically refers to SEM that contains latent interactions or polynomial effects. Such models can be considerably more difficult to estimate, and are not supported by most SEM programs (Mplus and OpenMX being the exceptions).

Is it just a matter of interpretation as above?

As mentioned previously, yes. Although it’s also a matter of how the term is typically used, if such a precedent exists.

Or do we need to use non-linear modeling with splines and penalties?

No.

What about robust covariance?

ML with robust (co)variance estimates generally performs similarly (or even better than) “distribution-free” methods such as WLS. The idea behind it is that ML estimates are consistent even if your model is misspecified, provided that the object of inference in the misspecified model is the true object of interest. The problem isn’t with the estimates, it’s with the variances. The way that we typically estimate variances (simply inverting the information matrix) underestimates variances when the model is misspecified. To address this, we simply replace the variance estimates with a consistent variance estimator, such as those from bootstraping or a sandwich estimator.