When a CFA model has a “covariance matrix was not positive definite” problem, is it due to the dataset or the model?

I am testing several CFA measurement models with Lavaan in R. The questionnaire that I am investigating has been shown to be composed of 1-factor, 3-factor, and 4-factor.

In the dataset, I found that the 1-factor and 3-factor fit okay, but the 3-factor fits best (lowest AIC).

However, the 4-factor model could not be tested because the “covariance matrix was not positive definite”. A closer look revealed that one of the factors has an above one correlation with another factor.

My question is, why is it that the covariance matrix was not positive definite in the 4-factor model in my dataset? Is this problem unique to my dataset?

I am still a beginner, so I might miss some important details.

Thank you very much for the fast response everyone!
Update: The sample is 904 participants and there are 28 observed variables (items) in the model.

Answer

The covariance matrix of the data is always non-negative definite, there is no doubt about that. However, the model-implied covariance matrix may not be when some parameters take values outside their natural ranges. In turn, this may happen for a number of reasons.

  1. Your 4-factor model may be misspecified, i.e., does not fit the data right.
  2. Your model is OK, it’s just that the sample that you are dealing with favors high values of the correlation parameter. To distinguish between 1 and 2, you need to find a way to test whether the correlation in question is significantly greater than 1, which is not a trivial endeavor (doi: 10.1177/0049124112442138): few packages computed the standard errors properly at the time that paper was written, and I don’t know if the current version of lavaan does.
  3. lavaan computes numeric derivatives (as any other software) by taking parameter $\pm$ a small step, and while the current value of the parameter is kosher, the step may throw it over the limit and produce a matrix that is not positive definite. (Analytic derivatives are available for the multivariate normal case, but binary/ordinal variables require numeric integration over the distributions of latent variables, and do not lend themselves to analytic differentiation. So this depends on your model.)

I think you can argue that, due to lack of convergence, your 4-factor model does not work well, and is not a contender in your model selection.

Attribution
Source : Link , Question Author : Edo Jaya , Answer Author : StasK

Leave a Comment