# What are the assumptions of factor analysis?

I want to check if I really understood [classic, linear] factor analysis (FA), especially assumptions that are made before (and possibly after) FA.

Some of the data should be initially correlated and there is a possible linear relation between them. After doing factor analysis, the data are normally distributed (bivariate distribution for each pairs) and there is no correlation between factors (common and specifics), and no correlation between variables from one factor and variables from other factors.

Is it correct?

Input data assumptions of linear FA (I’m not speaking here about internal assumptions/properties of the FA model or about checking the fitting quality of results).

1. Scale (interval or ratio) input variables. That means the items are either continuous measures or are conceptualized as
continuous while measured on discrete quantitative scale. No ordinal data in linear FA (read). Binary data should also be avoided (see this, this). Linear FA assumes that latent common and unique factors are continuous. Therefore observed variables which they load should be continuous too.
2. Correlations are linear. Linear FA may be performed based on any SSCP-type association matrix: Pearson correlation, covariance, cosine, etc (though some methods/implementations might restrict to Pearson correlations only). Note that these are all linear-algebra products. Despite that the magnitude of a covariance coefficient reflects more than just linearity in relation, the modeling in linear FA is linear in nature even when covariances are used: variables are linear combinations of factors and thus linearity is implied in the resulting associations. If you see/think nonlinear associations prevail – don’t do linear FA or try to linearize them first by some transformations of the data. And don’t base linear FA on Spearman or Kendall correlations (Pt. 4 there).
3. No outliers – that’s as with any nonrobust method. Pearson correlation and similar SSCP-type associations are sensitive of outliers, so watch out.
4. Reasonably high correlations are present. FA is the analysis of correlatedness, – what’s its use when all or almost all correlations are weak? – no use. However, what is “reasonably high correlation” depend on the field of study. There is also an interesting and varied question whether very high correlations should be accepted (the effect of them on PCA, for example, is discussed here). To test statistically if the data are not uncorrelated Bartlett’s test of sphericity can be used.
5. Partial correlations are weak, and factor can be enough defined. FA assumes that factors are more general than just loading pairs of correlated items. In fact, there even an advice not to extract factors loading decently less than 3 items in explotatory FA; and in confirmatory FA only 3+ is guaranteed-identified structure. A technical problem of extraction called Heywood case has, as one of the reasons behind, the too-few-items-on-factor situation. Kaiser-Meyer-Olkin (KMO) “sampling adequacy measure” estimates for you how weak are partial correlations in the data relative the full correlations; it can be computed for every item and for the whole correlation matrix. Common Factor analysis model assumes that pairwise partial correlations are enough small not be bothered about and modelled, and they all fall into that population noise for individual correlation coefficients which we don’t regard any differently from the sample noise for them (see). And read also.
6. No multicollinearity. FA model assumes that all items each posesses unique factor and those factors are orthogonal. Therefore 2 items must define a plane, 3 items – a 3d space, etc: p correlated vectors must span p-dim space to accomodate their p mutually perpendicular unique components. So, no singularity for theoretical reasons$$1^1$$ (and hence automatically n observations > p variables, without saying; and better n>>p). Not that complete multicollinearity is allowed though; yet it may cause computational problems in most of FA algorithms (see also).
7. Distribution. In general, linear FA does not require normality of the input data. Moderately skewed distributions are acceptable. Bimodality is not a contra-indication. Normality is indeed assumed for unique factors in the model (they serve as regressional errors) – but not for the common factors and the input data (see also). Still, multivariate normality of the data can be required as additional assumption by some methods of extraction (namely, maximum likelihood) and by performing some asymptotic testing.

$$1^1$$ ULS/minres methods of FA can work with singular and even non p.s.d. correlation matrix, but strictly theoretically such an analysis is dubious, for me.