Is there a rational for the number of observations per cluster in a random effect model? I have a sample size of 1,500 with 700 clusters modeled as exchangeable random effect. I have the option to merge clusters in order to build fewer, but larger clusters. I wonder how can I choose the minimum sample size per cluster as to have meaningful results in predicting the random effect for each cluster? Is there a good paper that explains this?

**Answer**

**TL;DR**: The minimum sample size per cluster in a mixed-effecs model is 1, provided that the number of clusters is adequate, and the proportion of singleton cluster is not “too high”

Longer version:

In general, the number of clusters is more important than the number of observations per cluster. With 700, clearly you have no problem there.

Small cluster sizes are quite common, especially in social science surveys that follow stratified sampling designs, and there is a body of research that has investigated cluster-level sample size.

While increasing the cluster size increases statistical power to estimate the random effects (Austin & Leckie, 2018), small cluster sizes do not lead to serious bias (Bell et al, 2008; Clarke, 2008; Clarke & Wheaton, 2007; Maas & Hox, 2005). Thus, the minimum sample size per cluster is 1.

In particular, Bell, et al (2008) performed a Monte Carlo simulation study with proportions of singleton clusters (clusters containing only a single observation) ranging from 0% to 70%, and found that, provided the number of clusters was large (~500) the small cluster sizes had almost no impact on bias and Type 1 error control.

They also reported very few problems with model convergence under any of their modelling scenarios.

For the particular scenario in the OP, I would suggest running the model with 700 clusters in the first instance. Unless there was a clear problem with this, I would be disinclined to merge clusters. I ran a simple simulation in R:

Here we create a clustered dataset with with a residual variance of 1, a single fixed effect also of 1, 700 clusters, of which 690 are singletons and 10 have just 2 observations. We run the simulation 1000 times and observe the histograms of the estimated fixed and residual random effects.

```
> set.seed(15)
> dtB <- expand.grid(Subject = 1:700, measure = c(1))
> dtB <- rbind(dtB, dtB[691:700, ])
> fixef.v <- numeric(1000)
> ranef.v <- numeric(1000)
> for (i in 1:1000) {
dtB$x <- rnorm(nrow(dtB), 0, 1)
dtB$y <- dtB$Subject/100 + rnorm(nrow(dtB), 0, 1) + dtB$x * 1
fm0B <- lmer(y ~ x + (1|Subject), data = dtB)
fixef.v[i] <- fixef(fm0B)[[2]]
ranef.v[i] <- attr(VarCorr(fm0B), "sc")
}
> hist(fixef.v, breaks = 15)
> hist(ranef.v, breaks = 15)
```

As you can see, the fixed effects are very well estimated, while the residual random effects appear to be a little downward-biased, but not drastically so:

```
> summary(fixef.v)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6479 0.9439 0.9992 1.0005 1.0578 1.2544
> summary(ranef.v)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2796 0.7745 0.9004 0.8993 1.0212 1.4837
```

The OP specifically mentions the estimation of cluster-level random effects. In the simulation above, the random effects were created simply as the value of each `Subject`

‘s ID (scaled down by a factor of 100). Obviously these are not normally distributed, which is the assumption of linear mixed effects models, however, we can extract the (conditional modes of) the cluster level effects and plot them against the actual `Subject`

IDs:

```
> re <- ranef(fm0B)[[1]][, 1]
> dtB$re <- append(re, re[691:700])
> hist(dtB$re)
> plot(dtB$re, dtB$Subject)
```

The histogram departs from normality somewhat, but this is due to the way we simulated the data. There is still a reasonable relationship between the estimated and actual random effects.

References:

Peter C. Austin & George Leckie (2018) The effect of number of clusters and cluster size on statistical power and Type I error rates when testing random effects variance components in multilevel linear and logistic regression models, Journal of Statistical Computation and Simulation, 88:16, 3151-3163, DOI: 10.1080/00949655.2018.1504945

Bell, B. A., Ferron, J. M., & Kromrey, J. D. (2008). *Cluster size in multilevel models: the impact of sparse data structures on point and interval estimates in two-level models*. JSM Proceedings, Section on Survey Research Methods, 1122-1129.

Clarke, P. (2008). *When can group level clustering be ignored? Multilevel models versus single-level models with sparse data*. Journal of Epidemiology and Community Health, 62(8), 752-758.

Clarke, P., & Wheaton, B. (2007). *Addressing data sparseness in contextual population research using cluster analysis to create synthetic neighborhoods*. Sociological Methods & Research, 35(3), 311-351.

Maas, C. J., & Hox, J. J. (2005). *Sufficient sample sizes for multilevel modeling*. Methodology, 1(3), 86-92.

**Attribution***Source : Link , Question Author : Roccer , Answer Author : Robert Long*