Confidence intervals around a centroid with modified Gower similarity

I would like to obtain 95% confidence intervals for centroids based on Gower similarity between some mulivariate samples (community data from sediment cores). I have so far used the vegan{} package in R to obtain modified Gower similarity between cores (based on Anderson 2006; now included in R as part of vegdist()). Does anyone know how I can calculate 95% confidence intervals for the centroids of, for example, sampling sites, based on modified Gower similarity?

Additionally, if possible, I would like to plot these 95% CIs on a PCO that shows the centroids, so it’s evident if they’re overlapping.

To get modified Gower similarity, I used:

dat.mgower <- vegdist(decostand(dat, "log"), "altGower")

But as far as I know, you don’t get centroids from vegdist(). I need to get centroids, then 95% CIs, then plot them… in R. Help!

Anderson, M. J., K. E. Ellingsen, and B. H. McArdle. 2006. Multivariate dispersion as a measure of beta diversity. Ecology Letters 9:683–693.

Answer

I’m not immediately clear what centroid you want, but the centroid that comes to mind is the point in multivariate space at the centre of the mass of the points per group. About this you want a 95% confidence ellipse. Both aspects can be computed using the ordiellipse() function in vegan. Here is a modified example from ?ordiellipse but using a PCO as a means to embed the dissimilarities in an Euclidean space from which we can derive centroids and confidence ellipses for groups based on the Nature Management variable Management.

require(vegan)
data(dune)
dij <- vegdist(decostand(dune, "log"), method = "altGower")
ord <- capscale(dij ~ 1) ## This does PCO

data(dune.env) ## load the environmental data

Now we display the first 2 PCO axes and add a 95% confidence ellipse based on the standard errors of the average of the axis scores. We want standard errors so set kind="se" and use the conf argument to give the confidence interval required.

plot(ord, display = "sites", type = "n")
stats <- with(dune.env,
              ordiellipse(ord, Management, kind="se", conf=0.95, 
                          lwd=2, draw = "polygon", col="skyblue",
                          border = "blue"))
points(ord)
ordipointlabel(ord, add = TRUE)

Notice that I capture the output from ordiellipse(). This returns a list, one component per group, with details of the centroid and ellipse. You can extract the center component from each of these to get at the centroids

> t(sapply(stats, `[[`, "center"))
         MDS1       MDS2
BF -1.2222687  0.1569338
HF -0.6222935 -0.1839497
NM  0.8848758  1.2061265
SF  0.2448365 -1.1313020

Notice that the centroid is only for the 2d solution. A more general option is to compute the centroids yourself. The centroid is just the individual averages of the variables or in this case the PCO axes. As you are working with the dissimilarities, they need to be embedded in an ordination space so you have axes (variables) that you can compute averages of. Here the axis scores are in columns and the sites in rows. The centroid of a group is the vector of column averages for the group. There are several ways of splitting the data but here I use aggregate() to split the scores on the first 2 PCO axes into groups based on Management and compute their averages

scrs <- scores(ord, display = "sites")
cent <- aggregate(scrs ~ Management, data = dune.env, FUN = mean)
names(cent)[-1] <- colnames(scrs)

This gives:

> cent
  Management       MDS1       MDS2
1         BF -1.2222687  0.1569338
2         HF -0.6222935 -0.1839497
3         NM  0.8848758  1.2061265
4         SF  0.2448365 -1.1313020

which is the same as the values stored in stats as extracted above. The aggregate() approach generalises to any number of axes, e.g.:

> scrs2 <- scores(ord, choices = 1:4, display = "sites")
> cent2 <- aggregate(scrs2 ~ Management, data = dune.env, FUN = mean)
> names(cent2)[-1] <- colnames(scrs2)
> cent2
  Management       MDS1       MDS2       MDS3       MDS4
1         BF -1.2222687  0.1569338 -0.5300011 -0.1063031
2         HF -0.6222935 -0.1839497  0.3252891  1.1354676
3         NM  0.8848758  1.2061265 -0.1986570 -0.4012043
4         SF  0.2448365 -1.1313020  0.1925833 -0.4918671

Obviously, the centroids on the first two PCO axes don’t change when we ask for more axes, so you could compute the centroids over all axes once, then use what ever dimension you want.

You can add the centroids to the above plot with

points(cent[, -1], pch = 22, col = "darkred", bg = "darkred", cex = 1.1)

The resulting plot will now look like this

use of ordiellipse

Finally, vegan contains the adonis() and betadisper() functions that are designed to look at differences in means and variances of multivariate data in ways very similar to Marti’s papers/software. betadisper() is closely linked to the content of the paper you cite and can also return the centroids for you.

Attribution
Source : Link , Question Author : Margaret , Answer Author : Gavin Simpson

Leave a Comment