I am trying to determine if caterpillars that eat a natural diet (monkeyflower) are more resistant to predators (ants) than caterpillars that eat an artificial diet (a mixture of wheat germ and vitamins). I did a trial study with a small sample size (20 caterpillars; 10 per diet). I weighed each caterpillar prior to the experiment. I offered a pair of caterpillars (one per diet) to a group of ants for a period of five minutes, and counted the number of times that each caterpillar was rejected. I repeated this process ten times.

This are what my data look like (A = artificial diet, N = natural diet):

`Trial A_Weight N_Weight A_Rejections N_Rejections 1 0.0496 0.1857 0 1 2 0.0324 0.1112 0 2 3 0.0291 0.3011 0 2 4 0.0247 0.2066 0 3 5 0.0394 0.1448 3 1 6 0.0641 0.0838 1 3 7 0.0360 0.1963 0 2 8 0.0243 0.145 0 3 9 0.0682 0.1519 0 3 10 0.0225 0.1571 1 0`

I am attempting to run an ANOVA in R. This is what my code looks like (0 = Artificial diet, 1 = Natural diet; all vectors are organized with data for the ten artificial diet caterpillars first, followed by data for the ten natural diet caterpillars):

`diet <- factor (rep (c (0, 1), each = 10) rejections <- c(0,0,0,0,3,1,0,0,0,1,1,2,2,3,1,3,2,3,3,0) weight <- c(0.0496,0.0324,0.0291,0.0247,0.0394,0.0641,0.036,0.0243,0.0682,0.0225,0.1857,0.1112,0.3011,0.2066,0.1448,0.0838,0.1963,0.145,0.1519,0.1571) all.data <- data.frame(Diet=diet, Rejections = rejections, Weight = weight) fit.all <- lm(Rejections ~ Diet * Weight, all.data) anova(fit.all)`

And these are what my results look like:

`Analysis of Variance Table Response: Rejections Df Sum Sq Mean Sq F value Pr(>F) Diet 1 11.2500 11.2500 9.8044 0.006444 ** Weight 1 0.0661 0.0661 0.0576 0.813432 Diet:Weight 1 0.0748 0.0748 0.0652 0.801678 Residuals 16 18.3591 1.1474 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

My questions are:

- Is ANOVA appropriate here? I realize the small sample size would be an issue with any statistical test; this is just a trial study that I’d like to run stats on for a class presentation. I do plan to redo this study with a larger sample size.
- Have I entered my data into R correctly?
- Is this telling me that diet is significant, but weight is not?

**Answer**

**tl;dr** @whuber is right that diet and weight are confounded in your analysis: this is what the picture looks like.

The fat points + ranges show the mean and bootstrap confidence intervals for diet alone; the gray line + confidence interval shows the overall relationship with weight; the individual lines + CI show the relationships with weight for each group. There’s more rejection for diet=N, but those individuals also have higher weights.

Going into the gory mechanical details: you’re on the right track with your analysis, but (1) when you test the effect of diet, you have to take the effect of weight into account, and *vice versa*; by default R does a *sequential* ANOVA, which tests the effect of diet alone; (2) for data like this you should probably be using a Poisson generalized linear model (GLM), although it doesn’t make too much difference to the statistical conclusions in this case.

If you look at `summary()`

rather than `anova()`

, which tests marginal effects, you’ll see that nothing looks particularly significant (you also have to be careful with testing main effects in the presence of an interaction: in this case the effect of diet is evaluated at a weight of *zero*: probably not sensible, but since the interaction is non-significant (although it has a large effect!) it may not make much difference.

```
summary(fit.lm <- lm(rejections~diet*weight,data=dd2))
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3455 0.9119 0.379 0.710
## dietN 1.9557 1.4000 1.397 0.182
## weight 3.9573 21.6920 0.182 0.858
## dietN:weight -5.7465 22.5013 -0.255 0.802
```

Centering the weight variable:

```
dd2$cweight <- dd2$weight-mean(dd2$weight)
summary(fit.clm <- update(fit.lm,rejections~diet*cweight))
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7559 1.4429 0.524 0.608
## dietN 1.3598 1.5318 0.888 0.388
## cweight 3.9573 21.6920 0.182 0.858
## dietN:cweight -5.7465 22.5013 -0.255 0.802
```

No huge changes in the story here.

```
car::Anova(fit.clm,type="3")
## Response: rejections
## Sum Sq Df F value Pr(>F)
## (Intercept) 0.3149 1 0.2744 0.6076
## diet 0.9043 1 0.7881 0.3878
## cweight 0.0382 1 0.0333 0.8575
## diet:cweight 0.0748 1 0.0652 0.8017
## Residuals 18.3591 16
```

There is some argument about whether so-called “type 3” tests make sense; they don’t always, although centering the weight helps. Type 2 analysis, which tests the main effects after taking the interaction out of the model, may be more defensible. In this case diet and cweight are tested in the presence of each other, but without the interactions included.

```
car::Anova(fit.clm,type="2")
## Response: rejections
## Sum Sq Df F value Pr(>F)
## diet 4.1179 1 3.5888 0.07639 .
## cweight 0.0661 1 0.0576 0.81343
## diet:cweight 0.0748 1 0.0652 0.80168
## Residuals 18.3591 16
```

We can see that if we analyzed diet **ignoring the effects of weight** we would get a highly significant result – this is essentially what you found in your analysis, because of the sequential ANOVA.

```
fit.lm_diet <- update(fit.clm,. ~ diet)
car::Anova(fit.lm_diet)
## Response: rejections
## Sum Sq Df F value Pr(>F)
## diet 11.25 1 10.946 0.003908 **
## Residuals 18.50 18
```

It would be more standard to fit this kind of data to a Poisson GLM (`glm(rejections~diet*cweight,data=dd2,family=poisson)`

), but in this case it doesn’t make very much difference to the conclusions.

By the way, it’s better to rearrange your data programmatically rather than by hand if you can. For reference, this is how I did it (sorry for the amount of “magic” I used):

```
library(tidyr)
library(dplyr)
dd <- read.table(header=TRUE,text=
"Trial A_Weight N_Weight A_Rejections N_Rejections
1 0.0496 0.1857 0 1
2 0.0324 0.1112 0 2
3 0.0291 0.3011 0 2
4 0.0247 0.2066 0 3
5 0.0394 0.1448 3 1
6 0.0641 0.0838 1 3
7 0.0360 0.1963 0 2
8 0.0243 0.145 0 3
9 0.0682 0.1519 0 3
10 0.0225 0.1571 1 0
")
## pick out weight and rearrange to long format
dwt <- dd %>% select(Trial,A_Weight,N_Weight) %>%
gather(diet,weight,-Trial) %>%
mutate(diet=gsub("_.*","",diet))
## ditto, rejections
drej <- dd %>% select(Trial,A_Rejections,N_Rejections) %>%
gather(diet,rejections,-Trial) %>%
mutate(diet=gsub("_.*","",diet))
## put them back together
dd2 <- full_join(dwt,drej,by=c("Trial","diet"))
```

Plotting code:

```
dd_sum <- dd2 %>% group_by(diet) %>%
do(data.frame(weight=mean(.$weight),
rbind(mean_cl_boot(.$rejections))))
library(ggplot2); theme_set(theme_bw())
ggplot(dd2,aes(weight,rejections,colour=diet))+
geom_point()+
scale_colour_brewer(palette="Set1")+
scale_fill_brewer(palette="Set1")+
geom_pointrange(data=dd_sum,aes(y=y,ymin=ymin,ymax=ymax),
size=4,alpha=0.5,show.legend=FALSE)+
geom_smooth(method="lm",aes(fill=diet),alpha=0.1)+
geom_smooth(method="lm",aes(group=1),colour="darkgray",
alpha=0.1)+
scale_y_continuous(limits=c(0,3),oob=scales::squish)
```

**Attribution***Source : Link , Question Author : Meow , Answer Author : Ben Bolker*