# How to run two-way ANOVA on data with neither normality nor equality of variance in R?

I am working on my master thesis at the moment and planned on running the statistics with SigmaPlot. However, after spending some time with my data I came to the conclusion that SigmaPlot might not be fit for my problem (I may be mistaken) so I started my first attempts in R, which did not exactly make it easier.

The plan was to run a simple TWO-WAY-ANOVA on my data which results from 3 different proteins and 8 different treatments on those, so my two factors are proteins and treatments. I tested for normality using both

``````> shapiro.test(time)
``````

and

``````> ks.test(time, "norm", mean=mean(time), sd=sqrt(var(time)))
``````

In both cases (maybe not surprising) I ended up with a non-normal distribution.

Which left me with the first questions of which test to use for equality of variances. I came up with

``````> chisq.test(time)
``````

and the result was, that I don’t have equality of variance in my data either.

I tried different data transformations (log, center, standardization), all of which did not solve my problems with the variances.

Now I am at a loss, how to conduct the ANOVA for testing which proteins and which treatments differ significantly from each other. I found something about a Kruskal-Walis-Test, but only for one factor (?). I also found things about ranking or randamization, but not yet how to implement those techniques in R.

Does anyone have a suggestion what I should do?

Edit: thank you for your answers, I am a little overwhelmed by the reading (it just seems getting more and more instead of less), but I will of course keep going.

Here an example of my data, as suggested (I am very sorry for the format, I couldn’t figure out another solution or place to put a file. I am still new to this all.):

``````protein treatment   time
A   con 2329.0
A   HY  1072.0
A   CL1 4435.0
A   CL2 2971.0
A   CL1-HY sim  823.5
A   CL2-HY sim  491.5
A   CL1+HY mix  2510.5
A   CL2+HY mix  2484.5
A   con 2454.0
A   HY  1180.5
A   CL1 3249.7
A   CL2 2106.7
A   CL1-HY sim  993.0
A   CL2-HY sim  817.5
A   CL1+HY mix  1981.0
A   CL2+HY mix  2687.5
B   con 1482.0
B   HY  2084.7
B   CL1 1498.0
B   CL2 1258.5
B   CL1-HY sim  1795.7
B   CL2-HY sim  1804.5
B   CL1+HY mix  1633.0
B   CL2+HY mix  1416.3
B   con 1339.0
B   HY  2119.0
B   CL1 1093.3
B   CL2 1026.5
B   CL1-HY sim  2315.5
B   CL2-HY sim  2048.5
B   CL1+HY mix  1465.0
B   CL2+HY mix  2334.5
C   con 1614.8
C   HY  1525.5
C   CL1 426.3
C   CL2 1192.0
C   CL1-HY sim  1546.0
C   CL2-HY sim  874.5
C   CL1+HY mix  1386.0
C   CL2+HY mix  364.5
C   con 1907.5
C   HY  1152.5
C   CL1 639.7
C   CL2 1306.5
C   CL1-HY sim  1515.0
C   CL2-HY sim  1251.0
C   CL1+HY mix  1350.5
C   CL2+HY mix  1230.5
``````

This may be more of a comment than an answer, but it won’t fit as a comment. We may be able to help you here, but this may take a few iterations; we need more information.

First, what is your response variable?

Second, note that the marginal distribution of your response does not have to be normal, rather the distribution conditional on the model (i.e., the residuals) should be–it is not clear that you have examined your residuals. Furthermore, normality is the least important assumption of a linear model (e.g., an ANOVA); the residuals may not need to be perfectly normal. Tests of normality are not generally worthwhile (see here for a discussion on CV), plots are much better. I would try a qq-plot of your residuals. In `R` this is done with `qqnorm()`, or try `qqPlot()` in the `car` package. It’s also worth considering the manner in which the residuals are non-normal: skewness is more damaging than excess kurtosis, in particular if the skews alternate directions amongst the groups.

If there really is a problem worth worrying about, a transformation is a good strategy. Taking the log of your raw data is one option, but not the only one. Note that centering and standardizing aren’t really transformations in this sense. You want to look into the Box & Cox family of power transformations. And remember, the result doesn’t have to be perfectly normal, just good enough.

Next, I don’t follow your use of the chi-squared test for homogeneity of variance, although it may be perfectly fine. I would suggest you use Levene’s test (use `leveneTest()` in `car`). Heterogeneity is more damaging than non-normality, but the ANOVA is pretty robust if the heterogeneity is minor. A standard rule of thumb is that the largest group variance can be up to four times the smallest without posing strong problems. A good transformation should also address heterogeneity.

If these strategies are insufficient, I would probably explore robust regression before trying a non-parametric approach.

If you can edit your question and say more about your data, I may be able to update this to provide more specific information.