# Does regression work on data that isn’t normally distributed?

I’m trying to see if variables x and y together or separately significantly affect Q_7 (the histogram for which is above). I’ve run a Shapiro-Wilk normality test and got the following

shapiro.test(Q_7)
## data:  Q_7
## W = 0.68439, p-value < 2.2e-16


With this distribution, will the following regression work? Or is there another test I should be doing?

lm(Q_7 ~ x*y)


where $X$ is your matrix of regressor variables, $y$ is the (vector of) data to be explained, $\beta$ is a vector of coefficients on the regressors and $\varepsilon$ is random variability (typically considered noise), then the assumption of Normality applies strictly to $\varepsilon$, not to $y$ (edit: well, strictly speaking it applies to the conditional distribution $y|X$ (which is the same as the distribution of $\varepsilon$), but not to the marginal distribution of $y$). In other words, the data should be Normally distributed once the effects of the regressors have been accounted for, but not (necessarily) before.
What you’re testing here is the distribution of $y$, where what you want to test is the distribution of $\varepsilon$. Of course you don’t know $\varepsilon$, but you can estimate it by running the regression and examining the distrbution of the residuals $\hat\varepsilon=y-X\hat\beta$ (where $\hat\beta$ are the estimated coefficents from the regression). These residuals $\hat\varepsilon$ are an estimate of $\varepsilon$, and so their distribution will be an approximation of the distribution of $\varepsilon$.