# GLM: verifying a choice of distribution and link function

I have a generalized linear model that adopts a Gaussian distribution and log link function. After fitting the model, I check the residuals: QQ plot, residuals vs predicted values, histogram of residuals (acknowledging that due caution is needed). Everything looks good. This seems to suggest (to me) that the choice of a Gaussian distribution was quite reasonable. Or, at least, that the residuals are consistent with the distribution I used in my model.

Q1: Would it be going too far to state that it validates my choice of distribution?

I chose a log link function because my response variable is always positive, but I’d like some sort of confirmation that it was a good choice.

Q2: Are there any tests, like checking the residuals for the choice of distribution, that can support my choice of link function? (Choosing a link function seems a bit arbitrary to me, as the only guidelines I can find are quite vague and hand-wavey, presumably for good reason.)

Within that framework, the canonical link for a Gaussian model would be the identity link. In this case you rejected that possibility, presumably for theoretical reasons. I suspect your thinking was that $Y$ cannot take negative values (note that ‘does not happen to’ is not the same thing). If so, the log is a reasonable choice a-priori, but it doesn’t just prevent $Y$ from becoming negative, it also induces a specific shape to the curvilinear relationship. A standard plot of residuals vs. fitted values (perhaps with a loess fit overlaid) will help you identify if the intrinsic curvature in your data is a reasonable match for the specific curvature imposed by the log link. As I mentioned, you can also try whatever other transformation meets your theoretical criteria that you want and compare the two fits directly.