Say I’m doing a negative binomial regression and I’m trying to fit my model via gradient descent.
Given that the Poisson and Negative Binomial have the same inverse link, how might I account for the dispersion?
P.s I need to implement this via GD as I’m working with encrypted ML; I can’t do inverses or transposes easily within my problem setup
I like looking at the Stata docs when I am implementing these models loss functions in other guises. So here on pg 11 Stata has a likelihood function for their version of negative binomial regression.
Where α is the dispersion estimate and uj is the mean estimate (same as for Poisson regression). This is a variant called the NB2 distribution. I have an example on my blog of using this as a loss function in a pytorch deep learning model. So here is the code for torch tensors, but should be easily translatable to other coding languages:
# pytorch loss function def nb2_loss(actual, log_pred, disp): m = 1/disp.exp() mu = log_pred.exp() p = 1/(1 + disp.exp()*mu) nll = torch.lgamma(m + actual) - torch.lgamma(actual+1) - torch.lgamma(m) nll += m*torch.log(p) + actual*torch.log(1-p) return -nll.mean()
Just a few notes — I paramaterize log(α) as
disp in this loss function. So this constrains the α parameter to always be positive. Like all models that use backpropogation, you need decent starting parameters. I think a starting parameter of somewhere between 0 and 1 works well for the problems I have dealt with.
For a second note, I have had terrible time with using pytorch’s stochastic gradient descent in all my experiments with this (even with Poisson regression and fake data so I know good starting points). So at this point I always just default to the Adam optimizer (but again good starting points for all parameters are important).