Tweedie p parameter Interpretation

From Wikipedia ( we know that

The Tweedie distributions include a number of familiar distributions as well as some unusual ones, each being specified by the domain of the index parameter. We have the
normal distribution, p = 0,
Poisson distribution, p = 1,
compound Poisson–gamma distribution, 1 < p < 2,
gamma distribution, p = 2,
positive stable distributions, 2 < p < 3,
inverse Gaussian distribution, p = 3,
positive stable distributions, p > 3, and
extreme stable distributions, p = ∞.

For 0 < p < 1 no Tweedie model exists.

My question is can we determine significance from the optimal p if it is between 1-2. For instance, if we say that Poisson controls the frequency in which our Y appears and Gamma controls the size of our Y if it appears and if the log-likelihood of the model has an optimal value for p of 1.1 then is it fair to say that the data is driven more by frequency than size and vice versa?

so for example, if you run this code:

x <- rep(1:100,times=10)
freq <- sapply(x,function(arg) rpois(1,.25*arg))
size <- sapply(x,function(arg) rgamma(1,100*arg))
y <- freq*size
df <- data.frame(cbind(x,y))

tweediep <- tweedie.profile(y~x, link.power = 0, data = df, 
    do.smooth = FALSE, do.plot = TRUE)

I get the result 1.3. Does that mean that this Y is more “driven by” frequency than severity? Does it mean something else altogether? Is there nothing I can discern from this estimated value of the parameter p ?

Edit: This is not a complete answer so I’m posting it as an edit and clarification rather than an answer to my own question.

I have attempted to anecdotally answer my own question via a limited experiment. Specifically, I want to create dummy data and a “frequency driven” and “severity driven” model from this data and observe the results to better understand the p parameter.


#creating random variables

#creating parameters based off a linear combination of variables plus a noise term - all exponentiated

#freq is the number of occurances, sev is the size of the occurances



#load library if needed

#store all the data in a single data frame

This is my “base model”. I now find the optimal parameter p.

tweediep<-tweedie.profile(formula = results1~x1+x2+x3+x4,link.power = 0,do.smooth = TRUE,do.plot = TRUE)

I get a result of 1.616327 . I went ahead and found the results model coefficients. Strictly for the purposes of answering my question, I don’t believe actually fitting the glm is necessary but perhaps the diagnostics of the glm will lead to more insight.

baseGLM<-glm(results1~x1+x2+x3+x4,data=df1,family=tweedie(var.power = tweediep$xi.max,link.power = 0))

It’s possible that the model isn’t “fit well enough” for this type of analysis to say anything meaningful, but to continue this line of reasoning, let’s try fitting a “frequency driven model”.

#freq driven





tweediep2<-tweedie.profile(formula = results2~x1+x2+x3+x4,link.power = 0,do.smooth = TRUE,do.plot = TRUE)

here our p parameter value is 1.718367 which is higher than our base value. Again, we fit the model.

GLMfreqDriven<-glm(results2~x1+x2+x3+x4,data=df1,family=tweedie(var.power = tweediep2$xi.max,link.power = 0))

Finally, let’s make our model more severity driven.

#severity driven





tweediep3<-tweedie.profile(formula = results3~x1+x2+x3+x4,link.power = 0,do.smooth = TRUE,do.plot = TRUE)

Here our p parameter is 1.591937.

This is the opposite of my initial hypothesis. I originally thought that the more frequency driven a model is, the closer the p parameter would be to 1 (since 1 is a true Poisson model), and the more severity driven the model is, closer the p parameter is to 2. In my, admittedly limited, experiment the opposite was true. The Poisson distribution does allow zeros as a response variable. Perhaps the p parameter has nothing to do with “frequency driven” or “severity driven”, but is strictly a function of the number of zeros in the data (for p parameter values between 1 and 2).

One possible limitation I thought of with respect to my experiment design is that would be interesting to see how the results change if I force the results1, results2, and results3 variables to all have the same mean. The data here was fit arbitrarily. Several of instances where I am dividing or multiplying by a constant, I am doing so to get the data in such a way that the algorithm converges without error. I would be more confident if the results (1.61, 1.72, 1.59) were more spread out.

In conclusion, I can see possible explanations for results of my limited experiment, but would really be appreciative if someone could further validate and explain how the p parameter is affected by the data. Any discussion as to the differences between creating separate frequency and severity models versus creating compound Poisson-gamma models may also prove to be very insightful.

Thank you


The Generalized Linear Models with Examples in R book by Peter Dunn and Gordon Smyth contains an illuminating discussion of Tweedie distributions.

If I maybe so blunt, and summarize your excellent question:

What is the relation of p and the underlying Poisson-Gamma model for a Tweedie distribution with 1<p<2?

As you already note in your question, the Tweedie distribution with 1<p<2 can be understood as a Poisson-Gamma model. To make it more concrete what this means, let's assume that

the observed y is
Dunn & Smyth give an example for this model, where N is the number of insurance claims and zi is the average cost for each claim. In that case the model would describe the total insurance payout.

The relation of p to the parameters of the Poisson and Gamma distribution is

where μ and ϕ are the mean and overdispersion parameters from the generalized linear model definition.

Source : Link , Question Author : Eric , Answer Author : const-ae

Leave a Comment