As part of the output of a generalised linear model, the null and residual deviance are used to evaluate the model. I often see the formulas for these quantities expressed in terms of the log likelihood of the saturated model, for example: https://stats.stackexchange.com/a/113022/22199, Logistic Regression : How to obtain a saturated model
The saturated model, as far as I understand it, is the model that perfectly fits the observed response. Thus, in most places I have seen, the loglikelihood of the saturated model is always given as zero.
Yet, the way the formula for deviance is given suggests that sometimes this quantity is non zero. (As if it is zero always, why bother including it?)
In what cases can it be non zero? If it is never nonzero, why include it in the formula for deviance?
Answer
If you really meant loglikelihood, then the answer is: it’s not always zero.
For example, consider Poisson data: yi∼Poisson(μi),i=1,…,n. The loglikelihood for Y=(y1,…,yn) is given by:
ℓ(μ;Y)=−n∑i=1μi+n∑i=1yilogμi−n∑i=1log(yi!).
Differentiate ℓ(μ;Y) in (∗) with respect to μi and set it to 0 (this is how we obtain the MLE for saturated model):
−1+yiμi=0.
Solve this for μi to get ˆμi=yi, substituting ˆμi back into (∗) for μi gives that the loglikelihood of the saturated model is:
ℓ(ˆμ;Y)=n∑i=1yi(logyi−1)−n∑i=1log(yi!)≠0
unless yi take very special values.
In the help page of the R
function glm
, under the item deviance
, the document explains this issue as follows:
deviance
up to a constant, minus twice the maximized loglikelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
Notice that it mentioned that the deviance, instead of the loglikelihood of the saturated model is chosen to be zero.
Probably, what you really wanted to confirm is that “the deviance of the saturated model is always given as zero”, which is true, since the deviance, by definition (see Section 4.5.1 of Categorical Data Analysis (2nd Edition) by Alan Agresti) is the likelihood ratio statistic of a specified GLM to the saturated model. The constant
aforementioned in the R documentation is actually twice the maximized loglikelihood of the saturated model.
Regarding your statement “Yet, the way the formula for deviance is given suggests that sometimes this quantity is non zero.”, it is probably due to the abuse of usage of the term deviance. For instance, in R, the likelihood ratio statistic of comparing two arbitrary (nested) models M1 and M2 is also referred to as deviance, which would be more precisely termed as the difference between the deviance of M1 and the deviance of M2, if we closely followed the definition as given in Agresti’s book.
Conclusion

The loglikelihood of the saturated model is in general nonzero.

The deviance (in its original definition) of the saturated model is zero.

The deviance output from softwares (such as R) is in general nonzero as it actually means something else (the difference between deviances).
The following are the derivation for the general exponentialfamily case and another concrete example. Suppose that data come from exponential family (see Modern Applied Statistics with S, Chapter 7):
f(yi;θi,φ)=exp[Ai(yiθi−γ(θi))/φ+τ(yi,φ/Ai)].
where Ai are known prior weights and φ are dispersion/scale parameter (for many cases such as binomial and Poisson, this parameter is known, while for other cases such as normal and Gamma, this parameter is unknown). Then the loglikelihood is given by:
ℓ(θ,φ;Y)=n∑i=1Ai(yiθi−γ(θi))/φ+n∑i=1τ(yi,φ/Ai).
As in the Poisson example, the saturated model’s parameters can be estimated by solving the following score function:
0=U(θi)=∂ℓ(θ,φ;Y)∂θi=Ai(yi−γ′(θi))φ
Denote the solution of the above equation by ˆθi, then the general form of the loglikelihood of the saturated model (treat the scale parameter as constant) is:
ℓ(ˆθ,φ;Y)=n∑i=1Ai(yiˆθi−γ(ˆθi))/φ+n∑i=1τ(yi,φ/Ai).
In my previous answer, I incorrectly stated that the first term on the right side of (∗∗) is always zero, the above Poisson data example proves it is wrong. For a more complicated example, consider the Gamma distribution Γ(α,β) given in the appendix.
Proof of the first term in the loglikelihood of saturated Gamma model is nonzero: Given
f(y;α,β)=βαΓ(α)e−βyyα−1,y>0,α>0,β>0,
we must do reparameterization first so that f has the exponential family form (1). It can be verified if letting
φ=1α,θ=−βα,
then f has the representation:
f(y;θ,φ)=exp[θy−(−log(−θ))φ+τ(y,φ)],
where
τ(y,φ)=−logφφ+(1φ−1)logy−logΓ(φ−1).
Therefore, the MLEs of the saturated model are ˆθi=−1yi.
Hence
n∑i=11φ[ˆθiyi−(−log(−ˆθi))]=n∑i=11φ[−1−log(yi)]≠0,
unless yi take very special values.
Attribution
Source : Link , Question Author : Alex , Answer Author : Zhanxiong