Type of inference to use with log-linear Poisson glm on contingency table frequency counts

I was doing some log-linear models to test for interactions/associations in multiway contingency tables (based on the tutorial here, http://ww2.coastal.edu/kingw/statistics/R-tutorials/loglin.html). I was doing this using a Poisson glm on the observed frequencies as well as with MASS‘s loglm. I was just wondering though what type of hypothesis test would make most sense here, sequential type I using anova() (not good since p values there depend on the order of the factors in the model), type III test using Anova() in car (independent of the order of the factors in the model) or using drop1 starting from the most complex model?

E.g. using the Titanic passenger survival data

library(COUNT)
data(titanic)
titanic=droplevels(titanic)
head(titanic)
mytable=xtabs(~class+age+sex+survived, data=titanic)
ftable(mytable)
                       survived  no yes
class     age    sex                   
1st class child  women            0   1
                 man              0   5
          adults women            4 140
                 man            118  57
2nd class child  women            0  13
                 man              0  11
          adults women           13  80
                 man            154  14
3rd class child  women           17  14
                 man             35  13
          adults women           89  76
                 man            387  75
freqdata=data.frame(mytable)
fullmodel=glm(Freq~SITE*SEX*MORTALITY,family=poisson,data=freqdata)

Would the most sensible test for interactions between the different categorical factors then be given by type I SS as in

anova(fullmodel, test="Chisq")
Analysis of Deviance Table

Model: poisson, link: log

Response: Freq

Terms added sequentially (first to last)


                       Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                      23    2173.33              
class                   2   231.18        21    1942.15 < 2.2e-16 ***
age                     1  1072.61        20     869.54 < 2.2e-16 ***
sex                     1   137.74        19     731.80 < 2.2e-16 ***
survived                1    77.61        18     654.19 < 2.2e-16 ***
class:age               2    32.41        16     621.78 9.178e-08 ***
class:sex               2    29.61        14     592.17 3.719e-07 ***
age:sex                 1     6.09        13     586.09   0.01363 *  
class:survived          2   132.69        11     453.40 < 2.2e-16 ***
age:survived            1    25.58        10     427.81 4.237e-07 ***
sex:survived            1   312.93         9     114.89 < 2.2e-16 ***
class:age:sex           2     4.04         7     110.84   0.13250    
class:age:survived      2    35.45         5      75.39 2.002e-08 ***
class:sex:survived      2    73.71         3       1.69 < 2.2e-16 ***
age:sex:survived        1     1.69         2       0.00   0.19421    
class:age:sex:survived  2     0.00         0       0.00   1.00000 

or using type III SS using car‘s Anova :

library(car)
library(afex)
set_sum_contrasts()
Anova(fullmodel, test="LR", type="III")
Analysis of Deviance Table (Type III tests)

Response: Freq
                       LR Chisq Df Pr(>Chisq)    
class                    37.353  2  7.744e-09 ***
age                       5.545  1  0.0185317 *  
sex                       0.000  1  0.9999999    
survived                  1.386  1  0.2390319    
class:age                 5.476  2  0.0646851 .  
class:sex                 0.000  2  1.0000000    
age:sex                   0.000  1  0.9999888    
class:survived           16.983  2  0.0002052 ***
age:survived              0.056  1  0.8126973    
sex:survived              0.000  1  0.9999953    
class:age:sex             0.000  2  1.0000000    
class:age:survived        3.461  2  0.1771673    
class:sex:survived        0.000  2  1.0000000    
age:sex:survived          0.000  1  0.9999905    
class:age:sex:survived    0.000  2  1.0000000    

or using single term deletions and LRTs with drop1 :

fullmodel=glm(Freq~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, family=poisson, data=freqdata)
drop1(fullmodel,test="Chisq")
Single term deletions

Model:
Freq ~ class + age + sex + survived + class:age + class:sex + 
    class:survived + age:sex + age:survived + sex:survived
               Df Deviance    AIC     LRT  Pr(>Chi)    
<none>              114.89 249.01                      
class:age       2   162.76 292.89  47.877 4.016e-11 ***
class:sex       2   115.74 245.86   0.850    0.6537    
class:survived  2   230.95 361.08 116.067 < 2.2e-16 ***
age:sex         1   114.89 247.02   0.008    0.9294    
age:survived    1   134.39 266.52  19.505 1.003e-05 ***
sex:survived    1   427.81 559.94 312.927 < 2.2e-16 ***

?

[This last result appears to match that of MASS‘s loglm, as should be the case :

fullmodel=loglm(~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, mytable)
stepAIC(fullmodel) 
drop1(fullmodel,test="Chisq")
Single term deletions

Model:
~class + age + sex + survived + class:age + class:sex + class:survived + 
    age:sex + age:survived + sex:survived
               Df    AIC     LRT  Pr(>Chi)    
<none>            144.89                      
class:age       2 188.76  47.877 4.016e-11 ***
class:sex       2 141.74   0.850    0.6537    
class:survived  2 256.95 116.067 < 2.2e-16 ***
age:sex         1 142.89   0.008    0.9294    
age:survived    1 162.39  19.505 1.003e-05 ***
sex:survived    1 455.81 312.927 < 2.2e-16 ***

]

(Any other more elegant ways btw to specify a model with main effects + all first order interaction effects?)

Any thoughts what would be the best way to analyse such multiway contingency tables, and adequately test for associations for unbalanced data sets?

EDIT: based on the answer below I went for the drop1 solution :

fullmodel=glm(Freq~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, family=poisson, data=freqdata)
drop1(fullmodel,test="Chisq")

which is equivalent to the log-linear model in MASS :

fullmodel=loglm(~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, mytable)
stepAIC(fullmodel) 
drop1(fullmodel,test="Chisq")

Answer

If I had to choose based on how you set this up, I guess I would go with Anova(), but neither makes much sense. The order R enters the variables into the model is standardized and arbitrary. I would not use that to define the tests I would run.

Instead, use ?loglin or ?loglm in the MASS library, and then drop the specific variables / combinations that you are interested in testing. There is an R loglm tutorial using the Titanic dataset here.

As @DWin notes, the sequential vs not distinction corresponds to meaningfully different hypotheses. So that cannot be answered except by the researcher. The standard version of this point in the R world is Venables’ paper, Exegesis on linear models (pdf). Given that you state you just wonder “which factors are associated with each other”, that seems less like a conditional inference and more like dropping the specified association from the full model and testing that, or perhaps testing associations dropped from stripped down models where the other variables aren’t included at all.

Attribution
Source : Link , Question Author : Tom Wenseleers , Answer Author : Community

Leave a Comment