How can I assess GEE/logistic model fit when covariates have some missing data?

I have fit two generalized estimating equation (GEE) models to my data:

1) Model 1: Outcome is longitudinal Yes/No variable (A) (year 1,2,3,4,5) with longitudinal continuous predictor (B) for years 1,2,3,4,5.

2) Model 2: Outcome is the same longitudinal Yes/No variable (A), but now with my predictor fixed at its year 1 value i.e. forced to be time invariant (B).

Due to missing measurements in my longitudinal predictor at a few time points for different cases, the number of data points in model 2 is higher than in model 1.

I would like to know about what comparisons I can validly make between the odds ratios, p-values and fit of the two models e.g.:

  • If the OR for predictor B is bigger in model 1, can I validly say that the association between A and B is stronger in model1?

  • How can I assess which is the better model for my data. am I correct in thinking that QIC/AIC pseudo R squareds should not be compared across models if the number of observations is not the same?

Any help would be greatly appreciated.


I would definitely try multiple imputation (eg with mice or Amelia in R), possibly with several alternative methods to impute missing values.

In the worst case scenario you can consider it a sensitivity analysis.

Source : Link , Question Author : N26 , Answer Author : Giuseppe Biondi-Zoccai

Leave a Comment