Pooling calibration plots after multiple imputation

I would like advice on pooling the calibration plots/statistics after multiple imputation. In the setting of developing statistical models in order to predict a future event (e.g. using data from hospital records to predict post hospital discharge survival or events), one can imagine there is some to a lot of missing information. Multiple imputation is a way of handling such a situation, but results in the need to pool the tests statistics from each imputation dataset taking into account the additional variability due to the inherent uncertainty of imputation.

I understand there are multiple calibration statistics (hosmer-lemeshow, Harrell’s Emax, estimated calibration index, etc.), for which the ‘regular’ Rubin’s rules for pooling might apply.

However, these statistics often are overall measures of calibration which do not show specific miss-calibrated regions of the model. For this reason, I’d rather look at a calibration plot. Regrettably, I am clueless as to how to ‘pool’ the plots or the data behind them (predicted probabilities per individual and observed outcome per individual), and can’t find much in the biomedical literature (the field I am familiar with), or here, on CrossValidated. Of course, looking at each imputation dataset’s calibration plot could be an answer, but could become quite bothersome (to present) when a lot of imputation sets are created.

I would therefore like to ask whether there are techniques which would result in a calibration plot, pooled after multiple imputation(?)


[…] if your n is 1,000 and you have 5 MI datasets, why not create a single calibration plot from the 5000 and compare observed/expected in whatever desired fashion in those 5,000?

Regarding references:

No references, we published a paper recently where we stated without
proof that we obtained inference for bootstrap standard errors and
multiple imputation by pooling them together in this fashion. I think
you can state that the purpose of the analysis is testing at the 0.05
level that the expectation / observation ratio or difference is within
a normal distributional range and that quantile estimates are
invariant to the sample size, so testing based on the 95% CI is not
affected by pooling.

Source : Link , Question Author : IWS , Answer Author :

Leave a Comment