Distinguishing missing at random (MAR) from missing completely at random (MCAR)

I’ve had these two explained multiple times. They continue to cook my brain. Missing Not at Random makes sense to be, and Missing Completely at Random makes sense…it’s the Missing at Random that doesn’t as much.

What gives rise to data that would be MAR but not MCAR?


Missing at random (MAR) means that the missingness can be explained by variables on which you have full information. It’s not a testable assumption, but there are cases where it is reasonable vs. not.

For example, take political opinion polls. Many people refuse to answer. If you assume that the reasons people refuse to answer are entirely based on demographics, and if you have those demographics on each person, then the data is MAR. It is known that some of the reasons why people refuse to answer can be based on demographics (for instance, people at both low and high incomes are less likely to answer than those in the middle), but there’s really no way to know if that is the full explanation.

So, the question becomes “is it full enough?”. Often, methods like multiple imputation work better than other methods as long as the data isn’t very missing not at random.

Source : Link , Question Author : Fomite , Answer Author : Peter Flom

Leave a Comment