Information gain and mutual information: different or equal?

I’m very confused about the difference between Information gain and mutual information. to make it even more confusing is that I can find both sources defining them as identical and other which explain their differences:

Information gain and Mutual information are the same:

  • Feature Selection: Information Gain VS Mutual Information
  • An introduction to information retrieval: “Show that mutual information and information gain are equivalent”, page 285, exercise 13.13.
  • It is thus known as the information gain, or more commonly the mutual information between X and Y” –> CS769 Spring 2010 Advanced Natural Language Processing, “Information Theory”, lecturer: Xiaojin Zhu
  • “Information gain is also called expected mutual
    information” –> “Feature Selection Methods for Text Classification”,
    Nicolette Nicolosi,

They are different:

little bit of confusion

I could still find other sources defending opposite thesis but I think these are enough. Can anyone enlighten me about the real difference / equality of these two measures?

EDIT: other related question

Information gain, mutual information and related measures


There are two types of Mutual Information:

  • Pointwise Mutual Information and
  • Expected Mutual Information

The pointwise Mutual Information between the values of two random variables can be defined as:

The expected Mutual Information between two random variables X and Y can be defined as as the Kullback-Leiber Divergence between p(X,Y) and p(X)p(Y):

Sometimes you find the definition of Information Gain as I(X;Y):=H(Y)H(YX) with the Entropy H(Y) and the conditional entropy H(YX)
, so


Note: p(y)=xp(x,y)

So expected Mutual Information and Information Gain are the same (with both definitions above).

Source : Link , Question Author : jcsun , Answer Author : chris elgoog

Leave a Comment