Information gain and mutual information: different or equal?

I’m very confused about the difference between Information gain and mutual information. to make it even more confusing is that I can find both sources defining them as identical and other which explain their differences:

Information gain and Mutual information are the same:

  • Feature Selection: Information Gain VS Mutual Information
  • An introduction to information retrieval: “Show that mutual information and information gain are equivalent”, page 285, exercise 13.13.
  • It is thus known as the information gain, or more commonly the mutual information between X and Y” –> CS769 Spring 2010 Advanced Natural Language Processing, “Information Theory”, lecturer: Xiaojin Zhu
  • “Information gain is also called expected mutual
    information” –> “Feature Selection Methods for Text Classification”,
    Nicolette Nicolosi,
    http://www.cs.rit.edu/~nan2563/feature_selection.pdf

They are different:

little bit of confusion

I could still find other sources defending opposite thesis but I think these are enough. Can anyone enlighten me about the real difference / equality of these two measures?

EDIT: other related question

Information gain, mutual information and related measures

Answer

There are two types of Mutual Information:

  • Pointwise Mutual Information and
  • Expected Mutual Information

The pointwise Mutual Information between the values of two random variables can be defined as:
pMI(x;y):=logp(x,y)p(x)p(y)

The expected Mutual Information between two random variables X and Y can be defined as as the Kullback-Leiber Divergence between p(X,Y) and p(X)p(Y):
eMI(X;Y):=x,yp(x,y)logp(x,y)p(x)p(y)

Sometimes you find the definition of Information Gain as I(X;Y):=H(Y)H(YX) with the Entropy H(Y) and the conditional entropy H(YX)
, so

I(X;Y)=H(Y)H(YX)=yp(y)logp(y)+x,yp(x)p(yx)logp(yx)=x,yp(x,y)logp(yx)y(xp(x,y))logp(y)=x,yp(x,y)logp(yx)x,yp(x,y)logp(y)=x,yp(x,y)logp(yx)p(y)=x,yp(x,y)logp(yx)p(x)p(y)p(x)=x,yp(x,y)logp(x,y)p(y)p(x)=eMI(X;Y)

Note: p(y)=xp(x,y)

So expected Mutual Information and Information Gain are the same (with both definitions above).

Attribution
Source : Link , Question Author : jcsun , Answer Author : chris elgoog

Leave a Comment