I’m very confused about the difference between Information gain and mutual information. to make it even more confusing is that I can find both sources defining them as identical and other which explain their differences:
Information gain and Mutual information are the same:
- Feature Selection: Information Gain VS Mutual Information
- An introduction to information retrieval: “Show that mutual information and information gain are equivalent”, page 285, exercise 13.13.
- It is thus known as the information gain, or more commonly the mutual information between X and Y” –> CS769 Spring 2010 Advanced Natural Language Processing, “Information Theory”, lecturer: Xiaojin Zhu
- “Information gain is also called expected mutual
information” –> “Feature Selection Methods for Text Classification”,
They are different:
- yang –> “A comparative study on Feature Selection in Text Categorization” –> they are treated separately and mutual information is even discarded because it performs very bad compared to IG
- citing yang –> “An Extensive Empirical Study of
Feature Selection Metrics for Text Classification” — http://www.jmlr.org/papers/volume3/forman03a/forman03a_full.pdf
little bit of confusion
I could still find other sources defending opposite thesis but I think these are enough. Can anyone enlighten me about the real difference / equality of these two measures?
EDIT: other related question
There are two types of Mutual Information:
- Pointwise Mutual Information and
- Expected Mutual Information
The pointwise Mutual Information between the values of two random variables can be defined as:
The expected Mutual Information between two random variables X and Y can be defined as as the Kullback-Leiber Divergence between p(X,Y) and p(X)p(Y):
Sometimes you find the definition of Information Gain as I(X;Y):=H(Y)−H(Y∣X) with the Entropy H(Y) and the conditional entropy H(Y∣X)
So expected Mutual Information and Information Gain are the same (with both definitions above).