When looking at the eigenvectors of the covariance matrix, we get the directions of maximum variance (the first eigenvector is the direction in which the data varies the most, etc.); this is called principal component analysis (PCA).

I was wondering what it would mean to look at the eigenvectors/values of the mutual information matrix, would they point in the direction of maximum entropy?

**Answer**

While it is not a direct answer (as it is about *pointwise* mutual information), look at paper relating **word2vec** to a **singular value decomposition** of PMI matrix:

- O. Levy, Y. Goldberg, Neural Word Embedding as Implicit Matrix Factorization

We analyze skip-gram with negative-sampling (SGNS), a word embedding

method introduced by Mikolov et al., and show that it is implicitly factorizing

a word-context matrix, whose cells are the pointwise mutual information (PMI) of

the respective word and context pairs, shifted by a global constant. We find that

another embedding method, NCE, is implicitly factorizing a similar matrix, where

each cell is the (shifted) log conditional probability of a word given its context.

We show that using a sparse Shifted Positive PMI word-context matrix to represent

words improves results on two word similarity tasks and one of two analogy tasks.

When dense low-dimensional vectors are preferred, exact factorization with SVD

can achieve solutions that are at least as good as SGNS’s solutions for word similarity

tasks. On analogy questions SGNS remains superior to SVD. We conjecture

that this stems from the weighted nature of SGNS’s factorization.

**Attribution***Source : Link , Question Author : kmace , Answer Author : Piotr Migdal*