Suppose I have two sets X and Y and a joint probability distribution over these sets p(x,y). Let p(x) and p(y) denote the marginal distributions over X and Y respectively.

The mutual information between X and Y is defined to be:

I(X;Y)=∑x,yp(x,y)⋅log(p(x,y)p(x)p(y))i.e. it is the average value of the pointwise mutual information pmi(x,y)≡log(p(x,y)p(x)p(y)).

Suppose I know upper and lower bounds on pmi(x,y): i.e. I know that for all x,y the following holds:

−k≤log(p(x,y)p(x)p(y))≤kWhat upper bound does this imply on I(X;Y). Of course it implies I(X;Y)≤k, but I would like a tighter bound if possible. This seems plausible to me because p defines a probability distribution, and pmi(x,y) cannot take its maximum value (or even be non-negative) for every value of x and y.

**Answer**

My contribution consists of an example. It illustrates some limits on how the mutual information can be bounded given bounds on the pointwise mutual information.

Take X=Y={1,…,n} and p(x)=1/n for all x∈X. For any m∈{1,…,n/2} let k>0 be the solution to the equation

mek+(n−m)e−k=n.

Then we place point mass ek/n2 in nm points in the product space {1,…,n}2 in such a way that there are m of these points in each row and each column. (This can be done in several ways. Start, for instance, with the first m points in the first row and then fill out the remaining rows by shifting the m points one to the right with a cyclic boundary condition for each row). We place the point mass e−k/n2 in the remaining n2−nm points. The sum of these point masses is

nmn2ek+n2−nmn2e−k=mek+(n−m)e−kn=1,

so they give a probability measure. All the marginal point probabilities are

mn2ek+m−nn2e−k=1n,

so both marginal distributions are uniform.

By the construction it is clear that pmi(x,y)∈{−k,k}, for all x,y∈{1,…,n}, and (after some computations)

I(X;Y)=knmn2ek−kn2−nmn2e−k=k(1−e−kek−e−k(ek+e−k)−e−k),

with the mutual information behaving as k2/2 for k→0 and as k for k→∞.

**Attribution***Source : Link , Question Author : Florian , Answer Author : NRH*