Mutual information¶

Given 2 random variables \(X\) and \(Y\) we may ask, if we know X how much knowledge we gain about Y. We can achieve this by comparing the KL Divergence the joint distribution \(p(X,Y)\) with the factored distribution \(p(X)p(Y)\).

\[ I(X;Y) \triangleq KL(p(X,Y)||p(X)p(Y)) = \sum_x \sum_y p(x,y) \log \frac{p(x,y)}{p(x)p(y)} \]

This is always positive
\(I(X;Y) = 0\) only if \(p(X,Y) = p(X)p(Y)\)

ONLY defined for discrete random variables

Entropy¶

Mutual information can be expressed using entropy:

\[ I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) \]

\(H\) is the entropy
\(H(Y|X) = \sum_x p(x)H(Y|X=x)\) is the conditional entropy

This can be interpreted as the reduction in uncertainity about X after observing Y (And vice versa)

mutual information

study-notes

Mutual information¶

Entropy¶

Point wise mutual information¶

Continuous random variables¶