MLE for Bayesian Networks¶

Here we look at how to learn parameters of a Bayesian network.

Suppose we are given i.i.i samples \(D=\{x_1, \cdots, x_n \}\) and a Bayesian network:

\[ p(x) = \prod_{i=1}^n \theta_{x_i|x_{\text{pa}(x)}} \]

Now we want to estimate the MLE of parameters (Conditional probability Densities). Our likelihood is:

\[ L(\theta|D) = \prod_{j=1}^n \sum_{j=1}^m \theta_{x_i^{(j)}|x_{\text{pa}(i)}^{(j)}} \]

We take logs and combine terms we get:

\[ \log l(\theta|D) = \sum_{i=1}^n\sum_{x_{\text{pa}(i)}}\sum_{x_i} \#(x_i,x_{\text{pa}(i)}) \cdot \log \theta_{x_i|x_{\text{pa}(x)}} \]

We can decompose the maximization of the log function into separate minimization’s for local conditional distribution. If we have discrete variables the MLE estimate has a closed form solution.

\[ \theta_{x_i|x_{\text{pa}(x)}}^* = \frac{\#(x_i, x_{\text{pa}(i)})}{\#(x_{\text{pa}(i)})} \]

Even if we do not have discrete variables, the log-factors are linearly separable, thus we can estimate each factor separately.