## Mean-Field Variational Approximation for Graphical Models

I remember the hard time I had when I tried to understand the mean-field variational approximation for Dynamic Tree Bayesian Networks (DTBNs). I tried to find a good paper to read. There were so many of them out there, so many that I did not know which one to begin with. Unfortunately I began with a bad one ^_^. I felt like I was walking in a deep forest, dark, lost and hopeless…hahaha. Now that I get through of it already, I would not want anyone to get lost like I did, and I would like to save some precious time for you. There are a whole bunch of great authors on this topic and I’m not having much time to write my own version of mean-field tutorial anyway. In stead I think it’s a good idea to compile a reading list on this topic so that you can enjoy walking on this beautiful forest as deep as you want. Please remember that this is just a starting point.

**Check list **

- familiar with graphical models –> You may refer to an old post of mine “A good introduction to graphical models“
- understand the concept of probabilistic inference. At least you can perform an inference on a simple graphical model
- Understand the core idea of more efficient inference methods like Forward-Backward (FB) algorithm, Belief Propagation (BP), Factor Graph (FG) –>
**READ [7] then [2]**and ask yourself the following:- know how they can be derived
- know why the methods do/don’t work on a circumstance
- know when/why the methods give the exact/approximate solution on a circumstance
- Understand how FB is derived and be able to use FB appropriately since FB will be used a lot in the examples later
- Realize that FB is a special case of BP, and BP is a special case of FG. At this point you might only adopt FG to use since it’s easy
- Know that the result of FG and FB is just the joint prob of the variable of interest and the evidence given a set of parameters

- Understand parameter estimation using maximum likelihood (ML) and Expectation-Maximization (EM) algorithm. –>
**READ [3] then [1]**from this point on having a good background on matrix calculus is very helpful.- Why do we need EM when we have ML?
- There are so many ways to write the EM, BUT….what is the benefit of writing EM update equations in terms of expectation of hidden variable w.r.t the prob. distribution of the hidden variables themselves?
- From the update equations obtained from ML and EM, find the similarity between the terms!!!
- Now that you see the similarity, you will insight exactly why we need EM algorithm and the beauty of the framework!!!
- From there you will see EM in disguise of ML, and here you will get the insight of how can you go beyond EM algorithm.
- Realize that M step of EM is very similar to that of ML, but the problem usually comes at the E step!!!
- E-step involves mainly on the inference technique. The intractable computation begin to surface at this step. That’s why I previously emphasized that we must have a good foundation of performing inference.
- See the connection between the expectation terms and the joint of the variable of interest with the evidence…They are almost the same thing!
- Practice deriving the update rules for parameter estimation in various situation/model/structure until you feel familiar with it.

- Mean-field approximation for factorial HMM (fHMM). Work on the mean-field approximation on a simple model. –>
**READ [1] and see the detail in [4]**- For an fHMM, try to derive the EM update equations to estimate the parameters. See which part (E-step or M-step) of the EM algorithm that is intractable?
- Why E-step is intractable?
- Is E-step of fHMM the same as that from HMM? Why?
- Get insight why and what behavior of the mean-field apprx makes the intractable tractable?
- We have a complete control over Q but we cannot enumerate P, how can we know that how Q and P are closed together when we cannot enumerate P? Explain it!
- Why do we use Dkl(Q|P) in stead of Dkl(P|Q)? Hint: It’s about the computational complexity.

- Advanced example: Mean-Field Dynamic Trees (MFDT). Extend your intuition of the mean field approximation. –>
**READ [5] and see the detail in [6]**.- Derive your own MFDT and compare to the original MFDT in the paper
- You might find that the MFDT derived in the original may not be correct!!! There are something wrong in it…please find it.
- Figure out how exactly you can use/code the update equations.
- Think forward on how could you make the approximation more accurate! –> At this point you may want to step ahead to “structured variational approximation”

**Bibliography**

[1] “An introduction to hidden Markov models and Bayesian networks” by Zoubin Ghahramani

[2] “Factor graphs and the sum-product algorithm” by F. R. Kschischang, B. J. Frey, H. A. Loeliger

[3] “A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models” by J. Bilmes

[4] “Factorial Hidden Markov Models” by Zoubin Ghahramani, Michael I. Jordan Machine Learning, Vol. 29, No. 2. (1 November 1997), pp. 245-273.

[5] “MFDTs: mean field dynamic trees” Adams, N.J. and Storkey, A.J. and Ghahramani, Z. and Williams, C.K.I. (2000)

[6] “Dynamic Trees: A Hierarchical Probabilistic Approach to Image Modelling” PhD Thesis by NJ Adams

[7] “Probabilistic reasoning in intelligent systems: Networks of plausible inference” (1988) J Pearl – Morgan Kauffmann, Los Altos, CA