## Variational Bayesian Gaussian Mixture Model (VBGMM)

EM-algorithm for Mixture of Gaussian (EMGMM) has been a very popular model in statistics and machine learning. In EMGMM, the model parameters are still estimated using maximum likelihood (ML) method, however, recently there has been a need to put the prior probability on the model parameters. So, the GMM becomes a hierarchical Bayesian model whose root layer to leaf layer are the parameters, the mixture proportion and the observation respectively. Originally this hierarchical model can be infered using some challenging integration techniques or stochastic sampling techniques (e.g. MCMC). The latter case takes a lot of computational time to sample from the distribution.

Fortunately, there is an approximation technique that can help to make fast inference and give a good approximate solution for example mean-field variational approximation. The variational approximation is very well explained in chapter9 of the classic machine learning textbook [1] by Bishop. There is a very good example on the variational Bayesian Gaussian mixture models. In fact, Bishop did a great job on explaining and deriving VBGMM, however, for a beginner, the algebra of the derivation can be challenging. Since the derivation contains a lot of interesting things that can be applied to other variational approximations, and the text skipped some details, so I decided to “fill in” the missing part and make a derivation tutorial out of it which is available here [pdf]. I also made the details of the derivation of some examples prior to VBGMM section in the text [1] available as well [pdf]. Originally, VBGMM is firstly appeard in an excellent paper [2]. Again, for the introduction and for more detail on the interpretation of the model, please refer to the original paper or Bishop’s textbook.

Implementation of the VBGMM in MATLAB can be found at Prof. Kevin Murphy’s group in UBC [link] or [link]. The code requires Netlab toolbox (a bunch of very good MATLAB codes for machine learning).

[1] “Pattern Recognition and Machine Learning” (2006) by Christopher Bishop [link]

[2] “A Variational Bayesian Framework for Graphical Models” (NIP99) by Hagai Attias [link]

There was a typo on equ (38), ‘ln’ should have been removed. Thanks Flash Arthur for pointing that out.

awsome ：）

Hi, Bot, what you said is exactly right ‘for a beginner, the algebra of the derivation can be challenging’!!!!

I am reading some chapters written by Bishop[1], some of the missing derivation steps are skipped, which confuse me for a long time!

I am so lucky to find your blog, I am sure what you have done will help a lot people, especially for those beginners like me.

Besides, I am really interested in one of your paper’CLOSED-FORM CAUCHY-SCHWARZ PDF DISTANCE FOR MIXTURE OF GAUSSIANS’, because I am doing some experiment using Kuallback divergence to comptue the distance between GMMs.

I will read your paper soon!! 🙂

Thank you.

Leo