EM-algorithm for Mixture of Gaussian (EMGMM) has been a very popular model in statistics and machine learning. In EMGMM, the model parameters are still estimated using maximum likelihood (ML) method, however, recently there has been a need to put the prior probability on the model parameters. So, the GMM becomes a hierarchical Bayesian model whose root layer to leaf layer are the parameters, the mixture proportion and the observation respectively. Originally this hierarchical model can be infered using some challenging integration techniques or stochastic sampling techniques (e.g. MCMC). The latter case takes a lot of computational time to sample from the distribution.
Fortunately, there is an approximation technique that can help to make fast inference and give a good approximate solution for example mean-field variational approximation. The variational approximation is very well explained in chapter9 of the classic machine learning textbook  by Bishop. There is a very good example on the variational Bayesian Gaussian mixture models. In fact, Bishop did a great job on explaining and deriving VBGMM, however, for a beginner, the algebra of the derivation can be challenging. Since the derivation contains a lot of interesting things that can be applied to other variational approximations, and the text skipped some details, so I decided to “fill in” the missing part and make a derivation tutorial out of it which is available here [pdf]. I also made the details of the derivation of some examples prior to VBGMM section in the text  available as well [pdf]. Originally, VBGMM is firstly appeard in an excellent paper . Again, for the introduction and for more detail on the interpretation of the model, please refer to the original paper or Bishop’s textbook.
I have heard a lot of about Gaussian mean shift clustering as it is very popular, easy to implement, and one of the best for object tracking in video. Recently I have been working on structure learning, and one of my recent work is to say (in fact, boast, wahahahaha…just kidding) that structure learning can be a more general case of those classic clustering algorithms like k-mean, LBG-vector quantization, mean shift and EM algorithm. So I have to find some algorithms to compare with mine. The algorithm that I want to compare with has to come up with the number of clusters automatically once the covariance matrix or kernel width is known, so I think about LBG-VQ and Gaussian mean shift (GMS). When deriving GMS, I found that it’s very similar to the mean update equation of EM algorithm except that in EM we try to estimate the , so we take derivative w.r.t. . Unlike EM, in GMS, we take derivative w.r.t. each sample point in the space. However, in Gaussian distribution we have the term whose derivative and have opposite sign, therefore when equating the derivative to zero, we will get similar update equation.
the update equation as
which very similar to mean update in EM algorithm for mixture of Gaussian.
I came across with an interesting algorithm for extracting feature descriptors from an image or a video file. The LBP looks very simple and easy to program, but I haven’t had a chance to try it myself. A lot of people claim that it’s pretty good.
Scholarpedia gives very good and short overview of this method: