## A good Introduction on MapReduce

MapReduce is a framework to efficiently process a task that can be parallelized using cluster or grid. A good introduction can be found in the link below.

http://en.wikipedia.org/wiki/MapReduce#Example

In a sense, MapReduce framework is very similar to message-passing algorithm in graphical models where the Map and Reduce are comparable to building (tree) structure and marginalization of the messages respectively. So, I think MapReduce can make an inference plausible for large-scale graphical models.

## Irregular Tree Structure Bayesian Networks (ITSBNs)

Irregular Tree Structure Bayesian Networks (ITSBNs)

This is my on-going work on structured image segmentation. I’m about to publish the algorithm soon, so the details will be posted after submission. ^_^ Please wait.

<details will be appeared soon>

Here are some results

How to use the ITSBN toolbox

- Install Bayesian networks MATLAB toolbox and VLFeat. Let’s say we put them in the folders Z:\research\FullBNT-1.0.4 and Z:\research\vlfeat-0.9.9 respectively.
- Download and unpack the ITSBN toolbox. Let’s say the folder location is “Z:\research\ITSBN”. The folder contains some MATLAB files and 2 subdirectories 1) Gaussian2Dplot and 2) QuickShift_ImageSegmentation
- Put any color image to be segmented in the same folder. In this case, we use the one from Berkeley image segmentation BSDS500 and the folder is ‘Z:\research\BSR\BSDS500\data\images\test’
- Open the file main_ITSBNImageSegm.m in MATLAB and make sure that all the paths pointing to their corresponding folders:
- vlfeat_dir = ‘Z:\research\vlfeat-0.9.9\toolbox/vl_setup’;
- BNT_dir = ‘Z:\research\FullBNT-1.0.4’;
- image_dataset_dir = ‘Z:\research\BSR\BSDS500\data\images\test’;

- Run main_ITSBNImageSegm.m. When finished you should see folders of segmented images in the folder ‘Z:\research\ITSBN’.

## Image Segmentation using Gaussian Mixture Models

Today I wanted to compare my image segmentation results against some traditional method, and of course, Gaussian Mixture Model (GMM) is my victim. I really hoped that GMM will be beaten badly and my algorithm would look super smart against GMM. However, the result from GMM is really good–much better than what I originally expected! The features are L*a*b pixel values and x-y pixel locations (all features are standardized). I tested my GMM segmentation code on some images, the results are pretty good relative to its short runtime. I can’t wait to share my MATLAB code and some results are shown below:

## Balanced-Tree Structure Bayesian Networks (TSBNs) for Image Segmentation

In this work, I made a more generalized version of Gaussian mixture model (GMM) by putting prior balanced-tree structure over the class variables (mixture components) in the hope that the induced correlation among the hidden variables would suppress the noise in the resulting segmentation. Unlike supervised image classification by [1], this work focuses on totally unsupervised segmentation using TSBN. In this work, it is interesting to see how the data will be “self-organized” according to the initial structure given by TSBN.

The MATLAB code is available here. The codes call inference routines in Bayesian network toolbox (BNT), so you may want to install the toolbox before using my TSBN code.

[1] X. Feng, C.K.I. Williams, S.N. Felderhof, “Combining Belief Networks and Neural Networks for Scene Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 467-483, April, 2002

## Conditional Random Fields (CRFs) and Voxel on LIDAR

I found some interesting papers using over-segmented data (voxel) concept and CRF with terrestrial LIDAR point cloud.

Volumetric Visualization of Multiple-return Lidar Data: Using Voxels by Jason Stoker [link]

Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data by D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, and A. Ng. (CVPR2005)

Conditional Random Field for 3D point clouds with Adaptive Data Reduction by E. H. Lim and D. Suter (CW2007)

Multi-scale Conditional Random Fields for Over-segmented Irregular 3D Point Clouds Classification by E. H. Lim and D. Suter (CVPR2008)

3D terrestrial LIDAR classifications with super-voxels and multi-scale Conditional Random Fields by E. H. Lim and D. Suter (CVPR2008)

Shape-based Recognition of 3D Point Clouds in Urban Environments by Aleksey Golovinskiy et al (ICCV2009)

## Variational Bayesian Gaussian Mixture Model (VBGMM)

EM-algorithm for Mixture of Gaussian (EMGMM) has been a very popular model in statistics and machine learning. In EMGMM, the model parameters are still estimated using maximum likelihood (ML) method, however, recently there has been a need to put the prior probability on the model parameters. So, the GMM becomes a hierarchical Bayesian model whose root layer to leaf layer are the parameters, the mixture proportion and the observation respectively. Originally this hierarchical model can be infered using some challenging integration techniques or stochastic sampling techniques (e.g. MCMC). The latter case takes a lot of computational time to sample from the distribution.

Fortunately, there is an approximation technique that can help to make fast inference and give a good approximate solution for example mean-field variational approximation. The variational approximation is very well explained in chapter9 of the classic machine learning textbook [1] by Bishop. There is a very good example on the variational Bayesian Gaussian mixture models. In fact, Bishop did a great job on explaining and deriving VBGMM, however, for a beginner, the algebra of the derivation can be challenging. Since the derivation contains a lot of interesting things that can be applied to other variational approximations, and the text skipped some details, so I decided to “fill in” the missing part and make a derivation tutorial out of it which is available here [pdf]. I also made the details of the derivation of some examples prior to VBGMM section in the text [1] available as well [pdf]. Originally, VBGMM is firstly appeard in an excellent paper [2]. Again, for the introduction and for more detail on the interpretation of the model, please refer to the original paper or Bishop’s textbook.

Implementation of the VBGMM in MATLAB can be found at Prof. Kevin Murphy’s group in UBC [link] or [link]. The code requires Netlab toolbox (a bunch of very good MATLAB codes for machine learning).

[1] “Pattern Recognition and Machine Learning” (2006) by Christopher Bishop [link]

[2] “A Variational Bayesian Framework for Graphical Models” (NIP99) by Hagai Attias [link]

## EM vs. mean-field variational approximation

When I derive mean-field (MF) approximation inference for many problems, I observed that MF gives very similar update equations to that of EM. The similarities not only appear on the results, but on the derivation process as well. This afternoon my adviser was out of the office ^_^, so I have some free time to figure out what’s going on between EM and MF. I decided to use a simple problem like Gaussian Mixture Model (GMM) as a model for explanation. Here is the writeup [link]