### Archive

Posts Tagged ‘itl’

## Hand posture recognition using minimum divergence classifier

I and my colleague were suggested by a reviewer to apply our accepted work on some real-world application. “Bro, we’ve got less than 4 days to apply our work on a real-world problem…what would we do?”, we spent 10 minutes discussing several possible problems such as automatic video segmentation, CD cover searching, human gesture recognition and some other funny-crazy ideas. Finally, with our curiosity and the time constraint we ended up with static hand posture recognition. Fortunately, the data set is not too difficult to find on internet. Millions thanks to Triesch and Von Der Malsburg for the wonderful hand posture database–that saved our lives.

Originally we found that calculating divergence measure of 2 Gaussian mixture models (GMM) can be done efficiently using Cauchy-Schwarz divergence ($D_{CS}$) as it gives closed-form expression for any pair of GMMs. Of course, we can’t get this awesome property in Kullback-Leibler divergence ($D_{KL}$)…why? read our paper [1] ^_^ Yay! In short, $D_{KL}$ formulation does not allow Gaussian integral trick, hence closed-form expression is not possible.

In this work, we use minimum divergence classifier to recognize the hand postures. Please see our paper for more details. We had finished our experiment on the second day, so we have some time left to make a fancy plot summarizing our work which we would like to share with you below. The classification accuracy using $D_{CS}$ and $D_{KL}$ are 95% and 92% respectively, and the former method also gives much better computational run-time, about 10 time faster. The figures below also suggest that our proposed method outperforms $D_{KL}$ when it comes to clustering as the proposed method gives more discriminative power.

Similarity matrix calculated by Cauchy-Schwarz divergence
Similarity matrix calculated by Kullback-Leibler divergence

[1] K. Kampa, E. Hasanbelliu and J. C. Principe, “Closed-form Cauchy-Schwarz pdf Divergence for Mixture of Gaussians,” Proc. of the International Joint Conference on Neural Networks (IJCNN 2011). [pdf] [BibTex]

We make our code available for anyone under
creative commons agreement [.zip]

We also collected some interesting links to the hand posture/gesture database here:

The following papers and documents can be helpful:

A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior
Hatice Gunes and Massimo Piccardi Computer Vision Research Group,
University of Technology, Sydney (UTS)

A Color Hand Gesture Database for Evaluating and Improving Algorithms on Hand Gesture and Posture Recognition

Hand Detection and Gesture Recognition using ASL Gestures
Supervisor: Andre L. C. Barczak
Student: Dakuan CUI
Massey University

Categories: iDo, Research, Reviews

## Closed-Form Expression for Divergence of Gaussian Mixture Models

This report presents an efficient approach to calculate the difference between two probability density functions (pdfs), each of which is a mixture of Gaussians (MoG). Unlike Kullback-Leibler divergence (DKL), the authors propose that the Cauchy-Schwarz (CS) pdf divergence measure (DCS) can give an analytic closed-form expression for MoG. This property of the DCS makes fast and efficient calculations possible, which is desired tremendously in real-world applications where the dimensionality of the data/features is very high. We show that DCS follows similar trends as DKL, but can be computed much faster, especially when the dimensionality is high. Moreover, the proposed method is shown to significantly outperformDKL in classifying real-world 2D and 3D objects based on distances alone. Full paper [pdf]. MATLAB code is available here.

## Gaussian Mean Shift Clustering

I have heard a lot of about Gaussian mean shift clustering as it is very popular, easy to implement, and one of the best for object tracking in video. Recently I have been working on structure learning, and one of my recent work is to say (in fact, boast, wahahahaha…just kidding) that structure learning can be a more general case of those classic clustering algorithms like k-mean, LBG-vector quantization, mean shift and EM algorithm. So I have to find some algorithms to compare with mine. The algorithm that I want to compare with has to come up with the number of clusters automatically once the covariance matrix or kernel width is known, so I think about LBG-VQ and Gaussian mean shift (GMS). When deriving GMS, I found that it’s very similar to the mean update equation of EM algorithm except that in EM we try to estimate the $\mu$, so we take derivative w.r.t. $\mu$. Unlike EM, in GMS, we take derivative w.r.t. each sample point in the space. However, in Gaussian distribution we have the term $(x_i-\mu)$ whose derivative $\frac{\partial}{\partial\mu}$ and $\frac{\partial}{\partial x_{i}}$ have opposite sign, therefore when equating the derivative to zero, we will get similar update equation.

the update equation as $x^{(t+1)}=\frac{\sum_{i=1}^{N}\mathcal{N}(x^{(t)}|x_{i},\Lambda^{-1})x_{i}}{\sum_{i'=1}^{N}\mathcal{N}(x^{(t)}|x_{i'},\Lambda^{-1})}\label{eq:update_mean_shift}$

which very similar to mean update in EM algorithm for mixture of Gaussian.

Here is a short report [pdf] on GMS which shows the preliminary results and how to derive GMS. MATLAB codes for GMS are available here.

## Dimensionality reduction using graphical models

— This idea popped up when I was writing my project proposal “Automatic Algorithm for Finding Dynamic Trees Bayesian Networks structure using ITL.” I think this idea is pretty trivial but good to know in order to inspire some new ideas! 🙂

Using graphical models can reduce the dimensionality because assigning nodes with some relationships means you guess or assume the structure for those dimension on the nodes already. For example, if we would like to make image segmentation on an RGB image, we can have two approaches in comparison:

1. 5D-approach: Here we extract the important features from the image which are R, G, B, x-coordinate and y-coordinate totally 5 features. That means we will have to do clustering of point(vector) in 5-D feature space! One important drawback for using high-dimensional space is that you might get very sparse points in the feature space which sometimes is not adequate to provide a good result.
2. 3D-approach: Here we will use nodes encode the positions X and Y of the pixels in the image. Consequently each node will take only 3-D distribution (not 5-D). However, what we will have to pay for having 2 fewer dimensions than before is that we will have to assume the relationships among the nodes which implies the relationship in the dimension of X and Y (in disguise). Therefore, if the assumption is good, the result is OK, but if the assumption is bad, then the result will be bad too. However, in the graphical models the complexity is determined by the size of the biggest table. So the dimensionality can be huge if the causality is very complicated. But that is the different story since the graphical models is probabilistic method which, somehow, cannot avoid having these kinds of huge joint pdf.