I and my colleague were suggested by a reviewer to apply our accepted work on some real-world application. “Bro, we’ve got less than 4 days to apply our work on a real-world problem…what would we do?”, we spent 10 minutes discussing several possible problems such as automatic video segmentation, CD cover searching, human gesture recognition and some other funny-crazy ideas. Finally, with our curiosity and the time constraint we ended up with static hand posture recognition. Fortunately, the data set is not too difficult to find on internet. Millions thanks to Triesch and Von Der Malsburg for the wonderful hand posture database–that saved our lives.
Originally we found that calculating divergence measure of 2 Gaussian mixture models (GMM) can be done efficiently using Cauchy-Schwarz divergence () as it gives closed-form expression for any pair of GMMs. Of course, we can’t get this awesome property in Kullback-Leibler divergence ()…why? read our paper  ^_^ Yay! In short, formulation does not allow Gaussian integral trick, hence closed-form expression is not possible.
In this work, we use minimum divergence classifier to recognize the hand postures. Please see our paper for more details. We had finished our experiment on the second day, so we have some time left to make a fancy plot summarizing our work which we would like to share with you below. The classification accuracy using and are 95% and 92% respectively, and the former method also gives much better computational run-time, about 10 time faster. The figures below also suggest that our proposed method outperforms when it comes to clustering as the proposed method gives more discriminative power.
 K. Kampa, E. Hasanbelliu and J. C. Principe, “Closed-form Cauchy-Schwarz pdf Divergence for Mixture of Gaussians,” Proc. of the International Joint Conference on Neural Networks (IJCNN 2011). [pdf] [BibTex]
We make our code available for anyone under creative commons agreement [.zip]
We also collected some interesting links to the hand posture/gesture database here:
The following papers and documents can be helpful:
A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior
Hatice Gunes and Massimo Piccardi Computer Vision Research Group,
University of Technology, Sydney (UTS)
A Color Hand Gesture Database for Evaluating and Improving Algorithms on Hand Gesture and Posture Recognition
FARHAD DADGOSTAR, ANDRE L. C. BARCZAK, ABDOLHOSSEIN SARRAFZADEH
Hand Detection and Gesture Recognition using ASL Gestures
Supervisor: Andre L. C. Barczak
Student: Dakuan CUI
This report presents an efficient approach to calculate the difference between two probability density functions (pdfs), each of which is a mixture of Gaussians (MoG). Unlike Kullback-Leibler divergence (DKL), the authors propose that the Cauchy-Schwarz (CS) pdf divergence measure (DCS) can give an analytic closed-form expression for MoG. This property of the DCS makes fast and efficient calculations possible, which is desired tremendously in real-world applications where the dimensionality of the data/features is very high. We show that DCS follows similar trends as DKL, but can be computed much faster, especially when the dimensionality is high. Moreover, the proposed method is shown to significantly outperformDKL in classifying real-world 2D and 3D objects based on distances alone. Full paper [pdf]. MATLAB code is available here.
I have heard a lot of about Gaussian mean shift clustering as it is very popular, easy to implement, and one of the best for object tracking in video. Recently I have been working on structure learning, and one of my recent work is to say (in fact, boast, wahahahaha…just kidding) that structure learning can be a more general case of those classic clustering algorithms like k-mean, LBG-vector quantization, mean shift and EM algorithm. So I have to find some algorithms to compare with mine. The algorithm that I want to compare with has to come up with the number of clusters automatically once the covariance matrix or kernel width is known, so I think about LBG-VQ and Gaussian mean shift (GMS). When deriving GMS, I found that it’s very similar to the mean update equation of EM algorithm except that in EM we try to estimate the , so we take derivative w.r.t. . Unlike EM, in GMS, we take derivative w.r.t. each sample point in the space. However, in Gaussian distribution we have the term whose derivative and have opposite sign, therefore when equating the derivative to zero, we will get similar update equation.
the update equation as
which very similar to mean update in EM algorithm for mixture of Gaussian.
— This idea popped up when I was writing my project proposal “Automatic Algorithm for Finding Dynamic Trees Bayesian Networks structure using ITL.” I think this idea is pretty trivial but good to know in order to inspire some new ideas! 🙂
Using graphical models can reduce the dimensionality because assigning nodes with some relationships means you guess or assume the structure for those dimension on the nodes already. For example, if we would like to make image segmentation on an RGB image, we can have two approaches in comparison:
- 5D-approach: Here we extract the important features from the image which are R, G, B, x-coordinate and y-coordinate totally 5 features. That means we will have to do clustering of point(vector) in 5-D feature space! One important drawback for using high-dimensional space is that you might get very sparse points in the feature space which sometimes is not adequate to provide a good result.
- 3D-approach: Here we will use nodes encode the positions X and Y of the pixels in the image. Consequently each node will take only 3-D distribution (not 5-D). However, what we will have to pay for having 2 fewer dimensions than before is that we will have to assume the relationships among the nodes which implies the relationship in the dimension of X and Y (in disguise). Therefore, if the assumption is good, the result is OK, but if the assumption is bad, then the result will be bad too. However, in the graphical models the complexity is determined by the size of the biggest table. So the dimensionality can be huge if the causality is very complicated. But that is the different story since the graphical models is probabilistic method which, somehow, cannot avoid having these kinds of huge joint pdf.
Today I have a chance to talk with Sudir and Shalom @ CNEL after our ITL class
The discussion was about finding a good structure for a DTs using ITL unsupervised learning. Shalom and I discussed about Sinisa’s work, using wavelets as features in DTs, how to make a different scale images that still have some correlation between the same pixels in the different scales.
- How to incorporate ITL unsupervised clustering into the Dynamic Trees Bayesian Networks. First I read the paper from Jenssen, the paper is very easy to understand and gives very clear idea of how we can use the ITL for making DAG unsupervisedly. However, the work is pretty much heuristic and computationally expensive since we have to tune the sigma and we have to find the direction of force for each point in the space which takes very long time. As Sudir’s work is more convenient since it gives a closed-form update rules for each point, therefore, I will use his segmentation method to build the structure. For this work, the only parameter we will have to play with is sigma; smaller sigma gives more number of clusters. On the other hand, bigger sigma gives smaller number of clusters. Consequently we will use the small sigma at the leaf nodes and big sigma at the root node.
- We may use the wavelets coefficients as features for finding DTs structure. There are 2 classic papers to read: 1) “Wavelet-based statistical signal processing using hidden Markov models” (1998) by Crouse, M.S. Nowak, R.D. Baraniuk, R.G. and 2) “Approximating discrete probability distributions with dependence trees” (1968) Chow, C.; Liu, C.
- We can make observations in each scale by using wavelet decompose the image into several scale, then we might be able to reconnect the links.
- We can also make observations in each scale by using Gaussian blur, resample (Sinisa’s work) applied to the image, then find some relationship between the upper-scale pixels and the lower-scale pixels. However, using Gaussian blur may not be a powerful method because it does not show much about the relationship across the scale, whereas, the wavelets and resampling might do a good job on that.
- If we really want to make a multi-scale image by bluring, we will have to design a blur function such that it will boost or remain the relationship between the across-scale pixels. How can we design such a blur function? Of course it might be an adaptive one!!!