Archive

Posts Tagged ‘machine learning’

BigML: a machine learning “sandbox”

Today I just found an interesting website BigML, and it seems to offer a playground for people, especially ML researchers, to experiment standard machine learning techniques on your data set or even on your business.

http://blog.bigml.com/2012/07/04/introducing-bigmls-free-machine-learning-sandbox/

The main website is here:

https://bigml.com/

You can try the BigML for free in development mode, but I think 1 MB for training data set is pretty restrictive though.

 

A good Introduction on MapReduce

MapReduce is a framework to efficiently process a task that can be parallelized using cluster or grid. A good introduction can be found in the link below.

http://en.wikipedia.org/wiki/MapReduce#Example

In a sense, MapReduce framework is very similar to message-passing algorithm in graphical models where the Map and Reduce are comparable to building (tree) structure and marginalization of the messages respectively. So, I think MapReduce can make an inference plausible for large-scale graphical models.

Awesome seminars at UW

April 3, 2012 1 comment

There are some fascinating seminars sponsored by UW, and most of them are recorded:

CSE Colloquia:
Every Tuesday 3:30 pm
https://www.cs.washington.edu/htbin-post/mvis/mvis/Colloquia#current

Yahoo! Machine Learning Seminar
Every Tuesday from 12 – 1 pm
http://ml.cs.washington.edu/seminars

UWTV: Research/Technology/Discovery Channel
Broadcast all the new findings, research, technology for free!!
http://www.uwtv.org/

 

 

 

Cluster Evaluation using Adjusted Rand Index (ARI)

August 17, 2011 Leave a comment

Here is the 2 partitions mentioned in the example1 in the tutorial paper “Details of the Adjusted Rand index and Clustering algorithms
Supplement to the paper “An empirical study on Principal Component Analysis for clustering gene expression data” (to appear in Bioinformatics)” pdf

Partition U (ground truth) and V (predicted)

And I think they did in the example is exactly the same as the following

a = |(4,5) ; (7,8)  (7,9) (7,10) (8,9) (8,10) (9,10)| = (2 choose 2) + (4 choose 2) =  7

b=|(1,2) (3,4) (3,5) (6,4) (6,5) (3,6)| = 6

c = |(1,3) (2,4) (2,5) (6,7) … (6,10)| = 7

d = |(1,4)…(1,10) (2,3) (2,6) …(2,10) (3,7) …(3,10) (4,7)…(4,10) (5,7)…(5,10)| = 25

where (i,j) denotes the pair (or edge) between node i and node j. Then they use this a, b, c and d to evaluate Rand index and adjusted Rand index.

Effects of adding loading factors to a covariance matrix

July 29, 2011 Leave a comment

From my previous post, we know that the update equation for covariance matrix might not be numerically stable because of the matrix not being positive definite. An easy way to stabilize the algorithm is to add a relatively small positive number a.k.a. loading factor to the diagonal entries of the covariance matrix. But, Does the factor loading affect the likelihood or the convergence of the EM algorithm?

Apparently, adding the loading factor to the covariance matrix does impact the log-likelihood value. I made some experiments on the issue, and let me share the results with you as seen in the learning curve (log-likelihood curve) of ITSBN with EM algorithm below. The factor is applied to the matrix only when the determinant of the covariance matrix is smaller than 10^{-6}. There are 5 different factors used in this experiment listed as follows; 10^{-8}, 10^{-6}, 10^{-4}, 10^{-3}, 10^{-2}. The results show that the learning curves are still monotonically increasing* and level off near the end. Furthermore, we found that the level-off value are highly associated with the value of the factor. The bigger the factor, the smaller the level-off value. This suggested that we should pick smallest value of factor as possible in order to stay as close as the ideal learning curve as possible. Note that the loading factor is not added to the covariance matrix until the second iteration.

log-likelihood curve with different loading factors

* Though I don’t think this is always the case because the factor is not consistently added to the matrix, and hence when it is added, it might pull the log-likelihood up to a low value. However, it is empirically shown that the log-likelihood is still monotonically increasing when the factor is big.

What make a covariance matrix NOT positive definite in the EM algorithm?

July 29, 2011 Leave a comment

There are so many plausible reasons. One common reason is that there is at least one Gaussian component not having its cluster members in a close affinity. This situation occurs when the data clusters spread very narrow with respect to the distance between each cluster; in other words, when the intra-cluster distance is much smaller than inter-cluster distance. Let’s assume we have 3 data clusters A, B and C, with A and B are almost merged to each other and very far away from C. We want to cluster the data into 3 components using the EM algorithm.  Suppose the initial locations of the 3 clusters are at the middle of the space among the three clusters, and it occurs that there is one centroid not having its “nearest” members. This also means that it is quite sufficient to use only 2 components to model the whole data rather than 3. Let’s assume the deserted centroid is labeled by the ID ‘2’. In which case, the posterior marginal distribution of each data sample will either have big value for label 1 or 3, but there is no sample give big value for label 2. In fact, to be more precise, the posterior marginal for the label 2 will be virtually zero for all the data samples. Unfortunately the update equation for a covariance matrix weights each atom (i.e.,  (x_i-\mu_2)(x_i-\mu_2)^{\top})  of updated covariance matrix with its corresponding class posterior marginal  p(x_i=c_2|evidence), and hence give zero matrix for covariance matrix of class label 2. So, as you have seen, it is not always an easy case to use EM to cluster the really-far-separated data.

Simple Classification Models: LDA, QDA and Linear Regression

July 28, 2011 Leave a comment

Finally, my website was set free from the hacker–at least for now ^_^. In my backup directory, I found some notes I made for the Pattern Recognition class I taught in Spring 2010. The notes contains the details of the derivation of

  • Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) [pdf]
  • Linear Regression for Classification [pdf]

Hope this can be useful.

Hand posture recognition using minimum divergence classifier

May 8, 2011 4 comments

I and my colleague were suggested by a reviewer to apply our accepted work on some real-world application. “Bro, we’ve got less than 4 days to apply our work on a real-world problem…what would we do?”, we spent 10 minutes discussing several possible problems such as automatic video segmentation, CD cover searching, human gesture recognition and some other funny-crazy ideas. Finally, with our curiosity and the time constraint we ended up with static hand posture recognition. Fortunately, the data set is not too difficult to find on internet. Millions thanks to Triesch and Von Der Malsburg for the wonderful hand posture database–that saved our lives.

Originally we found that calculating divergence measure of 2 Gaussian mixture models (GMM) can be done efficiently using Cauchy-Schwarz divergence (D_{CS}) as it gives closed-form expression for any pair of GMMs. Of course, we can’t get this awesome property in Kullback-Leibler divergence (D_{KL})…why? read our paper [1] ^_^ Yay! In short, D_{KL} formulation does not allow Gaussian integral trick, hence closed-form expression is not possible.

In this work, we use minimum divergence classifier to recognize the hand postures. Please see our paper for more details. We had finished our experiment on the second day, so we have some time left to make a fancy plot summarizing our work which we would like to share with you below. The classification accuracy using D_{CS} and D_{KL} are 95% and 92% respectively, and the former method also gives much better computational run-time, about 10 time faster. The figures below also suggest that our proposed method outperforms D_{KL} when it comes to clustering as the proposed method gives more discriminative power.

Similarity matrix calculated by Cauchy-Schwarz divergence
Similarity matrix calculated by Kullback-Leibler divergence

[1] K. Kampa, E. Hasanbelliu and J. C. Principe, “Closed-form Cauchy-Schwarz pdf Divergence for Mixture of Gaussians,” Proc. of the International Joint Conference on Neural Networks (IJCNN 2011). [pdf] [BibTex]

We make our code available for anyone under
 creative commons agreement [.zip]

We also collected some interesting links to the hand posture/gesture database here:

http://www.datehookup.com/content-analyzing-body-language-gesture-recognition.htm
http://www-prima.inrialpes.fr/FGnet/data/03-Pointing/index.html#Gesture%20Vocabulary
http://www.idiap.ch/resource/gestures/
http://www.iis.ee.ic.ac.uk/~tkkim/ges_db.htm
ftp://mi.eng.cam.ac.uk/pub/CamGesData/
http://www.csc.kth.se/~danik/gesture_database/

The following papers and documents can be helpful:

A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior
Hatice Gunes and Massimo Piccardi Computer Vision Research Group,
University of Technology, Sydney (UTS)

A Color Hand Gesture Database for Evaluating and Improving Algorithms on Hand Gesture and Posture Recognition
FARHAD DADGOSTAR, ANDRE L. C. BARCZAK, ABDOLHOSSEIN SARRAFZADEH

Hand Detection and Gesture Recognition using ASL Gestures
Supervisor: Andre L. C. Barczak
Student: Dakuan CUI
Massey University

How to install Greg Mori’s superpixel MATLAB code?

February 28, 2011 11 comments

This short note aims to show you how to use superpixel code from Greg Mori whose codes are observed to have very good results and used by a bunch of computer vision researchers. However, the installation process can be challenging sometimes ^_^, so I figured it’d be nice if I document the process so that it will be easier for absolute beginners to use the code, and more importantly…I can come back to read when I forget how to do it.

I have MATLAB R2010a installed on my Ubuntu 32-bit 10.04 LTS – the Lucid Lynx

I download Mori’s code, extract the zipped file to a folder called superpixels. The folder is located at
/home/student1/MATLABcodes/superpixels

Next I download the boundary detector code from the link
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/code/segbench.tar.gz
I extract it to a folder called segbench, then I put it inside the superpixels
/home/student1/MATLABcodes/superpixels/segbench
—————————————————————————–
Here are the instructions of the folders

README (Mori’s)

– Run mex on *.c in yu_imncut directory
– Obtain mfm-pb boundary detector code from
http://www.cs.berkeley.edu/projects/vision/grouping/segbench/
– Change path names in sp_demo.m and pbWrapper.m
– Get a fast processor and lots of RAM
– Run sp_demo.m

README (segbench’s)
(1) For the image and segmentation reading routines in the Dataset
directory to work, make sure you edit Dataset/bsdsRoot.m to point to
your local copy of the BSDS dataset.

(2) Run ‘gmake install’ from this directory to build everything.  You
should then probably put the lib/matlab directory in your MATLAB path.

(3) Read the Benchmark/README file.
———————————————————————————————————————

According to the README instruction
– Run mex on *.c in yu_imncut directory
I run mex on all the .c file in the folder
/home/student1/MATLABcodes/superpixels/yu_imncut
I don’t know why the command mex *.c does not work, so I have to run mex on every file one by one. Each time I run mex, I will get message
Warning: You are using gcc version “4.4.3-4ubuntu5)”.  The version
currently supported with MEX is “4.2.3”.
For a list of currently supported compilers see:
http://www.mathworks.com/support/compilers/current_release/
However, it seems to work fine since I can see all the .mexglx files show up in the folder. So I assume I do it correctly and go on the next step.

In this step,
– Obtain mfm-pb boundary detector code from
http://www.cs.berkeley.edu/projects/vision/grouping/segbench/
I got the code already, so I follow the README (segbench’s). Firstly, I do
(1) For the image and segmentation reading routines in the Dataset
directory to work, make sure you edit Dataset/bsdsRoot.m to point to
your local copy of the BSDS dataset.
So, I go to the file /home/student1/MATLABcodes/superpixels/segbench/Dataset/bsdsRoot.m and change the root to
root = ‘/home/student1/MATLABcodes/superpixels’;
which contains the image I want to segment, “img_000070.jpg”

Next, I do (2) in README (segbench’s)
(2) Run ‘gmake install’ from this directory to build everything.  You
should then probably put the lib/matlab directory in your MATLAB path.
Now at the folder, student1@student1-desktop:~/MATLABcodes/superpixels/segbench$
we need to make MATLAB seen in this folder, so we export the MATLAB path
student1@student1-desktop:~/MATLABcodes/superpixels/segbench$ PATH=$PATH:/usr/share/matlabr2010a/bin
student1@student1-desktop:~/MATLABcodes/superpixels/segbench$ export PATH
then use make install, this time I got quite a long message in the terminal
student1@student1-desktop:~/MATLABcodes/superpixels/segbench$ make install
Then you will notice some files in the folder
/home/student1/MATLABcodes/superpixels/segbench/lib/matlab
What you have to do here is to addpath in MATLAB by typing in the command window
addpath(’/home/student1/MATLABcodes/superpixels/segbench/lib/matlab’);

Next, (3) Read the Benchmark/README file. I found that we don’t have to do anything in this step. So just skip this.

Now it’s the last step
Change path names in sp_demo.m and pbWrapper.m
so, go to the folder /home/student1/MATLABcodes/superpixels and change the path
in pbWrapper.m I make the path pointing to ‘/home/student1/MATLABcodes/superpixels/segbench/lib/matlab’
in sp_demo.m I make the path pointing to
‘/home/student1/MATLABcodes/superpixels/yu_imncut’

Now run the file sp_demo.m. Unfortunately you will get some error messages because of a function spmd. This happens because MATLAB 2010a has function spmd of its own which has the number of input argument different from that of spmd from the toolbox. One way to get around this is to change the name of spmd.c in the toolbox to spmd2.c, then compile spmd2.c using mex spmd2.c. Then replace spmd(…) with spmd2(…). If you encounter more errors from this point on, don’t panic, because it’s probably from this spmd issue, so just do the same thing and it will work fine.

That’s it! Enjoy Greg Mori’s code!

——————————————————————————

For windows users, please refer to Thanapong’s blog, whose URL is given below:
http://blog.thanapong.in.th/arah/?p=29

Image Segmentation using Gaussian Mixture Model (GMM) and BIC

February 27, 2011 3 comments

A while ago, I was so amazed about the image segmentation results using Gaussian Mixture Models (GMMs) because GMM gives pretty good results on normal/natural images. There are some results on my previous post. Of course, GMM is not the best for this job, but hey look at its speed and easiness to implement–it’s pretty good in that sense. However, one problem with GMM is that we need to pick the number of components. In general, the more component numbers we assume, the better log-likelihood it would be for GMM. In that case, we would simply send the number of components to infinity, right? Well…but there is nothing good come out of that because the segment would not be so meaningful–in fact, we overfit the data, which is bad.

Therefore, Bayesian Information Criteria (BIC) is introduced as a cost function composing of 2 terms; 1) minus of log-likelihood and 2) model complexity. Please see my old post. You will see that BIC prefers model that gives good result while the complexity remains small. In other words, the model whose BIC is smallest is the winner. Simple as that. Here is the MATLAB code. Below are some results from sweeping the number of components from 2 to 10. Unfortunately, the results are not what I (and maybe other audiences) desire or expect. As a human, my attention just focuses on skier, snow, sky/cloud and perhaps in the worst case, the shadows, so the suitable number of components should be 3-4. Instead, the BIC assigns 9-component model the winner which is far from I expected. So, Can I say that the straightforward BIC might not be a good model for image segmentation, in particular, for human perception? Well…give GMM-BIC a break– I think this is too early to blame BIC because I haven’t use other more sophisticated features like texture, shape, color histogram which might improve results from using GMM-BIC. The question is what are the suitable features and the number of components that makes the segmentation results using GMM-BIC similar to human perception? MATLAB code is made available here.

original image

original image

Plot of BIC of model using 7-10 components

Plot of BIC of model using 7-10 components