### Archive

Posts Tagged ‘programming’

## BigML: a machine learning “sandbox”

Today I just found an interesting website BigML, and it seems to offer a playground for people, especially ML researchers, to experiment standard machine learning techniques on your data set or even on your business.

http://blog.bigml.com/2012/07/04/introducing-bigmls-free-machine-learning-sandbox/

The main website is here:

https://bigml.com/

You can try the BigML for free in development mode, but I think 1 MB for training data set is pretty restrictive though.

Categories: iDea

## How to Install Eclipse and Android SDK on Ubuntu 10.04 LTS

January 17, 2012 1 comment

Before walking you through the installation process in details, it’s good to know a big picture:

1. Since Android development needs Java, so we are going to need Java installed on Ubuntu
• If you are using 64-bit Linux machine (e.g. 64-bit Ubuntu), you may need to install ia32-libs.
2. You will also need Java Development Tools (JDK), and Eclipse is recommended
3. The last thing is to install Android Development Tools (ADT) on Eclipse

All the information regarding the installation can be found at the Android developer website

http://developer.android.com

Here is the installation in details:

1. Install Java
1. If you are using 64-bit Ubuntu, you will need to install ia32-libs package:
sudo apt-get ia32-libs
2. Next, install Javasudo apt-get install sun-java6-jdk

A tricky part is that when the Java installation almost done, you will need to click <Ok>, which can be done by click in the terminal and press Tab until the <Ok> is highlighted.

2. Next, you will need to install Java Development Tools (JDK). Eclipse is recommended because there is an Android Development Tools (ADT) plugin available.
2. I download Eclipse IDE for Java Developers (both 32- and 64-bit are available)
3. Extract the downloaded file eclipse-java-indigo-SR1-linux-gtk-x86_64.tar.gz, and you will get the folder called eclipse, containing all necessary files
4. Since Eclipse is built to be portable, you can move the extracted eclipse folder to any location you want. It is recommended to move the file to your home directory /home/yourusername/
3. Launch Eclipse, we will now install the ADT plugin
1. go to Help>Install New Software..
2. According to the ADT plugin page, we will put the URL
3. Tick the newly added plugin, and click Next until done
4. Next, we will install Android software development kit (Android SDK)
http://developer.android.com/sdk/index.html
2. Extract android-sdk_r16-linux.tgz in the home folder (/home/yourusername/)
3. The extracted folder is android-sdk-linux
4. Now go back to Windows>Preferences
5. On the left panel, click Android
6. In the SDK Location, select your extracted SDK location, which is
7. Click Apply and OK, wait a few minutes for the program to update
5. Retrieve all necessary files for SDK
1. Go to Window>Android SDK Manager
2. Tick all the Android versions that apply, and click Install packages
6. Done, So now you may proceed to build a new project!!!

Categories: iDo, Tutorials

From my previous post, we know that the update equation for covariance matrix might not be numerically stable because of the matrix not being positive definite. An easy way to stabilize the algorithm is to add a relatively small positive number a.k.a. loading factor to the diagonal entries of the covariance matrix. But, Does the factor loading affect the likelihood or the convergence of the EM algorithm?

Apparently, adding the loading factor to the covariance matrix does impact the log-likelihood value. I made some experiments on the issue, and let me share the results with you as seen in the learning curve (log-likelihood curve) of ITSBN with EM algorithm below. The factor is applied to the matrix only when the determinant of the covariance matrix is smaller than $10^{-6}$. There are 5 different factors used in this experiment listed as follows; $10^{-8}, 10^{-6}, 10^{-4}, 10^{-3}, 10^{-2}$. The results show that the learning curves are still monotonically increasing* and level off near the end. Furthermore, we found that the level-off value are highly associated with the value of the factor. The bigger the factor, the smaller the level-off value. This suggested that we should pick smallest value of factor as possible in order to stay as close as the ideal learning curve as possible. Note that the loading factor is not added to the covariance matrix until the second iteration.

* Though I don’t think this is always the case because the factor is not consistently added to the matrix, and hence when it is added, it might pull the log-likelihood up to a low value. However, it is empirically shown that the log-likelihood is still monotonically increasing when the factor is big.

## What make a covariance matrix NOT positive definite in the EM algorithm?

There are so many plausible reasons. One common reason is that there is at least one Gaussian component not having its cluster members in a close affinity. This situation occurs when the data clusters spread very narrow with respect to the distance between each cluster; in other words, when the intra-cluster distance is much smaller than inter-cluster distance. Let’s assume we have 3 data clusters A, B and C, with A and B are almost merged to each other and very far away from C. We want to cluster the data into 3 components using the EM algorithm.  Suppose the initial locations of the 3 clusters are at the middle of the space among the three clusters, and it occurs that there is one centroid not having its “nearest” members. This also means that it is quite sufficient to use only 2 components to model the whole data rather than 3. Let’s assume the deserted centroid is labeled by the ID ‘2’. In which case, the posterior marginal distribution of each data sample will either have big value for label 1 or 3, but there is no sample give big value for label 2. In fact, to be more precise, the posterior marginal for the label 2 will be virtually zero for all the data samples. Unfortunately the update equation for a covariance matrix weights each atom (i.e., $(x_i-\mu_2)(x_i-\mu_2)^{\top}$)  of updated covariance matrix with its corresponding class posterior marginal $p(x_i=c_2|evidence)$, and hence give zero matrix for covariance matrix of class label 2. So, as you have seen, it is not always an easy case to use EM to cluster the really-far-separated data.

## Using recursion in MATLAB

This week I wind up with coding sum-product algorithm in MATLAB. All went well, and there were some interestingly simple but powerful techniques I would like to share. We all know that programming function with recursion can save a lot of time, and is a classic technique in C++. I just realized that we can do so in MATLAB too, and the way to do it is very similar to that in C++.

### Example1: “Calculate the summation at each node in a binary tree”

I have a 3-level binary tree whose nodes are connected as follows: node1 is the parent of node 2 and 3, node 2 is the parent of node 4 and 5, node 3 is the parent of node 6 and 7. Let’s assume that nodes 4 – 7 are instantiated with number 4, 5, 6 and 7 respectively. We want to calculate for a node n the summation of its corresponding children in the leaf level. Let’s name the function fn_recurs_sum_tree(tree, n) where the variable “tree” is the binary tree structure with node 4-7 instantiated as above, and n denotes the node of interest. More specifically, tree is a cell array of the size 7 x 1, where tree{n} returns the value stored in the node n of the tree. Here is the example of the code

function sum = fn_recurs_sum_tree(tree,n) if ~isempty(tree{n,1})     sum = tree{n,1}; else     sum = fn_recurs_sum_tree(tree,2*n) + fn_recurs_sum_tree(tree,2*n+1); end

### Example2: “Calculate the summation at every node in a binary tree”

What if we want to find the summation at every node in the network? Of course, we would not call the function fn_sum_bin_tree(tree, n) for n=1, 2 and 3 as that would not be efficient when the number of node is large. One technique is to call the function at the root node (i.e., n = 1) so that all the summation is accumulated from bottom to the top. The price to pay is to deal with how to pass the cell array tree into such a function. Here is the example.

function [sum, tree] = fn_recurs_sum_tree2(tree,n) if ~isempty(tree{n,1})     sum = tree{n,1}; else     [sum1, tree] = fn_recurs_sum_tree2(tree,2*n);     [sum2, tree] = fn_recurs_sum_tree2(tree,2*n+1);     sum = sum1 + sum2;     tree{n,1} = sum; end

Here are some test codes:

% #### example code ######
% initial the tree
tree = cell(7,1);
for n = 4:7
tree{n,1} = n;
end

% Calculate the sum for a single node 2
sum = fn_recurs_sum_tree(tree,2)

% Calculate the sum for the whole network
[sum, tree] = fn_recurs_sum_tree2(tree,1)

This technique is very useful when you have to deal with tree. So, hope this helps! Sample codes are made available here:

Just copy all the codes, put them in the same folder, then run example1.

Categories: Tutorials

## pdflatex gives error when compiled with .eps figures

The command pdflatex can take only figures with .pdf extension, and unfortunately I mostly output my figures in .eps, which means that I have to convert all figures to .pdf. If there are only a handful of figures, manually converting the figures one by one is acceptable using the command (in the terminal)

epstopdf figure_file.eps

However, when there are hundreds of figures, we may not want to do that. Instead, one might want to simply write a batch file to convert all the figures automatically. Instructions for writing a batch file can be found in this web page:

http://commandwindows.com/batch.htm

I have my very easy code available to try here. For an advanced user, you can list all the file names inside a directory and use for-loop to convert the file one by one. There are some people suggest using ImageMagick, but I found that is not so convenient :-P.

## How to install Greg Mori’s superpixel MATLAB code?

This short note aims to show you how to use superpixel code from Greg Mori whose codes are observed to have very good results and used by a bunch of computer vision researchers. However, the installation process can be challenging sometimes ^_^, so I figured it’d be nice if I document the process so that it will be easier for absolute beginners to use the code, and more importantly…I can come back to read when I forget how to do it.

I have MATLAB R2010a installed on my Ubuntu 32-bit 10.04 LTS – the Lucid Lynx

I download Mori’s code, extract the zipped file to a folder called superpixels. The folder is located at
/home/student1/MATLABcodes/superpixels

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/code/segbench.tar.gz
I extract it to a folder called segbench, then I put it inside the superpixels
/home/student1/MATLABcodes/superpixels/segbench
—————————————————————————–
Here are the instructions of the folders

– Run mex on *.c in yu_imncut directory
– Obtain mfm-pb boundary detector code from

– Change path names in sp_demo.m and pbWrapper.m
– Get a fast processor and lots of RAM
– Run sp_demo.m

(1) For the image and segmentation reading routines in the Dataset
directory to work, make sure you edit Dataset/bsdsRoot.m to point to
your local copy of the BSDS dataset.

(2) Run ‘gmake install’ from this directory to build everything.  You
should then probably put the lib/matlab directory in your MATLAB path.

———————————————————————————————————————

– Run mex on *.c in yu_imncut directory
I run mex on all the .c file in the folder
/home/student1/MATLABcodes/superpixels/yu_imncut
I don’t know why the command mex *.c does not work, so I have to run mex on every file one by one. Each time I run mex, I will get message
Warning: You are using gcc version “4.4.3-4ubuntu5)”.  The version
currently supported with MEX is “4.2.3”.
For a list of currently supported compilers see:

However, it seems to work fine since I can see all the .mexglx files show up in the folder. So I assume I do it correctly and go on the next step.

In this step,
– Obtain mfm-pb boundary detector code from

(1) For the image and segmentation reading routines in the Dataset
directory to work, make sure you edit Dataset/bsdsRoot.m to point to
your local copy of the BSDS dataset.
So, I go to the file /home/student1/MATLABcodes/superpixels/segbench/Dataset/bsdsRoot.m and change the root to
root = ‘/home/student1/MATLABcodes/superpixels’;
which contains the image I want to segment, “img_000070.jpg”

Next, I do (2) in README (segbench’s)
(2) Run ‘gmake install’ from this directory to build everything.  You
should then probably put the lib/matlab directory in your MATLAB path.
Now at the folder, student1@student1-desktop:~/MATLABcodes/superpixels/segbench$we need to make MATLAB seen in this folder, so we export the MATLAB path student1@student1-desktop:~/MATLABcodes/superpixels/segbench$ PATH=$PATH:/usr/share/matlabr2010a/bin student1@student1-desktop:~/MATLABcodes/superpixels/segbench$ export PATH
then use make install, this time I got quite a long message in the terminal
student1@student1-desktop:~/MATLABcodes/superpixels/segbench\$ make install
Then you will notice some files in the folder
/home/student1/MATLABcodes/superpixels/segbench/lib/matlab
What you have to do here is to addpath in MATLAB by typing in the command window

Next, (3) Read the Benchmark/README file. I found that we don’t have to do anything in this step. So just skip this.

Now it’s the last step
Change path names in sp_demo.m and pbWrapper.m
so, go to the folder /home/student1/MATLABcodes/superpixels and change the path
in pbWrapper.m I make the path pointing to ‘/home/student1/MATLABcodes/superpixels/segbench/lib/matlab’
in sp_demo.m I make the path pointing to
‘/home/student1/MATLABcodes/superpixels/yu_imncut’

Now run the file sp_demo.m. Unfortunately you will get some error messages because of a function spmd. This happens because MATLAB 2010a has function spmd of its own which has the number of input argument different from that of spmd from the toolbox. One way to get around this is to change the name of spmd.c in the toolbox to spmd2.c, then compile spmd2.c using mex spmd2.c. Then replace spmd(…) with spmd2(…). If you encounter more errors from this point on, don’t panic, because it’s probably from this spmd issue, so just do the same thing and it will work fine.

That’s it! Enjoy Greg Mori’s code!

——————————————————————————

For windows users, please refer to Thanapong’s blog, whose URL is given below:
http://blog.thanapong.in.th/arah/?p=29