### Archive

Archive for July, 2011

From my previous post, we know that the update equation for covariance matrix might not be numerically stable because of the matrix not being positive definite. An easy way to stabilize the algorithm is to add a relatively small positive number a.k.a. loading factor to the diagonal entries of the covariance matrix. But, Does the factor loading affect the likelihood or the convergence of the EM algorithm?

Apparently, adding the loading factor to the covariance matrix does impact the log-likelihood value. I made some experiments on the issue, and let me share the results with you as seen in the learning curve (log-likelihood curve) of ITSBN with EM algorithm below. The factor is applied to the matrix only when the determinant of the covariance matrix is smaller than $10^{-6}$. There are 5 different factors used in this experiment listed as follows; $10^{-8}, 10^{-6}, 10^{-4}, 10^{-3}, 10^{-2}$. The results show that the learning curves are still monotonically increasing* and level off near the end. Furthermore, we found that the level-off value are highly associated with the value of the factor. The bigger the factor, the smaller the level-off value. This suggested that we should pick smallest value of factor as possible in order to stay as close as the ideal learning curve as possible. Note that the loading factor is not added to the covariance matrix until the second iteration.

* Though I don’t think this is always the case because the factor is not consistently added to the matrix, and hence when it is added, it might pull the log-likelihood up to a low value. However, it is empirically shown that the log-likelihood is still monotonically increasing when the factor is big.

## What make a covariance matrix NOT positive definite in the EM algorithm?

There are so many plausible reasons. One common reason is that there is at least one Gaussian component not having its cluster members in a close affinity. This situation occurs when the data clusters spread very narrow with respect to the distance between each cluster; in other words, when the intra-cluster distance is much smaller than inter-cluster distance. Let’s assume we have 3 data clusters A, B and C, with A and B are almost merged to each other and very far away from C. We want to cluster the data into 3 components using the EM algorithm.  Suppose the initial locations of the 3 clusters are at the middle of the space among the three clusters, and it occurs that there is one centroid not having its “nearest” members. This also means that it is quite sufficient to use only 2 components to model the whole data rather than 3. Let’s assume the deserted centroid is labeled by the ID ‘2’. In which case, the posterior marginal distribution of each data sample will either have big value for label 1 or 3, but there is no sample give big value for label 2. In fact, to be more precise, the posterior marginal for the label 2 will be virtually zero for all the data samples. Unfortunately the update equation for a covariance matrix weights each atom (i.e., $(x_i-\mu_2)(x_i-\mu_2)^{\top}$)  of updated covariance matrix with its corresponding class posterior marginal $p(x_i=c_2|evidence)$, and hence give zero matrix for covariance matrix of class label 2. So, as you have seen, it is not always an easy case to use EM to cluster the really-far-separated data.

## Simple Classification Models: LDA, QDA and Linear Regression

Finally, my website was set free from the hacker–at least for now ^_^. In my backup directory, I found some notes I made for the Pattern Recognition class I taught in Spring 2010. The notes contains the details of the derivation of

• Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) [pdf]
• Linear Regression for Classification [pdf]

Hope this can be useful.

## Using recursion in MATLAB

This week I wind up with coding sum-product algorithm in MATLAB. All went well, and there were some interestingly simple but powerful techniques I would like to share. We all know that programming function with recursion can save a lot of time, and is a classic technique in C++. I just realized that we can do so in MATLAB too, and the way to do it is very similar to that in C++.

### Example1: “Calculate the summation at each node in a binary tree”

I have a 3-level binary tree whose nodes are connected as follows: node1 is the parent of node 2 and 3, node 2 is the parent of node 4 and 5, node 3 is the parent of node 6 and 7. Let’s assume that nodes 4 – 7 are instantiated with number 4, 5, 6 and 7 respectively. We want to calculate for a node n the summation of its corresponding children in the leaf level. Let’s name the function fn_recurs_sum_tree(tree, n) where the variable “tree” is the binary tree structure with node 4-7 instantiated as above, and n denotes the node of interest. More specifically, tree is a cell array of the size 7 x 1, where tree{n} returns the value stored in the node n of the tree. Here is the example of the code

function sum = fn_recurs_sum_tree(tree,n) if ~isempty(tree{n,1})     sum = tree{n,1}; else     sum = fn_recurs_sum_tree(tree,2*n) + fn_recurs_sum_tree(tree,2*n+1); end

### Example2: “Calculate the summation at every node in a binary tree”

What if we want to find the summation at every node in the network? Of course, we would not call the function fn_sum_bin_tree(tree, n) for n=1, 2 and 3 as that would not be efficient when the number of node is large. One technique is to call the function at the root node (i.e., n = 1) so that all the summation is accumulated from bottom to the top. The price to pay is to deal with how to pass the cell array tree into such a function. Here is the example.

function [sum, tree] = fn_recurs_sum_tree2(tree,n) if ~isempty(tree{n,1})     sum = tree{n,1}; else     [sum1, tree] = fn_recurs_sum_tree2(tree,2*n);     [sum2, tree] = fn_recurs_sum_tree2(tree,2*n+1);     sum = sum1 + sum2;     tree{n,1} = sum; end

Here are some test codes:

% #### example code ######
% initial the tree
tree = cell(7,1);
for n = 4:7
tree{n,1} = n;
end

% Calculate the sum for a single node 2
sum = fn_recurs_sum_tree(tree,2)

% Calculate the sum for the whole network
[sum, tree] = fn_recurs_sum_tree2(tree,1)

This technique is very useful when you have to deal with tree. So, hope this helps! Sample codes are made available here:

Just copy all the codes, put them in the same folder, then run example1.

Categories: Tutorials