Contrastive Divergence for Parameter Estimation in DSCRFs

Recently, I have been working on Deformable-Structure Conditional Random Fields (DSCRFs) for image classification which is about CRFs that can change the graph structure to fit the data (image) and make inference (of pixel label) simultaneously. One problem of this approach is when we estimate the parameters (I guess everyone in the field knows what I’m talking about ^_^), so I’m looking for some optimization algorithms to deal with this. Of course, first thing I’m thinking of is variational approximation. I have tried some already such as mean-field, structure variational, but there is one popular method, “Contrastive Divergence” (CD), that I have heard and wanted to try. I have read some papers on it, and here are what I really recommend to read.

  1. “Note on Contrastive Divergence” by Oliver Woodford [pdf]: For me, this paper is the best; precise, intuitive and make you hungry to know more!
  2. “Training Products of Experts by Minimizing Contrastive Divergence” by Geoffrey E. Hinton [pdf]: I guess this is the original paper of CD. This is the first paper I read on this topic, the paper did a good job to make me understand the math underlying CD, however, I did not have an intuitive idea of what CD really is after that first reading. Surprisingly, after I read [1], then come back to [2], I found that I can put pieces together and get a better intuition of this topic. So, I really recommend reading [1] before [2].

Video lecture related to this topic

Using Fast Weights to Improve Persistent Contrastive Divergence

Tijmen Tieleman

