SharpRBM, Restricted Boltzmann Machines in C#/.NET

Years ago I was looking through lots of the lectures on GoogleTechTalks and i came across one by Geoffrey E. Hinton about Restricted Boltzmann Machines (RBMs) which is a very interesting method for training Neural Networks (NN) to do stuff, like image categorization but also other interesting things.

Anyway, I wanted my own RBM to play with, but I was unable to find any useful code for C# on the internet. I put it aside, but recently I stumbled across it again and decided to pick it up again.

SharpRBM

I’ve started developing my own implementation of RBMs that I call SharpRBM, it’s hosted at CodePlex. I thought I’d blog about my experiments as a work on my implementation. Right now it can produce image classifiers that work fairly well. I won’t discuss how RBMs work in any detail here, Geoffrey et. al. do a great job of doing that.

For RBM implementations to be scalable, they should be implemented using something like CUDA where a GPU is utilized to train the network in parallel, but my initial concern is that I want a proof of concept and something to play around with.

Feature detectors

One funny thing about RBMs is that you don’t train them with labeled data, you train them to create their own feature detectors from the “raw” data. Then you use any other training method (back-prop, GA, whatever really) to match the labels to the constructed feature detectors.

What are feature detectors? Well, one feature detector might recognize a nose when it sees one, others might recognize mouths, eyes and ears. The highest level training, the level using labeled data, might conclude that when the feature detectors “see” eyes, ears, a mouth and a nose together, there’s a face in the picture! Even though the training of the feature detectors had absolutely no knowledge of faces to begin with.

These early experiments will not demonstrate classification using the generated feature detectors.

Reconstructing images from feature detectors

Once the feature detectors are trained, we can show them an image and each feature detector will either come on or not. From this inner state we can reconstruct the image it “imagines” by using the weights in reverse. This gives you a really good sense of what the RBM thinks of the picture space it’s working with, how much of it it has understood.

The number of feature detectors will determine how well the RBM is able to reproduce the image it saw, too few and you’ll get an ugly mixed version. Use too many and it takes forever to train the network.

I want to see…

The RBMs can be allowed to “dream” by repeatedly activating the feature detectors, then generate reconstructions of the internal state of the feature detectors and then activate the feature detectors from the reconstruction. If you do this for a while, you’ll see images that the RBM considers probable/plausible. For numbers, this looks really funny as it dreams about numbers that look like numbers it has previously seen.

From youtube.com;

The real reason I started this is that I want to see the kinds of faces and RBM dreams of when it’s been trained on faces… But we’re not there yet.

This is what my version looks like when it’s trying to train an RBM to create feature detectors for numbers;

image

Tests

For this demo run, I’m using 10 input images that I drew myself using Paint, glorious glorious ms paint. Each experiment is run for several hundred generations, until it looks (to me) like it’s converged on a solution. These images are very hard to create feature detectors for because they’re all very de-similar. A low number of handwritten numbers are easier, but these tests are to test the limits of feature detectors, so a harder set is better.

Using 2 feature detectors

Using just 2 feature detectors is no enough for the RBM to be able to reconstruct the images. Note how the reconstructs and the feature detectors are mixes of several of the input images;

image

Using 3 feature detectors

Using three is still not enough, but the images are clearer.

image Using 4 feature detectors

Again, not particularly well reconstructed, but we’re closer.

image

Using 5 feature detectors

Closer, but still no cigar!

image

Using 6 feature detectors

Pretty close, actually! Note how the feature detectors are very chaotic, but the regenerated images aren’t! This is (probably) because the networks have been forced to re-use the feature detectors to mean many different things. Recombining them makes a fairly clean image because you can subtract one feature detector from another feature detector (by using negative weights) and thus come up with a cleaned up image.

image

Using 7 feature detectors

This one’s actually worse than the previous attempt – which might be due to the fact that I didn’t run it for as long. But it might also be some random thing that happens when the number of feature detectors is so low.

image

Using 8 feature detectors

Oh, we’re so close now! Feature detectors are still chaotic because they’re storing multiple actual features at once.

image

 

Using 9 feature detectors

Closer still

image

Using 10 feature detectors

Very good reconstruction, but the detectors are still chaotic! I was hoping that by now, we’d have nice clear detectors that identified one picture each, but alas. Not yet. Now we’ll pick up the pace and add more detectors for each experiment. Next up, 15 detectors!

image

10 and then 15 detectors

So, it seems my expectation of specialized feature detectors was incorrect;

image image

50 and then 100 detectors

Still no specialization, and note how the detectors are starting to get washed out. The gray areas of the detectors means that the detector doesn’t care about the region in question. Very few detectors that most pixels in most detectors actually count, with many detectors, some pixels can be left unused. It also makes detectors duller and duller to look at, as you move forward…

image

image

In closing

Very odd indeed, why aren’t my feature detectors specializing? Maybe the first level of RBM nodes don’t specialize, maybe that’s for the subsequent levels to do? Also, at the point of this writing, SharpRBM isn’t quite finished, it doesn’t implement biases, which are probably required going forward. That’s something I’ll work on, and some of the results I’m getting might be because of this.

[Edit: Biases have been added and they didn’t do a thing for specialization]

More experimenting to follow.

[Edit: The current thinking is that the lack of specialization is due to the fact that this is the first layer, adding another layer might make things more agreeable! I’m working on a second layer, which should be simple enough, if I can just find the time]

About mfagerlund
Writes code in my sleep - and sometimes it even compiles!

5 Responses to SharpRBM, Restricted Boltzmann Machines in C#/.NET

  1. Kyle Berry says:

    Hello,
    I started a discussion with an issue I am having on your http://sharprbm.codeplex.com/ site. I’m not sure if you get emailed by it, but you will probably receive this message. If you don’t mind please take a look at it and tell me what you think.

    Thanks,
    Kyle

  2. Jeff Haynes says:

    I’ve just started looking at SharpRBM but upon initial glance there are a number of places where you appear to create variables and then not use them (and I wonder if these are bugs). You might consider going over your code with ReSharper or something and seeing if there are some obvious places where this is happening.

  3. Magnus says:

    Hi,

    Nice code, however i’m wondering about this line:
    _reconstructedModel = new float[weights.LowerLayerSize];
    Shouldn’t that have the size of UpperLayerSize, since in the c code you are referencing upperValues[upper] (_reconstructedModel[upper])?

    Regards
    Magnus

    • mfagerlund says:

      Don’t remember from the top of my head, but when you go through the levels, the upper layer is the lower layer for the next layer over. Clear as mud😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: