SharpRBM, Restricted Boltzmann Machines in C#/.NET
September 14, 2010 5 Comments
Years ago I was looking through lots of the lectures on GoogleTechTalks and i came across one by Geoffrey E. Hinton about Restricted Boltzmann Machines (RBMs) which is a very interesting method for training Neural Networks (NN) to do stuff, like image categorization but also other interesting things.
Anyway, I wanted my own RBM to play with, but I was unable to find any useful code for C# on the internet. I put it aside, but recently I stumbled across it again and decided to pick it up again.
I’ve started developing my own implementation of RBMs that I call SharpRBM, it’s hosted at CodePlex. I thought I’d blog about my experiments as a work on my implementation. Right now it can produce image classifiers that work fairly well. I won’t discuss how RBMs work in any detail here, Geoffrey et. al. do a great job of doing that.
For RBM implementations to be scalable, they should be implemented using something like CUDA where a GPU is utilized to train the network in parallel, but my initial concern is that I want a proof of concept and something to play around with.
One funny thing about RBMs is that you don’t train them with labeled data, you train them to create their own feature detectors from the “raw” data. Then you use any other training method (back-prop, GA, whatever really) to match the labels to the constructed feature detectors.
What are feature detectors? Well, one feature detector might recognize a nose when it sees one, others might recognize mouths, eyes and ears. The highest level training, the level using labeled data, might conclude that when the feature detectors “see” eyes, ears, a mouth and a nose together, there’s a face in the picture! Even though the training of the feature detectors had absolutely no knowledge of faces to begin with.
These early experiments will not demonstrate classification using the generated feature detectors.
Reconstructing images from feature detectors
Once the feature detectors are trained, we can show them an image and each feature detector will either come on or not. From this inner state we can reconstruct the image it “imagines” by using the weights in reverse. This gives you a really good sense of what the RBM thinks of the picture space it’s working with, how much of it it has understood.
The number of feature detectors will determine how well the RBM is able to reproduce the image it saw, too few and you’ll get an ugly mixed version. Use too many and it takes forever to train the network.
I want to see…
The RBMs can be allowed to “dream” by repeatedly activating the feature detectors, then generate reconstructions of the internal state of the feature detectors and then activate the feature detectors from the reconstruction. If you do this for a while, you’ll see images that the RBM considers probable/plausible. For numbers, this looks really funny as it dreams about numbers that look like numbers it has previously seen.
The real reason I started this is that I want to see the kinds of faces and RBM dreams of when it’s been trained on faces… But we’re not there yet.
This is what my version looks like when it’s trying to train an RBM to create feature detectors for numbers;
For this demo run, I’m using 10 input images that I drew myself using Paint, glorious glorious ms paint. Each experiment is run for several hundred generations, until it looks (to me) like it’s converged on a solution. These images are very hard to create feature detectors for because they’re all very de-similar. A low number of handwritten numbers are easier, but these tests are to test the limits of feature detectors, so a harder set is better.
Using 2 feature detectors
Using just 2 feature detectors is no enough for the RBM to be able to reconstruct the images. Note how the reconstructs and the feature detectors are mixes of several of the input images;
Using 3 feature detectors
Using three is still not enough, but the images are clearer.
Again, not particularly well reconstructed, but we’re closer.
Using 5 feature detectors
Closer, but still no cigar!
Using 6 feature detectors
Pretty close, actually! Note how the feature detectors are very chaotic, but the regenerated images aren’t! This is (probably) because the networks have been forced to re-use the feature detectors to mean many different things. Recombining them makes a fairly clean image because you can subtract one feature detector from another feature detector (by using negative weights) and thus come up with a cleaned up image.
Using 7 feature detectors
This one’s actually worse than the previous attempt – which might be due to the fact that I didn’t run it for as long. But it might also be some random thing that happens when the number of feature detectors is so low.
Using 8 feature detectors
Oh, we’re so close now! Feature detectors are still chaotic because they’re storing multiple actual features at once.
Using 9 feature detectors
Using 10 feature detectors
Very good reconstruction, but the detectors are still chaotic! I was hoping that by now, we’d have nice clear detectors that identified one picture each, but alas. Not yet. Now we’ll pick up the pace and add more detectors for each experiment. Next up, 15 detectors!
10 and then 15 detectors
So, it seems my expectation of specialized feature detectors was incorrect;
50 and then 100 detectors
Still no specialization, and note how the detectors are starting to get washed out. The gray areas of the detectors means that the detector doesn’t care about the region in question. Very few detectors that most pixels in most detectors actually count, with many detectors, some pixels can be left unused. It also makes detectors duller and duller to look at, as you move forward…
Very odd indeed, why aren’t my feature detectors specializing? Maybe the first level of RBM nodes don’t specialize, maybe that’s for the subsequent levels to do? Also, at the point of this writing, SharpRBM isn’t quite finished, it doesn’t implement biases, which are probably required going forward. That’s something I’ll work on, and some of the results I’m getting might be because of this.
[Edit: Biases have been added and they didn’t do a thing for specialization]
More experimenting to follow.
[Edit: The current thinking is that the lack of specialization is due to the fact that this is the first layer, adding another layer might make things more agreeable! I’m working on a second layer, which should be simple enough, if I can just find the time]