Deep Learning - Exercise 13

Learning goal: Get a besser intuition for hierarchical (deep) unsupervised learning

Recap

In exercise 11 the task was to learn features (synonyms: filters, neurons, ...) unsupervisedly.

We have seen, that Hebb's rule was not enough to achieve this task. However, we were able to achieve unsupervised learning of features by three incredients:

  1. an appropriate update mechanism for each feature. Here we chose cumulative moving average to make sure, that on the hand each feature is influenced by all the image patches it has seen before and on the other hand becomes more and more robustly (adapts slower and slower).
  2. a winner-takes-it-all adaption mechanism. In each step we chose the filter which was most similar to the input image patch to adapt it slightly into the direction of the input pattern, while all the other filters did not adapt.
  3. an initialization phase which makes sure that all the filters are moved towards "natural" image patches (input patterns) when starting to learn. We have seen that it is crucially to start with "natural" image patches or one feature will win again and again and will be the only feature that will be adapted in the following resulting in a useless feature (filter) bank.

Stacking unsupervised feature learners

Now let's try to understand whether there is a benefit when we use stacked hierarchical feature learning.

For this, implement a base class FeatureLearner that encapsulates exactly the functionality we have developed in exercise 11: a single layer for unsupervised feature learning.

Then use this class FeatureLearner to build a model where you stack several of these layers on top of each other:

so layer i shall get its input from features (learned unsupervisedly) from layer i-1 below.

Further, let's copy the principle adopted in Convolutional Neural Networks: increase the receptive field (filter size) of the features while going from a layer i-1 to layer i to make sure, that features learned in higher layers will "see" a larger portion of the image and can represent higher-level object features.

For experimental evaluation whether there is a benefit of stacking layers or not, here is my first idea:

  • Download two types of videos for training: plane (spotting) vs. ship (spotting) videos. Download also some plane and ship videos that you do not use during training and save them for evaluation.
  • Learn a shallow unsupervised feature learner (only 1 layer of unsupervised feature learning) and connect the features learned to a simple classifier (e.g. Perceptron classifier)
  • Learn a deep unsupervised feature learner (several layers of unsupervised feature learning) and connect the features of the last layer learned to a simple classifier (e.g. Perceptron classifier)
  • Using the evaluation videos: choose randomly frames from ship or plane videos (so you know the ground truth, i.e., whether there is a ship or a plane visible in the frame) and compare the classification rate between the shallow vs. deep unsupervised feature learner. Key question: is there a benefit of stacking unsupervised feature learners?

Note

This last exercise is highly experimental. I have not solved it by my own, so I do not know what the result will be. However, it is exciting to see whether we can see a benefit of deep unsupervised feature learning as well!