Deformable Parts Model

Original Paper

Felzenszwalb, P. F. and Girshick, R. B. and McAllester, D. and Ramanan, D.
Object Detection with Discriminatively Trained Part Based Models.
PAMI, vol. 32, no. 9, pp. 1627-1645, September 2010

pdf

Paper website

How does it work?

That paper has a very generic & formal description of DPMs.

The OpenCV documentation contains the following description text:


The object detector described below has been initially proposed by P.F. Felzenszwalb in [Felzenszwalb2010]. It is based on a Dalal-Triggs detector that uses a single filter on histogram of oriented gradients (HOG) features to represent an object category. This detector uses a sliding window approach, where a filter is applied at all positions and scales of an image. The first innovation is enriching the Dalal-Triggs model using a star-structured part-based model defined by a “root” filter (analogous to the Dalal-Triggs filter) plus a set of parts filters and associated deformation models. The score of one of star models at a particular position and scale within an image is the score of the root filter at the given location plus the sum over parts of the maximum, over placements of that part, of the part filter score on its location minus a deformation cost easuring the deviation of the part from its ideal location relative to the root. Both root and part filter scores are defined by the dot product between a filter (a set of weights) and a subwindow of a feature pyramid computed from the input image. Another improvement is a representation of the class of models by a mixture of star models. The score of a mixture model at a particular position and scale is the maximum over components, of the score of that component model at the given location.

In the OpenCV Adventure blog you can find this explanation:


Latent SVM is a system built to recognize object by matching both 1. the HOG models, which consists of the 'whole' object and a few of its 'parts', and 2. the position of parts. The learned positions of object-parts and the 'exact' position of the whole object are the Latent Variables. The 'exact' position is with regard to the annotated bounding box from the input image. As an example, a human figure could be modeled by its outline-shape (whole-body head-to-toe) together with its parts (head, upper-body, left arm, right arm, left lower lib, right lower lib, feet).

The HOG descriptor for the whole body is Root Filter and those for the body parts are Parts Filter.

The target function is the best response by scanning a window over an image. The responses consists of the outputs from the all the filters. The search for best match is done in a multi-scale image pyramid. The classifier is trained iteratively using coordinate-descent method by holding some components constant while training the others. The components are Model Parameters (Filters Positions, Sizes), weight coefficients and error constants. The iteration process is a bit complicated - so much to learn! One important thing to note is the positive samples are composed of moving the parts around an allowable distance. There is a set of latent variables for this ( size of the movable-region, center of all the movable-regions, quadratic loss function coefficients). Able to consider the 'movable' parts is what I think being 'deformable' means.

Video explanations


57:14min intro by Pedro Felzenszwalb

Reference Code

OpenCV

You can use the OpenCV Latent SVM implementation for testing the approach.

If you configure OpenCV using cmake such that the examples are build with, you will find an example binary:

opencv\bin\Release\cpp-example-latentsvm_multidetect.exe

You have to call it like this:

cpp-example-latentsvm_multidetect.exe <imgfolder> <modelfolder>

The pre-trained different parts-based-models have to be loaded separately (!) from the OpenCV extra git repository. Unfortunately, they are not included within the normal OpenCV git repository.

But if I call it with the car- and person-model I get very pool results. Here is an example:

LibPaBOD

LibPaBOD provides signifcantly better results, but only comes with one pre-trained object model (upper body part).

I only tested the binary. Here are the results for the same picture:

Other implementations (interesting, but not yet tested by myself)

 
public/deformable_parts_model_dpm.txt · Last modified: 2013/12/25 23:02 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki