This post is part of a series on developing an SVM classifier for object detection:
- Part 1: SVMs, HOG features, and feature extraction
- Part 2: Sliding window technique and heatmaps
- Part 3: Feature descriptor code and OpenCV vs scikit-image HOG functions
- Part 4: Training the SVM classifier
- Part 5: Implementing the sliding window search
- Part 6: Heatmaps and object identification
The previous posts provided a high-level overview of SVMs (support vector machines), HOG (histogram of oriented gradients) features, and the basic algorithm employed by the object detection pipeline to actually find objects in images via a sliding window search and heatmap. If you’re not familiar with those topics, read those posts first. In this post, we’ll begin to look at the actual code, which is available on Github (as well as a readme that summarizes the project and instructions on using the pipeline). Here’s what the code ultimately ends up doing:
I won’t go through every line of code, as that would be excessive, but I will try to touch on key implementation details.
Prerequisites
I’ll assume that you already have Python 3 and OpenCV installed. Installing OpenCV is not trivial, so read the documentation and/or check out this helpful post by Adrian Rosebrock for instructions.
You’ll need the following Python packages: numpy, scikit-image 0.14+, scikit-learn, and scipy. As of this writing, scikit-image 0.14 is the development version (not released). These are the instructions for installing it. The previous versions of scikit-image don’t support multichannel images for HOG feature extraction. If you don’t plan to use multichannel images or you don’t plan to use the scikit-image HOG function (OpenCV also has this functionality, as we’ll see below), the stable release of scikit-image should be fine.
This post also assumes some basic familiarity with Python and object-orientation in Python (classes, etc.).
Feature descriptor
HOG features
The features we extract from the images form the basis of the object detection
pipeline, so let’s begin by looking at
descriptor.py, a file that defines a Descriptor
class. This allows us to
create a Descriptor
object that stores information about the features we wish
to extract. Any time we want to obtain a feature vector from an image, we can
simply pass it to the object’s getFeatureVector()
method, which returns
the feature vector.
Referring to
descriptor.py, you’ll notice that the first item in this class
definition is another class definition for a nested class called
_skHOGDescriptor
. The leading underscore in the name is meant to indicate that
it’s meant for internal use. This class simply serves as a wrapper for the
scikit-image HOG function, skimage.feature.hog()
, which takes our desired HOG
parameters and returns the HOG features.
|
|
In the nested class, we store these parameters as instance variables (in the
__init__()
method on lines 20-29), then define a method named compute()
on line 31 that actually takes an image, feeds the stored HOG parameters to
skimage.feature.hog()
, and returns the feature vector in the form of a 1D
numpy array—for example, if the HOG feature vector contained 1000
features, the shape of the returned array would be (1000,)
. Before returning
the HOG feature vector on line 38, we add a dimension to this 1D array to
make it a 2D array via np.expand_dims()
, which, for our 1000-feature feature
vector, gives the array a shape of (1000, 1)
—same number of elements,
but an extra dimension.
What’s the point of this? Aside from scikit-image, OpenCV also provides a
way to compute the HOG features of an image via the
cv2.HOGDescriptor class. A cv2.HOGDescriptor
object is
instantiated with the desired HOG parameters, and it possesses a compute()
method that takes an image and returns the HOG feature vector in the form of a
2D array.
My program gives the user the opportunity to choose whether to use the
scikit-image HOG implementation or the OpenCV HOG implementation. By wrapping
the scikit-image HOG function in a nested class, we can instantiate a single HOG
descriptor object later based on whichever implementation the user selects,
e.g., HOGDescriptor = _skHOGDescriptor()
for scikit-learn or
HOGDescriptor = cv2.HOGDescriptor()
for OpenCV. We can then use the resulting
HOGDescriptor
object the same way regardless of the library chosen. This can
be seen on lines 88-114:
|
|
What’s the difference between the two implementations? For starters, they take different parameters, and the OpenCV implementation more closely follows the original technique by Dalal and Triggs. Documentation for the OpenCV HOG object is poor, and documentation for the Python version (as opposed to C++) is nonexistent. For a concise and informative explanation of the OpenCV version’s input arguments, see this post by Satya Mallick. Note that, in the OpenCV implementation, block size is in pixels (not cells), and the block stride for block normalization must be set, also in pixels.
Three other major differences between the OpenCV and scikit-image HOG implementations (and ones which I believe have the greatest practical impact for our purposes) are as follows:
-
OpenCV HOG permits the use of both signed and unsigned gradients; scikit-learn HOG is limited to unsigned gradients.
-
OpenCV HOG does not support 2-channel images (but 1-channel, 3-channel, and 4-channel are fine). scikit-learn 0.14 supports multichannel images with any number of channels.
-
OpenCV’s HOG implementation is blazing fast compared to scikit-learn, even though scikit-learn is written with Cython. During development and testing, I found OpenCV HOG to be 4 to 5 times faster than its scikit-learn counterpart.
Color channel histogram features and spatial features
Moving on, we obtain color channel histogram features on lines 141-146 and
spatial features on lines 148-152 in the Descriptor
class’s
getFeatureVector()
method:
|
|
Note how we make use of np.histogram()
for the histogram of each channel,
being sure to specify the range range=(0, 255)
since it’s assumed
we’re working with 8-bit images. For spatial features, the image is
resized, then flattened with spatial_image.ravel()
. We utilize np.hstack()
to append features to the feature vector array.
That about sums up the feature descriptor. The next post will examine the
functions that extract features (via the Descriptor
object discussed in this
post) from our sample images and train the SVM classifier.