We have discussed Color-based features (Hue-Saturation Histograms) and Shape-based features (HOG features) as significant and robust features in classifying different species of wasps.
Up until now, we have used only global color histograms, and never used any local information. Today we happen to experiment with the idea of spatially connected blocks, as demonstrated in HOG features.
The idea is simple. Instead of using the pixel counts from the entire image for our histogram, we take local "blocks" of the image and extract the color features from each of these blocks. Furthermore, we can let these blocks overlap to capture information about the connectedness of these regions.
In our implementation, we only consider blocks horizontally, since wasps are already oriented in a horizontal fashion and are contained within a small region of interest to operate in (as discussed in the last blog). We also use the existing implementation of extracting color histograms (2-D histogram of 30 hue bins, 32 saturation bins) for calculating the feature for each block.
The features calculated at each block are then concatenated to make one feature that captures spatial information of the colors of the wasps.
Performance improved greatly (~10%) and is at an all-time high of about 97%!
The model was trained and validated using 10-fold cross validation under a variety of windowLengths and windowStrides. Below are some visual examples of how the images are organized into blocks. The average accuracies are as follows:
windowLength = image.length/2, windowStride = image.length/2
0.904891 (33.3/36.8)
windowLength = image.length/2, windowStride = image.length/4
0.932065 (34.3/36.8)
windowLength = image.length/4, windowStride = image.length/8
0.970109 (35.7/36.8)
windowLength = image.length/8, windowStride = image.length/16
0.964674 (35.5/36.8)
windowLength = image.length/4, windowStride = image.length/6
0.953804 (35.1/36.8)
|
windowLength = image.length/2
windowStride = image.length/2 |
|
windowLength = image.length/2
windowStride = image.length/4 |
|
windowLength = image.length/4
windowStride = image.length/8 |
A variety of other values were tried for windowLength and windowStride, but none outperformed a window length of image.length/4 and stride length of image.length/8. Here is the corresponding confusion matrix for the best performing setup.
Average accuracy: 0.970109 (35.7/36.8)
*Note the number of test images for each class is one-tenth of the actual number of samples in the database, due to using 10-fold cross validation.
Goals for next week:
- It would be interesting to automate the classifier for a number of different window and stride lengths to determine what values are optimal, but the classifier takes a significant amount of time (~1 minute to extract features from all images and project them to a lower dimensional space, and ~10 seconds to train model).
- Another interesting thing to experiment with is not limiting ourselves to using only horizontal blocks. We can also experiment with vertical strides as well.
- We need to begin testing our approach on the full set of classes (35 classes) as opposed to only the 11 classes we have been working on. The reason we have neglected to do so is due to the lack of training data among several of the classes.
- HOG features still present important shape information that color histograms do not capture. We can continue looking into how to combine these features.
- Ultimately, future work depends on how the project can scale to the raw images (with wings). While we most likely don't have time to work on this aspect of the project, there are many interesting ideas and solutions to be explored.