PRIME Project 2011

Friday, August 19, 2011

Future work

Here's a list of things I would really like to see implemented.
I hope that if I have time over the school year I can experiment with some of these items. If not, then I hope the project will be continued by somebody else.

Important and feasible

Probabilistic output of SVMs

Design a more reliable function, or maybe find another library for SVMs to do probabilistic output. The current implementation was rushed and is naive.
Switch to one-against-one SVM and see if performance improves

Graphical user interface and visual representation of results.

Important and exploratory

Extending current system to original images with wings

I did this very briefly and was able to achieve about 75% accuracy. This is just a rough estimate. There is a lot of room for creativity in solving this particular problem and I'll discuss a few ideas. The main idea is to do some sort of pre-processing or feature extraction to crop out the wasp body.

Part-based detection by training on wasp bodies using SVMs
Bilateral symmetry detection, centroids, other geometrical symmetry
Identify wings and then segment out or ignore
Any other ideas for "ignoring" the wings at feature extraction stage

Finding optimal values for tuning parameters (almost all hard-coded numbers were chosen manually, and not automatically optimized).

I will edit this list as things come to mind.

Combining shape and color features (at long last)

Our current system implementation simply outputs a class label for a given example image. We instead propose an output of "likelihoods" for every class.

For example, instead of "someimage.jpg" = Stenodynerus chinesis,

we might say "someimage.jpg" is 90% likely to be Stenodynerus chinesis, 5% likely to be ..., etc...

Creating these "confidence" vectors allows us to combine shape and color features assuming that they are independent.

P( class | shape AND color ) = P( class | shape ) * P( class | color )

For this, we switch our implementation to one-vs-all SVMs due to limited time. We only briefly looked into probabilistic output, and this choice we made due to simplicity of one-vs-all approach.

Results of one-vs-all SVMs using color features with standard deviation:

92.906 ± 4.534 % (406/437)

Results of one-vs-all SVMs using HOG features w/ stddev:

86.499 ± 4.843 % (378/437)

Finally, one-vs-all SVMs using combined color and HOG:

96.110 ± 1.810 % (420/437)

When we combine the features, accuracy improves and stddev decreases. This just about wraps up my project for the summer. This blog was used mainly for documentation purposes. You can find the full research paper here, or the PowerPoint presentation here.

Thanks for reading!

Wednesday, August 10, 2011

Better results for HOG

Finally was able to get the HOGDescriptor class to work in OpenCV. Something strange was happening when I was using the object detection debug library in OpenCV2.3. Once I switched it out with the release version, it stopped giving me errors.

We now have an all-time high HOG accuracy of 0.883295 (38.6/43.7).
After testing out with a few different parameters, we found that these worked best.

Cell size = 8x8
Block size = 3x3 (9 cells per block)
number of bins = 7

Now to continue working on how to combine these with the improved color features.

Note*
Found out how to automate the "heat map" style confusion matrix in Excel. Will be using this as the standard format for confusion matrices from now on (even though there isn't much time left).

Monday, August 8, 2011

Overlapping grid of color feature blocks

We have discussed Color-based features (Hue-Saturation Histograms) and Shape-based features (HOG features) as significant and robust features in classifying different species of wasps.

Up until now, we have used only global color histograms, and never used any local information. Today we happen to experiment with the idea of spatially connected blocks, as demonstrated in HOG features.

The idea is simple. Instead of using the pixel counts from the entire image for our histogram, we take local "blocks" of the image and extract the color features from each of these blocks. Furthermore, we can let these blocks overlap to capture information about the connectedness of these regions.

In our implementation, we only consider blocks horizontally, since wasps are already oriented in a horizontal fashion and are contained within a small region of interest to operate in (as discussed in the last blog). We also use the existing implementation of extracting color histograms (2-D histogram of 30 hue bins, 32 saturation bins) for calculating the feature for each block.

The features calculated at each block are then concatenated to make one feature that captures spatial information of the colors of the wasps.

Performance improved greatly (~10%) and is at an all-time high of about 97%!

The model was trained and validated using 10-fold cross validation under a variety of windowLengths and windowStrides. Below are some visual examples of how the images are organized into blocks. The average accuracies are as follows:

windowLength = image.length/2, windowStride = image.length/2

0.904891 (33.3/36.8)

windowLength = image.length/2, windowStride = image.length/4

0.932065 (34.3/36.8)

windowLength = image.length/4, windowStride = image.length/8

0.970109 (35.7/36.8)

windowLength = image.length/8, windowStride = image.length/16

0.964674 (35.5/36.8)

windowLength = image.length/4, windowStride = image.length/6

0.953804 (35.1/36.8)

windowLength = image.length/2
windowStride = image.length/2

windowLength = image.length/2
windowStride = image.length/4

windowLength = image.length/4
windowStride = image.length/8

A variety of other values were tried for windowLength and windowStride, but none outperformed a window length of image.length/4 and stride length of image.length/8. Here is the corresponding confusion matrix for the best performing setup.

Average accuracy: 0.970109 (35.7/36.8)

*Note the number of test images for each class is one-tenth of the actual number of samples in the database, due to using 10-fold cross validation.

Goals for next week:

It would be interesting to automate the classifier for a number of different window and stride lengths to determine what values are optimal, but the classifier takes a significant amount of time (~1 minute to extract features from all images and project them to a lower dimensional space, and ~10 seconds to train model).
Another interesting thing to experiment with is not limiting ourselves to using only horizontal blocks. We can also experiment with vertical strides as well.
We need to begin testing our approach on the full set of classes (35 classes) as opposed to only the 11 classes we have been working on. The reason we have neglected to do so is due to the lack of training data among several of the classes.
HOG features still present important shape information that color histograms do not capture. We can continue looking into how to combine these features.
Ultimately, future work depends on how the project can scale to the raw images (with wings). While we most likely don't have time to work on this aspect of the project, there are many interesting ideas and solutions to be explored.

Sunday, August 7, 2011

Selecting a region of interest

What I had tried to do earlier before was select a region of interest close to the center of the wasp. While this showed improvement, it was not a well-defined procedure since finding the center would vary from wasp to wasp depending on the wings.

But now that we have segmented the wings off manually, and are proceeding with less difficult problem, we can elegantly choose a bounding box for our region of interest.

We do this by first creating a binary image, and then choosing the region with largest area (assumed to be the wasp body). This is accomplished via the connected components algorithm, or findContours in OpenCV. Once the blob is found, we simply return the "tightest" bounding box for that region.

Here are a few examples:

The images are cropped according to these bounding boxes, so there is significantly less data to process during runtime. These cropped images must all be rescaled to one size, so that the HOG algorithm can produce feature vectors of equal length. For now, we have chosen to rescale each cropped image to the average size of the rectangular bounding boxes.

After testing out the aforementioned pre-processing steps, we note a slight increase in performance for H-S histogram features, and a significant performance in HOG features.

Here are the results for Color Histograms:

With an overall accuracy of: 0.872283 (32.1/36.8) - Our highest yet again! Improved by about ~1%.

And here are the results for HOG features:

With an overall accuracy of: 0.817935 (30.1/36.8) - Showing over 10% improvement from last time!

Recognition accuracy is beginning to reach a point where it may be practical and preferred over manual recognition. I will continue looking into how to combine the two features into a single weighted feature vector.

More improvements!

Made some good progress today. Will post up results tomorrow.

Things to look into next week:

HOGDescriptor class for OpenCV
Combining HOG feature vector with Color histogram feature vector
Part modeling

Friday, August 5, 2011

Re-factoring and multiClass SVM

So first I spent the last two days re-factoring, commenting and reorganizing all the code I've written.
It's much cleaner now, though I'm a bit embarrassed I had to look into how to properly write header files and understand dependencies.

Other than that, I replaced the k-nearest neighbor model from the classifier with a multiclass Support Vector Machine implementation. I used the one-vs-one approach described here:
http://en.wikipedia.org/wiki/Support_vector_machine#Multiclass_SVM

Validated the model with 10-fold cross validation and here are the results, using only color histograms:
Average accuracy: 0.861413 (31.7/36.8) - best so far!

And here are the results using only HOG features:
Average accuracy: 0.703804 (25.9/36.8) - Not bad for the naive approach.

The dimensionality of the HS histogram features is 960, whereas the HOG features are a whopping 16524... In cutting edge recognition, feature lengths are often less than 100 due to the curse of dimensionality. The classifier takes a good amount of time extracting the features, and then reducing them with PCA (about 1 minute).

I'm still edgy about what I should do for the next two weeks. I've narrowed it down to some of the things I feel I should focus on, but I'm unsure as to whether I will have enough time to finish everything.

Look into metric learning or other feature weighting approaches to combine HOG features with color histogram features.
Instead of taking HOG feature from the entire image, find bounding rectangle to take HOG feature.
Could also continue part-modeled recognition by continuing work on part detectors.

Create a user-interface for easy training of bounding boxes for different parts.
Increase speed of template matching

Reduce dimensionality of Color histogram features and HOG features, without hindering performance.