PRIME Project 2011: 2011

Friday, August 19, 2011

Future work

Here's a list of things I would really like to see implemented.
I hope that if I have time over the school year I can experiment with some of these items. If not, then I hope the project will be continued by somebody else.

Important and feasible

Probabilistic output of SVMs

Design a more reliable function, or maybe find another library for SVMs to do probabilistic output. The current implementation was rushed and is naive.
Switch to one-against-one SVM and see if performance improves

Graphical user interface and visual representation of results.

Important and exploratory

Extending current system to original images with wings

I did this very briefly and was able to achieve about 75% accuracy. This is just a rough estimate. There is a lot of room for creativity in solving this particular problem and I'll discuss a few ideas. The main idea is to do some sort of pre-processing or feature extraction to crop out the wasp body.

Part-based detection by training on wasp bodies using SVMs
Bilateral symmetry detection, centroids, other geometrical symmetry
Identify wings and then segment out or ignore
Any other ideas for "ignoring" the wings at feature extraction stage

Finding optimal values for tuning parameters (almost all hard-coded numbers were chosen manually, and not automatically optimized).

I will edit this list as things come to mind.

Combining shape and color features (at long last)

Our current system implementation simply outputs a class label for a given example image. We instead propose an output of "likelihoods" for every class.

For example, instead of "someimage.jpg" = Stenodynerus chinesis,

we might say "someimage.jpg" is 90% likely to be Stenodynerus chinesis, 5% likely to be ..., etc...

Creating these "confidence" vectors allows us to combine shape and color features assuming that they are independent.

P( class | shape AND color ) = P( class | shape ) * P( class | color )

For this, we switch our implementation to one-vs-all SVMs due to limited time. We only briefly looked into probabilistic output, and this choice we made due to simplicity of one-vs-all approach.

Results of one-vs-all SVMs using color features with standard deviation:

92.906 ± 4.534 % (406/437)

Results of one-vs-all SVMs using HOG features w/ stddev:

86.499 ± 4.843 % (378/437)

Finally, one-vs-all SVMs using combined color and HOG:

96.110 ± 1.810 % (420/437)

When we combine the features, accuracy improves and stddev decreases. This just about wraps up my project for the summer. This blog was used mainly for documentation purposes. You can find the full research paper here, or the PowerPoint presentation here.

Thanks for reading!

Wednesday, August 10, 2011

Better results for HOG

Finally was able to get the HOGDescriptor class to work in OpenCV. Something strange was happening when I was using the object detection debug library in OpenCV2.3. Once I switched it out with the release version, it stopped giving me errors.

We now have an all-time high HOG accuracy of 0.883295 (38.6/43.7).
After testing out with a few different parameters, we found that these worked best.

Cell size = 8x8
Block size = 3x3 (9 cells per block)
number of bins = 7

Now to continue working on how to combine these with the improved color features.

Note*
Found out how to automate the "heat map" style confusion matrix in Excel. Will be using this as the standard format for confusion matrices from now on (even though there isn't much time left).

Monday, August 8, 2011

Overlapping grid of color feature blocks

We have discussed Color-based features (Hue-Saturation Histograms) and Shape-based features (HOG features) as significant and robust features in classifying different species of wasps.

Up until now, we have used only global color histograms, and never used any local information. Today we happen to experiment with the idea of spatially connected blocks, as demonstrated in HOG features.

The idea is simple. Instead of using the pixel counts from the entire image for our histogram, we take local "blocks" of the image and extract the color features from each of these blocks. Furthermore, we can let these blocks overlap to capture information about the connectedness of these regions.

In our implementation, we only consider blocks horizontally, since wasps are already oriented in a horizontal fashion and are contained within a small region of interest to operate in (as discussed in the last blog). We also use the existing implementation of extracting color histograms (2-D histogram of 30 hue bins, 32 saturation bins) for calculating the feature for each block.

The features calculated at each block are then concatenated to make one feature that captures spatial information of the colors of the wasps.

Performance improved greatly (~10%) and is at an all-time high of about 97%!

The model was trained and validated using 10-fold cross validation under a variety of windowLengths and windowStrides. Below are some visual examples of how the images are organized into blocks. The average accuracies are as follows:

windowLength = image.length/2, windowStride = image.length/2

0.904891 (33.3/36.8)

windowLength = image.length/2, windowStride = image.length/4

0.932065 (34.3/36.8)

windowLength = image.length/4, windowStride = image.length/8

0.970109 (35.7/36.8)

windowLength = image.length/8, windowStride = image.length/16

0.964674 (35.5/36.8)

windowLength = image.length/4, windowStride = image.length/6

0.953804 (35.1/36.8)

windowLength = image.length/2
windowStride = image.length/2

windowLength = image.length/2
windowStride = image.length/4

windowLength = image.length/4
windowStride = image.length/8

A variety of other values were tried for windowLength and windowStride, but none outperformed a window length of image.length/4 and stride length of image.length/8. Here is the corresponding confusion matrix for the best performing setup.

Average accuracy: 0.970109 (35.7/36.8)

*Note the number of test images for each class is one-tenth of the actual number of samples in the database, due to using 10-fold cross validation.

Goals for next week:

It would be interesting to automate the classifier for a number of different window and stride lengths to determine what values are optimal, but the classifier takes a significant amount of time (~1 minute to extract features from all images and project them to a lower dimensional space, and ~10 seconds to train model).
Another interesting thing to experiment with is not limiting ourselves to using only horizontal blocks. We can also experiment with vertical strides as well.
We need to begin testing our approach on the full set of classes (35 classes) as opposed to only the 11 classes we have been working on. The reason we have neglected to do so is due to the lack of training data among several of the classes.
HOG features still present important shape information that color histograms do not capture. We can continue looking into how to combine these features.
Ultimately, future work depends on how the project can scale to the raw images (with wings). While we most likely don't have time to work on this aspect of the project, there are many interesting ideas and solutions to be explored.

Sunday, August 7, 2011

Selecting a region of interest

What I had tried to do earlier before was select a region of interest close to the center of the wasp. While this showed improvement, it was not a well-defined procedure since finding the center would vary from wasp to wasp depending on the wings.

But now that we have segmented the wings off manually, and are proceeding with less difficult problem, we can elegantly choose a bounding box for our region of interest.

We do this by first creating a binary image, and then choosing the region with largest area (assumed to be the wasp body). This is accomplished via the connected components algorithm, or findContours in OpenCV. Once the blob is found, we simply return the "tightest" bounding box for that region.

Here are a few examples:

The images are cropped according to these bounding boxes, so there is significantly less data to process during runtime. These cropped images must all be rescaled to one size, so that the HOG algorithm can produce feature vectors of equal length. For now, we have chosen to rescale each cropped image to the average size of the rectangular bounding boxes.

After testing out the aforementioned pre-processing steps, we note a slight increase in performance for H-S histogram features, and a significant performance in HOG features.

Here are the results for Color Histograms:

With an overall accuracy of: 0.872283 (32.1/36.8) - Our highest yet again! Improved by about ~1%.

And here are the results for HOG features:

With an overall accuracy of: 0.817935 (30.1/36.8) - Showing over 10% improvement from last time!

Recognition accuracy is beginning to reach a point where it may be practical and preferred over manual recognition. I will continue looking into how to combine the two features into a single weighted feature vector.

More improvements!

Made some good progress today. Will post up results tomorrow.

Things to look into next week:

HOGDescriptor class for OpenCV
Combining HOG feature vector with Color histogram feature vector
Part modeling

Friday, August 5, 2011

Re-factoring and multiClass SVM

So first I spent the last two days re-factoring, commenting and reorganizing all the code I've written.
It's much cleaner now, though I'm a bit embarrassed I had to look into how to properly write header files and understand dependencies.

Other than that, I replaced the k-nearest neighbor model from the classifier with a multiclass Support Vector Machine implementation. I used the one-vs-one approach described here:
http://en.wikipedia.org/wiki/Support_vector_machine#Multiclass_SVM

Validated the model with 10-fold cross validation and here are the results, using only color histograms:
Average accuracy: 0.861413 (31.7/36.8) - best so far!

And here are the results using only HOG features:
Average accuracy: 0.703804 (25.9/36.8) - Not bad for the naive approach.

The dimensionality of the HS histogram features is 960, whereas the HOG features are a whopping 16524... In cutting edge recognition, feature lengths are often less than 100 due to the curse of dimensionality. The classifier takes a good amount of time extracting the features, and then reducing them with PCA (about 1 minute).

I'm still edgy about what I should do for the next two weeks. I've narrowed it down to some of the things I feel I should focus on, but I'm unsure as to whether I will have enough time to finish everything.

Look into metric learning or other feature weighting approaches to combine HOG features with color histogram features.
Instead of taking HOG feature from the entire image, find bounding rectangle to take HOG feature.
Could also continue part-modeled recognition by continuing work on part detectors.

Create a user-interface for easy training of bounding boxes for different parts.
Increase speed of template matching

Reduce dimensionality of Color histogram features and HOG features, without hindering performance.

Wednesday, August 3, 2011

Template matching using HOG Features

Last time we talked about using Histogram of Orientation Gradient features to describe shape. Today we will show you the results of using the HOG features to detect shapes in images.

First we need to train our images by placing bounding boxes for positive labels and negative labels.
In this example, the "petiole" of an insect is a positive label (green bounding box), and the negative labels (red bounding box) are random patches that give the model an idea of what is "not a petiole."

We calculate the HOG feature at each window:

Then, we train the positive and negative labels using a Support Vector Machine, which is a binary classifier (eg. is it X or is it Y?) In our case we care about whether it is a petiole or not a petiole.

We scan through the image using a brute-force sliding window approach, calculating the HOG features as we go. We keep track of the window with the best score, and return that as our match.

Successful petiole matches

We also require the score to be high enough, in the case that there is no good match.

Successfully identified lack of petiole

Incorrectly identified petiole due to lighting

Tuesday, July 26, 2011

Visualizing HoG Features

Finished writing some of the code for HoG features. Still haven't completely understood/implemented the portion for detection via sliding window.

The following outlines our process: (more information at http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients)

Gradient maps are first computed using Sobel operator or 1-D derivative masks.

gradient in x-direction

gradient in y-direction

Using these gradient maps, we can calculate a magnitude and orientation for every pixel. We bin these pixels based on the angle of orientation, using nine bins (increments of 20 degrees, 0 to 180).

Then we group pixels together into "cells" and the orientation of the cell is determined by the magnitudes of the pixels within that cell. Here are some visual representations of HoG features of an image.
The bin with the highest magnitude for each cell is chosen to represent that cell:

Original image

16x16 pixel cells

8x8 pixel cells

4x4 pixel cells

Here are some more examples (original on left, HoG on right)

In recent years, a lot of people have reached out to me for guidance on how to generate these HOG visualizations. Please take a look at https://github.com/Porkbutts/Vespidae-Wasp-Classification/blob/master/Project/practice.cpp#L223 if you are interested.

Working with manually pre-processed images

I got the photoshopped wasps without wings/legs/antennae. Some of them are not so clean, so I have excluded those, but most of them are very good.

After experimented with classification on these images, here are some of the results. Using only color histograms, classification improved to about 80%. This is probably due to the color of wings varying from specimen to specimen, the classifier no longer has to deal with this issue.

Using simple HoG features, classification is about 55% accurate. Here is the confusion matrix using a nearest neighbor classifier on only HoG features:

Note that, even though classification results are not very high, they make sense.

Many examples from class 3 are misclassified as class 8 due to their similar body type:

Also class 4 misclassified as class 5:

Using Histogram of Oriented gradients, the image is divided into many overlapping blocks. This type of approach could also be applied to the color histograms. From here we can scan the image for body parts such as abdomen/head/etc... if we have a database of trained parts.

Friday, July 22, 2011

Restructuring the project

Sorry I haven't had much time to update the blog.

I am attaching some of the slides from my weekly progress report as a series of images as most of what I have to say would be redundant. I discuss the workflow necessary for a fully automated image classification system, and the step in that pipeline that I will be focusing on.

At the start of the week, I tried to look into identifying different parts of the insect, such as head/thorax/abdomen. Professor David Kriegman (UCSD Computer Vision) pointed me into the direction of a part-modeling paper: http://www.cs.cornell.edu/~dph/papers/pict-struct-ijcv.pdf
Professor Serge Belongie (UCSD Computer Vision) suggested I take a look at HoG features (Histogram of Oriented gradients: http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients#Gradient_computation)

I also looked into this paper on part-modeling using HoG features:
http://ttic.uchicago.edu/~dmcallester/lsvm-pami.pdf

I've been trying to write code for extracting HoG features, since OpenCV only provides code for extracting descriptors. I would like to manipulate the features manually and play around with them, so that is why I am writing code for it. I hope to have the code for HoG features implemented by next week and some images of the histograms that I can show for demonstration.