artificial intelligence – The Web and all that Jazz

Time does fly by when you don’t work on the weekend. It’s been almost a week since my last post–so you know I’ve been busy coding. Today, I got stuck on caching, so here I am, blogging.

In Nerd Time 8, I had mentioned the algorithm on content-aware image resizing. For those of you that didn’t hear about it a couple weeks ago, watch the movie. It seems pretty magical at first. It basically computes an energy function for the image to decide which part it can cut out if it needed to.

I don’t know if the rest of you had the same thought, but content-aware image resizing is essentially image summarization. You’re throwing out less important information in the picture in favor of preserving informational features of the image.

They use an energy function as a metric to determine which seams–and in what order–to remove from the picture to reduce its size.

The most surprising thing to me was that the basic energy function is just the magnitude of the gradient function. A gradient function of an image tells you how fast the colors are changing as you’re moving across the image. This means the sky would be smooth and slowly varying (low frequency), and the trees would be rough and varies quickly (high frequency). Therefore, the basic gradient energy function just allows you to selectively cut out the low frequency parts of the image while the seam selection preserves the aspect ratio and image coherence.

Apparently, this metric works pretty well, even compared to other metrics like entropy, which is the standard measure of information content. This works mostly because of the assumption that high content areas of the scene will be high frequency, and background images, like sky, road, wall, are generally low frequency images. If you had a picture of me and someone else holding up a flag with a forest as a backdrop, it’ll cut out the flag first, not the trees, using the gradient energy function.

This puts text summarization into clearer focus for me. There are two competing goals in text summarization: 1) reduce the amount of text 2) keep the information content high and coherent. With content-aware image resizing, it was able to achieve both goals by finding a metric that was calculable to distinguish between important and non-important. So by comparison, we should be able to do the same with text.

However, we don’t know what, if any, the gradient between words means, and how that would fare as a measure of information content. We also don’t have a good way of judging coherency of a piece of text–different people will come up with different summaries. In an image, we can look and just tell. This is because we judge all pieces of an image in parallel, and we have a database of images to compare to in our heads to tell if something ‘looks right’ or not.

One can tell how far apart a color is from another simply by measuring the distance of the hex values that represent that color. However, words that have similar letters may have completely different meanings.

The difference between the almost right word & the right word is really a large matter–it’s the difference between the lightning bug and the lightning. – Mark Twain

I suspect that one would need to use a gradient map for words, or be able to use the etymology of words to measure how far apart the meanings of words are from each other. How to generate this map has been difficult, as far as I know.

Many people have used co-occurrences of words to map words to meanings, since it makes sense that words related to each other would appear in the same text. However it was found that even if two words had the same meaning, they might have different frequency of occurrences, thus throwing off the validity of the gradient map.

Photosynth presentation | Venture Itch

I think this is a bit of old news, since I wasn’t running windows XP in order to view the demo at their website. For the last 10 years or so, I’ve always thought that computer vision has been still trapped in the realm of research labs. But things are starting to bear fruit. Image registration (lining up images) isn’t an easy task, since lighting, shape, perspective all have to be taken into account. It becomes especially from difficult if you have to do it from 3 space, as is done in the demo. However, it seems like everything’s preprocessed, so it looks fast.

I don’t think it’s a far stretch to say that you can also register people’s faces, so you can find all the images with your face in it, taken from different perspectives.

I also wouldn’t be surprised that all the tagging of people going on in facebook photos is training a classifier to recognize and register people’s faces.

The ones that push innovation and create new markets are the ones that open up possibilities, and show others what was previously thought impossible.

Category: artificial intelligence

[115, 117, 109, 109, 97, 114, 105, 122, 97, 116, 105, 111, 110].map{|c| c.chr}.join

Photosynth: stitching photos in 3D