Thursday, August 24, 2006

Photo Tourism

Though it is customary to bash Microsoft, in fact they do a great deal of fascinating work, either directly or in concert with others -- which this site, from the University of Washington, amply illustrates.

In a nutshell, the site demonstrates a process whereby multiple different images of a given object -- in their examples, the Trevi Fountain and the Cathedral of Notre Dame -- have a mathematical model built which segments each image into layers -- here's the background, here's the major focus, here's whats in front of the major focus. Then, the images are linked so that they all have the same focal point (obviously, this won't work if picture A is from the front of the cathedral and picture B is from the back they've got to have common features). A representative three dimensional model is drawn showing the key points that exist in all pictures, and you use that model to select a feature -- say, the main doors of the cathedral. The relevant portions of the pictures that show that area are retrieved and displayed.

The concept is simultaneously one that sounds absurdly simple and one that demonstrates an ingenious approach which, I think, likely has applications far beyond what's shown. I'm not bright enough to suggest what those applications might be, but I'd bet serious money on them. Tied in with image recognition, this could be awesome.

2 comments:

genderist said...

We've had some seroius Microsoft hate in this house over the last couple of weeks...

Cerulean Bill said...

It comes as a surprise to me to find myself saying nice things about them. But this is intriguing technology of a type I've not seen before, and I do seriously think it can lead to some amazing leaps in image recognition and manipulation. I thought they deserved some credit, even if they do eventually try to corner the market and squeeze it hard. I'll still think of them collectively as MicroShaft.