A prototype surveillance camera, developed at UCLA, has the ability to give a live text description of whatever crosses its path. It provides a more efficient way to search video, and gives me a good excuse to buy a wide-brimmed hat.
The system, called I2T (Image to Text) uses images or video frames as input, runs them through a series of algorithms, and produces a textual analysis of what was there. But how does the software know? A vast human-generated rubric:
In 2005, Zhu established the nonprofit Lotus Hill Institute in Ezhou, China, and, with some support from the Chinese government, recruited about 20 graduates of local art colleges to work full-time to annotate a library of images to aid computer vision systems. The result is a database of more than two million images containing objects that have been identified and classified into more than 500 categories.
Even with that much granularity, I2T still doesn't have a large enough database to correctly assess a dynamic situation, which is why it's not yet ready for commercialisation. And it'll likely never match the human analytic abilities. But it's a very cool technology and fun for stoking the paranoiac flames. [Technology Review]