New MIT Tech Can Read Pages Inside Closed Books… And Beat Internet CAPTCHAs

New MIT Tech Can Read Pages Inside Closed Books… And Beat Internet CAPTCHAs

How does one read a book without opening it? Why would you want to read a closed book in the first place? While not a common problem, it’s enough of one that MIT research scientist Barmak Heshmet decided to have a crack and came up with a system that uses terahertz radiation, femto-photography and air to read characters from a closed book, along with an algorithm that can give CAPTCHAs a run for their money.

But first, why closed books? Seems like a lot of trouble to go to when you could just, you know, open the book. Unfortunately, a lot of old and delicate tomes exist in libraries and vaults that would not survive the opening process, let alone exposure outside of controlled environments.

As Heshmet explains, their technique had to overcome a number of challenges:

In order to read through a closed book you have to do four things: first, you have to have radiation that goes through the paper. So the paper has to be slightly transparent in this frequency range. Second, you have to have the time resolution to distinguish between different pages … Third, you have to have the spectral information of different inks — for example, the ink should be visible on that range of frequencies. And the fourth one is recognising the characters themselves.

Terahertz radiation solved the first problem, as “different chemicals absorb different frequencies” of the radiation, according to MIT’s press release. So the composition of the ink would differ from the page it is on.

Femto-photography handled the second requirement and the third was a matter of finding the relevant data and having it on hand.

As for the last one, the research team came up with their own character recognition algorithms. Because they had to deal with images that would be heavily distorted and noisy, they ended up with something that would interest many a spammer:

“It’s actually kind of scary,” Heshmat says of the letter-interpretation algorithm. “A lot of websites have these letter certifications [captchas] to make sure you’re not a robot, and this algorithm can get through a lot of them.”

The end result is a system that can distinguish 20 individual pages, however, the radiation isn’t strong enough to read deeper than nine or so. The release mentions that advances in the associated technologies should improve these numbers, but for now, it’s a neat achievement.

[MIT, via TechCrunch]