Researchers Just Created The Most Amazing Lip-Reading Software

7 years ago

November 10, 2016 at 4:00 pm

Researchers Just Created The Most Amazing Lip-Reading Software

One of the most unsettling moments in Stanley Kubrick’s 2001: A Space Odyssey is when it’s revealed that HAL 9000 can read lips, leaving no secrets between the astronauts and the ship’s computer. That might have been science fiction, but 15 years after the events of that film, researchers in the real world have finally taught computers how to read lips.

LipNet, developed by researchers at the University of Oxford Computer Science Department, isn’t the first software designed to predict what a person is saying by analysing the movement of their lips. But it’s by far the most accurate, achieving an impressive 93.4 per cent accuracy, compared to just 52 per cent accuracy achieved by an experienced human lip reader.

So what’s the “secret sauce” that makes LipNet so adept at reading lips? Here’s how the researchers’ abstract that explains what makes their approach different, and better:

Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end.

So what does all that mean in English? Based on previous research, the computer scientists realised that humans are better at reading lips, and deciphering what’s being said, when longer words are spoken. So instead of analysing footage of someone speaking on a word-by-word basis, LipNet goes one step further by taking entire sentences into consideration, using Deep Learning techniques to then backtrack and decipher each word.

But what does this mean for those of us outside academia? Running on a smartphone, fed a live feed from a body-worn camera, LipNet could serve as an amazing tool for the hearing impaired. Even if they already know how to lip read, it could help boost their understanding while watching someone speak. And those without lip reading skills wouldn’t be frustrated when a person they’re speaking to doesn’t know sign language.

[Cornell University Library via Laughing Squid]

Transformers One’s First Trailer Certainly Looks Different

‘Epic on a Worthy Scale’: Rebel Moon Part 2 Cast Tease More Explosive Battle Sequences

NASA Set to Launch Solar Sail for Sunlight-Propelled Space Travel

Unidentified Submerged Objects Are What We Should Really Worry About

Marvel Returns to Spider-Man’s Radioactive Semen This Summer

Today’s Best Australian Tech Deals

Moose’s Mobile Deals Have Some Hard to Ignore Dollar-To-Data Value

Listen Up, You Can Get up to 38% off Bose Headphones and Speakers

Here’s How You Can Get a Fast NBN 50 Plan For Under $60

The Best Mobile Plans Under $30

Researchers Just Created The Most Amazing Lip-Reading Software