As convenient as it to ask Siri to skip to the next track or load up songs from your favourite artist without pulling out your phone, there are times when verbally interacting with smart assistants isn’t an option. So researchers at Cornell University developed a wearable smart camera that can detect voice commands even when the user doesn’t mutter a sound.
The intelligence of voice-activated assistants and their ability to effortlessly understand voice commands continues to improve year after year, but the one thing they’ve been all very good at from the start is understanding simple commands. One of the best reasons to opt for wireless earbuds from Apple, Google, and Amazon is easy to access to each company’s smart assistants through trigger words, so the experience is entirely hands-free.
But for those times when you don’t want to bark commands out loud (like when packed into a crowded subway car) or don’t want anyone to know you’re asking Siri to queue up your Celine’s greatest hits playlist, the SpeeChin is an interesting alternative.
Designed by Cheng Zhang, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science, and Cornell University doctoral student Ruidong Zhang, the SpeeChin is a compact infrared camera hanging on a necklace that’s worn at chest level. The camera points upwards, capturing high-contrast video of the wearer’s chin movements, which, after some training, can be used to figure out what someone is saying without them making any sound. The location of the camera is not only more covert than mounting a camera to someone’s face to record their mouth movements, it also sits at an angle where other people’s faces can’t be captured, ensuring no privacy concerns.
The researchers tested the SpeeChin with 20 participants; 10 of them spoke 54 simple phrases including digits and common voice assistant commands in English, and 10 spoke 44 simple words and phrases in Mandarin Chinese. After a training period, the chin-tracking camera was able to recognise commands in English with 90.5% accuracy, and commands in Mandarin Chinese with 91.6% accuracy. This was with the participants uttering the various phrases while remaining stationary. When asked to speak these phrases while walking, the accuracy dropped as a result of variations in each person’s movements including their gaits and the added movement of their heads.
It’s a problem that could potentially be resolved with a longer training session that included the participants both standing and walking while working through the library of phrases and commands, as well as improved camera equipment that was better able to track chin movements through more resolution or higher frame rates. Here’s to hoping the researchers continue to develop the technology, because with more advanced speech recognition capabilities, the world would be a more peaceful place where no one had to make a sound.