Using names like Siri, Cortana, and Google Now, advanced algorithms and technologies that would have baffled engineers and scientists half a century ago now rest in the palm of our hands. Talking with technology is the future of computing — mainly because that’s the way we’re built to communicate.
Google’s short eight-minute documentary, Behind the Mic: The Science of Talking With Computers explores humanity’s obsession with conversing with machines and the challenges of developing language learning algorithms. At its most basic, this drive for verbal interaction with our tech comes from how human speech develops, as Google computer scientist Geoffrey Hinton video explains:
We come into this world with the innate abilities to learn how to interact with other sentient beings. Supposed you had to interact with other people by writing little messages to them. It would be a real pain. That’s how we interact with computers. It’s much easier to talk to them. It’s just so much easier if the computer can understand what we’re saying.
Despite using keyboards for decades — making us more comfortable with the convenience of text than actually talking — a recent Google study shows that teenage smartphone users are more likely to use voice search than their parents’ generation.
But how we got to this point is actually a 62-year-long epic, starting with Bell Laboratories in 1952, which developed a machine that could only recognise numbers spoken by one specific person. Carnegie Mellon’s Harpy speech recognition system and other mathematical approaches, like the Hidden Markov model, began the slow trek toward what we recognise today as speech recognition.
The video goes on to talk about how modern technology slices and dices language down to phonemes, the building blocks of language, but how that impressive feat of engineering only represents part of creating real discourse between man and machine. The next big step will be language learning, which Google engineers and scientists seem convinced will come in the form of neural nets, essentially mimicking the way our brain interprets language.
Although speech recognition has plenty room for improvement, the next time you curse Siri for misquoting your conversation, take a second to marvel at the interaction that’s actually taking place. [9to5Google]