Germany announced this week that it will begin testing voice recognition software in its screening of refugees seeking asylum. The approach may help speed up the processing of hundreds of thousands of migrants, but some experts fear that the imperfect technology could cause more harm than good.
In millions of homes across America (and some in Australia, if you can get your hands on one), Amazon's voice-controlled personal assistant, Alexa, is listening. And whether you want to or not, she's ready to play.
In 1998, Germany began to use speech analysis to help determine the country of origin for asylum seekers. Reportedly, around 60 per cent of refugees arrive without the required identification papers. Traditionally, if there are doubts about an applicant's stated homeland, a recording of them having a conversation will be sent to a linguist for verification. With a major refugee crisis hitting the region, there's now a backlog of 430,000 applications to verify.
But automated speech analysis isn't just an altruistic effort to bring more people through the system faster. It's also part of an initiative to detect applicants who might falsely state they are from a country like Syria in order to receive preferential treatment. The possibility of voice recognition technology being inadequate presents a problem for advocates of relaxed immigration policy who wouldn't want someone to be wrongly turned away. And it's a problem for hardliners who fear nefarious characters being smuggled in for the purposes of terrorism.
The system that will be tested over two weeks was designed for use by banks and insurance companies. Ideally, it should be able to recognise where an applicant is from based on a sample of their speech. That analysis would then be paired with other "indicators" and might raise a flag for further review.
Some linguistic experts insist that the nuances of language couldn't possibly be covered accurately with the current software. University of Essex linguistics Professor Monika Schmid spoke to Deutsche Welle and warned that many factors go into the analysis of language like age, use of slang and human nature to adapt the way one speaks. She gave an example of a recent study her team conducted. They asked native German speakers to identify if an audio clip came from another native German speaker. All of the samples were, in fact, native speakers but they had lived abroad for at least five years. Again and again, the test group did not believe the samples were from native speakers. Think about when Madonna comes back to the U.S. after spending a weekend in England.
Applicants could fake it if they are pretending to be from somewhere that gets fast-tracked. "I don't see how automated software can distinguish whether a person uses a certain word or pronounces it in a particular way because this is part of their own repertoire or because they were primed to do so by the interviewer or interpreter," Schmid says. "Identifying the region of origin for anyone based on their speech is an extremely complex task. Both humans and machines can easily be wrong, but humans are probably better at realising this."
Dirk Hovy, a computer scientist at the University of Copenhagen, agrees that making this sort of system accurate is a monumental task. "Creating a perfect dataset is virtually impossible because language is constantly changing," he told Die Welt.