For the first time ever, two AI systems built to process and respond to human speech (created, respectively, by Microsoft and Chinese commerce giant Alibaba) outscored humans in a reading comprehension test designed by Stanford researchers.
A still from a video promoting Alibaba’s AI. (Screenshot: Alibaba)
The Stanford Question Answering Dataset, SQuAD, is composed of a staggering 100,000 questions following brief reading passages. Created in 2016, SQuAD is used as a benchmark to measure AI’s progress in natural language processing. After reading excerpts from Wikipedia, the systems answer questions such as “What is the Latin name for Black Death?” and “How many actors have played Doctor Who?” Both Microsoft and Alibaba’s AI outscored humanity in the latest round of testing. Alibaba’s AI score was 82.44, and Microsoft’s was 82.650, with humans trailing behind them both at 82.304.
Alibaba’s system may have finished second, but it’s more than qualified to handle its day job: Working in sales. The company’s AI team reportedly works closely with the developers of Ali Xiaomi, a chat bot that answers customer questions about products. At peak times, Alibaba says 95 per cent of its online customer questions can be handled by Ali Xiaomi. For now, the goal is creating a new class of responsive AI to help with a variety of online tasks. Similarly, Microsoft has used AI to boost the capabilities of its office suite.
“The technology underneath can be gradually applied to numerous applications,” said Alibaba’s chief scientist Luo Si in a statement, “such as customer service, museum tutorials and online responses to medical inquiries from patients, decreasing the need for human input in an unprecedented way.”
There’s something grimly familiar about an exceedingly well-trained AI trapped in a nine to five answering sales questions, but while the scores are certainly impressive, AI is far from comprehending things as we understand the concept. As MIT Technology Review’s Jamie Condliffe puts its, AI can only recognise answers in terms of patterns or structures. So while it can correctly answer “Who was elected US president in November 1960?” it doesn’t actually know who “John F. Kennedy” is.