How Google Has Turned Language Translation Into A Maths Problem

How Google Has Turned Language Translation Into A Maths Problem

Language translation is a notoriously difficult task for humans, let alone computers. But in trying to solve that problem Google has stumbled across a clever trick, that involves treating them like maps — and it really, really works.

Technology Review has shone a light on a Google research paper published on the arXiv server. It spells out how the search giant’s most recent translation work focuses on plotting out words on language maps — think of a space full of words, ordered in some kind of logical order — and simply uses linear operations to switch between languages. Technology Review explains:

The new approach is relatively straightforward. It relies on the notion that every language must describe a similar set of ideas, so the words that do this must also be similar. For example, most languages will have words for common animals such as cat, dog, cow and so on. And these words are probably used in the same way in sentences such as “a cat is an animal that is smaller than a dog.”

The same is true of numbers. The image above shows the vector representations of the numbers one to five in English and Spanish and demonstrates how similar they are.

This is an important clue. The new trick is to represent an entire language using the relationship between its words. The set of all the relationships, the so-called “language space”, can be thought of as a set of vectors that each point from one word to another. And in recent years, linguists have discovered that it is possible to handle these vectors mathematically. For example, the operation ‘king’ — ‘man’ + ‘woman’ results in a vector that is similar to ‘queen’.

Turns out that lots of languages share a huge number of similarities when they’re mapped out in this way, which means the problem isn’t about finding the right words, but about finding the right way to map one vector space to another. In other words, translation’s no longer a problem of linguistics, but of mathematics.

The neat thing is that the technique doesn’t really make any assumptions about the languages involved, it just interrogates the way vector space are related to teach other. That makes it incredibly versatile. But the algorithms are still in their early days; there’s a way to go yet. [arXiv via Technology Review]

Picture: dylancantwell/Flickr