When Microsoft and Skype revealed Skype Translator in May, everyone displayed awe and wonder at a service that could finally traverse the language barrier. The premise was that the Skype Translator app would convert speech in real time allowing fluid conversation between speaking partners with different lingual tongues.
Accomplishing something so monumental (and also releasing the beta later this year) is, in itself, a massive challenge. However, there's another layer to this science fiction babel fish, and that's learning the differences between writing and speaking.
Teresa Chong with IEEE Spectrum spoke with the Microsoft development team in Redmond, Wash., about how exactly Skype Translator will handle all the "ums," "ahs," you knows," and "likes" that pockmark everyday speech as well as vocal inflections, which separate a question from a statement. Chong highlights the main problem:
The gap exists between translating text and translating speech because some of the best machine translation systems today are taught using large volumes of high-quality text, which does not include the awkwardness that speech recognition systems deal with.
First, Microsoft took the traditional approach, but instead of only mapping phrases between languages, the team went a step further and mapped individual words as well. This helped overcome grammatical inconsistencies across languages. However, this soon brought them to social media where each platform -- primarily Facebook, SMS, and Twitter -- brought a unique challenge. The researchers adapted "social media text nomalization platform" to their existing system and improved text translation by six per cent with one developer saying "it really did move the needle on understanding and translating that type of data better."
This is another example of how social media is indispensable for research, not just in the social sciences, but in computer science as well. Hopefully, Microsoft will also benefit from Reddit, Imgur, and Twitch's recent endeavours with the Digital Ecologies Research Partnership (of course, called "DERP"), which allows researchers access to community-driven data across their platform.
The evolution of machine language learning is constantly evolving, and now our hashtags, posts, tweets, and digital slang will be a part of Skype Translator's future. [IEEE Spectrum]