Skip to main content


Cornell Student Articles on Topical Affairs

Could Artificial Intelligence Become the Future of Translation?

A lot of people watch Game of Thrones. It’s one of the most popular, if not the most popular show on the planet right now, and as such, it probably receives some of the most widespread and intense scrutiny of any show currently airing. With all that in mind, it might be tempting to assume that the showrunners, directors, and others in charge of the set and filming would be more careful than they have been recently, when someone left a cup of Starbucks in the shot that went on to air on HBO in front of millions of viewers. However, there was recently another gaffe that likely went unnoticed by everyone but a smaller section of the show’s viewers, namely Spanish speakers. In one very tense scene, one of the characters yells (in English) “She can’t see us!” What was the Spanish translation? “Sicansios!” This may seem all well and good to an average English-speaking viewer, but there’s one minor flaw: Sicansios is not a word in any language, let alone Spanish.

How did this happen? The issue lies in the original actor’s thick Scottish accent, as well as a number of issues endemic to the business of quick-and-dirty professional media translation that should look familiar to anyone who’s had a job in the last two thousand years: unskilled labor, impressively low wages, and in this specific instance, indecipherable source material. Thus begs the question: Will artificial intelligence solve these issues definitively, once and for all? Will it put translators out of work? The answer is, predictably, maybe, but not any time soon.

The most enterprising and forward-thinking (and optimistic, probably) in the field say that online translation will become an unstoppable juggernaut as soon as one to three years from now, taking up the vast majority of translation work. However, here we have the benefit of hindsight: That prediction is from August of 2018, and almost a year later, not much seems to have changed. Facebook, impressively, has forged ahead and passed some important milestones on its way to challenging Google and other online translation businesses for ultimate dominance, and the top companies have allegedly already implemented the newest and shiniest neural-net and deep-learning-powered artificial intelligence algorithms into their translation services. And yet, if one were to type an even lightly informal and yet totally innocuous and potentially commonly-used phrase like “Hey man, any chance I could crash at your place tonight?” into Google Translate and send it into, say, Mandarin Chinese and back, the transformed English phrase is “Hey, man, I have a chance to hit your place tonight?” Which, far from a casual request for a free couch, sounds more like a nervous would-be mobster trying to tip off their next potential target. It’s a big difference.

Machine-powered translation systems like this automation software are full of issues like these even today, in the era of nascent algorithmic dominance of absolutely everything, putting a bit of a damper on the excited claims of imminent near-perfection of the translation process by machines. That’s not for lack of trying, and examples of total failures abound – call them growing pains – and they are mostly all hilarious. Take, for instance, Wikipedia, who has partnered with Google to translate English pages into other languages, where the translation algorithm somehow turned “village pump,” a reasonable and not at all offensive English phrase, into “bomb the village” when translated into Portuguese.

Other issues may be more insidious and slightly less hilarious, however. Part of what made Facebook’s above-mentioned progress toward improving their translation algorithms possible was a form of machine learning wherein the machine learns from existing data as well as the data that it produces itself. It turns out that this is somehow teaching machines to be sexist, translating from non-gendered languages and then applying low-hanging gender biases at the output into gendered languages: A language that has no “he” or “she” but something closer to “it,” for instance, would produce “he is an engineer” and “she is a nurse” concurrently, despite neither being gendered in the original language.

All this is to say that machine learning translation still has a long way to go. Anyone who speaks more than one language even above a perfunctory level can make immediate and fairly accurate guesses regarding what will pose difficulties for machine translations. Take Indonesian, for instance, a language awash with cryptic and sometimes intimidating concatenations: In English, we might use an initialism like F.B.I or an acronym like NASA (the latter, spoken as a word, is also likely to confuse an algorithm), but in Indonesian, it’s much more common to string together a grouping of three letters from each word. So instead of BBC, you might have BritBroadCom, for British Broadcasting Company. These words are common to the point of ubiquity, and used everywhere from social media to official government communiques. And the translator, predictably, chokes on them.

And yet, human translators gave us the now-infamous “Sicansios”, so what’s to be done? Just because machines aren’t perfect doesn’t mean they won’t get better in the future, and for now the best thing to do is realize that language itself is either vague, ambiguous, or generally difficult, and some error is probably inevitable no matter how talented the translator is. And, for now, those talented humans are still the best at what they do.

Leave a Reply

Skip to toolbar