3 reasons why AI won’t replace human translators… yet

5 min read

Editor’s note: A version of this piece by Jonathan Rechtman originally appeared on the World Economic Forum Agenda website. All opinions expressed are those of the author.

I’ve been a proud simultaneous interpreter for the better part of a decade, but I agree that it’s not the flashiest of jobs. We speak in hushed voices, tucked away in small booths in the corners of conference rooms, the polyglot wallflowers of the global economy.

Grumbles grew into a roar recently when an interpreter in Shanghai took to social media to protest the misleading marketing of “AI-powered translation” at an international conference. The translation was, in fact, a voice-to-text transcription of the human interpreter’s work. The post went viral on Chinese social media and created a scandal around iFlyTek, the promoter of the mislabeled technology and one of China’s leading AI and natural language processing (NLP) companies.

The public response revealed the extent to which machine superiority in the field is already taken for granted. People seem genuinely shocked that in this day and age, interpreting still requires human professionals to perform actual knowledge work.

Didn’t Google Translate solve this problem years ago? Or Skype Translator? Or any of a dozen wearable translation devices on the market claiming to be the next “Babel Fish”?

AI consistently outperforms humans at driving carsdiagnosing cancershooting free-throws and predicting crop yields (not to mention chessGopoker, and Jeopardy). But when it comes to translation and interpreting, the most sophisticated technology on earth is still by far the human brain.

How come? There are three reasons.

1. Language is subjective

Artificial intelligence typically excels at tasks that are rooted in objective reality. Whether identifying elusive signal patterns in data sets or navigating complex road conditions, machines function best when confronted with clear mathematical or physical rules that govern their decision-making.

Natural languages, by contrast, are subjective constructs invented by groups of humans to communicate with each other. They often exhibit rule-like behavior (grammar and conjugation, for example), but these rules are grounded only in convention, not objective reality, and they are constantly evolving.

Humans may have forfeited our lead in recognizing tumors or judging credit risk, but we still have, and may always have, the final authority over what is or isn’t “natural” in a natural language. This authority is reflected in the metric of choice for evaluating machine translation algorithms – the BLEU (bilingual evaluation understudy) – which scores candidate translations based on their similarities to a human professional’s work. “The closer a machine translation is to a professional human translation, the better it is”, concede the framework’s inventors.

Human translation doesn’t just set the standard, it necessarily is the standard.

2. Big data doesn’t have a big sense of humor

Any translator will tell you that jokes, puns and sly innuendo (as well as nuanced cultural references) are among the hardest bits to get over the language barrier. Yet without them, our quality of expression becomes much poorer. From an interpreter’s standpoint, tone of voice and body language also directly inform a speaker’s intent and have to be accurately analyzed and conveyed in the target language as well.

This is challenging for humans, but it’s currently impossible for machines.

The move from statistical, phrase-based machine translation to neural networks has yielded significant improvements in overall quality. But neural machine translation is even more dependent on huge sets of training data than its predecessor models. And since the biggest bilingual datasets available are from official translations of government documents and religious texts, these algorithms have a pitifully low exposure to humor, wordplay and non-verbal expression.

Most disturbingly, neural machine translation often doesn’t confess its mistakes. Rather, like an ill-prepared schoolchild, it tries to fudge through them. When Google Translate started offering biblical prophesies in exchange for junk input, experts attributed the errors to neural networks’ preference for fluency over accuracy.

These “false positives” are far more insidious than clumsier and more obvious mistakes, as audiences in the target language might never realize a glitch has occurred and might attribute the outlandishness of the renegade translation to the original text itself.

3. Listen up, bots

The challenges above make it difficult enough to perform machine translation on a piece of static text. Asking a computer to interpret live speech simultaneously adds several layers of complexity, the most obvious being automatic speech recognition (ASR).

Yes, Siri, Alexa and their ilk seem to be pretty competent conversationalists these days. But your witty repartee with robots is typically constrained to a narrow set of contexts and conditions: short, command-based interactions involving a finite vocabulary in a controlled environment. Most live conferences and business discussions, on the other hand, feature speech that is spontaneous, continuous and highly context-dependent – traits that send the error rate of most ASR programs through the roof.

Hilarious and humiliating examples abound. Giving a speech in Beijing earlier this year, hedge fund guru Ray Dalio reflected on his mis-forecasts as a young trader.

“How arrogant!” he thundered to the crowd. “How could I be so arrogant?”

The real-time subtitling program valiantly struggled to render his rhetorical device.

“How?” the subtitles asked. “Aragon, I looked at myself and i”.

Recent advances in the field are promising, and many experts predict that the word error rate of ASR software will reach parity with human transcribers in the near future. Not all word errors are the same, though. Fudging “alright” into “ all right” might be an inconsequential mistake, while confusing “today” with “Tuesday” would likely cause a substantial mix-up. Even with fewer word errors, machines remain far more likely than humans to commit semantic errors that misrepresent the intended meaning of a speech.

Not a human, not yet a robot

Humans have long made a pastime of reflecting on our perceived superiority – over animals, over each other and more recently over machines. It’s a dark pastime, to be sure, and an inevitably foolish one.

I don’t doubt that the day may come when computers develop a human-like command of our natural languages. I don’t doubt that one day interpreters and translators, along with copywriters, editors, radio hosts and other professionals in the language economy, may find their jobs on the robot’s chopping block.

But that day is further away than most people think. Language work – always part art, part science – is surprisingly defensible against these early iterations of AI.

Like so many other industries, we language professionals should focus our attention on using AI/NLP technologies to increase the efficiency, quality and cost-competitiveness of our labor. Computer-assisted translation tools are already widely used among text translators, and while many bristle at the suggestion, no doubt simultaneous interpreters could benefit from some combination of speech recognition and translation memory technology. For the foreseeable future, at least, these tools would serve as a complement, not an alternative, to human output.

And as long as there are humans in the interpreter’s booth, let’s have the decency to give them the credit they deserve.