I’ve been a proud simultaneous interpreter for the better part of a decade, but I agree that it’s not the flashiest of jobs. We speak in hushed voices, tucked away in small booths in the corners of conference rooms, the polyglot wallflowers of the global economy.
Grumbles grew into a roar recently when an interpreter in Shanghai took to social media to protest the misleading marketing of “AI-powered translation” at an international conference. The translation was, in fact, a voice-to-text transcription of the human interpreter’s work. The post went viral on Chinese social media and created a scandal around iFlyTek, the promoter of the mislabeled technology and one of China’s leading AI and natural language processing (NLP) companies.
The public response revealed the extent to which machine superiority in the field is already taken for granted. People seem genuinely shocked that in this day and age, interpreting still requires human professionals to perform actual knowledge work.
Artificial intelligence typically excels at tasks that are rooted in objective reality. Whether identifying elusive signal patterns in data sets or navigating complex road conditions, machines function best when confronted with clear mathematical or physical rules that govern their decision-making.
Natural languages, by contrast, are subjective constructs invented by groups of humans to communicate with each other. They often exhibit rule-like behavior (grammar and conjugation, for example), but these rules are grounded only in convention, not objective reality, and they are constantly evolving.
Humans may have forfeited our lead in recognizing tumors or judging credit risk, but we still have, and may always have, the final authority over what is or isn’t “natural” in a natural language. This authority is reflected in the metric of choice for evaluating machine translation algorithms – the BLEU (bilingual evaluation understudy) – which scores candidate translations based on their similarities to a human professional’s work. “The closer a machine translation is to a professional human translation, the better it is”, concede the framework’s inventors.
Human translation doesn’t just set the standard, it necessarily is the standard.
2. Big data doesn’t have a big sense of humor
Any translator will tell you that jokes, puns and sly innuendo (as well as nuanced cultural references) are among the hardest bits to get over the language barrier. Yet without them, our quality of expression becomes much poorer. From an interpreter’s standpoint, tone of voice and body language also directly inform a speaker’s intent and have to be accurately analyzed and conveyed in the target language as well.
This is challenging for humans, but it’s currently impossible for machines.
Most disturbingly, neural machine translation often doesn’t confess its mistakes. Rather, like an ill-prepared schoolchild, it tries to fudge through them. When Google Translate started offering biblical prophesies in exchange for junk input, experts attributed the errors to neural networks’ preference for fluency over accuracy.
These “false positives” are far more insidious than clumsier and more obvious mistakes, as audiences in the target language might never realize a glitch has occurred and might attribute the outlandishness of the renegade translation to the original text itself.
3. Listen up, bots
The challenges above make it difficult enough to perform machine translation on a piece of static text. Asking a computer to interpret live speech simultaneously adds several layers of complexity, the most obvious being automatic speech recognition (ASR).
Yes, Siri, Alexa and their ilk seem to be pretty competent conversationalists these days. But your witty repartee with robots is typically constrained to a narrow set of contexts and conditions: short, command-based interactions involving a finite vocabulary in a controlled environment. Most live conferences and business discussions, on the other hand, feature speech that is spontaneous, continuous and highly context-dependent – traits that send the error rate of most ASR programs through the roof.
“How arrogant!” he thundered to the crowd. “How could I be so arrogant?”
The real-time subtitling program valiantly struggled to render his rhetorical device.
“How?” the subtitles asked. “Aragon, I looked at myself and i”.
Recent advances in the field are promising, and many experts predict that the word error rate of ASR software will reach parity with human transcribers in the near future. Not all word errors are the same, though. Fudging “alright” into “ all right” might be an inconsequential mistake, while confusing “today” with “Tuesday” would likely cause a substantial mix-up. Even with fewer word errors, machines remain far more likely than humans to commit semantic errors that misrepresent the intended meaning of a speech.
Not a human, not yet a robot
Humans have long made a pastime of reflecting on our perceived superiority – over animals, over each other and more recently over machines. It’s a dark pastime, to be sure, and an inevitably foolish one.
I don’t doubt that the day may come when computers develop a human-like command of our natural languages. I don’t doubt that one day interpreters and translators, along with copywriters, editors, radio hosts and other professionals in the language economy, may find their jobs on the robot’s chopping block.
But that day is further away than most people think. Language work – always part art, part science – is surprisingly defensible against these early iterations of AI.
Like so many other industries, we language professionals should focus our attention on using AI/NLP technologies to increase the efficiency, quality and cost-competitiveness of our labor. Computer-assisted translation tools are already widely used among text translators, and while many bristle at the suggestion, no doubt simultaneous interpreters could benefit from some combination of speech recognition and translation memory technology. For the foreseeable future, at least, these tools would serve as a complement, not an alternative, to human output.
And as long as there are humans in the interpreter’s booth, let’s have the decency to give them the credit they deserve.