Sogou doubles down on AI with launch of real-time translation and transcription devices

At the start of the Sogou Partner Conference in Beijing, the audience heard a recording that sounded like American President Donald Trump congratulating Sogou on the recent IPO and how this was a step forward for American-Chinese relations. We were wondering if President Trump had started watching CCTV instead of Fox News until we realized that the recording was synthesized. Well played, Sogou.

The playful recording signaled the search company’s high spirits, who had a successful IPO at the New York Stock Exchange in November 2017. CEO Wang Xiaochuan announced two new products in his keynote for the conference: the Sogou Travel Translator and the Sogou Smart Translation Recorder. Targeting Chinese globetrotters, the Travel Translator offers real-time speech and image translation for 17 languages, including Chinese, English, German and Arabic. While the Translation Recorder offers instant speech to text function and translation of the recorded text for 17 languages.

The Sogou Travel Translator (L) and Smart Translation Recorder (R) (Image credit: Sogou)

“We think the translator is a cut above the rest in the market,” said Wang Xiaochuan when asked about how the new products compare to the translation offered by competitors during an interview after the conference. This is a bold claim, considering others in the market include the Google Pixel Buds and Chinese voice recognition company iFlytek’s range of translation hardware and software.

Sogou has the data to back that up, however, at least for the Chinese language. Their Sogou Chinese keyboard dominates at over 70% of the input method market and while they fall behind Baidu in the search market, Sogou’s platform is China’s largest search engine by voice. According to Sogou CTO Yang Hongtao, these and other Sogou services combine to provide over 200 million voice requests each day, generating around 240 thousand hours of voice data per day—all of which will help Sogou refine its natural language processing technology.

Sogou CTO Yang Hongtao sharing some stats on Sogou at the Partner Conference. (Image credit: TechNode)

The Sogou Travel Translator and the Transcription Pen represent Sogou’s first foray into AI-powered hardware products and will be available for pre-order on JD.com from March 12. TechNode tested the Travel Translator after the conference. The Travel Translator returned an accurate English translation for “Where is the coffee shop?” in Chinese (咖啡店在哪里?). However, when speaking the same phrase in English, the syntax of the Chinese translation was not quite right. A staff explained that the Travel Translator will continue to be improved.

Sogou has started to focus on AI research and development in recent years. It unveiled an IBM Watson-like robot, called Sogou Wangzai (or Sogou Doggy) in 2017. The canine robot appears on Jiangsu TV’s Who’s Still Standing to battle contestants in general knowledge trivia. Baidu also has a robot called Xiaodu who has appeared on various TV game shows since 2014. Unavoidably, the two robots have drawn comparisons. On Zhihu, China’s version of Quora, some netizens thought Wangzai’s antics made it more charming.

Sogou Wangzai’s first appearance on Jiangsu TV’s Who’s Still Standing. (Image credit: screen capture from iQiyi)

“I just want to talk with this pretty girl here,” Wangzai said to host Guo Xiaomin during his debut on Who’s Still Standing. When we asked Wangzai at the Sogou conference who is the current US President, Wangzai gave the correct answer.

Another Sogou natural language processing technology showcased at the conference was lip reading. Sogou is the first company in China to develop this technology, which IBM and Google’s DeepMind have been working on for a while. Applications for this technology include assistance for people with hearing or speech disabilities, silent dictation, and subtitling.

A visitor at Sogou Partner Conference trying out the lip reading AI. (Image credit: TechNode)

TechNode gave Sogou’s lip reading AI a go. When reading a scripted Chinese text, the accuracy was almost 100%. However, when we went off script and said: “How’s the weather?” in Chinese (天气怎么样), the AI picked up the “how” (怎么样) but not the rest of the sentence. This is expected as the accuracy rate for unscripted, natural lip reading for Sogou is at 60%. For lip reading in scripted scenarios such as while driving or giving commands to smart home devices, Sogou claims that the accuracy rate is 90%. For comparison, Google’s DeepMind can annotate English TV footage with a 46.8% accuracy.

There is no doubt of AI’s role in Sogou’s future development, as CEO Wang Xiaochuan explained: “We are entering a new era where AI technology is unlocking a world of possibilities. We will continue to explore ways that we can leverage Sogou’s expertise in AI, particularly in natural language processing, to develop other smart hardware solutions to everyday challenges.”