5 min read
Toutiao is making fake news to train its anti-fake news AI
Toutiao’s AI software did not generate this headline, but for the 20 million pieces of content that flow through the platform each day, headline generation and AB testing are just two of the AI services Toutiao uses to get more people tapping.
Speaking to foreign journalists for the first time as head of the Jinri Toutiao AI Lab and vice president of the app’s owner Bytedance, Dr. Ma Wei-Ying talked about the tech that his lab is working on, why it has a bot that generates fake news and what it knows about its users.
Jinri Toutiao is a news recommendation app that is trained and updated in real time on a user’s behavior. Unlike search engines, Ma pointed out, its search function is individual rather than one ranking for everyone.
“This is the democratization of content creation,” said Ma, putting Bytedance in line with other Chinese tech companies that have recently declared themselves as content companies. “Toutiao is becoming a new information platform for people to find information and connect with information. People are using their smartphones not just to access information, but to create information. They don’t need their own website–they can use Toutiao to directly upload and publish the information and content they create.”
The tremendous amount of data generated by users and creators allows the training of neuro-network models. Applying AI to the data gathered is generating a better understanding of the world these users are in. “We are moving from a digital representation of the world to a semantic representation of the world”.
Ma believes the system is going to improve across the board. “Content creation will be fundamentally revolutionized in next few years” as AI allows the “mining of human intelligence to close the feedback loop” of each stage of the lifecycle of content creation, moderation, dissemination, and consumption. Here’s how.
Make fake news to beat fake news
Bytedance has a different approach to tackling fake news: writing it. The AI lab that Ma heads has developed a bot that uses the company’s growing database of real fake news stories to generate its own fake fake news. It then has another bot for detecting fake news which is trained by analyzing its counterpart’s fake feed, and by drawing on a matching database of real news. “One is good at writing, which means this also helps us to advance machine writing, and the other is machine reading. These two can push each other to improve by using the label data and assimilated data through our algorithms,” said Ma.
Ma believes that having two competing algorithms allows them each to improve. Toutiao lets users report what they believe to be fake news and analyzes comments to detect whether they suggest the content might be fake. When the system identifies a piece of fake news that has got through, it will notify all who have read it that they had read something fake.
Bytedance is using this “dual-learning” technique in other ways. It machine translates news from Chinese into English, then has another program to translate that article from English into Chinese to improve both processes. Fake news can also be translated to allow the algorithms to train for Toutiao’s global expansion. Other aspects of global expansion are language-independent, such as video, meaning those algorithms have already been trained on large numbers of Chinese users.
In the future, the culmination of analyzing successful pieces, building a database of popular topics, and developing machine writing will mean Toutiao will be able to automatically generate articles for its readers on their favorite subjects.
Better algorithms, better articles
“We adjust our strategy every week. It’s a constant experiment,” said Ma. The system is monitoring in real time and is also working to predict if a piece of content will be a success. Algorithms offer four headlines to article writers then conduct AB testing to determine which is having the most impact. But not all articles are subject to algorithms due to the computing power involved. Only when a piece starts to gain traction will it get extra help.
Machine learning is used for viral prediction. It compares incoming articles with previous content that has taken off and as the machine learning proves successful, the accuracy of the system increases with constant feedback. Ma acknowledged that care has to be taken to prevent the algorithms from distorting the popularity of particular elements of content or stopping content from new users getting through who have yet to establish a positive profile from the system.
Automated sports commentary
Object recognition in video is also finely developed to fuel more personalization. Bytedance is working on smarter, personalized sports coverage, explained Ma. The current one-feed-fits-all approach will be replaced with a tailored viewing experience when fan data recognizes an interest in, for example, a particular player. Coverage will focus more on that player, with the end goal being a personalized, automated commentary and onscreen captions.
Location, location, location. And time.
Toutiao builds up an idea of users’ lives including their whereabouts and habits. As well as understanding what content the user is interested in, the AI adjusts recommendations based on current and historic location. Ma gave an example of this which shows the sophistication of the tool. Chinese people living in the US, using Toutiao as part of their everyday lives there, are generating a footprint. Then suddenly Chinese New Year comes around and the location changes from the US to somewhere in China. The news may change accordingly there and then, but once the user heads back to the States, the software assumes that the user’s location at Chinese New Year was significant to them, and probably their hometown. Once back in the US, if any news stories crop up in their supposed hometowns, they will show up in the users’ feeds.
Time is used as a gauge for what is appropriate to send. Algorithms work out when a person is busy and so the app will not bombard them with too much content and will save it until they are free. On a larger scale, the data is providing profiles of cities and areas of cities in terms of people’s working habits. On an individual scale, these patterns can suggest what a person’s occupation is, but the data is anonymized. The system generates a user ID per smartphone, made up of a billion factors and which only an algorithm can identify.
Moderation and government relations
In a separate briefing, Bytedance senior vice-president for corporate development Liu Zhen revealed that of the 20 million pieces of content uploaded to Toutiao each day, 90% are machine moderated. Meaning the other 2 million pieces are human-reviewed. Although Toutiao has been working on its moderation for five years, humans are and always will be needed, according to Ma.
“We have a very good communication channel between the company and the government. So far we’ve been working very hard because we are a new platform, a new kind of application exploring a new frontier. Things have been going quite smoothly because the communication channel is very open and very healthy,” said Ma.