On October 20, Chinese travel service and experience sharing platform Mafengwo (literally “wasps’ nest”) was reported to have faked over 18 million user comments and blog articles among the total 21 million “original” content pieces the company claims it owns. The claims were made by independent WeChat media account Xiaosheng Bibi co-reported (in Chinese) with data group Hooray Data (乎睿数据).
Xiaosheng Bibi and Hooray Data said the 18 million pieces of UGC (user-generated content), after close investigation using Python and other Natural Language Processing (NLP) analyses, were found plagiarized from rivals or just completely faked.
“Mafengwo’s plagiarizing is worse than we imagined,” Xiaosheng Bibi said in the report. “We set up the rule for our data analysis that, in terms of content, copying the exact same words would be part of our analysis. If there is one sentence different from any ten sentences analyzed, we say the copy hypothesis fails.”
The report said it has found 7,454 fake user accounts for copying and transferring content from other sites including Ctrip, eLong, Meituan Dianping, Agoda, and Yelp (using Google Translate to transfer comments to Mafengwo’s domestic site). The total copied data include 5.72 million comments and blog stories on restaurant and bars, and 12.2 million comments and stories on hotels, in total around 85% of all comments Mafengwo shows on the platform. Hooray Data didn’t specify other analysis techniques in the report.
Hooray Data also showed that Mafengwo’s most comments on both food and restaurants were made during weekdays, a pattern different from other major platforms which usually see peaks during the weekends. In terms of daily comments, according to original comments details on Mafengwo, users tend to comment between 10 am and noon as well 2 pm to 5 pm. The finding is opposite of Meituan Dianping’s pattern which receives the most comments during meal times. The team suspects Mafengwo staff faked the content during company working hours.
Mafengwo responded (in Chinese) earlier this morning, saying that the media report was an “organized attack”, and the company will defend its interest with the law. Additionally, Mafengwo said user comments are only 2.9% of the total related data the company owns, which contradicts significantly with “third-party” findings. Mafengwo clarified that it has “extremely few” accounts that are suspected for copying content. TechNode reached out to Mafengwo for further details but did not receive a reply by publication
The original report posted on WeChat by Xiaosheng Bibi and Hooray Data is now labeled as “disputed content” by WeChat platform. Hooray Data, claiming that Mafengwo is now “destroying evidence,” has opened access to the 7,454 suspicious accounts they discovered to prove their findings.
“Due diligence into internet-business companies’ data is becoming more and more important. Behind-the-scene dirty marketing approaches will finally be history. In the future, Xiaosheng Bibi and Hooray Data will tear down more ’emperor’s new clothes’,” Xiaosheng Bibi and Hooray Data said in the subscription account.
According to a public conversation in the comments section, the investigative team said they would soon release more information about other companies and tactics. TechNode reached out to them as well to get more details but did not get a reply by the time of publication.
Established in 2006, Mafengwo gained massive fame during the 2018 World Cup season by heavily investing in media advertising. The company hoped to raise another $300 million which would boost the company valuation to up to $2.5 billion. Its existing investors include Temasek Holdings.