On October 20, Chinese travel service and experience sharing platform Mafengwo (literally “wasps’ nest”) was reported to have faked over 18 million user comments and blog articles among the total 21 million “original” content pieces the company claims it owns. The claims were made by independent WeChat media account Xiaosheng Bibi co-reported (in Chinese) with data group Hooray Data (乎睿数据).

Xiaosheng Bibi and Hooray Data said the 18 million pieces of UGC (user-generated content), after close investigation using Python and other Natural Language Processing (NLP) analyses, were found plagiarized from rivals or just completely faked.

“Mafengwo’s plagiarizing is worse than we imagined,” Xiaosheng Bibi said in the report. “We set up the rule for our data analysis that, in terms of content, copying the exact same words would be part of our analysis. If there is one sentence different from any ten sentences analyzed, we say the copy hypothesis fails.”

Start your free trial now.

Get instant access to all our premium content, archives, newsletters, and online community.

Monthly Membership

Yearly Membership

What you get

Full access to all premium content and our full archives

Members'-only newsletters

Preferential access and discounts to all TechNode events

Direct access to the TechNode newsroom

Start your free trial now.

Get instant access to all our premium content, archives, newsletters, and online community.

Monthly Membership

Yearly Membership

Runhua Zhao

Runhua Zhao is a technology reporter based in Beijing. Connect with her via email: runhuazhao@technode.com