Companies’ drive for data could put citizens at risk

Michael Shu, general manager at BYD’s Auto Intelligent Ecology Institute. (Image credit: TechCrunch中国)

In China, shared power banks, cars, and vending machines have more in common than is evident at first glance. The technologies aren’t new, far from it. But they have been given new life by the internet. And more specifically, data.

Data is the lifeblood of artificial intelligence (AI), for which China has huge ambitions. The country has developed an outline to become the world leader in AI technologies by 2030. Companies are looking to cash in on this trend, but by developing data-first strategies they could be putting the country’s citizens at risk.

Speaking at TechCrunch Shenzhen 2018, Ren Mu, chief marketing officer at Laidian Technology, a firm that provides shared power banks, said that some players in the industry believe creating access to data-driven products is more important than the power bank sharing itself.  

He added that renting power banks will create the second largest market after bike sharing for credit data generation by recording users’ rental patterns. This is similar to platforms such as Alibaba’s Sesame Credit, thereby creating opportunities for new players who wish to use data for future non-power bank businesses, such as in retail.

It’s also an important asset for vending machine companies, which use it to provide insights into purchasing behavior in defined geographical areas.

Once collected, data can then be used to enter more retail scenarios in related areas as it can provide data for management decision-making, an executive from a leading automatic vending machine manufacturer based in Guangdong province told TechNode. 

“Many new retail startups, particularly those involved with hardware and cloud services, essentially run data businesses,” said the vending machine executive, who requested anonymity citing his sensitive position within the industry.

“This is not a secret in the industry anymore,” he added. “Now, it’s a strategy.” 

In the automobile industry, data could not just be used to train autonomous driving models, but also for entertainment purposes.

Michael Shu, general manager of the Auto Intelligent Ecology Institute at Chinese carmaker BYD, told the audience at TechCrunch that, in the past machines we used every day merely needed to protect users’ physical safety. “If the physical performance was good, it will be accepted,” he said.

“Once your vehicle becomes smart there are other types of problems, like cybersecurity issues … There is always a zero-sum game between you and hackers,” said Shu.

Peril of data

Data quickly becomes an asset that could raise a company’s valuation and market potential, which will then incentivize more data collection on all available digital channels, including those that navigate existing regulation. 

In a country that is flush with data, overcollection could have severe consequences for Chinese individuals. Data breaches and the sale of personal information have become a means of making a quick buck among the country’s data thieves, with both the sophistication and scope of these breaches increasing.

In August, more than 130 million customers were affected by a data breach in which 13 hotels operated by Huazhu Group had personal data and booking information stolen—the most significant such leak in five years. Just a week earlier, third-party developer for Chinese mobile operators Ruizhi Huasheng was exposed for collecting three billion pieces of personal user information from Tencent, WeChat, Alibaba, among others. It had placed malicious software on the mobile operators’ servers.

In a discussion at TechCrunch focusing specifically on the technical challenges of managing credit risk in a country with little traditional credit data, Shi Hongzhe, vice president at online lending company LexinFintech, said the company is hoping to use “weak connections”—data without a direct relation to the result of a computation—to build up risk models.

The “weak connections,” as acknowledged by engineers and managers in the field, could include the speed at a user types her ID number. At a fintech forum organized by Blue Whale Media in Beijing in August, Souyidai, an online lending platform run by internet and gaming market leader NetEase, said the company has built around 8,000 dimensions—both strong connections including traditional risk analysis elements such as identity and credit record, and weak connections—to detect user needs and risks.

A large user base is making analysis models for the connections convenient and more accurate. At the Blue Whale event, a spokesperson from the finance affiliate of retail giant JD said the company was serving 400 million users. It deals with 800 terabytes of new data generated by the users per day.

Flimsy framework

Protections do exist, but at this stage, the framework serves as a roadmap for future development. According to China’s 2017 Cybersecurity Law, collection needs to be legal, justified, and necessary. It defines personal data as including but not limited to an individual’s name, birth date, and ID information.

A year after the law was implemented, a set of standards for personal data privacy was published. The standards are modeled off the European Union’s General Data Protection Regulation (GDPR) and state that data collection needs to be minimal, retention needs to be short, and usage limited. The standards don’t require compliance but could be used by government agencies to determine if firms are abiding by the rules.

Other forms of data—such as those with “weak connections” that are excluded those defined in the Cybersecurity Law—could be used to infer personal information. Known as the mosaic theory, seemingly innocuous forms of data can be combined or connected to become identifying information, an increasingly important consideration as the world enters an age of AI implementation.

Former head of Google China, Kaifu Lee refers to China as the Saudi Arabia of data. The Middle Eastern country has made a name for itself as an oil-producing giant. The lesson would appear to be that, just like oil’s detrimental effects on the world around us, data could have the same outcome on the security and privacy of user data if left unchecked.

Additional reporting: Christopher Udemans. 

Clarification: This article has been updated to clarify the context of LexinFintech Vice President Shi Hongzhe’s comments. He was referring only to the technical challenges of assessing credit risks in China, where many people have no credit history.