Data for sale:
- Three hundred and three hours of English and Chinese voice data collected from mobile phones, on sale at the Beijing International Big Data Exchange. Price negotiable.
- A collection of adult faces intended for AI training, provided on the Shanxi Data Exchange Platform. Price to be negotiated offline.
- A collection of pictures of the Chinese flag taken by mobile phone for AI model training, for sale on the Shanxi platform. Price to be negotiated offline.
- 20KB of licensing and penalty histories of various companies, detailing the types and duration of the licenses and the decision-making authority, supplied by the Beijing Financial Holdings Group to the Beijing International Big Data Exchange. Price negotiable.
- A list of COVID-19 testing centers in Beijing, complete with addresses, phone numbers, details on how to make an appointment, and booking links. Viewable online for free.
More data and more data exchanges are to come. There are at least 15 of these online marketplaces for big data in some stage of development in China. Established by city or regional governments, these pilot projects are state-run, or operating on a mixed public-private basis. Government agencies are the primary sources of the datasets. Whether private companies will be willing to put their data up for sale is an open question. The buyers may be individuals or companies. Qualifications for purchases are unclear, with various data exchanges asking potential buyers to register first.
Data as ‘seed capital’
Data exchanges are a key element in China’s ambitious digital plans. The 14th Five-Year Plan, released in March, set forth Beijing’s plan to integrate technology into development, digitizing everything from industrial production to agriculture to municipal governance. The digitization plan even extends to culture, sports, and lifestyle services such as libraries, hospitals, and nursing homes.
Chinese leaders hope to develop a new kind of data economy, one in which data is traded as easily as ball bearings or pork bellies, according to Kendra Schaefer, head of tech policy research at strategic advisory Trivium China in Beijing. With a focus on government infrastructure and domestic informatization, she has been researching China’s data marketplaces. Schaefer explained that China hopes to “supercharge innovation” by making more data available to companies and to the general public.
In November 2019, the Fourth Plenary Session of the Nineteenth Central Committee of the Communist Party characterized data as a “factor of production.” Experts identify this decision as a key change in economic strategy. It means that economic planners see data, and access to data, as being as important as land, labor, capital, and energy.
For now, most of the data available on exchanges comes from the state. The government sees data collected by state agencies as “seed capital,” Schaefer told TechNode. But as data exchanges mature, they hope to see private data sources join, and replace the state as primary sellers of data.
Shandong Province’s Rizhao City Big Data Development Bureau wrote in an analysis of trends in big data that governments at all levels have accumulated a “large volume” of data on the public. “How to put this data to use, better support government decision-making and public services, and to lead and promote the development of big data is key to the overall situation” (our translation).
Artificial intelligence (AI) development, in particular, reveals the importance of data to China’s technological growth. According to the Rizhao Bureau analysis, published on its WeChat account in April, recent important developments in artificial intelligence primarily stem from the “large volumes and high quality of the data that have been mined and analyzed.” The article states that it is often difficult for individual entities to gather enough high quality data on their own for effective research. It is only through “open sharing and the circulation of data across domains that we can create datasets with complete information,” the piece said.
How do data exchanges work?
A data exchange is an “experimental shopping mall for data and data services,” in Schaefer’s words. The exchanges are owners and operators of the malls as well as middlemen, negotiating agreements with data providers, such as government agencies and private companies, to sell their wares on the platforms. Schaefer explained that a major purpose of the data exchanges is to function as a “platform where government agencies put all their [data]… and then everybody knows where to go and how to get it and how to access it.”
TechNode’s research on exchanges’ websites and Chinese news sources has identified 15 planned exchanges. Based on news reports and exchange websites, 12 have some evidence of opening in the past six years, such as an opening ceremony, press conference, or anniversary. However, TechNode was only able to find four Web platforms listing datasets for sale. These are based in Beijing, Qingdao, Shanghai, and Shanxi. It is unclear whether and how the other trading platforms conduct business.
Potential customers on these four platforms search for various types of data separated into categories. They can use public data from state agencies for free. Current cost structures for other datasets, geared toward financial companies or AI training, are left vague, with prices for many datasets listed as “negotiable” or with a note asking potential buyers to contact the exchanges directly.
Big players in big data
China’s newest trading platform, the Beijing International Big Data Exchange, formally opened at the end of March. Run by the city government, it provides data in Beijing municipality and lists as “partners” on its website both state-owned enterprises such as China Electric (CEC), and private companies such as Tencent Cloud and JD. Tencent Cloud appears to be providing technical support and infrastructure for data sharing, but TechNode’s research did not find data for sale from Tencent, JD, or any other big private company on the Beijing, Shanghai, Shanxi, or Qingdao platform websites.
According to Schaefer, China’s major tech companies are “often tangentially involved in a huge variety of government projects,” and the manner of their involvement is “not always obvious.” She said that while partnerships could take multiple forms such as advising, sharing data, or building the platform infrastructure, it is also possible these tech majors are only providing technical support.
The 15 regional and municipal exchanges confirmed by TechNode are listed in the table below.
|Beijing International Big Data Exchange||Beijing||March 31, 2021|
|Shanghai Data Exchange Corporation||Shanghai||April 2016|
|Chongqing Big Data Exchange||Chongqing, Sichuan Province||Planning began September 2015|
|Northern Region Big Data Exchange||Tianjin||Planning began 2019|
|Beibuwan Big Data Exchange||Guangxi Province||Aug. 11, 2020|
|East Lake Big Data Trading Center||Wuhan, Hubei Province||July 2015|
|Guiyang Big Data Exchange||Guiyang, Guizhou Province||April 14, 2015|
|Harbin City Big Data Center||Harbin, Heilongjiang Province||September 2019|
|Hebei Big Data Exchange Center||Chengde, Hebei Province||Dec. 3, 2015|
|Henan Big Data Exchange||Henan Province||April 17, 2018|
|Huadong Jiangsu Big Data Exchange Center||Yancheng, Jiangsu Province||Circa early 2018|
|Qingdao Big Data Exchange Center||Qingdao, Shandong Province||Planning began April 2015|
|Shanxi Data Exchange Platform||Taiyuan, Shanxi Province||July 2020|
|Wuhan Yangtze River Big Data Exchange||Wuhan, Hubei Province (Optics Valley high tech development zone)||Aug. 28, 2015|
|Zhejiang Big Data Exchange Center||Hangzhou, Zhejiang Province||September 2016|
At least nine other cities and provinces are working to develop the ability to process data transactions, based on participation in the 2021 Joint Working Conference of the National Data Exchange Center in Shanghai, according to Chinese tech and investment publication Yaosu Jiaoyi Zhi Jia (The Factor Investor).
Two Chinese cities are also planning international exchange partnerships, both with Singapore. Puyang Science and Technology Intel reported that Tianjin began preparations to build a Northern Region Big Data Exchange in 2019. According to tech publication Cyberspace Ninghe, this exchange platform will be located in the future Sino-Singapore Tianjin Eco-city. China and Singapore are collaborating to develop the sustainable city on a plot of previously unusable land in Tianjin. The Singaporean government also announced back in September 2019 a partnership with Chongqing to create the China-Singapore (Chongqing) International Data Channel.
Big data, big problems
Some trading platforms are up and running, but they still need to resolve major problems regarding privacy protection, private sector sourcing, and integration with other exchanges if they are to operate as envisioned.
One big problem with selling data: it’s not always clear if you own it. Lawyers say that Chinese law doesn’t provide clear rules about what data can be owned, and how it works.
One key issue is the difference between “personal information” and “data.” The Chinese civil code gives citizens privacy rights to protect “personal private information”—conversations, medical information, ID numbers, faces, names. You can let a company use your face or your name, but you can’t sell them.
But when a company collects information about thousands or millions of people and properly anonymizes it, it can become data—a critical factor of production that the state wants bought and sold. Making clear rules for this informational alchemy is vital to make a data economy work, experts say.
Is it anonymous?
Much valuable data starts its life as personal private information. For example, mobile phone data is listed as a source of audio and image data under the Beijing Exchange’s artificial intelligence category. Many of these datasets are provided by Datatang, a state-owned AI services provider listed on Beijing’s NEEQ stock exchange.
While these datasets do not have names or contact information tied to each voice and image, that alone does not mean they are anonymized. The most recent draft of China’s Personal Information Protection Law, released in April, stated that personal information is any type of information “recognizable or potentially recognizable as being related to a person” (our translation). According to Camille Boullenois, a consultant with the European research consultancy Sinolytics, data is not considered anonymized if it can be used “alone or in combination with other data” to lead to re-identification of a person.
Even when direct identifiers such as a name or ID number are removed, it is possible for voice or image data to be combined with other information and traced back to its source. Boullenois explained that the risk of re-identification is “very difficult to assess.” The risk of re-identification often changes with time. For example, she said, if new data is added to the exchange in a few years, it might enable new combinations leading to reidentification.
Another significant hurdle to creating a useful exchange is that much of the most useful information comes from private companies. They’re “reticent” to sell their data, Schaefer said. Schaefer explained that since the technologies to protect data assets and regulations around ownership are not yet fully developed, companies see it as a “huge risk” to contribute data on the basis of an agreement or contract with the government. “There is not a lot of legal support for data protection right now,” she said.
Nonetheless, it’s possible companies will be willing to work with data exchanges in the future. “Depending on what data we’re talking about, [a company’s] evaluation changes,” Schaefer said. She suggested that while tech companies will try to avoid sharing the crown jewels with exchange platforms, they may be happy to give data that is “just sitting around” if they “can’t find another use for it.”
Legally and technologically, there is no nationally integrated system for data sharing. Gestures have been made with inter-city and international data sharing initiatives and conferences, but the reality is that the exchange platforms are scattered and unconnected. Writing in January for the Research Institute for Modern Digital Cities, Chinese analysts Li Chunguang and Wang Shuo argue that resources are unevenly distributed from region to region, with local governments lacking the talent to support digital transformation and key research centers being located in major cities outside of regional development areas. The two analysts say these gaps will hinder the growth of big data.
Still in beta
More regulations are on the way, and companies are waiting for these changes to data marketplaces before doing business. While two of three major planned laws governing data in China, the 2017 Cybersecurity Law and the 2021 Data Security Law, have gone into effect or been finalized, respectively, the Personal Information Protection Law is currently in its second draft. According to Schaefer, taken together, the three laws will serve as the “foundation” of Chinese privacy and data regulation. She explained that finalizing the laws governing data is “one of China’s top priorities,” and she expects to see an “explosion of regulations” when all three laws are in effect. “We’re really close to having that foundation there,” she said.
On its WeChat account, the Rizhao Big Data Development Bureau highlighted the importance of these laws in creating a process for big data usage: “In considering systemization, ensuring consistency, and avoiding fragmentation, formulating special data security laws, personal privacy protection are necessary.” But the bureau note also said that laws and regulations will add friction to data sharing and transactions, “inevitably [increasing] the cost of data circulation” and “[decreasing] the effectiveness of data integration.”