How one US data company is helping feed China’s hungry AI

4 min read

US-based Remark Holdings wants to bring all the data from all the major social networks around the world onto one platform. The company has amassed data on 1.3 billion people from social and consumer sites including Facebook, Twitter, Instagram, Sina Weibo, Alibaba, Baidu and Tencent—and you are likely one of them.  Currently, they are developing artificial intelligence algorithms and, aside from China’s big data players, they are working with international consumer brands and fintech companies in China.

How were they able to amass so much data? How can they turn this mess of social posts and random shopping decisions into pure AI-driven gold? And why are Chinese tech companies relying on an AI company from the US?

How AI recreates our online persona

“What we are working on together serves both sides with two purposes,” Remark’s CTO Jason Wei began explaining. “One is to marry the data from both sides into useful models.”

Predictive models, or AI, are used to make business decisions, most notably in marketing. Models are applied across different industries, they are even used to perfect Key Opinion Leaders (KOLs) into strong marketing weapons.

One of the reasons why Remark works with Alibaba is because they have the best online retail data, Wei said. But that’s not enough—they needed to connect a person’s social behavior and offline consumer insights in order to create a full picture. To cover these angles, Remark uses its data partnerships with Tencent and Baidu, as well as brands such as H&M, Aston Martin, and Uniqlo. As Wei explained, in order to train AI, it’s not enough to have a lot of data—it’s also the variety that matters.

“Marry the online retail data with social data and offline retail data and that will create a model which is able to analyze consumers in pretty much 360 degrees,” said Wei.

Data from those three sources allows them to identify a consumer through a specific ID and link his or her’s consumer behavior.

“The reason why Alibaba and Tencent both invited us is the data that we have. Now when we take our data and join it with their data we have over 11 or 12,000 different data points on how we can identify a person’s behavioral history,” chairman and CEO of Remark Shing Tao added.

How AI decides if we’re creditworthy

Another thing that Remark does well is social credit rating using alternative data—what you buy, what you share online, who your friends are, and what kind of services you use—to make a decision on your ability to pay off loans. These data types are often used when credit data is absent, which in China is often the case—only 25% of the population have a credit history. Remark targets the younger generation which is socially active online. It then offers all this data to China’s small microlending companies and will soon be offering it to big banks.

“These companies have grown their market really fast and although everybody claims that they have risk management they don’t really use them. The reason is that they just tried to grab the market and they could afford that by charging really high APR (annual percentage rate) to cover their loss,” said Wei.

But since the government has stepped in to regulate them and lowered their rates, these fintech companies have been looking to lower their risks when offering loans. That’s where Remark steps in with their abundance of data. However, unlike marketing, when it comes to deciding who gets credit—a potentially life-changing decision—data is not enough.

“Looking at trends—and there will be some regulation on that just like in the US—social behavior still cannot be a decision-making factor in the (credit assessment) process because of privacy and other concerns,” said Wei. But privacy is not the only reason why AI-driven decisions have to be taken with caveats.

How AI makes mistakes

AI today is much more advanced today than its first generation, AI 1.0 which could learn what people are doing but couldn’t handle a situation which they haven’t encountered, according to Wei. An example of this 1st gen AI is Deep Blue. Today’s AIs are able to predict something they have never seen before. But it still relies on people.

“The AI will start with human experience and will be limited to human experience or what you call bias—it depends on what kind of samples you feed to the AI,” Wei explained. For instance, when Alibaba tried to create its predictive models using solely its own data it discovered that the models just weren’t usable. They needed more data which is another source of difficulties.

“Training AI is a really painful process,” Wei said. With billions and billions of data points, one would need many people to pick out the right samples. And even if someone could hire all these people, human inconsistencies hurt the process—what one person considers a positive sample, another may hold negative.

“Since the human is part of the training process the bias is not going to be avoidable,” Wei said. “That’s why machine learning will lead us to the next generation of AI in which we believe we will take humans out of the process. This means we will use machines to determine the samples for the machine to learn. It will learn much faster than when feeding the samples by humans.”

For now, Remark aims to widen the use of their Kankan platform with around 15 more products within the next 15 months. They include facial recognition and natural language processing and will be applied in fintech, live streaming filters, public safety, and one seemingly popular area in China—surveillance.