Describing how his company’s fraud prediction system is like an unaccompanied child visiting a zoo, is when Wally Wang becomes the most animated… and the unsupervised machine-learning approach starts to make sense: “There’s no mom to teach a child what a tiger is. The child by intelligence will automatically get the point of how to recognize a tiger or a goose. The connection will be built by the child. That’s how the AI part plays here, the algorithm does the detection… We build up a model, making the child more intelligent, to be able to tell automatically when an animal is evolving into another species.”
DataVisor uses unsupervised machine learning to predict fraud attacks on companies. Founded by two women originally from China, DataVisor claims that using big data AI rather than databases and blacklists gives it greater accuracy and makes it better able to deal with the fast-changing world of online fraud. Tackling the attacks on China’s vast e-commerce sector has allowed it to build models that can be used anywhere in the world and win clients such as Pinterest. The global big data security market is estimated to reach $26.85 billion by 2022, according to a report by MarketsAndMarkets.
Cyber attacks on businesses come in many guises. At one end of the spectrum are the widely-reported ransomware attacks such as WannaCry and thefts of data such as those experienced by Equifax and Uber. But businesses are also faced with the constant bombardment of small-scale hits, from fraudulent transactions to fake reviews and account registrations, organized groups abusing new promotions and even their own staff fiddling the figures for their own achievements.
DataVisor was founded in December 2013 by CEO Xie Yinglian and CTO Yu Fang, both with PhDs in computer science from US universities. Their solution plugs into a company’s data feed then uses machine learning to look for signs of imminent attacks on the company.
Originally founded in the US, their Beijing office is large with a sense of an imminent client visit. Here, Wally, Head of Business Development and Product, explained to us why it’s better to leave your systems unsupervised.
A child at a zoo–how machine learning works for attack detection
If you have a database of known threats–or animals in the zoo–what happens when a new animal turns up that isn’t in the database? Other anti-fraud systems work by having blacklists of known offenders and databases of threats or patterns of behavior. These work with predetermined labels which, if detected, alert a system that an attack is happening or has happened. This is known as supervised detection. Machine learning, on the other hand, allows unsupervised detection which works better: Some animals even the mom won’t recognize.
“We are not defining what is strange. We are not defining the database or rule base (meaning if they switch hands or the device operating system is too old), we are not using those predefined rules because we think those rules will easily be got around by the fraudsters,” said Wang. “Instead we build up a model, making the child more intelligent, to be able to tell automatically when an animal is evolving into another species.”
DataVisor lets client data flow–whether it’s emails, SMS, app use, phone numbers used for account registration–through its system and applies machine learning algorithms to start building models based on that particular kind of data. This means they do not directly use the data itself by creating labels for certain data patterns or building a database of threats or cases, but to look for groups or clusters forming or trends emerging. This is then used to build models.
“We do not directly use client data,but accumulate models relevant to attacks, trends of new attacks,” said Wang on re-using models in other scenarios. Any data is deleted after 6 months, but the models are kept and reapplied for similar scenarios. At the time of the interview, the company had analyzed over 2.2 billion users and over 600 billion events (data points and data entries).
China as a training ground
As we’re all used to hearing, China has raced ahead with many areas of online life, with e-commerce being one of them. The sector has brought its own headaches for Chinese companies, in part because of the cutthroat competition. Government organizations and financial institutions are working on ways to prevent fraud in China and companies are also stepping up their defenses. Promotions are relentless in China, cropping up everywhere and through any channel: SMS, WeChat messages and channels, payment methods, and banking apps.
Coupon fraud is more common in East Asia, in part because of sheer volume. Half of the global e-commerce transaction volume is in the region. As well as offering small-scale coupons for a few RMB off a purchase, Chinese retailers run large sales events throughout the year.
The scale of promotions such as the November 11 Singles’ Day means fraud attempts on China’s e-commerce are international. With the potential savings on offer to users who manage to snatch up the heavily discounted goods, there are profits to be made in reselling the items. Hackers not just from within China, but from Russia, Southeast Asia, and Taiwan target the country’s e-commerce retailers. Chinese fraudsters sometimes have people outside China to get around certain regulations and use different IP addresses.
“All of [the large online retailers] see a great chunk of registrations happening in October, fake users registering a month ahead,” said Wang. “They behave like normal users, leaving a few reviews, making some small purchases and then, for the big promotions, they use bots to make quick purchases faster than humans. They grab things with a large discount then sell them on.” Often, the goods bought by the fake users to build their profile are actually purchased from fake stores; no goods ever change hands in the building the bot network.
Even in East Asia, there are very different approaches to fraud. In China, DataVisor has encountered more cellphone fraud than in other regions. In South Korea, it is very difficult to buy cellphone numbers, especially for foreigners. Compare this to areas of Southeast Asia where SIM cards are available from any corner store and don’t require registration.
For this reason, fighting fraud has to take a neutral approach with different models for different areas. The models learn from that specific region’s data, regardless of channel. With such different contexts, argues Wang, machine learning models work whereas databases and blacklists would not.
Social media is another frontier for online fraud. We’ve all had unusual friend requests online or unsolicited offers for all manner of goods and services. Social marketing where friends and connections recommend products in group chats or via feeds, is a growing phenomenon (and harnessed by companies such as Pinduoduo, an e-commerce platform that saw huge social commerce success when it developed a WeChat mini program). This is an area also well plied by fraudsters and who DataVisor confronts via monitoring the data of platforms.
Being able to tackle these issues in China, the frontier of e-commerce and social media is now part of DataVisor’s case for acquiring clients in other parts of the world. If you can deal with the international onslaught on the world’s most sophisticated e-commerce platforms, you can probably deal with issues in another country. This helped them win clients such as Pinterest and Yelp in the US and Tokopedia and Traveloka in Indonesia.
Why prediction is the best weapon in the battle for good vs bad
Not using rules or databases means there’s no chance of adding errors to the system. Supervising fraud monitoring can lead to false alarms and growing “noise” (errors and misleading figures) in the data.
“What’s the difference between AI and the previous technology?” said Wang, raising his eyebrows. “Ours is easier to make predictions instead of drawing conclusions, where the conclusions are learning from prior knowledge and trying to summarize it and draw up some rules, but with fraud detection that doesn’t really work. Because fraudsters are always coming up with new ways, so the best solution is to predict the future.”