Describing how his companyโs fraud prediction system is like an unaccompanied child visiting a zoo, is when Wally Wang becomes the most animatedโฆ and the unsupervised machine-learning approach starts to make sense: โThereโs no mom to teach a child what a tiger is. The child by intelligence will automatically get the point of how to recognize a tiger or a goose. The connection will be built by the child. Thatโs how the AI part plays here, the algorithm does the detectionโฆ We build up a model, making the child more intelligent, to be able to tell automatically when an animal is evolving into another species.โ
DataVisor uses unsupervised machine learning to predict fraud attacks on companies. Founded by two women originally from China, DataVisor claims that using big data AI rather than databases and blacklists gives it greater accuracy and makes it better able to deal with the fast-changing world of online fraud. Tackling the attacks on Chinaโs vast e-commerce sector has allowed it to build models that can be used anywhere in the world and win clients such as Pinterest. The global big data security market is estimated to reach $26.85 billion by 2022, according to a report by MarketsAndMarkets.
Cyber attacks on businesses come in many guises. At one end of the spectrum are the widely-reported ransomware attacks such as WannaCry and thefts of data such as those experienced by Equifax and Uber. But businesses are also faced with the constant bombardment of small-scale hits, from fraudulent transactions to fake reviews and account registrations, organized groups abusing new promotions and even their own staff fiddling the figures for their own achievements.
DataVisor was founded in December 2013 by CEO Xie Yinglian and CTO Yu Fang, both with PhDs in computer science from US universities. Their solution plugs into a companyโs data feed then uses machine learning to look for signs of imminent attacks on the company.
Originally founded in the US, their Beijing office is large with a sense of an imminent client visit. Here, Wally, Head of Business Development and Product, explained to us why itโs better to leave your systems unsupervised.

A child at a zooโhow machine learning works for attack detection
If you have a database of known threatsโor animals in the zooโwhat happens when a new animal turns up that isnโt in the database? Other anti-fraud systems work by having blacklists of known offenders and databases of threats or patterns of behavior. These work with predetermined labels which, if detected, alert a system that an attack is happening or has happened. This is known as supervised detection. Machine learning, on the other hand, allows unsupervised detection which works better: Some animals even the mom wonโt recognize.
โWe are not defining what is strange. We are not defining the database or rule base (meaning if they switch hands or the device operating system is too old), we are not using those predefined rules because we think those rules will easily be got around by the fraudsters,โ said Wang. โInstead we build up a model, making the child more intelligent, to be able to tell automatically when an animal is evolving into another species.โ
DataVisor lets client data flowโwhether itโs emails, SMS, app use, phone numbers used for account registrationโthrough its system and applies machine learning algorithms to start building models based on that particular kind of data. This means they do not directly use the data itself by creating labels for certain data patterns or building a database of threats or cases, but to look for groups or clusters forming or trends emerging. This is then used to build models.
โWe do not directly use client data,but accumulate models relevant to attacks, trends of new attacks,โ said Wang on re-using models in other scenarios. Any data is deleted after 6 months, but the models are kept and reapplied for similar scenarios. At the time of the interview, the company had analyzed over 2.2 billion users and over 600 billion events (data points and data entries).
China as a training ground
As weโre all used to hearing, China has raced ahead with many areas of online life, with e-commerce being one of them. The sector has brought its own headaches for Chinese companies, in part because of the cutthroat competition. Government organizations and financial institutions are working on ways to prevent fraud in China and companies are also stepping up their defenses. Promotions are relentless in China, cropping up everywhere and through any channel: SMS, WeChat messages and channels, payment methods, and banking apps.
Coupon fraud is more common in East Asia, in part because of sheer volume. Half of the global e-commerce transaction volume is in the region. As well as offering small-scale coupons for a few RMB off a purchase, Chinese retailers run large sales events throughout the year.

The scale of promotions such as the November 11 Singlesโ Day means fraud attempts on Chinaโs e-commerce are international. With the potential savings on offer to users who manage to snatch up the heavily discounted goods, there are profits to be made in reselling the items. Hackers not just from within China, but from Russia, Southeast Asia, and Taiwan target the countryโs e-commerce retailers. Chinese fraudsters sometimes have people outside China to get around certain regulations and use different IP addresses.
โAll of [the large online retailers] see a great chunk of registrations happening in October, fake users registering a month ahead,โ said Wang. โThey behave like normal users, leaving a few reviews, making some small purchases and then, for the big promotions, they use bots to make quick purchases faster than humans. They grab things with a large discount then sell them on.โ Often, the goods bought by the fake users to build their profile are actually purchased from fake stores; no goods ever change hands in the building the bot network.
Even in East Asia, there are very different approaches to fraud. In China, DataVisor has encountered more cellphone fraud than in other regions. In South Korea, it is very difficult to buy cellphone numbers, especially for foreigners. Compare this to areas of Southeast Asia where SIM cards are available from any corner store and donโt require registration.
For this reason, fighting fraud has to take a neutral approach with different models for different areas. The models learn from that specific regionโs data, regardless of channel. With such different contexts, argues Wang, machine learning models work whereas databases and blacklists would not.
Social media is another frontier for online fraud. Weโve all had unusual friend requests online or unsolicited offers for all manner of goods and services. Social marketing where friends and connections recommend products in group chats or via feeds, is a growing phenomenon (and harnessed by companies such as Pinduoduo, an e-commerce platform that saw huge social commerce success when it developed a WeChat mini program). This is an area also well plied by fraudsters and who DataVisor confronts via monitoring the data of platforms.
For companies such as dating apps Momo, Blued and Tantan, DataVisor works on the early detection of problem accounts that could be used for bad friend requests and unwanted solicitation.
Being able to tackle these issues in China, the frontier of e-commerce and social media is now part of DataVisorโs case for acquiring clients in other parts of the world. If you can deal with the international onslaught on the worldโs most sophisticated e-commerce platforms, you can probably deal with issues in another country. This helped them win clients such as Pinterest and Yelp in the US and Tokopedia and Traveloka in Indonesia.
Why prediction is the best weapon in the battle for good vs bad
Not using rules or databases means thereโs no chance of adding errors to the system. Supervising fraud monitoring can lead to false alarms and growing โnoiseโ (errors and misleading figures) in the data.
โWhatโs the difference between AI and the previous technology?โ said Wang, raising his eyebrows. โOurs is easier to make predictions instead of drawing conclusions, where the conclusions are learning from prior knowledge and trying to summarize it and draw up some rules, but with fraud detection that doesnโt really work. Because fraudsters are always coming up with new ways, so the best solution is to predict the future.โ
