Unsupervised Machine Learning and smart data compression help AI deal with big data

As the use of big data grows, data protection and AI’s role in smart management of data are becoming increasing concerns. In the near future, a trend in the data tech field is more than reactions or legislation that will establish and consolidate proper rights regarding data use cases. Rather, data scientists and startups are taking a greater leap – enabling machine learning to detect any potential data fraud before it takes place and building fast data storage engine.

Yu Fang, co-founder and CTO at data fraud and abuse detection startup DataVisor, and Guo Kuan, CEO of Terark, a leading Chinese data storage startup,  discussed Big Data, AI, and Data Protection on July 3 at TechCrunch Hangzhou.

“People are more familiar with supervised machine learning,” Yu, a former member of Microsoft’s research team, said when explaining the initiative of her team’s Unsupervised Machine Learning (UML) technology.

In a supervised learning process, according to her, a person guides machines to learn constant patterns by inserting machine-understandable labels and existing patterns. The learning process will be slow and less sensitive to dynamic changes of potential data abuse or cyber attack patterns. UML, however, looks into correlations and complex movements to adaptively build knowledge of changing situations and intentional fraud plots. Machines will then also be able to draft comprehensive maps of possible crime rings – relevant crime accounts and potential improper actions correlated to the same key nodes.

Terark’s algorithm for compression allows 200 times faster database performance. Guo said during the discussion: “Our algorithm is 3-5 years ahead of other companies, so at the moment we have few competitors.” The company’s technology is already adopted by Alibaba Cloud. Guo added, the era of big data has begun (and it began earlier than most people have realized it).

As the collection of big data is moving into standardized frameworks where legislation tightens and dominant player conquer major resources, behind-the-scene AI innovations such as smart management of data and infrastructural data platform construction are having few strong players. Technological thresholds will be high, but the industry will gradually offer better rewards.