How can companies use the latest deep learning technologies to defend from large, coordinated online fraud? DataVisor’s Ting-Fang Yen and Arthur Meng, presented a real-time scalable online fraud detection solution backed by deep learning techniques.
Deep Learning has been growing in popularity and is mostly used in the fields of computer vision and natural language processing. Ting-Fang and Arthur discuss how DataVisor how Deep Learning can also be applied to security and fraud detection problems, and why it outperforms traditional blacklists and machine learning approaches.
Here we present a real-time, scalable online fraud detection solution backed by deep learning technique. Nowadays, most deep learning applications are seen in actively studied fields including computer vision, natural language processing, etc. Our current solution represents one of the few production examples where deep learning models are applied to security problems. Our results demonstrate that deep learning solution outforms traditional blacklist and machine learning approaches significantly at terabyte-data scale.
Online fraud is largely orchestrated by organized crime rings. Coordinated malicious user accounts, either created anew, or obtained via user hijacking, actively target the various modern online service for real-world financial gain. Existing fraud solutions either rely on reputation lists for blocking known suspicious activities, or require extensive feature engineering by human analysts for model training. These approaches do not adapt well to changing fraud patterns nor are able to scale to large data volumes. At DataVisor, we analyze activities from billions of accounts across global online services to detect fraud and abuse. These data gives us unique insights into the online fraud landscape that allow us to tackle the coordinated fraud attacks holistically.
Our deep learning solution is based on digital information commonly collected by online services, including IP addresses, user-agent strings, email domains, user nicknames, etc. We build a general fraud detection framework which can identify fraudulent activities in log data that contain (all or a subnet of) these common digital information. By leveraging common digital information, the model is agnostic to the specific application or service from which data queries originate. We discuss the design and implementation of our deep learning pipeline based on Spark and Tensorflow that is built to fit our multi-cloud, real-time production requirements. We also demonstrate how our system outperforms traditional solutions including blacklists and machine learning methods.