Can Your Known Fraud Tools Fight Unknown Fraud?

Alex Niu

Alex Niu

A Director of Solution Engineering at DataVisor, Alex has over 8 years of experience in the financial industry in the area of risk management analytics. As the director of Decision Science at American Express, Alex developed and implemented machine learning solutions in credit and fraud applications.

The digital economy is witnessing a rise in sophisticated and previously unknown fraud attacks like never before. These attacks are well orchestrated, massive and utilize modern technologies to mirror legitimate account behaviors. In a recent Fraud Index Report published by our research team where we gathered signals between April and June 2018 spanning over one billion active user accounts,1.5 million email domains, thousands of device types, hundreds of cloud hosting providers and data centers, our analysis showed a rise in coordinated frauds, resulting in larger fraud related losses. We also saw that fraudsters are getting better at making fake account appear “real” and fraud is geographically distributed though its geolocation may not be what it seems – fraudulent users could originate from a cloud service, likely to either mask the attack origin or use cloud services to scale up their operations.

Existing solutions are reactive as they rely on historical attack patterns or experiences. Whether it is existing rules or supervised machine learning models, both approaches are largely based on labels of already observed attacks or knowledge of past experiences. Supervised machine learning, simplistically put, automates rules creation. As newer fraud patterns emerge, the model is retuned but only after the loss has already incurred. Rules and supervised machine learning approaches suffer from the curse of early victims where attacks are identified after the fact and then addressed with newer rules or retuned models. However, fraudsters are agile, and change their attack patterns constantly and sadly these models cannot adapt in real-time to the unknown type of attacks. These techniques for fraud detection are set up to address known fraud patterns only making them less effective.

Adoption of unsupervised machine learning that can provide early detection of unknown fraud has been growing steadily. According to Gartner, 50% of companies will use unsupervised machine learning by 2021. So why is unsupervised machine learning becoming a sought-after technology for fraud detection?

1. Proactive, not reactive response: Unsupervised machine learning models do not require historic loss experience or training data and look for the “unknown” without any preconceived bias. They are proactive in detecting changing attack patterns, and can often provide 30-50% additional detection results over existing systems by detecting new attacks early, even at account application or registration time.

2. Co-relates All Accounts in Real time: The unsupervised machine learning approach allows processing events and account activities to analyze the correlations and similarities across millions or hundreds of millions of accounts – all in real time. While other approaches look at accounts in isolation, the algorithm here is set up to show hidden structures across abusive, fraudulent, or money laundering accounts in real-time.

Viewing data in isolation is ineffective where accounts operate in a stealthy mode. For example, some accounts may conduct very low volume activities to stay under the radar and could incubate for days or months before striking. However, by analyzing the global population of accounts simultaneously, unsupervised Machine learning reveals the subtle correlations among them and helps make timely decisions.

3. Low Retuning Overhead: Traditional supervised approaches need to constantly update their models to keep up with fraudsters. Labels often require data science as well as deep fraud domain expertise. Model tuning is a time intensive process. Since new sophisticated attacks often involve many different types of events and steps, fast and effective manual rule derivation becomes impossible. The UML approach doesn’t need frequent retuning since its predictive power is not based on intelligence derived from historical experience. It is proactive and catches unknown fraud by constantly adapting to the evolving attack patterns in order to maintain high performance and reliability.

As an example, let’s examine a real world fraud ring comprised of over 200 credit card accounts from a large bank that slipped through other detection systems. These accounts resided in low risk regions, had a high FICO score, matched bureau data, and were not in the existing fraud database. They didn’t have any risky signals that were similar to any previously known or seen attacks and passed through other existing fraud detection systems. However, when unsupervised machine learning was applied, it not only examined the usual dimensions of data, but also looked at other digital attributes of credit card applications and uncovered subtle correlations that were indicative of the presence of a fraud ring. All emails had the same pattern of being created by the account holder’s first name, last name initial, and birthday, the IP addresses were all associated with high-risk data centers, the accounts all used an old iPhone, all accounts performed their activity with Chrome, though Safari is the default browser app for iPhones. The correlation analysis caught the innocuous looking accounts to find fraud that was hitherto unknown!

Figure 1: Individual application looks legitimate.
Figure 2: However, subtle, hidden correlation exists in digital traces when viewed in a bigger picture.

In this cat and mouse game with fraudsters, reactive solutions are no longer an option. We need to stay ahead of fraudsters and build better defenses. You can learn more about how digital banks can stay ahead of fraud with our eBook.