arrow left facebook twitter linkedin medium menu play circle

Guest Post: AML Data Quality – The Challenge of Fitting a Square Peg into a Round Hole

By DataVisor April 17, 2017

Photo of DataVisor

about DataVisor

square peg

As mentioned in my previous articles, traditional rule-based transaction monitoring systems (TMS) have architectural limitations which make them prone to false positives and false negatives:

This article focuses on the third drawback of existing TMS solutions: how their inflexible data models lead to poor data quality, resulting in additional false positives and false negatives.

I think many of us working in the anti-money laundering (AML) technology space have experienced the frustration of spending many hours retrofitting new data types to squeeze into the rigid data model of a TMS. Unfortunately, the more effort we spend retrofitting data, the more likely we introduce data quality issues. Further, when we don’t complete it in a timely fashion, we’re exposed to risk of large fines from regulators. That said, there’s hope on the horizon from machine learning solutions that are more forgiving of disparate data formats.

Square peg in a round hole

Sending data from source systems to many of the existing TMS is like trying to fit a square peg in a round hole. There are two major reasons for why this is the case.

First, TMS require a lot of data of many various types. Financial institutions typically have many disparate customer, account and transaction systems that feed data into the TMS to satisfy monitoring requirements. Second, existing TMS have a monolithic data model that’s generally difficult to adjust without significant customization.

This forces the financial institution to change its data to conform. However, this is difficult because each source system will have its own unique characteristics and ultimately serve a different business purpose. For example, a mortgage lending application may function differently than a system handling retail demand deposit accounts (DDA). Furthermore, each system will have its own data model or way to store and update information.

Unfortunately, these challenges result in a long, arduous process that’s filled with subtle gotchas, leading to missing potential AML events, leaving you exposed to huge fines from regulators. For example, imagine that a financial institution purchased a commercial loans company. The financial institution must integrate the acquired company’s data into their existing TMS, but the process takes longer than anticipated. During a regulatory exam, the regulator uncovers that the purchased company’s data is still not being monitored by the existing TMS. The regulator views the acquired firm’s lack of integration into the existing AML framework as a red flag and decides to probe the program deeper than it had in the past.

Even worse, the more the data is reshaped to fit the TMS data model, the greater the likelihood of developing additional data issues. And as you know, this will lead to false positives and false negatives down the line.

The best solution is to minimize data transformations. If the files are kept as close to the system’s original format as possible, the data integrity issues will be isolated to the system. While a certain degree of data transformations will be required before the detection algorithms are run, this can be accomplished within the TMS itself. However, this would require a TMS that is not based on a monolithic data model, and has some flexibility and adaptability.

How unsupervised machine learning (UML) leads to a more flexible TMS

There are some promising AI-based TMS solutions that are designed to solve this data inconsistency problem. Using unsupervised machine learning (UML) allows the TMS to have flexible data requirements. (For more information about how UML works in the context of AML, read my first blog post on the subject.)

To understand why, consider their differences. Traditional TMS with rule-based models look for specific scenarios and require specific fields structured in certain ways to map them to their internal data model. UML does not have a strict data model that inputs must adhere to; rather, it works with the data that it’s given.

Consider the scenario where an account was previously dormant and then suddenly began transacting very quickly. A rule would require several highly specific data fields and encode strict thresholds in order to try to match the scenario. However, the rigidity of the data fields make the initial integration difficult which increases the likelihood of data quality issues. A secondary issue is the strict thresholds, which lead to false positives and false negatives.



On the other hand, a TMS that leverages UML can take in a variety of data fields to find hidden networks of accounts with anomalous behavior. For example, UML may uncover a network of accounts that were previously dormant and started transacting quickly.



Note this example is simplified, as in practice the UML model would take into account hundreds to thousands of different data attributes to uncover the network.

There are three major benefits of using UML to power or supplement a TMS. First, with low data integration effort required, there are few chances to make mistakes that lead to data quality issues (and ultimately, false positives and false negatives). Second, it’s faster to get the TMS up and running. And third, it’s much easier to add new data fields or entire new use cases over time. This includes changing business logic (for example, new product offerings are launched) and relentless criminals adapting their methods.

The future of TMS technology

Ultimately, detecting money laundering is extremely complex. To make matters worse, customers, customer behaviors, product offerings, regulatory requirements, and even institutions themselves are under a constant state of change. We must consider that the tools we use to fight financial crime today not only limit our technical capabilities, but may actually influence the way we think about the problem itself.  As Marshall McLuhan said, “We shape our tools and afterwards our tools shape us.” It’s time we got some better tools.

Keith Furst Picture croppedKeith Furst is the Founder of Data Derivatives, and has years of experience within a variety of financial institutions including Tier One wholesale banks, investment banks, foreign bank branches, commercial banks, retail banks, broker-dealers, prepaid card providers and merchant acquirers with a focus on implementing, fine tuning and validating financial crime systems. His forte relates to transaction monitoring, customer due diligence, fraud and market abuse systems and his work included custom data analytics resulting in the identification of suspicious activity outside of the traditional surveillance models.

Popular Posts

Intelligent solutions. Informed decisions. Unrivaled results.

DataVisor Fraud Index Report: Q1 2019

Learn More

Access proprietary data and research results to discover the latest attack techniques and prevention strategies.

Download the Q1 2019 Fraud Index Report from DataVisor to receive unparalleled data-driven insights into the latest attack trends, and the most effective prevention strategies, based on analysis of over 44 billion events, 800 million users, 396 million IP addresses, and more.

Dumb & Dumber vs Ocean’s 11

Learn More

Understand the range of modern fraud attacks to ensure complete coverage for your organization.

Complex and coordinated fraud attacks that are extensively planned, hard to detect, and highly scalable are the new normal for online platforms. Explore and understand the full spectrum of fraud attacks—from simple to sophisticated—and learn how you can defend against each type in this…

Guard Your Online Marketplace Against Fraud

Learn More

Discover AI-powered fraud strategies for preventing financial and reputational damage in this powerful eBook.

Online marketplaces withstand a complicated array of fraud attacks—spam, scam, and all points in between. Only the most comprehensive, proactive AI-powered solutions can fully protect against reputational and financial damage. This eBook details the entire lifecycle of a fraud attack, and lays out…

Protect your business, your customers, and your data.

Request Demo