arrow left facebook twitter linkedin medium menu play circle

Digital Fraud Wiki

Your source for the latest fraud intelligence, insights, research, and commentary.

Supervised Machine Learning

Machine learning is a branch of artificial intelligence that enables algorithms to learn from existing data and then apply that knowledge to new data. Supervised learning is the most common type of machine learning. It requires labeled training data, and the training goal is to be able to label the new data (test data) correctly. Supervised machine learning (SML) is used to discover patterns and insights from a set of data to make predictions about future outcomes. 

With supervised machine learning, the algorithm builds a mathematical model from a set of data that contains both the inputs and the desired outputs. As an example, if the task were determining whether an image contained a particular object, the training data for a supervised learning algorithm would include images with and without that object (the input), and each image would have a label (the output) designating whether it contained the object.

SML is so named because the process of “learning” from a training dataset is a “supervised” process—similar to the way a teacher guides learning for their students. Supervised learning requires that an algorithm’s possible outputs are already known and that all of the data used to train the algorithm is already labeled with correct answers. 

SML consists of regression and classification, and the main task of supervised machine learning is to define a model that minimizes prediction error.

SML Use Cases

There are many ways SML can be used across industries. Examples include:

  • Optimizing product-level price points
  • Classifying loan default risk 
  • Determining whether or not a skin lesion is malignant
  • Providing a decision framework for screening new job candidates
  • Detecting fraudulent transaction activity
  • Classifying emails as spam 
  • Analyzing consumer sentiment

Advantages of SML for Fraud Detection and Prevention

Adding supervised machine learning capabilities to fraud detection efforts offers improvements over rules-based systems because of the ability to generalize patterns from previous instances of fraud. SML models can leverage many more features than a manually created rule and simultaneously weight features more accurately. 

SML is especially effective for use cases in which the problem is well defined and does not change over time. When there is an abundance of readily acquirable high-quality training data to train models on, SML can be relied upon to produce good results. Examples of these instances include image recognition, natural language process, and predictive algorithms for everything from stock prices to the weather.

Limitations of SML for Fraud Detection and Prevention 

When a machine learning model is trained based on historical cases, it remains bound to the data defined in those cases.  What makes fraud detection a unique challenge for SML is that fraud is a “moving target.” Sophisticated fraudsters continually evolve their techniques and tactics,  and as SML requires prior labels and existing attack labels to function, SML-based approaches ultimately offer limited value as they can only detect fraud based on features and attributes that are already defined and trained. SML is unable to address new and unknown fraud, which dooms it to being solely a reactive solution.

Among the significant limitations that make SML inadequate for combating sophisticated modern fraud:

  1. Prior knowledge of fraud attacks needed
  2. Continual retuning required
  3. Delayed time-to-value
  4. Limited by manual feature engineering

Modern digital fraud is complex, coordinated, and operates at massive scale. SML approaches, which of necessity rely on historical data and labels, can neither keep pace nor meet this scale. As a component in larger, more comprehensive fraud management solution, SML can deliver meaningful value, but as a standalone strategy, it is too reactive, and too slow to adapt, to truly represent a viable defense against modern fraud. For these reasons, unsupervised machine learning has emerged as one of the most important techniques for proactive fraud prevention.