arrow left facebook twitter linkedin medium menu play circle

Automated Feature Engineering

DataVisor Threat Blog

By Swetha Basavaraj May 14, 2019

Photo of Swetha Basavaraj

about Swetha Basavaraj
Swetha is a senior product manager at DataVisor. She has a diverse experience of over 10 years leading teams in various capacities such as a product manager, entrepreneur and engineer to launch new B2B products in Yahoo, IBX (now Tradeshift), VolvoCars and IBM. Her past and current work has focused on building scalable enterprise products using latest technologies including machine learning.

How to achieve exceptional model accuracy in minutes instead of months with automated feature engineering and unsupervised machine learning.

Sophisticated modeling practices are critical for modern fraud management, and the ability to detect, deter, and defeat massive-scale coordinated attacks is made possible by the power of AI and machine learning.

All modeling processes have three main steps:

  1. Data Collection and Cleansing
  2. Feature Engineering
  3. Model Building and Evaluation

In this post, we’ll discuss feature engineering, which is one of the most important and valuable steps for achieving the highest quality results.

Feature Engineering

Let’s begin by establishing a basic definition of feature engineering. A feature is a characteristic that can help solve a problem using machine learning. The process of extracting such features from a raw dataset is called feature engineering. There is an art to this process, and final results depend on how well this step is managed. Domain expertise and data insights help create the right features that produce the best possible results.

The challenges of manual processes
Feature engineering is often still performed manually by data scientists. A data scientist will analyze data and then, based on their domain expertise and experience, decide what features to create. The goal is better model results, but since many features are available for modeling, overfitting—an overabundance of applied parameters that narrow, and negatively impact, a model’s ability to perform—is a common problem. Adequate tools and technical skills are required for successful feature engineering, and even then, the process can still be labor-intensive and time-consuming.

The benefits of automation
Where there is a clear problem to solve, and domain expertise that can be applied, it is possible to standardize certain features that can be used for building models. These features can be automatically derived or extracted from raw data. For example: IP address is essential for fraud detection. For each IP address in the raw data, we should be able to derive additional features such as: ip prefix, ip city, check_ip_from_datacenter, ip_country, and more. In this way, we can begin to develop automated processes that increase both efficiency and accuracy.

Automated Feature Engineering with DCube

DCube, DataVisor’s comprehensive fraud detection platform, not only provides the necessary tools for modeling (data management, feature engineering, model review) but also automates the feature engineering process by providing hundreds of derived features based on data and mapping. The higher the data quality, the better these derived features will be. These extracted features can include:

  • Transform Features
  • Aggregated Features
  • Global Intelligence Network Features

Transform features
Transform features are created from one or more of the existing attributes of the raw data.

Example: From “event_time,” a user should be able to get derived features such as minute, hour, day, week, month, year, and date, automatically.

Aggregated features
To create aggregated features, records are grouped based on a specific value of the attribute, and a feature is created based on the aggregated data for a specific period of time. There are several out of the box aggregated features calculated automatically by dCube based on the attributes available for feature engineering.

Example: A feature to calculate the total amount of transactions processed from a particular device where the amount of transaction exceeds $500, within a set 7-day period.

Global Intelligence Network features
These features are derived from fraud data and patterns observed in our Global Intelligence Network (GIN), which is comprised of data from more than 4 billion global accounts.

Ex: GIN provides a reputation score for each of the IPs within the raw data, based on global data and distribution. This score is based on a ratio of detected users to total users on a specific IP.

Conclusion

Models are only as good as their data and features, and feature engineering is made more efficient and effective when the most important features necessary for fraud detection—as determined by extensive domain expertise—are automatically created. When paired with sophisticated unsupervised machine learning algorithms, automated feature engineering can deliver exceptional model accuracy in minutes instead of months.


Popular Posts

Intelligent solutions. Informed decisions. Unrivaled results.

DataVisor Fraud Index Report: Q2 2019

Learn More

The DataVisor Q2 2019 Fraud Index Report is here.

Customers online want convenience, ease, and access. Fortunately, your business offers it all. Unfortunately, that’s what fraudsters want too. To a cyber criminal, those features mean vulnerabilities. To bring you the very latest and most actionable insights about where the risks are and what you…

Dumb & Dumber vs Ocean’s 11

Learn More

Understand the range of modern fraud attacks to ensure complete coverage for your organization.

Complex and coordinated fraud attacks that are extensively planned, hard to detect, and highly scalable are the new normal for online platforms. Explore and understand the full spectrum of fraud attacks—from simple to sophisticated—and learn how you can defend against each type in this…

Diagnose and Defeat Application Fraud with the Latest AI-Powered Tools

Learn More

Learn how leading financial institutions are using ML to proactively detect card application fraud.

In this insightful webinar, you’ll explore how organizations are leveraging AI-powered fraud management solutions to get tangible, real-world benefits as they work to proactively detect and defeat sophisticated modern fraud attacks. Plus, you’ll discover strategies for empowering cross-team…


Protect your business, your customers, and your data.

Request Demo