arrow left facebook twitter linkedin medium menu play circle

Managing Thousands of Spark Workers in the Cloud: DataVisor Presents at SAIS 2018

By Claire Zhou June 18, 2018

Photo of Claire Zhou

about Claire Zhou
Claire is a Product Marketing Manager at DataVisor with over 5 years of marketing experience in security and fin-tech. She is passionate about empowering enterprise customers with AI-based solutions. Her expertise spans data analytics, cybersecurity, and fraud prevention. Claire has an MBA from UCLA.

DataVisor’s Yuhao Zheng, Tech Lead Manager, and Boduo Li, Senior Research Scientist, Infrastructure, discuss how DataVisor leverages Spark’s scalability and portability to protect 4 Billion accounts from fraud abuse and money laundering. Watch the video to learn about managing 2000+ Spark workers in clusters and as well as DataVisor’s proprietary SparkGenerator, and automated Spark management platform that optimally balances cost and resource allocation.

Click here to get more information about the session >>

Session Abstract:

At DataVisor, we fight online fraud, abuse, and money laundering using unsupervised machine learning approach that clusters millions of users. In order to support the computationally intensive workload, DataVisor uses Spark as the mainstay of its computation infrastructure. The scalability and portability of our Spark infrastructure is critical to our company when we expand our business. In this talk, we will present our story of how we manage our Spark infrastructure at scale.

At peak time, we have 2000+ Spark workers online, and we group these workers into ~50 clusters of various size. The benefits of this, on one hand, is data isolation, which is critical to DataVisor as we are processing multi-customer data. On the other hand, this is for cost and performance consideration, as we want to provide just enough resources to each Spark application. When under-provision, Spark application will fail due to out-of-memory or out-of-disk. However we want to avoid unnecessary over-provision as it dramatically increases our cloud cost.

Next, we will present our DataVisor SparkGenerator (DSG), which is designed to automatically manage our Spark infrastructure. The responsibility of DSG includes (a) launching and shutting down Spark cluster, to maximize concurrency and minimize cost, (b) assigning Spark applications to the proper clusters intelligently, according to the Spark application profile, and (c) managing the dependency among Spark applications, to make our pipeline run smoothly and efficiently, and (d) running all of the Spark worker on Spot instances, reducing the cloud computation cost versus on-demand by over 80%.


Popular Posts

Intelligent solutions. Informed decisions. Unrivaled results.

DataVisor Fraud Index Report: Q3 2019

Learn More

Drawing on 80B events, 758M users, and 368M IPs, DataVisor’s Fraud Index Report tackles content abuse—how it happens,…

Drawing on 80B events, 758M users, and 368M IPs, DataVisor’s Fraud Index Report tackles content abuse—how it happens, why it’s scaling, and how to stop it.

Improve Fraud Protection and Customer Experience with AI

Learn More

Strides in artificial intelligence (AI) promise to strengthen fraud protection while also significantly improving the customer experience.

Strides in artificial intelligence (AI) promise to strengthen fraud protection while also significantly improving the customer experience—two vital sources of competitive differentiation in today’s competitive landscape. As lending activity moves online, AI leverages advanced analytics to stop new…

Keeping Platforms Safe: AI and Machine Learning for Fraud Prevention

Learn More

Every company is different, and every attack is different. When it comes to defeating fraud, success is determined…

Every company is different, and every attack is different. When it comes to defeating fraud, success is determined organization by organization. From mass registrations and fake listings, to ATO and spam, to promo abuse and bot attacks, DataVisor’s AI-powered fraud management solutions deliver the…


Protect your business, your customers, and your data.

Request Demo