The power of machine learning to fight fraud


As the threat landscape becomes more complex, fraud and security teams find it significantly more difficult to solve the problem efficiently. Partnering with the right security vendor and leveraging the insights they provide in the most efficient way is key to making accurate, real-time risk decisions.

The current threat landscape

I often see scammers portrayed as teenagers confined to their bedroom in hoodies, their eyes fixed on their computer screen and their fingers actively typing in a complex series of scripted commands. And within minutes they cracked the code! I think for those of us who grew up in the 1980’s, the War Games movie influenced us and the web security industry. Looking back, watching this movie in my youth might have given me a taste for web security (that definitely sounded cool at the time).

In my experience, reality differs from the glamorous point of view that Holywood offers us. We’re dealing with some scripting kiddies, but we’re mostly dealing with experienced developers who are capable of reverse engineering complex protection through trial and error. You have resources at your disposal:

  • They use the same cloud infrastructure that reputable companies use and are experts at scaling and load balancing.
  • Some of them are qualified enough to introduce computer vision to overcome challenges.
  • Some organizations have hired multiple developers and offer 24/7 support.

Basically, scammers conduct their activities in the same way as legitimate companies, and when the economics are in their favor, the attacks will continue.

Subpar scam solutions make it easy for bad actors to sneak through

Unfortunately for many companies, their fraud prevention solutions are ill-equipped to distinguish between legitimate consumers and bad actors, leaving consumers’ digital accounts more vulnerable to exploitation.

A fraud detection product should be able to look at traffic from multiple angles to cover as much of the attack surface as possible. For example, simple tricks like looking at clients’ request speed work with both volumetric and simple attacks, but attackers learned long ago to circumvent this by balancing their traffic through proxy services. A ruleset can help detect signals typically related to fraudulent activity, but the more advanced scammers have refined their strategies over the years, making such a ruleset less effective, especially as it may not be updated quickly enough. The detection layer must consider multiple signals and have algorithms to automatically detect anomalies and score traffic accordingly.

Machine learning algorithms to the rescue

Leveraging machine learning (ML) is definitely the way to go to speed up and automate detection, manage continuous strategy shifting, and curb fraudulent activity. However, it is not as easy as it seems. Anyone can design and deploy machine learning algorithms to detect anomalies, but few can do so with high accuracy, especially with a low false positive rate. Inaccurate detection typically results in the web security team not trusting the result, not applying an appropriate countermeasure, and ultimately allowing the attacker to continue their attack.

Developing an accurate machine learning model is complex. If you decide to use a monitored model to detect known bad or good activity, you need accurately labeled data. This may sound easily obtainable, but unfortunately it is not always the case:

  • Data can be characterized by an offline job that could look at a client’s activity history and through that lens identify anomalies.
  • Data can be manually labeled by a group of people, but training a team to consistently assess and label data can be time-consuming, costly, and challenging.

In either situation, incorrect labeling can result in the accuracy of the ML model being compromised. At Arkose Labs we use the feedback loop we get from the results of challenging users. We also look at typical traffic patterns over time and our knowledge of the internet ecosystem and typical legitimate user behavior. The combination of these multiple sources of truth helps us maintain a high level of accuracy.

As a design principle and for better explainability, I like to keep things simple. For this reason, I prefer to use unsupervised or statistical models wherever possible. Many anomalies can thus be detected with good accuracy. Most of the time, as long as your understanding of the data and your assumptions are correct, the accuracy of the model output will be accurate and easier to manage.

Consuming the output of a fraud detection system

At Arkose Labs we strive to be as transparent as possible about our detection and share all evidence with our customers. Some trust our judgment and let us decide when to issue a challenge to mitigate the activity. Others prefer to use us as a source of information, using our signals, a combination of risk assessment and classification, and the list of detected anomalies. They typically ingest Arkose Labs data via their own machine learning models, which can combine input from other vendors and apply their own decision engine. For such a model to be successful, understanding each vendor’s results and how they came about is key to designing and developing the most accurate model and delivering the best user experience.

The Arkose Labs Advantage

Whichever model you decide to go with, Arkose Labs can help protect your critical endpoints and keep attackers at bay. The research team is constantly looking for innovative ways to process the data and further improve the recognition accuracy. Book a demo today for more details.

*** This is a syndicated blog from Security Bloggers Network Arkose Labs written by David Senecal. Read the original post at:


About Author

Comments are closed.