HPC and big data convergence in fraud detection

HPC Insights


The convergence of two existing disciplines can be an explosively creative force. A great example within the world of tech is the convergence of HPC merging with big data and machine learning. Though in many ways this convergence is still in its early stages, the merging of these technologies is already starting to deliver concrete, real world benefits in the fraud detection field, helping save financial firms hundreds of millions of dollars.


Unsurprisingly, PayPal is one of the companies at the forefront of this convergence. As an online transaction processor that was conceived on the Internet, Paypal has grown up exposed to virtually every cyber security threat and fraud imaginable. Because of this, the company has been aggressively pursuing a security strategy that used both HPC and big data technologies as early as 2001. Although PayPal keeps details of its fraud protection systems a secret, it has been very open about leveraging the flexibility of the open-source H20 machine learning framework in conjunction with its big data infrastructure, which gathers more than 20 terabytes of log data every day.

To gain insight from this massive Hadoop dataset, Paypal, which handles over 13 million online monetary transactions per day, combines the three types of machine learning — linear, nonlinear, and deep learning — to help identify and stop fraudsters. The company estimates that just in the first few years of deploying their fraud detection systems, they’ve saved over $700 million dollars in fraudulent transactions that they otherwise wouldn’t have noticed.

Though PayPal may be have been one of the first to recognise the value of converging HPC and big data technologies, today virtually all the major financial services firms are seeking novel ways to combine these technologies in order to protect themselves. Another high-profile example is MasterCard, which has a staggering 2.2 billion cards in use in 330 countries, and handles roughly 160 million transactions per hour, or 52 billion transactions a year. MasterCard, much like PayPal, employs a hybrid machine learning approach that uses supervised and unsupervised learning, in addition to traditional big data technologies like Hadoop and Spark, to examine the location, spending habits, and travel patterns of the customer before each purchase is made. According to Vice President of Global Big Data Consulting at MasterCard, Nick Curcuru, the company’s infrastructure applies 1.9 million distinct rules to examine each transaction, and processes each of these transactions in just milliseconds.

Other major institutions have developed their own mix of technologies to protect themselves against fraud. CitiBank, for example, has recently made investments in machine learning companies like Feedzai, Cylance and Ayasdi in order to bolster its fraud detection capability, and the company just recently announced that it would open a branch of its Global Innovation Lab at a London WeWork, specialising in the development of big data and high performance computing technologies.

The quest for better fraud protection is a never-ending one, however, as credit card fraud continues to increase in severity. Credit card fraud in Europe caused 1.8 billion Euro in damages in 2016 (with the UK and France accounting for 73% of that), while in the US credit card fraud has been a persistent problem, and even debit cards, which until now have been relatively safe, are now starting to see a rise in fraud.

The problems of controlling fraud are compounded by demands from consumers, who want faster, easier payments, and greater flexibility. This pressure was almost certainly a factor when the Payment Card Industry (PCI) Security Standards Council - the regulatory body that oversees transactions from credit and debit cards - decided to start allowing PIN numbers to be entered on mobile phones. The decision, which may benefit the user experience, also introduces a new level of vulnerability into the transaction process, and therefore has elicited some skepticism.

Delivering an improved user experience while also maintaining maximum security will necessitate improved fraud detection systems. This points to a need for stronger, more deeply converged systems that can gather deeper insight from customer and transaction data. Realising the better union of HPC, big data, and machine learning will require a deeper harmony between once heterogeneous computational infrastructures, and perhaps also a shift in perception. Challenges aside, major players are already making serious commitments to this convergence, and envision a not-to-distant future where the three workloads can be seamlessly processed on one system.

Verne Global, which provides its clients in these fields with low-cost, fully-sustainable power resources is in an ideal position to help facilitate this transformation — a role we’re eager to fulfill.


Written by Spencer Lamb

See Spencer Lamb's blog

Spencer is Verne Global's Director of Research and head's up our high performance computing work with European research and scientific organisations. He is also a member of the European Technology Platform for High Performance Computing (ETP4HPC).

Related blogs

Cloudpoint: A blueprint for high performance computing

As part of the hpcDIRECT team, based in the London offices of Verne Global, we continue to develop and refine our "high performance computing as a service" (HPCaaS) product that was launched in December last year.

Read more


Explainable AI

SC18 here in Dallas is proving once again to be a fascinating melting pot of HPC insights and observations, and it's intriguing to see the continuing convergence of AI into the supercomputing ecosystem. Along these lines I started to think about the movement towards 'Explainable AI'. Being able to explain and understand how models work when making predictions about the real world is a fundamental tenet of science. Whether solving equations in a dynamic system for precise answers or using statistical analysis to examine a distribution of events, the results sought from these methods are intended to increase our clarity and knowledge of how the world works.

Read more


SC18 - My Schedule Highlights

SC18 gets underway in Dallas, Texas, today, kicking off a week of panels, workshops and tutorials. This is the 30th year of the conference, which is properly known as the International Conference for High Performance Computing, Networking, Storage and Analysis. It's an exhausting schedule to navigate and even experienced attendees will have to do some diligent research to pick out the highlights. Here are mine...

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.