How data is collected and analysed is changing at exponential rates. In industry — and I know this from talking and consulting with hundreds of companies — so much data is being collected that many companies are either confused with, or overwhelmed by, all of the ways to leverage and analyse their data.
Comparing what it meant to analyse data in the early days of high performance computing (HPC) to what it means today is like comparing apples and oranges – similar, but so different.
This article will provide a brief history of the evolution of HPC data analytics then focus more substantively on recent use cases of applied/industrial analytics and go into some technical detail about what architecture and tools are available to drive today’s data analytics.
Data analytics come in many different forms. Even the largest of companies have traditionally analysed data manually in spreadsheets or have relatively rudimentary visual analytics in infoviz format.
Though AI has been around for decades, leveraging the power of AI (ML and DL), particularly in a HPC compute environment, is relatively new, especially in industry. Add opportunities in geospatial analysis and more sophisticated visualization in harmony with increased computing power and more analytics are taking on a deeper, more sophisticated capability, providing greater business intelligence more efficiently as we head into the next industrial revolution.
Examples of Applied/Corporate HPC Data Analytics
From DataFloq (4 Ground Breaking Use Cases of Big Data and High Performance Computing, 26 July 2015) “One company that deals with these volumes is PayPal. On a daily basis, they deal with 10+ million logins, 13 million transactions and 300 variables that are calculated per event to find a potentially fraudulent transaction. Thanks to High-Performance Data Analytics, PayPal saved in their first year of production over $700 million in fraudulent transactions that they would not have detected previously.”
In the area of medical imaging, “Pfizer uses machine learning for immuno-oncology research about how the body’s immune system can fight cancer.” (Built-In, 1 February 2019, Ultra Modern Medicine: 5 Examples of Machine Learning in Healthcare) In terms of industrial impact, Pfizer collaborated with a Chinese tech startup “to develop an artificial intelligence-powered platform to model small-molecule drugs as part of its discovery and development efforts. The project will combine quantum mechanics and machine learning to help predict the pharmaceutical properties of a broad range of molecular compounds.”
Autonomous vehicles are evolving quickly yet with seemingly so far yet to go. Deep learning impacts many elements in autonomous, according to Forbes’ 20 August 2018 article, 10 Amazing Examples Of How Deep Learning AI Is Used In Practice? “There's not just one AI model at work as an autonomous vehicle drives down the street. Some deep-learning models specialise in streets signs while others are trained to recognise pedestrians. As a car navigates down the road, it can be informed by up to millions of individual AI models that allow the car to act.”
There’s so much more, of course. From crop protection to drug discovery to insurance cost controls to smart manufacturing, AI now plays an important role in industrial advancements and successes.
At least partially due to the data deluge, analytics today are driven more frequently by advanced computing resources. CPUs have traditionally run most analytics jobs, and even that evolution has seen significant changes.
In a 24 August 2016 insideHPC article, The Evolution of HPC, change from the need for huge CPU clusters was described. “Instead of a monolithic CPU that manages MPI or SHMEM communication a programmable co-design presents a new model that blurs the lines between discrete cluster components (i.e. the server, accelerators, and the network). A network co-design model allows data algorithms to be executed more efficiently using smart interface cards and switches.”
Meanwhile, GPUs are now very much in the analytics mix, depending on what is being analysed. Our recent upgrades to our evergreen iForge industrial cluster has expanded the use of both Skylake CPUs and NVIDIA V100 GPUs. Benchmarking across domains, including on a variety of data analytics workflows, has shown up to 60% performance uplift with GPUs.
AI has expanded its reach in today’s analytics as driven by HPC thanks to the confluence of machine learning and deep learning with other domains like modeling & simulation and bioinformatics. Oil & gas, healthcare, financial services, agriculture and more sectors take advantage of advances in analytics capabilities in creating enhanced services and generating more business.
HPC drives sophisticated data analytics. AI needs the power of HPC to collect and analyse in this era of data deluge. For industry, the more intelligence derived from data and in as short a time as possible equates to better products and services and, ultimately, ROI.
The next five years should be more of the same as we approach Exascale capacity to drive more data to deeper, more sophisticated solutions. Think about what is happening now, then add significantly more compute power, more tools, more experts, and more companies driving our evolution through their increasing applied needs. Now things are advancing exponentially. It’s going to be an incredible ride!
Brendan McGinty is Director of Industry for the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign.