Demand for high performance computing (HPC) is growing fast - and you might expect it to become kind of generic. But leading research sites like CERN still make extreme demands.
Customers needing a lot of computing muscle to handle large data sets or perform giant calculations, have tended to run their own supercomputers on-premise. But increasingly, HPC is available in the cloud either in bulk from the giant hyperscalers or in a specialist form such as that provided by my blog-hosts, Verne Global (read my previous blog which touches on this), and more people are starting to use cloudHPC as an option.
HPC is normally a good fit for a remote data center or a cloud service. In most cases, say wind tunnel modelling for plane and car development, the results aren’t needed in microseconds, so it makes sense to run the calculations where power and space are cheap, in a large facility with all the economies of scale.
Not everyone has that option though. In my latest feature for Datacenter Dynamics’s magazine, I had the privilege of talking to people whose needs absolutely ruled out a remote service.
Scientists probing the properties of the universe at the European particle physics laboratory, CERN, found they needed powerful computing resources as close as was humanly possible to the large hadron collider (LHC) - the 27km ring of superconducting magnets, which provided new evidence for the structure of matter when it detected the Higgs boson in 2012.
The LHC operates by smashing particles together, and detecting the fragments and fresh particles created in those collisions. Next year, CERN will start a short series of shutdowns during which it will upgrade the LHC to eventually operate at higher powers (so-called “high luminosity”), generating more Higgs bosons, and more answers to fundamental science questions.
Two of the experiments using the LHC aren’t waiting for the full upgrade - they are going to install more equipment during next year’s shutdown, to collect more data from the existing LHC when it switches back on,
The LHCb (for “beauty”) experiment, is looking for reasons why the Universe we see is composd mostly of matter, with very little anti-matter, by looking at quarks produced by collisions in the LHC. The LHC creates vast numbers of these - and the LHCb manages to capture data on one million of them every second, for analysis by a team of 700 scientists.
That sounds a lot - but the LHCb team wants more. The LHC actually creates 40 million collisions each second and the experiment chooses which ones to save on the fly. The LHCb team want to collect the data from all of the collisions, and sift them all for interesting cases that are currently being missed.
Another experiment looking at conditions after the Big Bang, called ALICE, wants to do the same thing. The trouble is, this means transmitting and processing an awful lot of data. The LHCb wants to collect around 40 Tbits per second - and to make the job harder, this data comes from clusters of instruments on the end of thousands of optical fibers.
This fiber cable alone is so expensive, it would not be possible to run it even as far as the main CERN computing center, so it’s simply not possible to handle the expanded LHCb data in the cloud. Instead, the experiment has set up its own small data center, located directly above the LHCb experiment, which is 100 metres below ground near the village of Ferney-Voltaire, France.
”It is not cost-efficient to transport more than 40 Tbps over 3km!” Niko Neufeld of the LHCb’s online team told me. Placing the IT resources directly above the experiment kept the cable down to 300m. The experiment processes the data in six shipping-container-sized modules (from Automation Datacenter Facilities of Belgium) which sit next to the cooling systems for the LHC’s superconducting magnets.
The fibers are terminated in two I/O modules, where a horde of purpose-built NIC cards decode the data feed, and the data from each collision is matched up. From there the “events” are passed to four other modules where more conventional general purpose GPUs check their characteristics to find the ones of interest.
Neufeld says this is specialist IT, with few wider applications, if any. The protocols handling data have to handle masses of data gathered in an environment swimming in radiation. Even nuclear power stations are less demanding.
But I’m not so sure. The capacity required is way beyond applications we currently envisage, such as self-driving cars and the Internet of Things (where data rates are well below the Tbps level). But the fact that what the LHCb is doing is even possible, could mean that someone, somewhere, will find another use for it.
But for the rest of us, for the foreseeable future, cloud HPC looks like the way to go...