CERN - When science can't use the cloud

Scientific Research Insights

Demand for high performance computing (HPC) is growing fast - and you might expect it to become kind of generic. But leading research sites like CERN still make extreme demands.

Customers needing a lot of computing muscle to handle large data sets or perform giant calculations, have tended to run their own supercomputers on-premise. But increasingly, HPC is available in the cloud either in bulk from the giant hyperscalers or in a specialist form such as that provided by my blog-hosts, Verne Global (read my previous blog which touches on this), and more people are starting to use cloudHPC as an option.

HPC is normally a good fit for a remote data center or a cloud service. In most cases, say wind tunnel modelling for plane and car development, the results aren’t needed in microseconds, so it makes sense to run the calculations where power and space are cheap, in a large facility with all the economies of scale.

Not everyone has that option though. In my latest feature for Datacenter Dynamics’s magazine, I had the privilege of talking to people whose needs absolutely ruled out a remote service.

Scientists probing the properties of the universe at the European particle physics laboratory, CERN, found they needed powerful computing resources as close as was humanly possible to the large hadron collider (LHC) - the 27km ring of superconducting magnets, which provided new evidence for the structure of matter when it detected the Higgs boson in 2012.

The LHC operates by smashing particles together, and detecting the fragments and fresh particles created in those collisions. Next year, CERN will start a short series of shutdowns during which it will upgrade the LHC to eventually operate at higher powers (so-called “high luminosity”), generating more Higgs bosons, and more answers to fundamental science questions.

Two of the experiments using the LHC aren’t waiting for the full upgrade - they are going to install more equipment during next year’s shutdown, to collect more data from the existing LHC when it switches back on,

The LHCb (for “beauty”) experiment, is looking for reasons why the Universe we see is composd mostly of matter, with very little anti-matter, by looking at quarks produced by collisions in the LHC. The LHC creates vast numbers of these - and the LHCb manages to capture data on one million of them every second, for analysis by a team of 700 scientists.

That sounds a lot - but the LHCb team wants more. The LHC actually creates 40 million collisions each second and the experiment chooses which ones to save on the fly. The LHCb team want to collect the data from all of the collisions, and sift them all for interesting cases that are currently being missed.

Another experiment looking at conditions after the Big Bang, called ALICE, wants to do the same thing. The trouble is, this means transmitting and processing an awful lot of data. The LHCb wants to collect around 40 Tbits per second - and to make the job harder, this data comes from clusters of instruments on the end of thousands of optical fibers.

This fiber cable alone is so expensive, it would not be possible to run it even as far as the main CERN computing center, so it’s simply not possible to handle the expanded LHCb data in the cloud. Instead, the experiment has set up its own small data center, located directly above the LHCb experiment, which is 100 metres below ground near the village of Ferney-Voltaire, France.

”It is not cost-efficient to transport more than 40 Tbps over 3km!” Niko Neufeld of the LHCb’s online team told me. Placing the IT resources directly above the experiment kept the cable down to 300m. The experiment processes the data in six shipping-container-sized modules (from Automation Datacenter Facilities of Belgium) which sit next to the cooling systems for the LHC’s superconducting magnets.

The fibers are terminated in two I/O modules, where a horde of purpose-built NIC cards decode the data feed, and the data from each collision is matched up. From there the “events” are passed to four other modules where more conventional general purpose GPUs check their characteristics to find the ones of interest.

Neufeld says this is specialist IT, with few wider applications, if any. The protocols handling data have to handle masses of data gathered in an environment swimming in radiation. Even nuclear power stations are less demanding.

But I’m not so sure. The capacity required is way beyond applications we currently envisage, such as self-driving cars and the Internet of Things (where data rates are well below the Tbps level). But the fact that what the LHCb is doing is even possible, could mean that someone, somewhere, will find another use for it.

But for the rest of us, for the foreseeable future, cloud HPC looks like the way to go...

Written by Peter Judge (Guest)

See Peter Judge (Guest)'s blog

Peter Judge is the Global Editor at Datacenter Dynamics. His main interests are networking, security, mobility and cloud. You can follow Peter at: @judgecorp

Related blogs

DNA research points to a new future for data storage

Readers of a certain age will remember buying much-loved albums on multiple formats. Perhaps first you had the vinyl version, then maybe a cassette for playing in the car and later a CD for added digital clarity. How about the DNA version? Earlier this year, Massive Attack, the British band, encoded their 1998 album Mezzanine into DNA to celebrate its 20th birthday.

Read more

TrueHPC at Industrial Scale

As we prepare to descend upon Frankfurt this week for ISC18, science will take its appropriate place at center stage. Throughout the week, the gathering will hear how through science we are challenging the world’s toughest problems, dissecting those problems down to their foundations and then building them back up by methodically moving every minute detail that has been learned into the powerful realm of the supercomputer. Science shows us that the best innovations are created by literally starting from scratch and building from there.

Read more

Industrial HPC Solutions: Visualisation

When you imagine what visualisation is in the world of HPC, most people think of astronomy, such as images of galaxies or black holes, or they think of weather, like analyses of tornadoes or hurricanes. Astronomical and atmospheric data is huge, requires HPC to analyse, and can make for amazing, sophisticated visualisations.

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.