Heavy metal and tag lines – Rumours from the trade show floor

Tech Trends Insights

On the cusp of spring I regularly refresh my GPU technology suntan at the Nvidia GPU Technology Conference (GTC) in San Jose. This year was fascinating as the speed and scale of both AI and Virtual Reality industries has leapt forward. Here are my takeaways...

On that note, my favourite announcement this year was the “Heavy Metal” DGX-2 monster GPU box, which under the bonnet provides:

  • 16 GPUs burning 10kW of power
  • 2,000 TFLOPs
  • Consistent memory model across all GPUs connected with the new NVSwitch
  • 300 GB/s chip-to-chip communication at 12 times the speed of PCIe
  • 350 lbs!
  • And all for only $399,000

Not since the days of mainframe and minicomputers have I seen such a heavy solution. An early internet router from Motorola had a similar weight and needed riggers to install, and the floor to be certified. Perhaps one reason why Cisco made better progress in that market. However, I do sense that there is an appetite for this box, despite its weight, in the AI community. Jen-Hsun clearly had a marketing dude whispering in his ear because for each of the ten new products announced, we got exposed to the new tag-line: “The more you buy, the more you save” at least twice. It echoed around the hallways for the remainder of the show. Whether it was effective only time will tell.

In-between sessions I was happy to be interviewed by the team at InsideHPC on my take on how AI and GPU use is developing and how AI is helping 'turbo-charge' traditional high performance computing (HPC) applications. You can watch the video interview here.

The two other announcements that resonated with me were the integration with Kubernetes for GPU container managements and the DRIVE™ Constellation, available in Q3, which simulates a multitude of sensors for autonomous vehicle testing. The first server runs NVIDIA DRIVE Sim software to simulate a self-driving vehicle’s sensors, such as cameras, lidar and radar. The second contains a powerful NVIDIA DRIVE Pegasus™ AI car computer that runs the complete autonomous vehicle software stack and processes the simulated data as if it were coming from the sensors of a car driving on the road. With this solution NVIDIA both streamlines the amount of road testing necessary to achieve the full test regime coverage - a hot topic following the recent unfortunate crash in Arizona - and a one-stop GPU environment for autonomous vehicle developers.

Most attendees were heavy-duty technologists keen to learn the newest GPU exploitation techniques. Nevertheless my 20-year old EE degree allowed me to ingest, at a high level, lots of exciting advancements. The following NVIDIA slide shows the rapid evolution of deep neural network (DNN) training techniques over the last few years:

Walking the halls and waiting in-line for lunch I heard a lot of chatter about capsule networks. Here is a good explanation for science students. If you can understand this, then you are a candidate to become one of the next 1,000,000 GPU developers!

As the conference progressed I sensed an interesting tension resulting from the NVIDIA 1080ti End-user License Agreement (EULA) update which restricted its use in data center environments for activity other than crypto-currency mining. A few developers had prototyped their solution using these inexpensive video GPUs and were now wanting to scale their solution but felt limited from doing so without using the V100 class GPUs, which cost 10 times as much and deliver lots more performance. Others who had already passed the prototype to production milestone were only too keen to embrace the V100, DGX-1, DGX-2 class devices and exploit their deep neural networks to the full. This was especially so for well-funded larger companies.

A particularly interesting panel discussed data science best practices as a by-product of promoting the DGX-2. I discovered that data scientists have short attention spans and don’t like to be kept waiting, something I can really relate to. Consequently they focus on their projects where the DNN training takes less than a couple of days. When it takes much longer they focus on finding a new gig with the necessary hardware to train the DNN in less than two days. Hence the best practice is to provide a local workstation with a couple of GPUs for prototyping and a heavy-duty GPU cloud for full product DNN training:

Note: Thanks to Deepgram’s Scott Stephenson for his Speech DNN Training Insight.

Additionally, the current fad to move everything to the cloud does not scale well to DNN training with GPUs. On AWS adding additional GPUs helps the DNN training speed until the fifth one is added, at which point the solution becomes memory communications limited between the GPUs and without InfiniBand or other memory bandwidth accelerators, not available on AWS, the training process slows down compared to 4 GPUs and costs more!

I spent two days trying to convince people in the hallways that there was a much better tag-line than “The more you buy, the more you save” but it’s not clear that Verne Global’s “All AI training roads lead to Iceland” made a dent versus Jen-Hsun’s keynote. Please help me out and tell your GPU friends! In the meantime, here is a great keynote live-blog summary a good 15-minute video summary and my video interview with InsideHPC is here. Enjoy!

Let’s chat at the Rise of AI conference in Berlin on May 17th - bob.fletcher@verneglobal.com

Written by Bob Fletcher

See Bob Fletcher's blog

Bob, a veteran of the telecommunications and technology industries, is Verne Global's VP of Strategy. He has a keen interest in HPC and the continuing evolution of AI and deep neural networks (DNN). He's based in Boston, Massachusetts.

Related blogs

Trends Advancing Industrial HPC

As we build-up to SC18, Verne Global is delighted to welcome Brendan McGinty, Director of Industry for the National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign, as a Guest Blogger. In his first blog Brendan looks at the industry trends around supercomputing and industrial HPC, and how these are advancing innovation in key markets.

Read more

Next generation energy storage solutions: An emerging option for enhancing data center reliability

For years, data centers have been haunted by the threat of power outages and the associated costs of such events. This situation is getting worse, with the most recent numbers from a 2016 report by the Ponemon Institute indicating that the average costs of a data center outage rose from $505,500 in 2010 to over $740,000 in 2015, while the maximum cost increased from $1.0 million to $2.4 million. How can next generation energy storage solutions help?

Read more

US CLOUD Act raises new data privacy issues

At the end of March, Donald Trump signed into law a $1.3 trillion spending bill that covered a vast range of policy areas. The 2,232-page bill ensured that the US Government would not shut down – at least until September – but it also provided an excellent opportunity for legislators to add other measures to the ‘omnibus’ bill, which, according to Senator Rand Paul, was passed without anyone having read the whole thing.

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.