On the cusp of spring I regularly refresh my GPU technology suntan at the Nvidia GPU Technology Conference (GTC) in San Jose. This year was fascinating as the speed and scale of both AI and Virtual Reality industries has leapt forward. Here are my takeaways...
On that note, my favourite announcement this year was the “Heavy Metal” DGX-2 monster GPU box, which under the bonnet provides:
- 16 GPUs burning 10kW of power
- 2,000 TFLOPs
- Consistent memory model across all GPUs connected with the new NVSwitch
- 300 GB/s chip-to-chip communication at 12 times the speed of PCIe
- 350 lbs!
- And all for only $399,000
Not since the days of mainframe and minicomputers have I seen such a heavy solution. An early internet router from Motorola had a similar weight and needed riggers to install, and the floor to be certified. Perhaps one reason why Cisco made better progress in that market. However, I do sense that there is an appetite for this box, despite its weight, in the AI community. Jen-Hsun clearly had a marketing dude whispering in his ear because for each of the ten new products announced, we got exposed to the new tag-line: “The more you buy, the more you save” at least twice. It echoed around the hallways for the remainder of the show. Whether it was effective only time will tell.
In-between sessions I was happy to be interviewed by the team at InsideHPC on my take on how AI and GPU use is developing and how AI is helping 'turbo-charge' traditional high performance computing (HPC) applications. You can watch the video interview here.
The two other announcements that resonated with me were the integration with Kubernetes for GPU container managements and the DRIVE™ Constellation, available in Q3, which simulates a multitude of sensors for autonomous vehicle testing. The first server runs NVIDIA DRIVE Sim software to simulate a self-driving vehicle’s sensors, such as cameras, lidar and radar. The second contains a powerful NVIDIA DRIVE Pegasus™ AI car computer that runs the complete autonomous vehicle software stack and processes the simulated data as if it were coming from the sensors of a car driving on the road. With this solution NVIDIA both streamlines the amount of road testing necessary to achieve the full test regime coverage - a hot topic following the recent unfortunate crash in Arizona - and a one-stop GPU environment for autonomous vehicle developers.
Most attendees were heavy-duty technologists keen to learn the newest GPU exploitation techniques. Nevertheless my 20-year old EE degree allowed me to ingest, at a high level, lots of exciting advancements. The following NVIDIA slide shows the rapid evolution of deep neural network (DNN) training techniques over the last few years:
Walking the halls and waiting in-line for lunch I heard a lot of chatter about capsule networks. Here is a good explanation for science students. If you can understand this, then you are a candidate to become one of the next 1,000,000 GPU developers!
As the conference progressed I sensed an interesting tension resulting from the NVIDIA 1080ti End-user License Agreement (EULA) update which restricted its use in data center environments for activity other than crypto-currency mining. A few developers had prototyped their solution using these inexpensive video GPUs and were now wanting to scale their solution but felt limited from doing so without using the V100 class GPUs, which cost 10 times as much and deliver lots more performance. Others who had already passed the prototype to production milestone were only too keen to embrace the V100, DGX-1, DGX-2 class devices and exploit their deep neural networks to the full. This was especially so for well-funded larger companies.
A particularly interesting panel discussed data science best practices as a by-product of promoting the DGX-2. I discovered that data scientists have short attention spans and don’t like to be kept waiting, something I can really relate to. Consequently they focus on their projects where the DNN training takes less than a couple of days. When it takes much longer they focus on finding a new gig with the necessary hardware to train the DNN in less than two days. Hence the best practice is to provide a local workstation with a couple of GPUs for prototyping and a heavy-duty GPU cloud for full product DNN training:
Note: Thanks to Deepgram’s Scott Stephenson for his Speech DNN Training Insight.
Additionally, the current fad to move everything to the cloud does not scale well to DNN training with GPUs. On AWS adding additional GPUs helps the DNN training speed until the fifth one is added, at which point the solution becomes memory communications limited between the GPUs and without InfiniBand or other memory bandwidth accelerators, not available on AWS, the training process slows down compared to 4 GPUs and costs more!
I spent two days trying to convince people in the hallways that there was a much better tag-line than “The more you buy, the more you save” but it’s not clear that Verne Global’s “All AI training roads lead to Iceland” made a dent versus Jen-Hsun’s keynote. Please help me out and tell your GPU friends! In the meantime, here is a great keynote live-blog summary a good 15-minute video summary and my video interview with InsideHPC is here. Enjoy!