NVIDIA DGX A100 - The Universal System for AI Infrastructure

XENON NVIDIA DGX A100 NVIDIA’s third generation DGX system – the DGX A100 – represents a massive improvement in all areas of the underlying architecture. The result is a 6RU beast that can flexibly perform every AI infrastructure task – data analytics, model training, and inference. This is achieved through new acceleration in the GPU and networking, as well as a new flexible architecture. In this post we take a look under the hood to explain what makes the DGX A100 a truly special innovation from NVIDIA.

NVIDIA A100 GPU

The starting point for the DGX A100 is the new A100 Tensor Core GPU. This new GPU from NVIDIA delivers 20x the performance of the previous Volta GPUs in TF32 training and INT8 inference. Running at 19.5 TFLOPS, the FP64 performance is 2.5x higher than that of the previous Tesla Volta V100 units. In addition to all this extra horsepower, each A100 can be split into 7 separate GPU instances, which can be used individually or combined across GPUs. As of November 2020, there versions available – with either 40GB or 80GB memory per GPU.

Networking

Each A100 packs 12 NVLink connections, making each A100 capable of 600 GB/s bi-directional bandwidth between any two GPUs in the DGX A100. All GPUs are connected with six next generation NVSwitches, giving an overall 4.8TB/s bi-directional bandwidth. What does this mean in practice? The system could transfer 426 hours of HD video in a single second!

The DGX A100 also comes with 9 Mellanox ConnextX-6 NICs each providing 200 Gb/s of network bandwidth.

Memory, CPU, and Storage

The cache, processing memory and on-board GPU memory have all been increased to enable the massive accelerated GPU computing capabilities in the new DGX A100. The November 2020 update has doubled the memory available and given you two configurations to choose from. The DGX A100 can be configured with 40GB or 80GB of A100 GPU memory, for a total of 320GB or 640GB. The storage available likewise has two options – 15TB or 30TB of Gen4 NVMe SSD to hold large data sets and feed the data hungry A100 GPUs. The DGX A100 runs dual 64-core AMD Rome CPUs with 1TB or 2TB RAM.

MIG – Multi-instance GPU

The DGX A100 packs 8 of these new A100 GPUs, providing a huge boost in processing power at all levels, and providing up to 56 GPU instances to work with – either singularly or combined as required for the workload. This multi-GPU capability is provided in a flexible manner, and is the real secret sauce that makes the DGX A100 a universal AI engine.

Using MIG, you can optimise GPU utilisation, expand access to more users, and guarantee quality of service and performance. The 7 GPUs in each A100 can be combined to run workloads in parallel, from a few all the way up to using all 56 GPU instances at once. This flexibility is what allows the DGX A100 to adapt and serve the needs at each stage of the AI processing pipeline – from analytics to training to inference.

Elastic Infrastructure

AI workloads evolve over time. In the beginning, there are massive data sets to crunch and analytics work to be done. Then the workload shifts to training the machine learning models. Finally, inference takes centre stage and the workload shifts again. Typically, these three workloads have been performed in clusters of CPU and/or GPUs depending on the workloads. This often leaves resources under-utilised when the work moves through the stages, while it can also leave teams short of the resources they need for the job at hand. The new DGX A100 solves this problem through MIG, allowing the infrastructure to be more elastic. With the DGX A100 and MIG you can compose the infrastructure in a way that meets your needs at each stage of the AI workload. This innovation has the potential to replace racks of CPU clusters with a one or two DGX A100s.

XENON NVIDIA AI 5 Steps to Get Started EBook

Click on the thumbnail to view the PDF.

Get Started in AI

If you are early in your AI journey, Download this eBook from NVIDIA and XENON – How to Get Started in AI. It explains how the software stack scales across the NVIDIA platforms, the learning models, and how to start your AI journey.

Take the Next Step

Check out the full specs and details, or download the DGX A100 datasheet. XENON is the ANZ NVIDIA Elite partner, with vast experience of deploying DGX systems.

Learn how the DGX A100 can accelerate your time to insight, contact the XENON team today.

Get a Quote on a DGX A100 for your Team

Name*
First Last
Company or Organisation*
Email*
Phone
Any specific requirements or questions?
I understand XENON Privacy Policy.*
I am familiar with XENON's privacy policy and understand how my information will be used.
I consent to receiving information about XENON's products and services. I can unsubscribe at any time.*
- Yes
- No
Name
This field is for validation purposes and should be left unchanged.

Have a look under the hood in this video from NVIDIA.

Introducing NVIDIA DGX A100

Watch this video on YouTube

First published 20-May-2020. Last Post Update: 19-Nov-2020

DGX, DGX A100, NVIDIA, NVIDIA Ampere

LATEST STORIES

XENON NVIDIA Partner of the Year 04 Apr 2024 READ
AI in Overdrive: Powering the surge for future technology 07 Mar 2024 READ
FOMO Alert: Discover 7 Unmissable Reasons to Attend GTC 2024 06 Mar 2024 READ