• COVID-19 Info
  • About us
  • Careers
  • Support
  • Partners
  • Contact Us
What are you looking for?
  • Home
  • Products & Solutions
        • Products & Solutions

        • spacer

        • spacer

          • GPU Computing – NVIDIA
            • NVIDIA® Ampere®
            • NVIDIA® DGX POD
            • NVIDIA® DGX Systems
            • NVIDIA® EGX Edge Compute
            • NVIDIA® GPU Software
            • NVIDIA® Jetson Embedded Platforms
            • NVIDIA® Tesla®
            • GPU Computing – XENON
          • GPU Computing – XENON
            • XENON GPU Personal SuperComputers
            • XENON GPU Servers
          • Graphics Accelerators
            • NVIDIA® GeForce® GTX
            • NVIDIA Grid Solutions
            • NVIDIA® Quadro®
            • NVIDIA® Titan®
          • High Performance Computing
            • Cluster Management Software
            • GPU Clusters
            • HPC Clusters
            • HPC Cooling Solutions
            • HPC Services
            • Job Scheduling Software
          • Industry Solutions
            • Data Analytics
              • Hadoop
              • Kinetica
              • Parabricks
              • Vertica
            • Defence
            • Finance
            • Health & Life Sciences
            • Higher Education
            • Hyperconverged Infrastructure
            • Manufacturing & Design
            • Media and Entertainment
              • Catapult
              • Mediaproxy
              • Quantum StorNext
            • Oil & Gas Exploration
          • Networking
            • Ethernet – Mellanox Solutions
              • BlueField™ SmartNIC Ethernet
              • Mellanox Cables & Optical Modules
              • Mellanox Ethernet Cards
              • Mellanox Software
              • Spectrum SN2000 Switches
              • Spectrum SN3000 Switches
              • Rivermax™
            • Ethernet – Other Solutions
              • Arista Ethernet Switches
              • CSPi Ethernet Solutions
              • Extreme Networks ExaNIC & ExaLINK
              • Extreme Networks Summit & BlackDiamond
              • Metamako Switches
              • Solarflare Application Onload™ Engine (AOE)
              • Solarflare Flareon™ Adapters
              • Solarflare Software
            • InfiniBand – Mellanox Solutions
              • Mellanox InfiniBand Switches
              • Mellanox InfiniBand VPI Adapter Cards
              • Mellanox United Fabric Manager (UFM)
            • SDN – Cumulus
              • Cumulus Open Networking
              • Mellanox and Cumulus
          • Storage and Data Protection
            • Data Protection
              • Commvault
              • Quantum DXi Deduplication
              • Veeam
            • Software Defined Storage & File Systems
              • Ceph
              • Lustre Solutions
              • Versity
              • VM Virtual SAN
              • Weka
            • Storage Partners
              • Data Direct Networks (DDN)
              • NetApp
              • Nutanix (Lenovo) HCI
              • Panasas Storage Systems
              • Pure Storage
              • Quantum Storage
              • Seagate Enterprise Storage
              • Spectra Logic
              • TrueNAS
              • VAST Data
            • XENON PlatiNAS
          • XENON High Frequency Trading
            • HFT Networking
            • XENON eXtreme HFT Servers
            • XENON Packet Capture Appliance
            • XENON HFT Support
            • Xilinx FPGA Accelerator Cards
          • XENON Servers & Workstations
            • XENON NITRO™ Visual Workstations
              • More About XENON NITRO™ Range
            • XENON Rack Mount Servers
              • AMD – Dual Processor
              • AMD – Single Processor
              • Intel – Dual Processor
              • Intel – Multi Processor
              • Intel – Single Processor
            • XENON Rack Mount High Density
              • Blade Systems
              • Blades for Cloud
              • High Density Servers
            • XENON Tower Servers
              • Single Processor
              • Dual Processor
          • XENOptics Fibre Automation
        • spacer

        • spacer

  • Professional Services
        • Professional Services

        • spacer

        • spacer

          • Artificial Intelligence Consulting
            • AI Professional Services
            • AI Project Scoping Service
            • Infrastructure for AI
            • Managed Services for AI
          • Cloud Services
            • Cloud Orchestration
            • Hybrid Cloud Solutions
            • Private Cloud Solutions
          • Data Protection and Backup
          • HPC Services
            • HPC as a Service
            • HPC Cluster Management
          • Rentals
          • System Design
          • Training
            • Customised Workshops
            • Deep Learning Workshops
            • Deep Learning for Computer Vision
            • Deep Learning for Multiple Data Types
            • Docker Workshop
            • RAPIDS Workshop
        • spacer

        • spacer

  • Knowledge Centre
        • Knowledge Centre

          • All
          • News
          • Case Studies
          • White Papers
  • Shop @ XENON
  • Professional Services
  • Products & Solutions
  • Knowledge Centre
    • All
    • News
    • Case Studies
    • White Papers
  • About us
    • Careers
    • Community
    • COVID-19 Info
    • Policies
    • Terms and Conditions
  • Support
  • Our Partners & Certifications
  • Contact Us
Search

More results...

Generic filters
Filter by
Pages
Posts
Products
Partners

NVIDIA Tesla Volta: The World’s Most Advanced Data Center GPU

22 May 2017

XENON SystemsBack to NewsBack to New ProductsNVIDIA Tesla Volta: The World’s Most Advanced Data Center GPU

Tesla V100: The AI Computing and HPC Powerhouse

The NVIDIA® Tesla® V100 accelerator is the world’s highest performing parallel processor, designed to power the most computationally intensive HPC, ArtificiaI Intelligence, and graphics workloads.

The GV100 GPU includes 21.1 billion transistors with a die size of 815 mm2. It is fabricated on a new TSMC 12 nm FFN high performance manufacturing process customized for NVIDIA. GV100 delivers considerably more compute performance, and adds many new features compared to its predecessor, the Pascal GP100 GPU and its architecture family. Further simplifying GPU programming and application porting, GV100 also improves GPU resource utilization. GV100 is an extremely power-efficient processor, delivering exceptional performance per watt. Figure 2 shows Tesla V100 performance for deep learning training and inference using the ResNet-50 deep neural network.


Figure 1: Left: Tesla V100 trains the ResNet-50 deep neural network 2.4x faster than Tesla P100. Right: Given a target latency per image of 7ms, Tesla V100 is able to perform inference using the ResNet-50 deep neural network 3.7x faster than Tesla P100. (Measured on pre-production Tesla V100.)

 

COMPARISONS:

This table 1 compares NVIDIA® Tesla® accelerators over the past 5 years.

Tesla ProductTesla K40Tesla M40Tesla P100Tesla V100
P100Cards-PCIe_06212016
GPUGK180 (Kepler)GM200 (Maxwell)GP100 (Pascal)GV100 (Volta)
SMs15245680
TPCs15242840
FP32 Cores/SM1921286464
FP32 Cores/GPU2880307235845120
FP64 Cores/SM6443232
FP64 Cores/GPU9609617922560
Tensor Cores/SMNANANA8
Tensor Cores/GPUNANANA640
GPU Boost Clock810/875 MHz1114 MHz1480 MHz1455 MHz
Peak FP32 TFLOP/s*5.046.810.615
Peak FP64 TFLOP/s*1.682.15.37.5
Peak Tensor Core TFLOP/s*NANANA120
Texture Units240192224320
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM2
Memory SizeUp to 12 GBUp to 24 GB16 GB16 GB
L2 Cache Size1536 KB3072 KB4096 KB6144 KB
Shared Memory Size/SM16 KB/32 KB/48 KB96 KB64 KBConfigurable up to 96 KB
Register File Size/SM256 KB256 KB256 KB256KB
Register File Size/GPU3840 KB6144 KB14336 KB20480 KB
TDP235 Watts250 Watts300 Watts300 Watts
Transistors7.1 billion8 billion15.3 billion21.1 billion
GPU Die Size551 mm²601 mm²610 mm²815 mm²
Manufacturing Process28 nm28 nm16 nm FinFET+12 nm FFN

 

Table 1: Tesla V100 Compared to Prior Generation Tesla Accelerators.
(* Peak TFLOP/s rates are based on GPU Boost clock.)

Key Features

Key compute features of Tesla V100 include the following:

  • Tensor Core
    An Exponential Leap in Performance

    New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Volta features a major new redesign of the SM processor architecture that is at the center of the GPU. The new Volta SM is 50% more energy efficient than the previous generation Pascal design, enabling major boosts in FP32 and FP64 performance in the same power envelope. New Tensor Cores designed specifically for deep learning deliver up to 12x higher peak TFLOPs for training. With independent, parallel integer and floating point datapaths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations. Volta’s new independent thread scheduling capability enables finer-grain synchronization and cooperation between parallel threads. Finally, a new combined L1 Data Cache and Shared Memory subsystem significantly improves performance while also simplifying programming.

  • HBM2 Memory: Faster, Higher Efficiency Volta’s highly tuned 16GB HBM2 memory subsystem delivers 900 GB/sec peak memory bandwidth. The combination of both a new generation HBM2 memory from Samsung, and a new generation memory controller in Volta, provides 1.5x delivered memory bandwidth versus Pascal GP100 and greater than 95% memory bandwidth efficiency running many workloads.
  • Volta Multi-Process Service Volta Multi-Process Service (MPS) is a new feature of the Volta GV100 architecture providing hardware acceleration of critical components of the CUDA MPS server, enabling improved performance, isolation, and better quality of service (QoS) for multiple compute applications sharing the GPU. Volta MPS also triples the maximum number of MPS clients from 16 on Pascal to 48 on Volta.
  • NVIDIA® NVLink
    Scalability for Rapid Time-to-Solution

    Second-Generation NVLink™ The second generation of NVIDIA’s NVLink high-speed interconnect delivers higher bandwidth, more links, and improved scalability for multi-GPU and multi-GPU/CPU system configurations. GV100 supports up to 6 NVLink links at 25 GB/s for a total of 300 GB/s. NVLink now supports CPU mastering and cache coherence capabilities with IBM Power 9 CPU-based servers. The new NVIDIA DGX-1 with V100 AI supercomputer uses NVLink to deliver greater scalability for ultra-fast deep learning training.

  • Enhanced Unified Memory and Address Translation Services GV100 Unified Memory technology in Volta GV100 includes new access counters to allow more accurate migration of memory pages to the processor that accesses the pages most frequently, improving efficiency for accessing memory ranges shared between processors. On IBM Power platforms, new Address Translation Services (ATS) support allows the GPU to access the CPU’s page tables directly.
  • Cooperative Groups and New Cooperative Launch APIs Cooperative Groups is a new programming model introduced in CUDA 9 for organizing groups of communicating threads. Cooperative Groups allows developers to express the granularity at which threads are communicating, helping them to express richer, more efficient parallel decompositions. Basic Cooperative Groups functionality is supported on all NVIDIA GPUs since Kepler. Pascal and Volta include support for new Cooperative Launch APIs that support synchronization amongst CUDA thread blocks. Volta adds support for new synchronization patterns.
  • Volta-Optimized Software
    GPU-Accelerated Frameworks and Applications

    Volta Optimized Software New versions of deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others harness the performance of Volta to deliver dramatically faster training times and higher multi-node training performance. Volta-optimized versions of GPU accelerated libraries such as cuDNN, cuBLAS, and TensorRT leverage the new features of the Volta GV100 architecture to deliver higher performance for both deep learning and High Performance Computing (HPC) applications. The NVIDIA CUDA Toolkit version 9.0 includes new APIs and support for Volta features to provide even easier programmability.

  • Maximum Performance and Maximum Efficiency Modes In Maximum Performance mode, the Tesla V100 accelerator will operate unconstrained up to its TDP (Thermal Design Power) level of 300W to accelerate applications that require the fastest computational speed and highest data throughput. Maximum Efficiency Mode allows data center managers to tune power usage of their Tesla V100 accelerators to operate with optimal performance per watt. A not-to-exceed power cap can be set across all GPUs in a rack, reducing power consumption dramatically, while still obtaining excellent rack performance.

For more information, please contact XENON or call to speak to a solutions architect, 1300 030 888.

NVIDIA, NVIDIA Tesla, NVIDIA Tesla V100, V100

LATEST STORIES

  • XENON GCC Certificate Announcement Banner
    XENON Now ISO 9001 Certified 20 Jan 2021 READ
  • Universal Storage for AI 18 Dec 2020 READ
  • Holiday Hours Christmas 2020 16 Dec 2020 READ
Australia: 1300 888 030 | International: +61 3 9549 1111

National Head Office

XENON Systems Pty Ltd
10 Westall Road,
Clayton South
Victoria 3169

XENON

  • About us
  • Careers
  • Community
  • Contact Us
  • Partners

Popular Solutions

  • HFT Servers
  • HPC Clusters
  • GPU Computing
  • Servers & Workstations

Popular Services

  • AI/ML Consulting
  • Cloud Services
  • Backup & DR
  • Deep Learning
  • HPC Services

Sitemap

  • Shop @ XENON
  • Knowledge Centre
  • Policies
  • Support
  • Terms and Conditions

© Copyright XENON Systems Pty Ltd 2020

  • This form should only take 4 min
    and we will send you a copy of this enquiry to your email address.

  • This field is for validation purposes and should be left unchanged.
  • This form should only take 4 min
    and we will send you a copy of this enquiry to your email address.

  • This field is for validation purposes and should be left unchanged.