NVIDIA GB200 (Grace Blackwell 200, 2024-Q4)

Product Overview

NVIDIA GB200 (Grace Blackwell 200) is NVIDIA's second-generation Grace Blackwell Superchip, entering mass production Q4 2024. It pairs a B200 GPU with an Arm Grace 72-core CPU via NV-HBI 900 GB/s high-speed interconnect — 1 Superchip = 1 GPU + 1 CPU. 72 GB200 units form an NVL72 rack delivering 1 EFLOPS FP4 sparse, NVLink 5 130 TB/s interconnect, and ConnectX-8 800G networking — NVIDIA's 2024–2025 data center AI flagship.

Generational comparison:

GH200 (2023-Q3): Grace + H100, FP8 1 PF sparse, NVLink 4 60 TB/s
GB200 (2024-Q4): Grace + B200, FP4 10 PF sparse, NVLink 5 130 TB/s, 1 EFLOPS / NVL72
GB300 (2025 H2): Vera 88-core + B300 Ultra + ConnectX-9 1.6T, 1.08 EFLOPS / NVL72 (page available)

Core Specifications

Parameter	Value
Architecture	Grace Blackwell 200 Superchip
GPU Chip	1× B200 (Blackwell)
CPU Chip	1× Arm Grace (72-core Neoverse V2)
NV-HBI	900 GB/s bidirectional CPU-GPU interconnect
CPU-GPU Coherent Memory	Unified addressing
GPU Memory	192GB HBM3E
GPU Bandwidth	8 TB/s
CPU Memory	480GB LPDDR5X (on Grace)
CPU Bandwidth	512 GB/s
FP4 Sparse	20 PFLOPS (B200 single GPU ×2 Superchip)
FP8 Dense	4.5 PFLOPS
BF16 Dense	2.25 PFLOPS
TDP (per Superchip)	1000W
Form Factor	Board-integrated (non-removable)
Mass Production	2024-Q4
Unit Price	~$60,000–70,000 (Superchip module)

GB200 NVL72 Rack

Parameter	Configuration
Superchip Count	72× GB200
GPU Count	72× B200
CPU Count	72× Arm Grace (72 cores × 72 = 5,184 cores)
Total HBM	13.8 TB HBM3E
Total LPDDR5X	34.6 TB
NVLink 5 Domain	130 TB/s fully connected
ConnectX-8 Egress	72× 800G = 57.6 Tb/s
FP4 Sparse Total	1,440 PFLOPS (72× 20 PFLOPS)
FP8 Dense Total	324 PFLOPS (72× 4.5 PFLOPS)
Rack TDP	~120 kW
Rack Count	8 (standard data center row)
Price	~$3M / rack (estimated)

GB200 NVL576 (8-Rack)

Parameter	Configuration
Superchip Count	576× GB200
GPU Count	576× B200
NVLink 5 Domain	Cross-rack 130 TB/s
Total HBM	110 TB
FP4 Sparse Total	11.52 EFLOPS (576× 20 PFLOPS)
FP8 Dense Total	2.6 EFLOPS (576× 4.5 PFLOPS)
Rack TDP	960 kW
Price	~$24M

GB200 NVL576 advantage: 8 racks, 576 GPUs sharing a single 130 TB/s NVLink domain — the largest single AI compute domain in the industry as of 2024, critical for trillion-parameter LLM training.

GH200 → GB200 → GB300 Comparison

Metric	GH200 (2023-Q3)	GB200 (2024-Q4)	GB300 (2025 H2)
GPU	H100	B200	B300 Ultra
CPU	Grace 72-core	Grace 72-core	Vera 88-core
GPU Memory	96GB HBM3	192GB HBM3E	288GB HBM3E
GPU Bandwidth	3.35 TB/s	8 TB/s	10 TB/s
NVLink Domain	60 TB/s	130 TB/s	130 TB/s
Networking	ConnectX-7 400G	ConnectX-8 800G	ConnectX-9 1.6T
FP4 Sparse	N/A (FP8 2 PF)	10 PF	15 PF
FP8 Dense	1 PF	2.25 PF	3.75 PF
TDP (Superchip)	1000W	1000W	1200W

ConnectX-8 800G Networking

Dimension	Specification
Speed	800 Gb/s single port (2× ConnectX-7)
Port Count	2–4 per Superchip
Protocol	InfiniBand NDR / RoCE v2
Latency	< 0.5 μs
GPUDirect	GPU-NIC direct DMA
Congestion Control	SHARP v3
2024 Deployments	ORNL Aurora successor, CSCS Alps, EuroHPC

ConnectX-8 upgrade: 2× speed (400G → 800G), GPUDirect RDMA 3.0, supports NVLink over IB (cross-rack NVLink).

Arm Grace 72-Core

Dimension	Specification
Architecture	Arm Neoverse V2
Core Count	72 cores
L3 Cache	Shared 192MB
LPDDR5X	480GB
Bandwidth	512 GB/s
TDP	200W (CPU only)
PCIe	Gen5 ×32
Features	SVE2 enhanced

Grace vs Vera upgrade: Vera is Grace's successor (88-core + 256MB L3 + 480GB LPDDR5X). GB200 still uses Grace 72-core; only GB300 upgrades to Vera.

GB200 Use Cases

✅ Trillion-parameter LLM training (NVL576 domain, 130 TB/s NVLink)
✅ MoE model training (expert parallel + tensor parallel)
✅ Ultra-large-scale RLHF (576 GPU synchronized)
✅ Multimodal large models (video + text + image)
✅ AI for Science (climate, materials, life sciences)
✅ Cloud AI services (CoreWeave, Lambda, OVHcloud)
❌ Small-scale inference (cost-prohibitive)
❌ China market (export controls)

GB200 Customers

Meta: Llama 4 / 5 training (>$10B order)
Microsoft Azure: OpenAI GPT-5 + Copilot
Google Cloud: Gemini 1.5 / 2.0
AWS: Anthropic Claude 4 + Bedrock
CoreWeave: 30K+ GB200 deployed (H1 2025)
xAI Grok 3: Colossus cluster 100K+ GB200
Oracle Cloud: OCI deployment
Lambda Labs: Lambda 1-Click Cluster

Vendor Information

Parameter	Value
Company	NVIDIA Corporation
Product Page	https://www.nvidia.com/en-us/data-center/grace-blackwell/
CEO	Jensen Huang
Foundry	TSMC 4NP (B200) + TSMC N3 (Grace)
2024-Q4 Mass Production	Yes
Price	Superchip ~$60–70K, NVL72 ~$3M

GB200 vs GB300

Metric	GB200 (2024-Q4)	GB300 (2025 H2)
GPU	B200	B300 Ultra
CPU	Grace 72-core	Vera 88-core
GPU Memory	192GB HBM3E	288GB HBM3E
GPU Bandwidth	8 TB/s	10 TB/s
Networking	ConnectX-8 800G	ConnectX-9 1.6T
FP4 Sparse	10 PF	15 PF
FP8 Dense	2.25 PF	3.75 PF
TDP (Superchip)	1000W	1200W

GB300 upgrade: GPU memory +50% (192→288GB), compute +50% (FP8 2.25→3.75 PF), networking 2× (800G→1.6T), comparable price.

Key Features

NVL72 domain: 72 GPUs sharing 130 TB/s NVLink
NVL576 domain: 576 GPUs across 8 racks NVLink
ConnectX-8 800G: 800G single port
Arm Grace 72-core: CPU + GPU unified memory addressing
FP4 10 PFLOPS: Inference optimized
Drawbacks: 1000W TDP, export controls, CUDA-only software

NVIDIA GB300 (Grace Blackwell 300) — Next generation
NVIDIA B200 — Same-generation GPU
NVIDIA B300 Ultra — Next-generation GPU
NVIDIA H100 — Previous generation
NVIDIA H200 — Contemporary
NVIDIA Rubin R200 — 2026
AMD MI400 — Competitor
Huawei Ascend 920 — Domestic comparison
Groq 3 LPX (Post-NVIDIA Acquisition) — Groq 3rd gen

Product Overview​

Core Specifications​

GB200 NVL72 Rack​

GB200 NVL576 (8-Rack)​

GH200 → GB200 → GB300 Comparison​

ConnectX-8 800G Networking​

Arm Grace 72-Core​

GB200 Use Cases​

GB200 Customers​

Vendor Information​

GB200 vs GB300​

Key Features​

Related Cards​