NVIDIA RTX 5090 (Blackwell Consumer Flagship)

Product Overview

The NVIDIA RTX 5090, unveiled at CES 2025-01, is the consumer flagship bringing the Blackwell architecture to consumer GPUs for the first time. With 32GB GDDR7 memory, 21,760 CUDA cores, and a 575W TDP, it delivers 3,352 TOPS of AI compute (FP4) — 2.5× that of the RTX 4090.

Positioned for local LLM inference (70B+ models), Stable Diffusion XL training, and consumer AI developers.

Core Specifications

Parameter	Value
Architecture	Blackwell (GB202)
Process Node	TSMC 4N (custom 5nm)
CUDA Cores	21,760
Tensor Cores	680 (5th Gen)
RT Cores	170 (4th Gen)
Base Clock	2.01 GHz
Boost Clock	2.41 GHz
Memory	32 GB GDDR7
Memory Bandwidth	1,792 GB/s (28 Gbps × 512-bit)
FP32 Compute	104.8 TFLOPS
FP16 Tensor	419 TFLOPS (sparse)
FP8 Tensor	838 TFLOPS (sparse)
FP4 Tensor	3,352 TOPS (sparse)
INT8 Tensor	1,676 TOPS
TDP	575 W
Power Connector	1× 16-pin (12V-2x6)
MSRP	$1,999
Launch Date	2025-01-30

RTX 5090 vs RTX 4090 Comparison

Metric	RTX 5090	RTX 4090	Improvement
Architecture	Blackwell	Ada Lovelace	New gen
CUDA Cores	21,760	16,384	1.33×
Memory	32GB GDDR7	24GB GDDR6X	1.33×
Memory Bandwidth	1,792 GB/s	1,008 GB/s	1.78×
FP16 Tensor	419 TFLOPS	165 TFLOPS	2.5×
FP4 Tensor	3,352 TOPS	N/A	New
TDP	575W	450W	1.28×
Price	$1,999	$1,599	1.25×

Blackwell New Features

FP4 Precision Support

Native FP4 Tensor Cores (first time on consumer GPUs).
Reduces inference memory footprint by 50% (vs FP8).
70B LLM can run FP4 quantized within 32GB memory (~40GB model compressed).

DLSS 4 Multi Frame Generation

Multi Frame Generation: Generates 3 frames from 1 (vs DLSS 3's 1 frame from 1).
Gaming-only, but showcases Blackwell's compute power.

GDDR7 Memory

28 Gbps speed (vs GDDR6X 21 Gbps).
1,792 GB/s bandwidth = 2× RTX 4090.
Alleviates the memory-bound bottleneck in LLM inference.

LLM Inference Performance

Model	Quantization	RTX 5090 (32GB)	RTX 4090 (24GB)	Improvement
Llama 3 8B	FP16	~95 tok/s	~70 tok/s	1.36×
Llama 3 70B	FP4	~28 tok/s	OOM	Breakthrough
Llama 3 70B	INT4	~22 tok/s	~15 tok/s	1.47×
Mixtral 8x7B	INT4	~45 tok/s	~32 tok/s	1.41×
Qwen 2.5 72B	FP4	~26 tok/s	OOM	Breakthrough

70B model FP4 quantized (~40GB) fully fits in VRAM — 32GB memory is the key enabler.

Vendor Information

Parameter	Value
Vendor	NVIDIA Corporation
Product Page	https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/
MSRP	$1,999 (FE Founders Edition)
Target Market	Consumer AI, creators, researchers, local LLM

Use Cases

✅ Local 70B LLM inference (FP4 quantized, 32GB VRAM)
✅ Stable Diffusion XL / Flux training and inference
✅ Video production (DaVinci Resolve AI acceleration)
✅ 8K gaming + frame generation
❌ Data center (use H100/B200 instead)
❌ Multi-node training (lacks NVLink)

NVIDIA RTX 4090 — Previous consumer flagship
NVIDIA RTX 5080 — Same-generation sub-flagship
NVIDIA B200 — Data center same-generation
NVIDIA H100 NVL — 94GB dual-die

Product Overview​

Core Specifications​

RTX 5090 vs RTX 4090 Comparison​

Blackwell New Features​

FP4 Precision Support​

DLSS 4 Multi Frame Generation​

GDDR7 Memory​

LLM Inference Performance​

Vendor Information​

Use Cases​

Related Cards​