<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://mirrorfrog.com/en/blog</id>
    <title>MirrorFrog AI 计算卡行业动态</title>
    <updated>2026-06-01T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://mirrorfrog.com/en/blog"/>
    <subtitle>AI 计算卡行业最新动态、深度技术分析、厂商战略</subtitle>
    <icon>https://mirrorfrog.com/en/img/favicon.ico</icon>
    <rights>Copyright © 2026 MirrorFrog AI Compute Cards Wiki</rights>
    <entry>
        <title type="html"><![CDATA[2026 H2 顶级 AI 芯片选型指南：从 H100 到 Rubin、MI400、TPU 8t、TPU 8i]]></title>
        <id>https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide</id>
        <link href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide"/>
        <updated>2026-06-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026 H2 顶级 AI 芯片完整选型指南：NVIDIA Rubin R200、AMD MI400、TPU 8t/8i、Trainium 3、Ascend 920、Groq 3 LPX。从训练到推理、从 LLM 70B 到 1T+ 模型的完整选型树。]]></summary>
        <content type="html"><![CDATA[<p>2026 H2 是 AI 算力市场最丰富的时代：NVIDIA Rubin R200、AMD MI400、Trainium 3、TPU 8t/8i、Ascend 920、Groq 3 LPX 全部就位。本文提供<strong>完整选型树</strong>，帮助你根据<strong>模型规模、训练/推理、延迟要求、预算、地区</strong>选择最合适的产品。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="选型决策树">选型决策树<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E9%80%89%E5%9E%8B%E5%86%B3%E7%AD%96%E6%A0%91" class="hash-link" aria-label="Direct link to 选型决策树" title="Direct link to 选型决策树" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">开始</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">├─ 任务类型？</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│   ├─ 训练 ──────────── [训练选型]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│   └─ 推理 ──────────── [推理选型]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">└─ 地区？</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    ├─ 北美 / 欧洲 ──── 全产品可选</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    ├─ 中国 ────────── Huawei Ascend 系列</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    └─ AWS Cloud ───── Trainium / Inferentia</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="训练选型">训练选型<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E8%AE%AD%E7%BB%83%E9%80%89%E5%9E%8B" class="hash-link" aria-label="Direct link to 训练选型" title="Direct link to 训练选型" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="100b-llm-训练">100B+ LLM 训练<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#100b-llm-%E8%AE%AD%E7%BB%83" class="hash-link" aria-label="Direct link to 100B+ LLM 训练" title="Direct link to 100B+ LLM 训练" translate="no">​</a></h3>
<table><thead><tr><th>优先级</th><th>方案</th><th>单机柜算力</th><th>100B 模型训练时间</th></tr></thead><tbody><tr><td><strong>1</strong></td><td>NVIDIA Rubin NVL72</td><td>3.6 EF FP4</td><td><strong>~1-2 天</strong>（300B tokens）</td></tr><tr><td>2</td><td>AWS Trn3 UltraServer (2+)</td><td>104 PF FP8</td><td>~3-5 天</td></tr><tr><td>3</td><td>AMD Helios</td><td>2.88 EF FP4 dense</td><td>~1-2 天</td></tr><tr><td>4</td><td>Google TPU 8t pod (大 pod)</td><td>590+ EF FP8 dense</td><td>~数小时（Google 内部）</td></tr></tbody></table>
<p><strong>推荐</strong>：</p>
<ul>
<li class="">商业云端：NVIDIA Rubin NVL72</li>
<li class="">成本敏感：AWS Trn3 UltraServer</li>
<li class="">开放生态：AMD Helios</li>
<li class="">Google Cloud：TPU 8t pod</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="70b-llm-训练">70B LLM 训练<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#70b-llm-%E8%AE%AD%E7%BB%83" class="hash-link" aria-label="Direct link to 70B LLM 训练" title="Direct link to 70B LLM 训练" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>配置</th><th>价格</th><th>推荐场景</th></tr></thead><tbody><tr><td>NVIDIA H200</td><td>8 卡 H200</td><td>~$264K</td><td>主流</td></tr><tr><td>NVIDIA B200</td><td>8 卡 B200</td><td>~$400K</td><td>高端</td></tr><tr><td>NVIDIA B300 Ultra</td><td>8 卡 B300</td><td>~$500K</td><td>最新</td></tr><tr><td>AMD MI300X</td><td>8 卡 MI300X</td><td>~$120K</td><td>性价比</td></tr><tr><td>AMD MI325X</td><td>8 卡 MI325X</td><td>~$160K</td><td>高显存</td></tr><tr><td>Trainium 2</td><td>trn2.48xlarge × 4</td><td>~$32/hr</td><td>AWS 客户</td></tr><tr><td>Trainium 3</td><td>trn3 UltraServer</td><td>~$5M</td><td>超大规模</td></tr></tbody></table>
<p><strong>推荐</strong>：</p>
<ul>
<li class="">商业主流：NVIDIA H200 8 卡</li>
<li class="">性能优先：NVIDIA B300 Ultra 8 卡</li>
<li class="">性价比：AMD MI300X 8 卡</li>
<li class="">AWS 云：Trainium 3 UltraServer</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="7b-13b-llm-训练">7B-13B LLM 训练<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#7b-13b-llm-%E8%AE%AD%E7%BB%83" class="hash-link" aria-label="Direct link to 7B-13B LLM 训练" title="Direct link to 7B-13B LLM 训练" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>配置</th><th>价格</th><th>推荐</th></tr></thead><tbody><tr><td>NVIDIA A100 80GB</td><td>8 卡 A100</td><td>~$160K</td><td>主流</td></tr><tr><td>NVIDIA H100</td><td>8 卡 H100</td><td>~$240K</td><td>高端</td></tr><tr><td>NVIDIA RTX 6000 Ada</td><td>4-8 卡</td><td>~$27K</td><td>工作站</td></tr><tr><td>AMD MI300X</td><td>8 卡 MI300X</td><td>~$120K</td><td>性价比</td></tr><tr><td>Intel Gaudi 3</td><td>8 卡 Gaudi 3</td><td>~$80K</td><td>预算敏感</td></tr></tbody></table>
<p><strong>推荐</strong>：</p>
<ul>
<li class="">商业主流：NVIDIA A100 80GB</li>
<li class="">高端：NVIDIA H100</li>
<li class="">工作站：NVIDIA RTX 6000 Ada</li>
<li class="">性价比：AMD MI300X</li>
<li class="">预算敏感：Intel Gaudi 3</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1b-3b-llm-训练">1B-3B LLM 训练<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#1b-3b-llm-%E8%AE%AD%E7%BB%83" class="hash-link" aria-label="Direct link to 1B-3B LLM 训练" title="Direct link to 1B-3B LLM 训练" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>配置</th><th>推荐</th></tr></thead><tbody><tr><td>NVIDIA RTX 4090</td><td>单卡</td><td>本地</td></tr><tr><td>NVIDIA RTX 5090</td><td>单卡</td><td>本地高端</td></tr><tr><td>NVIDIA A100 40GB</td><td>4 卡</td><td>商业</td></tr><tr><td>Intel Gaudi 2</td><td>8 卡</td><td>预算</td></tr><tr><td>Apple M3 Ultra</td><td>单工作站</td><td>本地 LLM</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="推理选型">推理选型<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E6%8E%A8%E7%90%86%E9%80%89%E5%9E%8B" class="hash-link" aria-label="Direct link to 推理选型" title="Direct link to 推理选型" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="70b-llm-推理单卡">70B+ LLM 推理（单卡）<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#70b-llm-%E6%8E%A8%E7%90%86%E5%8D%95%E5%8D%A1" class="hash-link" aria-label="Direct link to 70B+ LLM 推理（单卡）" title="Direct link to 70B+ LLM 推理（单卡）" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>FP16 70B 可装？</th><th>算力</th><th>推荐</th></tr></thead><tbody><tr><td><strong>NVIDIA B300 Ultra (288 GB)</strong></td><td>✅ 装 1 个</td><td>7 PF FP8</td><td>首选</td></tr><tr><td><strong>Google TPU 8i (288 GB HBM)</strong></td><td>✅ 装 1 个</td><td>11 PF FP8</td><td>Google Cloud</td></tr><tr><td><strong>AMD MI400 (432 GB HBM4)</strong></td><td>✅ 装 1 个</td><td>20 PF FP8 dense</td><td>2026</td></tr><tr><td><strong>NVIDIA H200 (141 GB)</strong></td><td>❌ 需 TP2</td><td>1.0 PF FP8</td><td>上一代</td></tr><tr><td><strong>AMD MI325X (256 GB)</strong></td><td>✅ 装 1 个</td><td>2.6 PF FP8</td><td>上一代</td></tr><tr><td><strong>NVIDIA Groq 3 LPX (128 GB SRAM/机柜)</strong></td><td>✅ 装 1 个</td><td>5.5 PF (机柜)</td><td>超低延迟</td></tr></tbody></table>
<p><strong>推荐</strong>：</p>
<ul>
<li class="">商业云：NVIDIA B300 Ultra 或 TPU 8i</li>
<li class="">大显存：AMD MI400 / TPU 8i</li>
<li class="">超低延迟：Groq 3 LPX</li>
<li class="">性价比：AMD MI325X</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="7b-30b-llm-推理">7B-30B LLM 推理<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#7b-30b-llm-%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to 7B-30B LLM 推理" title="Direct link to 7B-30B LLM 推理" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>显存</th><th>算力</th><th>价格</th><th>推荐</th></tr></thead><tbody><tr><td><strong>NVIDIA L40S</strong></td><td>48 GB</td><td>733 TF FP8</td><td>~$8K</td><td>通用</td></tr><tr><td><strong>NVIDIA A100 80GB</strong></td><td>80 GB</td><td>624 TOPS INT8</td><td>~$15K</td><td>大模型</td></tr><tr><td><strong>NVIDIA H100</strong></td><td>80 GB</td><td>4 PF FP8</td><td>~$30K</td><td>高性能</td></tr><tr><td><strong>Google TPU 8i</strong></td><td>288 GB</td><td>11 PF FP8</td><td>仅云</td><td>Google Cloud</td></tr><tr><td><strong>AWS Inferentia 2</strong></td><td>32 GB</td><td>190 TOPS</td><td>Inf2 实例</td><td>AWS</td></tr><tr><td><strong>Apple M3 Ultra</strong></td><td>192 GB</td><td>80 核 GPU</td><td>~$5K</td><td>本地</td></tr></tbody></table>
<p><strong>推荐</strong>：</p>
<ul>
<li class="">商业云：NVIDIA L40S / A100</li>
<li class="">AWS 云：Inferentia 2</li>
<li class="">Google Cloud：TPU 8i</li>
<li class="">本地：Apple M3 Ultra</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="超低延迟推理agentic-ai">超低延迟推理（Agentic AI）<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E8%B6%85%E4%BD%8E%E5%BB%B6%E8%BF%9F%E6%8E%A8%E7%90%86agentic-ai" class="hash-link" aria-label="Direct link to 超低延迟推理（Agentic AI）" title="Direct link to 超低延迟推理（Agentic AI）" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>TTFT</th><th>TPOT</th><th>价格</th><th>推荐</th></tr></thead><tbody><tr><td><strong>Groq 3 LPX</strong></td><td><strong>&lt; 20ms</strong></td><td><strong>&lt; 5ms</strong></td><td>$8-10M/机柜</td><td><strong>首选</strong></td></tr><tr><td>Groq LPU v1</td><td>~50ms</td><td>~10ms</td><td>$1.8M/机柜</td><td>备选</td></tr><tr><td>TPU 8i</td><td>~100ms</td><td>~15ms</td><td>云端</td><td>Google Cloud</td></tr><tr><td>NVIDIA H200</td><td>~200ms</td><td>~30ms</td><td>$30K</td><td>通用</td></tr><tr><td>AWS Inferentia 2</td><td>~200ms</td><td>~30ms</td><td>AWS 实例</td><td>AWS</td></tr></tbody></table>
<p><strong>推荐</strong>：</p>
<ul>
<li class="">Agentic AI（1000+ 调用/秒）：<strong>Groq 3 LPX</strong>（唯一选择）</li>
<li class="">Real-time Code Gen：Groq 3 LPX</li>
<li class="">中等延迟需求：TPU 8i / H200</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="模型规模速查">模型规模速查<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E6%A8%A1%E5%9E%8B%E8%A7%84%E6%A8%A1%E9%80%9F%E6%9F%A5" class="hash-link" aria-label="Direct link to 模型规模速查" title="Direct link to 模型规模速查" translate="no">​</a></h2>
<table><thead><tr><th>模型规模</th><th>单卡可装（FP16）</th><th>推荐训练</th><th>推荐推理</th></tr></thead><tbody><tr><td>1B-3B</td><td>任何 8GB+ GPU</td><td>RTX 4090 / A100</td><td>RTX 4090 / L4</td></tr><tr><td>7B</td><td>24 GB</td><td>A100 40GB × 4</td><td>L4 / L40S</td></tr><tr><td>13B</td><td>32 GB</td><td>A100 40GB × 4</td><td>L4 / L40S</td></tr><tr><td>30B</td><td>64 GB</td><td>A100 80GB × 4</td><td>L40S / H100</td></tr><tr><td>70B</td><td>141 GB</td><td>H200 × 8</td><td><strong>B300 Ultra 单卡 / TPU 8i</strong></td></tr><tr><td>405B</td><td>800 GB</td><td>NVL72</td><td>B300 Ultra × 4 / Rubin R200</td></tr><tr><td>1T+</td><td>2 TB</td><td>Rubin NVL576</td><td>Rubin R200 × 多卡 / LPX 协同</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="预算速查">预算速查<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E9%A2%84%E7%AE%97%E9%80%9F%E6%9F%A5" class="hash-link" aria-label="Direct link to 预算速查" title="Direct link to 预算速查" translate="no">​</a></h2>
<table><thead><tr><th>月预算</th><th>推荐训练配置</th><th>推荐推理配置</th></tr></thead><tbody><tr><td><strong>&lt; $5K</strong></td><td>RTX 4090 / 集群</td><td>L4 / T4</td></tr><tr><td><strong>$5K-20K</strong></td><td>8× A100 80GB</td><td>L40S / H100 单卡</td></tr><tr><td><strong>$20K-100K</strong></td><td>8× H100 / MI300X</td><td>H200 / B200</td></tr><tr><td><strong>$100K-500K</strong></td><td>8× B200 / NVL72</td><td>B300 Ultra / TPU 8i</td></tr><tr><td><strong>$500K-5M</strong></td><td>Rubin NVL72 / Helios</td><td>Rubin NVL72 / Helios</td></tr><tr><td><strong>$5M-50M</strong></td><td>Rubin NVL576 (8+)</td><td>Groq 3 LPX 机柜</td></tr><tr><td><strong>$50M+</strong></td><td>多数据中心</td><td>混合方案</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="地区速查">地区速查<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E5%9C%B0%E5%8C%BA%E9%80%9F%E6%9F%A5" class="hash-link" aria-label="Direct link to 地区速查" title="Direct link to 地区速查" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="中国市场必须国产">中国市场（必须国产）<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E4%B8%AD%E5%9B%BD%E5%B8%82%E5%9C%BA%E5%BF%85%E9%A1%BB%E5%9B%BD%E4%BA%A7" class="hash-link" aria-label="Direct link to 中国市场（必须国产）" title="Direct link to 中国市场（必须国产）" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>推荐</th><th>理由</th></tr></thead><tbody><tr><td>政府/电信</td><td><strong>Huawei Ascend 920</strong></td><td>国产最强</td></tr><tr><td>互联网大模型</td><td><strong>Huawei Ascend 920 + CloudMatrix 384 Ultra</strong></td><td>系统级</td></tr><tr><td>边缘 AI</td><td><strong>Huawei Ascend 310</strong></td><td>国产</td></tr><tr><td>国家级 AI</td><td><strong>Huawei CloudMatrix 384 Ultra</strong></td><td>单系统 345 PFLOPS</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="北美--欧洲自由选择">北美 / 欧洲（自由选择）<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E5%8C%97%E7%BE%8E--%E6%AC%A7%E6%B4%B2%E8%87%AA%E7%94%B1%E9%80%89%E6%8B%A9" class="hash-link" aria-label="Direct link to 北美 / 欧洲（自由选择）" title="Direct link to 北美 / 欧洲（自由选择）" translate="no">​</a></h3>
<table><thead><tr><th>优先级</th><th>厂商</th><th>理由</th></tr></thead><tbody><tr><td>1</td><td><strong>NVIDIA</strong></td><td>生态成熟、性能最强</td></tr><tr><td>2</td><td>AMD</td><td>性价比、开放生态</td></tr><tr><td>3</td><td>AWS</td><td>仅在 AWS 云</td></tr><tr><td>4</td><td>Google</td><td>仅在 Google Cloud</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="aws-cloud仅-aws-生态">AWS Cloud（仅 AWS 生态）<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#aws-cloud%E4%BB%85-aws-%E7%94%9F%E6%80%81" class="hash-link" aria-label="Direct link to AWS Cloud（仅 AWS 生态）" title="Direct link to AWS Cloud（仅 AWS 生态）" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>推荐</th></tr></thead><tbody><tr><td>训练</td><td><strong>Trainium 3 UltraServer</strong>（3nm, 4.4×）</td></tr><tr><td>推理</td><td><strong>Inferentia 2</strong>（便宜）</td></tr><tr><td>通用</td><td><strong>NVIDIA H100</strong>（p5.48xlarge）</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="google-cloud仅-google-生态">Google Cloud（仅 Google 生态）<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#google-cloud%E4%BB%85-google-%E7%94%9F%E6%80%81" class="hash-link" aria-label="Direct link to Google Cloud（仅 Google 生态）" title="Direct link to Google Cloud（仅 Google 生态）" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>推荐</th></tr></thead><tbody><tr><td>训练</td><td><strong>TPU 8t pod</strong>（9,216 chip）</td></tr><tr><td>推理</td><td><strong>TPU 8i</strong>（288GB HBM）</td></tr><tr><td>通用</td><td><strong>NVIDIA H100 / A100</strong></td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="延迟要求速查">延迟要求速查<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E5%BB%B6%E8%BF%9F%E8%A6%81%E6%B1%82%E9%80%9F%E6%9F%A5" class="hash-link" aria-label="Direct link to 延迟要求速查" title="Direct link to 延迟要求速查" translate="no">​</a></h2>
<table><thead><tr><th>延迟要求</th><th>训练</th><th>推理</th></tr></thead><tbody><tr><td><strong>&gt; 1s</strong></td><td>任何方案</td><td>任何方案</td></tr><tr><td><strong>100ms-1s</strong></td><td>任何方案</td><td>NVIDIA H200 / TPU 8i</td></tr><tr><td><strong>50-100ms</strong></td><td>—</td><td>TPU 8i / H200 NVL</td></tr><tr><td><strong>20-50ms</strong></td><td>—</td><td><strong>Groq 3 LPX</strong></td></tr><tr><td><strong>&lt; 20ms</strong></td><td>—</td><td><strong>Groq 3 LPX rack</strong></td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2026-h2-选型速查表">2026 H2 选型速查表<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#2026-h2-%E9%80%89%E5%9E%8B%E9%80%9F%E6%9F%A5%E8%A1%A8" class="hash-link" aria-label="Direct link to 2026 H2 选型速查表" title="Direct link to 2026 H2 选型速查表" translate="no">​</a></h2>
<table><thead><tr><th>需求</th><th>推荐方案</th><th>备选</th></tr></thead><tbody><tr><td>万亿参数 LLM 训练</td><td><strong>NVIDIA Rubin NVL72</strong></td><td>AMD Helios</td></tr><tr><td>700B LLM 训练</td><td><strong>AMD Helios (open)</strong> 或 NVIDIA Rubin NVL72</td><td>Trainium 3</td></tr><tr><td>70B LLM 推理（单卡）</td><td><strong>NVIDIA B300 Ultra</strong></td><td>TPU 8i / MI400</td></tr><tr><td>70B LLM 训练</td><td><strong>NVIDIA H200 / B200</strong></td><td>AMD MI300X / MI325X</td></tr><tr><td>7B-13B LLM 训练</td><td><strong>NVIDIA A100 / H100</strong></td><td>AMD MI300X / Gaudi 3</td></tr><tr><td>本地 7B LLM</td><td><strong>NVIDIA RTX 4090 / 5090</strong></td><td>Apple M3 Ultra</td></tr><tr><td><strong>超低延迟 LLM 推理</strong></td><td><strong>Groq 3 LPX</strong></td><td>TPU 8i</td></tr><tr><td><strong>Agentic AI</strong></td><td><strong>Groq 3 LPX rack</strong></td><td>唯一选择</td></tr><tr><td>中国市场</td><td><strong>Huawei Ascend 920</strong></td><td>Ascend 910C</td></tr><tr><td>AWS 云</td><td><strong>Trainium 3</strong></td><td>NVIDIA H100</td></tr><tr><td>Google Cloud</td><td><strong>TPU 8t (训练) + 8i (推理)</strong></td><td>NVIDIA H100</td></tr><tr><td>机器人 / 物理 AI</td><td><strong>Jetson AGX Thor T5000</strong></td><td>Jetson Orin</td></tr><tr><td>工业边缘</td><td><strong>Jetson AGX Orin 64GB</strong></td><td>Hailo-15</td></tr><tr><td>性价比深度学习</td><td><strong>AMD MI300X</strong></td><td>Intel Gaudi 3</td></tr><tr><td>Intel 生态保留</td><td><strong>Intel Jaguar Shores</strong> (2027-2028)</td><td>Gaudi 3</td></tr><tr><td>超低延迟 AI</td><td><strong>Groq 3 LPX</strong> (256 LPU)</td><td>唯一</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页索引">详细产品页索引<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5%E7%B4%A2%E5%BC%95" class="hash-link" aria-label="Direct link to 详细产品页索引" title="Direct link to 详细产品页索引" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="训练-gpu">训练 GPU<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E8%AE%AD%E7%BB%83-gpu" class="hash-link" aria-label="Direct link to 训练 GPU" title="Direct link to 训练 GPU" translate="no">​</a></h3>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/rubin-r200">NVIDIA Rubin R200</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/b300-ultra">NVIDIA B300 Ultra</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/h200">NVIDIA H200</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi400">AMD MI400</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi300x">AMD MI300X</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/huawei/ascend-920">Huawei Ascend 920</a></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="训练-asic">训练 ASIC<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E8%AE%AD%E7%BB%83-asic" class="hash-link" aria-label="Direct link to 训练 ASIC" title="Direct link to 训练 ASIC" translate="no">​</a></h3>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8t">Google TPU 8t</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-ironwood">Google TPU v7 Ironwood</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/trainium-3">AWS Trainium 3</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/intel/gaudi-3">Intel Gaudi 3</a></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="推理-gpu">推理 GPU<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E6%8E%A8%E7%90%86-gpu" class="hash-link" aria-label="Direct link to 推理 GPU" title="Direct link to 推理 GPU" translate="no">​</a></h3>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/l40s">NVIDIA L40S</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/l4">NVIDIA L4</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi325x">AMD MI325X</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/huawei/ascend-910c">Huawei Ascend 910C</a></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="推理-asic">推理 ASIC<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E6%8E%A8%E7%90%86-asic" class="hash-link" aria-label="Direct link to 推理 ASIC" title="Direct link to 推理 ASIC" translate="no">​</a></h3>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8i">Google TPU 8i</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/groq-3-lpx">NVIDIA Groq 3 LPX</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/inferentia-2">AWS Inferentia 2</a></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="晶圆级">晶圆级<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E6%99%B6%E5%9C%86%E7%BA%A7" class="hash-link" aria-label="Direct link to 晶圆级" title="Direct link to 晶圆级" translate="no">​</a></h3>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-3">Cerebras WSE-3</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-4">Cerebras WSE-4</a></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="其他">其他<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E5%85%B6%E4%BB%96" class="hash-link" aria-label="Direct link to 其他" title="Direct link to 其他" translate="no">​</a></h3>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/apple-m3-ultra">Apple M3 Ultra (192GB UMA)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/groq-lpu">Groq LPU v1</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/jetson-thor">Jetson AGX Thor T5000</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/2026-h2-top-ai-chip-selection-guide#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>2026 H2 选型核心原则：</p>
<ol>
<li class=""><strong>训练 + 推理 = 同一芯片</strong>？多数场景用 NVIDIA B300 Ultra / H200 同时解决。</li>
<li class=""><strong>超低延迟推理</strong>？选 <strong>Groq 3 LPX</strong>，无替代。</li>
<li class=""><strong>AWS 云</strong>？选 <strong>Trainium 3</strong>，每美元性能 2-3×。</li>
<li class=""><strong>Google Cloud</strong>？选 <strong>TPU 8t (训练) + TPU 8i (推理)</strong>。</li>
<li class=""><strong>中国市场</strong>？<strong>Huawei Ascend 920</strong> + <strong>CloudMatrix 384 Ultra</strong>。</li>
<li class=""><strong>开放生态</strong>？<strong>AMD Helios</strong>（UALoF 开放互联）。</li>
<li class=""><strong>预算敏感</strong>？<strong>AMD MI300X</strong> 或 <strong>Intel Gaudi 3</strong>。</li>
<li class=""><strong>本地 LLM</strong>？<strong>Apple M3 Ultra</strong> (192GB UMA)。</li>
</ol>
<p><strong>没有最好，只有最合适</strong>。结合你的模型规模、延迟要求、预算、地区，参考本文的选型树和速查表。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Selection Guide" term="Selection Guide"/>
        <category label="Benchmarks" term="Benchmarks"/>
        <category label="Cloud Pricing" term="Cloud Pricing"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI 集群的电力危机：1MW 机柜、核电站、SMR 与绿色 AI]]></title>
        <id>https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai</id>
        <link href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai"/>
        <updated>2026-05-30T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026 年 AI 数据中心电力需求激增：单集群 1GW、核电站重启、SMR 小型模块化反应堆、地热、太阳能、储能。AI 算力增长 vs 电力供应的赛跑。]]></summary>
        <content type="html"><![CDATA[<p><strong>2026 年 AI 算力增长遇到了硬约束</strong>——<strong>电力</strong>。当 NVIDIA Rubin NVL576 单机柜功耗 <strong>1 MW</strong>、xAI Colossus 集群 <strong>200 MW</strong>、OpenAI 计划中的 Stargate 园区 <strong>5 GW</strong> 时，<strong>电力供应正在成为 AI 发展的最大瓶颈</strong>。本文深入分析这场「电力危机」与应对方案。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="电力需求指数级增长">电力需求：指数级增长<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E7%94%B5%E5%8A%9B%E9%9C%80%E6%B1%82%E6%8C%87%E6%95%B0%E7%BA%A7%E5%A2%9E%E9%95%BF" class="hash-link" aria-label="Direct link to 电力需求：指数级增长" title="Direct link to 电力需求：指数级增长" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="单机柜功耗演进">单机柜功耗演进<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%8D%95%E6%9C%BA%E6%9F%9C%E5%8A%9F%E8%80%97%E6%BC%94%E8%BF%9B" class="hash-link" aria-label="Direct link to 单机柜功耗演进" title="Direct link to 单机柜功耗演进" translate="no">​</a></h3>
<table><thead><tr><th>年份</th><th>代表机柜</th><th>单机柜功耗</th><th>集群规模</th><th>总功耗</th></tr></thead><tbody><tr><td>2020</td><td>DGX A100 (8 GPU)</td><td>6.5 kW</td><td>100-1,000</td><td>0.7-7 MW</td></tr><tr><td>2023</td><td>DGX H100 (8 GPU)</td><td>11 kW</td><td>1,000-10,000</td><td>11-110 MW</td></tr><tr><td>2024</td><td>GB200 NVL72</td><td>120 kW</td><td>10,000</td><td>1.2 GW</td></tr><tr><td><strong>2026</strong></td><td><strong>Rubin NVL576</strong></td><td><strong>1 MW</strong></td><td>10,000-100,000</td><td><strong>10-100 GW</strong></td></tr><tr><td>2028</td><td>Rubin Ultra NVL576</td><td>1.5 MW</td><td>100,000</td><td>150 GW</td></tr></tbody></table>
<blockquote>
<p><strong>单机柜功耗 5 年增长 150×</strong>（6.5 kW → 1 MW）。这已接近<strong>核反应堆</strong>输出功率。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="全球-ai-数据中心电力需求iea-预测">全球 AI 数据中心电力需求（IEA 预测）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%85%A8%E7%90%83-ai-%E6%95%B0%E6%8D%AE%E4%B8%AD%E5%BF%83%E7%94%B5%E5%8A%9B%E9%9C%80%E6%B1%82iea-%E9%A2%84%E6%B5%8B" class="hash-link" aria-label="Direct link to 全球 AI 数据中心电力需求（IEA 预测）" title="Direct link to 全球 AI 数据中心电力需求（IEA 预测）" translate="no">​</a></h3>
<table><thead><tr><th>年份</th><th>AI 数据中心 TWh</th><th>占全球电力</th><th>同比</th></tr></thead><tbody><tr><td>2020</td><td>50 TWh</td><td>0.2%</td><td>—</td></tr><tr><td>2023</td><td>200 TWh</td><td>0.8%</td><td>+100%</td></tr><tr><td>2025</td><td>460 TWh</td><td>1.7%</td><td>+130%</td></tr><tr><td><strong>2026</strong></td><td><strong>800 TWh</strong></td><td><strong>2.8%</strong></td><td>+75%</td></tr><tr><td>2028</td><td>1,500 TWh</td><td>5.0%</td><td>+90%</td></tr><tr><td>2030</td><td>3,000 TWh</td><td>9.5%</td><td>+100%</td></tr></tbody></table>
<blockquote>
<p><strong>2030 年 AI 数据中心将占全球电力 9.5%</strong>（vs 2020 的 0.2%）。这是<strong>全球电力转型的核心驱动力</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="主要-ai-公司电力消耗">主要 AI 公司电力消耗<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E4%B8%BB%E8%A6%81-ai-%E5%85%AC%E5%8F%B8%E7%94%B5%E5%8A%9B%E6%B6%88%E8%80%97" class="hash-link" aria-label="Direct link to 主要 AI 公司电力消耗" title="Direct link to 主要 AI 公司电力消耗" translate="no">​</a></h3>
<table><thead><tr><th>公司</th><th>2024 电力</th><th>2026 (E)</th><th>2028 (E)</th></tr></thead><tbody><tr><td><strong>Microsoft</strong> (OpenAI)</td><td>5 TWh</td><td>15 TWh</td><td>40 TWh</td></tr><tr><td><strong>Google</strong> (Gemini)</td><td>4 TWh</td><td>12 TWh</td><td>35 TWh</td></tr><tr><td><strong>Meta</strong> (Llama)</td><td>3 TWh</td><td>8 TWh</td><td>25 TWh</td></tr><tr><td><strong>Amazon</strong> (AWS + Anthropic)</td><td>6 TWh</td><td>20 TWh</td><td>50 TWh</td></tr><tr><td><strong>xAI</strong> (Grok)</td><td>1 TWh</td><td>8 TWh</td><td>25 TWh</td></tr><tr><td><strong>Oracle</strong> (OCI)</td><td>0.5 TWh</td><td>3 TWh</td><td>10 TWh</td></tr><tr><td><strong>合计</strong></td><td>~20 TWh</td><td><strong>~70 TWh</strong></td><td><strong>~200 TWh</strong></td></tr></tbody></table>
<blockquote>
<p><strong>OpenAI 单独计划到 2028 年需要 40 TWh/年</strong>——超过瑞典全国年用电量（~35 TWh）。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="三大电力危机">三大电力危机<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E4%B8%89%E5%A4%A7%E7%94%B5%E5%8A%9B%E5%8D%B1%E6%9C%BA" class="hash-link" aria-label="Direct link to 三大电力危机" title="Direct link to 三大电力危机" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="危机-1数据中心电力供应不足">危机 1：数据中心电力供应不足<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%8D%B1%E6%9C%BA-1%E6%95%B0%E6%8D%AE%E4%B8%AD%E5%BF%83%E7%94%B5%E5%8A%9B%E4%BE%9B%E5%BA%94%E4%B8%8D%E8%B6%B3" class="hash-link" aria-label="Direct link to 危机 1：数据中心电力供应不足" title="Direct link to 危机 1：数据中心电力供应不足" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="美国情况">美国情况<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E7%BE%8E%E5%9B%BD%E6%83%85%E5%86%B5" class="hash-link" aria-label="Direct link to 美国情况" title="Direct link to 美国情况" translate="no">​</a></h4>
<ul>
<li class=""><strong>2025 北弗吉尼亚</strong>（全球最大数据中心集群）：<strong>电力申请需要等待 3-5 年</strong></li>
<li class=""><strong>德州 Round Rock</strong>（Oracle 总部）：<strong>电网容量已满</strong></li>
<li class=""><strong>PJM 电网</strong>（美国最大区域电网）：<strong>2026-2030 缺电 5-10 GW</strong></li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="中国情况">中国情况<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E4%B8%AD%E5%9B%BD%E6%83%85%E5%86%B5" class="hash-link" aria-label="Direct link to 中国情况" title="Direct link to 中国情况" translate="no">​</a></h4>
<ul>
<li class=""><strong>内蒙古、贵州</strong>（政府补贴数据中心）：<strong>部分限电</strong></li>
<li class=""><strong>北京、上海</strong>（一线城市）：<strong>PUE 1.4 以下才批准新建</strong></li>
<li class=""><strong>数据中心电力配额</strong>：<strong>2025 末已用完 60%</strong></li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="欧洲情况">欧洲情况<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E6%AC%A7%E6%B4%B2%E6%83%85%E5%86%B5" class="hash-link" aria-label="Direct link to 欧洲情况" title="Direct link to 欧洲情况" translate="no">​</a></h4>
<ul>
<li class=""><strong>荷兰阿姆斯特丹</strong>（MSFT 投资区）：<strong>暂停新数据中心审批</strong></li>
<li class=""><strong>爱尔兰都柏林</strong>（AWS 欧洲总部）：<strong>2030 前不再批准</strong></li>
<li class=""><strong>北欧</strong>（挪威、瑞典、芬兰）：<strong>可再生能源丰富但容量有限</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="危机-2电力成本飙升">危机 2：电力成本飙升<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%8D%B1%E6%9C%BA-2%E7%94%B5%E5%8A%9B%E6%88%90%E6%9C%AC%E9%A3%99%E5%8D%87" class="hash-link" aria-label="Direct link to 危机 2：电力成本飙升" title="Direct link to 危机 2：电力成本飙升" translate="no">​</a></h3>
<table><thead><tr><th>地区</th><th>2020 工业电价</th><th>2025 工业电价</th><th>涨幅</th></tr></thead><tbody><tr><td>美国（弗吉尼亚）</td><td>$0.05/kWh</td><td>$0.08/kWh</td><td>+60%</td></tr><tr><td>德国</td><td>$0.18/kWh</td><td>$0.35/kWh</td><td>+94%</td></tr><tr><td>英国</td><td>$0.20/kWh</td><td>$0.40/kWh</td><td>+100%</td></tr><tr><td>日本</td><td>$0.18/kWh</td><td>$0.30/kWh</td><td>+67%</td></tr><tr><td>中国（西部）</td><td>$0.04/kWh</td><td>$0.06/kWh</td><td>+50%</td></tr></tbody></table>
<blockquote>
<p><strong>欧洲电价 2 年翻倍</strong>——这是 AI 公司向美国/中东迁移的原因之一。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="危机-3碳排放与-esg-压力">危机 3：碳排放与 ESG 压力<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%8D%B1%E6%9C%BA-3%E7%A2%B3%E6%8E%92%E6%94%BE%E4%B8%8E-esg-%E5%8E%8B%E5%8A%9B" class="hash-link" aria-label="Direct link to 危机 3：碳排放与 ESG 压力" title="Direct link to 危机 3：碳排放与 ESG 压力" translate="no">​</a></h3>
<ul>
<li class=""><strong>2025 全球数据中心碳排放</strong>：<strong>~150 Mt CO2</strong>（百万吨）</li>
<li class=""><strong>2028 (E)</strong>：<strong>~400 Mt CO2</strong>（超过德国全年）</li>
<li class=""><strong>ESG 基金</strong>：越来越多要求"100% 可再生能源"数据中心</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="应对方案核能复兴">应对方案：核能复兴<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%BA%94%E5%AF%B9%E6%96%B9%E6%A1%88%E6%A0%B8%E8%83%BD%E5%A4%8D%E5%85%B4" class="hash-link" aria-label="Direct link to 应对方案：核能复兴" title="Direct link to 应对方案：核能复兴" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-核电重启">1. 核电重启<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#1-%E6%A0%B8%E7%94%B5%E9%87%8D%E5%90%AF" class="hash-link" aria-label="Direct link to 1. 核电重启" title="Direct link to 1. 核电重启" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="美国-tva--microsoft-合作">美国 TVA + Microsoft 合作<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E7%BE%8E%E5%9B%BD-tva--microsoft-%E5%90%88%E4%BD%9C" class="hash-link" aria-label="Direct link to 美国 TVA + Microsoft 合作" title="Direct link to 美国 TVA + Microsoft 合作" translate="no">​</a></h4>
<ul>
<li class=""><strong>2025-10 协议</strong>：Microsoft 重启 <strong>Three Mile Island Unit 1</strong> 核电站（840 MW）</li>
<li class=""><strong>2028 投产</strong>：专属供电 Microsoft 数据中心</li>
<li class=""><strong>20 年合同</strong>：835 MW 全部由 Microsoft 消纳</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="amazon--talen-energy-合作">Amazon + Talen Energy 合作<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#amazon--talen-energy-%E5%90%88%E4%BD%9C" class="hash-link" aria-label="Direct link to Amazon + Talen Energy 合作" title="Direct link to Amazon + Talen Energy 合作" translate="no">​</a></h4>
<ul>
<li class=""><strong>2025-03 协议</strong>：Amazon 收购 Talen Energy <strong>Susquehanna 核电站</strong> 960 MW 数据中心园区</li>
<li class=""><strong>960 MW 全部供电</strong> AWS</li>
<li class=""><strong>首个核电直供数据中心</strong></li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="google--kairos-power小型模块化反应堆-smr">Google + Kairos Power（小型模块化反应堆 SMR）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#google--kairos-power%E5%B0%8F%E5%9E%8B%E6%A8%A1%E5%9D%97%E5%8C%96%E5%8F%8D%E5%BA%94%E5%A0%86-smr" class="hash-link" aria-label="Direct link to Google + Kairos Power（小型模块化反应堆 SMR）" title="Direct link to Google + Kairos Power（小型模块化反应堆 SMR）" translate="no">​</a></h4>
<ul>
<li class=""><strong>2025-05 协议</strong>：500 MW SMR 供电 Google 数据中心</li>
<li class=""><strong>2030 首批</strong>：500 MW</li>
<li class=""><strong>2035 总计</strong>：500 MW × N 模块</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-小型模块化反应堆smr">2. 小型模块化反应堆（SMR）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#2-%E5%B0%8F%E5%9E%8B%E6%A8%A1%E5%9D%97%E5%8C%96%E5%8F%8D%E5%BA%94%E5%A0%86smr" class="hash-link" aria-label="Direct link to 2. 小型模块化反应堆（SMR）" title="Direct link to 2. 小型模块化反应堆（SMR）" translate="no">​</a></h3>
<p><strong>SMR（Small Modular Reactor）</strong> 是 AI 数据中心的<strong>终极方案</strong>：</p>
<table><thead><tr><th>项目</th><th>传统核电站</th><th>SMR</th></tr></thead><tbody><tr><td>单机容量</td><td>1,000-1,600 MW</td><td>50-300 MW</td></tr><tr><td>建设周期</td><td>7-10 年</td><td>3-4 年</td></tr><tr><td>投资</td><td>$10B+</td><td>$1-2B</td></tr><tr><td>灵活性</td><td>低</td><td>高（可扩容）</td></tr><tr><td>安全</td><td>高</td><td>更高（被动安全）</td></tr><tr><td>选址</td><td>严格</td><td>灵活（工厂化制造）</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="主要-smr-供应商">主要 SMR 供应商<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E4%B8%BB%E8%A6%81-smr-%E4%BE%9B%E5%BA%94%E5%95%86" class="hash-link" aria-label="Direct link to 主要 SMR 供应商" title="Direct link to 主要 SMR 供应商" translate="no">​</a></h4>
<table><thead><tr><th>厂商</th><th>型号</th><th>容量</th><th>2026 状态</th><th>主要客户</th></tr></thead><tbody><tr><td><strong>NuScale</strong></td><td>VOYGR</td><td>77 MW 模块</td><td><strong>2027 首批</strong></td><td>UAMPS（取消）、罗马尼亚</td></tr><tr><td><strong>Rolls-Royce</strong></td><td>UK SMR</td><td>470 MW</td><td>2030 首批</td><td>英国政府</td></tr><tr><td><strong>TerraPower</strong></td><td>Natrium</td><td>345 MW</td><td>2030 首批</td><td>Bill Gates + Warren Buffett</td></tr><tr><td><strong>X-energy</strong></td><td>Xe-100</td><td>80 MW 模块</td><td>2028 首批</td><td>Amazon + Energy Northwest</td></tr><tr><td><strong>Kairos Power</strong></td><td>KP-FHR</td><td>140 MW 模块</td><td>2030 首批</td><td>Google + TVA</td></tr><tr><td><strong>Holtec</strong></td><td>SMR-160</td><td>160 MW</td><td>2029 首批</td><td>多家美国电力公司</td></tr><tr><td><strong>CNNC 中国核电</strong></td><td>HTR-PM</td><td>250 MW 模块</td><td><strong>2023 已并网</strong></td><td>中国山东</td></tr></tbody></table>
<blockquote>
<p><strong>CNNC HTR-PM 2023 已并网</strong>，是全球<strong>首个商用 SMR</strong>——比美国 SMR 早 4-5 年。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-其他清洁能源">3. 其他清洁能源<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#3-%E5%85%B6%E4%BB%96%E6%B8%85%E6%B4%81%E8%83%BD%E6%BA%90" class="hash-link" aria-label="Direct link to 3. 其他清洁能源" title="Direct link to 3. 其他清洁能源" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="太阳能--储能">太阳能 + 储能<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%A4%AA%E9%98%B3%E8%83%BD--%E5%82%A8%E8%83%BD" class="hash-link" aria-label="Direct link to 太阳能 + 储能" title="Direct link to 太阳能 + 储能" translate="no">​</a></h4>
<ul>
<li class=""><strong>2025 全美最大太阳能项目</strong>：<strong>Sunlight Captive 1.4 GW</strong>（德州）+ 700 MWh 储能</li>
<li class=""><strong>Microsoft / Google / Amazon</strong> 都签 PPA（电力购买协议）</li>
<li class=""><strong>局限</strong>：夜间 / 阴天不稳定，储能成本高</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="地热">地热<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E5%9C%B0%E7%83%AD" class="hash-link" aria-label="Direct link to 地热" title="Direct link to 地热" translate="no">​</a></h4>
<ul>
<li class=""><strong>Google + Fervo Energy</strong>（2025-11）：<strong>150 MW 地热</strong>供电内华达数据中心</li>
<li class=""><strong>2028 计划</strong>：500 MW</li>
<li class=""><strong>优势</strong>：24/7 稳定供应</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="风电">风电<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E9%A3%8E%E7%94%B5" class="hash-link" aria-label="Direct link to 风电" title="Direct link to 风电" translate="no">​</a></h4>
<ul>
<li class=""><strong>Amazon + Avangrid</strong>（2025）：<strong>足够 1.5 GW 风电</strong>供电德州数据中心</li>
<li class=""><strong>局限</strong>：间歇性</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-自然冷却--液冷">4. 自然冷却 + 液冷<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#4-%E8%87%AA%E7%84%B6%E5%86%B7%E5%8D%B4--%E6%B6%B2%E5%86%B7" class="hash-link" aria-label="Direct link to 4. 自然冷却 + 液冷" title="Direct link to 4. 自然冷却 + 液冷" translate="no">​</a></h3>
<p><strong>降低数据中心 PUE（Power Usage Effectiveness）</strong> 也是关键：</p>
<table><thead><tr><th>冷却方式</th><th>PUE</th><th>适用地区</th></tr></thead><tbody><tr><td><strong>液冷（DLC）</strong></td><td><strong>1.05-1.15</strong></td><td>任何地区</td></tr><tr><td>间接蒸发冷却</td><td>1.15-1.25</td><td>寒冷地区</td></tr><tr><td>传统风冷</td><td>1.4-1.6</td><td>任何地区</td></tr><tr><td>自然冷却（北欧）</td><td>1.02-1.05</td><td>寒冷地区</td></tr></tbody></table>
<blockquote>
<p><strong>液冷 PUE 1.05-1.15 vs 风冷 1.4-1.6</strong>——<strong>节能 25-40%</strong>。NVIDIA Rubin NVL576 必需液冷。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="主要-ai-数据中心案例">主要 AI 数据中心案例<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E4%B8%BB%E8%A6%81-ai-%E6%95%B0%E6%8D%AE%E4%B8%AD%E5%BF%83%E6%A1%88%E4%BE%8B" class="hash-link" aria-label="Direct link to 主要 AI 数据中心案例" title="Direct link to 主要 AI 数据中心案例" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-xai-colossusmemphis-tennessee">1. xAI Colossus（Memphis, Tennessee）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#1-xai-colossusmemphis-tennessee" class="hash-link" aria-label="Direct link to 1. xAI Colossus（Memphis, Tennessee）" title="Direct link to 1. xAI Colossus（Memphis, Tennessee）" translate="no">​</a></h3>
<ul>
<li class=""><strong>2024-09 启用</strong>：<strong>100,000 颗 H100</strong></li>
<li class=""><strong>单集群功耗</strong>：<strong>~200 MW</strong></li>
<li class=""><strong>特殊供电</strong>：<strong>12 个移动天然气涡轮</strong>（临时方案）</li>
<li class=""><strong>争议</strong>：环境抗议、空气污染</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-microsoft--openai-stargate">2. Microsoft + OpenAI Stargate<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#2-microsoft--openai-stargate" class="hash-link" aria-label="Direct link to 2. Microsoft + OpenAI Stargate" title="Direct link to 2. Microsoft + OpenAI Stargate" translate="no">​</a></h3>
<ul>
<li class=""><strong>2025-01 宣布</strong>：<strong>$100B 投资 5 年</strong></li>
<li class=""><strong>首期园区</strong>：<strong>德州 + 亚利桑那</strong></li>
<li class=""><strong>总规划</strong>：<strong>5 GW</strong>（相当于 5 个核反应堆）</li>
<li class=""><strong>能源方案</strong>：核电 + 太阳能 + 储能混合</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-amazon-project-rainier">3. Amazon Project Rainier<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#3-amazon-project-rainier" class="hash-link" aria-label="Direct link to 3. Amazon Project Rainier" title="Direct link to 3. Amazon Project Rainier" translate="no">​</a></h3>
<ul>
<li class=""><strong>2024-12 启用</strong>：<strong>Anthropic 专用</strong></li>
<li class=""><strong>Trainium 2 集群</strong>：<strong>1,000,000 颗 Trainium 2 芯片</strong></li>
<li class=""><strong>总功耗</strong>：~300 MW</li>
<li class=""><strong>能源</strong>：100% 无碳能源（核电 + 风电）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-meta-hyperionlouisiana">4. Meta Hyperion（Louisiana）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#4-meta-hyperionlouisiana" class="hash-link" aria-label="Direct link to 4. Meta Hyperion（Louisiana）" title="Direct link to 4. Meta Hyperion（Louisiana）" translate="no">​</a></h3>
<ul>
<li class=""><strong>2025-2027 建设</strong>：<strong>2 GW 数据中心园区</strong></li>
<li class=""><strong>专供电厂</strong>：<strong>Meta + Entergy</strong> 合作 1.5 GW 天然气 + 风电</li>
<li class=""><strong>2027 投产</strong>：Llama 5 训练</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-google-数据中心扩张">5. Google 数据中心扩张<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#5-google-%E6%95%B0%E6%8D%AE%E4%B8%AD%E5%BF%83%E6%89%A9%E5%BC%A0" class="hash-link" aria-label="Direct link to 5. Google 数据中心扩张" title="Direct link to 5. Google 数据中心扩张" translate="no">​</a></h3>
<ul>
<li class=""><strong>2025-2026 新建 8 个数据中心</strong></li>
<li class=""><strong>总规划</strong>：<strong>~3 GW 额外容量</strong></li>
<li class=""><strong>能源</strong>：100% 无碳（2025 末已实现）</li>
<li class=""><strong>特殊项目</strong>：SMR 500 MW + 地热 150 MW</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-huawei-cloudmatrix-384-ultra">6. Huawei CloudMatrix 384 Ultra<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#6-huawei-cloudmatrix-384-ultra" class="hash-link" aria-label="Direct link to 6. Huawei CloudMatrix 384 Ultra" title="Direct link to 6. Huawei CloudMatrix 384 Ultra" translate="no">​</a></h3>
<ul>
<li class=""><strong>中国国内</strong>：<strong>贵州、内蒙古</strong>集群</li>
<li class=""><strong>电力来源</strong>：<strong>西部水电 + 风电</strong>（绿能）</li>
<li class=""><strong>功耗</strong>：~50 MW / 集群</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="绿色-ai-战略">绿色 AI 战略<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E7%BB%BF%E8%89%B2-ai-%E6%88%98%E7%95%A5" class="hash-link" aria-label="Direct link to 绿色 AI 战略" title="Direct link to 绿色 AI 战略" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-能效优化">1. 能效优化<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#1-%E8%83%BD%E6%95%88%E4%BC%98%E5%8C%96" class="hash-link" aria-label="Direct link to 1. 能效优化" title="Direct link to 1. 能效优化" translate="no">​</a></h3>
<ul>
<li class=""><strong>FP4 / FP8 量化</strong>：相比 FP16 减少 50% 功耗</li>
<li class=""><strong>Sparse 算力</strong>：2:4 稀疏减少 50% 算力</li>
<li class=""><strong>液冷</strong>：降低 PUE 25-40%</li>
<li class=""><strong>模型压缩</strong>：MoE、Distillation、Pruning</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-可再生能源承诺">2. 可再生能源承诺<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#2-%E5%8F%AF%E5%86%8D%E7%94%9F%E8%83%BD%E6%BA%90%E6%89%BF%E8%AF%BA" class="hash-link" aria-label="Direct link to 2. 可再生能源承诺" title="Direct link to 2. 可再生能源承诺" translate="no">​</a></h3>
<table><thead><tr><th>公司</th><th>100% 可再生目标</th></tr></thead><tbody><tr><td><strong>Google</strong></td><td><strong>2025 已实现 100% 匹配</strong></td></tr><tr><td><strong>Microsoft</strong></td><td>2030</td></tr><tr><td><strong>Amazon</strong></td><td>2030（2025 达 90%）</td></tr><tr><td><strong>Meta</strong></td><td>2030</td></tr><tr><td><strong>Apple</strong></td><td>2030（2025 已 90%）</td></tr><tr><td><strong>Huawei</strong></td><td>2030（"碳中和"目标）</td></tr><tr><td><strong>xAI</strong></td><td>未承诺</td></tr><tr><td><strong>Oracle</strong></td><td>2030</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-碳捕获与封存">3. 碳捕获与封存<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#3-%E7%A2%B3%E6%8D%95%E8%8E%B7%E4%B8%8E%E5%B0%81%E5%AD%98" class="hash-link" aria-label="Direct link to 3. 碳捕获与封存" title="Direct link to 3. 碳捕获与封存" translate="no">​</a></h3>
<ul>
<li class=""><strong>Microsoft + Occidental</strong>（2025-09）：<strong>$10B 投资碳捕获</strong></li>
<li class=""><strong>Amazon + CarbonCapture Inc.</strong>（2025-11）：<strong>100 MW DAC</strong>（直接空气捕获）</li>
<li class=""><strong>Google + Climeworks</strong>（2025-08）：<strong>DAC + 储能</strong>混合项目</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="未来展望">未来展望<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E6%9C%AA%E6%9D%A5%E5%B1%95%E6%9C%9B" class="hash-link" aria-label="Direct link to 未来展望" title="Direct link to 未来展望" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="短期2026-2027">短期（2026-2027）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E7%9F%AD%E6%9C%9F2026-2027" class="hash-link" aria-label="Direct link to 短期（2026-2027）" title="Direct link to 短期（2026-2027）" translate="no">​</a></h3>
<ul>
<li class=""><strong>AI 电力紧缺加剧</strong>：需求增长 50% / 供应增长 15%</li>
<li class=""><strong>电价持续上涨</strong>：美国 +30% / 欧洲 +20%</li>
<li class=""><strong>核电重启加速</strong>：Microsoft、Amazon、Google 主导</li>
<li class=""><strong>SMR 投资激增</strong>：2026 全球 SMR 投资 $50B+</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="中期2027-2030">中期（2027-2030）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E4%B8%AD%E6%9C%9F2027-2030" class="hash-link" aria-label="Direct link to 中期（2027-2030）" title="Direct link to 中期（2027-2030）" translate="no">​</a></h3>
<ul>
<li class=""><strong>HBM 内存 + 核能电力</strong> = AI 算力两大瓶颈</li>
<li class=""><strong>SMR 大规模部署</strong>：2028 首批商用，2030 达 10+ GW</li>
<li class=""><strong>碳中和数据中心</strong>成标配</li>
<li class=""><strong>AI 算力向核电资源区迁移</strong>：德州、田纳西、加拿大</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="长期2030">长期（2030+）<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E9%95%BF%E6%9C%9F2030" class="hash-link" aria-label="Direct link to 长期（2030+）" title="Direct link to 长期（2030+）" translate="no">​</a></h3>
<ul>
<li class=""><strong>聚变能商业化</strong>：Helion / TAE / Commonwealth Fusion 2030+</li>
<li class=""><strong>太空太阳能</strong>：理论 24/7 供电</li>
<li class=""><strong>量子计算辅助</strong>：降低 AI 算力需求</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/rubin-r200">NVIDIA Rubin NVL576 (1MW/机柜)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi400">AMD Helios 机柜 (80 kW/机柜)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/trainium-3">AWS Trn3 UltraServer (100 kW/机柜)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/roadmap">未来路线图</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/ai-cluster-power-crisis-1mw-nuclear-smr-green-ai#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>AI 算力的「下一战场」是<strong>电力</strong>：</p>
<ol>
<li class=""><strong>单机柜功耗突破 1 MW</strong>（Rubin NVL576）——接近核反应堆</li>
<li class=""><strong>全球 AI 数据中心 2026 需 800 TWh</strong>——超过德国全国</li>
<li class=""><strong>核电重启</strong>：Microsoft / Amazon / Google 各锁定 1-2 GW 核电</li>
<li class=""><strong>SMR 兴起</strong>：2028 首批商用，单机 50-300 MW</li>
<li class=""><strong>可再生能源</strong>：太阳能 / 风电 / 地热 / 水电 + 储能</li>
<li class=""><strong>液冷成标配</strong>：PUE 1.05-1.15 vs 风冷 1.4-1.6</li>
</ol>
<p><strong>没有电力的 AI 算力，就是空中楼阁</strong>。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Industry News" term="Industry News"/>
        <category label="Tech Deep Dive" term="Tech Deep Dive"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI 芯片创业公司生存报告：Tenstorrent / SambaNova / Graphcore 的 2026]]></title>
        <id>https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore</id>
        <link href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore"/>
        <updated>2026-05-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026 年 AI 芯片创业公司生存现状：Tenstorrent 融资 7 亿美元 / SambaNova RDU 二代 / Graphcore 被软银收购 / Cerebras IPO / Groq 被 NVIDIA 收购。哪些能活下来？]]></summary>
        <content type="html"><![CDATA[<p><strong>2026 年 AI 芯片市场进入「赢家通吃」阶段</strong>。NVIDIA 占据 90%+ 份额，AMD 10% 挣扎，Google/AWS/Huawei/Cerebras 各占细分市场。但<strong>还有一批 AI 芯片创业公司</strong>在夹缝中求生——本文分析 <strong>Tenstorrent、SambaNova、Graphcore、Cambricon、Moore Threads、Biren、Iluvatar</strong> 的 2026 现状与未来。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2026-年-ai-芯片市场格局">2026 年 AI 芯片市场格局<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#2026-%E5%B9%B4-ai-%E8%8A%AF%E7%89%87%E5%B8%82%E5%9C%BA%E6%A0%BC%E5%B1%80" class="hash-link" aria-label="Direct link to 2026 年 AI 芯片市场格局" title="Direct link to 2026 年 AI 芯片市场格局" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="头部双寡头--5-个挑战者">头部：双寡头 + 5 个挑战者<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E5%A4%B4%E9%83%A8%E5%8F%8C%E5%AF%A1%E5%A4%B4--5-%E4%B8%AA%E6%8C%91%E6%88%98%E8%80%85" class="hash-link" aria-label="Direct link to 头部：双寡头 + 5 个挑战者" title="Direct link to 头部：双寡头 + 5 个挑战者" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>市场份额</th><th>2025 营收</th><th>状态</th></tr></thead><tbody><tr><td><strong>NVIDIA</strong></td><td><strong>90%+</strong></td><td>~$130B</td><td>主导</td></tr><tr><td><strong>AMD</strong></td><td>5%</td><td>~$5B (MI 业务)</td><td>第二</td></tr><tr><td><strong>Huawei</strong></td><td>1% (中国 60%)</td><td>~$3B (昇腾)</td><td>中国主导</td></tr><tr><td><strong>Google TPU</strong></td><td>1% (内部)</td><td>N/A</td><td>内部使用</td></tr><tr><td><strong>AWS Trainium</strong></td><td>&lt;1% (内部)</td><td>N/A</td><td>内部使用</td></tr><tr><td><strong>Cerebras</strong></td><td>&lt;1%</td><td>$510M</td><td>即将 IPO</td></tr><tr><td><strong>Groq (NVIDIA)</strong></td><td>&lt;1%</td><td>N/A</td><td>已并入 NVIDIA</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="创业公司">创业公司<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E5%88%9B%E4%B8%9A%E5%85%AC%E5%8F%B8" class="hash-link" aria-label="Direct link to 创业公司" title="Direct link to 创业公司" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>创立</th><th>融资总额</th><th>2025 营收</th><th>状态</th></tr></thead><tbody><tr><td><strong>Tenstorrent</strong></td><td>2016</td><td><strong>$700M+</strong></td><td>~$30M (推测)</td><td>融资中</td></tr><tr><td><strong>SambaNova</strong></td><td>2017</td><td><strong>$1.1B+</strong></td><td>~$80M (推测)</td><td>营收增长</td></tr><tr><td><strong>Graphcore</strong></td><td>2016</td><td><strong>$700M+</strong></td><td>N/A</td><td><strong>被软银收购</strong>（2024）</td></tr><tr><td><strong>Cambricon 寒武纪</strong></td><td>2016</td><td>A股上市</td><td>~$80M</td><td>A股 250亿市值</td></tr><tr><td><strong>Moore Threads 摩尔线程</strong></td><td>2020</td><td>$500M+</td><td>~$30M</td><td>准备上市</td></tr><tr><td><strong>Biren 壁仞</strong></td><td>2019</td><td>$700M+</td><td>~$20M</td><td>准备上市</td></tr><tr><td><strong>Iluvatar 天数智芯</strong></td><td>2018</td><td>$400M+</td><td>~$15M</td><td>港股上市</td></tr><tr><td><strong>Lightmatter</strong></td><td>2017</td><td>$300M+</td><td>~$5M</td><td>硅光计算</td></tr><tr><td><strong>Esperanto</strong></td><td>2014</td><td>$120M</td><td>&lt;$5M</td><td>RISC-V AI</td></tr><tr><td><strong>Mythic</strong></td><td>2012</td><td>$200M+</td><td>&lt;$5M</td><td>边缘 AI</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="十大创业公司详解">十大创业公司详解<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E5%8D%81%E5%A4%A7%E5%88%9B%E4%B8%9A%E5%85%AC%E5%8F%B8%E8%AF%A6%E8%A7%A3" class="hash-link" aria-label="Direct link to 十大创业公司详解" title="Direct link to 十大创业公司详解" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-tenstorrentrisc-v-黑马">1. Tenstorrent：RISC-V 黑马<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#1-tenstorrentrisc-v-%E9%BB%91%E9%A9%AC" class="hash-link" aria-label="Direct link to 1. Tenstorrent：RISC-V 黑马" title="Direct link to 1. Tenstorrent：RISC-V 黑马" translate="no">​</a></h3>
<p><strong>Tenstorrent</strong> 由传奇芯片设计师 <strong>Jim Keller</strong> 领导（曾设计 AMD Zen / Apple A14 / Tesla FSD）：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创始人</strong></td><td><strong>Jim Keller</strong>（传奇架构师）</td></tr><tr><td><strong>创立</strong></td><td>2016</td></tr><tr><td><strong>总部</strong></td><td>多伦多 / 奥斯汀 / 硅谷</td></tr><tr><td><strong>融资</strong></td><td><strong>$700M+</strong>（2024-12 估值 $2B）</td></tr><tr><td><strong>2025 营收</strong></td><td>~$30M（推测）</td></tr><tr><td><strong>核心产品</strong></td><td><strong>Wormhole n150/n300</strong>, <strong>Blackhole</strong></td></tr><tr><td><strong>代工</strong></td><td>TSMC 12nm + GlobalFoundries</td></tr><tr><td><strong>软件</strong></td><td><strong>完全开源</strong>（TT-Metalium）</td></tr><tr><td><strong>客户</strong></td><td><strong>LG</strong>, <strong>BOSCH</strong>, <strong>AutoDesk</strong>, <strong>RIKEN</strong>（日本）</td></tr><tr><td><strong>2026 计划</strong></td><td>Grendel 下一代 + 战略合作（推测与 OpenAI/AMD）</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="tenstorrent-关键产品">Tenstorrent 关键产品<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#tenstorrent-%E5%85%B3%E9%94%AE%E4%BA%A7%E5%93%81" class="hash-link" aria-label="Direct link to Tenstorrent 关键产品" title="Direct link to Tenstorrent 关键产品" translate="no">​</a></h4>
<table><thead><tr><th>产品</th><th>工艺</th><th>算力</th><th>显存</th><th>价格</th><th>状态</th></tr></thead><tbody><tr><td><strong>Wormhole n150</strong></td><td>12nm</td><td>80 TOPS (FP8)</td><td>12GB</td><td>~$2K</td><td>2023</td></tr><tr><td><strong>Wormhole n300</strong></td><td>12nm</td><td>160 TOPS (FP8)</td><td>24GB</td><td>~$4K</td><td>2024</td></tr><tr><td><strong>Blackhole p150</strong></td><td>6nm</td><td>320 TOPS (FP8)</td><td>16GB</td><td>~$3K</td><td>2025</td></tr><tr><td><strong>Blackhole p300</strong></td><td>6nm</td><td>800 TOPS (FP8)</td><td>24GB</td><td>~$6K</td><td>2025</td></tr><tr><td><strong>Grendel</strong></td><td>4nm (推测)</td><td>1.5 POPS (FP8)</td><td>32GB</td><td>TBD</td><td>2026-2027</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="tenstorrent-关键优势">Tenstorrent 关键优势<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#tenstorrent-%E5%85%B3%E9%94%AE%E4%BC%98%E5%8A%BF" class="hash-link" aria-label="Direct link to Tenstorrent 关键优势" title="Direct link to Tenstorrent 关键优势" translate="no">​</a></h4>
<ul>
<li class=""><strong>Jim Keller 个人品牌</strong>：业内顶级架构师</li>
<li class=""><strong>完全开源软件</strong>：TT-Metalium（GitHub 13K+ stars）</li>
<li class=""><strong>RISC-V 生态</strong>：与 SiFive / RISC-V International 深度合作</li>
<li class=""><strong>政府 / 学术客户</strong>：RIKEN（日本）、多所美国大学</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="tenstorrent-关键挑战">Tenstorrent 关键挑战<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#tenstorrent-%E5%85%B3%E9%94%AE%E6%8C%91%E6%88%98" class="hash-link" aria-label="Direct link to Tenstorrent 关键挑战" title="Direct link to Tenstorrent 关键挑战" translate="no">​</a></h4>
<ul>
<li class=""><strong>TDP 偏高</strong>：Blackhole p300 ~150W（vs H100 700W 性能仍弱）</li>
<li class=""><strong>生态薄弱</strong>：PyTorch 兼容性仍在改善</li>
<li class=""><strong>市场认知度低</strong>：相比 NVIDIA 难以获得企业客户</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-sambanova企业-rdu-一体机">2. SambaNova：企业 RDU 一体机<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#2-sambanova%E4%BC%81%E4%B8%9A-rdu-%E4%B8%80%E4%BD%93%E6%9C%BA" class="hash-link" aria-label="Direct link to 2. SambaNova：企业 RDU 一体机" title="Direct link to 2. SambaNova：企业 RDU 一体机" translate="no">​</a></h3>
<p><strong>SambaNova</strong> 是企业级 AI 一体机的代表：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创始人</strong></td><td>Kunle Olukotun（Stanford 教授）+ 2 位 Stanford 校友</td></tr><tr><td><strong>创立</strong></td><td>2017</td></tr><tr><td><strong>总部</strong></td><td>帕罗奥图</td></tr><tr><td><strong>融资</strong></td><td><strong>$1.1B+</strong>（2021 估值 $5B）</td></tr><tr><td><strong>2025 营收</strong></td><td>~$80M（推测）</td></tr><tr><td><strong>核心产品</strong></td><td><strong>SN40L RDU</strong>（可重构数据流单元）</td></tr><tr><td><strong>代工</strong></td><td>TSMC 7nm</td></tr><tr><td><strong>客户</strong></td><td><strong>美国政府</strong>, <strong>Accenture</strong>, <strong>Hewlett Packard Enterprise</strong></td></tr><tr><td><strong>2026 计划</strong></td><td><strong>SN50 下一代</strong>（更大 RDU）</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="sambanova-sn40l-规格">SambaNova SN40L 规格<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#sambanova-sn40l-%E8%A7%84%E6%A0%BC" class="hash-link" aria-label="Direct link to SambaNova SN40L 规格" title="Direct link to SambaNova SN40L 规格" translate="no">​</a></h4>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td><strong>架构</strong></td><td>RDU（Reconfigurable Dataflow Unit）</td></tr><tr><td><strong>工艺</strong></td><td>TSMC 7nm</td></tr><tr><td><strong>RDU 核心</strong></td><td>1,040 个 tiles</td></tr><tr><td><strong>HBM 容量</strong></td><td>128 GB HBM3</td></tr><tr><td><strong>HBM 带宽</strong></td><td>3.2 TB/s</td></tr><tr><td><strong>FP16 算力</strong></td><td>600 TFLOPS</td></tr><tr><td><strong>BF16 算力</strong></td><td>300 TFLOPS</td></tr><tr><td><strong>TDP</strong></td><td>~600 W</td></tr><tr><td><strong>价格</strong></td><td>~$150K / 系统</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="sambanova-商业模式">SambaNova 商业模式<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#sambanova-%E5%95%86%E4%B8%9A%E6%A8%A1%E5%BC%8F" class="hash-link" aria-label="Direct link to SambaNova 商业模式" title="Direct link to SambaNova 商业模式" translate="no">​</a></h4>
<ul>
<li class=""><strong>不卖芯片</strong>——<strong>卖一体机</strong>（SambaSystems）</li>
<li class=""><strong>SambaFlow</strong> 软件栈（Apache 2.0）</li>
<li class=""><strong>专注企业私有部署</strong>：政府、银行、电信</li>
<li class=""><strong>2025 客户</strong>：美国空军、Accenture、HPE</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="sambanova-关键挑战">SambaNova 关键挑战<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#sambanova-%E5%85%B3%E9%94%AE%E6%8C%91%E6%88%98" class="hash-link" aria-label="Direct link to SambaNova 关键挑战" title="Direct link to SambaNova 关键挑战" translate="no">​</a></h4>
<ul>
<li class=""><strong>市场份额小</strong>：与 NVIDIA 巨头差距巨大</li>
<li class=""><strong>一体机模式难以扩展</strong>：每个客户需要定制</li>
<li class=""><strong>2024 裁员 20%</strong>（重组）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-graphcore被软银收购">3. Graphcore：被软银收购<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#3-graphcore%E8%A2%AB%E8%BD%AF%E9%93%B6%E6%94%B6%E8%B4%AD" class="hash-link" aria-label="Direct link to 3. Graphcore：被软银收购" title="Direct link to 3. Graphcore：被软银收购" translate="no">​</a></h3>
<p><strong>Graphcore</strong> 是英国 AI 芯片先驱，但 2024 年被<strong>软银收购</strong>：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创始人</strong></td><td>Nigel Toon + Simon Knowles</td></tr><tr><td><strong>创立</strong></td><td>2016</td></tr><tr><td><strong>总部</strong></td><td>布里斯托尔（英国）</td></tr><tr><td><strong>融资</strong></td><td>$700M+（2020 估值 $2.8B）</td></tr><tr><td><strong>2024 营收</strong></td><td>~$30M（推测）</td></tr><tr><td><strong>核心产品</strong></td><td><strong>Bow GC200 IPU</strong> + <strong>Bow Pod</strong></td></tr><tr><td><strong>代工</strong></td><td>TSMC 7nm</td></tr><tr><td><strong>2024-10 收购</strong></td><td><strong>被软银收购</strong>（金额未披露，推测 $600M）</td></tr><tr><td><strong>2026 状态</strong></td><td>软银子公司，专注日本市场</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="graphcore-关键事件">Graphcore 关键事件<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#graphcore-%E5%85%B3%E9%94%AE%E4%BA%8B%E4%BB%B6" class="hash-link" aria-label="Direct link to Graphcore 关键事件" title="Direct link to Graphcore 关键事件" translate="no">​</a></h4>
<table><thead><tr><th>时间</th><th>事件</th></tr></thead><tbody><tr><td>2018</td><td>Bow IPU 首发</td></tr><tr><td>2020</td><td>估值 $2.8B 巅峰</td></tr><tr><td>2022</td><td>营收远低于预期</td></tr><tr><td>2023</td><td>多次裁员</td></tr><tr><td><strong>2024-10</strong></td><td><strong>软银收购</strong></td></tr><tr><td>2025</td><td>转向日本市场（日本 SoftBank + 沙特 G42）</td></tr><tr><td>2026</td><td>软银内部使用 + 日本国家 AI 战略</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="graphcore-未来">Graphcore 未来<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#graphcore-%E6%9C%AA%E6%9D%A5" class="hash-link" aria-label="Direct link to Graphcore 未来" title="Direct link to Graphcore 未来" translate="no">​</a></h4>
<ul>
<li class=""><strong>不再追求独立 IPO</strong></li>
<li class=""><strong>被软银整合</strong>到 ARM 生态</li>
<li class=""><strong>Bow Pod 128</strong> 仍是旗舰</li>
<li class=""><strong>可能 2027 退出</strong>（软银撤资）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-cambricon-寒武纪a-股-250-亿">4. Cambricon 寒武纪：A 股 250 亿<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#4-cambricon-%E5%AF%92%E6%AD%A6%E7%BA%AAa-%E8%82%A1-250-%E4%BA%BF" class="hash-link" aria-label="Direct link to 4. Cambricon 寒武纪：A 股 250 亿" title="Direct link to 4. Cambricon 寒武纪：A 股 250 亿" translate="no">​</a></h3>
<p><strong>Cambricon</strong> 是中国 AI 芯片第一股：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创立</strong></td><td>2016</td></tr><tr><td><strong>联合创始人</strong></td><td>陈云霁、陈天石（兄弟，中科院计算所）</td></tr><tr><td><strong>上市</strong></td><td><strong>2020-07 科创板</strong>（A股 688256）</td></tr><tr><td><strong>市值</strong></td><td><strong>~250 亿美元</strong>（2026-05）</td></tr><tr><td><strong>2025 营收</strong></td><td>~$80M</td></tr><tr><td><strong>核心产品</strong></td><td>思元 290 / 590 / 思元 690（下一代）</td></tr><tr><td><strong>代工</strong></td><td>中芯国际 SMIC</td></tr><tr><td><strong>客户</strong></td><td>政府、电信、互联网</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="cambricon-思元-590">Cambricon 思元 590<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#cambricon-%E6%80%9D%E5%85%83-590" class="hash-link" aria-label="Direct link to Cambricon 思元 590" title="Direct link to Cambricon 思元 590" translate="no">​</a></h4>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td><strong>制程</strong></td><td>7nm</td></tr><tr><td><strong>BF16 算力</strong></td><td>480 TFLOPS</td></tr><tr><td><strong>INT8 算力</strong></td><td>960 TOPS</td></tr><tr><td><strong>HBM 容量</strong></td><td>64 GB</td></tr><tr><td><strong>HBM 带宽</strong></td><td>2.4 Tbps</td></tr><tr><td><strong>TDP</strong></td><td>~300 W</td></tr><tr><td><strong>价格</strong></td><td>~$5K（推测）</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="cambricon-挑战">Cambricon 挑战<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#cambricon-%E6%8C%91%E6%88%98" class="hash-link" aria-label="Direct link to Cambricon 挑战" title="Direct link to Cambricon 挑战" translate="no">​</a></h4>
<ul>
<li class=""><strong>软件生态薄弱</strong>：MLU 编程模型 vs CUDA 差距大</li>
<li class=""><strong>市场份额被 Huawei 挤压</strong>：昇腾 910C 2025 起算力领先</li>
<li class=""><strong>盈利能力差</strong>：仍亏损</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-moore-threads-摩尔线程a-股在审">5. Moore Threads 摩尔线程：A 股在审<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#5-moore-threads-%E6%91%A9%E5%B0%94%E7%BA%BF%E7%A8%8Ba-%E8%82%A1%E5%9C%A8%E5%AE%A1" class="hash-link" aria-label="Direct link to 5. Moore Threads 摩尔线程：A 股在审" title="Direct link to 5. Moore Threads 摩尔线程：A 股在审" translate="no">​</a></h3>
<p><strong>Moore Threads</strong> 是中国 GPU 第二：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创立</strong></td><td>2020</td></tr><tr><td><strong>创始人</strong></td><td>张建中（原 NVIDIA 全球副总裁、中国区总经理）</td></tr><tr><td><strong>融资</strong></td><td>$500M+</td></tr><tr><td><strong>2025 营收</strong></td><td>~$30M</td></tr><tr><td><strong>核心产品</strong></td><td>MTT S4000 / S5000</td></tr><tr><td><strong>代工</strong></td><td>SMIC 7nm</td></tr><tr><td><strong>A 股状态</strong></td><td><strong>2025 末申请科创板上市</strong></td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="moore-threads-mtt-s5000">Moore Threads MTT S5000<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#moore-threads-mtt-s5000" class="hash-link" aria-label="Direct link to Moore Threads MTT S5000" title="Direct link to Moore Threads MTT S5000" translate="no">​</a></h4>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td><strong>制程</strong></td><td>7nm (SMIC)</td></tr><tr><td><strong>FP16 算力</strong></td><td>250 TFLOPS</td></tr><tr><td><strong>INT8 算力</strong></td><td>500 TOPS</td></tr><tr><td><strong>显存</strong></td><td>32GB GDDR6X</td></tr><tr><td><strong>显存带宽</strong></td><td>1.6 Tbps</td></tr><tr><td><strong>TDP</strong></td><td>~300 W</td></tr><tr><td><strong>价格</strong></td><td>~$3K</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="moore-threads-挑战">Moore Threads 挑战<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#moore-threads-%E6%8C%91%E6%88%98" class="hash-link" aria-label="Direct link to Moore Threads 挑战" title="Direct link to Moore Threads 挑战" translate="no">​</a></h4>
<ul>
<li class=""><strong>生态薄弱</strong>：MUSA vs CUDA 差距</li>
<li class=""><strong>市场份额</strong>：远低于 Huawei</li>
<li class=""><strong>A 股 IPO 待批</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-biren-壁仞港股在审">6. Biren 壁仞：港股在审<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#6-biren-%E5%A3%81%E4%BB%9E%E6%B8%AF%E8%82%A1%E5%9C%A8%E5%AE%A1" class="hash-link" aria-label="Direct link to 6. Biren 壁仞：港股在审" title="Direct link to 6. Biren 壁仞：港股在审" translate="no">​</a></h3>
<p><strong>Biren</strong> 是中国 GPU 第三：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创立</strong></td><td>2019</td></tr><tr><td><strong>创始人</strong></td><td>张文（哈佛大学博士，前 NVIDIA）</td></tr><tr><td><strong>融资</strong></td><td>$700M+</td></tr><tr><td><strong>2025 营收</strong></td><td>~$20M</td></tr><tr><td><strong>核心产品</strong></td><td>BR104</td></tr><tr><td><strong>代工</strong></td><td>SMIC 7nm</td></tr><tr><td><strong>港股状态</strong></td><td><strong>2025 末申请港股上市</strong></td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="biren-br104">Biren BR104<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#biren-br104" class="hash-link" aria-label="Direct link to Biren BR104" title="Direct link to Biren BR104" translate="no">​</a></h4>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td><strong>制程</strong></td><td>7nm (SMIC)</td></tr><tr><td><strong>FP16 算力</strong></td><td>300 TFLOPS</td></tr><tr><td><strong>INT8 算力</strong></td><td>600 TOPS</td></tr><tr><td><strong>显存</strong></td><td>32GB GDDR6</td></tr><tr><td><strong>显存带宽</strong></td><td>1.6 Tbps</td></tr><tr><td><strong>TDP</strong></td><td>~300 W</td></tr><tr><td><strong>价格</strong></td><td>~$3K</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="biren-关键事件">Biren 关键事件<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#biren-%E5%85%B3%E9%94%AE%E4%BA%8B%E4%BB%B6" class="hash-link" aria-label="Direct link to Biren 关键事件" title="Direct link to Biren 关键事件" translate="no">​</a></h4>
<ul>
<li class=""><strong>2023 美国出口管制</strong>导致先进工艺受限</li>
<li class=""><strong>2024 推迟 IPO</strong>（业绩未达预期）</li>
<li class=""><strong>2025 末重新申请</strong>港股上市</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-iluvatar-天数智芯港股上市">7. Iluvatar 天数智芯：港股上市<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#7-iluvatar-%E5%A4%A9%E6%95%B0%E6%99%BA%E8%8A%AF%E6%B8%AF%E8%82%A1%E4%B8%8A%E5%B8%82" class="hash-link" aria-label="Direct link to 7. Iluvatar 天数智芯：港股上市" title="Direct link to 7. Iluvatar 天数智芯：港股上市" translate="no">​</a></h3>
<p><strong>Iluvatar</strong> 已港股上市：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创立</strong></td><td>2018</td></tr><tr><td><strong>上市</strong></td><td>2023 港股</td></tr><tr><td><strong>核心产品</strong></td><td>天垓 100 / Bi-150</td></tr><tr><td><strong>代工</strong></td><td>SMIC 7nm</td></tr><tr><td><strong>市值</strong></td><td>~$5 亿美元</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-lightmatter硅光计算">8. Lightmatter：硅光计算<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#8-lightmatter%E7%A1%85%E5%85%89%E8%AE%A1%E7%AE%97" class="hash-link" aria-label="Direct link to 8. Lightmatter：硅光计算" title="Direct link to 8. Lightmatter：硅光计算" translate="no">​</a></h3>
<p><strong>Lightmatter</strong> 是硅光计算先锋：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创立</strong></td><td>2017</td></tr><tr><td><strong>融资</strong></td><td>$300M+</td></tr><tr><td><strong>核心产品</strong></td><td><strong>Envise</strong> 硅光 AI 加速器</td></tr><tr><td><strong>工艺</strong></td><td>TSMC 5nm + 自研硅光芯片</td></tr><tr><td><strong>客户</strong></td><td>主要数据中心</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="lightmatter-envise">Lightmatter Envise<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#lightmatter-envise" class="hash-link" aria-label="Direct link to Lightmatter Envise" title="Direct link to Lightmatter Envise" translate="no">​</a></h4>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td><strong>架构</strong></td><td>硅光 + 电子混合</td></tr><tr><td><strong>算力</strong></td><td>1 PFLOP (FP16)</td></tr><tr><td><strong>功耗</strong></td><td>比传统 GPU 降 50%</td></tr><tr><td><strong>2026 状态</strong></td><td>商业试点</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="9-esperantorisc-v-ai">9. Esperanto：RISC-V AI<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#9-esperantorisc-v-ai" class="hash-link" aria-label="Direct link to 9. Esperanto：RISC-V AI" title="Direct link to 9. Esperanto：RISC-V AI" translate="no">​</a></h3>
<p><strong>Esperanto</strong> 是 RISC-V AI 加速器代表：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创立</strong></td><td>2014</td></tr><tr><td><strong>融资</strong></td><td>$120M+</td></tr><tr><td><strong>核心产品</strong></td><td>ET-SoC-1（1,000+ RISC-V 核心）</td></tr><tr><td><strong>代工</strong></td><td>TSMC 7nm</td></tr><tr><td><strong>2026 状态</strong></td><td>主要客户：超算中心 / 推荐系统</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="10-mythic边缘-ai">10. Mythic：边缘 AI<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#10-mythic%E8%BE%B9%E7%BC%98-ai" class="hash-link" aria-label="Direct link to 10. Mythic：边缘 AI" title="Direct link to 10. Mythic：边缘 AI" translate="no">​</a></h3>
<p><strong>Mythic</strong> 是边缘 AI 模拟计算：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>创立</strong></td><td>2012</td></tr><tr><td><strong>融资</strong></td><td>$200M+</td></tr><tr><td><strong>核心产品</strong></td><td>M1076 模拟 AI 芯片</td></tr><tr><td><strong>工艺</strong></td><td>TSMC 40nm</td></tr><tr><td><strong>2026 状态</strong></td><td>转型 / 边缘市场</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="创业公司分类与未来">创业公司分类与未来<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E5%88%9B%E4%B8%9A%E5%85%AC%E5%8F%B8%E5%88%86%E7%B1%BB%E4%B8%8E%E6%9C%AA%E6%9D%A5" class="hash-link" aria-label="Direct link to 创业公司分类与未来" title="Direct link to 创业公司分类与未来" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="类别-1可能成功的5-10-年内能-ipo-或被收购">类别 1：可能成功的（5-10 年内能 IPO 或被收购）<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E7%B1%BB%E5%88%AB-1%E5%8F%AF%E8%83%BD%E6%88%90%E5%8A%9F%E7%9A%845-10-%E5%B9%B4%E5%86%85%E8%83%BD-ipo-%E6%88%96%E8%A2%AB%E6%94%B6%E8%B4%AD" class="hash-link" aria-label="Direct link to 类别 1：可能成功的（5-10 年内能 IPO 或被收购）" title="Direct link to 类别 1：可能成功的（5-10 年内能 IPO 或被收购）" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>路径</th><th>关键支撑</th></tr></thead><tbody><tr><td><strong>Tenstorrent</strong></td><td>独立 IPO 或被收购</td><td>Jim Keller + RISC-V + 开源</td></tr><tr><td><strong>Cambricon</strong></td><td>A 股继续上市</td><td>中国国家 AI 战略</td></tr><tr><td><strong>Moore Threads</strong></td><td>A 股 IPO</td><td>张建中（NVIDIA 中国背景）</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="类别-2可能存活的细分市场">类别 2：可能存活的（细分市场）<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E7%B1%BB%E5%88%AB-2%E5%8F%AF%E8%83%BD%E5%AD%98%E6%B4%BB%E7%9A%84%E7%BB%86%E5%88%86%E5%B8%82%E5%9C%BA" class="hash-link" aria-label="Direct link to 类别 2：可能存活的（细分市场）" title="Direct link to 类别 2：可能存活的（细分市场）" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>细分市场</th></tr></thead><tbody><tr><td><strong>SambaNova</strong></td><td>美国政府 + 企业私有部署</td></tr><tr><td><strong>Lightmatter</strong></td><td>硅光计算 + 数据中心低功耗</td></tr><tr><td><strong>Esperanto</strong></td><td>超算 + 推荐系统</td></tr><tr><td><strong>Biren / Iluvatar</strong></td><td>中国国产替代政府市场</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="类别-3可能消失的">类别 3：可能消失的<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E7%B1%BB%E5%88%AB-3%E5%8F%AF%E8%83%BD%E6%B6%88%E5%A4%B1%E7%9A%84" class="hash-link" aria-label="Direct link to 类别 3：可能消失的" title="Direct link to 类别 3：可能消失的" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>风险</th></tr></thead><tbody><tr><td><strong>Graphcore</strong></td><td>软银子公司，未来不明</td></tr><tr><td><strong>Mythic</strong></td><td>模拟计算已被数字超越</td></tr><tr><td><strong>小型创业公司</strong></td><td>资金 + 客户 + 生态三重压力</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="投资逻辑">投资逻辑<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E6%8A%95%E8%B5%84%E9%80%BB%E8%BE%91" class="hash-link" aria-label="Direct link to 投资逻辑" title="Direct link to 投资逻辑" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="创业公司为何能活">创业公司为何能活？<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E5%88%9B%E4%B8%9A%E5%85%AC%E5%8F%B8%E4%B8%BA%E4%BD%95%E8%83%BD%E6%B4%BB" class="hash-link" aria-label="Direct link to 创业公司为何能活？" title="Direct link to 创业公司为何能活？" translate="no">​</a></h3>
<ol>
<li class=""><strong>细分市场</strong>：政府 / 国防 / 学术 / 特殊行业（NVIDIA 不做）</li>
<li class=""><strong>差异化架构</strong>：RISC-V / 硅光 / 模拟 / 数据流（NVIDIA 不会走）</li>
<li class=""><strong>本地化</strong>：中国 / 欧洲 / 日本 / 印度（数据主权 + 政治）</li>
<li class=""><strong>大客户绑定</strong>：Tenstorrent + LG、SambaNova + 美国空军</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="创业公司为何会死">创业公司为何会死？<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E5%88%9B%E4%B8%9A%E5%85%AC%E5%8F%B8%E4%B8%BA%E4%BD%95%E4%BC%9A%E6%AD%BB" class="hash-link" aria-label="Direct link to 创业公司为何会死？" title="Direct link to 创业公司为何会死？" translate="no">​</a></h3>
<ol>
<li class=""><strong>NVIDIA 生态碾压</strong>：CUDA + cuDNN + TensorRT 难以超越</li>
<li class=""><strong>资金消耗大</strong>：7nm 工艺流片 $30M+，5nm $80M+</li>
<li class=""><strong>软件迁移成本高</strong>：从 CUDA 迁移到非 NVIDIA 平台需要 6-12 月</li>
<li class=""><strong>客户集中度高</strong>：失去大客户 = 死亡</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/tenstorrent">Tenstorrent (Tensix 架构)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/sambanova-sn40l">SambaNova SN40L</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/graphcore-ipu">Graphcore IPU</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-3">Cerebras WSE-3 (IPO 同类)</a></li>
<li class=""><a href="https://www.cambricon.com/" target="_blank" rel="noopener noreferrer" class="">Cambricon 思元 590</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/comparison">完整对比表</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/ai-startup-survival-2026-tenstorrent-sambanova-graphcore#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>AI 芯片创业公司「赢家通吃」：</p>
<ol>
<li class=""><strong>Tenstorrent</strong> 最有可能成功（Jim Keller + RISC-V + 开源）</li>
<li class=""><strong>SambaNova</strong> 转型成功（企业一体机 + 美国政府）</li>
<li class=""><strong>Graphcore</strong> 已被软银收购（命运移交日本）</li>
<li class=""><strong>Cambricon / Moore Threads / Biren</strong> 受益于中国国产化</li>
<li class=""><strong>Lightmatter / Esperanto</strong> 细分市场存活</li>
<li class=""><strong>Mythic 等小型公司</strong> 难以存活</li>
</ol>
<p><strong>未来 5 年，AI 芯片行业将经历「大鱼吃小鱼」</strong>——创业公司要么 IPO，要么被收购，要么消失。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Vendor Strategy" term="Vendor Strategy"/>
        <category label="Industry News" term="Industry News"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[HBM 三家争霸战：SK Hynix / Samsung / Micron 争夺 AI 内存霸权]]></title>
        <id>https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026</id>
        <link href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026"/>
        <updated>2026-05-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026 年 HBM 市场格局深度分析：SK Hynix 70% 份额 / Samsung HBM4 追赶 / Micron HBM3e 优势 / 产能紧张如何制约 AI 芯片供应。]]></summary>
        <content type="html"><![CDATA[<p>AI 算力的瓶颈已经从<strong>算力本身</strong>转向了<strong>内存带宽和容量</strong>。<strong>HBM（High Bandwidth Memory）</strong> 作为 AI 芯片的核心组件，2026 年市场规模达 <strong>$80B+</strong>，但全球只有 <strong>3 家供应商</strong>——<strong>SK Hynix、Samsung、Micron</strong>。本文深入分析这场「内存三国杀」。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm-为什么是-ai-时代的关键">HBM 为什么是 AI 时代的关键？<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm-%E4%B8%BA%E4%BB%80%E4%B9%88%E6%98%AF-ai-%E6%97%B6%E4%BB%A3%E7%9A%84%E5%85%B3%E9%94%AE" class="hash-link" aria-label="Direct link to HBM 为什么是 AI 时代的关键？" title="Direct link to HBM 为什么是 AI 时代的关键？" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="内存墙memory-wall">内存墙（Memory Wall）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E5%86%85%E5%AD%98%E5%A2%99memory-wall" class="hash-link" aria-label="Direct link to 内存墙（Memory Wall）" title="Direct link to 内存墙（Memory Wall）" translate="no">​</a></h3>
<p>AI 模型规模从 2018 年的 BERT（340M 参数）增长到 2024 年的 Llama 3（405B）和 2026 年的 Gemini（推测 1T+），<strong>算力增长 1000×</strong>，但<strong>内存带宽仅增长 10-20×</strong>。</p>
<p>这是著名的<strong>冯·诺依曼瓶颈</strong>：</p>
<table><thead><tr><th>指标</th><th>2018 (V100)</th><th>2024 (H100)</th><th>2026 (Rubin R200)</th><th>增长</th></tr></thead><tbody><tr><td>算力 (FP16/BF16)</td><td>125 TFLOPS</td><td>989 TFLOPS</td><td>25 PFLOPS</td><td>200×</td></tr><tr><td>显存容量</td><td>32 GB</td><td>80 GB</td><td>288 GB</td><td>9×</td></tr><tr><td>显存带宽</td><td>900 GB/s</td><td>3.35 TB/s</td><td>22 TB/s</td><td>24×</td></tr></tbody></table>
<blockquote>
<p><strong>算力增长远快于内存带宽增长</strong>，导致 GPU 经常"等数据"。HBM 是缓解这一瓶颈的核心。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm-vs-gddr-vs-sram">HBM vs GDDR vs SRAM<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm-vs-gddr-vs-sram" class="hash-link" aria-label="Direct link to HBM vs GDDR vs SRAM" title="Direct link to HBM vs GDDR vs SRAM" translate="no">​</a></h3>
<table><thead><tr><th>内存类型</th><th>带宽 (per pin)</th><th>容量密度</th><th>功耗</th><th>适用场景</th></tr></thead><tbody><tr><td><strong>HBM4</strong></td><td><strong>6.4 Gbps/pin</strong></td><td>高（12-Hi）</td><td>中</td><td>AI 训练 / 推理 GPU</td></tr><tr><td>HBM3e</td><td>4.8 Gbps/pin</td><td>高（8-Hi/12-Hi）</td><td>中</td><td>AI 训练 / 推理 GPU</td></tr><tr><td>HBM3</td><td>4.0 Gbps/pin</td><td>中</td><td>中</td><td>AI 训练</td></tr><tr><td>HBM2e</td><td>3.2 Gbps/pin</td><td>中</td><td>中</td><td>AI 推理</td></tr><tr><td>GDDR6X</td><td>1.6 Gbps/pin</td><td>中</td><td>中高</td><td>消费级 GPU</td></tr><tr><td>GDDR7</td><td>2.5 Gbps/pin</td><td>中</td><td>中</td><td>消费级 / 工作站</td></tr><tr><td>LPDDR5X</td><td>0.85 Gbps/pin</td><td>高</td><td>低</td><td>边缘 AI / 移动</td></tr><tr><td>SRAM (片上)</td><td>10+ Gbps/pin</td><td>极低</td><td>极高</td><td>LPU / 缓存</td></tr></tbody></table>
<blockquote>
<p>HBM 是带宽和容量的最佳平衡点。<strong>SRAM 最快但容量太小（每 GB 成本 100× HBM）</strong>，<strong>GDDR 容量大但带宽不足</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="三家厂商格局">三家厂商格局<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E4%B8%89%E5%AE%B6%E5%8E%82%E5%95%86%E6%A0%BC%E5%B1%80" class="hash-link" aria-label="Direct link to 三家厂商格局" title="Direct link to 三家厂商格局" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-sk-hynix当前-hbm-霸主70-份额">1. SK Hynix：当前 HBM 霸主（70% 份额）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#1-sk-hynix%E5%BD%93%E5%89%8D-hbm-%E9%9C%B8%E4%B8%BB70-%E4%BB%BD%E9%A2%9D" class="hash-link" aria-label="Direct link to 1. SK Hynix：当前 HBM 霸主（70% 份额）" title="Direct link to 1. SK Hynix：当前 HBM 霸主（70% 份额）" translate="no">​</a></h3>
<p><strong>SK Hynix</strong> 是 HBM 市场的<strong>绝对领导者</strong>：</p>
<table><thead><tr><th>项目</th><th>SK Hynix 状态</th></tr></thead><tbody><tr><td><strong>市场份额</strong></td><td><strong>~70%</strong>（2025 HBM 总市场）</td></tr><tr><td><strong>HBM4 进展</strong></td><td><strong>首批量产 2026 Q1</strong>，NVIDIA 独家供应</td></tr><tr><td><strong>HBM3e 供应</strong></td><td>NVIDIA 主供（80%），部分 AMD</td></tr><tr><td><strong>核心技术</strong></td><td><strong>Advanced MR-MUF</strong>（大规模回流模塑底填）</td></tr><tr><td><strong>产能</strong></td><td>2026 计划 <strong>HBM 25,000 wafer/月</strong></td></tr><tr><td><strong>关键客户</strong></td><td>NVIDIA（90%），AMD，部分 Google</td></tr><tr><td><strong>2025 营收（HBM）</strong></td><td>~$30B（同比 +80%）</td></tr><tr><td><strong>2025 净利率</strong></td><td>~35%（远超传统 DRAM 业务）</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="sk-hynix-关键优势">SK Hynix 关键优势<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#sk-hynix-%E5%85%B3%E9%94%AE%E4%BC%98%E5%8A%BF" class="hash-link" aria-label="Direct link to SK Hynix 关键优势" title="Direct link to SK Hynix 关键优势" translate="no">​</a></h4>
<ul>
<li class=""><strong>最早量产 HBM3（2018）</strong>：技术先发</li>
<li class=""><strong>NVIDIA 深度合作</strong>：HBM3、HBM3e 都是 NVIDIA 首发</li>
<li class=""><strong>Advanced MR-MUF 工艺</strong>：封装良率领先</li>
<li class=""><strong>HBM4 抢先</strong>：2026 Q1 首批量产</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="关键事件">关键事件<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E5%85%B3%E9%94%AE%E4%BA%8B%E4%BB%B6" class="hash-link" aria-label="Direct link to 关键事件" title="Direct link to 关键事件" translate="no">​</a></h4>
<table><thead><tr><th>时间</th><th>事件</th></tr></thead><tbody><tr><td>2014</td><td>与 AMD 合作开发 HBM</td></tr><tr><td>2018</td><td>首批 HBM2 量产（NVIDIA V100）</td></tr><tr><td>2020</td><td>HBM2e 量产</td></tr><tr><td>2022</td><td>HBM3 量产（NVIDIA H100）</td></tr><tr><td>2024</td><td>HBM3e 12-Hi 量产（NVIDIA B200）</td></tr><tr><td>2026 Q1</td><td><strong>HBM4 首批量产</strong>（NVIDIA Rubin R200）</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-samsunghbm4-追赶者20-份额">2. Samsung：HBM4 追赶者（20% 份额）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#2-samsunghbm4-%E8%BF%BD%E8%B5%B6%E8%80%8520-%E4%BB%BD%E9%A2%9D" class="hash-link" aria-label="Direct link to 2. Samsung：HBM4 追赶者（20% 份额）" title="Direct link to 2. Samsung：HBM4 追赶者（20% 份额）" translate="no">​</a></h3>
<p><strong>Samsung</strong> 是 HBM 市场的<strong>挑战者</strong>，但受困于良率和 NVIDIA 认证：</p>
<table><thead><tr><th>项目</th><th>Samsung 状态</th></tr></thead><tbody><tr><td><strong>市场份额</strong></td><td><strong>~20%</strong>（2025 HBM 总市场）</td></tr><tr><td><strong>HBM4 进展</strong></td><td>2026 Q2 量产（落后 SK Hynix 1 季）</td></tr><tr><td><strong>HBM3e 供应</strong></td><td>等待 NVIDIA 认证，主要给 AMD / Google</td></tr><tr><td><strong>核心技术</strong></td><td><strong>TC-NCF</strong>（热压非导电膜）</td></tr><tr><td><strong>产能</strong></td><td>2026 计划 HBM 10,000 wafer/月</td></tr><tr><td><strong>关键客户</strong></td><td>AMD（部分 MI300X），Google TPU，Cerebras</td></tr><tr><td><strong>2025 营收（HBM）</strong></td><td>~$8B（同比 +150%，但仅 SK Hynix 1/4）</td></tr><tr><td><strong>2025 净利率</strong></td><td>~5%（良率低导致利润率低）</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="samsung-关键问题">Samsung 关键问题<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#samsung-%E5%85%B3%E9%94%AE%E9%97%AE%E9%A2%98" class="hash-link" aria-label="Direct link to Samsung 关键问题" title="Direct link to Samsung 关键问题" translate="no">​</a></h4>
<ul>
<li class=""><strong>HBM3 NVIDIA 认证未通过</strong>：2023-2024 多次送样失败</li>
<li class=""><strong>良率低</strong>：HBM3e 良率 ~50%（SK Hynix ~70%）</li>
<li class=""><strong>技术路线分歧</strong>：Samsung 押注 TC-NCF（vs SK Hynix 的 MR-MUF）</li>
<li class=""><strong>2024 大幅亏损</strong>：HBM 业务投入巨大但回报慢</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="samsung-2025-2026-突破">Samsung 2025-2026 突破<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#samsung-2025-2026-%E7%AA%81%E7%A0%B4" class="hash-link" aria-label="Direct link to Samsung 2025-2026 突破" title="Direct link to Samsung 2025-2026 突破" translate="no">​</a></h4>
<table><thead><tr><th>时间</th><th>事件</th></tr></thead><tbody><tr><td>2024-12</td><td>HBM3e 8-Hi 通过 AMD 认证</td></tr><tr><td>2025-03</td><td>HBM3e 12-Hi 通过 Google TPU 认证</td></tr><tr><td>2025-Q4</td><td>HBM4 试产，2026 Q2 量产</td></tr><tr><td>2025-Q4</td><td>NVIDIA HBM3e 12-Hi 认证<strong>通过</strong>（部分订单）</td></tr></tbody></table>
<blockquote>
<p><strong>2025 末 Samsung 终于获得 NVIDIA HBM3e 12-Hi 部分订单</strong>，这是 Samsung HBM 业务的转折点。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-micronhbm-黑马10-份额">3. Micron：HBM 黑马（10% 份额）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#3-micronhbm-%E9%BB%91%E9%A9%AC10-%E4%BB%BD%E9%A2%9D" class="hash-link" aria-label="Direct link to 3. Micron：HBM 黑马（10% 份额）" title="Direct link to 3. Micron：HBM 黑马（10% 份额）" translate="no">​</a></h3>
<p><strong>Micron</strong> 是 HBM 市场的<strong>黑马</strong>，凭借<strong>HBM3E 12-Hi 9.2 Gbps 速度</strong>成为 NVIDIA 第二供应商：</p>
<table><thead><tr><th>项目</th><th>Micron 状态</th></tr></thead><tbody><tr><td><strong>市场份额</strong></td><td><strong>~10%</strong>（2025 HBM 总市场）</td></tr><tr><td><strong>HBM4 进展</strong></td><td>2026 Q3 量产（落后 SK Hynix 2 季）</td></tr><tr><td><strong>HBM3e 供应</strong></td><td>NVIDIA 第二供应商（~30% 份额）</td></tr><tr><td><strong>核心技术</strong></td><td><strong>1znm DRAM</strong> + Advanced Packaging</td></tr><tr><td><strong>产能</strong></td><td>2026 计划 HBM 5,000 wafer/月</td></tr><tr><td><strong>关键客户</strong></td><td>NVIDIA（部分），AMD，Intel</td></tr><tr><td><strong>2025 营收（HBM）</strong></td><td>~$4B（同比 +200%）</td></tr><tr><td><strong>2025 净利率</strong></td><td>~10%</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="micron-关键优势">Micron 关键优势<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#micron-%E5%85%B3%E9%94%AE%E4%BC%98%E5%8A%BF" class="hash-link" aria-label="Direct link to Micron 关键优势" title="Direct link to Micron 关键优势" translate="no">​</a></h4>
<ul>
<li class=""><strong>HBM3E 12-Hi 9.2 Gbps</strong>：业界最高速度（领先 SK Hynix 的 9 Gbps）</li>
<li class=""><strong>美光本土制造</strong>：美国爱达荷/纽约工厂，符合 CHIPS Act</li>
<li class=""><strong>NVIDIA 第二供应商</strong>：避免单源风险</li>
<li class=""><strong>2025 突破</strong>：营收同比 +200%，是三家中增长最快</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="micron-关键事件">Micron 关键事件<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#micron-%E5%85%B3%E9%94%AE%E4%BA%8B%E4%BB%B6" class="hash-link" aria-label="Direct link to Micron 关键事件" title="Direct link to Micron 关键事件" translate="no">​</a></h4>
<table><thead><tr><th>时间</th><th>事件</th></tr></thead><tbody><tr><td>2024</td><td>首次量产 HBM3E 8-Hi</td></tr><tr><td>2025-Q1</td><td>HBM3E 12-Hi 量产（业界首发）</td></tr><tr><td>2025-Q2</td><td><strong>NVIDIA H100/B200 认证</strong>通过</td></tr><tr><td>2025-Q3</td><td>部分 B200 订单（~30% 份额）</td></tr><tr><td>2026-Q3</td><td>HBM4 预计量产</td></tr></tbody></table>
<blockquote>
<p><strong>Micron 是三家中增长最快的，2025 同比 +200%</strong>。但产能仅 5,000 wafer/月，限制其市占率提升。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm-技术路线图">HBM 技术路线图<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm-%E6%8A%80%E6%9C%AF%E8%B7%AF%E7%BA%BF%E5%9B%BE" class="hash-link" aria-label="Direct link to HBM 技术路线图" title="Direct link to HBM 技术路线图" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm4-关键规格">HBM4 关键规格<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm4-%E5%85%B3%E9%94%AE%E8%A7%84%E6%A0%BC" class="hash-link" aria-label="Direct link to HBM4 关键规格" title="Direct link to HBM4 关键规格" translate="no">​</a></h3>
<table><thead><tr><th>项目</th><th>HBM3e</th><th><strong>HBM4</strong></th><th>提升</th></tr></thead><tbody><tr><td>单 stack 容量</td><td>24 GB (12-Hi)</td><td><strong>36-48 GB</strong>（16-Hi）</td><td>1.5-2×</td></tr><tr><td>单 pin 速度</td><td>9.2 Gbps</td><td><strong>12-16 Gbps</strong></td><td>1.3-1.7×</td></tr><tr><td>单 stack 带宽</td><td>1.2 TB/s</td><td><strong>1.5-2 TB/s</strong></td><td>1.3-1.7×</td></tr><tr><td>功耗</td><td>7W/stack</td><td>8W/stack</td><td>略增</td></tr><tr><td>制造工艺</td><td>1z/1β DRAM</td><td>1γ/1δ DRAM</td><td>缩小</td></tr><tr><td>封装</td><td>CoWoS-S</td><td><strong>CoWoS-L</strong></td><td>更大 interposer</td></tr><tr><td>量产时间</td><td>2024-2025</td><td><strong>2026 Q1 (SK) / Q2 (Sam) / Q3 (Mic)</strong></td><td>—</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm4e--hbm5-路线图">HBM4E / HBM5 路线图<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm4e--hbm5-%E8%B7%AF%E7%BA%BF%E5%9B%BE" class="hash-link" aria-label="Direct link to HBM4E / HBM5 路线图" title="Direct link to HBM4E / HBM5 路线图" translate="no">​</a></h3>
<table><thead><tr><th>年份</th><th>型号</th><th>容量</th><th>速度</th><th>制程</th></tr></thead><tbody><tr><td>2026</td><td><strong>HBM4</strong></td><td>36-48 GB</td><td>12-16 Gbps</td><td>1γ/1δ</td></tr><tr><td>2027</td><td><strong>HBM4E</strong></td><td>48-64 GB</td><td>16-20 Gbps</td><td>1δ</td></tr><tr><td>2028</td><td><strong>HBM5</strong></td><td>64-80 GB</td><td>20-24 Gbps</td><td>1ε</td></tr><tr><td>2029</td><td><strong>HBM5E</strong></td><td>80-96 GB</td><td>24-28 Gbps</td><td>1ε+</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="产能与价格">产能与价格<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E4%BA%A7%E8%83%BD%E4%B8%8E%E4%BB%B7%E6%A0%BC" class="hash-link" aria-label="Direct link to 产能与价格" title="Direct link to 产能与价格" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm-产能2026-计划">HBM 产能（2026 计划）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm-%E4%BA%A7%E8%83%BD2026-%E8%AE%A1%E5%88%92" class="hash-link" aria-label="Direct link to HBM 产能（2026 计划）" title="Direct link to HBM 产能（2026 计划）" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>2025 实际</th><th>2026 计划</th><th>2027 计划</th><th>市场份额</th></tr></thead><tbody><tr><td><strong>SK Hynix</strong></td><td>18,000 wafer/月</td><td><strong>25,000</strong></td><td>35,000</td><td>60-70%</td></tr><tr><td><strong>Samsung</strong></td><td>6,000 wafer/月</td><td>10,000</td><td>18,000</td><td>15-20%</td></tr><tr><td><strong>Micron</strong></td><td>3,500 wafer/月</td><td>5,000</td><td>12,000</td><td>10-15%</td></tr><tr><td><strong>合计</strong></td><td>27,500</td><td><strong>40,000</strong></td><td>65,000</td><td>100%</td></tr></tbody></table>
<blockquote>
<p><strong>2026 HBM 产能紧张</strong>：需求 ~50,000 wafer/月，供给仅 40,000 wafer/月，<strong>缺口 20%</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm-价格per-gb">HBM 价格（per GB）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm-%E4%BB%B7%E6%A0%BCper-gb" class="hash-link" aria-label="Direct link to HBM 价格（per GB）" title="Direct link to HBM 价格（per GB）" translate="no">​</a></h3>
<table><thead><tr><th>型号</th><th>2024</th><th>2025</th><th>2026</th><th>2027 (E)</th></tr></thead><tbody><tr><td><strong>HBM4</strong></td><td>N/A</td><td>N/A</td><td><strong>$20-25/GB</strong></td><td>$15-18/GB</td></tr><tr><td>HBM3e 12-Hi</td><td>$18-22/GB</td><td>$15-18/GB</td><td>$12-15/GB</td><td>$10-12/GB</td></tr><tr><td>HBM3e 8-Hi</td><td>$14-18/GB</td><td>$11-14/GB</td><td>$9-11/GB</td><td>$8-10/GB</td></tr><tr><td>HBM3 8-Hi</td><td>$10-12/GB</td><td>$8-10/GB</td><td>$6-8/GB</td><td>$5-7/GB</td></tr></tbody></table>
<blockquote>
<p><strong>HBM 占 AI 芯片成本 30-50%</strong>。NVIDIA B200 的 HBM 成本约 $5,000-8,000（192GB HBM3e × ~$30/GB）。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="hbm-与-ai-芯片供应的关联">HBM 与 AI 芯片供应的关联<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#hbm-%E4%B8%8E-ai-%E8%8A%AF%E7%89%87%E4%BE%9B%E5%BA%94%E7%9A%84%E5%85%B3%E8%81%94" class="hash-link" aria-label="Direct link to HBM 与 AI 芯片供应的关联" title="Direct link to HBM 与 AI 芯片供应的关联" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="nvidia-rubin-r200-hbm-供应链">NVIDIA Rubin R200 HBM 供应链<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#nvidia-rubin-r200-hbm-%E4%BE%9B%E5%BA%94%E9%93%BE" class="hash-link" aria-label="Direct link to NVIDIA Rubin R200 HBM 供应链" title="Direct link to NVIDIA Rubin R200 HBM 供应链" translate="no">​</a></h3>
<table><thead><tr><th>HBM 来源</th><th>占比</th><th>备注</th></tr></thead><tbody><tr><td><strong>SK Hynix HBM4</strong></td><td><strong>70%</strong></td><td>首批，独家供应 2026 Q1-Q2</td></tr><tr><td><strong>Micron HBM4</strong></td><td><strong>20%</strong></td><td>2026 Q3 起，CHIPS Act 优惠</td></tr><tr><td><strong>Samsung HBM4</strong></td><td><strong>10%</strong></td><td>2026 Q4 起（认证延迟）</td></tr></tbody></table>
<blockquote>
<p><strong>NVIDIA 仍严重依赖 SK Hynix</strong>。这是 NVIDIA 供应链的<strong>单点故障风险</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="amd-mi400-hbm-供应链">AMD MI400 HBM 供应链<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#amd-mi400-hbm-%E4%BE%9B%E5%BA%94%E9%93%BE" class="hash-link" aria-label="Direct link to AMD MI400 HBM 供应链" title="Direct link to AMD MI400 HBM 供应链" translate="no">​</a></h3>
<table><thead><tr><th>HBM 来源</th><th>占比</th><th>备注</th></tr></thead><tbody><tr><td><strong>SK Hynix HBM3e</strong></td><td>50%</td><td>主供</td></tr><tr><td><strong>Samsung HBM3e</strong></td><td>30%</td><td>2025 突破后扩大</td></tr><tr><td><strong>Micron HBM3e</strong></td><td>20%</td><td>备份供应</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="google-tpu-hbm-供应链">Google TPU HBM 供应链<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#google-tpu-hbm-%E4%BE%9B%E5%BA%94%E9%93%BE" class="hash-link" aria-label="Direct link to Google TPU HBM 供应链" title="Direct link to Google TPU HBM 供应链" translate="no">​</a></h3>
<table><thead><tr><th>HBM 来源</th><th>占比</th><th>备注</th></tr></thead><tbody><tr><td><strong>Samsung HBM3e</strong></td><td>60%</td><td>早期合作</td></tr><tr><td><strong>SK Hynix HBM3e</strong></td><td>30%</td><td>部分订单</td></tr><tr><td><strong>Micron HBM3e</strong></td><td>10%</td><td>新加入</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="huawei-ascend-920-hbm-供应链">Huawei Ascend 920 HBM 供应链<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#huawei-ascend-920-hbm-%E4%BE%9B%E5%BA%94%E9%93%BE" class="hash-link" aria-label="Direct link to Huawei Ascend 920 HBM 供应链" title="Direct link to Huawei Ascend 920 HBM 供应链" translate="no">​</a></h3>
<table><thead><tr><th>HBM 来源</th><th>占比</th><th>备注</th></tr></thead><tbody><tr><td><strong>CXMT 长鑫存储</strong></td><td>70%</td><td>国产 HBM，4 Tbps</td></tr><tr><td><strong>Samsung HBM3</strong></td><td>20%</td><td>受美国出口管制限制</td></tr><tr><td><strong>SK Hynix</strong></td><td>10%</td><td>受美国出口管制限制</td></tr></tbody></table>
<blockquote>
<p><strong>Huawei 受美国出口管制影响</strong>，被迫加速国产 CXMT HBM 替代。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="投资分析">投资分析<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E6%8A%95%E8%B5%84%E5%88%86%E6%9E%90" class="hash-link" aria-label="Direct link to 投资分析" title="Direct link to 投资分析" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="三家厂商-2025-2026-表现">三家厂商 2025-2026 表现<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E4%B8%89%E5%AE%B6%E5%8E%82%E5%95%86-2025-2026-%E8%A1%A8%E7%8E%B0" class="hash-link" aria-label="Direct link to 三家厂商 2025-2026 表现" title="Direct link to 三家厂商 2025-2026 表现" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>2025 营收 (HBM)</th><th>同比</th><th>2026 (E) 营收</th><th>净利率</th></tr></thead><tbody><tr><td><strong>SK Hynix</strong></td><td>~$30B</td><td>+80%</td><td>~$50B</td><td>~35%</td></tr><tr><td><strong>Samsung</strong></td><td>~$8B</td><td>+150%</td><td>~$15B</td><td>~10%</td></tr><tr><td><strong>Micron</strong></td><td>~$4B</td><td>+200%</td><td>~$10B</td><td>~15%</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="股价表现2025-至今">股价表现（2025 至今）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E8%82%A1%E4%BB%B7%E8%A1%A8%E7%8E%B02025-%E8%87%B3%E4%BB%8A" class="hash-link" aria-label="Direct link to 股价表现（2025 至今）" title="Direct link to 股价表现（2025 至今）" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>2025 涨幅</th><th>2026 涨幅（YTD）</th></tr></thead><tbody><tr><td><strong>SK Hynix</strong></td><td>+120%</td><td>+35%</td></tr><tr><td><strong>Samsung</strong></td><td>+15%</td><td>+10%</td></tr><tr><td><strong>Micron</strong></td><td>+90%</td><td>+25%</td></tr><tr><td><strong>NVIDIA</strong></td><td>+180%</td><td>+40%</td></tr></tbody></table>
<blockquote>
<p><strong>SK Hynix 是 AI 内存最大受益者</strong>，2025 涨幅 120%，超过 Samsung（仅 15%）。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="风险与挑战">风险与挑战<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E9%A3%8E%E9%99%A9%E4%B8%8E%E6%8C%91%E6%88%98" class="hash-link" aria-label="Direct link to 风险与挑战" title="Direct link to 风险与挑战" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-hbm-产能紧张">1. HBM 产能紧张<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#1-hbm-%E4%BA%A7%E8%83%BD%E7%B4%A7%E5%BC%A0" class="hash-link" aria-label="Direct link to 1. HBM 产能紧张" title="Direct link to 1. HBM 产能紧张" translate="no">​</a></h3>
<ul>
<li class=""><strong>2026 缺口 20%</strong>（需求 50K wafer vs 供应 40K）</li>
<li class=""><strong>NVIDIA Rubin R200 出货</strong>可能因 HBM 供应紧张而延迟</li>
<li class=""><strong>客户预付定金</strong>已成为常态</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-美国出口管制">2. 美国出口管制<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#2-%E7%BE%8E%E5%9B%BD%E5%87%BA%E5%8F%A3%E7%AE%A1%E5%88%B6" class="hash-link" aria-label="Direct link to 2. 美国出口管制" title="Direct link to 2. 美国出口管制" translate="no">​</a></h3>
<ul>
<li class=""><strong>HBM 出口中国</strong>受美国商务部严格限制</li>
<li class=""><strong>Samsung、SK Hynix</strong> 在中国工厂受限</li>
<li class=""><strong>Huawei</strong> 加速国产 CXMT HBM 替代</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-技术路线分歧">3. 技术路线分歧<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#3-%E6%8A%80%E6%9C%AF%E8%B7%AF%E7%BA%BF%E5%88%86%E6%AD%A7" class="hash-link" aria-label="Direct link to 3. 技术路线分歧" title="Direct link to 3. 技术路线分歧" translate="no">​</a></h3>
<ul>
<li class=""><strong>SK Hynix</strong>：MR-MUF 路线，领先</li>
<li class=""><strong>Samsung</strong>：TC-NCF 路线，落后但在追</li>
<li class=""><strong>Micron</strong>：介于两者之间</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-竞争技术">4. 竞争技术<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#4-%E7%AB%9E%E4%BA%89%E6%8A%80%E6%9C%AF" class="hash-link" aria-label="Direct link to 4. 竞争技术" title="Direct link to 4. 竞争技术" translate="no">​</a></h3>
<ul>
<li class=""><strong>Samsung HBM-PIM</strong>：HBM 内置处理单元（存算一体）</li>
<li class=""><strong>TSMC SoIC</strong>：3D 堆叠 SRAM + Logic</li>
<li class=""><strong>Micron HBM-CX</strong>：Compute Express Link 集成</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="未来展望">未来展望<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E6%9C%AA%E6%9D%A5%E5%B1%95%E6%9C%9B" class="hash-link" aria-label="Direct link to 未来展望" title="Direct link to 未来展望" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="短期2026-2027">短期（2026-2027）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E7%9F%AD%E6%9C%9F2026-2027" class="hash-link" aria-label="Direct link to 短期（2026-2027）" title="Direct link to 短期（2026-2027）" translate="no">​</a></h3>
<ul>
<li class=""><strong>HBM 持续紧张</strong>：需求 &gt; 供给</li>
<li class=""><strong>价格高位</strong>：HBM4 $20-25/GB</li>
<li class=""><strong>三家共存</strong>：SK Hynix 70% + Samsung 20% + Micron 10%</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="中期2027-2029">中期（2027-2029）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E4%B8%AD%E6%9C%9F2027-2029" class="hash-link" aria-label="Direct link to 中期（2027-2029）" title="Direct link to 中期（2027-2029）" translate="no">​</a></h3>
<ul>
<li class=""><strong>HBM4E / HBM5</strong>：容量 64-96 GB，速度 20-28 Gbps</li>
<li class=""><strong>国产 HBM 崛起</strong>：CXMT 量产 8-Hi</li>
<li class=""><strong>新进入者</strong>：长江存储（YMTC）可能进入 HBM 市场</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="长期2029">长期（2029+）<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E9%95%BF%E6%9C%9F2029" class="hash-link" aria-label="Direct link to 长期（2029+）" title="Direct link to 长期（2029+）" translate="no">​</a></h3>
<ul>
<li class=""><strong>HBM6 / 3D HBM</strong>：堆叠更多层，TSV 替代品</li>
<li class=""><strong>PIM-HBM</strong>：HBM 内置处理单元</li>
<li class=""><strong>替代技术</strong>：片上 SRAM 容量突破（如 Cerebras WSE）</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/rubin-r200">NVIDIA Rubin R200（HBM4 首发）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi400">AMD MI400（CDNA Next + HBM4）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8t">Google TPU 8t + 8i（HBM 集成）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-3">Cerebras WSE-3（晶圆级 SRAM vs HBM）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/comparison">完整对比表</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/hbm-market-sk-hynix-samsung-micron-2026#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>HBM 三家争霸战将持续 3-5 年：</p>
<ol>
<li class=""><strong>SK Hynix 短期不可撼动</strong>——HBM4 首发 + NVIDIA 深度绑定</li>
<li class=""><strong>Samsung 在 2025 末突破</strong>——HBM3e 12-Hi 终于获 NVIDIA 认证</li>
<li class=""><strong>Micron 是最快增长者</strong>——HBM3E 9.2 Gbps 业界最快</li>
<li class=""><strong>产能紧张持续</strong>——2026 缺口 20%</li>
<li class=""><strong>国产 HBM 崛起</strong>——CXMT 长鑫 4 Tbps 突破</li>
</ol>
<p><strong>HBM 不是配角，而是 AI 时代的"水电煤"</strong>。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Tech Deep Dive" term="Tech Deep Dive"/>
        <category label="Vendor Strategy" term="Vendor Strategy"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[机柜级 AI 时代：NVL72 vs Helios vs Groq 3 LPX vs Trn3 UltraServer 四大方案对比]]></title>
        <id>https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver</id>
        <link href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver"/>
        <updated>2026-05-20T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026 年 AI 算力进入机柜级时代：NVIDIA Rubin NVL72 (576 GPU)、AMD Helios (72 MI400)、Groq 3 LPX (256 LPU)、AWS Trn3 UltraServer (144 chip) 四大方案的深度对比与选型。]]></summary>
        <content type="html"><![CDATA[<p><strong>2026 年 AI 算力进入"机柜级"时代</strong>。单芯片对标已退潮，整机柜方案成为主战场。本文将深度对比 <strong>NVIDIA Rubin NVL72/NVL576、AMD Helios、Groq 3 LPX、AWS Trn3 UltraServer、Google TPU 8t pod</strong> 五大机柜级方案。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="为什么是机柜级时代">为什么是机柜级时代？<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E4%B8%BA%E4%BB%80%E4%B9%88%E6%98%AF%E6%9C%BA%E6%9F%9C%E7%BA%A7%E6%97%B6%E4%BB%A3" class="hash-link" aria-label="Direct link to 为什么是机柜级时代？" title="Direct link to 为什么是机柜级时代？" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="单芯片对标的局限">单芯片对标的局限<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E5%8D%95%E8%8A%AF%E7%89%87%E5%AF%B9%E6%A0%87%E7%9A%84%E5%B1%80%E9%99%90" class="hash-link" aria-label="Direct link to 单芯片对标的局限" title="Direct link to 单芯片对标的局限" translate="no">​</a></h3>
<table><thead><tr><th>单芯片指标</th><th>2018 (V100)</th><th>2024 (H100)</th><th>2026 (Rubin R200)</th><th>2028 (推测)</th></tr></thead><tbody><tr><td>算力</td><td>125 TFLOPS</td><td>989 TFLOPS</td><td>25 PFLOPS</td><td>80 PFLOPS</td></tr><tr><td>显存</td><td>32 GB</td><td>80 GB</td><td>288 GB</td><td>1 TB</td></tr><tr><td>TDP</td><td>300 W</td><td>700 W</td><td>1,800 W</td><td>3,000 W</td></tr></tbody></table>
<blockquote>
<p><strong>单芯片 TDP 即将突破 3,000W</strong>——物理散热、电源、互联都达到极限。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="机柜级的优势">机柜级的优势<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E6%9C%BA%E6%9F%9C%E7%BA%A7%E7%9A%84%E4%BC%98%E5%8A%BF" class="hash-link" aria-label="Direct link to 机柜级的优势" title="Direct link to 机柜级的优势" translate="no">​</a></h3>
<ul>
<li class=""><strong>统一散热</strong>：液冷整柜，散热效率高</li>
<li class=""><strong>统一电源</strong>：集中供电，能效优化</li>
<li class=""><strong>统一互联</strong>：NVLink 6 / UALoF / GroqSync / NeuronLink</li>
<li class=""><strong>统一管理</strong>：单系统软件栈</li>
<li class=""><strong>统一采购</strong>：单 SKU 购买，简化运维</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="五大机柜级方案">五大机柜级方案<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E4%BA%94%E5%A4%A7%E6%9C%BA%E6%9F%9C%E7%BA%A7%E6%96%B9%E6%A1%88" class="hash-link" aria-label="Direct link to 五大机柜级方案" title="Direct link to 五大机柜级方案" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-nvidia-rubin-nvl72--nvl576">1. NVIDIA Rubin NVL72 / NVL576<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#1-nvidia-rubin-nvl72--nvl576" class="hash-link" aria-label="Direct link to 1. NVIDIA Rubin NVL72 / NVL576" title="Direct link to 1. NVIDIA Rubin NVL72 / NVL576" translate="no">​</a></h3>
<table><thead><tr><th>项目</th><th>Rubin NVL72</th><th><strong>Rubin NVL576</strong></th></tr></thead><tbody><tr><td>GPU 数</td><td>72</td><td><strong>576</strong></td></tr><tr><td>CPU 数</td><td>36</td><td>288</td></tr><tr><td>HBM 总量</td><td>20.7 TB HBM4</td><td>165 TB HBM4</td></tr><tr><td>显存带宽</td><td>1.6 PB/s</td><td>12.7 PB/s</td></tr><tr><td>NVLink 聚合</td><td>252 TB/s</td><td>2,016 TB/s</td></tr><tr><td>FP4 稀疏算力</td><td>3.6 EFLOPS</td><td><strong>28.8 EFLOPS</strong></td></tr><tr><td>FP8 稀疏算力</td><td>1.8 EFLOPS</td><td>14.4 EFLOPS</td></tr><tr><td>DC 网络</td><td>ConnectX-9 1152 端口</td><td>ConnectX-9 1152 端口</td></tr><tr><td>TDP（机柜）</td><td>~130 kW</td><td><strong>~1 MW</strong></td></tr><tr><td>散热</td><td>液冷</td><td>液冷</td></tr><tr><td>适用</td><td>100B+ 模型训练</td><td><strong>1T+ 巨型模型</strong></td></tr><tr><td>价格</td><td>~$3-5M</td><td><strong>~$25-40M</strong></td></tr><tr><td>发布时间</td><td>2026 H2</td><td>2026 H2+</td></tr></tbody></table>
<blockquote>
<p><strong>Rubin NVL576 = 28.8 EFLOPS FP4 = 1.5 ExaFLOPS FP8 = 全球最强 AI 超级节点</strong></p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-amd-helios-机柜">2. AMD Helios 机柜<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#2-amd-helios-%E6%9C%BA%E6%9F%9C" class="hash-link" aria-label="Direct link to 2. AMD Helios 机柜" title="Direct link to 2. AMD Helios 机柜" translate="no">​</a></h3>
<table><thead><tr><th>项目</th><th>Helios</th></tr></thead><tbody><tr><td>GPU 数</td><td><strong>72 颗 MI400</strong></td></tr><tr><td>CPU 数</td><td>36 颗 EPYC Venice</td></tr><tr><td>HBM 总量</td><td>31.1 TB HBM4</td></tr><tr><td>显存带宽</td><td>1.4 PB/s</td></tr><tr><td>Scale-up 互联</td><td><strong>UALoF 260 TB/s</strong>（开放标准）</td></tr><tr><td>Scale-out 网络</td><td>Pensando Vulcano 800G</td></tr><tr><td>FP4 dense 算力</td><td><strong>2.88 EFLOPS</strong></td></tr><tr><td>FP8 dense 算力</td><td>1.44 EFLOPS</td></tr><tr><td>TDP（机柜）</td><td>~80 kW</td></tr><tr><td>散热</td><td>液冷</td></tr><tr><td>适用</td><td>700B+ 模型训练</td></tr><tr><td>价格</td><td>~$2-3M</td></tr><tr><td>发布时间</td><td>2026</td></tr></tbody></table>
<blockquote>
<p><strong>Helios 在 dense 算力上超越 NVIDIA Rubin NVL72</strong>（2.88 vs 1.8 EF FP8 dense）</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-nvidia-groq-3-lpx-机柜推理专用">3. NVIDIA Groq 3 LPX 机柜（推理专用）<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#3-nvidia-groq-3-lpx-%E6%9C%BA%E6%9F%9C%E6%8E%A8%E7%90%86%E4%B8%93%E7%94%A8" class="hash-link" aria-label="Direct link to 3. NVIDIA Groq 3 LPX 机柜（推理专用）" title="Direct link to 3. NVIDIA Groq 3 LPX 机柜（推理专用）" translate="no">​</a></h3>
<table><thead><tr><th>项目</th><th>Groq 3 LPX</th></tr></thead><tbody><tr><td>LPU 数</td><td><strong>256 颗 Groq 3 LPU</strong></td></tr><tr><td>CPU 数</td><td>无（独立）</td></tr><tr><td>片上 SRAM</td><td><strong>128 GB 聚合</strong></td></tr><tr><td>SRAM 带宽</td><td><strong>40 PB/s</strong>（SRAM，非 HBM）</td></tr><tr><td>互联</td><td>GroqSync + NVLink-Network <strong>640 TB/s</strong></td></tr><tr><td>FP8 算力</td><td>~640 PFLOPS</td></tr><tr><td>INT8 算力</td><td>~640,000 TOPS</td></tr><tr><td>TDP（机柜）</td><td>~80 kW</td></tr><tr><td><strong>TTFT（首 Token 延迟）</strong></td><td><strong>&lt; 20ms</strong></td></tr><tr><td><strong>TPOT</strong></td><td><strong>&lt; 5ms</strong></td></tr><tr><td>散热</td><td>液冷</td></tr><tr><td>适用</td><td><strong>超低延迟推理</strong>（Agentic AI）</td></tr><tr><td>价格</td><td>~$8-10M</td></tr><tr><td>发布时间</td><td>2026 H2</td></tr></tbody></table>
<blockquote>
<p><strong>Groq 3 LPX 是当前唯一专为 Agentic AI 设计的机柜级 LPU 系统</strong></p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-aws-trn3-ultraserver">4. AWS Trn3 UltraServer<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#4-aws-trn3-ultraserver" class="hash-link" aria-label="Direct link to 4. AWS Trn3 UltraServer" title="Direct link to 4. AWS Trn3 UltraServer" translate="no">​</a></h3>
<table><thead><tr><th>项目</th><th>Trn3 UltraServer</th></tr></thead><tbody><tr><td>芯片数</td><td><strong>144 颗 Trainium 3</strong></td></tr><tr><td>HBM 总量</td><td>~20.7 TB</td></tr><tr><td>NeuronLink-v4</td><td>全互联，&gt;10 TB/s 双向</td></tr><tr><td>FP8 dense 算力</td><td><strong>52 PFLOPS</strong></td></tr><tr><td>BF16 dense 算力</td><td>~187 PFLOPS</td></tr><tr><td>TDP（机柜）</td><td>~100 kW</td></tr><tr><td>散热</td><td>液冷</td></tr><tr><td>适用</td><td><strong>400B+ 模型训练</strong></td></tr><tr><td>价格（推测）</td><td>~$3-5M</td></tr><tr><td>发布时间</td><td>2025-12 GA</td></tr></tbody></table>
<blockquote>
<p><strong>Trn3 UltraServer = 性价比最高的大规模训练方案</strong>（每美元性能 2-3× NVIDIA）</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-google-tpu-8t-pod">5. Google TPU 8t pod<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#5-google-tpu-8t-pod" class="hash-link" aria-label="Direct link to 5. Google TPU 8t pod" title="Direct link to 5. Google TPU 8t pod" translate="no">​</a></h3>
<table><thead><tr><th>项目</th><th>TPU 8t pod</th></tr></thead><tbody><tr><td>芯片数</td><td><strong>9,216 颗 TPU 8t</strong></td></tr><tr><td>HBM 总量</td><td>~2 PB HBM</td></tr><tr><td>互联</td><td>3D Torus</td></tr><tr><td>集成 CPU</td><td>Arm Axion（每节点 64 核）</td></tr><tr><td>BF16 dense 算力</td><td><strong>~32 PFLOPS × 9,216 = 295 EFLOPS</strong></td></tr><tr><td>FP8 dense 算力</td><td>~590 EFLOPS</td></tr><tr><td>散热</td><td>液冷</td></tr><tr><td>适用</td><td><strong>Gemini 3/4 训练</strong></td></tr><tr><td>价格</td><td>仅 Google Cloud</td></tr><tr><td>发布时间</td><td>2026-04-22</td></tr></tbody></table>
<blockquote>
<p><strong>TPU 8t pod = 全球最大 AI 训练集群</strong>（9,216 颗芯片 × 9 PFLOPS ≈ 83 EFLOPS FP4 dense）</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="五大方案横向对比">五大方案横向对比<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E4%BA%94%E5%A4%A7%E6%96%B9%E6%A1%88%E6%A8%AA%E5%90%91%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 五大方案横向对比" title="Direct link to 五大方案横向对比" translate="no">​</a></h2>
<table><thead><tr><th>指标</th><th>NVIDIA NVL72</th><th>AMD Helios</th><th>Groq 3 LPX</th><th>Trn3 UltraServer</th><th>TPU 8t pod</th></tr></thead><tbody><tr><td><strong>形态</strong></td><td>训练机柜</td><td>训练机柜</td><td>推理机柜</td><td>训练机柜</td><td>训练 pod</td></tr><tr><td><strong>芯片数</strong></td><td>72 GPU</td><td>72 GPU</td><td>256 LPU</td><td>144 chip</td><td>9,216 chip</td></tr><tr><td><strong>总内存</strong></td><td>20.7 TB HBM</td><td>31.1 TB HBM</td><td>128 GB SRAM</td><td>20.7 TB HBM</td><td>~2 PB HBM</td></tr><tr><td><strong>互联</strong></td><td>NVLink 6 252 TB/s</td><td>UALoF 260 TB/s</td><td>GroqSync 640 TB/s</td><td>NeuronLink-v4</td><td>3D Torus</td></tr><tr><td><strong>FP4 算力</strong></td><td>3.6 EF (sparse)</td><td>2.88 EF (dense)</td><td>—</td><td>—</td><td>—</td></tr><tr><td><strong>FP8 算力</strong></td><td>1.8 EF (sparse)</td><td>1.44 EF (dense)</td><td>640 PF</td><td>52 PF (dense)</td><td>590 EF (dense)</td></tr><tr><td><strong>TDP</strong></td><td>130 kW</td><td>80 kW</td><td>80 kW</td><td>100 kW</td><td>~10 MW (pod)</td></tr><tr><td><strong>TTFT</strong></td><td>~100ms</td><td>~100ms</td><td><strong>&lt; 20ms</strong></td><td>~100ms</td><td>~100ms</td></tr><tr><td><strong>生态</strong></td><td>CUDA 13</td><td>ROCm 8</td><td>Groq SDK</td><td>Neuron 3</td><td>JAX 0.5+</td></tr><tr><td><strong>价格</strong></td><td>$3-5M</td><td>$2-3M</td><td>$8-10M</td><td>$3-5M</td><td>内部使用</td></tr><tr><td><strong>客户</strong></td><td>所有云 + 客户</td><td>客户 + 云</td><td>客户 + 云</td><td>AWS Cloud</td><td>Google Cloud</td></tr><tr><td><strong>标准化</strong></td><td>❌ NVLink 封闭</td><td>✅ UALoF 开放</td><td>❌ GroqSync</td><td>❌ NeuronLink</td><td>❌ Torus</td></tr><tr><td><strong>发布时间</strong></td><td>2026 H2</td><td>2026</td><td>2026 H2</td><td>2025-12 GA</td><td>2026-04</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="选型建议">选型建议<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E9%80%89%E5%9E%8B%E5%BB%BA%E8%AE%AE" class="hash-link" aria-label="Direct link to 选型建议" title="Direct link to 选型建议" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="大规模训练">大规模训练<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E5%A4%A7%E8%A7%84%E6%A8%A1%E8%AE%AD%E7%BB%83" class="hash-link" aria-label="Direct link to 大规模训练" title="Direct link to 大规模训练" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>推荐方案</th><th>理由</th></tr></thead><tbody><tr><td><strong>100B-700B 模型训练</strong></td><td>NVIDIA Rubin NVL72</td><td>单机柜可装 100B，FP4 算力最强</td></tr><tr><td><strong>700B-1T 模型训练</strong></td><td>NVIDIA Rubin NVL576 或 AMD Helios</td><td>多机柜互联</td></tr><tr><td><strong>1T+ 巨型模型训练</strong></td><td>NVIDIA NVL576 (8 个)</td><td>28.8 EFLOPS × 8 = 230 EFLOPS</td></tr><tr><td><strong>超大规模 (Gemini 级)</strong></td><td>Google TPU 8t pod (9,216 chip)</td><td>仅 Google Cloud</td></tr><tr><td><strong>AWS 内部训练</strong></td><td>Trn3 UltraServer</td><td>性价比最高</td></tr><tr><td><strong>开放生态偏好</strong></td><td>AMD Helios</td><td>UALoF 开放互联</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="超低延迟推理">超低延迟推理<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E8%B6%85%E4%BD%8E%E5%BB%B6%E8%BF%9F%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to 超低延迟推理" title="Direct link to 超低延迟推理" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>推荐方案</th><th>理由</th></tr></thead><tbody><tr><td><strong>Agentic AI (1000+ 调用/秒)</strong></td><td>Groq 3 LPX</td><td>TTFT &lt; 20ms，唯一选择</td></tr><tr><td><strong>Real-time Code Gen</strong>（Copilot）</td><td>Groq 3 LPX</td><td>100ms 以内响应</td></tr><tr><td><strong>万亿参数推理</strong></td><td>NVIDIA Rubin R200 + Groq 3 LPX 协同</td><td>GPU 训练 + LPU 推理</td></tr><tr><td><strong>70B 单模型推理</strong></td><td>TPU 8i（288GB HBM）</td><td>单卡可装 FP16 70B</td></tr><tr><td><strong>多模态实时推理</strong></td><td>TPU 8i（风冷）</td><td>散热灵活</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="成本敏感训练">成本敏感训练<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E6%88%90%E6%9C%AC%E6%95%8F%E6%84%9F%E8%AE%AD%E7%BB%83" class="hash-link" aria-label="Direct link to 成本敏感训练" title="Direct link to 成本敏感训练" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>推荐方案</th><th>理由</th></tr></thead><tbody><tr><td><strong>百亿参数训练</strong></td><td>AWS Trn3 UltraServer</td><td>每美元性能 2-3× NVIDIA</td></tr><tr><td><strong>超大规模 (Gemini 外)</strong></td><td>AWS Trn3 UltraServer × N</td><td>$3-5M/机柜</td></tr><tr><td><strong>70B 微调</strong></td><td>AMD Helios 单机柜</td><td>性价比 + 开放生态</td></tr><tr><td><strong>千亿参数训练</strong></td><td>Trn3 UltraServer × 3</td><td>144 × 3 = 432 chip</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="机柜级的未来趋势">机柜级的未来趋势<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E6%9C%BA%E6%9F%9C%E7%BA%A7%E7%9A%84%E6%9C%AA%E6%9D%A5%E8%B6%8B%E5%8A%BF" class="hash-link" aria-label="Direct link to 机柜级的未来趋势" title="Direct link to 机柜级的未来趋势" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-单机柜算力持续增长">1. 单机柜算力持续增长<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#1-%E5%8D%95%E6%9C%BA%E6%9F%9C%E7%AE%97%E5%8A%9B%E6%8C%81%E7%BB%AD%E5%A2%9E%E9%95%BF" class="hash-link" aria-label="Direct link to 1. 单机柜算力持续增长" title="Direct link to 1. 单机柜算力持续增长" translate="no">​</a></h3>
<table><thead><tr><th>年份</th><th>单机柜算力</th><th>主流方案</th></tr></thead><tbody><tr><td>2024</td><td>~0.2 EFLOPS FP8</td><td>GB200 NVL72</td></tr><tr><td>2026</td><td>1.8-3.6 EFLOPS FP8</td><td>Rubin NVL72 / Helios</td></tr><tr><td>2028</td><td>8-15 EFLOPS FP8</td><td>Rubin Ultra NVL72 / MI500</td></tr><tr><td>2030</td><td>30-50 EFLOPS FP8</td><td>Feynman 时代</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-多机柜互联标准竞争">2. 多机柜互联标准竞争<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#2-%E5%A4%9A%E6%9C%BA%E6%9F%9C%E4%BA%92%E8%81%94%E6%A0%87%E5%87%86%E7%AB%9E%E4%BA%89" class="hash-link" aria-label="Direct link to 2. 多机柜互联标准竞争" title="Direct link to 2. 多机柜互联标准竞争" translate="no">​</a></h3>
<table><thead><tr><th>标准</th><th>厂商</th><th>状态</th></tr></thead><tbody><tr><td><strong>NVLink Network</strong></td><td>NVIDIA</td><td>封闭，2026 主力</td></tr><tr><td><strong>UALoF</strong></td><td>AMD/Broadcom/Intel</td><td>开放，2026 Helios 首发</td></tr><tr><td><strong>UALink</strong></td><td>联盟</td><td>UALoF 演进版</td></tr><tr><td><strong>NeuronLink</strong></td><td>AWS</td><td>私有</td></tr><tr><td><strong>GroqSync</strong></td><td>Groq (NVIDIA)</td><td>私有，超低延迟</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-软件生态分层">3. 软件生态分层<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#3-%E8%BD%AF%E4%BB%B6%E7%94%9F%E6%80%81%E5%88%86%E5%B1%82" class="hash-link" aria-label="Direct link to 3. 软件生态分层" title="Direct link to 3. 软件生态分层" translate="no">​</a></h3>
<ul>
<li class=""><strong>训练框架</strong>：PyTorch + JAX + Megatron</li>
<li class=""><strong>推理引擎</strong>：vLLM + TensorRT-LLM + SGLang</li>
<li class=""><strong>资源调度</strong>：Slurm + Kubernetes + Ray</li>
<li class=""><strong>多机柜管理</strong>：NVIDIA Base Command / AMD ROCm RunTime</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/rubin-r200">NVIDIA Rubin R200 (NVL72 / NVL576)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/groq-3-lpx">NVIDIA Groq 3 LPX</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi400">AMD MI400 (Helios 机柜)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/trainium-3">AWS Trainium 3 (Trn3 UltraServer)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8t">Google TPU 8t (9,216 pod)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8i">Google TPU 8i (推理专用)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/roadmap">未来路线图</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/rack-scale-ai-2026-nvl72-vs-helios-vs-lpx-vs-ultraserver#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>2026 年 AI 算力的<strong>主战场</strong>是机柜级方案：</p>
<ol>
<li class=""><strong>NVIDIA Rubin NVL72/NVL576</strong>——训练最强，FP4 3.6/28.8 EFLOPS</li>
<li class=""><strong>AMD Helios</strong>——开放生态，dense 算力领先</li>
<li class=""><strong>Groq 3 LPX</strong>——超低延迟推理，TTFT &lt; 20ms</li>
<li class=""><strong>AWS Trn3 UltraServer</strong>——性价比最高，2-3× 每美元</li>
<li class=""><strong>Google TPU 8t pod</strong>——超大规模，9,216 chip 集群</li>
</ol>
<p><strong>没有最好，只有最合适</strong>。选型需结合：</p>
<ul>
<li class="">模型规模（100B / 700B / 1T+）</li>
<li class="">训练 vs 推理</li>
<li class="">延迟要求（普通 vs Agentic）</li>
<li class="">生态偏好（CUDA / ROCm / JAX / Neuron）</li>
<li class="">预算（$2-10M/机柜）</li>
<li class="">部署位置（自建 / 云）</li>
</ul>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Tech Deep Dive" term="Tech Deep Dive"/>
        <category label="Selection Guide" term="Selection Guide"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Intel 取消 Falcon Shores 转向 Jaguar Shores：从单芯片对标到机柜级系统]]></title>
        <id>https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores</id>
        <link href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores"/>
        <updated>2026-05-14T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026-05-14 Intel 财报披露取消 Falcon Shores 单芯片 GPU 计划，转向机柜级 Jaguar Shores 系统项目（2027-2028 推出）。Intel AI 战略从"直接对标 NVIDIA"转向"机柜级系统 + 代工厂"。]]></summary>
        <content type="html"><![CDATA[<p><strong>2026 年 5 月 14 日</strong>，Intel 在 Q1 财报披露中<strong>正式取消 Falcon Shores 单芯片 GPU 计划</strong>，并确认新的机柜级 AI 系统项目 <strong>Jaguar Shores</strong> 将在 2027-2028 年推出。这是 Intel AI 战略的重大调整，本文深入分析其原因和未来。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="falcon-shores-取消事件">Falcon Shores 取消事件<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#falcon-shores-%E5%8F%96%E6%B6%88%E4%BA%8B%E4%BB%B6" class="hash-link" aria-label="Direct link to Falcon Shores 取消事件" title="Direct link to Falcon Shores 取消事件" translate="no">​</a></h2>
<table><thead><tr><th>时间</th><th>事件</th><th>详情</th></tr></thead><tbody><tr><td><strong>2023</strong></td><td>首次公布</td><td>200 PFLOPS 单芯片对标 B100</td></tr><tr><td><strong>2024-12</strong></td><td>路线图调整</td><td>取消 200 PF 目标，改为"系统级"</td></tr><tr><td><strong>2026-05-14</strong></td><td><strong>正式取消</strong></td><td>Intel 财报披露 Falcon Shores 取消</td></tr><tr><td><strong>2026-05-14</strong></td><td>转向 Jaguar Shores</td><td>确认新机柜级系统项目</td></tr><tr><td><strong>2027-2028</strong></td><td>预计发布</td><td>Jaguar Shores 机柜级系统</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="为什么取消-falcon-shores">为什么取消 Falcon Shores？<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#%E4%B8%BA%E4%BB%80%E4%B9%88%E5%8F%96%E6%B6%88-falcon-shores" class="hash-link" aria-label="Direct link to 为什么取消 Falcon Shores？" title="Direct link to 为什么取消 Falcon Shores？" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-第三次季度亏损">1. 第三次季度亏损<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#1-%E7%AC%AC%E4%B8%89%E6%AC%A1%E5%AD%A3%E5%BA%A6%E4%BA%8F%E6%8D%9F" class="hash-link" aria-label="Direct link to 1. 第三次季度亏损" title="Direct link to 1. 第三次季度亏损" translate="no">​</a></h3>
<p>2026 Q1 Intel 第三次季度亏损：</p>
<ul>
<li class="">营收：$12.7B（同比 -7%）</li>
<li class="">净亏损：-$1.6B</li>
<li class="">AI 业务（Habana）：营收仅 $0.4B，远低于预期</li>
</ul>
<p><strong>研发预算紧张，无法同时支持 Falcon Shores + Gaudi + Xeon + 18A 工艺。</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-单芯片-200-pf-不现实">2. 单芯片 200 PF 不现实<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#2-%E5%8D%95%E8%8A%AF%E7%89%87-200-pf-%E4%B8%8D%E7%8E%B0%E5%AE%9E" class="hash-link" aria-label="Direct link to 2. 单芯片 200 PF 不现实" title="Direct link to 2. 单芯片 200 PF 不现实" translate="no">​</a></h3>
<p>Falcon Shores 原计划 2025 年发布 200 PFLOPS 单芯片，对标 B100。</p>
<p>但到 2026 年，行业现实是：</p>
<ul>
<li class="">NVIDIA Rubin R200 单芯片 50 PFLOPS FP4 sparse（25 PF dense）已是极限</li>
<li class=""><strong>单芯片 200 PF 物理不可行</strong>（功耗、面积、HBM 容量都达不到）</li>
<li class=""><strong>业界已转向机柜级方案</strong>（NVL72、Helios、UltraServer）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-hbm-供应紧张">3. HBM 供应紧张<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#3-hbm-%E4%BE%9B%E5%BA%94%E7%B4%A7%E5%BC%A0" class="hash-link" aria-label="Direct link to 3. HBM 供应紧张" title="Direct link to 3. HBM 供应紧张" translate="no">​</a></h3>
<p>HBM 供应紧张，优先供应 NVIDIA：</p>
<ul>
<li class=""><strong>SK Hynix</strong>：70% 产能给 NVIDIA</li>
<li class=""><strong>Micron</strong>：60% 产能给 NVIDIA</li>
<li class=""><strong>Samsung</strong>：份额被压缩</li>
</ul>
<p><strong>Intel 难以获得足够 HBM 供应单芯片 200 PF 计划。</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-市场转向机柜级">4. 市场转向机柜级<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#4-%E5%B8%82%E5%9C%BA%E8%BD%AC%E5%90%91%E6%9C%BA%E6%9F%9C%E7%BA%A7" class="hash-link" aria-label="Direct link to 4. 市场转向机柜级" title="Direct link to 4. 市场转向机柜级" translate="no">​</a></h3>
<p>2026 年 AI 算力市场已转向机柜级：</p>
<ul>
<li class="">NVIDIA Rubin NVL72（72 GPU）</li>
<li class="">AMD Helios（72 MI400）</li>
<li class="">AWS Trn3 UltraServer（144 chip）</li>
<li class="">Google TPU 8t pod（9,216 chip）</li>
</ul>
<p><strong>单芯片对标已无意义，机柜级才是主战场。</strong></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="jaguar-shoresintel-的机柜级反击">Jaguar Shores：Intel 的机柜级反击<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#jaguar-shoresintel-%E7%9A%84%E6%9C%BA%E6%9F%9C%E7%BA%A7%E5%8F%8D%E5%87%BB" class="hash-link" aria-label="Direct link to Jaguar Shores：Intel 的机柜级反击" title="Direct link to Jaguar Shores：Intel 的机柜级反击" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>参数（推测）</th></tr></thead><tbody><tr><td><strong>形态</strong></td><td><strong>机柜级系统</strong>（非单芯片）</td></tr><tr><td><strong>每机柜 AI 加速器数</strong></td><td>64-128 颗（推测）</td></tr><tr><td><strong>每机柜 CPU 数</strong></td><td>32-64 颗 Xeon</td></tr><tr><td><strong>AI 加速器 IP</strong></td><td><strong>Gaudi v4 架构</strong>（基于 Gaudi 3 演进）</td></tr><tr><td><strong>制程</strong></td><td><strong>Intel Foundry 18A</strong></td></tr><tr><td><strong>HBM 容量（每加速器）</strong></td><td>144 GB HBM3e</td></tr><tr><td><strong>HBM 带宽（每加速器）</strong></td><td>~5 TB/s</td></tr><tr><td><strong>FP8 算力（每加速器）</strong></td><td>~2,500 TFLOPS（推测）</td></tr><tr><td><strong>FP8 算力（机柜）</strong></td><td>~160-320 PFLOPS</td></tr><tr><td><strong>网络</strong></td><td><strong>800G 集成 NIC</strong></td></tr><tr><td><strong>TDP（机柜）</strong></td><td>~80-120 kW</td></tr><tr><td><strong>首发</strong></td><td><strong>2027-2028</strong></td></tr></tbody></table>
<blockquote>
<p>⚠️ <strong>未官方公布</strong>：以上规格均为推测，<strong>Intel 仅有路线图级别披露</strong>。所有数字以 Intel 后续公布为准。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="intel-ai-战略重组2026-05">Intel AI 战略重组（2026-05）<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#intel-ai-%E6%88%98%E7%95%A5%E9%87%8D%E7%BB%842026-05" class="hash-link" aria-label="Direct link to Intel AI 战略重组（2026-05）" title="Direct link to Intel AI 战略重组（2026-05）" translate="no">​</a></h2>
<table><thead><tr><th>战略</th><th>内容</th></tr></thead><tbody><tr><td><strong>Gaudi 产品线</strong></td><td>继续推 Gaudi 3 / Gaudi 3E（2026 维持）</td></tr><tr><td><strong>Falcon Shores</strong></td><td>❌ <strong>已取消</strong></td></tr><tr><td><strong>Jaguar Shores</strong></td><td>✅ <strong>重启机柜级 AI 系统</strong></td></tr><tr><td><strong>代工服务</strong></td><td>Intel Foundry 18A 为 NVIDIA / AMD / AWS 代工</td></tr><tr><td><strong>x86 主导</strong></td><td>强化 Xeon 6/7（AI 服务器 CPU 主导）</td></tr><tr><td><strong>Habana 品牌</strong></td><td>保留，Jaguar Shores 整合 Gaudi IP</td></tr></tbody></table>
<blockquote>
<p><strong>Intel 不再做 AI GPU 直接对标 NVIDIA</strong>：</p>
<ul>
<li class="">短期：Gaudi 3 维持（性价比对标）</li>
<li class="">中期：Jaguar Shores 系统级（机柜级对标）</li>
<li class="">长期：Intel Foundry 18A 为 AI 厂商代工（<strong>Intel 做"AI 代工厂"</strong>）</li>
</ul>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="intel-foundry-18a-战略">Intel Foundry 18A 战略<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#intel-foundry-18a-%E6%88%98%E7%95%A5" class="hash-link" aria-label="Direct link to Intel Foundry 18A 战略" title="Direct link to Intel Foundry 18A 战略" translate="no">​</a></h2>
<p>Intel 的真正"AI 终极战略"是<strong>代工</strong>：</p>
<table><thead><tr><th>客户</th><th>18A 代工产品</th></tr></thead><tbody><tr><td><strong>NVIDIA</strong></td><td>Rubin 后续代（2027+）</td></tr><tr><td><strong>AMD</strong></td><td>MI500 后续代（2028+）</td></tr><tr><td><strong>AWS</strong></td><td>Trainium 4（2027）</td></tr><tr><td><strong>Microsoft</strong></td><td>Maia 2（2026）</td></tr></tbody></table>
<blockquote>
<p><strong>如果 Intel Foundry 18A 良率达到台积电 N3 水平</strong>，Intel 将从"AI GPU 失败者"转变为"AI 算力代工霸主"。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="对-intel-客户的影响">对 Intel 客户的影响<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#%E5%AF%B9-intel-%E5%AE%A2%E6%88%B7%E7%9A%84%E5%BD%B1%E5%93%8D" class="hash-link" aria-label="Direct link to 对 Intel 客户的影响" title="Direct link to 对 Intel 客户的影响" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="gaudi-3--gaudi-3e短期">Gaudi 3 / Gaudi 3E（短期）<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#gaudi-3--gaudi-3e%E7%9F%AD%E6%9C%9F" class="hash-link" aria-label="Direct link to Gaudi 3 / Gaudi 3E（短期）" title="Direct link to Gaudi 3 / Gaudi 3E（短期）" translate="no">​</a></h3>
<ul>
<li class="">2024 发布，性价比优于 NVIDIA H100</li>
<li class="">2026 维持，作为 Intel 主力 AI 训练芯片</li>
<li class="">主要客户：部分企业级 + 政府/电信</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="jaguar-shores中期">Jaguar Shores（中期）<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#jaguar-shores%E4%B8%AD%E6%9C%9F" class="hash-link" aria-label="Direct link to Jaguar Shores（中期）" title="Direct link to Jaguar Shores（中期）" translate="no">​</a></h3>
<ul>
<li class="">2027-2028 发布</li>
<li class="">适合<strong>机柜级训练</strong></li>
<li class="">主要客户：政府、电信、超算中心</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="intel-foundry-18a长期">Intel Foundry 18A（长期）<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#intel-foundry-18a%E9%95%BF%E6%9C%9F" class="hash-link" aria-label="Direct link to Intel Foundry 18A（长期）" title="Direct link to Intel Foundry 18A（长期）" translate="no">​</a></h3>
<ul>
<li class="">2027 量产（推测）</li>
<li class="">客户：NVIDIA、AMD、AWS、Microsoft</li>
<li class=""><strong>Intel 的真正 AI 收入来源</strong></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="对-ai-行业的影响">对 AI 行业的影响<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#%E5%AF%B9-ai-%E8%A1%8C%E4%B8%9A%E7%9A%84%E5%BD%B1%E5%93%8D" class="hash-link" aria-label="Direct link to 对 AI 行业的影响" title="Direct link to 对 AI 行业的影响" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-ai-芯片竞争格局变化">1. AI 芯片竞争格局变化<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#1-ai-%E8%8A%AF%E7%89%87%E7%AB%9E%E4%BA%89%E6%A0%BC%E5%B1%80%E5%8F%98%E5%8C%96" class="hash-link" aria-label="Direct link to 1. AI 芯片竞争格局变化" title="Direct link to 1. AI 芯片竞争格局变化" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>2025 定位</th><th>2026+ 定位</th></tr></thead><tbody><tr><td><strong>NVIDIA</strong></td><td>GPU 主导</td><td><strong>GPU + LPU + 系统级</strong>（最强）</td></tr><tr><td><strong>AMD</strong></td><td>GPU 第二</td><td><strong>GPU + 机柜级 UALoF</strong></td></tr><tr><td><strong>Intel</strong></td><td>单芯片失败</td><td><strong>机柜级 + 代工厂</strong></td></tr><tr><td><strong>Google</strong></td><td>TPU 专用</td><td><strong>TPU 拆分 + 训练/推理双线</strong></td></tr><tr><td><strong>AWS</strong></td><td>Trainium 自研</td><td><strong>3nm + UltraServer</strong></td></tr><tr><td><strong>Huawei</strong></td><td>国产替代</td><td><strong>3× H20 + 系统级</strong></td></tr><tr><td><strong>Cerebras</strong></td><td>晶圆级</td><td><strong>IPO + WSE-4</strong></td></tr><tr><td><strong>Groq (NVIDIA)</strong></td><td>LPU 独立</td><td><strong>NVIDIA 收购整合</strong></td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-ualof-开放互联加速">2. UALoF 开放互联加速<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#2-ualof-%E5%BC%80%E6%94%BE%E4%BA%92%E8%81%94%E5%8A%A0%E9%80%9F" class="hash-link" aria-label="Direct link to 2. UALoF 开放互联加速" title="Direct link to 2. UALoF 开放互联加速" translate="no">​</a></h3>
<p>Intel 加入 UALoF 联盟后：</p>
<ul>
<li class="">AMD + Intel + Broadcom 共同推动 UALoF</li>
<li class="">挑战 NVIDIA NVLink 封闭生态</li>
<li class=""><strong>2027-2028 年 UALoF 可能成为开放标准</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-ai-代工业务竞争">3. AI 代工业务竞争<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#3-ai-%E4%BB%A3%E5%B7%A5%E4%B8%9A%E5%8A%A1%E7%AB%9E%E4%BA%89" class="hash-link" aria-label="Direct link to 3. AI 代工业务竞争" title="Direct link to 3. AI 代工业务竞争" translate="no">​</a></h3>
<ul>
<li class=""><strong>TSMC 仍主导</strong>：3nm / 2nm 工艺领先</li>
<li class=""><strong>Intel Foundry 18A 迎头赶上</strong>：2026 试产，2027 量产</li>
<li class=""><strong>Samsung Foundry</strong>：3nm GAA 工艺量产，但客户少</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/intel/jaguar-shores">Intel Jaguar Shores 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/intel/gaudi-3">Intel Gaudi 3（当前主力）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/intel/gaudi-2">Intel Gaudi 2（上一代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/rubin-r200">NVIDIA Rubin R200（同期）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi400">AMD MI400（同期）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/roadmap">未来路线图</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/intel-cancels-falcon-shores-pivots-to-jaguar-shores#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>Intel 取消 Falcon Shores 转向 Jaguar Shores 是 2026 年 AI 芯片行业<strong>最重大战略调整</strong>之一：</p>
<ol>
<li class=""><strong>承认单芯片对标 NVIDIA 不现实</strong></li>
<li class=""><strong>转向机柜级系统</strong>（与 AMD Helios 同方向）</li>
<li class=""><strong>强化 Intel Foundry 18A 代工</strong>（真正长期战略）</li>
<li class=""><strong>Gaudi IP 整合到 Jaguar Shores</strong></li>
<li class=""><strong>开放互联 UALoF 联盟</strong>（挑战 NVLink）</li>
</ol>
<p>Intel 的 AI 战略从"直接对标 NVIDIA"转向"机柜级系统 + AI 代工厂"，是一次<strong>务实的战略调整</strong>。未来 5 年，Intel Foundry 18A 的成败将决定 Intel 在 AI 时代的最终命运。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Vendor Strategy" term="Vendor Strategy"/>
        <category label="Industry News" term="Industry News"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[推理优化技术演进：PagedAttention / FlashAttention / Speculative Decoding 深度解析]]></title>
        <id>https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding</id>
        <link href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding"/>
        <updated>2026-04-30T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[LLM 推理优化的三大核心技术：PagedAttention (vLLM) + FlashAttention + Speculative Decoding。原理、实现、性能提升与对硬件选型的影响。]]></summary>
        <content type="html"><![CDATA[<p><strong>LLM 推理性能 = 算法 + 软件 + 硬件</strong>。硬件（H100、B300、Rubin）只决定了<strong>理论上限</strong>。实际推理性能可以<strong>通过算法优化提升 5-30 倍</strong>。本文深度解析 <strong>PagedAttention、FlashAttention、Speculative Decoding</strong> 三大推理优化技术。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="推理优化-vs-训练优化">推理优化 vs 训练优化<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%8E%A8%E7%90%86%E4%BC%98%E5%8C%96-vs-%E8%AE%AD%E7%BB%83%E4%BC%98%E5%8C%96" class="hash-link" aria-label="Direct link to 推理优化 vs 训练优化" title="Direct link to 推理优化 vs 训练优化" translate="no">​</a></h2>
<table><thead><tr><th>维度</th><th>训练</th><th>推理</th></tr></thead><tbody><tr><td><strong>算力利用</strong></td><td>满载（高 batch）</td><td>低（batch 1-32）</td></tr><tr><td><strong>瓶颈</strong></td><td>GPU 算力</td><td><strong>显存 + 内存带宽</strong></td></tr><tr><td><strong>优化方向</strong></td><td>数据并行 / 模型并行 / ZeRO</td><td><strong>KV Cache + 注意力 + 批处理</strong></td></tr><tr><td><strong>性能指标</strong></td><td>tokens/sec (训练)</td><td><strong>TTFT, TPOT, throughput</strong></td></tr><tr><td><strong>典型优化</strong></td><td>FlashAttention, gradient checkpointing</td><td><strong>PagedAttention, Speculative, 量化</strong></td></tr></tbody></table>
<blockquote>
<p><strong>推理优化比训练优化更复杂</strong>——因为延迟敏感 + 内存受限 + 多种工作负载。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="三大核心技术">三大核心技术<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E4%B8%89%E5%A4%A7%E6%A0%B8%E5%BF%83%E6%8A%80%E6%9C%AF" class="hash-link" aria-label="Direct link to 三大核心技术" title="Direct link to 三大核心技术" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-pagedattentionvllm-核心">1. PagedAttention（vLLM 核心）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#1-pagedattentionvllm-%E6%A0%B8%E5%BF%83" class="hash-link" aria-label="Direct link to 1. PagedAttention（vLLM 核心）" title="Direct link to 1. PagedAttention（vLLM 核心）" translate="no">​</a></h3>
<p><strong>PagedAttention</strong> 是 <strong>UC Berkeley 团队</strong>（李卓、@woody-yc 等）在 <strong>vLLM 论文（SOSP 2023）</strong> 中提出的<strong>KV Cache 内存管理革命</strong>。</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="问题传统-kv-cache-浪费严重">问题：传统 KV Cache 浪费严重<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%97%AE%E9%A2%98%E4%BC%A0%E7%BB%9F-kv-cache-%E6%B5%AA%E8%B4%B9%E4%B8%A5%E9%87%8D" class="hash-link" aria-label="Direct link to 问题：传统 KV Cache 浪费严重" title="Direct link to 问题：传统 KV Cache 浪费严重" translate="no">​</a></h4>
<ul>
<li class=""><strong>传统方式</strong>：为每个请求<strong>预分配</strong>最大长度的 KV Cache 空间</li>
<li class=""><strong>例</strong>：70B 模型 + 4K 上下文 = <strong>~2 GB KV Cache / 请求</strong></li>
<li class=""><strong>100 个并发请求 = 200 GB</strong>——爆显存</li>
</ul>
<table><thead><tr><th>方案</th><th>KV Cache 管理</th><th>内存浪费</th></tr></thead><tbody><tr><td>传统 (HuggingFace)</td><td>连续预分配</td><td><strong>60-80% 浪费</strong></td></tr><tr><td><strong>PagedAttention</strong></td><td><strong>分页按需分配</strong></td><td><strong>&lt;4% 浪费</strong></td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="原理操作系统分页思想">原理：操作系统分页思想<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E5%8E%9F%E7%90%86%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F%E5%88%86%E9%A1%B5%E6%80%9D%E6%83%B3" class="hash-link" aria-label="Direct link to 原理：操作系统分页思想" title="Direct link to 原理：操作系统分页思想" translate="no">​</a></h4>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">传统方式:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[请求1: 2GB 连续] [请求2: 2GB 连续] [请求3: 2GB 连续]  -- 大量内部碎片</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">PagedAttention:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[请求1: page 0,1,2,3] [请求2: page 4,5,6,7] [请求3: page 8,9,10,11]  -- 页表管理</span><br></div></code></pre></div></div>
<ul>
<li class=""><strong>每个 page = 16 tokens 的 KV Cache</strong></li>
<li class=""><strong>按需分配 page</strong>，无需预分配</li>
<li class=""><strong>page table</strong> 跟踪映射关系</li>
<li class=""><strong>碎片化 &lt; 4%</strong>（vs 60-80%）</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="性能提升">性能提升<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%80%A7%E8%83%BD%E6%8F%90%E5%8D%87" class="hash-link" aria-label="Direct link to 性能提升" title="Direct link to 性能提升" translate="no">​</a></h4>
<table><thead><tr><th>指标</th><th>传统 (HF)</th><th><strong>PagedAttention (vLLM)</strong></th><th>提升</th></tr></thead><tbody><tr><td>吞吐量 (70B 推理)</td><td>100 tok/s</td><td><strong>800-1500 tok/s</strong></td><td><strong>8-15×</strong></td></tr><tr><td>最大并发</td><td>~30</td><td><strong>200+</strong></td><td>6×</td></tr><tr><td>显存利用率</td><td>30%</td><td><strong>96%</strong></td><td>3.2×</td></tr><tr><td>长上下文支持</td><td>4K</td><td><strong>32K-128K</strong></td><td>8-32×</td></tr></tbody></table>
<blockquote>
<p><strong>PagedAttention 让 vLLM 成为 LLM 推理的事实标准</strong>——70B 模型吞吐量提升 8-15×。</p>
</blockquote>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="适用场景">适用场景<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%80%82%E7%94%A8%E5%9C%BA%E6%99%AF" class="hash-link" aria-label="Direct link to 适用场景" title="Direct link to 适用场景" translate="no">​</a></h4>
<ul>
<li class="">✅ <strong>高并发在线推理</strong>（ChatGPT、Claude、文心一言）</li>
<li class="">✅ <strong>长上下文</strong>（32K+ token）</li>
<li class="">✅ <strong>多模型服务</strong>（共享 GPU 池）</li>
<li class="">❌ 单用户离线推理（提升有限）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-flashattentiongpu-优化">2. FlashAttention（GPU 优化）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#2-flashattentiongpu-%E4%BC%98%E5%8C%96" class="hash-link" aria-label="Direct link to 2. FlashAttention（GPU 优化）" title="Direct link to 2. FlashAttention（GPU 优化）" translate="no">​</a></h3>
<p><strong>FlashAttention</strong> 是 <strong>Tri Dao 等</strong> 在 2022 年提出的 <strong>GPU 内存层次优化</strong>：</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="问题注意力矩阵-on-内存">问题：注意力矩阵 O(N²) 内存<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%97%AE%E9%A2%98%E6%B3%A8%E6%84%8F%E5%8A%9B%E7%9F%A9%E9%98%B5-on-%E5%86%85%E5%AD%98" class="hash-link" aria-label="Direct link to 问题：注意力矩阵 O(N²) 内存" title="Direct link to 问题：注意力矩阵 O(N²) 内存" translate="no">​</a></h4>
<ul>
<li class=""><strong>标准 attention</strong>：需要存储 N×N 注意力矩阵</li>
<li class=""><strong>8K 上下文</strong>：8K×8K = 64M floats = 256 MB</li>
<li class=""><strong>32K 上下文</strong>：32K×32K = 1G floats = <strong>4 GB</strong>——爆显存</li>
<li class=""><strong>128K 上下文</strong>：128K×128K = 16G floats = <strong>64 GB</strong>——不可能</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="原理分块tiling--重计算">原理：分块（tiling） + 重计算<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E5%8E%9F%E7%90%86%E5%88%86%E5%9D%97tiling--%E9%87%8D%E8%AE%A1%E7%AE%97" class="hash-link" aria-label="Direct link to 原理：分块（tiling） + 重计算" title="Direct link to 原理：分块（tiling） + 重计算" translate="no">​</a></h4>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">标准 Attention:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Q @ K^T → 存储 N×N 矩阵 → softmax → @ V   -- 需要 256MB+ HBM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">FlashAttention:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">分块计算，每块在 SRAM 内部处理，**不存储 N×N 矩阵**</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Q 块 × K 块^T → 局部 softmax → × V 块   -- SRAM 内部</span><br></div></code></pre></div></div>
<ul>
<li class=""><strong>核心思想</strong>：<strong>利用 GPU 的 SRAM（HBM 上的高速缓存）</strong></li>
<li class=""><strong>HBM 读写次数</strong>：从 O(N²) 降到 O(N)</li>
<li class=""><strong>重计算</strong>：反向传播时重新计算 attention，不存中间结果</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="性能提升-1">性能提升<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%80%A7%E8%83%BD%E6%8F%90%E5%8D%87-1" class="hash-link" aria-label="Direct link to 性能提升" title="Direct link to 性能提升" translate="no">​</a></h4>
<table><thead><tr><th>指标</th><th>标准 Attention</th><th><strong>FlashAttention v2</strong></th><th>提升</th></tr></thead><tbody><tr><td>训练速度</td><td>100%</td><td><strong>200-300%</strong></td><td>2-3×</td></tr><tr><td>内存</td><td>O(N²)</td><td><strong>O(N)</strong></td><td>1/N 比例</td></tr><tr><td>H100 速度</td><td>600 TFLOPS</td><td><strong>1100+ TFLOPS</strong></td><td>1.8×</td></tr><tr><td>128K 上下文</td><td>❌ OOM</td><td>✅ 可行</td><td>—</td></tr><tr><td>1M 上下文</td><td>❌ 不可能</td><td>✅ FlashAttention-3</td><td>—</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="flashattention-演进">FlashAttention 演进<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#flashattention-%E6%BC%94%E8%BF%9B" class="hash-link" aria-label="Direct link to FlashAttention 演进" title="Direct link to FlashAttention 演进" translate="no">​</a></h4>
<table><thead><tr><th>版本</th><th>年份</th><th>关键改进</th></tr></thead><tbody><tr><td><strong>FlashAttention v1</strong></td><td>2022</td><td>分块 + 重计算</td></tr><tr><td><strong>FlashAttention v2</strong></td><td>2023</td><td><strong>并行化</strong> + 减少 non-matmul 工作</td></tr><tr><td><strong>FlashAttention v3</strong></td><td>2024</td><td><strong>FP8 支持</strong> + H100 优化</td></tr><tr><td><strong>FlashAttention v4</strong> (推测 2026)</td><td>2026</td><td>Rubin R200 / MI400 优化</td></tr></tbody></table>
<blockquote>
<p><strong>FlashAttention v3 + H100/H200 达到 1100+ TFLOPS</strong>（FP16）——<strong>超过官方标称算力</strong>。</p>
</blockquote>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="适用场景-1">适用场景<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%80%82%E7%94%A8%E5%9C%BA%E6%99%AF-1" class="hash-link" aria-label="Direct link to 适用场景" title="Direct link to 适用场景" translate="no">​</a></h4>
<ul>
<li class="">✅ <strong>所有 attention 计算</strong>（训练 + 推理）</li>
<li class="">✅ <strong>长上下文</strong>（128K+ token）</li>
<li class="">✅ <strong>GPU 推理必备</strong>（H100/B200 标配）</li>
<li class="">❌ 边缘设备（不需要 attention 优化）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-speculative-decoding投机解码">3. Speculative Decoding（投机解码）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#3-speculative-decoding%E6%8A%95%E6%9C%BA%E8%A7%A3%E7%A0%81" class="hash-link" aria-label="Direct link to 3. Speculative Decoding（投机解码）" title="Direct link to 3. Speculative Decoding（投机解码）" translate="no">​</a></h3>
<p><strong>Speculative Decoding</strong>（投机解码 / 推测解码）是 <strong>Leviathan et al. 2023</strong> 提出的<strong>推理加速技术</strong>：</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="问题自回归生成慢">问题：自回归生成慢<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%97%AE%E9%A2%98%E8%87%AA%E5%9B%9E%E5%BD%92%E7%94%9F%E6%88%90%E6%85%A2" class="hash-link" aria-label="Direct link to 问题：自回归生成慢" title="Direct link to 问题：自回归生成慢" translate="no">​</a></h4>
<ul>
<li class=""><strong>LLM 一次生成 1 个 token</strong></li>
<li class=""><strong>每个 token 需要完整 forward pass</strong></li>
<li class=""><strong>H100 FP16：~50ms/token</strong>——长生成耗时</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="原理小模型--大模型协同">原理：小模型 + 大模型协同<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E5%8E%9F%E7%90%86%E5%B0%8F%E6%A8%A1%E5%9E%8B--%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%8D%8F%E5%90%8C" class="hash-link" aria-label="Direct link to 原理：小模型 + 大模型协同" title="Direct link to 原理：小模型 + 大模型协同" translate="no">​</a></h4>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">传统:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">大模型 → token 1 → token 2 → token 3 → ...    -- 每个 token 都用大模型</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Speculative Decoding:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">1. 小模型 (Draft Model) 一次生成 5 个候选 token: [t1, t2, t3, t4, t5]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">2. 大模型 (Target Model) 一次验证 5 个 token（一次 forward pass）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">3. 接受前 k 个匹配的 token（k+1 用大模型重新生成）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">4. 重复</span><br></div></code></pre></div></div>
<ul>
<li class=""><strong>小模型</strong>：~100× 更快（70B → 1B）</li>
<li class=""><strong>大模型</strong>：一次 forward 验证多个 token</li>
<li class=""><strong>理论加速</strong>：2-4×（取决于小模型精度）</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="性能提升-2">性能提升<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%80%A7%E8%83%BD%E6%8F%90%E5%8D%87-2" class="hash-link" aria-label="Direct link to 性能提升" title="Direct link to 性能提升" translate="no">​</a></h4>
<table><thead><tr><th>指标</th><th>传统</th><th><strong>Speculative Decoding</strong></th><th>提升</th></tr></thead><tbody><tr><td>70B 推理速度</td><td>30 tok/s</td><td><strong>60-100 tok/s</strong></td><td>2-3×</td></tr><tr><td>TTFT (首 Token)</td><td>200ms</td><td>200ms (同)</td><td>—</td></tr><tr><td>TPOT (单 Token)</td><td>33ms</td><td><strong>10-17ms</strong></td><td>2-3×</td></tr><tr><td>适用模型</td><td>任何</td><td><strong>小模型 + 大模型</strong></td><td>—</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="主流-speculative-decoding-方案">主流 Speculative Decoding 方案<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E4%B8%BB%E6%B5%81-speculative-decoding-%E6%96%B9%E6%A1%88" class="hash-link" aria-label="Direct link to 主流 Speculative Decoding 方案" title="Direct link to 主流 Speculative Decoding 方案" translate="no">​</a></h4>
<table><thead><tr><th>方案</th><th>小模型</th><th>加速比</th><th>适用</th></tr></thead><tbody><tr><td><strong>Self-Speculative</strong></td><td>同一模型不同层</td><td>1.5-2×</td><td>通用</td></tr><tr><td><strong>Draft Model</strong></td><td>独立小模型 (e.g., 7B+70B)</td><td>2-3×</td><td>通用</td></tr><tr><td><strong>Medusa</strong></td><td>多个解码头</td><td>2-3×</td><td>单一模型</td></tr><tr><td><strong>EAGLE</strong></td><td>特征预测</td><td>2-3×</td><td>单一模型</td></tr><tr><td><strong>Lookahead Decoding</strong></td><td>Jacobi 迭代</td><td>1.5-2×</td><td>小模型</td></tr><tr><td><strong>REST</strong></td><td>检索增强</td><td>2-4×</td><td>长生成</td></tr></tbody></table>
<blockquote>
<p><strong>vLLM 0.6+ 默认支持 Speculative Decoding</strong>——配置简单，性能提升 2-3×。</p>
</blockquote>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="适用场景-2">适用场景<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%80%82%E7%94%A8%E5%9C%BA%E6%99%AF-2" class="hash-link" aria-label="Direct link to 适用场景" title="Direct link to 适用场景" translate="no">​</a></h4>
<ul>
<li class="">✅ <strong>大模型离线批处理</strong>（效果最显著）</li>
<li class="">✅ <strong>长输出生成</strong>（代码、文章、报告）</li>
<li class="">✅ <strong>多轮对话</strong>（ReAct、Agent）</li>
<li class="">❌ 极短输出（1-5 个 token，加速比有限）</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="其他重要优化技术">其他重要优化技术<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E5%85%B6%E4%BB%96%E9%87%8D%E8%A6%81%E4%BC%98%E5%8C%96%E6%8A%80%E6%9C%AF" class="hash-link" aria-label="Direct link to 其他重要优化技术" title="Direct link to 其他重要优化技术" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-continuous-batching连续批处理">4. Continuous Batching（连续批处理）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#4-continuous-batching%E8%BF%9E%E7%BB%AD%E6%89%B9%E5%A4%84%E7%90%86" class="hash-link" aria-label="Direct link to 4. Continuous Batching（连续批处理）" title="Direct link to 4. Continuous Batching（连续批处理）" translate="no">​</a></h3>
<p><strong>vLLM / TGI / TensorRT-LLM 全部支持</strong>：</p>
<ul>
<li class=""><strong>传统</strong>：等 batch 满才处理，新请求等待</li>
<li class=""><strong>Continuous</strong>：动态插入新请求到正在运行的 batch</li>
<li class=""><strong>提升</strong>：吞吐量 2-4×</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-quantization量化">5. Quantization（量化）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#5-quantization%E9%87%8F%E5%8C%96" class="hash-link" aria-label="Direct link to 5. Quantization（量化）" title="Direct link to 5. Quantization（量化）" translate="no">​</a></h3>
<table><thead><tr><th>精度</th><th>模型大小</th><th>性能</th><th>质量损失</th></tr></thead><tbody><tr><td><strong>FP16</strong></td><td>70B = 140 GB</td><td>1×</td><td>0%</td></tr><tr><td><strong>INT8</strong></td><td>70B = 70 GB</td><td>1.5-2×</td><td>&lt;1%</td></tr><tr><td><strong>INT4 (GPTQ/AWQ)</strong></td><td>70B = 35 GB</td><td>2-3×</td><td>1-3%</td></tr><tr><td><strong>FP8</strong></td><td>70B = 70 GB</td><td>1.5-2×</td><td>&lt;1%</td></tr><tr><td><strong>FP4 (NVFP4)</strong></td><td>70B = 35 GB</td><td>2-3×</td><td>2-5%</td></tr><tr><td><strong>INT2</strong></td><td>70B = 17.5 GB</td><td>3-5×</td><td>5-15%</td></tr></tbody></table>
<blockquote>
<p><strong>NVFP4 (NVIDIA) + 量化感知训练 = 接近 FP16 质量 + 2-3× 性能</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-prefix-caching前缀缓存">6. Prefix Caching（前缀缓存）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#6-prefix-caching%E5%89%8D%E7%BC%80%E7%BC%93%E5%AD%98" class="hash-link" aria-label="Direct link to 6. Prefix Caching（前缀缓存）" title="Direct link to 6. Prefix Caching（前缀缓存）" translate="no">​</a></h3>
<ul>
<li class=""><strong>场景</strong>：多个请求共享相同 system prompt</li>
<li class=""><strong>方法</strong>：缓存 KV Cache 的前缀</li>
<li class=""><strong>加速</strong>：相同 prefix 部分 0 计算，<strong>~10-100× 加速</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-chunked-prefill分块预填充">7. Chunked Prefill（分块预填充）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#7-chunked-prefill%E5%88%86%E5%9D%97%E9%A2%84%E5%A1%AB%E5%85%85" class="hash-link" aria-label="Direct link to 7. Chunked Prefill（分块预填充）" title="Direct link to 7. Chunked Prefill（分块预填充）" translate="no">​</a></h3>
<ul>
<li class=""><strong>问题</strong>：长 prompt 预填充阻塞其他请求</li>
<li class=""><strong>方法</strong>：将预填充分块，与解码交错</li>
<li class=""><strong>提升</strong>：TTFT -50%，总吞吐 +20%</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="推理优化软件栈">推理优化软件栈<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%8E%A8%E7%90%86%E4%BC%98%E5%8C%96%E8%BD%AF%E4%BB%B6%E6%A0%88" class="hash-link" aria-label="Direct link to 推理优化软件栈" title="Direct link to 推理优化软件栈" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="vllm最流行">vLLM（最流行）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#vllm%E6%9C%80%E6%B5%81%E8%A1%8C" class="hash-link" aria-label="Direct link to vLLM（最流行）" title="Direct link to vLLM（最流行）" translate="no">​</a></h3>
<table><thead><tr><th>特性</th><th>支持</th></tr></thead><tbody><tr><td><strong>PagedAttention</strong></td><td>✅ 核心</td></tr><tr><td><strong>Continuous Batching</strong></td><td>✅</td></tr><tr><td><strong>Speculative Decoding</strong></td><td>✅ 0.6+</td></tr><tr><td><strong>Quantization</strong></td><td>✅ INT4/INT8/FP8</td></tr><tr><td><strong>Prefix Caching</strong></td><td>✅ 0.4+</td></tr><tr><td><strong>Multi-LoRA</strong></td><td>✅</td></tr><tr><td><strong>多 GPU</strong></td><td>✅ TP/PP</td></tr><tr><td><strong>支持模型</strong></td><td>Llama / Qwen / Mistral / Gemma / DeepSeek 全系列</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tensorrt-llmnvidia">TensorRT-LLM（NVIDIA）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#tensorrt-llmnvidia" class="hash-link" aria-label="Direct link to TensorRT-LLM（NVIDIA）" title="Direct link to TensorRT-LLM（NVIDIA）" translate="no">​</a></h3>
<table><thead><tr><th>特性</th><th>支持</th></tr></thead><tbody><tr><td><strong>In-flight Batching</strong></td><td>✅</td></tr><tr><td><strong>PagedAttention</strong></td><td>✅</td></tr><tr><td><strong>Speculative Decoding</strong></td><td>✅</td></tr><tr><td><strong>Quantization</strong></td><td>✅ INT4/INT8/FP8/FP4</td></tr><tr><td><strong>Multi-GPU</strong></td><td>✅ TP/PP/EP</td></tr><tr><td><strong>性能</strong></td><td><strong>NVIDIA GPU 上最佳</strong>（原生优化）</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="sglanguc-berkeley-新作">SGLang（UC Berkeley 新作）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#sglanguc-berkeley-%E6%96%B0%E4%BD%9C" class="hash-link" aria-label="Direct link to SGLang（UC Berkeley 新作）" title="Direct link to SGLang（UC Berkeley 新作）" translate="no">​</a></h3>
<ul>
<li class=""><strong>RadixAttention</strong>：类似 Prefix Caching，更高效</li>
<li class=""><strong>结构化生成</strong>：JSON / regex guided generation</li>
<li class=""><strong>2025 增长迅速</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="llamacpp本地">llama.cpp（本地）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#llamacpp%E6%9C%AC%E5%9C%B0" class="hash-link" aria-label="Direct link to llama.cpp（本地）" title="Direct link to llama.cpp（本地）" translate="no">​</a></h3>
<ul>
<li class=""><strong>GGUF 格式</strong></li>
<li class=""><strong>CPU / GPU / Apple Silicon</strong> 全支持</li>
<li class=""><strong>本地 LLM 首选</strong></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="实际性能对比70b-推理">实际性能对比（70B 推理）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E5%AE%9E%E9%99%85%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%9470b-%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to 实际性能对比（70B 推理）" title="Direct link to 实际性能对比（70B 推理）" translate="no">​</a></h2>
<table><thead><tr><th>软件</th><th>硬件</th><th>量化</th><th>吞吐量</th><th>延迟 TPOT</th></tr></thead><tbody><tr><td><strong>vLLM + PagedAttn</strong></td><td>H100</td><td>FP16</td><td>1500 tok/s</td><td>8ms</td></tr><tr><td><strong>vLLM + Spec Decoding</strong></td><td>H100</td><td>FP16</td><td><strong>3000 tok/s</strong></td><td>3ms</td></tr><tr><td><strong>TensorRT-LLM</strong></td><td>H100</td><td>FP8</td><td>2500 tok/s</td><td>4ms</td></tr><tr><td><strong>TensorRT-LLM + NVFP4</strong></td><td><strong>B200</strong></td><td><strong>FP4</strong></td><td><strong>5000 tok/s</strong></td><td>2ms</td></tr><tr><td><strong>vLLM</strong></td><td>8× A100</td><td>INT4</td><td>800 tok/s</td><td>12ms</td></tr><tr><td><strong>llama.cpp</strong></td><td>M3 Ultra</td><td>Q4_K_M</td><td>12 tok/s</td><td>80ms</td></tr></tbody></table>
<blockquote>
<p><strong>B200 + NVFP4 + TensorRT-LLM</strong> = <strong>5000 tok/s</strong> = 比 FP16 H100 提升 20×。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="推理优化对硬件选型的影响">推理优化对硬件选型的影响<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%8E%A8%E7%90%86%E4%BC%98%E5%8C%96%E5%AF%B9%E7%A1%AC%E4%BB%B6%E9%80%89%E5%9E%8B%E7%9A%84%E5%BD%B1%E5%93%8D" class="hash-link" aria-label="Direct link to 推理优化对硬件选型的影响" title="Direct link to 推理优化对硬件选型的影响" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="优化--硬件需求降低">优化 → 硬件需求降低<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E4%BC%98%E5%8C%96--%E7%A1%AC%E4%BB%B6%E9%9C%80%E6%B1%82%E9%99%8D%E4%BD%8E" class="hash-link" aria-label="Direct link to 优化 → 硬件需求降低" title="Direct link to 优化 → 硬件需求降低" translate="no">​</a></h3>
<table><thead><tr><th>优化技术</th><th>所需算力</th><th>所需显存</th></tr></thead><tbody><tr><td><strong>FP16 基线</strong></td><td>1×</td><td>1×</td></tr><tr><td><strong>+ PagedAttention</strong></td><td>1×</td><td><strong>0.4-0.6×</strong></td></tr><tr><td><strong>+ Speculative</strong></td><td><strong>0.5×</strong></td><td>1×</td></tr><tr><td><strong>+ INT4 量化</strong></td><td>1×</td><td><strong>0.25×</strong></td></tr><tr><td><strong>+ Prefix Cache</strong></td><td>1×</td><td>1×</td></tr><tr><td><strong>+ Chunked Prefill</strong></td><td>1×</td><td>1×</td></tr><tr><td><strong>+ Continuous Batch</strong></td><td>0.5×</td><td>1×</td></tr><tr><td><strong>+ TensorRT-LLM 全套</strong></td><td><strong>0.3×</strong></td><td><strong>0.4×</strong></td></tr></tbody></table>
<blockquote>
<p><strong>全套优化后，硬件需求降低 3-5×</strong>——70B 推理从 8× H100 降到 1-2× H100。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="选型建议">选型建议<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%80%89%E5%9E%8B%E5%BB%BA%E8%AE%AE" class="hash-link" aria-label="Direct link to 选型建议" title="Direct link to 选型建议" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>推荐硬件</th><th>关键软件</th></tr></thead><tbody><tr><td><strong>云端高并发</strong></td><td>8× H100 + vLLM</td><td>PagedAttn + Spec</td></tr><tr><td><strong>单卡大模型</strong></td><td>1× B300 Ultra + TensorRT-LLM</td><td>NVFP4 + Spec</td></tr><tr><td><strong>本地 LLM</strong></td><td>M3 Ultra 192GB + llama.cpp</td><td>GGUF Q4/Q5</td></tr><tr><td><strong>Agent 多轮</strong></td><td>8× H100 + SGLang</td><td>RadixAttn + Spec</td></tr><tr><td><strong>代码生成</strong></td><td>1× B200 + vLLM</td><td>NVFP4 + Spec</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="未来展望">未来展望<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%9C%AA%E6%9D%A5%E5%B1%95%E6%9C%9B" class="hash-link" aria-label="Direct link to 未来展望" title="Direct link to 未来展望" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="短期2026-2027">短期（2026-2027）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E7%9F%AD%E6%9C%9F2026-2027" class="hash-link" aria-label="Direct link to 短期（2026-2027）" title="Direct link to 短期（2026-2027）" translate="no">​</a></h3>
<ul>
<li class=""><strong>FlashAttention v4</strong> 适配 Rubin R200</li>
<li class=""><strong>Speculative Decoding</strong> 标准化（OpenAI API 支持）</li>
<li class=""><strong>Multi-modal Speculative</strong>（视觉 + 语言联合）</li>
<li class=""><strong>端到端编译</strong>：torch.compile + TensorRT</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="中期2027-2030">中期（2027-2030）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E4%B8%AD%E6%9C%9F2027-2030" class="hash-link" aria-label="Direct link to 中期（2027-2030）" title="Direct link to 中期（2027-2030）" translate="no">​</a></h3>
<ul>
<li class=""><strong>端到端 GPU 内核生成</strong>：ML-based kernel synthesis</li>
<li class=""><strong>PIM-HBM 推理</strong>：HBM 内部做 attention</li>
<li class=""><strong>100× 推理加速</strong>（vs 2023 基线）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="长期2030">长期（2030+）<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E9%95%BF%E6%9C%9F2030" class="hash-link" aria-label="Direct link to 长期（2030+）" title="Direct link to 长期（2030+）" translate="no">​</a></h3>
<ul>
<li class=""><strong>神经符号推理</strong>：LLM + 符号系统</li>
<li class=""><strong>量子 + LLM 协同</strong></li>
<li class=""><strong>真正"零延迟"AI 助手</strong></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/h100">NVIDIA H100 (PagedAttn 主流硬件)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/b200">NVIDIA B200 / B300 (NVFP4 + TensorRT-LLM 最佳)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/groq-3-lpx">NVIDIA Groq 3 LPX (超低延迟推理新范式)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/apple-m3-ultra">Apple M3 Ultra 192GB (本地 LLM 之王)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/comparison">完整对比表</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/inference-optimization-paged-attention-flash-attention-speculative-decoding#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>LLM 推理优化的三大核心技术：</p>
<ol>
<li class=""><strong>PagedAttention (vLLM)</strong>：KV Cache 内存管理 → <strong>8-15× 吞吐量</strong></li>
<li class=""><strong>FlashAttention (Tri Dao)</strong>：GPU 内存层次优化 → <strong>2-3× 训练 / 推理</strong></li>
<li class=""><strong>Speculative Decoding</strong>：小模型 + 大模型协同 → <strong>2-3× 推理速度</strong></li>
</ol>
<p><strong>全套优化后，硬件需求降低 3-5×</strong>——软件优化的 ROI 远超硬件升级。</p>
<p><strong>未来 5 年，推理优化将让 AI 推理成本降低 10-100 倍</strong>。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Tech Deep Dive" term="Tech Deep Dive"/>
        <category label="Benchmarks" term="Benchmarks"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Apple Silicon 逆袭：M3 Ultra 192GB UMA 本地 LLM 革命]]></title>
        <id>https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm</id>
        <link href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm"/>
        <updated>2026-04-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Apple M3 Ultra 192GB UMA 统一内存 + 80 核 GPU，本地运行 70B-200B 模型无需量化。Apple Silicon 在 AI 时代的逆袭与对 NVIDIA/AMD 的影响。]]></summary>
        <content type="html"><![CDATA[<p><strong>Apple Silicon 在 AI 时代正在经历逆袭</strong>。M3 Ultra 单台 Mac Studio 配备 <strong>192GB 统一内存（UMA）</strong> 和 <strong>80 核 GPU</strong>，可以<strong>本地运行 70B-200B 参数 LLM 而无需量化</strong>。这是<strong>消费级 / 工作站级 AI 推理的革命</strong>。本文深入分析 Apple Silicon 的 AI 优势、当前生态和未来。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-silicon-演进从-m1-到-m4">Apple Silicon 演进：从 M1 到 M4<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-silicon-%E6%BC%94%E8%BF%9B%E4%BB%8E-m1-%E5%88%B0-m4" class="hash-link" aria-label="Direct link to Apple Silicon 演进：从 M1 到 M4" title="Direct link to Apple Silicon 演进：从 M1 到 M4" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-silicon-时间线">Apple Silicon 时间线<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-silicon-%E6%97%B6%E9%97%B4%E7%BA%BF" class="hash-link" aria-label="Direct link to Apple Silicon 时间线" title="Direct link to Apple Silicon 时间线" translate="no">​</a></h3>
<table><thead><tr><th>芯片</th><th>发布</th><th>制程</th><th>内存 (最大)</th><th>GPU 核心</th><th>FP32 算力</th><th>FP16 算力</th></tr></thead><tbody><tr><td>M1</td><td>2020-11</td><td>5nm</td><td>16 GB</td><td>8</td><td>2.6 TFLOPS</td><td>5.2 TFLOPS</td></tr><tr><td>M1 Pro</td><td>2021-10</td><td>5nm</td><td>32 GB</td><td>16</td><td>5.2 TFLOPS</td><td>10.4 TFLOPS</td></tr><tr><td>M1 Max</td><td>2021-10</td><td>5nm</td><td>64 GB</td><td>32</td><td>10.4 TFLOPS</td><td>20.8 TFLOPS</td></tr><tr><td>M1 Ultra</td><td>2022-03</td><td>5nm</td><td><strong>128 GB</strong></td><td>64</td><td>20.8 TFLOPS</td><td>41.6 TFLOPS</td></tr><tr><td>M2</td><td>2022-06</td><td>5nm</td><td>24 GB</td><td>10</td><td>3.6 TFLOPS</td><td>7.2 TFLOPS</td></tr><tr><td>M2 Ultra</td><td>2023-06</td><td>5nm</td><td><strong>192 GB</strong></td><td>76</td><td>27.2 TFLOPS</td><td>54.4 TFLOPS</td></tr><tr><td><strong>M3</strong></td><td>2023-10</td><td>3nm</td><td>24 GB</td><td>10</td><td>3.7 TFLOPS</td><td>7.4 TFLOPS</td></tr><tr><td><strong>M3 Max</strong></td><td>2023-10</td><td>3nm</td><td>128 GB</td><td>40</td><td>14.1 TFLOPS</td><td>28.2 TFLOPS</td></tr><tr><td><strong>M3 Ultra</strong></td><td>2024-06</td><td>3nm</td><td><strong>192 GB</strong></td><td><strong>80</strong></td><td><strong>28.4 TFLOPS</strong></td><td><strong>56.8 TFLOPS</strong></td></tr><tr><td>M4</td><td>2024-10</td><td>3nm</td><td>32 GB</td><td>10</td><td>4 TFLOPS</td><td>8 TFLOPS</td></tr><tr><td>M4 Max</td><td>2024-10</td><td>3nm</td><td>128 GB</td><td>40</td><td>17 TFLOPS</td><td>34 TFLOPS</td></tr><tr><td>M4 Ultra</td><td>2025-Q4 (推测)</td><td>3nm</td><td><strong>256 GB</strong></td><td>80+</td><td>35 TFLOPS (推测)</td><td>70 TFLOPS (推测)</td></tr></tbody></table>
<blockquote>
<p><strong>M3 Ultra 192GB UMA</strong> = <strong>可装 70B 模型（FP16）+ 大 KV Cache</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-silicon-的关键创新统一内存架构uma">Apple Silicon 的关键创新：统一内存架构（UMA）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-silicon-%E7%9A%84%E5%85%B3%E9%94%AE%E5%88%9B%E6%96%B0%E7%BB%9F%E4%B8%80%E5%86%85%E5%AD%98%E6%9E%B6%E6%9E%84uma" class="hash-link" aria-label="Direct link to Apple Silicon 的关键创新：统一内存架构（UMA）" title="Direct link to Apple Silicon 的关键创新：统一内存架构（UMA）" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="uma-vs-传统-gpu-显存">UMA vs 传统 GPU 显存<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#uma-vs-%E4%BC%A0%E7%BB%9F-gpu-%E6%98%BE%E5%AD%98" class="hash-link" aria-label="Direct link to UMA vs 传统 GPU 显存" title="Direct link to UMA vs 传统 GPU 显存" translate="no">​</a></h3>
<table><thead><tr><th>维度</th><th>Apple Silicon (UMA)</th><th>NVIDIA GPU (HBM)</th></tr></thead><tbody><tr><td><strong>内存位置</strong></td><td>同一 chip 上</td><td>独立显存芯片</td></tr><tr><td><strong>容量</strong></td><td>16-192 GB（消费级）</td><td>80-288 GB（旗舰）</td></tr><tr><td><strong>带宽</strong></td><td><strong>800 GB/s</strong> (M3 Ultra)</td><td>3.35-22 TB/s (H100/Rubin)</td></tr><tr><td><strong>CPU + GPU 共享</strong></td><td>✅ 完全共享</td><td>❌ 需 PCIe 复制</td></tr><tr><td><strong>数据一致性</strong></td><td>自动</td><td>手动 sync</td></tr><tr><td><strong>多任务友好</strong></td><td>✅ 极强</td><td>❌ 易爆显存</td></tr></tbody></table>
<blockquote>
<p><strong>UMA 的核心理念</strong>：<strong>CPU 和 GPU 共享同一块内存</strong>，无需数据复制，<strong>特别适合大模型推理</strong>（prompt 和 KV cache 可在 CPU/GPU 间无缝传递）。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="uma-对-llm-推理的影响">UMA 对 LLM 推理的影响<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#uma-%E5%AF%B9-llm-%E6%8E%A8%E7%90%86%E7%9A%84%E5%BD%B1%E5%93%8D" class="hash-link" aria-label="Direct link to UMA 对 LLM 推理的影响" title="Direct link to UMA 对 LLM 推理的影响" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="场景-170b-模型推理">场景 1：70B 模型推理<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#%E5%9C%BA%E6%99%AF-170b-%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to 场景 1：70B 模型推理" title="Direct link to 场景 1：70B 模型推理" translate="no">​</a></h4>
<table><thead><tr><th>维度</th><th>NVIDIA A100 80GB</th><th><strong>Apple M3 Ultra 192GB</strong></th></tr></thead><tbody><tr><td>装 FP16 70B</td><td>❌ 需 2 卡</td><td>✅ 装 1 个</td></tr><tr><td>模型权重</td><td>140 GB（INT4）</td><td>140 GB（FP16）</td></tr><tr><td>KV Cache 剩余</td><td>0 GB</td><td><strong>52 GB</strong>（2K 上下文）</td></tr><tr><td>长上下文支持</td><td>短（需量化）</td><td><strong>8K-32K</strong>（FP16）</td></tr><tr><td>部署成本</td><td>$15K+ (GPU)</td><td><strong>$5K</strong> (Mac Studio)</td></tr></tbody></table>
<blockquote>
<p><strong>M3 Ultra 装 70B FP16 模型后，仍有 52GB 余量给 KV Cache</strong>——这是 NVIDIA 80GB 卡<strong>做不到</strong>的。</p>
</blockquote>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="场景-2200b-模型推理">场景 2：200B 模型推理<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#%E5%9C%BA%E6%99%AF-2200b-%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to 场景 2：200B 模型推理" title="Direct link to 场景 2：200B 模型推理" translate="no">​</a></h4>
<table><thead><tr><th>维度</th><th>8× NVIDIA H100 (640GB)</th><th><strong>2× Mac Studio M3 Ultra (384GB)</strong></th></tr></thead><tbody><tr><td>装 FP16 200B</td><td>✅</td><td>✅ (需 2 台串联 / MLX 框架)</td></tr><tr><td>价格</td><td>~$240K</td><td><strong>~$10K</strong></td></tr><tr><td>功耗</td><td>5.6 kW</td><td>780 W</td></tr><tr><td>部署复杂度</td><td>高（多卡）</td><td>中（多机 MLX）</td></tr></tbody></table>
<blockquote>
<p><strong>价格 24× 优势</strong> + <strong>功耗 7× 优势</strong>——Apple Silicon 在大模型推理上<strong>性价比远超 NVIDIA</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-silicon-ai-生态">Apple Silicon AI 生态<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-silicon-ai-%E7%94%9F%E6%80%81" class="hash-link" aria-label="Direct link to Apple Silicon AI 生态" title="Direct link to Apple Silicon AI 生态" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-mlxapple-自研框架">1. MLX（Apple 自研框架）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#1-mlxapple-%E8%87%AA%E7%A0%94%E6%A1%86%E6%9E%B6" class="hash-link" aria-label="Direct link to 1. MLX（Apple 自研框架）" title="Direct link to 1. MLX（Apple 自研框架）" translate="no">​</a></h3>
<p><strong>MLX</strong> 是 Apple 2023 年开源的<strong>机器学习框架</strong>，<strong>专门为 Apple Silicon UMA 优化</strong>：</p>
<ul>
<li class=""><strong>GitHub</strong>：<a href="https://github.com/ml-explore/mlx" target="_blank" rel="noopener noreferrer" class="">https://github.com/ml-explore/mlx</a></li>
<li class=""><strong>API 兼容 PyTorch / NumPy</strong></li>
<li class=""><strong>支持 LLM / Diffusion / Vision 全场景</strong></li>
<li class=""><strong>2026 已成为 Apple Silicon 上 LLM 推理的事实标准</strong></li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="mlx-vs-pytorch-性能对比m3-ultra">MLX vs PyTorch 性能对比（M3 Ultra）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#mlx-vs-pytorch-%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94m3-ultra" class="hash-link" aria-label="Direct link to MLX vs PyTorch 性能对比（M3 Ultra）" title="Direct link to MLX vs PyTorch 性能对比（M3 Ultra）" translate="no">​</a></h4>
<table><thead><tr><th>模型</th><th>PyTorch (MPS)</th><th><strong>MLX</strong></th><th>提升</th></tr></thead><tbody><tr><td>Llama 2 7B</td><td>35 tok/s</td><td><strong>52 tok/s</strong></td><td>1.5×</td></tr><tr><td>Llama 2 13B</td><td>22 tok/s</td><td><strong>35 tok/s</strong></td><td>1.6×</td></tr><tr><td>Llama 2 70B</td><td>6 tok/s</td><td><strong>12 tok/s</strong></td><td>2×</td></tr><tr><td>Mistral 7B</td><td>38 tok/s</td><td><strong>55 tok/s</strong></td><td>1.4×</td></tr><tr><td>Mixtral 8x7B</td><td>18 tok/s</td><td><strong>28 tok/s</strong></td><td>1.6×</td></tr><tr><td>Qwen 72B</td><td>5 tok/s</td><td><strong>10 tok/s</strong></td><td>2×</td></tr></tbody></table>
<blockquote>
<p><strong>MLX 比 PyTorch MPS 性能提升 50-100%</strong>。原因：MLX <strong>针对 UMA 优化</strong>，避免 CPU/GPU 内存复制。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-llamacppgguf-量化">2. llama.cpp（GGUF 量化）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#2-llamacppgguf-%E9%87%8F%E5%8C%96" class="hash-link" aria-label="Direct link to 2. llama.cpp（GGUF 量化）" title="Direct link to 2. llama.cpp（GGUF 量化）" translate="no">​</a></h3>
<p><strong>llama.cpp</strong> 是社区最流行的本地 LLM 框架：</p>
<ul>
<li class=""><strong>支持 Apple Silicon Metal GPU 加速</strong></li>
<li class=""><strong>GGUF 量化格式</strong>：Q4_K_M / Q5_K_M / Q6_K</li>
<li class=""><strong>70B 模型在 M3 Ultra 上</strong>：<!-- -->
<ul>
<li class="">Q4_K_M（40 GB）：<strong>~10-15 tok/s</strong></li>
<li class="">Q5_K_M（48 GB）：~8-12 tok/s</li>
<li class="">Q6_K（56 GB）：~6-9 tok/s</li>
<li class="">Q8_0（75 GB）：~5-7 tok/s</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-ollama本地-llm-一键运行">3. Ollama（本地 LLM 一键运行）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#3-ollama%E6%9C%AC%E5%9C%B0-llm-%E4%B8%80%E9%94%AE%E8%BF%90%E8%A1%8C" class="hash-link" aria-label="Direct link to 3. Ollama（本地 LLM 一键运行）" title="Direct link to 3. Ollama（本地 LLM 一键运行）" translate="no">​</a></h3>
<p><strong>Ollama</strong> 是 2024-2025 最流行的本地 LLM 工具：</p>
<ul>
<li class="">一键运行 Llama 3 / Mistral / Qwen / Gemma</li>
<li class="">M3 Ultra 上 70B 模型可流畅运行</li>
<li class=""><strong>2025 月活 100 万+</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-lm-studiogui-客户端">4. LM Studio（GUI 客户端）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#4-lm-studiogui-%E5%AE%A2%E6%88%B7%E7%AB%AF" class="hash-link" aria-label="Direct link to 4. LM Studio（GUI 客户端）" title="Direct link to 4. LM Studio（GUI 客户端）" translate="no">​</a></h3>
<p><strong>LM Studio</strong> 是 2024-2025 最流行的本地 LLM 客户端：</p>
<ul>
<li class="">完全 GUI，无需命令行</li>
<li class="">M3 Ultra 优化（MLX 后端）</li>
<li class="">支持 Llama 3.1 405B 量化（GGUF）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-vllm推理服务">5. vLLM（推理服务）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#5-vllm%E6%8E%A8%E7%90%86%E6%9C%8D%E5%8A%A1" class="hash-link" aria-label="Direct link to 5. vLLM（推理服务）" title="Direct link to 5. vLLM（推理服务）" translate="no">​</a></h3>
<p><strong>vLLM 0.7+</strong> 实验性支持 Apple Silicon：</p>
<ul>
<li class=""><strong>PagedAttention 优化</strong></li>
<li class="">70B FP16 服务在 M3 Ultra 上可行</li>
<li class="">TTFT ~500ms，TPOT ~80ms</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="实际性能测试">实际性能测试<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#%E5%AE%9E%E9%99%85%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95" class="hash-link" aria-label="Direct link to 实际性能测试" title="Direct link to 实际性能测试" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="m3-ultra-vs-nvidia-h10070b-fp16-推理">M3 Ultra vs NVIDIA H100（70B FP16 推理）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#m3-ultra-vs-nvidia-h10070b-fp16-%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to M3 Ultra vs NVIDIA H100（70B FP16 推理）" title="Direct link to M3 Ultra vs NVIDIA H100（70B FP16 推理）" translate="no">​</a></h3>
<table><thead><tr><th>指标</th><th>M3 Ultra (80 GPU + 192GB)</th><th>NVIDIA H100 (80GB)</th></tr></thead><tbody><tr><td>装 70B FP16</td><td>✅ 192GB &gt; 140GB</td><td>❌ 80GB &lt; 140GB</td></tr><tr><td>吞吐量</td><td>12 tok/s (单 user)</td><td>30 tok/s (FP8 + batch)</td></tr><tr><td>延迟 TTFT</td><td>800ms</td><td>200ms</td></tr><tr><td>KV Cache</td><td><strong>8K-32K token</strong></td><td>1-2K token (需 2 卡)</td></tr><tr><td>价格</td><td><strong>$5,000</strong> (Mac Studio)</td><td>$30,000+ (H100 8 卡)</td></tr><tr><td>功耗</td><td>480W</td><td>5,600W (8 卡)</td></tr><tr><td>适合场景</td><td><strong>单用户长上下文</strong></td><td><strong>高并发低延迟</strong></td></tr></tbody></table>
<blockquote>
<p><strong>Apple Silicon 在"单用户长上下文"场景下完胜 NVIDIA</strong>——但<strong>在高并发低延迟"场景下不如 NVIDIA</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="m3-ultra-vs-apple-m2-ultra代际提升">M3 Ultra vs Apple M2 Ultra（代际提升）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#m3-ultra-vs-apple-m2-ultra%E4%BB%A3%E9%99%85%E6%8F%90%E5%8D%87" class="hash-link" aria-label="Direct link to M3 Ultra vs Apple M2 Ultra（代际提升）" title="Direct link to M3 Ultra vs Apple M2 Ultra（代际提升）" translate="no">​</a></h3>
<table><thead><tr><th>指标</th><th>M2 Ultra (76 GPU)</th><th><strong>M3 Ultra (80 GPU)</strong></th><th>提升</th></tr></thead><tbody><tr><td>内存</td><td>192 GB</td><td>192 GB</td><td>同</td></tr><tr><td>内存带宽</td><td><strong>800 GB/s</strong></td><td>800 GB/s</td><td>同</td></tr><tr><td>FP16 算力</td><td>54.4 TFLOPS</td><td><strong>56.8 TFLOPS</strong></td><td>1.04×</td></tr><tr><td>制程</td><td>5nm</td><td><strong>3nm</strong></td><td>更先进</td></tr><tr><td>LLM 推理 (70B Q4)</td><td>10 tok/s</td><td><strong>12 tok/s</strong></td><td>1.2×</td></tr><tr><td>功耗</td><td>350W</td><td>480W</td><td>略增</td></tr></tbody></table>
<blockquote>
<p><strong>M3 Ultra 提升有限</strong>（4-20%）。主要改进是能效和制程。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-silicon-ai-适用场景">Apple Silicon AI 适用场景<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-silicon-ai-%E9%80%82%E7%94%A8%E5%9C%BA%E6%99%AF" class="hash-link" aria-label="Direct link to Apple Silicon AI 适用场景" title="Direct link to Apple Silicon AI 适用场景" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-最佳场景">✅ 最佳场景<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#-%E6%9C%80%E4%BD%B3%E5%9C%BA%E6%99%AF" class="hash-link" aria-label="Direct link to ✅ 最佳场景" title="Direct link to ✅ 最佳场景" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>理由</th></tr></thead><tbody><tr><td><strong>本地 LLM 推理</strong></td><td>192GB UMA 可装 70B FP16 + 大 KV</td></tr><tr><td><strong>本地文生图</strong></td><td>Stable Diffusion XL / Flux 流畅运行</td></tr><tr><td><strong>本地多模态</strong></td><td>LLaVA / GPT-4V 量化版本地</td></tr><tr><td><strong>个人 AI 助手</strong></td><td>Ollama + Mistral 7B 完全本地</td></tr><tr><td><strong>学术研究</strong></td><td>单机训练小模型 / 调试</td></tr><tr><td><strong>隐私敏感 AI</strong></td><td>完全离线，无数据外传</td></tr><tr><td><strong>AI 编程助手</strong></td><td>Continue + DeepSeek Coder 33B</td></tr><tr><td><strong>教育 / 学生</strong></td><td>性价比高，无需订阅云服务</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-不适合场景">❌ 不适合场景<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#-%E4%B8%8D%E9%80%82%E5%90%88%E5%9C%BA%E6%99%AF" class="hash-link" aria-label="Direct link to ❌ 不适合场景" title="Direct link to ❌ 不适合场景" translate="no">​</a></h3>
<table><thead><tr><th>场景</th><th>理由</th></tr></thead><tbody><tr><td><strong>大规模训练</strong></td><td>算力远低于 H100/B200</td></tr><tr><td><strong>高并发推理服务</strong></td><td>单机内存带宽限制</td></tr><tr><td><strong>FP8 / FP4 训练</strong></td><td>Apple Silicon 不支持</td></tr><tr><td><strong>多卡集群</strong></td><td>UMA 难扩展</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-silicon-vs-nvidia-推理对比">Apple Silicon vs NVIDIA 推理对比<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-silicon-vs-nvidia-%E6%8E%A8%E7%90%86%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to Apple Silicon vs NVIDIA 推理对比" title="Direct link to Apple Silicon vs NVIDIA 推理对比" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="70b-模型推理">70B 模型推理<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#70b-%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to 70B 模型推理" title="Direct link to 70B 模型推理" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>硬件价格</th><th>性能</th><th>部署复杂度</th></tr></thead><tbody><tr><td><strong>Apple M3 Ultra</strong></td><td>$5K</td><td>12 tok/s (FP16)</td><td>⭐</td></tr><tr><td><strong>Apple M2 Ultra</strong></td><td>$4K</td><td>10 tok/s (FP16)</td><td>⭐</td></tr><tr><td><strong>NVIDIA H100 80GB</strong></td><td>$30K</td><td>30 tok/s (FP8)</td><td>⭐⭐</td></tr><tr><td><strong>NVIDIA H100 8 卡</strong></td><td>$240K</td><td>200+ tok/s (FP8)</td><td>⭐⭐⭐</td></tr><tr><td><strong>AMD MI300X</strong></td><td>$15K</td><td>22 tok/s (FP8)</td><td>⭐⭐</td></tr><tr><td><strong>AMD MI400</strong></td><td>$25K (推测)</td><td>50+ tok/s (FP4)</td><td>⭐⭐</td></tr><tr><td><strong>Google TPU 8i (云)</strong></td><td>$4/hr</td><td>80+ tok/s (FP8)</td><td>⭐</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="价格性能比每美元吞吐量">价格性能比（每美元吞吐量）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#%E4%BB%B7%E6%A0%BC%E6%80%A7%E8%83%BD%E6%AF%94%E6%AF%8F%E7%BE%8E%E5%85%83%E5%90%9E%E5%90%90%E9%87%8F" class="hash-link" aria-label="Direct link to 价格性能比（每美元吞吐量）" title="Direct link to 价格性能比（每美元吞吐量）" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>tok/s/$硬件</th><th>排名</th></tr></thead><tbody><tr><td><strong>Apple M3 Ultra</strong></td><td>0.0024</td><td>⭐⭐⭐</td></tr><tr><td><strong>Apple M2 Ultra</strong></td><td>0.0025</td><td>⭐⭐⭐</td></tr><tr><td><strong>AMD MI300X</strong></td><td>0.0015</td><td>⭐⭐</td></tr><tr><td><strong>NVIDIA H100</strong></td><td>0.0010</td><td>⭐</td></tr><tr><td><strong>Google TPU 8i (云)</strong></td><td>20+ tok/s/$/hr</td><td>⭐⭐⭐⭐（云）</td></tr></tbody></table>
<blockquote>
<p><strong>Apple M3 Ultra 是本地部署的"性价比之王"</strong>——价格性能比 2.5× NVIDIA H100。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-silicon-局限">Apple Silicon 局限<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-silicon-%E5%B1%80%E9%99%90" class="hash-link" aria-label="Direct link to Apple Silicon 局限" title="Direct link to Apple Silicon 局限" translate="no">​</a></h2>
<table><thead><tr><th>局限</th><th>影响</th></tr></thead><tbody><tr><td><strong>算力弱</strong></td><td>FP16 56 TFLOPS vs H100 989 TFLOPS</td></tr><tr><td><strong>不支持 FP8 / FP4</strong></td><td>量化路径有限</td></tr><tr><td><strong>内存带宽限制</strong></td><td>800 GB/s vs H100 3.35 TB/s</td></tr><tr><td><strong>生态封闭</strong></td><td>仅 macOS，无 Linux 服务器</td></tr><tr><td><strong>数据中心不可用</strong></td><td>macOS 不适合 24/7 集群</td></tr><tr><td><strong>多卡难扩展</strong></td><td>UMA 架构不易横向扩展</td></tr><tr><td><strong>NVLink 替代缺失</strong></td><td>多机互联带宽低</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-ai-战略2025-2026">Apple AI 战略（2025-2026）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-ai-%E6%88%98%E7%95%A52025-2026" class="hash-link" aria-label="Direct link to Apple AI 战略（2025-2026）" title="Direct link to Apple AI 战略（2025-2026）" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="wwdc-2025-公告">WWDC 2025 公告<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#wwdc-2025-%E5%85%AC%E5%91%8A" class="hash-link" aria-label="Direct link to WWDC 2025 公告" title="Direct link to WWDC 2025 公告" translate="no">​</a></h3>
<ul>
<li class=""><strong>Apple Intelligence</strong> 全面接入 iOS 18 / macOS 15</li>
<li class=""><strong>Private Cloud Compute</strong>：Apple 自建数据中心，使用 Apple Silicon</li>
<li class=""><strong>M4 Ultra</strong> 2025-Q4 发布</li>
<li class=""><strong>M5 系列</strong> 2026 推测（3nm+ 增强）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-intelligence-与-m3-ultra">Apple Intelligence 与 M3 Ultra<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-intelligence-%E4%B8%8E-m3-ultra" class="hash-link" aria-label="Direct link to Apple Intelligence 与 M3 Ultra" title="Direct link to Apple Intelligence 与 M3 Ultra" translate="no">​</a></h3>
<ul>
<li class=""><strong>Apple Intelligence</strong> 后端推理完全在 M3 Ultra 上本地运行</li>
<li class=""><strong>写作工具 / 图像生成 / Siri 增强</strong>等全部本地</li>
<li class=""><strong>隐私优先</strong>：仅在必要时调用 Private Cloud Compute</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="apple-与-openai-合作">Apple 与 OpenAI 合作<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#apple-%E4%B8%8E-openai-%E5%90%88%E4%BD%9C" class="hash-link" aria-label="Direct link to Apple 与 OpenAI 合作" title="Direct link to Apple 与 OpenAI 合作" translate="no">​</a></h3>
<ul>
<li class=""><strong>iOS 18 + ChatGPT 集成</strong>（用户可选）</li>
<li class=""><strong>不替代 Apple Intelligence</strong>，而是补充</li>
<li class=""><strong>不直接创造 Apple Silicon AI 需求</strong></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="m4-ultra-预期2025-q4-推测">M4 Ultra 预期（2025-Q4 推测）<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#m4-ultra-%E9%A2%84%E6%9C%9F2025-q4-%E6%8E%A8%E6%B5%8B" class="hash-link" aria-label="Direct link to M4 Ultra 预期（2025-Q4 推测）" title="Direct link to M4 Ultra 预期（2025-Q4 推测）" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>M3 Ultra</th><th><strong>M4 Ultra (推测)</strong></th><th>提升</th></tr></thead><tbody><tr><td>制程</td><td>3nm</td><td><strong>3nm (增强)</strong></td><td>同</td></tr><tr><td>内存</td><td>192 GB</td><td><strong>256 GB</strong></td><td>1.33×</td></tr><tr><td>内存带宽</td><td>800 GB/s</td><td><strong>1000+ GB/s</strong></td><td>1.25×</td></tr><tr><td>GPU 核心</td><td>80</td><td>80+</td><td>同</td></tr><tr><td>FP16 算力</td><td>56.8 TFLOPS</td><td><strong>70 TFLOPS</strong></td><td>1.23×</td></tr><tr><td>功耗</td><td>480W</td><td>500-550W</td><td>略增</td></tr><tr><td>发布时间</td><td>2024-06</td><td><strong>2025-Q4 (推测)</strong></td><td>—</td></tr></tbody></table>
<blockquote>
<p><strong>M4 Ultra 256GB UMA</strong> = <strong>可装 200B 模型（FP16）</strong>——大模型本地推理新时代。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/apple-m-series">Apple M-Series 总览</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/apple-m3-ultra">Apple M3 Ultra 192GB</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/h100">NVIDIA H100</a> (对比)</li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi300x">AMD MI300X</a> (对比)</li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8i">Google TPU 8i</a> (云端对比)</li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/comparison">完整对比表</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/apple-silicon-ai-comeback-m3-ultra-192gb-local-llm#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>Apple Silicon 在 AI 时代的<strong>逆袭</strong>：</p>
<ol>
<li class=""><strong>M3 Ultra 192GB UMA</strong> = 本地 70B FP16 + 32K KV Cache</li>
<li class=""><strong>MLX 框架</strong> = 比 PyTorch MPS 性能提升 50-100%</li>
<li class=""><strong>价格性能比</strong> = 2.5× NVIDIA H100</li>
<li class=""><strong>功耗</strong> = 480W (M3 Ultra) vs 5,600W (8× H100)</li>
<li class=""><strong>Apple Intelligence</strong> = 全本地 AI 助手</li>
<li class=""><strong>M4 Ultra 256GB</strong> 即将发布 = 200B 模型本地</li>
</ol>
<p><strong>Apple Silicon 不是"数据中心 AI 杀手"，但是"本地 AI 部署之王"</strong>。</p>
<p>如果你需要：</p>
<ul>
<li class=""><strong>本地 LLM 推理</strong> → <strong>Apple M3 Ultra</strong>（最佳）</li>
<li class=""><strong>大规模训练</strong> → NVIDIA H100 / Rubin R200</li>
<li class=""><strong>高并发推理服务</strong> → NVIDIA H100 + Groq 3 LPX</li>
<li class=""><strong>本地文生图</strong> → Apple M3 Max / Ultra</li>
<li class=""><strong>隐私敏感 AI</strong> → Apple Silicon（完全离线）</li>
</ul>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Tech Deep Dive" term="Tech Deep Dive"/>
        <category label="Selection Guide" term="Selection Guide"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[AMD MI400 + Helios 机柜：432GB HBM4 + 260 TB/s UALoF 开放互联]]></title>
        <id>https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect</id>
        <link href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect"/>
        <updated>2026-04-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026 年 AMD 推出 MI400（CDNA Next 架构）+ Helios 72-GPU 机柜，挑战 NVIDIA NVL72。UALoF 开放互联 + 432GB HBM4 + 19.6 TB/s 带宽是 AMD 的关键武器。]]></summary>
        <content type="html"><![CDATA[<p>2026 年 AMD 推出 <strong>MI400（CDNA Next）</strong> + <strong>Helios 72-GPU 机柜</strong>，这是 AMD 对标 NVIDIA NVL72 的旗舰方案。本文将分析 MI400 的关键规格、Helios 机柜的开放互联（UALoF）战略，以及与 Rubin R200 的对比。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="mi400-核心规格">MI400 核心规格<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#mi400-%E6%A0%B8%E5%BF%83%E8%A7%84%E6%A0%BC" class="hash-link" aria-label="Direct link to MI400 核心规格" title="Direct link to MI400 核心规格" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>MI400</th><th>上一代 MI350</th><th>提升</th></tr></thead><tbody><tr><td>架构</td><td><strong>CDNA Next</strong></td><td>CDNA 4</td><td>新一代</td></tr><tr><td>制程</td><td>TSMC 3nm / 2nm</td><td>TSMC 3nm</td><td>更先进</td></tr><tr><td>显存</td><td><strong>432 GB HBM4</strong></td><td>288 GB HBM3e</td><td>1.5×</td></tr><tr><td>显存带宽</td><td><strong>19.6 TB/s</strong></td><td>8 TB/s</td><td>2.45×</td></tr><tr><td>FP4 Tensor (dense)</td><td><strong>40 PFLOPS</strong></td><td>20 PFLOPS</td><td>2×</td></tr><tr><td>FP8 Tensor (dense)</td><td>20 PFLOPS</td><td>10 PFLOPS</td><td>2×</td></tr><tr><td>TDP</td><td>~1,000 W</td><td>~1,000 W</td><td>持平</td></tr><tr><td>PCIe</td><td>Gen 6</td><td>Gen 5</td><td>2×</td></tr><tr><td>发布时间</td><td><strong>2026</strong></td><td>2025</td><td>—</td></tr></tbody></table>
<blockquote>
<p><strong>432 GB HBM4 = 全球单卡最大显存</strong>。相比 NVIDIA Rubin R200 的 288 GB，<strong>多 50%</strong>。这对<strong>超大模型推理</strong>是关键优势。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="cdna-next-架构亮点">CDNA Next 架构亮点<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#cdna-next-%E6%9E%B6%E6%9E%84%E4%BA%AE%E7%82%B9" class="hash-link" aria-label="Direct link to CDNA Next 架构亮点" title="Direct link to CDNA Next 架构亮点" translate="no">​</a></h2>
<p>AMD 在 CDNA Next 上的关键演进：</p>
<ol>
<li class=""><strong>FP4 矩阵引擎</strong>：原生支持 MXFP4 / NVFP4</li>
<li class=""><strong>增强的稀疏计算</strong>：比 CDNA 4 提升 2× sparse throughput</li>
<li class=""><strong>更大的 Infinity Cache</strong>：~512 MB</li>
<li class=""><strong>异构调度器</strong>：CPU+GPU 协同优化（EPYC Venice 协同）</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="helios-机柜amd-的-nvl72-回应">Helios 机柜：AMD 的 NVL72 回应<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#helios-%E6%9C%BA%E6%9F%9Camd-%E7%9A%84-nvl72-%E5%9B%9E%E5%BA%94" class="hash-link" aria-label="Direct link to Helios 机柜：AMD 的 NVL72 回应" title="Direct link to Helios 机柜：AMD 的 NVL72 回应" translate="no">​</a></h2>
<p>Helios 是 AMD 的机柜级方案，<strong>对标 NVIDIA GB300 NVL72 / Rubin NVL72</strong>：</p>
<table><thead><tr><th>项目</th><th>Helios 机柜</th><th>NVIDIA Rubin NVL72</th></tr></thead><tbody><tr><td>GPU 数</td><td><strong>72 颗 MI400</strong></td><td>72 颗 Rubin</td></tr><tr><td>CPU 数</td><td>36 颗 <strong>EPYC Venice</strong></td><td>36 颗 Vera</td></tr><tr><td>HBM 总量</td><td>31.1 TB HBM4</td><td>20.7 TB HBM4</td></tr><tr><td>Scale-up 互联</td><td><strong>UALoF 260 TB/s</strong></td><td>NVLink 6 252 TB/s</td></tr><tr><td>Scale-out 网络</td><td>Pensando Vulcano 800G</td><td>ConnectX-9 14.4 Tbps</td></tr><tr><td>FP4 算力 (dense)</td><td><strong>2.88 EFLOPS</strong></td><td>3.6 EFLOPS (sparse)</td></tr><tr><td>FP4 dense 折算</td><td>2.88 EF</td><td>1.8 EF</td></tr><tr><td>TDP（机柜）</td><td>~80 kW</td><td>~130 kW</td></tr><tr><td>散热</td><td>液冷必需</td><td>液冷必需</td></tr></tbody></table>
<blockquote>
<p><strong>AMD Helios 在 dense 算力上超越 NVIDIA Rubin NVL72（2.88 vs 1.8 EFLOPS）</strong>。但 NVIDIA 的 sparse 算力翻倍后达到 3.6 EFLOPS，所以是"互有胜负"。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ualof开放互联挑战-nvlink">UALoF：开放互联挑战 NVLink<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#ualof%E5%BC%80%E6%94%BE%E4%BA%92%E8%81%94%E6%8C%91%E6%88%98-nvlink" class="hash-link" aria-label="Direct link to UALoF：开放互联挑战 NVLink" title="Direct link to UALoF：开放互联挑战 NVLink" translate="no">​</a></h2>
<p><strong>Ultra Accelerator Link（UALoF / UALink）</strong> 是 AMD + Broadcom + Intel 共同推动的<strong>开放标准 scale-up 互联</strong>协议：</p>
<ul>
<li class=""><strong>目标</strong>：取代 NVLink 单家封闭生态</li>
<li class=""><strong>2026 首发</strong>：AMD Helios 72-GPU 机柜</li>
<li class=""><strong>后续</strong>：Intel Jaguar Shores、AWS UltraServers</li>
</ul>
<p>UALoF 关键特性：</p>
<table><thead><tr><th>特性</th><th>UALoF</th><th>NVLink 6</th></tr></thead><tbody><tr><td>标准化</td><td>开放标准</td><td>NVIDIA 私有</td></tr><tr><td>带宽（机柜级）</td><td>260 TB/s</td><td>252 TB/s</td></tr><tr><td>厂商</td><td>AMD/Broadcom/Intel</td><td>NVIDIA only</td></tr><tr><td>生态</td><td>ROCm + Open</td><td>CUDA only</td></tr><tr><td>未来扩展性</td><td><strong>高</strong></td><td>受限</td></tr></tbody></table>
<blockquote>
<p><strong>UALoF 的真正威胁不是当下，而是未来</strong>。如果 UALoF 能在 2-3 年内构建完整生态，NVIDIA 的封闭互联优势将被削弱。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="rocm-8-软件生态">ROCm 8 软件生态<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#rocm-8-%E8%BD%AF%E4%BB%B6%E7%94%9F%E6%80%81" class="hash-link" aria-label="Direct link to ROCm 8 软件生态" title="Direct link to ROCm 8 软件生态" translate="no">​</a></h2>
<p>AMD 在 ROCm 上持续投入：</p>
<ul>
<li class=""><strong>ROCm 7.x</strong>（2025 GA）：PyTorch / JAX / Triton 全优化</li>
<li class=""><strong>ROCm 8.x</strong>（2026）：CDNA Next 首发，全面支持 FP4 / FP8</li>
<li class=""><strong>vLLM 0.7+</strong>（AMD-SGLang 优化版）</li>
<li class=""><strong>AMD Composable Kernel (CK)</strong>：类比 CUDA Cores，开源</li>
<li class=""><strong>MIGraphX / ONNX-Runtime</strong>：推理引擎</li>
<li class=""><strong>Infinity Hub</strong>：AMD 官方参考实现</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="部署推荐">部署推荐<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#%E9%83%A8%E7%BD%B2%E6%8E%A8%E8%8D%90" class="hash-link" aria-label="Direct link to 部署推荐" title="Direct link to 部署推荐" translate="no">​</a></h2>
<table><thead><tr><th>场景</th><th>推荐配置</th></tr></thead><tbody><tr><td><strong>700B+ 模型训练</strong></td><td>Helios 机柜（72 GPU，<strong>单机柜可运行 700B 模型</strong>）</td></tr><tr><td><strong>1T+ 巨型模型训练</strong></td><td>多机柜 + UALoF 跨机柜互联</td></tr><tr><td><strong>超低延迟推理</strong></td><td>MI400 + FP4 + vLLM/AMD-SGLang</td></tr><tr><td><strong>科学计算</strong></td><td>MI400 + ROCm 7/8 + OpenMP</td></tr><tr><td><strong>多模态生成</strong></td><td>MI400（432GB 完整保留）</td></tr><tr><td><strong>偏好开放生态</strong></td><td>UALoF + ROCm 8（避免 NVIDIA 锁定）</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="mi400-vs-rubin-r200同期旗舰对比">MI400 vs Rubin R200（同期旗舰对比）<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#mi400-vs-rubin-r200%E5%90%8C%E6%9C%9F%E6%97%97%E8%88%B0%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to MI400 vs Rubin R200（同期旗舰对比）" title="Direct link to MI400 vs Rubin R200（同期旗舰对比）" translate="no">​</a></h2>
<table><thead><tr><th>指标</th><th>MI400 (CDNA Next)</th><th>Rubin R200</th></tr></thead><tbody><tr><td>显存</td><td><strong>432 GB HBM4</strong> ✅</td><td>288 GB HBM4</td></tr><tr><td>显存带宽</td><td>19.6 TB/s</td><td><strong>22 TB/s</strong> ✅</td></tr><tr><td>FP4 dense</td><td>40 PF ✅</td><td>25 PF</td></tr><tr><td>FP8 dense</td><td>20 PF</td><td>12.5 PF</td></tr><tr><td>每 GPU 互联</td><td>UALoF (开放) ✅</td><td>NVLink 6 (封闭)</td></tr><tr><td>每 GPU 网络</td><td>Pensando 800G</td><td><strong>ConnectX-9 14.4 Tbps</strong> ✅</td></tr><tr><td>CPU</td><td>EPYC Venice</td><td><strong>Vera ARM 88 核</strong> ✅</td></tr><tr><td>生态</td><td>ROCm 8 (开源) ✅</td><td>CUDA 13 (成熟) ✅</td></tr><tr><td>标准化</td><td>UALoF ✅</td><td>NVLink ❌</td></tr><tr><td>TDP</td><td>1,000 W ✅</td><td>1,800 W</td></tr></tbody></table>
<blockquote>
<p><strong>AMD 优势</strong>：显存大、FP4 dense 算力领先、开放互联、功耗较低
<strong>NVIDIA 优势</strong>：HBM 带宽、CPU 集成、DC 网络、CUDA 生态</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi400">AMD MI400 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi350">AMD MI350（前代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/rubin-r200">NVIDIA Rubin R200（同期）</a></li>
<li class=""><a href="https://www.amd.com/en/products/accelerators/instinct.html" target="_blank" rel="noopener noreferrer" class="">AMD Helios 机柜文档</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/roadmap">未来路线图</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/amd-mi400-helios-rack-open-interconnect#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>AMD MI400 + Helios 是 AMD 在 AI 算力上的<strong>最强反击</strong>：</p>
<ol>
<li class=""><strong>CDNA Next + 432 GB HBM4</strong> 在硬件规格上不输 NVIDIA</li>
<li class=""><strong>Helios 72-GPU 机柜</strong> 在 dense 算力上甚至超过 NVIDIA NVL72</li>
<li class=""><strong>UALoF 开放互联</strong> 是对 NVLink 封闭的真正威胁</li>
<li class=""><strong>ROCm 8 生态</strong> 持续改善，但仍需时间</li>
</ol>
<p>2026 年，AMD 是<strong>唯一能正面挑战 NVIDIA 的 GPU 厂商</strong>。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Product Launch" term="Product Launch"/>
        <category label="Tech Deep Dive" term="Tech Deep Dive"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Google TPU 8t + 8i：首次拆分训练/推理的 TPU 时代]]></title>
        <id>https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference</id>
        <link href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference"/>
        <updated>2026-04-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026-04-22 Google 公布 TPU 8t（训练专用 216GB HBM）+ TPU 8i（推理专用 288GB HBM），首次将 TPU 拆分为训练/推理两条独立产品线，集成 Arm Axion CPU。]]></summary>
        <content type="html"><![CDATA[<p>2026 年 4 月 22 日，Google 在 Cloud Next 大会上公布了 <strong>TPU 8t + TPU 8i</strong>——<strong>首次将 TPU 拆分为训练/推理两条独立产品线</strong>。TPU 8t 专注训练，TPU 8i 专注推理。这是 Google 应对 AI 推理时代的关键产品调整。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="为什么要拆分-tpu">为什么要拆分 TPU？<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#%E4%B8%BA%E4%BB%80%E4%B9%88%E8%A6%81%E6%8B%86%E5%88%86-tpu" class="hash-link" aria-label="Direct link to 为什么要拆分 TPU？" title="Direct link to 为什么要拆分 TPU？" translate="no">​</a></h2>
<p>过去 7 代 TPU（v1 → v7 Ironwood）都是<strong>训练/推理通用</strong>的：</p>
<ul>
<li class="">v4-v6e：训练为主、推理辅助</li>
<li class="">v7 Ironwood：开始偏向推理，但仍是通用</li>
</ul>
<p>但 2025-2026 年的 AI 行业发生根本变化：</p>
<ol>
<li class=""><strong>训练需求</strong>：仅几家头部公司（OpenAI、Anthropic、Google DeepMind、Meta、xAI）需要</li>
<li class=""><strong>推理需求</strong>：<strong>所有 AI 应用都需要</strong>，是 100× 更大的市场</li>
<li class=""><strong>推理优化方向</strong>：<strong>与训练截然不同</strong>
<ul>
<li class="">训练：算力 + 互联优先（compute-bound）</li>
<li class="">推理：显存 + 带宽 + 散热灵活优先（memory-bound + TCO 敏感）</li>
</ul>
</li>
</ol>
<p>Google 因此决定<strong>将 TPU 拆分为两条产品线</strong>：</p>
<table><thead><tr><th>产品</th><th>定位</th><th>核心优化</th></tr></thead><tbody><tr><td><strong>TPU 8t</strong></td><td>训练专用</td><td>算力 + 互联 + 集成 Axion CPU</td></tr><tr><td><strong>TPU 8i</strong></td><td>推理专用</td><td>显存 + 带宽 + 散热灵活</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8t训练专用">TPU 8t：训练专用<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8t%E8%AE%AD%E7%BB%83%E4%B8%93%E7%94%A8" class="hash-link" aria-label="Direct link to TPU 8t：训练专用" title="Direct link to TPU 8t：训练专用" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td>架构</td><td>TPU 8t（Trillium 2）</td></tr><tr><td>形态</td><td><strong>训练专用</strong></td></tr><tr><td>BF16 算力（密集）</td><td>~3,500 TFLOPS</td></tr><tr><td>FP8 算力（密集）</td><td>~7,000 TFLOPS</td></tr><tr><td>HBM 容量</td><td><strong>216 GB</strong></td></tr><tr><td>HBM 带宽</td><td><strong>6,528 GB/s</strong></td></tr><tr><td>ICI 互联</td><td>1,400 GB/s（双向）</td></tr><tr><td>集成 CPU</td><td><strong>Arm Axion（Google 自研，64 核）</strong></td></tr><tr><td>Pod 规模</td><td><strong>9,216 芯片</strong></td></tr><tr><td>拓扑</td><td>3D Torus</td></tr><tr><td>散热</td><td>液冷</td></tr></tbody></table>
<blockquote>
<p><strong>Arm Axion 是 Google 自研的 64 核 ARM CPU</strong>，首次进入 TPU 节点。这让 TPU 8t 节点成为 <strong>TPU + Axion CPU 协同系统</strong>，对标 NVIDIA Vera CPU。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8i推理专用">TPU 8i：推理专用<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8i%E6%8E%A8%E7%90%86%E4%B8%93%E7%94%A8" class="hash-link" aria-label="Direct link to TPU 8i：推理专用" title="Direct link to TPU 8i：推理专用" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td>架构</td><td>TPU 8i（Trillium 2）</td></tr><tr><td>形态</td><td><strong>推理专用</strong></td></tr><tr><td>BF16 算力（密集）</td><td><strong>~5,500 TFLOPS</strong></td></tr><tr><td>FP8 算力（密集）</td><td>~11,000 TFLOPS</td></tr><tr><td>INT8 算力</td><td>~22,000 TOPS</td></tr><tr><td>HBM 容量</td><td><strong>288 GB</strong></td></tr><tr><td>HBM 带宽</td><td><strong>8,601 GB/s</strong></td></tr><tr><td>散热</td><td><strong>风冷 / 液冷均可</strong></td></tr><tr><td>Pod 规模</td><td>256 芯片</td></tr></tbody></table>
<blockquote>
<p><strong>TPU 8i 单卡 288GB HBM = 当前最大显存推理 ASIC</strong>。单卡可装 FP16 70B 模型（不需张量并行），非常适合<strong>长上下文 RAG、Agentic AI</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8t-vs-8i-关键差异">TPU 8t vs 8i 关键差异<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8t-vs-8i-%E5%85%B3%E9%94%AE%E5%B7%AE%E5%BC%82" class="hash-link" aria-label="Direct link to TPU 8t vs 8i 关键差异" title="Direct link to TPU 8t vs 8i 关键差异" translate="no">​</a></h2>
<table><thead><tr><th>指标</th><th>TPU 8t（训练）</th><th><strong>TPU 8i（推理）</strong></th></tr></thead><tbody><tr><td>定位</td><td>训练</td><td><strong>推理</strong></td></tr><tr><td>BF16 算力</td><td>~3,500 TFLOPS</td><td><strong>~5,500 TFLOPS</strong>（更强）</td></tr><tr><td>HBM 容量</td><td>216 GB</td><td><strong>288 GB</strong>（更大）</td></tr><tr><td>HBM 带宽</td><td>6,528 GB/s</td><td><strong>8,601 GB/s</strong>（更高）</td></tr><tr><td>散热</td><td>液冷</td><td><strong>风冷/液冷</strong></td></tr><tr><td>Pod 规模</td><td>9,216 颗</td><td>256 颗</td></tr><tr><td>集成 CPU</td><td>Arm Axion</td><td>无（独立）</td></tr><tr><td>价格</td><td>高</td><td><strong>中</strong></td></tr></tbody></table>
<blockquote>
<p><strong>拆分目的</strong>：训练强调算力 + 互联，推理强调显存 + 带宽 + 散热灵活性。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8i-推理范式优化">TPU 8i 推理范式优化<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8i-%E6%8E%A8%E7%90%86%E8%8C%83%E5%BC%8F%E4%BC%98%E5%8C%96" class="hash-link" aria-label="Direct link to TPU 8i 推理范式优化" title="Direct link to TPU 8i 推理范式优化" translate="no">​</a></h2>
<p>TPU 8i 专门为推理场景优化：</p>
<table><thead><tr><th>优化方向</th><th>内容</th></tr></thead><tbody><tr><td><strong>超低延迟</strong></td><td><strong>TTFT &lt; 100ms</strong>（首 Token 延迟）</td></tr><tr><td><strong>高吞吐</strong></td><td>10,000+ tok/s（70B 模型 FP8）</td></tr><tr><td><strong>Long-context KV</strong></td><td><strong>288GB 完整保留 1M+ token 上下文</strong></td></tr><tr><td><strong>MoE 推理</strong></td><td>Expert Parallel 原生支持</td></tr><tr><td><strong>Speculative Decoding</strong></td><td>内部 speculative 加速</td></tr><tr><td><strong>Batching</strong></td><td>Continuous batching + PagedAttention</td></tr><tr><td><strong>Continuous KV Cache</strong></td><td><strong>KV Cache 跨请求共享</strong>（同 prefix 优化）</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8t-训练范式优化">TPU 8t 训练范式优化<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8t-%E8%AE%AD%E7%BB%83%E8%8C%83%E5%BC%8F%E4%BC%98%E5%8C%96" class="hash-link" aria-label="Direct link to TPU 8t 训练范式优化" title="Direct link to TPU 8t 训练范式优化" translate="no">​</a></h2>
<p>TPU 8t 专门为训练场景优化：</p>
<table><thead><tr><th>优化方向</th><th>内容</th></tr></thead><tbody><tr><td><strong>MoE 训练</strong></td><td>Expert Parallel 原生支持（DeepSeek / Mixtral 风格）</td></tr><tr><td><strong>Long-context 训练</strong></td><td>1M+ token 上下文训练优化</td></tr><tr><td><strong>RLHF / 后训练</strong></td><td>Online RL（DPO / PPO / GRPO）原生优化</td></tr><tr><td><strong>多模态训练</strong></td><td>视觉-语言联合训练（ViT + LLM 同步）</td></tr><tr><td><strong>AXIOM</strong></td><td>Arm Axion CPU 协同（数据预处理 / 权重初始化）</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8i-推理服务定价">TPU 8i 推理服务定价<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8i-%E6%8E%A8%E7%90%86%E6%9C%8D%E5%8A%A1%E5%AE%9A%E4%BB%B7" class="hash-link" aria-label="Direct link to TPU 8i 推理服务定价" title="Direct link to TPU 8i 推理服务定价" translate="no">​</a></h2>
<table><thead><tr><th>实例</th><th>每小时价格（推测）</th></tr></thead><tbody><tr><td><strong>TPU 8i v6e-equivalent</strong></td><td>~$3-5 / chip</td></tr><tr><td><strong>TPU v7 Ironwood</strong></td><td>~$6-8 / chip</td></tr><tr><td><strong>TPU 8i vs TPU v7</strong></td><td><strong>+50%</strong> 价格 / <strong>+150%</strong> 算力</td></tr></tbody></table>
<blockquote>
<p><strong>TPU 8i 单美元 BF16 算力比 TPU v7 Ironwood 高 70%</strong>（按 2.4× 算力 / 1.5× 价格）。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="软件生态">软件生态<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#%E8%BD%AF%E4%BB%B6%E7%94%9F%E6%80%81" class="hash-link" aria-label="Direct link to 软件生态" title="Direct link to 软件生态" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8t">TPU 8t<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8t" class="hash-link" aria-label="Direct link to TPU 8t" title="Direct link to TPU 8t" translate="no">​</a></h3>
<ul>
<li class=""><strong>JAX 0.5+</strong>：Google 主力训练框架</li>
<li class=""><strong>PyTorch/XLA 2.5+</strong>：PyTorch 兼容</li>
<li class=""><strong>TensorFlow 2.17+</strong>：旧框架</li>
<li class=""><strong>Paxml / Orbax</strong>：Google 内部 LLM 训练栈</li>
<li class=""><strong>MaxText</strong>：Google 参考实现</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpu-8i">TPU 8i<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#tpu-8i" class="hash-link" aria-label="Direct link to TPU 8i" title="Direct link to TPU 8i" translate="no">​</a></h3>
<ul>
<li class=""><strong>JAX 0.5+</strong>：推理</li>
<li class=""><strong>PyTorch/XLA 2.5+</strong>：推理</li>
<li class=""><strong>vLLM 0.8+</strong>（TPU 后端）：低延迟推理</li>
<li class=""><strong>Vertex AI Inference</strong>：Google 托管推理服务</li>
<li class=""><strong>Gemini API</strong>：内部最大用户</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="与同期竞品对比">与同期竞品对比<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#%E4%B8%8E%E5%90%8C%E6%9C%9F%E7%AB%9E%E5%93%81%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 与同期竞品对比" title="Direct link to 与同期竞品对比" translate="no">​</a></h2>
<table><thead><tr><th>指标</th><th>TPU 8t</th><th>TPU 8i</th><th>NVIDIA B300 Ultra</th><th>Groq 3 LPX</th></tr></thead><tbody><tr><td>定位</td><td>训练</td><td>推理</td><td>训练+推理</td><td>超低延迟推理</td></tr><tr><td>HBM/SRAM</td><td>216 GB HBM</td><td>288 GB HBM</td><td>288 GB HBM3e</td><td>128 GB SRAM</td></tr><tr><td>带宽</td><td>6.5 TB/s</td><td><strong>8.6 TB/s</strong></td><td>8 TB/s</td><td><strong>40 PB/s</strong></td></tr><tr><td>BF16 算力</td><td>3.5 PF</td><td>5.5 PF</td><td>3.5 PF (FP8 dense)</td><td>320 PF (机柜)</td></tr><tr><td>互联</td><td>3D Torus</td><td>3D Torus</td><td>NVLink 5</td><td>GroqSync</td></tr><tr><td>散热</td><td>液冷</td><td><strong>风冷</strong></td><td>液冷</td><td>液冷</td></tr><tr><td>客户</td><td>Google DeepMind</td><td>Gemini / Vertex AI</td><td>AWS / Azure</td><td>NVIDIA 客户</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8t">Google TPU 8t 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-8i">Google TPU 8i 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/google/tpu-ironwood">Google TPU v7 Ironwood（上一代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/architectures/arch-tpu">TPU 架构详解</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/b300-ultra">NVIDIA B300 Ultra</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/google-tpu-8t-8i-split-training-inference#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>Google TPU 8t + 8i 拆分是 AI 推理时代的标志性事件：</p>
<ol>
<li class=""><strong>首次拆分训练/推理 TPU</strong>——TPU 进入"专用化"时代</li>
<li class=""><strong>TPU 8i 288GB HBM</strong>——单卡可装 70B 模型</li>
<li class=""><strong>TPU 8i 风冷</strong>——降低数据中心部署门槛</li>
<li class=""><strong>Arm Axion 集成</strong>——Google 自研 CPU 进入 TPU</li>
<li class=""><strong>JAX 训练范式</strong>——Google 押注 JAX 作为下一代训练标准</li>
</ol>
<p><strong>Google 现在有"全场景 AI 算力覆盖"</strong>：</p>
<ul>
<li class="">训练：TPU 8t pod</li>
<li class="">通用推理：TPU 8i</li>
<li class="">Gemini API：TPU 8i 集群</li>
<li class="">Vertex AI：TPU 8i 商用</li>
</ul>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Product Launch" term="Product Launch"/>
        <category label="Vendor Strategy" term="Vendor Strategy"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[NVIDIA Vera Rubin Platform Deep Dive: 6-Chip Package, 288GB HBM4, 50 PFLOPS FP4]]></title>
        <id>https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive</id>
        <link href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive"/>
        <updated>2026-04-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[NVIDIA's next-gen flagship platform launching H2 2026: 6-chip CoWoS-L package, 288GB HBM4, 22 TB/s bandwidth, 50 PFLOPS FP4 sparse compute, ConnectX-9 28.8 TB/s networking.]]></summary>
        <content type="html"><![CDATA[<p>The NVIDIA Vera Rubin platform is NVIDIA's next-generation flagship computing platform after Blackwell. This article provides an in-depth analysis covering the naming origin, 6-chip packaging, memory subsystem, compute matrix, networking architecture, rack-scale solution, and software ecosystem.</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="naming-origin-honoring-astronomer-vera-rubin">Naming Origin: Honoring Astronomer Vera Rubin<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#naming-origin-honoring-astronomer-vera-rubin" class="hash-link" aria-label="Direct link to Naming Origin: Honoring Astronomer Vera Rubin" title="Direct link to Naming Origin: Honoring Astronomer Vera Rubin" translate="no">​</a></h2>
<p>NVIDIA chose "Vera Rubin" as the codename for the next-generation platform, honoring astronomer <strong>Vera Florence Cooper Rubin</strong> (1928-2016). In the 1960s-70s, through studying galactic rotation curves, she <strong>provided the first conclusive proof of dark matter's existence</strong>.</p>
<p>NVIDIA naming the next-generation platform of the AI compute revolution after her carries deep philosophical meaning:</p>
<ul>
<li class=""><strong>Dark matter</strong> is the "invisible" yet dominant mass component of the universe</li>
<li class=""><strong>AI compute</strong> is the "invisible" yet dominant underlying infrastructure of the digital economy</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-chip-packaging-cowos-l">6-Chip Packaging (CoWoS-L)<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#6-chip-packaging-cowos-l" class="hash-link" aria-label="Direct link to 6-Chip Packaging (CoWoS-L)" title="Direct link to 6-Chip Packaging (CoWoS-L)" translate="no">​</a></h2>
<p>The Vera Rubin platform adopts the industry's first <strong>6-chip CoWoS-L packaging</strong>:</p>
<table><thead><tr><th>Chip</th><th>Quantity</th><th>Role</th><th>Process</th></tr></thead><tbody><tr><td><strong>Vera CPU</strong></td><td>1</td><td>Host CPU / Prefetch / Interconnect Controller</td><td>TSMC 3NP</td></tr><tr><td><strong>Rubin GPU Die</strong></td><td>2</td><td>Matrix Compute Cores</td><td>TSMC 3NP / 4NP</td></tr><tr><td><strong>I/O / HBM Base Die</strong></td><td>3</td><td>HBM4 PHY + I/O + Interconnect</td><td>TSMC 4NP</td></tr></tbody></table>
<p>Compared to the Blackwell B300 Ultra's 2-chip package (only 2 GPU dies), Vera Rubin features a <strong>3× increase in chip count</strong>, with substantial transistor count growth.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="core-specifications-per-gpu">Core Specifications (Per GPU)<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#core-specifications-per-gpu" class="hash-link" aria-label="Direct link to Core Specifications (Per GPU)" title="Direct link to Core Specifications (Per GPU)" translate="no">​</a></h2>
<table><thead><tr><th>Item</th><th>Specification</th></tr></thead><tbody><tr><td><strong>Architecture</strong></td><td>Rubin</td></tr><tr><td><strong>Transistor Count</strong></td><td>~340 billion (per GPU)</td></tr><tr><td><strong>Memory</strong></td><td><strong>288 GB HBM4</strong></td></tr><tr><td><strong>Memory Bandwidth</strong></td><td><strong>22 TB/s</strong> (2.75× HBM3e)</td></tr><tr><td><strong>FP4 Tensor (sparse)</strong></td><td><strong>50 PFLOPS</strong></td></tr><tr><td><strong>FP8 Tensor (sparse)</strong></td><td>25 PFLOPS</td></tr><tr><td><strong>FP16/BF16 Tensor</strong></td><td>12.5 PFLOPS</td></tr><tr><td><strong>TDP</strong></td><td>~1,800 W (liquid cooling required)</td></tr></tbody></table>
<blockquote>
<p><strong>Data Convention</strong>: Post-Blackwell NVIDIA products continue to use <strong>sparse</strong> compute as the official metric. FP4 50 PF = dense 25 PF. For cross-vendor comparison, AMD MI400's 40 PF FP4 dense = NVIDIA Rubin R200's 80 PF FP4 sparse.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="vera-cpu-deep-dive">Vera CPU Deep Dive<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#vera-cpu-deep-dive" class="hash-link" aria-label="Direct link to Vera CPU Deep Dive" title="Direct link to Vera CPU Deep Dive" translate="no">​</a></h2>
<p>The <strong>Vera CPU</strong> is NVIDIA's first self-designed ARM-based server CPU, succeeding Grace:</p>
<table><thead><tr><th>Item</th><th>Specification</th></tr></thead><tbody><tr><td><strong>Architecture</strong></td><td>ARM v9.2 Olympus</td></tr><tr><td><strong>Core Count</strong></td><td><strong>88 cores</strong> (single package)</td></tr><tr><td><strong>Process</strong></td><td>TSMC 3NP</td></tr><tr><td><strong>L2 Cache</strong></td><td>1 MB per core</td></tr><tr><td><strong>L3 Cache</strong></td><td>Shared 264 MB</td></tr><tr><td><strong>Memory</strong></td><td>12-channel DDR5-8000</td></tr><tr><td><strong>Memory Bandwidth</strong></td><td>614 GB/s</td></tr><tr><td><strong>CXL 2.0</strong></td><td>Yes (Type-3 memory expansion)</td></tr><tr><td><strong>PCIe</strong></td><td>PCIe Gen 6 (128 lanes)</td></tr><tr><td><strong>TDP</strong></td><td>300-450 W</td></tr></tbody></table>
<blockquote>
<p><strong>Why ARM?</strong> NVIDIA acquired ARM in 2020 (failed), but obtained the <strong>permanent ARM architecture license</strong> and <strong>Neoverse series code</strong>. Vera is essentially NVIDIA's "ARM Olympus" design.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="rubin-gpu-4x-blackwell-performance">Rubin GPU: 4x Blackwell Performance<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#rubin-gpu-4x-blackwell-performance" class="hash-link" aria-label="Direct link to Rubin GPU: 4x Blackwell Performance" title="Direct link to Rubin GPU: 4x Blackwell Performance" translate="no">​</a></h2>
<p>The Rubin GPU achieves <strong>4× Blackwell B300 Ultra</strong> performance through:</p>
<ul>
<li class=""><strong>2× SMs</strong> (Streaming Multiprocessors)</li>
<li class=""><strong>2× HBM4 bandwidth</strong> (22 TB/s vs HBM3e 11 TB/s)</li>
<li class=""><strong>FP4 Tensor Core</strong> (new precision, 4× FP8)</li>
<li class=""><strong>Transformer Engine 3</strong> (9× DNN inference acceleration)</li>
<li class=""><strong>RAS Engine</strong> (Reliability, Availability, Serviceability)</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="memory-subsystem-hbm4-288gb">Memory Subsystem: HBM4 288GB<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#memory-subsystem-hbm4-288gb" class="hash-link" aria-label="Direct link to Memory Subsystem: HBM4 288GB" title="Direct link to Memory Subsystem: HBM4 288GB" translate="no">​</a></h2>
<p>The HBM4 standard, formalized in 2026, brings:</p>
<ul>
<li class=""><strong>Per-stack capacity</strong>: 36 GB → <strong>48 GB</strong> (33% increase)</li>
<li class=""><strong>Per-stack bandwidth</strong>: 1.2 TB/s → <strong>1.6 TB/s</strong> (33% increase)</li>
<li class=""><strong>Stack height</strong>: 12-Hi → <strong>16-Hi</strong></li>
<li class=""><strong>Per-package capacity</strong>: 288 GB (6 stacks × 48 GB)</li>
<li class=""><strong>Per-package bandwidth</strong>: 22 TB/s (theoretical peak)</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="compute-matrix-fp4-everywhere">Compute Matrix: FP4 Everywhere<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#compute-matrix-fp4-everywhere" class="hash-link" aria-label="Direct link to Compute Matrix: FP4 Everywhere" title="Direct link to Compute Matrix: FP4 Everywhere" translate="no">​</a></h2>
<p>Vera Rubin is the <strong>first NVIDIA platform to natively support FP4</strong> data type:</p>
<table><thead><tr><th>Precision</th><th>Throughput (per GPU, sparse)</th><th>Use Cases</th></tr></thead><tbody><tr><td><strong>FP4</strong></td><td><strong>50 PFLOPS</strong></td><td>Inference, small-batch training</td></tr><tr><td>FP8</td><td>25 PFLOPS</td><td>LLM training, large model inference</td></tr><tr><td>FP16/BF16</td><td>12.5 PFLOPS</td><td>Traditional training, scientific computing</td></tr><tr><td>FP32</td><td>6.25 PFLOPS</td><td>HPC, traditional scientific computing</td></tr><tr><td>FP64</td><td>3.1 PFLOPS</td><td>Numerical simulation</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="networking-connectx-9-288-tbs">Networking: ConnectX-9 28.8 TB/s<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#networking-connectx-9-288-tbs" class="hash-link" aria-label="Direct link to Networking: ConnectX-9 28.8 TB/s" title="Direct link to Networking: ConnectX-9 28.8 TB/s" translate="no">​</a></h2>
<p>The ConnectX-9 is NVIDIA's next-generation network adapter:</p>
<ul>
<li class=""><strong>Bandwidth</strong>: <strong>1.6 Tbps per port</strong> (2× ConnectX-8)</li>
<li class=""><strong>GPU ratio</strong>: 2× ConnectX-9 per GPU</li>
<li class=""><strong>Aggregate GPU-to-GPU</strong>: <strong>28.8 TB/s</strong> (per Rubin GPU)</li>
<li class=""><strong>Protocols</strong>: NVLink 6, PCIe Gen 6, RoCE v2, InfiniBand NDR</li>
<li class=""><strong>Encryption</strong>: AES-256-GCM hardware acceleration</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="rack-scale-solutions">Rack-Scale Solutions<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#rack-scale-solutions" class="hash-link" aria-label="Direct link to Rack-Scale Solutions" title="Direct link to Rack-Scale Solutions" translate="no">​</a></h2>
<p>Vera Rubin supports two rack-scale configurations:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-rubin-nvl72-1-rack">1. Rubin NVL72 (1 Rack)<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#1-rubin-nvl72-1-rack" class="hash-link" aria-label="Direct link to 1. Rubin NVL72 (1 Rack)" title="Direct link to 1. Rubin NVL72 (1 Rack)" translate="no">​</a></h3>
<ul>
<li class=""><strong>GPU count</strong>: 72 (NVL72 1×72 topology)</li>
<li class=""><strong>CPU count</strong>: 36 Vera</li>
<li class=""><strong>Total HBM4</strong>: 20.7 TB</li>
<li class=""><strong>Total bandwidth</strong>: 1.6 PB/s</li>
<li class=""><strong>Peak FP4</strong>: <strong>1.4 EFLOPS</strong> (1 Rack unit)</li>
<li class=""><strong>Power</strong>: ~130 kW</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-rubin-nvl576-8-racks">2. Rubin NVL576 (8 Racks)<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#2-rubin-nvl576-8-racks" class="hash-link" aria-label="Direct link to 2. Rubin NVL576 (8 Racks)" title="Direct link to 2. Rubin NVL576 (8 Racks)" translate="no">​</a></h3>
<ul>
<li class=""><strong>GPU count</strong>: 576 (NVL576 8×72)</li>
<li class=""><strong>CPU count</strong>: 288 Vera</li>
<li class=""><strong>Total HBM4</strong>: 165 TB</li>
<li class=""><strong>Total bandwidth</strong>: 12.7 PB/s</li>
<li class=""><strong>Peak FP4</strong>: <strong>28.8 EFLOPS</strong> (full cluster)</li>
<li class=""><strong>Power</strong>: ~1.1 MW (single data hall)</li>
</ul>
<blockquote>
<p><strong>1.1 MW single-hall power consumption</strong> marks Vera Rubin NVL576 as <strong>the first data hall that requires substation-level dedicated power supply</strong>, posing new challenges to data center design.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="software-ecosystem">Software Ecosystem<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#software-ecosystem" class="hash-link" aria-label="Direct link to Software Ecosystem" title="Direct link to Software Ecosystem" translate="no">​</a></h2>
<p>The Vera Rubin platform's software stack:</p>
<ul>
<li class=""><strong>CUDA 13.0</strong>: Full FP4 / FP8 / FP16 / BF16 / TF32 support</li>
<li class=""><strong>cuDNN 9.0</strong>: FP4 Tensor Core acceleration</li>
<li class=""><strong>TensorRT 11.0</strong>: LLM FP4 quantization deployment</li>
<li class=""><strong>Megatron-LM 0.12</strong>: Distributed training framework</li>
<li class=""><strong>TensorRT-LLM 2.0</strong>: LLM inference optimization</li>
<li class=""><strong>NeMo 2.0</strong>: LLM/Speech AI/Visual Agent framework</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="competitive-analysis">Competitive Analysis<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#competitive-analysis" class="hash-link" aria-label="Direct link to Competitive Analysis" title="Direct link to Competitive Analysis" translate="no">​</a></h2>
<table><thead><tr><th>Vendor</th><th>Product</th><th>Peak FP4 dense</th><th>Memory</th><th>Memory BW</th><th>Process</th></tr></thead><tbody><tr><td><strong>NVIDIA</strong></td><td>Vera Rubin R200</td><td>25 PF (50 PF sparse)</td><td>288 GB HBM4</td><td>22 TB/s</td><td>3NP</td></tr><tr><td><strong>AMD</strong></td><td>MI400</td><td><strong>40 PF dense</strong></td><td>432 GB HBM4</td><td>~12 TB/s</td><td>3N + 5N</td></tr><tr><td><strong>Google</strong></td><td>TPU Ironwood v7</td><td>2.3 PF dense</td><td>192 GB HBM</td><td>7.4 TB/s</td><td>5N</td></tr><tr><td><strong>AWS</strong></td><td>Trainium 3</td><td>1.8 PF dense</td><td>144 GB HBM</td><td>2.7 TB/s</td><td>3N</td></tr><tr><td><strong>Huawei</strong></td><td>Ascend 920</td><td>0.45 PF dense</td><td>192 GB HBM2e</td><td>4 TB/s</td><td>6N SMIC</td></tr></tbody></table>
<blockquote>
<p><strong>Apple Silicon M5 Ultra</strong> (2026 H2 predicted) 400B FP8 = NVIDIA Rubin R200's 25 PF FP8 sparse (dense) × 6.25 = 156 PF FP8. Apple still 6× behind in single-chip performance, but with <strong>384GB UMA + 1 TB/s bandwidth</strong>, it's a strong local LLM inference option.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://mirrorfrog.com/en/blog/nvidia-vera-rubin-platform-6-chips-deep-dive#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Vera Rubin is the culmination of <strong>5 architectural innovations</strong>:</p>
<ol>
<li class=""><strong>6-chip CoWoS-L packaging</strong> (industry first)</li>
<li class=""><strong>288GB HBM4</strong> (3× B200 capacity)</li>
<li class=""><strong>22 TB/s memory bandwidth</strong> (2.75× HBM3e)</li>
<li class=""><strong>50 PFLOPS FP4 sparse</strong> (2.5× B300 Ultra)</li>
<li class=""><strong>ConnectX-9 28.8 TB/s</strong> (2× ConnectX-8)</li>
</ol>
<p>For AI workloads, Vera Rubin is the <strong>next-generation foundation for trillion-parameter model training and Agentic AI inference</strong>.</p>
<blockquote>
<p>💡 <strong>Disclaimer</strong>: All "predicted" and "rumored" data points are clearly marked and <strong>do not constitute investment advice</strong>.</p>
</blockquote>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Product Launch" term="Product Launch"/>
        <category label="Tech Deep Dive" term="Tech Deep Dive"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Cerebras IPO 深度分析：S-1 申请、22-25B 估值、OpenAI $10B 大单]]></title>
        <id>https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation</id>
        <link href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation"/>
        <updated>2026-04-20T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026-04-17 Cerebras 提交 S-1 申请 IPO，目标 2026-05 上市 Nasdaq:CBRS，估值 $22-25B，2025 营收 $510M，OpenAI $10B 推理算力长期合同，全球第二大晶圆级 AI 公司即将上市。]]></summary>
        <content type="html"><![CDATA[<p><strong>2026 年 4 月 17 日</strong>，Cerebras Systems 正式向 SEC 提交 <strong>S-1 招股书</strong>，申请在 <strong>Nasdaq</strong> 上市，<strong>目标 2026 年 5 月</strong>。这是 AI 芯片行业 2026 年最重大的 IPO 之一。本文将深入分析 Cerebras 的财务、战略、客户和未来。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ipo-关键数据">IPO 关键数据<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#ipo-%E5%85%B3%E9%94%AE%E6%95%B0%E6%8D%AE" class="hash-link" aria-label="Direct link to IPO 关键数据" title="Direct link to IPO 关键数据" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>内容</th></tr></thead><tbody><tr><td><strong>IPO 申请日期</strong></td><td><strong>2026-04-17</strong>（S-1 提交）</td></tr><tr><td><strong>目标上市日期</strong></td><td><strong>2026-05</strong>（Nasdaq: <strong>CBRS</strong>）</td></tr><tr><td><strong>估值</strong></td><td><strong>$22-25B</strong></td></tr><tr><td><strong>2025 营收</strong></td><td>~$510M（同比 +150%）</td></tr><tr><td><strong>2025 净亏损</strong></td><td>~$200M（仍亏损，但亏损率收窄）</td></tr><tr><td><strong>关键大单</strong></td><td><strong>OpenAI $10B 推理算力长期合同</strong></td></tr><tr><td><strong>主要客户</strong></td><td>OpenAI、G42、Mistral、Meta、Mayo Clinic</td></tr><tr><td><strong>投行</strong></td><td>高盛 / 摩根士丹利 / 摩根大通</td></tr><tr><td><strong>创始人</strong></td><td>Andrew Feldman（CEO）</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="财务数据来自-s-1">财务数据（来自 S-1）<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E8%B4%A2%E5%8A%A1%E6%95%B0%E6%8D%AE%E6%9D%A5%E8%87%AA-s-1" class="hash-link" aria-label="Direct link to 财务数据（来自 S-1）" title="Direct link to 财务数据（来自 S-1）" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="营收增长">营收增长<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E8%90%A5%E6%94%B6%E5%A2%9E%E9%95%BF" class="hash-link" aria-label="Direct link to 营收增长" title="Direct link to 营收增长" translate="no">​</a></h3>
<table><thead><tr><th>年份</th><th>营收</th><th>同比</th></tr></thead><tbody><tr><td>2023</td><td>~$80M</td><td>—</td></tr><tr><td>2024</td><td>~$200M</td><td>+150%</td></tr><tr><td><strong>2025</strong></td><td><strong>~$510M</strong></td><td><strong>+155%</strong></td></tr><tr><td>2026 (E)</td><td>~$1.2B</td><td>+135%</td></tr></tbody></table>
<blockquote>
<p><strong>三年营收增长 6.4 倍</strong>，是 AI 芯片行业增长最快的公司之一。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="净亏损">净亏损<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E5%87%80%E4%BA%8F%E6%8D%9F" class="hash-link" aria-label="Direct link to 净亏损" title="Direct link to 净亏损" translate="no">​</a></h3>
<table><thead><tr><th>年份</th><th>净亏损</th><th>亏损率</th></tr></thead><tbody><tr><td>2023</td><td>~$120M</td><td>-150%</td></tr><tr><td>2024</td><td>~$180M</td><td>-90%</td></tr><tr><td><strong>2025</strong></td><td><strong>~$200M</strong></td><td><strong>-39%</strong>（亏损率收窄）</td></tr><tr><td>2026 (E)</td><td>~$50M</td><td>-4%</td></tr></tbody></table>
<blockquote>
<p><strong>亏损率从 -150% 收窄到 -39%</strong>，2026 年预计接近盈亏平衡。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="客户集中度">客户集中度<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E5%AE%A2%E6%88%B7%E9%9B%86%E4%B8%AD%E5%BA%A6" class="hash-link" aria-label="Direct link to 客户集中度" title="Direct link to 客户集中度" translate="no">​</a></h3>
<table><thead><tr><th>客户</th><th>营收占比</th></tr></thead><tbody><tr><td><strong>G42</strong>（阿联酋）</td><td>~25%</td></tr><tr><td><strong>OpenAI</strong></td><td>~20%</td></tr><tr><td><strong>其他企业</strong></td><td>~55%</td></tr></tbody></table>
<blockquote>
<p><strong>G42 + OpenAI 占 45% 营收</strong>，客户集中度较高，但 OpenAI 合同大幅增长后这一比例将变化。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="关键大单openai-10b-推理合同">关键大单：OpenAI $10B 推理合同<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E5%85%B3%E9%94%AE%E5%A4%A7%E5%8D%95openai-10b-%E6%8E%A8%E7%90%86%E5%90%88%E5%90%8C" class="hash-link" aria-label="Direct link to 关键大单：OpenAI $10B 推理合同" title="Direct link to 关键大单：OpenAI $10B 推理合同" translate="no">​</a></h2>
<p>2025 年 12 月，Cerebras 宣布与 <strong>OpenAI 签订 $10B 推理算力长期合同</strong>（10 年期）：</p>
<table><thead><tr><th>项目</th><th>详情</th></tr></thead><tbody><tr><td><strong>合同金额</strong></td><td>$10B（10 年期）</td></tr><tr><td><strong>服务</strong></td><td>OpenAI 模型推理算力</td></tr><tr><td><strong>起始时间</strong></td><td>2026 Q2</td></tr><tr><td><strong>年化金额</strong></td><td>$1B/年</td></tr><tr><td><strong>硬件</strong></td><td>CS-3 (WSE-3) + 未来 CS-4 (WSE-4)</td></tr><tr><td><strong>意义</strong></td><td><strong>Cerebras 营收基础大幅扩展</strong></td></tr></tbody></table>
<blockquote>
<p>这份合同让 Cerebras 的 2026 营收预测从 ~$700M 提升到 ~$1.2B。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="客户列表">客户列表<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E5%AE%A2%E6%88%B7%E5%88%97%E8%A1%A8" class="hash-link" aria-label="Direct link to 客户列表" title="Direct link to 客户列表" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="旗舰客户">旗舰客户<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E6%97%97%E8%88%B0%E5%AE%A2%E6%88%B7" class="hash-link" aria-label="Direct link to 旗舰客户" title="Direct link to 旗舰客户" translate="no">​</a></h3>
<table><thead><tr><th>客户</th><th>行业</th><th>应用</th></tr></thead><tbody><tr><td><strong>OpenAI</strong></td><td>AI 实验室</td><td>GPT 系列推理</td></tr><tr><td><strong>G42</strong></td><td>阿联酋主权 AI</td><td>国家 AI 基础设施</td></tr><tr><td><strong>Meta</strong></td><td>互联网</td><td>Llama 训练</td></tr><tr><td><strong>Mistral</strong></td><td>AI 公司</td><td>模型训练 + 推理</td></tr><tr><td><strong>Mayo Clinic</strong></td><td>医疗</td><td>医疗 AI 训练</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="科研--政府客户">科研 / 政府客户<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E7%A7%91%E7%A0%94--%E6%94%BF%E5%BA%9C%E5%AE%A2%E6%88%B7" class="hash-link" aria-label="Direct link to 科研 / 政府客户" title="Direct link to 科研 / 政府客户" translate="no">​</a></h3>
<table><thead><tr><th>客户</th><th>应用</th></tr></thead><tbody><tr><td><strong>Argonne National Lab</strong></td><td>科学计算</td></tr><tr><td><strong>Los Alamos National Lab</strong></td><td>国家安全</td></tr><tr><td><strong>NASA</strong></td><td>气候模拟</td></tr><tr><td><strong>Sandia National Lab</strong></td><td>国防</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="企业客户">企业客户<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E4%BC%81%E4%B8%9A%E5%AE%A2%E6%88%B7" class="hash-link" aria-label="Direct link to 企业客户" title="Direct link to 企业客户" translate="no">​</a></h3>
<table><thead><tr><th>客户</th><th>行业</th></tr></thead><tbody><tr><td><strong>GSK</strong></td><td>制药</td></tr><tr><td><strong>AstraZeneca</strong></td><td>制药</td></tr><tr><td><strong>Total</strong></td><td>能源</td></tr><tr><td><strong>BMW</strong></td><td>自动驾驶</td></tr><tr><td><strong>Daimler</strong></td><td>自动驾驶</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="核心产品wse-3-cs-3">核心产品：WSE-3 (CS-3)<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E6%A0%B8%E5%BF%83%E4%BA%A7%E5%93%81wse-3-cs-3" class="hash-link" aria-label="Direct link to 核心产品：WSE-3 (CS-3)" title="Direct link to 核心产品：WSE-3 (CS-3)" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td>晶体管数</td><td>4 万亿</td></tr><tr><td>核心数</td><td>900,000</td></tr><tr><td>片上 SRAM</td><td>44 GB</td></tr><tr><td>算力 (BF16 稀疏)</td><td>125 PFLOPS</td></tr><tr><td>算力 (FP8)</td><td>250 PFLOPS（推测）</td></tr><tr><td>内存带宽</td><td>21 PB/s</td></tr><tr><td>互联</td><td>SwarmX（多 WSE 互联）</td></tr><tr><td>TDP</td><td>~25 kW</td></tr><tr><td>价格</td><td>~$3-5M/系统</td></tr><tr><td>发布时间</td><td>2024</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="未来产品wse-4-cs-4-推测">未来产品：WSE-4 (CS-4, 推测)<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E6%9C%AA%E6%9D%A5%E4%BA%A7%E5%93%81wse-4-cs-4-%E6%8E%A8%E6%B5%8B" class="hash-link" aria-label="Direct link to 未来产品：WSE-4 (CS-4, 推测)" title="Direct link to 未来产品：WSE-4 (CS-4, 推测)" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>WSE-3</th><th><strong>WSE-4 (推测)</strong></th><th>提升</th></tr></thead><tbody><tr><td>制程</td><td>TSMC 5nm</td><td><strong>TSMC 3nm</strong></td><td>+一代</td></tr><tr><td>晶体管数</td><td>4 万亿</td><td><strong>~5 万亿</strong></td><td>1.4×</td></tr><tr><td>核心数</td><td>900,000</td><td><strong>~1,500,000</strong></td><td>1.67×</td></tr><tr><td>SRAM</td><td>44 GB</td><td><strong>~80 GB</strong></td><td>1.8×</td></tr><tr><td>SRAM 带宽</td><td>21 PB/s</td><td><strong>~40 PB/s</strong></td><td>1.9×</td></tr><tr><td>BF16 算力</td><td>125 PFLOPS</td><td><strong>~200 PFLOPS</strong></td><td>1.6×</td></tr><tr><td>TDP</td><td>25 kW</td><td><strong>~30-35 kW</strong></td><td>1.3×</td></tr><tr><td>发布时间</td><td>2024</td><td><strong>2027 预计</strong></td><td></td></tr></tbody></table>
<blockquote>
<p>⚠️ <strong>WSE-4 未官方公布</strong>，以上为推测。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="估值分析">估值分析<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E4%BC%B0%E5%80%BC%E5%88%86%E6%9E%90" class="hash-link" aria-label="Direct link to 估值分析" title="Direct link to 估值分析" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="估值倍数">估值倍数<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E4%BC%B0%E5%80%BC%E5%80%8D%E6%95%B0" class="hash-link" aria-label="Direct link to 估值倍数" title="Direct link to 估值倍数" translate="no">​</a></h3>
<table><thead><tr><th>指标</th><th>Cerebras 2025</th><th>估值倍数</th></tr></thead><tbody><tr><td>营收</td><td>$510M</td><td><strong>43-49×</strong>（按 $22-25B 估值）</td></tr><tr><td>净亏损</td><td>-$200M</td><td>—</td></tr><tr><td>2026 营收（E）</td><td>$1.2B</td><td><strong>18-21×</strong></td></tr><tr><td>2027 营收（E）</td><td>$2.5B</td><td><strong>9-10×</strong></td></tr></tbody></table>
<blockquote>
<p><strong>按 2026 营收，估值倍数 18-21× SaaS 类似</strong>；按 2027 营收，<strong>9-10× 接近 NVIDIA 长期倍数</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="与-nvidia-估值对比">与 NVIDIA 估值对比<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E4%B8%8E-nvidia-%E4%BC%B0%E5%80%BC%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 与 NVIDIA 估值对比" title="Direct link to 与 NVIDIA 估值对比" translate="no">​</a></h3>
<table><thead><tr><th>厂商</th><th>市值</th><th>2025 营收</th><th>市销率</th></tr></thead><tbody><tr><td><strong>NVIDIA</strong></td><td>~$4,000B</td><td>~$130B</td><td>~31×</td></tr><tr><td><strong>Cerebras</strong></td><td>$22-25B</td><td>$510M</td><td>43-49×</td></tr><tr><td><strong>AMD</strong></td><td>~$280B</td><td>~$25B</td><td>11×</td></tr></tbody></table>
<blockquote>
<p><strong>Cerebras 市销率高于 NVIDIA</strong>——市场对 Cerebras 增长预期较高。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="投资亮点">投资亮点<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E6%8A%95%E8%B5%84%E4%BA%AE%E7%82%B9" class="hash-link" aria-label="Direct link to 投资亮点" title="Direct link to 投资亮点" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-晶圆级技术领先">1. 晶圆级技术领先<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#1-%E6%99%B6%E5%9C%86%E7%BA%A7%E6%8A%80%E6%9C%AF%E9%A2%86%E5%85%88" class="hash-link" aria-label="Direct link to 1. 晶圆级技术领先" title="Direct link to 1. 晶圆级技术领先" translate="no">​</a></h3>
<ul>
<li class=""><strong>单芯片 125 PFLOPS BF16</strong> = 5× H100 单卡（BF16 989 TFLOPS）</li>
<li class=""><strong>44 GB SRAM 远超 HBM</strong>：SRAM 比 HBM 快 1000×</li>
<li class=""><strong>21 PB/s 内存带宽</strong>：比 H100 HBM 6000×</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-推理市场新机遇">2. 推理市场新机遇<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#2-%E6%8E%A8%E7%90%86%E5%B8%82%E5%9C%BA%E6%96%B0%E6%9C%BA%E9%81%87" class="hash-link" aria-label="Direct link to 2. 推理市场新机遇" title="Direct link to 2. 推理市场新机遇" translate="no">​</a></h3>
<ul>
<li class=""><strong>OpenAI $10B 合同</strong> = 10 年长期收入</li>
<li class=""><strong>超低延迟推理</strong>：vs GPU HBM 延迟，SRAM 快 1000×</li>
<li class=""><strong>机柜级集成</strong>：单系统可推理万亿参数模型</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-软件生态完善">3. 软件生态完善<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#3-%E8%BD%AF%E4%BB%B6%E7%94%9F%E6%80%81%E5%AE%8C%E5%96%84" class="hash-link" aria-label="Direct link to 3. 软件生态完善" title="Direct link to 3. 软件生态完善" translate="no">​</a></h3>
<ul>
<li class=""><strong>Cerebras Software Platform (CSoft)</strong>：基于 PyTorch</li>
<li class=""><strong>JAX + Cerebras backend</strong>：Google 集成</li>
<li class=""><strong>vLLM 0.7+ Cerebras backend</strong>（推测）</li>
<li class=""><strong>HuggingFace 集成</strong></li>
<li class=""><strong>Triton + Cerebras backend</strong></li>
<li class=""><strong>OpenAI 兼容 API</strong>（Cerebras Inference）</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="投资风险">投资风险<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E6%8A%95%E8%B5%84%E9%A3%8E%E9%99%A9" class="hash-link" aria-label="Direct link to 投资风险" title="Direct link to 投资风险" translate="no">​</a></h2>
<table><thead><tr><th>风险</th><th>影响</th></tr></thead><tbody><tr><td><strong>持续亏损</strong></td><td>2025 仍亏 $200M</td></tr><tr><td><strong>客户集中度</strong></td><td>G42 + OpenAI = 45% 营收</td></tr><tr><td><strong>TDP 高</strong></td><td>25 kW/单芯片，散热挑战</td></tr><tr><td><strong>价格高</strong></td><td>$3-5M/系统</td></tr><tr><td><strong>NVIDIA Groq 收购</strong></td><td>超低延迟推理市场竞争加剧</td></tr><tr><td><strong>WSE-4 延迟</strong></td><td>2027 才发布，AMD MI400 / NVIDIA Rubin 已 2026 GA</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ipo-后影响">IPO 后影响<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#ipo-%E5%90%8E%E5%BD%B1%E5%93%8D" class="hash-link" aria-label="Direct link to IPO 后影响" title="Direct link to IPO 后影响" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-算力-ipo-行业洗牌">1. 算力 IPO 行业洗牌<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#1-%E7%AE%97%E5%8A%9B-ipo-%E8%A1%8C%E4%B8%9A%E6%B4%97%E7%89%8C" class="hash-link" aria-label="Direct link to 1. 算力 IPO 行业洗牌" title="Direct link to 1. 算力 IPO 行业洗牌" translate="no">​</a></h3>
<table><thead><tr><th>公司</th><th>状态</th><th>市值</th></tr></thead><tbody><tr><td><strong>NVIDIA</strong></td><td>上市</td><td>~$4,000B</td></tr><tr><td><strong>Cerebras</strong></td><td><strong>即将上市</strong></td><td>$22-25B</td></tr><tr><td><strong>Groq</strong>（被 NVIDIA 收购）</td><td>已退出 IPO</td><td>—</td></tr><tr><td><strong>SambaNova</strong></td><td>仍在私募</td><td>推测 $5-10B</td></tr><tr><td><strong>Tenstorrent</strong></td><td>仍在私募</td><td>推测 $4-7B</td></tr><tr><td><strong>Cambricon 寒武纪</strong></td><td>已 A 股上市</td><td>~$8B</td></tr></tbody></table>
<blockquote>
<p><strong>Cerebras 上市后，将是 NVIDIA 之外最大的纯 AI 芯片上市公司</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-国产晶圆级芯片">2. 国产晶圆级芯片<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#2-%E5%9B%BD%E4%BA%A7%E6%99%B6%E5%9C%86%E7%BA%A7%E8%8A%AF%E7%89%87" class="hash-link" aria-label="Direct link to 2. 国产晶圆级芯片" title="Direct link to 2. 国产晶圆级芯片" translate="no">​</a></h3>
<p>中国国产晶圆级 AI 芯片进展：</p>
<ul>
<li class=""><strong>壁仞科技</strong> BR104：300 TFLOPS（FP16）</li>
<li class=""><strong>天数智芯</strong> Iluvatar Bi-150：200 TFLOPS</li>
<li class=""><strong>摩尔线程</strong> MTT S5000：250 TFLOPS</li>
</ul>
<blockquote>
<p><strong>Cerebras 上市将刺激国产晶圆级 AI 芯片融资和上市</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-3">Cerebras WSE-3 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-4">Cerebras WSE-4 推测规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-2">Cerebras WSE-2（上一代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/architectures/arch-wse">WSE 架构详解</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/groq-3-lpx">NVIDIA Groq 3 LPX（竞争对手）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/roadmap">未来路线图</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/cerebras-ipo-2026-s-1-cbrs-nasdaq-22b-valuation#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>Cerebras IPO 是 2026 年 AI 芯片行业<strong>最重大事件</strong>之一：</p>
<ol>
<li class=""><strong>S-1 2026-04-17 提交</strong>，目标 2026-05 上市 Nasdaq: <strong>CBRS</strong></li>
<li class=""><strong>估值 $22-25B</strong>，2025 营收 $510M</li>
<li class=""><strong>OpenAI $10B 长期合同</strong> = 10 年收入基础</li>
<li class=""><strong>晶圆级技术领先</strong>（单芯片 125 PFLOPS）</li>
<li class=""><strong>亏损率收窄</strong>（从 -150% 到 -39%）</li>
<li class=""><strong>WSE-4 2027 发布</strong> = IPO 后首代产品</li>
</ol>
<p>Cerebras 上市后，<strong>AI 算力市场将形成 NVIDIA + Cerebras 双寡头格局</strong>。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Vendor Strategy" term="Vendor Strategy"/>
        <category label="Industry News" term="Industry News"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[NVIDIA 200 亿美元收购 Groq：LPU 正式进入 NVIDIA 生态]]></title>
        <id>https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy</id>
        <link href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy"/>
        <updated>2026-04-15T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2026-Q1 NVIDIA 以约 200 亿美元全资收购 Groq，Groq 3 LPU 重命名为 NVIDIA Groq 3 LPX，作为 Vera Rubin 平台的超低延迟推理 co-processor，TTFT < 20ms。]]></summary>
        <content type="html"><![CDATA[<p>2026 年 Q1，AI 芯片行业最大的新闻之一：<strong>NVIDIA 以约 200 亿美元全资收购 Groq</strong>。这意味着 Groq 的 LPU 架构正式成为 NVIDIA 算力版图的一部分，与 GPU 形成互补。本文将详细分析这次收购的战略意义。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="收购时间线">收购时间线<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#%E6%94%B6%E8%B4%AD%E6%97%B6%E9%97%B4%E7%BA%BF" class="hash-link" aria-label="Direct link to 收购时间线" title="Direct link to 收购时间线" translate="no">​</a></h2>
<table><thead><tr><th>时间</th><th>事件</th><th>详情</th></tr></thead><tbody><tr><td><strong>2024-2025</strong></td><td>Groq 独立运营</td><td>LPU v1 商用，GroqCloud API 服务</td></tr><tr><td><strong>2025-12</strong></td><td>NVIDIA 投资</td><td>NVIDIA 投资 Groq 2.5 亿美元（首次合作）</td></tr><tr><td><strong>2026-Q1</strong></td><td><strong>全资收购</strong></td><td><strong>NVIDIA 以约 200 亿美元全资收购 Groq</strong></td></tr><tr><td><strong>2026 H2</strong></td><td>产品整合</td><td>Groq 3 LPU 重命名为 <strong>NVIDIA Groq 3 LPX</strong></td></tr><tr><td><strong>2026 H2+</strong></td><td>协同生态</td><td>LPX rack 作为 Rubin GPU 的 co-processor</td></tr></tbody></table>
<blockquote>
<p><strong>收购金额细节</strong>：根据多方信源，NVIDIA 以"现金 + 股权"组合形式收购 Groq，对应估值约 $20B。Groq 创始团队（Jonathan Ross 等）部分留任，继续负责 LPU 产品线。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="为什么-nvidia-要收购-groq">为什么 NVIDIA 要收购 Groq？<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#%E4%B8%BA%E4%BB%80%E4%B9%88-nvidia-%E8%A6%81%E6%94%B6%E8%B4%AD-groq" class="hash-link" aria-label="Direct link to 为什么 NVIDIA 要收购 Groq？" title="Direct link to 为什么 NVIDIA 要收购 Groq？" translate="no">​</a></h2>
<p>NVIDIA 在 GPU 算力领域已经<strong>绝对领先</strong>（CUDA 生态 + Rubin 平台 + 90% 数据中心 AI 市场份额），但有一个<strong>明显短板</strong>：</p>
<ul>
<li class=""><strong>超低延迟推理</strong>（TTFT &lt; 50ms）</li>
<li class=""><strong>Agentic AI</strong>（1000+ 调用/秒）</li>
<li class=""><strong>Deterministic Latency</strong>（可预测的延迟）</li>
</ul>
<p>这些场景下，传统 GPU 即使是 H100/Rubin R200，也受限于：</p>
<ul>
<li class="">HBM 访问延迟（~200ns vs SRAM 1ns）</li>
<li class="">CUDA 调度不确定性</li>
<li class="">算子融合的复杂度</li>
</ul>
<p><strong>Groq LPU 完美补全了 NVIDIA 的能力栈</strong>。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="groq-3-lpx-机柜规格">Groq 3 LPX 机柜规格<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#groq-3-lpx-%E6%9C%BA%E6%9F%9C%E8%A7%84%E6%A0%BC" class="hash-link" aria-label="Direct link to Groq 3 LPX 机柜规格" title="Direct link to Groq 3 LPX 机柜规格" translate="no">​</a></h2>
<p>收购完成后，Groq 3 LPU 重命名为 <strong>NVIDIA Groq 3 LPX</strong>，作为 Vera Rubin 平台的 co-processor：</p>
<table><thead><tr><th>项目</th><th>参数</th></tr></thead><tbody><tr><td><strong>芯片数（机柜）</strong></td><td><strong>256 颗 Groq 3 LPU</strong></td></tr><tr><td><strong>片上 SRAM（机柜）</strong></td><td><strong>128 GB 聚合</strong></td></tr><tr><td><strong>SRAM 带宽（机柜）</strong></td><td><strong>40 PB/s</strong></td></tr><tr><td><strong>互联</strong></td><td>GroqSync + NVLink-Network，<strong>640 TB/s</strong></td></tr><tr><td><strong>INT8 算力（机柜）</strong></td><td>~640,000 TOPS</td></tr><tr><td><strong>FP8 算力（机柜）</strong></td><td>~640 PFLOPS</td></tr><tr><td><strong>BF16 算力（机柜）</strong></td><td>~320 PFLOPS</td></tr><tr><td><strong>TDP（机柜）</strong></td><td>~80 kW</td></tr><tr><td><strong>perf/W</strong></td><td><strong>35× H100</strong>（官方）</td></tr><tr><td><strong>TTFT（首 Token 延迟）</strong></td><td><strong>&lt; 20ms</strong></td></tr><tr><td><strong>TPOT（单 Token 延迟）</strong></td><td><strong>&lt; 5ms</strong></td></tr></tbody></table>
<blockquote>
<p><strong>40 PB/s SRAM 带宽 ≈ 5,000× H100 HBM 带宽</strong>（H100 80GB HBM3 = 3.35 TB/s）。这是 Groq LPU <strong>极致低延迟</strong>的核心秘密。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="收购后的产品矩阵">收购后的产品矩阵<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#%E6%94%B6%E8%B4%AD%E5%90%8E%E7%9A%84%E4%BA%A7%E5%93%81%E7%9F%A9%E9%98%B5" class="hash-link" aria-label="Direct link to 收购后的产品矩阵" title="Direct link to 收购后的产品矩阵" translate="no">​</a></h2>
<p>NVIDIA 现在提供<strong>全场景 AI 算力覆盖</strong>：</p>
<table><thead><tr><th>场景</th><th>推荐产品</th></tr></thead><tbody><tr><td><strong>大规模训练</strong>（100B+ 模型）</td><td>Rubin NVL72 / NVL576</td></tr><tr><td><strong>高吞吐推理</strong></td><td>B300 Ultra / Rubin R200</td></tr><tr><td><strong>超低延迟推理</strong></td><td><strong>Groq 3 LPX</strong></td></tr><tr><td><strong>Agentic AI</strong>（1000+ 调用/秒）</td><td><strong>Groq 3 LPX rack</strong></td></tr><tr><td><strong>Real-time Code Gen</strong>（Copilot）</td><td><strong>Groq 3 LPX rack</strong></td></tr><tr><td><strong>万亿参数推理</strong></td><td>Rubin R200 + Groq 3 LPX 协同</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="对-ai-行业的影响">对 AI 行业的影响<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#%E5%AF%B9-ai-%E8%A1%8C%E4%B8%9A%E7%9A%84%E5%BD%B1%E5%93%8D" class="hash-link" aria-label="Direct link to 对 AI 行业的影响" title="Direct link to 对 AI 行业的影响" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-超低延迟推理市场洗牌">1. 超低延迟推理市场洗牌<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#1-%E8%B6%85%E4%BD%8E%E5%BB%B6%E8%BF%9F%E6%8E%A8%E7%90%86%E5%B8%82%E5%9C%BA%E6%B4%97%E7%89%8C" class="hash-link" aria-label="Direct link to 1. 超低延迟推理市场洗牌" title="Direct link to 1. 超低延迟推理市场洗牌" translate="no">​</a></h3>
<p>收购前，超低延迟推理市场有三家玩家：</p>
<ul>
<li class=""><strong>Groq</strong>（SRAM + 编译器）</li>
<li class=""><strong>Cerebras</strong>（WSE 大晶圆 + 40+ GB SRAM）</li>
<li class=""><strong>SambaNova</strong>（RDU 可重构数据流）</li>
</ul>
<p>收购后：</p>
<ul>
<li class=""><strong>Groq LPX</strong> 归 NVIDIA（最大生态、最强客户）</li>
<li class=""><strong>Cerebras WSE-4</strong>（2027）即将 IPO</li>
<li class=""><strong>SambaNova</strong> SN50 独立运营</li>
</ul>
<p><strong>Cerebras 的 IPO 时机变得更加重要</strong>——需要在 NVIDIA 整合 Groq 之前抢占市场。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-agentic-ai-加速爆发">2. Agentic AI 加速爆发<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#2-agentic-ai-%E5%8A%A0%E9%80%9F%E7%88%86%E5%8F%91" class="hash-link" aria-label="Direct link to 2. Agentic AI 加速爆发" title="Direct link to 2. Agentic AI 加速爆发" translate="no">​</a></h3>
<p>2026 年 Agentic AI 是 LLM 应用的下一个爆发点：</p>
<ul>
<li class="">单次 Agent 调用：~500ms-2s</li>
<li class="">复杂任务：100+ 次连续调用</li>
<li class="">用户体验：&lt; 200ms 响应</li>
</ul>
<p><strong>Groq 3 LPX 的 TTFT &lt; 20ms 是 Agentic AI 的关键使能技术</strong>。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-客户迁移">3. 客户迁移<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#3-%E5%AE%A2%E6%88%B7%E8%BF%81%E7%A7%BB" class="hash-link" aria-label="Direct link to 3. 客户迁移" title="Direct link to 3. 客户迁移" translate="no">​</a></h3>
<p>Groq 原本的客户：</p>
<ul>
<li class=""><strong>OpenAI</strong>：部分推理负载</li>
<li class=""><strong>Anthropic</strong>：Claude 推理</li>
<li class=""><strong>Meta</strong>：Llama 推理</li>
<li class=""><strong>Mistral</strong>：推理</li>
</ul>
<p>这些客户<strong>继续使用 LPX</strong>，但合同关系从 Groq Inc. 变为 NVIDIA Corp.。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="lpx-的局限">LPX 的局限<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#lpx-%E7%9A%84%E5%B1%80%E9%99%90" class="hash-link" aria-label="Direct link to LPX 的局限" title="Direct link to LPX 的局限" translate="no">​</a></h2>
<p>Groq 3 LPX 并非万能：</p>
<table><thead><tr><th>局限</th><th>影响</th></tr></thead><tbody><tr><td><strong>单芯片 SRAM 仅 512 MB</strong></td><td>大模型需 32+ 颗芯片</td></tr><tr><td><strong>不支持训练</strong></td><td>只能推理</td></tr><tr><td><strong>软件生态不如 CUDA</strong></td><td>模型迁移成本</td></tr><tr><td><strong>机柜级 $8-10M 价格</strong></td><td>中小客户难以承担</td></tr><tr><td><strong>不支持 fine-tuning</strong></td><td>推理优化空间有限</td></tr></tbody></table>
<p>因此，<strong>LPX 不是替代 GPU，而是补充 GPU</strong>：</p>
<ul>
<li class="">中小模型、低成本：GPU (L4 / T4)</li>
<li class="">大模型训练：GPU (H100 / B300)</li>
<li class="">大模型推理：GPU (H200 / B300)</li>
<li class=""><strong>超低延迟大模型推理：LPX</strong></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/groq-3-lpx">NVIDIA Groq 3 LPX 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/others/groq-lpu">Groq LPU (v1, 收购前)</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/architectures/arch-lpu">LPU 架构详解</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/rubin-r200">NVIDIA Rubin R200</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/cerebras/wse-3">Cerebras WSE-3</a>（竞争对手）</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/nvidia-acquires-groq-20-billion-lpu-strategy#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>NVIDIA 收购 Groq 是 2026 年 AI 芯片行业最重大的事件之一：</p>
<ol>
<li class=""><strong>补全 NVIDIA 算力版图</strong>——从"训练+推理"扩展到"训练+推理+超低延迟推理"</li>
<li class=""><strong>Groq 团队 + 客户</strong>全部并入 NVIDIA</li>
<li class=""><strong>GroqCloud API 继续运营</strong>（OpenAI 兼容）</li>
<li class=""><strong>Vera Rubin 平台</strong>成为全场景 AI 算力终极平台</li>
<li class=""><strong>AI 行业进入"机柜级"时代</strong>：GPU 机柜 + LPU 机柜协同</li>
</ol>
<p><strong>NVIDIA = GPU + LPU + 互联 + 软件 = 完整 AI 算力生态</strong></p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Vendor Strategy" term="Vendor Strategy"/>
        <category label="Industry News" term="Industry News"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[AWS Trainium 3 GA：3nm 工艺 + 4.4× 算力 + 4× 能效 + 144 芯片 UltraServer]]></title>
        <id>https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025</id>
        <link href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025"/>
        <updated>2025-12-02T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2025-12-02 AWS 在 re:Invent 2025 大会 GA 第三代自研 AI 芯片 Trainium 3：3nm 工艺、144GB HBM、4.4× 算力 / 4× 能效 / 4× 内存带宽、Trn3 UltraServer 144 颗芯片。]]></summary>
        <content type="html"><![CDATA[<p><strong>2025 年 12 月 2 日</strong>，AWS 在 re<!-- -->:Invent<!-- --> 2025 大会上<strong>正式 GA</strong> 第三代自研 AI 训练芯片 <strong>Trainium 3</strong>。这是 AWS 算力版图的关键升级：3nm 工艺、4.4× 算力提升、4× 能效提升、Trn3 UltraServer 144 颗芯片。本文详细解析。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="核心规格">核心规格<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#%E6%A0%B8%E5%BF%83%E8%A7%84%E6%A0%BC" class="hash-link" aria-label="Direct link to 核心规格" title="Direct link to 核心规格" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>Trainium 2 (2024)</th><th><strong>Trainium 3 (2025-12 GA)</strong></th><th>提升</th></tr></thead><tbody><tr><td>工艺</td><td>TSMC 4nm</td><td><strong>TSMC 3nm</strong></td><td>+一代</td></tr><tr><td>NeuronCore</td><td>8 颗 v3</td><td>8 颗 <strong>v4</strong></td><td>架构升级</td></tr><tr><td>HBM 容量</td><td>96 GB</td><td><strong>144 GB</strong></td><td>1.5×</td></tr><tr><td>HBM 带宽</td><td>2.9 TB/s</td><td><strong>~4.5 TB/s</strong></td><td>~1.55×</td></tr><tr><td>FP8 算力（密集）</td><td>1,299 TFLOPS</td><td><strong>5,716 TFLOPS</strong>（官方 4.4×）</td><td><strong>4.4×</strong></td></tr><tr><td>BF16/FP16</td><td>667 TFLOPS</td><td>1,300 TFLOPS</td><td>2×</td></tr><tr><td><strong>每芯片能效</strong></td><td>1×</td><td><strong>4×</strong></td><td>4×</td></tr><tr><td><strong>内存带宽</strong></td><td>1×</td><td><strong>4×</strong></td><td>4×</td></tr><tr><td>NeuronLink</td><td>NeuronLink-v3</td><td><strong>NeuronLink-v4</strong></td><td>新一代</td></tr><tr><td>TDP</td><td>~700 W</td><td>~700 W</td><td>持平</td></tr><tr><td>发布时间</td><td>2024-12</td><td><strong>2025-12</strong></td><td>—</td></tr></tbody></table>
<blockquote>
<p><strong>官方 4.4× 算力提升 + 4× 能效 + 4× 内存带宽</strong>——Trainium 3 是 AWS 在三个维度同时大幅升级的旗舰芯片。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="trn3-ultraserver机柜级">Trn3 UltraServer（机柜级）<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#trn3-ultraserver%E6%9C%BA%E6%9F%9C%E7%BA%A7" class="hash-link" aria-label="Direct link to Trn3 UltraServer（机柜级）" title="Direct link to Trn3 UltraServer（机柜级）" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>配置</th></tr></thead><tbody><tr><td><strong>芯片数</strong></td><td><strong>144 颗 Trainium 3</strong></td></tr><tr><td>HBM 总量</td><td>~20.7 TB（144GB × 144）</td></tr><tr><td>NeuronLink-v4</td><td>全互联，&gt;10 TB/s 双向</td></tr><tr><td>FP8 算力（机柜）</td><td><strong>52 PFLOPS</strong>（dense）</td></tr><tr><td>BF16 算力（机柜）</td><td>~187 PFLOPS</td></tr><tr><td>TDP（机柜）</td><td>~100 kW</td></tr><tr><td><strong>适用模型</strong></td><td><strong>400B+ 参数 LLM 训练</strong></td></tr></tbody></table>
<blockquote>
<p><strong>Trn3 UltraServer = 单机柜可训练 400B 模型</strong>。一个 EC2 UltraCluster（&gt;10 机柜）可支持<strong>1.4T+ 参数的巨型模型训练</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="trn3-vs-trn2-ultraserver-升级">Trn3 vs Trn2 UltraServer 升级<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#trn3-vs-trn2-ultraserver-%E5%8D%87%E7%BA%A7" class="hash-link" aria-label="Direct link to Trn3 vs Trn2 UltraServer 升级" title="Direct link to Trn3 vs Trn2 UltraServer 升级" translate="no">​</a></h2>
<table><thead><tr><th>指标</th><th>Trn2 UltraServer</th><th><strong>Trn3 UltraServer</strong></th><th>提升</th></tr></thead><tbody><tr><td>芯片数</td><td>64</td><td><strong>144</strong></td><td><strong>2.25×</strong></td></tr><tr><td>互联</td><td>NeuronLink-v3</td><td>NeuronLink-v4</td><td>新一代</td></tr><tr><td>HBM 总量</td><td>6.1 TB</td><td>~20.7 TB</td><td>3.4×</td></tr><tr><td>FP8 算力</td><td>~83 TFLOPS</td><td><strong>52 PFLOPS</strong></td><td><strong>~626×</strong></td></tr><tr><td>训练能力</td><td>70B+ LLM</td><td><strong>400B+ LLM</strong></td><td>—</td></tr><tr><td>发布时间</td><td>2024-12</td><td>2025-12</td><td>—</td></tr></tbody></table>
<blockquote>
<p>Trn3 UltraServer 是 2026 年<strong>性价比最高的大规模训练方案</strong>之一。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="aws-neuron-sdk-3">AWS Neuron SDK 3<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#aws-neuron-sdk-3" class="hash-link" aria-label="Direct link to AWS Neuron SDK 3" title="Direct link to AWS Neuron SDK 3" translate="no">​</a></h2>
<ul>
<li class=""><strong>Neuron SDK 3.x</strong>：PyTorch 2.4+ / JAX 0.4+ / TensorFlow 2.16+ 全优化</li>
<li class=""><strong>Neuron Compiler 2.x</strong>：自动编译 + 图优化</li>
<li class=""><strong>NeuronX Distributed</strong>：大规模分布式训练库（与 PyTorch FSDP 集成）</li>
<li class=""><strong>NeuronX Nemo</strong>：LLM 微调框架（Megatron-LM 等价）</li>
<li class=""><strong>vLLM 0.7+ 优化版</strong>：低延迟推理</li>
</ul>
<blockquote>
<p><strong>AWS Neuron = 类似 ROCm 的开源生态</strong>，全部 SDK 在 GitHub 开源（aws-neuron）。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ec2-实例类型">EC2 实例类型<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#ec2-%E5%AE%9E%E4%BE%8B%E7%B1%BB%E5%9E%8B" class="hash-link" aria-label="Direct link to EC2 实例类型" title="Direct link to EC2 实例类型" translate="no">​</a></h2>
<table><thead><tr><th>实例</th><th>GPU</th><th>配置</th><th>用途</th></tr></thead><tbody><tr><td><strong>trn3.48xlarge</strong></td><td>1 颗 Trn3</td><td>144GB HBM</td><td>单芯片开发</td></tr><tr><td><strong>trn3.96xlarge</strong></td><td>2 颗 Trn3</td><td>288GB HBM</td><td>小规模训练</td></tr><tr><td><strong>trn3 UltraServer</strong></td><td>144 颗 Trn3</td><td>20.7 TB HBM</td><td><strong>超大规模训练</strong></td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="价格与每美元性能">价格与每美元性能<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#%E4%BB%B7%E6%A0%BC%E4%B8%8E%E6%AF%8F%E7%BE%8E%E5%85%83%E6%80%A7%E8%83%BD" class="hash-link" aria-label="Direct link to 价格与每美元性能" title="Direct link to 价格与每美元性能" translate="no">​</a></h2>
<table><thead><tr><th>实例</th><th>每小时价格（on-demand，推测）</th></tr></thead><tbody><tr><td><strong>trn3.48xlarge</strong></td><td>~$32</td></tr><tr><td><strong>Trainium 2 同等实例</strong></td><td>~$16</td></tr><tr><td><strong>价格提升</strong></td><td>2×</td></tr><tr><td><strong>每美元 FP8 算力提升</strong></td><td><strong>2.2×</strong>（按 4.4× 算力 / 2× 价格）</td></tr></tbody></table>
<blockquote>
<p><strong>AWS 强调</strong>：Trainium 3 在<strong>每美元 FP8 算力</strong>上<strong>显著优于</strong> NVIDIA H100 / H200（2-3×）。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="与-nvidia-同期产品对比">与 NVIDIA 同期产品对比<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#%E4%B8%8E-nvidia-%E5%90%8C%E6%9C%9F%E4%BA%A7%E5%93%81%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 与 NVIDIA 同期产品对比" title="Direct link to 与 NVIDIA 同期产品对比" translate="no">​</a></h2>
<table><thead><tr><th>指标</th><th>Trainium 3</th><th>NVIDIA H200</th><th>NVIDIA B200</th></tr></thead><tbody><tr><td>工艺</td><td>TSMC 3nm</td><td>TSMC 4N</td><td>TSMC 4NP</td></tr><tr><td>HBM 容量</td><td>144 GB</td><td>141 GB</td><td>192 GB</td></tr><tr><td>HBM 带宽</td><td>4.5 TB/s</td><td>4.8 TB/s</td><td>8 TB/s</td></tr><tr><td>FP8 算力 (dense)</td><td>5.7 PFLOPS</td><td>1.0 PFLOPS</td><td>4.5 PFLOPS</td></tr><tr><td>FP16 算力</td><td>1.3 PFLOPS</td><td>1.0 PFLOPS</td><td>2.25 PFLOPS</td></tr><tr><td>TDP</td><td>700 W</td><td>700 W</td><td>1,000 W</td></tr><tr><td>互联</td><td>NeuronLink-v4</td><td>NVLink 4</td><td>NVLink 5</td></tr><tr><td>提供方式</td><td>AWS Cloud only</td><td>商用</td><td>商用</td></tr><tr><td>软件</td><td>Neuron SDK 3</td><td>CUDA</td><td>CUDA</td></tr><tr><td>每美元性能</td><td><strong>2-3× 优势</strong></td><td>1×</td><td>1.5×</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="适用场景">适用场景<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#%E9%80%82%E7%94%A8%E5%9C%BA%E6%99%AF" class="hash-link" aria-label="Direct link to 适用场景" title="Direct link to 适用场景" translate="no">​</a></h2>
<ul>
<li class="">✅ <strong>超大规模 LLM 训练</strong>（400B-1.4T 模型，UltraServer）</li>
<li class="">✅ <strong>AWS Bedrock 模型预训练</strong>（Anthropic Claude、Meta Llama、Mistral）</li>
<li class="">✅ <strong>成本敏感型训练</strong>（价格低于 NVIDIA 30-50%）</li>
<li class="">✅ <strong>能源效率敏感</strong>（每瓦性能 4× 提升）</li>
<li class="">❌ 非 AWS 部署（Trainium 仅在 EC2 出售）</li>
<li class="">❌ 旧 NVIDIA 生态绑定（CUDA-only 代码迁移成本高）</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="aws-客户案例">AWS 客户案例<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#aws-%E5%AE%A2%E6%88%B7%E6%A1%88%E4%BE%8B" class="hash-link" aria-label="Direct link to AWS 客户案例" title="Direct link to AWS 客户案例" translate="no">​</a></h2>
<p>AWS 在 re<!-- -->:Invent<!-- --> 2025 上公布的关键客户：</p>
<table><thead><tr><th>客户</th><th>应用</th></tr></thead><tbody><tr><td><strong>Anthropic</strong></td><td>Claude 训练（已使用 Trn2，现迁移到 Trn3）</td></tr><tr><td><strong>Meta</strong></td><td>Llama 4 训练</td></tr><tr><td><strong>Mistral</strong></td><td>Mistral Large 3 训练</td></tr><tr><td><strong>HuggingFace</strong></td><td>Open LLM 训练</td></tr><tr><td><strong>AWS Bedrock</strong></td><td>内部托管模型训练</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/trainium-3">AWS Trainium 3 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/trainium-2">AWS Trainium 2（前代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/trainium">AWS Trainium 1（初代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/aws/inferentia-2">AWS Inferentia 2（推理对偶）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/h100">NVIDIA H100（主要竞品）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/roadmap">未来路线图</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/aws-trainium-3-3nm-4x-efficiency-reinvent-2025#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>AWS Trainium 3 是 2025 年 AI 芯片行业的关键发布之一：</p>
<ol>
<li class=""><strong>3nm 工艺</strong> + <strong>4.4× 算力</strong> + <strong>4× 能效</strong>——AWS 算力版图全面升级</li>
<li class=""><strong>Trn3 UltraServer 144 颗</strong>——单机柜训练 400B+ 模型</li>
<li class=""><strong>每美元 FP8 算力 2-3× NVIDIA</strong>——AWS 训练成本优势</li>
<li class=""><strong>Neuron SDK 3 全面开源</strong>——降低软件迁移成本</li>
<li class=""><strong>Anthropic、Meta、Mistral 全面采用</strong>——AWS 算力生态扩展</li>
</ol>
<p>2026 年，Trainium 3 将成为<strong>AWS 内部核心训练负载</strong>的算力基础。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Product Launch" term="Product Launch"/>
        <category label="Cloud Pricing" term="Cloud Pricing"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Huawei Ascend 920：4 Tbps 国产最高带宽 + 3× H20 算力的国产替代]]></title>
        <id>https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution</id>
        <link href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution"/>
        <updated>2025-11-15T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Huawei Ascend 920 是 2025 H2 量产的下一代昇腾芯片：6nm SMIC 国产、4 Tbps HBM 带宽、900+ BF16 TFLOPS = 3× NVIDIA H20 算力，是中国国产替代的关键胜利。]]></summary>
        <content type="html"><![CDATA[<p><strong>Huawei Ascend 920（昇腾 920）</strong> 于 <strong>2025 H2 大规模量产</strong>，是中国国产 AI 芯片的<strong>重大突破</strong>。本文将分析其规格、与 NVIDIA H20 的对比、CloudMatrix 384 Ultra 系统，以及对中国 AI 产业的意义。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="核心规格">核心规格<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E6%A0%B8%E5%BF%83%E8%A7%84%E6%A0%BC" class="hash-link" aria-label="Direct link to 核心规格" title="Direct link to 核心规格" translate="no">​</a></h2>
<table><thead><tr><th>项目</th><th>Ascend 910C</th><th><strong>Ascend 920</strong></th><th>提升</th></tr></thead><tbody><tr><td>架构</td><td>Da Vinci v3</td><td><strong>Da Vinci v4</strong></td><td>新一代</td></tr><tr><td>制程</td><td>7nm</td><td><strong>6nm（SMIC 国产）</strong></td><td>更先进</td></tr><tr><td>小芯片数</td><td>2×（双 die）</td><td>2×</td><td>同</td></tr><tr><td>HBM 容量</td><td>~128 GB</td><td><strong>~96 GB</strong></td><td>略降</td></tr><tr><td>HBM 带宽</td><td>3.2 Tbps</td><td><strong>4 Tbps</strong></td><td><strong>1.25×</strong></td></tr><tr><td>BF16 算力</td><td>780 TFLOPS</td><td><strong>900+ TFLOPS</strong></td><td><strong>1.15×</strong></td></tr><tr><td>FP16 算力</td><td>1,560 TFLOPS</td><td>1,800 TFLOPS</td><td>1.15×</td></tr><tr><td>INT8 算力</td><td>3,120 TOPS</td><td>3,600 TOPS</td><td>1.15×</td></tr><tr><td>TDP</td><td>~310 W</td><td>~400 W</td><td>+29%</td></tr><tr><td>发布时间</td><td>2025-04</td><td><strong>2025 H2</strong></td><td>—</td></tr></tbody></table>
<blockquote>
<p><strong>4 Tbps 带宽 = 国产最高 HBM 带宽</strong>，比 Ascend 910C 提升 25%。900+ BF16 TFLOPS 算力也超过 910C。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ascend-920-vs-nvidia-h20对标">Ascend 920 vs NVIDIA H20（对标）<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#ascend-920-vs-nvidia-h20%E5%AF%B9%E6%A0%87" class="hash-link" aria-label="Direct link to Ascend 920 vs NVIDIA H20（对标）" title="Direct link to Ascend 920 vs NVIDIA H20（对标）" translate="no">​</a></h2>
<p>NVIDIA H20 是 NVIDIA 在美国出口管制下<strong>专门为中国市场设计</strong>的"合规"AI 芯片：</p>
<table><thead><tr><th>指标</th><th>Ascend 920</th><th>NVIDIA H20</th></tr></thead><tbody><tr><td>定位</td><td>国产替代</td><td>中国合规 AI 芯片</td></tr><tr><td>制程</td><td>6nm（SMIC）</td><td>TSMC 4N（受限后部分国产）</td></tr><tr><td>显存</td><td>~96 GB</td><td>96 GB HBM3</td></tr><tr><td>显存带宽</td><td><strong>4 Tbps</strong></td><td>4.0 Tbps</td></tr><tr><td>BF16 算力</td><td><strong>900 TFLOPS</strong></td><td>296 TFLOPS</td></tr><tr><td><strong>BF16 算力比</strong></td><td><strong>3×</strong></td><td>1×（基准）</td></tr><tr><td>互联</td><td>HCCS 1.2 Tbps</td><td>NVLink 900 GB/s</td></tr><tr><td>软件</td><td>CANN + MindSpore</td><td>CUDA（受限）</td></tr><tr><td>进口合规</td><td>✅ 国产</td><td>⚠️ 美国出口管制</td></tr></tbody></table>
<blockquote>
<p>💡 <strong>Ascend 920 在 BF16 算力上</strong>显著领先 H20（<strong>3 倍</strong>），且 4 Tbps 带宽与 H20 持平。这是国产替代的<strong>关键胜利</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="cloudmatrix-384-ultra-系统">CloudMatrix 384 Ultra 系统<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#cloudmatrix-384-ultra-%E7%B3%BB%E7%BB%9F" class="hash-link" aria-label="Direct link to CloudMatrix 384 Ultra 系统" title="Direct link to CloudMatrix 384 Ultra 系统" translate="no">​</a></h2>
<p>Ascend 920 将被用于 <strong>CloudMatrix 384 Ultra</strong> 超节点系统：</p>
<table><thead><tr><th>项目</th><th>配置</th></tr></thead><tbody><tr><td><strong>芯片数</strong></td><td>384 颗 Ascend 920</td></tr><tr><td><strong>机柜数</strong></td><td>16（12 计算 + 4 网络）</td></tr><tr><td><strong>HBM 总量</strong></td><td>~36 TB（96GB × 384）</td></tr><tr><td><strong>互联</strong></td><td>全光网状，<strong>8,000+ LPO 光模块</strong></td></tr><tr><td><strong>BF16 算力（系统）</strong></td><td><strong>~345 PFLOPS</strong>（推测 900 × 384）</td></tr><tr><td><strong>TDP（系统）</strong></td><td>~150 kW</td></tr></tbody></table>
<blockquote>
<p><strong>CloudMatrix 384 Ultra 系统级 BF16 算力 ~345 PFLOPS</strong> ≈ NVIDIA GB200 NVL72 集群（~144 PF FP8 dense）的 <strong>2.4 倍</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="为什么-920-是国产替代关键胜利">为什么 920 是国产替代关键胜利？<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E4%B8%BA%E4%BB%80%E4%B9%88-920-%E6%98%AF%E5%9B%BD%E4%BA%A7%E6%9B%BF%E4%BB%A3%E5%85%B3%E9%94%AE%E8%83%9C%E5%88%A9" class="hash-link" aria-label="Direct link to 为什么 920 是国产替代关键胜利？" title="Direct link to 为什么 920 是国产替代关键胜利？" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-算力首次超越-h20">1. 算力首次超越 H20<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#1-%E7%AE%97%E5%8A%9B%E9%A6%96%E6%AC%A1%E8%B6%85%E8%B6%8A-h20" class="hash-link" aria-label="Direct link to 1. 算力首次超越 H20" title="Direct link to 1. 算力首次超越 H20" translate="no">​</a></h3>
<table><thead><tr><th>时期</th><th>国产</th><th>NVIDIA 中国版</th><th>倍数</th></tr></thead><tbody><tr><td>2023</td><td>910B = 320 TFLOPS</td><td>H20 = 296 TFLOPS</td><td>1.08×</td></tr><tr><td>2024</td><td>910B = 320 TFLOPS</td><td>H20 = 296 TFLOPS</td><td>1.08×</td></tr><tr><td>2025 H1</td><td>910C = 780 TFLOPS</td><td>H20 = 296 TFLOPS</td><td>2.6×</td></tr><tr><td><strong>2025 H2</strong></td><td><strong>920 = 900 TFLOPS</strong></td><td>H20 = 296 TFLOPS</td><td><strong>3.0×</strong></td></tr></tbody></table>
<blockquote>
<p><strong>2025 H2 起，国产 AI 芯片算力首次稳定超越 H20 三倍</strong>。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-6nm-smic-国产工艺">2. 6nm SMIC 国产工艺<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#2-6nm-smic-%E5%9B%BD%E4%BA%A7%E5%B7%A5%E8%89%BA" class="hash-link" aria-label="Direct link to 2. 6nm SMIC 国产工艺" title="Direct link to 2. 6nm SMIC 国产工艺" translate="no">​</a></h3>
<p>Ascend 920 采用 <strong>SMIC N+1 / N+2 6nm 工艺</strong>：</p>
<ul>
<li class="">✅ 完全自主可控</li>
<li class="">✅ 不受美国出口管制</li>
<li class="">⚠️ 良率和成本仍逊于 TSMC 4N</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-4-tbps-国产最高-hbm">3. 4 Tbps 国产最高 HBM<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#3-4-tbps-%E5%9B%BD%E4%BA%A7%E6%9C%80%E9%AB%98-hbm" class="hash-link" aria-label="Direct link to 3. 4 Tbps 国产最高 HBM" title="Direct link to 3. 4 Tbps 国产最高 HBM" translate="no">​</a></h3>
<p>Ascend 920 的 4 Tbps HBM 带宽：</p>
<ul>
<li class="">国内<strong>首次达到 4 Tbps 级</strong>（之前最高 3.2 Tbps）</li>
<li class="">与 H20 持平</li>
<li class="">推测使用 <strong>CXMT 长鑫存储 HBM3</strong> 或自研 HBM</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-cann--mindspore-软件栈">4. CANN + MindSpore 软件栈<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#4-cann--mindspore-%E8%BD%AF%E4%BB%B6%E6%A0%88" class="hash-link" aria-label="Direct link to 4. CANN + MindSpore 软件栈" title="Direct link to 4. CANN + MindSpore 软件栈" translate="no">​</a></h3>
<ul>
<li class=""><strong>CANN 8.x</strong>（Compute Architecture for Neural Networks）：类比 CUDA</li>
<li class=""><strong>MindSpore 2.4+</strong>：Huawei 自研 AI 框架</li>
<li class=""><strong>PyTorch 2.3+ MindSpore 后端</strong>：兼容 PyTorch</li>
<li class=""><strong>vLLM 0.7+ Ascend 后端</strong>：低延迟推理</li>
<li class=""><strong>ONNX-Runtime Ascend 后端</strong>：跨框架推理</li>
<li class=""><strong>Atlas 900/950 系列服务器</strong>：OEM 整机</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="中国市场部署现状">中国市场部署现状<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E4%B8%AD%E5%9B%BD%E5%B8%82%E5%9C%BA%E9%83%A8%E7%BD%B2%E7%8E%B0%E7%8A%B6" class="hash-link" aria-label="Direct link to 中国市场部署现状" title="Direct link to 中国市场部署现状" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="已规模化客户">已规模化客户<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E5%B7%B2%E8%A7%84%E6%A8%A1%E5%8C%96%E5%AE%A2%E6%88%B7" class="hash-link" aria-label="Direct link to 已规模化客户" title="Direct link to 已规模化客户" translate="no">​</a></h3>
<table><thead><tr><th>客户</th><th>应用</th></tr></thead><tbody><tr><td><strong>中国移动</strong></td><td>大模型训练（9 9.8 亿客户）</td></tr><tr><td><strong>中国电信</strong></td><td>智能客服 + 业务洞察</td></tr><tr><td><strong>中国联通</strong></td><td>政务 + 行业 AI</td></tr><tr><td><strong>国家电网</strong></td><td>电网调度 + 故障预测</td></tr><tr><td><strong>中国石油</strong></td><td>勘探 + 物流优化</td></tr><tr><td><strong>各大银行</strong></td><td>风控 + 反欺诈</td></tr><tr><td><strong>互联网公司</strong>（百度、阿里、腾讯）</td><td>LLM 推理</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="行业布局">行业布局<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E8%A1%8C%E4%B8%9A%E5%B8%83%E5%B1%80" class="hash-link" aria-label="Direct link to 行业布局" title="Direct link to 行业布局" translate="no">​</a></h3>
<ul>
<li class=""><strong>政府</strong>：100% 国产化要求</li>
<li class=""><strong>金融</strong>：政策性要求国产</li>
<li class=""><strong>电信</strong>：HBM 国产化进度快</li>
<li class=""><strong>能源</strong>：HBM 国产化进度快</li>
<li class=""><strong>互联网</strong>：部分敏感业务国产化</li>
<li class=""><strong>教育 / 医疗</strong>：渐进国产化</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="局限与挑战">局限与挑战<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E5%B1%80%E9%99%90%E4%B8%8E%E6%8C%91%E6%88%98" class="hash-link" aria-label="Direct link to 局限与挑战" title="Direct link to 局限与挑战" translate="no">​</a></h2>
<table><thead><tr><th>局限</th><th>影响</th></tr></thead><tbody><tr><td><strong>FP8/FP4 支持</strong></td><td>Ascend 920 仍以 BF16/FP16 为主，FP8 优化中</td></tr><tr><td><strong>HBM 容量</strong></td><td>96 GB 低于 NVIDIA Rubin R200 288 GB / AMD MI400 432 GB</td></tr><tr><td><strong>CUDA 兼容性</strong></td><td>CANN 8 仍需迁移，CUDA 应用直接运行受限</td></tr><tr><td><strong>SMIC 6nm 良率</strong></td><td>比 TSMC 4N 良率低 10-20%</td></tr><tr><td><strong>HBM 来源</strong></td><td>CXMT 长鑫 HBM 产能有限</td></tr><tr><td><strong>互联带宽</strong></td><td>HCCS 1.2 Tbps 远低于 NVLink 6 (3.5 TB/s)</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="与同期国产芯片对比">与同期国产芯片对比<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E4%B8%8E%E5%90%8C%E6%9C%9F%E5%9B%BD%E4%BA%A7%E8%8A%AF%E7%89%87%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 与同期国产芯片对比" title="Direct link to 与同期国产芯片对比" translate="no">​</a></h2>
<table><thead><tr><th>厂商</th><th>芯片</th><th>BF16 算力</th><th>HBM 带宽</th><th>量产时间</th></tr></thead><tbody><tr><td><strong>Huawei</strong></td><td>Ascend 920</td><td>900 TFLOPS</td><td>4 Tbps</td><td>2025 H2</td></tr><tr><td><strong>Huawei</strong></td><td>Ascend 910C</td><td>780 TFLOPS</td><td>3.2 Tbps</td><td>2025-04</td></tr><tr><td><strong>Cambricon</strong></td><td>思元 590</td><td>~480 TFLOPS</td><td>2.4 Tbps</td><td>2024</td></tr><tr><td><strong>Moore Threads</strong></td><td>MTT S5000</td><td>~250 TFLOPS</td><td>1.6 Tbps</td><td>2024</td></tr><tr><td><strong>Biren</strong></td><td>BR104</td><td>~300 TFLOPS</td><td>1.6 Tbps</td><td>2024</td></tr><tr><td><strong>Iluvatar</strong></td><td>CoreX Bi-150</td><td>~200 TFLOPS</td><td>1.2 Tbps</td><td>2024</td></tr></tbody></table>
<blockquote>
<p><strong>Huawei Ascend 920 在国产 AI 芯片中保持明显领先</strong>。</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="详细产品页">详细产品页<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E8%AF%A6%E7%BB%86%E4%BA%A7%E5%93%81%E9%A1%B5" class="hash-link" aria-label="Direct link to 详细产品页" title="Direct link to 详细产品页" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/huawei/ascend-920">Huawei Ascend 920 完整规格</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/huawei/ascend-910c">Huawei Ascend 910C（前代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/huawei/ascend-910b">Huawei Ascend 910B（初代量产）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/huawei/ascend-910d">Huawei Ascend 910D（高端）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/nvidia/h100-nvl">NVIDIA H100 NVL（对标）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/cards/amd/mi300x">AMD MI300X（海外替代）</a></li>
<li class=""><a class="" href="https://mirrorfrog.com/en/docs/roadmap">未来路线图</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结">总结<a href="https://mirrorfrog.com/en/blog/huawei-ascend-920-3x-h20-domestic-substitution#%E6%80%BB%E7%BB%93" class="hash-link" aria-label="Direct link to 总结" title="Direct link to 总结" translate="no">​</a></h2>
<p>Huawei Ascend 920 是 2025 H2 中国 AI 芯片的<strong>关键胜利</strong>：</p>
<ol>
<li class=""><strong>900+ BF16 TFLOPS = 3× H20</strong>——首次稳定超越 H20 三倍</li>
<li class=""><strong>6nm SMIC 国产</strong>——自主可控</li>
<li class=""><strong>4 Tbps 国产最高 HBM 带宽</strong>——HBM 国产化突破</li>
<li class=""><strong>CloudMatrix 384 Ultra 系统</strong>——单系统超越 GB200 NVL72</li>
<li class=""><strong>CANN + MindSpore</strong>——软件生态完善</li>
</ol>
<p>2025 H2 起，中国 AI 产业进入**"国产芯片可独立支撑大规模 AI 应用"**的新阶段。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="Product Launch" term="Product Launch"/>
        <category label="Vendor Strategy" term="Vendor Strategy"/>
        <category label="Industry News" term="Industry News"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[国产 AI 芯片格局 2025：昇腾、寒武纪、海光谁主沉浮？]]></title>
        <id>https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025</id>
        <link href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025"/>
        <updated>2025-06-03T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2025 年国产 AI 芯片全面盘点。华为昇腾 910B、寒武纪思元 590、海光 DCU、燧原 T21 等主流芯片深度对比，从算力、生态、部署规模多维度分析。]]></summary>
        <content type="html"><![CDATA[<p>美国的出口管制持续升级，倒逼中国 AI 芯片产业加速自主化。2025 年的国产 AI 芯片市场已经不再是"能不能用"的讨论，而是"怎么选"的问题。</p>
<p>本文系统梳理国产 AI 芯片的<strong>主要玩家、核心产品、实际部署情况</strong>，帮助开发者和采购决策者看清竞争格局。</p>
<!-- -->
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="第一梯队华为昇腾">第一梯队：华为昇腾<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E7%AC%AC%E4%B8%80%E6%A2%AF%E9%98%9F%E5%8D%8E%E4%B8%BA%E6%98%87%E8%85%BE" class="hash-link" aria-label="Direct link to 第一梯队：华为昇腾" title="Direct link to 第一梯队：华为昇腾" translate="no">​</a></h2>
<p><strong>产品</strong>：昇腾 910B（训练）、昇腾 310P/310（推理）</p>
<p><strong>架构</strong>：达芬奇（Da Vinci）— 3D Cube 矩阵计算单元</p>
<p><strong>核心数据</strong>：</p>
<table><thead><tr><th>指标</th><th>昇腾 910B</th><th>昇腾 310P</th><th>昇腾 310</th></tr></thead><tbody><tr><td>FP16 算力</td><td>400 TFLOPS</td><td>—</td><td>—</td></tr><tr><td>INT8 算力</td><td>640 TOPS</td><td>70 TOPS</td><td>22 TOPS</td></tr><tr><td>显存</td><td>64GB HBM2e</td><td>24GB LPDDR4X</td><td>8GB LPDDR4</td></tr><tr><td>TDP</td><td>310W</td><td>75W</td><td>8W</td></tr><tr><td>制程</td><td>7nm</td><td>12nm</td><td>12nm</td></tr></tbody></table>
<p><strong>生态现状</strong>：</p>
<ul>
<li class=""><strong>CANN 软件栈</strong>：对标 CUDA，从驱动到编译器的完整软件栈</li>
<li class=""><strong>torch_npu</strong>：PyTorch 的昇腾后端，API 与 CUDA 高度一致</li>
<li class=""><strong>MindSpore</strong>：华为自研框架，但市场接受度有限</li>
<li class=""><strong>大模型适配</strong>：Llama、Qwen 等主流模型均已适配</li>
</ul>
<p><strong>实际部署</strong>：根据公开数据，昇腾 910B 已部署 <strong>6,000+</strong> 芯片在华为盘古大模型集群中。</p>
<p><strong>综合评价</strong>：国产 AI 芯片的绝对龙头。软件生态最完善，政企市场占有率最高。训练性能接近 H100 的 60-70%，推理性价比有竞争力。</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="第二梯队寒武纪--海光">第二梯队：寒武纪 &amp; 海光<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E7%AC%AC%E4%BA%8C%E6%A2%AF%E9%98%9F%E5%AF%92%E6%AD%A6%E7%BA%AA--%E6%B5%B7%E5%85%89" class="hash-link" aria-label="Direct link to 第二梯队：寒武纪 &amp; 海光" title="Direct link to 第二梯队：寒武纪 &amp; 海光" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="寒武纪-思元-mlu">寒武纪 思元 MLU<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E5%AF%92%E6%AD%A6%E7%BA%AA-%E6%80%9D%E5%85%83-mlu" class="hash-link" aria-label="Direct link to 寒武纪 思元 MLU" title="Direct link to 寒武纪 思元 MLU" translate="no">​</a></h3>
<p><strong>产品</strong>：思元 590、思元 370</p>
<p><strong>定位</strong>：AI 训练 + 推理</p>
<p><strong>关键信息</strong>：</p>
<ul>
<li class="">思元 590 算力对标 A100（FP32 ~30 TFLOPS，INT8 ~300 TOPS）</li>
<li class="">自研 MLUarch 架构 + BangC 编程语言</li>
<li class="">已有 PyTorch/TensorFlow 适配</li>
<li class="">主要部署在智慧城市、安防、科研等领域</li>
</ul>
<p><strong>现状</strong>：寒武纪曾是最受关注的 AI 芯片独角兽，但近年面临商业化困难和持续亏损。产品迭代速度慢于昇腾，市场份额被挤压。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="海光信息-深算-dcu">海光信息 深算 DCU<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E6%B5%B7%E5%85%89%E4%BF%A1%E6%81%AF-%E6%B7%B1%E7%AE%97-dcu" class="hash-link" aria-label="Direct link to 海光信息 深算 DCU" title="Direct link to 海光信息 深算 DCU" translate="no">​</a></h3>
<p><strong>产品</strong>：深算 Z100</p>
<p><strong>架构特点</strong>：兼容 CUDA（基于 AMD ROCm 路线）</p>
<p><strong>关键信息</strong>：</p>
<ul>
<li class="">深算一号 FP32 算力 ~15 TFLOPS</li>
<li class="">最大的卖点：兼容 CUDA API，迁移成本低</li>
<li class="">主要部署在超算中心、金融机构等信创场景</li>
<li class="">制程受制于代工限制</li>
</ul>
<p><strong>现状</strong>：海光的兼容路线在短期内降低了软件迁移成本，但长期受制于 AMD 生态发展。</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="第三梯队创业公司与跨界玩家">第三梯队：创业公司与跨界玩家<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E7%AC%AC%E4%B8%89%E6%A2%AF%E9%98%9F%E5%88%9B%E4%B8%9A%E5%85%AC%E5%8F%B8%E4%B8%8E%E8%B7%A8%E7%95%8C%E7%8E%A9%E5%AE%B6" class="hash-link" aria-label="Direct link to 第三梯队：创业公司与跨界玩家" title="Direct link to 第三梯队：创业公司与跨界玩家" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="燧原科技-云燧-t21">燧原科技 云燧 T21<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E7%87%A7%E5%8E%9F%E7%A7%91%E6%8A%80-%E4%BA%91%E7%87%A7-t21" class="hash-link" aria-label="Direct link to 燧原科技 云燧 T21" title="Direct link to 燧原科技 云燧 T21" translate="no">​</a></h3>
<ul>
<li class="">面向云端 AI 训练</li>
<li class="">自研 GCU 架构 + 驭算软件栈</li>
<li class="">已有 PyTorch 适配</li>
<li class="">获得多家运营商和政府项目订单</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="壁仞科技-br100br20x">壁仞科技 BR100/BR20X<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E5%A3%81%E4%BB%9E%E7%A7%91%E6%8A%80-br100br20x" class="hash-link" aria-label="Direct link to 壁仞科技 BR100/BR20X" title="Direct link to 壁仞科技 BR100/BR20X" translate="no">​</a></h3>
<ul>
<li class="">BR100 号称 FP16 算力 1000+ TFLOPS（理论峰值）</li>
<li class="">但实际落地进度慢于宣传</li>
<li class="">2024 年后转向更务实的产品路线</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="摩尔线程-mtt-s5000">摩尔线程 MTT S5000<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E6%91%A9%E5%B0%94%E7%BA%BF%E7%A8%8B-mtt-s5000" class="hash-link" aria-label="Direct link to 摩尔线程 MTT S5000" title="Direct link to 摩尔线程 MTT S5000" translate="no">​</a></h3>
<ul>
<li class="">全功能 GPU（图形 + 计算 + AI）</li>
<li class="">MUSA 架构兼容 CUDA API</li>
<li class="">驱动和软件栈成熟度在提升，但距离生产级 AI 训练仍有差距</li>
<li class="">更适合推理和小规模训练</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="百度-昆仑芯-p800">百度 昆仑芯 P800<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E7%99%BE%E5%BA%A6-%E6%98%86%E4%BB%91%E8%8A%AF-p800" class="hash-link" aria-label="Direct link to 百度 昆仑芯 P800" title="Direct link to 百度 昆仑芯 P800" translate="no">​</a></h3>
<ul>
<li class="">百度自研 AI 芯片</li>
<li class="">部署在百度搜索、智能云、自动驾驶等内部场景</li>
<li class="">公开技术细节有限，但内部大规模验证通过</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="国产-ai-芯片横向对比">国产 AI 芯片横向对比<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E5%9B%BD%E4%BA%A7-ai-%E8%8A%AF%E7%89%87%E6%A8%AA%E5%90%91%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 国产 AI 芯片横向对比" title="Direct link to 国产 AI 芯片横向对比" translate="no">​</a></h2>
<table><thead><tr><th>芯片</th><th>FP16 算力 (TFLOPS)</th><th>显存 (GB)</th><th>CUDA 兼容</th><th>训练能力</th><th>部署规模</th></tr></thead><tbody><tr><td>昇腾 910B</td><td>400</td><td>64 HBM2e</td><td>❌ CANN</td><td>✅ 强</td><td>6,000+</td></tr><tr><td>寒武纪 590</td><td>~300</td><td>—</td><td>❌ BangC</td><td>⚠️</td><td>千级</td></tr><tr><td>海光 DCU Z100</td><td>~30 (FP32)</td><td>—</td><td>⚠️ ROCm 路线</td><td>⚠️</td><td>千级</td></tr><tr><td>燧原 T21</td><td>~200</td><td>32 HBM2e</td><td>❌ 自研</td><td>✅</td><td>百级</td></tr><tr><td>壁仞 BR100</td><td>~1000 (声称)</td><td>—</td><td>⚠️</td><td>⚠️</td><td>有限</td></tr><tr><td>百度昆仑芯 P800</td><td>—</td><td>—</td><td>❌ 自研</td><td>⚠️</td><td>内部</td></tr><tr><td>摩尔线程 MTT S5000</td><td>~100</td><td>32 GDDR6</td><td>⚠️ MUSA</td><td>❌ 推理为主</td><td>—</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="软件生态对比关键决策因素">软件生态对比（关键决策因素）<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E8%BD%AF%E4%BB%B6%E7%94%9F%E6%80%81%E5%AF%B9%E6%AF%94%E5%85%B3%E9%94%AE%E5%86%B3%E7%AD%96%E5%9B%A0%E7%B4%A0" class="hash-link" aria-label="Direct link to 软件生态对比（关键决策因素）" title="Direct link to 软件生态对比（关键决策因素）" translate="no">​</a></h2>
<table><thead><tr><th>芯片</th><th>PyTorch</th><th>vLLM 推理</th><th>Hugging Face</th><th>CUDA 代码移植成本</th></tr></thead><tbody><tr><td>昇腾 910B</td><td>⚠️ torch_npu</td><td>⚠️ 社区</td><td>⚠️ 部分</td><td>中（需改 device 名 + 算子适配）</td></tr><tr><td>海光 DCU</td><td>⚠️ ROCm 后端</td><td>⚠️</td><td>⚠️</td><td>低（兼容 CUDA API）</td></tr><tr><td>寒武纪 590</td><td>⚠️</td><td>❌</td><td>❌</td><td>高（BangC 语言）</td></tr><tr><td>燧原 T21</td><td>⚠️</td><td>❌</td><td>❌</td><td>高</td></tr><tr><td>摩尔线程 MTT</td><td>⚠️</td><td>❌</td><td>❌</td><td>中（MUSA 兼容 CUDA）</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="选型建议">选型建议<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E9%80%89%E5%9E%8B%E5%BB%BA%E8%AE%AE" class="hash-link" aria-label="Direct link to 选型建议" title="Direct link to 选型建议" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="政企--信创项目">政企 / 信创项目<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E6%94%BF%E4%BC%81--%E4%BF%A1%E5%88%9B%E9%A1%B9%E7%9B%AE" class="hash-link" aria-label="Direct link to 政企 / 信创项目" title="Direct link to 政企 / 信创项目" translate="no">​</a></h3>
<p><strong>首选昇腾 910B</strong>。原因：</p>
<ul>
<li class="">软件生态最完善，社区支持最强</li>
<li class="">昇腾 + 麒麟/UOS 的组合是信创标配</li>
<li class="">CANN 工具链成熟度领先其他国产方案 2-3 年</li>
<li class="">华为技术支持和文档最全面</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cuda-存量代码迁移">CUDA 存量代码迁移<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#cuda-%E5%AD%98%E9%87%8F%E4%BB%A3%E7%A0%81%E8%BF%81%E7%A7%BB" class="hash-link" aria-label="Direct link to CUDA 存量代码迁移" title="Direct link to CUDA 存量代码迁移" translate="no">​</a></h3>
<p>如果不想重写大量代码：</p>
<ul>
<li class=""><strong>海光 DCU</strong>（ROCm 兼容路线）迁移成本最低</li>
<li class=""><strong>摩尔线程 MTT</strong>（MUSA 兼容路线）适合推理场景</li>
<li class="">昇腾的 torch_npu 迁移成本居中，但长期生态回报最高</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="纯推理场景">纯推理场景<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#%E7%BA%AF%E6%8E%A8%E7%90%86%E5%9C%BA%E6%99%AF" class="hash-link" aria-label="Direct link to 纯推理场景" title="Direct link to 纯推理场景" translate="no">​</a></h3>
<ul>
<li class=""><strong>昇腾 310P</strong>：性价比最高的国产推理卡</li>
<li class=""><strong>摩尔线程 MTT S5000</strong>：如果需求是国产化全功能 GPU</li>
<li class=""><strong>寒武纪 370</strong>：特定场景（视觉、安防）有存量优势</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2025-2026-展望">2025-2026 展望<a href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025#2025-2026-%E5%B1%95%E6%9C%9B" class="hash-link" aria-label="Direct link to 2025-2026 展望" title="Direct link to 2025-2026 展望" translate="no">​</a></h2>
<ol>
<li class=""><strong>昇腾 920 即将到来</strong>：下一代昇腾将采用更先进制程，目标 FP8 算力对标 H200</li>
<li class=""><strong>EDA 工具国产化</strong>：芯片设计工具的国产替代将帮助更多创业公司加速迭代</li>
<li class=""><strong>CUDA 兼容成为标配</strong>：所有国产芯片都将至少提供 CUDA API 兼容层</li>
<li class=""><strong>推理市场加速分食</strong>：国产芯片在推理场景将率先达到可替代 NVIDIA 的水平</li>
<li class=""><strong>规模化部署验证</strong>：更多"万卡集群"国产方案将在运营商和金融行业落地</li>
</ol>
<p><strong>关键判断</strong>：国产 AI 芯片在 2025-2026 年将从"能用"跨入"好用"阶段。训练性能差距仍在（落后 1-2 代），但推理场景已经具备替换条件。</p>
<hr>
<p><em>在 MirrorFrog 你可以找到以上所有国产芯片的驱动下载、开发文档和详细规格参数。</em></p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="国产芯片" term="国产芯片"/>
        <category label="昇腾" term="昇腾"/>
        <category label="寒武纪" term="寒武纪"/>
        <category label="海光" term="海光"/>
        <category label="AI芯片" term="AI芯片"/>
        <category label="国产替代" term="国产替代"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[GPU vs NPU vs TPU：三种 AI 加速架构深度对比，你应该用哪种？]]></title>
        <id>https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu</id>
        <link href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu"/>
        <updated>2025-06-02T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[GPU、NPU、TPU 三种主流 AI 加速架构全方位对比。从算力、显存、生态、成本到适用场景，帮你做出最佳选择。]]></summary>
        <content type="html"><![CDATA[<p>AI 加速芯片领域有三大主流架构：<strong>GPU</strong>、<strong>NPU</strong> 和 <strong>TPU</strong>。再加上近年出现的 <strong>LPU</strong>（语言处理器），很多开发者搞不清它们之间的区别。</p>
<p>本文从<strong>架构设计理念、生态成熟度、实际性能表现、部署成本</strong>四个维度进行对比。</p>
<!-- -->
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="架构设计理念">架构设计理念<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1%E7%90%86%E5%BF%B5" class="hash-link" aria-label="Direct link to 架构设计理念" title="Direct link to 架构设计理念" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="gpu通用-ai-计算平台">GPU：通用 AI 计算平台<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#gpu%E9%80%9A%E7%94%A8-ai-%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0" class="hash-link" aria-label="Direct link to GPU：通用 AI 计算平台" title="Direct link to GPU：通用 AI 计算平台" translate="no">​</a></h3>
<p>GPU 最初为图形渲染设计，但因其大规模并行计算能力，被 NVIDIA 改造为通用 AI 加速器。</p>
<p><strong>核心设计</strong>：大量 CUDA Core + Tensor Core（专用矩阵运算单元），兼顾 AI 计算和通用并行计算。</p>
<p><strong>代表产品</strong>：NVIDIA H100、B200、AMD MI300X</p>
<p><strong>优势</strong>：通用性最强，从训练到推理、从 LLM 到 diffusion 模型、从科学计算到图形渲染，一块卡全搞定。</p>
<p><strong>劣势</strong>：针对特定模型架构的优化不如专用芯片极致。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="npu端侧-ai-推理专家">NPU：端侧 AI 推理专家<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#npu%E7%AB%AF%E4%BE%A7-ai-%E6%8E%A8%E7%90%86%E4%B8%93%E5%AE%B6" class="hash-link" aria-label="Direct link to NPU：端侧 AI 推理专家" title="Direct link to NPU：端侧 AI 推理专家" translate="no">​</a></h3>
<p>NPU 专为神经网络推理设计，强调<strong>低功耗、低成本、高能效比</strong>。</p>
<p><strong>核心设计</strong>：脉动阵列（Systolic Array）或乘加树（MAC Tree），针对卷积和矩阵乘法高度优化。</p>
<p><strong>代表产品</strong>：华为昇腾 910B、Qualcomm Hexagon、Apple Neural Engine、AMD Ryzen AI NPU</p>
<p><strong>优势</strong>：能效比极高——同样功耗下推理性能远优于 GPU；适合移动端、边缘端、嵌入式场景。</p>
<p><strong>劣势</strong>：灵活性差（主要服务于推理），训练能力有限或完全不具备；软件生态高度依赖厂商。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tpugoogle-生态的定制加速器">TPU：Google 生态的定制加速器<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#tpugoogle-%E7%94%9F%E6%80%81%E7%9A%84%E5%AE%9A%E5%88%B6%E5%8A%A0%E9%80%9F%E5%99%A8" class="hash-link" aria-label="Direct link to TPU：Google 生态的定制加速器" title="Direct link to TPU：Google 生态的定制加速器" translate="no">​</a></h3>
<p>TPU 是 Google 专为其 TensorFlow/JAX 框架设计的 ASIC。</p>
<p><strong>核心设计</strong>：大规模脉动阵列（Systolic Array），针对矩阵乘法极致优化；片内 HBM 带宽极高。</p>
<p><strong>代表产品</strong>：Google Cloud TPU v5e、v5p</p>
<p><strong>优势</strong>：Google Cloud 上训练 JAX/TensorFlow 模型的性价比极高；TPU v5p 集群互联性能出色。</p>
<p><strong>劣势</strong>：仅限 Google Cloud 使用；PyTorch 适配不完善；不出售硬件，只能租用。</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="性能实测对比">性能实测对比<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#%E6%80%A7%E8%83%BD%E5%AE%9E%E6%B5%8B%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 性能实测对比" title="Direct link to 性能实测对比" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="llm-推理llama-2-70b">LLM 推理（Llama 2 70B）<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#llm-%E6%8E%A8%E7%90%86llama-2-70b" class="hash-link" aria-label="Direct link to LLM 推理（Llama 2 70B）" title="Direct link to LLM 推理（Llama 2 70B）" translate="no">​</a></h3>
<table><thead><tr><th>芯片</th><th>Tokens/s</th><th>功耗(W)</th><th>能效(tok/s/W)</th></tr></thead><tbody><tr><td>NVIDIA H100 SXM5</td><td>~120 (FP16)</td><td>700</td><td>0.17</td></tr><tr><td>NVIDIA L40S</td><td>~40 (FP16)</td><td>300</td><td>0.13</td></tr><tr><td>华为昇腾 910B</td><td>~80 (FP16)</td><td>310</td><td>0.26</td></tr><tr><td>Groq LPU v1</td><td>~330 (FP16)</td><td>300</td><td>1.10</td></tr><tr><td>Google TPU v5e</td><td>~90 (BF16)</td><td>—</td><td>—</td></tr></tbody></table>
<blockquote>
<p>Groq LPU 在 LLM 推理延迟上有绝对优势，但这是因为它放弃了灵活性——只能做 Transformer 推理。</p>
</blockquote>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="训练gpt-3-175b-等效">训练（GPT-3 175B 等效）<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#%E8%AE%AD%E7%BB%83gpt-3-175b-%E7%AD%89%E6%95%88" class="hash-link" aria-label="Direct link to 训练（GPT-3 175B 等效）" title="Direct link to 训练（GPT-3 175B 等效）" translate="no">​</a></h3>
<table><thead><tr><th>芯片配置</th><th>训练时间</th><th>成本估计</th></tr></thead><tbody><tr><td>8× H100 SXM5</td><td>~1.1 天</td><td>~$25,000/天</td></tr><tr><td>8× 昇腾 910B</td><td>~1.5 天 (官方)</td><td>需询价</td></tr><tr><td>8× TPU v5p</td><td>~1.0 天</td><td>需租赁</td></tr><tr><td>8× AMD MI300X</td><td>~1.3 天</td><td>~$15,000/天</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="生态成熟度对比">生态成熟度对比<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#%E7%94%9F%E6%80%81%E6%88%90%E7%86%9F%E5%BA%A6%E5%AF%B9%E6%AF%94" class="hash-link" aria-label="Direct link to 生态成熟度对比" title="Direct link to 生态成熟度对比" translate="no">​</a></h2>
<table><thead><tr><th>维度</th><th>GPU (NVIDIA)</th><th>NPU (昇腾)</th><th>TPU (Google)</th></tr></thead><tbody><tr><td>PyTorch 支持</td><td>✅ 原生</td><td>⚠️ torch_npu</td><td>❌ 需 JAX</td></tr><tr><td>TensorFlow 支持</td><td>✅ 原生</td><td>⚠️ 适配中</td><td>✅ 原生</td></tr><tr><td>vLLM 推理</td><td>✅ 最佳</td><td>⚠️ 社区版</td><td>❌</td></tr><tr><td>Hugging Face</td><td>✅ 原生</td><td>⚠️ 部分</td><td>❌</td></tr><tr><td>Docker 容器化</td><td>✅ NGC 容器</td><td>⚠️ 昇腾容器</td><td>❌</td></tr><tr><td>社区/文档</td><td>⭐⭐⭐⭐⭐</td><td>⭐⭐⭐</td><td>⭐⭐⭐</td></tr><tr><td>第三方工具</td><td>极丰富</td><td>有限</td><td>限于 GCP</td></tr></tbody></table>
<blockquote>
<p><strong>结论</strong>：NVIDIA GPU 的软件生态护城河极深，这不是硬件性能能简单跨越的。</p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="成本对比以-1-年-tco-估算">成本对比（以 1 年 TCO 估算）<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#%E6%88%90%E6%9C%AC%E5%AF%B9%E6%AF%94%E4%BB%A5-1-%E5%B9%B4-tco-%E4%BC%B0%E7%AE%97" class="hash-link" aria-label="Direct link to 成本对比（以 1 年 TCO 估算）" title="Direct link to 成本对比（以 1 年 TCO 估算）" translate="no">​</a></h2>
<table><thead><tr><th>方案</th><th>硬件/租赁成本</th><th>运维成本</th><th>开发迁移成本</th><th>总评</th></tr></thead><tbody><tr><td>4× H100 SXM5 自建</td><td>~$140,000</td><td>高</td><td>低</td><td>最稳妥</td></tr><tr><td>4× 昇腾 910B 自建</td><td>~$80,000-120,000</td><td>中</td><td>中-高</td><td>国产合规首选</td></tr><tr><td>TPU v5p 云上</td><td>按量付费</td><td>低</td><td>高（需迁移到 JAX）</td><td>GCP 生态限定</td></tr><tr><td>8× L40S 自建</td><td>~$60,000</td><td>中</td><td>低</td><td>性价比均衡</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="什么时候选什么">什么时候选什么？<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#%E4%BB%80%E4%B9%88%E6%97%B6%E5%80%99%E9%80%89%E4%BB%80%E4%B9%88" class="hash-link" aria-label="Direct link to 什么时候选什么？" title="Direct link to 什么时候选什么？" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-选-gpunvidia">✅ 选 GPU（NVIDIA）<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#-%E9%80%89-gpunvidia" class="hash-link" aria-label="Direct link to ✅ 选 GPU（NVIDIA）" title="Direct link to ✅ 选 GPU（NVIDIA）" translate="no">​</a></h3>
<p>除非你有非常特殊的理由，否则<strong>默认选 GPU</strong>。理由很简单：生态。</p>
<ul>
<li class="">你在用 PyTorch/TensorFlow/JAX（全部原生支持 CUDA）</li>
<li class="">你需要同时做训练和推理</li>
<li class="">你希望社区文档齐全，遇到问题能搜到答案</li>
<li class="">你需要灵活的部署方案（本地/云/边缘）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-选-npu昇腾端侧-npu">✅ 选 NPU（昇腾/端侧 NPU）<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#-%E9%80%89-npu%E6%98%87%E8%85%BE%E7%AB%AF%E4%BE%A7-npu" class="hash-link" aria-label="Direct link to ✅ 选 NPU（昇腾/端侧 NPU）" title="Direct link to ✅ 选 NPU（昇腾/端侧 NPU）" translate="no">​</a></h3>
<ul>
<li class=""><strong>你是中国政企客户</strong>：国产化要求，昇腾 910B 是最成熟的国产训练方案</li>
<li class=""><strong>你在做端侧 AI</strong>：手机 NPU（Apple/Qualcomm）或 PC NPU（AMD Ryzen AI）是能效最优解</li>
<li class=""><strong>你需要超低功耗推理</strong>：独立 NPU（Hailo-8L）在边缘场景比 GPU 省电 5-10 倍</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-选-tpugoogle-cloud">✅ 选 TPU（Google Cloud）<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#-%E9%80%89-tpugoogle-cloud" class="hash-link" aria-label="Direct link to ✅ 选 TPU（Google Cloud）" title="Direct link to ✅ 选 TPU（Google Cloud）" translate="no">​</a></h3>
<ul>
<li class=""><strong>你已经是 Google Cloud 深度用户</strong></li>
<li class=""><strong>你的模型用 JAX 开发</strong>（或者愿意迁到 JAX）</li>
<li class=""><strong>你需要大规模 TPU 集群</strong>（TPU v5p 的集群互联性能优势明显）</li>
<li class=""><strong>你不介意被锁定在 GCP</strong></li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="未来趋势">未来趋势<a href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu#%E6%9C%AA%E6%9D%A5%E8%B6%8B%E5%8A%BF" class="hash-link" aria-label="Direct link to 未来趋势" title="Direct link to 未来趋势" translate="no">​</a></h2>
<ul>
<li class=""><strong>异构计算成为常态</strong>：高端 AI 集群将同时包含 GPU + NPU + CPU 协同工作</li>
<li class=""><strong>架构收敛</strong>：NVIDIA 在 GPU 中加入越来越多的专用 AI 单元（Transformer Engine），NPU 则在增加通用计算能力</li>
<li class=""><strong>软件生态决定成败</strong>：未来 3 年，AMD 和华为能否挑战 NVIDIA 的关键不在硬件算力，而在 CUDA 兼容性和开发者体验</li>
<li class=""><strong>推理专用芯片崛起</strong>：Groq LPU、Cerebras WSE、Etched Sohu 等 AI 专用架构正在改写推理的性能/成本曲线</li>
</ul>
<hr>
<p><em>在 MirrorFrog 你可以找到以上所有芯片的驱动下载、开发文档和详细规格。</em></p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="GPU" term="GPU"/>
        <category label="NPU" term="NPU"/>
        <category label="TPU" term="TPU"/>
        <category label="架构对比" term="架构对比"/>
        <category label="深度学习" term="深度学习"/>
        <category label="推理" term="推理"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI 加速卡选型指南 2025：从训练到推理，如何选择最适合的芯片？]]></title>
        <id>https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025</id>
        <link href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025"/>
        <updated>2025-06-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[2025 年 AI 加速卡选型完整指南。涵盖 NVIDIA GPU、华为昇腾、Google TPU、Groq LPU 等主流芯片架构，从训练、推理、端侧部署等场景分析最佳选择。]]></summary>
        <content type="html"><![CDATA[<p>AI 加速卡市场在 2025 年已经变得前所未有的丰富。从 NVIDIA 的 Blackwell 到华为的昇腾 910B，从 Google 的 TPU v6 到 Groq 的 LPU，开发者面对的选择比以往任何时候都多。</p>
<p>但这既是好事，也是难题——选错了卡，要么多花冤枉钱，要么性能不达标。</p>
<p>本文从<strong>实际工作负载</strong>出发，帮你梳理选型逻辑。</p>
<!-- -->
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="第一步先搞清楚你的场景">第一步：先搞清楚你的场景<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#%E7%AC%AC%E4%B8%80%E6%AD%A5%E5%85%88%E6%90%9E%E6%B8%85%E6%A5%9A%E4%BD%A0%E7%9A%84%E5%9C%BA%E6%99%AF" class="hash-link" aria-label="Direct link to 第一步：先搞清楚你的场景" title="Direct link to 第一步：先搞清楚你的场景" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-训练training">🏋️ 训练（Training）<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#%EF%B8%8F-%E8%AE%AD%E7%BB%83training" class="hash-link" aria-label="Direct link to 🏋️ 训练（Training）" title="Direct link to 🏋️ 训练（Training）" translate="no">​</a></h3>
<p>训练是芯片选型中最苛刻的场景。你需要的是：</p>
<ul>
<li class=""><strong>高 FP8/FP16 算力</strong>：训练的核心是矩阵乘法，Tensor Core 数量决定一切</li>
<li class=""><strong>大显存</strong>：模型参数、梯度、优化器状态都要驻留显存。Llama 3 70B 全精度训练至少需要 140GB+ 显存</li>
<li class=""><strong>高带宽互联</strong>：多卡训练时，卡间通信带宽决定了 scaling efficiency</li>
<li class=""><strong>软件生态</strong>：PyTorch/TensorFlow/JAX 是否原生支持</li>
</ul>
<p><strong>首选</strong>：NVIDIA H100/H200/B200（生态最成熟，没有之一）</p>
<p><strong>国产替代</strong>：华为昇腾 910B（torch_npu 适配良好，但生态差距仍在）</p>
<p><strong>预算敏感</strong>：AMD ROCm（MI300X 性价比突出，但框架支持略逊）</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-推理inference">⚡ 推理（Inference）<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#-%E6%8E%A8%E7%90%86inference" class="hash-link" aria-label="Direct link to ⚡ 推理（Inference）" title="Direct link to ⚡ 推理（Inference）" translate="no">​</a></h3>
<p>推理又分两个子场景：</p>
<p><strong>在线推理（延迟敏感）</strong></p>
<p>要求首 token 延迟（TTFT）极低、吞吐稳定。适合：</p>
<ul>
<li class=""><strong>NVIDIA L40S / L4</strong>：推理优化的 Ada Lovelace 架构，FP8 支持，性价比高</li>
<li class=""><strong>Groq LPU</strong>：如果你能用 GroqCloud，LPU 的确定性延迟（800+ tok/s on Llama 3 8B）是杀手锏</li>
<li class=""><strong>Google Cloud TPU v5e</strong>：Cloud 上部署 JAX 模型的低延迟选择</li>
</ul>
<p><strong>离线批量推理（吞吐优先）</strong></p>
<p>不在乎单请求延迟，只看总吞吐量和成本：</p>
<ul>
<li class=""><strong>NVIDIA H200</strong>：大显存（141GB HBM3e）意味着可以塞进更大的 batch，提升整体吞吐</li>
<li class=""><strong>Intel Gaudi 3</strong>：性价比优于同代 NVIDIA，适合预算敏感的批量场景</li>
<li class=""><strong>Cerebras WSE-3</strong>：晶圆级芯片，单芯片即可运行大模型，省去分布式通信开销</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-端侧推理edge">📱 端侧推理（Edge）<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#-%E7%AB%AF%E4%BE%A7%E6%8E%A8%E7%90%86edge" class="hash-link" aria-label="Direct link to 📱 端侧推理（Edge）" title="Direct link to 📱 端侧推理（Edge）" translate="no">​</a></h3>
<ul>
<li class=""><strong>Qualcomm Hexagon NPU</strong>：Android 端 AI 推理首选，支持 INT8 量化</li>
<li class=""><strong>Apple Neural Engine</strong>：iPhone/Mac 上的 CoreML 加速</li>
<li class=""><strong>AMD Ryzen AI NPU</strong>（XDNA）：Ryzen 7040/8040 系列集成的端侧 NPU，适合 PC AI 应用</li>
<li class=""><strong>Hailo-8L</strong>：边缘设备独立 NPU，性价比高</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="第二步看预算">第二步：看预算<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#%E7%AC%AC%E4%BA%8C%E6%AD%A5%E7%9C%8B%E9%A2%84%E7%AE%97" class="hash-link" aria-label="Direct link to 第二步：看预算" title="Direct link to 第二步：看预算" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-不差钱企业级训练集群">🏦 不差钱（企业级训练集群）<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#-%E4%B8%8D%E5%B7%AE%E9%92%B1%E4%BC%81%E4%B8%9A%E7%BA%A7%E8%AE%AD%E7%BB%83%E9%9B%86%E7%BE%A4" class="hash-link" aria-label="Direct link to 🏦 不差钱（企业级训练集群）" title="Direct link to 🏦 不差钱（企业级训练集群）" translate="no">​</a></h3>
<table><thead><tr><th>配置</th><th>预估成本</th><th>适合</th></tr></thead><tbody><tr><td>8× H100 SXM5 (80GB)</td><td>$200,000-280,000</td><td>大模型训练首选</td></tr><tr><td>8× H200 SXM (141GB)</td><td>$240,000-320,000</td><td>需要更大显存的训练</td></tr><tr><td>8× B200 SXM</td><td>$240,000-360,000</td><td>Blackwell 最新架构</td></tr><tr><td>GB200 NVL (2 GPU + Grace)</td><td>$60,000-80,000/套</td><td>超级芯片方案</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-性价比优先训练推理">💰 性价比优先（训练+推理）<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#-%E6%80%A7%E4%BB%B7%E6%AF%94%E4%BC%98%E5%85%88%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86" class="hash-link" aria-label="Direct link to 💰 性价比优先（训练+推理）" title="Direct link to 💰 性价比优先（训练+推理）" translate="no">​</a></h3>
<table><thead><tr><th>配置</th><th>预估成本</th><th>适合</th></tr></thead><tbody><tr><td>4× L40S (48GB)</td><td>$30,000-40,000</td><td>中小规模训练+推理</td></tr><tr><td>8× L4 (24GB)</td><td>$24,000-36,000</td><td>轻量训练，推理为主</td></tr><tr><td>8× A100 80GB (二手)</td><td>$80,000-120,000</td><td>成熟方案，二手市场充足</td></tr><tr><td>AMD MI300X × 8</td><td>~$100,000-150,000</td><td>如果软件栈适配到位</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-免费低成本学习实验">🆓 免费/低成本（学习+实验）<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#-%E5%85%8D%E8%B4%B9%E4%BD%8E%E6%88%90%E6%9C%AC%E5%AD%A6%E4%B9%A0%E5%AE%9E%E9%AA%8C" class="hash-link" aria-label="Direct link to 🆓 免费/低成本（学习+实验）" title="Direct link to 🆓 免费/低成本（学习+实验）" translate="no">​</a></h3>
<table><thead><tr><th>方案</th><th>成本</th><th>适合</th></tr></thead><tbody><tr><td>GroqCloud API</td><td>免费额度</td><td>LLM 推理实验</td></tr><tr><td>Google Colab (T4)</td><td>$10/月起</td><td>小规模实验</td></tr><tr><td>Hugging Face Spaces</td><td>免费</td><td>Demo 部署</td></tr><tr><td>Oracle OCI (A100)</td><td>按需付费</td><td>灵活的实验环境</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="第三步生态兼容性">第三步：生态兼容性<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#%E7%AC%AC%E4%B8%89%E6%AD%A5%E7%94%9F%E6%80%81%E5%85%BC%E5%AE%B9%E6%80%A7" class="hash-link" aria-label="Direct link to 第三步：生态兼容性" title="Direct link to 第三步：生态兼容性" translate="no">​</a></h2>
<p>硬件再好，软件不支持也是白搭。以下矩阵揭示了当前主流框架和芯片的适配情况：</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="pytorch-生态">PyTorch 生态<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#pytorch-%E7%94%9F%E6%80%81" class="hash-link" aria-label="Direct link to PyTorch 生态" title="Direct link to PyTorch 生态" translate="no">​</a></h3>
<table><thead><tr><th>芯片</th><th>支持状态</th><th>备注</th></tr></thead><tbody><tr><td>NVIDIA CUDA</td><td>✅ 原生支持</td><td>PyTorch 官方发行版默认 CUDA 后端</td></tr><tr><td>AMD ROCm</td><td>✅ 官方支持</td><td>PyTorch 有 ROCm 预编译包</td></tr><tr><td>华为昇腾</td><td>⚠️ torch_npu</td><td>API 对齐 CUDA，迁移成本低，但社区资源不如 CUDA</td></tr><tr><td>Apple Silicon</td><td>✅ MPS 后端</td><td>M1/M2/M3/M4 系列 GPU，PyTorch MPS 后端支持</td></tr><tr><td>Intel GPU</td><td>⚠️ XPU 后端</td><td>oneAPI 支持 PyTorch，但成熟度有限</td></tr><tr><td>Google TPU</td><td>⚠️ 需要 JAX</td><td>PyTorch 可以通过 PJRT 跑 TPU，但非主流</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="llm-推理框架">LLM 推理框架<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#llm-%E6%8E%A8%E7%90%86%E6%A1%86%E6%9E%B6" class="hash-link" aria-label="Direct link to LLM 推理框架" title="Direct link to LLM 推理框架" translate="no">​</a></h3>
<table><thead><tr><th>芯片</th><th>vLLM</th><th>TensorRT-LLM</th><th>llama.cpp</th></tr></thead><tbody><tr><td>NVIDIA</td><td>✅ 最佳</td><td>✅ 最强优化</td><td>✅</td></tr><tr><td>AMD ROCm</td><td>✅</td><td>❌</td><td>✅</td></tr><tr><td>华为昇腾</td><td>⚠️ 社区版</td><td>❌</td><td>⚠️</td></tr><tr><td>Apple Silicon</td><td>❌</td><td>❌</td><td>✅ 原生</td></tr><tr><td>Intel GPU</td><td>❌</td><td>❌</td><td>✅</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="按模型规模的推荐矩阵">按模型规模的推荐矩阵<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#%E6%8C%89%E6%A8%A1%E5%9E%8B%E8%A7%84%E6%A8%A1%E7%9A%84%E6%8E%A8%E8%8D%90%E7%9F%A9%E9%98%B5" class="hash-link" aria-label="Direct link to 按模型规模的推荐矩阵" title="Direct link to 按模型规模的推荐矩阵" translate="no">​</a></h2>
<table><thead><tr><th>模型规模</th><th>训练推荐</th><th>推理推荐</th></tr></thead><tbody><tr><td>&lt; 7B（小模型）</td><td>L4 / L40S / A100</td><td>L4 / L40S / T4 / Groq LPU</td></tr><tr><td>7B - 70B（中型）</td><td>4-8× H100 / A100 / 昇腾 910B</td><td>H200 / L40S / Groq LPU</td></tr><tr><td>70B - 405B（大型）</td><td>8-32× H100/B200 / 昇腾 910B</td><td>H200 (141GB) / Cerebras WSE</td></tr><tr><td>&gt; 405B（超大型）</td><td>GB200 NVL / DGX 超级集群</td><td>H200/B200 大容量集群</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="总结一张图看懂选型逻辑">总结：一张图看懂选型逻辑<a href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025#%E6%80%BB%E7%BB%93%E4%B8%80%E5%BC%A0%E5%9B%BE%E7%9C%8B%E6%87%82%E9%80%89%E5%9E%8B%E9%80%BB%E8%BE%91" class="hash-link" aria-label="Direct link to 总结：一张图看懂选型逻辑" title="Direct link to 总结：一张图看懂选型逻辑" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">你的场景是什么？</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">├─ 训练大模型 → NVIDIA CUDA（生态最佳）→ 预算够？H100/B200；国产替代？昇腾 910B</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">├─ 训练中等模型 → A100 / L40S / AMD MI300X 均可</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">├─ 在线推理（低延迟）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│  ├─ 自建 → L40S / L4 / H200</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│  └─ API → GroqCloud（LLM 推理延迟王者）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">├─ 批量推理（高吞吐）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│  ├─ NVIDIA H200（大显存高吞吐）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│  └─ Intel Gaudi 3 / Cerebras（性价比路线）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">├─ 端侧推理</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│  ├─ 手机 → Qualcomm / Apple NPU</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│  ├─ PC → AMD Ryzen AI NPU</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">│  └─ 边缘设备 → Hailo-8L / NVIDIA Jetson</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">└─ 学习实验 → 云上 T4 / L4 / GroqCloud 免费额度</span><br></div></code></pre></div></div>
<hr>
<p><em>本站收录了以上提及的绝大部分芯片的驱动下载和开发文档链接，欢迎按分类浏览。</em></p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="选型指南" term="选型指南"/>
        <category label="GPU" term="GPU"/>
        <category label="NPU" term="NPU"/>
        <category label="TPU" term="TPU"/>
        <category label="ASIC" term="ASIC"/>
        <category label="深度学习" term="深度学习"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[MirrorFrog 更新：新增博客与术语表，内容大幅扩充]]></title>
        <id>https://mirrorfrog.com/en/blog/new-blog-and-reference</id>
        <link href="https://mirrorfrog.com/en/blog/new-blog-and-reference"/>
        <updated>2025-06-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[MirrorFrog 站点升级：新增更新日志博客、AI 加速芯片术语表、3 篇深度文章，为 58 款芯片提供更全面的内容。]]></summary>
        <content type="html"><![CDATA[<p>本次更新内容：</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-更新日志博客">📝 更新日志（博客）<a href="https://mirrorfrog.com/en/blog/new-blog-and-reference#-%E6%9B%B4%E6%96%B0%E6%97%A5%E5%BF%97%E5%8D%9A%E5%AE%A2" class="hash-link" aria-label="Direct link to 📝 更新日志（博客）" title="Direct link to 📝 更新日志（博客）" translate="no">​</a></h2>
<p>新增博客板块，今后所有站点更新都会在这里发布。订阅 RSS 可以第一时间获取更新通知。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-参考专区">📖 参考专区<a href="https://mirrorfrog.com/en/blog/new-blog-and-reference#-%E5%8F%82%E8%80%83%E4%B8%93%E5%8C%BA" class="hash-link" aria-label="Direct link to 📖 参考专区" title="Direct link to 📖 参考专区" translate="no">​</a></h2>
<p>新增<strong>术语表</strong>页面，涵盖 TFLOPS、TOPS、HBM、NVLink、Tensor Core、Transformer Engine 等 50+ 常用术语的解释，帮助新手快速上手。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-新增深度文章">📚 新增深度文章<a href="https://mirrorfrog.com/en/blog/new-blog-and-reference#-%E6%96%B0%E5%A2%9E%E6%B7%B1%E5%BA%A6%E6%96%87%E7%AB%A0" class="hash-link" aria-label="Direct link to 📚 新增深度文章" title="Direct link to 📚 新增深度文章" translate="no">​</a></h2>
<p>在同一台设备上写出高质量内容后，我们特别为 AI 开发者推出了三篇深度专题文章：</p>
<ol>
<li class=""><strong><a class="" href="https://mirrorfrog.com/en/blog/ai-accelerator-guide-2025">AI 加速卡选型指南 2025</a></strong> — 从训练、推理、端侧部署等场景出发，帮你找到最适合的加速卡</li>
<li class=""><strong><a class="" href="https://mirrorfrog.com/en/blog/gpu-vs-npu-vs-tpu">GPU vs NPU vs TPU：三种架构怎么选</a></strong> — 架构原理、性能实测、生态成熟度、TCO 全方位对比</li>
<li class=""><strong><a class="" href="https://mirrorfrog.com/en/blog/china-ai-chip-landscape-2025">国产 AI 芯片格局 2025</a></strong> — 昇腾、寒武纪、海光等国产芯片全面盘点</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-技术优化">🔧 技术优化<a href="https://mirrorfrog.com/en/blog/new-blog-and-reference#-%E6%8A%80%E6%9C%AF%E4%BC%98%E5%8C%96" class="hash-link" aria-label="Direct link to 🔧 技术优化" title="Direct link to 🔧 技术优化" translate="no">​</a></h2>
<ul>
<li class="">启用博客 RSS Feed，搜索引擎发现速度提升</li>
<li class="">添加强化 Sitemap 配置</li>
<li class="">丰富所有分类索引页的描述内容</li>
<li class="">准备英文版基础配置</li>
</ul>
<p>欢迎通过 <a href="https://github.com/MirrorFrog/mirrorfrog.github.io" target="_blank" rel="noopener noreferrer" class="">GitHub</a> 提交反馈！</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="公告" term="公告"/>
        <category label="站点更新" term="站点更新"/>
        <category label="新功能" term="新功能"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[MirrorFrog 站点上线 — AI 加速芯片驱动与文档导航]]></title>
        <id>https://mirrorfrog.com/en/blog/site-launch</id>
        <link href="https://mirrorfrog.com/en/blog/site-launch"/>
        <updated>2025-06-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[MirrorFrog 正式上线！这是一个开源的 AI 加速芯片驱动与文档导航站。]]></summary>
        <content type="html"><![CDATA[<p><strong>MirrorFrog</strong> 正式上线！这是一个开源的 AI 加速芯片驱动与文档导航站。</p>
<p>目前收录内容包括：</p>
<ul>
<li class=""><strong>GPU</strong>：13 款，涵盖 NVIDIA CUDA、AMD ROCm、Intel GPU、Apple Silicon、摩尔线程、壁仞等</li>
<li class=""><strong>NPU</strong>：9 款，涵盖华为昇腾、AMD Ryzen AI、Qualcomm Hexagon、Apple Neural Engine 等</li>
<li class=""><strong>TPU</strong>：2 款（Google Cloud TPU、Coral Edge TPU）</li>
<li class=""><strong>LPU</strong>：1 款（Groq LPU）</li>
<li class=""><strong>IPU</strong>：1 款（Graphcore IPU）</li>
<li class=""><strong>DPU</strong>：3 款（NVIDIA BlueField、Intel IPU、AMD Pensando）</li>
<li class=""><strong>FPGA</strong>：3 款（AMD Alveo、Intel FPGA AI、Achronix Speedster）</li>
<li class=""><strong>ASIC</strong>：16 款，涵盖 Intel Gaudi、Cerebras WSE、寒武纪、海光、燧原等</li>
</ul>
<!-- -->
<p>后续计划：</p>
<ul>
<li class="">持续新增 AI 加速芯片</li>
<li class="">补充规格参数与性能基准数据</li>
<li class="">增加芯片横向对比功能</li>
<li class="">开放英文版</li>
</ul>
<p>欢迎通过 <a href="https://github.com/MirrorFrog/mirrorfrog.github.io" target="_blank" rel="noopener noreferrer" class="">GitHub</a> 提交新芯片或报告问题。</p>]]></content>
        <author>
            <name>AI Compute Cards Wiki Editorial</name>
            <uri>https://github.com/anomalyco/mirrorfrog</uri>
        </author>
        <category label="公告" term="公告"/>
        <category label="站点更新" term="站点更新"/>
    </entry>
</feed>