Capacity Desk

Right-sizing GPUs and provisioned throughput

PTU Utilization · Weighted Avg

GPU Node Util · 14d × Pool

Idle30%60%85%+

p4d-ml-1

p4d-ml-2

p5-train

g5-infer

a10g-dev

PTU Recommendations

Model	Util	Action	Status
Sonnet · Prod	81%	Hold	Hold
Opus · Agents	38%	−2 PTU	Down
Haiku · Triage	76%	Hold	Hold
GPT-4o · RAG	89%	+1 PTU	Up
Llama · Self-hosted	42%	Bin-pack	Down

Capacity Forecast vs. Actual · QPS

ForecastActualProvisioned capacity

Commitment Mix · Last 6 Months

PTUOn-demandReserved

−24%

Eff. $/req

Overview

Built a Python capacity model that joins three inputs (historical utilization, the engineering roadmap, and provider pricing) and outputs a recommended commitment mix per model family with sensitivity analysis.

Tools

Python
Snowflake
AWS CUR
Kubecost
NVIDIA DCGM
EKS
Anthropic / OpenAI PTU