Capacity Desk
Right-sizing GPUs and provisioned throughput
PTU Utilization · Weighted Avg
GPU Node Util · 14d × Pool
Idle30%60%85%+
p4d-ml-1
p4d-ml-2
p5-train
g5-infer
a10g-dev
PTU Recommendations
| Model | Util | Action | Status |
|---|---|---|---|
| Sonnet · Prod | 81% | Hold | Hold |
| Opus · Agents | 38% | −2 PTU | Down |
| Haiku · Triage | 76% | Hold | Hold |
| GPT-4o · RAG | 89% | +1 PTU | Up |
| Llama · Self-hosted | 42% | Bin-pack | Down |
Capacity Forecast vs. Actual · QPS
ForecastActualProvisioned capacity
Commitment Mix · Last 6 Months
PTUOn-demandReserved
−24%
Eff. $/req
Overview
Built a Python capacity model that joins three inputs (historical utilization, the engineering roadmap, and provider pricing) and outputs a recommended commitment mix per model family with sensitivity analysis.
Tools
- Python
- Snowflake
- AWS CUR
- Kubecost
- NVIDIA DCGM
- EKS
- Anthropic / OpenAI PTU