back to projects

Capacity Desk

Right-sizing GPUs and provisioned throughput

PTU Utilization · Weighted Avg
0%100%74%▲ from 52% in Q3
GPU Node Util · 14d × Pool
Idle30%60%85%+
p4d-ml-1
p4d-ml-2
p5-train
g5-infer
a10g-dev
PTU Recommendations
ModelUtilActionStatus
Sonnet · Prod81%HoldHold
Opus · Agents38%−2 PTUDown
Haiku · Triage76%HoldHold
GPT-4o · RAG89%+1 PTUUp
Llama · Self-hosted42%Bin-packDown
Capacity Forecast vs. Actual · QPS
ForecastActualProvisioned capacity
Commitment Mix · Last 6 Months
PTUOn-demandReserved
−24%
Eff. $/req
Overview

Built a Python capacity model that joins three inputs (historical utilization, the engineering roadmap, and provider pricing) and outputs a recommended commitment mix per model family with sensitivity analysis.

Tools
  • Python
  • Snowflake
  • AWS CUR
  • Kubecost
  • NVIDIA DCGM
  • EKS
  • Anthropic / OpenAI PTU