Inference burn — worked example (ballpark)
Picture a B2B SaaS support agent: four hundred conversations a day, eight turns average, ~750 tokens in and ~450 tokens out per turn (rough — your prompts vary). That is about 3,200 turns/day × 1,200 tokens ≈ 3.8M tokens/day → ~115M tokens/month.
Pricing moves whenever providers ship new models — treat numbers below as order-of-magnitude, not accounting truth. At illustrative blended rates of roughly $4 per million tokens for a frontier-class model (mix of input/output), raw API spend lands near hundreds of dollars a month at this scale — before retrieval, before redundancy, before human review of edge cases.
Now double your estimate for retries, evaluation runs, and staging environments. Then add 30% ego margin because marketing will ask for “smarter answers” (longer outputs) after launch.