Cost & LLM Strategy
Tiered model routing to minimize cost without sacrificing quality
LLM cost is controllable if you treat it as an engineering problem, not a fixed API bill. Route by complexity, cache aggressively, and push deterministic logic out of the model entirely.
Design Principles
Four rules that govern every inference decision in the system.
Use cheapest viable model
Default to the smallest model that meets accuracy thresholds. Premium models are the exception, not the rule.
Reserve premium for complex reasoning
Only invoke expensive models for nuanced dispute resolution and multi-step policy interpretation.
Reduce tokens aggressively
Structured retrieval, summarization caches, and pre-filtered context keep token counts low without sacrificing relevance.
Push deterministic checks to rules engine
Policy checks that are binary (refund ≤ $X, within Y days) go through a rules engine — no LLM needed.
Model Routing
Three tiers — route by case complexity, pay only for the intelligence you need.
Templates, classification, lightweight summarization
Contextual response drafting + policy retrieval
Nuanced reasoning for complex disputes and edge cases
Cost at Scale
Projected monthly LLM cost at 1M tickets/month.
Key insight: Tier 2 handles only 5% of tickets but drives ~54% of total LLM cost. Every 1pp reduction in premium-model invocation rate saves ~$540/month at scale.
Cost Controls
Five mechanisms that prevent cost creep as volume scales.
Hard caps per policy category prevent runaway token spend on low-value cases.
Cap retrieved chunks at 3–5 per query; truncate context to essential history only.
Model output capped at 300 tokens for Tier 0, 600 for Tier 1, 1,200 for Tier 2.
Case summaries generated once and cached — subsequent turns reference the summary, not raw history.
Audits run in batch during low-traffic hours to avoid premium-rate inference costs.
KPI Dashboard
Five metrics that keep cost visible and actionable for ops and engineering.
Blended LLM cost across all tiers for cases where copilot was engaged.
End-to-end cost including audit, routing, and any supervisor review time.
Percentage of cases escalated to Tier 2 — target <5%, alert at >8%.
Segmented token usage reveals which markets or verticals drive disproportionate cost.
Pareto chart of cost per case against suggestion acceptance rate — find the efficient frontier.
$0.005 per ticket at scale
At 1M tickets/month, tiered routing keeps total LLM cost under $5,100/month. The cost structure is dominated by Tier 2 — every improvement in classifier accuracy that correctly routes cases to lower tiers compounds into significant savings.