Cost & LLM Strategy

Tiered model routing to minimize cost without sacrificing quality

LLM cost is controllable if you treat it as an engineering problem, not a fixed API bill. Route by complexity, cache aggressively, and push deterministic logic out of the model entirely.

Design Principles

Four rules that govern every inference decision in the system.

Use cheapest viable model

Default to the smallest model that meets accuracy thresholds. Premium models are the exception, not the rule.

Reserve premium for complex reasoning

Only invoke expensive models for nuanced dispute resolution and multi-step policy interpretation.

Reduce tokens aggressively

Structured retrieval, summarization caches, and pre-filtered context keep token counts low without sacrificing relevance.

Push deterministic checks to rules engine

Policy checks that are binary (refund ≤ $X, within Y days) go through a rules engine — no LLM needed.

Model Routing

Three tiers — route by case complexity, pay only for the intelligence you need.

Tier 0

Low Complexity

Templates, classification, lightweight summarization

Ticket share~70%

Avg tokens/case~900

Cost/1M tokens$0.60

Tier 1

Medium Complexity

Contextual response drafting + policy retrieval

Ticket share~25%

Avg tokens/case~2,200

Cost/1M tokens$3.00

Tier 2

High Complexity

Nuanced reasoning for complex disputes and edge cases

Ticket share~5%

Avg tokens/case~4,500

Cost/1M tokens$12.00

Cheapest (70%)Mid-tier (25%)Premium (5%)

Cost at Scale

Projected monthly LLM cost at 1M tickets/month.

1M tickets / month

~$5,028/mo

Per ticket

~$0.005

Tier 0700K tickets

$378

7.5%

630M tokens

Tier 1250K tickets

$1,650

32.8%

550M tokens

Tier 250K tickets

$2,700

53.7%

225M tokens

Audit1M tickets

$300

500M tokens

Key insight: Tier 2 handles only 5% of tickets but drives ~54% of total LLM cost. Every 1pp reduction in premium-model invocation rate saves ~$540/month at scale.

Per-Ticket Cost by Type

Verified cost ranges based on model routing and token assumptions.

Simple / Routine

Tier 0 only. Template lookups, basic classification.

Cost per ticket

$0.00072 – $0.00102

Medium / Policy-dependent

Tier 1 primary, possible Tier 0 pre-pass for context extraction.

Cost per ticket

$0.0057 – $0.0093

Complex / Dispute

Tier 2 reasoning + Tier 1 retrieval. Highest token consumption per case.

Cost per ticket

$0.0423 – $0.0843

Note: Ranges reflect sensitivity to ±30% token variation per call. Actual costs depend on prompt design, retrieval chunk sizes, and model pricing at time of deployment. All prices are placeholder benchmarks — final figures require vendor negotiation.

Cost Controls

Five mechanisms that prevent cost creep as volume scales.

Token budget per case type

Hard caps per policy category prevent runaway token spend on low-value cases.

Max context window + retrieval chunk limits

Cap retrieved chunks at 3–5 per query; truncate context to essential history only.

Response length caps

Model output capped at 300 tokens for Tier 0, 600 for Tier 1, 1,200 for Tier 2.

Reuse summaries across session

Case summaries generated once and cached — subsequent turns reference the summary, not raw history.

Batch post-resolution audits off-peak

Audits run in batch during low-traffic hours to avoid premium-rate inference costs.

KPI Dashboard

Six metrics that keep cost visible and actionable for ops and engineering.

Cost per assisted case

Blended LLM cost across all tiers for cases where copilot was engaged.

Cost per resolved case

End-to-end cost including audit, routing, and any supervisor review time.

Premium-model invocation rate

Percentage of cases escalated to Tier 2 — target <5%, alert at >8%.

Tokens per case by market/product

Segmented token usage reveals which markets or verticals drive disproportionate cost.

Cost vs quality frontier

Pareto chart of cost per case against suggestion acceptance rate — find the efficient frontier.

Suggestion→close cost efficiency by queue

How much LLM spend is needed to close one ticket, segmented by queue — identifies queues where copilot ROI is weakest.

$0.005 per ticket at scale

At 1M tickets/month, tiered routing keeps total LLM cost under $5,100/month — including 100% post-resolution audit coverage. The cost structure is dominated by Tier 2 — every improvement in classifier accuracy that correctly routes cases to lower tiers compounds into significant savings.