Delivery Plan

Three phases from foundation to mature governance over 18 months

Ship fast, prove value, then scale. Every phase has measurable gates — if the data doesn't support continuing, we stop and recalibrate before burning more budget.

Phase 1
Day 0–180
Phase 2
Day 181–365
Phase 3
Day 366–540

Phase 1: Build & Validate

Day 0–180 — foundation spikes, controlled pilot, then hardening.

1

Foundation

Day 0–60
  • Establish baseline metrics across SG GrabFood support
  • Spike 1 (2 weeks): Policy retrieval engine — target >90% precision on top-20 policies
  • Spike 2 (3 weeks): Deviation taxonomy classifier — map override categories to risk tiers
  • Spike 3 (3 weeks): Copilot panel v1 — inline suggestions in agent desktop
  • Stand up inference gateway with model routing and token tracking
2

Pilot

Day 60–120
  • Market: Singapore, GrabFood vertical
  • 50 agents in treatment group + 50 control agents
  • 8-week controlled experiment with weekly data reviews
  • Go/no-go decision gate at Day 90
3

Harden

Day 120–180
  • Activate risk scoring with approval gates for Tier H deviations
  • Launch post-resolution audit pipeline (sampled)
  • Supervisor queue with SLA-based routing
  • Weekly model calibration cycles begin

Go / No-Go Gates — Day 90

Five criteria that must pass before the pilot graduates to hardening. Miss any gate and we pause.

≥5pp improvement

Policy adherence

No regression >−0.1

CSAT

No regression >+5%

P50 resolution time

>70% using suggestions

Agent adoption

>75% true positive

Risk precision

All five gates must pass at Day 90. Failure on any metric triggers a pause-and-recalibrate cycle.

Phase 2: Expand & Learn

Day 181–365 — multi-market rollout, ML upgrades, and gamification.

1
Expand to Malaysia and Indonesia markets
2
Gamification layer — badges for confirmed good judgment
3
ML classifier upgrade from rules-based to trained model
4
Compliance dashboard for ops managers
5
Identify graduation candidates for reduced friction (Band A)

Phase 3: Scale & Optimize

Day 366–540 — self-serve migration, upstream prevention, and platform maturity.

1
Migrate ≥3 high-volume flows to self-serve with copilot fallback
2
Upstream prevention — feed deviation patterns back to product teams
3
Cost optimization — model routing tuned by case complexity
4
Platform hardening — multi-region, failover, audit trail
5
Mature governance — quarterly calibration, policy version control

Supervisor Capacity Model

How many supervisor FTEs are needed at each scale for Tier H review queues.

Pilot
~50
Tickets/month~15K/month
Tier H reviews~225 reviews
Supervisor need<0.25 FTE
Full Scale
All markets
Tickets/month1M/month
Tier H reviews~15K reviews
Supervisor need~15 FTE

Assumption: ~1.5% of all tickets produce Tier H deviations requiring supervisor review. At ~10 min/review, one FTE handles ~1,000 reviews/month. As ML precision improves, false positives drop and FTE needs decrease.

Feasibility Matrix

Technical feasibility and confidence level for each core capability.

CapabilityFeasibilityConfidence
Real-time policy suggestionsHighMedium
Deviation risk scoringMediumMedium
Selective approval gatesHighHigh
Post-resolution auditHighMedium
Multi-market policy engineMediumLow–Med
Cost-optimized model routingHighMedium

Prove it small, scale it fast

The 180-day foundation phase is designed to produce clear, measurable proof that selective friction works — before committing to multi-market expansion. Every gate is quantifiable and every phase has a kill switch.