Delivery Plan

Three phases from foundation to mature governance over 18 months

Ship fast, prove value, then scale. Every phase has measurable gates — if the data doesn't support continuing, we stop and recalibrate before burning more budget.

Phase 1

Day 0–180

Phase 2

Day 181–365

Phase 3

Day 366–540

Phase 1: Build & Validate

Day 0–180 — foundation spikes, controlled pilot, then hardening.

Foundation

Day 0–60

Establish baseline metrics across SG GrabFood support
Spike 1 (2 weeks): Policy retrieval engine — target >90% precision on top-20 policies
Spike 2 (3 weeks): Deviation taxonomy classifier — map override categories to risk tiers
Spike 3 (3 weeks): Copilot panel v1 — inline suggestions in agent desktop
Stand up inference gateway with model routing and token tracking

Pilot

Day 60–120

Market: Singapore, GrabFood vertical
50 agents in treatment group + 50 control agents
8-week controlled experiment with weekly data reviews
Go/no-go decision gate at Day 90

Harden

Day 120–180

Activate risk scoring with approval gates for Tier H deviations
Launch post-resolution audit pipeline (sampled)
Supervisor queue with SLA-based routing
Weekly model calibration cycles begin

Go / No-Go Gates — Day 90

Five criteria that must pass before the pilot graduates to hardening. Miss any gate and we pause.

≥5pp improvement

Policy adherence

No regression >−0.1

CSAT

No regression >+5%

P50 resolution time

>70% using suggestions

Agent adoption

>75% true positive

Risk precision

All five gates must pass at Day 90. Failure on any metric triggers a pause-and-recalibrate cycle.

Phase 2: Expand & Learn

Day 181–365 — multi-market rollout, ML upgrades, and gamification.

Expand to Malaysia and Indonesia markets

Gamification layer — badges for confirmed good judgment

ML classifier upgrade from rules-based to trained model

Compliance dashboard for ops managers

Identify graduation candidates for reduced friction (Band A)

Phase 3: Scale & Optimize

Day 366–540 — self-serve migration, upstream prevention, and platform maturity.

Migrate ≥3 high-volume flows to self-serve with copilot fallback

Upstream prevention — feed deviation patterns back to product teams

Cost optimization — model routing tuned by case complexity

Platform hardening — multi-region, failover, audit trail

Mature governance — quarterly calibration, policy version control

Supervisor Capacity Model

How many supervisor FTEs are needed at each scale for Tier H review queues.

Pilot

~50

Tickets/month~15K/month

Tier H reviews~225 reviews

Supervisor need<0.25 FTE

Full Scale

All markets

Tickets/month1M/month

Tier H reviews~15K reviews

Supervisor need~15 FTE

Assumption: ~1.5% of all tickets produce Tier H deviations requiring supervisor review. At ~10 min/review, one FTE handles ~1,000 reviews/month. As ML precision improves, false positives drop and FTE needs decrease.

Feasibility Matrix

Technical feasibility and confidence level for each core capability.

Capability	Feasibility	Confidence
Real-time policy suggestions	High	Medium
Deviation risk scoring	Medium	Medium
Selective approval gates	High	High
Post-resolution audit	High	Medium
Multi-market policy engine	Medium	Low–Med
Cost-optimized model routing	High	Medium

Prove it small, scale it fast

The 180-day foundation phase is designed to produce clear, measurable proof that selective friction works — before committing to multi-market expansion. Every gate is quantifiable and every phase has a kill switch.