The workflow

How three agents stay in sync
without talking to each other.

The ring coordinates through shared state, not chatter. A single database is the brain; routing flags are the nervous system; human sign-off is the gate on anything that leaves the building. Here's the whole machine.

Shared state

One database, one source of truth

All three peers share a single Postgres database exposed through a data-API layer. Each entity has a clear owner — the agent allowed to write it — and the others are read-only or blind to it entirely. This is how the ring stays coherent across days, weeks, and phase boundaries without ever holding everything in one context window.

EntityWhat it holdsOwner (writes)
signalsRaw inbound from Haleon / Microsoft / backlog / scout observationsCoS + 5 scouts (tier_0_reader)
briefsThe 5-3-2-1 morning & end-of-day brief rowsCoS
comms_draftsOutbound queue, gated on Charlie sign-offCoS
quiet_watch_stateSilent-stakeholder trackingCoS
meeting_preps · steerco_prepsPre-meeting prep docsCoS
adrsArchitecture decision records + DB-backed review_stage axisSA
compliance_statePer-use-case regulatory postureSA
reuse_catalogCross-pod reusable patternsSA
decisionsNarrative log; peer routing is queryable data — decision_class, owning_peer, adr_id, narrative (ISS-14)CoS / SA
agent_runsPer-turn audit spine. Class enum (7): {turn, rotation, snapshot, ado_drain, audit, correction, scout_sweep}. Multi-writer; gap-scan key (session_id, session_turn_seq)multi-writer (every role writes its own); Steward audits
rotation_log · agent_snapshotsRotation handover records + Historian snapshots; cross-referenced from agent_runs via detail_ref (detail-row-first ordering, HS-4)Steward (Operator for Steward self-rotation)
cost_telemetryWhole-UTC-day cost cells; derived projection of agent_runs tokens via sp_rollup_cost_telemetryderived (rollup SP)
ado_writeback_queuePending ADO ops; gated default pending_approval; drained by ado-scribe through CAS-guarded sp_update_ado_writebackCoS
tick_lease · scout_lease · drainer_leaseThree independent single-flight mutexes (re-entrant same-holder CAS, TTL-expiry steal)CoS (via shim entities)
scout_watermark · scout_enable_flagsPer-source scan watermark; 3-greens DB gate (cost_breaker_live / drain_proven / scout_proven) read by sp_check_scout_enabled()scouts + Charlie

The shared brain spans ~29 base tables (22 originals + the Batch 3–7 substrate: rotation_log, agent_snapshots, the three leases, scout_watermark, scout_enable_flags) exposed through 52 DAB entities (base reads + *In shim views + action-views like CostRollupDayIn / CostBreakerRollingIn / ScoutEnabledCheckIn). A handful of cross-cutting entities (pod intel, customer health, risks, recipient registry) are read by more than one peer but still have a single writer. The rule never changes: one writer per row, everyone else reads or stays blind.

Who can talk to whom

Communication topology

The peers never call each other directly. The Chief of Staff and Solution Architect coordinate by writing routing flags onto decisions rows. The Steward is one-way: it reads health metadata and only ever "speaks" by spawning a replacement. Charlie is the only node everyone talks to directly.

Communication topology Chief of Staff and Solution Architect exchange routing flags through the decisions table in two directions: needs-SA-review going to the architect, and needs-narrative-for-steerco coming back. Both also surface escalations directly to Charlie. The Steward reads agent-run metadata from both and spawns replacements. Chief of Staff signal triage Solution Architect ADR work Charlie human Steward health-watch · rotation needs_sa_review → ← needs_narrative_for_steerco escalate escalate reads agent_runs metadata · spawns replacements
Rose arcs = routing flags through the decisions table (the only peer channel). (Peer routing is now persisted as data — decision_class, owning_peer, adr_id, narrative on decisions (ISS-14); the per-handoff flag values shown are illustrative shorthand over those columns.) Grey dashes = escalations to Charlie. Teal dashes = the Steward reading metadata and spawning replacements.

End to end

A day in the life of the ring

Follow one signal — a Haleon stakeholder raising a data-residency concern — as it moves through the ring from inbound ping to ratified decision and customer-ready reply. (Today the ring is on-demand only. The timeline below shows the intended cadence; the recurring daily-brief cron and the 15-min ring-tick are not yet enabled — see Roadmap.)

  • 07:00
    CoS Morning brief. Overnight signals are metabolised into a 5-3-2-1 brief and posted to Charlie via Teams. The residency concern lands as one of the top-5 risks.
  • 07:12
    CoS Routing. The concern is decision-shaped and technical, so CoS writes a decisions row with routing=needs_sa_review — and purges the technical detail from its own working memory.
  • 09:30
    SA Picks up the queue. SA reads the routed row, checks the reuse catalog for an existing residency pattern, finds none, and opens a new ADR in draft.
  • 10:05
    SA Adversarial review. SA spawns the Reviewer skill. It pushes back on the first draft; SA revises. On pass, the ADR moves to reviewed.
  • 10:40
    SA Compliance gate. The decision touches a consumer-health surface, so the Compliance Checker runs (mandatory). It returns needs-review and a compliance_state row is written.
  • 11:15
    Steward Health tick. A routine metadata scan: both peers green, turn counts nominal, cost within band. An agent_runs audit row is written. Silent — no Teams.
  • 14:00
    👤 Charlie ratifies. SA surfaces the reviewed, compliance-checked ADR. Charlie acks; the ADR becomes ratified and SA writes a hand-back row with routing=needs_narrative_for_steerco.
  • 14:20
    CoS Drafts the reply. CoS picks up the narrative, drafts a reply to the Haleon stakeholder, runs it through the Outbound Voice skill, and queues it at awaiting_signoff.
  • 14:25
    👤 Charlie sends. He reviews the draft, tweaks one line, and approves. Only now does anything leave the building. human gate
  • 17:00
    CoS End-of-day brief. The residency thread is closed out in the brief, the ratified ADR noted, and it's added to the Steerco prep accumulation for the week.
Notice what never happened: the two agents never sent each other a message, no agent ever made an architecture call outside the Reviewer + Compliance gates, and nothing reached the customer without Charlie's explicit approval.

Staying healthy over the long haul

The rotation lifecycle

A long-lived agent's context degrades over time — it re-asks questions, contradicts earlier decisions, loses the thread. The Steward exists to catch that and swap in a fresh successor without losing institutional memory. There are two ceremonies, and they're deliberately kept distinct.

① Context-degradation rotation

Any-time, threshold-driven, low-ceremony. Triggered by the two-signal rule on health metadata. The dying peer emits an ## Open state block; a successor is spawned, adopts that state, and is smoke-tested before the old one is archived.

② Phase-boundary rotation

Heavier, calendar-driven at the Pilot→Scale and Scale→Transform markers. Includes a fuller Historian snapshot and a phase-carryover document. Both delivery peers rotate together — a clean break for the new phase — and the Steward rotates last, with Charlie's ack.

The ceremony, step by step

The executable ceremony lives in the generic rotate-role skill (~/.copilot/m-skills/rotate-role/SKILL.md), driven by ~/.copilot/m-workflows/haleon/workflow.json. The old loom-specific rotate-clawpilot-role is a deprecated forwarding shim.

  1. Classify. The dying peer is marked RED via the two-signal rule (or a hard self-flag / Charlie request).
  2. Freeze & request state. Confirm the peer is idle, then ask it for its ## Open state block plus swap-protocol additions. Address by sessionId, not name — the name is about to be reused (D8).
  3. Snapshot. Historian for narrative roles (SA / Architect) → structured snapshot child; mechanical roles (CoS / Steward) → inline summary from the freeze response. Snapshot persisted to the dedicated agent_snapshots table (no longer the runtime ledger).
  4. Spawn the successor. Rotator spawns from the same role brief (resolved from workflow.json), handed the predecessor's open state to adopt.
  5. Await boot ack (D3 — no boot-turn race). Wait for the successor's ## Context health green ack before releasing the SDK handle. Release with keepPendingTurn:false.
  6. Status-probe re-adopt (D4 — no orphan-turn race). A lightweight ping cleanly re-adopts the successor handle before the heavy readiness smoke-test.
  7. Smoke-test. If it fails, both predecessor and failed successor stay alive — the successor is archived (delete:false), never deleted — and Charlie is pinged.
  8. Archive. On a clean smoke-test, the predecessor is archived (delete:false, never deleted) and Charlie gets one Teams notification. The ZOMBIE branch (D5) defaults to cold-spawn + archive; transcript-synthesis or delete:true requires an explicit Charlie ack quoted in rotation_log.notes.
  9. Broadcast. The successor announces its new sessionId to the surviving peers so the ring re-links — for a Steward rotation, only after Charlie's ack.
  10. Record (Step 9b — detail-row-first, HS-4). Write rotation_log and agent_snapshots rows FIRST, capture their ids, then INSERT the agent_runs spine row with class='rotation' and detail_ref already populated. The runtime ledger (plan.md) is updated as a human-readable companion.
The cardinal rule of rotation: the rotator spawns the successor — the role being rotated never spawns itself. And the Steward never rotates itself without Charlie's explicit acknowledgement; per workflow.json's selfRotationRotator:"operator", Operator (Charlie's hands-on session) is the rotator-of-record for Steward self-rotation — the outgoing Steward NEVER spawns its own successor. That's single-point-of-failure protection for the ring.

See what the ring actually produces →