Fogbreak · the BREAKER harness
the rails for your AI ~

BREAKER

Most AI agents are a clever model on a loose leash. BREAKER is the leash, the rails, and the map — it lets the AI do real work end-to-end, but never lets it do anything that isn't checked, signed, and reversible.

Boundary-Regulated Execution & Agent Knowledge Engine Routing

deterministic rails + AI judgement ✅ all 6 planes BUILT — 101 tests · merged

First, the big idea: the harness is the moat

Everyone can use the same AI models — the model is the easy, commodity part. The hard, valuable part is everything wrapped around it: the rules, the memory, the permissions, the recovery. That wrapper is called the harness.

~98%of a top coding agent (Claude Code, teardown) is harness — only ~2% is the AI itself

So the smartest move isn't a better model — it's a better harness. BREAKER is Fogbreak's harness, and it's where our durable advantage lives. It's built from six interlocking parts (we call them planes) — and every one of them sits on top of tech we already have running: Kestra, Omnigent, the Leash, and our knowledge layers.

Grounded in: a Claude Code v2.1.88 teardown · arXiv:2605.18747 "Code as Agent Harness" · 4-model research (GLM · GPT-5.5 · Gemini · Perplexity).

The six planes (tap any card — the fog clears to show the deep version)

1 The Blueprint

📐 Manifest

One document describes a whole work-area (a Shore): its steps, who's allowed to do what, which AI agents run it, and where it pauses for you. From that single blueprint, BREAKER builds both the steady assembly-line and the AI workers — so the two can never drift apart.


How it actually works

  • One BreakerManifest (YAML) compiles to: Kestra FlowDefinitions (the deterministic rail) + Omnigent AgentSpecs (the AI council) + Leash gate policy + bounded op-vocab + knowledge bindings.
  • Single source of truth → the harness is generated, not hand-wired (today we hand-author Kestra YAML and agent configs separately — they drift).
  • The Manifest is the search space for self-improvement (Plane 6 can propose Manifest patches).

Why it's novel

  • No vendor compiles one domain blueprint into both the workflow engine and the agents. (Amazon's Kiro does spec→tasks→code — close, but not both rails + agents, and not receipted.)

Status — ✅ now BUILT

  • The BreakerManifest compiler is BUILT (services/breaker/src/compiler/): one typed manifest → Kestra flows + Omnigent AgentSpecs + gate policy + op-vocab + knowledge bindings; all-or-nothing, idempotent, cross-artifact consistent, carries no model id/secret. Proven on the real-estate Shore.

Generalizes our existing per-Shore ShoreManifest + op-vocab.ts.

2 The Two Hands

🤝 Execution

Two hands work together: a steady hand (the workflow engine) that does the same thing every time, and a creative hand (the AI) that figures things out. Neither is the boss. Every move passes through four safety gates.


The two co-equal layers

  • Kestra DAG (deterministic rail: stages, If, LoopUntil, Pause) ⊕ Omnigent agent loop (AI: AIAgent, council, swap-model). The DAG isn't a wrapper around the agent; the agent isn't a task in the DAG — they're co-equal and both emit receipts.

The four boundaries

  • Tool — 6-phase ALLOW / ASK / DENY + a bounded op-vocabulary: the AI may only emit ops from a fixed list; off-vocab → rejected (could_not_do). Constrains the action space, not just output shape.
  • Sandbox — bwrap / seatbelt + egress MITM (Omnigent).
  • Flow — Kestra Pause = human gate that can wait days, resume from any device.
  • Spend — every model call routes through Ferry (BYOK + metering); the spend event itself becomes a receipt.

Status — ✅ now BUILT

  • The governed-turn runtime is BUILT (services/breaker/src/execution/, PR #130): runGovernedTurn drives one turn with the Kestra DAG ⊕ Omnigent loop co-equal, emitting one signed Leash receipt per consequential transition into the live D1LeashStorethe live-emission gap is closed — and verifying prior state with gatedRead (fail-closed). All four boundaries are enforced at runtime; agents only propose via a typed Work Envelope, only BREAKER commits; wired to the real OmnigentSSE / KestraRestFlowSource seams (no mocks).

Precedent: Temporal's "deterministic workflow / LLM-in-activity" split (used by OpenAI, Replit, Cursor).

3 The Receipt Chain · "BRE"

🔗 Leash

Every meaningful action gets a tamper-proof, signed receipt, chained to the one before it — a notarized logbook nothing can be secretly edited out of. If the chain ever breaks, BREAKER stops. This is the "Boundary-Regulated Execution": the AI can only move forward through signed, checked steps.


How it actually works

  • Every state transition (not just every tool call) → a hash-chained, ECDSA-P256-signed receipt, cross-linking Kestra executionId ⊕ Omnigent conv_id.
  • Receipt-gated: downstream consumers verify the chain segment before reading new state; a broken chain fail-closes the harness.
  • Receipts are typed — task-execution vs evolution:* — so an auditor can answer "why did BREAKER change?"

Status — ✅ now BUILT

  • ✅ The verifier was already proven (exec-plane/leash/verify.py, CI-gated). And now the runtime emission + the fail-closed gate are BUILT (services/breaker/src/leash/): 5 transition types emit signed receipts, gatedRead halts fail-closed on any break, and a TS verifier port is conformance-proven ≡ the Python verifier — both directions (Python accepts a TS-authored chain; TS reproduces Python's verdicts byte-for-byte). The single highest-leverage unblock is closed.

Why it's novel

  • Durable-execution rivals (Temporal/DBOS/Restate) have logs — but mutable ones. None chain + sign across the engine⊕agent boundary. This is BREAKER's sharpest edge.
4 The Navigator · "AKER"

🧭 Knowledge Router

Instead of one dumb search, BREAKER picks the right way to find each answer — and remembers which routes worked (well-worn paths glow brighter). The AI's memory lives outside its short-term window and is pulled in only when needed.

grep / FastContextexact symbol · recent file
BGE-M3 + Qdrantfuzzy / meaning
GraphRAG"how does X relate to Y"
Graphiti"what changed / when"
OKF filescanonical definitions
stigmergywell-worn paths brighten

Two kinds of routing

  • Knowledge-engine routing — a deterministic classifier sends each query to the engine that wins for that query shape (table above).
  • Harness routing — per task, route to the best agent backend (Claude SDK / Codex / Gemini / OpenHands…) on cost · risk · fit · prior success. We can be vendor-neutral; no vendor will route to its competitors.

Why it matters

  • Research finding (arXiv:2605.15184, "Is Grep All You Need?"): the harness, not the retriever, decides accuracy — so the routing decision is the thing to own, govern, and make learnable. The router itself is receipted + reinforced by what evals well.

Status — ✅ v0 BUILT

  • Knowledge Router v0 is BUILT (services/breaker/src/knowledge-router/): deterministic per-query engine routing (≥95% on a labeled set) + vendor-neutral per-task harness routing, every decision receipted, context as a just-in-time projection, stigmergy-reinforced, safe fallback.
5 One Pen · Many Eyes

⚖️ Convergence

Many AI agents can read and argue, but only one is allowed to actually write — so they can't trip over each other. And when two agents genuinely disagree, BREAKER doesn't just pick one: it runs both as a real experiment and learns which was right.


How it actually works

  • Council agents share one Loro CRDT state object + Leash correlation IDs so they converge instead of diverging. Pattern: scout → plan → verify → ONE writer → merge gate (matches Cognition's "writes stay single-threaded" lesson).
  • Divergence = fuel (your directive): a detector flags material disagreement → forks parallel receipted A/B branches → a deterministic oracle (tests/eval, never the AI judging itself) picks the winner → winner merges, loser is kept as a learning, and the result reinforces routing.

Status — ✅ now BUILT

  • The convergence runtime is BUILT (services/breaker/src/convergence/, PR #129): a shared Loro CRDT council state with one deterministic writer behind an invariant merge gate (the CRDT's only mutation path), every apply sealed by a council_apply receipt; it implements the CouncilStateApply port that Plane 2 calls. The convergence protocol (scout→plan→verify→one-writer→gate→receipt) and the divergence→A/B runtime (detectDivergence → receipted A/B arms → a deterministic oracle decides → winner merged via the writer + route reinforced; no-winner → both kept, never a silent pick) are built. 25 convergence tests.
6 It Gets Better · Safely

🌱 Evolution

BREAKER improves itself over time — better prompts, routing, skills — but every change is tested against a fixed yardstick, signed, and reversible. It can learn, but it can't secretly rewrite itself.


The self-improvement loop

  • propose → oracle-evaluate → canary → promote / roll-back. Every change is a falsifiable prediction + a typed evolution:* receipt. The optimizer can never edit its own oracle.

Three change tiers (the safety gradient)

  • Auto after eval: prompt wording, route weights, memory weights.
  • Canary + auto-rollback: agent configs, skill patches, DAG logic.
  • Human-approved: new tools, new spend, new op-vocab verbs, oracle changes.

Grounded in the 2026 SOTA

  • Darwin-Gödel-Machine, ADAS, AlphaEvolve (evolutionary, gated) · Reflexion / Self-Refine · DSPy/MIPRO (prompt opt) · Voyager (skill libraries) · A-MEM (memory evolution) · Agentic Harness Engineering (the observability skeleton). Each maps onto a Fogbreak primitive (Leash / stigmergy / Manifest / GraphRAG / op-vocab).

Status — ✅ Phase-1 BUILT

  • Evolution Plane Phase-1 is BUILT (services/breaker/src/evolution/): council disagreement → receipted A/B arms → a deterministic oracle picks the winner (never the AI judging itself) → routing reinforced, loser kept as a learning. The optimizer is structurally barred from editing the oracle/verifier/promotion policy (proven by test). No code changes yet — prompt/route A/B only, so it's provably safe.

Start at harness level (safe); gate code self-modification behind sandbox + holdout evals + human review.

Where BREAKER sits in the real world

There's already a category of tools that give AI "durable rails" — they survive crashes and replay reliably. Our engine Kestra is one of them. BREAKER is the member of that category that is also cryptographically governed, knowledge-native, and self-improving.

The category: durable-execution-for-agents

Temporal (OpenAI · Replit · Cursor) DBOSRestate Dapr AgentsDiagrid (NVIDIA · HSBC) InngestKestra ← our engine

They give: crash-survival, deterministic replay, audit logs.

BREAKER adds, on top:

🔗 Leash — cryptographic, not just durable logs 📐 Manifest — compiles rails and agents 🧭 Knowledge Router 🛡 bounded op-vocab 🌱 Evolution plane

They're durable + replayable. BREAKER is durable + cryptographic + knowledge-routed + self-evolving.

Built on what we already run

Kestrathe deterministic engine — and it already ships a native AI-Agent loop
Omnigentthe agent meta-harness over 7+ coding agents + model gateway + HITL
Leashthe signed hash-chained receipt store (verifier proven today)
FerryBYOK keys + per-call metering & spend gates
KnowledgeOKF · BGE-M3 · GraphRAG · Graphiti · stigmergy · FastContext

Next: build your Council by hand, in the Forge

The foundation is in. The next surface makes it visible and editable: a canvas in the Forge where you drag agents and wire their fan-outs — a supervisor delegating to workers, a swarm handing off, an evaluator looping on a judge. Pick a premade Council blueprint, then rearrange it freely.

The trick: the picture is the Manifest

The tree you draw isn't a diagram of the system — it is a BreakerManifest. Arrange it visually → the compiler (Plane 1, already built) turns it into real Kestra flows + Omnigent agents. Zero new orchestration — Kestra already ships the agent loop, agent-as-tool nesting, A2A, and judged loops; the canvas just edits the blueprint they run from.

nodes = agentsedges = delegate / hand-off / escalatefan-out = parallel workersjudge node = the loop gate

Watch it run — for real

Because the Leash is now live, the canvas lights up from the real run — per-node badges (pending → running → done / failed), animated delegation edges, cost + token per node — streamed from genuine Kestra execution states + Omnigent runs + signed Leash receipts. Not a mock. The cozy World and this schematic tree become two views of one true feed.

React Flow + BlockSuite canvas (already in our stack) blueprint gallery (borrowed UX: n8n · Langflow · Sim Studio) live overlay (LangSmith-Studio-grade)

The market gap (research, 2026-06-21): no product unifies a named-pattern gallery + free-form editing + a live receipt-backed run-overlay + framework-native export for agent trees. Fogbreak can — because the export target (the Manifest) and the receipts (the Leash) already exist. Grounded in docs/design/2026-06-21-visual-council-trees.md + off-meter swarm-trees research.

🌫 Fogbreak · the BREAKER harness — Boundary-Regulated Execution & Agent Knowledge Engine Routing
Phase-0 grounded research (2026-06-23) → foundation BUILT (PR #114, 4 planes) → all 6 planes BUILT + MERGED (2026-06-24): Plane 2 Execution (PR #130) + Plane 5 Convergence (PR #129) · 101 tests · GLM-authored off-meter, Claude-orchestrated. Next surface: the Forge visual Council builder.
a harness that learns — but cannot secretly mutate.