BREAKER

1 The Blueprint

📐 Manifest

One document describes a whole work-area (a Shore): its steps, who's allowed to do what, which AI agents run it, and where it pauses for you. From that single blueprint, BREAKER builds both the steady assembly-line and the AI workers — so the two can never drift apart.

How it actually works

One BreakerManifest (YAML) compiles to: Kestra FlowDefinitions (the deterministic rail) + Omnigent AgentSpecs (the AI council) + Leash gate policy + bounded op-vocab + knowledge bindings.
Single source of truth → the harness is generated, not hand-wired (today we hand-author Kestra YAML and agent configs separately — they drift).
The Manifest is the search space for self-improvement (Plane 6 can propose Manifest patches).

Why it's novel

No vendor compiles one domain blueprint into both the workflow engine and the agents. (Amazon's Kiro does spec→tasks→code — close, but not both rails + agents, and not receipted.)

Generalizes our existing per-Shore ShoreManifest + op-vocab.ts.

2 The Two Hands

🤝 Execution

Two hands work together: a steady hand (the workflow engine) that does the same thing every time, and a creative hand (the AI) that figures things out. Neither is the boss. Every move passes through four safety gates.

The two co-equal layers

Kestra DAG (deterministic rail: stages, If, LoopUntil, Pause) ⊕ Omnigent agent loop (AI: AIAgent, council, swap-model). The DAG isn't a wrapper around the agent; the agent isn't a task in the DAG — they're co-equal and both emit receipts.

The four boundaries

Tool — 6-phase ALLOW / ASK / DENY + a bounded op-vocabulary: the AI may only emit ops from a fixed list; off-vocab → rejected (could_not_do). Constrains the action space, not just output shape.
Sandbox — bwrap / seatbelt + egress MITM (Omnigent).
Flow — Kestra Pause = human gate that can wait days, resume from any device.
Spend — every model call routes through Ferry (BYOK + metering); the spend event itself becomes a receipt.

Precedent: Temporal's "deterministic workflow / LLM-in-activity" split (used by OpenAI, Replit, Cursor).

3 The Receipt Chain · "BRE"

🔗 Leash

Every meaningful action gets a tamper-proof, signed receipt, chained to the one before it — a notarized logbook nothing can be secretly edited out of. If the chain ever breaks, BREAKER stops. This is the "Boundary-Regulated Execution": the AI can only move forward through signed, checked steps.

How it actually works

Every state transition (not just every tool call) → a hash-chained, ECDSA-P256-signed receipt, cross-linking Kestra executionId ⊕ Omnigent conv_id.
Receipt-gated: downstream consumers verify the chain segment before reading new state; a broken chain fail-closes the harness.
Receipts are typed — task-execution vs evolution:* — so an auditor can answer "why did BREAKER change?"

Status

✅ The verifier is already proven (exec-plane/leash/verify.py, CI-gated, tamper fixtures fail-closed). 🔨 The one piece to build: live runtime emission from Kestra/Omnigent — the single highest-leverage unblock.

Why it's novel

Durable-execution rivals (Temporal/DBOS/Restate) have logs — but mutable ones. None chain + sign across the engine⊕agent boundary. This is BREAKER's sharpest edge.

4 The Navigator · "AKER"

🧭 Knowledge Router

Instead of one dumb search, BREAKER picks the right way to find each answer — and remembers which routes worked (well-worn paths glow brighter). The AI's memory lives outside its short-term window and is pulled in only when needed.

grep / FastContextexact symbol · recent file

BGE-M3 + Qdrantfuzzy / meaning

GraphRAG"how does X relate to Y"

Graphiti"what changed / when"

OKF filescanonical definitions

stigmergywell-worn paths brighten

Two kinds of routing

Knowledge-engine routing — a deterministic classifier sends each query to the engine that wins for that query shape (table above).
Harness routing — per task, route to the best agent backend (Claude SDK / Codex / Gemini / OpenHands…) on cost · risk · fit · prior success. We can be vendor-neutral; no vendor will route to its competitors.

Why it matters

Research finding (arXiv:2605.15184, "Is Grep All You Need?"): the harness, not the retriever, decides accuracy — so the routing decision is the thing to own, govern, and make learnable. The router itself is receipted + reinforced by what evals well.

5 One Pen · Many Eyes

⚖️ Convergence

Many AI agents can read and argue, but only one is allowed to actually write — so they can't trip over each other. And when two agents genuinely disagree, BREAKER doesn't just pick one: it runs both as a real experiment and learns which was right.

How it actually works

Council agents share one Loro CRDT state object + Leash correlation IDs so they converge instead of diverging. Pattern: scout → plan → verify → ONE writer → merge gate (matches Cognition's "writes stay single-threaded" lesson).
Divergence = fuel (your directive): a detector flags material disagreement → forks parallel receipted A/B branches → a deterministic oracle (tests/eval, never the AI judging itself) picks the winner → winner merges, loser is kept as a learning, and the result reinforces routing.

6 It Gets Better · Safely

🌱 Evolution

BREAKER improves itself over time — better prompts, routing, skills — but every change is tested against a fixed yardstick, signed, and reversible. It can learn, but it can't secretly rewrite itself.

The self-improvement loop

propose → oracle-evaluate → canary → promote / roll-back. Every change is a falsifiable prediction + a typed evolution:* receipt. The optimizer can never edit its own oracle.

Three change tiers (the safety gradient)

Auto after eval: prompt wording, route weights, memory weights.
Canary + auto-rollback: agent configs, skill patches, DAG logic.
Human-approved: new tools, new spend, new op-vocab verbs, oracle changes.

Grounded in the 2026 SOTA

Darwin-Gödel-Machine, ADAS, AlphaEvolve (evolutionary, gated) · Reflexion / Self-Refine · DSPy/MIPRO (prompt opt) · Voyager (skill libraries) · A-MEM (memory evolution) · Agentic Harness Engineering (the observability skeleton). Each maps onto a Fogbreak primitive (Leash / stigmergy / Manifest / GraphRAG / op-vocab).

Start at harness level (safe); gate code self-modification behind sandbox + holdout evals + human review.

First, the big idea: the harness is the moat

The six planes (tap any card — the fog clears to show the deep version)

📐 Manifest

How it actually works

Why it's novel

🤝 Execution

The two co-equal layers

The four boundaries

🔗 Leash

How it actually works

Status

Why it's novel

🧭 Knowledge Router

Two kinds of routing

Why it matters

⚖️ Convergence

How it actually works

🌱 Evolution

The self-improvement loop

Three change tiers (the safety gradient)

Grounded in the 2026 SOTA

Where BREAKER sits in the real world

The category: durable-execution-for-agents

BREAKER adds, on top:

Built on what we already run