Fogbreak Β· the BREAKER harness
the rails for your AI ~

BREAKER

Most AI agents are a clever model on a loose leash. BREAKER is the leash, the rails, and the map β€” it lets the AI do real work end-to-end, but never lets it do anything that isn't checked, signed, and reversible.

Boundary-Regulated Execution & Agent Knowledge Engine Routing

deterministic rails + AI judgement

First, the big idea: the harness is the moat

Everyone can use the same AI models β€” the model is the easy, commodity part. The hard, valuable part is everything wrapped around it: the rules, the memory, the permissions, the recovery. That wrapper is called the harness.

~98%of a top coding agent (Claude Code, teardown) is harness β€” only ~2% is the AI itself

So the smartest move isn't a better model β€” it's a better harness. BREAKER is Fogbreak's harness, and it's where our durable advantage lives. It's built from six interlocking parts (we call them planes) β€” and every one of them sits on top of tech we already have running: Kestra, Omnigent, the Leash, and our knowledge layers.

Grounded in: a Claude Code v2.1.88 teardown Β· arXiv:2605.18747 "Code as Agent Harness" Β· 4-model research (GLM Β· GPT-5.5 Β· Gemini Β· Perplexity).

The six planes (tap any card β€” the fog clears to show the deep version)

1 The Blueprint

πŸ“ Manifest

One document describes a whole work-area (a Shore): its steps, who's allowed to do what, which AI agents run it, and where it pauses for you. From that single blueprint, BREAKER builds both the steady assembly-line and the AI workers β€” so the two can never drift apart.


How it actually works

  • One BreakerManifest (YAML) compiles to: Kestra FlowDefinitions (the deterministic rail) + Omnigent AgentSpecs (the AI council) + Leash gate policy + bounded op-vocab + knowledge bindings.
  • Single source of truth β†’ the harness is generated, not hand-wired (today we hand-author Kestra YAML and agent configs separately β€” they drift).
  • The Manifest is the search space for self-improvement (Plane 6 can propose Manifest patches).

Why it's novel

  • No vendor compiles one domain blueprint into both the workflow engine and the agents. (Amazon's Kiro does specβ†’tasksβ†’code β€” close, but not both rails + agents, and not receipted.)

Generalizes our existing per-Shore ShoreManifest + op-vocab.ts.

2 The Two Hands

🀝 Execution

Two hands work together: a steady hand (the workflow engine) that does the same thing every time, and a creative hand (the AI) that figures things out. Neither is the boss. Every move passes through four safety gates.


The two co-equal layers

  • Kestra DAG (deterministic rail: stages, If, LoopUntil, Pause) βŠ• Omnigent agent loop (AI: AIAgent, council, swap-model). The DAG isn't a wrapper around the agent; the agent isn't a task in the DAG β€” they're co-equal and both emit receipts.

The four boundaries

  • Tool β€” 6-phase ALLOW / ASK / DENY + a bounded op-vocabulary: the AI may only emit ops from a fixed list; off-vocab β†’ rejected (could_not_do). Constrains the action space, not just output shape.
  • Sandbox β€” bwrap / seatbelt + egress MITM (Omnigent).
  • Flow β€” Kestra Pause = human gate that can wait days, resume from any device.
  • Spend β€” every model call routes through Ferry (BYOK + metering); the spend event itself becomes a receipt.

Precedent: Temporal's "deterministic workflow / LLM-in-activity" split (used by OpenAI, Replit, Cursor).

3 The Receipt Chain Β· "BRE"

πŸ”— Leash

Every meaningful action gets a tamper-proof, signed receipt, chained to the one before it β€” a notarized logbook nothing can be secretly edited out of. If the chain ever breaks, BREAKER stops. This is the "Boundary-Regulated Execution": the AI can only move forward through signed, checked steps.


How it actually works

  • Every state transition (not just every tool call) β†’ a hash-chained, ECDSA-P256-signed receipt, cross-linking Kestra executionId βŠ• Omnigent conv_id.
  • Receipt-gated: downstream consumers verify the chain segment before reading new state; a broken chain fail-closes the harness.
  • Receipts are typed β€” task-execution vs evolution:* β€” so an auditor can answer "why did BREAKER change?"

Status

  • βœ… The verifier is already proven (exec-plane/leash/verify.py, CI-gated, tamper fixtures fail-closed). πŸ”¨ The one piece to build: live runtime emission from Kestra/Omnigent β€” the single highest-leverage unblock.

Why it's novel

  • Durable-execution rivals (Temporal/DBOS/Restate) have logs β€” but mutable ones. None chain + sign across the engineβŠ•agent boundary. This is BREAKER's sharpest edge.
4 The Navigator Β· "AKER"

🧭 Knowledge Router

Instead of one dumb search, BREAKER picks the right way to find each answer β€” and remembers which routes worked (well-worn paths glow brighter). The AI's memory lives outside its short-term window and is pulled in only when needed.

grep / FastContextexact symbol Β· recent file
BGE-M3 + Qdrantfuzzy / meaning
GraphRAG"how does X relate to Y"
Graphiti"what changed / when"
OKF filescanonical definitions
stigmergywell-worn paths brighten

Two kinds of routing

  • Knowledge-engine routing β€” a deterministic classifier sends each query to the engine that wins for that query shape (table above).
  • Harness routing β€” per task, route to the best agent backend (Claude SDK / Codex / Gemini / OpenHands…) on cost Β· risk Β· fit Β· prior success. We can be vendor-neutral; no vendor will route to its competitors.

Why it matters

  • Research finding (arXiv:2605.15184, "Is Grep All You Need?"): the harness, not the retriever, decides accuracy β€” so the routing decision is the thing to own, govern, and make learnable. The router itself is receipted + reinforced by what evals well.
5 One Pen Β· Many Eyes

βš–οΈ Convergence

Many AI agents can read and argue, but only one is allowed to actually write β€” so they can't trip over each other. And when two agents genuinely disagree, BREAKER doesn't just pick one: it runs both as a real experiment and learns which was right.


How it actually works

  • Council agents share one Loro CRDT state object + Leash correlation IDs so they converge instead of diverging. Pattern: scout β†’ plan β†’ verify β†’ ONE writer β†’ merge gate (matches Cognition's "writes stay single-threaded" lesson).
  • Divergence = fuel (your directive): a detector flags material disagreement β†’ forks parallel receipted A/B branches β†’ a deterministic oracle (tests/eval, never the AI judging itself) picks the winner β†’ winner merges, loser is kept as a learning, and the result reinforces routing.
6 It Gets Better Β· Safely

🌱 Evolution

BREAKER improves itself over time β€” better prompts, routing, skills β€” but every change is tested against a fixed yardstick, signed, and reversible. It can learn, but it can't secretly rewrite itself.


The self-improvement loop

  • propose β†’ oracle-evaluate β†’ canary β†’ promote / roll-back. Every change is a falsifiable prediction + a typed evolution:* receipt. The optimizer can never edit its own oracle.

Three change tiers (the safety gradient)

  • Auto after eval: prompt wording, route weights, memory weights.
  • Canary + auto-rollback: agent configs, skill patches, DAG logic.
  • Human-approved: new tools, new spend, new op-vocab verbs, oracle changes.

Grounded in the 2026 SOTA

  • Darwin-GΓΆdel-Machine, ADAS, AlphaEvolve (evolutionary, gated) Β· Reflexion / Self-Refine Β· DSPy/MIPRO (prompt opt) Β· Voyager (skill libraries) Β· A-MEM (memory evolution) Β· Agentic Harness Engineering (the observability skeleton). Each maps onto a Fogbreak primitive (Leash / stigmergy / Manifest / GraphRAG / op-vocab).

Start at harness level (safe); gate code self-modification behind sandbox + holdout evals + human review.

Where BREAKER sits in the real world

There's already a category of tools that give AI "durable rails" β€” they survive crashes and replay reliably. Our engine Kestra is one of them. BREAKER is the member of that category that is also cryptographically governed, knowledge-native, and self-improving.

The category: durable-execution-for-agents

Temporal (OpenAI Β· Replit Β· Cursor) DBOSRestate Dapr AgentsDiagrid (NVIDIA Β· HSBC) InngestKestra ← our engine

They give: crash-survival, deterministic replay, audit logs.

BREAKER adds, on top:

πŸ”— Leash β€” cryptographic, not just durable logs πŸ“ Manifest β€” compiles rails and agents 🧭 Knowledge Router πŸ›‘ bounded op-vocab 🌱 Evolution plane

They're durable + replayable. BREAKER is durable + cryptographic + knowledge-routed + self-evolving.

Built on what we already run

Kestrathe deterministic engine β€” and it already ships a native AI-Agent loop
Omnigentthe agent meta-harness over 7+ coding agents + model gateway + HITL
Leashthe signed hash-chained receipt store (verifier proven today)
FerryBYOK keys + per-call metering & spend gates
KnowledgeOKF Β· BGE-M3 Β· GraphRAG Β· Graphiti Β· stigmergy Β· FastContext

🌫 Fogbreak Β· the BREAKER harness β€” Boundary-Regulated Execution & Agent Knowledge Engine Routing
Phase-0 grounded research (2026-06-23). Co-designed with GLM-5.2 Β· GPT-5.5 Β· Gemini Β· Perplexity, off the Anthropic meter.
a harness that learns β€” but cannot secretly mutate.