Skip to content

Agent loops and backpressure

A coding agent produces fast and you review slow. When the only feedback that bites comes from you, every mistake becomes review work. Backpressure is feedback that reaches the agent before it reaches you: a failing scenario the agent reads and repairs on its own. When the agent runs in a loop, unattended, the same signals become the loop’s worklist and its stopping condition.

These commands read the raw-run your framework already writes. They do not replace your test runner or scheduler. They supply the verification an agent loop needs to run without you watching every turn.

Loop phaseCommandWhat it gives the agent
Discovertriagea ranked worklist: failing scenarios, regressions first, each with the code it covers
Actchecka fast per-turn signal: passing collapses to a count, failing expands to intent + error + covers
Terminategoala behavioral definition-of-done it cannot fake
Remembertraceability-matrixrequirement coverage on disk that survives between runs

Run it after the tests on every change. Passing scenarios collapse to a count line. Each failing scenario expands to its Given/When/Then, the step that broke, the error, and the product code it covers.

Terminal window
executable-stories check .executable-stories/raw-run.json --baseline reports/previous.json
✓ 47 passed ✗ 1 failed (48 scenarios)
✗ Expired session redirects to login (regressed)
src/auth/session.story.test.ts:31
Given an expired session
When the user opens /dashboard
✗ Then they are redirected to /login
→ expected redirect to /login, received 200
covers: src/auth/session.ts, src/middleware/auth.ts
ticket: AUTH-123
⚠ 1 regressed since baseline (was passing).

check exits 5 when any scenario failed, so the agent’s loop reacts before a human reads it. Pass --no-fail to report only, or --check-format json for structured input. A baseline adds the “N regressed / N fixed” deltas.

The scheduled automation that opens the loop needs a queue of what to work on, not a full report. triage lists failing scenarios, regressions first, each carrying its covers paths so the loop knows which files to send a fixer at.

Terminal window
executable-stories triage .executable-stories/raw-run.json \
--baseline reports/last-green.json --triage-format json

Failures with no covers are flagged: the loop cannot route them to code, so they need a covers annotation or a human first. triage always exits 0 — it reports work, it does not gate.

A /goal-style loop runs until a verifiable condition holds. goal expresses that condition in behavior. It is met when the required scenarios pass, nothing regressed (with --no-regressions), and no scenario was removed, disabled, or had steps deleted versus the baseline.

Terminal window
executable-stories goal .executable-stories/raw-run.json \
--require-tickets US-101 --baseline reports/last-green.json --no-regressions
GOAL: not met
ticket:US-101: 2/3 scenarios pass (1 failing)
regressions: 0
ratchet: clean (0 scenarios removed/weakened)

Exit 0 means met, 5 means not yet, so a loop runs until the verdict flips. Declare the target with --require-tags, --require-tickets, or --require-scenarios; with none given, the goal is “every scenario passes”.

The ratchet matters for an unattended loop. An agent that can make “done” true by deleting the failing scenario will eventually try it. With a baseline, goal refuses a “done” that dropped, skipped, or shortened a scenario. Disable it with --no-ratchet if you need to.

Tomorrow’s run reads where today’s stopped. The behavior artifacts on disk are that memory. The traceability matrix is the requirement-first view: each ticket, the scenarios that verify it, the code they cover, and whether they pass, plus any scenario linked to no requirement.

Terminal window
executable-stories format reports/raw-run.json \
--format traceability-matrix --output-dir reports --output-name index

See the Agent artifact contract for the StoryReport, scenario index, and behavior manifest an agent also reads.

The fastest backpressure runs without anyone asking. Add it to CLAUDE.md or AGENTS.md:

After changing code:
- run the tests
- run: executable-stories check .executable-stories/raw-run.json --baseline reports/last-green.json
- fix every failure before continuing. Do not edit or skip a scenario to make it pass.
Stopping condition for this task:
- executable-stories goal .executable-stories/raw-run.json --require-tickets <TICKET> --baseline reports/last-green.json --no-regressions
- the task is done only when this exits 0.

A loop running unattended is also a loop making mistakes unattended. goal makes “done” a checked predicate over behavior and the ratchet stops the obvious cheats, but a green goal is evidence, not a guarantee you specified the right behavior. The Given/When/Then in check and the requirement view in traceability-matrix are there so you can read what the loop produced in behavior terms, fast, instead of reverse-engineering diffs. Build the loop, and stay the engineer who reads what it made.

  • Agent artifact contract — the StoryReport, scenario index, and behavior manifest.
  • MCP serverget_failing_scenarios, get_scenarios_for_paths, get_behavior_diff, run_scenario.
  • Release confidence — the before-PR and release gates (compare, gate-release).