Harness Engineering

BSW.DevSpark's harness runtime is an optional CLI execution layer for repeatable engineering workflows. It is additive: the prompt-first slash-command workflow remains unchanged, while the CLI adds a way to validate, execute, and inspect declarative workflow specs.

This page documents what is currently implemented in the repository.

CLI required — Everything on this page requires the BSW.DevSpark CLI. Install it once with:
uv tool install devspark-cli --force --find-links https://bsw-devspark.bswhive.com/dist/cli/
Prompt-first slash commands (/devspark.*) work without the CLI and are documented in the Implementation Lifecycle Guide.

devspark.run — Development Workflow Aliases

devspark run <alias> is the fastest path through the spec-driven development cycle when the CLI is installed. It chains atomic prompts into a single terminal command with built-in pause points and structured artifact output.

These are CLI-only commands. There is no /devspark.run slash command and no backing file exists in .claude/commands/. Without the CLI, run the atomic commands manually in sequence (see Without the CLI below).

Available Aliases

Alias	Chains	Pause point	Output
`create-spec`	`specify → plan → tasks → analyze`	After `analyze` — review before implementing	Reviewable spec artifact
`execute-plan`	`implement → create-pr → pr-review`	After `create-pr` — confirm PR before review runs	Pull request
`suggest-improvement`	`capture-context → classify-improvement → create-issue → (assign-agent) → (implement)`	None by default; pass `--yes` to skip confirmation	GitHub issue link
`full-cycle`	`specify → plan → tasks → critic → analyze → tasks (remediate) → implement → create-pr → pr-review`	None — `autonomy.level: autonomous` with guardrails instead	Pull request

full-cycle is intentionally different from the other three: it has no pause_after/review_after steps at all. Safety comes from guardrails (file/line/path limits — see Autonomy Model), which hard-block rather than pause under autonomy.level: autonomous. Like the others, devspark run full-cycle only sequences steps and tracks telemetry — it expects an agent already driving the conversation to execute each one. For execution with no agent watching, see Full Unattended Lifecycle below.

Usage

# Start a new feature from scratch
devspark run create-spec

# Execute an existing plan through to a reviewed PR
devspark run execute-plan

# File a workflow improvement against bswhealth/bsw.devspark
devspark run suggest-improvement
devspark run suggest-improvement --yes    # skip confirmation prompt

Pause and Resume

create-spec and execute-plan pause at defined checkpoints so a human can review before the workflow continues. When a pause fires, the CLI prints the exact resume command:

devspark resume <run_id>

Pause state is saved at .documentation/telemetry/runs/<run_id>.json. On resume, BSW.DevSpark verifies the persisted schema_version, workflow_id, and context_checksum — any mismatch exits with code 25 (EXIT_RESUME_FAILED).

Active paused runs can be listed with:

devspark runs list

Full Development Cycle with devspark.run

The two aliases cover the entire feature lifecycle when used in sequence:

devspark run create-spec
# → review analyze output, then resume or continue:
devspark run execute-plan
# → review and merge PR, then release:
devspark release <version>

For the full command order including release, see the Full Development → Release Cycle on the home page.

Without the CLI

Use the atomic slash commands directly in your agent:

`devspark.run` alias	Manual equivalent (no CLI required)
`create-spec`	`/devspark.specify` → `/devspark.plan` → `/devspark.tasks` → `/devspark.analyze`
`execute-plan`	`/devspark.implement` → `/devspark.create-pr` → `/devspark.pr-review`
`suggest-improvement`	`/devspark.specify` with improvement framing, then file the issue manually
`full-cycle`	`/devspark.specify` → `/devspark.plan` → `/devspark.tasks` → `/devspark.critic` → `/devspark.analyze` → `/devspark.tasks` (re-run, merges gate findings) → `/devspark.implement` → `/devspark.create-pr` → `/devspark.pr-review`

When to Use It

Use the harness runtime when you need terminal-driven execution, repeatable local validation, or a structured audit trail for a workflow that should run the same way more than once.

Good fits:

validate a harness spec before using it in a repeatable workflow
run a repo or app-scoped engineering sequence and capture artifacts
inspect why a prior run failed, retried, or aborted
verify adapter availability on a new machine

Less suitable fits:

ad hoc product work that already fits the prompt-first /devspark.* flow
one-off changes where a full execution spec would add more overhead than value

Command Surface

These are CLI commands, not slash commands.

devspark doctor
devspark harness validate sample.harness.yaml
devspark harness run sample.harness.yaml --dry-run
devspark harness trace latest
devspark adapter list
devspark adapter default claude_code

`devspark doctor`

Checks whether the current environment is ready for harness workflows.

Current checks include:

Python 3.11+
pydantic importability
compatible project layout
readable and valid agents-registry.json
git availability
required local CLIs for agent integrations that declare requires_cli

The command accepts both installed-project layouts with .devspark/ and compatible source checkouts with .documentation/, pyproject.toml, and src/devspark_cli/.

`devspark harness validate`

Loads a YAML or JSON harness spec, validates it against the current Pydantic model and schema expectations, and exits without executing any steps.

Use it before committing a new spec or before a real run.

`devspark harness run`

Executes a harness spec sequentially, evaluates validations after each step, persists artifacts, and returns structured exit codes.

Important current behavior:

exit codes are 0 complete, 1 failed, 2 aborted, 3 validation error
--dry-run writes a run record without executing step actions
--adapter overrides the adapter for executable steps
--adapter-default uses the saved user default adapter when present

`devspark harness trace`

Reads events.jsonl from a prior run and renders the recorded event stream. Use an explicit run ID or latest.

`devspark adapter list`

Lists the built-in adapters, whether each is available on the current machine, and the currently saved default.

`devspark adapter doctor`

Produces normalized readiness states for each adapter:

ready
write_approval_required
write_incompatible
unavailable

Use this before hands-off lifecycle runs to confirm the selected adapter can execute write-required stages without interactive approval.

`devspark adapter default`

Persists a local default adapter in the user's config directory. This does not modify .devspark/ or .documentation/, so upgrades do not overwrite the preference.

Built-In Adapters

The current built-in adapters are:

noop
manual
claude_code
codex
copilot
cursor

`noop`

Safe default for contract tests, dry runs, and environments without an AI tool installed.

`manual`

Displays the prompt for a human operator and waits for an acknowledgement keypress. It requires a TTY. In non-interactive contexts it fails clearly instead of silently skipping the gate.

`claude_code`, `codex`, `copilot`, `cursor`

These adapters call the corresponding local CLI (claude, codex, copilot, cursor-agent) if it is installed. Prompt content is sent through standard input rather than as a command-line argument, which avoids Windows command-length issues for larger prompts. Run devspark adapter list to see which of these are actually available on the current machine.

Spec Model

Harness specs are YAML or JSON documents with:

apiVersion: devspark.ai/v1
kind: HarnessSpec
name
scope
defaults
steps
telemetry

The checked-in examples are sample.harness.yaml (minimal, demonstrates each validation rule type) and full-cycle.harness.yaml (the full specify-through-pr-review lifecycle as agent_task steps — see Full Unattended Lifecycle below).

Step types currently implemented:

agent_task
validation
human_gate

Validation rule types currently implemented:

always.pass
file.exists
file.contains
command.exit_code
json.schema
git.clean
regex.match

Scope Resolution

Harness runs support repository scope and application scope.

scope.type: repo writes under the repository's .documentation/devspark/runs/
scope.type: app requires a valid multi-app registry and resolves the documentation root through the existing scope system

Current guardrails:

the repository root is derived from the spec path, not the caller's current working directory
malformed or path-invalid multi-app registries fail clearly instead of being treated as missing
ambiguous scope resolution is surfaced as a harness spec error

Run Artifacts

By default, telemetry writes to .documentation/devspark/runs/<run-id>/.

Current artifact layout includes:

spec.resolved.yaml
context.json
events.jsonl
result.json
adapter-doctor.json
decision-packet.json
steps/<step-id>/prompt.md when a prompt was materialized
steps/<step-id>/output.txt when an adapter produced output
steps/<step-id>/stdout.txt for command.exit_code validation output

Conditional artifacts:

no-change-explainer.md when workflow completed but delivery evidence was unmet
max-pass-failure-report.md when hands-off convergence reaches max passes without resolution

Runs are retained with a user-configurable limit. The default retention limit is 20.

Retry and Validation Behavior

After each executable step, the runner evaluates the declared validations.

error-severity failures block success
warning-severity failures are recorded but do not block the run
retry policies can request another attempt on validation failure
retry repair prompts append a ## Validation Errors section to the next adapter prompt
requireHumanAfter can force a manual pause after a configured attempt count

If a run is interrupted, the current implementation preserves the artifacts already written and records the run as aborted.

Operator Guidance

Recommended flow for a new spec:

Run devspark doctor on the target machine.
Validate the spec with devspark harness validate <spec.yaml>.
Run a dry run first with devspark harness run <spec.yaml> --dry-run.
Inspect the generated artifacts and the resolved spec.
Execute a real run only after the adapter and validation behavior are what you expect.

For adapter-driven runs, prefer explicit adapters in the spec when reproducibility matters across machines. Use a saved adapter default when you want a machine-local convenience setting.

Full Unattended Lifecycle

full-cycle.harness.yaml chains all nine lifecycle steps (specify → plan → tasks → critic → analyze → tasks (remediate) → implement → create-pr → pr-review) as agent_task steps, with one human_gate before create-pr that --hands-off bypasses automatically. Unlike devspark run full-cycle (see Available Aliases above), this actually executes each step non-interactively via the chosen adapter's CLI.

devspark adapter doctor                                              # confirm write-capable adapter
devspark harness validate full-cycle.harness.yaml
devspark harness run full-cycle.harness.yaml --dry-run                # resolve without executing
devspark harness run full-cycle.harness.yaml --adapter claude_code --hands-off

Validation rules deliberately avoid hardcoding a feature directory path — none exists yet when the spec is authored — and instead resolve the most recently modified .documentation/specs/*/ directory at validation time (command.exit_code rules using $(ls -td .documentation/specs/*/ | head -1)).

Convergence-loop caveat: --hands-off automatically runs run_stage_revalidation_loop (max 3 passes) after the step loop completes, but as currently implemented this only re-checks run.findings bookkeeping — it does not re-invoke the critic/analyze prompts to regenerate fresh findings. In practice, full-cycle.harness.yaml gets one remediation pass per run (the remediate-gates step, which is /devspark.tasks re-run merging gates/critic.md + gates/analyze.md into a ## Gate Remediation task phase — see Gate Remediation Merge below). If findings remain open after that pass, the delivery-gate check should block create-pr readiness rather than silently proceeding; check no-change-explainer.md / max-pass-failure-report.md if a run reports unmet delivery status.

Gate Remediation Merge

/devspark.tasks detects whether tasks.md already exists. On a re-run (already exists), it merges the findings: YAML from gates/critic.md and gates/analyze.md — both already in the Shared Review Resolution Contract shape (finding_id, severity, description, recommended_action, execution_mode, status, outcome) — dedupes overlapping findings, sorts by severity, and appends a ## Gate Remediation task phase with tasks tagged (resolves: <finding_id>). /devspark.implement flips each referenced finding's status to resolved in the originating gate file as its task completes, so a subsequent /devspark.critic//devspark.analyze re-run reports fewer open findings instead of repeating the same ones. This is prompt-level behavior (in templates/commands/tasks.md and templates/commands/implement.md), not harness-runtime code — it works whether /devspark.tasks is invoked as a slash command, via devspark run, or via devspark harness run.

Hands-Off Troubleshooting

If run fails with write_incompatible_adapter, switch to a write-capable non-interactive adapter and rerun devspark adapter doctor.
If delivery_status is unmet, review no-change-explainer.md and ensure changes exist under src/ or test/.
If convergence fails after max passes, inspect max-pass-failure-report.md and resolve remaining findings manually before retrying — see the convergence-loop caveat under Full Unattended Lifecycle.

Relationship to the Prompt Workflow

The harness runtime does not replace BSW.DevSpark's prompt-first lifecycle.

use slash commands to define, plan, implement, review, and release work
use the harness runtime when you need repeatable terminal-driven execution and traceable run artifacts

That separation is intentional: prompt workflows manage human and agent collaboration, while the harness runtime executes declarative engineering flows.

Test Coverage

The harness runtime is covered by two kinds of tests, both under tests/:

pytest test modules (run via `pytest tests/`)

These use standard def test_* functions and are picked up automatically by the test runner.

File	Tests	What it covers
`test_delivery_status_contract.py`	2	Delivery gate logic: `unmet` when no `src/` or `test/` changes; `met` when `src/` changes present
`test_convergence_loop_contract.py`	2	Finding state transitions (`open`, `resolved`, `deferred`); stage iteration record structure

Run: pytest tests/ -v

Runnable contract scripts (run directly via `python`)

These use a main() entry point and validate end-to-end CLI behavior through typer.testing.CliRunner or subprocess. CI runs them in the contract-validation job.

File	What it covers
`test_harness_validation_contract.py`	`devspark harness validate` — loads and validates a spec YAML against the schema
`test_harness_spec_contract.py`	Spec model parsing, field validation, and constraint checking
`test_harness_runner_contract.py`	Full harness run lifecycle — artifacts written, exit codes, retry and abort paths
`test_harness_adapters_contract.py`	Adapter routing via `agents-registry.json`, step-level adapter resolution
`test_adapter_doctor_contract.py`	`devspark adapter doctor` — normalized readiness states (`ready`, `write_approval_required`, `write_incompatible`, `unavailable`)
`test_hands_off_lifecycle_contract.py`	`--hands-off` flag — write-incompatible adapter triggers abort; `decision-packet.json` and `result.json` artifacts created

Run individually: python tests/test_harness_runner_contract.py

Run all: python tests/test_harness_validation_contract.py && python tests/test_harness_spec_contract.py && python tests/test_harness_runner_contract.py && python tests/test_harness_adapters_contract.py && python tests/test_adapter_doctor_contract.py && python tests/test_hands_off_lifecycle_contract.py