harness
the control plane behind the agentic builds
Most agent demos work. Most agent deployments fail. The gap between them is the control plane, and that is the part nobody puts in the slide deck.
Harness is the control plane that closes it. A python-first product with one shared core and three thin surfaces: API, CLI, and UI. The point is to make agent systems boring enough to run in production.
what the core does
A single runtime takes an intent, selects the right pack, builds a plan, checks policy on every step, requests approvals, executes connectors, handles exceptions, stores receipts, and reports KPIs.
Every action follows one rule. Policy-gated, replayable, observable, auditable. Get that rule right once and the rest of the system inherits it.
packs as plugins
A customer ops pack, finance ops pack, founder ops pack, or industry-specific pack each defines its own intents, playbooks, policies, connectors, KPIs, and exception handlers.
The product grows through packs. The core runtime stays stable. That separation is the whole game. Teams ship new agent capabilities by writing a pack, never by patching the runtime.
the three surfaces
The API stays thin. A small set of endpoints for intents, approvals, executions, and receipts. No business logic lives there.
The CLI gives operators clean JSON output. Pipe it into monitoring, scripts, other agents, dashboards.
The UI focuses on the work that matters. Approvals. Exceptions. Execution timelines. Receipts. Rollback. Pack status. Nothing else makes it onto the screen.
what lives in the core, and why
Typed contracts. Deterministic plans. Policy decisions. Approval gates. Connector isolation. Receipts. Rollback tokens. KPI tracking.
These are the parts that have to be right. Get them wrong once and the next hundred packs inherit the same problem. Get them right and packs become cheap to write.
why this design matters
For agents to work in real operations, a team has to see what the agent planned, what it touched, which policy allowed it, who approved it, what changed, and how to replay or roll back the action.
That is the difference between a demo agent and an operational agent. A demo calls an LLM, returns a string, and lands a screenshot in a deck. An operational agent runs against real systems with real consequences. The control plane is what makes that safe.
the shape
A monorepo python package. One deployable service exposing API and UI. A CLI entrypoint for operators. The whole thing fits in a single deploy until the load forces a split.
Four rules of thumb: start simple, make the core strict, make packs easy to add, make every action accountable.
why this is the right layer
Most agent frameworks optimize for the first 80% of a demo. Prompt orchestration. Tool calls. Memory. They are useful for prototypes. They are also where every team gets stuck when they try to ship.
The remaining 20% is the operational work. Who approved this? What policy applied? Can we roll it back? Did it touch sensitive data? Was the action expected? Can a different agent replay it tomorrow with the same result?
The teams winning this round are the ones building that 20% as the foundation. The prompts and models become commodity. The control plane is the asset.