Build with AI, Run Deterministic

Pattern · design-philosophy

Core principle: AI belongs at the construction layer (write the workflow, generate the spec, design the UI, codify the policy). The runtime should be deterministic, auditable, and thermodynamically efficient. The agent’s job is to write the workflow, not be the workflow.

The articulation

Matt Dean (Trabian, May 9 2026):

“Our thesis with AI is that AI should be used to build as much as possible and run as little as possible. By which I mean, you shouldn’t have to be able to use AI to do anything that’s deterministic. So our whole platform is built around this idea that you should be able to use deterministic, auditable, thermodynamically efficient, ultimately, tools to be able to run your workflows… but that AI should be able to help you create the right thing in the first place.”

“We’re not trying to run an agent to reinvent the wheel every single time you need to display a widget.”

In Mesh: the AI agent is the workflow constructor (operator-facing, conversational, gathers requirements, drafts the spec) but the runtime is Temporal + TypeScript — deterministic, durable, replayable. AI builds; Temporal runs.

Why the principle matters

Three properties get worse the more AI you push into the runtime:

Determinism. Same input → same output is a non-negotiable invariant for financial workflows, audit trails, regulated environments. LLM inference is non-deterministic by design (sampling temperature, model updates, prompt drift).
Auditability. Compliance, post-mortems, and debugging all depend on being able to reconstruct exactly what the system did. Deterministic code has a deterministic trace. Agent-in-the-loop runtime traces include “the model said X this time” — which can’t be re-derived.
Thermodynamic efficiency. LLM inference at runtime burns tokens (and dollars and latency) for behavior that, once specified, can be encoded in microseconds of CPU. Re-paying for the same decision every request is the wrong cost curve.

The fourth, implied: legibility. A deterministic workflow is one a human can read, modify, and trust without watching it run. An agent-in-the-loop runtime is opaque.

The construction layer is where AI’s leverage lives

Inverting the principle: what gets better the more AI you push into construction?

Specification quality. AI as consultant — surfaces edge cases, asks clarifying questions, builds the spec faster than a human PM/engineer working alone (cf. Mesh’s “check for gaps” feature).
Cost of customization. A workflow that takes a human engineer 8 weeks to build can be drafted in a 30-min conversation by an operator. The cost curve of building drops to near-zero; the cost curve of running stays low because the output is just code.
Iteration speed. Building a refer-a-friend, a treasury-management onboarding, a custom file ingest is the same conversational shape — all of them resolve to deterministic Temporal workflows.
Domain-expert leverage. A bank ops person who can describe the workflow in English now gets the workflow they described. The translator (engineer) is no longer in the critical path.

Who else has converged on this

Temporal itself is a bet on this principle — durable execution is “the runtime is just code, structured so that it can be replayed deterministically.” Stripe, Netflix, Coinbase use it. OpenAI uses Temporal to run Codex — i.e., even the AI-product company runs its agentic system on a deterministic substrate.
Cursor / Aider / Claude Code are construction-layer AI: they generate code; they don’t run the code at runtime. The artifact is deterministic.
Sora / Stable Diffusion / image models are mostly construction-layer: produce the asset once, ship the artifact, don’t re-generate at every page load.
The “AI-built CRUD app” pattern — generate the schema, generate the migrations, generate the screens; the app at runtime is plain code with no LLM in the request path.

The counterexamples (chatbots, agentic copilots, anything ReAct-style) are intentionally agent-in-runtime because their task is conversation. But for any task that can be specified, AI in construction beats AI in runtime.

Where this is the wrong principle

Not all problems can be specified ahead of time. AI in the runtime is correct when:

The task is open-ended classification or generation (summarize this email, classify this support ticket, draft this reply).
The input space is too large to enumerate (parse arbitrary natural-language intent).
The cost of being slightly wrong is low and human review catches errors.
Latency and cost are acceptable.

The rule of thumb: if you can write the deterministic code, you should — and you should use AI to write it. If you can’t write the deterministic code (because the task itself is fuzzy), AI at runtime is the right tool.

How this changes engineering work

If you take the principle seriously, the shape of an engineering team changes:

Less: integration plumbing, per-tenant customization labor, workflow boilerplate, theming variations. These are construction-layer tasks AI now does.
More: specifying invariants and policies that the construction layer obeys; building the substrate that AI generates onto (the Temporal layer, the type system, the testing harness); designing the human-AI handoff for safe construction (review gates, simulation modes, rollback).

The center of gravity moves from “writing the code” to “designing the system that AI writes the code into.” This is the same shift Augmentation-Over-Automation describes at the design-philosophy level — but sharper, applied to runtime architecture.

Tactical questions when applying

Where in our stack is AI currently in the runtime that could be moved to construction? (e.g., classification logic that runs on every request and rarely changes.)
Where in our stack do we run code generated by AI without auditing it once? (Construction without review is its own failure mode.)
What’s the “spec → deterministic code” pipeline? If absent, AI’s construction-layer leverage doesn’t compound.
Are we paying for inference at runtime when we could pay it once at construction? (Cost smell.)

Connections

Augmentation-Over-Automation — the same insight at the design-philosophy level (AI as teammate, not replacement). Build-AI-Run-Deterministic is the architectural specialization: AI augments the engineer building the system, not the system at runtime.
AI-Ready-Engineering — code health, TDD, supervisory workflows are the substrate that makes AI-built artifacts auditable. Without this substrate, AI construction degrades.
Use-Equals-Build — when an operator describes a workflow in conversation and the system materializes it, the user IS extending the system. The construction-layer AI is the mechanism that makes Use=Build real for non-engineers.
Smalltalk-Integrated-Environment-Brief — Mesh’s roadmap of “hit play and step through the workflow with sample data” is a Smalltalk inspector for AI-built workflows. The breadboard mentality.
Doorman-Fallacy — agent-in-the-runtime patterns often fall into the fallacy: you see a task being done and try to make AI “do that task at every request,” missing that the task is best specified once.
Layer-Cake — the AI substrate stratifies into construction-time vs run-time the way the rest of the stack does. Don’t conflate the layers.
Plug-Me-In-Archetype — Mesh is a plug-me-in because the runtime is deterministic and constructible. Plug-me-ins that put AI in the runtime are much harder to integrate into regulated environments.
Capability-Autonomy-Risk-Triangle — Build-AI-Run-Deterministic is one of the cleanest ways to split the triangle by phase: high autonomy at build time, low autonomy at runtime. Lets you compose otherwise-incompatible corners of the triangle.
Vibe-Coding-to-Agentic-Engineering — closest sibling. Both express the same architectural insight at different scales: AI at the construction/spec layer, deterministic execution at the artifact layer. Agentic engineering generalizes Build-AI-Run-Deterministic from runtime systems to the SDLC itself.