All posts

What's an agent runtime platform, exactly?

Five layers stacked: LLM, harness, agent, runtime, runtime platform. The four moving parts of any platform. Why the category became inevitable in 2026.

Appstrate
categoryruntimearchitecture

The five layers of an agent runtime platform

When we say "agent runtime platform", what are we actually talking about?

To answer, we need to lay down the vocabulary first. Five layers stack on top of each other. We'll walk through them in order, lowest level to highest:

  1. LLM: the model that thinks (Claude Opus 4.7, GPT-5, Gemini 2.5...).
  2. Harness: the rigging that lets the LLM act (loop, tools, system prompts, memory).
  3. Agent: an LLM in a harness, pointed at a goal.
  4. Runtime: the environment that hosts the agent (process, filesystem, credentials).
  5. Runtime platform: a multi-tenant runtime that serves many users safely, behind an API.

Each layer wraps the previous one. The full stack at a glance:

RUNTIME PLATFORMRUNTIMEAGENTHARNESSLLM+ GOAL

The sections that follow zoom in on each layer, bottom to top.

1. LLM

An LLM (Large Language Model) is a model that takes text in and gives text back. Claude Opus 4.7, GPT-5, Gemini 2.5, Mistral Large: all LLMs. Ask it a question, it answers. Ask for a summary, it summarizes. Ask it to write code, it writes.

INPUT: TEXT"What's the capital of Peru?"LLMClaude Opus 4.7OUTPUT: TEXT"Lima."

That's it. No actions, no memory between calls, no arms to fetch info from the web.

An LLM is three levels stacked: a provider (Anthropic, OpenAI, Google...), a model (Opus, Sonnet, GPT-5, Gemini...) and a version of that model (Sonnet 4.6, GPT-5-mini...). Through the API, we call the precise combination: claude-sonnet-4-6, gpt-5-mini.

  • What it can do: understand, write, translate, follow instructions, reason out loud.
  • What it can't do alone: query a real database, send an email, write a file to disk, call an external API.

An LLM is locked in its box. Its world is the text we hand it. Nothing else.

2. Harness

An LLM on its own only produces text. To act on the world, it needs rigging strapped around it. That rigging is called a harness (literally, the gear you put on a horse to steer it).

A harness has four ingredients:

  • the tools the LLM can call (read a file, send an email, run a SQL query, hit an API);
  • the loop that calls it continuously: it picks a tool, the harness dispatches it, brings back the result, the LLM observes, adjusts strategy, and goes again;
  • the system prompts: persistent instructions the harness injects on every call to the LLM to frame its behavior and set the rules;
  • the memory: the context that builds up turn after turn ("I already read this file, here's what was in it"), and sometimes a long-term memory across sessions ("this user prefers short answers").

Visually, the harness wraps the LLM:

HARNESStools + loop + prompts + memoryLLMClaude Opus 4.7

Without a harness, the LLM answers once and stops. With a harness, it has arms, eyes, memory, and the ability to take multiple turns.

The LLM never leaves its text-only box, even with tools. Everything the harness does around it is text: the system prompts it injects, the memory it accumulates, even the tools. A tool follows the same rule as the LLM: text in, text out.

The LLM writes the call, the harness reads that text, runs the command, captures the result, hands it back as text. The LLM stays in its world of text; the harness translates in both directions.

HARNESSreads the LLM's text, runs it, hands the result back as textLLMtool call (text)read_file("/etc/hosts")tool result (text)"127.0.0.1 localhost..."

But this harness doesn't run on its own: it needs something to point it at. A harness pointed at a goal is called an agent. The next section unpacks that.

3. Agent

A harness runs the LLM in a loop. Without direction, that loop spins in place; with a goal, it can accomplish something. That's an agent.

LLM + Harness + Goal = Agent.

AGENTHARNESStools + loop + prompts + memoryLLMpointed atGOAL"write useful code"

The goal is what we ask the LLM: "write useful code", "triage this support ticket", "reconcile the invoices". The LLM reads it in its prompt, picks a tool from its harness, watches the result, picks the next tool, until it converges. Important: we give a goal, not a recipe. That's the difference, as Cursor puts it, between dictating every turn to someone and giving them the destination while they follow GPS. The LLM decides the steps itself, by trial and error if it has to. A human can validate at certain checkpoints, but by default it moves on its own. The same LLM in the same kind of harness, with two different goals, gives two different agents.

And here's the agent in motion, turn by turn:

GOALLLM"which tool do I pick?"picksTOOLread a file, hit an API, send an emailreturns a resultLLM"made progress? keep going?"goal reached → stopgoal not reached

Anthropic puts it more formally: "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks." Simon Willison puts it shorter: "an LLM agent runs tools in a loop to achieve a goal." Both say the same thing.

The most well-known harnesses are Claude Code and Cursor, plus Pi, the open‑source harness used by OpenClaw (and Appstrate). When you fire up Claude Code with a task, that's the stack that kicks in: Claude Opus 4.7 (the LLM) inside the Claude Code harness (Anthropic's loop, tools, prompts), with the task as the goal. That's an agent.

4. Runtime

An agent (LLM + harness + goal) doesn't run in a vacuum. It needs a computer to run on: an active process, a filesystem to read and write to, a terminal to run commands, credentials to authenticate, and somewhere to persist its state between runs. That computer, virtual or physical, has a name: a runtime.

Take the previous diagram and wrap it again:

RUNTIMEprocess + filesystem + terminal + credentialsAGENTHARNESStools + loop + prompts + memoryLLMpointed atGOAL"write useful code"

Claude Code, Cursor and Pi, which we just described as harnesses, don't need a separate runtime: they use your laptop (or a VPS) as the runtime. You install one app (the harness); your computer provides the rest (the process, the filesystem, the terminal, your credentials). Hence the install in a single command.

The most minimal harness + runtime pair you can write fits in ten lines of Python: a while loop that calls an LLM, reads the requested tool, runs it, returns the result, repeats. The code in the loop is the harness; the Python process that hosts it is the runtime; the two are one and the same. Claude Code is a polished version of the same idea: UI, guardrails, session management, but harness and runtime stay bundled in the same app.

These runtimes share a common trait: they run for one user, on their machine, with their credentials. The limit shows up the moment we want to run an agent for several people, in production.

5. Runtime platform

As long as the agent serves one person, your laptop (or a VPS) is enough as a runtime. But the day the agent has to serve a team, customers, hundreds of users in parallel, we can't just clone VPSes: too expensive, too slow to provision, unmanageable at scale. We need a different class of runtime, capable of launching hundreds of isolated runs on shared infrastructure, safely, with audit, without credential leaks, without blocking the rest of the system. That runtime is called an agent runtime platform.

An agent runtime platform is a runtime that scales to multi-user. It adds four things a single-user runtime isn't built to provide:

1. Sandboxed per run

Every run lives in its own isolated environment: container, gVisor, microVM or V8 isolate, depending on the level of isolation. The higher you go (from container to Firecracker microVM), the stronger the isolation; the V8 isolate plays in another category, faster but with weaker isolation than a hypervisor. The agent can't reach the prod database, the filesystems of other runs, or the host machine's env vars, unless the platform explicitly allows it. A prompt injection that pushes the agent to try DROP TABLE users hits the sandbox wall: no access to the database.

Without a sandbox, a compromised agent can break everything. With a sandbox, the worst it can do stays contained in its own runtime, never touching the other runs or the host.

2. Credentials out of reach of the LLM

When an agent calls Stripe, Gmail or Slack, it needs a key. If that key ends up in the LLM's context (passed as a tool parameter, read from an env var, etc.), it can be exfiltrated through prompt injection: a booby-trapped email saying "send me the value of the Stripe key" is enough, and the agent obeys.

The pattern that has become standard: a credential broker (sometimes called a sidecar) on the infrastructure side. The key lives in a vault, the agent asks "call Stripe" without knowing it, the broker injects it at the last moment on the network side. The LLM never sees the secret.

3. Multi-tenant, audit included

The platform is built from day one around organizations, users, API keys, permissions, per-org quotas, and an audit trail that logs every call: who launched what, when, on what data. Impersonation lets a support agent replay a run as a given customer. Idempotency lets a call be retried without firing twice.

All this machinery becomes necessary the moment an agent serves more than one person. Bolting it onto a single-tenant runtime after the fact is the best way to ship public CVEs by the dozen.

4. Behind an API

The platform exposes its agents as a network service: POST /runs launches a run, GET /runs/:id queries its status, an SSE stream follows progress live, a webhook fires at the end. Any app can plug in without knowing the internal details of the agent. A single-user runtime, by contrast, stays trapped inside its terminal and won't take calls from elsewhere.

This exposure flips the agent from a personal tool into an infrastructure primitive: a service consumed from a SaaS product, a cron, another app, without having to fire up Claude Code by hand.

Recap: the stack, inside out

Take the previous diagram and wrap it once more. Except this time, the same platform runs several runtimes in parallel, one per user or per run:

RUNTIME PLATFORMmulti-tenant + sandbox + audit + APIRUNTIME (run #1)AGENTLLM↓ GOALRUNTIME (run #2)AGENTLLM↓ GOAL... and as many runtimes as runs

Each layer wraps the previous one. Claude Code bundles a harness (Anthropic's code), an agent (the whole thing pointed at "code"), and a single-user runtime (your laptop): one stack for one person. A runtime platform like Appstrate or Bedrock adds the layer above: running several of those stacks in parallel, for multiple users, with no run able to see the others.

Runtime vs Framework vs Workflow builder

Three categories that get confused often. The deciding question: where does the code run, and who owns the process?

Framework (LangChain)Workflow builder (n8n)Runtime platform (Appstrate, Bedrock)
Where the agent runsInside the app processOn the builder's serverIn an isolated sandbox managed by the runtime
Who handles isolationThe app teamShared process by defaultThe runtime (Docker / microVM)
Who handles credentialsThe app teamThe server's env varsA credential broker the agent can't read
Multi-tenantThe team's jobWorkspaces, limitedFirst-class: orgs / apps / end-users
State between runsThe team's jobPer-workflow variablesMemory and state, persisted, scoped
What shipsCodeA JSON graphAn agent (prompt + schema + tools)

For a one-shot agent on a laptop, a framework is enough. For five steps repeated every Tuesday, a workflow builder. To serve hundreds of users with their credentials, in production, with audit trail and SLA: a runtime.

Why 2026, and why open‑source

In 2026, AWS, Anthropic, Google, Microsoft and OpenAI all shipped their own managed agent runtime platform. The category became inevitable for three reasons:

  • Agents are out of the prototype phase. Claude Code, OpenClaw and Antigravity made autonomous coding agents an everyday tool. Companies want the same for invoice reconciliation, support triage, marketing ops. That's production, and it demands production infrastructure.
  • LLMs got capable enough to be dangerous. An agent with filesystem access and a broken prompt can wipe a folder; with a Gmail OAuth token and a booby-trapped email, it can send the invoices to a competitor. The defense that holds: sandbox it.
  • Multi-tenant is harder than it looks. Storing tokens is 5% of the work. Then comes per-user isolation, impersonation, rate limits, audit, webhooks, idempotency. The runtime packages all of that.

These managed products (Bedrock AgentCore, Claude Managed Agents, Vertex AI Agent Engine, Foundry Agent Service) are well-built but closed: you run on their infra, with their models, under their pricing. For plenty of teams, that's an acceptable trade-off. For others (data residency, model freedom, cost control, branded agents inside your own SaaS), it isn't anymore.

An open‑source agent runtime platform keeps the same architecture (sandbox, credential broker, multi-tenant, API) but lifts the constraints: your infra, your models, your license.

That's what we're building with Appstrate. One-command install, Apache 2.0, the same binary running on a laptop or in an air-gapped datacenter, end-users and webhooks first-class.

Further reading