How much does an agent system cost to run?

Runtime cost has three components: LLM inference (roughly $0.003–$0.075 per 1,000 tokens depending on model), tool infrastructure (APIs, databases, message queues), and observability. A well-designed industrial agent that runs 1,000 decisions per day typically costs between $50 and $500 per month in inference, plus the cost of the underlying systems it connects to.

Agent Systems Apr 18, 2026 · 18 min read

What is an Agent System? A Complete Guide for 2026

An agent system is a software architecture that pairs a large language model with tools, memory, and a control loop — so that the model can act, not just answer. This guide walks through the four-layer anatomy, shows where agents outperform traditional workflows, and covers the production patterns that actually ship in industrial environments.

Derik Lawlis Founder, Sagepar · building agent systems for industrial ops

A working definition

An agent system is a runtime architecture in which one or more language models are given the ability to take actions — read data, call APIs, update systems, and decide what to do next — inside a controlled loop. It has four required parts: a model, a harness, memory, and tools. Remove any one and the system breaks.

That definition matters because the word "agent" has been stretched to mean everything from a chatbot that calls one function to a fully autonomous fleet managing a warehouse. For an operations leader deciding what to build, the differences are load-bearing. A chatbot does not need a harness. A production system absolutely does.

In one sentence

An agent system = model + harness + memory + tools, operating in a loop that can observe, decide, act, and learn without a human driving each step.

The four-layer anatomy

Every production agent system, regardless of vendor or framework, is built from the same four layers. The rest of this article walks through each in order, because the failure modes tend to cluster by layer and the economics do too.

Layer	Role	Typical components	What breaks if it's weak
Model	Reasoning and language	Claude, GPT, Gemini, open-source variants	Hallucinations, poor judgment, high cost
Harness	The control loop	Agent SDK, LangGraph, custom runtime	Unsafe actions, runaway loops, no observability
Memory	State across steps and sessions	Vector DB, key-value store, session cache, files	Forgets context, repeats work, no learning
Tools	The agent's hands	APIs, MCP servers, RPC, SQL queries, scripts	Can think, can't do; bottleneck at the last mile

1. The model

The model is the reasoning engine. In 2026 the dominant production options are Claude (Anthropic), GPT (OpenAI), Gemini (Google), and a small number of open-source models fine-tuned for agentic work. The choice of model matters less than operators typically assume — most of the performance gap between a "good" agent and a "bad" one comes from the layers above and below the model, not the model itself.

That said, the frontier models differ meaningfully in three ways that affect industrial deployments: context window (how much history the agent can hold at once), tool-use reliability (how consistently the model calls tools in the expected format), and cost per token. For most industrial use cases, the first two matter more than the third until volume crosses about 10,000 agent runs per month.

2. The harness

The agent harness is the single most under-appreciated part of an agent system. It is the code that wraps the model and runs the loop: take a task, call the model, read the model's tool-call output, execute the tool, feed the result back, ask the model what to do next, stop when the task is complete, and log every step along the way.

Without a harness, a language model is a text generator. With a harness, it is an agent. The harness is where you enforce:

Step limits. No more than N iterations before human review.
Tool allow-lists. This agent can read from the ERP, but it cannot write to the ledger.
Cost caps. Kill the run if it exceeds a token budget.
Retry logic. Network failures, rate limits, malformed tool output.
Observability. Every prompt, every tool call, every result, persisted and queryable.
Approval gates. Pause and wait for a human before actions above a risk threshold.

Popular harnesses in 2026 include the Claude Agent SDK, OpenAI's Assistants API, LangGraph, and a growing ecosystem of custom runtimes. For industrial deployments, teams increasingly build their own — because the harness is where your safety model lives, and that should not be a vendor dependency.

Mapping your own agent stack?

Our free assessment walks through the four layers as applied to your company — your tools, your workflows, your team. Six questions, two minutes, a personalized map of where agents fit.

Take the AI Assessment →

3. Memory

Memory is what separates an agent that can do a single task from an agent that can work through a shift. There are five kinds of memory in a production agent system, and most teams only think about two of them.

Working memory. The current context window. Everything the model sees right now.
Episodic memory. What happened in prior runs. "Last time I triaged a supplier like this, here's what I did."
Semantic memory. Facts about the world. Specs, policies, product catalogs, SOPs.
Procedural memory. How to do things. Learned patterns, preferred tool sequences.
User memory. What the operator it's paired with prefers and how they work.

Most early agent projects build only working memory and a basic semantic retrieval over a vector database. That is enough for proof-of-concept. It is not enough for production. The agents that actually become useful in industrial settings are the ones that accumulate episodic memory — the log of "what I tried, what worked, what the operator corrected" — and feed it back into the system.

We go deeper on memory patterns in the Agent Memory Complete Guide.

4. Tools

Tools are the agent's hands. Every action an agent can take in the world is a tool call: query a database, read a PDF, call an API, post to Slack, place an order, run a SQL query, update a Jira ticket. The quality of the tool layer usually determines whether an agent is useful or theater.

Two shifts in 2025–2026 changed how tools are built. First, the Model Context Protocol (MCP) gave agents a standard way to discover and use tools across vendors. Second, tool-use reliability became a frontier-model priority — modern frontier models call tools with the correct schema well above 95% of the time, where two years ago that number was 70–80%.

For industrial operators, the practical implication is that you should expect your agent to call every tool you give it, correctly, the first time. If it doesn't, the bug is almost always in the tool description, the schema, or the surrounding context — not in the model.

Agent vs. workflow vs. automation

The most common mistake in enterprise AI projects in 2026 is using an agent where a workflow would have done the job better. Agents are powerful because they can handle unknown inputs and novel situations. They are expensive because every step runs an inference call, and they are harder to reason about because their behavior is stochastic.

A useful three-way distinction:

	Automation	Workflow	Agent system
Decisions	None — scripted	Pre-defined branches	Dynamic, at runtime
Input	Structured, predictable	Structured, known variants	Unstructured or novel
Cost per run	Near zero	Near zero	$0.01 – $1.00+
Debuggability	Deterministic	Deterministic	Probabilistic, needs traces
When to use	The task never changes	Known variants, clear rules	Judgment required, open input

The best industrial agent systems are not 100% agentic. They are workflows with agents at the decision points.

In a well-designed system, the deterministic backbone does the heavy lifting — data movement, state transitions, queues — and the agent is called only when judgment is required. A procurement system, for example, might have a workflow that receives a purchase request, validates the fields, routes to the right approver, and updates the ERP. The agent is called inside that workflow to answer one question: "Is this a commodity purchase or does it require a sourcing review?" That is a one-sentence job for an agent, and the rest is plumbing.

Seven production patterns

Across the deployments we've seen in the past eighteen months, most useful industrial agent systems fall into one of seven patterns. These are worth knowing because they compose — a real system usually combines three or four of them.

1. Triage agent

Reads an incoming item (ticket, email, PO, inspection report), classifies it, decides where it goes. Replaces a human triage role; escalates unclear cases. The simplest pattern, and often the highest-ROI first deployment.

2. Extraction agent

Pulls structured data from unstructured documents — contracts, spec sheets, COAs, safety reports, maintenance logs. Writes results into a system of record. This is where most industrial companies get their first durable win.

3. Research and summarize

Pulls data from multiple sources, synthesizes, produces a structured output. Think: weekly supplier risk report, daily shift-handoff summary, equipment-history briefing before a site visit.

4. Orchestration agent

Coordinates several sub-agents or tools to accomplish a multi-step task. The "manager" pattern. Powerful but dangerous if under-constrained — this is where teams most often over-invest too early.

5. Conversational operator

A chat interface sitting in front of a data system — Slack or Teams, often — that operators use to query, update, and take action through natural language. The "Jarvis for ops" pattern.

6. Monitor and respond

Runs on a schedule or trigger, checks a condition, takes action or alerts. The successor to traditional monitoring, with the ability to interpret state instead of just threshold-checking it.

7. Continuous worker

Runs in a loop, working through a queue, handling each item with judgment. Closest to the science-fiction image of an agent. Appropriate for well-bounded, high-volume tasks with strong guardrails.

What this looks like in industrial operations

Industrial operators have three advantages when deploying agent systems that pure-software companies don't: bounded domains, clear ROI, and existing systems of record. That combination makes industrial agents both easier to justify and easier to instrument than the consumer-facing versions that get more press.

The use cases with the clearest track record as of 2026:

Predictive-maintenance triage. An agent reads incoming telemetry alerts, cross-references recent work-order history, and decides whether to dispatch, schedule, or ignore.
Supplier qualification. An agent reads a supplier's documentation package, checks it against your standards, and produces a qualification memo with gap analysis.
Shop-floor Q&A. An agent fronts your SOPs, equipment manuals, and safety docs; operators ask questions in plain language.
Work-order drafting. An agent takes a technician's voice note or photo, drafts a work order, and submits it for approval.
Procurement exception handling. An agent handles the 80% of purchase requests that are routine and escalates the 20% that aren't.
Quality-report generation. An agent takes the raw data from an inspection run and produces the formatted report the customer expects.

Each of these is a pattern-1, pattern-2, or pattern-3 agent — triage, extraction, or research. Pattern-4 orchestration agents and pattern-7 continuous workers are where industrial operators should be cautious in 2026. The technology works; the governance and risk models often haven't caught up. We cover these in detail in AI Agents for Industrial Operations.

Cost economics

Runtime cost for an industrial agent has three components:

Inference. Roughly $0.003 to $0.075 per 1,000 tokens depending on model tier. A typical agent run consumes 5,000 to 50,000 tokens across its loop, so cost per run falls in the $0.02 to $3.75 range.
Infrastructure. Vector DB, queues, logging, observability. Usually $100 to $2,000 per month for a well-instrumented single-agent system.
Development. Building the harness, tools, and evals. The largest cost by far in year one; amortized rapidly after.

A useful benchmark: an industrial triage agent handling 1,000 decisions per day, with a well-designed prompt and caching strategy, typically runs at $50–$500 per month in inference cost. The ROI calculation then compares that against the loaded cost of the human triage role being augmented — which in most industrial environments is somewhere between $40 and $120 per hour fully loaded.

Back-of-envelope

A triage role at $60/hour fully loaded, spending two hours per day on a task an agent could handle for $5 in inference: $120/day in labor vs. $5 in inference. Payback on a $15,000 build: roughly 130 working days, or about six months.

Most industrial agents pay back faster than that because they also eliminate back-and-forth queues and handoff delays, not just the labor.

How to start

A reliable sequence for industrial operators getting started with agent systems in 2026:

Map the candidates. Look for tasks that are high-volume, judgment-based, and touch unstructured data. Triage, extraction, and drafting tasks top the list.
Pick one and instrument it. Before building the agent, instrument the manual version. How long does it take? What fraction of cases are routine? What does "success" look like?
Build the tools first. Before wiring up the agent, make sure the underlying APIs, queries, and integrations work. 80% of agent-project time is really tool-layer time.
Start with a narrow harness. One model, three tools, strict step limits, human approval on every action. Expand once you've seen it work on 100 real cases.
Build the eval loop. You cannot improve what you cannot measure. A production agent without an eval harness is a liability, not an asset.
Promote gradually. Shadow mode → assisted mode (human approves) → autonomous with review → autonomous. Most agents should live in step 3 for a long time.

Frequently asked questions

An agent is a single LLM-driven actor. An agent system is the surrounding architecture — tools, memory, control loop, guardrails, observability — that lets one or more agents operate reliably in production. One is a component; the other is a complete runtime.

An agent harness is the execution environment that turns a language model into an agent. It handles the control loop (prompt → tool call → result → next step), manages tool access, enforces guardrails, logs every step for observability, and handles errors. Without a harness, an LLM is a text generator; with one, it is an agent.

Use a workflow when the steps are known, the inputs are structured, and the branching logic is finite. Use an agent system when the steps depend on the input, the environment changes, or the decision tree is too large to code by hand. Most industrial operations need both — deterministic workflows for the backbone and agents at the decision points.

Runtime cost has three components: LLM inference (roughly $0.003–$0.075 per 1,000 tokens), tool infrastructure, and observability. A well-designed industrial agent that runs 1,000 decisions per day typically costs between $50 and $500 per month in inference, plus the underlying systems it touches.

Yes — for bounded, well-instrumented tasks. Agent systems are already deployed in production for procurement triage, maintenance scheduling, document processing, quality-report generation, and supplier qualification. The constraint is not capability; it is the rigor of the surrounding harness, the guardrails, and the evaluation loop.

For industrial operators, the question is less "which framework" and more "which model provider." Start with the Claude Agent SDK if Anthropic is your model, OpenAI Assistants if GPT is, or LangGraph if you want framework-agnostic. Avoid investing heavily in any one framework for the first six months — you'll rewrite the harness at least once as you learn your workload.

Put it in motion

Find out where agent systems fit in your operations.

Our free AI Assessment walks through your tools, your workflows, and your team — and gives you a personalized map of where agent systems produce real ROI.

Take the AI Assessment →

What is an Agent System? A Complete Guide for 2026

A working definition

The four-layer anatomy

1. The model

2. The harness

Mapping your own agent stack?

3. Memory

4. Tools

Agent vs. workflow vs. automation

Seven production patterns

1. Triage agent

2. Extraction agent

3. Research and summarize

4. Orchestration agent

5. Conversational operator

6. Monitor and respond

7. Continuous worker

What this looks like in industrial operations

Cost economics

How to start

Frequently asked questions

Find out where agent systems fit in your operations.

Keep reading

Agent Memory: The Complete 2026 Guide

AI Agents for Industrial Operations