Do I need a vector database for agent memory?

Only for semantic memory over unstructured content — documents, notes, freeform text where similarity search is useful. For episodic memory (prior runs), a relational or document database is usually better because you query by agent ID, user ID, and time range, not by similarity. For structured facts, key-value or SQL is simpler. Many production agents use no vector database at all.

How do I prevent an agent from forgetting important context?

Three mechanisms work together: (1) persist relevant state to long-term storage at the end of every run, (2) build a retrieval layer that pulls the most relevant state back into context at the start of the next run, and (3) use a summarizer that compresses long histories into a short running summary. The retrieval policy matters more than the storage technology — the question is always 'what does this agent need to remember right now?'

What is the difference between RAG and agent memory?

RAG (Retrieval-Augmented Generation) is one technique for providing semantic memory — retrieve documents by similarity and inject them into context. Agent memory is a broader concept that also includes episodic memory (what the agent has done before), procedural memory (how it did similar tasks), and user memory (preferences). RAG is a subset of agent memory. Most production agents use RAG plus at least one other memory pattern.

Agent Memory Apr 17, 2026 · 22 min read

Agent Memory: The Complete 2026 Guide

Q: What is the difference between short-term and long-term agent memory?

Short-term memory is the active context window — everything the model sees in the current step. It is volatile and bounded by the model's context limit (typically 200K–1M tokens in 2026). Long-term memory persists beyond the current run and is stored outside the model, usually in a vector database for semantic content, a key-value store for structured facts, or a relational database for episodic logs. Long-term memory is retrieved and injected into the context window as needed.

Agent memory is how an AI agent holds and retrieves context across steps, sessions, and users. Most agent projects that plateau at "interesting prototype" do so because they got memory wrong. This guide walks through the five memory types, the storage choices behind each, and the retrieval patterns that make agents actually useful in production.

Derik Lawlis Founder, Sagepar · designing agent memory for industrial deployments

What is agent memory?

Agent memory is the set of mechanisms an AI agent uses to hold and retrieve information across time. It is what lets an agent remember what happened yesterday, what the user prefers, what it learned from a previous failure, and what the facts of the domain actually are.

When people say "the model has a memory problem," they usually mean one of five different problems. Conflating them is the reason so many early agent projects feel stuck. The model isn't forgetting — it never had access in the first place.

Core idea

A language model has no built-in memory. Everything it "knows" at inference time is either in its weights (fixed at training) or in its context window (reset every call). Agent memory is the engineering around that gap.

The five types of agent memory

Borrowing a classification from cognitive science and adapting it to agent systems, there are five memory types in any production-grade agent. Most teams only build the first two. The difference between a prototype and a durable agent system is usually how many of the five are actually implemented.

Type	Holds	Lifespan	Typical storage
Working	Current task state	One agent run	Context window
Episodic	Prior runs and outcomes	Indefinite	Relational or document DB
Semantic	Facts about the world	Indefinite	Vector DB + file store
Procedural	How to do things	Indefinite	Prompts, skills, playbooks
User	Operator preferences	Per user, indefinite	Key-value, profile files

1. Working memory

Working memory is the context window — everything the model can see in the current inference call. It is the only memory the model actually reads from; every other type has to be retrieved and injected into this window.

In 2026, frontier models ship context windows between 200,000 and 1 million tokens. That sounds like a lot. It isn't. A busy industrial agent working through a multi-step task — reading docs, calling tools, reviewing prior decisions — can fill 200K tokens in a single run. And the cost scales with the context size on every call.

Two working-memory engineering problems dominate production:

Context budget. How much of the window should be task content versus memory content? A useful default is 60% task, 30% retrieved memory, 10% instructions, with wide variation by use case.
Context compression. As a task runs long, the early turns become less relevant. Mid-run summarization (turn N-10 and earlier become a one-paragraph summary) preserves continuity without exploding cost.

2. Episodic memory

Episodic memory is the log of what the agent actually did — prior runs, the inputs they started from, the tools they called, the outcomes, and any corrections the operator made. It is the single highest-leverage memory type for industrial agents and the one most commonly neglected.

Why it matters: an agent without episodic memory re-discovers the same edge cases every time. An agent with episodic memory can answer "have I seen a case like this before?" — which is the foundation for everything from few-shot learning at runtime to continuous improvement.

Good episodic memory stores, per run:

Task inputs (structured where possible)
The sequence of tool calls and results
The final output
Outcome signal (success / failure / corrected)
Human corrections or annotations
Latency and cost

Relational databases are usually the right storage here, not vector DBs. You query by agent ID, user ID, task type, and outcome — not by similarity. Occasionally you want similarity search over episodes (find the three most similar prior cases), which is where a small embedding index over episode summaries helps. But the structured log is the backbone.

3. Semantic memory

Semantic memory is facts about the domain: product catalogs, specifications, SOPs, equipment manuals, safety documents, regulatory requirements, vendor details, customer profiles. It is the memory most teams build first, usually as a RAG (Retrieval-Augmented Generation) pipeline.

Semantic memory has three technical components:

Ingest. Convert source documents into retrievable chunks. Chunking strategy matters a lot — headings, tables, code blocks should each be handled.
Index. Typically a vector database (pgvector, Pinecone, Weaviate, Qdrant) plus a keyword index (BM25) for hybrid search.
Retrieve. At agent runtime, pull the top-k relevant chunks and inject them into context with clear source attribution.

For industrial operators, the ingest layer usually dwarfs the other two in effort. Real industrial docs are PDFs with tables, scans, engineering drawings, form fields, and version-control metadata. A semantic memory pipeline that handles a clean markdown corpus is four to ten times simpler than one that handles a real documentation set.

Scoping an agent memory system?

We help industrial operators map their existing documentation, choose the right storage pattern, and design retrieval that actually works for their workload. Start with the assessment.

Take the AI Assessment →

4. Procedural memory

Procedural memory is how the agent does things — the learned patterns, preferred tool sequences, and playbooks for common situations. In most implementations it lives in prompts, skill files, and capability descriptions rather than a database.

Three practical implementations:

Skills. Modular capability definitions that the agent can invoke, each with its own instructions and tool permissions.
Playbooks. Step-by-step guides for recurring workflows, retrievable by task type and injected when relevant.
Few-shot exemplars. Examples of correct behavior for common cases, injected into context when a similar case appears.

The economic argument for procedural memory is simple: each stored procedure is a compressed form of what would otherwise be a long chain of reasoning. An agent with good procedural memory completes routine tasks in one or two tool calls that would otherwise take six or seven.

5. User memory

User memory is what the agent knows about the specific operator it is working with — their role, their preferences, their working style, the shortcuts they like. For single-operator agents this is trivial; for agents that serve many users it is the memory type that makes the product feel personal.

In industrial environments user memory tends to cluster around three things:

Role and permissions. What this operator is allowed to do, what data they can see.
Context preferences. Units (metric vs. imperial), format (brief vs. detailed), language.
Working history. Recent tasks, frequently referenced items, open threads.

A useful pattern: keep user memory in a small structured profile that is always injected into context, plus a longer history that is retrieved on demand. This keeps the constant-cost portion small while still allowing deep personalization when needed.

Storage choices, by memory type

One of the most common mistakes in early agent design is picking a vector database and then trying to fit every memory type into it. Different memories need different stores.

Memory type	Query pattern	Best storage	Why
Working	Linear read	Context window	No storage — just the current prompt
Episodic	Structured filter + time	Postgres / SQLite	Rich queries by ID, time, outcome
Semantic	Similarity	Vector DB + BM25	Unstructured content, hybrid retrieval
Procedural	Category + keyword	File system / prompt store	Versioning matters more than search
User	Key lookup	KV / small table	Low latency, always-injected

The retrieval policy matters more than the storage technology.

A well-chosen Postgres table beats a poorly-tuned vector DB for episodic queries. A small JSON file beats both for user preferences. The question is never "what's the trendy database?" — it's "what query will I actually run against this?"

Retrieval patterns that work

Retrieval is the hard part of memory. Storage is mostly solved; getting the right thing back into context at the right time is where agents succeed or fail.

Always-inject

Some memory should be in every call: the user's role, the task instructions, any safety constraints, the agent's identity. Keep this block small — 500 to 2,000 tokens — and measure carefully.

Query-time retrieval

At the start of the task, run one or more retrievals against the relevant memory stores, rank by relevance, and inject the top matches. For semantic memory use hybrid retrieval (vector + keyword); for episodic use structured filters.

In-loop retrieval

Let the agent itself call a retrieval tool when it needs to look something up. This works well with modern frontier models that handle tool use reliably. The advantage: the agent retrieves only when needed. The cost: one additional round-trip per retrieval.

Summarization loops

For long-running agents, summarize older context into a short paragraph that replaces the raw turns. This is the only practical way to let an agent run for hours without blowing the context budget.

Write-back

At the end of every run, write episodic memory (the log), update user memory (new preferences), and optionally distill procedural memory (if this episode revealed a reusable pattern). Most early implementations skip this step. That's why they feel like goldfish.

Memory patterns in industrial agents

Three worked examples from industrial deployments. Each combines multiple memory types; none could work with just one.

Maintenance triage agent

An agent that reads incoming maintenance tickets and decides whether to dispatch, schedule, or escalate.

Semantic: Equipment manuals, known failure modes, vendor docs, indexed in a vector DB.
Episodic: Every prior ticket, its triage decision, and whether that decision turned out to be correct, in Postgres.
User: Per-tech preferences for ticket format, escalation thresholds.

The biggest driver of quality is episodic memory. When a ticket arrives, the agent first asks "what's the most similar ticket we've seen in the last 90 days, and what happened?" That single query cuts the error rate roughly in half versus a pure semantic-only version.

Supplier qualification agent

An agent that reads a supplier's documentation package (certifications, spec sheets, test reports, financials) and produces a qualification memo.

Semantic: The supplier's submitted docs, indexed per-supplier.
Procedural: Qualification checklist templates, one per product category.
Episodic: Prior supplier qualifications and their outcomes over time.

Procedural memory does the most work here — the checklist encodes 90% of the qualification logic, and the agent's job is to faithfully apply it while noting where the supplier's docs are ambiguous.

Shift-handoff summarizer

An agent that runs at shift change and produces a briefing for the incoming team: open issues, equipment status, safety notes, unusual events.

Semantic: Equipment status feeds, log aggregators, open-ticket summaries.
Episodic: Prior shift summaries, especially ones the incoming supervisor flagged as missing something.
User: Each supervisor's preferred format, pet concerns, reading level.

Five common memory pitfalls

Using a vector DB for everything. Vector search is for unstructured similarity; it is the wrong tool for "find all runs by user X in the last week."
Never writing back. An agent that doesn't persist what it did will feel amnesic regardless of how much you put in at ingest time.
Stuffing the context window. More tokens is not more intelligence. Past a certain volume, the model's attention degrades and latency balloons. Prefer precise retrieval to generous dumping.
No memory versioning. SOPs change. Product specs change. Without version metadata on semantic memory, your agent confidently quotes stale policies.
No eval on retrieval. Most teams evaluate the agent's final output. Few evaluate whether the retrieval returned the right source material. Fix the upstream quality problem before tuning the downstream one.

Frequently asked questions

Agent memory is the set of mechanisms an AI agent uses to hold and retrieve information across steps, sessions, and users. It includes working memory (current context window), episodic memory (past runs), semantic memory (facts about the world), procedural memory (how to do things), and user memory (preferences of the operator).

Short-term memory is the context window — volatile, bounded by the model's token limit. Long-term memory persists beyond the current run and is stored outside the model, usually in a vector database, key-value store, or relational database. Long-term memory is retrieved and injected into the context window as needed.

Only for semantic memory over unstructured content. For episodic memory (prior runs), a relational database is usually better because you query by agent ID, user ID, and time range. For structured facts, key-value or SQL is simpler. Many production agents use no vector database at all.

Three mechanisms work together: persist relevant state to long-term storage at the end of every run, build a retrieval layer that pulls the most relevant state back into context at the start of the next run, and use a summarizer that compresses long histories into a short running summary.

RAG is one technique for providing semantic memory — retrieve documents by similarity and inject them into context. Agent memory is broader and also includes episodic, procedural, and user memory. RAG is a subset. Most production agents use RAG plus at least one other memory pattern.

For most industrial agents: working memory (start here — it's free) plus episodic memory. Build semantic (RAG) only after you have episodic, because episodic will tell you which semantic sources the agent actually needs.

Build on this

Designing agent memory for your operations?

Our free AI Assessment maps your existing documentation, data, and workflows — and identifies the right memory pattern for your first agent deployment.

Take the AI Assessment →

Agent Memory: The Complete 2026 Guide

What is agent memory?

The five types of agent memory

1. Working memory

2. Episodic memory

3. Semantic memory

Scoping an agent memory system?

4. Procedural memory

5. User memory

Storage choices, by memory type

Retrieval patterns that work

Always-inject

Query-time retrieval

In-loop retrieval

Summarization loops

Write-back

Memory patterns in industrial agents

Maintenance triage agent

Supplier qualification agent

Shift-handoff summarizer

Five common memory pitfalls

Frequently asked questions

Designing agent memory for your operations?

Keep reading

What is an Agent System? A Complete Guide for 2026

AI Agents for Industrial Operations