How to Build AI Memory Systems That Persist Across Sessions

Ask ChatGPT what you told it yesterday. It has no idea. Ask it what your job title is, what project you're working on, what you asked it to do three messages ago in a different conversation. Blank stare.

This isn't a bug. It's how every major AI assistant works. Each conversation is a blank slate. The model has no persistent memory, no context from previous sessions, no understanding of who you are beyond what you've typed in the current window. And for most use cases, that's fine. You ask a question, you get an answer, you move on.

But the moment you try to build something that acts as a genuine assistant — something that knows your preferences, tracks your projects, remembers decisions you made last week — you hit a wall that no amount of prompt engineering can fix.

The Problem with Stateless AI

When I started building Mimir, my AI assistant, the first version was stateless. Every conversation started fresh. And it was immediately obvious how limiting that was. I'd tell it about a project on Monday, and by Tuesday it had no idea what I was talking about. I'd set preferences — "always format my todos with due dates" — and the next session, gone.

The obvious solution is to stuff everything into the system prompt. Just append all the context the model might need. Your name, your preferences, your recent conversations, your project state. Some people do this. It works until it doesn't.

The problem with putting everything in the system prompt is that "everything" grows fast. And the more context you add, the worse the model performs at actually using any of it.

I covered this in my previous article about the three-agent pattern — context noise degrades performance. The same principle applies to memory. If you dump 50 memories into every request, the model drowns in information that's mostly irrelevant to the current task.

Six Layers of Memory

The solution I landed on was a six-layer memory system. Each layer serves a different purpose, has a different lifespan, and gets injected into context only when it's relevant.

1. Session RAM

This is the working memory for the current conversation. It tracks things like "the user just mentioned three emails — here are their IDs" or "we're currently editing a file at this path." It's structured data, not natural language. It expires when the session ends. Think of it as the scratchpad the assistant uses while working on your request.

2. Session Memory

Slightly longer-lived than RAM. This captures context that's relevant for the duration of a conversation but doesn't need to persist forever. "The user is in a bad mood today" or "we've been debugging this specific error for the last 20 minutes." It helps the assistant maintain coherence across a long conversation without polluting permanent storage.

3. Semantic Memory

This is the long-term knowledge base. Facts, preferences, decisions, lessons learned. "Imtiaz prefers bullet points over paragraphs." "The Q2 roadmap was finalized on March 3rd." "Sarah's role is VP of Engineering." These are stored as embeddings and retrieved via semantic search — meaning the system finds relevant memories based on meaning, not keyword matching.

This is the layer that makes the assistant feel like it actually knows you. When you mention Sarah, it pulls up what it knows about Sarah. When you ask about a project, it retrieves the decisions you've made about that project. Not because it was told to — because the semantic search found them relevant.

4. Persistent Memory

Hard facts that should never expire. Your timezone. Your work hours. Your communication preferences. These are always available, injected into every request because they're universally relevant. The key difference from semantic memory: persistent memories aren't searched — they're always present.

5. Conversation Store

A searchable archive of past conversations. Not every message — that would be enormous — but summaries and key exchanges. When you say "remember that thing we discussed last week about the database migration?" the system can actually search its conversation history and find it.

6. Project RAM

Working context for active projects. When you're in the middle of a multi-day project, the system maintains state — what's been done, what's pending, what decisions have been made. This is more structured than semantic memory and more persistent than session RAM. It lives as long as the project is active.

The Retrieval Problem

Having six layers of memory is useless if you can't retrieve the right memories at the right time. This is where most memory implementations fall apart. They either retrieve too much (noise problem) or too little (the assistant seems forgetful).

The approach that works: let the Planner agent decide what to search for. Remember the three-agent pipeline from my earlier article? The Planner has access to memory search tools. When it receives a request, it can proactively search for relevant context before creating the execution plan. It might search for the person mentioned, the project referenced, or the topic being discussed.

This means memory retrieval is intentional, not automatic. The system doesn't blindly inject the top 10 most similar memories into every request. It searches for what's relevant to this specific request, and only injects what it finds useful.

The best memory system isn't the one that remembers the most. It's the one that retrieves the right thing at the right time.

What This Means for AI Products

If you're building an AI product that interacts with the same users repeatedly — a customer support bot, an internal tool, a personal assistant — memory isn't optional. It's the difference between a tool and an assistant.

You don't need six layers. But you probably need at least two: something short-term for conversation coherence, and something long-term for user knowledge. And you definitely need a retrieval strategy that's smarter than "dump everything into the prompt."

Start with semantic memory. Store facts as embeddings, retrieve by similarity. This alone transforms the user experience.
Separate storage from retrieval. Just because you stored something doesn't mean it should be in every request. Search intentionally.
Give memories a lifespan. Not everything needs to live forever. Session context should expire. Project context should archive when the project ends.
Deduplicate aggressively. Users will tell you the same thing multiple times. Your memory system needs to recognize "I already know this" and update rather than duplicate.

Mimir currently has over 900 memories. It knows my work schedule, my communication preferences, the people I work with, the projects I'm tracking, and the decisions I've made over the past several months. When I start a new session, it doesn't feel like starting over. It feels like picking up where we left off.

That's what memory gives you. Not just recall — continuity.

This is the third article in a series about building reliable AI applications. Previous: The 3-Agent Pattern and Costume Change vs. New Actor.

Also worth reading: Context Windows Are a Lie — on why injecting all memories into context doesn't work.