Back to Blog

AI Agent Memory Management: How to Build a Knowledge System Your Agents Won't Outgrow

Architecture··11 min read·
memoryagentscontext engineeringCLAUDE.mdtutorial

Your agent just rewrote a module you refactored last week. It added back the deprecated API call, reintroduced the old naming convention, and opened a PR with a confident commit message. The code compiled. The tests passed. And none of it was what you wanted.

The problem wasn't the model. It was the memory. Your agent started a fresh session with zero knowledge of what happened yesterday, last week, or last month. It had no idea you'd already solved this problem. It just... solved it again, differently.

This is the most common failure mode in production agent systems, and it's entirely preventable.

The Two Memory Problems

AI agents face two distinct memory challenges that pull in opposite directions.

Session amnesia is what happens when agents forget everything between sessions. Each conversation starts from scratch. Decisions evaporate. Patterns learned through painful debugging disappear. Your agent makes the same mistakes on Monday that you corrected on Friday.

Context pollution is the opposite problem. You stuff everything into the agent's context window hoping it'll remember what matters, but instead it drowns in irrelevant details. An agent tasked with writing a unit test doesn't need your deployment runbook. A documentation agent doesn't need your database migration history. Too much context is almost as bad as none — it dilutes the signal and burns tokens.

The solution is a two-layer memory architecture: a small, always-loaded summary file that gives every session baseline context, paired with a structured directory of detailed notes that agents can search when they need depth.

Evergreen Files vs. Daily Logs

The foundation of agent memory is a file called MEMORY.md that lives at the root of your agent's memory directory. This file is automatically loaded into every conversation. Think of it as your agent's working memory — the stuff it should always know.

~/.claude/projects/your-project/memory/
├── MEMORY.md              # Always loaded, <200 lines
├── debugging.md           # Detailed notes by topic
├── patterns.md            # Confirmed conventions
├── architecture.md        # System design decisions
├── 2026-03-10.md          # Daily log
├── 2026-03-09.md          # Daily log
└── 2026-03-08.md          # Daily log

The memory system classifies files into two categories based on a simple pattern: if the filename matches a date format (like 2026-03-10.md), it's a daily log. Everything else is an evergreen file.

Daily logs capture session-specific context — what you worked on today, decisions made, bugs found. They're high-value when fresh and decay rapidly. You rarely need Tuesday's debugging notes the following month.

Evergreen files capture durable knowledge — architectural decisions, confirmed patterns, user preferences, project conventions. These change slowly and stay relevant for months. A file documenting that "this project uses Vitest, not Jest" is useful indefinitely.

This classification matters because the two types have different lifecycles, different staleness thresholds, and different pruning rules.

The 200-Line Rule

Here's the constraint that shapes everything else: MEMORY.md gets truncated at 200 lines. Lines 201 and beyond simply don't exist as far as your agent is concerned. This isn't a soft guideline — it's a hard limit in Claude Code's memory loading.

This means your primary memory file needs to be ruthlessly curated. It should contain:

  • Project name, location, and stack
  • Core conventions (test runner, linter, deployment)
  • Key file paths the agent needs regularly
  • Active decisions and their status
  • Links to detailed topic files

It should not contain full debugging histories, complete API documentation, or anything that belongs in a topic-specific file.

The memory health system enforces this with two thresholds:

| Check | Threshold | Severity | |-------|-----------|----------| | MEMORY.md line count | >200 lines | Critical — content is being silently truncated | | MEMORY.md line count | 150-200 lines | Warning — approaching the limit | | Individual file size | >100 KB | Critical — too large for efficient context loading | | Individual file size | 50-100 KB | Warning — consider splitting | | Total memory directory | >1 MB | Critical — memory sprawl | | Total memory directory | 500 KB - 1 MB | Warning — time to prune |

The 150-line warning gives you a 50-line buffer. When you hit it, audit your MEMORY.md for anything that could move to a topic file. The line count check is the single most impactful health indicator because a truncated memory file means your agent is silently losing context every session.

Temporal Decay

Not all memories are equally valuable over time. A debugging insight from yesterday is more relevant than one from two months ago. Temporal decay formalizes this intuition.

The default configuration uses a 30-day half-life, meaning a memory entry's relevance score drops by 50% every 30 days. A note from today has full weight. The same note 30 days from now has half the weight. After 60 days, a quarter. This decay curve ensures recent context naturally surfaces above older content.

When agents search memory, the system uses hybrid search that combines two signals:

  • Vector similarity (weight: 0.7) — semantic meaning of the query vs. memory entries
  • Text matching (weight: 0.3) — keyword overlap for exact terms and identifiers

The temporal decay multiplier is applied after these scores are combined, so a highly relevant old entry can still surface if the semantic match is strong enough. But when two entries are equally relevant, the fresher one wins.

To prevent search results from being too similar, the system uses MMR (Maximal Marginal Relevance) deduplication with a lambda of 0.7. This balances relevance against diversity — you get the most relevant results without five near-identical entries about the same debugging session.

Score = (0.7 × vector_similarity + 0.3 × text_match) × temporal_decay

The search cache holds up to 256 entries to avoid redundant embedding calls. For most projects, this means repeated queries within a session hit cache instead of recomputing.

Stale Content Detection

Memory that isn't maintained becomes a liability. Stale entries don't just waste space — they can actively mislead agents. A six-month-old note about your API schema is worse than no note at all if the schema has changed.

The staleness rules differ by file type because daily logs and evergreen files have different expected lifecycles:

Daily logs:

  • Older than 60 days → Warning — likely no longer relevant, candidate for archival
  • 30-60 days old → Info — review for any durable insights worth promoting to evergreen

Evergreen files:

  • Not updated in 90+ days → Info — verify content is still accurate

The asymmetry is intentional. Daily logs are expected to become irrelevant quickly — that's their nature. A 60-day-old daily log almost certainly contains nothing you need. Evergreen files get a longer runway because their content is meant to be durable, but even durable knowledge needs periodic review.

A healthy pruning cycle looks like this:

  1. Weekly: scan daily logs older than 30 days, extract any insights worth keeping into evergreen files
  2. Monthly: review evergreen files for accuracy, remove or update anything that's drifted from reality
  3. Quarterly: audit the entire memory directory against the health checks

The goal isn't to minimize memory size — it's to maximize the signal-to-noise ratio. A 200-line MEMORY.md full of current, accurate context is worth more than a 2,000-line document full of half-truths.

Memory Health Scoring

The health scoring system runs 10 checks against your memory directory and produces a composite score from 0 to 100. Each check has a severity level, and each severity level has a fixed point deduction:

| Severity | Point Deduction | |----------|----------------| | Critical | -20 per finding | | Warning | -10 per finding | | Info | -3 per finding |

A perfect score of 100 means every check passed with no findings. Here are the 10 checks:

  1. MEMORY.md exists — Critical if missing. Without this file, agents have no baseline context.
  2. MEMORY.md line count — Critical above 200, Warning at 150-200. The truncation check.
  3. MEMORY.md file size — Critical above 100 KB, Warning at 50-100 KB.
  4. Individual file sizes — Critical above 100 KB per file, Warning at 50-100 KB.
  5. Total directory size — Critical above 1 MB, Warning at 500 KB-1 MB.
  6. Stale daily logs — Warning for logs older than 60 days, Info for 30-60 days.
  7. Stale evergreen files — Info for files not updated in 90+ days.
  8. File count — Warning if the memory directory has an excessive number of files.
  9. Duplicate content — Warning if multiple files contain substantially similar content.
  10. Orphaned references — Info if MEMORY.md links to files that don't exist.

The scoring is deliberately harsh on critical findings. A single critical issue drops you from 100 to 80. Two critical issues put you at 60. This reflects reality: a truncated MEMORY.md or a missing memory file fundamentally undermines the entire system. Warning-level issues are less severe individually but accumulate — four warnings equal one critical in point impact.

In practice, most healthy projects score between 80-95. A score below 70 means something structural needs attention. Below 50 suggests the memory system isn't being maintained at all.

Safe Writes

Writing to the memory directory sounds simple, but in a multi-agent system it's surprisingly dangerous. Two agents writing to the same file simultaneously can corrupt it. An agent writing bad data can poison the memory for every subsequent session. A path traversal bug could let an agent write outside its designated directory.

The safe write system addresses these risks at four levels:

Path validation ensures every write targets a file within the designated memory directory. Paths are resolved and canonicalized before any I/O. Attempts to write to ../../etc/passwd or any path outside the memory directory are rejected before they reach the filesystem.

Git snapshots capture the state of the memory directory before writes. If an agent corrupts a file, you can restore the previous version. This is lightweight — it's a git add and git commit of the memory directory, not a full repository snapshot.

Atomic writes use a write-to-temp-then-rename pattern. The system writes to a temporary file in the same directory, then atomically renames it to the target path. This prevents partial writes — you either get the complete new content or the old content, never a half-written file.

Conflict detection checks whether the target file has been modified since the agent last read it. If another agent (or a human) has changed the file in the meantime, the write is flagged as a potential conflict rather than silently overwriting.

These safeguards add minimal overhead — a few milliseconds per write — but prevent the class of failures that can cascade through a multi-agent system. A corrupted MEMORY.md doesn't just affect one session. It affects every session until someone notices and fixes it.

Designing Your Memory Architecture

The system described above isn't hypothetical. It's the architecture that ClawPort implements for managing agent memory at scale. But regardless of whether you use ClawPort or build your own, the principles are the same:

Start with the 200-line constraint. Everything else follows from the fact that your always-loaded memory is finite. This forces you to be intentional about what goes in the primary file versus topic files.

Classify aggressively. Daily logs and evergreen files serve different purposes. Don't dump everything into one flat directory. Use the date-based classification to automate staleness detection.

Score and prune regularly. Memory health isn't a one-time setup. It's an ongoing practice. The 10-check scoring system gives you a quantifiable measure of memory quality that you can track over time.

Protect writes. In single-agent setups, write safety feels like overkill. The moment you add a second agent, it becomes essential. Build the safeguards before you need them.

The difference between an agent that makes you productive and one that creates more work than it saves often comes down to memory. An agent with well-maintained context is an agent that learns from your project's history instead of repeating it.

Building AI agent teams? ClawPort gives you the memory browser, health dashboard, and safe write infrastructure to keep your agents' context sharp. Free and open source.