Clawdbot's Memory System: Why 'Files as Memory' and How It Retrieves

Clawdbot's Memory System: Why 'Files as Memory' and How It Retrieves

Target Audience

You want to use Clawdbot as a “long-term online personal assistant” and soon encounter two practical problems:

  • Chat context grows long, models have window limits
  • You need it to “remember” decisions, preferences, project status, but don’t want to hand memory to cloud black boxes

Clawdbot’s approach: treat memory as ordinary files in your workspace, and build a local index for it.

First, Two Concepts: Context vs Memory

Context (Single Request Context)

Context is everything the model can see in this request, such as: system prompts, conversation history, tool results, attachments, etc.

Its characteristics: temporary, limited, expensive (larger context = higher cost).

Memory (Persisted Long-term Memory)

Memory is content persisted to disk, such as:

  • MEMORY.md (long-term accumulation)
  • memory/*.md (daily incremental records)
  • Session transcripts (sessions transcript, indexed depending on configuration)

Its characteristics: persistent, growable, retrievable.

Benefits of “Files as Memory”

From an engineering perspective, this design is simple but brings three benefits:

  1. Auditable: You can directly open files to see what it remembers
  2. Versionable: You can manage with git (at least manage key rules and conclusions)
  3. Controllable: No dependency on external vector databases; index is derivative and rebuildable if corrupted

How Retrieval Works (Implementation Perspective)

In the current implementation, indexing uses SQLite; vector retrieval can use the sqlite-vec extension; keyword retrieval uses FTS5.

A more realistic simplified process:

  1. Scan memory source files in workspace (Markdown, optional sessions)
  2. Chunk the content
  3. Calculate embeddings (can use OpenAI/Gemini/local embedding)
  4. Write to SQLite (chunks / embedding_cache / optional chunks_fts / chunks_vec)
  5. Query in parallel:
    • Vector similarity (semantic)
    • BM25 (keywords) Then merge results by weight

Default index path (configurable):

  • ~/.clawdbot/memory/{agentId}.sqlite

(Corresponding config: agents.defaults.memorySearch.store.path)

Isolation in Multi-Agent Scenarios

In multi-agent scenarios, it’s generally recommended to isolate “configuration, workspace, indexing”:

  • Each agent has its own workspace (source files)
  • Each agent has its own SQLite index (derived data)

This prevents “personal assistant” and “work assistant” from polluting each other’s context and memory.

How You Should Use It (Actionable Advice)

1) Write Rules into Persistent Files

If you want it to work consistently long-term, recommend writing these contents into workspace guidance files:

  • Who you are, your preferences (few and precise)
  • Boundaries you want it to observe (e.g., which commands must ask)
  • What you want it to check regularly (heartbeat/cron task lists)

2) Write “Conclusions” in MEMORY.md, “Process” in Daily Notes

A simple experience:

  • memory/YYYY-MM-DD.md more like daily flow
  • MEMORY.md more like reusable conclusions and long-term constraints

3) Don’t Panic if Index is Corrupted: It’s Derivative

As long as source files exist, the index can usually be rebuilt. What really needs backing up:

  • ~/.clawdbot/ (configuration and credentials)
  • ~/clawd/ (workspace)

References and Extensions

  • Related implementation: src/memory/manager.ts, src/memory/memory-schema.ts (Clawdbot source code)
  • (Pending) Clawd-style workspace and memory files: /en/docs/start/clawd/

Source material: docs_doc/_docs/docs/clawdbot/2015780646770323543.md