Modern AI systems increasingly rely on external memory.

This shifts “learning” away from parameters.

Resource Link
Related Engram | Engram Revisited | Multi-hop RAG

The Memory Paradigm

Diagram showing brain (model) connected to notebook (memory) with RAG, CAG, and Engram types
Store facts outside the brain.

Why External Memory?

Most “learning new facts” should not modify weights.

Weights are for generalization. They encode reasoning patterns, language structure, and capability.

Memory is for storage. It holds specific facts, documents, and experiences.

If you store everything in weights:

  • You create interference
  • You risk forgetting
  • You must retrain

If you store facts in memory:

  • No forgetting
  • Fast updates
  • Survives model upgrades

Retrieval-Augmented Generation (RAG)

Documents are embedded into vectors. At query time:

  1. Embed the query
  2. Search the vector database
  3. Retrieve relevant documents
  4. Inject into prompt
  5. Generate grounded response

The model does not need to remember facts internally. It retrieves them on demand.

RAG Benefits

Benefit Description
No forgetting External storage, not weights
Persistent Survives restarts and model changes
Scalable Add documents without retraining
Verifiable Can cite sources

RAG Challenges

  • Retrieval precision (wrong docs = bad answers)
  • Latency (search takes time)
  • Index maintenance
  • Chunk boundaries

Cache-Augmented Generation (CAG)

Instead of retrieving from vector DB, cache previous context or KV states.

Use cases:

  • Repeated knowledge tasks
  • Multi-turn conversations
  • Pre-computed context windows

Benefits over RAG:

  • Often faster (no embedding + search)
  • More deterministic
  • Good for structured repeated workflows

Trade-offs:

  • Less flexible
  • Cache management complexity

Engram-Style Memory

Recent proposals (e.g., DeepSeek research) introduce conditional memory modules with direct indexing.

Instead of scanning long context or searching vectors:

  • Memory slots indexed directly
  • O(1) lookup instead of O(n) attention
  • Separates static knowledge from dynamic reasoning

The goal: Constant-time memory access that doesn’t scale with context length.

This changes the compute story:

  • Don’t waste attention on “known facts”
  • Reserve compute for reasoning
  • Avoid context rot

Model Editing

A related technique: surgically patch specific facts without full fine-tuning.

Example: The model says “The capital of Australia is Sydney.” You edit the specific association to “Canberra” without retraining.

Pros:

  • Targeted fixes
  • Fast

Cons:

  • Side effects possible
  • Consistency not guaranteed

The Key Distinction

Aspect Weight Learning Memory Learning
Location Parameters External storage
Persistence Model lifetime Storage lifetime
Forgetting risk High None
Update speed Slow (training) Fast (database)
Survives model change? No Yes

When to Use What

Situation Approach
Need new reasoning capability Weight-based (fine-tune)
Need to know new facts Memory-based (RAG)
Need domain expertise Weight-based (LoRA)
Need to cite sources Memory-based (RAG)
Frequently changing data Memory-based (RAG/CAG)

References

Concept Paper
RAG Retrieval-Augmented Generation (Lewis et al. 2020)
Engram Engram: Conditional Memory via Scalable Lookup (DeepSeek 2025)
REALM REALM: Retrieval-Augmented Pre-Training (Guu et al. 2020)
Model Editing Editing Factual Knowledge (De Cao et al. 2021)

Coming Next

In Part 5, we’ll examine context engineering and recursive reasoning: ICL, RLM, and techniques that prevent context rot during inference.


The brain stays stable. The notebook grows.