How AI Learns Part 4: Memory-Based Learning

Modern AI systems increasingly rely on external memory. RAG, CAG, and Engram-style modules shift 'learning' away from weights. The brain stays stable. The notebook grows.

Modern AI systems increasingly rely on external memory.

This shifts “learning” away from parameters.

Resource	Link
Related	Engram \| Engram Revisited \| Multi-hop RAG

The Memory Paradigm

Diagram showing brain (model) connected to notebook (memory) with RAG, CAG, and Engram types — Store facts outside the brain.

Why External Memory?

Most “learning new facts” should not modify weights.

Weights are for generalization. They encode reasoning patterns, language structure, and capability.

Memory is for storage. It holds specific facts, documents, and experiences.

If you store everything in weights:

You create interference
You risk forgetting
You must retrain

If you store facts in memory:

No forgetting
Fast updates
Survives model upgrades

Retrieval-Augmented Generation (RAG)

Documents are embedded into vectors. At query time:

Embed the query
Search the vector database
Retrieve relevant documents
Inject into prompt
Generate grounded response

The model does not need to remember facts internally. It retrieves them on demand.

RAG Benefits

Benefit	Description
No forgetting	External storage, not weights
Persistent	Survives restarts and model changes
Scalable	Add documents without retraining
Verifiable	Can cite sources

RAG Challenges

Retrieval precision (wrong docs = bad answers)
Latency (search takes time)
Index maintenance
Chunk boundaries

Cache-Augmented Generation (CAG)

Instead of retrieving from vector DB, cache previous context or KV states.

Use cases:

Repeated knowledge tasks
Multi-turn conversations
Pre-computed context windows

Benefits over RAG:

Often faster (no embedding + search)
More deterministic
Good for structured repeated workflows

Trade-offs:

Less flexible
Cache management complexity

Engram-Style Memory

Recent proposals (e.g., DeepSeek research) introduce conditional memory modules with direct indexing.

Instead of scanning long context or searching vectors:

Memory slots indexed directly
O(1) lookup instead of O(n) attention
Separates static knowledge from dynamic reasoning

The goal: Constant-time memory access that doesn’t scale with context length.

This changes the compute story:

Don’t waste attention on “known facts”
Reserve compute for reasoning
Avoid context rot

Model Editing

A related technique: surgically patch specific facts without full fine-tuning.

Example: The model says “The capital of Australia is Sydney.” You edit the specific association to “Canberra” without retraining.

Pros:

Targeted fixes
Fast

Cons:

Side effects possible
Consistency not guaranteed

The Key Distinction

Aspect	Weight Learning	Memory Learning
Location	Parameters	External storage
Persistence	Model lifetime	Storage lifetime
Forgetting risk	High	None
Update speed	Slow (training)	Fast (database)
Survives model change?	No	Yes

When to Use What

Situation	Approach
Need new reasoning capability	Weight-based (fine-tune)
Need to know new facts	Memory-based (RAG)
Need domain expertise	Weight-based (LoRA)
Need to cite sources	Memory-based (RAG)
Frequently changing data	Memory-based (RAG/CAG)

References

Concept	Paper
RAG	Retrieval-Augmented Generation (Lewis et al. 2020)
Engram	Engram: Conditional Memory via Scalable Lookup (DeepSeek 2025)
REALM	REALM: Retrieval-Augmented Pre-Training (Guu et al. 2020)
Model Editing	Editing Factual Knowledge (De Cao et al. 2021)

Coming Next

In Part 5, we’ll examine context engineering and recursive reasoning: ICL, RLM, and techniques that prevent context rot during inference.

The brain stays stable. The notebook grows.