How AI Learns Part 1: The Many Meanings of Learning
592 words • 3 min read • Abstract

When people say, “AI learned something,” they usually mean one of at least four very different things.
Large Language Models (LLMs) do not learn in one single way. They learn at different time scales, in different locations, and with very different consequences. To understand modern AI systems—especially agents—we need to separate these layers.
| Resource | Link |
|---|---|
| Related | ICL Revisited | RLM | Engram |
Four Time Scales of Learning
1. Pretraining (Years)
This is the foundation.
The model trains on massive datasets using gradient descent. The result is a set of weights—billions of parameters—encoding statistical structure of language and knowledge.
This learning:
- Is slow and expensive
- Persists across restarts
- Cannot easily be reversed
- Is vulnerable to interference if modified later
Think of this as long-term biological memory.
2. Fine-Tuning (Days to Weeks)
Fine-tuning modifies the weights further, but with narrower data.
This includes:
- Instruction tuning (following directions)
- Alignment methods (Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO))
- Domain adaptation
- Parameter-efficient methods like Low-Rank Adaptation (LoRA)
This is still weight-based learning.
It persists across restarts. It risks catastrophic forgetting. It modifies the brain itself.
3. Memory-Based Learning (Seconds to Minutes)
This is where many modern systems shift.
Instead of changing weights, they store information externally:
- RAG (Retrieval-Augmented Generation)
- CAG (Cache-Augmented Generation)
- Vector databases
- Engram-style memory modules
The model retrieves relevant memory per query.
The brain stays stable. The notebook grows.
This learning:
- Persists across restarts
- Survives model upgrades
- Does not cause forgetting
- Is fast
4. In-Context Learning (Milliseconds)
This is temporary reasoning scaffolding.
Information exists only in the prompt window.
It:
- Does not update weights
- Does not persist across sessions
- Is powerful but fragile
- Suffers from context rot
This is working memory.
Why This Matters
Most discussions collapse all of this into “the model learned.”
But:
- Updating weights risks forgetting
- Updating memory does not
- Updating prompts does not persist
- Updating adapters can be modular and reversible
Continuous learning systems must coordinate all four.
Persistence Comparison
| Mechanism | Persists Across Chat? | Persists Across Restart? | Persists Across Model Change? |
|---|---|---|---|
| Pretraining | Yes | Yes | No |
| Fine-tune | Yes | Yes | No |
| LoRA | Yes | Yes | Usually |
| Distillation | Yes | Yes | No |
| ICL | No | No | No |
| RAG | Yes | Yes | Yes |
| Engram | Yes | Yes | Yes |
| CAG | Yes | Yes | Yes |
That last column is subtle but powerful for agents.
References
| Concept | Paper |
|---|---|
| LoRA | LoRA: Low-Rank Adaptation of Large Language Models (Hu et al. 2021) |
| RAG | Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al. 2020) |
| ICL | What Can Transformers Learn In-Context? (Garg et al. 2022) |
| Engram | Engram: Conditional Memory via Scalable Lookup (DeepSeek 2025) |
| DPO | Direct Preference Optimization (Rafailov et al. 2023) |
Coming Next
In Part 2, we’ll examine the two fundamental failure modes that arise from confusing these layers: catastrophic forgetting and context rot.
Learning happens in layers of permanence.
Part 1 of the How AI Learns series. View all parts | Next: Part 2 →