How AI Learns Part 6: Toward Continuous Learning

Continuous learning aims to absorb new information and skills over time without losing old capabilities. The key: learn often in memory, consolidate carefully in weights. Periodic consolidation, not constant updates.

Continuous learning aims to:

Learn new skills
Retain old skills
Avoid retraining from scratch
Avoid catastrophic forgetting

Resource	Link
Related	Sleepy Coder Part 1 \| Sleepy Coder Part 2

The Continuous Learning Loop

Flow diagram showing Agent to Logs to Evaluate to Cluster to Train to Validate to Deploy cycle, with Memory branch — Periodic consolidation, not constant updates.

The Core Tradeoff

Goal	Description
Plasticity	Learn new things quickly
Stability	Retain old things reliably

You cannot maximize both simultaneously. The art is in the balance.

Approaches to Continuous Learning

1. Replay-Based Methods

Keep (or synthesize) some old data. Periodically retrain on old + new.

How it works:

Store representative examples from each task
Mix old data into new training batches
Periodically consolidate

Recent work: FOREVER adapts replay timing using “model-centric time” (based on optimizer update magnitude) rather than fixed training steps.

Pros	Cons
Strong retention	Storage costs
Conceptually simple	Privacy concerns
Well-understood	Data governance complexity

2. Replay-Free Regularization

Constrain weight updates to avoid interference, without storing old data.

Efficient Lifelong Learning Algorithm (ELLA) (Jan 2026): Regularizes updates using subspace de-correlation. Reduces interference while allowing transfer.

Share (Feb 2026): Maintains a single evolving shared low-rank subspace. Integrates new tasks without storing many adapters.

Pros	Cons
No replay needed	Still active research
Privacy-friendly	Evaluation complexity
Constant memory	Subtle failure modes

3. Modular Adapters

Keep base model frozen. Train task-specific adapters. Merge or switch as needed.

Evolution:

Low-Rank Adaptation (LoRA): Individual adapters per task
Shared LoRA spaces: Adapters share subspace
Adapter banks: Library of skills to compose

Pros	Cons
Modular, versioned	Adapter proliferation
Low forgetting risk	Routing complexity
Easy rollback	Composition challenges

4. Memory-First Learning

Store experiences in external memory. Only consolidate to weights what’s proven stable.

Pattern:

New information → Memory (fast)
Validated patterns → Adapters (slow)
Fundamental capabilities → Weights (rare)

This separates the speed of learning from the permanence of changes.

The Practical Loop

A working continuous learning system:

Run agent (with Recursive Language Model (RLM) context management)
Collect traces: prompts, tool calls, outcomes, failures
Score outcomes: tests, static analysis, user signals
Cluster recurring failure patterns
Train lightweight updates (LoRA/adapters)
Validate retention (did old skills degrade?)
Deploy modular update (with rollback capability)

This is not real-time learning. It’s periodic consolidation.

Human analogy: Sleep. Process experiences, consolidate important patterns, prune noise.

Time Scales of Update

Frequency	What Changes	Method
Every query	Nothing (inference only)	-
Per session	Memory	Retrieval-Augmented Generation (RAG)/Engram
Daily	Adapters (maybe)	Lightweight Parameter-Efficient Fine-Tuning (PEFT)
Weekly	Validated adapters	Reviewed updates
Monthly	Core weights	Major consolidation

Most systems should:

Update memory frequently
Update adapters occasionally
Update core weights rarely

Evaluation Is Critical

Continuous learning without continuous evaluation is dangerous.

Required:

Retention tests (what got worse?)
Forward transfer tests (what got better?)
Regression detection
Rollback capability

Without these, you’re flying blind.

References

Concept	Paper
ELLA	Subspace Learning for Lifelong ML (2024)
Share	Shared LoRA Subspaces (2025)
FOREVER	Model-Centric Replay (2024)
EWC	Overcoming Catastrophic Forgetting (Kirkpatrick et al. 2017)

Coming Next

In Part 7, we’ll put it all together: designing a practical continuous learning agent with layered architecture, logging, feedback loops, and safety.

Learn often in memory. Consolidate carefully in weights.