A robust continuous learning agent contains:

  • Core model (rarely updated)
  • Adapters (modular skills)
  • External memory (facts)
  • Context manager (Recursive Language Model (RLM)-style)
  • Logging & evaluation loop
Resource Link
Related RLM | Engram | Sleepy Coder

The Layered Architecture

Four-layer architecture showing Context Manager, External Memory, Adapters, and Core Weights with feedback and evaluation loops
Continuous learning is layered coordination.

Layer by Layer

Layer 4: Core Weights (Bottom)

The foundation. Trained once, changed rarely.

Aspect Details
Contains General reasoning, language, base knowledge
Update frequency Months or never
Update method Full fine-tune or major consolidation
Risk of change High (forgetting, capability shifts)

Rule: Don’t touch this unless you have a very good reason.

Layer 3: Adapters (Parameter-Efficient Fine-Tuning (PEFT) / Low-Rank Adaptation (LoRA))

Modular skills that plug into the base.

Aspect Details
Contains Task-specific capabilities
Update frequency Weekly to monthly
Update method Lightweight PEFT training
Risk of change Medium (isolated, but validate)

Rule: Train adapters for validated, recurring patterns. Version them. Enable rollback.

Layer 2: External Memory

Facts, experiences, and retrieved knowledge.

Aspect Details
Contains Documents, logs, structured data
Update frequency Continuous
Update method Database writes
Risk of change Low (doesn’t affect weights)

Rule: Store experiences here first. Memory is cheap and safe.

Layer 1: Context Manager (Top)

The RLM-style interface that rebuilds focus each step.

Aspect Details
Contains Current context, retrieved data, active state
Update frequency Per call
Update method Reconstruction from memory + query
Risk of change None (ephemeral)

Rule: Don’t drag context forward. Rebuild it.

The Feedback Loop

Logging

Capture everything the agent does:

  • Prompts received
  • Actions taken
  • Tool calls made
  • Errors encountered
  • User signals

This is your training data.

Evaluation

Before any update reaches production:

Check Purpose
Retention tests Did old skills degrade?
Forward transfer Did new skills improve?
Regression suite Known failure cases
Safety checks Harmful outputs?

Without evaluation, you’re updating blind.

Deployment

Updates should be:

  • Modular: Can isolate and rollback
  • Versioned: Know what changed when
  • Staged: Test before full rollout
  • Monitored: Track post-deployment metrics

The Error Flow

Where do errors go?

Error occurs
    ↓
Log it (immediate)
    ↓
Store in memory (same day)
    ↓
Pattern emerges over multiple occurrences
    ↓
Train adapter update (weekly/monthly)
    ↓
Validate update (before deployment)
    ↓
Deploy with rollback capability

Errors feed into memory first. Only validated, recurring improvements reach adapters. Core weights almost never change.

What This Architecture Achieves

Problem Solution
Catastrophic forgetting Core weights frozen; adapters isolated
Context rot RLM rebuilds focus each step
Hallucination Memory grounds responses
Slow adaptation Memory updates continuously
Unsafe changes Evaluation before deployment

Design Principles

1. Separate Storage from Reasoning

Facts belong in memory. Reasoning belongs in weights. Don’t blur them.

2. Separate Speed from Permanence

Fast learning (memory) is temporary. Slow learning (weights) is permanent. Match the update speed to the desired permanence.

3. Evaluate Before Consolidating

Every update to adapters or weights must be validated. Regressions are silent killers.

4. Enable Rollback

Version everything. If an update causes problems, you must be able to undo it.

5. Log Everything

You cannot improve what you cannot measure. Structured logging is the foundation of continuous learning.

The Big Picture

AI does not learn in one place.

It learns in layers:

  • Permanent (weights)
  • Modular (adapters)
  • External (memory)
  • Temporary (context)

Continuous learning is not constant weight updates.

It is careful coordination across time scales.

Continuous learning systems don’t constantly retrain. They carefully consolidate what works.

References

Concept Paper
LoRA LoRA: Low-Rank Adaptation (Hu et al. 2021)
RAG Retrieval-Augmented Generation (Lewis et al. 2020)
RLM Recursive Language Models (Zhou et al. 2024)
Share Shared LoRA Subspaces (2025)
Engram Engram: Conditional Memory (DeepSeek 2025)

Series Summary

Part Key Insight
1. Time Scales Learning happens at different layers and speeds
2. Forgetting vs Rot Different failures need different fixes
3. Weight-Based Change the brain carefully
4. Memory-Based Store facts outside the brain
5. Context & RLM Rebuild focus instead of dragging baggage
6. Continuous Learning Learn in memory, consolidate in weights
7. Full Architecture Layered coordination enables safe improvement

Continuous learning is layered coordination.