Large context windows are not a complete solution.

As context grows:

  • Attention dilutes
  • Errors compound
  • Reasoning quality degrades
Resource Link
Related RLM | ICL Revisited

The Context Problem

Transformers have finite attention. With limited attention heads and capacity, the model cannot attend equally to everything. As tokens accumulate:

  • Earlier instructions lose influence
  • Patterns average toward generic responses
  • Multi-step reasoning fails

This is context rot—not forgetting weights, but losing signal in noise.

In-Context Learning (ICL)

The model adapts temporarily via examples in the prompt.

Aspect ICL
Updates weights? No
Persists across sessions? No
Speed Instant
Mechanism Activations, not gradients

ICL is powerful but ephemeral. It’s working memory, not learning.

Limitation: As context grows, ICL examples compete with other content for attention.

Recursive Language Models (RLM)

Circular flow diagram showing LLM connected to Tools, Memory, Context, and Evaluation in a recursive loop
Rebuild context each step instead of dragging it forward.

RLMs decompose reasoning into multiple passes. Instead of dragging entire context forward:

  1. Query relevant memory
  2. Retrieve what’s needed now
  3. Execute tools
  4. Evaluate results
  5. Reconstruct focused context
  6. Repeat

This treats context as a dynamic environment, not a static blob.

Why RLM Works

Traditional approach:

[System prompt + 50k tokens of history + query]

RLM approach:

[System prompt + retrieved relevant context + current query]

Each reasoning step starts fresh with focused attention.

Context Engineering Techniques

Technique How It Helps
Summarization Compress old context, preserve essentials
Chunking Process in segments, aggregate results
Retrieval Pull relevant content, not everything
Tool offloading Store state externally, query on demand
Structured prompts Clear sections, explicit priorities

Tool Use as Context Management

Tools aren’t just for actions—they’re for state management.

Instead of keeping everything in context:

  • Store in files, databases, or structured formats
  • Query when needed
  • Return focused results

This converts unbounded context into bounded queries.

The Agent Loop

Modern agents combine these ideas:

while not done:
    # 1. Assess current state
    relevant = retrieve_from_memory(query)

    # 2. Build focused context
    context = [system_prompt, relevant, current_task]

    # 3. Reason
    action = llm(context)

    # 4. Execute
    result = execute_tool(action)

    # 5. Update memory
    memory.store(result)

    # 6. Evaluate
    if goal_achieved(result):
        done = True

Each iteration rebuilds context. No rot accumulation.

Test-Time Adaptation

A related technique: temporarily update weights during inference.

Aspect Test-Time Learning
Updates weights? Yes, lightly (LoRA)
Persists? No (rolled back)
Purpose Adapt to input distribution

This sits between ICL (no updates) and fine-tuning (permanent updates).

Key Insight

Context is not a static buffer. It’s a dynamic workspace.

Systems that treat context as “append everything” will rot. Systems that actively manage context stay coherent.

References

Concept Paper
RLM Recursive Language Models (Zhou et al. 2024)
ICL What Can Transformers Learn In-Context? (Garg et al. 2022)
Test-Time Training TTT for Language Models (2024)
Chain-of-Thought Chain-of-Thought Prompting (Wei et al. 2022)

Coming Next

In Part 6, we’ll connect all of this to continuous learning: replay methods, subspace regularization, adapter evolution, and consolidation loops.


Rebuild focus instead of dragging baggage.