How AI Learns Part 5: Context Engineering & Recursive Reasoning
631 words • 4 min read • Abstract

Large context windows are not a complete solution.
As context grows:
- Attention dilutes
- Errors compound
- Reasoning quality degrades
| Resource | Link |
|---|---|
| Related | RLM | ICL Revisited |
The Context Problem
Transformers have finite attention. With limited attention heads and capacity, the model cannot attend equally to everything. As tokens accumulate:
- Earlier instructions lose influence
- Patterns average toward generic responses
- Multi-step reasoning fails
This is context rot—not forgetting weights, but losing signal in noise.
In-Context Learning (ICL)
The model adapts temporarily via examples in the prompt.
| Aspect | ICL |
|---|---|
| Updates weights? | No |
| Persists across sessions? | No |
| Speed | Instant |
| Mechanism | Activations, not gradients |
ICL is powerful but ephemeral. It’s working memory, not learning.
Limitation: As context grows, ICL examples compete with other content for attention.
Recursive Language Models (RLM)
RLMs decompose reasoning into multiple passes. Instead of dragging entire context forward:
- Query relevant memory
- Retrieve what’s needed now
- Execute tools
- Evaluate results
- Reconstruct focused context
- Repeat
This treats context as a dynamic environment, not a static blob.
Why RLM Works
Traditional approach:
[System prompt + 50k tokens of history + query]
RLM approach:
[System prompt + retrieved relevant context + current query]
Each reasoning step starts fresh with focused attention.
Context Engineering Techniques
| Technique | How It Helps |
|---|---|
| Summarization | Compress old context, preserve essentials |
| Chunking | Process in segments, aggregate results |
| Retrieval | Pull relevant content, not everything |
| Tool offloading | Store state externally, query on demand |
| Structured prompts | Clear sections, explicit priorities |
Tool Use as Context Management
Tools aren’t just for actions—they’re for state management.
Instead of keeping everything in context:
- Store in files, databases, or structured formats
- Query when needed
- Return focused results
This converts unbounded context into bounded queries.
The Agent Loop
Modern agents combine these ideas:
while not done:
# 1. Assess current state
relevant = retrieve_from_memory(query)
# 2. Build focused context
context = [system_prompt, relevant, current_task]
# 3. Reason
action = llm(context)
# 4. Execute
result = execute_tool(action)
# 5. Update memory
memory.store(result)
# 6. Evaluate
if goal_achieved(result):
done = True
Each iteration rebuilds context. No rot accumulation.
Test-Time Adaptation
A related technique: temporarily update weights during inference.
| Aspect | Test-Time Learning |
|---|---|
| Updates weights? | Yes, lightly (LoRA) |
| Persists? | No (rolled back) |
| Purpose | Adapt to input distribution |
This sits between ICL (no updates) and fine-tuning (permanent updates).
Key Insight
Context is not a static buffer. It’s a dynamic workspace.
Systems that treat context as “append everything” will rot. Systems that actively manage context stay coherent.
References
| Concept | Paper |
|---|---|
| RLM | Recursive Language Models (Zhou et al. 2024) |
| ICL | What Can Transformers Learn In-Context? (Garg et al. 2022) |
| Test-Time Training | TTT for Language Models (2024) |
| Chain-of-Thought | Chain-of-Thought Prompting (Wei et al. 2022) |
Coming Next
In Part 6, we’ll connect all of this to continuous learning: replay methods, subspace regularization, adapter evolution, and consolidation loops.
Rebuild focus instead of dragging baggage.
Part 5 of the How AI Learns series. View all parts | Next: Part 6 →