RLM: Recursive Language Models for Massive Context
995 words • 5 min read • Abstract

What happens when your data won’t fit in a context window? RLM expands the workspace instead of cramming everything into limited memory. This post covers the MIT paper, my Rust implementation, and six video demonstrations.
| Resource | Link |
|---|---|
| Paper | arXiv:2512.24601 |
| Code | rlm-project |
| Playlist | RLM Implementations |
The Problem: Context Limits
Large language models have a hard limit. They can only process so much text at once.
Imagine a cookie jar that holds 100 cookies. What if you need to search through ten thousand? When you force too much in, the model forgets things—this is called context rot.
Bigger models help, but the limit always exists. We need a different approach.
The RLM Solution
Recursive Language Models flip the problem. Instead of bigger jars, use better tools.
The data stays in a context box. The model gets tools to peek inside:
| Tool | Purpose |
|---|---|
slice |
Get a character range |
find |
Search for text |
regex |
Pattern matching |
count |
Count occurrences |
llm_query |
Ask a sub-LLM to analyze a chunk |
Small, focused, deliberate. The model thinks about what it needs, then asks for just that.
The Results
From the MIT paper—on tasks that don’t fit in context:
| Approach | Accuracy |
|---|---|
| Standard prompting | 0% |
| RLM | 87-91% |
Results hold across GPT-4, Claude, Llama, Mistral, and Gemini.
My Implementation: Four Capability Levels
I built a Rust implementation with four capability levels:
| Level | Name | Description |
|---|---|---|
| L1 | DSL | Built-in commands (find, regex, count) |
| L2 | WASM | LLM generates Rust → compiles to WebAssembly sandbox |
| L3 | CLI | LLM generates Rust → compiles to native binary |
| L4 | LLM | Recursive delegation to sub-LLMs |
Each level trades off safety for capability:
- L1 is instant but limited to predefined operations
- L2 runs custom code but in a sandboxed environment
- L3 breaks free for large datasets that would timeout in WASM
- L4 uses LLM reasoning for semantic analysis
The Video Series
Six videos demonstrate RLM in action:
1. RLM Explained
The foundational video. Covers the MIT paper, the cookie jar analogy, and benchmark results showing 0% → 91% accuracy improvement.
Key insight: Expand the workspace, not the context.
2. War and Peace Demo
Can AI read all of War and Peace to find a hidden secret? The full text is 3.2 MB with 65,666 lines—way too big for any context window.
RLM finds “the password to Prince Andrei’s secret vault” in just 2 iterations using only 3,000 tokens. That’s 100% savings compared to sending the full document.
3. WASM Sandboxing
What if your LLM could write custom analysis code on the fly? Level 2 demonstrates WebAssembly sandboxing.
The LLM writes Rust code that compiles to WASM and runs in a secure sandbox. Demos include:
- Error ranking in logs
- Response time percentiles
- Unique IP counting
Trade-offs: ASCII only, 64MB memory limit, subset of Rust.
4. Native CLI Binaries
When 5,000 lines would timeout in WASM, Level 3 breaks free. Native Rust binaries process massive datasets with no limits.
Four CLI demos:
- Error ranking: Hash map counts error types
- Unique IPs: Hash set finds distinct addresses
- Percentiles: Sort and index for p50/p95/p99
- Word frequency: Tokenize, filter stop words, count
5. Detective Mystery Demo
A murder at the manor. Seven suspects. Dozens of clues. Can an LLM solve it?
Level 4 delegates reasoning to sub-LLMs. Instead of code execution, the model calls other models to:
- Analyze witness statements
- Compare alibis
- Draw conclusions
Watch as L4 examines each suspect and identifies the killer.
6. Large Context Processing
War and Peace is 3MB—far too large for any context window. This video shows Level 4 extracting noble family relationships from the entire novel.
The process:
- L3 extracts relationship sentences (father, mother, son, daughter…)
- L4 analyzes filtered data with sub-LLMs
- Final output: structured family trees
Three million characters → structured family trees in ~90 seconds.
Architecture
┌─────────────┐ ┌─────────────────┐ ┌─────────────┐
│ Client │────▶│ RLM Server │────▶│ Root LLM │
│ /visualize │ │ (Rust/Axum) │ │ (DeepSeek) │
└─────────────┘ └────────┬────────┘ └─────────────┘
│
┌────────▼────────┐
│ Command Executor │
│ slice, find, │
│ regex, count, │
│ llm_query... │
└────────┬────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Ollama │ │ Ollama │ │ Ollama │
│ (local) │ │ (remote) │ │ (other) │
└──────────┘ └──────────┘ └──────────┘
Sub-LM Pool (for llm_query)
Quick Start
cd rlm-orchestrator
# Configure providers in config.toml
export DEEPSEEK_API_KEY="your-key"
# Run the server
cargo run --bin rlm-server
# Open visualizer
open http://localhost:8080/visualize
The Cookie Jar Analogy
Think of it like this:
- Old way: Dump everything on the table, then dig through the mess
- RLM way: Use a scoop—grab just the cookies you need
The key insight is simple: expand the workspace, not the context.
Resources
- RLM Paper (arXiv:2512.24601) - Zhang, Kraska, Khattab (MIT CSAIL)
- rlm-project Repository
- rlm-project Wiki
- RLM Implementations Playlist
- ELI5: What is RLM?
When context windows aren’t enough, RLM gives your LLM tools to explore. Six videos, four capability levels, one insight: expand the workspace, not the context.
Part 3 of the Machine Learning series. View all parts | Next: Part 4 →





