What happens when your data won’t fit in a context window? RLM expands the workspace instead of cramming everything into limited memory. This post covers the MIT paper, my Rust implementation, and six video demonstrations.

Resource Link
Paper arXiv:2512.24601
Code rlm-project
Playlist RLM Implementations

The Problem: Context Limits

Large language models have a hard limit. They can only process so much text at once.

Imagine a cookie jar that holds 100 cookies. What if you need to search through ten thousand? When you force too much in, the model forgets things—this is called context rot.

Bigger models help, but the limit always exists. We need a different approach.

The RLM Solution

Recursive Language Models flip the problem. Instead of bigger jars, use better tools.

The data stays in a context box. The model gets tools to peek inside:

Tool Purpose
slice Get a character range
find Search for text
regex Pattern matching
count Count occurrences
llm_query Ask a sub-LLM to analyze a chunk

Small, focused, deliberate. The model thinks about what it needs, then asks for just that.

The Results

From the MIT paper—on tasks that don’t fit in context:

Approach Accuracy
Standard prompting 0%
RLM 87-91%

Results hold across GPT-4, Claude, Llama, Mistral, and Gemini.

My Implementation: Four Capability Levels

I built a Rust implementation with four capability levels:

Level Name Description
L1 DSL Built-in commands (find, regex, count)
L2 WASM LLM generates Rust → compiles to WebAssembly sandbox
L3 CLI LLM generates Rust → compiles to native binary
L4 LLM Recursive delegation to sub-LLMs

Each level trades off safety for capability:

  • L1 is instant but limited to predefined operations
  • L2 runs custom code but in a sandboxed environment
  • L3 breaks free for large datasets that would timeout in WASM
  • L4 uses LLM reasoning for semantic analysis

The Video Series

Six videos demonstrate RLM in action:

1. RLM Explained

RLM Explained

The foundational video. Covers the MIT paper, the cookie jar analogy, and benchmark results showing 0% → 91% accuracy improvement.

Key insight: Expand the workspace, not the context.


2. War and Peace Demo

War and Peace Demo

Can AI read all of War and Peace to find a hidden secret? The full text is 3.2 MB with 65,666 lines—way too big for any context window.

RLM finds “the password to Prince Andrei’s secret vault” in just 2 iterations using only 3,000 tokens. That’s 100% savings compared to sending the full document.


3. WASM Sandboxing

WASM Sandboxing

What if your LLM could write custom analysis code on the fly? Level 2 demonstrates WebAssembly sandboxing.

The LLM writes Rust code that compiles to WASM and runs in a secure sandbox. Demos include:

  • Error ranking in logs
  • Response time percentiles
  • Unique IP counting

Trade-offs: ASCII only, 64MB memory limit, subset of Rust.


4. Native CLI Binaries

Native CLI Binaries

When 5,000 lines would timeout in WASM, Level 3 breaks free. Native Rust binaries process massive datasets with no limits.

Four CLI demos:

  • Error ranking: Hash map counts error types
  • Unique IPs: Hash set finds distinct addresses
  • Percentiles: Sort and index for p50/p95/p99
  • Word frequency: Tokenize, filter stop words, count

5. Detective Mystery Demo

Detective Mystery Demo

A murder at the manor. Seven suspects. Dozens of clues. Can an LLM solve it?

Level 4 delegates reasoning to sub-LLMs. Instead of code execution, the model calls other models to:

  • Analyze witness statements
  • Compare alibis
  • Draw conclusions

Watch as L4 examines each suspect and identifies the killer.


6. Large Context Processing

Large Context Processing

War and Peace is 3MB—far too large for any context window. This video shows Level 4 extracting noble family relationships from the entire novel.

The process:

  1. L3 extracts relationship sentences (father, mother, son, daughter…)
  2. L4 analyzes filtered data with sub-LLMs
  3. Final output: structured family trees

Three million characters → structured family trees in ~90 seconds.


Architecture

┌─────────────┐     ┌─────────────────┐     ┌─────────────┐
│   Client    │────▶│  RLM Server     │────▶│  Root LLM   │
│  /visualize │     │  (Rust/Axum)    │     │  (DeepSeek) │
└─────────────┘     └────────┬────────┘     └─────────────┘
                             │
                    ┌────────▼────────┐
                    │ Command Executor │
                    │  slice, find,   │
                    │  regex, count,  │
                    │  llm_query...   │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
        │  Ollama  │  │  Ollama  │  │  Ollama  │
        │ (local)  │  │ (remote) │  │ (other)  │
        └──────────┘  └──────────┘  └──────────┘
              Sub-LM Pool (for llm_query)

Quick Start

cd rlm-orchestrator

# Configure providers in config.toml
export DEEPSEEK_API_KEY="your-key"

# Run the server
cargo run --bin rlm-server

# Open visualizer
open http://localhost:8080/visualize

Think of it like this:

  • Old way: Dump everything on the table, then dig through the mess
  • RLM way: Use a scoop—grab just the cookies you need

The key insight is simple: expand the workspace, not the context.

Resources


When context windows aren’t enough, RLM gives your LLM tools to explore. Six videos, four capability levels, one insight: expand the workspace, not the context.