Small Models (1/6): 976 Parameters Beat Billions
703 words • 4 min read • Abstract

The best large language models score zero on hard mazes. A model with under 1,000 parameters scores 85 percent.
This is Part 1 of the Small Models, Big Brains series, exploring how tiny models with clever architectures outperform massive ones on specific tasks.
| Resource | Link |
|---|---|
| Paper | Tiny Recursive Model |
| Code | train-trm |
| Video | 976 parameters is more than billions?!![]() |
Why LLMs Fail at Mazes
Large language models generate one token at a time. They cannot backtrack. One wrong move and the entire solution fails.
Maze solving requires:
- Exploring dead ends
- Backtracking when stuck
- Maintaining spatial awareness
- Planning multiple steps ahead
Autoregressive generation is fundamentally incompatible with these requirements.
Meet TRM: The Tiny Recursive Model
The Tiny Recursive Model uses under 1,000 parameters. Instead of being bigger, it thinks in loops.
Input → Think → Act → Think → Act → ... → Output
A simple two-layer network that iterates until the solution emerges.
The Architecture
TRM alternates between two phases:
| Phase | Purpose |
|---|---|
| Think | Update internal latent state by processing input, current answer, and previous state |
| Act | Update the answer based on the refined latent state |
This process repeats for multiple cycles, progressively improving the output.
TRMConfig {
input_dim: 5,
output_dim: 5,
hidden_dim: 16,
latent_dim: 16,
l_layers: 2, // Network depth
h_cycles: 3, // Outer think-act cycles
l_cycles: 4, // Inner think cycles
}
The Secret: Deep Supervision
The key insight isn’t just recursion—it’s supervising every step, not just the final answer.
Traditional training:
Input → [black box] → Final Output → Loss
TRM training:
Input → Step 1 → Loss₁
→ Step 2 → Loss₂
→ Step 3 → Loss₃
→ ...
→ Final → Loss_n
Every iteration gets feedback. The model learns to make progress at each step.
Results
| Model | Maze Accuracy |
|---|---|
| GPT-4 | ~0% on hard mazes |
| Claude | ~0% on hard mazes |
| TRM (976 params) | 85% |
Iteration beats scale.
Running the Code
The train-trm repo provides a complete Rust implementation:
# Clone and build
git clone https://github.com/softwarewrighter/train-trm
cd train-trm
./scripts/build.sh --release
# Train a model
./scripts/train.sh --epochs 1000 --lr 0.01
# Evaluate
./scripts/eval.sh
# Or launch the web UI
cargo install --locked trunk
./scripts/web-serve.sh
The web UI includes interactive maze visualization with solution paths and real-time training charts.
Implementation Details
| Metric | Value |
|---|---|
| Primary Language | Rust |
| Source Files | 21 .rs files |
| Estimated Size | ~2.5 KLOC |
| Also Includes | HTML (web UI), Shell scripts |
| Build System | Cargo + Trunk (WASM) |
| Dependencies | ndarray, serde, clap, wasm-bindgen |
Good for you if: You want to learn Rust ML from scratch, experiment with recursive architectures, or need a web-based training visualization.
Complexity: Moderate. Clean Rust code with good documentation. The neural network is implemented from scratch (no PyTorch/TensorFlow), making it educational but requiring Rust familiarity.
Key Takeaways
-
Parameter count isn’t everything. Architecture and training strategy matter more for certain tasks.
-
Recursion enables backtracking. By iterating, TRM can explore and refine solutions.
-
Deep supervision accelerates learning. Feedback at every step, not just the end.
-
Task-specific models excel. TRM isn’t a general-purpose LLM—it’s optimized for maze-like reasoning.
What’s Next
Part 2 explores MobileLLM and running AI completely offline on your Android phone.
Resources
Part 1 of the Small Models, Big Brains series. View all parts | Next: Part 2 →
