Small Models (4/6): This AI Has a Visible Brain
842 words • 5 min read • Abstract

LLMs are black boxes. Baby Dragon Hatchling (BDH) is different—a brain-inspired language model with sparse, interpretable activations.
Train it on Shakespeare and actually see what’s happening inside.
This is Part 4 of the Small Models, Big Brains series, exploring interpretability through sparsity.
| Resource | Link |
|---|---|
| Paper | Pathway (Sparse Coding) |
| Original Code | pathwaycom/bdh |
| Fork (with tools) | softwarewrighter/bdh |
| Video | This AI Has a Visible Brain![]() |
The Black Box Problem
Modern neural networks are opaque:
- Billions of parameters
- Dense activations everywhere
- No clear mapping from neurons to concepts
- “It works, but we don’t know why”
This isn’t just an academic concern. We’re deploying AI systems we don’t understand.
Baby Dragon Hatchling: A Different Approach
BDH takes inspiration from biological brains, which use sparse coding:
| Biological Brains | Dense Neural Networks |
|---|---|
| ~1-5% neurons active | ~100% neurons active |
| Energy efficient | Computationally expensive |
| Interpretable patterns | Distributed, opaque |
| Robust to noise | Brittle |
Sparse Activations
BDH enforces 80% sparsity—only 20% of neurons are active for any given token.
Dense Network: [████████████████████] 100% active
BDH: [████░░░░░░░░░░░░░░░░] 20% active
This constraint forces the network to learn meaningful, localized representations.
Training on Shakespeare
The demo trains BDH on Shakespeare’s works:
Training Progress:
Epoch 1: Loss 0.86
Epoch 50: Loss 0.54
Epoch 100: Loss 0.38
Epoch 200: Loss 0.22
Loss drops from 0.86 to 0.22—the architecture works.
Seeing Inside the Model
With sparse activations, you can actually inspect what neurons mean:
# Which neurons fire for "love"?
activations = model.forward("love")
active_neurons = activations.nonzero()
# Neuron 47: fires for emotional words
# Neuron 112: fires for abstract nouns
# Neuron 203: fires for relationship terms
When only 20% of neurons fire, each one carries interpretable meaning.
Running the Code
The bdh repository is a fork of Pathway’s original with added inspection tools:
git clone https://github.com/softwarewrighter/bdh
cd bdh
pip install -r requirements.txt
# Train on Shakespeare
python train.py --dataset shakespeare --sparsity 0.8
# Inspect activations
python inspect.py --model checkpoint.pt --text "To be or not to be"
GPU recommended (Nvidia or Apple Silicon) for reasonable training times.
Why Sparsity Enables Interpretability
Dense Networks
Every neuron participates in every computation. The “meaning” of any single neuron is distributed across all inputs it ever sees.
Input: "cat" → All neurons contribute → Output
Input: "dog" → All neurons contribute → Output
Input: "love" → All neurons contribute → Output
Trying to understand one neuron means understanding everything.
Sparse Networks
Only a small subset of neurons fire for each input. Neurons develop specialization.
Input: "cat" → Neurons [12, 47, 89] fire → Output
Input: "dog" → Neurons [12, 52, 89] fire → Output
Input: "love" → Neurons [47, 112, 203] fire → Output
Neuron 12 might mean “animal.” Neuron 47 might mean “emotional/living.” You can actually trace meaning.
Comparison with Other Sparse Architectures
| Model | Sparsity Type | Purpose |
|---|---|---|
| Mixture of Experts | Routing sparsity | Efficiency |
| Top-k attention | Attention sparsity | Memory |
| BDH | Activation sparsity | Interpretability |
BDH’s sparsity is specifically designed for understanding, not just efficiency.
Implementation Details
| Metric | Value |
|---|---|
| Primary Language | Python |
| Source Files | 9 .py files |
| Estimated Size | ~1.5 KLOC |
| Framework | PyTorch |
| Build System | pip / requirements.txt |
| GPU Support | CUDA, MPS (Apple Silicon) |
Good for you if: You want to experiment with sparse neural architectures, study interpretability techniques, or train small language models with visible internals.
Complexity: Low-Moderate. Standard PyTorch project structure. The sparse activation mechanism is well-documented. Fork includes additional inspection tools not in the original.
Key Takeaways
-
Sparsity enables interpretability. When fewer neurons fire, each one means more.
-
Brain-inspired design works. Biological neural coding principles transfer to AI.
-
Interpretability doesn’t require sacrifice. BDH learns effectively despite constraints.
-
We can build AI we understand. Black boxes aren’t inevitable.
Current Limitations
- Early research stage
- Smaller scale than production models
- Training requires more epochs
- Not yet competitive with dense models on benchmarks
But the principle is sound: constraint breeds clarity.
What’s Next
Part 5 dives into the 1B parameter sweet spot—comparing TinyLlama, Llama 3.2, StableLM, and Pythia.
Resources
- Pathway Paper
- Original Pathway Code
- bdh Repository (with inspection tools)
- Video: This AI Has a Visible Brain
Part 4 of the Small Models, Big Brains series. View all parts | Next: Part 5 →
