LLMs are black boxes. Baby Dragon Hatchling (BDH) is different—a brain-inspired language model with sparse, interpretable activations.

Train it on Shakespeare and actually see what’s happening inside.

This is Part 4 of the Small Models, Big Brains series, exploring interpretability through sparsity.

Resource Link
Paper Pathway (Sparse Coding)
Original Code pathwaycom/bdh
Fork (with tools) softwarewrighter/bdh
Video This AI Has a Visible Brain
Video

The Black Box Problem

Modern neural networks are opaque:

  • Billions of parameters
  • Dense activations everywhere
  • No clear mapping from neurons to concepts
  • “It works, but we don’t know why”

This isn’t just an academic concern. We’re deploying AI systems we don’t understand.

Baby Dragon Hatchling: A Different Approach

BDH takes inspiration from biological brains, which use sparse coding:

Biological Brains Dense Neural Networks
~1-5% neurons active ~100% neurons active
Energy efficient Computationally expensive
Interpretable patterns Distributed, opaque
Robust to noise Brittle

Sparse Activations

BDH enforces 80% sparsity—only 20% of neurons are active for any given token.

Dense Network:    [████████████████████] 100% active
BDH:              [████░░░░░░░░░░░░░░░░]  20% active

This constraint forces the network to learn meaningful, localized representations.

Training on Shakespeare

The demo trains BDH on Shakespeare’s works:

Training Progress:
Epoch 1:   Loss 0.86
Epoch 50:  Loss 0.54
Epoch 100: Loss 0.38
Epoch 200: Loss 0.22

Loss drops from 0.86 to 0.22—the architecture works.

Seeing Inside the Model

With sparse activations, you can actually inspect what neurons mean:

# Which neurons fire for "love"?
activations = model.forward("love")
active_neurons = activations.nonzero()

# Neuron 47: fires for emotional words
# Neuron 112: fires for abstract nouns
# Neuron 203: fires for relationship terms

When only 20% of neurons fire, each one carries interpretable meaning.

Running the Code

The bdh repository is a fork of Pathway’s original with added inspection tools:

git clone https://github.com/softwarewrighter/bdh
cd bdh
pip install -r requirements.txt

# Train on Shakespeare
python train.py --dataset shakespeare --sparsity 0.8

# Inspect activations
python inspect.py --model checkpoint.pt --text "To be or not to be"

GPU recommended (Nvidia or Apple Silicon) for reasonable training times.

Why Sparsity Enables Interpretability

Dense Networks

Every neuron participates in every computation. The “meaning” of any single neuron is distributed across all inputs it ever sees.

Input: "cat"  → All neurons contribute → Output
Input: "dog"  → All neurons contribute → Output
Input: "love" → All neurons contribute → Output

Trying to understand one neuron means understanding everything.

Sparse Networks

Only a small subset of neurons fire for each input. Neurons develop specialization.

Input: "cat"  → Neurons [12, 47, 89] fire → Output
Input: "dog"  → Neurons [12, 52, 89] fire → Output
Input: "love" → Neurons [47, 112, 203] fire → Output

Neuron 12 might mean “animal.” Neuron 47 might mean “emotional/living.” You can actually trace meaning.

Comparison with Other Sparse Architectures

Model Sparsity Type Purpose
Mixture of Experts Routing sparsity Efficiency
Top-k attention Attention sparsity Memory
BDH Activation sparsity Interpretability

BDH’s sparsity is specifically designed for understanding, not just efficiency.

Implementation Details

Metric Value
Primary Language Python
Source Files 9 .py files
Estimated Size ~1.5 KLOC
Framework PyTorch
Build System pip / requirements.txt
GPU Support CUDA, MPS (Apple Silicon)

Good for you if: You want to experiment with sparse neural architectures, study interpretability techniques, or train small language models with visible internals.

Complexity: Low-Moderate. Standard PyTorch project structure. The sparse activation mechanism is well-documented. Fork includes additional inspection tools not in the original.

Key Takeaways

  1. Sparsity enables interpretability. When fewer neurons fire, each one means more.

  2. Brain-inspired design works. Biological neural coding principles transfer to AI.

  3. Interpretability doesn’t require sacrifice. BDH learns effectively despite constraints.

  4. We can build AI we understand. Black boxes aren’t inevitable.

Current Limitations

  • Early research stage
  • Smaller scale than production models
  • Training requires more epochs
  • Not yet competitive with dense models on benchmarks

But the principle is sound: constraint breeds clarity.

What’s Next

Part 5 dives into the 1B parameter sweet spot—comparing TinyLlama, Llama 3.2, StableLM, and Pythia.

Resources