Deepseek Papers (2/3): Engram - Conditional Memory for Transformers

Implementing Deepseek's Engram paper on conditional memory. Instead of recomputing common patterns through O(n^2) attention, Engram provides O(1) lookup for cached results. Our LoRA-based behavioral approximation achieves 58% loss reduction in 10 seconds.

Deepseek publishes papers. I implement them. This paper tackles another fundamental transformer problem: redundant computation.

This post covers my implementation of Engram (Conditional Memory via Scalable Lookup)—running on both Apple Silicon and NVIDIA GPUs.

Resource	Link
Paper	arXiv:2601.07372
Code	engram-poc
Video 1	Engram Part 1
Video 2	Engram Part 2
Comments	Discord

The Problem: Redundant Computation

LLMs waste compute reconstructing patterns they’ve seen before:

Style rules repeated across files
Common code idioms re-derived each call
Boilerplate knowledge injected repeatedly

Attention computes everything from scratch every time. For recurring patterns, this is wasteful.

The Engram Solution: O(1) Lookup

Engram introduces conditional memory as a complementary sparsity axis. Instead of recomputing common patterns through attention, look them up in O(1) time.

Think of it as a cache for the model’s learned patterns:

Without Engram	With Engram
Recompute pattern every call	Look up cached result
O(n²) attention	O(1) deterministic lookup
Implicit knowledge	Explicit, inspectable memory

The PoC Approach

The full Engram paper describes in-model memory. The engram-poc repo approximates the benefits through behavioral fine-tuning:

Pattern Injection: Training data encodes lookup-like patterns
LoRA Adapters: Learn to recognize and consistently respond
Evaluation: Compare baseline vs tuned model

Pattern Categories

The PoC includes 131 patterns across 4 categories:

Category	Examples
Code Idioms	`for i in range(` → `len(items)):`
Factual Recall	`HTTP status for 'Not Found'?` → `404`
Format Transforms	`snake_case: getUserName` → `get_user_name`
Error Fixes	`Fix: if x = 5:` → `if x == 5:`

Results

Training on SmolLM-135M-Instruct:

Metric	Value
Training Examples	337
Training Time	~10 seconds (M-series Mac)
Loss Reduction	58.2% (4.34 → 1.82)

Behavioral change:

Prompt: Complete: for i in range(

Baseline:     "Here is a Python function that implements this approach..."
Engram-tuned: "len(items)):"

The tuned model produces direct, pattern-completing responses instead of verbose explanations.

Running the Engram Demo

git clone https://github.com/softwarewrighter/engram-poc
cd engram-poc

# Apple Silicon
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt
./scripts/run_all.sh

# NVIDIA GPU (separate directory)
cd unsloth-nvidia
uv venv && source .venv/bin/activate
uv pip install torch --index-url https://download.pytorch.org/whl/cu124
uv pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
./scripts/run_all.sh

Implementation Details

Metric	Value
Primary Language	Python
Source Files	24 `.py`, 10 `.sh`, 6 `.yaml`
Estimated Size	~3.0 KLOC
Frameworks	MLX-LM, Unsloth
Platforms	Apple Silicon, NVIDIA CUDA
Key Features	LoRA fine-tuning, pattern evaluation, interactive demo

Good for you if: You want to experiment with LoRA fine-tuning, understand behavioral pattern injection, or compare MLX vs Unsloth workflows.

Complexity: Moderate. Includes extensive documentation and video recording guides. Pattern data is human-readable YAML.

Key Takeaways

Engram reduces redundant computation. O(1) lookup for recurring patterns beats recomputing through attention.
LoRA makes experimentation accessible. Fine-tune small models in seconds on a laptop.
Cross-platform matters. The repo runs on Apple Silicon and NVIDIA, with different tooling for each.
Deepseek publishes useful research. Their papers address real problems with practical solutions.

What’s Next

Part 3 will cover Engram Revisited—what happened when we moved from behavioral emulation to real hash-based memory implementation. Spoiler: it works, but not everywhere.

Resources

Implementing papers is the best way to understand them. Clone the repo and run the demo yourself.