Personal Software #8: One Ring to Rule Them All --- sw-launcher's Memory Profiles, Heap Budgets, and a Working Scenario A
4405 words • 23 min read • Abstract

Why this matters — AI coding agents working across multiple sw-embed repos do not have the patience or the pattern-matching to get the load plan right by inspection. They will happily write cor24-run --load-binary out.bin@0 --load-binary app.p24@0x10000 --patch 0x12=0x10000 --entry 0 from scratch every time, sometimes inventing flags that don’t exist. The fix is not better agent prompts; it is removing the freedom to invent. sw-launch run <scenario> is the only verb the agent gets, the TOML is the only place memory-layout decisions live, and the schema makes oversized heaps argue for themselves before the validator accepts them.
| Resource | Link |
|---|---|
| Repo | sw-cli-tools/sw-launcher |
| 13-repo survey | docs/survey/index.md · schema-gaps.md |
| Memory stance | docs/memory-stance.md · docs/heap-analysis.md |
| COR24 emulator | sw-embed/cor24-rs |
| Driven projects | sw-cor24-pcode · sw-cor24-ocaml · sw-cor24-pascal · sw-cor24-basic |
| Related AI Tools post | AI Tools #3: sw-checklist — Reining In AI Coding Agents With a Code-Metrics Ratchet |
| Comments | Discord |
The Problem: Every Language Has Its Own Loader
The COR24 is a 24-bit machine with 1 MiB of SRAM, a 3 KiB EBR hardware stack, and an MMIO aperture at 0xFF0000. The host-side cor24-run emulator accepts a small surface — --load-binary path@hex_addr, --patch hex_addr=hex_value, --uart-input "...", --entry, --speed, -n — and that’s the universe.
What changes between repos is what gets loaded where, which runtime word has to be patched to point at the layer above it, and how source and data ride the UART. The Phase 0 survey looked at thirteen working repos and aggregated the patterns. A few from the comparison table:
| repo | loads | patches | UART src | UART data | heap | stack | approx SRAM |
|---|---|---|---|---|---|---|---|
| basic | 1 | 0 | yes | no | emb | hw EBR | ~64 KiB |
| forth | 0 | 0 | yes | no | emb | hw EBR | ~256 KiB |
| macrolisp | 0–1 | 0 | yes | snapshot | emb | hw EBR | ~512 KiB |
| ocaml | 2 | 2 | yes | post-EOT | emb+res | emb in pvm | ~512 KiB |
| pascal | 1–2 | 1 | yes | no | emb | emb | ~64 KiB |
| plsw | 0 | 0 | yes | no | emb | hw EBR | ~1 MiB |
| snobol4 | 1–3 | 0 | yes | mode-flag | emb | emb | ~128 KiB |
| monitor | many | 0 | no | no | emb | hw EBR | ~64 KiB |
| tuplet | 3 | 2 | yes | image@0x080000 | res | emb in pvm | ~768 KiB |
Every project’s scripts/run-*.sh re-encodes one of these shapes. None of them validate. None of them cache. None of them notice when the OCaml heap and the DSL heap overlap. And every AI agent that touches these scripts adds its own subtle variation, because the shell script is the spec.
Two Axes, Five Shapes
The original PRD assumed one axis with three points (A: single image, B: runtime+image, C: nested interpreter). The survey says it is actually two axes:
- Build axis: hand-written assembly, compiled from a higher-level language, snapshot rehydrated by host tooling, or composite of N modules linked host-side.
- Run axis: one-shot batch (kick off and check UART), interactive REPL through UART, interactive shell with a resident process model, or edit-then-run via a resident editor.
The cross product yields five primitive shapes that cover everything sw-embed has written so far. The first three were already in the day-zero design; the last two emerged from the survey:
- Single image at zero, UART source. Heap and stack embedded in the image. (apl, basic, forth, plsw, smalltalk-delegated.)
- Runtime + image + patch. Native COR24 runtime at 0 plus a p-code image at a higher address with a
code_ptr-style patch. (pascal single-unit and multi-unit, the OCaml/tuplet pattern without the heap patch.) - Nested interpreter with heap-limit patch and UART-after-EOT data. Adds a second patch (heap limit), and the UART payload is
<source> + EOT + <runtime data>. (ocaml, tuplet.) - Multi-module composite image. The launcher loads N independently assembled modules at contiguous bases (snobol4 via
link24) or at fixed slot addresses (macrolisp’s multi-module demo, monitor’s program registry). Linking happens host-side, not via patches. - Resident shell + paste-and-go. Monitor at 0, sws shell at
0x20000, programs at fixed slots, all preloaded together; transfer of control happens inside the emulator via a service-vector / trampoline (mon_invoke_program) and never returns to the host runner. (monitor, script, yocto-ed.)
A scenario picks one shape; the schema makes that pick explicit instead of implied by which shell script you happen to run.
Schema v1.1: Partition Grid (Considered, Then Rejected)
The first revision after the survey, schema v1.1, divided the 1 MiB SRAM into eight fixed partitions of 128 KiB and four regions per partition (code/heap/spare/stack, 32 KiB each). Most existing repos already align to obvious partition boundaries (0x000000, 0x010000, 0x040000, 0x080000, 0x0F0000), so re-stating those addresses in (partition, region) coordinates was mostly a labeling change.
It was the wrong move. The grid canonized a layout without taking a position on the budgets, which let oversized heaps express themselves as multi-cell claims and call it normal:
# v1.1: OCaml's 252 KiB heap, expressed as four contiguous cells.
# Schema accepts it, validator passes, nothing argues back.
[layers.ocaml_interp.segments.value_heap]
kind = "heap"
grows = "down"
claims = [
{ partition = 0, region = "spare" },
{ partition = 0, region = "stack" },
{ partition = 1, region = "code" },
{ partition = 1, region = "heap" },
]
That’s the OCaml interpreter’s current heap_limit = 0x03F000 written as a partition-cell list. Pinning down the layout this way looked like progress. It was actually normalization of the bug.
Schema v1.2: Memory Profiles + Heap Budgets
The second revision flipped the prior. From docs/memory-stance.md:
The COR24 board emulator targets 1 MiB SRAM. That is more, not less, than every machine these re-implemented languages were originally designed for: Forth in 4–16 KiB, BASIC in 4 KiB (Altair) to 32 KiB (MS BASIC for IBM PC), APL/360 in <128 KiB per partition, Smalltalk-72/76 in 128–512 KiB including the bitmap display, Macrolisp on a PDP-10 with 256 KiB total. The IBM PC shipped in 1981 with 16–256 KiB. By 1985 measure, 1 MiB and a tiny monitor is a luxurious environment.
If macrolisp on a PDP-10 fit in 256 KiB total — runtime, interpreter, and program — then the COR24 macrolisp’s ~288 KiB heap is not a constraint problem. Something has gone soft. The 1 MiB ceiling does not need to be raised. The heaps need to be shrunk.
v1.2 makes that the schema’s stance. Three concrete changes:
1. The fixed grid is gone. Replaced with named memory profiles. Each profile is an ordered list of partitions of arbitrary size, each with its own list of named regions of arbitrary kind and size, plus a budget block:
[memory_profiles.compiled-app]
description = "Single image at 0; small heap; small stack."
[[memory_profiles.compiled-app.partitions]]
name = "code"
base = "0x000000"
size = "0x010000" # 64 KiB
regions = [
{ name = "code", kind = "code", size = "auto" },
{ name = "static", kind = "data", size = "auto" },
]
[[memory_profiles.compiled-app.partitions]]
name = "heap"
base = "0x010000"
size = "0x008000" # 32 KiB
regions = [{ name = "heap", kind = "heap", size = "0x008000" }]
[memory_profiles.compiled-app.budget]
code_max = "0x008000" # 32 KiB
heap_max = "0x004000" # 16 KiB
stack_max = "0x002000" # 8 KiB
total_max = "0x010000" # 64 KiB
justification_required = true
2. Five default profiles ship with the launcher, each sized per docs/heap-analysis.md:
| profile | code+data | heap | stack | total | example use |
|---|---|---|---|---|---|
compiled-app |
<= 32 KiB | <= 16 KiB | <= 8 KiB | <= 64 KiB | BASIC echo program |
interpreter-only |
<= 64 KiB | <= 64 KiB | <= 16 KiB | <= 160 KiB | APL, Forth, Smalltalk |
repl-inline-compile |
<= 128 KiB | <= 256 KiB | <= 32 KiB | <= 448 KiB | OCaml + GC, Tuplet (post-fix) |
compiler-image |
<= 256 KiB | <= 64 KiB | <= 32 KiB | <= 384 KiB | PL/SW (post-fix) |
resident-shell |
<= 64 KiB per slot, up to 8 slots | per-program | shared 8 KiB | <= 512 KiB | monitor + sws + N programs |
A scenario picks a profile by name; the validator enforces that profile’s budget. Layers cite partitions and regions by name, not by hex.
3. Heaps over 32 KiB must argue for themselves through a heap_justification block:
[layers.ocaml_interp.heap_justification]
category = "gc-slack"
note = "Mark/sweep GC; sized for working set + 2x slack."
measured_floor_kib = 64
tracking_issue = "sw-cor24-ocaml#28"
Five categories, in roughly descending order of merit:
| category | accepted? | meaning |
|---|---|---|
algorithmic-floor |
yes | Working set genuinely requires this size. |
bytecode-image |
yes | Heap is mostly read-only data, not allocations. |
gc-slack |
yes (with measured_floor_kib) |
Sized for floor + slack between collections. |
dead-leak |
warn; rejected by --strict |
Allocations that never get freed. |
algorithmic-bloat |
warn; rejected by --strict |
Pointer width, boxing, dispatch tables, etc. |
The default category for an undocumented oversized heap is dead-leak — because the heap-analysis pass found that all three of the demanding repos (ocaml, macrolisp, plsw) match exactly that pattern, and the first job of a budget is to refuse to normalize them.
What the Heap Analysis Found
docs/heap-analysis.md walks every repo with a claimed heap > 32 KiB and assigns it a category, a historical benchmark, and a shrinkage backlog:
| repo | current claim | category | historical floor | post-fix target |
|---|---|---|---|---|
| ocaml | ~252 KiB heap | dead-leak | OCaml-on-PDP-10 < 256 KiB total (1973) | <= 64 KiB after GC |
| tuplet | inherits ocaml | dead-leak | n/a (downstream) | shrinks with ocaml |
| macrolisp | ~288 KiB BSS | dead-leak + bloat | Maclisp 256 KiB total | <= 64 KiB heap |
| plsw | ~1 MiB image | algorithmic-bloat | UCSD Pascal in 64 KiB; Turbo Pascal 1.0 in 33.5 KiB | <= 256 KiB image |
| snobol4 | ~76 KiB internal | floor + dead-leak | SNOBOL4 in 64–256 KiB total | <= 64 KiB |
| forth | dictionary | algorithmic-floor | Forth kernels in 4–16 KiB | <= 16 KiB typical |
| basic | embedded DIM 64–128 KiB | algorithmic-floor | Altair 4K BASIC | <= 32 KiB |
OCaml’s GC work in sw-cor24-ocaml#28 is the one in flight. After it lands, tuplet’s heap_limit should shrink, not stay where it is. Macrolisp’s mark byte should be a mark bit (8x reduction). PL/SW’s compiler-output redundancy is fixable in one pass through the transpiler. The schema must support the current sizes transitionally, but the analysis doc must not normalize them. The current sizes are evidence of work to do; not the spec for the launcher.
Layers Are Composites, Not Blobs
Every other piece of the schema survives both revisions. A layer is still (artifact?) + (segments), and segments still have lifecycles:
- Embedded segments live inside the artifact.
pvm.sreserveseval_stack,call_stack, and a smallheap_segstatically; they are part ofpvm.binand the loader does not allocate them again, but the validator has to know they exist (resolved through the artifact’s listing) so the global overlap check sees them. - Reserved segments are allocated by the loader at a configured cell with a configured size and zero-filled. The OCaml interpreter’s value heap, the Pascal eval stack, the DSL heap on top — none of these fit in the COR24’s 3 KiB EBR; they live in high SRAM cells and the runtime gets patched to point at them.
- Patched is the verb that ties the two together. A reserved segment with
value = "self.address"(orself.endfor down-growing heaps) tells the loader to write its own resolved address into the runtime’sheap_base/heap_limitsymbol, so the runtime knows where to find the heap the loader just allocated for it.
Patches in v1.2 also accept two value forms that the survey demanded:
value = "sidecar:<path>"reads a build-time-resolved address from a small text file. ocaml and tuplet today writebuild/code_ptr_addr.txtandbuild/heap_limit_addr.txtduring their build; the schema makes that explicit and includes the sidecar in the cache key.target = "<upstream-layer>.<symbol>"lets a downstream layer reference a symbol in an upstream layer’s listing — including upstream layers from a vendored repo (Phase 2 step 002 just landed the resolver for this; tuplet wants pvm symbols from sw-cor24-ocaml’s build, not its own).
The Wizard’s Spellbook — sw-launch.toml

The whole config is one file at the project root. A trimmed Scenario A example, written against the compiled-app profile, that actually runs end-to-end on a real cor24-run:
[scenarios.echo]
target = "cor24"
memory_profile = "compiled-app"
layers = ["program", "stdin"]
entry = "0x000000"
[scenarios.echo.run]
timeout_ms = 2000
max_cycles = 200_000
halt_on = "uart-eot"
[scenarios.echo.expect]
uart_contains = ["A"]
exit_code = 0
[layers.program]
kind = "assembler"
source = "local"
input = "src/echo.s"
tool = "assembler"
artifact = "echo.bin"
[layers.program.segments.code]
kind = "code"
claims = [{ partition = "code", region = "code" }]
[layers.stdin]
kind = "data"
input = "tests/echo-input.txt"
load.method = "uart"
load.max_bytes = 1024
sw-launch run echo walks the layer DAG, builds each layer (or pulls it from the in-process memoization cache), assembles the load plan, invokes cor24-run, captures UART output, and checks the expectations. Today this prints A and exits 0. The CLI surface is small enough to memorize:
sw-launch run <scenario> Build (with cache) and execute, check expectations.
sw-launch build <scenario> Build all layers; do not execute.
sw-launch check <scenario> Validate config + lock; no tools run.
sw-launch graph <scenario> Print layer DAG (text or --json).
sw-launch cache list List cached artifacts.
sw-launch vendor sync Resolve and pin all dependencies.
sw-launch doctor Verify host tools (cor24-run, pa24r, pl24r) found.
Every flag agents used to invent is now either a TOML field or a --profile. There is no --load-binary to mistype.
Phase 1: What Runs Today
The Phase 1 saga (sw-launcher-phase1, ten steps) closed clean on April 28. End state:
sw-launch run echo --config tests/fixtures/scenario_a/sw-launch.tomlexits 0 and printsA(the captured UART output).sw-launch check echovalidates the scenario without spawning the emulator.sw-launch build echoassembles every assembler-kind layer under<config-dir>/.sw-launch/build/<scenario>/<layer>/with a sha256-keyed in-process memoization cache.- 60 tests across 11 binaries, including 2 end-to-end against the real
cor24-runbinary, all green. validate.rsimplements 17 stable error codes, each with a negative test asserting the exact code and span.
The integration tests ran against cor24-run 0.1.0, rustc 1.94.1, edition 2024, on Darwin 24.6.0. Every one of those versions is recorded in the repo’s status.md so future-me knows what “Phase 1 worked” actually meant.
Phase 2 (sw-launcher-phase2) is open and seeded with five steps; step 001 (PCode tool with SourceSpec resolution) and step 002 (cross-layer listing-symbol patches resolve at scenario validate time) just landed. Phase 2’s target is end-to-end Scenario B: COR24 runtime at 0 plus a p-code blob at a higher address with a code_ptr patch — the smallest meaningful test that the launcher can express the layered shape that today’s sw-cor24-pcode and sw-cor24-pascal demos use.
Validation: Catch Collisions Before the Emulator Does
The validator is the most important verb. sw-launch check reads the TOML and the lockfile, walks every segment of every layer in the scenario, computes absolute address ranges (resolving profile partitions to addresses, embedded segments through the producing layer’s listing, sidecars from disk), and runs the rules. Each rule has a stable error code so agents can match on it without scraping prose.
A subset, selected for what they catch:
| Code | Rule |
|---|---|
| E0003 | No two memory ranges overlap (across embedded and reserved segments). |
| E0004 | UART layers declare max_bytes and the input fits. |
| E0005 | Layer kind and load method are compatible. |
| E0006 | Every patch resolves — symbol exists, segment exists, topo order is right. |
| E0011 | Reserved stack/heap/bss lies inside regions.sram, never touches regions.ebr_stack or regions.mmio. |
| E0014 | embedded = true segments declare a symbol that resolves in the producing layer’s listing. |
| E0023 | Resident-mode mismatch (a layer claims a slot the resident shell doesn’t expose). |
| E0028 | Heap-budget overshoot (heap exceeds the profile’s heap_max). |
| E0029 | Heap >= 80% of heap_max (warn). |
| E0030 | Missing heap_justification for heap > 32 KiB. |
| E0031 | Layer cites a partition or region the profile does not declare. |
| E0032 | Profile self-overlap (the profile’s own partitions collide). |
| E0033 | Total claimed SRAM > 1 MiB. |
| E0034 | heap_justification.category = "dead-leak" or "algorithmic-bloat" under --strict. |
--strict mode promotes most warnings to errors and always rejects dead-leak and algorithmic-bloat justifications. CI runs strict; local check runs lax so the shrinkage work can land incrementally without breaking the build.
Caching, Vendoring, and the Lockfile
The cost of not caching is that every test re-assembles pvm.s, every demo re-builds the host toolchain, every CI run wastes minutes. Phase 1 ships an in-process memoization cache (sha256 keyed on (input bytes, tool version, args, output filename)) inside the assembler tool wrapper; Phase 4 will lift that to a persistent on-disk cache at ~/.cache/sw-launch/. The key formula already accounts for the hard cases:
layer_key = sha256(
schema_version
| normalize_toml(layer_config)
| hash_each(input_files)
| tool_version_hash
| dependency_layer_hashes (in topo order)
| resolved_address_or_uart_marker
| sidecar_contents (if any)
)
A cache hit is sound — if the key matches, the artifact would be byte-identical to a fresh build, so reusing it can never produce a different scenario result. Sidecar contents are in the key because v1.2’s value = "sidecar:..." patch source needs the cache to invalidate when the sidecar changes.
The vendor side is symmetric. The survey was unflattering — ten of thirteen repos pin nothing at all, and one (tuplet) inherits its pins transitively from sw-cor24-ocaml. Only sw-cor24-ocaml has a real vendor/<tool>/<version>/active.env model with commit SHAs. v1.2 makes the OCaml-style vendored model the default: sw-launch vendor sync (Phase 4) will resolve declared dependencies (sibling:, vendor:, eventually git:) and write sw-launch.lock with each artifact’s commit hash and SHA. sw-launch doctor will record the observed version of each PATH-resolved tool too, so drift is visible even when nothing is explicitly pinned.
What This Buys
The point is not that the TOML is shorter than the shell script — it is often longer. The point is what changes about the system:
- One verb for agents.
sw-launch run <scenario>replaces the ten variants ofcor24-run --load-binary ...that agents kept reinventing. - Validation before the emulator. The most expensive failure mode — “ran for forty seconds, traps silently, zero output” — is replaced by a fast
checkthat names the colliding region and the offending TOML span. - Heaps argue for themselves. A 252 KiB heap is not waved through; it has to declare a category, a measured floor, and a tracking issue.
--strictrejectsdead-leakandalgorithmic-bloatoutright. - A schema for memory layout. Reserved heaps, embedded stacks, profile-named partitions, sidecar patches, cross-layer symbol references — all first-class in TOML. A new language port describes its memory shape; it does not write a new shell loader.
- Vendor versions visible. Sibling-repo dependencies are pinned by commit hash in the lockfile, not by “whatever was checked out at 3pm.” PATH-resolved tools have their observed versions recorded too.
- Phase 1 actually runs. The “what runs today” section is not a roadmap; it is what
cargo testexercises, end-to-end, against a real emulator.
The wizard metaphor still fits the post marker: thirteen repos, each with its own ring of power, all in the end answering to one. sw-launcher is the one ring in the boring sense — the one place to encode the load plan — not in the corrupting sense, hopefully. But the more accurate metaphor for v1.2 is that the schema is a budget officer: every heap that wants more than 32 KiB has to file paperwork, and the default verdict on the paperwork is “this is a leak; prove otherwise.”
Where It Sits in the Personal-Software Toolkit
sw-checklist, now part of the AI Tools series, sits at a different layer of the same problem: AI agents working alone produce too much variety in places where uniformity is cheaper. sw-checklist constrains the shape of the code (function/file/module/crate size limits); sw-launcher constrains the shape of the load plan and the memory budget (which addresses, which patches, how big a heap can grow before it has to file paperwork). Both are accidental complexity in Brooks’s sense. Both are paying rent.
Phase 0 (survey) and Phase 1 (Scenario A end-to-end) are done. Schema v1.1 was a wrong turn that taught the right lesson: do not canonize the layout without taking a position on the budgets. Schema v1.2 takes that position, in writing, with five default profiles sized against historical implementations from the 1970s and 1980s. Phase 2 (Scenario B, runtime + p-code blob with cross-layer patches) is in flight; Phase 3 brings Scenario C (nested interpreter, heap-limit-only patches — the tuplet shape); Phase 4 brings persistent caching and vendor sync; Phase 5 brings the resident-shell composite (D + E).
The wizard is just the post marker. The one ring is just the metaphor. The interesting part is the budget officer.
Part 8 of the Personal Software series. View all parts | Next: Part 9 →
Comments or questions? SW Lab Discord or YouTube @SoftwareWrighter.