midi-cli-rs: Music Generation for AI Coding Agents

Personal Software via Vibe Coding: a music tool for AI agents. midi-cli-rs provides mood presets (suspense, upbeat, calm, jazz) so agents can generate complete audio compositions from simple commands. No music theory required.

AI coding agents can write code, generate images, and produce text. But what about music? When I needed background audio for explainer videos, I wanted a tool that AI agents could use directly—no music theory required.

Resource	Link
Video	midi-cli-rs Explainer
Examples	Listen to Samples
Code	midi-cli-rs

The Problem

Generating music programmatically is hard. Traditional approaches require understanding music theory, MIDI specifications, instrument mappings, and audio synthesis. That’s a lot to ask of an AI agent that just needs a 5-second intro.

I wanted something simpler: a CLI tool where an agent could say “give me 5 seconds of suspenseful music” and get a usable WAV file.

The Solution: Mood Presets

midi-cli-rs solves this with mood presets—curated musical generators that produce complete compositions from a single command:

# Generate a 5-second suspenseful intro
midi-cli-rs preset --mood suspense --duration 5 -o intro.wav

# Upbeat outro with specific key
midi-cli-rs preset -m upbeat -d 7 --key C --seed 42 -o outro.wav

Six moods are available:

Mood	Character
`suspense`	Low drones, tremolo strings, tension
`eerie`	Sparse tones, diminished harmony
`upbeat`	Rhythmic chords, energetic
`calm`	Warm pads, gentle arpeggios
`ambient`	Textural drones, pentatonic bells
`jazz`	Walking bass, brushed drums, piano trio

Each mood generates multi-layer compositions with appropriate instruments, rhythms, and harmonies. The --seed parameter ensures reproducibility—same seed, same output. Different seeds produce meaningful variations in melody contour, rhythm patterns, and instrument choices.

Melodic Variation

The presets don’t just randomize notes—they use a contour-based variation system. Changing the seed produces melodies that follow different shapes (ascending, descending, arch, wave) while staying musically coherent. This means you can generate multiple versions of a mood and pick the one that fits best.

How It Works

The tool generates MIDI programmatically, then renders to WAV using FluidSynth:

Mood Preset → MIDI Generation → FluidSynth → WAV Output

MIDI generation uses the midly crate to create standard MIDI files. Each preset generates multiple tracks with different instruments, note patterns, and dynamics.

Audio rendering calls FluidSynth as a subprocess with a SoundFont (instrument samples). This avoids LGPL licensing complications—subprocess execution doesn’t trigger copyleft.

Note-Level Control

When presets aren’t enough, you can specify exact notes:

# Note format: PITCH:DURATION:VELOCITY[@OFFSET]
midi-cli-rs generate \
    --notes "C4:0.5:80@0,E4:0.5:80@0.5,G4:0.5:80@1,C5:1:90@1.5" \
    -i piano -t 120 -o arpeggio.wav

Or use JSON for complex multi-track arrangements:

echo '{"tempo":90,"instrument":"piano","notes":[
  {"pitch":"C4","duration":0.5,"velocity":80,"offset":0},
  {"pitch":"E4","duration":0.5,"velocity":80,"offset":0.5},
  {"pitch":"G4","duration":1,"velocity":90,"offset":1}
]}' | midi-cli-rs generate --json -o output.wav

Web UI

For interactive composition, there’s a browser-based interface:

midi-cli-rs serve  # Starts on http://127.0.0.1:3105

The Presets tab lets you adjust mood, key, duration, intensity, and tempo with immediate audio preview. Click the clock button to generate a time-based seed for unique but reproducible results.

The Melodies tab provides note-by-note composition with keyboard shortcuts:

a-g for note pitch
[ / ] to adjust duration
+ / - to change octave
Tab to navigate between notes

For AI Agents

The CLI is designed for AI agent usage:

Simple commands: One line generates complete audio
Reproducible: Seed values ensure consistent output
Self-documenting: --help includes agent-specific instructions
Composable: Generate tracks separately, combine with ffmpeg

# AI agent workflow
midi-cli-rs preset -m suspense -d 5 --seed 1 -o intro.wav
midi-cli-rs preset -m upbeat -d 10 --seed 2 -o main.wav
ffmpeg -i intro.wav -i main.wav -filter_complex concat=n=2:v=0:a=1 final.wav

SoundFont Quality Matters

The quality of generated audio depends heavily on the SoundFont used. SoundFonts are collections of audio samples for each instrument—a tiny SoundFont with compressed samples will sound thin and artificial, while a larger one with high-quality recordings produces professional results.

SoundFont	Size	Quality	License
TimGM6mb	~6MB	Basic	GPL v2
GeneralUser GS	~30MB	Good	Permissive
FluidR3_GM	~140MB	Very Good	MIT
MuseScore_General	~200MB	Excellent	MIT

For anything beyond quick prototypes, use a quality SoundFont. The difference is dramatic—the same MIDI file can sound like a toy keyboard or a real instrument depending on the samples.

The tool auto-detects SoundFonts in common locations (~/.soundfonts/, /opt/homebrew/share/soundfonts/, etc.), or specify one explicitly with --soundfont.

Technical Details

Built with Rust 2024 edition using permissively licensed dependencies:

Crate	Purpose
midly	MIDI file generation
clap	CLI argument parsing
serde	JSON serialization
rand	Randomization for presets
axum	Web server (for `serve` command)

FluidSynth is called as a subprocess for WAV rendering, keeping the main codebase MIT-licensed.

Try It

Listen to sample outputs, or build locally:

git clone https://github.com/softwarewrighter/midi-cli-rs.git
cd midi-cli-rs
cargo build --release
./target/release/midi-cli-rs preset -m jazz -d 5 -o jazz.wav

Requires FluidSynth for WAV output (brew install fluid-synth on macOS).

Series: Personal Software (Part 2)

Previous: cat-finder: Local ML in Rust

Next: midi-cli-rs: Custom Mood Packs

Disclaimer

You are responsible for how you use generated audio. Ensure you have the appropriate rights and permissions for any commercial or public use. This tool generates MIDI data algorithmically—how you render and distribute the final audio is your responsibility.

Be aware that algorithmic composition can inadvertently produce sequences similar to existing copyrighted works. Whether you use this tool, AI generation, or compose by hand, you must verify that your output doesn’t infringe on existing copyrights before public release or commercial use. Protect yourself legally.