SpeechButton for Developers: Voice-Driven Multi-Agent Coding

You Already Talk to Your AI. Why Are You Still Typing?

You're in the zone. Three Claude Code agents running in parallel — one refactoring auth, one writing tests, one reviewing a PR. You need to send each of them a different task.

So you alt-tab. Type the task. Alt-tab to the next terminal. Type another task. Alt-tab again. By the time you've context-switched three times, your flow state is gone.

What if you could just speak?

Hold RightCommand+1. "Fix the race condition in the session middleware — the refresh token is being read before the write completes."

Release. The task arrives at your Claude Code agent as a structured markdown prompt. The agent starts working. You never left your editor.

Hold RightCommand+2. "Write integration tests for the new OAuth flow — cover expired tokens, revoked sessions, and concurrent logins."

Release. A different agent picks up a different task. You're still looking at your code.

This is SpeechButton. One hotkey per destination. Each with its own AI transform pipeline. Voice-driven multi-agent coding that no other tool can do.

How It Works: 60 Seconds

SpeechButton is a macOS push-to-talk engine built in Rust. You hold a hotkey, speak, release. Text appears wherever you want it — instantly.

The key insight: different hotkeys route to different destinations with different transforms.

RightCommand     → paste at cursor (raw dictation, no transform)
RightCommand+1   → Claude Code agent (transform: structure as task)
RightCommand+2   → Slack #dev channel (transform: casual tone)
RightCommand+3   → Linear (transform: format as issue)
RightCommand+4   → Git commit message (transform: conventional commit format)

Each hotkey has three components:

Hotkey — which key combination triggers it
Transform — a markdown prompt file that shapes the raw transcription before delivery
Destination — where the transformed text goes (paste, exec, Python integration script)

The transform pipeline is simple: your speech → SpeechButton's STT engine → raw text → processed through Local AI with your prompt file → destination. Any .md file in prompts/ is a valid transform.

The Multi-Agent Developer Setup

Here's a real config.toml for a developer working with multiple AI coding agents:

# ~/.config/speechbutton/config.toml

[global]
model = "parakeet-tdt-0.6b-v3-int8"    # Apple Neural Engine, 100% offline
language = "auto"
auto_punctuation = true

# Default hotkey: paste raw text at cursor
# Hold RightCommand, speak, release → text appears in your editor
[[hotkey]]
key = "RightCommand"
name = "default"
paste = "accessibility"

# Channel 1: Claude Code — structured task for AI agent
# Hold RightCommand+1, speak task, release → agent receives structured prompt
[[hotkey]]
key = "RightCommand"
channel = "1"
name = "claude-agent"
transform = "prompts/claude_code_task.md"
exec = "integrations/send_claude_code.py"

# Channel 2: Slack — casual dev message
# Hold RightCommand+2, speak, release → message sent to Slack channel
[[hotkey]]
key = "RightCommand"
channel = "2"
name = "slack-dev"
transform = "prompts/slack_message.md"
exec = "SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx integrations/send_slack.py"

# Channel 3: Linear — create issue from voice
# Hold RightCommand+3, describe bug, release → Linear issue created
[[hotkey]]
key = "RightCommand"
channel = "3"
name = "linear-issue"
transform = "prompts/linear_issue.md"
exec = "LINEAR_API_KEY=lin_api_xxx integrations/send_linear.py"

# Channel 4: Git commit — conventional commit from voice
# Hold RightCommand+4, describe change, release → formatted commit message pasted
[[hotkey]]
key = "RightCommand"
channel = "4"
name = "git-commit"
transform = "prompts/conventional_commit.md"
exec = "paste"

[vad]
enabled = false

[ptt]
chunking_enabled = true
chunk_silence_sec = 1.0

Five hotkeys. Five destinations. Each with its own AI-powered transform. All configured in a single TOML file.

The Transform Pipeline: Where the Magic Happens

The real power is in the transforms. A transform is a markdown prompt file in your prompts/ folder. SpeechButton's Local AI engine reads your speech, applies the prompt, and routes the result to your integration script. No cloud API calls required — everything runs on-device.

Transform 1: Spoken task → Structured agent prompt

You say:

"Fix the auth bug where refresh tokens expire but the session stays active causing a 500 when the user tries to access their dashboard after about 30 minutes"

The prompts/claude_code_task.md prompt shapes this into a clear task:

# prompts/claude_code_task.md
# Transform a spoken task description into a structured developer task.
# Output ONLY the structured task in markdown, no commentary.

Structure the following spoken task as a clear developer task with these sections:
## Problem, ## Expected Behavior, ## Steps to Investigate.
Be concise and actionable. Output markdown only.

Input: {{transcription}}

Your agent receives:

## Problem
Refresh tokens expire but user sessions remain active. After ~30 minutes,
accessing the dashboard returns HTTP 500.

## Expected Behavior
When a refresh token expires, the session should be invalidated gracefully
and the user redirected to login.

## Steps to Investigate
1. Check token refresh logic in auth middleware
2. Verify session invalidation is triggered on token expiry
3. Look for race condition between token refresh and session read

Instead of your raw stream-of-consciousness, the agent gets a structured task it can act on immediately. The transform runs entirely on-device via Local AI. You spent zero time formatting.

Transform 2: Conventional commit from voice

You say:

"Add rate limiting to the API endpoints using Redis with a sliding window algorithm, 100 requests per minute per user"

The prompts/conventional_commit.md prompt:

# prompts/conventional_commit.md
# Convert a spoken description into a conventional commit message.
# Output ONLY the commit message on a single line, no commentary.

Convert the following to a conventional commit message using the format:
type(scope): subject

Types: feat, fix, refactor, perf, test, docs, chore
Keep the subject under 72 characters. Output one line only.

Input: {{transcription}}

Output pasted at cursor:

feat(api): add Redis sliding-window rate limiting (100 req/min/user)

Hold RightCommand+4, describe your change naturally, release. Perfect commit message. No more staring at git commit -m " trying to remember conventional commit format.

Transform 3: 100% Offline with Local AI

All transforms run through SpeechButton's built-in Local AI engine — no data leaves your machine. The [local_ai] section is optional; Local AI is enabled by default:

[local_ai]
# auto_load = true
# server_address = "127.0.0.1:11435"

Your prompts/*.md files are the only configuration needed. Same pipeline, same config structure — zero cloud dependency. Your voice, your prompts, your Mac. Nothing leaves.

This is useful for:

Working on proprietary codebases with strict data policies
Air-gapped development environments
Simply preferring that your spoken task descriptions never touch a server

Real Workflow: Multi-Agent Bug Fix

Here's a real scenario. You're debugging a production issue. Three agents, three hotkeys, five minutes.

10:02 — Triage (RightCommand+1 → Claude Code agent #1)

"There's a memory leak in the WebSocket handler. Connections aren't being cleaned up on client disconnect. Check the connection pool in src/ws/handler.rs and find where the drop is missing."

Agent #1 starts investigating. You keep reading logs.

10:03 — Parallel task (RightCommand+1 → Claude Code agent #2, different terminal)

"Write a load test that opens 500 WebSocket connections, disconnects half of them, and verifies the connection pool size matches. Use the existing test framework in tests/integration."

Agent #2 starts writing tests. You haven't typed a single character.

10:04 — Team update (RightCommand+2 → Slack)

"Found the WebSocket leak, agents are on it. Fix incoming in about 20 minutes, no need to restart prod yet."

Your team gets a casual, well-formatted Slack message. You're still reading code.

10:05 — Track the bug (RightCommand+3 → Linear)

"WebSocket connections not cleaned up on client disconnect. Connection pool grows unbounded. P1 bug, affects production under load."

A formatted Linear issue is created with title, description, and priority. All from voice.

10:07 — Agent #1 found the fix. You review, approve, commit (RightCommand+4)

"Fix WebSocket connection leak by implementing drop for ConnectionGuard in the connection pool"

fix(ws): implement Drop for ConnectionGuard to prevent connection leak

Five minutes. Two agents working in parallel. One Slack update. One Linear issue. One commit. Zero typing.

Why 20ms Matters for Developers

Other dictation tools take 200ms+ to start recording after you press the hotkey. That means if you immediately start talking — and developers always do — you lose the first word.

"Fix the race condition" becomes "the race condition." You have to re-dictate.

SpeechButton captures audio in 20ms. The word "Fix" is there. Every time. This isn't a marketing number — it's measured from hotkey press to first audio sample captured.

For a developer who sends 50+ voice commands a day to AI agents, losing the first word 50 times means re-dictating 50 times. At 20ms, you never re-dictate. The compound time savings is significant, but more importantly: it doesn't break your flow.

Config as Code: Your AI Agent Can Set This Up

SpeechButton's config.toml is a plain text file. That means your AI agent can read it, modify it, and configure new hotkeys programmatically.

Ask Claude Code: "Add a new SpeechButton hotkey on RightCommand+5 that sends to the #alerts Slack channel with an urgent tone transform."

Claude reads your config.toml, adds:

[[hotkey]]
key = "RightCommand"
channel = "5"
name = "slack-alerts"
transform = "prompts/urgent_tone.md"
exec = "SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/alerts integrations/send_slack.py"

And writes the prompts/urgent_tone.md prompt file. Your voice-routing setup evolves with your workflow — configured by the same AI agents you're routing voice to.

No GUI clicking. No settings menus. A TOML file and markdown prompts. The way developers configure tools.

What No Competitor Can Do

SuperWhisper and Wispr Flow are dictation tools. They transcribe speech and paste text. That's it.

SpeechButton is a voice routing engine. Each hotkey is a programmable channel with its own transform pipeline and destination. The combination of:

Per-hotkey routing — different hotkeys → different destinations
Markdown prompt transforms — simple .md files in prompts/, no scripting required
Multi-agent support — send structured tasks to different AI agents via Python integration scripts
Config as code — TOML file, editable by humans and AI agents
100% offline — Apple Neural Engine + Local AI, no cloud required
20ms capture — never lose the first word

...doesn't exist in any other product. This isn't dictation. It's a voice-first developer interface.

Get Started in 2 Minutes

Download SpeechButton — free 5 minutes/day, no account needed
Set your default hotkey — RightCommand for paste-at-cursor dictation
Add your first channel — edit ~/.config/speechbutton/config.toml, add a RightCommand+1 hotkey pointing to your AI agent
Write a prompt file — create prompts/claude_code_task.md with the task structuring prompt above
Hold, speak, release — your agent receives a structured task

That's it. You're voice-coding.

Ready to stop typing?

Download SpeechButton

Free for 5 minutes/day. Pro ($7.99/mo) removes the limit.
Requires macOS 15+ (Sequoia) and Apple Silicon (M1/M2/M3/M4).

 Download for macOS →

Free 5 min/day · macOS 15+ · Apple Silicon only