You Already Talk to Your AI. Why Are You Still Typing?
You're in the zone. Three Claude Code agents running in parallel — one refactoring auth, one writing tests, one reviewing a PR. You need to send each of them a different task.
So you alt-tab. Type the task. Alt-tab to the next terminal. Type another task. Alt-tab again. By the time you've context-switched three times, your flow state is gone.
What if you could just speak?
Release. The task arrives at your Claude Code agent as a structured markdown prompt. The agent starts working. You never left your editor.
Release. A different agent picks up a different task. You're still looking at your code.
This is SpeechButton. One hotkey per destination. Each with its own AI transform pipeline. Voice-driven multi-agent coding that no other tool can do.
How It Works: 60 Seconds
SpeechButton is a macOS push-to-talk engine built in Rust. You hold a hotkey, speak, release. Text appears wherever you want it — instantly.
The key insight: different hotkeys route to different destinations with different transforms.
RightCommand → paste at cursor (raw dictation, no transform) RightCommand+1 → Claude Code agent (transform: structure as task) RightCommand+2 → Slack #dev channel (transform: casual tone) RightCommand+3 → Linear (transform: format as issue) RightCommand+4 → Git commit message (transform: conventional commit format)
Each hotkey has three components:
- Hotkey — which key combination triggers it
- Transform — a markdown prompt file that shapes the raw transcription before delivery
- Destination — where the transformed text goes (paste, exec, Python integration script)
The transform pipeline is simple: your speech → SpeechButton's STT engine → raw text → processed through Local AI with your prompt file → destination. Any .md file in prompts/ is a valid transform.
The Multi-Agent Developer Setup
Here's a real config.toml for a developer working with multiple AI coding agents:
# ~/.config/speechbutton/config.toml [global] model = "parakeet-tdt-0.6b-v3-int8" # Apple Neural Engine, 100% offline language = "auto" auto_punctuation = true # Default hotkey: paste raw text at cursor # Hold RightCommand, speak, release → text appears in your editor [[hotkey]] key = "RightCommand" name = "default" paste = "accessibility" # Channel 1: Claude Code — structured task for AI agent # Hold RightCommand+1, speak task, release → agent receives structured prompt [[hotkey]] key = "RightCommand" channel = "1" name = "claude-agent" transform = "prompts/claude_code_task.md" exec = "integrations/send_claude_code.py" # Channel 2: Slack — casual dev message # Hold RightCommand+2, speak, release → message sent to Slack channel [[hotkey]] key = "RightCommand" channel = "2" name = "slack-dev" transform = "prompts/slack_message.md" exec = "SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx integrations/send_slack.py" # Channel 3: Linear — create issue from voice # Hold RightCommand+3, describe bug, release → Linear issue created [[hotkey]] key = "RightCommand" channel = "3" name = "linear-issue" transform = "prompts/linear_issue.md" exec = "LINEAR_API_KEY=lin_api_xxx integrations/send_linear.py" # Channel 4: Git commit — conventional commit from voice # Hold RightCommand+4, describe change, release → formatted commit message pasted [[hotkey]] key = "RightCommand" channel = "4" name = "git-commit" transform = "prompts/conventional_commit.md" exec = "paste" [vad] enabled = false [ptt] chunking_enabled = true chunk_silence_sec = 1.0
Five hotkeys. Five destinations. Each with its own AI-powered transform. All configured in a single TOML file.
The Transform Pipeline: Where the Magic Happens
The real power is in the transforms. A transform is a markdown prompt file in your prompts/ folder. SpeechButton's Local AI engine reads your speech, applies the prompt, and routes the result to your integration script. No cloud API calls required — everything runs on-device.
Transform 1: Spoken task → Structured agent prompt
You say:
The prompts/claude_code_task.md prompt shapes this into a clear task:
# prompts/claude_code_task.md # Transform a spoken task description into a structured developer task. # Output ONLY the structured task in markdown, no commentary. Structure the following spoken task as a clear developer task with these sections: ## Problem, ## Expected Behavior, ## Steps to Investigate. Be concise and actionable. Output markdown only. Input: {{transcription}}
Your agent receives:
## Problem Refresh tokens expire but user sessions remain active. After ~30 minutes, accessing the dashboard returns HTTP 500. ## Expected Behavior When a refresh token expires, the session should be invalidated gracefully and the user redirected to login. ## Steps to Investigate 1. Check token refresh logic in auth middleware 2. Verify session invalidation is triggered on token expiry 3. Look for race condition between token refresh and session read
Instead of your raw stream-of-consciousness, the agent gets a structured task it can act on immediately. The transform runs entirely on-device via Local AI. You spent zero time formatting.
Transform 2: Conventional commit from voice
You say:
The prompts/conventional_commit.md prompt:
# prompts/conventional_commit.md # Convert a spoken description into a conventional commit message. # Output ONLY the commit message on a single line, no commentary. Convert the following to a conventional commit message using the format: type(scope): subject Types: feat, fix, refactor, perf, test, docs, chore Keep the subject under 72 characters. Output one line only. Input: {{transcription}}
Output pasted at cursor:
feat(api): add Redis sliding-window rate limiting (100 req/min/user)
Hold RightCommand+4, describe your change naturally, release. Perfect commit message. No more staring at git commit -m " trying to remember conventional commit format.
Transform 3: 100% Offline with Local AI
All transforms run through SpeechButton's built-in Local AI engine — no data leaves your machine. The [local_ai] section is optional; Local AI is enabled by default:
[local_ai] # auto_load = true # server_address = "127.0.0.1:11435"
Your prompts/*.md files are the only configuration needed. Same pipeline, same config structure — zero cloud dependency. Your voice, your prompts, your Mac. Nothing leaves.
This is useful for:
- Working on proprietary codebases with strict data policies
- Air-gapped development environments
- Simply preferring that your spoken task descriptions never touch a server
Real Workflow: Multi-Agent Bug Fix
Here's a real scenario. You're debugging a production issue. Three agents, three hotkeys, five minutes.
Agent #1 starts investigating. You keep reading logs.
Agent #2 starts writing tests. You haven't typed a single character.
Your team gets a casual, well-formatted Slack message. You're still reading code.
A formatted Linear issue is created with title, description, and priority. All from voice.
fix(ws): implement Drop for ConnectionGuard to prevent connection leak
Five minutes. Two agents working in parallel. One Slack update. One Linear issue. One commit. Zero typing.
Why 7ms Matters for Developers
Other dictation tools take 200ms+ to start recording after you press the hotkey. That means if you immediately start talking — and developers always do — you lose the first word.
"Fix the race condition" becomes "the race condition." You have to re-dictate.
SpeechButton captures audio in 7ms. The word "Fix" is there. Every time. This isn't a marketing number — it's measured from hotkey press to first audio sample captured.
For a developer who sends 50+ voice commands a day to AI agents, losing the first word 50 times means re-dictating 50 times. At 7ms, you never re-dictate. The compound time savings is significant, but more importantly: it doesn't break your flow.
Config as Code: Your AI Agent Can Set This Up
SpeechButton's config.toml is a plain text file. That means your AI agent can read it, modify it, and configure new hotkeys programmatically.
Ask Claude Code: "Add a new SpeechButton hotkey on RightCommand+5 that sends to the #alerts Slack channel with an urgent tone transform."
Claude reads your config.toml, adds:
[[hotkey]] key = "RightCommand" channel = "5" name = "slack-alerts" transform = "prompts/urgent_tone.md" exec = "SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/alerts integrations/send_slack.py"
And writes the prompts/urgent_tone.md prompt file. Your voice-routing setup evolves with your workflow — configured by the same AI agents you're routing voice to.
No GUI clicking. No settings menus. A TOML file and markdown prompts. The way developers configure tools.
What No Competitor Can Do
SuperWhisper and Wispr Flow are dictation tools. They transcribe speech and paste text. That's it.
SpeechButton is a voice routing engine. Each hotkey is a programmable channel with its own transform pipeline and destination. The combination of:
- Per-hotkey routing — different hotkeys → different destinations
- Markdown prompt transforms — simple
.mdfiles inprompts/, no scripting required - Multi-agent support — send structured tasks to different AI agents via Python integration scripts
- Config as code — TOML file, editable by humans and AI agents
- 100% offline — Apple Neural Engine + Local AI, no cloud required
- 7ms capture — never lose the first word
...doesn't exist in any other product. This isn't dictation. It's a voice-first developer interface.
Get Started in 2 Minutes
- Download SpeechButton — free 15 minutes/day, no account needed
- Set your default hotkey — RightCommand for paste-at-cursor dictation
- Add your first channel — edit
~/.config/speechbutton/config.toml, add a RightCommand+1 hotkey pointing to your AI agent - Write a prompt file — create
prompts/claude_code_task.mdwith the task structuring prompt above - Hold, speak, release — your agent receives a structured task
That's it. You're voice-coding.
Ready to stop typing?
Download SpeechButton
Free for 15 minutes/day. Pro ($7.99/mo) removes the limit.
Requires macOS 15+ (Sequoia) and Apple Silicon (M1/M2/M3/M4).
Free 15 min/day · macOS 15+ · Apple Silicon only