Your Agents Are Waiting for Instructions. Stop Typing Them.
You have Claude Code agents running on remote servers. One is refactoring the payment module. Another is fixing CI. A third is writing tests for the new API. Each agent sits in a terminal, ready for its next task.
The bottleneck is you. You type the task. Switch terminals. Type another task. Switch again. Copy-paste context between windows. By the time you've dispatched three tasks, five minutes are gone and your flow state with them.
What if you could just speak?
"Refactor the payment webhook handler — extract the Stripe-specific logic into a separate adapter so we can add PayPal later."
Release. SpeechButton transcribes in 7ms, the built-in local AI structures it into a task, and dispatches it to your Claude Code agent. The agent starts working on your server. You never switched windows.
"The CI is failing on the auth integration tests — looks like the test database isn't being seeded. Fix the setup script."
A different hotkey, a different integration method, a different agent. Two tasks dispatched in 15 seconds.
Zero typing. Zero context switching. Three integration methods to fit your workflow.
Four Ways to Reach Claude Code
SpeechButton supports four integration methods, each suited to different workflows. Pick the one that fits, or mix them across hotkeys.
| Method | Best for | Streaming | Multi-turn | Remote |
|---|---|---|---|---|
| CLI one-shot | Quick, self-contained tasks | — | — | — |
| CLI --resume | Continuing a conversation | — | ✓ | — |
| Remote Control API recommended | Full control, running sessions | ✓ | ✓ | ✓ |
| Channels (MCP) | Bidirectional, webhooks | ✓ | ✓ | — |
All four use the same local AI transform (Gemma 4): your voice goes through a built-in on-device model that structures your spoken task into a clean prompt before it's dispatched. No cloud, no API key, no cost for the transform step.
The Architecture: Voice → Local AI Transform (.md) → Claude Code
The Local AI Transform is the simplest path: no API calls, no latency, no cost. SpeechButton's built-in Gemma 4 model structures your voice into a clean task before the integration exec runs.
Setup: config.toml + Integration Scripts
One config file with hotkey channels. Exec scripts live in integrations/. Prompt files live in prompts/. The Local AI Transform is built in — just point transform at your .md file.
SpeechButton config.toml
Two hotkeys — one for CLI one-shot, one for Remote Control. Both use .md prompt files for offline task structuring.
[global] model = "parakeet-tdt-0.6b-v3-int8" language = "auto" auto_punctuation = true [[hotkey]] key = "RightCommand" name = "default" paste = "accessibility" [[hotkey]] key = "RightCommand" channel = "1" name = "claude-code" transform = "prompts/claude_code_task.md" exec = "integrations/send_claude_code.py" [[hotkey]] key = "RightCommand" channel = "3" name = "claude-remote" transform = "prompts/default.md" exec = "integrations/send_claude_remote.py"
The Local AI Transform runs a built-in small language model on your Mac using a .md prompt file to structure your voice into a developer task prompt — completely offline, no API key, no per-request cost.
CLI One-Shot
The simplest method. Starts a new Claude Code process, runs the task, exits. Best for self-contained tasks where you don't need context from a prior conversation.
#!/usr/bin/env python3 """Send text to Claude Code CLI as a one-shot prompt.""" import os, shutil, subprocess, sys def find_claude_binary(): found = shutil.which("claude") if found: return found for path in [os.path.expanduser("~/.local/bin/claude"), "/opt/homebrew/bin/claude"]: if os.path.exists(path): return path print("claude CLI not found", file=sys.stderr) sys.exit(1) def main(): task = sys.stdin.read().strip() if not task: sys.exit(0) claude = find_claude_binary() result = subprocess.run( [claude, "--print", "--bare", "-p", task], capture_output=True, text=True, timeout=120, ) if result.returncode == 0: response = result.stdout.strip() print(f"Claude: {response.split(chr(10))[0][:100]}" if response else "Done") else: print(f"Error: {result.stderr[:100]}", file=sys.stderr) sys.exit(1) if __name__ == "__main__": main()
# ⌘+1 → Claude Code one-shot [[hotkey]] key = "RightCommand" channel = "1" name = "claude-code" transform = "prompts/claude_code_task.md" exec = "integrations/send_claude_code.py"
You say:
"Add a missing index on the users table email column — write and run the migration."
Claude Code spins up, writes the migration, applies it, exits. No session to manage.
CLI --resume
Resumes a previous Claude Code session by ID. The agent remembers the full prior context — what files it read, what it changed, what it was thinking. Use this when you want to continue a conversation that's already in progress.
#!/bin/bash # integrations/send_claude_resume.sh SESSION_ID # Resumes a Claude Code session and sends a follow-up task SESSION_ID="$1" TASK=$(cat) claude --bare -p "$TASK" \ --resume "$SESSION_ID" \ --allowedTools "Read,Edit,Bash" \ --output-format json \ 2>/dev/null \ | jq -r '.result // "Done"'
# ⌘+2 → Resume Claude Code session [[hotkey]] key = "RightCommand" channel = "2" name = "claude-resume" transform = "prompts/claude_code_task.md" exec = "integrations/send_claude_resume.sh SESSION_ID_HERE"
You say (follow-up to an earlier task):
"Good. Now add a composite index on user_id and created_at as well — same migration pattern."
The agent picks up exactly where it left off. Set the session ID in your config and that hotkey becomes a voice channel into a persistent conversation.
Remote Control API recommended
The most powerful method. Posts a message directly into a running Claude Code session via the Anthropic Remote Control API. The agent is already running, already has context, and responds immediately with streaming output. No startup latency, no lost state.
How to get your session ID
- Start Claude Code with
claude --remote-control - The session ID appears in the startup output, or is auto-detected from
bridge-pointer.json - Auth token is read automatically from
~/.claude/.credentials.json
#!/usr/bin/env python3 """Send voice command to a running Claude Code Remote Control session. Usage: echo "task text" | python3 send_claude_remote.py [session_id] If session_id is omitted, auto-detects from bridge-pointer.json. Auth token is read from ~/.claude/.credentials.json automatically. """ import json, os, pathlib, sys, urllib.request, uuid def find_session_id(): """Scan common locations for bridge-pointer.json.""" candidates = [ pathlib.Path.home() / ".claude" / "bridge-pointer.json", pathlib.Path("/tmp/claude-bridge-pointer.json"), ] for p in candidates: if p.exists(): return json.loads(p.read_text())["sessionId"] raise RuntimeError( "No session ID provided and bridge-pointer.json not found. " "Start Claude Code with --remote-control or pass session ID as argument." ) def resolve_session_id(args): """Return session ID from CLI arg or auto-detect.""" if len(args) > 1 and args[1] not in ("", "SESSION_ID_HERE"): return args[1] return find_session_id() def get_oauth_token(): """Read OAuth access token from Claude Code credentials file.""" creds_path = pathlib.Path.home() / ".claude" / ".credentials.json" if not creds_path.exists(): raise FileNotFoundError( f"Credentials not found at {creds_path}. Run 'claude login' first." ) return json.loads(creds_path.read_text())["claudeAiOauth"]["accessToken"] def get_org_uuid(token): """Fetch organization UUID from Anthropic OAuth profile endpoint.""" req = urllib.request.Request( "https://api.anthropic.com/api/oauth/profile", headers={"Authorization": f"Bearer {token}"}, ) profile = json.loads(urllib.request.urlopen(req).read()) return profile["organization"]["uuid"] def send_message(session_id, text, token, org_uuid): """POST a user message event to the Remote Control API.""" data = json.dumps({ "events": [{ "uuid": str(uuid.uuid4()), "session_id": session_id, "type": "user", "parent_tool_use_id": None, "message": {"role": "user", "content": text}, }] }).encode() req = urllib.request.Request( f"https://api.anthropic.com/v1/sessions/{session_id}/events", data=data, headers={ "Authorization": f"Bearer {token}", "Content-Type": "application/json", "anthropic-version": "2023-06-01", "anthropic-beta": "ccr-byoc-2025-07-29", "x-organization-uuid": org_uuid, }, ) urllib.request.urlopen(req) def main(): text = sys.stdin.read().strip() if not text: sys.exit(0) try: session_id = resolve_session_id(sys.argv) token = get_oauth_token() org_uuid = get_org_uuid(token) send_message(session_id, text, token, org_uuid) print(f"Sent to RC session {session_id[:8]}...: {text[:60]}") except Exception as e: print(f"Error: {e}", file=sys.stderr) sys.exit(1) if __name__ == "__main__": main()
# ⌘+3 → Remote Control session (recommended) [[hotkey]] key = "RightCommand" channel = "3" name = "claude-remote" transform = "prompts/default.md" exec = "integrations/send_claude_remote.py"
You say (to a running agent):
"There's a race condition in the session middleware — the refresh token is being read before the write completes causing a 500 after about thirty minutes."
The running agent receives the message and starts investigating immediately — with full context of everything it's already read in this session. Response streams back in real time. No startup latency.
Channels (MCP)
Bidirectional communication via MCP server. Push events into a running Claude Code session through a local webhook. Two-way: Claude can reply back through the channel.
claude --dangerously-load-development-channels server:speechbutton
curl -X POST http://localhost:8788 \
-H "Content-Type: application/json" \
-d '{"message": "Fix the auth bug in login.rs"}'
MCP Channel Server (TypeScript)
The channel needs an MCP server that listens for HTTP requests and pushes them into the Claude Code session. Here's a minimal implementation:
// MCP channel server for SpeechButton voice commands // Install: bun add @modelcontextprotocol/sdk import { Server } from "@modelcontextprotocol/sdk/server/index.js" import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js" const mcp = new Server( { name: "speechbutton", version: "0.0.1" }, { capabilities: { experimental: { "claude/channel": {} } }, instructions: "Voice commands from SpeechButton arrive as <channel> tags. Execute them.", }, ) await mcp.connect(new StdioServerTransport()) // HTTP webhook — SpeechButton POSTs voice tasks here Bun.serve({ port: 8788, hostname: "127.0.0.1", async fetch(req) { const body = await req.text() await mcp.notification({ method: "notifications/claude/channel", params: { content: body, meta: { source: "speechbutton", type: "voice", timestamp: new Date().toISOString(), }, }, }) return new Response("ok") }, })
Register in .mcp.json
Add the channel server to your project's MCP config so Claude Code loads it on startup:
{
"mcpServers": {
"speechbutton": {
"command": "bun",
"args": ["./speechbutton-channel.ts"]
}
}
}
SpeechButton config.toml
Add a hotkey channel that sends voice to the MCP channel server:
# Channel 4: Claude Code via MCP Channel (bidirectional) # Hold RightCommand, tap 4, speak → task pushed into running session [[hotkey]] key = "RightCommand" channel = "4" name = "claude-channel" transform = "prompts/claude_code_task.md" exec = "curl -s -X POST http://localhost:8788 -d @-"
When SpeechButton sends a voice command via curl POST localhost:8788, it arrives in Claude Code as:
<channel source="speechbutton" type="voice"> Fix the auth bug in login.rs </channel>
Channels are ideal for automation pipelines: CI/CD triggers, file watchers, or any system that needs to push tasks into a running agent. The MCP server handles message routing and session management.
The Power of Local AI Transform
The transform pipeline is what makes this more than dictation. Setting transform to a .md prompt file activates SpeechButton's Local AI Transform using the built-in Gemma 4 model. It structures your voice before anything leaves your Mac.
"there's a race condition in the session middleware the refresh token is being read before the write completes causing a five hundred after about thirty minutes"
## Problem
Race condition in session middleware: refresh token read before write completes → HTTP 500 after ~30 minutes.
## Expected Behavior
Token refresh and read should be atomic.
## Steps to Investigate
- Check concurrent access in session middleware
- Look for missing locks on token write
- Verify 30-minute window matches token expiry
The structured version makes Claude Code 2–3x more effective. It doesn't waste turns asking clarifying questions. It doesn't misinterpret "five hundred" as a number instead of an HTTP status code.
The Local AI Transform model runs entirely on your Apple Silicon — no network call, no API key, no per-use cost. Each hotkey channel can use a different .md prompt file for specialized formatting.
And each channel can have a different transform:
- ⌘1 —
prompts/claude_code_task.mdgeneral task structuring (Problem/Expected/Steps) - ⌘2 —
prompts/bug_report.mdfor bug report format (Reproduce/Expected/Actual) - ⌘3 —
prompts/test_spec.mdfor test specification (Given/When/Then)
Same voice. Different structure. Different agent. All from config.toml.
Real Workflow: Voice-Controlling 3 Remote Agents
You're leading a sprint. Three agents on your build server, each in its own worktree. You orchestrate them by voice from your MacBook.
9:00 — Morning kickoff (CLI one-shot)
"Start implementing the GraphQL subscription for real-time order updates. Use the existing OrderEvent type and add a WebSocket transport layer. Check how the REST endpoint works in src/api/orders.rs and mirror the data model."
Agent #1 starts reading the codebase and implementing.
9:01 — Bug fix in parallel (Remote Control API)
"Users are reporting that CSV exports are truncated at 10,000 rows. The export handler in src/export/csv.rs probably has a hardcoded limit. Find it, remove it, add a streaming writer so memory doesn't blow up on large exports."
Sent directly into the running RC session. Agent #2 responds within a second. You're drinking coffee.
9:02 — Test coverage (CLI --resume)
"We're missing integration tests for the billing webhook. Write tests that cover successful payment, failed payment, subscription upgrade, and subscription cancellation. Use the existing test fixtures in tests/fixtures/billing."
Agent #3 resumes its previous session and continues from prior context. Three agents, three tasks, under two minutes.
9:30 — Check-in and course correction (Remote Control API)
"How's the GraphQL subscription going? If you've got the basic query working, add cursor-based pagination before the mutation handler."
The agent continues with updated instructions. Full prior context. You corrected course in 5 seconds.
10:00 — Merge and ship
"If the CSV fix is tested and passing, create a PR with the title 'fix: remove hardcoded row limit in CSV export, add streaming writer'."
The agent runs tests, commits, and opens a PR. From your voice to a pull request.
Why This Only Works with 7ms Capture
When you're orchestrating agents by voice, you're giving rapid-fire instructions. Hold hotkey, speak, release. Hold another hotkey, speak, release. The rhythm is fast — you're thinking out loud, dispatching as ideas form.
The first word of every instruction gets clipped. "Start implementing the GraphQL..." becomes "implementing the GraphQL..." Your agent misses the verb. It misses the intent.
Every word lands. The first syllable of "Start" is captured. Your instruction is complete. Over a morning of 30+ voice dispatches, every instruction is complete — agents do what you meant, not what they guess.
Over a morning of 30+ voice dispatches, the difference between complete and clipped instructions is the difference between agents that do what you meant and agents that guess.
100% Offline Voice Capture
Here's exactly what stays on your Mac and what goes to the cloud:
| Component | Where it runs | Data sent externally |
|---|---|---|
| Voice capture | Your Mac | ✓ Nothing |
| Speech-to-text (Parakeet V3) | Apple Neural Engine | ✓ Nothing |
Local AI Transform — Gemma 4 (.md prompt file) |
Apple Silicon (on-device) | ✓ Nothing |
| CLI one-shot / --resume | Your Mac or remote server | Task text → Anthropic API |
| Remote Control API | Your Mac → Anthropic API | Task text → running session |
The voice capture, transcription, and local AI transform are all 100% local. Your spoken words never leave your Mac as audio. Only the final structured task text is sent — and with the Remote Control API, that goes directly into your already-authenticated running session. For teams working on proprietary code: your voice describing the code never touches a third-party server as raw audio.
Config as Code: Agents Configure Agents
SpeechButton's config.toml is a plain text file. Your Claude Code agent can modify it.
"Hey Claude, add a new SpeechButton channel on Right Command channel 3 that connects to the staging RC session with a deployment task format."
Claude reads your config.toml, adds the hotkey, saves the session ID, and sets up the integration exec. Your voice-routing setup evolves as your infrastructure grows.
This creates a recursive loop: you use voice to control agents, and agents configure how your voice controls them. The system gets better the more you use it.
What Competitors Can't Do
SuperWhisper and Wispr Flow are dictation tools. They transcribe speech and paste it where your cursor is. Full stop.
They can't:
- × Route different hotkeys to different Claude Code integration methods
- × Transform speech with a built-in local AI model — offline, free, instant
- × Post directly into a running Claude Code session via Remote Control API
- × Resume multi-turn agent conversations by voice
- × Let you orchestrate remote agents from a MacBook without opening a terminal
- × Be configured by the agents they control
SpeechButton isn't a dictation tool. It's a voice control layer for AI agent infrastructure. The combination of per-hotkey routing, local AI transforms, four integration methods, and Remote Control API support is unique in the market.
Get Started
Prerequisites
- macOS 15+ (Sequoia), Apple Silicon (M1/M2/M3/M4)
- Claude Code CLI installed (
npm install -g @anthropic-ai/claude-code) - For Remote Control API: Claude Code logged in (
~/.claude/.credentials.jsonpresent)
Quick Start — CLI One-Shot (simplest)
- 1 Download SpeechButton — free 15 minutes/day, no account needed
-
2
Copy the config.toml from above into
~/.config/speechbutton/config.toml -
3
Save the exec script as
~/.config/speechbutton/integrations/send_claude_code.pyandchmod +xit - 4 Hold Right Command, speak a task, release. Your first voice-dispatched agent task will execute in under 10 seconds.
-
5
Upgrade to Remote Control API — start Claude Code with
claude --remote-control, savesend_claude_remote.pyintointegrations/, and Right Command channel 3 will auto-detect your running session
Start voice-controlling your agents today
Free 15 min/day · No account needed · macOS 15+ · Apple Silicon
Download for macOS — FreePro ($7.99/mo) removes the daily limit. Requires macOS 15+ and Apple Silicon.