SpeechButton + Claude Code: Voice Control Remote AI Agents

Your Agents Are Waiting for Instructions. Stop Typing Them.

You have Claude Code agents running on remote servers. One is refactoring the payment module. Another is fixing CI. A third is writing tests for the new API. Each agent sits in a terminal, ready for its next task.

The bottleneck is you. You type the task. Switch terminals. Type another task. Switch again. Copy-paste context between windows. By the time you've dispatched three tasks, five minutes are gone and your flow state with them.

What if you could just speak?

⌘1

"Refactor the payment webhook handler — extract the Stripe-specific logic into a separate adapter so we can add PayPal later."

Release. SpeechButton transcribes in 20ms, the built-in local AI structures it into a task, and dispatches it to your Claude Code agent. The agent starts working on your server. You never switched windows.

⌘2

"The CI is failing on the auth integration tests — looks like the test database isn't being seeded. Fix the setup script."

A different hotkey, a different integration method, a different agent. Two tasks dispatched in 15 seconds.

Zero typing. Zero context switching. Three integration methods to fit your workflow.

Four Ways to Reach Claude Code

SpeechButton supports four integration methods, each suited to different workflows. Pick the one that fits, or mix them across hotkeys.

Method	Best for	Streaming	Multi-turn	Remote
CLI one-shot	Quick, self-contained tasks	—	—	—
CLI --resume	Continuing a conversation	—	✓	—
Remote Control API recommended	Full control, running sessions	✓	✓	✓
Channels (MCP)	Bidirectional, webhooks	✓	✓	—

All four use the same local AI transform (Gemma 4): your voice goes through a built-in on-device model that structures your spoken task into a clean prompt before it's dispatched. No cloud, no API key, no cost for the transform step.

The Architecture: Voice → Local AI Transform (.md) → Claude Code

flowchart LR A["🎤 Your Voice"] -->|20ms| B["SpeechButton\nSTT Engine"] B -->|raw text| C["Local AI\nTransform (.md)"] C -->|structured task| D{"Integration\nExec"} D -->|"CLI -p"| E["Claude Code\nOne-Shot"] D -->|"--resume"| F["Claude Code\nSession"] D -->|"POST /events"| G["Claude Code\nRemote Control"] D -->|"Channel MCP"| H["Claude Code\nChannel"] style A fill:#fefce8,stroke:#eab308,color:#713f12 style B fill:#eff6ff,stroke:#2563eb,color:#1e3a5f style C fill:#f0fdf4,stroke:#16a34a,color:#14532d style D fill:#faf5ff,stroke:#9333ea,color:#581c87 style E fill:#f8fafc,stroke:#64748b,color:#334155 style F fill:#f8fafc,stroke:#64748b,color:#334155 style G fill:#eff6ff,stroke:#2563eb,color:#1e40af,stroke-width:3px

🎤

Voice Capture

20ms latency

🧠

On-Device STT

Apple Neural Engine

✨

Local AI Transform

Local AI Transform · free · offline

⚡

Claude Code

CLI / Resume / RC API

The Local AI Transform is the simplest path: no API calls, no latency, no cost. SpeechButton's built-in Gemma 4 model structures your voice into a clean task before the integration exec runs.

Setup: config.toml + Integration Scripts

One config file with hotkey channels. Exec scripts live in integrations/. Prompt files live in prompts/. The Local AI Transform is built in — just point transform at your .md file.

SpeechButton config.toml

Two hotkeys — one for CLI one-shot, one for Remote Control. Both use .md prompt files for offline task structuring.

toml — ~/.config/speechbutton/config.toml

[global]
model = "parakeet-tdt-0.6b-v3-int8"
language = "auto"
auto_punctuation = true

[[hotkey]]
key = "RightCommand"
name = "default"
paste = "accessibility"

[[hotkey]]
key = "RightCommand"
channel = "1"
name = "claude-code"
transform = "prompts/claude_code_task.md"
exec = "integrations/send_claude_code.py"

[[hotkey]]
key = "RightCommand"
channel = "3"
name = "claude-remote"
transform = "prompts/default.md"
exec = "integrations/send_claude_remote.py"

The Local AI Transform runs a built-in small language model on your Mac using a .md prompt file to structure your voice into a developer task prompt — completely offline, no API key, no per-request cost.

Method 1

CLI One-Shot

The simplest method. Starts a new Claude Code process, runs the task, exits. Best for self-contained tasks where you don't need context from a prior conversation.

python — integrations/send_claude_code.py

#!/usr/bin/env python3
"""Send text to Claude Code CLI as a one-shot prompt."""
import os, shutil, subprocess, sys

def find_claude_binary():
    found = shutil.which("claude")
    if found: return found
    for path in [os.path.expanduser("~/.local/bin/claude"), "/opt/homebrew/bin/claude"]:
        if os.path.exists(path): return path
    print("claude CLI not found", file=sys.stderr)
    sys.exit(1)

def main():
    task = sys.stdin.read().strip()
    if not task: sys.exit(0)
    claude = find_claude_binary()
    result = subprocess.run(
        [claude, "--print", "--bare", "-p", task],
        capture_output=True, text=True, timeout=120,
    )
    if result.returncode == 0:
        response = result.stdout.strip()
        print(f"Claude: {response.split(chr(10))[0][:100]}" if response else "Done")
    else:
        print(f"Error: {result.stderr[:100]}", file=sys.stderr)
        sys.exit(1)

if __name__ == "__main__":
    main()

toml — SpeechButton config.toml

# ⌘+1 → Claude Code one-shot
[[hotkey]]
key = "RightCommand"
channel = "1"
name = "claude-code"
transform = "prompts/claude_code_task.md"
exec = "integrations/send_claude_code.py"

You say:

"Add a missing index on the users table email column — write and run the migration."

Claude Code spins up, writes the migration, applies it, exits. No session to manage.

Method 2

CLI --resume

Resumes a previous Claude Code session by ID. The agent remembers the full prior context — what files it read, what it changed, what it was thinking. Use this when you want to continue a conversation that's already in progress.

bash — integrations/send_claude_resume.sh

#!/bin/bash
# integrations/send_claude_resume.sh SESSION_ID
# Resumes a Claude Code session and sends a follow-up task

SESSION_ID="$1"
TASK=$(cat)

claude --bare -p "$TASK" \
  --resume "$SESSION_ID" \
  --allowedTools "Read,Edit,Bash" \
  --output-format json \
  2>/dev/null \
  | jq -r '.result // "Done"'

toml — SpeechButton config.toml

# ⌘+2 → Resume Claude Code session
[[hotkey]]
key = "RightCommand"
channel = "2"
name = "claude-resume"
transform = "prompts/claude_code_task.md"
exec = "integrations/send_claude_resume.sh SESSION_ID_HERE"

You say (follow-up to an earlier task):

"Good. Now add a composite index on user_id and created_at as well — same migration pattern."

The agent picks up exactly where it left off. Set the session ID in your config and that hotkey becomes a voice channel into a persistent conversation.

Method 3

Remote Control API recommended

The most powerful method. Posts a message directly into a running Claude Code session via the Anthropic Remote Control API. The agent is already running, already has context, and responds immediately with streaming output. No startup latency, no lost state.

flowchart LR V["🎤 Voice"] -->|20ms| S["STT"] S --> T["Local AI\nTransform (.md)"] T --> P["POST /v1/sessions/\n{id}/events"] P -->|Anthropic API| R["Running\nClaude Code"] R --> F["Reads, edits,\nruns tests"] style V fill:#fefce8,stroke:#eab308,color:#713f12 style S fill:#eff6ff,stroke:#2563eb,color:#1e3a5f style T fill:#f0fdf4,stroke:#16a34a,color:#14532d style P fill:#faf5ff,stroke:#9333ea,color:#581c87 style R fill:#eff6ff,stroke:#2563eb,color:#1e40af,stroke-width:3px style F fill:#f8fafc,stroke:#64748b,color:#334155

How to get your session ID

Start Claude Code with claude --remote-control
The session ID appears in the startup output, or is auto-detected from bridge-pointer.json
Auth token is read automatically from ~/.claude/.credentials.json

python — integrations/send_claude_remote.py

#!/usr/bin/env python3
"""Send voice command to a running Claude Code Remote Control session.

Usage: echo "task text" | python3 send_claude_remote.py [session_id]
If session_id is omitted, auto-detects from bridge-pointer.json.
Auth token is read from ~/.claude/.credentials.json automatically.
"""
import json, os, pathlib, sys, urllib.request, uuid

def find_session_id():
    """Scan common locations for bridge-pointer.json."""
    candidates = [
        pathlib.Path.home() / ".claude" / "bridge-pointer.json",
        pathlib.Path("/tmp/claude-bridge-pointer.json"),
    ]
    for p in candidates:
        if p.exists():
            return json.loads(p.read_text())["sessionId"]
    raise RuntimeError(
        "No session ID provided and bridge-pointer.json not found. "
        "Start Claude Code with --remote-control or pass session ID as argument."
    )

def resolve_session_id(args):
    """Return session ID from CLI arg or auto-detect."""
    if len(args) > 1 and args[1] not in ("", "SESSION_ID_HERE"):
        return args[1]
    return find_session_id()

def get_oauth_token():
    """Read OAuth access token from Claude Code credentials file."""
    creds_path = pathlib.Path.home() / ".claude" / ".credentials.json"
    if not creds_path.exists():
        raise FileNotFoundError(
            f"Credentials not found at {creds_path}. Run 'claude login' first."
        )
    return json.loads(creds_path.read_text())["claudeAiOauth"]["accessToken"]

def get_org_uuid(token):
    """Fetch organization UUID from Anthropic OAuth profile endpoint."""
    req = urllib.request.Request(
        "https://api.anthropic.com/api/oauth/profile",
        headers={"Authorization": f"Bearer {token}"},
    )
    profile = json.loads(urllib.request.urlopen(req).read())
    return profile["organization"]["uuid"]

def send_message(session_id, text, token, org_uuid):
    """POST a user message event to the Remote Control API."""
    data = json.dumps({
        "events": [{
            "uuid": str(uuid.uuid4()),
            "session_id": session_id,
            "type": "user",
            "parent_tool_use_id": None,
            "message": {"role": "user", "content": text},
        }]
    }).encode()
    req = urllib.request.Request(
        f"https://api.anthropic.com/v1/sessions/{session_id}/events",
        data=data,
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
            "anthropic-version": "2023-06-01",
            "anthropic-beta": "ccr-byoc-2025-07-29",
            "x-organization-uuid": org_uuid,
        },
    )
    urllib.request.urlopen(req)

def main():
    text = sys.stdin.read().strip()
    if not text:
        sys.exit(0)
    try:
        session_id = resolve_session_id(sys.argv)
        token = get_oauth_token()
        org_uuid = get_org_uuid(token)
        send_message(session_id, text, token, org_uuid)
        print(f"Sent to RC session {session_id[:8]}...: {text[:60]}")
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)

if __name__ == "__main__":
    main()

toml — SpeechButton config.toml

# ⌘+3 → Remote Control session (recommended)
[[hotkey]]
key = "RightCommand"
channel = "3"
name = "claude-remote"
transform = "prompts/default.md"
exec = "integrations/send_claude_remote.py"

You say (to a running agent):

"There's a race condition in the session middleware — the refresh token is being read before the write completes causing a 500 after about thirty minutes."

The running agent receives the message and starts investigating immediately — with full context of everything it's already read in this session. Response streams back in real time. No startup latency.

🔌

Channels (MCP)

Bidirectional communication via MCP server. Push events into a running Claude Code session through a local webhook. Two-way: Claude can reply back through the channel.

start claude code with channel

claude --dangerously-load-development-channels server:speechbutton

send a task via channel webhook

curl -X POST http://localhost:8788 \
  -H "Content-Type: application/json" \
  -d '{"message": "Fix the auth bug in login.rs"}'

MCP Channel Server (TypeScript)

The channel needs an MCP server that listens for HTTP requests and pushes them into the Claude Code session. Here's a minimal implementation:

typescript — speechbutton-channel.ts

// MCP channel server for SpeechButton voice commands
// Install: bun add @modelcontextprotocol/sdk

import { Server } from "@modelcontextprotocol/sdk/server/index.js"
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"

const mcp = new Server(
  { name: "speechbutton", version: "0.0.1" },
  {
    capabilities: {
      experimental: { "claude/channel": {} }
    },
    instructions: "Voice commands from SpeechButton arrive as <channel> tags. Execute them.",
  },
)

await mcp.connect(new StdioServerTransport())

// HTTP webhook — SpeechButton POSTs voice tasks here
Bun.serve({
  port: 8788,
  hostname: "127.0.0.1",
  async fetch(req) {
    const body = await req.text()
    await mcp.notification({
      method: "notifications/claude/channel",
      params: {
        content: body,
        meta: {
          source: "speechbutton",
          type: "voice",
          timestamp: new Date().toISOString(),
        },
      },
    })
    return new Response("ok")
  },
})

Register in .mcp.json

Add the channel server to your project's MCP config so Claude Code loads it on startup:

json — .mcp.json

{
  "mcpServers": {
    "speechbutton": {
      "command": "bun",
      "args": ["./speechbutton-channel.ts"]
    }
  }
}

SpeechButton config.toml

Add a hotkey channel that sends voice to the MCP channel server:

toml — config.toml (add to existing hotkeys)

# Channel 4: Claude Code via MCP Channel (bidirectional)
# Hold RightCommand, tap 4, speak → task pushed into running session
[[hotkey]]
key = "RightCommand"
channel = "4"
name = "claude-channel"
transform = "prompts/claude_code_task.md"
exec = "curl -s -X POST http://localhost:8788 -d @-"

When SpeechButton sends a voice command via curl POST localhost:8788, it arrives in Claude Code as:

what claude code receives

<channel source="speechbutton" type="voice">
Fix the auth bug in login.rs
</channel>

Channels are ideal for automation pipelines: CI/CD triggers, file watchers, or any system that needs to push tasks into a running agent. The MCP server handles message routing and session management.

The Power of Local AI Transform

The transform pipeline is what makes this more than dictation. Setting transform to a .md prompt file activates SpeechButton's Local AI Transform using the built-in Gemma 4 model. It structures your voice before anything leaves your Mac.

Without transform (raw dictation)

"there's a race condition in the session middleware the refresh token is being read before the write completes causing a five hundred after about thirty minutes"

With Local AI Transform (.md prompt file)

## Problem

Race condition in session middleware: refresh token read before write completes → HTTP 500 after ~30 minutes.

## Expected Behavior

Token refresh and read should be atomic.

## Steps to Investigate

Check concurrent access in session middleware
Look for missing locks on token write
Verify 30-minute window matches token expiry

The structured version makes Claude Code 2–3x more effective. It doesn't waste turns asking clarifying questions. It doesn't misinterpret "five hundred" as a number instead of an HTTP status code.

The Local AI Transform model runs entirely on your Apple Silicon — no network call, no API key, no per-use cost. Each hotkey channel can use a different .md prompt file for specialized formatting.

And each channel can have a different transform:

⌘1 — prompts/claude_code_task.md general task structuring (Problem/Expected/Steps)
⌘2 — prompts/bug_report.md for bug report format (Reproduce/Expected/Actual)
⌘3 — prompts/test_spec.md for test specification (Given/When/Then)

Same voice. Different structure. Different agent. All from config.toml.

Real Workflow: Voice-Controlling 3 Remote Agents

You're leading a sprint. Three agents on your build server, each in its own worktree. You orchestrate them by voice from your MacBook.

9:00 — Morning kickoff (CLI one-shot)

⌘1

"Start implementing the GraphQL subscription for real-time order updates. Use the existing OrderEvent type and add a WebSocket transport layer. Check how the REST endpoint works in src/api/orders.rs and mirror the data model."

Agent #1 starts reading the codebase and implementing.

9:01 — Bug fix in parallel (Remote Control API)

⌘2

"Users are reporting that CSV exports are truncated at 10,000 rows. The export handler in src/export/csv.rs probably has a hardcoded limit. Find it, remove it, add a streaming writer so memory doesn't blow up on large exports."

Sent directly into the running RC session. Agent #2 responds within a second. You're drinking coffee.

9:02 — Test coverage (CLI --resume)

⌘3

"We're missing integration tests for the billing webhook. Write tests that cover successful payment, failed payment, subscription upgrade, and subscription cancellation. Use the existing test fixtures in tests/fixtures/billing."

Agent #3 resumes its previous session and continues from prior context. Three agents, three tasks, under two minutes.

9:30 — Check-in and course correction (Remote Control API)

⌘2

"How's the GraphQL subscription going? If you've got the basic query working, add cursor-based pagination before the mutation handler."

The agent continues with updated instructions. Full prior context. You corrected course in 5 seconds.

10:00 — Merge and ship

⌘2

"If the CSV fix is tested and passing, create a PR with the title 'fix: remove hardcoded row limit in CSV export, add streaming writer'."

The agent runs tests, commits, and opens a PR. From your voice to a pull request.

Why This Only Works with 20ms Capture

When you're orchestrating agents by voice, you're giving rapid-fire instructions. Hold hotkey, speak, release. Hold another hotkey, speak, release. The rhythm is fast — you're thinking out loud, dispatching as ideas form.

At 200ms (SuperWhisper, Wispr Flow)

The first word of every instruction gets clipped. "Start implementing the GraphQL..." becomes "implementing the GraphQL..." Your agent misses the verb. It misses the intent.

At 20ms (SpeechButton)

Every word lands. The first syllable of "Start" is captured. Your instruction is complete. Over a morning of 30+ voice dispatches, every instruction is complete — agents do what you meant, not what they guess.

Over a morning of 30+ voice dispatches, the difference between complete and clipped instructions is the difference between agents that do what you meant and agents that guess.

100% Offline Voice Capture

Here's exactly what stays on your Mac and what goes to the cloud:

Component	Where it runs	Data sent externally
Voice capture	Your Mac	✓ Nothing
Speech-to-text (Parakeet V3)	Apple Neural Engine	✓ Nothing
Local AI Transform — Gemma 4 (`.md prompt file`)	Apple Silicon (on-device)	✓ Nothing
CLI one-shot / --resume	Your Mac or remote server	Task text → Anthropic API
Remote Control API	Your Mac → Anthropic API	Task text → running session

The voice capture, transcription, and local AI transform are all 100% local. Your spoken words never leave your Mac as audio. Only the final structured task text is sent — and with the Remote Control API, that goes directly into your already-authenticated running session. For teams working on proprietary code: your voice describing the code never touches a third-party server as raw audio.

Config as Code: Agents Configure Agents

SpeechButton's config.toml is a plain text file. Your Claude Code agent can modify it.

"Hey Claude, add a new SpeechButton channel on Right Command channel 3 that connects to the staging RC session with a deployment task format."

Claude reads your config.toml, adds the hotkey, saves the session ID, and sets up the integration exec. Your voice-routing setup evolves as your infrastructure grows.

This creates a recursive loop: you use voice to control agents, and agents configure how your voice controls them. The system gets better the more you use it.

What Competitors Can't Do

SuperWhisper and Wispr Flow are dictation tools. They transcribe speech and paste it where your cursor is. Full stop.

They can't:

× Route different hotkeys to different Claude Code integration methods
× Transform speech with a built-in local AI model — offline, free, instant
× Post directly into a running Claude Code session via Remote Control API
× Resume multi-turn agent conversations by voice
× Let you orchestrate remote agents from a MacBook without opening a terminal
× Be configured by the agents they control

SpeechButton isn't a dictation tool. It's a voice control layer for AI agent infrastructure. The combination of per-hotkey routing, local AI transforms, four integration methods, and Remote Control API support is unique in the market.

Get Started

Prerequisites

macOS 15+ (Sequoia), Apple Silicon (M1/M2/M3/M4)
Claude Code CLI installed (npm install -g @anthropic-ai/claude-code)
For Remote Control API: Claude Code logged in (~/.claude/.credentials.json present)

Quick Start — CLI One-Shot (simplest)

1 Download SpeechButton — free 5 minutes/day, no account needed
2 Copy the config.toml from above into ~/.config/speechbutton/config.toml
3 Save the exec script as ~/.config/speechbutton/integrations/send_claude_code.py and chmod +x it
4 Hold Right Command, speak a task, release. Your first voice-dispatched agent task will execute in under 10 seconds.
5 Upgrade to Remote Control API — start Claude Code with claude --remote-control, save send_claude_remote.py into integrations/, and Right Command channel 3 will auto-detect your running session

Start voice-controlling your agents today

Free 5 min/day · No account needed · macOS 15+ · Apple Silicon

 Download for macOS — Free

Pro ($7.99/mo) removes the daily limit. Requires macOS 15+ and Apple Silicon.

Voice ControlRemote AI Agents

Your Agents Are Waiting for Instructions. Stop Typing Them.

Four Ways to Reach Claude Code

The Architecture: Voice → Local AI Transform (.md) → Claude Code

Setup: config.toml + Integration Scripts

SpeechButton config.toml

CLI One-Shot

CLI --resume

Remote Control API recommended

Channels (MCP)

MCP Channel Server (TypeScript)

Register in .mcp.json

SpeechButton config.toml

The Power of Local AI Transform

Real Workflow: Voice-Controlling 3 Remote Agents

Why This Only Works with 20ms Capture

100% Offline Voice Capture

Config as Code: Agents Configure Agents

What Competitors Can't Do

Get Started

Start voice-controlling your agents today

Coming to your platform soon!

You're on the list!

Voice Control
Remote AI Agents