OpenClaw Does Everything. You Still Have to Type.
OpenClaw is the open-source AI agent that connects to your entire digital life — WhatsApp, Slack, Telegram, email, calendar, GitHub, and dozens more. It can send messages, schedule meetings, create issues, and manage your workflows. All from a single AI assistant.
But to use it, you type. Or you tap. Or you use its built-in voice wake word. The problem: OpenClaw's voice features are designed for conversational interactions, not rapid-fire task dispatch. You wait for it to wake, speak, wait for confirmation, confirm.
SpeechButton is different. Hold. Speak. Release.
"Send Alex on WhatsApp that I'll be 10 minutes late to the meeting and ask him to start without me."
Release. SpeechButton transcribes your words, transforms them into an OpenClaw command, and sends it to the local Gateway. OpenClaw routes the message to Alex on WhatsApp. Total time: 3 seconds.
"Sounds good, let's do Thursday at 3. I'll book the room."
Raw text, no transform, dispatched through OpenClaw to whoever messaged you last. Two seconds.
The task hits OpenClaw's Gateway in under a second. No wake word. No confirmation dialog. No waiting.
How It Works: Voice → STT → Transform → OpenClaw Gateway
Your voice ──▶ SpeechButton STT ──▶ Transform ──▶ HTTP API ──▶ OpenClaw Agent
(20ms) (100% offline) (Gemma 4) (localhost) (WhatsApp, Slack,
Telegram, …)
OpenClaw exposes an OpenAI-compatible HTTP API on localhost. SpeechButton's exec field runs a simple curl command to post your structured task directly to the OpenClaw gateway.
Each piece is simple:
- SpeechButton captures your voice and transcribes it locally (Apple Neural Engine, 100% offline)
- Transform (Gemma 4) cleans and structures your spoken command locally
- Exec script sends the structured task to OpenClaw's HTTP API via curl
- OpenClaw agent receives and executes it across WhatsApp, Slack, Telegram, and 30+ platforms
The API is OpenAI-compatible — standard JSON, Bearer token auth, streaming support. No special SDK needed.
Setup
Prerequisites, config, and an exec script. Three files, five minutes.
Prerequisites
- macOS 14+ (Sonoma)
- OpenClaw installed and gateway running (
openclaw status) - Gateway auth token set (
openclaw config set gateway.auth.token "your-token")
SpeechButton config.toml
RightCommand sends to your OpenClaw agent. The exec script sends the transformed text to OpenClaw's HTTP API via curl.
# ~/.config/speechbutton/config.toml [global] model = "parakeet-tdt-0.6b-v3-int8" language = "en" auto_punctuation = true [vad] enabled = true chunk_silence_sec = 1.0 # Send to OpenClaw agent via HTTP API [[hotkey]] key = "RightCommand" channel = "9" name = "openclaw" transform = "__local__" exec = "integrations/send_openclaw.py"
Exec script
The exec script receives the transformed text on stdin and sends it to OpenClaw's HTTP API.
#!/usr/bin/env python3 """Send voice command to OpenClaw agent via HTTP API.""" import sys, json, urllib.request, os OPENCLAW_URL = os.getenv("OPENCLAW_URL", "http://127.0.0.1:18789") OPENCLAW_TOKEN = os.getenv("OPENCLAW_GATEWAY_TOKEN", "") SESSION_KEY = sys.argv[1] if len(sys.argv) > 1 else "speechbutton" text = sys.stdin.read().strip() if not text: sys.exit(0) payload = json.dumps({ "model": "openclaw/default", "messages": [{"role": "user", "content": text}] }).encode() req = urllib.request.Request( f"{OPENCLAW_URL}/v1/chat/completions", data=payload, headers={ "Content-Type": "application/json", "Authorization": f"Bearer {OPENCLAW_TOKEN}", "x-openclaw-session-key": SESSION_KEY, } ) urllib.request.urlopen(req)
Transform — spoken task → structured agent command
The transform is a prompt file that cleans up your speech before it reaches the agent. No script, no API key — SpeechButton applies it locally using the built-in Gemma 4 model.
You say:
"tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her"
# prompts/default.md # Default transform — clean up spoken input for an AI agent Clean up this spoken command for an AI agent. Fix grammar, remove filler words, make the instruction clear and actionable. Output ONLY the cleaned command.
What the agent receives:
Tell Sarah on Telegram that the design review has been moved to 3pm and ask if that time works for her.
Your stream-of-consciousness became a clear, actionable instruction. The agent can execute it immediately.
Multiple agents — route different hotkeys to different sessions
Add one [[hotkey]] entry per agent. Each hotkey targets a different OpenClaw session via the x-openclaw-session-key header.
# RightCommand+9 → PM agent (with AI transform) [[hotkey]] key = "RightCommand" channel = "9" name = "openclaw-pm" transform = "prompts/default.md" exec = "integrations/send_openclaw.py" # RightCommand+0 → Dev agent (with task-specific transform) [[hotkey]] key = "RightCommand" channel = "1" name = "openclaw-dev" transform = "prompts/claude_code_task.md" exec = "integrations/send_openclaw.py dev"
The exec script uses the argument as the session key. Each session maintains its own conversation history in OpenClaw, so your PM agent and dev agent stay independent.
Real Workflows
You're deep in work. OpenClaw handles the communications layer. Your voice is the interface.
Cross-platform messaging without opening anything
You need to reach people on different platforms. Normally: open WhatsApp, type. Open Slack, type. Open Telegram, type.
"Tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her."
"Send a message to the engineering Slack channel that the deploy is on hold until we fix the flaky test."
"Reply to Alex's last WhatsApp message saying yes, I can meet at the coffee shop at noon."
Three platforms. Three messages. 20 seconds of speaking. Zero app switching.
Multi-step tasks
OpenClaw can chain actions. Your voice triggers complex workflows:
"Schedule a meeting with the product team for next Tuesday at 2pm, send calendar invites to Alex, Maria, and James, and post a reminder in the product Slack channel."
OpenClaw handles the calendar creation, the invites, and the Slack notification. You spoke one sentence.
Morning briefing
"Give me a summary of overnight messages — anything urgent from WhatsApp or Slack that I need to handle before standup."
OpenClaw scans your messaging platforms, summarizes, and responds. You're briefed while making coffee.
Quick replies while coding
Someone messages you. You don't want to context-switch.
"Sounds good, let's do Thursday at 3. I'll book the room."
Command+6 is your raw-message channel. No transform, no overhead. Dispatched through OpenClaw in two seconds.
Why SpeechButton + OpenClaw > OpenClaw Alone
OpenClaw already has voice features — Voice Wake and Talk Mode. Why add SpeechButton?
| OpenClaw Voice | SpeechButton + OpenClaw | |
|---|---|---|
| Activation | Wake word ("Hey Claw") | Push-to-talk hotkey (instant) |
| Latency | Wake word detection + confirmation | 20ms capture, no confirmation |
| Multiple targets | One conversation at a time | Per-hotkey routing to different tasks |
| Offline STT | Cloud-dependent (platform varies) | 100% local Apple Neural Engine |
| Transform | No pre-processing pipeline | Custom transforms per channel |
| Rapid-fire | Conversational back-and-forth | Hold-speak-release, next command |
SpeechButton is for rapid-fire task dispatch — you know what you want, you say it, it's done. OpenClaw's built-in voice is for conversational interactions where you need back-and-forth. They complement each other.
The per-hotkey routing is the key differentiator. Command+5 for agent tasks, Command+6 for quick messages, Command+2 for Slack (directly, bypassing OpenClaw), Command+3 for Linear. You choose the optimal path per command — sometimes through OpenClaw, sometimes direct.
Privacy
Here's exactly what stays on your Mac and what goes to external services:
| Component | Where it runs | Data sent externally |
|---|---|---|
| Voice capture | Your Mac | ✓ Nothing |
| Speech-to-text (Parakeet V3) | Apple Neural Engine | ✓ Nothing |
| Transform (prompts/default.md) | Your Mac (Gemma 4, local) | ✓ Nothing |
| Exec → OpenClaw Gateway | localhost HTTP (127.0.0.1) | ✓ Nothing — never leaves your Mac |
| OpenClaw agent execution | Your Mac | Depends on the task (sends messages to WhatsApp, Slack, etc.) |
Voice capture, STT, and the AI transform are 100% local — no audio ever leaves your Mac. The task text is sent to OpenClaw's gateway on localhost (127.0.0.1) — it never touches the internet until OpenClaw executes the task on your chosen platform.
Get Started
Prerequisites
- macOS 14+ (Sonoma), Apple Silicon
- OpenClaw installed and gateway running (
openclaw status) - At least one messaging channel connected (WhatsApp, Slack, Telegram, etc.)
Quick Start
- 1 Download SpeechButton — free 5 minutes/day, no account needed
-
2
Start the OpenClaw gateway —
openclaw gatewaystarts the HTTP API on localhost:18789 -
3
Save the exec script as
~/.config/speechbutton/integrations/send_openclaw.pyandchmod +xit -
4
Add a
[[hotkey]]entry toconfig.tomlwithexec = "integrations/send_openclaw.py" - 5 Hold RightCommand, speak a task, release. Your voice routes to OpenClaw's gateway on localhost — the agent executes it immediately.
Your voice becomes the universal controller
Free 5 min/day · No account needed · macOS 14+ · Apple Silicon
Download for macOS — FreePro ($7.99/mo) removes the daily limit. Requires macOS 15+ and Apple Silicon.