SpeechButton + OpenClaw: Voice Control Your AI Agent Across Every App

OpenClaw Does Everything. You Still Have to Type.

OpenClaw is the open-source AI agent that connects to your entire digital life — WhatsApp, Slack, Telegram, email, calendar, GitHub, and dozens more. It can send messages, schedule meetings, create issues, and manage your workflows. All from a single AI assistant.

But to use it, you type. Or you tap. Or you use its built-in voice wake word. The problem: OpenClaw's voice features are designed for conversational interactions, not rapid-fire task dispatch. You wait for it to wake, speak, wait for confirmation, confirm.

SpeechButton is different. Hold. Speak. Release.

⌘5

"Send Alex on WhatsApp that I'll be 10 minutes late to the meeting and ask him to start without me."

Release. SpeechButton transcribes your words, transforms them into an OpenClaw command, and sends it to the local Gateway. OpenClaw routes the message to Alex on WhatsApp. Total time: 3 seconds.

⌘6

"Sounds good, let's do Thursday at 3. I'll book the room."

Raw text, no transform, dispatched through OpenClaw to whoever messaged you last. Two seconds.

The task hits OpenClaw's Gateway in under a second. No wake word. No confirmation dialog. No waiting.

How It Works: Voice → STT → Transform → OpenClaw Gateway

Your voice ──▶ SpeechButton STT ──▶ Transform ──▶ HTTP API ──▶ OpenClaw Agent
  (20ms)         (100% offline)      (Gemma 4)    (localhost)    (WhatsApp, Slack,
                                                                 Telegram, …)

OpenClaw exposes an OpenAI-compatible HTTP API on localhost. SpeechButton's exec field runs a simple curl command to post your structured task directly to the OpenClaw gateway.

Each piece is simple:

SpeechButton captures your voice and transcribes it locally (Apple Neural Engine, 100% offline)
Transform (Gemma 4) cleans and structures your spoken command locally
Exec script sends the structured task to OpenClaw's HTTP API via curl
OpenClaw agent receives and executes it across WhatsApp, Slack, Telegram, and 30+ platforms

The API is OpenAI-compatible — standard JSON, Bearer token auth, streaming support. No special SDK needed.

Setup

Prerequisites, config, and an exec script. Three files, five minutes.

Prerequisites

macOS 14+ (Sonoma)
OpenClaw installed and gateway running (openclaw status)
Gateway auth token set (openclaw config set gateway.auth.token "your-token")

SpeechButton config.toml

RightCommand sends to your OpenClaw agent. The exec script sends the transformed text to OpenClaw's HTTP API via curl.

toml — ~/.config/speechbutton/config.toml

# ~/.config/speechbutton/config.toml

[global]
model = "parakeet-tdt-0.6b-v3-int8"
language = "en"
auto_punctuation = true

[vad]
enabled = true
chunk_silence_sec = 1.0

# Send to OpenClaw agent via HTTP API
[[hotkey]]
key = "RightCommand"
channel = "9"
name = "openclaw"
transform = "__local__"
exec = "integrations/send_openclaw.py"

Exec script

The exec script receives the transformed text on stdin and sends it to OpenClaw's HTTP API.

python — integrations/send_openclaw.py

#!/usr/bin/env python3
"""Send voice command to OpenClaw agent via HTTP API."""
import sys, json, urllib.request, os

OPENCLAW_URL = os.getenv("OPENCLAW_URL", "http://127.0.0.1:18789")
OPENCLAW_TOKEN = os.getenv("OPENCLAW_GATEWAY_TOKEN", "")
SESSION_KEY = sys.argv[1] if len(sys.argv) > 1 else "speechbutton"

text = sys.stdin.read().strip()
if not text:
    sys.exit(0)

payload = json.dumps({
    "model": "openclaw/default",
    "messages": [{"role": "user", "content": text}]
}).encode()

req = urllib.request.Request(
    f"{OPENCLAW_URL}/v1/chat/completions",
    data=payload,
    headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {OPENCLAW_TOKEN}",
        "x-openclaw-session-key": SESSION_KEY,
    }
)
urllib.request.urlopen(req)

Transform — spoken task → structured agent command

The transform is a prompt file that cleans up your speech before it reaches the agent. No script, no API key — SpeechButton applies it locally using the built-in Gemma 4 model.

You say:

"tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her"

markdown — prompts/default.md

# prompts/default.md
# Default transform — clean up spoken input for an AI agent

Clean up this spoken command for an AI agent.
Fix grammar, remove filler words, make the
instruction clear and actionable.
Output ONLY the cleaned command.

What the agent receives:

Tell Sarah on Telegram that the design review has been moved to 3pm and ask if that time works for her.

Your stream-of-consciousness became a clear, actionable instruction. The agent can execute it immediately.

Multiple agents — route different hotkeys to different sessions

Add one [[hotkey]] entry per agent. Each hotkey targets a different OpenClaw session via the x-openclaw-session-key header.

toml — multiple agents example

# RightCommand+9 → PM agent (with AI transform)
[[hotkey]]
key = "RightCommand"
channel = "9"
name = "openclaw-pm"
transform = "prompts/default.md"
exec = "integrations/send_openclaw.py"

# RightCommand+0 → Dev agent (with task-specific transform)
[[hotkey]]
key = "RightCommand"
channel = "1"
name = "openclaw-dev"
transform = "prompts/claude_code_task.md"
exec = "integrations/send_openclaw.py dev"

The exec script uses the argument as the session key. Each session maintains its own conversation history in OpenClaw, so your PM agent and dev agent stay independent.

Real Workflows

You're deep in work. OpenClaw handles the communications layer. Your voice is the interface.

Cross-platform messaging without opening anything

You need to reach people on different platforms. Normally: open WhatsApp, type. Open Slack, type. Open Telegram, type.

⌘5

"Tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her."

⌘5

"Send a message to the engineering Slack channel that the deploy is on hold until we fix the flaky test."

⌘5

"Reply to Alex's last WhatsApp message saying yes, I can meet at the coffee shop at noon."

Three platforms. Three messages. 20 seconds of speaking. Zero app switching.

Multi-step tasks

OpenClaw can chain actions. Your voice triggers complex workflows:

⌘5

"Schedule a meeting with the product team for next Tuesday at 2pm, send calendar invites to Alex, Maria, and James, and post a reminder in the product Slack channel."

OpenClaw handles the calendar creation, the invites, and the Slack notification. You spoke one sentence.

Morning briefing

⌘5

"Give me a summary of overnight messages — anything urgent from WhatsApp or Slack that I need to handle before standup."

OpenClaw scans your messaging platforms, summarizes, and responds. You're briefed while making coffee.

Quick replies while coding

Someone messages you. You don't want to context-switch.

⌘6

"Sounds good, let's do Thursday at 3. I'll book the room."

Command+6 is your raw-message channel. No transform, no overhead. Dispatched through OpenClaw in two seconds.

Why SpeechButton + OpenClaw > OpenClaw Alone

OpenClaw already has voice features — Voice Wake and Talk Mode. Why add SpeechButton?

	OpenClaw Voice	SpeechButton + OpenClaw
Activation	Wake word ("Hey Claw")	Push-to-talk hotkey (instant)
Latency	Wake word detection + confirmation	20ms capture, no confirmation
Multiple targets	One conversation at a time	Per-hotkey routing to different tasks
Offline STT	Cloud-dependent (platform varies)	100% local Apple Neural Engine
Transform	No pre-processing pipeline	Custom transforms per channel
Rapid-fire	Conversational back-and-forth	Hold-speak-release, next command

SpeechButton is for rapid-fire task dispatch — you know what you want, you say it, it's done. OpenClaw's built-in voice is for conversational interactions where you need back-and-forth. They complement each other.

The per-hotkey routing is the key differentiator. Command+5 for agent tasks, Command+6 for quick messages, Command+2 for Slack (directly, bypassing OpenClaw), Command+3 for Linear. You choose the optimal path per command — sometimes through OpenClaw, sometimes direct.

Privacy

Here's exactly what stays on your Mac and what goes to external services:

Component	Where it runs	Data sent externally
Voice capture	Your Mac	✓ Nothing
Speech-to-text (Parakeet V3)	Apple Neural Engine	✓ Nothing
Transform (prompts/default.md)	Your Mac (Gemma 4, local)	✓ Nothing
Exec → OpenClaw Gateway	localhost HTTP (127.0.0.1)	✓ Nothing — never leaves your Mac
OpenClaw agent execution	Your Mac	Depends on the task (sends messages to WhatsApp, Slack, etc.)

Voice capture, STT, and the AI transform are 100% local — no audio ever leaves your Mac. The task text is sent to OpenClaw's gateway on localhost (127.0.0.1) — it never touches the internet until OpenClaw executes the task on your chosen platform.

Get Started

Prerequisites

macOS 14+ (Sonoma), Apple Silicon
OpenClaw installed and gateway running (openclaw status)
At least one messaging channel connected (WhatsApp, Slack, Telegram, etc.)

Quick Start

1 Download SpeechButton — free 5 minutes/day, no account needed
2 Start the OpenClaw gateway — openclaw gateway starts the HTTP API on localhost:18789
3 Save the exec script as ~/.config/speechbutton/integrations/send_openclaw.py and chmod +x it
4 Add a [[hotkey]] entry to config.toml with exec = "integrations/send_openclaw.py"
5 Hold RightCommand, speak a task, release. Your voice routes to OpenClaw's gateway on localhost — the agent executes it immediately.

Your voice becomes the universal controller

Free 5 min/day · No account needed · macOS 14+ · Apple Silicon

 Download for macOS — Free

Pro ($7.99/mo) removes the daily limit. Requires macOS 15+ and Apple Silicon.

Voice Control YourAI Agent AcrossEvery App

OpenClaw Does Everything. You Still Have to Type.

How It Works: Voice → STT → Transform → OpenClaw Gateway

Setup

Prerequisites

SpeechButton config.toml

Exec script

Transform — spoken task → structured agent command

Multiple agents — route different hotkeys to different sessions

Real Workflows

Cross-platform messaging without opening anything

Multi-step tasks

Morning briefing

Quick replies while coding

Why SpeechButton + OpenClaw > OpenClaw Alone

Privacy

Get Started

Your voice becomes the universal controller

Coming to your platform soon!

You're on the list!

Voice Control Your
AI Agent Across
Every App