OpenClaw Does Everything. You Still Have to Type.

OpenClaw is the open-source AI agent that connects to your entire digital life — WhatsApp, Slack, Telegram, email, calendar, GitHub, and dozens more. It can send messages, schedule meetings, create issues, and manage your workflows. All from a single AI assistant.

But to use it, you type. Or you tap. Or you use its built-in voice wake word. The problem: OpenClaw's voice features are designed for conversational interactions, not rapid-fire task dispatch. You wait for it to wake, speak, wait for confirmation, confirm.

SpeechButton is different. Hold. Speak. Release.

⌘5

"Send Alex on WhatsApp that I'll be 10 minutes late to the meeting and ask him to start without me."

Release. SpeechButton transcribes your words, transforms them into an OpenClaw command, and sends it to the local Gateway. OpenClaw routes the message to Alex on WhatsApp. Total time: 3 seconds.

⌘6

"Sounds good, let's do Thursday at 3. I'll book the room."

Raw text, no transform, dispatched through OpenClaw to whoever messaged you last. Two seconds.

The task hits OpenClaw's Gateway in under a second. No wake word. No confirmation dialog. No waiting.

How It Works: Voice → STT → Transform → OpenClaw Gateway

Your voice ──▶ SpeechButton STT ──▶ Transform ──▶ nerw P2P mesh ──▶ AI agent
  (7ms)         (100% offline)      (structure)    (QUIC/TLS 1.3)    (tolki-pm,
                                                                      claude-code, …)

OpenClaw uses nerw — a P2P mesh network over QUIC/TLS 1.3 — to route tasks to AI agents running on any connected machine. SpeechButton's exec field runs nerw send directly — no scripts, no API keys, no local server required.

Each piece is simple:

  1. SpeechButton captures your voice and transcribes it locally (Apple Neural Engine, 100% offline)
  2. Transform (optional) cleans and structures your spoken command — e.g. prompts/default.md
  3. nerw delivers the task text over the encrypted P2P mesh to the target agent
  4. Your AI agent (tolki-pm, claude-code, or any other) receives and executes it

The exec command is simply nerw send linux-server/agent-name. nerw handles routing automatically — no configuration, no tokens, no open ports.

Setup

Prerequisites, config, and a prompt file. That's the entire integration — no scripts, no API keys.

Prerequisites

  • macOS 15+ (Sequoia)
  • nerw installed and connected to your mesh (nerw status shows your peers)
  • At least one AI agent running on the mesh (e.g. linux-server/tolki-pm)

SpeechButton config.toml

RightCommand sends to your PM agent with a transform. Add more [[hotkey]] entries to route different channels to different agents.

toml — ~/.config/speechbutton/config.toml
# ~/.config/speechbutton/config.toml

[general]
model = "parakeet-v3"
language = "en"
auto_punctuation = true

[audio]
vad_enabled = true
vad_silence_threshold = 1.0

# Send to PM agent via nerw P2P mesh
[[hotkey]]
key = "RightCommand"
channel = "9"
name = "openclaw"
transform = "prompts/default.md"
exec = "nerw send linux-server/tolki-pm"

Transform — spoken task → structured agent command

The transform is a prompt file that cleans up your speech before it reaches the agent. No script, no API key — SpeechButton applies it locally using its built-in AI.

You say:

"tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her"

markdown — prompts/default.md
# prompts/default.md
# Default transform — clean up spoken input for an AI agent

Clean up this spoken command for an AI agent.
Fix grammar, remove filler words, make the
instruction clear and actionable.
Output ONLY the cleaned command.

What the agent receives:

Tell Sarah on Telegram that the design review has been moved to 3pm and ask if that time works for her.

Your stream-of-consciousness became a clear, actionable instruction. The agent can execute it immediately.

Multiple agents — route different hotkeys to different agents

Add one [[hotkey]] entry per agent. nerw routes each message to the right destination automatically.

toml — multiple agents example
# RightCommand+9 → PM agent (with AI transform)
[[hotkey]]
key = "RightCommand"
channel = "9"
name = "openclaw-pm"
transform = "prompts/default.md"
exec = "nerw send linux-server/tolki-pm"

# RightCommand+0 → Dev agent (with task-specific transform)
[[hotkey]]
key = "RightCommand"
channel = "0"
name = "openclaw-dev"
transform = "prompts/claude_code_task.md"
exec = "nerw send linux-server/claude-code"

No scripts to write. No API keys to configure. nerw handles the encrypted P2P delivery to each agent.

Real Workflows

You're deep in work. OpenClaw handles the communications layer. Your voice is the interface.

Cross-platform messaging without opening anything

You need to reach people on different platforms. Normally: open WhatsApp, type. Open Slack, type. Open Telegram, type.

⌘5

"Tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her."

⌘5

"Send a message to the engineering Slack channel that the deploy is on hold until we fix the flaky test."

⌘5

"Reply to Alex's last WhatsApp message saying yes, I can meet at the coffee shop at noon."

Three platforms. Three messages. 20 seconds of speaking. Zero app switching.

Multi-step tasks

OpenClaw can chain actions. Your voice triggers complex workflows:

⌘5

"Schedule a meeting with the product team for next Tuesday at 2pm, send calendar invites to Alex, Maria, and James, and post a reminder in the product Slack channel."

OpenClaw handles the calendar creation, the invites, and the Slack notification. You spoke one sentence.

Morning briefing

⌘5

"Give me a summary of overnight messages — anything urgent from WhatsApp or Slack that I need to handle before standup."

OpenClaw scans your messaging platforms, summarizes, and responds. You're briefed while making coffee.

Quick replies while coding

Someone messages you. You don't want to context-switch.

⌘6

"Sounds good, let's do Thursday at 3. I'll book the room."

Command+6 is your raw-message channel. No transform, no overhead. Dispatched through OpenClaw in two seconds.

Why SpeechButton + OpenClaw > OpenClaw Alone

OpenClaw already has voice features — Voice Wake and Talk Mode. Why add SpeechButton?

OpenClaw Voice SpeechButton + OpenClaw
Activation Wake word ("Hey Claw") Push-to-talk hotkey (instant)
Latency Wake word detection + confirmation 7ms capture, no confirmation
Multiple targets One conversation at a time Per-hotkey routing to different tasks
Offline STT Cloud-dependent (platform varies) 100% local Apple Neural Engine
Transform No pre-processing pipeline Custom transforms per channel
Rapid-fire Conversational back-and-forth Hold-speak-release, next command

SpeechButton is for rapid-fire task dispatch — you know what you want, you say it, it's done. OpenClaw's built-in voice is for conversational interactions where you need back-and-forth. They complement each other.

The per-hotkey routing is the key differentiator. Command+5 for agent tasks, Command+6 for quick messages, Command+2 for Slack (directly, bypassing OpenClaw), Command+3 for Linear. You choose the optimal path per command — sometimes through OpenClaw, sometimes direct.

Privacy

Here's exactly what stays on your Mac and what goes to external services:

Component Where it runs Data sent externally
Voice capture Your Mac Nothing
Speech-to-text (Parakeet V3) Apple Neural Engine Nothing
Transform (prompts/default.md) Your Mac (local AI) Nothing
nerw P2P mesh QUIC/TLS 1.3, encrypted Task text → your agent only
Agent execution Your server / machine Depends on what the agent does

Voice capture and STT are 100% local — no audio ever leaves your Mac. The task text travels over the nerw P2P mesh using QUIC/TLS 1.3 encryption, direct to your agent. No third-party relay, no cloud gateway, no API keys in config.

Get Started

Prerequisites

  • macOS 15+ (Sequoia), Apple Silicon
  • nerw installed and at least one peer connected (nerw status)
  • At least one AI agent running on the mesh

Quick Start

  1. 1 Download SpeechButton — free 15 minutes/day, no account needed
  2. 2 Verify nerw is runningnerw status should show your peers and agents
  3. 3 Create prompts/default.md in your SpeechButton config folder with your transform prompt
  4. 4 Add a [[hotkey]] entry to config.toml with exec = "nerw send linux-server/your-agent"
  5. 5 Hold RightCommand, speak a task, release. Your voice now routes directly to your AI agent over the encrypted P2P mesh.

Your voice becomes the universal controller

Free 15 min/day · No account needed · macOS 15+ · Apple Silicon

 Download for macOS — Free

Pro ($7.99/mo) removes the daily limit. Requires macOS 15+ and Apple Silicon.