OpenClaw Does Everything. You Still Have to Type.
OpenClaw is the open-source AI agent that connects to your entire digital life — WhatsApp, Slack, Telegram, email, calendar, GitHub, and dozens more. It can send messages, schedule meetings, create issues, and manage your workflows. All from a single AI assistant.
But to use it, you type. Or you tap. Or you use its built-in voice wake word. The problem: OpenClaw's voice features are designed for conversational interactions, not rapid-fire task dispatch. You wait for it to wake, speak, wait for confirmation, confirm.
SpeechButton is different. Hold. Speak. Release.
"Send Alex on WhatsApp that I'll be 10 minutes late to the meeting and ask him to start without me."
Release. SpeechButton transcribes your words, transforms them into an OpenClaw command, and sends it to the local Gateway. OpenClaw routes the message to Alex on WhatsApp. Total time: 3 seconds.
"Sounds good, let's do Thursday at 3. I'll book the room."
Raw text, no transform, dispatched through OpenClaw to whoever messaged you last. Two seconds.
The task hits OpenClaw's Gateway in under a second. No wake word. No confirmation dialog. No waiting.
How It Works: Voice → STT → Transform → OpenClaw Gateway
Your voice ──▶ SpeechButton STT ──▶ Transform ──▶ nerw P2P mesh ──▶ AI agent
(7ms) (100% offline) (structure) (QUIC/TLS 1.3) (tolki-pm,
claude-code, …)
OpenClaw uses nerw — a P2P mesh network over QUIC/TLS 1.3 — to route tasks to AI agents running on any connected machine. SpeechButton's exec field runs nerw send directly — no scripts, no API keys, no local server required.
Each piece is simple:
- SpeechButton captures your voice and transcribes it locally (Apple Neural Engine, 100% offline)
- Transform (optional) cleans and structures your spoken command — e.g.
prompts/default.md - nerw delivers the task text over the encrypted P2P mesh to the target agent
- Your AI agent (tolki-pm, claude-code, or any other) receives and executes it
The exec command is simply nerw send linux-server/agent-name. nerw handles routing automatically — no configuration, no tokens, no open ports.
Setup
Prerequisites, config, and a prompt file. That's the entire integration — no scripts, no API keys.
Prerequisites
- macOS 15+ (Sequoia)
- nerw installed and connected to your mesh (
nerw statusshows your peers) - At least one AI agent running on the mesh (e.g.
linux-server/tolki-pm)
SpeechButton config.toml
RightCommand sends to your PM agent with a transform. Add more [[hotkey]] entries to route different channels to different agents.
# ~/.config/speechbutton/config.toml [general] model = "parakeet-v3" language = "en" auto_punctuation = true [audio] vad_enabled = true vad_silence_threshold = 1.0 # Send to PM agent via nerw P2P mesh [[hotkey]] key = "RightCommand" channel = "9" name = "openclaw" transform = "prompts/default.md" exec = "nerw send linux-server/tolki-pm"
Transform — spoken task → structured agent command
The transform is a prompt file that cleans up your speech before it reaches the agent. No script, no API key — SpeechButton applies it locally using its built-in AI.
You say:
"tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her"
# prompts/default.md # Default transform — clean up spoken input for an AI agent Clean up this spoken command for an AI agent. Fix grammar, remove filler words, make the instruction clear and actionable. Output ONLY the cleaned command.
What the agent receives:
Tell Sarah on Telegram that the design review has been moved to 3pm and ask if that time works for her.
Your stream-of-consciousness became a clear, actionable instruction. The agent can execute it immediately.
Multiple agents — route different hotkeys to different agents
Add one [[hotkey]] entry per agent. nerw routes each message to the right destination automatically.
# RightCommand+9 → PM agent (with AI transform) [[hotkey]] key = "RightCommand" channel = "9" name = "openclaw-pm" transform = "prompts/default.md" exec = "nerw send linux-server/tolki-pm" # RightCommand+0 → Dev agent (with task-specific transform) [[hotkey]] key = "RightCommand" channel = "0" name = "openclaw-dev" transform = "prompts/claude_code_task.md" exec = "nerw send linux-server/claude-code"
No scripts to write. No API keys to configure. nerw handles the encrypted P2P delivery to each agent.
Real Workflows
You're deep in work. OpenClaw handles the communications layer. Your voice is the interface.
Cross-platform messaging without opening anything
You need to reach people on different platforms. Normally: open WhatsApp, type. Open Slack, type. Open Telegram, type.
"Tell Sarah on Telegram that the design review is moved to 3pm and ask if that works for her."
"Send a message to the engineering Slack channel that the deploy is on hold until we fix the flaky test."
"Reply to Alex's last WhatsApp message saying yes, I can meet at the coffee shop at noon."
Three platforms. Three messages. 20 seconds of speaking. Zero app switching.
Multi-step tasks
OpenClaw can chain actions. Your voice triggers complex workflows:
"Schedule a meeting with the product team for next Tuesday at 2pm, send calendar invites to Alex, Maria, and James, and post a reminder in the product Slack channel."
OpenClaw handles the calendar creation, the invites, and the Slack notification. You spoke one sentence.
Morning briefing
"Give me a summary of overnight messages — anything urgent from WhatsApp or Slack that I need to handle before standup."
OpenClaw scans your messaging platforms, summarizes, and responds. You're briefed while making coffee.
Quick replies while coding
Someone messages you. You don't want to context-switch.
"Sounds good, let's do Thursday at 3. I'll book the room."
Command+6 is your raw-message channel. No transform, no overhead. Dispatched through OpenClaw in two seconds.
Why SpeechButton + OpenClaw > OpenClaw Alone
OpenClaw already has voice features — Voice Wake and Talk Mode. Why add SpeechButton?
| OpenClaw Voice | SpeechButton + OpenClaw | |
|---|---|---|
| Activation | Wake word ("Hey Claw") | Push-to-talk hotkey (instant) |
| Latency | Wake word detection + confirmation | 7ms capture, no confirmation |
| Multiple targets | One conversation at a time | Per-hotkey routing to different tasks |
| Offline STT | Cloud-dependent (platform varies) | 100% local Apple Neural Engine |
| Transform | No pre-processing pipeline | Custom transforms per channel |
| Rapid-fire | Conversational back-and-forth | Hold-speak-release, next command |
SpeechButton is for rapid-fire task dispatch — you know what you want, you say it, it's done. OpenClaw's built-in voice is for conversational interactions where you need back-and-forth. They complement each other.
The per-hotkey routing is the key differentiator. Command+5 for agent tasks, Command+6 for quick messages, Command+2 for Slack (directly, bypassing OpenClaw), Command+3 for Linear. You choose the optimal path per command — sometimes through OpenClaw, sometimes direct.
Privacy
Here's exactly what stays on your Mac and what goes to external services:
| Component | Where it runs | Data sent externally |
|---|---|---|
| Voice capture | Your Mac | ✓ Nothing |
| Speech-to-text (Parakeet V3) | Apple Neural Engine | ✓ Nothing |
| Transform (prompts/default.md) | Your Mac (local AI) | ✓ Nothing |
| nerw P2P mesh | QUIC/TLS 1.3, encrypted | Task text → your agent only |
| Agent execution | Your server / machine | Depends on what the agent does |
Voice capture and STT are 100% local — no audio ever leaves your Mac. The task text travels over the nerw P2P mesh using QUIC/TLS 1.3 encryption, direct to your agent. No third-party relay, no cloud gateway, no API keys in config.
Get Started
Prerequisites
- macOS 15+ (Sequoia), Apple Silicon
- nerw installed and at least one peer connected (
nerw status) - At least one AI agent running on the mesh
Quick Start
- 1 Download SpeechButton — free 15 minutes/day, no account needed
-
2
Verify nerw is running —
nerw statusshould show your peers and agents -
3
Create
prompts/default.mdin your SpeechButton config folder with your transform prompt -
4
Add a
[[hotkey]]entry toconfig.tomlwithexec = "nerw send linux-server/your-agent" - 5 Hold RightCommand, speak a task, release. Your voice now routes directly to your AI agent over the encrypted P2P mesh.
Your voice becomes the universal controller
Free 15 min/day · No account needed · macOS 15+ · Apple Silicon
Download for macOS — FreePro ($7.99/mo) removes the daily limit. Requires macOS 15+ and Apple Silicon.