use-local-whisperClaude Skill
Use when the user wants local voice transcription instead of OpenAI Whisper API.
| name | use-local-whisper |
| description | Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first. |
Use Local Whisper
Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.
Channel support: Currently WhatsApp only. The transcription module (src/transcription.ts) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.
Note: The Homebrew package is whisper-cpp, but the CLI binary it installs is whisper-cli.
Prerequisites
voice-transcriptionskill must be applied first (WhatsApp channel)- macOS with Apple Silicon (M1+) recommended
whisper-cppinstalled:brew install whisper-cpp(provides thewhisper-clibinary)ffmpeginstalled:brew install ffmpeg- A GGML model file downloaded to
data/models/
Phase 1: Pre-flight
Check if already applied
Read .nanoclaw/state.yaml. If use-local-whisper is in applied_skills, skip to Phase 3 (Verify).
Check dependencies are installed
whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING" ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"
If missing, install via Homebrew:
brew install whisper-cpp ffmpeg
Check for model file
ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
If no model exists, download the base model (148MB, good balance of speed and accuracy):
mkdir -p data/models curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
For better accuracy at the cost of speed, use ggml-small.bin (466MB) or ggml-medium.bin (1.5GB).
Phase 2: Apply Code Changes
npx tsx scripts/apply-skill.ts .claude/skills/use-local-whisper
This modifies src/transcription.ts to use the whisper-cli binary instead of the OpenAI API.
Validate
npm test npm run build
Phase 3: Verify
Ensure launchd PATH includes Homebrew
The NanoClaw launchd service runs with a restricted PATH. whisper-cli and ffmpeg are in /opt/homebrew/bin/ (Apple Silicon) or /usr/local/bin/ (Intel), which may not be in the plist's PATH.
Check the current PATH:
grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist
If /opt/homebrew/bin is missing, add it to the <string> value inside the PATH key in the plist. Then reload:
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
Build and restart
npm run build launchctl kickstart -k gui/$(id -u)/com.nanoclaw
Test
Send a voice note in any registered group. The agent should receive it as [Voice: <transcript>].
Check logs
tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"
Look for:
Transcribed voice message— successful transcriptionwhisper.cpp transcription failed— check model path, ffmpeg, or PATH
Configuration
Environment variables (optional, set in .env):
| Variable | Default | Description |
|---|---|---|
WHISPER_BIN | whisper-cli | Path to whisper.cpp binary |
WHISPER_MODEL | data/models/ggml-base.bin | Path to GGML model file |
Troubleshooting
"whisper.cpp transcription failed": Ensure both whisper-cli and ffmpeg are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:
ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
Transcription works in dev but not as service: The launchd plist PATH likely doesn't include /opt/homebrew/bin. See "Ensure launchd PATH includes Homebrew" in Phase 3.
Slow transcription: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.
Wrong language: whisper.cpp auto-detects language. To force a language, you can set WHISPER_LANG and modify src/transcription.ts to pass -l $WHISPER_LANG.
Similar Claude Skills & Agent Workflows
manifest
Smart LLM Router for OpenClaw.
clawrouter
Smart LLM router — save 67% on inference costs.
chatgpt-app-builder
DEPRECATED: This skill has been replaced by `mcp-app-builder`.
add-voice-transcription
Add voice message transcription to NanoClaw using OpenAI's Whisper API.
add-ollama-tool
Add Ollama MCP server so the container agent can call local models for cheaper/faster tasks like summarization, translation, or general queries.
openai-image-vision
Analyze images using OpenAI's Vision API.