๐ฎ AgentAdventure
OpenClaw skill that drops AI agents into self-hosted WorkAdventure as real avatars โ with movement, proximity chat, and experimental voice.
Prerequisite: A running WorkAdventure instance with LiveKit enabled.
| Key | Purpose | Required | Where to Get |
|---|---|---|---|
WA_URL |
Your WorkAdventure instance URL | Yes | Your self-hosted WA deployment |
WA_BOT_NAME |
Display name for the bot avatar | Yes | Any string you choose |
ELEVENLABS_API_KEY |
Text-to-speech for voice chat | Voice only | elevenlabs.io |
DEEPGRAM_API_KEY |
Speech-to-text for voice chat | Voice only | deepgram.com |
npm install -g openclaw@latest
openclaw gateway start # starts gateway; creates ~/.openclaw/ on first run
# Option A: From ClawHub (once published)
clawdhub install agentadventure
# Option B: Manual (during development)
mkdir -p ~/.openclaw/skills/agentadventure
# Copy SKILL.md, runner.ts, bridge.ts into the folder
cd ~/.openclaw/skills/agentadventure && npx playwright install chromium
# Verify:
openclaw skills list --eligible
Add the skill entry to ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"agentadventure": {
"enabled": true,
"env": {
"WA_URL": "http://play.workadventure.localhost/",
"WA_BOT_NAME": "AgentBot"
}
}
}
}
}
For voice support, also add the voice-call skill entry:
{
"skills": {
"entries": {
"agentadventure": { "enabled": true, "env": { "WA_URL": "...", "WA_BOT_NAME": "AgentBot" } },
"voice-call": {
"enabled": true,
"env": {
"ELEVENLABS_API_KEY": "your-key-here",
"DEEPGRAM_API_KEY": "your-key-here"
}
}
}
}
}
Then start the gateway:
openclaw gateway start
Verify by joining the WA map โ the agent avatar should appear and respond to proximity chat.
AgentAdventure is an OpenClaw skill that enables AI agents to appear as visible avatars in a self-hosted WorkAdventure virtual office. Each agent runs inside a headless Chromium browser controlled by Playwright, interacting with the WA Scripting API for movement, proximity chat, and experimental voice conversations โ all without modifying the WorkAdventure backend.
Agents enter WorkAdventure the same way a human would: through the anonymous login flow (display name โ Woka avatar picker โ map entry). Once inside, injected scripts bridge WA events back to the OpenClaw gateway, where agent logic generates responses and sends commands. Matrix provides fallback messaging for non-proximity interactions and multi-agent coordination.
The entire skill is a single folder (~/.openclaw/skills/agentadventure/) deployable via clawdhub install agentadventure or manual placement.
WA.player.moveTo(), and full participation in proximity bubbles.WA.chat.sendChatMessage / onChatMessage with โbubbleโ scope and typing indicators.WA.players.onPlayerEnters / onPlayerLeaves (with configureTracking()), plus bubble lifecycle via proximityMeeting.onJoin().listenToAudioStream / startAudioStream APIs, bridged to OpenClaw voice skills (ElevenLabs, Deepgram). Falls back to text on failure.The skill spawns a Playwright browser session per agent, injects WA Scripting API event listeners, and bridges callbacks to the OpenClaw gateway via page.exposeFunction. Outbound commands (move, chat, voice) flow from agent logic through page.evaluate() calls.
graph TB
subgraph OpenClaw["OpenClaw Platform"]
GW[Gateway<br/>Session Mgmt]
SK[Skill Runner<br/>AgentAdventure]
VS[Voice Skill<br/>STT/TTS Pipeline]
MX[Matrix Channel<br/>Chat Fallback]
end
subgraph Browser["Playwright Browser (Headless)"]
PW[Playwright Controller]
INJ[Injected Scripts<br/>Event Listeners]
end
subgraph WA["WorkAdventure v1.28.9"]
WAC[WA Client<br/>Scripting API]
AV[Bot Avatar]
PRX[Proximity Bubble<br/>Chat / Voice]
LK[LiveKit<br/>Audio Streams]
end
GW --> SK
SK --> PW
PW --> WAC
WAC --> AV
WAC --> PRX
WAC --> LK
INJ --> PW
PW --> GW
VS <--> SK
MX <--> GW
Agent logic sends a command (move/chat/voice) โ OpenClaw gateway routes it to the AgentAdventure skill โ Playwright calls page.evaluate() โ WA Scripting API executes the action (avatar moves, message appears in bubble).
A human enters a proximity bubble โ WA fires proximityMeeting.onJoin โ injected listener calls window.onWAEvent('join', users) โ Playwright bridges the callback to the gateway โ agent logic processes and responds. The same pattern applies to chat messages (onChatMessage) and audio buffers (listenToAudioStream).
Incoming audio flows through WAโs listenToAudioStream (Float32Array buffers) โ an injected listener collects buffers โ STT (Deepgram/ElevenLabs) transcribes โ agent LLM generates a response โ TTS synthesizes audio โ startAudioStream / appendAudioData sends it back to the bubble.
โ ๏ธ WA blog documents PCM16 at 24kHz converted to Float32 for the Web Audio API. Verify the actual
sampleRateparameter from WA source before hardcoding.
Every operation follows a retry โ fallback โ restart chain. Transient failures retry up to 3 times. Voice failures drop to text chat. Browser crashes trigger an automatic session restart. Non-recoverable errors notify the gateway without crashing the skill.
WAโs native Matrix bridge syncs proximity bubbles to Matrix rooms. The OpenClaw Matrix channel handles m.room.message events for fallback/global messaging and multi-agent coordination outside proximity range.
~/.openclaw/skills/agentadventure/
โโโ SKILL.md # Skill definition โ YAML frontmatter + usage instructions
โโโ runner.ts # Playwright session: launch, anonymous login, lifecycle, retry
โโโ bridge.ts # Event bridge: WA Scripting API โ OpenClaw agent logic
โโโ voice.ts # Voice pipeline: listenToAudioStream โ STT โ LLM โ TTS โ startAudioStream
โโโ utils.ts # Shared helpers: retryOp, parseCoords, getMessage, rate limiting
โโโ __tests__/
โโโ runner.test.ts
โโโ bridge.test.ts
โโโ voice.test.ts
Configuration lives in ~/.openclaw/openclaw.json under skills.entries.agentadventure. OpenClaw skills are SKILL.md folders โ there is no plugin.json.
Once deployed, the agent uses the skill when instructed to join WorkAdventure. The skill handles the full lifecycle:
onJoin, onPlayerEnters/Leaves), chat (onChatMessage with bubble scope), and optionally voice (listenToAudioStream).page.exposeFunction; outbound commands execute via page.evaluate (move, chat with typing indicators, voice).Logs are available via openclaw logs and docker logs for WA containers.
For multiple agents, scale with Kubernetes/Helm and limit browsers via environment variables.
Voice support is experimental and depends on WAโs startAudioStream / listenToAudioStream APIs.
The pipeline works as follows: incoming audio from the WA bubble arrives as Float32Array buffers via listenToAudioStream. These buffers are collected and sent to a STT provider (Deepgram or ElevenLabs). The transcription feeds into the agent LLM, which generates a response. That response is synthesized via TTS and streamed back through startAudioStream / appendAudioData.
On any voice failure (STT timeout, TTS error, stream routing issue), the skill automatically drops to text chat. Headless audio routing uses Playwrightโs --use-fake-device-for-media-stream flag; LiveKit handles the WebRTC transport.
skills.entries.*.env / skills.entries.*.apiKey; rotate every 90 days.agentId mismatch.wss://, bearer tokens).npm audit / Snyk scans; remediate high-severity vulns within 7 days.| Risk | Mitigation | Verification |
|---|---|---|
| Playwright instability / browser crashes | Docker sandbox; auto-restart sessions | Log โSession restarted after crashโ |
| WA Scripting API is client-only (no server bots) | Full browser automation; Matrix fallback | Dry-run script injection; compare manual vs. automated |
| Perf overhead (browser per agent) | Limit agents; lightweight Chromium | Benchmark CPU/mem; prove <20% overhead |
| Credentials exposure | Gateway permissions; encrypt keys | Audit logs; no leaks in tests |
| Risk | Mitigation | Verification |
|---|---|---|
| Event drops in automated browser | RxJS subs with retries; websocket bridge | Sim bubble join/leave; 100% capture in logs |
| Bubble scope limits (no history on join) | Agent state tracks context; fetch players on join | Test msg before/after join; agent ignores pre-join |
| Flaky tests/timeouts | Auto-wait assertions; retries on transients | Induce delay โ retry logs success |
| WA script load errors (CORS) | Console listener + restart | Sim bad script โ log/catch/restart |
| Risk | Mitigation | Verification |
|---|---|---|
| Headless audio routing fails | Fake streams for tests; LiveKit node SDK bridge | Log stream capture/playback; compare manual vs. agent |
| High latency in STT/TTS | Low-latency providers (Deepgram); cache responses | Measure e2e <500ms vs. WA native (~200ms) |
| Audio leaks | Encrypt streams; scope voice perms | Audit no external sends without consent |
| Experimental voice APIs unstable | Fallback to text; monitor WA docs/GitHub | Test stream start/listen; logs show buffers |
| Issue | Fix |
|---|---|
| Browser crash | Check Playwright logs; restart the gateway (openclaw gateway start) |
| Login failure | WA anonymous login: verify name input selector; test in non-headless mode; increase timeouts in runner.ts |
| Missed proximity events | Inspect injected script; ensure configureTracking() is called; sim with manual joins; fallback to Matrix |
| Voice latency | Test STT/TTS providers; cache responses; fallback to text on >500ms |
| Matrix sync issues | Confirm WA Matrix bridge config; check OpenClaw channel perms; resync rooms |
| High CPU | Limit to <5 agents per browser; use headless: false for debug; monitor with top/htop |
| Skill not eligible | Run openclaw skills list --eligible; check requires.bins are on PATH; restart gateway |
| General | Enable verbose logging; check WA/OpenClaw docs and GitHub issues |