solution channel medium macos linux windows telegram

Telegram TTS: audio is sent as a file instead of a voice bubble (or gets sent twice)

Fix Telegram TTS replies that arrive as normal audio files instead of voice bubbles, or that produce both an audio attachment and a second narrative message, by checking provider/output format and per-session TTS prefs.

By CoClaw Team • March 8, 2026

Symptoms

You enable TTS in Telegram and receive a normal audio file instead of a round voice bubble.
Or one agent reply turns into two visible outputs, for example:
- one audio attachment,
- plus a second narrative text reply.
Changing ~/.openclaw/settings/tts.json or main config seems inconsistent, and it is not obvious which setting actually won.
This often shows up when using Edge TTS for Telegram.

Cause

This is usually a combination of output format mismatch and TTS mode confusion.

There are three important pieces here:

Provider / output format
- Telegram voice-note UX works best when the TTS result is Telegram-friendly voice media.
- Edge TTS defaults to an MP3-style output format, which Telegram can treat as a normal audio file instead of a voice bubble.
Per-session TTS preferences
- /tts commands write local overrides to ~/.openclaw/settings/tts.json.
- Those local prefs can override messages.tts.* in the main config on that host.
Auto-TTS behavior
- If auto-TTS is always on, the agent may produce an audio result and still emit normal narrative text, which can look like a duplicate reply.

So the problem is not always “Telegram is broken.” It is often:

the wrong output format for Telegram voice-note UX,
plus a TTS mode that is too eager for the workflow.

Fix

1) Check which provider and mode are actually active

In Telegram, run:

/tts status

If the provider/mode shown there is not what you expected, remember that /tts prefs can override your main config.

Why this helps: it tells you whether you are debugging the main config or a per-session override.

2) If you want Telegram voice bubbles, avoid Edge’s default MP3-style output

If you want the most predictable Telegram voice-note behavior, prefer a provider/output that produces Telegram-friendly voice media.

If you are testing Edge specifically, set an OGG/Opus-style output format in your config and restart the gateway.

Why this helps: Edge’s default output format is commonly treated as a normal audio file, not a voice bubble.

3) Reduce accidental double-send behavior

If you are seeing both audio and text, switch from always-on auto TTS to a more controlled mode such as tagged or one-off usage.

Examples:

/tts tagged

or use one-off generation only when you want audio:

/tts audio 你好，这是测试。

Why this helps: it prevents every reply from automatically trying to become audio while the agent also emits normal text narration.

4) If `/tts` preferences are stale, reset them intentionally

Because /tts commands write local prefs, it is easy to think a config edit failed when a local override is still active.

Re-run the exact /tts commands you want for the current session, then re-check:

/tts status

Verify

Telegram now shows a voice bubble when you expect one, instead of a generic audio file.
One assistant turn no longer produces both an audio send and a second unwanted narrative reply.
/tts status matches the provider/mode you intended.

If it still fails, collect these exact details:

/tts status
your messages.tts.* config block
whether the problem happens only with Edge or also with OpenAI/ElevenLabs
whether the reply was generated with /tts audio, always, inbound, or tagged

/guides/telegram-setup
/guides/openclaw-configuration
/troubleshooting/solutions/telegram-preview-streaming-duplicate-or-flash
OpenClaw docs: TTS