Files
proxmox/docs/04-configuration/PHOENIX_TTS_API_CONTRACT.md

2.0 KiB
Raw Permalink Blame History

Phoenix TTS API contract (ElevenLabs-compatible)

Last Updated: 2026-02-10
Purpose: So virtual-banker (and other apps) can “just change endpoint” from ElevenLabs to a Phoenix-hosted TTS service.


Required endpoints

The Phoenix TTS service must implement the same HTTP contract as ElevenLabs for these paths (base path is the apps /tts or similar; below uses prefix /v1).

1. Sync text-to-speech

  • Method: POST
  • Path: /v1/text-to-speech/:voice_id
  • Headers:
    • Content-Type: application/json
    • Accept: audio/mpeg
    • Auth: either xi-api-key: <key> or Authorization: Bearer <token> (configurable in client)
  • Body (JSON):
    {
      "text": "Hello world",
      "model_id": "eleven_multilingual_v2",
      "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.75,
        "style": 0,
        "use_speaker_boost": true
      }
    }
    
  • Response: 200 OK, body = raw mp3 bytes (audio/mpeg).

2. Streaming text-to-speech

  • Method: POST
  • Path: /v1/text-to-speech/:voice_id/stream
  • Headers: Same as sync.
  • Body: Same JSON as sync.
  • Response: 200 OK, body = streaming mp3 (same format).
  • Method: GET
  • Path: /health (at same origin as the TTS base URL, e.g. https://phoenix.example.com/tts/health if base is .../tts/v1)
  • Response: 200 OK (body optional; used for readiness).

Optional

  • Auth: If Phoenix uses a different scheme (e.g. Bearer only), clients set TTS_AUTH_HEADER_NAME / TTS_AUTH_HEADER_VALUE; no API change.
  • Visemes: For better lip-sync, a future endpoint could return phoneme/viseme timings; client would call it when available.

Reference

  • Virtual-banker TTS client: virtual-banker/backend/tts (see backend/tts/README.md).
  • ElevenLabs TTS API: Text-to-speech, Stream.