Files
proxmox/docs/04-configuration/PHOENIX_TTS_API_CONTRACT.md

62 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phoenix TTS API contract (ElevenLabs-compatible)
**Last Updated:** 2026-02-10
**Purpose:** So virtual-banker (and other apps) can “just change endpoint” from ElevenLabs to a Phoenix-hosted TTS service.
---
## Required endpoints
The Phoenix TTS service **must** implement the same HTTP contract as ElevenLabs for these paths (base path is the apps `/tts` or similar; below uses prefix `/v1`).
### 1. Sync text-to-speech
- **Method:** `POST`
- **Path:** `/v1/text-to-speech/:voice_id`
- **Headers:**
- `Content-Type: application/json`
- `Accept: audio/mpeg`
- Auth: either `xi-api-key: <key>` or `Authorization: Bearer <token>` (configurable in client)
- **Body (JSON):**
```json
{
"text": "Hello world",
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0,
"use_speaker_boost": true
}
}
```
- **Response:** `200 OK`, body = raw **mp3** bytes (`audio/mpeg`).
### 2. Streaming text-to-speech
- **Method:** `POST`
- **Path:** `/v1/text-to-speech/:voice_id/stream`
- **Headers:** Same as sync.
- **Body:** Same JSON as sync.
- **Response:** `200 OK`, body = **streaming** mp3 (same format).
### 3. Health (recommended)
- **Method:** `GET`
- **Path:** `/health` (at same origin as the TTS base URL, e.g. `https://phoenix.example.com/tts/health` if base is `.../tts/v1`)
- **Response:** `200 OK` (body optional; used for readiness).
---
## Optional
- **Auth:** If Phoenix uses a different scheme (e.g. Bearer only), clients set `TTS_AUTH_HEADER_NAME` / `TTS_AUTH_HEADER_VALUE`; no API change.
- **Visemes:** For better lip-sync, a future endpoint could return phoneme/viseme timings; client would call it when available.
---
## Reference
- Virtual-banker TTS client: `virtual-banker/backend/tts` (see `backend/tts/README.md`).
- ElevenLabs TTS API: [Text-to-speech](https://elevenlabs.io/docs/api-reference/text-to-speech), [Stream](https://elevenlabs.io/docs/api-reference/text-to-speech/stream).