7.6 KiB
sidebar_position, title
| sidebar_position | title |
|---|---|
| 11 | TTS - OpenedAI-Speech using Docker |
Integrating openedai-speech into Open WebUI using Docker
What is openedai-speech?
:::info
openedai-speech is an OpenAI API compatible text-to-speech server that uses Coqui AI's xtts_v2 and/or Piper TTS as the backend. It's a free, private, text-to-speech server that allows for custom voice cloning and is compatible with the OpenAI audio/speech API.
:::
Prerequisites
- Docker installed on your system
- Open WebUI running in a Docker container
- A basic understanding of Docker and Docker Compose
Option 1: Using Docker Compose
Step 1: Create a new folder for the openedai-speech service
Create a new folder, for example, openedai-speech-service, to store the docker-compose.yml and .env files.
Step 2: Create a docker-compose.yml file
In the openedai-speech-service folder, create a new file named docker-compose.yml with the following contents:
services:
server:
image: ghcr.io/matatonic/openedai-speech
container_name: openedai-speech
env_file: .env
ports:
- "8000:8000"
volumes:
- tts-voices:/app/voices
- tts-config:/app/config
# labels:
# - "com.centurylinklabs.watchtower.enable=true"
restart: unless-stopped
volumes:
tts-voices:
tts-config:
Step 3: Create an .env file (optional)
In the same openedai-speech-service folder, create a new file named .env with the following contents:
TTS_HOME=voices
HF_HOME=voices
#PRELOAD_MODEL=xtts
#PRELOAD_MODEL=xtts_v2.0.2
#PRELOAD_MODEL=parler-tts/parler_tts_mini_v0.1
Step 4: Run docker compose to start the openedai-speech service
Run the following command in the openedai-speech-service folder to start the openedai-speech service in detached mode:
docker compose up -d
This will start the openedai-speech service in the background.
Option 2: Using Docker Run Commands
You can also use the following Docker run commands to start the openedai-speech service in detached mode:
With GPU (Nvidia CUDA) support:
docker run -d --gpus=all -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech:latest
Alternative without GPU support:
docker run -d -p 8000:8000 -v tts-voices:/app/voices -v tts-config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min:latest
Configuring Open WebUI
:::tip
For more information on configuring Open WebUI to use openedai-speech, including setting environment variables, see the Open WebUI documentation.
:::
Step 5: Configure Open WebUI to use openedai-speech
Open the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration as shown in the following image:
- API Base URL:
http://host.docker.internal:8000/v1 - API Key:
sk-111111111(note: this is a dummy API key, asopenedai-speechdoesn't require an API key; you can use whatever for this field)
Step 6: Choose a voice
Under TTS Voice within the same audio settings menu in the admin panel, you can set the TTS Model to use from the following choices below that openedai-speech supports. The voices of these models are optimized for the English language.
tts-1ortts-1-hd:alloy,echo,echo-alt,fable,onyx,nova, andshimmer(tts-1-hdis configurable; uses OpenAI samples by default)
Step 7 (optional): Adding new voices
The voice wave files are stored in the tts-voices volume and the configuration files are in the tts-config volume. Default voices are defined in voice_to_speaker.default.yaml.
In order to add an additional voice, you need to:
- Add an appropriate wave file/voice (*.wav) into the
tts-voicesvolume, for exampleexample.wav. - Reference the newly added wave file in the
voice_to_speaker.yamlconfiguration file, under the appropriate model (eithertts1ortts-1-hd), eg:
example:
model: xtts
speaker: voices/example.wav
To use this new voice, simply use the string of the voice name (in this case example) in the Audio configuration settings for your user (or set this voice as the system default).
Model Details:
Two example parler-tts voices are included in the voice_to_speaker.default.yaml file. parler-tts is experimental software and is on the slower side. The exact voice will be slightly different each generation but should be similar to the basic description.
tts-1via Piper TTS (very fast, runs on CPU): You can map your own Piper voices via thevoice_to_speaker.yamlconfiguration file, as per the instructions above.tts-1-hdvia Coqui AI/TTS XTTS v2 voice cloning (fast, but requires around 4GB GPU VRAM & Nvidia GPU with CUDA).- Multilingual Support with XTTS voices
- Beta parler-tts support (you can describe very basic features of the speaker voice), See: (https://www.text-description-to-speech.com/) for some examples of how to describe voices.
Step 8: Press Save to apply the changes and start enjoying naturally sounding voices
Press the Save button to apply the changes to your Open WebUI settings and enjoy using openedai-speech integration within Open WebUI to generate naturally sounding voice responses with text-to-speech.
Troubleshooting
If you encounter any issues, make sure that:
- The
openedai-speechservice is running and the port you set in the docker-compose.yml file is exposed. - The
host.docker.internalhostname is resolvable from within the Open WebUI container.host.docker.internalis required sinceopenedai-speechis exposed vialocalhoston your PC, butopen-webuicannot normally access this from within its container. - The API key is set to a dummy value, as
openedai-speechdoesn't require an API key.
FAQ
How can I control the emotional range of the generated audio?
There is no direct mechanism to control the emotional output of the audio generated. Certain factors may influence the output audio like capitalization or grammar, but internal tests have yielded mixed results.
Additional Resources
For more information on openedai-speech, please visit the GitHub repository.
:::note
You can change the port number in the docker-compose.yml file to any open and usable port, but make sure to update the API Base URL in Open WebUI Admin Audio settings accordingly.
:::