mirror of
https://github.com/LibreChat-AI/librechat.ai.git
synced 2026-03-27 10:48:32 +07:00
🎤 docs: add custom speech config, browser TTS/STT features, and dynamic speech tab settings (#61)
* initial commit * chore: Update STT and TTS configuration documentation * docs(stt_tts): small improvements
This commit is contained in:
@@ -6,237 +6,222 @@ description: Configuration of the Speech to Text (STT) and Text to Speech (TTS)
|
||||
# Speech to Text (STT) and Text to Speech (TTS)
|
||||
|
||||
<Callout type="info" title="Upcoming STT/TTS Enhancements" collapsible>
|
||||
|
||||
The Google cloud STT/TTS and Deepgram services are beeing planned to add in the future
|
||||
|
||||
The Google Cloud STT/TTS and Deepgram services are being planned for future integration.
|
||||
</Callout>
|
||||
|
||||
## STT
|
||||
## Speech Introduction
|
||||
|
||||
The Speech-to-Text (STT) feature allows you to convert spoken words into written text.
|
||||
To enable the STT (already configured), click on the STT button (near the send button) and start speaking. Otherwise, you can also use the key combination: ++Ctrl+Alt+L++ to start the transcription.
|
||||
The Speech Configuration includes settings for both Speech-to-Text (STT) and Text-to-Speech (TTS) under a unified `speech:` section. Additionally, there is a new `speechTab` menu for user-specific settings
|
||||
|
||||
There are many different STT services available, but here's a list of some of the most popular ones:
|
||||
## Speech Tab (optional)
|
||||
|
||||
### Local STT
|
||||
|
||||
- Browser-based
|
||||
- Whisper (tested on LocalAI and HomeAssistant)
|
||||
|
||||
### Cloud STT
|
||||
|
||||
- OpenAI Whisper (via API calls)
|
||||
- Azure Whisper (via API calls)
|
||||
- All the other OpenAI compatible STT services (via API calls)
|
||||
|
||||
### Browser-based
|
||||
|
||||
No setup required, just click make sure that the "STT button" in the speech settings tab is enabled and in the engine dropdown "Browser" is selected. When clicking the button, the browser will ask for permission to use the microphone. Once permission is granted, you can start speaking and the text will be displayed in the chat window in real-time. When you're done speaking, click the button again to stop the transcription or wait for the timeout to stop the transcription automatically
|
||||
|
||||
### Whisper local
|
||||
|
||||
<Callout type="warning" title="Compatibility Testing" collapsible>
|
||||
|
||||
Whisper local has been tested only on LocalAI and HomeAssistant's whisper docker image, but it should work on any other local whisper instance
|
||||
|
||||
</Callout>
|
||||
|
||||
To use the Whisper local STT service, you need to have a local whisper instance running. You can find more information on how to set up a local whisper instance with LocalAI in the [LocalAI's documentation](https://localai.io/features/audio-to-text/). Once you have a local whisper instance running, you can configure the STT service as followed:
|
||||
|
||||
in the `librechat.yaml` add this configuration:
|
||||
The `speechTab` menu provides customizable options for conversation and advanced modes, as well as detailed settings for STT and TTS. This will set the default settings for users
|
||||
|
||||
```yaml
|
||||
stt:
|
||||
openai:
|
||||
url: 'http://host.docker.internal:8080/v1/audio/transcriptions'
|
||||
model: 'whisper'
|
||||
speech:
|
||||
speechTab:
|
||||
conversationMode: true
|
||||
advancedMode: false
|
||||
speechToText:
|
||||
engineSTT: "OpenAI Whisper"
|
||||
languageSTT: "en"
|
||||
autoTranscribeAudio: true
|
||||
decibelValue: 30
|
||||
autoSendText: true
|
||||
textToSpeech:
|
||||
engineTTS: "OpenAI"
|
||||
voice: "alloy"
|
||||
languageTTS: "en"
|
||||
automaticPlayback: true
|
||||
playbackRate: 1.0
|
||||
cacheTTS: true
|
||||
```
|
||||
|
||||
where, `url` it's the url of the whisper instance, the apiKey points to the .env and `model` is the model that you want to use for the transcription
|
||||
## STT (Speech-to-Text)
|
||||
|
||||
### OpenAI Whisper
|
||||
The Speech-to-Text (STT) feature converts spoken words into written text. To enable STT, click on the STT button (near the send button) or use the key combination ++Ctrl+Alt+L++ to start the transcription.
|
||||
|
||||
Create an OpenAI api key at [OpenAI's website](https://platform.openai.com/account/api-keys)
|
||||
### Available STT Services
|
||||
|
||||
Then, in the `librechat.yaml` file, add the following configuration:
|
||||
- **Local STT**
|
||||
- Browser-based
|
||||
- Whisper (tested on LocalAI)
|
||||
- **Cloud STT**
|
||||
- OpenAI Whisper
|
||||
- Azure Whisper
|
||||
- Other OpenAI-compatible STT services
|
||||
|
||||
### Configuring Local STT
|
||||
|
||||
- #### Browser-based
|
||||
No setup required. Ensure the "Speech To Text" switch in the speech settings tab is enabled and "Browser" is selected in the engine dropdown.
|
||||
|
||||
- #### Whisper Local
|
||||
Requires a local Whisper instance.
|
||||
|
||||
```yaml
|
||||
stt:
|
||||
openai:
|
||||
apiKey: '${STT_API_KEY}'
|
||||
model: 'whisper-1'
|
||||
speech:
|
||||
stt:
|
||||
openai:
|
||||
url: 'http://host.docker.internal:8080/v1/audio/transcriptions'
|
||||
model: 'whisper'
|
||||
```
|
||||
|
||||
<Callout type="abstract" title="Understanding Guide" collapsible>
|
||||
### Configuring Cloud STT
|
||||
|
||||
if you want to understand more about these variables check the [Whisper local](#whisper-local) section
|
||||
- #### OpenAI Whisper
|
||||
|
||||
</Callout>
|
||||
|
||||
### Azure Whisper (WIP)
|
||||
|
||||
in the `librechat.yaml` file, add the following configuration to your already existing Azure configuration:
|
||||
|
||||
<Callout type="i" title="don't have an Azure configuration yet?" collapsible>
|
||||
|
||||
if you don't have one, you can find more information on how to set up an Azure STT service in the [Azure's documentation](https://docs.librechat.ai/install/configuration/azure_openai.html)
|
||||
|
||||
</Callout>
|
||||
|
||||
```yaml filename="librechat.yaml"
|
||||
models:
|
||||
whisper:
|
||||
deploymentName: whisper-01
|
||||
```yaml
|
||||
speech:
|
||||
stt:
|
||||
openai:
|
||||
apiKey: '${STT_API_KEY}'
|
||||
model: 'whisper-1'
|
||||
```
|
||||
|
||||
<Callout type="abstract" title="Understanding Guide" collapsible>
|
||||
- #### Azure Whisper (WIP)
|
||||
|
||||
if you want to understand more about these variables check the [Whisper local](#whisper-local) section
|
||||
```yaml
|
||||
speech:
|
||||
stt:
|
||||
azure:
|
||||
instanceName: 'instanceName'
|
||||
apiKey: '${STT_API_KEY}'
|
||||
deploymentName: 'deploymentName'
|
||||
apiVersion: 'apiVersion'
|
||||
```
|
||||
|
||||
</Callout>
|
||||
- #### OpenAI compatible
|
||||
|
||||
### OpenAI compatible STT services
|
||||
Refer to the OpenAI Whisper section, adjusting the `url` and `model` as needed.
|
||||
|
||||
check the [OpenAI Whisper](#openai-whisper) section, just change the `url` and `model` variables to the ones that you want to use
|
||||
example
|
||||
|
||||
```yaml
|
||||
speech:
|
||||
stt:
|
||||
openai:
|
||||
url: 'http://host.docker.internal:8080/v1/audio/transcriptions'
|
||||
model: 'whisper'
|
||||
```
|
||||
|
||||
|
||||
## TTS
|
||||
## TTS (Text-to-Speech)
|
||||
|
||||
The Text-to-Speech (TTS) feature allows you to convert written text into spoken words. There are many different TTS services available, but here's a list of some of the most popular ones:
|
||||
The Text-to-Speech (TTS) feature converts written text into spoken words. Various TTS services are available:
|
||||
|
||||
### Local TTS
|
||||
### Available TTS Services
|
||||
|
||||
- Browser-based
|
||||
- Piper (tested on LocalAI)
|
||||
- Coqui (tested on LocalAI)
|
||||
- **Local TTS**
|
||||
- Browser-based
|
||||
- Piper (tested on LocalAI)
|
||||
- Coqui (tested on LocalAI)
|
||||
- **Cloud TTS**
|
||||
- OpenAI TTS
|
||||
- ElevenLabs
|
||||
- Other OpenAI/ElevenLabs-compatible TTS services
|
||||
|
||||
### Cloud TTS
|
||||
### Configuring Local TTS
|
||||
|
||||
- OpenAI TTS
|
||||
- ElevenLabs
|
||||
- All the other OpenAI compatible TTS services
|
||||
- #### Browser-based
|
||||
|
||||
### Browser-based
|
||||
No setup required. Ensure the "Text To Speech" switcg in the speech settings tab is enabled and "Browser" is selected in the engine dropdown.
|
||||
|
||||
No setup required, just click make sure that the "TTS button" in the speech settings tab is enabled and in the engine dropdown "Browser" is selected. When clicking the button, it will start speaking, click the button again to stop the speech or wait for the speech to finish
|
||||
- #### Piper
|
||||
|
||||
### Piper
|
||||
Requires a local Piper instance.
|
||||
|
||||
<Callout type="warning" title="Compatibility Testing" collapsible>
|
||||
|
||||
Piper has been tested only on LocalAI, but it should work on any other local piper instance
|
||||
|
||||
</Callout>
|
||||
|
||||
To use the Piper local TTS service, you need to have a local piper instance running. You can find more information on how to set up a local piper instance with LocalAI in the [LocalAI's documentation](https://localai.io/features/text-to-audio/#piper). Once you have a local piper instance running, you can configure the TTS service as followed:
|
||||
|
||||
In the `librechat.yaml` add this configuration:
|
||||
|
||||
```yaml filename="librechat.yaml"
|
||||
tts:
|
||||
localai:
|
||||
url: "http://host.docker.internal:8080/tts"
|
||||
apiKey: "EMPTY"
|
||||
voices: [
|
||||
"en-us-amy-low.onnx",
|
||||
"en-us-danny-low.onnx",
|
||||
"en-us-libritts-high.onnx",
|
||||
"en-us-ryan-high.onnx",
|
||||
```yaml
|
||||
speech:
|
||||
tts:
|
||||
localai:
|
||||
url: "http://host.docker.internal:8080/tts"
|
||||
apiKey: "EMPTY"
|
||||
voices: [
|
||||
"en-us-amy-low.onnx",
|
||||
"en-us-danny-low.onnx",
|
||||
"en-us-libritts-high.onnx",
|
||||
"en-us-ryan-high.onnx",
|
||||
]
|
||||
backend: "piper"
|
||||
backend: "piper"
|
||||
```
|
||||
|
||||
Voices are just an example, you can find more information about the voices in the [LocalAI's documentation](https://localai.io/features/text-to-audio/#piper)
|
||||
- #### Coqui
|
||||
|
||||
### Coqui
|
||||
Requires a local Coqui instance.
|
||||
|
||||
<Callout type="warning" title="Compatibility Testing" collapsible>
|
||||
|
||||
Coqui has been tested only on LocalAI, but it should work on any other local coqui instance
|
||||
|
||||
</Callout>
|
||||
|
||||
To use the Coqui local TTS service, you need to have a local coqui instance running. You can find more information on how to set up a local coqui instance with LocalAI in the [LocalAI's documentation](https://localai.io/features/text-to-audio/#-coqui). Once you have a local coqui instance running, you can configure the TTS service as followed:
|
||||
|
||||
in the `librechat.yaml` add this configuration:
|
||||
|
||||
```yaml filename="librechat.yaml"
|
||||
tts:
|
||||
localai:
|
||||
url: 'http://localhost:8080/v1/audio/synthesize'
|
||||
voices: ['tts_models/en/ljspeech/glow-tts', 'tts_models/en/ljspeech/tacotron2', 'tts_models/en/ljspeech/waveglow']
|
||||
backend: 'coqui'
|
||||
```yaml
|
||||
speech:
|
||||
tts:
|
||||
localai:
|
||||
url: 'http://localhost:8080/v1/audio/synthesize'
|
||||
voices: ['tts_models/en/ljspeech/glow-tts', 'tts_models/en/ljspeech/tacotron2', 'tts_models/en/ljspeech/waveglow']
|
||||
backend: 'coqui'
|
||||
```
|
||||
|
||||
voices are just an example, you can find more information about the voices in the [LocalAI's documentation](https://localai.io/features/text-to-audio/#-coqui)
|
||||
### Configuring Cloud TTS
|
||||
|
||||
### OpenAI TTS
|
||||
- #### OpenAI TTS
|
||||
|
||||
Create an OpenAI api key at [OpenAI's website](https://platform.openai.com/account/api-keys)
|
||||
|
||||
Then, in the `librechat.yaml` file, add the following configuration:
|
||||
|
||||
```yaml filename="librechat.yaml"
|
||||
tts:
|
||||
openai:
|
||||
apiKey: '${TTS_API_KEY}'
|
||||
model: 'tts-1'
|
||||
voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
|
||||
```yaml
|
||||
speech:
|
||||
tts:
|
||||
openai:
|
||||
apiKey: '${TTS_API_KEY}'
|
||||
model: 'tts-1'
|
||||
voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
|
||||
```
|
||||
|
||||
you can choose between the `tts-1` and the `tts-1-hd` models, more information about the models can be found in the [OpenAI's documentation](https://platform.openai.com/docs/guides/text-to-speech/audio-quality)
|
||||
- #### ElevenLabs
|
||||
|
||||
the `voice` variable can be `alloy`, `echo`, `fable` etc... more information about the voices can be found in the [OpenAI's documentation](https://platform.openai.com/docs/guides/text-to-speech/voice-options)
|
||||
|
||||
### ElevenLabs
|
||||
|
||||
Create an ElevenLabs api key at [ElevenLabs's website](https://elevenlabs.io/)
|
||||
|
||||
Then, click on the "Voices" tab, and copy the ID of the voices you want to use. If you haven't already added one, click on the "Voice library" where you can find a lot of pre-made voices, add one and copy the ID of the voice that you want to use by clicking the "ID" button
|
||||
|
||||
in the `librechat.yaml` file, add the following configuration:
|
||||
|
||||
```yaml filename="librechat.yaml"
|
||||
tts:
|
||||
elevenlabs:
|
||||
apiKey: '${TTS_API_KEY}'
|
||||
model: 'eleven_multilingual_v2'
|
||||
voices: ['202898wioas09d2', 'addwqr324tesfsf', '3asdasr3qrq44w', 'adsadsa']
|
||||
```yaml
|
||||
speech:
|
||||
tts:
|
||||
elevenlabs:
|
||||
apiKey: '${TTS_API_KEY}'
|
||||
model: 'eleven_multilingual_v2'
|
||||
voices: ['202898wioas09d2', 'addwqr324tesfsf', '3asdasr3qrq44w', 'adsadsa']
|
||||
```
|
||||
|
||||
- **model:** model is the model that you want to use for the synthesis (not the voice), you can find more information about the models in the [ElevenLabs's documentation](https://elevenlabs.io/docs/api-reference/get-models)
|
||||
Additional ElevenLabs-specific parameters can be added as follows:
|
||||
|
||||
- **voices:** list all the voice IDs you want. Add them first to your Elevenlabs account here: https://elevenlabs.io/app/voice-lab
|
||||
|
||||
if you want to add custom parameters, you can add them in the `librechat.yaml` file as follows:
|
||||
|
||||
<Callout type="warning" title="only for ElevenLabs" collapsible>
|
||||
|
||||
these parameters under the `voice_settings` and the pronunciation_dictionary_locators are only for ElevenLabs
|
||||
|
||||
</Callout>
|
||||
|
||||
```yaml filename="librechat.yaml"
|
||||
voice_settings:
|
||||
similarity_boost: '' # number
|
||||
stability: '' # number
|
||||
style: '' # number
|
||||
use_speaker_boost: #boolean
|
||||
pronunciation_dictionary_locators: [''] # list of strings (array)
|
||||
```yaml
|
||||
voice_settings:
|
||||
similarity_boost: '' # number
|
||||
stability: '' # number
|
||||
style: '' # number
|
||||
use_speaker_boost: # boolean
|
||||
pronunciation_dictionary_locators: [''] # list of strings (array)
|
||||
```
|
||||
|
||||
### OpenAI compatible TTS services
|
||||
- #### OpenAI compatible
|
||||
|
||||
check the [OpenAI TTS](#openai-tts) section, just change the `url` variable to the ones that you want to use. It should be a complete url:
|
||||
```yaml filename="librechat.yaml"
|
||||
tts:
|
||||
openai:
|
||||
apiKey: 'sk-xxx'
|
||||
model: 'tts-1'
|
||||
voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
|
||||
url: "https://api.compatible.com/v1/audio/speech"
|
||||
Refer to the OpenAI TTS section, adjusting the `url` variable as needed
|
||||
|
||||
example:
|
||||
|
||||
```yaml
|
||||
speech:
|
||||
tts:
|
||||
openai:
|
||||
url: 'http://host.docker.internal:8080/v1/audio/synthesize'
|
||||
apiKey: '${TTS_API_KEY}'
|
||||
model: 'tts-1'
|
||||
voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
|
||||
```
|
||||
|
||||
### ElevenLabs compatible TTS services
|
||||
- #### ElevenLabs compatible
|
||||
|
||||
check the [ElevenLabs](#elevenlabs) section, just change the `url` variable to the ones that you want to use
|
||||
Refer to the ElevenLabs section, adjusting the `url` variable as needed
|
||||
|
||||
example:
|
||||
|
||||
```yaml
|
||||
speech:
|
||||
tts:
|
||||
elevenlabs:
|
||||
url: 'http://host.docker.internal:8080/v1/audio/synthesize'
|
||||
apiKey: '${TTS_API_KEY}'
|
||||
model: 'eleven_multilingual_v2'
|
||||
voices: ['202898wioas09d2', 'addwqr324tesfsf', '3asdasr3qrq44w', 'adsadsa']
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user