mirror of
https://github.com/open-webui/docs.git
synced 2026-01-04 02:36:55 +07:00
162 lines
7.6 KiB
Plaintext
162 lines
7.6 KiB
Plaintext
---
|
|
sidebar_position: 10
|
|
title: "Multi-Replica / High Availability / Concurrency"
|
|
---
|
|
|
|
# Multi-Replica, High Availability & Concurrency Troubleshooting
|
|
|
|
This guide addresses common issues encountered when deploying Open WebUI in **multi-replica** environments (e.g., Kubernetes, Docker Swarm) or when using **multiple workers** (`UVICORN_WORKERS > 1`) for increased concurrency.
|
|
|
|
## Core Requirements Checklist
|
|
|
|
Before troubleshooting specific errors, ensure your deployment meets these **absolute requirements** for a multi-replica setup. Missing any of these will cause instability, login loops, or data loss.
|
|
|
|
1. **Shared Secret Key:** [`WEBUI_SECRET_KEY`](/getting-started/env-configuration#webui_secret_key) **MUST** be identical on all replicas.
|
|
2. **External Database:** You **MUST** use an external PostgreSQL database (see [`DATABASE_URL`](/getting-started/env-configuration#database_server)). SQLite is **NOT** supported for multiple instances.
|
|
3. **Redis for WebSockets:** [`ENABLE_WEBSOCKET_SUPPORT=True`](/getting-started/env-configuration#enable_websocket_support) and [`WEBSOCKET_MANAGER=redis`](/getting-started/env-configuration#websocket_manager) with a valid [`WEBSOCKET_REDIS_URL`](/getting-started/env-configuration#websocket_redis_url) are required.
|
|
4. **Shared Storage:** A persistent volume (RWX / ReadWriteMany if possible, or ensuring all replicas map to the same underlying storage for `data/`) is critical for RAG (uploads/vectors) and generated images.
|
|
5. **External Vector Database (Recommended):** While embedded Chroma works with shared storage, using a dedicated external Vector DB (e.g., [PGVector](/getting-started/env-configuration#pgvector_db_url), Milvus, Qdrant) is **highly recommended** to avoid file locking issues and improve performance.
|
|
|
|
---
|
|
|
|
## Common Issues
|
|
|
|
### 1. Login Loops / 401 Unauthorized Errors
|
|
|
|
**Symptoms:**
|
|
- You log in successfully, but the next click logs you out.
|
|
- You see "Unauthorized" or "401" errors in the browser console immediately after login.
|
|
- "Error decrypting tokens" appears in logs.
|
|
|
|
**Cause:**
|
|
Each replica is using a different `WEBUI_SECRET_KEY`. When Replica A issues a session token (JWT), Replica B rejects it because it cannot verify the signature with its own different key.
|
|
|
|
**Solution:**
|
|
Set the `WEBUI_SECRET_KEY` environment variable to the **same** strong, random string on all backend replicas.
|
|
|
|
```yaml
|
|
# Example in Kubernetes/Compose
|
|
env:
|
|
- name: WEBUI_SECRET_KEY
|
|
value: "your-super-secure-static-key-here"
|
|
```
|
|
|
|
### 2. WebSocket 403 Errors / Connection Failures
|
|
|
|
**Symptoms:**
|
|
- Chat stops responding or hangs.
|
|
- Browser console shows `WebSocket connection failed: 403 Forbidden` or `Connection closed`.
|
|
- Logs show `engineio.server: https://your-domain.com is not an accepted origin`.
|
|
|
|
**Cause:**
|
|
- **CORS:** The load balancer or ingress origin does not match the allowed origins.
|
|
- **Missing Redis:** WebSockets are defaulting to in-memory, so events on Replica A (e.g., LLM generation finish) are not broadcast to the user connected to Replica B.
|
|
|
|
**Solution:**
|
|
1. **Configure CORS:** Ensure [`CORS_ALLOW_ORIGIN`](/getting-started/env-configuration#cors_allow_origin) includes your public domain *and* http/https variations.
|
|
|
|
If you see logs like `engineio.base_server:_log_error_once:354 - https://yourdomain.com is not an accepted origin`, you must update this variable. It accepts a **semicolon-separated list** of allowed origins.
|
|
|
|
**Example:**
|
|
```bash
|
|
CORS_ALLOW_ORIGIN="https://chat.yourdomain.com;http://chat.yourdomain.com;https://yourhostname;http://localhost:3000"
|
|
```
|
|
*Add all valid IPs, Domains, and Hostnames that users might use to access your Open WebUI.*
|
|
2. **Enable Redis for WebSockets:**
|
|
Ensure these variables are set on **all** replicas:
|
|
```bash
|
|
ENABLE_WEBSOCKET_SUPPORT=True
|
|
WEBSOCKET_MANAGER=redis
|
|
WEBSOCKET_REDIS_URL=redis://your-redis-host:6379/0
|
|
```
|
|
|
|
### 3. "Model Not Found" or Configuration Mismatch
|
|
|
|
**Symptoms:**
|
|
- You enable a model or change a setting in the Admin UI, but other users (or you, after a refresh) don't see the change.
|
|
- Chats fail with "Model not found" intermittently.
|
|
|
|
**Cause:**
|
|
- **Configuration Sync:** Replicas are not synced. Open WebUI uses Redis Pub/Sub to broadcast configuration changes (like toggling a model) to all other instances.
|
|
- **Missing Redis:** If `REDIS_URL` is not set, configuration changes stay local to the instance where the change was made.
|
|
|
|
**Solution:**
|
|
Set `REDIS_URL` to point to your shared Redis instance. This enables the Pub/Sub mechanism for real-time config syncing.
|
|
|
|
```bash
|
|
REDIS_URL=redis://your-redis-host:6379/0
|
|
```
|
|
|
|
### 4. Database Corruption / "Locked" Errors
|
|
|
|
**Symptoms:**
|
|
- Logs show `database is locked` or severe SQL errors.
|
|
- Data saved on one instance disappears on another.
|
|
|
|
**Cause:**
|
|
Using **SQLite** with multiple replicas. SQLite is a file-based database and does not support concurrent network writes from multiple containers.
|
|
|
|
**Solution:**
|
|
Migrate to **PostgreSQL**. Update your connection string:
|
|
|
|
```bash
|
|
DATABASE_URL=postgresql://user:password@postgres-host:5432/openwebui
|
|
```
|
|
|
|
### 5. Uploaded Files or RAG Knowledge Inaccessible
|
|
|
|
**Symptoms:**
|
|
- You upload a file (for RAG) on one instance, but the model cannot find it later.
|
|
- Generated images appear as broken links.
|
|
|
|
**Cause:**
|
|
The `/app/backend/data` directory is not shared or is not consistent across replicas. If User A uploads a file to Replica 1, and the next request hits Replica 2, Replica 2 won't have the file physically on disk.
|
|
|
|
**Solution:**
|
|
- **Kubernetes:** Use a `PersistentVolumeClaim` with `ReadWriteMany` (RWX) access mode if your storage provider supports it (e.g., NFS, CephFS, AWS EFS).
|
|
- **Docker Swarm/Compose:** Mount a shared volume (e.g., NFS mount) to `/app/backend/data` on all containers.
|
|
|
|
---
|
|
|
|
## Deployment Best Practices
|
|
|
|
### Updates and Migrations
|
|
|
|
:::danger Critical: Avoid Concurrent Migrations
|
|
**Always ensure only one process is running database migrations when upgrading Open WebUI versions.**
|
|
:::
|
|
|
|
Database migrations run automatically on startup. If multiple replicas (or multiple workers within a single container) start simultaneously with a new version, they may try to run migrations concurrently, potentially leading to race conditions or database schema corruption.
|
|
|
|
**Safe Update Procedure:**
|
|
|
|
There are two ways to safely handle migrations in a multi-replica environment:
|
|
|
|
#### Option 1: Designate a Master Migration Pod (Recommended)
|
|
1. Identify one pod/replica as the "master" for migrations.
|
|
2. Set `ENABLE_DB_MIGRATIONS=True` (default) on the master pod.
|
|
3. Set `ENABLE_DB_MIGRATIONS=False` on all other pods.
|
|
4. When updating, the master pod will handle the database schema update while other pods skip the migration step.
|
|
|
|
#### Option 2: Scale Down During Update
|
|
1. **Scale Down:** Set replicas to `1` (and ensure `UVICORN_WORKERS=1`).
|
|
2. **Update Image:** Update the image or version.
|
|
3. **Wait for Health Check:** Wait for the single instance to start fully and complete migrations.
|
|
4. **Scale Up:** Increase replicas back to your desired count.
|
|
|
|
|
|
### Session Affinity (Sticky Sessions)
|
|
While Open WebUI is designed to be stateless with proper Redis configuration, enabling **Session Affinity** (Sticky Sessions) at your Load Balancer / Ingress level can improve performance and reduce occasional jitter in WebSocket connections.
|
|
|
|
- **Nginx Ingress:** `nginx.ingress.kubernetes.io/affinity: "cookie"`
|
|
- **AWS ALB:** Enable Target Group Stickiness.
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [Environment Variable Configuration](/getting-started/env-configuration)
|
|
- [Troubleshooting Connection Errors](/troubleshooting/connection-error)
|
|
- [Logging Configuration](/getting-started/advanced-topics/logging)
|
|
|