Add cross-links between scaling guide and troubleshooting docs

- scaling.md: Link each step to relevant troubleshooting sections
  (DB corruption, WebSocket errors, login loops, worker crashes,
  file access issues, logging, OpenTelemetry)
- multi-replica.mdx: Add scaling guide link in intro and Related Docs,
  plus links to Redis and RAG troubleshooting
- performance.md: Add scaling guide link in Scaling Infrastructure section
- redis.md: Add scaling guide link in "When is Redis Required?"
- connection-error.mdx: Add links to Redis tutorial, scaling guide,
  and multi-replica WebSocket troubleshooting
- Helm.md: Add scaling guide link alongside existing HA guide link

https://claude.ai/code/session_01TPoquFdHG6dZxRrZ4Jormh
This commit is contained in:
Claude
2026-02-16 13:02:57 +00:00
parent e9e15395cb
commit 68c7feca10
6 changed files with 31 additions and 6 deletions

View File

@@ -42,9 +42,10 @@ DATABASE_URL=postgresql://user:password@db-host:5432/openwebui
**Key things to know:** **Key things to know:**
- Open WebUI does **not** migrate data between databases — plan this before you have production data in SQLite. - Open WebUI does **not** migrate data between databases — plan this before you have production data in SQLite. If you need to migrate an existing database, see the [Database Migration guide](/troubleshooting/manual-database-migration).
- For high-concurrency deployments, tune `DATABASE_POOL_SIZE` and `DATABASE_POOL_MAX_OVERFLOW` to match your usage patterns. - For high-concurrency deployments, tune `DATABASE_POOL_SIZE` and `DATABASE_POOL_MAX_OVERFLOW` to match your usage patterns. See [Database Optimization](/troubleshooting/performance#-database-optimization) for detailed guidance.
- Remember that **each Open WebUI instance maintains its own connection pool**, so total connections = pool size × number of instances. - Remember that **each Open WebUI instance maintains its own connection pool**, so total connections = pool size × number of instances.
- If you skip this step and run multiple instances with SQLite, you will see `database is locked` errors and data corruption. See [Database Corruption / "Locked" Errors](/troubleshooting/multi-replica#4-database-corruption--locked-errors) for details.
:::tip :::tip
A good starting point for tuning is `DATABASE_POOL_SIZE=15` and `DATABASE_POOL_MAX_OVERFLOW=20`. Keep the combined total per instance well below your PostgreSQL `max_connections` limit (default is 100). A good starting point for tuning is `DATABASE_POOL_SIZE=15` and `DATABASE_POOL_MAX_OVERFLOW=20`. Keep the combined total per instance well below your PostgreSQL `max_connections` limit (default is 100).
@@ -74,6 +75,9 @@ ENABLE_WEBSOCKET_SUPPORT=true
- If you're using Redis Sentinel for high availability, also set `REDIS_SENTINEL_HOSTS` and consider setting `REDIS_SOCKET_CONNECT_TIMEOUT=5` to prevent hangs during failover. - If you're using Redis Sentinel for high availability, also set `REDIS_SENTINEL_HOSTS` and consider setting `REDIS_SOCKET_CONNECT_TIMEOUT=5` to prevent hangs during failover.
- For AWS Elasticache or other managed Redis Cluster services, set `REDIS_CLUSTER=true`. - For AWS Elasticache or other managed Redis Cluster services, set `REDIS_CLUSTER=true`.
- Make sure your Redis server has `timeout 1800` and a high enough `maxclients` (10000+) to prevent connection exhaustion over time. - Make sure your Redis server has `timeout 1800` and a high enough `maxclients` (10000+) to prevent connection exhaustion over time.
- Without Redis in a multi-instance setup, you will experience [WebSocket 403 errors](/troubleshooting/multi-replica#2-websocket-403-errors--connection-failures), [configuration sync issues](/troubleshooting/multi-replica#3-model-not-found-or-configuration-mismatch), and intermittent authentication failures.
For a complete step-by-step Redis setup (Docker Compose, Sentinel, Cluster mode, verification), see the [Redis WebSocket Support](/tutorials/integrations/redis) tutorial. For WebSocket and CORS issues behind reverse proxies, see [Connection Errors](/troubleshooting/connection-error#-https-tls-cors--websocket-issues).
--- ---
@@ -83,12 +87,16 @@ ENABLE_WEBSOCKET_SUPPORT=true
Open WebUI is stateless, so you can run as many instances as needed behind a **load balancer**. Each instance is identical and interchangeable. Open WebUI is stateless, so you can run as many instances as needed behind a **load balancer**. Each instance is identical and interchangeable.
:::warning
Before running multiple instances, ensure you have completed **Steps 1 and 2** (PostgreSQL and Redis). You also need a shared `WEBUI_SECRET_KEY` across all replicas — without it, users will experience [login loops and 401 errors](/troubleshooting/multi-replica#1-login-loops--401-unauthorized-errors). For a full pre-flight checklist, see the [Core Requirements Checklist](/troubleshooting/multi-replica#core-requirements-checklist).
:::
### Option A: Container Orchestration (Recommended) ### Option A: Container Orchestration (Recommended)
Use Kubernetes, Docker Swarm, or similar platforms to manage multiple replicas: Use Kubernetes, Docker Swarm, or similar platforms to manage multiple replicas:
- Keep `UVICORN_WORKERS=1` per container (let the orchestrator handle scaling, not the app) - Keep `UVICORN_WORKERS=1` per container (let the orchestrator handle scaling, not the app)
- Set `ENABLE_DB_MIGRATIONS=false` on all replicas except one designated "primary" pod to prevent migration race conditions - Set `ENABLE_DB_MIGRATIONS=false` on all replicas except one designated "primary" pod to prevent migration race conditions — see [Updates and Migrations](/troubleshooting/multi-replica#updates-and-migrations) for the safe procedure
- Scale up/down by adjusting your replica count - Scale up/down by adjusting your replica count
### Option B: Multiple Workers per Container ### Option B: Multiple Workers per Container
@@ -123,6 +131,8 @@ INFO: Child process [pid] died
This is a [well-known SQLite limitation](https://www.sqlite.org/howtocorrupt.html#_carrying_an_open_database_connection_across_a_fork_), not a bug. It also affects multi-replica deployments where multiple containers access the same ChromaDB data directory. This is a [well-known SQLite limitation](https://www.sqlite.org/howtocorrupt.html#_carrying_an_open_database_connection_across_a_fork_), not a bug. It also affects multi-replica deployments where multiple containers access the same ChromaDB data directory.
For the full crash sequence analysis, see [Worker Crashes During Document Upload](/troubleshooting/multi-replica#6-worker-crashes-during-document-upload-chromadb--multi-worker) or [RAG Troubleshooting: Worker Dies During Upload](/troubleshooting/rag#12-worker-dies-during-document-upload).
::: :::
**What to do:** **What to do:**
@@ -155,7 +165,7 @@ For maximum scalability in self-hosted environments, **Milvus** and **Qdrant** b
**When:** You're running multiple instances that need to share uploaded files, generated images, and other user data. **When:** You're running multiple instances that need to share uploaded files, generated images, and other user data.
By default, Open WebUI stores uploaded files on the local filesystem under `DATA_DIR` (typically `/app/backend/data`). In a multi-instance setup, each instance needs access to the same files. By default, Open WebUI stores uploaded files on the local filesystem under `DATA_DIR` (typically `/app/backend/data`). In a multi-instance setup, each instance needs access to the same files. Without shared storage, you will see [uploaded files and RAG knowledge become inaccessible](/troubleshooting/multi-replica#5-uploaded-files-or-rag-knowledge-inaccessible) when requests hit different replicas.
### Do I need cloud storage (S3)? ### Do I need cloud storage (S3)?
@@ -229,6 +239,8 @@ OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4317
This gives you visibility into request latency, database query performance, error rates, and more. This gives you visibility into request latency, database query performance, error rates, and more.
For the full setup guide, see [OpenTelemetry Monitoring](/reference/monitoring/otel). For application-level log configuration (log levels, debug output), see [Logging Configuration](/getting-started/advanced-topics/logging).
--- ---
## Putting It All Together ## Putting It All Together
@@ -258,6 +270,8 @@ Here's what a production-ready scaled deployment typically looks like:
└───────────────────────────────┘ └───────────────────────────────┘
``` ```
**Running into issues?** The [Scaling & HA Troubleshooting](/troubleshooting/multi-replica) guide covers common problems (login loops, WebSocket failures, database locks, worker crashes) and their solutions. For performance tuning at scale, see [Optimization, Performance & RAM Usage](/troubleshooting/performance).
### Minimum Environment Variables for Scaled Deployments ### Minimum Environment Variables for Scaled Deployments
```bash ```bash

View File

@@ -34,7 +34,9 @@ Helm helps you manage Kubernetes applications.
If you intend to scale Open WebUI using multiple nodes/pods/workers in a clustered environment, you need to setup a NoSQL key-value database (Redis). If you intend to scale Open WebUI using multiple nodes/pods/workers in a clustered environment, you need to setup a NoSQL key-value database (Redis).
There are some [environment variables](https://docs.openwebui.com/reference/env-configuration/) that need to be set to the same value for all service-instances, otherwise consistency problems, faulty sessions and other issues will occur! There are some [environment variables](https://docs.openwebui.com/reference/env-configuration/) that need to be set to the same value for all service-instances, otherwise consistency problems, faulty sessions and other issues will occur!
**Important:** The default vector database (ChromaDB) uses a local SQLite-backed client that is **not safe for multi-replica or multi-worker deployments**. SQLite connections are not fork-safe, and concurrent writes from multiple processes will crash workers instantly. You **must** switch to an external vector database (PGVector, Milvus, Qdrant) via [`VECTOR_DB`](https://docs.openwebui.com/reference/env-configuration#vector_db), or run ChromaDB as a separate HTTP server via [`CHROMA_HTTP_HOST`](https://docs.openwebui.com/reference/env-configuration#chroma_http_host). See the [Scaling & HA guide](https://docs.openwebui.com/troubleshooting/multi-replica) for full requirements. **Important:** The default vector database (ChromaDB) uses a local SQLite-backed client that is **not safe for multi-replica or multi-worker deployments**. SQLite connections are not fork-safe, and concurrent writes from multiple processes will crash workers instantly. You **must** switch to an external vector database (PGVector, Milvus, Qdrant) via [`VECTOR_DB`](https://docs.openwebui.com/reference/env-configuration#vector_db), or run ChromaDB as a separate HTTP server via [`CHROMA_HTTP_HOST`](https://docs.openwebui.com/reference/env-configuration#chroma_http_host).
For the complete step-by-step scaling walkthrough, see [Scaling Open WebUI](https://docs.openwebui.com/getting-started/advanced-topics/scaling). For troubleshooting multi-replica issues, see the [Scaling & HA guide](https://docs.openwebui.com/troubleshooting/multi-replica).
::: :::

View File

@@ -90,6 +90,8 @@ WEBSOCKET_MANAGER=redis
WEBSOCKET_REDIS_URL=redis://redis:6379/1 WEBSOCKET_REDIS_URL=redis://redis:6379/1
``` ```
For detailed Redis setup instructions, see [Redis WebSocket Support](/tutorials/integrations/redis). For a complete multi-instance scaling walkthrough, see [Scaling Open WebUI](/getting-started/advanced-topics/scaling). If you're seeing WebSocket 403 errors specifically in a multi-replica setup, see [Scaling & HA Troubleshooting](/troubleshooting/multi-replica#2-websocket-403-errors--connection-failures).
### Testing Your Configuration ### Testing Your Configuration
To verify your setup is working: To verify your setup is working:

View File

@@ -7,6 +7,8 @@ title: "Scaling & HA"
This guide addresses common issues encountered when deploying Open WebUI in **multi-replica** environments (e.g., Kubernetes, Docker Swarm) or when using **multiple workers** (`UVICORN_WORKERS > 1`) for increased concurrency. This guide addresses common issues encountered when deploying Open WebUI in **multi-replica** environments (e.g., Kubernetes, Docker Swarm) or when using **multiple workers** (`UVICORN_WORKERS > 1`) for increased concurrency.
If you are setting up a scaled deployment for the first time, start with the [Scaling Open WebUI](/getting-started/advanced-topics/scaling) guide for a step-by-step walkthrough.
## Core Requirements Checklist ## Core Requirements Checklist
Before troubleshooting specific errors, ensure your deployment meets these **absolute requirements** for a multi-replica setup. Missing any of these will cause instability, login loops, or data loss. Before troubleshooting specific errors, ensure your deployment meets these **absolute requirements** for a multi-replica setup. Missing any of these will cause instability, login loops, or data loss.
@@ -253,8 +255,11 @@ While Open WebUI is designed to be stateless with proper Redis configuration, en
## Related Documentation ## Related Documentation
- [Scaling Open WebUI](/getting-started/advanced-topics/scaling) — Step-by-step guide to scaling from single instance to production
- [Environment Variable Configuration](/reference/env-configuration) - [Environment Variable Configuration](/reference/env-configuration)
- [Optimization, Performance & RAM Usage](/troubleshooting/performance) - [Optimization, Performance & RAM Usage](/troubleshooting/performance)
- [Redis WebSocket Support](/tutorials/integrations/redis) — Detailed Redis setup tutorial
- [Troubleshooting Connection Errors](/troubleshooting/connection-error) - [Troubleshooting Connection Errors](/troubleshooting/connection-error)
- [RAG Troubleshooting](/troubleshooting/rag) — Document upload and embedding issues
- [Logging Configuration](/getting-started/advanced-topics/logging) - [Logging Configuration](/getting-started/advanced-topics/logging)

View File

@@ -149,6 +149,8 @@ The way your documents are chunked directly impacts both storage efficiency and
If you are deploying for **enterprise scale** (hundreds of users), simple Docker Compose setups may not suffice. You will need to move to a clustered environment. If you are deploying for **enterprise scale** (hundreds of users), simple Docker Compose setups may not suffice. You will need to move to a clustered environment.
For a step-by-step walkthrough of the entire scaling journey (PostgreSQL, Redis, vector DB, storage, observability), see the **[Scaling Open WebUI](/getting-started/advanced-topics/scaling)** guide.
* **Kubernetes / Helm**: For deploying on K8s with multiple replicas, see the **[Multi-Replica & High Availability Guide](/troubleshooting/multi-replica)**. * **Kubernetes / Helm**: For deploying on K8s with multiple replicas, see the **[Multi-Replica & High Availability Guide](/troubleshooting/multi-replica)**.
* **Redis (Mandatory)**: When running multiple workers (`UVICORN_WORKERS > 1`) or multiple replicas, **Redis is required** to handle WebSocket connections and session syncing. See **[Redis Integration](/tutorials/integrations/redis)**. * **Redis (Mandatory)**: When running multiple workers (`UVICORN_WORKERS > 1`) or multiple replicas, **Redis is required** to handle WebSocket connections and session syncing. See **[Redis Integration](/tutorials/integrations/redis)**.
* **Load Balancing**: Ensure your Ingress controller supports **Session Affinity** (Sticky Sessions) for best performance. * **Load Balancing**: Ensure your Ingress controller supports **Session Affinity** (Sticky Sessions) for best performance.

View File

@@ -17,7 +17,7 @@ This documentation page outlines the steps required to integrate Redis with Open
## When is Redis Required? ## When is Redis Required?
Redis serves two distinct purposes in Open WebUI, and understanding when it's required is crucial for proper deployment: Redis serves two distinct purposes in Open WebUI, and understanding when it's required is crucial for proper deployment. For a high-level overview of all scaling requirements (PostgreSQL, Redis, vector DB, storage), see the [Scaling Open WebUI](/getting-started/advanced-topics/scaling) guide.
### Single Instance Deployments ### Single Instance Deployments