diff --git a/docs/getting-started/advanced-topics/scaling.md b/docs/getting-started/advanced-topics/scaling.md index 2a7891ed..80c14339 100644 --- a/docs/getting-started/advanced-topics/scaling.md +++ b/docs/getting-started/advanced-topics/scaling.md @@ -42,9 +42,10 @@ DATABASE_URL=postgresql://user:password@db-host:5432/openwebui **Key things to know:** -- Open WebUI does **not** migrate data between databases — plan this before you have production data in SQLite. -- For high-concurrency deployments, tune `DATABASE_POOL_SIZE` and `DATABASE_POOL_MAX_OVERFLOW` to match your usage patterns. +- Open WebUI does **not** migrate data between databases — plan this before you have production data in SQLite. If you need to migrate an existing database, see the [Database Migration guide](/troubleshooting/manual-database-migration). +- For high-concurrency deployments, tune `DATABASE_POOL_SIZE` and `DATABASE_POOL_MAX_OVERFLOW` to match your usage patterns. See [Database Optimization](/troubleshooting/performance#-database-optimization) for detailed guidance. - Remember that **each Open WebUI instance maintains its own connection pool**, so total connections = pool size × number of instances. +- If you skip this step and run multiple instances with SQLite, you will see `database is locked` errors and data corruption. See [Database Corruption / "Locked" Errors](/troubleshooting/multi-replica#4-database-corruption--locked-errors) for details. :::tip A good starting point for tuning is `DATABASE_POOL_SIZE=15` and `DATABASE_POOL_MAX_OVERFLOW=20`. Keep the combined total per instance well below your PostgreSQL `max_connections` limit (default is 100). @@ -74,6 +75,9 @@ ENABLE_WEBSOCKET_SUPPORT=true - If you're using Redis Sentinel for high availability, also set `REDIS_SENTINEL_HOSTS` and consider setting `REDIS_SOCKET_CONNECT_TIMEOUT=5` to prevent hangs during failover. - For AWS Elasticache or other managed Redis Cluster services, set `REDIS_CLUSTER=true`. - Make sure your Redis server has `timeout 1800` and a high enough `maxclients` (10000+) to prevent connection exhaustion over time. +- Without Redis in a multi-instance setup, you will experience [WebSocket 403 errors](/troubleshooting/multi-replica#2-websocket-403-errors--connection-failures), [configuration sync issues](/troubleshooting/multi-replica#3-model-not-found-or-configuration-mismatch), and intermittent authentication failures. + +For a complete step-by-step Redis setup (Docker Compose, Sentinel, Cluster mode, verification), see the [Redis WebSocket Support](/tutorials/integrations/redis) tutorial. For WebSocket and CORS issues behind reverse proxies, see [Connection Errors](/troubleshooting/connection-error#-https-tls-cors--websocket-issues). --- @@ -83,12 +87,16 @@ ENABLE_WEBSOCKET_SUPPORT=true Open WebUI is stateless, so you can run as many instances as needed behind a **load balancer**. Each instance is identical and interchangeable. +:::warning +Before running multiple instances, ensure you have completed **Steps 1 and 2** (PostgreSQL and Redis). You also need a shared `WEBUI_SECRET_KEY` across all replicas — without it, users will experience [login loops and 401 errors](/troubleshooting/multi-replica#1-login-loops--401-unauthorized-errors). For a full pre-flight checklist, see the [Core Requirements Checklist](/troubleshooting/multi-replica#core-requirements-checklist). +::: + ### Option A: Container Orchestration (Recommended) Use Kubernetes, Docker Swarm, or similar platforms to manage multiple replicas: - Keep `UVICORN_WORKERS=1` per container (let the orchestrator handle scaling, not the app) -- Set `ENABLE_DB_MIGRATIONS=false` on all replicas except one designated "primary" pod to prevent migration race conditions +- Set `ENABLE_DB_MIGRATIONS=false` on all replicas except one designated "primary" pod to prevent migration race conditions — see [Updates and Migrations](/troubleshooting/multi-replica#updates-and-migrations) for the safe procedure - Scale up/down by adjusting your replica count ### Option B: Multiple Workers per Container @@ -123,6 +131,8 @@ INFO: Child process [pid] died This is a [well-known SQLite limitation](https://www.sqlite.org/howtocorrupt.html#_carrying_an_open_database_connection_across_a_fork_), not a bug. It also affects multi-replica deployments where multiple containers access the same ChromaDB data directory. +For the full crash sequence analysis, see [Worker Crashes During Document Upload](/troubleshooting/multi-replica#6-worker-crashes-during-document-upload-chromadb--multi-worker) or [RAG Troubleshooting: Worker Dies During Upload](/troubleshooting/rag#12-worker-dies-during-document-upload). + ::: **What to do:** @@ -155,7 +165,7 @@ For maximum scalability in self-hosted environments, **Milvus** and **Qdrant** b **When:** You're running multiple instances that need to share uploaded files, generated images, and other user data. -By default, Open WebUI stores uploaded files on the local filesystem under `DATA_DIR` (typically `/app/backend/data`). In a multi-instance setup, each instance needs access to the same files. +By default, Open WebUI stores uploaded files on the local filesystem under `DATA_DIR` (typically `/app/backend/data`). In a multi-instance setup, each instance needs access to the same files. Without shared storage, you will see [uploaded files and RAG knowledge become inaccessible](/troubleshooting/multi-replica#5-uploaded-files-or-rag-knowledge-inaccessible) when requests hit different replicas. ### Do I need cloud storage (S3)? @@ -229,6 +239,8 @@ OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4317 This gives you visibility into request latency, database query performance, error rates, and more. +For the full setup guide, see [OpenTelemetry Monitoring](/reference/monitoring/otel). For application-level log configuration (log levels, debug output), see [Logging Configuration](/getting-started/advanced-topics/logging). + --- ## Putting It All Together @@ -258,6 +270,8 @@ Here's what a production-ready scaled deployment typically looks like: └───────────────────────────────┘ ``` +**Running into issues?** The [Scaling & HA Troubleshooting](/troubleshooting/multi-replica) guide covers common problems (login loops, WebSocket failures, database locks, worker crashes) and their solutions. For performance tuning at scale, see [Optimization, Performance & RAM Usage](/troubleshooting/performance). + ### Minimum Environment Variables for Scaled Deployments ```bash diff --git a/docs/getting-started/quick-start/tab-kubernetes/Helm.md b/docs/getting-started/quick-start/tab-kubernetes/Helm.md index b32777d6..03e56e4b 100644 --- a/docs/getting-started/quick-start/tab-kubernetes/Helm.md +++ b/docs/getting-started/quick-start/tab-kubernetes/Helm.md @@ -34,7 +34,9 @@ Helm helps you manage Kubernetes applications. If you intend to scale Open WebUI using multiple nodes/pods/workers in a clustered environment, you need to setup a NoSQL key-value database (Redis). There are some [environment variables](https://docs.openwebui.com/reference/env-configuration/) that need to be set to the same value for all service-instances, otherwise consistency problems, faulty sessions and other issues will occur! -**Important:** The default vector database (ChromaDB) uses a local SQLite-backed client that is **not safe for multi-replica or multi-worker deployments**. SQLite connections are not fork-safe, and concurrent writes from multiple processes will crash workers instantly. You **must** switch to an external vector database (PGVector, Milvus, Qdrant) via [`VECTOR_DB`](https://docs.openwebui.com/reference/env-configuration#vector_db), or run ChromaDB as a separate HTTP server via [`CHROMA_HTTP_HOST`](https://docs.openwebui.com/reference/env-configuration#chroma_http_host). See the [Scaling & HA guide](https://docs.openwebui.com/troubleshooting/multi-replica) for full requirements. +**Important:** The default vector database (ChromaDB) uses a local SQLite-backed client that is **not safe for multi-replica or multi-worker deployments**. SQLite connections are not fork-safe, and concurrent writes from multiple processes will crash workers instantly. You **must** switch to an external vector database (PGVector, Milvus, Qdrant) via [`VECTOR_DB`](https://docs.openwebui.com/reference/env-configuration#vector_db), or run ChromaDB as a separate HTTP server via [`CHROMA_HTTP_HOST`](https://docs.openwebui.com/reference/env-configuration#chroma_http_host). + +For the complete step-by-step scaling walkthrough, see [Scaling Open WebUI](https://docs.openwebui.com/getting-started/advanced-topics/scaling). For troubleshooting multi-replica issues, see the [Scaling & HA guide](https://docs.openwebui.com/troubleshooting/multi-replica). ::: diff --git a/docs/troubleshooting/connection-error.mdx b/docs/troubleshooting/connection-error.mdx index e6a369ed..83d73b15 100644 --- a/docs/troubleshooting/connection-error.mdx +++ b/docs/troubleshooting/connection-error.mdx @@ -90,6 +90,8 @@ WEBSOCKET_MANAGER=redis WEBSOCKET_REDIS_URL=redis://redis:6379/1 ``` +For detailed Redis setup instructions, see [Redis WebSocket Support](/tutorials/integrations/redis). For a complete multi-instance scaling walkthrough, see [Scaling Open WebUI](/getting-started/advanced-topics/scaling). If you're seeing WebSocket 403 errors specifically in a multi-replica setup, see [Scaling & HA Troubleshooting](/troubleshooting/multi-replica#2-websocket-403-errors--connection-failures). + ### Testing Your Configuration To verify your setup is working: diff --git a/docs/troubleshooting/multi-replica.mdx b/docs/troubleshooting/multi-replica.mdx index 312bcb0f..19ae65f2 100644 --- a/docs/troubleshooting/multi-replica.mdx +++ b/docs/troubleshooting/multi-replica.mdx @@ -7,6 +7,8 @@ title: "Scaling & HA" This guide addresses common issues encountered when deploying Open WebUI in **multi-replica** environments (e.g., Kubernetes, Docker Swarm) or when using **multiple workers** (`UVICORN_WORKERS > 1`) for increased concurrency. +If you are setting up a scaled deployment for the first time, start with the [Scaling Open WebUI](/getting-started/advanced-topics/scaling) guide for a step-by-step walkthrough. + ## Core Requirements Checklist Before troubleshooting specific errors, ensure your deployment meets these **absolute requirements** for a multi-replica setup. Missing any of these will cause instability, login loops, or data loss. @@ -253,8 +255,11 @@ While Open WebUI is designed to be stateless with proper Redis configuration, en ## Related Documentation +- [Scaling Open WebUI](/getting-started/advanced-topics/scaling) — Step-by-step guide to scaling from single instance to production - [Environment Variable Configuration](/reference/env-configuration) - [Optimization, Performance & RAM Usage](/troubleshooting/performance) +- [Redis WebSocket Support](/tutorials/integrations/redis) — Detailed Redis setup tutorial - [Troubleshooting Connection Errors](/troubleshooting/connection-error) +- [RAG Troubleshooting](/troubleshooting/rag) — Document upload and embedding issues - [Logging Configuration](/getting-started/advanced-topics/logging) diff --git a/docs/troubleshooting/performance.md b/docs/troubleshooting/performance.md index 2d5b1058..62eecd00 100644 --- a/docs/troubleshooting/performance.md +++ b/docs/troubleshooting/performance.md @@ -149,6 +149,8 @@ The way your documents are chunked directly impacts both storage efficiency and If you are deploying for **enterprise scale** (hundreds of users), simple Docker Compose setups may not suffice. You will need to move to a clustered environment. +For a step-by-step walkthrough of the entire scaling journey (PostgreSQL, Redis, vector DB, storage, observability), see the **[Scaling Open WebUI](/getting-started/advanced-topics/scaling)** guide. + * **Kubernetes / Helm**: For deploying on K8s with multiple replicas, see the **[Multi-Replica & High Availability Guide](/troubleshooting/multi-replica)**. * **Redis (Mandatory)**: When running multiple workers (`UVICORN_WORKERS > 1`) or multiple replicas, **Redis is required** to handle WebSocket connections and session syncing. See **[Redis Integration](/tutorials/integrations/redis)**. * **Load Balancing**: Ensure your Ingress controller supports **Session Affinity** (Sticky Sessions) for best performance. diff --git a/docs/tutorials/integrations/redis.md b/docs/tutorials/integrations/redis.md index fa502963..241d4f01 100644 --- a/docs/tutorials/integrations/redis.md +++ b/docs/tutorials/integrations/redis.md @@ -17,7 +17,7 @@ This documentation page outlines the steps required to integrate Redis with Open ## When is Redis Required? -Redis serves two distinct purposes in Open WebUI, and understanding when it's required is crucial for proper deployment: +Redis serves two distinct purposes in Open WebUI, and understanding when it's required is crucial for proper deployment. For a high-level overview of all scaling requirements (PostgreSQL, Redis, vector DB, storage), see the [Scaling Open WebUI](/getting-started/advanced-topics/scaling) guide. ### Single Instance Deployments