From 0bb8971df22d78f8681b189249704d319ac683e9 Mon Sep 17 00:00:00 2001 From: James Westbrook <0xthresh@protonmail.com> Date: Wed, 4 Mar 2026 18:24:12 -0700 Subject: [PATCH] fix: split deployments into individual pages --- docs/enterprise/deployment.md | 460 ------------------ .../deployment/container-service.md | 85 ++++ docs/enterprise/deployment/index.md | 154 ++++++ docs/enterprise/deployment/kubernetes-helm.md | 177 +++++++ docs/enterprise/deployment/python-pip.md | 96 ++++ 5 files changed, 512 insertions(+), 460 deletions(-) delete mode 100644 docs/enterprise/deployment.md create mode 100644 docs/enterprise/deployment/container-service.md create mode 100644 docs/enterprise/deployment/index.md create mode 100644 docs/enterprise/deployment/kubernetes-helm.md create mode 100644 docs/enterprise/deployment/python-pip.md diff --git a/docs/enterprise/deployment.md b/docs/enterprise/deployment.md deleted file mode 100644 index ef9dff98..00000000 --- a/docs/enterprise/deployment.md +++ /dev/null @@ -1,460 +0,0 @@ ---- -sidebar_position: 4 -title: "Deployment Options" ---- - -# Enterprise Deployment Options - -Open WebUI's **stateless, container-first architecture** means the same application runs identically whether you deploy it as a Python process on a VM, a container in a managed service, or a pod in a Kubernetes cluster. The difference between deployment patterns is how you **orchestrate, scale, and operate** the application — not how the application itself behaves. - -This guide covers three production deployment patterns for enterprise environments. Each section includes an architecture overview, scaling strategy, and key considerations to help both technical decision-makers evaluating options and platform engineers planning implementation. - -:::tip Model Inference Is Independent -How you serve LLM models is separate from how you deploy Open WebUI. You can use **managed APIs** (OpenAI, Anthropic, Azure OpenAI, Google Gemini) or **self-hosted inference** (Ollama, vLLM) with any deployment pattern. See [Integration](/enterprise/integration) for details on connecting models. -::: - ---- - -## Shared Infrastructure Requirements - -Regardless of which deployment pattern you choose, every scaled Open WebUI deployment requires the same set of backing services. Configure these **before** scaling beyond a single instance. - -| Component | Why It's Required | Options | -| :--- | :--- | :--- | -| **PostgreSQL** | Multi-instance deployments require a real database. SQLite does not support concurrent writes from multiple processes. | Self-managed, Amazon RDS, Azure Database for PostgreSQL, Google Cloud SQL | -| **Redis** | Session management, WebSocket coordination, and configuration sync across instances. | Self-managed, Amazon ElastiCache, Azure Cache for Redis, Google Memorystore | -| **Vector Database** | The default ChromaDB uses a local SQLite backend that is not safe for multi-process access. | PGVector (shares PostgreSQL), Milvus, Qdrant, or ChromaDB in HTTP server mode | -| **Shared Storage** | Uploaded files must be accessible from every instance. | Shared filesystem (NFS, EFS, CephFS) or object storage (`S3`, `GCS`, `Azure Blob`) | -| **Content Extraction** | The default `pypdf` extractor leaks memory under sustained load. | Apache Tika or Docling as a sidecar service | -| **Embedding Engine** | The default SentenceTransformers model loads ~500 MB into RAM per worker process. | OpenAI Embeddings API, or Ollama running an embedding model | - -### Critical Configuration - -These environment variables **must** be set consistently across every instance: - -```bash -# Shared secret — MUST be identical on all instances -WEBUI_SECRET_KEY=your-secret-key-here - -# Database -DATABASE_URL=postgresql://user:password@db-host:5432/openwebui - -# Vector Database -VECTOR_DB=pgvector -PGVECTOR_DB_URL=postgresql://user:password@db-host:5432/openwebui - -# Redis -REDIS_URL=redis://redis-host:6379/0 -WEBSOCKET_MANAGER=redis -ENABLE_WEBSOCKET_SUPPORT=true - -# Content Extraction -CONTENT_EXTRACTION_ENGINE=tika -TIKA_SERVER_URL=http://tika:9998 - -# Embeddings -RAG_EMBEDDING_ENGINE=openai - -# Storage — choose ONE: -# Option A: shared filesystem (mount the same volume to all instances, no env var needed) -# Option B: object storage (see https://docs.openwebui.com/reference/env-configuration#cloud-storage for all required vars) -# STORAGE_PROVIDER=s3 - -# Workers — let the orchestrator handle scaling -UVICORN_WORKERS=1 - -# Migrations — only ONE instance should run migrations -ENABLE_DB_MIGRATIONS=false -``` - -:::warning Database Migrations -Set `ENABLE_DB_MIGRATIONS=false` on **all instances except one**. During updates, scale down to a single instance, allow migrations to complete, then scale back up. Concurrent migrations can corrupt your database. -::: - -For the complete step-by-step scaling walkthrough, see [Scaling Open WebUI](/getting-started/advanced-topics/scaling). For the full environment variable reference, see [Environment Variable Configuration](/reference/env-configuration). - ---- - -## Option 1: Python / Pip on Auto-Scaling VMs - -Deploy `open-webui serve` as a systemd-managed process on virtual machines in a cloud auto-scaling group (AWS ASG, Azure VMSS, GCP MIG). - -### When to Choose This Pattern - -- Your organization has established VM-based infrastructure and operational practices -- Regulatory or compliance requirements mandate direct OS-level control -- Your team has limited container expertise but strong Linux administration skills -- You want a straightforward deployment without container orchestration overhead - -### Architecture - -```mermaid -flowchart TB - LB["Load Balancer"] - - subgraph ASG["Auto-Scaling Group"] - VM1["VM 1"] - VM2["VM 2"] - VM3["VM N"] - end - - subgraph Backend["Backing Services"] - PG["PostgreSQL + PGVector"] - Redis["Redis"] - S3["Object Storage"] - Tika["Tika"] - end - - LB --> ASG - ASG --> Backend -``` - -### Installation - -Install on each VM using pip with the `[all]` extra (includes PostgreSQL drivers): - -```bash -pip install open-webui[all] -``` - -Create a systemd unit to manage the process: - -```ini -[Unit] -Description=Open WebUI -After=network.target - -[Service] -Type=simple -User=openwebui -EnvironmentFile=/etc/open-webui/env -ExecStart=/usr/local/bin/open-webui serve -Restart=always -RestartSec=5 - -[Install] -WantedBy=multi-user.target -``` - -Place your environment variables in `/etc/open-webui/env` (see [Critical Configuration](#critical-configuration) above). - -### Scaling Strategy - -- **Horizontal scaling**: Configure your auto-scaling group to add or remove VMs based on CPU utilization or request count. -- **Health checks**: Point your load balancer health check at the `/health` endpoint (HTTP 200 when healthy). -- **One process per VM**: Keep `UVICORN_WORKERS=1` and let the auto-scaler manage capacity. This simplifies memory accounting and avoids fork-safety issues with the default vector database. -- **Sticky sessions**: Configure your load balancer for cookie-based session affinity to ensure WebSocket connections remain routed to the same instance. - -### Key Considerations - -| Consideration | Detail | -| :--- | :--- | -| **OS patching** | You are responsible for OS updates, security patches, and Python runtime management. | -| **Python environment** | Pin your Python version (3.11 recommended) and use a virtual environment or system-level install. | -| **Storage** | Use object storage (such as S3) or a shared filesystem (such as NFS) since VMs in an auto-scaling group do not share a local filesystem. | -| **Tika sidecar** | Run a Tika server on each VM or as a shared service. A shared instance simplifies management. | -| **Updates** | Scale the group to 1 instance, update the package (`pip install --upgrade open-webui`), wait for database migrations to complete, then scale back up. | - -For pip installation basics, see the [Python Quick Start](/getting-started/quick-start#python). - ---- - -## Option 2: Container Service - -Run the official `ghcr.io/open-webui/open-webui` image on a managed container platform such as AWS ECS/Fargate, Azure Container Apps, or Google Cloud Run. - -### When to Choose This Pattern - -- You want container benefits (immutable images, versioned deployments, no OS management) without Kubernetes complexity -- Your organization already uses a managed container platform -- You need fast scaling with minimal operational overhead -- You prefer managed infrastructure with platform-native auto-scaling - -### Architecture - -```mermaid -flowchart TB - LB["Load Balancer"] - - subgraph CS["Container Service"] - T1["Container Task 1"] - T2["Container Task 2"] - T3["Container Task N"] - end - - subgraph Backend["Managed Backing Services"] - PG["PostgreSQL + PGVector"] - Redis["Redis"] - S3["Object Storage"] - Tika["Tika"] - end - - LB --> CS - CS --> Backend -``` - -### Image Selection - -Use **versioned tags** for production stability: - -``` -ghcr.io/open-webui/open-webui:v0.x.x -``` - -Avoid the `:main` tag in production — it tracks the latest development build and can introduce breaking changes without warning. Check the [Open WebUI releases](https://github.com/open-webui/open-webui/releases) for the latest stable version. - -### Scaling Strategy - -- **Platform-native auto-scaling**: Configure your container service to scale on CPU utilization, memory, or request count. -- **Health checks**: Use the `/health` endpoint for both liveness and readiness probes. -- **Task-level env vars**: Pass all shared infrastructure configuration as environment variables or secrets in your task definition. -- **Session affinity**: Enable sticky sessions on your load balancer for WebSocket stability. While Redis handles cross-instance coordination, session affinity reduces unnecessary session handoffs. - -### Key Considerations - -| Consideration | Detail | -| :--- | :--- | -| **Storage** | Use object storage (S3, GCS, Azure Blob) or a shared filesystem (such as EFS). Container-local storage is ephemeral and not shared across tasks. | -| **Tika sidecar** | Run Tika as a sidecar container in the same task definition, or as a separate service. Sidecar pattern keeps extraction traffic local. | -| **Secrets management** | Use your platform's secrets manager (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager) for `DATABASE_URL`, `REDIS_URL`, and `WEBUI_SECRET_KEY`. | -| **Updates** | Perform a rolling deployment with a single task first — this task runs migrations (`ENABLE_DB_MIGRATIONS=true`). Once healthy, scale the remaining tasks with `ENABLE_DB_MIGRATIONS=false`. | - -### Anti-Patterns to Avoid - -| Anti-Pattern | Impact | Fix | -| :--- | :--- | :--- | -| Using local SQLite | Data loss on task restart, database locks with multiple tasks | Set `DATABASE_URL` to PostgreSQL | -| Default ChromaDB | SQLite-backed vector DB crashes under multi-process access | Set `VECTOR_DB=pgvector` (or Milvus/Qdrant) | -| Inconsistent `WEBUI_SECRET_KEY` | Login loops, 401 errors, sessions that don't persist across tasks | Set the same key on every task via secrets manager | -| No Redis | WebSocket failures, config not syncing, "Model Not Found" errors | Set `REDIS_URL` and `WEBSOCKET_MANAGER=redis` | - -For container basics, see the [Docker Quick Start](/getting-started/quick-start#docker). For a Docker Swarm example with external ChromaDB, see the [Docker Swarm guide](/getting-started/quick-start#docker-swarm). - ---- - -## Option 3: Kubernetes with Helm - -Deploy using the official Open WebUI Helm chart on any Kubernetes distribution (EKS, AKS, GKE, OpenShift, Rancher, self-managed). - -### When to Choose This Pattern - -- Your organization runs Kubernetes and has platform engineering expertise -- You need declarative infrastructure-as-code with GitOps workflows -- You require advanced scaling (HPA), rolling updates, and pod disruption budgets -- You are deploying for hundreds to thousands of users in a mission-critical environment - -### Architecture - -```mermaid -flowchart LR - subgraph K8s["Kubernetes Cluster"] - - subgraph mgmt["Cluster Management"] - Ingress["Ingress Controller"] - HPA["HPA"] - end - - subgraph genai["GenAI Namespace"] - subgraph Deploy["Open WebUI Deployment"] - P1["Pod 1"] - P2["Pod 2"] - P3["Pod N"] - end - end - - subgraph data["Data Namespace"] - PG["PostgreSQL + PGVector"] - Redis["Redis"] - Storage["Shared Storage"] - Tika["Tika"] - end - - end - - Ingress --> Deploy - HPA -.- Deploy - Deploy --> data -``` - -### Helm Chart Setup - -```bash -# Add the repository -helm repo add open-webui https://open-webui.github.io/helm-charts -helm repo update - -# Install with custom values -helm install openwebui open-webui/open-webui -f values.yaml -``` - -Your `values.yaml` should override the defaults to point at your shared infrastructure. The chart has dedicated values for many common settings — use these instead of raw environment variables where available: - -```yaml -# Example values.yaml overrides (refer to chart documentation for full schema) -replicaCount: 3 - -# -- Database: use an external PostgreSQL instance -databaseUrl: "postgresql://user:password@db-host:5432/openwebui" - -# -- WebSocket & Redis: the chart can auto-deploy Redis in-cluster, -# or you can point to an external Redis instance via websocket.url -websocket: - enabled: true - manager: redis - # url: "redis://my-external-redis:6379/0" # uncomment to use external Redis - redis: - enabled: true # set to false if using external Redis - -# -- Tika: the chart can auto-deploy Tika in-cluster -tika: - enabled: true - -# -- Ollama: disable if using external model APIs or a separate Ollama deployment -ollama: - enabled: false - -# -- Storage: use object storage instead of local PVC for multi-replica -persistence: - provider: s3 # or "gcs" / "azure" - s3: - bucket: "my-openwebui-bucket" - region: "us-east-1" - accessKeyExistingSecret: "openwebui-s3-creds" - accessKeyExistingAccessKey: "access-key" - secretKeyExistingSecret: "openwebui-s3-creds" - secretKeyExistingSecretKey: "secret-key" - # -- Alternatively, use a shared filesystem (RWX PVC) instead of object storage: - # provider: local - # accessModes: - # - ReadWriteMany - # storageClass: "efs-sc" - -# -- Ingress: configure if exposing via an ingress controller -ingress: - enabled: true - class: "nginx" - host: "ai.example.com" - tls: true - existingSecret: "openwebui-tls" - annotations: - nginx.ingress.kubernetes.io/affinity: "cookie" - nginx.ingress.kubernetes.io/session-cookie-name: "open-webui-session" - nginx.ingress.kubernetes.io/session-cookie-expires: "172800" - nginx.ingress.kubernetes.io/session-cookie-max-age: "172800" - -# -- Remaining settings that don't have dedicated chart values -extraEnvVars: - - name: WEBUI_SECRET_KEY - valueFrom: - secretKeyRef: - name: openwebui-secrets - key: secret-key - - name: VECTOR_DB - value: "pgvector" - - name: PGVECTOR_DB_URL - valueFrom: - secretKeyRef: - name: openwebui-secrets - key: database-url - - name: UVICORN_WORKERS - value: "1" - - name: ENABLE_DB_MIGRATIONS - value: "false" - - name: RAG_EMBEDDING_ENGINE - value: "openai" -``` - -### Scaling Strategy - -- **Horizontal Pod Autoscaler (HPA)**: Scale on CPU or memory utilization. Keep `UVICORN_WORKERS=1` per pod and let Kubernetes manage the replica count. -- **Resource requests and limits**: Set appropriate CPU and memory requests to ensure the scheduler places pods correctly and the HPA has accurate metrics. -- **Pod disruption budgets**: Configure a PDB to ensure a minimum number of pods remain available during voluntary disruptions (node drains, cluster upgrades). - -### Update Procedure - -:::danger Critical Update Process -When running multiple replicas, you **must** follow this process for every update: - -1. Scale the deployment to **1 replica** -2. Apply the new image version (with `ENABLE_DB_MIGRATIONS=true` on the single replica) -3. Wait for the pod to become **fully ready** (database migrations complete) -4. Scale back to your desired replica count (with `ENABLE_DB_MIGRATIONS=false`) - -Skipping this process risks database corruption from concurrent migrations. -::: - -### Key Considerations - -| Consideration | Detail | -| :--- | :--- | -| **Storage** | Use a **ReadWriteMany (RWX)** shared filesystem (EFS, CephFS, NFS) or object storage (S3, GCS, Azure Blob) for uploaded files. ReadWriteOnce volumes will not work with multiple pods. | -| **Secrets** | Store credentials in Kubernetes Secrets and reference via `secretKeyRef`. Integrate with external secrets operators (External Secrets, Sealed Secrets) for GitOps workflows. | -| **Database** | Use a managed PostgreSQL service (RDS, Cloud SQL, Azure DB) for production. In-cluster PostgreSQL operators (CloudNativePG, Zalando) are viable but add operational burden. | -| **Redis** | A single Redis instance with `timeout 1800` and `maxclients 10000` is sufficient for most deployments. Redis Sentinel or Cluster is only needed if Redis itself must be highly available. | -| **Networking** | Keep all services in the same availability zone. Target < 2 ms database latency. Audit network policies to ensure pods can reach PostgreSQL, Redis, and storage backends. | - -For the complete Helm setup guide, see [Kubernetes Quick Start](/getting-started/quick-start#kubernetes--helm). For troubleshooting multi-replica issues, see [Multi-Replica Troubleshooting](/troubleshooting/multi-replica). - ---- - -## Deployment Comparison - -| | **Python / Pip (VMs)** | **Container Service** | **Kubernetes (Helm)** | -| :--- | :--- | :--- | :--- | -| **Operational complexity** | Moderate — OS patching, Python management | Low — platform-managed containers | Higher — requires K8s expertise | -| **Auto-scaling** | Cloud ASG/VMSS with health checks | Platform-native, minimal configuration | HPA with fine-grained control | -| **Container isolation** | None — process runs directly on OS | Full container isolation | Full container + namespace isolation | -| **Rolling updates** | Manual (scale down, update, scale up) | Platform-managed rolling deployments | Declarative rolling updates with rollback | -| **Infrastructure-as-code** | Terraform/Pulumi for VMs + config mgmt | Task/service definitions (CloudFormation, Bicep, Terraform) | Helm charts + GitOps (Argo CD, Flux) | -| **Best suited for** | Teams with VM-centric operations, regulatory constraints | Teams wanting container benefits without K8s complexity | Large-scale, mission-critical deployments | -| **Minimum team expertise** | Linux administration, Python | Container fundamentals, cloud platform | Kubernetes, Helm, cloud-native patterns | - ---- - -## Observability - -Production deployments should include monitoring and observability regardless of deployment pattern. - -### Health Checks - -- **`/health`** — Basic liveness check. Returns HTTP 200 when the application is running. Use this for load balancer and auto-scaler health checks. -- **`/api/models`** — Verifies the application can connect to configured model backends. Requires an API key. - -### OpenTelemetry - -Open WebUI supports **OpenTelemetry** for distributed tracing and HTTP metrics. Enable it with: - -```bash -ENABLE_OTEL=true -OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318 -OTEL_SERVICE_NAME=open-webui -``` - -This auto-instruments FastAPI, SQLAlchemy, Redis, and HTTP clients — giving visibility into request latency, database query performance, and cross-service traces. - -### Structured Logging - -Enable JSON-formatted logs for integration with log aggregation platforms (Datadog, Loki, CloudWatch, Splunk): - -```bash -LOG_FORMAT=json -GLOBAL_LOG_LEVEL=INFO -``` - -For full monitoring setup details, see [Monitoring](/reference/monitoring) and [OpenTelemetry](/reference/monitoring/otel). - ---- - -## Next Steps - -- **[Architecture & High Availability](/enterprise/architecture)** — Deeper dive into Open WebUI's stateless design and HA capabilities. -- **[Security](/enterprise/security)** — Compliance frameworks, SSO/LDAP integration, RBAC, and audit logging. -- **[Integration](/enterprise/integration)** — Connecting AI models, pipelines, and extending functionality. -- **[Scaling Open WebUI](/getting-started/advanced-topics/scaling)** — The complete step-by-step technical scaling guide. -- **[Multi-Replica Troubleshooting](/troubleshooting/multi-replica)** — Solutions for common issues in scaled deployments. - ---- - -**Need help planning your enterprise deployment?** Our team works with organizations worldwide to design and implement production Open WebUI environments. - -[**Contact Enterprise Sales → sales@openwebui.com**](mailto:sales@openwebui.com) diff --git a/docs/enterprise/deployment/container-service.md b/docs/enterprise/deployment/container-service.md new file mode 100644 index 00000000..9ca996f9 --- /dev/null +++ b/docs/enterprise/deployment/container-service.md @@ -0,0 +1,85 @@ +--- +sidebar_position: 2 +title: "Container Service" +--- + +# Container Service + +Run the official `ghcr.io/open-webui/open-webui` image on a managed container platform such as AWS ECS/Fargate, Azure Container Apps, or Google Cloud Run. + +:::info Prerequisites +Before proceeding, ensure you have configured the [shared infrastructure requirements](/enterprise/deployment#shared-infrastructure-requirements) — PostgreSQL, Redis, a vector database, shared storage, and content extraction. +::: + +## When to Choose This Pattern + +- You want container benefits (immutable images, versioned deployments, no OS management) without Kubernetes complexity +- Your organization already uses a managed container platform +- You need fast scaling with minimal operational overhead +- You prefer managed infrastructure with platform-native auto-scaling + +## Architecture + +```mermaid +flowchart TB + LB["Load Balancer"] + + subgraph CS["Container Service"] + T1["Container Task 1"] + T2["Container Task 2"] + T3["Container Task N"] + end + + subgraph Backend["Managed Backing Services"] + PG["PostgreSQL + PGVector"] + Redis["Redis"] + S3["Object Storage"] + Tika["Tika"] + end + + LB --> CS + CS --> Backend +``` + +## Image Selection + +Use **versioned tags** for production stability: + +``` +ghcr.io/open-webui/open-webui:v0.x.x +``` + +Avoid the `:main` tag in production — it tracks the latest development build and can introduce breaking changes without warning. Check the [Open WebUI releases](https://github.com/open-webui/open-webui/releases) for the latest stable version. + +## Scaling Strategy + +- **Platform-native auto-scaling**: Configure your container service to scale on CPU utilization, memory, or request count. +- **Health checks**: Use the `/health` endpoint for both liveness and readiness probes. +- **Task-level env vars**: Pass all shared infrastructure configuration as environment variables or secrets in your task definition. +- **Session affinity**: Enable sticky sessions on your load balancer for WebSocket stability. While Redis handles cross-instance coordination, session affinity reduces unnecessary session handoffs. + +## Key Considerations + +| Consideration | Detail | +| :--- | :--- | +| **Storage** | Use object storage (S3, GCS, Azure Blob) or a shared filesystem (such as EFS). Container-local storage is ephemeral and not shared across tasks. | +| **Tika sidecar** | Run Tika as a sidecar container in the same task definition, or as a separate service. Sidecar pattern keeps extraction traffic local. | +| **Secrets management** | Use your platform's secrets manager (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager) for `DATABASE_URL`, `REDIS_URL`, and `WEBUI_SECRET_KEY`. | +| **Updates** | Perform a rolling deployment with a single task first — this task runs migrations (`ENABLE_DB_MIGRATIONS=true`). Once healthy, scale the remaining tasks with `ENABLE_DB_MIGRATIONS=false`. | + +## Anti-Patterns to Avoid + +| Anti-Pattern | Impact | Fix | +| :--- | :--- | :--- | +| Using local SQLite | Data loss on task restart, database locks with multiple tasks | Set `DATABASE_URL` to PostgreSQL | +| Default ChromaDB | SQLite-backed vector DB crashes under multi-process access | Set `VECTOR_DB=pgvector` (or Milvus/Qdrant) | +| Inconsistent `WEBUI_SECRET_KEY` | Login loops, 401 errors, sessions that don't persist across tasks | Set the same key on every task via secrets manager | +| No Redis | WebSocket failures, config not syncing, "Model Not Found" errors | Set `REDIS_URL` and `WEBSOCKET_MANAGER=redis` | + +For container basics, see the [Quick Start guide](/getting-started/quick-start). + +--- + +**Need help planning your enterprise deployment?** Our team works with organizations worldwide to design and implement production Open WebUI environments. + +[**Contact Enterprise Sales → sales@openwebui.com**](mailto:sales@openwebui.com) diff --git a/docs/enterprise/deployment/index.md b/docs/enterprise/deployment/index.md new file mode 100644 index 00000000..b70927de --- /dev/null +++ b/docs/enterprise/deployment/index.md @@ -0,0 +1,154 @@ +--- +sidebar_position: 4 +title: "Deployment Options" +--- + +# Enterprise Deployment Options + +Open WebUI's **stateless, container-first architecture** means the same application runs identically whether you deploy it as a Python process on a VM, a container in a managed service, or a pod in a Kubernetes cluster. The difference between deployment patterns is how you **orchestrate, scale, and operate** the application — not how the application itself behaves. + +:::tip Model Inference Is Independent +How you serve LLM models is separate from how you deploy Open WebUI. You can use **managed APIs** (OpenAI, Anthropic, Azure OpenAI, Google Gemini) or **self-hosted inference** (Ollama, vLLM) with any deployment pattern. See [Integration](/enterprise/integration) for details on connecting models. +::: + +--- + +## Shared Infrastructure Requirements + +Regardless of which deployment pattern you choose, every scaled Open WebUI deployment requires the same set of backing services. Configure these **before** scaling beyond a single instance. + +| Component | Why It's Required | Options | +| :--- | :--- | :--- | +| **PostgreSQL** | Multi-instance deployments require a real database. SQLite does not support concurrent writes from multiple processes. | Self-managed, Amazon RDS, Azure Database for PostgreSQL, Google Cloud SQL | +| **Redis** | Session management, WebSocket coordination, and configuration sync across instances. | Self-managed, Amazon ElastiCache, Azure Cache for Redis, Google Memorystore | +| **Vector Database** | The default ChromaDB uses a local SQLite backend that is not safe for multi-process access. | PGVector (shares PostgreSQL), Milvus, Qdrant, or ChromaDB in HTTP server mode | +| **Shared Storage** | Uploaded files must be accessible from every instance. | Shared filesystem (NFS, EFS, CephFS) or object storage (`S3`, `GCS`, `Azure Blob`) | +| **Content Extraction** | The default `pypdf` extractor leaks memory under sustained load. | Apache Tika or Docling as a sidecar service | +| **Embedding Engine** | The default SentenceTransformers model loads ~500 MB into RAM per worker process. | OpenAI Embeddings API, or Ollama running an embedding model | + +### Critical Configuration + +These environment variables **must** be set consistently across every instance: + +```bash +# Shared secret — MUST be identical on all instances +WEBUI_SECRET_KEY=your-secret-key-here + +# Database +DATABASE_URL=postgresql://user:password@db-host:5432/openwebui + +# Vector Database +VECTOR_DB=pgvector +PGVECTOR_DB_URL=postgresql://user:password@db-host:5432/openwebui + +# Redis +REDIS_URL=redis://redis-host:6379/0 +WEBSOCKET_MANAGER=redis +ENABLE_WEBSOCKET_SUPPORT=true + +# Content Extraction +CONTENT_EXTRACTION_ENGINE=tika +TIKA_SERVER_URL=http://tika:9998 + +# Embeddings +RAG_EMBEDDING_ENGINE=openai + +# Storage — choose ONE: +# Option A: shared filesystem (mount the same volume to all instances, no env var needed) +# Option B: object storage (see https://docs.openwebui.com/reference/env-configuration#cloud-storage for all required vars) +# STORAGE_PROVIDER=s3 + +# Workers — let the orchestrator handle scaling +UVICORN_WORKERS=1 + +# Migrations — only ONE instance should run migrations +ENABLE_DB_MIGRATIONS=false +``` + +:::warning Database Migrations +Set `ENABLE_DB_MIGRATIONS=false` on **all instances except one**. During updates, scale down to a single instance, allow migrations to complete, then scale back up. Concurrent migrations can corrupt your database. +::: + +For the complete step-by-step scaling walkthrough, see [Scaling Open WebUI](/getting-started/advanced-topics/scaling). For the full environment variable reference, see [Environment Variable Configuration](/reference/env-configuration). + +--- + +## Choose Your Deployment Pattern + +Open WebUI supports three production deployment patterns. Each guide covers architecture, scaling strategy, and key considerations specific to that approach. + +### [Python / Pip on Auto-Scaling VMs](./python-pip) + +Deploy `open-webui serve` as a systemd-managed process on virtual machines in a cloud auto-scaling group (AWS ASG, Azure VMSS, GCP MIG). Best for teams with established VM-based infrastructure and strong Linux administration skills, or when regulatory requirements mandate direct OS-level control. + +### [Container Service](./container-service) + +Run the official Open WebUI container image on a managed platform such as AWS ECS/Fargate, Azure Container Apps, or Google Cloud Run. Best for teams wanting container benefits — immutable images, versioned deployments, no OS management — without Kubernetes complexity. + +### [Kubernetes with Helm](./kubernetes-helm) + +Deploy using the official Open WebUI Helm chart on any Kubernetes distribution (EKS, AKS, GKE, OpenShift, Rancher, self-managed). Best for large-scale, mission-critical deployments requiring declarative infrastructure-as-code, advanced auto-scaling, and GitOps workflows. + +--- + +## Deployment Comparison + +| | **Python / Pip (VMs)** | **Container Service** | **Kubernetes (Helm)** | +| :--- | :--- | :--- | :--- | +| **Operational complexity** | Moderate — OS patching, Python management | Low — platform-managed containers | Higher — requires K8s expertise | +| **Auto-scaling** | Cloud ASG/VMSS with health checks | Platform-native, minimal configuration | HPA with fine-grained control | +| **Container isolation** | None — process runs directly on OS | Full container isolation | Full container + namespace isolation | +| **Rolling updates** | Manual (scale down, update, scale up) | Platform-managed rolling deployments | Declarative rolling updates with rollback | +| **Infrastructure-as-code** | Terraform/Pulumi for VMs + config mgmt | Task/service definitions (CloudFormation, Bicep, Terraform) | Helm charts + GitOps (Argo CD, Flux) | +| **Best suited for** | Teams with VM-centric operations, regulatory constraints | Teams wanting container benefits without K8s complexity | Large-scale, mission-critical deployments | +| **Minimum team expertise** | Linux administration, Python | Container fundamentals, cloud platform | Kubernetes, Helm, cloud-native patterns | + +--- + +## Observability + +Production deployments should include monitoring and observability regardless of deployment pattern. + +### Health Checks + +- **`/health`** — Basic liveness check. Returns HTTP 200 when the application is running. Use this for load balancer and auto-scaler health checks. +- **`/api/models`** — Verifies the application can connect to configured model backends. Requires an API key. + +### OpenTelemetry + +Open WebUI supports **OpenTelemetry** for distributed tracing and HTTP metrics. Enable it with: + +```bash +ENABLE_OTEL=true +OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318 +OTEL_SERVICE_NAME=open-webui +``` + +This auto-instruments FastAPI, SQLAlchemy, Redis, and HTTP clients — giving visibility into request latency, database query performance, and cross-service traces. + +### Structured Logging + +Enable JSON-formatted logs for integration with log aggregation platforms (Datadog, Loki, CloudWatch, Splunk): + +```bash +LOG_FORMAT=json +GLOBAL_LOG_LEVEL=INFO +``` + +For full monitoring setup details, see [Monitoring](/reference/monitoring) and [OpenTelemetry](/reference/monitoring/otel). + +--- + +## Next Steps + +- **[Architecture & High Availability](/enterprise/architecture)** — Deeper dive into Open WebUI's stateless design and HA capabilities. +- **[Security](/enterprise/security)** — Compliance frameworks, SSO/LDAP integration, RBAC, and audit logging. +- **[Integration](/enterprise/integration)** — Connecting AI models, pipelines, and extending functionality. +- **[Scaling Open WebUI](/getting-started/advanced-topics/scaling)** — The complete step-by-step technical scaling guide. +- **[Multi-Replica Troubleshooting](/troubleshooting/multi-replica)** — Solutions for common issues in scaled deployments. + +--- + +**Need help planning your enterprise deployment?** Our team works with organizations worldwide to design and implement production Open WebUI environments. + +[**Contact Enterprise Sales → sales@openwebui.com**](mailto:sales@openwebui.com) diff --git a/docs/enterprise/deployment/kubernetes-helm.md b/docs/enterprise/deployment/kubernetes-helm.md new file mode 100644 index 00000000..f5d15dbe --- /dev/null +++ b/docs/enterprise/deployment/kubernetes-helm.md @@ -0,0 +1,177 @@ +--- +sidebar_position: 3 +title: "Kubernetes with Helm" +--- + +# Kubernetes with Helm + +Deploy using the official Open WebUI Helm chart on any Kubernetes distribution (EKS, AKS, GKE, OpenShift, Rancher, self-managed). + +:::info Prerequisites +Before proceeding, ensure you have configured the [shared infrastructure requirements](/enterprise/deployment#shared-infrastructure-requirements) — PostgreSQL, Redis, a vector database, shared storage, and content extraction. +::: + +## When to Choose This Pattern + +- Your organization runs Kubernetes and has platform engineering expertise +- You need declarative infrastructure-as-code with GitOps workflows +- You require advanced scaling (HPA), rolling updates, and pod disruption budgets +- You are deploying for hundreds to thousands of users in a mission-critical environment + +## Architecture + +```mermaid +flowchart LR + subgraph K8s["Kubernetes Cluster"] + + subgraph mgmt["Cluster Management"] + Ingress["Ingress Controller"] + HPA["HPA"] + end + + subgraph genai["GenAI Namespace"] + subgraph Deploy["Open WebUI Deployment"] + P1["Pod 1"] + P2["Pod 2"] + P3["Pod N"] + end + end + + subgraph data["Data Namespace"] + PG["PostgreSQL + PGVector"] + Redis["Redis"] + Storage["Shared Storage"] + Tika["Tika"] + end + + end + + Ingress --> Deploy + HPA -.- Deploy + Deploy --> data +``` + +## Helm Chart Setup + +```bash +# Add the repository +helm repo add open-webui https://open-webui.github.io/helm-charts +helm repo update + +# Install with custom values +helm install openwebui open-webui/open-webui -f values.yaml +``` + +Your `values.yaml` should override the defaults to point at your shared infrastructure. The chart has dedicated values for many common settings — use these instead of raw environment variables where available: + +```yaml +# Example values.yaml overrides (refer to chart documentation for full schema) +replicaCount: 3 + +# -- Database: use an external PostgreSQL instance +databaseUrl: "postgresql://user:password@db-host:5432/openwebui" + +# -- WebSocket & Redis: the chart can auto-deploy Redis in-cluster, +# or you can point to an external Redis instance via websocket.url +websocket: + enabled: true + manager: redis + # url: "redis://my-external-redis:6379/0" # uncomment to use external Redis + redis: + enabled: true # set to false if using external Redis + +# -- Tika: the chart can auto-deploy Tika in-cluster +tika: + enabled: true + +# -- Ollama: disable if using external model APIs or a separate Ollama deployment +ollama: + enabled: false + +# -- Storage: use object storage instead of local PVC for multi-replica +persistence: + provider: s3 # or "gcs" / "azure" + s3: + bucket: "my-openwebui-bucket" + region: "us-east-1" + accessKeyExistingSecret: "openwebui-s3-creds" + accessKeyExistingAccessKey: "access-key" + secretKeyExistingSecret: "openwebui-s3-creds" + secretKeyExistingSecretKey: "secret-key" + # -- Alternatively, use a shared filesystem (RWX PVC) instead of object storage: + # provider: local + # accessModes: + # - ReadWriteMany + # storageClass: "efs-sc" + +# -- Ingress: configure if exposing via an ingress controller +ingress: + enabled: true + class: "nginx" + host: "ai.example.com" + tls: true + existingSecret: "openwebui-tls" + annotations: + nginx.ingress.kubernetes.io/affinity: "cookie" + nginx.ingress.kubernetes.io/session-cookie-name: "open-webui-session" + nginx.ingress.kubernetes.io/session-cookie-expires: "172800" + nginx.ingress.kubernetes.io/session-cookie-max-age: "172800" + +# -- Remaining settings that don't have dedicated chart values +extraEnvVars: + - name: WEBUI_SECRET_KEY + valueFrom: + secretKeyRef: + name: openwebui-secrets + key: secret-key + - name: VECTOR_DB + value: "pgvector" + - name: PGVECTOR_DB_URL + valueFrom: + secretKeyRef: + name: openwebui-secrets + key: database-url + - name: UVICORN_WORKERS + value: "1" + - name: ENABLE_DB_MIGRATIONS + value: "false" + - name: RAG_EMBEDDING_ENGINE + value: "openai" +``` + +## Scaling Strategy + +- **Horizontal Pod Autoscaler (HPA)**: Scale on CPU or memory utilization. Keep `UVICORN_WORKERS=1` per pod and let Kubernetes manage the replica count. +- **Resource requests and limits**: Set appropriate CPU and memory requests to ensure the scheduler places pods correctly and the HPA has accurate metrics. +- **Pod disruption budgets**: Configure a PDB to ensure a minimum number of pods remain available during voluntary disruptions (node drains, cluster upgrades). + +## Update Procedure + +:::danger Critical Update Process +When running multiple replicas, you **must** follow this process for every update: + +1. Scale the deployment to **1 replica** +2. Apply the new image version (with `ENABLE_DB_MIGRATIONS=true` on the single replica) +3. Wait for the pod to become **fully ready** (database migrations complete) +4. Scale back to your desired replica count (with `ENABLE_DB_MIGRATIONS=false`) + +Skipping this process risks database corruption from concurrent migrations. +::: + +## Key Considerations + +| Consideration | Detail | +| :--- | :--- | +| **Storage** | Use a **ReadWriteMany (RWX)** shared filesystem (EFS, CephFS, NFS) or object storage (S3, GCS, Azure Blob) for uploaded files. ReadWriteOnce volumes will not work with multiple pods. | +| **Secrets** | Store credentials in Kubernetes Secrets and reference via `secretKeyRef`. Integrate with external secrets operators (External Secrets, Sealed Secrets) for GitOps workflows. | +| **Database** | Use a managed PostgreSQL service (RDS, Cloud SQL, Azure DB) for production. In-cluster PostgreSQL operators (CloudNativePG, Zalando) are viable but add operational burden. | +| **Redis** | A single Redis instance with `timeout 1800` and `maxclients 10000` is sufficient for most deployments. Redis Sentinel or Cluster is only needed if Redis itself must be highly available. | +| **Networking** | Keep all services in the same availability zone. Target < 2 ms database latency. Audit network policies to ensure pods can reach PostgreSQL, Redis, and storage backends. | + +For the complete Helm setup guide, see the [Quick Start guide](/getting-started/quick-start). For troubleshooting multi-replica issues, see [Multi-Replica Troubleshooting](/troubleshooting/multi-replica). + +--- + +**Need help planning your enterprise deployment?** Our team works with organizations worldwide to design and implement production Open WebUI environments. + +[**Contact Enterprise Sales → sales@openwebui.com**](mailto:sales@openwebui.com) diff --git a/docs/enterprise/deployment/python-pip.md b/docs/enterprise/deployment/python-pip.md new file mode 100644 index 00000000..cbbb34fe --- /dev/null +++ b/docs/enterprise/deployment/python-pip.md @@ -0,0 +1,96 @@ +--- +sidebar_position: 1 +title: "Python / Pip on VMs" +--- + +# Python / Pip on Auto-Scaling VMs + +Deploy `open-webui serve` as a systemd-managed process on virtual machines in a cloud auto-scaling group (AWS ASG, Azure VMSS, GCP MIG). + +:::info Prerequisites +Before proceeding, ensure you have configured the [shared infrastructure requirements](/enterprise/deployment#shared-infrastructure-requirements) — PostgreSQL, Redis, a vector database, shared storage, and content extraction. +::: + +## When to Choose This Pattern + +- Your organization has established VM-based infrastructure and operational practices +- Regulatory or compliance requirements mandate direct OS-level control +- Your team has limited container expertise but strong Linux administration skills +- You want a straightforward deployment without container orchestration overhead + +## Architecture + +```mermaid +flowchart TB + LB["Load Balancer"] + + subgraph ASG["Auto-Scaling Group"] + VM1["VM 1"] + VM2["VM 2"] + VM3["VM N"] + end + + subgraph Backend["Backing Services"] + PG["PostgreSQL + PGVector"] + Redis["Redis"] + S3["Object Storage"] + Tika["Tika"] + end + + LB --> ASG + ASG --> Backend +``` + +## Installation + +Install on each VM using pip with the `[all]` extra (includes PostgreSQL drivers): + +```bash +pip install open-webui[all] +``` + +Create a systemd unit to manage the process: + +```ini +[Unit] +Description=Open WebUI +After=network.target + +[Service] +Type=simple +User=openwebui +EnvironmentFile=/etc/open-webui/env +ExecStart=/usr/local/bin/open-webui serve +Restart=always +RestartSec=5 + +[Install] +WantedBy=multi-user.target +``` + +Place your environment variables in `/etc/open-webui/env` (see [Critical Configuration](/enterprise/deployment#critical-configuration)). + +## Scaling Strategy + +- **Horizontal scaling**: Configure your auto-scaling group to add or remove VMs based on CPU utilization or request count. +- **Health checks**: Point your load balancer health check at the `/health` endpoint (HTTP 200 when healthy). +- **One process per VM**: Keep `UVICORN_WORKERS=1` and let the auto-scaler manage capacity. This simplifies memory accounting and avoids fork-safety issues with the default vector database. +- **Sticky sessions**: Configure your load balancer for cookie-based session affinity to ensure WebSocket connections remain routed to the same instance. + +## Key Considerations + +| Consideration | Detail | +| :--- | :--- | +| **OS patching** | You are responsible for OS updates, security patches, and Python runtime management. | +| **Python environment** | Pin your Python version (3.11 recommended) and use a virtual environment or system-level install. | +| **Storage** | Use object storage (such as S3) or a shared filesystem (such as NFS) since VMs in an auto-scaling group do not share a local filesystem. | +| **Tika sidecar** | Run a Tika server on each VM or as a shared service. A shared instance simplifies management. | +| **Updates** | Scale the group to 1 instance, update the package (`pip install --upgrade open-webui`), wait for database migrations to complete, then scale back up. | + +For pip installation basics, see the [Quick Start guide](/getting-started/quick-start). + +--- + +**Need help planning your enterprise deployment?** Our team works with organizations worldwide to design and implement production Open WebUI environments. + +[**Contact Enterprise Sales → sales@openwebui.com**](mailto:sales@openwebui.com)