From 12c84822fe80eb3edd6bed003493209f8dbe8134 Mon Sep 17 00:00:00 2001 From: James Westbrook <0xthresh@protonmail.com> Date: Wed, 4 Mar 2026 17:47:37 -0700 Subject: [PATCH] docs: initial deployments doc --- docs/enterprise/deployment.md | 460 ++++++++++++++++++++++++++++++++++ docs/enterprise/support.md | 2 +- 2 files changed, 461 insertions(+), 1 deletion(-) create mode 100644 docs/enterprise/deployment.md diff --git a/docs/enterprise/deployment.md b/docs/enterprise/deployment.md new file mode 100644 index 00000000..ef9dff98 --- /dev/null +++ b/docs/enterprise/deployment.md @@ -0,0 +1,460 @@ +--- +sidebar_position: 4 +title: "Deployment Options" +--- + +# Enterprise Deployment Options + +Open WebUI's **stateless, container-first architecture** means the same application runs identically whether you deploy it as a Python process on a VM, a container in a managed service, or a pod in a Kubernetes cluster. The difference between deployment patterns is how you **orchestrate, scale, and operate** the application — not how the application itself behaves. + +This guide covers three production deployment patterns for enterprise environments. Each section includes an architecture overview, scaling strategy, and key considerations to help both technical decision-makers evaluating options and platform engineers planning implementation. + +:::tip Model Inference Is Independent +How you serve LLM models is separate from how you deploy Open WebUI. You can use **managed APIs** (OpenAI, Anthropic, Azure OpenAI, Google Gemini) or **self-hosted inference** (Ollama, vLLM) with any deployment pattern. See [Integration](/enterprise/integration) for details on connecting models. +::: + +--- + +## Shared Infrastructure Requirements + +Regardless of which deployment pattern you choose, every scaled Open WebUI deployment requires the same set of backing services. Configure these **before** scaling beyond a single instance. + +| Component | Why It's Required | Options | +| :--- | :--- | :--- | +| **PostgreSQL** | Multi-instance deployments require a real database. SQLite does not support concurrent writes from multiple processes. | Self-managed, Amazon RDS, Azure Database for PostgreSQL, Google Cloud SQL | +| **Redis** | Session management, WebSocket coordination, and configuration sync across instances. | Self-managed, Amazon ElastiCache, Azure Cache for Redis, Google Memorystore | +| **Vector Database** | The default ChromaDB uses a local SQLite backend that is not safe for multi-process access. | PGVector (shares PostgreSQL), Milvus, Qdrant, or ChromaDB in HTTP server mode | +| **Shared Storage** | Uploaded files must be accessible from every instance. | Shared filesystem (NFS, EFS, CephFS) or object storage (`S3`, `GCS`, `Azure Blob`) | +| **Content Extraction** | The default `pypdf` extractor leaks memory under sustained load. | Apache Tika or Docling as a sidecar service | +| **Embedding Engine** | The default SentenceTransformers model loads ~500 MB into RAM per worker process. | OpenAI Embeddings API, or Ollama running an embedding model | + +### Critical Configuration + +These environment variables **must** be set consistently across every instance: + +```bash +# Shared secret — MUST be identical on all instances +WEBUI_SECRET_KEY=your-secret-key-here + +# Database +DATABASE_URL=postgresql://user:password@db-host:5432/openwebui + +# Vector Database +VECTOR_DB=pgvector +PGVECTOR_DB_URL=postgresql://user:password@db-host:5432/openwebui + +# Redis +REDIS_URL=redis://redis-host:6379/0 +WEBSOCKET_MANAGER=redis +ENABLE_WEBSOCKET_SUPPORT=true + +# Content Extraction +CONTENT_EXTRACTION_ENGINE=tika +TIKA_SERVER_URL=http://tika:9998 + +# Embeddings +RAG_EMBEDDING_ENGINE=openai + +# Storage — choose ONE: +# Option A: shared filesystem (mount the same volume to all instances, no env var needed) +# Option B: object storage (see https://docs.openwebui.com/reference/env-configuration#cloud-storage for all required vars) +# STORAGE_PROVIDER=s3 + +# Workers — let the orchestrator handle scaling +UVICORN_WORKERS=1 + +# Migrations — only ONE instance should run migrations +ENABLE_DB_MIGRATIONS=false +``` + +:::warning Database Migrations +Set `ENABLE_DB_MIGRATIONS=false` on **all instances except one**. During updates, scale down to a single instance, allow migrations to complete, then scale back up. Concurrent migrations can corrupt your database. +::: + +For the complete step-by-step scaling walkthrough, see [Scaling Open WebUI](/getting-started/advanced-topics/scaling). For the full environment variable reference, see [Environment Variable Configuration](/reference/env-configuration). + +--- + +## Option 1: Python / Pip on Auto-Scaling VMs + +Deploy `open-webui serve` as a systemd-managed process on virtual machines in a cloud auto-scaling group (AWS ASG, Azure VMSS, GCP MIG). + +### When to Choose This Pattern + +- Your organization has established VM-based infrastructure and operational practices +- Regulatory or compliance requirements mandate direct OS-level control +- Your team has limited container expertise but strong Linux administration skills +- You want a straightforward deployment without container orchestration overhead + +### Architecture + +```mermaid +flowchart TB + LB["Load Balancer"] + + subgraph ASG["Auto-Scaling Group"] + VM1["VM 1"] + VM2["VM 2"] + VM3["VM N"] + end + + subgraph Backend["Backing Services"] + PG["PostgreSQL + PGVector"] + Redis["Redis"] + S3["Object Storage"] + Tika["Tika"] + end + + LB --> ASG + ASG --> Backend +``` + +### Installation + +Install on each VM using pip with the `[all]` extra (includes PostgreSQL drivers): + +```bash +pip install open-webui[all] +``` + +Create a systemd unit to manage the process: + +```ini +[Unit] +Description=Open WebUI +After=network.target + +[Service] +Type=simple +User=openwebui +EnvironmentFile=/etc/open-webui/env +ExecStart=/usr/local/bin/open-webui serve +Restart=always +RestartSec=5 + +[Install] +WantedBy=multi-user.target +``` + +Place your environment variables in `/etc/open-webui/env` (see [Critical Configuration](#critical-configuration) above). + +### Scaling Strategy + +- **Horizontal scaling**: Configure your auto-scaling group to add or remove VMs based on CPU utilization or request count. +- **Health checks**: Point your load balancer health check at the `/health` endpoint (HTTP 200 when healthy). +- **One process per VM**: Keep `UVICORN_WORKERS=1` and let the auto-scaler manage capacity. This simplifies memory accounting and avoids fork-safety issues with the default vector database. +- **Sticky sessions**: Configure your load balancer for cookie-based session affinity to ensure WebSocket connections remain routed to the same instance. + +### Key Considerations + +| Consideration | Detail | +| :--- | :--- | +| **OS patching** | You are responsible for OS updates, security patches, and Python runtime management. | +| **Python environment** | Pin your Python version (3.11 recommended) and use a virtual environment or system-level install. | +| **Storage** | Use object storage (such as S3) or a shared filesystem (such as NFS) since VMs in an auto-scaling group do not share a local filesystem. | +| **Tika sidecar** | Run a Tika server on each VM or as a shared service. A shared instance simplifies management. | +| **Updates** | Scale the group to 1 instance, update the package (`pip install --upgrade open-webui`), wait for database migrations to complete, then scale back up. | + +For pip installation basics, see the [Python Quick Start](/getting-started/quick-start#python). + +--- + +## Option 2: Container Service + +Run the official `ghcr.io/open-webui/open-webui` image on a managed container platform such as AWS ECS/Fargate, Azure Container Apps, or Google Cloud Run. + +### When to Choose This Pattern + +- You want container benefits (immutable images, versioned deployments, no OS management) without Kubernetes complexity +- Your organization already uses a managed container platform +- You need fast scaling with minimal operational overhead +- You prefer managed infrastructure with platform-native auto-scaling + +### Architecture + +```mermaid +flowchart TB + LB["Load Balancer"] + + subgraph CS["Container Service"] + T1["Container Task 1"] + T2["Container Task 2"] + T3["Container Task N"] + end + + subgraph Backend["Managed Backing Services"] + PG["PostgreSQL + PGVector"] + Redis["Redis"] + S3["Object Storage"] + Tika["Tika"] + end + + LB --> CS + CS --> Backend +``` + +### Image Selection + +Use **versioned tags** for production stability: + +``` +ghcr.io/open-webui/open-webui:v0.x.x +``` + +Avoid the `:main` tag in production — it tracks the latest development build and can introduce breaking changes without warning. Check the [Open WebUI releases](https://github.com/open-webui/open-webui/releases) for the latest stable version. + +### Scaling Strategy + +- **Platform-native auto-scaling**: Configure your container service to scale on CPU utilization, memory, or request count. +- **Health checks**: Use the `/health` endpoint for both liveness and readiness probes. +- **Task-level env vars**: Pass all shared infrastructure configuration as environment variables or secrets in your task definition. +- **Session affinity**: Enable sticky sessions on your load balancer for WebSocket stability. While Redis handles cross-instance coordination, session affinity reduces unnecessary session handoffs. + +### Key Considerations + +| Consideration | Detail | +| :--- | :--- | +| **Storage** | Use object storage (S3, GCS, Azure Blob) or a shared filesystem (such as EFS). Container-local storage is ephemeral and not shared across tasks. | +| **Tika sidecar** | Run Tika as a sidecar container in the same task definition, or as a separate service. Sidecar pattern keeps extraction traffic local. | +| **Secrets management** | Use your platform's secrets manager (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager) for `DATABASE_URL`, `REDIS_URL`, and `WEBUI_SECRET_KEY`. | +| **Updates** | Perform a rolling deployment with a single task first — this task runs migrations (`ENABLE_DB_MIGRATIONS=true`). Once healthy, scale the remaining tasks with `ENABLE_DB_MIGRATIONS=false`. | + +### Anti-Patterns to Avoid + +| Anti-Pattern | Impact | Fix | +| :--- | :--- | :--- | +| Using local SQLite | Data loss on task restart, database locks with multiple tasks | Set `DATABASE_URL` to PostgreSQL | +| Default ChromaDB | SQLite-backed vector DB crashes under multi-process access | Set `VECTOR_DB=pgvector` (or Milvus/Qdrant) | +| Inconsistent `WEBUI_SECRET_KEY` | Login loops, 401 errors, sessions that don't persist across tasks | Set the same key on every task via secrets manager | +| No Redis | WebSocket failures, config not syncing, "Model Not Found" errors | Set `REDIS_URL` and `WEBSOCKET_MANAGER=redis` | + +For container basics, see the [Docker Quick Start](/getting-started/quick-start#docker). For a Docker Swarm example with external ChromaDB, see the [Docker Swarm guide](/getting-started/quick-start#docker-swarm). + +--- + +## Option 3: Kubernetes with Helm + +Deploy using the official Open WebUI Helm chart on any Kubernetes distribution (EKS, AKS, GKE, OpenShift, Rancher, self-managed). + +### When to Choose This Pattern + +- Your organization runs Kubernetes and has platform engineering expertise +- You need declarative infrastructure-as-code with GitOps workflows +- You require advanced scaling (HPA), rolling updates, and pod disruption budgets +- You are deploying for hundreds to thousands of users in a mission-critical environment + +### Architecture + +```mermaid +flowchart LR + subgraph K8s["Kubernetes Cluster"] + + subgraph mgmt["Cluster Management"] + Ingress["Ingress Controller"] + HPA["HPA"] + end + + subgraph genai["GenAI Namespace"] + subgraph Deploy["Open WebUI Deployment"] + P1["Pod 1"] + P2["Pod 2"] + P3["Pod N"] + end + end + + subgraph data["Data Namespace"] + PG["PostgreSQL + PGVector"] + Redis["Redis"] + Storage["Shared Storage"] + Tika["Tika"] + end + + end + + Ingress --> Deploy + HPA -.- Deploy + Deploy --> data +``` + +### Helm Chart Setup + +```bash +# Add the repository +helm repo add open-webui https://open-webui.github.io/helm-charts +helm repo update + +# Install with custom values +helm install openwebui open-webui/open-webui -f values.yaml +``` + +Your `values.yaml` should override the defaults to point at your shared infrastructure. The chart has dedicated values for many common settings — use these instead of raw environment variables where available: + +```yaml +# Example values.yaml overrides (refer to chart documentation for full schema) +replicaCount: 3 + +# -- Database: use an external PostgreSQL instance +databaseUrl: "postgresql://user:password@db-host:5432/openwebui" + +# -- WebSocket & Redis: the chart can auto-deploy Redis in-cluster, +# or you can point to an external Redis instance via websocket.url +websocket: + enabled: true + manager: redis + # url: "redis://my-external-redis:6379/0" # uncomment to use external Redis + redis: + enabled: true # set to false if using external Redis + +# -- Tika: the chart can auto-deploy Tika in-cluster +tika: + enabled: true + +# -- Ollama: disable if using external model APIs or a separate Ollama deployment +ollama: + enabled: false + +# -- Storage: use object storage instead of local PVC for multi-replica +persistence: + provider: s3 # or "gcs" / "azure" + s3: + bucket: "my-openwebui-bucket" + region: "us-east-1" + accessKeyExistingSecret: "openwebui-s3-creds" + accessKeyExistingAccessKey: "access-key" + secretKeyExistingSecret: "openwebui-s3-creds" + secretKeyExistingSecretKey: "secret-key" + # -- Alternatively, use a shared filesystem (RWX PVC) instead of object storage: + # provider: local + # accessModes: + # - ReadWriteMany + # storageClass: "efs-sc" + +# -- Ingress: configure if exposing via an ingress controller +ingress: + enabled: true + class: "nginx" + host: "ai.example.com" + tls: true + existingSecret: "openwebui-tls" + annotations: + nginx.ingress.kubernetes.io/affinity: "cookie" + nginx.ingress.kubernetes.io/session-cookie-name: "open-webui-session" + nginx.ingress.kubernetes.io/session-cookie-expires: "172800" + nginx.ingress.kubernetes.io/session-cookie-max-age: "172800" + +# -- Remaining settings that don't have dedicated chart values +extraEnvVars: + - name: WEBUI_SECRET_KEY + valueFrom: + secretKeyRef: + name: openwebui-secrets + key: secret-key + - name: VECTOR_DB + value: "pgvector" + - name: PGVECTOR_DB_URL + valueFrom: + secretKeyRef: + name: openwebui-secrets + key: database-url + - name: UVICORN_WORKERS + value: "1" + - name: ENABLE_DB_MIGRATIONS + value: "false" + - name: RAG_EMBEDDING_ENGINE + value: "openai" +``` + +### Scaling Strategy + +- **Horizontal Pod Autoscaler (HPA)**: Scale on CPU or memory utilization. Keep `UVICORN_WORKERS=1` per pod and let Kubernetes manage the replica count. +- **Resource requests and limits**: Set appropriate CPU and memory requests to ensure the scheduler places pods correctly and the HPA has accurate metrics. +- **Pod disruption budgets**: Configure a PDB to ensure a minimum number of pods remain available during voluntary disruptions (node drains, cluster upgrades). + +### Update Procedure + +:::danger Critical Update Process +When running multiple replicas, you **must** follow this process for every update: + +1. Scale the deployment to **1 replica** +2. Apply the new image version (with `ENABLE_DB_MIGRATIONS=true` on the single replica) +3. Wait for the pod to become **fully ready** (database migrations complete) +4. Scale back to your desired replica count (with `ENABLE_DB_MIGRATIONS=false`) + +Skipping this process risks database corruption from concurrent migrations. +::: + +### Key Considerations + +| Consideration | Detail | +| :--- | :--- | +| **Storage** | Use a **ReadWriteMany (RWX)** shared filesystem (EFS, CephFS, NFS) or object storage (S3, GCS, Azure Blob) for uploaded files. ReadWriteOnce volumes will not work with multiple pods. | +| **Secrets** | Store credentials in Kubernetes Secrets and reference via `secretKeyRef`. Integrate with external secrets operators (External Secrets, Sealed Secrets) for GitOps workflows. | +| **Database** | Use a managed PostgreSQL service (RDS, Cloud SQL, Azure DB) for production. In-cluster PostgreSQL operators (CloudNativePG, Zalando) are viable but add operational burden. | +| **Redis** | A single Redis instance with `timeout 1800` and `maxclients 10000` is sufficient for most deployments. Redis Sentinel or Cluster is only needed if Redis itself must be highly available. | +| **Networking** | Keep all services in the same availability zone. Target < 2 ms database latency. Audit network policies to ensure pods can reach PostgreSQL, Redis, and storage backends. | + +For the complete Helm setup guide, see [Kubernetes Quick Start](/getting-started/quick-start#kubernetes--helm). For troubleshooting multi-replica issues, see [Multi-Replica Troubleshooting](/troubleshooting/multi-replica). + +--- + +## Deployment Comparison + +| | **Python / Pip (VMs)** | **Container Service** | **Kubernetes (Helm)** | +| :--- | :--- | :--- | :--- | +| **Operational complexity** | Moderate — OS patching, Python management | Low — platform-managed containers | Higher — requires K8s expertise | +| **Auto-scaling** | Cloud ASG/VMSS with health checks | Platform-native, minimal configuration | HPA with fine-grained control | +| **Container isolation** | None — process runs directly on OS | Full container isolation | Full container + namespace isolation | +| **Rolling updates** | Manual (scale down, update, scale up) | Platform-managed rolling deployments | Declarative rolling updates with rollback | +| **Infrastructure-as-code** | Terraform/Pulumi for VMs + config mgmt | Task/service definitions (CloudFormation, Bicep, Terraform) | Helm charts + GitOps (Argo CD, Flux) | +| **Best suited for** | Teams with VM-centric operations, regulatory constraints | Teams wanting container benefits without K8s complexity | Large-scale, mission-critical deployments | +| **Minimum team expertise** | Linux administration, Python | Container fundamentals, cloud platform | Kubernetes, Helm, cloud-native patterns | + +--- + +## Observability + +Production deployments should include monitoring and observability regardless of deployment pattern. + +### Health Checks + +- **`/health`** — Basic liveness check. Returns HTTP 200 when the application is running. Use this for load balancer and auto-scaler health checks. +- **`/api/models`** — Verifies the application can connect to configured model backends. Requires an API key. + +### OpenTelemetry + +Open WebUI supports **OpenTelemetry** for distributed tracing and HTTP metrics. Enable it with: + +```bash +ENABLE_OTEL=true +OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318 +OTEL_SERVICE_NAME=open-webui +``` + +This auto-instruments FastAPI, SQLAlchemy, Redis, and HTTP clients — giving visibility into request latency, database query performance, and cross-service traces. + +### Structured Logging + +Enable JSON-formatted logs for integration with log aggregation platforms (Datadog, Loki, CloudWatch, Splunk): + +```bash +LOG_FORMAT=json +GLOBAL_LOG_LEVEL=INFO +``` + +For full monitoring setup details, see [Monitoring](/reference/monitoring) and [OpenTelemetry](/reference/monitoring/otel). + +--- + +## Next Steps + +- **[Architecture & High Availability](/enterprise/architecture)** — Deeper dive into Open WebUI's stateless design and HA capabilities. +- **[Security](/enterprise/security)** — Compliance frameworks, SSO/LDAP integration, RBAC, and audit logging. +- **[Integration](/enterprise/integration)** — Connecting AI models, pipelines, and extending functionality. +- **[Scaling Open WebUI](/getting-started/advanced-topics/scaling)** — The complete step-by-step technical scaling guide. +- **[Multi-Replica Troubleshooting](/troubleshooting/multi-replica)** — Solutions for common issues in scaled deployments. + +--- + +**Need help planning your enterprise deployment?** Our team works with organizations worldwide to design and implement production Open WebUI environments. + +[**Contact Enterprise Sales → sales@openwebui.com**](mailto:sales@openwebui.com) diff --git a/docs/enterprise/support.md b/docs/enterprise/support.md index a41f7aa1..84445fad 100644 --- a/docs/enterprise/support.md +++ b/docs/enterprise/support.md @@ -1,5 +1,5 @@ --- -sidebar_position: 4 +sidebar_position: 7 title: "Support & SLAs" ---