From e28b702f78778d4719f3c83d35e6893275db3b2f Mon Sep 17 00:00:00 2001 From: DrMelone <27028174+Classic298@users.noreply.github.com> Date: Fri, 2 Jan 2026 16:29:14 +0100 Subject: [PATCH] cloud --- docs/troubleshooting/multi-replica.mdx | 13 ++++++ docs/tutorials/tips/performance.md | 59 ++++++++++++++++++++++++++ 2 files changed, 72 insertions(+) diff --git a/docs/troubleshooting/multi-replica.mdx b/docs/troubleshooting/multi-replica.mdx index 1b8ae8cc..9d7dc25d 100644 --- a/docs/troubleshooting/multi-replica.mdx +++ b/docs/troubleshooting/multi-replica.mdx @@ -116,6 +116,18 @@ The `/app/backend/data` directory is not shared or is not consistent across repl - **Kubernetes:** Use a `PersistentVolumeClaim` with `ReadWriteMany` (RWX) access mode if your storage provider supports it (e.g., NFS, CephFS, AWS EFS). - **Docker Swarm/Compose:** Mount a shared volume (e.g., NFS mount) to `/app/backend/data` on all containers. +### 6. Slow Performance in Cloud vs. Local Kubernetes + +**Symptoms:** +- Open WebUI performs well locally but experiences significant degradation or timeouts when deployed to cloud providers (AKS, EKS, GKE). +- Performance drops sharply under concurrent load despite adequate resource allocation. + +**Cause:** +This is typically caused by infrastructure latency (Network Latency to the database or Disk I/O latency for SQLite) that is inherently higher in cloud environments compared to local NVMe/SSD storage and local networks. + +**Solution:** +Refer to the **[Cloud Infrastructure Latency](/tutorials/tips/performance#%EF%B8%8F-cloud-infrastructure-latency)** section in the Performance Guide for a detailed breakdown of diagnosis and mitigation strategies. + --- ## Deployment Best Practices @@ -156,6 +168,7 @@ While Open WebUI is designed to be stateless with proper Redis configuration, en ## Related Documentation - [Environment Variable Configuration](/getting-started/env-configuration) +- [Optimization, Performance & RAM Usage](/tutorials/tips/performance) - [Troubleshooting Connection Errors](/troubleshooting/connection-error) - [Logging Configuration](/getting-started/advanced-topics/logging) diff --git a/docs/tutorials/tips/performance.md b/docs/tutorials/tips/performance.md index 2f8d5187..c7d920ee 100644 --- a/docs/tutorials/tips/performance.md +++ b/docs/tutorials/tips/performance.md @@ -130,6 +130,65 @@ Defines the number of worker threads available for handling requests. --- +## ☁️ Cloud Infrastructure Latency + +When deploying Open WebUI in cloud Kubernetes environments (AKS, EKS, GKE), you may notice significant performance degradation compared to local Kubernetes (Rancher Desktop, kind, Minikube) or bare-metal deployments—even with identical resource allocations. This is almost always caused by **latency** in the underlying infrastructure. + +### Network Latency (Database & Services) + +The most common cause of cloud performance issues is **network latency between Open WebUI and its database**. + +Many cloud deployments place the database on a separate node, availability zone, or even a managed database service. While this is architecturally sound, it introduces latency to *every single database query*. Open WebUI makes multiple database calls per request, so even 10-20ms of network latency per query can compound into multi-second response times under concurrent load. + +**Symptoms:** +- Health check endpoints show high response times instead of being near-instant. +- Simple API calls or normal chat completions become sluggish under concurrent load, even when CPU and Memory usage appear low. +- Significant performance gap between local development/testing and cloud production environments. + +**Diagnosis:** +- Check network latency between your Open WebUI pod and your database. From within the pod: + ```bash + # For PostgreSQL + psql -h -U -c "SELECT 1" -d + + # Or use ping/nc to check raw latency + nc -zv 5432 + ``` +- If network latency to your database exceeds **5ms**, it is not recommended for high-performance deployments and will likely be your primary bottleneck. + +**Solutions:** +1. **Co-locate services:** Deploy Open WebUI and PostgreSQL in the same availability zone, or even on the same node pool if possible, to minimize network hops. +2. **Managed DB Consideration:** Note that "one-click" managed database solutions in the cloud, while scalable, often introduce significant network latency compared to a self-hosted DB on the same node. This tradeoff must be carefully considered. +3. **Enable caching:** Use `ENABLE_BASE_MODELS_CACHE=True` and other caching options to reduce the frequency of database queries. +4. **Reduce database writes:** Set `ENABLE_REALTIME_CHAT_SAVE=False` to batch database updates and reduce IOPS pressure. + +### Disk I/O Latency (SQLite & Storage) + +If you're using **SQLite** (the default) in a cloud environment, you may be trading network latency for **disk latency**. + +Cloud storage (Azure Disks, AWS EBS, GCP Persistent Disks) often has significantly higher latency and lower IOPS than local NVMe/SSD storage—especially on lower-tier storage classes. SQLite is particularly sensitive to disk performance because it performs synchronous writes. + +**Symptoms:** +- Performance is acceptable with a single user but degrades rapidly as concurrency increases. +- High "I/O Wait" on the server despite low CPU usage. + +**Solutions:** +1. **Use high-performance storage classes:** + - Ensure you are using SSD-backed storage classes (e.g., `Premium_LRS` on Azure, `gp3` on AWS, `pd-ssd` on GCP). +2. **Use PostgreSQL instead:** For any medium to large production deployment, **Postgres is mandatory**. SQLite is generally not recommended at scale in cloud environments due to the inherent latency of network-attached storage. + +### Other Cloud-Specific Considerations + +| Factor | Impact | Mitigation | +|--------|--------|------------| +| **Burstable VMs** (e.g., Azure B-series, AWS T-series) | CPU throttling under sustained load, even at low reported usage | Use standard/compute-optimized node pools | +| **DNS Resolution** | CoreDNS overhead on every external request | Ensure CoreDNS is properly scaled; consider node-local DNS cache | +| **Service Mesh Sidecars** | Istio/Linkerd proxies add latency to every request | Check for unexpected sidecar containers in your pods | +| **Network Policies** | CNI processing overhead | Audit and simplify network policies if possible | +| **Cross-Zone Traffic** | Latency + egress costs when services span zones | Pin services to the same availability zone | + +--- + ## 📉 Resource Efficiency (Reducing RAM) If deploying on memory-constrained devices (Raspberry Pi, small VPS), use these strategies to prevent the application from crashing due to OOM (Out of Memory) errors.