cloud

2026-03-27 13:28:37 +07:00 · 2026-01-02 16:29:14 +01:00
parent 8d5a257c8a
commit e28b702f78
2 changed files with 72 additions and 0 deletions
--- a/docs/troubleshooting/multi-replica.mdx
+++ b/docs/troubleshooting/multi-replica.mdx
@@ -116,6 +116,18 @@ The `/app/backend/data` directory is not shared or is not consistent across repl
 - **Kubernetes:** Use a `PersistentVolumeClaim` with `ReadWriteMany` (RWX) access mode if your storage provider supports it (e.g., NFS, CephFS, AWS EFS).
 - **Docker Swarm/Compose:** Mount a shared volume (e.g., NFS mount) to `/app/backend/data` on all containers.

+### 6. Slow Performance in Cloud vs. Local Kubernetes
+
+**Symptoms:**
+- Open WebUI performs well locally but experiences significant degradation or timeouts when deployed to cloud providers (AKS, EKS, GKE).
+- Performance drops sharply under concurrent load despite adequate resource allocation.
+
+**Cause:**
+This is typically caused by infrastructure latency (Network Latency to the database or Disk I/O latency for SQLite) that is inherently higher in cloud environments compared to local NVMe/SSD storage and local networks.
+
+**Solution:**
+Refer to the **[Cloud Infrastructure Latency](/tutorials/tips/performance#%EF%B8%8F-cloud-infrastructure-latency)** section in the Performance Guide for a detailed breakdown of diagnosis and mitigation strategies.
+
 ---

 ## Deployment Best Practices
@@ -156,6 +168,7 @@ While Open WebUI is designed to be stateless with proper Redis configuration, en
 ## Related Documentation

 - [Environment Variable Configuration](/getting-started/env-configuration)
+- [Optimization, Performance & RAM Usage](/tutorials/tips/performance)
 - [Troubleshooting Connection Errors](/troubleshooting/connection-error)
 - [Logging Configuration](/getting-started/advanced-topics/logging)

--- a/docs/tutorials/tips/performance.md
+++ b/docs/tutorials/tips/performance.md
@@ -130,6 +130,65 @@ Defines the number of worker threads available for handling requests.

 ---

+## ☁️ Cloud Infrastructure Latency
+
+When deploying Open WebUI in cloud Kubernetes environments (AKS, EKS, GKE), you may notice significant performance degradation compared to local Kubernetes (Rancher Desktop, kind, Minikube) or bare-metal deployments—even with identical resource allocations. This is almost always caused by **latency** in the underlying infrastructure.
+
+### Network Latency (Database & Services)
+
+The most common cause of cloud performance issues is **network latency between Open WebUI and its database**.
+
+Many cloud deployments place the database on a separate node, availability zone, or even a managed database service. While this is architecturally sound, it introduces latency to *every single database query*. Open WebUI makes multiple database calls per request, so even 10-20ms of network latency per query can compound into multi-second response times under concurrent load.
+
+**Symptoms:**
+- Health check endpoints show high response times instead of being near-instant.
+- Simple API calls or normal chat completions become sluggish under concurrent load, even when CPU and Memory usage appear low.
+- Significant performance gap between local development/testing and cloud production environments.
+
+**Diagnosis:**
+- Check network latency between your Open WebUI pod and your database. From within the pod:
+  ```bash
+  # For PostgreSQL
+  psql -h <db-host> -U <user> -c "SELECT 1" -d <database>
+  
+  # Or use ping/nc to check raw latency
+  nc -zv <db-host> 5432
+  ```
+- If network latency to your database exceeds **5ms**, it is not recommended for high-performance deployments and will likely be your primary bottleneck.
+
+**Solutions:**
+1. **Co-locate services:** Deploy Open WebUI and PostgreSQL in the same availability zone, or even on the same node pool if possible, to minimize network hops.
+2. **Managed DB Consideration:** Note that "one-click" managed database solutions in the cloud, while scalable, often introduce significant network latency compared to a self-hosted DB on the same node. This tradeoff must be carefully considered.
+3. **Enable caching:** Use `ENABLE_BASE_MODELS_CACHE=True` and other caching options to reduce the frequency of database queries.
+4. **Reduce database writes:** Set `ENABLE_REALTIME_CHAT_SAVE=False` to batch database updates and reduce IOPS pressure.
+
+### Disk I/O Latency (SQLite & Storage)
+
+If you're using **SQLite** (the default) in a cloud environment, you may be trading network latency for **disk latency**.
+
+Cloud storage (Azure Disks, AWS EBS, GCP Persistent Disks) often has significantly higher latency and lower IOPS than local NVMe/SSD storage—especially on lower-tier storage classes. SQLite is particularly sensitive to disk performance because it performs synchronous writes.
+
+**Symptoms:**
+- Performance is acceptable with a single user but degrades rapidly as concurrency increases.
+- High "I/O Wait" on the server despite low CPU usage.
+
+**Solutions:**
+1. **Use high-performance storage classes:**
+   - Ensure you are using SSD-backed storage classes (e.g., `Premium_LRS` on Azure, `gp3` on AWS, `pd-ssd` on GCP).
+2. **Use PostgreSQL instead:** For any medium to large production deployment, **Postgres is mandatory**. SQLite is generally not recommended at scale in cloud environments due to the inherent latency of network-attached storage.
+
+### Other Cloud-Specific Considerations
+
+| Factor | Impact | Mitigation |
+|--------|--------|------------|
+| **Burstable VMs** (e.g., Azure B-series, AWS T-series) | CPU throttling under sustained load, even at low reported usage | Use standard/compute-optimized node pools |
+| **DNS Resolution** | CoreDNS overhead on every external request | Ensure CoreDNS is properly scaled; consider node-local DNS cache |
+| **Service Mesh Sidecars** | Istio/Linkerd proxies add latency to every request | Check for unexpected sidecar containers in your pods |
+| **Network Policies** | CNI processing overhead | Audit and simplify network policies if possible |
+| **Cross-Zone Traffic** | Latency + egress costs when services span zones | Pin services to the same availability zone |
+
+---
+
 ## 📉 Resource Efficiency (Reducing RAM)

 If deploying on memory-constrained devices (Raspberry Pi, small VPS), use these strategies to prevent the application from crashing due to OOM (Out of Memory) errors.