From 1357ee1f7a6669914b94f78aa68c6cfa12bae5a0 Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Sun, 11 Jan 2026 13:33:56 +0100
Subject: [PATCH] update

---
 docs/getting-started/env-configuration.mdx          | 13 ++++++++++++-
 .../quick-start/starting-with-openai.mdx            |  2 +-
 docs/troubleshooting/multi-replica.mdx              |  6 ++++--
 .../tips => troubleshooting}/performance.md         | 13 +++++++------
 4 files changed, 24 insertions(+), 10 deletions(-)
 rename docs/{tutorials/tips => troubleshooting}/performance.md (95%)

diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx
index 72827e93..23003d89 100644
--- a/docs/getting-started/env-configuration.mdx
+++ b/docs/getting-started/env-configuration.mdx
@@ -402,7 +402,18 @@ This will run the Open WebUI on port `9999`. The `PORT` environment variable is
 
 - Type: `bool`
 - Default: `False`
-- Description: When enabled, the system saves each chunk of streamed chat data to the database in real time to ensure maximum data persistency. This feature provides robust data recovery and allows accurate session tracking. However, the tradeoff is increased latency, as saving to the database introduces a delay. Disabling this feature can improve performance and reduce delays, but it risks potential data loss in the event of a system failure or crash. Use based on your application's requirements and acceptable tradeoffs.
+- Description: When enabled, the system saves each individual chunk of streamed chat data to the database in real time.
+
+:::danger EXTREME PERFORMANCE RISK: DO NOT ENABLE IN PRODUCTION
+**It is strongly recommended to NEVER enable this setting in production or multi-user environments.**
+
+Enabling `ENABLE_REALTIME_CHAT_SAVE` causes every single token generated by the LLM to trigger a separate database write operation. In a multi-user environment, this will:
+1.  **Exhaust Database Connection Pools**: Rapid-fire writes will quickly consume all available database connections, leading to "QueuePool limit reached" errors and application-wide freezes.
+2.  **Severe Performance Impact**: The overhead of thousands of database transactions per minute will cause massive latency for all users.
+3.  **Hardware Strain**: It creates immense I/O pressure on your storage system.
+
+**Keep this set to `False` (the default).** Chats are still saved automatically once generation is complete. This setting is only intended for extreme debugging scenarios or single-user environments where sub-second persistence of every token is more important than stability.
+:::
 
 #### `ENABLE_CHAT_RESPONSE_BASE64_IMAGE_URL_CONVERSION`
 
diff --git a/docs/getting-started/quick-start/starting-with-openai.mdx b/docs/getting-started/quick-start/starting-with-openai.mdx
index 7317380e..d8485c44 100644
--- a/docs/getting-started/quick-start/starting-with-openai.mdx
+++ b/docs/getting-started/quick-start/starting-with-openai.mdx
@@ -79,7 +79,7 @@ Once Open WebUI is running:
     :::tip OpenRouter Recommendation
     When using **OpenRouter**, we **highly recommend**:
     1.  **Use an allowlist** (add specific Model IDs). OpenRouter exposes thousands of models, which can clutter your model selector and slow down the admin panel if not filtered.
-    2.  **Enable Model Caching** (`Settings > Connections > Cache Base Model List` or `ENABLE_BASE_MODELS_CACHE=True`). Without caching, page loads can take 10-15+ seconds on first visit due to querying a large number of models. See the [Performance Guide](/tutorials/tips/performance) for more details.
+    2.  **Enable Model Caching** (`Settings > Connections > Cache Base Model List` or `ENABLE_BASE_MODELS_CACHE=True`). Without caching, page loads can take 10-15+ seconds on first visit due to querying a large number of models. See the [Performance Guide](/troubleshooting/performance) for more details.
     :::
 
     :::caution MiniMax Whitelisting
diff --git a/docs/troubleshooting/multi-replica.mdx b/docs/troubleshooting/multi-replica.mdx
index 45f4d69c..2a0dc9a1 100644
--- a/docs/troubleshooting/multi-replica.mdx
+++ b/docs/troubleshooting/multi-replica.mdx
@@ -127,7 +127,9 @@ The `/app/backend/data` directory is not shared or is not consistent across repl
 This is typically caused by infrastructure latency (Network Latency to the database or Disk I/O latency for SQLite) that is inherently higher in cloud environments compared to local NVMe/SSD storage and local networks.
 
 **Solution:**
-Refer to the **[Cloud Infrastructure Latency](/tutorials/tips/performance#%EF%B8%8F-cloud-infrastructure-latency)** section in the Performance Guide for a detailed breakdown of diagnosis and mitigation strategies.
+Refer to the **[Cloud Infrastructure Latency](/troubleshooting/performance#%EF%B8%8F-cloud-infrastructure-latency)** section in the Performance Guide for a detailed breakdown of diagnosis and mitigation strategies.
+
+If you need more tips for performance improvements, check out the full [Optimization & Performance Guide](/troubleshooting/performance).
 
 ### 7. Optimizing Database Performance
 
@@ -196,7 +198,7 @@ While Open WebUI is designed to be stateless with proper Redis configuration, en
 ## Related Documentation
 
 - [Environment Variable Configuration](/getting-started/env-configuration)
-- [Optimization, Performance & RAM Usage](/tutorials/tips/performance)
+- [Optimization, Performance & RAM Usage](/troubleshooting/performance)
 - [Troubleshooting Connection Errors](/troubleshooting/connection-error)
 - [Logging Configuration](/getting-started/advanced-topics/logging)
 
diff --git a/docs/tutorials/tips/performance.md b/docs/troubleshooting/performance.md
similarity index 95%
rename from docs/tutorials/tips/performance.md
rename to docs/troubleshooting/performance.md
index dc5d4e48..2c966f93 100644
--- a/docs/tutorials/tips/performance.md
+++ b/docs/troubleshooting/performance.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 10
+sidebar_position: 15
 title: "Optimization, Performance & RAM Usage"
 ---
 
@@ -86,11 +86,12 @@ For any multi-user or high-concurrency setup, **PostgreSQL is mandatory**. SQLit
 -   **Example**: `postgres://user:password@localhost:5432/webui`
 
 ### Chat Saving Strategy
-By default, Open WebUI saves chats in **real-time**. This ensures no data loss but creates massive database write pressure because *every single chunk* of text received from the LLM triggers a database update.
 
--   **Env Var**: `ENABLE_REALTIME_CHAT_SAVE=False`
+By default, Open WebUI saves chats **after generation is complete**. While saving in real-time (token by token) is possible, it creates massive database write pressure and is **strongly discouraged**.
+
+-   **Env Var**: `ENABLE_REALTIME_CHAT_SAVE=False` (Default)
 -   **Effect**: Chats are saved only when the generation is complete (or periodically).
--   **Recommendation**: **Highly Recommended** for any high-user setup to reduce DB load substantially.
+-   **Recommendation**: **DO NOT ENABLE `ENABLE_REALTIME_CHAT_SAVE` in production.** It is highly recommended to keep this `False` to prevent database connection exhaustion and severe performance degradation under concurrent load. See the [Environment Variable Configuration](/getting-started/env-configuration#enable_realtime_chat_save) for details.
 
 ### Database Session Sharing
 
@@ -159,7 +160,7 @@ If you are deploying for **enterprise scale** (hundreds of users), simple Docker
 For setups with many simultaneous users, these settings are crucial to prevent bottlenecks.
 
 #### Batch Streaming Tokens
-By default, Open WebUI streams *every single token* arriving from the LLM. High-frequency streaming increases network IO and CPU usage on the server. If real-time saving is enabled, it also destroys database performance (you can disable it with `ENABLE_REALTIME_CHAT_SAVE=False`).
+By default, Open WebUI streams *every single token* arriving from the LLM. High-frequency streaming increases network IO and CPU usage on the server. If real-time saving is enabled (which is strongly discouraged), it also destroys database performance.
 
 Increasing the chunk size buffers these updates, sending them to the client in larger groups. The only downside is a slightly choppier UI experience when streaming the response, but it can make a big difference in performance.
 
@@ -307,7 +308,7 @@ If resource usage is critical, disable automated features that constantly trigge
 1.  **Embeddings**: `RAG_EMBEDDING_ENGINE=openai` (or `ollama` with `nomic-embed-text` on a fast server).
 2.  **Task Model**: `gpt-5-nano` or `llama-3.1-8b-instant`.
 3.  **Caching**: `MODELS_CACHE_TTL=300`.
-4.  **Database**: `ENABLE_REALTIME_CHAT_SAVE=True` (Persistence is usually preferred over raw write speed here).
+4.  **Database**: `ENABLE_REALTIME_CHAT_SAVE=False` (Keeping this disabled is recommended even for single users to ensure maximum stability).
 5.  **Vector DB**: PGVector (recommended) or ChromaDB (either is fine unless dealing with massive data).
 
 ### Profile 3: High Scale / Enterprise