Document CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE

Added documentation for CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE including type, default value, and description.
2025-12-12 07:29:49 +07:00 · 2025-11-10 08:54:33 +01:00
parent 947afd17e9
commit b10c5024ef
1 changed files with 6 additions and 0 deletions
--- a/docs/getting-started/env-configuration.mdx
+++ b/docs/getting-started/env-configuration.mdx
@@ -269,6 +269,12 @@ This will run the Open WebUI on port `9999`. The `PORT` environment variable is
 - Default: `1`
 - Description: Sets a system-wide minimum value for the number of tokens to batch together before sending them to the client during a streaming response. This allows an administrator to enforce a baseline level of performance and stability across the entire system by preventing excessively small chunk sizes that can cause high CPU load. The final chunk size used for a response will be the highest value set among this global variable, the model's advanced parameters, or the per-chat settings. The default is 1, which applies no minimum batching at the global level.
 #### `CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE`
 - Type: `int`
 - Default: Empty string (' '), which disables the limit (equivalent to None)
 - Description: Sets the maximum buffer size in bytes for handling stream response chunks. When a single chunk exceeds this limit, the system returns an empty JSON object and skips subsequent oversized data until encountering normally-sized chunks. This prevents memory issues when dealing with extremely large responses from certain providers (e.g., models like gemini-2.5-flash-image or services returning extensive web search data exceeding). Set to an empty string or a negative value to disable chunk size limitations entirely.
 :::info
 It is recommended to set this to a high single-digit or low double-digit value if you run Open WebUI with high concurrency, many users, and very fast streaming models.