From b10c5024ef2602a94b1b60669cfaebad32dfd48b Mon Sep 17 00:00:00 2001 From: Classic298 <27028174+Classic298@users.noreply.github.com> Date: Mon, 10 Nov 2025 08:54:33 +0100 Subject: [PATCH] Document CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE Added documentation for CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE including type, default value, and description. --- docs/getting-started/env-configuration.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx index c2989fb..75e96c9 100644 --- a/docs/getting-started/env-configuration.mdx +++ b/docs/getting-started/env-configuration.mdx @@ -269,6 +269,12 @@ This will run the Open WebUI on port `9999`. The `PORT` environment variable is - Default: `1` - Description: Sets a system-wide minimum value for the number of tokens to batch together before sending them to the client during a streaming response. This allows an administrator to enforce a baseline level of performance and stability across the entire system by preventing excessively small chunk sizes that can cause high CPU load. The final chunk size used for a response will be the highest value set among this global variable, the model's advanced parameters, or the per-chat settings. The default is 1, which applies no minimum batching at the global level. +#### `CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE` + +- Type: `int` +- Default: Empty string (' '), which disables the limit (equivalent to None) +- Description: Sets the maximum buffer size in bytes for handling stream response chunks. When a single chunk exceeds this limit, the system returns an empty JSON object and skips subsequent oversized data until encountering normally-sized chunks. This prevents memory issues when dealing with extremely large responses from certain providers (e.g., models like gemini-2.5-flash-image or services returning extensive web search data exceeding). Set to an empty string or a negative value to disable chunk size limitations entirely. + :::info It is recommended to set this to a high single-digit or low double-digit value if you run Open WebUI with high concurrency, many users, and very fast streaming models.