From b10c5024ef2602a94b1b60669cfaebad32dfd48b Mon Sep 17 00:00:00 2001
From: Classic298 <27028174+Classic298@users.noreply.github.com>
Date: Mon, 10 Nov 2025 08:54:33 +0100
Subject: [PATCH] Document CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE

Added documentation for CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE including type, default value, and description.
---
 docs/getting-started/env-configuration.mdx | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx
index c2989fb..75e96c9 100644
--- a/docs/getting-started/env-configuration.mdx
+++ b/docs/getting-started/env-configuration.mdx
@@ -269,6 +269,12 @@ This will run the Open WebUI on port `9999`. The `PORT` environment variable is
 - Default: `1`
 - Description: Sets a system-wide minimum value for the number of tokens to batch together before sending them to the client during a streaming response. This allows an administrator to enforce a baseline level of performance and stability across the entire system by preventing excessively small chunk sizes that can cause high CPU load. The final chunk size used for a response will be the highest value set among this global variable, the model's advanced parameters, or the per-chat settings. The default is 1, which applies no minimum batching at the global level.
 
+#### `CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE`
+
+- Type: `int`
+- Default: Empty string (' '), which disables the limit (equivalent to None)
+- Description: Sets the maximum buffer size in bytes for handling stream response chunks. When a single chunk exceeds this limit, the system returns an empty JSON object and skips subsequent oversized data until encountering normally-sized chunks. This prevents memory issues when dealing with extremely large responses from certain providers (e.g., models like gemini-2.5-flash-image or services returning extensive web search data exceeding). Set to an empty string or a negative value to disable chunk size limitations entirely.
+
 :::info
 
 It is recommended to set this to a high single-digit or low double-digit value if you run Open WebUI with high concurrency, many users, and very fast streaming models.