Document CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE

Added documentation for CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE including type, default value, and description.
This commit is contained in:
Classic298
2025-11-10 08:54:33 +01:00
committed by GitHub
parent 947afd17e9
commit b10c5024ef

View File

@@ -269,6 +269,12 @@ This will run the Open WebUI on port `9999`. The `PORT` environment variable is
- Default: `1` - Default: `1`
- Description: Sets a system-wide minimum value for the number of tokens to batch together before sending them to the client during a streaming response. This allows an administrator to enforce a baseline level of performance and stability across the entire system by preventing excessively small chunk sizes that can cause high CPU load. The final chunk size used for a response will be the highest value set among this global variable, the model's advanced parameters, or the per-chat settings. The default is 1, which applies no minimum batching at the global level. - Description: Sets a system-wide minimum value for the number of tokens to batch together before sending them to the client during a streaming response. This allows an administrator to enforce a baseline level of performance and stability across the entire system by preventing excessively small chunk sizes that can cause high CPU load. The final chunk size used for a response will be the highest value set among this global variable, the model's advanced parameters, or the per-chat settings. The default is 1, which applies no minimum batching at the global level.
#### `CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE`
- Type: `int`
- Default: Empty string (' '), which disables the limit (equivalent to None)
- Description: Sets the maximum buffer size in bytes for handling stream response chunks. When a single chunk exceeds this limit, the system returns an empty JSON object and skips subsequent oversized data until encountering normally-sized chunks. This prevents memory issues when dealing with extremely large responses from certain providers (e.g., models like gemini-2.5-flash-image or services returning extensive web search data exceeding). Set to an empty string or a negative value to disable chunk size limitations entirely.
:::info :::info
It is recommended to set this to a high single-digit or low double-digit value if you run Open WebUI with high concurrency, many users, and very fast streaming models. It is recommended to set this to a high single-digit or low double-digit value if you run Open WebUI with high concurrency, many users, and very fast streaming models.