mirror of
https://github.com/open-webui/docs.git
synced 2025-12-12 07:29:49 +07:00
Document CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE
Added documentation for CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE including type, default value, and description.
This commit is contained in:
@@ -269,6 +269,12 @@ This will run the Open WebUI on port `9999`. The `PORT` environment variable is
|
||||
- Default: `1`
|
||||
- Description: Sets a system-wide minimum value for the number of tokens to batch together before sending them to the client during a streaming response. This allows an administrator to enforce a baseline level of performance and stability across the entire system by preventing excessively small chunk sizes that can cause high CPU load. The final chunk size used for a response will be the highest value set among this global variable, the model's advanced parameters, or the per-chat settings. The default is 1, which applies no minimum batching at the global level.
|
||||
|
||||
#### `CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE`
|
||||
|
||||
- Type: `int`
|
||||
- Default: Empty string (' '), which disables the limit (equivalent to None)
|
||||
- Description: Sets the maximum buffer size in bytes for handling stream response chunks. When a single chunk exceeds this limit, the system returns an empty JSON object and skips subsequent oversized data until encountering normally-sized chunks. This prevents memory issues when dealing with extremely large responses from certain providers (e.g., models like gemini-2.5-flash-image or services returning extensive web search data exceeding). Set to an empty string or a negative value to disable chunk size limitations entirely.
|
||||
|
||||
:::info
|
||||
|
||||
It is recommended to set this to a high single-digit or low double-digit value if you run Open WebUI with high concurrency, many users, and very fast streaming models.
|
||||
|
||||
Reference in New Issue
Block a user