docs: Update image generation and editing documentation for clarity on context and image ID usage

2026-03-27 10:48:32 +07:00 · 2025-04-26 03:48:17 -04:00
parent ca672b846b
commit 968ed4e11a
1 changed files with 21 additions and 18 deletions
--- a/pages/docs/features/image_gen.mdx
+++ b/pages/docs/features/image_gen.mdx
@@ -19,6 +19,7 @@ Each has its own look, price-point, and setup step (usually just an API key or U

 **Notes:**
 - Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
+  - The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
  - See [Image Storage and Handling](#image-storage-and-handling) for more details.
 - API keys can be omitted in favor of allowing the user to enter their own key from the UI.
 - Azure OpenAI does not yet support the latest OpenAI GPT-Image-1.
@@ -44,21 +45,25 @@ Each has its own look, price-point, and setup step (usually just an API key or U
 | Use-case | Invokes |
 |----------|---------------|
 | "Start from scratch" | **Image Generation** |
-| "Use my uploaded photo(s)" | **Image Editing** |
+| "Use existing image(s)" | **Image Editing** |

-The agent decides automatically, but the distinction is simple:
+The agent decides which tool to use based on the context:

- If the user’s last message includes image(s), the LLM can choose the "edit" tool.
- "Editing" uses all uploaded images as direct references for image generation
- Otherwise, only image generation is possible
-    - Image generation is always an option in either scenario.
+- **Image Generation** creates brand new images from text descriptions only
+- **Image Editing** modifies or remixes existing images using their image IDs
+  - These can be images from the current message or previously generated/referenced images
+  - The LLM keeps track of image IDs as long as they remain in the context window
+  - Includes the referenced image IDs in the tool output
+- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
+- Both tools will include the generated image ID in the tool output

 ⚠️ **Important**
- Only the images attached to **the current user message** are sent to OpenAI for editing.  
- if the "Resend files" model parameter is toggled, previously uploaded images will stay in scope as part of the regular chat request 
-  - However, the "Resend files" model parameter does not affect files for "image editing."
-  - Motivation: this is to prevent the model from trying to edit images that are no longer relevant to the current context, which could lead to unexpected results.
-  - You can easily attach previously uploaded images from the side panel without needing to upload them again.
+- Image editing relies on image IDs, which are retained in the chat history.
+- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
+- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
+- You can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
+- You can also attach previously uploaded images from the side panel without needing to upload them again.
+  - This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.

 ### Parameters

@@ -70,8 +75,7 @@ The agent decides automatically, but the distinction is simple:

 #### Image Editing

-Note: The image editing tool is only available if the user has uploaded images in the current message.
-
+• **image_ids** – array of image IDs to use as reference for editing (required)
 • **prompt** – your description of the changes (required)  
 • **size** – `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`  
 • **quality** – `auto` (default), `high`, `medium`, or `low`
@@ -94,8 +98,7 @@ You can customize the tool descriptions and prompt guidance by setting these env

 ```bash
 # Image Generation Tool Descriptions
-IMAGE_GEN_OAI_DESCRIPTION_WITH_FILES=...
-IMAGE_GEN_OAI_DESCRIPTION_NO_FILES=...
+IMAGE_GEN_OAI_DESCRIPTION=...
 IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...

 # Image Editing Tool Descriptions
@@ -260,11 +263,12 @@ mcpServers:
 All generated images are:
 1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
 2. Displayed directly in the chat interface
-3. Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
+3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
  - This may create issues if you are using an LLM that does not support image inputs.
  - There will be an option to disable this behavior on a per-agent-basis in the future.
-  - The outputs are only directly sent to the LLM upon generation, not on every message.
+  - These outputs are only directly sent to the LLM upon generation, not on every message.
  - To include the image in the chat, you can directly attach it to the message from the side panel.
+  - To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.

 ---

@@ -300,4 +304,3 @@ Though you can customize the prompts for [OpenAI Image Tools](#advanced-configur
 Example:  

 > A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.
-