docs: Update image generation and editing documentation for clarity on context and image ID usage

This commit is contained in:
Danny Avila
2025-04-26 03:48:17 -04:00
parent ca672b846b
commit 968ed4e11a

View File

@@ -19,6 +19,7 @@ Each has its own look, price-point, and setup step (usually just an API key or U
**Notes:**
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
- The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
- Azure OpenAI does not yet support the latest OpenAI GPT-Image-1.
@@ -44,21 +45,25 @@ Each has its own look, price-point, and setup step (usually just an API key or U
| Use-case | Invokes |
|----------|---------------|
| "Start from scratch" | **Image Generation** |
| "Use my uploaded photo(s)" | **Image Editing** |
| "Use existing image(s)" | **Image Editing** |
The agent decides automatically, but the distinction is simple:
The agent decides which tool to use based on the context:
- If the users last message includes image(s), the LLM can choose the "edit" tool.
- "Editing" uses all uploaded images as direct references for image generation
- Otherwise, only image generation is possible
- Image generation is always an option in either scenario.
- **Image Generation** creates brand new images from text descriptions only
- **Image Editing** modifies or remixes existing images using their image IDs
- These can be images from the current message or previously generated/referenced images
- The LLM keeps track of image IDs as long as they remain in the context window
- Includes the referenced image IDs in the tool output
- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
- Both tools will include the generated image ID in the tool output
⚠️ **Important**
- Only the images attached to **the current user message** are sent to OpenAI for editing.
- if the "Resend files" model parameter is toggled, previously uploaded images will stay in scope as part of the regular chat request
- However, the "Resend files" model parameter does not affect files for "image editing."
- Motivation: this is to prevent the model from trying to edit images that are no longer relevant to the current context, which could lead to unexpected results.
- You can easily attach previously uploaded images from the side panel without needing to upload them again.
- Image editing relies on image IDs, which are retained in the chat history.
- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
- You can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
- You can also attach previously uploaded images from the side panel without needing to upload them again.
- This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.
### Parameters
@@ -70,8 +75,7 @@ The agent decides automatically, but the distinction is simple:
#### Image Editing
Note: The image editing tool is only available if the user has uploaded images in the current message.
• **image_ids** array of image IDs to use as reference for editing (required)
• **prompt** your description of the changes (required)
• **size** `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`
• **quality** `auto` (default), `high`, `medium`, or `low`
@@ -94,8 +98,7 @@ You can customize the tool descriptions and prompt guidance by setting these env
```bash
# Image Generation Tool Descriptions
IMAGE_GEN_OAI_DESCRIPTION_WITH_FILES=...
IMAGE_GEN_OAI_DESCRIPTION_NO_FILES=...
IMAGE_GEN_OAI_DESCRIPTION=...
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...
# Image Editing Tool Descriptions
@@ -260,11 +263,12 @@ mcpServers:
All generated images are:
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
2. Displayed directly in the chat interface
3. Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
- This may create issues if you are using an LLM that does not support image inputs.
- There will be an option to disable this behavior on a per-agent-basis in the future.
- The outputs are only directly sent to the LLM upon generation, not on every message.
- These outputs are only directly sent to the LLM upon generation, not on every message.
- To include the image in the chat, you can directly attach it to the message from the side panel.
- To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
---
@@ -300,4 +304,3 @@ Though you can customize the prompts for [OpenAI Image Tools](#advanced-configur
Example:
> A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.