mirror of
https://github.com/LibreChat-AI/librechat.ai.git
synced 2026-03-27 10:48:32 +07:00
docs: Update image generation and editing documentation for clarity on context and image ID usage
This commit is contained in:
@@ -19,6 +19,7 @@ Each has its own look, price-point, and setup step (usually just an API key or U
|
||||
|
||||
**Notes:**
|
||||
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||||
- The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
|
||||
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
|
||||
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
|
||||
- Azure OpenAI does not yet support the latest OpenAI GPT-Image-1.
|
||||
@@ -44,21 +45,25 @@ Each has its own look, price-point, and setup step (usually just an API key or U
|
||||
| Use-case | Invokes |
|
||||
|----------|---------------|
|
||||
| "Start from scratch" | **Image Generation** |
|
||||
| "Use my uploaded photo(s)" | **Image Editing** |
|
||||
| "Use existing image(s)" | **Image Editing** |
|
||||
|
||||
The agent decides automatically, but the distinction is simple:
|
||||
The agent decides which tool to use based on the context:
|
||||
|
||||
- If the user’s last message includes image(s), the LLM can choose the "edit" tool.
|
||||
- "Editing" uses all uploaded images as direct references for image generation
|
||||
- Otherwise, only image generation is possible
|
||||
- Image generation is always an option in either scenario.
|
||||
- **Image Generation** creates brand new images from text descriptions only
|
||||
- **Image Editing** modifies or remixes existing images using their image IDs
|
||||
- These can be images from the current message or previously generated/referenced images
|
||||
- The LLM keeps track of image IDs as long as they remain in the context window
|
||||
- Includes the referenced image IDs in the tool output
|
||||
- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
|
||||
- Both tools will include the generated image ID in the tool output
|
||||
|
||||
⚠️ **Important**
|
||||
- Only the images attached to **the current user message** are sent to OpenAI for editing.
|
||||
- if the "Resend files" model parameter is toggled, previously uploaded images will stay in scope as part of the regular chat request
|
||||
- However, the "Resend files" model parameter does not affect files for "image editing."
|
||||
- Motivation: this is to prevent the model from trying to edit images that are no longer relevant to the current context, which could lead to unexpected results.
|
||||
- You can easily attach previously uploaded images from the side panel without needing to upload them again.
|
||||
- Image editing relies on image IDs, which are retained in the chat history.
|
||||
- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
|
||||
- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
|
||||
- You can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
|
||||
- You can also attach previously uploaded images from the side panel without needing to upload them again.
|
||||
- This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.
|
||||
|
||||
### Parameters
|
||||
|
||||
@@ -70,8 +75,7 @@ The agent decides automatically, but the distinction is simple:
|
||||
|
||||
#### Image Editing
|
||||
|
||||
Note: The image editing tool is only available if the user has uploaded images in the current message.
|
||||
|
||||
• **image_ids** – array of image IDs to use as reference for editing (required)
|
||||
• **prompt** – your description of the changes (required)
|
||||
• **size** – `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`
|
||||
• **quality** – `auto` (default), `high`, `medium`, or `low`
|
||||
@@ -94,8 +98,7 @@ You can customize the tool descriptions and prompt guidance by setting these env
|
||||
|
||||
```bash
|
||||
# Image Generation Tool Descriptions
|
||||
IMAGE_GEN_OAI_DESCRIPTION_WITH_FILES=...
|
||||
IMAGE_GEN_OAI_DESCRIPTION_NO_FILES=...
|
||||
IMAGE_GEN_OAI_DESCRIPTION=...
|
||||
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...
|
||||
|
||||
# Image Editing Tool Descriptions
|
||||
@@ -260,11 +263,12 @@ mcpServers:
|
||||
All generated images are:
|
||||
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
|
||||
2. Displayed directly in the chat interface
|
||||
3. Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||||
3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||||
- This may create issues if you are using an LLM that does not support image inputs.
|
||||
- There will be an option to disable this behavior on a per-agent-basis in the future.
|
||||
- The outputs are only directly sent to the LLM upon generation, not on every message.
|
||||
- These outputs are only directly sent to the LLM upon generation, not on every message.
|
||||
- To include the image in the chat, you can directly attach it to the message from the side panel.
|
||||
- To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
|
||||
|
||||
---
|
||||
|
||||
@@ -300,4 +304,3 @@ Though you can customize the prompts for [OpenAI Image Tools](#advanced-configur
|
||||
Example:
|
||||
|
||||
> A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user