mirror of
https://github.com/LibreChat-AI/librechat.ai.git
synced 2026-03-27 10:48:32 +07:00
* docs: Add Gemini Image Generation tool documentation * docs: Add Gemini Image Tools to image_gen features page * fix: Update default model to gemini-2.5-flash-image
377 lines
15 KiB
Plaintext
377 lines
15 KiB
Plaintext
---
|
||
title: Image Generation & Editing
|
||
description: Comprehensive guide to LibreChat's built-in image generation and editing tools
|
||
---
|
||
|
||
## 🎨 Image Generation & Editing
|
||
|
||
LibreChat comes with **built-in image tools** that you can add to an **[Agent](/docs/features/agents).**
|
||
|
||
Each has its own look, price-point, and setup step (usually just an API key or URL).
|
||
|
||
| Tool | Best for | Needs |
|
||
|------|----------|-------|
|
||
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
|
||
| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
|
||
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
|
||
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
|
||
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
|
||
| **MCP** | Bring-your-own-Image-Generators | MCP server with image output support |
|
||
|
||
**Notes:**
|
||
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
|
||
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||
- The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
|
||
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
|
||
- MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools.
|
||
- Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).
|
||
|
||
---
|
||
|
||
## 1 · OpenAI Image Tools (recommended)
|
||
|
||
### Features
|
||
|
||
"OpenAI Image Tools" are an agent toolkit made up of 2 separate tools.
|
||
|
||
- **Image Generation**:
|
||
- **Create** brand-new images from text prompts (no upload required).
|
||
- **Image Editing**:
|
||
- **Edit** or **remix** the images you just uploaded—change colours, add objects, extend the canvas, etc.
|
||
- Both use OpenAI's latest image generation model, **GPT-Image-1**, for superior instruction following, text rendering, detailed editing, real-world knowledge
|
||
- See OpenAI's [Image Generation documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) for more details.
|
||
|
||
#### Generation vs. Editing
|
||
| Use-case | Invokes |
|
||
|----------|---------------|
|
||
| "Start from scratch" | **Image Generation** |
|
||
| "Use existing image(s)" | **Image Editing** |
|
||
|
||
The agent decides which tool to use based on the context:
|
||
|
||
- **Image Generation** creates brand new images from text descriptions only
|
||
- **Image Editing** modifies or remixes existing images using their image IDs
|
||
- These can be images from the current message or previously generated/referenced images
|
||
- The LLM keeps track of image IDs as long as they remain in the context window
|
||
- Includes the referenced image IDs in the tool output
|
||
- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
|
||
- Both tools will include the generated image ID in the tool output
|
||
|
||
⚠️ **Important**
|
||
- Image editing relies on image IDs, which are retained in the chat history.
|
||
- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
|
||
- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
|
||
- The LLM can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
|
||
- You can also attach previously uploaded images from the side panel without needing to upload them again.
|
||
- This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.
|
||
|
||
### Parameters
|
||
|
||
#### Image Generation
|
||
• **prompt** – text description (required)
|
||
• **size** – `auto` (default), `1024x1024` (square), `1536x1024` (landscape), or `1024x1536` (portrait)
|
||
• **quality** – `auto` (default), `high`, `medium`, or `low`
|
||
• **background** – `auto` (default), `transparent`, or `opaque` (transparent requires PNG or WebP format)
|
||
|
||
#### Image Editing
|
||
|
||
• **image_ids** – array of image IDs to use as reference for editing (required)
|
||
• **prompt** – text description of the changes (required)
|
||
• **size** – `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`
|
||
• **quality** – `auto` (default), `high`, `medium`, or `low`
|
||
|
||
### Setup
|
||
Create or reuse an OpenAI key and add to `.env`:
|
||
|
||
```bash
|
||
IMAGE_GEN_OAI_API_KEY=sk-...
|
||
# optional extras
|
||
IMAGE_GEN_OAI_BASEURL=https://...
|
||
```
|
||
|
||
For Azure OpenAI deployments, you will first need access: https://aka.ms/oai/gptimage1access
|
||
|
||
Then, add your corresponding credentials to your `.env` file:
|
||
|
||
```bash
|
||
IMAGE_GEN_OAI_API_KEY=your-api-key
|
||
# optional extras
|
||
IMAGE_GEN_OAI_BASEURL=https://deploymentname.openai.azure.com/openai/deployments/gpt-image-1/
|
||
IMAGE_GEN_OAI_AZURE_API_VERSION=2025-04-01-preview
|
||
```
|
||
|
||
Then add "OpenAI Image Tools" to your Agent's *Tools* list.
|
||
|
||
### Advanced Configuration
|
||
|
||
You can customize the tool descriptions and prompt guidance by setting these environment variables:
|
||
|
||
```bash
|
||
# Image Generation Tool Descriptions
|
||
IMAGE_GEN_OAI_DESCRIPTION=...
|
||
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...
|
||
|
||
# Image Editing Tool Descriptions
|
||
IMAGE_EDIT_OAI_DESCRIPTION=...
|
||
IMAGE_EDIT_OAI_PROMPT_DESCRIPTION=...
|
||
```
|
||
|
||
### Pricing
|
||
See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-image-1) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1#cost-and-latency) for details on costs associated with image generation.
|
||
|
||
---
|
||
|
||
## 2 · Gemini Image Tools
|
||
|
||
Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
|
||
|
||
### Features
|
||
|
||
- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
|
||
- **Image Context Support**: Use existing images as context or inspiration for new generations
|
||
- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
|
||
- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
|
||
- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI
|
||
|
||
### Parameters
|
||
|
||
• **prompt** – Detailed text description of the desired image (required, up to 32,000 characters)
|
||
• **image_ids** – Optional array of image IDs to use as visual context for generation
|
||
|
||
### Setup
|
||
|
||
#### Option 1: Gemini API (Recommended)
|
||
|
||
Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):
|
||
|
||
```bash
|
||
GEMINI_API_KEY=your_api_key_here
|
||
```
|
||
|
||
#### Option 2: Vertex AI (Enterprise)
|
||
|
||
For Google Cloud users with Vertex AI access:
|
||
|
||
```bash
|
||
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
|
||
GOOGLE_CLOUD_LOCATION=us-central1 # optional, default: global
|
||
```
|
||
|
||
### Model Selection
|
||
|
||
```bash
|
||
# Default model (fast and efficient)
|
||
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
|
||
|
||
# Higher quality model
|
||
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
|
||
```
|
||
|
||
### Advanced Configuration
|
||
|
||
Customize tool descriptions via environment variables:
|
||
|
||
```bash
|
||
GEMINI_IMAGE_GEN_DESCRIPTION=...
|
||
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
|
||
GEMINI_IMAGE_IDS_DESCRIPTION=...
|
||
```
|
||
|
||
More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).
|
||
|
||
---
|
||
|
||
## 3 · DALL·E (legacy)
|
||
|
||
DALL·E provides high-quality image generation using OpenAI's legacy image models.
|
||
|
||
### Parameters
|
||
• **prompt** – Text description of the desired image (required, up to 4000 characters)
|
||
• **style** – `vivid` (hyper-real, dramatic - default) or `natural` (less hyper-real)
|
||
• **quality** – `standard` (default) or `hd`
|
||
• **size** – `1024x1024` (default/square), `1792x1024` (wide), or `1024x1792` (tall)
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
# Required
|
||
DALLE_API_KEY=sk-... # or DALLE3_API_KEY=sk-...
|
||
|
||
# Optional
|
||
DALLE_REVERSE_PROXY=https://... # Alternative endpoint
|
||
DALLE3_BASEURL=https://... # For Azure or custom endpoints
|
||
DALLE3_AZURE_API_VERSION=2023-12-01-preview # For Azure deployments
|
||
DALLE3_SYSTEM_PROMPT=... # Custom system prompt for DALL·E
|
||
```
|
||
|
||
### Advanced Configuration
|
||
For Azure OpenAI deployments, configure both the base URL and API version:
|
||
|
||
```bash
|
||
DALLE3_BASEURL=https://your-resource-name.openai.azure.com/openai/deployments/your-deployment-name
|
||
DALLE3_AZURE_API_VERSION=2023-12-01-preview
|
||
DALLE3_API_KEY=your-azure-api-key
|
||
```
|
||
|
||
Enable the **DALL·E** tool for the Agent and start prompting.
|
||
|
||
### Pricing
|
||
See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=dall-e-3) for details on costs associated with image generation.
|
||
|
||
---
|
||
|
||
## 4 · Stable Diffusion (local)
|
||
|
||
Run images entirely on your own machine or server.
|
||
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
|
||
|
||
### Parameters
|
||
• **prompt** – Detailed keywords describing desired elements in the image (required)
|
||
• **negative_prompt** – Keywords describing elements to exclude from the image (required)
|
||
|
||
The Stable Diffusion implementation uses these default parameters:
|
||
- cfg_scale: 4.5
|
||
- steps: 22
|
||
- width: 1024
|
||
- height: 1024
|
||
|
||
These values are currently fixed but provide good results for most use cases.
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
SD_WEBUI_URL=http://127.0.0.1:7860 # URL to your Automatic1111 WebUI
|
||
```
|
||
|
||
No API key required—just the reachable URL.
|
||
|
||
More details on setting up Automatic1111 can be found in the dedicated [Stable Diffusion guide](/docs/configuration/tools/stable_diffusion).
|
||
|
||
---
|
||
|
||
## 5 · Flux
|
||
|
||
Cloud generator with an emphasis on speed and optional fine-tuned models.
|
||
|
||
### Features
|
||
- Fast cloud-based image generation
|
||
- Support for fine-tuned models
|
||
- Multiple quality levels and aspect ratios
|
||
- Raw mode for less processed, more natural-looking images
|
||
|
||
### Parameters
|
||
The Flux tool supports three main actions:
|
||
|
||
1. **generate** - Create a new image from a text prompt
|
||
2. **generate_finetuned** - Create an image using a fine-tuned model
|
||
3. **list_finetunes** - List available custom models for the user
|
||
|
||
More details can be found in the dedicated [Flux guide](/docs/configuration/tools/flux#parameters).
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
FLUX_API_KEY=flux_live_...
|
||
FLUX_API_BASE_URL=https://api.us1.bfl.ai # default is fine for most users
|
||
```
|
||
|
||
Choose the **Flux** tool inside the Agent. Prompts are plain text; one call produces one image.
|
||
|
||
### Pricing
|
||
See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs associated with image generation.
|
||
|
||
---
|
||
|
||
## 6 · Model Context Protocol (MCP)
|
||
|
||
Image outputs are supported from MCP servers.
|
||
|
||
For example, the [Puppeteer MCP Server](https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer) can be used to generate screenshots of web pages, which correctly output the image in the expected format and is treated the same as LC's built-in image tools.
|
||
|
||
> The examples below assume LibreChat is running outside of Docker, directly using Node.js. The Model Context Protocol is a relatively new framework, and many developers are still learning how to properly serve their systems with uv/node for scalable distribution.
|
||
|
||
> As this technology is still emerging, there are currently few image-generating servers available, and many existing ones have yet to adopt the correct response format for images.
|
||
|
||
> While many MCP servers do function well within Docker, the following examples do not, or not without more advanced configurations, showcasing some of the current inconsistency between MCP servers.
|
||
|
||
```yaml
|
||
mcpServers:
|
||
puppeteer:
|
||
command: npx
|
||
args:
|
||
- -y
|
||
- "@modelcontextprotocol/server-puppeteer"
|
||
```
|
||
|
||
The following is an example of an [Image Generation server](https://github.com/GongRzhe/Image-Generation-MCP-Server) that outputs images using [Replicate API](https://replicate.com/account/api-tokens), but returns URLs of the images, which doesn't conform to MCP's image response standard.
|
||
|
||
> Note: for this particular server, you need to install the `@gongrzhe/image-gen-server` package globally using npm, i.e. `npm install -g @gongrzhe/image-gen-server`, then point to the package's compiled files as shown below.
|
||
|
||
```yaml
|
||
mcpServers:
|
||
image-gen:
|
||
command: "node"
|
||
# First, install the package globally using npm:
|
||
# `npm install -g @gongrzhe/image-gen-server`
|
||
# Then, point to the location of the installed package,
|
||
# which you can find by running `npm root -g`
|
||
args:
|
||
- "{REPLACE_WITH_NODE_MODULES_LOCATION}/@gongrzhe/image-gen-server/build/index.js"
|
||
# Example with output from `npm root -g`:
|
||
# - "/home/danny/.nvm/versions/node/v20.19.0/lib/node_modules/@gongrzhe/image-gen-server/build/index.js"
|
||
env:
|
||
# Do not hardcode the API token here, use the environment variable instead
|
||
# The following will pick up the token from your .env file or environment
|
||
REPLICATE_API_TOKEN: "${REPLICATE_API_TOKEN}"
|
||
MODEL: "google/imagen-3"
|
||
```
|
||
|
||
---
|
||
|
||
## Image Storage and Handling
|
||
|
||
All generated images are:
|
||
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
|
||
2. Displayed directly in the chat interface
|
||
3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||
- This may create issues if you are using an LLM that does not support image inputs.
|
||
- There will be an option to disable this behavior on a per-agent-basis in the future.
|
||
- These outputs are only directly sent to the LLM upon generation, not on every message.
|
||
- To include the image in the chat, you can directly attach it to the message from the side panel.
|
||
- To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
|
||
|
||
---
|
||
|
||
## Proxy Support
|
||
|
||
All image generation tools support proxy configuration through the `PROXY` environment variable:
|
||
|
||
```bash
|
||
PROXY=http://proxy-url:port
|
||
```
|
||
|
||
## Error Handling
|
||
If any of the tools encounter an error, they will return an error message explaining what went wrong. Common issues include:
|
||
- Invalid API key
|
||
- API unavailability
|
||
- Content policy violations
|
||
- Proxy/network issues
|
||
- Invalid parameters
|
||
- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above)
|
||
|
||
---
|
||
|
||
## Prompting
|
||
|
||
Though you can customize the prompts for [OpenAI Image Tools](#advanced-configuration) and [DALL·E](#advanced-configuration-1), the following tips inform the default prompts supplied by the tools, which is helpful to know for your own writing/prompting.
|
||
|
||
1. Start with the **subject** and **style** (photo, oil painting, etc.).
|
||
2. Add **composition** and **camera / medium** ("wide-angle shot of…", "watercolour…").
|
||
3. Mention **lighting & mood** ("golden hour", "dramatic shadows").
|
||
4. Finish with **detail keywords** (textures, colours, expressions).
|
||
5. Keep negatives positive—describe what should be included, not what to avoid.
|
||
|
||
Example:
|
||
|
||
> A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.
|