Files
librechat.ai/pages/docs/features/image_gen.mdx
Joseph Licata 2bab1c9cd2 docs: Add Gemini Image Generation tool documentation (#452)
* docs: Add Gemini Image Generation tool documentation

* docs: Add Gemini Image Tools to image_gen features page

* fix: Update default model to gemini-2.5-flash-image
2026-01-03 12:48:43 -05:00

377 lines
15 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Image Generation & Editing
description: Comprehensive guide to LibreChat's built-in image generation and editing tools
---
## 🎨 Image Generation & Editing
LibreChat comes with **built-in image tools** that you can add to an **[Agent](/docs/features/agents).**
Each has its own look, price-point, and setup step (usually just an API key or URL).
| Tool | Best for | Needs |
|------|----------|-------|
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
| **MCP** | Bring-your-own-Image-Generators | MCP server with image output support |
**Notes:**
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
- The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
- MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools.
- Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).
---
## 1 · OpenAI Image Tools (recommended)
### Features
"OpenAI Image Tools" are an agent toolkit made up of 2 separate tools.
- **Image Generation**:
- **Create** brand-new images from text prompts (no upload required).
- **Image Editing**:
- **Edit** or **remix** the images you just uploaded—change colours, add objects, extend the canvas, etc.
- Both use OpenAI's latest image generation model, **GPT-Image-1**, for superior instruction following, text rendering, detailed editing, real-world knowledge
- See OpenAI's [Image Generation documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) for more details.
#### Generation vs. Editing
| Use-case | Invokes |
|----------|---------------|
| "Start from scratch" | **Image Generation** |
| "Use existing image(s)" | **Image Editing** |
The agent decides which tool to use based on the context:
- **Image Generation** creates brand new images from text descriptions only
- **Image Editing** modifies or remixes existing images using their image IDs
- These can be images from the current message or previously generated/referenced images
- The LLM keeps track of image IDs as long as they remain in the context window
- Includes the referenced image IDs in the tool output
- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
- Both tools will include the generated image ID in the tool output
⚠️ **Important**
- Image editing relies on image IDs, which are retained in the chat history.
- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
- The LLM can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
- You can also attach previously uploaded images from the side panel without needing to upload them again.
- This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.
### Parameters
#### Image Generation
• **prompt** text description (required)
• **size** `auto` (default), `1024x1024` (square), `1536x1024` (landscape), or `1024x1536` (portrait)
• **quality** `auto` (default), `high`, `medium`, or `low`
• **background** `auto` (default), `transparent`, or `opaque` (transparent requires PNG or WebP format)
#### Image Editing
• **image_ids** array of image IDs to use as reference for editing (required)
• **prompt** text description of the changes (required)
• **size** `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`
• **quality** `auto` (default), `high`, `medium`, or `low`
### Setup
Create or reuse an OpenAI key and add to `.env`:
```bash
IMAGE_GEN_OAI_API_KEY=sk-...
# optional extras
IMAGE_GEN_OAI_BASEURL=https://...
```
For Azure OpenAI deployments, you will first need access: https://aka.ms/oai/gptimage1access
Then, add your corresponding credentials to your `.env` file:
```bash
IMAGE_GEN_OAI_API_KEY=your-api-key
# optional extras
IMAGE_GEN_OAI_BASEURL=https://deploymentname.openai.azure.com/openai/deployments/gpt-image-1/
IMAGE_GEN_OAI_AZURE_API_VERSION=2025-04-01-preview
```
Then add "OpenAI Image Tools" to your Agent's *Tools* list.
### Advanced Configuration
You can customize the tool descriptions and prompt guidance by setting these environment variables:
```bash
# Image Generation Tool Descriptions
IMAGE_GEN_OAI_DESCRIPTION=...
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...
# Image Editing Tool Descriptions
IMAGE_EDIT_OAI_DESCRIPTION=...
IMAGE_EDIT_OAI_PROMPT_DESCRIPTION=...
```
### Pricing
See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-image-1) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1#cost-and-latency) for details on costs associated with image generation.
---
## 2 · Gemini Image Tools
Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
### Features
- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
- **Image Context Support**: Use existing images as context or inspiration for new generations
- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI
### Parameters
• **prompt** Detailed text description of the desired image (required, up to 32,000 characters)
• **image_ids** Optional array of image IDs to use as visual context for generation
### Setup
#### Option 1: Gemini API (Recommended)
Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):
```bash
GEMINI_API_KEY=your_api_key_here
```
#### Option 2: Vertex AI (Enterprise)
For Google Cloud users with Vertex AI access:
```bash
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
GOOGLE_CLOUD_LOCATION=us-central1 # optional, default: global
```
### Model Selection
```bash
# Default model (fast and efficient)
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
# Higher quality model
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
```
### Advanced Configuration
Customize tool descriptions via environment variables:
```bash
GEMINI_IMAGE_GEN_DESCRIPTION=...
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
GEMINI_IMAGE_IDS_DESCRIPTION=...
```
More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).
---
## 3 · DALL·E (legacy)
DALL·E provides high-quality image generation using OpenAI's legacy image models.
### Parameters
• **prompt** Text description of the desired image (required, up to 4000 characters)
• **style** `vivid` (hyper-real, dramatic - default) or `natural` (less hyper-real)
• **quality** `standard` (default) or `hd`
• **size** `1024x1024` (default/square), `1792x1024` (wide), or `1024x1792` (tall)
### Setup
```bash
# Required
DALLE_API_KEY=sk-... # or DALLE3_API_KEY=sk-...
# Optional
DALLE_REVERSE_PROXY=https://... # Alternative endpoint
DALLE3_BASEURL=https://... # For Azure or custom endpoints
DALLE3_AZURE_API_VERSION=2023-12-01-preview # For Azure deployments
DALLE3_SYSTEM_PROMPT=... # Custom system prompt for DALL·E
```
### Advanced Configuration
For Azure OpenAI deployments, configure both the base URL and API version:
```bash
DALLE3_BASEURL=https://your-resource-name.openai.azure.com/openai/deployments/your-deployment-name
DALLE3_AZURE_API_VERSION=2023-12-01-preview
DALLE3_API_KEY=your-azure-api-key
```
Enable the **DALL·E** tool for the Agent and start prompting.
### Pricing
See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=dall-e-3) for details on costs associated with image generation.
---
## 4 · Stable Diffusion (local)
Run images entirely on your own machine or server.
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
### Parameters
• **prompt** Detailed keywords describing desired elements in the image (required)
• **negative_prompt** Keywords describing elements to exclude from the image (required)
The Stable Diffusion implementation uses these default parameters:
- cfg_scale: 4.5
- steps: 22
- width: 1024
- height: 1024
These values are currently fixed but provide good results for most use cases.
### Setup
```bash
SD_WEBUI_URL=http://127.0.0.1:7860 # URL to your Automatic1111 WebUI
```
No API key required—just the reachable URL.
More details on setting up Automatic1111 can be found in the dedicated [Stable Diffusion guide](/docs/configuration/tools/stable_diffusion).
---
## 5 · Flux
Cloud generator with an emphasis on speed and optional fine-tuned models.
### Features
- Fast cloud-based image generation
- Support for fine-tuned models
- Multiple quality levels and aspect ratios
- Raw mode for less processed, more natural-looking images
### Parameters
The Flux tool supports three main actions:
1. **generate** - Create a new image from a text prompt
2. **generate_finetuned** - Create an image using a fine-tuned model
3. **list_finetunes** - List available custom models for the user
More details can be found in the dedicated [Flux guide](/docs/configuration/tools/flux#parameters).
### Setup
```bash
FLUX_API_KEY=flux_live_...
FLUX_API_BASE_URL=https://api.us1.bfl.ai # default is fine for most users
```
Choose the **Flux** tool inside the Agent. Prompts are plain text; one call produces one image.
### Pricing
See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs associated with image generation.
---
## 6 · Model Context Protocol (MCP)
Image outputs are supported from MCP servers.
For example, the [Puppeteer MCP Server](https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer) can be used to generate screenshots of web pages, which correctly output the image in the expected format and is treated the same as LC's built-in image tools.
> The examples below assume LibreChat is running outside of Docker, directly using Node.js. The Model Context Protocol is a relatively new framework, and many developers are still learning how to properly serve their systems with uv/node for scalable distribution.
> As this technology is still emerging, there are currently few image-generating servers available, and many existing ones have yet to adopt the correct response format for images.
> While many MCP servers do function well within Docker, the following examples do not, or not without more advanced configurations, showcasing some of the current inconsistency between MCP servers.
```yaml
mcpServers:
puppeteer:
command: npx
args:
- -y
- "@modelcontextprotocol/server-puppeteer"
```
The following is an example of an [Image Generation server](https://github.com/GongRzhe/Image-Generation-MCP-Server) that outputs images using [Replicate API](https://replicate.com/account/api-tokens), but returns URLs of the images, which doesn't conform to MCP's image response standard.
> Note: for this particular server, you need to install the `@gongrzhe/image-gen-server` package globally using npm, i.e. `npm install -g @gongrzhe/image-gen-server`, then point to the package's compiled files as shown below.
```yaml
mcpServers:
image-gen:
command: "node"
# First, install the package globally using npm:
# `npm install -g @gongrzhe/image-gen-server`
# Then, point to the location of the installed package,
# which you can find by running `npm root -g`
args:
- "{REPLACE_WITH_NODE_MODULES_LOCATION}/@gongrzhe/image-gen-server/build/index.js"
# Example with output from `npm root -g`:
# - "/home/danny/.nvm/versions/node/v20.19.0/lib/node_modules/@gongrzhe/image-gen-server/build/index.js"
env:
# Do not hardcode the API token here, use the environment variable instead
# The following will pick up the token from your .env file or environment
REPLICATE_API_TOKEN: "${REPLICATE_API_TOKEN}"
MODEL: "google/imagen-3"
```
---
## Image Storage and Handling
All generated images are:
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
2. Displayed directly in the chat interface
3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
- This may create issues if you are using an LLM that does not support image inputs.
- There will be an option to disable this behavior on a per-agent-basis in the future.
- These outputs are only directly sent to the LLM upon generation, not on every message.
- To include the image in the chat, you can directly attach it to the message from the side panel.
- To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
---
## Proxy Support
All image generation tools support proxy configuration through the `PROXY` environment variable:
```bash
PROXY=http://proxy-url:port
```
## Error Handling
If any of the tools encounter an error, they will return an error message explaining what went wrong. Common issues include:
- Invalid API key
- API unavailability
- Content policy violations
- Proxy/network issues
- Invalid parameters
- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above)
---
## Prompting
Though you can customize the prompts for [OpenAI Image Tools](#advanced-configuration) and [DALL·E](#advanced-configuration-1), the following tips inform the default prompts supplied by the tools, which is helpful to know for your own writing/prompting.
1. Start with the **subject** and **style** (photo, oil painting, etc.).
2. Add **composition** and **camera / medium** ("wide-angle shot of…", "watercolour…").
3. Mention **lighting & mood** ("golden hour", "dramatic shadows").
4. Finish with **detail keywords** (textures, colours, expressions).
5. Keep negatives positive—describe what should be included, not what to avoid.
Example:
> A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.