Files
librechat.ai/content/docs/features/image_gen.mdx
Marco Beretta c8dfc77b19 docs: rewrite high-traffic pages as actionable step-by-step guides (#530)
* docs(config): rewrite overview with FileTree and restart guide

- Replace vague marketing copy with actionable config guide
- Add FileTree showing 4 config files (.env, librechat.yaml, docker-compose.yml, docker-compose.override.yml)
- Add file descriptions explaining what each config file controls
- Add restart Callout with Docker/Local tabs
- Add next-step Cards linking to librechat.yaml setup, Docker setup, .env reference

* docs(config): restructure librechat.yaml with guide-first setup steps

- Replace feature list with 4-step setup procedure using Steps component
- Add Docker/Local Tabs for deployment-specific commands
- Add OpenRouter worked example with OPENROUTER_KEY warning
- Add Reference section with Cards linking to ai_endpoints and object_structure
- Merge troubleshooting content from setup.mdx into index page
- Remove outdated model menu screenshots

* refactor(config): merge setup.mdx into librechat.yaml and add redirect

- Delete setup.mdx (content merged into index.mdx)
- Remove setup from meta.json pages array
- Update ai_endpoints link from /setup to /librechat_yaml
- Add redirect for old /setup URL in next.config.mjs

* docs(endpoints): rewrite OpenRouter as end-to-end setup guide

- Replace bare YAML snippet with 5-step Steps component guide
- Add OPENROUTER_KEY vs OPENROUTER_API_KEY warning callout
- Add Docker/Local restart Tabs and verification step
- Add customization section and reference Cards
- Remove deprecated GitHub screenshot image

* docs(setup): rewrite Docker install with first-login flow and troubleshooting

- Add Steps-based installation with clone, env, start, and verify steps
- Document first account = admin behavior at localhost:3080
- Add librechat.yaml volume mounting section with override file pattern
- Add troubleshooting for port conflicts, container crashes, missing env vars
- Replace deprecated Additional Links with Cards component for next steps
- Cross-link to librechat.yaml guide, Docker override guide, and .env reference

* docs(endpoints): add file context and activation steps to custom endpoints

- Add 'Which File Does What' callout explaining librechat.yaml, .env, and docker-compose.override.yml roles
- Rewrite Step 4 as 'Restart and Verify' with Docker/Local Tabs
- Add troubleshooting callout for missing endpoints
- Replace deprecated AdditionalLinks with Cards for next steps
- Link to Configuration Overview for file relationship context

* docs(endpoints): fix OpenRouter Step tag indentation

* fix(docs): replace empty user guides hub with Cards and remove stale screenshot

- Rewrite user_guides/index.mdx as Cards hub with Guides and Popular Features sections
- Add cross-links to Agents, Image Gen, Web Search, and MCP feature pages
- Remove stale screenshot from LiteLLM page (110412045 asset)
- Verify auth section Next button works (SAML/auth0 -> pre_configured_ai)
- Verify S3 page has all required sections (no changes needed)

* docs(features): add Quick Start guide and cross-links to image generation

* docs(tools): rewrite Google Search with agent-first Steps and cross-links

* chore: update .gitignore

* fix: make Docker/Local tabs switch content when clicked

The TabCompat wrapper was rendering <Tabs.Tab> as plain <div> elements,
which never registered with fumadocs' internal tab context. Clicking tab
triggers had no effect because the content panels didn't respond to state
changes.

Fix: assign the real fumadocs Tab component as TabsCompat.Tab via
Object.assign, so <Tabs.Tab> in MDX renders the real Tab that registers
with the parent Tabs context. Keep standalone <Tab> (from auto-generated
code blocks with filename=) as a plain div fallback to avoid crashes
when there is no parent Tabs context.
2026-03-20 00:23:49 +01:00

436 lines
17 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Image Generation & Editing
icon: Image
description: Comprehensive guide to LibreChat's built-in image generation and editing tools
---
## Quick Start
Get image generation working in under 5 minutes with OpenAI Image Tools (recommended).
<Steps>
<Step>
### Create an Agent
Go to the **Agents** panel in LibreChat and create a new agent. Give it a name like "Image Creator".
</Step>
<Step>
### Add OpenAI Image Tools
In the agent's **Tools** list, select **OpenAI Image Tools**. This adds both image generation and image editing capabilities.
</Step>
<Step>
### Set Your API Key
Add the following to your `.env` file:
```bash filename=".env"
IMAGE_GEN_OAI_API_KEY=sk-your-openai-api-key
```
</Step>
<Step>
### Restart and Test
| Deployment | Command |
|------------|---------|
| Docker | `docker compose down && docker compose up -d` |
| Local | Stop (Ctrl+C) then `npm run backend` |
Send a message like "Generate an image of a sunset over mountains" to your agent.
</Step>
</Steps>
---
LibreChat comes with **built-in image tools** that you can add to an **[Agent](/docs/features/agents).**
Each has its own look, price-point, and setup step (usually just an API key or URL).
| Tool | Best for | Needs |
|------|----------|-------|
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
| **MCP** | Bring-your-own-Image-Generators | MCP server with image output support |
**Notes:**
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
- The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
- MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools.
- Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).
---
## 1 · OpenAI Image Tools (recommended)
### Features
"OpenAI Image Tools" are an agent toolkit made up of 2 separate tools.
- **Image Generation**:
- **Create** brand-new images from text prompts (no upload required).
- **Image Editing**:
- **Edit** or **remix** the images you just uploaded—change colours, add objects, extend the canvas, etc.
- Both use OpenAI's latest image generation model, **GPT-Image-1**, for superior instruction following, text rendering, detailed editing, real-world knowledge
- See OpenAI's [Image Generation documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) for more details.
#### Generation vs. Editing
| Use-case | Invokes |
|----------|---------------|
| "Start from scratch" | **Image Generation** |
| "Use existing image(s)" | **Image Editing** |
The agent decides which tool to use based on the context:
- **Image Generation** creates brand new images from text descriptions only
- **Image Editing** modifies or remixes existing images using their image IDs
- These can be images from the current message or previously generated/referenced images
- The LLM keeps track of image IDs as long as they remain in the context window
- Includes the referenced image IDs in the tool output
- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
- Both tools will include the generated image ID in the tool output
⚠️ **Important**
- Image editing relies on image IDs, which are retained in the chat history.
- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
- The LLM can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
- You can also attach previously uploaded images from the side panel without needing to upload them again.
- This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.
### Parameters
#### Image Generation
• **prompt** text description (required)
• **size** `auto` (default), `1024x1024` (square), `1536x1024` (landscape), or `1024x1536` (portrait)
• **quality** `auto` (default), `high`, `medium`, or `low`
• **background** `auto` (default), `transparent`, or `opaque` (transparent requires PNG or WebP format)
#### Image Editing
• **image_ids** array of image IDs to use as reference for editing (required)
• **prompt** text description of the changes (required)
• **size** `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`
• **quality** `auto` (default), `high`, `medium`, or `low`
### Setup
Create or reuse an OpenAI key and add to `.env`:
```bash
IMAGE_GEN_OAI_API_KEY=sk-...
# optional extras
IMAGE_GEN_OAI_BASEURL=https://...
```
For Azure OpenAI deployments, you will first need access: https://aka.ms/oai/gptimage1access
Then, add your corresponding credentials to your `.env` file:
```bash
IMAGE_GEN_OAI_API_KEY=your-api-key
# optional extras
IMAGE_GEN_OAI_BASEURL=https://deploymentname.openai.azure.com/openai/deployments/gpt-image-1/
IMAGE_GEN_OAI_AZURE_API_VERSION=2025-04-01-preview
```
Then add "OpenAI Image Tools" to your Agent's *Tools* list.
### Advanced Configuration
You can customize the tool descriptions and prompt guidance by setting these environment variables:
```bash
# Image Generation Tool Descriptions
IMAGE_GEN_OAI_DESCRIPTION=...
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...
# Image Editing Tool Descriptions
IMAGE_EDIT_OAI_DESCRIPTION=...
IMAGE_EDIT_OAI_PROMPT_DESCRIPTION=...
```
### Pricing
See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-image-1) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1#cost-and-latency) for details on costs associated with image generation.
---
## 2 · Gemini Image Tools
Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
### Features
- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
- **Image Context Support**: Use existing images as context or inspiration for new generations
- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI
### Parameters
• **prompt** Detailed text description of the desired image (required, up to 32,000 characters)
• **image_ids** Optional array of image IDs to use as visual context for generation
### Setup
#### Option 1: Gemini API (Recommended)
Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):
```bash
GEMINI_API_KEY=your_api_key_here
```
#### Option 2: Vertex AI (Enterprise)
For Google Cloud users with Vertex AI access:
```bash
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
GOOGLE_CLOUD_LOCATION=us-central1 # optional, default: global
```
### Model Selection
```bash
# Default model (fast and efficient)
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
# Higher quality model
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
```
### Advanced Configuration
Customize tool descriptions via environment variables:
```bash
GEMINI_IMAGE_GEN_DESCRIPTION=...
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
GEMINI_IMAGE_IDS_DESCRIPTION=...
```
More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).
---
## 3 · DALL·E (legacy)
DALL·E provides high-quality image generation using OpenAI's legacy image models.
### Parameters
• **prompt** Text description of the desired image (required, up to 4000 characters)
• **style** `vivid` (hyper-real, dramatic - default) or `natural` (less hyper-real)
• **quality** `standard` (default) or `hd`
• **size** `1024x1024` (default/square), `1792x1024` (wide), or `1024x1792` (tall)
### Setup
```bash
# Required
DALLE_API_KEY=sk-... # or DALLE3_API_KEY=sk-...
# Optional
DALLE_REVERSE_PROXY=https://... # Alternative endpoint
DALLE3_BASEURL=https://... # For Azure or custom endpoints
DALLE3_AZURE_API_VERSION=2023-12-01-preview # For Azure deployments
DALLE3_SYSTEM_PROMPT=... # Custom system prompt for DALL·E
```
### Advanced Configuration
For Azure OpenAI deployments, configure both the base URL and API version:
```bash
DALLE3_BASEURL=https://your-resource-name.openai.azure.com/openai/deployments/your-deployment-name
DALLE3_AZURE_API_VERSION=2023-12-01-preview
DALLE3_API_KEY=your-azure-api-key
```
Enable the **DALL·E** tool for the Agent and start prompting.
### Pricing
See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=dall-e-3) for details on costs associated with image generation.
---
## 4 · Stable Diffusion (local)
Run images entirely on your own machine or server.
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
### Parameters
• **prompt** Detailed keywords describing desired elements in the image (required)
• **negative_prompt** Keywords describing elements to exclude from the image (required)
The Stable Diffusion implementation uses these default parameters:
- cfg_scale: 4.5
- steps: 22
- width: 1024
- height: 1024
These values are currently fixed but provide good results for most use cases.
### Setup
```bash
SD_WEBUI_URL=http://127.0.0.1:7860 # URL to your Automatic1111 WebUI
```
No API key required—just the reachable URL.
More details on setting up Automatic1111 can be found in the dedicated [Stable Diffusion guide](/docs/configuration/tools/stable_diffusion).
---
## 5 · Flux
Cloud generator with an emphasis on speed and optional fine-tuned models.
### Features
- Fast cloud-based image generation
- Support for fine-tuned models
- Multiple quality levels and aspect ratios
- Raw mode for less processed, more natural-looking images
### Parameters
The Flux tool supports three main actions:
1. **generate** - Create a new image from a text prompt
2. **generate_finetuned** - Create an image using a fine-tuned model
3. **list_finetunes** - List available custom models for the user
More details can be found in the dedicated [Flux guide](/docs/configuration/tools/flux#parameters).
### Setup
```bash
FLUX_API_KEY=flux_live_...
FLUX_API_BASE_URL=https://api.us1.bfl.ai # default is fine for most users
```
Choose the **Flux** tool inside the Agent. Prompts are plain text; one call produces one image.
### Pricing
See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs associated with image generation.
---
## 6 · Model Context Protocol (MCP)
Image outputs are supported from MCP servers.
For example, the [Puppeteer MCP Server](https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer) can be used to generate screenshots of web pages, which correctly output the image in the expected format and is treated the same as LC's built-in image tools.
> The examples below assume LibreChat is running outside of Docker, directly using Node.js. The Model Context Protocol is a relatively new framework, and many developers are still learning how to properly serve their systems with uv/node for scalable distribution.
> As this technology is still emerging, there are currently few image-generating servers available, and many existing ones have yet to adopt the correct response format for images.
> While many MCP servers do function well within Docker, the following examples do not, or not without more advanced configurations, showcasing some of the current inconsistency between MCP servers.
```yaml
mcpServers:
puppeteer:
command: npx
args:
- -y
- "@modelcontextprotocol/server-puppeteer"
```
The following is an example of an [Image Generation server](https://github.com/GongRzhe/Image-Generation-MCP-Server) that outputs images using [Replicate API](https://replicate.com/account/api-tokens), but returns URLs of the images, which doesn't conform to MCP's image response standard.
> Note: for this particular server, you need to install the `@gongrzhe/image-gen-server` package globally using npm, i.e. `npm install -g @gongrzhe/image-gen-server`, then point to the package's compiled files as shown below.
```yaml
mcpServers:
image-gen:
command: "node"
# First, install the package globally using npm:
# `npm install -g @gongrzhe/image-gen-server`
# Then, point to the location of the installed package,
# which you can find by running `npm root -g`
args:
- "{REPLACE_WITH_NODE_MODULES_LOCATION}/@gongrzhe/image-gen-server/build/index.js"
# Example with output from `npm root -g`:
# - "/home/danny/.nvm/versions/node/v20.19.0/lib/node_modules/@gongrzhe/image-gen-server/build/index.js"
env:
# Do not hardcode the API token here, use the environment variable instead
# The following will pick up the token from your .env file or environment
REPLICATE_API_TOKEN: "${REPLICATE_API_TOKEN}"
MODEL: "google/imagen-3"
```
---
## Image Storage and Handling
All generated images are:
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
2. Displayed directly in the chat interface
3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
- This may create issues if you are using an LLM that does not support image inputs.
- There will be an option to disable this behavior on a per-agent-basis in the future.
- These outputs are only directly sent to the LLM upon generation, not on every message.
- To include the image in the chat, you can directly attach it to the message from the side panel.
- To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
---
## Proxy Support
All image generation tools support proxy configuration through the `PROXY` environment variable:
```bash
PROXY=http://proxy-url:port
```
## Error Handling
If any of the tools encounter an error, they will return an error message explaining what went wrong. Common issues include:
- Invalid API key
- API unavailability
- Content policy violations
- Proxy/network issues
- Invalid parameters
- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above)
---
## Prompting
Though you can customize the prompts for [OpenAI Image Tools](#advanced-configuration) and [DALL·E](#advanced-configuration-1), the following tips inform the default prompts supplied by the tools, which is helpful to know for your own writing/prompting.
1. Start with the **subject** and **style** (photo, oil painting, etc.).
2. Add **composition** and **camera / medium** ("wide-angle shot of…", "watercolour…").
3. Mention **lighting & mood** ("golden hour", "dramatic shadows").
4. Finish with **detail keywords** (textures, colours, expressions).
5. Keep negatives positive—describe what should be included, not what to avoid.
Example:
> A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.
## Related Pages
<Cards num={3}>
<Cards.Card title="Agents" href="/docs/features/agents" arrow>
Create and configure AI agents with custom tools
</Cards.Card>
<Cards.Card title="MCP Servers" href="/docs/features/mcp" arrow>
Bring your own tools via Model Context Protocol
</Cards.Card>
<Cards.Card title="Gemini Image Tools" href="/docs/configuration/tools/gemini_image_gen" arrow>
Detailed setup guide for Google Gemini image generation
</Cards.Card>
</Cards>