mirror of
https://github.com/LibreChat-AI/librechat.ai.git
synced 2026-03-27 10:48:32 +07:00
* docs(config): rewrite overview with FileTree and restart guide - Replace vague marketing copy with actionable config guide - Add FileTree showing 4 config files (.env, librechat.yaml, docker-compose.yml, docker-compose.override.yml) - Add file descriptions explaining what each config file controls - Add restart Callout with Docker/Local tabs - Add next-step Cards linking to librechat.yaml setup, Docker setup, .env reference * docs(config): restructure librechat.yaml with guide-first setup steps - Replace feature list with 4-step setup procedure using Steps component - Add Docker/Local Tabs for deployment-specific commands - Add OpenRouter worked example with OPENROUTER_KEY warning - Add Reference section with Cards linking to ai_endpoints and object_structure - Merge troubleshooting content from setup.mdx into index page - Remove outdated model menu screenshots * refactor(config): merge setup.mdx into librechat.yaml and add redirect - Delete setup.mdx (content merged into index.mdx) - Remove setup from meta.json pages array - Update ai_endpoints link from /setup to /librechat_yaml - Add redirect for old /setup URL in next.config.mjs * docs(endpoints): rewrite OpenRouter as end-to-end setup guide - Replace bare YAML snippet with 5-step Steps component guide - Add OPENROUTER_KEY vs OPENROUTER_API_KEY warning callout - Add Docker/Local restart Tabs and verification step - Add customization section and reference Cards - Remove deprecated GitHub screenshot image * docs(setup): rewrite Docker install with first-login flow and troubleshooting - Add Steps-based installation with clone, env, start, and verify steps - Document first account = admin behavior at localhost:3080 - Add librechat.yaml volume mounting section with override file pattern - Add troubleshooting for port conflicts, container crashes, missing env vars - Replace deprecated Additional Links with Cards component for next steps - Cross-link to librechat.yaml guide, Docker override guide, and .env reference * docs(endpoints): add file context and activation steps to custom endpoints - Add 'Which File Does What' callout explaining librechat.yaml, .env, and docker-compose.override.yml roles - Rewrite Step 4 as 'Restart and Verify' with Docker/Local Tabs - Add troubleshooting callout for missing endpoints - Replace deprecated AdditionalLinks with Cards for next steps - Link to Configuration Overview for file relationship context * docs(endpoints): fix OpenRouter Step tag indentation * fix(docs): replace empty user guides hub with Cards and remove stale screenshot - Rewrite user_guides/index.mdx as Cards hub with Guides and Popular Features sections - Add cross-links to Agents, Image Gen, Web Search, and MCP feature pages - Remove stale screenshot from LiteLLM page (110412045 asset) - Verify auth section Next button works (SAML/auth0 -> pre_configured_ai) - Verify S3 page has all required sections (no changes needed) * docs(features): add Quick Start guide and cross-links to image generation * docs(tools): rewrite Google Search with agent-first Steps and cross-links * chore: update .gitignore * fix: make Docker/Local tabs switch content when clicked The TabCompat wrapper was rendering <Tabs.Tab> as plain <div> elements, which never registered with fumadocs' internal tab context. Clicking tab triggers had no effect because the content panels didn't respond to state changes. Fix: assign the real fumadocs Tab component as TabsCompat.Tab via Object.assign, so <Tabs.Tab> in MDX renders the real Tab that registers with the parent Tabs context. Keep standalone <Tab> (from auto-generated code blocks with filename=) as a plain div fallback to avoid crashes when there is no parent Tabs context.
436 lines
17 KiB
Plaintext
436 lines
17 KiB
Plaintext
---
|
||
title: Image Generation & Editing
|
||
icon: Image
|
||
description: Comprehensive guide to LibreChat's built-in image generation and editing tools
|
||
---
|
||
|
||
## Quick Start
|
||
|
||
Get image generation working in under 5 minutes with OpenAI Image Tools (recommended).
|
||
|
||
<Steps>
|
||
<Step>
|
||
|
||
### Create an Agent
|
||
|
||
Go to the **Agents** panel in LibreChat and create a new agent. Give it a name like "Image Creator".
|
||
|
||
</Step>
|
||
<Step>
|
||
|
||
### Add OpenAI Image Tools
|
||
|
||
In the agent's **Tools** list, select **OpenAI Image Tools**. This adds both image generation and image editing capabilities.
|
||
|
||
</Step>
|
||
<Step>
|
||
|
||
### Set Your API Key
|
||
|
||
Add the following to your `.env` file:
|
||
|
||
```bash filename=".env"
|
||
IMAGE_GEN_OAI_API_KEY=sk-your-openai-api-key
|
||
```
|
||
|
||
</Step>
|
||
<Step>
|
||
|
||
### Restart and Test
|
||
|
||
| Deployment | Command |
|
||
|------------|---------|
|
||
| Docker | `docker compose down && docker compose up -d` |
|
||
| Local | Stop (Ctrl+C) then `npm run backend` |
|
||
|
||
Send a message like "Generate an image of a sunset over mountains" to your agent.
|
||
|
||
</Step>
|
||
</Steps>
|
||
|
||
---
|
||
|
||
LibreChat comes with **built-in image tools** that you can add to an **[Agent](/docs/features/agents).**
|
||
|
||
Each has its own look, price-point, and setup step (usually just an API key or URL).
|
||
|
||
| Tool | Best for | Needs |
|
||
|------|----------|-------|
|
||
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
|
||
| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
|
||
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
|
||
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
|
||
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
|
||
| **MCP** | Bring-your-own-Image-Generators | MCP server with image output support |
|
||
|
||
**Notes:**
|
||
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
|
||
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||
- The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
|
||
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
|
||
- MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools.
|
||
- Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).
|
||
|
||
---
|
||
|
||
## 1 · OpenAI Image Tools (recommended)
|
||
|
||
### Features
|
||
|
||
"OpenAI Image Tools" are an agent toolkit made up of 2 separate tools.
|
||
|
||
- **Image Generation**:
|
||
- **Create** brand-new images from text prompts (no upload required).
|
||
- **Image Editing**:
|
||
- **Edit** or **remix** the images you just uploaded—change colours, add objects, extend the canvas, etc.
|
||
- Both use OpenAI's latest image generation model, **GPT-Image-1**, for superior instruction following, text rendering, detailed editing, real-world knowledge
|
||
- See OpenAI's [Image Generation documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) for more details.
|
||
|
||
#### Generation vs. Editing
|
||
| Use-case | Invokes |
|
||
|----------|---------------|
|
||
| "Start from scratch" | **Image Generation** |
|
||
| "Use existing image(s)" | **Image Editing** |
|
||
|
||
The agent decides which tool to use based on the context:
|
||
|
||
- **Image Generation** creates brand new images from text descriptions only
|
||
- **Image Editing** modifies or remixes existing images using their image IDs
|
||
- These can be images from the current message or previously generated/referenced images
|
||
- The LLM keeps track of image IDs as long as they remain in the context window
|
||
- Includes the referenced image IDs in the tool output
|
||
- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
|
||
- Both tools will include the generated image ID in the tool output
|
||
|
||
⚠️ **Important**
|
||
- Image editing relies on image IDs, which are retained in the chat history.
|
||
- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
|
||
- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
|
||
- The LLM can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
|
||
- You can also attach previously uploaded images from the side panel without needing to upload them again.
|
||
- This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.
|
||
|
||
### Parameters
|
||
|
||
#### Image Generation
|
||
• **prompt** – text description (required)
|
||
• **size** – `auto` (default), `1024x1024` (square), `1536x1024` (landscape), or `1024x1536` (portrait)
|
||
• **quality** – `auto` (default), `high`, `medium`, or `low`
|
||
• **background** – `auto` (default), `transparent`, or `opaque` (transparent requires PNG or WebP format)
|
||
|
||
#### Image Editing
|
||
|
||
• **image_ids** – array of image IDs to use as reference for editing (required)
|
||
• **prompt** – text description of the changes (required)
|
||
• **size** – `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`
|
||
• **quality** – `auto` (default), `high`, `medium`, or `low`
|
||
|
||
### Setup
|
||
Create or reuse an OpenAI key and add to `.env`:
|
||
|
||
```bash
|
||
IMAGE_GEN_OAI_API_KEY=sk-...
|
||
# optional extras
|
||
IMAGE_GEN_OAI_BASEURL=https://...
|
||
```
|
||
|
||
For Azure OpenAI deployments, you will first need access: https://aka.ms/oai/gptimage1access
|
||
|
||
Then, add your corresponding credentials to your `.env` file:
|
||
|
||
```bash
|
||
IMAGE_GEN_OAI_API_KEY=your-api-key
|
||
# optional extras
|
||
IMAGE_GEN_OAI_BASEURL=https://deploymentname.openai.azure.com/openai/deployments/gpt-image-1/
|
||
IMAGE_GEN_OAI_AZURE_API_VERSION=2025-04-01-preview
|
||
```
|
||
|
||
Then add "OpenAI Image Tools" to your Agent's *Tools* list.
|
||
|
||
### Advanced Configuration
|
||
|
||
You can customize the tool descriptions and prompt guidance by setting these environment variables:
|
||
|
||
```bash
|
||
# Image Generation Tool Descriptions
|
||
IMAGE_GEN_OAI_DESCRIPTION=...
|
||
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...
|
||
|
||
# Image Editing Tool Descriptions
|
||
IMAGE_EDIT_OAI_DESCRIPTION=...
|
||
IMAGE_EDIT_OAI_PROMPT_DESCRIPTION=...
|
||
```
|
||
|
||
### Pricing
|
||
See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-image-1) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1#cost-and-latency) for details on costs associated with image generation.
|
||
|
||
---
|
||
|
||
## 2 · Gemini Image Tools
|
||
|
||
Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
|
||
|
||
### Features
|
||
|
||
- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
|
||
- **Image Context Support**: Use existing images as context or inspiration for new generations
|
||
- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
|
||
- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
|
||
- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI
|
||
|
||
### Parameters
|
||
|
||
• **prompt** – Detailed text description of the desired image (required, up to 32,000 characters)
|
||
• **image_ids** – Optional array of image IDs to use as visual context for generation
|
||
|
||
### Setup
|
||
|
||
#### Option 1: Gemini API (Recommended)
|
||
|
||
Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):
|
||
|
||
```bash
|
||
GEMINI_API_KEY=your_api_key_here
|
||
```
|
||
|
||
#### Option 2: Vertex AI (Enterprise)
|
||
|
||
For Google Cloud users with Vertex AI access:
|
||
|
||
```bash
|
||
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
|
||
GOOGLE_CLOUD_LOCATION=us-central1 # optional, default: global
|
||
```
|
||
|
||
### Model Selection
|
||
|
||
```bash
|
||
# Default model (fast and efficient)
|
||
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
|
||
|
||
# Higher quality model
|
||
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
|
||
```
|
||
|
||
### Advanced Configuration
|
||
|
||
Customize tool descriptions via environment variables:
|
||
|
||
```bash
|
||
GEMINI_IMAGE_GEN_DESCRIPTION=...
|
||
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
|
||
GEMINI_IMAGE_IDS_DESCRIPTION=...
|
||
```
|
||
|
||
More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).
|
||
|
||
---
|
||
|
||
## 3 · DALL·E (legacy)
|
||
|
||
DALL·E provides high-quality image generation using OpenAI's legacy image models.
|
||
|
||
### Parameters
|
||
• **prompt** – Text description of the desired image (required, up to 4000 characters)
|
||
• **style** – `vivid` (hyper-real, dramatic - default) or `natural` (less hyper-real)
|
||
• **quality** – `standard` (default) or `hd`
|
||
• **size** – `1024x1024` (default/square), `1792x1024` (wide), or `1024x1792` (tall)
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
# Required
|
||
DALLE_API_KEY=sk-... # or DALLE3_API_KEY=sk-...
|
||
|
||
# Optional
|
||
DALLE_REVERSE_PROXY=https://... # Alternative endpoint
|
||
DALLE3_BASEURL=https://... # For Azure or custom endpoints
|
||
DALLE3_AZURE_API_VERSION=2023-12-01-preview # For Azure deployments
|
||
DALLE3_SYSTEM_PROMPT=... # Custom system prompt for DALL·E
|
||
```
|
||
|
||
### Advanced Configuration
|
||
For Azure OpenAI deployments, configure both the base URL and API version:
|
||
|
||
```bash
|
||
DALLE3_BASEURL=https://your-resource-name.openai.azure.com/openai/deployments/your-deployment-name
|
||
DALLE3_AZURE_API_VERSION=2023-12-01-preview
|
||
DALLE3_API_KEY=your-azure-api-key
|
||
```
|
||
|
||
Enable the **DALL·E** tool for the Agent and start prompting.
|
||
|
||
### Pricing
|
||
See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=dall-e-3) for details on costs associated with image generation.
|
||
|
||
---
|
||
|
||
## 4 · Stable Diffusion (local)
|
||
|
||
Run images entirely on your own machine or server.
|
||
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
|
||
|
||
### Parameters
|
||
• **prompt** – Detailed keywords describing desired elements in the image (required)
|
||
• **negative_prompt** – Keywords describing elements to exclude from the image (required)
|
||
|
||
The Stable Diffusion implementation uses these default parameters:
|
||
- cfg_scale: 4.5
|
||
- steps: 22
|
||
- width: 1024
|
||
- height: 1024
|
||
|
||
These values are currently fixed but provide good results for most use cases.
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
SD_WEBUI_URL=http://127.0.0.1:7860 # URL to your Automatic1111 WebUI
|
||
```
|
||
|
||
No API key required—just the reachable URL.
|
||
|
||
More details on setting up Automatic1111 can be found in the dedicated [Stable Diffusion guide](/docs/configuration/tools/stable_diffusion).
|
||
|
||
---
|
||
|
||
## 5 · Flux
|
||
|
||
Cloud generator with an emphasis on speed and optional fine-tuned models.
|
||
|
||
### Features
|
||
- Fast cloud-based image generation
|
||
- Support for fine-tuned models
|
||
- Multiple quality levels and aspect ratios
|
||
- Raw mode for less processed, more natural-looking images
|
||
|
||
### Parameters
|
||
The Flux tool supports three main actions:
|
||
|
||
1. **generate** - Create a new image from a text prompt
|
||
2. **generate_finetuned** - Create an image using a fine-tuned model
|
||
3. **list_finetunes** - List available custom models for the user
|
||
|
||
More details can be found in the dedicated [Flux guide](/docs/configuration/tools/flux#parameters).
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
FLUX_API_KEY=flux_live_...
|
||
FLUX_API_BASE_URL=https://api.us1.bfl.ai # default is fine for most users
|
||
```
|
||
|
||
Choose the **Flux** tool inside the Agent. Prompts are plain text; one call produces one image.
|
||
|
||
### Pricing
|
||
See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs associated with image generation.
|
||
|
||
---
|
||
|
||
## 6 · Model Context Protocol (MCP)
|
||
|
||
Image outputs are supported from MCP servers.
|
||
|
||
For example, the [Puppeteer MCP Server](https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer) can be used to generate screenshots of web pages, which correctly output the image in the expected format and is treated the same as LC's built-in image tools.
|
||
|
||
> The examples below assume LibreChat is running outside of Docker, directly using Node.js. The Model Context Protocol is a relatively new framework, and many developers are still learning how to properly serve their systems with uv/node for scalable distribution.
|
||
|
||
> As this technology is still emerging, there are currently few image-generating servers available, and many existing ones have yet to adopt the correct response format for images.
|
||
|
||
> While many MCP servers do function well within Docker, the following examples do not, or not without more advanced configurations, showcasing some of the current inconsistency between MCP servers.
|
||
|
||
```yaml
|
||
mcpServers:
|
||
puppeteer:
|
||
command: npx
|
||
args:
|
||
- -y
|
||
- "@modelcontextprotocol/server-puppeteer"
|
||
```
|
||
|
||
The following is an example of an [Image Generation server](https://github.com/GongRzhe/Image-Generation-MCP-Server) that outputs images using [Replicate API](https://replicate.com/account/api-tokens), but returns URLs of the images, which doesn't conform to MCP's image response standard.
|
||
|
||
> Note: for this particular server, you need to install the `@gongrzhe/image-gen-server` package globally using npm, i.e. `npm install -g @gongrzhe/image-gen-server`, then point to the package's compiled files as shown below.
|
||
|
||
```yaml
|
||
mcpServers:
|
||
image-gen:
|
||
command: "node"
|
||
# First, install the package globally using npm:
|
||
# `npm install -g @gongrzhe/image-gen-server`
|
||
# Then, point to the location of the installed package,
|
||
# which you can find by running `npm root -g`
|
||
args:
|
||
- "{REPLACE_WITH_NODE_MODULES_LOCATION}/@gongrzhe/image-gen-server/build/index.js"
|
||
# Example with output from `npm root -g`:
|
||
# - "/home/danny/.nvm/versions/node/v20.19.0/lib/node_modules/@gongrzhe/image-gen-server/build/index.js"
|
||
env:
|
||
# Do not hardcode the API token here, use the environment variable instead
|
||
# The following will pick up the token from your .env file or environment
|
||
REPLICATE_API_TOKEN: "${REPLICATE_API_TOKEN}"
|
||
MODEL: "google/imagen-3"
|
||
```
|
||
|
||
---
|
||
|
||
## Image Storage and Handling
|
||
|
||
All generated images are:
|
||
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
|
||
2. Displayed directly in the chat interface
|
||
3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||
- This may create issues if you are using an LLM that does not support image inputs.
|
||
- There will be an option to disable this behavior on a per-agent-basis in the future.
|
||
- These outputs are only directly sent to the LLM upon generation, not on every message.
|
||
- To include the image in the chat, you can directly attach it to the message from the side panel.
|
||
- To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
|
||
|
||
---
|
||
|
||
## Proxy Support
|
||
|
||
All image generation tools support proxy configuration through the `PROXY` environment variable:
|
||
|
||
```bash
|
||
PROXY=http://proxy-url:port
|
||
```
|
||
|
||
## Error Handling
|
||
If any of the tools encounter an error, they will return an error message explaining what went wrong. Common issues include:
|
||
- Invalid API key
|
||
- API unavailability
|
||
- Content policy violations
|
||
- Proxy/network issues
|
||
- Invalid parameters
|
||
- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above)
|
||
|
||
---
|
||
|
||
## Prompting
|
||
|
||
Though you can customize the prompts for [OpenAI Image Tools](#advanced-configuration) and [DALL·E](#advanced-configuration-1), the following tips inform the default prompts supplied by the tools, which is helpful to know for your own writing/prompting.
|
||
|
||
1. Start with the **subject** and **style** (photo, oil painting, etc.).
|
||
2. Add **composition** and **camera / medium** ("wide-angle shot of…", "watercolour…").
|
||
3. Mention **lighting & mood** ("golden hour", "dramatic shadows").
|
||
4. Finish with **detail keywords** (textures, colours, expressions).
|
||
5. Keep negatives positive—describe what should be included, not what to avoid.
|
||
|
||
Example:
|
||
|
||
> A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.
|
||
|
||
## Related Pages
|
||
|
||
<Cards num={3}>
|
||
<Cards.Card title="Agents" href="/docs/features/agents" arrow>
|
||
Create and configure AI agents with custom tools
|
||
</Cards.Card>
|
||
<Cards.Card title="MCP Servers" href="/docs/features/mcp" arrow>
|
||
Bring your own tools via Model Context Protocol
|
||
</Cards.Card>
|
||
<Cards.Card title="Gemini Image Tools" href="/docs/configuration/tools/gemini_image_gen" arrow>
|
||
Detailed setup guide for Google Gemini image generation
|
||
</Cards.Card>
|
||
</Cards>
|