librechat.ai/pages/docs/features/image_gen.mdx

---
title: Image Generation & Editing
description: Comprehensive guide to LibreChat's built-in image generation and editing tools
---

## 🎨 Image Generation & Editing

LibreChat comes with **built-in image tools** that you can add to an **[Agent](/docs/features/agents).**

Each has its own look, price-point, and setup step (usually just an API key or URL).

| Tool | Best for | Needs |
|------|----------|-------|
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
| **MCP** | Bring-your-own-Image-Generators | MCP server with image output support |

**Notes:**
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
  - The LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.
  - See [Image Storage and Handling](#image-storage-and-handling) for more details.
- MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools.
  - Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).

---

## 1 · OpenAI Image Tools (recommended)

### Features

"OpenAI Image Tools" are an agent toolkit made up of 2 separate tools.

- **Image Generation**:
  - **Create** brand-new images from text prompts (no upload required).
- **Image Editing**:
  - **Edit** or **remix** the images you just uploaded—change colours, add objects, extend the canvas, etc.
- Both use OpenAI's latest image generation model, **GPT-Image-1**, for superior instruction following, text rendering, detailed editing, real-world knowledge
- See OpenAI's [Image Generation documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) for more details.

#### Generation vs. Editing
| Use-case | Invokes |
|----------|---------------|
| "Start from scratch" | **Image Generation** |
| "Use existing image(s)" | **Image Editing** |

The agent decides which tool to use based on the context:

- **Image Generation** creates brand new images from text descriptions only
- **Image Editing** modifies or remixes existing images using their image IDs
  - These can be images from the current message or previously generated/referenced images
  - The LLM keeps track of image IDs as long as they remain in the context window
  - Includes the referenced image IDs in the tool output
- Both tools are always available, but the LLM will choose the appropriate one based on the user's request
- Both tools will include the generated image ID in the tool output

⚠️ **Important**
- Image editing relies on image IDs, which are retained in the chat history.
- When files are uploaded to the current request, their image IDs are added to the context of the LLM before any tokens are generated.
- Previously referenced or generated image IDs can be used for editing, as long as they remain within the context window.
- The LLM can include any relevant image IDs in the `image_ids` array when calling the image editing tool.
- You can also attach previously uploaded images from the side panel without needing to upload them again.
  - This also has the added benefit of providing a vision model with the image context, which can be useful for informing the `prompt` for the image editing tool.

### Parameters

#### Image Generation
• **prompt** – text description (required)
• **size** – `auto` (default), `1024x1024` (square), `1536x1024` (landscape), or `1024x1536` (portrait)
• **quality** – `auto` (default), `high`, `medium`, or `low`
• **background** – `auto` (default), `transparent`, or `opaque` (transparent requires PNG or WebP format)

#### Image Editing

• **image_ids** – array of image IDs to use as reference for editing (required)
• **prompt** – text description of the changes (required)
• **size** – `auto` (default), `1024x1024`, `1536x1024`, `1024x1536`, `256x256`, or `512x512`
• **quality** – `auto` (default), `high`, `medium`, or `low`

### Setup
Create or reuse an OpenAI key and add to `.env`:

```bash
IMAGE_GEN_OAI_API_KEY=sk-...
# optional extras
IMAGE_GEN_OAI_BASEURL=https://...
```

For Azure OpenAI deployments, you will first need access: https://aka.ms/oai/gptimage1access

Then, add your corresponding credentials to your `.env` file:

```bash
IMAGE_GEN_OAI_API_KEY=your-api-key
# optional extras
IMAGE_GEN_OAI_BASEURL=https://deploymentname.openai.azure.com/openai/deployments/gpt-image-1/
IMAGE_GEN_OAI_AZURE_API_VERSION=2025-04-01-preview
```

Then add "OpenAI Image Tools" to your Agent's *Tools* list.

### Advanced Configuration

You can customize the tool descriptions and prompt guidance by setting these environment variables:

```bash
# Image Generation Tool Descriptions
IMAGE_GEN_OAI_DESCRIPTION=...
IMAGE_GEN_OAI_PROMPT_DESCRIPTION=...

# Image Editing Tool Descriptions
IMAGE_EDIT_OAI_DESCRIPTION=...
IMAGE_EDIT_OAI_PROMPT_DESCRIPTION=...
```

### Pricing
See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-image-1) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1#cost-and-latency) for details on costs associated with image generation.

---

## 2 · Gemini Image Tools

Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.

### Features

- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
- **Image Context Support**: Use existing images as context or inspiration for new generations
- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI

### Parameters

• **prompt** – Detailed text description of the desired image (required, up to 32,000 characters)
• **image_ids** – Optional array of image IDs to use as visual context for generation

### Setup

#### Option 1: Gemini API (Recommended)

Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):

```bash
GEMINI_API_KEY=your_api_key_here
```

#### Option 2: Vertex AI (Enterprise)

For Google Cloud users with Vertex AI access:

```bash
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
GOOGLE_CLOUD_LOCATION=us-central1  # optional, default: global
```

### Model Selection

```bash
# Default model (fast and efficient)
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image

# Higher quality model
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
```

### Advanced Configuration

Customize tool descriptions via environment variables:

```bash
GEMINI_IMAGE_GEN_DESCRIPTION=...
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
GEMINI_IMAGE_IDS_DESCRIPTION=...
```

More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).

---

## 3 · DALL·E (legacy)

DALL·E provides high-quality image generation using OpenAI's legacy image models.

### Parameters
• **prompt** – Text description of the desired image (required, up to 4000 characters)
• **style** – `vivid` (hyper-real, dramatic - default) or `natural` (less hyper-real)
• **quality** – `standard` (default) or `hd`
• **size** – `1024x1024` (default/square), `1792x1024` (wide), or `1024x1792` (tall)

### Setup

```bash
# Required
DALLE_API_KEY=sk-...  # or DALLE3_API_KEY=sk-...

# Optional
DALLE_REVERSE_PROXY=https://...  # Alternative endpoint
DALLE3_BASEURL=https://...  # For Azure or custom endpoints
DALLE3_AZURE_API_VERSION=2023-12-01-preview  # For Azure deployments
DALLE3_SYSTEM_PROMPT=...  # Custom system prompt for DALL·E
```

### Advanced Configuration
For Azure OpenAI deployments, configure both the base URL and API version:

```bash
DALLE3_BASEURL=https://your-resource-name.openai.azure.com/openai/deployments/your-deployment-name
DALLE3_AZURE_API_VERSION=2023-12-01-preview
DALLE3_API_KEY=your-azure-api-key
```

Enable the **DALL·E** tool for the Agent and start prompting.

### Pricing
See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3) and [Image Generation Documentation](https://platform.openai.com/docs/guides/image-generation?image-generation-model=dall-e-3) for details on costs associated with image generation.

---

## 4 · Stable Diffusion (local)

Run images entirely on your own machine or server.
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.

### Parameters
• **prompt** – Detailed keywords describing desired elements in the image (required)
• **negative_prompt** – Keywords describing elements to exclude from the image (required)

The Stable Diffusion implementation uses these default parameters:
- cfg_scale: 4.5
- steps: 22
- width: 1024
- height: 1024

These values are currently fixed but provide good results for most use cases.

### Setup

```bash
SD_WEBUI_URL=http://127.0.0.1:7860  # URL to your Automatic1111 WebUI
```

No API key required—just the reachable URL.

More details on setting up Automatic1111 can be found in the dedicated [Stable Diffusion guide](/docs/configuration/tools/stable_diffusion).

---

## 5 · Flux

Cloud generator with an emphasis on speed and optional fine-tuned models.

### Features
- Fast cloud-based image generation
- Support for fine-tuned models
- Multiple quality levels and aspect ratios
- Raw mode for less processed, more natural-looking images

### Parameters
The Flux tool supports three main actions:

1. **generate** - Create a new image from a text prompt
2. **generate_finetuned** - Create an image using a fine-tuned model
3. **list_finetunes** - List available custom models for the user

More details can be found in the dedicated [Flux guide](/docs/configuration/tools/flux#parameters).

### Setup

```bash
FLUX_API_KEY=flux_live_...
FLUX_API_BASE_URL=https://api.us1.bfl.ai   # default is fine for most users
```

Choose the **Flux** tool inside the Agent. Prompts are plain text; one call produces one image.

### Pricing
See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs associated with image generation.

---

## 6 · Model Context Protocol (MCP)

Image outputs are supported from MCP servers.

For example, the [Puppeteer MCP Server](https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer) can be used to generate screenshots of web pages, which correctly output the image in the expected format and is treated the same as LC's built-in image tools.

> The examples below assume LibreChat is running outside of Docker, directly using Node.js. The Model Context Protocol is a relatively new framework, and many developers are still learning how to properly serve their systems with uv/node for scalable distribution.

> As this technology is still emerging, there are currently few image-generating servers available, and many existing ones have yet to adopt the correct response format for images.

> While many MCP servers do function well within Docker, the following examples do not, or not without more advanced configurations, showcasing some of the current inconsistency between MCP servers.

```yaml
mcpServers:
  puppeteer:
    command: npx
    args:
      - -y
      - "@modelcontextprotocol/server-puppeteer"
```

The following is an example of an [Image Generation server](https://github.com/GongRzhe/Image-Generation-MCP-Server) that outputs images using [Replicate API](https://replicate.com/account/api-tokens), but returns URLs of the images, which doesn't conform to MCP's image response standard.

> Note: for this particular server, you need to install the `@gongrzhe/image-gen-server` package globally using npm, i.e. `npm install -g @gongrzhe/image-gen-server`, then point to the package's compiled files as shown below.

```yaml
mcpServers:
  image-gen:
    command: "node"
    # First, install the package globally using npm:
    # `npm install -g @gongrzhe/image-gen-server`
    # Then, point to the location of the installed package,
    # which you can find by running `npm root -g`
    args:
      - "{REPLACE_WITH_NODE_MODULES_LOCATION}/@gongrzhe/image-gen-server/build/index.js"
      # Example with output from `npm root -g`:
      # - "/home/danny/.nvm/versions/node/v20.19.0/lib/node_modules/@gongrzhe/image-gen-server/build/index.js"
    env:
      # Do not hardcode the API token here, use the environment variable instead
      # The following will pick up the token from your .env file or environment
      REPLICATE_API_TOKEN: "${REPLICATE_API_TOKEN}"
      MODEL: "google/imagen-3"
```

---

## Image Storage and Handling

All generated images are:
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
2. Displayed directly in the chat interface
3. Image tool outputs are directly sent to the LLM as part of the immediate chat context following generation.
  - This may create issues if you are using an LLM that does not support image inputs.
  - There will be an option to disable this behavior on a per-agent-basis in the future.
  - These outputs are only directly sent to the LLM upon generation, not on every message.
  - To include the image in the chat, you can directly attach it to the message from the side panel.
  - To summarize, the LLM will only get vision context from images attached to user messages, and not from generations/edits, except for immediately after generation.

---

## Proxy Support

All image generation tools support proxy configuration through the `PROXY` environment variable:

```bash
PROXY=http://proxy-url:port
```

## Error Handling
If any of the tools encounter an error, they will return an error message explaining what went wrong. Common issues include:
- Invalid API key
- API unavailability
- Content policy violations
- Proxy/network issues
- Invalid parameters
- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above)

---

## Prompting

Though you can customize the prompts for [OpenAI Image Tools](#advanced-configuration) and [DALL·E](#advanced-configuration-1), the following tips inform the default prompts supplied by the tools, which is helpful to know for your own writing/prompting.

1. Start with the **subject** and **style** (photo, oil painting, etc.).
2. Add **composition** and **camera / medium** ("wide-angle shot of…", "watercolour…").
3. Mention **lighting & mood** ("golden hour", "dramatic shadows").
4. Finish with **detail keywords** (textures, colours, expressions).
5. Keep negatives positive—describe what should be included, not what to avoid.

Example:

> A cinematic photo of an antique library bathed in warm afternoon sunlight. Tall wooden shelves overflow with leather-bound books, and dust particles shimmer in the light. A single green-shaded banker's lamp illuminates an open atlas on a polished mahogany desk in the foreground. 85 mm lens, shallow depth of field, rich amber tones, ultra-high detail.