docs: Add Gemini Image Generation tool documentation (#452)

* docs: Add Gemini Image Generation tool documentation

* docs: Add Gemini Image Tools to image_gen features page

* fix: Update default model to gemini-2.5-flash-image
This commit is contained in:
Joseph Licata
2026-01-03 12:48:43 -05:00
committed by GitHub
parent f2a90bb6ac
commit 2bab1c9cd2
3 changed files with 223 additions and 4 deletions

View File

@@ -1,4 +1,5 @@
export default { export default {
index: 'Intro', index: 'Intro',
flux: 'Flux', flux: 'Flux',
gemini_image_gen: 'Gemini Image Gen',
} }

View File

@@ -0,0 +1,157 @@
---
title: Gemini Image Generation
description: Setup and usage instructions for Google Gemini image generation
---
# Gemini Image Generation
Gemini Image Generation is a powerful tool that integrates Google's Gemini Image Models for high-quality text-to-image generation and image context-aware editing. It supports both the simple Gemini API and Google Cloud Vertex AI.
## Setup Instructions
You can use either the Gemini API (recommended for most users) or Vertex AI with a service account.
### Option 1: Gemini API (Recommended)
1. Get your API key from [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Set the `GEMINI_API_KEY` environment variable in your `.env` file:
```bash
GEMINI_API_KEY=your_api_key_here
```
### Option 2: Vertex AI (For Enterprise/GCP Users)
1. Create a service account in Google Cloud Console with Vertex AI permissions
2. Download the service account JSON key file
3. Configure the environment variables:
```bash
# Path to your service account JSON file
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
# Optional: Set the location (default: global)
GOOGLE_CLOUD_LOCATION=us-central1
```
## Configuration Options
### Model Selection
You can choose which Gemini image model to use via environment variable:
```bash
# Default model
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
# Or use the newer Gemini 3 Pro Image model
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
```
### Available Models
| Model | Description |
|-------|-------------|
| `gemini-2.5-flash-image` | Default model, fast and efficient |
| `gemini-3-pro-image-preview` | Higher quality, more detailed generations |
## Features
### Core Capabilities
- **Text-to-Image Generation**: Create images from detailed text descriptions
- **Image Context Support**: Use existing images as context/inspiration for new generations
- **Image Editing**: Generate new images based on modifications to existing ones
- **Safety Filtering**: Built-in content safety with user-friendly error messages
- **Multi-Storage Support**: Compatible with local, S3, Azure, and Firebase storage strategies
### Parameters
The Gemini Image Gen tool accepts the following parameters:
- **prompt** (required) A detailed text description of the desired image, up to 32,000 characters
- **image_ids** (optional) Array of image IDs to use as visual context for generation
## Best Practices
### Prompt Writing
1. **Be specific and detailed** in your descriptions
2. **Start with the image type**: photo, oil painting, watercolor, illustration, cartoon, drawing, vector, render, etc.
3. **Include key elements**:
- Subject matter and composition
- Style and artistic approach
- Lighting and atmosphere
- Color palette preferences
- Technical specifications
### Image Editing Tips
When editing existing images:
1. **Include the original image ID** in the `image_ids` array
2. **Use direct editing instructions**:
- "Remove the background from this image"
- "Add sunglasses to the person in this image"
- "Change the color of the car to red"
3. **Don't reconstruct the original prompt** use simple, direct modification instructions
## Usage Examples
### Basic Image Generation
> A serene Japanese garden at golden hour, featuring a traditional red bridge over a koi pond. Cherry blossom trees frame the scene with soft pink petals falling. Photorealistic style with warm, diffused lighting and rich colors.
### Image with Context
When you have an existing image and want to create something inspired by it:
1. Reference the image ID in the `image_ids` parameter
2. Describe what you want: "Create a winter version of this landscape scene with snow-covered trees and a frozen lake"
### Image Editing
To modify an existing image:
1. Include the image ID in `image_ids`
2. Describe the change: "Remove the person from the background of this image"
## Error Handling
### Common Issues
| Error | Solution |
|-------|----------|
| "Image blocked by content safety filters" | Modify your prompt to avoid content that violates safety policies |
| "No image was generated" | Try a different prompt or simplify your request |
| "GEMINI_API_KEY or service account required" | Ensure you've configured either the API key or Vertex AI credentials |
### Safety Filtering
Gemini includes built-in safety filters. If your image is blocked:
- Review your prompt for potentially problematic content
- Try rephrasing to be more specific about artistic intent
- Avoid requests for harmful, violent, or explicit content
## Technical Details
### Storage Integration
Generated images are automatically saved using your configured file strategy:
- **Local**: Saved to `client/public/images/{userId}/`
- **S3/Azure/Firebase**: Uploaded to your configured cloud storage
### Image Format
- Output format: PNG
- Images include unique identifiers for reference in subsequent requests
## Rate Limits
Rate limits depend on your API tier:
- **Gemini API**: Check [Google AI Studio](https://aistudio.google.com/) for current limits
- **Vertex AI**: Based on your Google Cloud project quotas

View File

@@ -12,6 +12,7 @@ Each has its own look, price-point, and setup step (usually just an API key or U
| Tool | Best for | Needs | | Tool | Best for | Needs |
|------|----------|-------| |------|----------|-------|
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API | | **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API | | **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API | | **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API | | **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
@@ -120,7 +121,67 @@ See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-i
--- ---
## 2 · DALL·E (legacy) ## 2 · Gemini Image Tools
Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
### Features
- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
- **Image Context Support**: Use existing images as context or inspiration for new generations
- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI
### Parameters
• **prompt** Detailed text description of the desired image (required, up to 32,000 characters)
• **image_ids** Optional array of image IDs to use as visual context for generation
### Setup
#### Option 1: Gemini API (Recommended)
Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):
```bash
GEMINI_API_KEY=your_api_key_here
```
#### Option 2: Vertex AI (Enterprise)
For Google Cloud users with Vertex AI access:
```bash
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
GOOGLE_CLOUD_LOCATION=us-central1 # optional, default: global
```
### Model Selection
```bash
# Default model (fast and efficient)
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
# Higher quality model
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
```
### Advanced Configuration
Customize tool descriptions via environment variables:
```bash
GEMINI_IMAGE_GEN_DESCRIPTION=...
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
GEMINI_IMAGE_IDS_DESCRIPTION=...
```
More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).
---
## 3 · DALL·E (legacy)
DALL·E provides high-quality image generation using OpenAI's legacy image models. DALL·E provides high-quality image generation using OpenAI's legacy image models.
@@ -159,7 +220,7 @@ See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3)
--- ---
## 3 · Stable Diffusion (local) ## 4 · Stable Diffusion (local)
Run images entirely on your own machine or server. Run images entirely on your own machine or server.
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set. Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
@@ -188,7 +249,7 @@ More details on setting up Automatic1111 can be found in the dedicated [Stable D
--- ---
## 4 · Flux ## 5 · Flux
Cloud generator with an emphasis on speed and optional fine-tuned models. Cloud generator with an emphasis on speed and optional fine-tuned models.
@@ -221,7 +282,7 @@ See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs a
--- ---
## 5 · Model Context Protocol (MCP) ## 6 · Model Context Protocol (MCP)
Image outputs are supported from MCP servers. Image outputs are supported from MCP servers.