mirror of
https://github.com/LibreChat-AI/librechat.ai.git
synced 2026-03-27 10:48:32 +07:00
docs: Add Gemini Image Generation tool documentation (#452)
* docs: Add Gemini Image Generation tool documentation * docs: Add Gemini Image Tools to image_gen features page * fix: Update default model to gemini-2.5-flash-image
This commit is contained in:
@@ -1,4 +1,5 @@
|
|||||||
export default {
|
export default {
|
||||||
index: 'Intro',
|
index: 'Intro',
|
||||||
flux: 'Flux',
|
flux: 'Flux',
|
||||||
|
gemini_image_gen: 'Gemini Image Gen',
|
||||||
}
|
}
|
||||||
|
|||||||
157
pages/docs/configuration/tools/gemini_image_gen.mdx
Normal file
157
pages/docs/configuration/tools/gemini_image_gen.mdx
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
---
|
||||||
|
title: Gemini Image Generation
|
||||||
|
description: Setup and usage instructions for Google Gemini image generation
|
||||||
|
---
|
||||||
|
|
||||||
|
# Gemini Image Generation
|
||||||
|
|
||||||
|
Gemini Image Generation is a powerful tool that integrates Google's Gemini Image Models for high-quality text-to-image generation and image context-aware editing. It supports both the simple Gemini API and Google Cloud Vertex AI.
|
||||||
|
|
||||||
|
## Setup Instructions
|
||||||
|
|
||||||
|
You can use either the Gemini API (recommended for most users) or Vertex AI with a service account.
|
||||||
|
|
||||||
|
### Option 1: Gemini API (Recommended)
|
||||||
|
|
||||||
|
1. Get your API key from [Google AI Studio](https://aistudio.google.com/app/apikey)
|
||||||
|
2. Set the `GEMINI_API_KEY` environment variable in your `.env` file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GEMINI_API_KEY=your_api_key_here
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Vertex AI (For Enterprise/GCP Users)
|
||||||
|
|
||||||
|
1. Create a service account in Google Cloud Console with Vertex AI permissions
|
||||||
|
2. Download the service account JSON key file
|
||||||
|
3. Configure the environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Path to your service account JSON file
|
||||||
|
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
|
||||||
|
|
||||||
|
# Optional: Set the location (default: global)
|
||||||
|
GOOGLE_CLOUD_LOCATION=us-central1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration Options
|
||||||
|
|
||||||
|
### Model Selection
|
||||||
|
|
||||||
|
You can choose which Gemini image model to use via environment variable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Default model
|
||||||
|
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
|
||||||
|
|
||||||
|
# Or use the newer Gemini 3 Pro Image model
|
||||||
|
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
|
||||||
|
```
|
||||||
|
|
||||||
|
### Available Models
|
||||||
|
|
||||||
|
| Model | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `gemini-2.5-flash-image` | Default model, fast and efficient |
|
||||||
|
| `gemini-3-pro-image-preview` | Higher quality, more detailed generations |
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
### Core Capabilities
|
||||||
|
|
||||||
|
- **Text-to-Image Generation**: Create images from detailed text descriptions
|
||||||
|
- **Image Context Support**: Use existing images as context/inspiration for new generations
|
||||||
|
- **Image Editing**: Generate new images based on modifications to existing ones
|
||||||
|
- **Safety Filtering**: Built-in content safety with user-friendly error messages
|
||||||
|
- **Multi-Storage Support**: Compatible with local, S3, Azure, and Firebase storage strategies
|
||||||
|
|
||||||
|
### Parameters
|
||||||
|
|
||||||
|
The Gemini Image Gen tool accepts the following parameters:
|
||||||
|
|
||||||
|
- **prompt** (required) – A detailed text description of the desired image, up to 32,000 characters
|
||||||
|
- **image_ids** (optional) – Array of image IDs to use as visual context for generation
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Prompt Writing
|
||||||
|
|
||||||
|
1. **Be specific and detailed** in your descriptions
|
||||||
|
2. **Start with the image type**: photo, oil painting, watercolor, illustration, cartoon, drawing, vector, render, etc.
|
||||||
|
3. **Include key elements**:
|
||||||
|
- Subject matter and composition
|
||||||
|
- Style and artistic approach
|
||||||
|
- Lighting and atmosphere
|
||||||
|
- Color palette preferences
|
||||||
|
- Technical specifications
|
||||||
|
|
||||||
|
### Image Editing Tips
|
||||||
|
|
||||||
|
When editing existing images:
|
||||||
|
|
||||||
|
1. **Include the original image ID** in the `image_ids` array
|
||||||
|
2. **Use direct editing instructions**:
|
||||||
|
- "Remove the background from this image"
|
||||||
|
- "Add sunglasses to the person in this image"
|
||||||
|
- "Change the color of the car to red"
|
||||||
|
3. **Don't reconstruct the original prompt** – use simple, direct modification instructions
|
||||||
|
|
||||||
|
## Usage Examples
|
||||||
|
|
||||||
|
### Basic Image Generation
|
||||||
|
|
||||||
|
> A serene Japanese garden at golden hour, featuring a traditional red bridge over a koi pond. Cherry blossom trees frame the scene with soft pink petals falling. Photorealistic style with warm, diffused lighting and rich colors.
|
||||||
|
|
||||||
|
### Image with Context
|
||||||
|
|
||||||
|
When you have an existing image and want to create something inspired by it:
|
||||||
|
|
||||||
|
1. Reference the image ID in the `image_ids` parameter
|
||||||
|
2. Describe what you want: "Create a winter version of this landscape scene with snow-covered trees and a frozen lake"
|
||||||
|
|
||||||
|
### Image Editing
|
||||||
|
|
||||||
|
To modify an existing image:
|
||||||
|
|
||||||
|
1. Include the image ID in `image_ids`
|
||||||
|
2. Describe the change: "Remove the person from the background of this image"
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
| Error | Solution |
|
||||||
|
|-------|----------|
|
||||||
|
| "Image blocked by content safety filters" | Modify your prompt to avoid content that violates safety policies |
|
||||||
|
| "No image was generated" | Try a different prompt or simplify your request |
|
||||||
|
| "GEMINI_API_KEY or service account required" | Ensure you've configured either the API key or Vertex AI credentials |
|
||||||
|
|
||||||
|
### Safety Filtering
|
||||||
|
|
||||||
|
Gemini includes built-in safety filters. If your image is blocked:
|
||||||
|
|
||||||
|
- Review your prompt for potentially problematic content
|
||||||
|
- Try rephrasing to be more specific about artistic intent
|
||||||
|
- Avoid requests for harmful, violent, or explicit content
|
||||||
|
|
||||||
|
## Technical Details
|
||||||
|
|
||||||
|
### Storage Integration
|
||||||
|
|
||||||
|
Generated images are automatically saved using your configured file strategy:
|
||||||
|
|
||||||
|
- **Local**: Saved to `client/public/images/{userId}/`
|
||||||
|
- **S3/Azure/Firebase**: Uploaded to your configured cloud storage
|
||||||
|
|
||||||
|
### Image Format
|
||||||
|
|
||||||
|
- Output format: PNG
|
||||||
|
- Images include unique identifiers for reference in subsequent requests
|
||||||
|
|
||||||
|
## Rate Limits
|
||||||
|
|
||||||
|
Rate limits depend on your API tier:
|
||||||
|
|
||||||
|
- **Gemini API**: Check [Google AI Studio](https://aistudio.google.com/) for current limits
|
||||||
|
- **Vertex AI**: Based on your Google Cloud project quotas
|
||||||
|
|
||||||
@@ -12,6 +12,7 @@ Each has its own look, price-point, and setup step (usually just an API key or U
|
|||||||
| Tool | Best for | Needs |
|
| Tool | Best for | Needs |
|
||||||
|------|----------|-------|
|
|------|----------|-------|
|
||||||
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
|
| **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
|
||||||
|
| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
|
||||||
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
|
| **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
|
||||||
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
|
| **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
|
||||||
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
|
| **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
|
||||||
@@ -120,7 +121,67 @@ See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-i
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 2 · DALL·E (legacy)
|
## 2 · Gemini Image Tools
|
||||||
|
|
||||||
|
Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
|
||||||
|
|
||||||
|
### Features
|
||||||
|
|
||||||
|
- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
|
||||||
|
- **Image Context Support**: Use existing images as context or inspiration for new generations
|
||||||
|
- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
|
||||||
|
- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
|
||||||
|
- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI
|
||||||
|
|
||||||
|
### Parameters
|
||||||
|
|
||||||
|
• **prompt** – Detailed text description of the desired image (required, up to 32,000 characters)
|
||||||
|
• **image_ids** – Optional array of image IDs to use as visual context for generation
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
|
||||||
|
#### Option 1: Gemini API (Recommended)
|
||||||
|
|
||||||
|
Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GEMINI_API_KEY=your_api_key_here
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Option 2: Vertex AI (Enterprise)
|
||||||
|
|
||||||
|
For Google Cloud users with Vertex AI access:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
|
||||||
|
GOOGLE_CLOUD_LOCATION=us-central1 # optional, default: global
|
||||||
|
```
|
||||||
|
|
||||||
|
### Model Selection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Default model (fast and efficient)
|
||||||
|
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
|
||||||
|
|
||||||
|
# Higher quality model
|
||||||
|
GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
|
||||||
|
```
|
||||||
|
|
||||||
|
### Advanced Configuration
|
||||||
|
|
||||||
|
Customize tool descriptions via environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GEMINI_IMAGE_GEN_DESCRIPTION=...
|
||||||
|
GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
|
||||||
|
GEMINI_IMAGE_IDS_DESCRIPTION=...
|
||||||
|
```
|
||||||
|
|
||||||
|
More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3 · DALL·E (legacy)
|
||||||
|
|
||||||
DALL·E provides high-quality image generation using OpenAI's legacy image models.
|
DALL·E provides high-quality image generation using OpenAI's legacy image models.
|
||||||
|
|
||||||
@@ -159,7 +220,7 @@ See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3)
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 3 · Stable Diffusion (local)
|
## 4 · Stable Diffusion (local)
|
||||||
|
|
||||||
Run images entirely on your own machine or server.
|
Run images entirely on your own machine or server.
|
||||||
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
|
Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
|
||||||
@@ -188,7 +249,7 @@ More details on setting up Automatic1111 can be found in the dedicated [Stable D
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 4 · Flux
|
## 5 · Flux
|
||||||
|
|
||||||
Cloud generator with an emphasis on speed and optional fine-tuned models.
|
Cloud generator with an emphasis on speed and optional fine-tuned models.
|
||||||
|
|
||||||
@@ -221,7 +282,7 @@ See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs a
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 5 · Model Context Protocol (MCP)
|
## 6 · Model Context Protocol (MCP)
|
||||||
|
|
||||||
Image outputs are supported from MCP servers.
|
Image outputs are supported from MCP servers.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user