docs: Add Gemini Image Generation tool documentation (#452)

* docs: Add Gemini Image Generation tool documentation * docs: Add Gemini Image Tools to image_gen features page * fix: Update default model to gemini-2.5-flash-image
2026-03-27 02:38:32 +07:00 · 2026-01-03 12:48:43 -05:00
parent f2a90bb6ac
commit 2bab1c9cd2
3 changed files with 223 additions and 4 deletions
--- a/pages/docs/configuration/tools/_meta.ts
+++ b/pages/docs/configuration/tools/_meta.ts
@@ -1,4 +1,5 @@
 export default {
  index: 'Intro',
  flux: 'Flux',
+  gemini_image_gen: 'Gemini Image Gen',
 }
--- a/pages/docs/configuration/tools/gemini_image_gen.mdx
+++ b/pages/docs/configuration/tools/gemini_image_gen.mdx
@@ -0,0 +1,157 @@
+---
+title: Gemini Image Generation
+description: Setup and usage instructions for Google Gemini image generation
+---
+
+# Gemini Image Generation
+
+Gemini Image Generation is a powerful tool that integrates Google's Gemini Image Models for high-quality text-to-image generation and image context-aware editing. It supports both the simple Gemini API and Google Cloud Vertex AI.
+
+## Setup Instructions
+
+You can use either the Gemini API (recommended for most users) or Vertex AI with a service account.
+
+### Option 1: Gemini API (Recommended)
+
+1. Get your API key from [Google AI Studio](https://aistudio.google.com/app/apikey)
+2. Set the `GEMINI_API_KEY` environment variable in your `.env` file:
+
+```bash
+GEMINI_API_KEY=your_api_key_here
+```
+
+### Option 2: Vertex AI (For Enterprise/GCP Users)
+
+1. Create a service account in Google Cloud Console with Vertex AI permissions
+2. Download the service account JSON key file
+3. Configure the environment variables:
+
+```bash
+# Path to your service account JSON file
+GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
+
+# Optional: Set the location (default: global)
+GOOGLE_CLOUD_LOCATION=us-central1
+```
+
+## Configuration Options
+
+### Model Selection
+
+You can choose which Gemini image model to use via environment variable:
+
+```bash
+# Default model
+GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
+
+# Or use the newer Gemini 3 Pro Image model
+GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
+```
+
+### Available Models
+
+| Model | Description |
+|-------|-------------|
+| `gemini-2.5-flash-image` | Default model, fast and efficient |
+| `gemini-3-pro-image-preview` | Higher quality, more detailed generations |
+
+## Features
+
+### Core Capabilities
+
+- **Text-to-Image Generation**: Create images from detailed text descriptions
+- **Image Context Support**: Use existing images as context/inspiration for new generations
+- **Image Editing**: Generate new images based on modifications to existing ones
+- **Safety Filtering**: Built-in content safety with user-friendly error messages
+- **Multi-Storage Support**: Compatible with local, S3, Azure, and Firebase storage strategies
+
+### Parameters
+
+The Gemini Image Gen tool accepts the following parameters:
+
+- **prompt** (required) – A detailed text description of the desired image, up to 32,000 characters
+- **image_ids** (optional) – Array of image IDs to use as visual context for generation
+
+## Best Practices
+
+### Prompt Writing
+
+1. **Be specific and detailed** in your descriptions
+2. **Start with the image type**: photo, oil painting, watercolor, illustration, cartoon, drawing, vector, render, etc.
+3. **Include key elements**:
+   - Subject matter and composition
+   - Style and artistic approach
+   - Lighting and atmosphere
+   - Color palette preferences
+   - Technical specifications
+
+### Image Editing Tips
+
+When editing existing images:
+
+1. **Include the original image ID** in the `image_ids` array
+2. **Use direct editing instructions**:
+   - "Remove the background from this image"
+   - "Add sunglasses to the person in this image"
+   - "Change the color of the car to red"
+3. **Don't reconstruct the original prompt** – use simple, direct modification instructions
+
+## Usage Examples
+
+### Basic Image Generation
+
+> A serene Japanese garden at golden hour, featuring a traditional red bridge over a koi pond. Cherry blossom trees frame the scene with soft pink petals falling. Photorealistic style with warm, diffused lighting and rich colors.
+
+### Image with Context
+
+When you have an existing image and want to create something inspired by it:
+
+1. Reference the image ID in the `image_ids` parameter
+2. Describe what you want: "Create a winter version of this landscape scene with snow-covered trees and a frozen lake"
+
+### Image Editing
+
+To modify an existing image:
+
+1. Include the image ID in `image_ids`
+2. Describe the change: "Remove the person from the background of this image"
+
+## Error Handling
+
+### Common Issues
+
+| Error | Solution |
+|-------|----------|
+| "Image blocked by content safety filters" | Modify your prompt to avoid content that violates safety policies |
+| "No image was generated" | Try a different prompt or simplify your request |
+| "GEMINI_API_KEY or service account required" | Ensure you've configured either the API key or Vertex AI credentials |
+
+### Safety Filtering
+
+Gemini includes built-in safety filters. If your image is blocked:
+
+- Review your prompt for potentially problematic content
+- Try rephrasing to be more specific about artistic intent
+- Avoid requests for harmful, violent, or explicit content
+
+## Technical Details
+
+### Storage Integration
+
+Generated images are automatically saved using your configured file strategy:
+
+- **Local**: Saved to `client/public/images/{userId}/`
+- **S3/Azure/Firebase**: Uploaded to your configured cloud storage
+
+### Image Format
+
+- Output format: PNG
+- Images include unique identifiers for reference in subsequent requests
+
+## Rate Limits
+
+Rate limits depend on your API tier:
+
+- **Gemini API**: Check [Google AI Studio](https://aistudio.google.com/) for current limits
+- **Vertex AI**: Based on your Google Cloud project quotas
+
--- a/pages/docs/features/image_gen.mdx
+++ b/pages/docs/features/image_gen.mdx
@@ -12,6 +12,7 @@ Each has its own look, price-point, and setup step (usually just an API key or U
 | Tool | Best for | Needs |
 |------|----------|-------|
 | **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).<br/>Can also ***edit*** the images you upload. | OpenAI API |
+| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI |
 | **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API |
 | **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API |
 | **Flux** | Fast cloud renders, optional fine-tunes. | Flux API |
@@ -120,7 +121,67 @@ See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-i

 ---

-## 2 · DALL·E (legacy)
+## 2 · Gemini Image Tools
+
+Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing.
+
+### Features
+
+- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions
+- **Image Context Support**: Use existing images as context or inspiration for new generations
+- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID)
+- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview`
+- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI
+
+### Parameters
+
+• **prompt** – Detailed text description of the desired image (required, up to 32,000 characters)  
+• **image_ids** – Optional array of image IDs to use as visual context for generation
+
+### Setup
+
+#### Option 1: Gemini API (Recommended)
+
+Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey):
+
+```bash
+GEMINI_API_KEY=your_api_key_here
+```
+
+#### Option 2: Vertex AI (Enterprise)
+
+For Google Cloud users with Vertex AI access:
+
+```bash
+GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
+GOOGLE_CLOUD_LOCATION=us-central1  # optional, default: global
+```
+
+### Model Selection
+
+```bash
+# Default model (fast and efficient)
+GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
+
+# Higher quality model
+GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
+```
+
+### Advanced Configuration
+
+Customize tool descriptions via environment variables:
+
+```bash
+GEMINI_IMAGE_GEN_DESCRIPTION=...
+GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=...
+GEMINI_IMAGE_IDS_DESCRIPTION=...
+```
+
+More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen).
+
+---
+
+## 3 · DALL·E (legacy)

 DALL·E provides high-quality image generation using OpenAI's legacy image models.

@@ -159,7 +220,7 @@ See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3)

 ---

-## 3 · Stable Diffusion (local)
+## 4 · Stable Diffusion (local)

 Run images entirely on your own machine or server.  
 Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set.
@@ -188,7 +249,7 @@ More details on setting up Automatic1111 can be found in the dedicated [Stable D

 ---

-## 4 · Flux
+## 5 · Flux

 Cloud generator with an emphasis on speed and optional fine-tuned models.

@@ -221,7 +282,7 @@ See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs a

 ---

-## 5 · Model Context Protocol (MCP)
+## 6 · Model Context Protocol (MCP)

 Image outputs are supported from MCP servers.