diff --git a/pages/docs/configuration/tools/_meta.ts b/pages/docs/configuration/tools/_meta.ts index 74bfe67..a2d96bb 100644 --- a/pages/docs/configuration/tools/_meta.ts +++ b/pages/docs/configuration/tools/_meta.ts @@ -1,4 +1,5 @@ export default { index: 'Intro', flux: 'Flux', + gemini_image_gen: 'Gemini Image Gen', } diff --git a/pages/docs/configuration/tools/gemini_image_gen.mdx b/pages/docs/configuration/tools/gemini_image_gen.mdx new file mode 100644 index 0000000..5e0d669 --- /dev/null +++ b/pages/docs/configuration/tools/gemini_image_gen.mdx @@ -0,0 +1,157 @@ +--- +title: Gemini Image Generation +description: Setup and usage instructions for Google Gemini image generation +--- + +# Gemini Image Generation + +Gemini Image Generation is a powerful tool that integrates Google's Gemini Image Models for high-quality text-to-image generation and image context-aware editing. It supports both the simple Gemini API and Google Cloud Vertex AI. + +## Setup Instructions + +You can use either the Gemini API (recommended for most users) or Vertex AI with a service account. + +### Option 1: Gemini API (Recommended) + +1. Get your API key from [Google AI Studio](https://aistudio.google.com/app/apikey) +2. Set the `GEMINI_API_KEY` environment variable in your `.env` file: + +```bash +GEMINI_API_KEY=your_api_key_here +``` + +### Option 2: Vertex AI (For Enterprise/GCP Users) + +1. Create a service account in Google Cloud Console with Vertex AI permissions +2. Download the service account JSON key file +3. Configure the environment variables: + +```bash +# Path to your service account JSON file +GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json + +# Optional: Set the location (default: global) +GOOGLE_CLOUD_LOCATION=us-central1 +``` + +## Configuration Options + +### Model Selection + +You can choose which Gemini image model to use via environment variable: + +```bash +# Default model +GEMINI_IMAGE_MODEL=gemini-2.5-flash-image + +# Or use the newer Gemini 3 Pro Image model +GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview +``` + +### Available Models + +| Model | Description | +|-------|-------------| +| `gemini-2.5-flash-image` | Default model, fast and efficient | +| `gemini-3-pro-image-preview` | Higher quality, more detailed generations | + +## Features + +### Core Capabilities + +- **Text-to-Image Generation**: Create images from detailed text descriptions +- **Image Context Support**: Use existing images as context/inspiration for new generations +- **Image Editing**: Generate new images based on modifications to existing ones +- **Safety Filtering**: Built-in content safety with user-friendly error messages +- **Multi-Storage Support**: Compatible with local, S3, Azure, and Firebase storage strategies + +### Parameters + +The Gemini Image Gen tool accepts the following parameters: + +- **prompt** (required) – A detailed text description of the desired image, up to 32,000 characters +- **image_ids** (optional) – Array of image IDs to use as visual context for generation + +## Best Practices + +### Prompt Writing + +1. **Be specific and detailed** in your descriptions +2. **Start with the image type**: photo, oil painting, watercolor, illustration, cartoon, drawing, vector, render, etc. +3. **Include key elements**: + - Subject matter and composition + - Style and artistic approach + - Lighting and atmosphere + - Color palette preferences + - Technical specifications + +### Image Editing Tips + +When editing existing images: + +1. **Include the original image ID** in the `image_ids` array +2. **Use direct editing instructions**: + - "Remove the background from this image" + - "Add sunglasses to the person in this image" + - "Change the color of the car to red" +3. **Don't reconstruct the original prompt** – use simple, direct modification instructions + +## Usage Examples + +### Basic Image Generation + +> A serene Japanese garden at golden hour, featuring a traditional red bridge over a koi pond. Cherry blossom trees frame the scene with soft pink petals falling. Photorealistic style with warm, diffused lighting and rich colors. + +### Image with Context + +When you have an existing image and want to create something inspired by it: + +1. Reference the image ID in the `image_ids` parameter +2. Describe what you want: "Create a winter version of this landscape scene with snow-covered trees and a frozen lake" + +### Image Editing + +To modify an existing image: + +1. Include the image ID in `image_ids` +2. Describe the change: "Remove the person from the background of this image" + +## Error Handling + +### Common Issues + +| Error | Solution | +|-------|----------| +| "Image blocked by content safety filters" | Modify your prompt to avoid content that violates safety policies | +| "No image was generated" | Try a different prompt or simplify your request | +| "GEMINI_API_KEY or service account required" | Ensure you've configured either the API key or Vertex AI credentials | + +### Safety Filtering + +Gemini includes built-in safety filters. If your image is blocked: + +- Review your prompt for potentially problematic content +- Try rephrasing to be more specific about artistic intent +- Avoid requests for harmful, violent, or explicit content + +## Technical Details + +### Storage Integration + +Generated images are automatically saved using your configured file strategy: + +- **Local**: Saved to `client/public/images/{userId}/` +- **S3/Azure/Firebase**: Uploaded to your configured cloud storage + +### Image Format + +- Output format: PNG +- Images include unique identifiers for reference in subsequent requests + +## Rate Limits + +Rate limits depend on your API tier: + +- **Gemini API**: Check [Google AI Studio](https://aistudio.google.com/) for current limits +- **Vertex AI**: Based on your Google Cloud project quotas + diff --git a/pages/docs/features/image_gen.mdx b/pages/docs/features/image_gen.mdx index 169e7b8..11287cd 100644 --- a/pages/docs/features/image_gen.mdx +++ b/pages/docs/features/image_gen.mdx @@ -12,6 +12,7 @@ Each has its own look, price-point, and setup step (usually just an API key or U | Tool | Best for | Needs | |------|----------|-------| | **OpenAI Image Tools** | Cutting-edge results (GPT-Image-1).
Can also ***edit*** the images you upload. | OpenAI API | +| **Gemini Image Tools** | Google's latest image models with context-aware generation. | Gemini API or Vertex AI | | **DALL·E (3 / 2)** | Legacy OpenAI Image models. | OpenAI API | | **Stable Diffusion** | Local or self-hosted generation, endless community models. | Automatic1111 API | | **Flux** | Fast cloud renders, optional fine-tunes. | Flux API | @@ -120,7 +121,67 @@ See the [GPT-Image-1 pricing page](https://platform.openai.com/docs/models/gpt-i --- -## 2 · DALL·E (legacy) +## 2 · Gemini Image Tools + +Gemini Image Tools integrate Google's latest image generation models, supporting both text-to-image generation and image context-aware editing. + +### Features + +- **Text-to-Image Generation**: Create high-quality images from detailed text descriptions +- **Image Context Support**: Use existing images as context or inspiration for new generations +- **Image Editing**: Generate new images based on modifications to existing ones (include original image ID) +- **Multiple Models**: Choose between `gemini-2.5-flash-image` (default) or `gemini-3-pro-image-preview` +- **Dual API Support**: Works with both simple Gemini API keys and Google Cloud Vertex AI + +### Parameters + +• **prompt** – Detailed text description of the desired image (required, up to 32,000 characters) +• **image_ids** – Optional array of image IDs to use as visual context for generation + +### Setup + +#### Option 1: Gemini API (Recommended) + +Get an API key from [Google AI Studio](https://aistudio.google.com/app/apikey): + +```bash +GEMINI_API_KEY=your_api_key_here +``` + +#### Option 2: Vertex AI (Enterprise) + +For Google Cloud users with Vertex AI access: + +```bash +GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json +GOOGLE_CLOUD_LOCATION=us-central1 # optional, default: global +``` + +### Model Selection + +```bash +# Default model (fast and efficient) +GEMINI_IMAGE_MODEL=gemini-2.5-flash-image + +# Higher quality model +GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview +``` + +### Advanced Configuration + +Customize tool descriptions via environment variables: + +```bash +GEMINI_IMAGE_GEN_DESCRIPTION=... +GEMINI_IMAGE_GEN_PROMPT_DESCRIPTION=... +GEMINI_IMAGE_IDS_DESCRIPTION=... +``` + +More details can be found in the dedicated [Gemini Image Gen guide](/docs/configuration/tools/gemini_image_gen). + +--- + +## 3 · DALL·E (legacy) DALL·E provides high-quality image generation using OpenAI's legacy image models. @@ -159,7 +220,7 @@ See the [DALL-E pricing page](https://platform.openai.com/docs/models/dall-e-3) --- -## 3 · Stable Diffusion (local) +## 4 · Stable Diffusion (local) Run images entirely on your own machine or server. Point LibreChat at any Automatic1111 (or compatible) endpoint and you're set. @@ -188,7 +249,7 @@ More details on setting up Automatic1111 can be found in the dedicated [Stable D --- -## 4 · Flux +## 5 · Flux Cloud generator with an emphasis on speed and optional fine-tuned models. @@ -221,7 +282,7 @@ See the [Flux pricing page](https://docs.bfl.ml/pricing/) for details on costs a --- -## 5 · Model Context Protocol (MCP) +## 6 · Model Context Protocol (MCP) Image outputs are supported from MCP servers.