diff --git a/pages/blog/2025-04-25_optimizing-rag-performance-in-librechat.mdx b/pages/blog/2025-04-25_optimizing-rag-performance-in-librechat.mdx new file mode 100644 index 0000000..5031d19 --- /dev/null +++ b/pages/blog/2025-04-25_optimizing-rag-performance-in-librechat.mdx @@ -0,0 +1,289 @@ +--- +title: Optimizing RAG Performance in LibreChat +date: 2025/04/25 +description: 'Optimize Retrieval-Augmented Generation (RAG) performance for LibreChat' +tags: + - rag + - performance + - optimization + - guide +authorid: anon +ogImage: https://firebasestorage.googleapis.com/v0/b/superb-reporter-407417.appspot.com/o/rag_blog_image_04_25_25.png?alt=media +--- + +import { BlogHeader } from '@/components/blog/BlogHeader' + + + +> **Note**: This is a guest post. The information provided may not be accurate if the underlying RAG API changes in future LibreChat versions. Always refer to the official documentation for the most up-to-date information. + +# Optimizing RAG Performance in LibreChat + +This guide walks you through optimizing Retrieval-Augmented Generation (RAG) performance in your LibreChat setup. Always change **only one major setting at a time** and test results carefully. + +--- + +## 1. Optimize Database (vectordb - PostgreSQL/pgvector) + +Improving database performance is crucial for RAG speed, especially during indexing and retrieval. + +### 1.1. Verify/Create Metadata & Filter Indexes (**CRITICAL**) + +Missing indexes for filtering can drastically degrade performance. + +#### Connect to the Database: + +```bash +docker exec -it vectordb psql -U myuser -d mydatabase +# Enter 'mypassword' (or your secure password) +``` + +#### Check for existing indexes: + +```sql +\di langchain_pg_embedding* +``` + +You should see: + +- `custom_id_idx` +- `idx_cmetadata_file_id_text` +- A vector index like `langchain_pg_embedding_embedding_idx` + +If missing, run: + +```sql +-- Internal lookup by custom_id +CREATE INDEX CONCURRENTLY IF NOT EXISTS custom_id_idx ON langchain_pg_embedding (custom_id); + +-- Filter by file_id inside metadata +CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_cmetadata_file_id_text ON langchain_pg_embedding ((cmetadata ->> 'file_id')); +``` + +#### Exit: + +```sql +\q +``` + +--- + +### 1.2. Verify/Tune Vector Index + +The pgvector extension typically creates an index on the `embedding` column. + +Check with `\di` again. Look for a `hnsw` or `ivfflat` index type. + +> ⚙️ **Advanced**: You can tune index parameters like `lists`, `m`, `ef_search`, and `ef_construction` (see [pgvector README](https://github.com/pgvector/pgvector)). + +--- + +### 1.3. Monitor/Adjust Server Resources + +```bash +docker stats vectordb +``` + +Watch for memory and/or CPU saturation. PostgreSQL benefits from abundant RAM. + +#### Optional: Set resource limits in `docker-compose.override.yml` + +```yaml +deploy: + resources: + limits: + memory: 4G + cpus: '1.0' +``` + +--- + +### 1.4. Perform Database Maintenance + +Run regularly: + +```sql +VACUUM (VERBOSE, ANALYZE) langchain_pg_embedding; +``` + +--- + +### 1.5. Advanced PostgreSQL Tuning + +Consider tuning: + +- `shared_buffers` +- `work_mem` +- `maintenance_work_mem` +- `effective_cache_size` + +These live in `postgresql.conf` (inside the container). Only touch them if you know what you're doing. + +--- + +## 2. Tune Chunking Strategy (`.env`) + +Impacts upload speed and retrieval precision. + +### 2.1. Open the main `.env` file: + +```bash +nano ~/LibreChat/.env +``` + +### 2.2. Modify chunk settings: + +```env +CHUNK_SIZE=1500 +CHUNK_OVERLAP=150 +``` + +Try other combinations like: + +- `1000/100` +- `500/50` + +Trade-offs: +- **Larger chunks** = faster processing, lower precision +- **Smaller chunks** = slower, more precise + +### 2.3. Save, exit, and restart: + +```bash +docker-compose down +docker-compose up -d --force-recreate +``` + +--- + +### 2.4. Delete Old Embeddings + +- **Easiest**: Delete files via UI +- **Advanced**: Delete from DB + +```sql +DELETE FROM langchain_pg_embedding WHERE cmetadata ->> 'file_id' = 'YOUR_FILE_ID'; +``` + +> 🔁 Safer method: Use a **new test file** for each config test + +--- + +### 2.5. Re-upload & test performance + +--- + +## 3. Optimize Embedding Process + +Set provider/model in `.env`. + +### Examples: + +#### OpenAI: + +```env +EMBEDDINGS_PROVIDER=openai +EMBEDDINGS_MODEL=text-embedding-ada-002 +``` + +#### Azure: + +```env +EMBEDDINGS_PROVIDER=azure +AZURE_OPENAI_API_KEY=... +AZURE_OPENAI_ENDPOINT=... +AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME=... +``` + +#### Ollama (local): + +```env +EMBEDDINGS_PROVIDER=ollama +OLLAMA_BASE_URL=http://ollama:11434 +EMBEDDINGS_MODEL=nomic-embed-text +``` + +### Restart and re-upload: + +```bash +docker-compose down +docker-compose up -d --force-recreate +``` + +--- + +## 4. Tune Retrieval Strategy + +How many chunks are retrieved affects both relevance and API limits. + +### 4.1. In `.env`: + +```env +RAG_API_TOP_K=3 +# RAG_USE_FULL_CONTEXT=True +``` + +- Lower `TOP_K` = safer, faster +- Higher `TOP_K` = more context, risk of hitting token limits + +--- + +## 5. Monitor LibreChat API Logs + +Check for truncation or token overflows. + +### 5.1. Run a large query, then: + +```bash +docker logs --tail 300 +``` + +Search for: + +```text +... [truncated] +``` + +If present, reduce `TOP_K` or `CHUNK_SIZE`. + +--- + +## 6. Manage Server Load & Isolation + +### 6.1. Monitor: + +```bash +htop +docker stats +``` + +### 6.2. Reduce Load (temporarily): + +```bash +# Stop unused APIs or admin panels +docker-compose stop api2 api3 +sudo systemctl stop lcadmin3.service +``` + +### 6.3. Upgrade Hardware + +- More RAM/CPU +- Use SSD (preferably NVMe) +- GPU boosts embedding (if using local models) + +### 6.4. Advanced: Separate Services + +You can host `vectordb` and `rag_api` on separate machines for heavy workloads. + +--- + +## Summary + +Start with **index optimization**. Then move on to: + +1. Chunk tuning (`CHUNK_SIZE`, `CHUNK_OVERLAP`) +2. Retrieval strategy (`RAG_API_TOP_K`) +3. Embedding configuration +4. API log monitoring + +Test each change independently. Always monitor API logs and resource usage. If issues persist, consider model/hardware upgrades. diff --git a/pages/docs/features/image_gen.mdx b/pages/docs/features/image_gen.mdx index daf16a8..6b659e5 100644 --- a/pages/docs/features/image_gen.mdx +++ b/pages/docs/features/image_gen.mdx @@ -18,10 +18,12 @@ Each has its own look, price-point, and setup step (usually just an API key or U | **MCP** | Bring-your-own-Image-Generators | MCP server with image output support | **Notes:** +- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation. + - See [Image Storage and Handling](#image-storage-and-handling) for more details. - API keys can be omitted in favor of allowing the user to enter their own key from the UI. - Azure OpenAI does not yet support the latest OpenAI GPT-Image-1. - MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools. - - Note: MCP servers may or may not use the correct response format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp). + - Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp). --- @@ -258,6 +260,11 @@ mcpServers: All generated images are: 1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy) 2. Displayed directly in the chat interface +3. Image Outputs are directly sent to the LLM as part of the immediate chat context following generation. + - This may create issues if you are using an LLM that does not support image inputs. + - There will be an option to disable this behavior on a per-agent-basis in the future. + - The outputs are only directly sent to the LLM upon generation, not on every message. + - To include the image in the chat, you can directly attach it to the message from the side panel. --- @@ -276,6 +283,7 @@ If any of the tools encounter an error, they will return an error message explai - Content policy violations - Proxy/network issues - Invalid parameters +- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above) ---