mirror of
https://github.com/LibreChat-AI/librechat.ai.git
synced 2026-03-27 10:48:32 +07:00
feat: RAG Blog post (#291)
* Create optimizing-rag-performance-in-librechat.md This PR adds a detailed blog post written by Henk van Ess that covers how to optimize Retrieval-Augmented Generation (RAG) performance in LibreChat. The guide walks through: Improving vector database performance (PostgreSQL/pgvector) Chunking strategies (CHUNK_SIZE / CHUNK_OVERLAP) Embedding provider options (OpenAI, Azure, Ollama) Retrieval settings (RAG_API_TOP_K) Monitoring and server resource tips It's designed to help developers fine-tune their LibreChat instances for speed and quality. All content is based on hands-on testing and is Markdown-formatted for blog use. Looking forward to feedback — happy to revise if needed! * Update optimizing-rag-performance-in-librechat.md This PR adds a detailed blog post written by Henk van Ess that covers how to optimize Retrieval-Augmented Generation (RAG) performance in LibreChat. The guide walks through: Improving vector database performance (PostgreSQL/pgvector) Chunking strategies (CHUNK_SIZE / CHUNK_OVERLAP) Embedding provider options (OpenAI, Azure, Ollama) Retrieval settings (RAG_API_TOP_K) Monitoring and server resource tips It's designed to help developers fine-tune their LibreChat instances for speed and quality. All content is based on hands-on testing and is Markdown-formatted for blog use. Looking forward to feedback — happy to revise if needed! * Optimizing-rag-performance-in-librechat.md This PR adds a detailed blog post written by Henk van Ess that covers how to optimize Retrieval-Augmented Generation (RAG) performance in LibreChat. The guide walks through: Improving vector database performance (PostgreSQL/pgvector) Chunking strategies (CHUNK_SIZE / CHUNK_OVERLAP) Embedding provider options (OpenAI, Azure, Ollama) Retrieval settings (RAG_API_TOP_K) Monitoring and server resource tips It's designed to help developers fine-tune their LibreChat instances for speed and quality. All content is based on hands-on testing and is Markdown-formatted for blog use. Looking forward to feedback — happy to revise if needed! * docs: Document AWS_ENDPOINT_URL for S3 configuration (#285) This commit updates the documentation related to S3 storage configuration to include the `AWS_ENDPOINT_URL` environment variable. This variable was introduced in PR [#6431](https://github.com/danny-avila/LibreChat/pull/6431) to allow users to specify a custom endpoint URL for S3 connections, but the documentation is not update. The changes include: - Adding a description for `AWS_ENDPOINT_URL`, clarifying its purpose and indicating that it's optional. * 🎨 feat: Adds Image Gen Docs, fix broken links, usage stats, missing .env vars, formatting issues, bump Next.js (#288) * docs: enhance API key setup instructions for clarity * docs: update section title for API key setup clarity * docs: add comprehensive guide for OpenAI image generation and editing tools * docs: clarify Stable Diffusion section and update link in Image Generation overview * docs: add Flux cloud generator configuration details and environment variables * fix: Firebase CDN configuration link * docs: enhance fileStrategy section with CDN options and notes * docs: enhance Image Generation section with improved structure and pricing details * docs: add Code Interpreter section with environment variable details and enterprise plan notes * fix: formatting * chore: bump next * fix: correct markdown formatting for artifact example in agents documentation * docs: add deprecation notices for tools, plugins, presets, and enhance image generation section * feat: implement GitHub stats API and update Usage component to fetch stars dynamically * fix: update Docker pulls value in Usage component * 🫙 fix: Azure Blob Storage Documentation (#289) * fix: Update fileStrategy option to use "azure_blob" in configuration docs * fix: Update CDN documentation for Azure Blob Storage and improve navigation * fix: Remove Hetzner reference from deployment comparison, closes #271 * fix: Update RAG API documentation for clarity on feature availability, closes #249 * feat: Update demo images (#290) * docs: Clarify configuration file usage and API key settings in documentation, closes #238 * fix: Add instructions to remove existing Docker images for local and remote setups * 🌊 docs: Flux (#203) * feat: Enhance documentation for Flux and related tools, including setup instructions and parameters * docs: Update image generation documentation with additional notes on output handling and MCP server format * feat: Add blog post on optimizing RAG performance in LibreChat with accompanying image * chore: Remove unused image for RAG performance blog post --------- Co-authored-by: Henk van Ess <31323314+voelspriet@users.noreply.github.com> Co-authored-by: lensfa <80903255+lensfa-lzd@users.noreply.github.com> Co-authored-by: Marco Beretta <81851188+berry-13@users.noreply.github.com> Co-authored-by: heptapod <164861708+leondape@users.noreply.github.com>
This commit is contained in:
@@ -0,0 +1,289 @@
|
||||
---
|
||||
title: Optimizing RAG Performance in LibreChat
|
||||
date: 2025/04/25
|
||||
description: 'Optimize Retrieval-Augmented Generation (RAG) performance for LibreChat'
|
||||
tags:
|
||||
- rag
|
||||
- performance
|
||||
- optimization
|
||||
- guide
|
||||
authorid: anon
|
||||
ogImage: https://firebasestorage.googleapis.com/v0/b/superb-reporter-407417.appspot.com/o/rag_blog_image_04_25_25.png?alt=media
|
||||
---
|
||||
|
||||
import { BlogHeader } from '@/components/blog/BlogHeader'
|
||||
|
||||
<BlogHeader />
|
||||
|
||||
> **Note**: This is a guest post. The information provided may not be accurate if the underlying RAG API changes in future LibreChat versions. Always refer to the official documentation for the most up-to-date information.
|
||||
|
||||
# Optimizing RAG Performance in LibreChat
|
||||
|
||||
This guide walks you through optimizing Retrieval-Augmented Generation (RAG) performance in your LibreChat setup. Always change **only one major setting at a time** and test results carefully.
|
||||
|
||||
---
|
||||
|
||||
## 1. Optimize Database (vectordb - PostgreSQL/pgvector)
|
||||
|
||||
Improving database performance is crucial for RAG speed, especially during indexing and retrieval.
|
||||
|
||||
### 1.1. Verify/Create Metadata & Filter Indexes (**CRITICAL**)
|
||||
|
||||
Missing indexes for filtering can drastically degrade performance.
|
||||
|
||||
#### Connect to the Database:
|
||||
|
||||
```bash
|
||||
docker exec -it vectordb psql -U myuser -d mydatabase
|
||||
# Enter 'mypassword' (or your secure password)
|
||||
```
|
||||
|
||||
#### Check for existing indexes:
|
||||
|
||||
```sql
|
||||
\di langchain_pg_embedding*
|
||||
```
|
||||
|
||||
You should see:
|
||||
|
||||
- `custom_id_idx`
|
||||
- `idx_cmetadata_file_id_text`
|
||||
- A vector index like `langchain_pg_embedding_embedding_idx`
|
||||
|
||||
If missing, run:
|
||||
|
||||
```sql
|
||||
-- Internal lookup by custom_id
|
||||
CREATE INDEX CONCURRENTLY IF NOT EXISTS custom_id_idx ON langchain_pg_embedding (custom_id);
|
||||
|
||||
-- Filter by file_id inside metadata
|
||||
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_cmetadata_file_id_text ON langchain_pg_embedding ((cmetadata ->> 'file_id'));
|
||||
```
|
||||
|
||||
#### Exit:
|
||||
|
||||
```sql
|
||||
\q
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 1.2. Verify/Tune Vector Index
|
||||
|
||||
The pgvector extension typically creates an index on the `embedding` column.
|
||||
|
||||
Check with `\di` again. Look for a `hnsw` or `ivfflat` index type.
|
||||
|
||||
> ⚙️ **Advanced**: You can tune index parameters like `lists`, `m`, `ef_search`, and `ef_construction` (see [pgvector README](https://github.com/pgvector/pgvector)).
|
||||
|
||||
---
|
||||
|
||||
### 1.3. Monitor/Adjust Server Resources
|
||||
|
||||
```bash
|
||||
docker stats vectordb
|
||||
```
|
||||
|
||||
Watch for memory and/or CPU saturation. PostgreSQL benefits from abundant RAM.
|
||||
|
||||
#### Optional: Set resource limits in `docker-compose.override.yml`
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4G
|
||||
cpus: '1.0'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 1.4. Perform Database Maintenance
|
||||
|
||||
Run regularly:
|
||||
|
||||
```sql
|
||||
VACUUM (VERBOSE, ANALYZE) langchain_pg_embedding;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 1.5. Advanced PostgreSQL Tuning
|
||||
|
||||
Consider tuning:
|
||||
|
||||
- `shared_buffers`
|
||||
- `work_mem`
|
||||
- `maintenance_work_mem`
|
||||
- `effective_cache_size`
|
||||
|
||||
These live in `postgresql.conf` (inside the container). Only touch them if you know what you're doing.
|
||||
|
||||
---
|
||||
|
||||
## 2. Tune Chunking Strategy (`.env`)
|
||||
|
||||
Impacts upload speed and retrieval precision.
|
||||
|
||||
### 2.1. Open the main `.env` file:
|
||||
|
||||
```bash
|
||||
nano ~/LibreChat/.env
|
||||
```
|
||||
|
||||
### 2.2. Modify chunk settings:
|
||||
|
||||
```env
|
||||
CHUNK_SIZE=1500
|
||||
CHUNK_OVERLAP=150
|
||||
```
|
||||
|
||||
Try other combinations like:
|
||||
|
||||
- `1000/100`
|
||||
- `500/50`
|
||||
|
||||
Trade-offs:
|
||||
- **Larger chunks** = faster processing, lower precision
|
||||
- **Smaller chunks** = slower, more precise
|
||||
|
||||
### 2.3. Save, exit, and restart:
|
||||
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up -d --force-recreate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2.4. Delete Old Embeddings
|
||||
|
||||
- **Easiest**: Delete files via UI
|
||||
- **Advanced**: Delete from DB
|
||||
|
||||
```sql
|
||||
DELETE FROM langchain_pg_embedding WHERE cmetadata ->> 'file_id' = 'YOUR_FILE_ID';
|
||||
```
|
||||
|
||||
> 🔁 Safer method: Use a **new test file** for each config test
|
||||
|
||||
---
|
||||
|
||||
### 2.5. Re-upload & test performance
|
||||
|
||||
---
|
||||
|
||||
## 3. Optimize Embedding Process
|
||||
|
||||
Set provider/model in `.env`.
|
||||
|
||||
### Examples:
|
||||
|
||||
#### OpenAI:
|
||||
|
||||
```env
|
||||
EMBEDDINGS_PROVIDER=openai
|
||||
EMBEDDINGS_MODEL=text-embedding-ada-002
|
||||
```
|
||||
|
||||
#### Azure:
|
||||
|
||||
```env
|
||||
EMBEDDINGS_PROVIDER=azure
|
||||
AZURE_OPENAI_API_KEY=...
|
||||
AZURE_OPENAI_ENDPOINT=...
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME=...
|
||||
```
|
||||
|
||||
#### Ollama (local):
|
||||
|
||||
```env
|
||||
EMBEDDINGS_PROVIDER=ollama
|
||||
OLLAMA_BASE_URL=http://ollama:11434
|
||||
EMBEDDINGS_MODEL=nomic-embed-text
|
||||
```
|
||||
|
||||
### Restart and re-upload:
|
||||
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up -d --force-recreate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Tune Retrieval Strategy
|
||||
|
||||
How many chunks are retrieved affects both relevance and API limits.
|
||||
|
||||
### 4.1. In `.env`:
|
||||
|
||||
```env
|
||||
RAG_API_TOP_K=3
|
||||
# RAG_USE_FULL_CONTEXT=True
|
||||
```
|
||||
|
||||
- Lower `TOP_K` = safer, faster
|
||||
- Higher `TOP_K` = more context, risk of hitting token limits
|
||||
|
||||
---
|
||||
|
||||
## 5. Monitor LibreChat API Logs
|
||||
|
||||
Check for truncation or token overflows.
|
||||
|
||||
### 5.1. Run a large query, then:
|
||||
|
||||
```bash
|
||||
docker logs <apiN_container_name> --tail 300
|
||||
```
|
||||
|
||||
Search for:
|
||||
|
||||
```text
|
||||
... [truncated]
|
||||
```
|
||||
|
||||
If present, reduce `TOP_K` or `CHUNK_SIZE`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Manage Server Load & Isolation
|
||||
|
||||
### 6.1. Monitor:
|
||||
|
||||
```bash
|
||||
htop
|
||||
docker stats
|
||||
```
|
||||
|
||||
### 6.2. Reduce Load (temporarily):
|
||||
|
||||
```bash
|
||||
# Stop unused APIs or admin panels
|
||||
docker-compose stop api2 api3
|
||||
sudo systemctl stop lcadmin3.service
|
||||
```
|
||||
|
||||
### 6.3. Upgrade Hardware
|
||||
|
||||
- More RAM/CPU
|
||||
- Use SSD (preferably NVMe)
|
||||
- GPU boosts embedding (if using local models)
|
||||
|
||||
### 6.4. Advanced: Separate Services
|
||||
|
||||
You can host `vectordb` and `rag_api` on separate machines for heavy workloads.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Start with **index optimization**. Then move on to:
|
||||
|
||||
1. Chunk tuning (`CHUNK_SIZE`, `CHUNK_OVERLAP`)
|
||||
2. Retrieval strategy (`RAG_API_TOP_K`)
|
||||
3. Embedding configuration
|
||||
4. API log monitoring
|
||||
|
||||
Test each change independently. Always monitor API logs and resource usage. If issues persist, consider model/hardware upgrades.
|
||||
@@ -18,10 +18,12 @@ Each has its own look, price-point, and setup step (usually just an API key or U
|
||||
| **MCP** | Bring-your-own-Image-Generators | MCP server with image output support |
|
||||
|
||||
**Notes:**
|
||||
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||||
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
|
||||
- API keys can be omitted in favor of allowing the user to enter their own key from the UI.
|
||||
- Azure OpenAI does not yet support the latest OpenAI GPT-Image-1.
|
||||
- MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools.
|
||||
- Note: MCP servers may or may not use the correct response format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).
|
||||
- Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).
|
||||
|
||||
---
|
||||
|
||||
@@ -258,6 +260,11 @@ mcpServers:
|
||||
All generated images are:
|
||||
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
|
||||
2. Displayed directly in the chat interface
|
||||
3. Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
|
||||
- This may create issues if you are using an LLM that does not support image inputs.
|
||||
- There will be an option to disable this behavior on a per-agent-basis in the future.
|
||||
- The outputs are only directly sent to the LLM upon generation, not on every message.
|
||||
- To include the image in the chat, you can directly attach it to the message from the side panel.
|
||||
|
||||
---
|
||||
|
||||
@@ -276,6 +283,7 @@ If any of the tools encounter an error, they will return an error message explai
|
||||
- Content policy violations
|
||||
- Proxy/network issues
|
||||
- Invalid parameters
|
||||
- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above)
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user