feat: RAG Blog post (#291)

* Create optimizing-rag-performance-in-librechat.md

This PR adds a detailed blog post written by Henk van Ess that covers how to optimize Retrieval-Augmented Generation (RAG) performance in LibreChat.

The guide walks through:

Improving vector database performance (PostgreSQL/pgvector)
Chunking strategies (CHUNK_SIZE / CHUNK_OVERLAP)
Embedding provider options (OpenAI, Azure, Ollama)
Retrieval settings (RAG_API_TOP_K)
Monitoring and server resource tips
It's designed to help developers fine-tune their LibreChat instances for speed and quality. All content is based on hands-on testing and is Markdown-formatted for blog use.

Looking forward to feedback — happy to revise if needed!

* Update optimizing-rag-performance-in-librechat.md

This PR adds a detailed blog post written by Henk van Ess that covers how to optimize Retrieval-Augmented Generation (RAG) performance in LibreChat.

The guide walks through:

Improving vector database performance (PostgreSQL/pgvector)
Chunking strategies (CHUNK_SIZE / CHUNK_OVERLAP)
Embedding provider options (OpenAI, Azure, Ollama)
Retrieval settings (RAG_API_TOP_K)
Monitoring and server resource tips
It's designed to help developers fine-tune their LibreChat instances for speed and quality. All content is based on hands-on testing and is Markdown-formatted for blog use.

Looking forward to feedback — happy to revise if needed!

* Optimizing-rag-performance-in-librechat.md

This PR adds a detailed blog post written by Henk van Ess that covers how to optimize Retrieval-Augmented Generation (RAG) performance in LibreChat.

The guide walks through:

Improving vector database performance (PostgreSQL/pgvector)
Chunking strategies (CHUNK_SIZE / CHUNK_OVERLAP)
Embedding provider options (OpenAI, Azure, Ollama)
Retrieval settings (RAG_API_TOP_K)
Monitoring and server resource tips
It's designed to help developers fine-tune their LibreChat instances for speed and quality. All content is based on hands-on testing and is Markdown-formatted for blog use.

Looking forward to feedback — happy to revise if needed!

* docs: Document AWS_ENDPOINT_URL for S3 configuration (#285)

This commit updates the documentation related to S3 storage configuration to include the `AWS_ENDPOINT_URL` environment variable. This variable was introduced in PR [#6431](https://github.com/danny-avila/LibreChat/pull/6431) to allow users to specify a custom endpoint URL for S3 connections, but the documentation is not update.

The changes include:
- Adding a description for `AWS_ENDPOINT_URL`, clarifying its purpose and indicating that it's optional.

* 🎨 feat: Adds Image Gen Docs, fix broken links, usage stats, missing .env vars, formatting issues, bump Next.js (#288)

* docs: enhance API key setup instructions for clarity

* docs: update section title for API key setup clarity

* docs: add comprehensive guide for OpenAI image generation and editing tools

* docs: clarify Stable Diffusion section and update link in Image Generation overview

* docs: add Flux cloud generator configuration details and environment variables

* fix: Firebase CDN configuration link

* docs: enhance fileStrategy section with CDN options and notes

* docs: enhance Image Generation section with improved structure and pricing details

* docs: add Code Interpreter section with environment variable details and enterprise plan notes

* fix: formatting

* chore: bump next

* fix: correct markdown formatting for artifact example in agents documentation

* docs: add deprecation notices for tools, plugins, presets, and enhance image generation section

* feat: implement GitHub stats API and update Usage component to fetch stars dynamically

* fix: update Docker pulls value in Usage component

* 🫙 fix: Azure Blob Storage Documentation (#289)

* fix: Update fileStrategy option to use "azure_blob" in configuration docs

* fix: Update CDN documentation for Azure Blob Storage and improve navigation

* fix: Remove Hetzner reference from deployment comparison, closes #271

* fix: Update RAG API documentation for clarity on feature availability, closes #249

* feat: Update demo images (#290)

* docs: Clarify configuration file usage and API key settings in documentation, closes #238

* fix: Add instructions to remove existing Docker images for local and remote setups

* 🌊 docs: Flux (#203)

* feat: Enhance documentation for Flux and related tools, including setup instructions and parameters

* docs: Update image generation documentation with additional notes on output handling and MCP server format

* feat: Add blog post on optimizing RAG performance in LibreChat with accompanying image

* chore: Remove unused image for RAG performance blog post

---------

Co-authored-by: Henk van Ess <31323314+voelspriet@users.noreply.github.com>
Co-authored-by: lensfa <80903255+lensfa-lzd@users.noreply.github.com>
Co-authored-by: Marco Beretta <81851188+berry-13@users.noreply.github.com>
Co-authored-by: heptapod <164861708+leondape@users.noreply.github.com>
This commit is contained in:
Danny Avila
2025-04-25 15:16:17 -04:00
committed by GitHub
parent b83e42c13c
commit ca672b846b
2 changed files with 298 additions and 1 deletions

View File

@@ -0,0 +1,289 @@
---
title: Optimizing RAG Performance in LibreChat
date: 2025/04/25
description: 'Optimize Retrieval-Augmented Generation (RAG) performance for LibreChat'
tags:
- rag
- performance
- optimization
- guide
authorid: anon
ogImage: https://firebasestorage.googleapis.com/v0/b/superb-reporter-407417.appspot.com/o/rag_blog_image_04_25_25.png?alt=media
---
import { BlogHeader } from '@/components/blog/BlogHeader'
<BlogHeader />
> **Note**: This is a guest post. The information provided may not be accurate if the underlying RAG API changes in future LibreChat versions. Always refer to the official documentation for the most up-to-date information.
# Optimizing RAG Performance in LibreChat
This guide walks you through optimizing Retrieval-Augmented Generation (RAG) performance in your LibreChat setup. Always change **only one major setting at a time** and test results carefully.
---
## 1. Optimize Database (vectordb - PostgreSQL/pgvector)
Improving database performance is crucial for RAG speed, especially during indexing and retrieval.
### 1.1. Verify/Create Metadata & Filter Indexes (**CRITICAL**)
Missing indexes for filtering can drastically degrade performance.
#### Connect to the Database:
```bash
docker exec -it vectordb psql -U myuser -d mydatabase
# Enter 'mypassword' (or your secure password)
```
#### Check for existing indexes:
```sql
\di langchain_pg_embedding*
```
You should see:
- `custom_id_idx`
- `idx_cmetadata_file_id_text`
- A vector index like `langchain_pg_embedding_embedding_idx`
If missing, run:
```sql
-- Internal lookup by custom_id
CREATE INDEX CONCURRENTLY IF NOT EXISTS custom_id_idx ON langchain_pg_embedding (custom_id);
-- Filter by file_id inside metadata
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_cmetadata_file_id_text ON langchain_pg_embedding ((cmetadata ->> 'file_id'));
```
#### Exit:
```sql
\q
```
---
### 1.2. Verify/Tune Vector Index
The pgvector extension typically creates an index on the `embedding` column.
Check with `\di` again. Look for a `hnsw` or `ivfflat` index type.
> ⚙️ **Advanced**: You can tune index parameters like `lists`, `m`, `ef_search`, and `ef_construction` (see [pgvector README](https://github.com/pgvector/pgvector)).
---
### 1.3. Monitor/Adjust Server Resources
```bash
docker stats vectordb
```
Watch for memory and/or CPU saturation. PostgreSQL benefits from abundant RAM.
#### Optional: Set resource limits in `docker-compose.override.yml`
```yaml
deploy:
resources:
limits:
memory: 4G
cpus: '1.0'
```
---
### 1.4. Perform Database Maintenance
Run regularly:
```sql
VACUUM (VERBOSE, ANALYZE) langchain_pg_embedding;
```
---
### 1.5. Advanced PostgreSQL Tuning
Consider tuning:
- `shared_buffers`
- `work_mem`
- `maintenance_work_mem`
- `effective_cache_size`
These live in `postgresql.conf` (inside the container). Only touch them if you know what you're doing.
---
## 2. Tune Chunking Strategy (`.env`)
Impacts upload speed and retrieval precision.
### 2.1. Open the main `.env` file:
```bash
nano ~/LibreChat/.env
```
### 2.2. Modify chunk settings:
```env
CHUNK_SIZE=1500
CHUNK_OVERLAP=150
```
Try other combinations like:
- `1000/100`
- `500/50`
Trade-offs:
- **Larger chunks** = faster processing, lower precision
- **Smaller chunks** = slower, more precise
### 2.3. Save, exit, and restart:
```bash
docker-compose down
docker-compose up -d --force-recreate
```
---
### 2.4. Delete Old Embeddings
- **Easiest**: Delete files via UI
- **Advanced**: Delete from DB
```sql
DELETE FROM langchain_pg_embedding WHERE cmetadata ->> 'file_id' = 'YOUR_FILE_ID';
```
> 🔁 Safer method: Use a **new test file** for each config test
---
### 2.5. Re-upload & test performance
---
## 3. Optimize Embedding Process
Set provider/model in `.env`.
### Examples:
#### OpenAI:
```env
EMBEDDINGS_PROVIDER=openai
EMBEDDINGS_MODEL=text-embedding-ada-002
```
#### Azure:
```env
EMBEDDINGS_PROVIDER=azure
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME=...
```
#### Ollama (local):
```env
EMBEDDINGS_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama:11434
EMBEDDINGS_MODEL=nomic-embed-text
```
### Restart and re-upload:
```bash
docker-compose down
docker-compose up -d --force-recreate
```
---
## 4. Tune Retrieval Strategy
How many chunks are retrieved affects both relevance and API limits.
### 4.1. In `.env`:
```env
RAG_API_TOP_K=3
# RAG_USE_FULL_CONTEXT=True
```
- Lower `TOP_K` = safer, faster
- Higher `TOP_K` = more context, risk of hitting token limits
---
## 5. Monitor LibreChat API Logs
Check for truncation or token overflows.
### 5.1. Run a large query, then:
```bash
docker logs <apiN_container_name> --tail 300
```
Search for:
```text
... [truncated]
```
If present, reduce `TOP_K` or `CHUNK_SIZE`.
---
## 6. Manage Server Load & Isolation
### 6.1. Monitor:
```bash
htop
docker stats
```
### 6.2. Reduce Load (temporarily):
```bash
# Stop unused APIs or admin panels
docker-compose stop api2 api3
sudo systemctl stop lcadmin3.service
```
### 6.3. Upgrade Hardware
- More RAM/CPU
- Use SSD (preferably NVMe)
- GPU boosts embedding (if using local models)
### 6.4. Advanced: Separate Services
You can host `vectordb` and `rag_api` on separate machines for heavy workloads.
---
## Summary
Start with **index optimization**. Then move on to:
1. Chunk tuning (`CHUNK_SIZE`, `CHUNK_OVERLAP`)
2. Retrieval strategy (`RAG_API_TOP_K`)
3. Embedding configuration
4. API log monitoring
Test each change independently. Always monitor API logs and resource usage. If issues persist, consider model/hardware upgrades.

View File

@@ -18,10 +18,12 @@ Each has its own look, price-point, and setup step (usually just an API key or U
| **MCP** | Bring-your-own-Image-Generators | MCP server with image output support | | **MCP** | Bring-your-own-Image-Generators | MCP server with image output support |
**Notes:** **Notes:**
- Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
- See [Image Storage and Handling](#image-storage-and-handling) for more details.
- API keys can be omitted in favor of allowing the user to enter their own key from the UI. - API keys can be omitted in favor of allowing the user to enter their own key from the UI.
- Azure OpenAI does not yet support the latest OpenAI GPT-Image-1. - Azure OpenAI does not yet support the latest OpenAI GPT-Image-1.
- MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools. - MCP Server tool image outputs are supported, which may output images similarly to LC's built-in tools.
- Note: MCP servers may or may not use the correct response format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp). - Note: MCP servers may or may not use the correct format when outputting images. See details in the [MCP section below](#5--model-context-protocol-mcp).
--- ---
@@ -258,6 +260,11 @@ mcpServers:
All generated images are: All generated images are:
1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy) 1. Saved according to the configured [**`fileStrategy`**](/docs/configuration/librechat_yaml/object_structure/config#filestrategy)
2. Displayed directly in the chat interface 2. Displayed directly in the chat interface
3. Image Outputs are directly sent to the LLM as part of the immediate chat context following generation.
- This may create issues if you are using an LLM that does not support image inputs.
- There will be an option to disable this behavior on a per-agent-basis in the future.
- The outputs are only directly sent to the LLM upon generation, not on every message.
- To include the image in the chat, you can directly attach it to the message from the side panel.
--- ---
@@ -276,6 +283,7 @@ If any of the tools encounter an error, they will return an error message explai
- Content policy violations - Content policy violations
- Proxy/network issues - Proxy/network issues
- Invalid parameters - Invalid parameters
- Unsupported image payload (see [Image Storage and Handling](#image-storage-and-handling) above)
--- ---