diff --git a/pages/docs/configuration/librechat_yaml/object_structure/agents.mdx b/pages/docs/configuration/librechat_yaml/object_structure/agents.mdx index 80c01b0..ec92f8b 100644 --- a/pages/docs/configuration/librechat_yaml/object_structure/agents.mdx +++ b/pages/docs/configuration/librechat_yaml/object_structure/agents.mdx @@ -11,7 +11,7 @@ endpoints: maxRecursionLimit: 100 disableBuilder: false # (optional) Agent Capabilities available to all users. Omit the ones you wish to exclude. Defaults to list below. - # capabilities: ["execute_code", "file_search", "actions", "tools", "artifacts", "ocr", "chain", "web_search"] + # capabilities: ["execute_code", "file_search", "actions", "tools", "artifacts", "context", "ocr", "chain", "web_search"] # (optional) File citation configuration for file_search capability maxCitations: 30 # Maximum total citations in responses (1-50) maxCitationsPerFile: 7 # Maximum citations from each file (1-10) @@ -96,7 +96,7 @@ allowedProviders: ]} /> -**Default:** `["execute_code", "file_search", "actions", "tools", "artifacts", "ocr", "chain", "web_search"]` +**Default:** `["execute_code", "file_search", "actions", "tools", "artifacts", "context", "ocr", "chain", "web_search"]` **Example:** ```yaml filename="endpoints / agents / capabilities" @@ -106,6 +106,7 @@ capabilities: - "actions" - "tools" - "artifacts" + - "context" - "ocr" - "chain" - "web_search" @@ -203,7 +204,10 @@ The `capabilities` field allows you to enable or disable specific functionalitie - **file_search**: Enables the agent to search and interact with files. When enabled, citation behavior is controlled by `maxCitations`, `maxCitationsPerFile`, and `minRelevanceScore` settings. - **actions**: Permits the agent to perform predefined actions. - **tools**: Grants the agent access to various tools. -- **ocr**: Enables uploading files as additional context, leveraging Optical Character Recognition for extracting text from images and documents. +- **artifacts**: Enables the agent to generate interactive artifacts (React components, HTML, Mermaid diagrams). +- **context**: Enables "Upload as Text" functionality in chat, and "File Context" for agents, allowing users to upload files and have their content extracted and included directly in the conversation. +- **ocr**: Optionally enhances "Upload as Text" in chat, and "File Context" for agents, allowing files to be uploaded and processed with OCR. **Requires an OCR service to be configured.** +- **chain**: Enables Beta feature for agent chaining, also known as Mixture-of-Agents (MoA) workflows. - **web_search**: Enables web search functionality for agents, allowing them to search and retrieve information from the internet. By specifying the capabilities, you can control the features available to users when interacting with agents. @@ -226,6 +230,7 @@ endpoints: - "file_search" - "actions" - "artifacts" + - "context" - "ocr" - "web_search" ``` @@ -234,7 +239,7 @@ In this example: - The builder interface is enabled - File citations are limited to 20 total, with maximum 5 per file - Only sources with 60%+ relevance are included -- The agent has access to code execution, file search (with citations), actions, artifacts, OCR, and web search capabilities +- LibreChat Agents have access to code execution, file search (with citations), actions, artifacts, file context, ocr services if configured, and web search capabilities ## Notes @@ -242,3 +247,5 @@ In this example: - File citation configuration (`maxCitations`, `maxCitationsPerFile`, `minRelevanceScore`) only applies when the `file_search` capability is enabled. - The relevance score is calculated using vector similarity, where 1.0 represents a perfect match and 0.0 represents no similarity. - Citation limits help balance comprehensive information retrieval with response quality and performance. +- The `context` capability works without OCR configuration using text parsing methods. OCR enhances extraction quality when configured. +- The `ocr` capability requires an OCR service to be configured (see [OCR Configuration](/docs/configuration/librechat_yaml/object_structure/ocr)). diff --git a/pages/docs/configuration/librechat_yaml/object_structure/file_config.mdx b/pages/docs/configuration/librechat_yaml/object_structure/file_config.mdx index 0f7f822..6077824 100644 --- a/pages/docs/configuration/librechat_yaml/object_structure/file_config.mdx +++ b/pages/docs/configuration/librechat_yaml/object_structure/file_config.mdx @@ -18,6 +18,7 @@ There are 7 main fields under `fileConfig`: - At the time of writing, the Assistants endpoint [supports filetypes from this list](https://platform.openai.com/docs/assistants/tools/file-search#supported-files). - OpenAI, Azure OpenAI, Google, and Custom endpoints support files through the [RAG API.](../../rag_api.mdx) +- The `ocr`, `text`, and `stt` sections control file processing for features like [Upload as Text](/docs/features/upload_as_text) and [OCR](/docs/features/ocr) - Any other endpoints not mentioned, like Plugins, do not support file uploads (yet). - The Assistants endpoint has a defined endpoint value of `assistants`. All other endpoints use the defined value `default` - For non-assistants endpoints, you can adjust file settings for all of them under `default` @@ -155,6 +156,10 @@ fileConfig: **Description:** The `ocr` section configures which file types should be processed using OCR functionality for extracting text from visual documents. +**Note:** This section controls file type matching for OCR processing. To enable agent capabilities and configure OCR services, see: +- [Agents Configuration](/docs/configuration/librechat_yaml/object_structure/agents#capabilities) for the `ocr` and `context` capabilities +- [OCR Configuration](/docs/configuration/librechat_yaml/object_structure/ocr) for OCR service setup + ### supportedMimeTypes STT > text parsing -- Files not matching any pattern will not be processed +- **Processing precedence: OCR > STT > text parsing > fallback** +- Files not matching any pattern will fall back to text parsing + +## File Processing Priority + +LibreChat processes uploaded files based on MIME type matching with the following **priority order**: + +1. **OCR** - If file matches `ocr.supportedMimeTypes` AND OCR is configured +2. **STT** - If file matches `stt.supportedMimeTypes` AND STT is configured +3. **Text Parsing** - If file matches `text.supportedMimeTypes` +4. **Fallback** - Text parsing as last resort + +This processing order ensures optimal extraction quality while maintaining functionality even when specialized services (OCR/STT) are not configured. + +### Processing Examples + +**PDF file with OCR configured:** +- File matches `ocr.supportedMimeTypes` +- **Uses OCR** to extract text +- Better quality for scanned PDFs and images + +**PDF file without OCR configured:** +- File matches `text.supportedMimeTypes` (or uses fallback) +- **Uses text parsing** library +- Works well for digital PDFs with selectable text + +**Python file:** +- File matches `text.supportedMimeTypes` +- **Uses text parsing** (no OCR needed) +- Direct text extraction + +**Audio file with STT configured:** +- File matches `stt.supportedMimeTypes` +- **Uses STT** to transcribe audio to text + +**Image file without OCR configured:** +- File matches `ocr.supportedMimeTypes` but OCR not available +- **Falls back to text parsing** +- Limited extraction capability without OCR + +This priority system allows features like "Upload as Text" to work without requiring OCR configuration, while still leveraging OCR when available for improved extraction quality. ## endpoints diff --git a/pages/docs/features/_meta.ts b/pages/docs/features/_meta.ts index 010d8fd..3d50f7f 100644 --- a/pages/docs/features/_meta.ts +++ b/pages/docs/features/_meta.ts @@ -7,6 +7,8 @@ export default { web_search: 'Web Search', memory: 'Memory', image_gen: 'Image Generation', + upload_as_text: 'Upload as Text', + ocr: 'OCR for Documents', // local_setup: 'Local Setup', // custom_endpoints: 'Custom Endpoints', url_query: 'URL Query Parameters', diff --git a/pages/docs/features/agents.mdx b/pages/docs/features/agents.mdx index 4e85d01..13b0f21 100644 --- a/pages/docs/features/agents.mdx +++ b/pages/docs/features/agents.mdx @@ -60,16 +60,21 @@ The File Search capability enables: - Context-aware responses based on file contents - File attachment support at both agent and chat thread levels -### File Context (using Optical Character Recognition) +### File Context -The File Context (OCR) capability allows your agent to extract and process text from images and documents: +The File Context capability allows your agent to store extracted text from files as part of its system instructions: - Extract text while maintaining document structure and formatting - Process complex layouts including multi-column text and mixed content - Handle tables, equations, and other specialized content - Work with multilingual content -- [More info about OCR](/docs/features/ocr) - - **Currently uses Mistral OCR API which may incur costs** +- Text is stored in the agent's instructions in the database +- **No OCR service required** - Uses text parsing by default with fallback methods +- **Enhanced by OCR** - If OCR is configured, extraction quality improves for images and scanned documents +- Uses the same processing logic as "Upload as Text": **OCR > STT > text parsing** +- [More info about OCR configuration](/docs/features/ocr) + +**Note:** File Context includes extracted text in the agent's system instructions. For temporary document questions in individual conversations, use [Upload as Text](/docs/features/upload_as_text) from the chat instead. ### Model Context Protocol (MCP) @@ -211,18 +216,47 @@ A single, non-tool response is 1 step. A singular round of tool usage is usually ## File Management -Agents support four distinct file upload categories: +Agents support multiple ways to work with files: -1. **Image Upload**: For visual content processing +### In Chat Interface + +When chatting with an agent, you have four upload options: + +1. **Upload Images** + - Uploads images for native vision model support + - Sends images directly to the model provider + +2. **Upload as Text** (requires `context` capability) + - Extracts and includes full document content in conversation + - Uses text parsing by default; enhanced by OCR if configured + - Content exists only in current conversation + - See [Upload as Text](/docs/features/upload_as_text) + +3. **Upload for File Search** (requires `file_search` capability, toggled ON) + - Uses semantic search (RAG) with vector stores + - Returns relevant chunks via tool use + - Optimal for large documents/multiple files + - Sub-optimal for structured data (CSV, Excel, JSON, etc.) + +4. **Upload for Code Interpreter** (requires `execute_code` capability, toggled ON) + - Adds files to code interpreter environment + - Optimal for structured data (CSV, Excel, JSON, etc.) + - More info about [Code Interpreter](/docs/features/code_interpreter) + +### In Agent Builder + +When configuring an agent, you can attach files in different categories: + +1. **Image Upload**: For visual content the agent can reference 2. **File Search Upload**: Documents for RAG capabilities 3. **Code Interpreter Upload**: Files for code processing -4. **File Context (OCR)**: Documents processed with OCR and added to the agent's instructions +4. **File Context**: Documents with extracted text to supplement agent instructions -Files can be attached directly to the agent configuration or within individual chat threads. +**File Context** uses the `context` capability and works just like ["Upload as Text"](/docs/features/upload_as_text) - it uses text parsing by default and is enhanced by OCR when configured. Text is extracted at upload time and stored in the agent's instructions. This is ideal for giving agents persistent knowledge from documents, PDFs, code files, or images with text. -![File Context using OCR for agents](/images/ocr/file_context_ocr.png) +**Processing priority:** OCR > STT > text parsing (same as Upload as Text) -Files uploaded as "File Context" are processed using OCR to extract text, which is then added to the Agent's instructions. This is ideal for documents, images with text, or PDFs where you need the full text content of a file to be available to the agent. Note, the OCR is performed at the time of upload and is not stored as a separate file, rather purely as text in the database. +**Note:** The extracted text is included as part of the agent's system instructions. ## Sharing and Permissions @@ -280,11 +314,16 @@ LibreChat allows admins to configure the use of agents via the [`librechat.yaml` 1. Select "Agents" from the endpoint dropdown menu 2. Open the Agent Builder panel 3. Fill out the required agent details -4. Configure desired capabilities (Code Interpreter, File Search, File Context or OCR) +4. Configure desired capabilities (Code Interpreter, File Search, File Context, etc.) 5. Add necessary tools and files 6. Set sharing permissions if desired 7. Create and start using your agent +When chatting with agents, you can: +- Use "Upload as Text" to include full document content in conversations (text parsing by default, enhanced by OCR) +- Use "Upload for File Search" for semantic search over documents (requires RAG API) +- Add files to agent's "File Context" to included a file's full content as part of the agent's system instructions + ## Migration Required (v0.8.0-rc3+) diff --git a/pages/docs/features/ocr.mdx b/pages/docs/features/ocr.mdx index bbb01c1..5ebe535 100644 --- a/pages/docs/features/ocr.mdx +++ b/pages/docs/features/ocr.mdx @@ -1,15 +1,59 @@ --- -title: File Context (OCR) -description: Learn how to use LibreChat's OCR capability to extract text from images and documents for AI processing. +title: OCR for Documents +description: Learn how to configure Optical Character Recognition (OCR) to enhance text extraction in LibreChat's file upload features. --- -# File Context via Optical Character Recognition (OCR) +# OCR for Documents -LibreChat's OCR (Optical Character Recognition) feature enables AI agents to extract and process text from images and documents. This capability enhances the AI's ability to work with visual content, making it possible to analyze, understand, and respond to information contained in images. +OCR (Optical Character Recognition) in LibreChat is an optional enhancement for text extraction from files. -## Overview +### Upload as Text -OCR functionality in LibreChat allows agents to: +The "Upload as Text" feature (from the chat) works the same way: + +- Files matching `fileConfig.ocr.supportedMimeTypes` use OCR if available +- Falls back to text parsing if OCR is not configured +- Especially useful for images with text, scanned documents, and complex PDFs +- Processing priority: **OCR > STT > text parsing** +- See the [Upload as Text](/docs/features/upload_as_text) documentation for details. + +### File Context (for agents) + +When you upload files through the Agent Builder's File Context section: + +1. Text is extracted using text parsing by default (OCR/STT if configured and file matches) +2. Extracted text is stored as part of the agent's system instructions +3. Agent can reference this context in all conversations +4. **OCR service is optional** - the feature works without it using text parsing + +Files uploaded as "File Context" are processed to extract text, which is then added to the Agent's system instructions. This is ideal for documents, code files, PDFs, or images with text where you need the full text content to be included in the agent's instructions. + +**Note:** The extracted text is included in the agent's system instructions. + +## Optional OCR Configuration + +Both Agent File Context and Upload as Text work out-of-the-box using text parsing. To enhance extraction quality for images and scanned documents, you can optionally configure an OCR service: + +```yaml +# librechat.yaml +endpoints: + agents: + capabilities: + - "context" # Enables both agent file context and upload as text + - "ocr" # Optionally enhances both with OCR + +ocr: + strategy: "mistral_ocr" + apiKey: "${OCR_API_KEY}" + baseURL: "https://api.mistral.ai/v1" + mistralModel: "mistral-ocr-latest" +``` + +**Note:** The `context` capability is enabled by default. You only need to configure OCR (the `ocr` capability) if you want enhanced extraction quality for images and scanned documents. + +## Overview of OCR Capabilities + +OCR functionality in LibreChat allows: - Extract text from images and documents - Maintain document structure and formatting @@ -17,10 +61,6 @@ OCR functionality in LibreChat allows agents to: - Handle tables, equations, and other specialized content - Work with multilingual content -## Availability - -Currently, OCR is **only available as an agent capability**. This means you must use an agent via the Agents endpoint to leverage OCR functionality. - ## OCR Strategies LibreChat supports multiple OCR strategies to meet different deployment needs and requirements. Choose the strategy that best fits your infrastructure and compliance requirements. @@ -132,49 +172,49 @@ Support for custom OCR providers and user-defined strategies is planned for futu For additional, detailed configuration options, see the [OCR Config Object Structure](/docs/configuration/librechat_yaml/object_structure/ocr). -## Using File Context (OCR) in LibreChat +## OCR Processing Configuration -LibreChat provides two main ways to use OCR functionality: +Control which file types are processed with OCR using `fileConfig`: -### 1. Upload as Text in Chat +```yaml +fileConfig: + ocr: + supportedMimeTypes: + - "^image/(jpeg|gif|png|webp|heic|heif)$" + - "^application/pdf$" + - "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$" + - "^application/vnd\\.ms-(word|powerpoint|excel)$" + - "^application/epub\\+zip$" +``` -In any chat conversation, you can use OCR to extract text from images or documents: +Files matching these patterns will use OCR when: +- Uploaded to agent file context (always, if OCR is configured) +- Uploaded as text in chat (if OCR is configured; otherwise falls back to text parsing) -1. Click the attachment icon in the chat input -2. Select "Upload as Text" from the menu -3. Choose an image or document file -4. The OCR system will process the file and insert the extracted text into your message +For more details on file processing configuration, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config). -![Upload as Text option in the attachment menu](/images/ocr/upload_as_text.png) +## Use Cases for Agent File Context -### 2. File Context for Agents +Agent File Context is ideal for: -When working with agents, you can add documents as context using OCR: +- **Persistent Agent Knowledge**: Add documentation, policies, or reference materials to an agent's system instructions +- **Specialized Agents**: Create agents with domain-specific knowledge from documents +- **Document-Based Assistants**: Build agents that always reference specific manuals or guides +- **Code Files**: Include code examples or libraries in agent instructions +- **Structured Data**: Add CSV, JSON, or other structured data for the agent to reference -1. Open the Agent Builder panel or edit an existing agent -2. In the File Context section, click "Upload File Context" -3. Select a document or image file -4. The OCR system will extract text from the file and add it to the agent's instructions +When OCR is configured, File Context also handles: +- **Scanned Document Processing**: Extract and store text from images or scanned PDFs +- **Image Text Extraction**: Extract text from screenshots or photos of documents -![File Context using OCR for agents](/images/ocr/file_context_ocr.png) - -Files uploaded as "Context" are processed using OCR to extract text, which is then added to the Agent's instructions. This is ideal for documents, images with text, or PDFs where you need the full text content of a file to be available to the agent. - -**Note,** the OCR is performed at the time of upload and is not stored as a separate file, rather purely as text in the database. - -## Example Use Cases - -- **Document Analysis**: Extract and analyze text from scanned documents, PDFs, or images -- **Data Extraction**: Pull specific information from forms, receipts, or invoices -- **Research Assistance**: Process academic papers, articles, or books -- **Language Translation**: Extract text from foreign language documents for translation -- **Content Digitization**: Convert printed materials into digital, searchable text +For temporary document questions in chat, see [Upload as Text](/docs/features/upload_as_text). ## Limitations -- OCR accuracy may vary depending on image quality, document complexity, and text clarity +- Text extraction accuracy may vary depending on file type, image quality, document complexity, and text clarity - Some specialized formatting or unusual layouts might not be perfectly preserved - Very large documents may be truncated due to token limitations of the underlying AI models +- For best results with images and scanned documents, configure an OCR service ## Future Enhancements diff --git a/pages/docs/features/upload_as_text.mdx b/pages/docs/features/upload_as_text.mdx new file mode 100644 index 0000000..476ecf7 --- /dev/null +++ b/pages/docs/features/upload_as_text.mdx @@ -0,0 +1,324 @@ +--- +title: Upload as Text +description: Upload files to include their full content in conversations without requiring OCR configuration. +--- + +# Upload as Text + +Upload as Text allows you to upload documents and have their full content included directly in your conversation with the AI. This feature works out-of-the-box using text parsing methods, with optional OCR enhancement for improved extraction quality. + +## Overview + +- **No OCR required** - Uses text parsing with fallback methods by default +- **Enhanced by OCR** - If OCR is configured, extraction quality improves for images and scanned documents +- **Full document content** - Entire file content available to the model in the conversation +- **Works with all models** - No special tool capabilities needed +- **Token limit control** - Configurable via `fileTokenLimit` to manage context usage + +## The `context` Capability + +Upload as Text is controlled by the `context` capability in your LibreChat configuration. + +```yaml +# librechat.yaml +endpoints: + agents: + capabilities: + - "context" # Enables "Upload as Text" +``` + +**Default:** The `context` capability is included by default. You only need to explicitly add it if you've customized the capabilities list. + +## How It Works + +When you upload a file using "Upload as Text": + +1. LibreChat checks the file MIME type against `fileConfig` patterns +2. **Processing method determined by precedence: OCR > STT > text parsing** +3. If file matches `fileConfig.ocr.supportedMimeTypes` AND OCR is configured: **Use OCR** +4. If file matches `fileConfig.stt.supportedMimeTypes` AND STT is configured: **Use STT** +5. If file matches `fileConfig.text.supportedMimeTypes`: **Use text parsing** +6. Otherwise: **Fallback to text parsing** +7. Text is truncated to `fileConfig.fileTokenLimit` before prompt construction +8. Full extracted text included in conversation context + +### Text Processing Methods + +**Text Parsing (Default):** +- Uses a robust parsing library (same as the RAG API) +- Handles PDFs, Word docs, text files, code files, and more +- No external service required +- Works immediately without configuration +- Fallback method if no other match + +**OCR Enhancement (Optional):** +- Improves extraction from images, scanned documents, and complex PDFs +- Requires OCR service configuration +- Automatically used for files matching `fileConfig.ocr.supportedMimeTypes` when available +- See [OCR Configuration](/docs/features/ocr) + +**STT Processing (Optional):** +- Converts audio files to text +- Requires STT service configuration +- See [Speech-to-Text Configuration](/docs/features/speech-to-text) + +## Usage + +1. Click the attachment icon in the chat input +2. Select "Upload as Text" from the menu +3. Choose your file +4. File content is extracted and included in your message + +**Note:** If you don't see "Upload as Text", ensure the `context` capability is enabled in your [`endpoints.agents.capabilities` configuration](/docs/configuration/librechat_yaml/object_structure/agents#capabilities). + +## Configuration + +### Basic Configuration + +The `context` capability is enabled by default. No additional configuration is required for basic text parsing functionality. + +### File Handling Configuration + +Control text processing behavior with `fileConfig`: + +```yaml +fileConfig: + # Maximum tokens from text files before truncation + fileTokenLimit: 100000 + + # Files matching these patterns use OCR (if configured) + ocr: + supportedMimeTypes: + - "^image/(jpeg|gif|png|webp|heic|heif)$" + - "^application/pdf$" + - "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$" + - "^application/vnd\\.ms-(word|powerpoint|excel)$" + - "^application/epub\\+zip$" + + # Files matching these patterns use text parsing + text: + supportedMimeTypes: + - "^text/(plain|markdown|csv|json|xml|html|css|javascript|typescript|x-python|x-java|x-csharp|x-php|x-ruby|x-go|x-rust|x-kotlin|x-swift|x-scala|x-perl|x-lua|x-shell|x-sql|x-yaml|x-toml)$" + + # Files matching these patterns use STT (if configured) + stt: + supportedMimeTypes: + - "^audio/(mp3|mpeg|mpeg3|wav|wave|x-wav|ogg|vorbis|mp4|x-m4a|flac|x-flac|webm)$" +``` + +**Processing Priority:** OCR > STT > text parsing > fallback + +For more details, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config). + +### Optional: Configure OCR for Enhanced Extraction + +OCR is **not required** but enhances extraction quality when configured: + +```yaml +# librechat.yaml +ocr: + strategy: "mistral_ocr" + apiKey: "${OCR_API_KEY}" + baseURL: "https://api.mistral.ai/v1" + mistralModel: "mistral-ocr-latest" +``` + +See [OCR Configuration](/docs/features/ocr) for full details. + +## When to Use Each Upload Option + +LibreChat offers three different ways to upload files, each suited for different use cases: + +### Use "Upload as Text" when: +- ✅ You want the AI to read the complete document content +- ✅ Working with smaller files that fit in context +- ✅ You need "chat with files" functionality +- ✅ Using models without tool capabilities +- ✅ You want direct content access without semantic search + +### Use "Upload for File Search" when: +- ✅ Working with large documents or multiple files +- ✅ You want to optimize token usage +- ✅ You need semantic search for relevant sections +- ✅ Building knowledge bases +- ✅ The `file_search` capability is enabled and toggled ON + +### Use standard "Upload Files" when: +- ✅ Using vision models to analyze images +- ✅ Using code interpreter to execute code +- ✅ Files don't need text extraction + +## Supported File Types + +### Text Files (text parsing) +- Plain text, Markdown, CSV, JSON, XML, HTML +- Programming languages (Python, JavaScript, Java, C++, etc.) +- Configuration files (YAML, TOML, INI, etc.) +- Shell scripts, SQL files + +### Documents (text parsing or OCR) +- PDF documents +- Word documents (.docx, .doc) +- PowerPoint presentations (.pptx, .ppt) +- Excel spreadsheets (.xlsx, .xls) +- EPUB books + +### Images (OCR if configured) +- JPEG, PNG, GIF, WebP +- HEIC, HEIF (Apple formats) +- Screenshots, photos of documents, scanned images + +### Audio (STT if configured) +- MP3, WAV, OGG, FLAC +- M4A, WebM +- Voice recordings, podcasts + +## File Processing Priority + +LibreChat processes files based on MIME type matching with the following **priority order**: + +1. **OCR** - If file matches `ocr.supportedMimeTypes` AND OCR is configured +2. **STT** - If file matches `stt.supportedMimeTypes` AND STT is configured +3. **Text Parsing** - If file matches `text.supportedMimeTypes` +4. **Fallback** - Text parsing as last resort + +### Processing Examples + +**PDF file with OCR configured:** +- Matches `ocr.supportedMimeTypes` +- **Uses OCR** to extract text +- Better quality for scanned PDFs + +**PDF file without OCR configured:** +- Matches `text.supportedMimeTypes` (fallback) +- **Uses text parsing** library +- Works well for digital PDFs + +**Python file:** +- Matches `text.supportedMimeTypes` +- **Uses text parsing** (no OCR needed) +- Direct text extraction + +**Audio file with STT configured:** +- Matches `stt.supportedMimeTypes` +- **Uses STT** to transcribe + +## Token Limits + +Files are truncated to `fileTokenLimit` tokens to manage context window usage: + +```yaml +fileConfig: + fileTokenLimit: 100000 # Default: 100,000 tokens +``` + +- Truncation happens at runtime before prompt construction +- Helps prevent exceeding model context limits +- Configurable based on your needs and model capabilities +- Larger limits allow more content but use more tokens + +## Comparison with Other File Features + +| Feature | Capability | Requires Service | Persistence | Best For | +|---------|-----------|------------------|-------------|----------| +| **Upload as Text** | `context` | No (enhanced by OCR) | Single conversation | Temporary document questions | +| **Agent File Context** | `context` | No (enhanced by OCR) | Agent system instructions | Specialized agent knowledge | +| **File Search** | `file_search` | Yes (vector DB) | Stored in vector store | Large documents, semantic search | + +### Upload as Text vs Agent File Context + +**Upload as Text (`context`):** +- Available in any chat conversation +- Content included in current conversation only +- No OCR service required (text parsing by default) +- Best for one-off document questions + +**Agent File Context (`context`):** +- Only available in Agent Builder +- Content stored in agent's system instructions +- No OCR service required (text parsing by default) +- Best for creating specialized agents with persistent knowledge +- See [OCR for Documents](/docs/features/ocr) + +### Upload as Text vs File Search + +**Upload as Text (`context`):** +- Full document content in conversation context +- Direct access to all text +- Token usage: entire file (up to limit) +- Works without RAG API configuration + +**File Search (`file_search`):** +- Semantic search over documents +- Returns relevant chunks via tool use +- Token usage: only relevant sections +- Requires RAG API and vector store configuration +- See [RAG API](/docs/features/rag_api) + +## Example Use Cases + +- **Document Analysis**: Upload contracts, reports, or articles for analysis +- **Code Review**: Upload source files for review and suggestions +- **Data Extraction**: Extract information from structured documents +- **Translation**: Translate document contents +- **Summarization**: Summarize articles, papers, or reports +- **Research**: Discuss academic papers or technical documentation +- **Troubleshooting**: Share log files for analysis +- **Content Editing**: Review and edit written content +- **Data Processing**: Work with CSV or JSON data files + +## Troubleshooting + +### "Upload as Text" option not appearing + +**Solution:** Ensure the `context` capability is enabled: + +```yaml +endpoints: + agents: + capabilities: + - "context" # Add this if missing +``` + +### File content not extracted properly + +**Solutions:** +1. Check if file type is supported (matches `fileConfig` patterns) +2. For images/scanned documents: Configure OCR for better extraction +3. For audio files: Configure STT service +4. Verify file is not corrupted + +### Content seems truncated + +**Solution:** Increase the token limit: + +```yaml +fileConfig: + fileTokenLimit: 150000 # Increase as needed +``` + +### Poor extraction quality from images + +**Solution:** Configure OCR to enhance extraction: + +```yaml +ocr: + strategy: "mistral_ocr" + apiKey: "${OCR_API_KEY}" +``` + +See [OCR Configuration](/docs/features/ocr) for details. + +## Related Features + +- [File Context (OCR)](/docs/features/ocr) - Permanent OCR-based agent knowledge +- [File Search](/docs/features/rag_api) - Semantic search over documents +- [Agents](/docs/features/agents) - Build custom AI assistants +- [File Configuration](/docs/configuration/librechat_yaml/object_structure/file_config) - Configure file handling +- [OCR Configuration](/docs/configuration/librechat_yaml/object_structure/ocr) - Configure OCR services + +--- + +Upload as Text provides a simple, powerful way to work with documents in LibreChat without requiring complex configuration or external services. + + diff --git a/public/images/ocr/file_context_ocr.png b/public/images/ocr/file_context_ocr.png deleted file mode 100644 index 1608b43..0000000 Binary files a/public/images/ocr/file_context_ocr.png and /dev/null differ diff --git a/public/images/ocr/upload_as_text.png b/public/images/ocr/upload_as_text.png deleted file mode 100644 index 58b934b..0000000 Binary files a/public/images/ocr/upload_as_text.png and /dev/null differ