diff --git a/content/docs/features/upload_as_text.mdx b/content/docs/features/upload_as_text.mdx index 53e5505..df66eb9 100644 --- a/content/docs/features/upload_as_text.mdx +++ b/content/docs/features/upload_as_text.mdx @@ -1,321 +1,308 @@ --- title: Upload Files as Text icon: Upload -description: Upload files to include their full content in conversations without requiring OCR configuration. +description: Drop any file into your chat and let LibreChat read it — no setup needed. --- -Upload as Text allows you to upload documents and have their full content included directly in your conversation with the AI. This feature works out-of-the-box using text parsing methods, with optional OCR enhancement for improved extraction quality. +# Upload Files as Text -## Overview +Ever wanted to hand a PDF, a code file, or a spreadsheet to the AI and just say _"read this"_? That's exactly what **Upload as Text** does. -- **No OCR required** - Uses text parsing with fallback methods by default -- **Enhanced by OCR** - If OCR is configured, extraction quality improves for images and scanned documents -- **Full document content** - Entire file content available to the model in the conversation -- **Works with all models** - No special tool capabilities needed -- **Token limit control** - Configurable via `fileTokenLimit` to manage context usage +You attach a file, LibreChat extracts the text from it, and the full content gets pasted straight into your conversation. The AI can then read every word of it — no plugins, no vector databases, no extra services to configure. It works out of the box. -## The `context` Capability + + Upload as Text works immediately on any LibreChat instance. It uses built-in text parsing — you don't need OCR, a RAG pipeline, or any external service to get started. + -Upload as Text is controlled by the `context` capability in your LibreChat configuration. +--- -```yaml -# librechat.yaml +## How to use it + + + + ### Click the attachment icon + + In the chat input bar, click the **paperclip** (📎) icon. + + + ### Pick "Upload as Text" + + From the dropdown menu, select **Upload as Text**. This tells LibreChat to read the file contents rather than pass it as a raw attachment. + + + ### Choose your file + + Select the file from your device. LibreChat will extract the text and embed it directly into your message. + + + ### Ask your question + + Type your prompt as usual. The AI now has the full text of your file in context and can reference any part of it. + + + + + If "Upload as Text" doesn't appear, the `context` capability may have been disabled by your admin. It's on by default — but if the capabilities list was customized, `context` needs to be explicitly included. See the [configuration section](#the-context-capability) below. + + +--- + +## What happens under the hood + +When you upload a file this way, LibreChat doesn't just dump raw bytes into the prompt. It runs through a processing pipeline to extract clean, readable text: + +1. **MIME type detection** — LibreChat checks what kind of file you uploaded (PDF, image, audio, source code, etc.) by inspecting its MIME type. +2. **Method selection** — Based on the file type and what services are available, it picks the best extraction method using this priority: + + + + | Priority | Method | When it's used | + |----------|--------|---------------| + | 1st | **OCR** | File is an image or scanned document, _and_ OCR is configured | + | 2nd | **STT** (Speech-to-Text) | File is audio, _and_ STT is configured | + | 3rd | **Text parsing** | File matches a known text MIME type | + | 4th | **Fallback** | None of the above matched — tries text parsing anyway | + + + **A `.pdf` on an instance with OCR configured:** + → OCR kicks in. Great for scanned docs and complex layouts. + + **A `.pdf` on a default instance (no OCR):** + → Text parsing handles it. Works well for digitally-created PDFs. + + **A `.py` Python file:** + → Straight to text parsing. Source code is already text — no conversion needed. + + **An `.mp3` on an instance with STT configured:** + → Speech-to-Text transcribes it into text for the conversation. + + **A `.png` screenshot with no OCR configured:** + → Falls back to text parsing (limited results — consider setting up OCR for images). + + + +3. **Token truncation** — The extracted text is trimmed to the `fileTokenLimit` (default: 100,000 tokens) so it doesn't blow past the model's context window. +4. **Prompt injection** — The text gets included in the conversation context, right alongside your message. + +--- + +## Which files are supported + + + + These are parsed directly — they're already text, so no conversion is needed. + + - Plain text (`.txt`), Markdown (`.md`), CSV, JSON, XML, HTML, CSS + - Programming languages — Python, JavaScript, TypeScript, Java, C#, PHP, Ruby, Go, Rust, Kotlin, Swift, Scala, Perl, Lua + - Config files — YAML, TOML, INI + - Shell scripts, SQL files + + + Text parsing handles these out of the box. If OCR is configured, it takes over for better accuracy on complex layouts. + + - **PDF** — digital and scanned (scanned PDFs benefit from OCR) + - **Word** — `.docx`, `.doc` + - **PowerPoint** — `.pptx`, `.ppt` + - **Excel** — `.xlsx`, `.xls` + - **EPUB** books + + + Images **require OCR** to produce useful text. Without it, results will be poor. + + - JPEG, PNG, GIF, WebP + - HEIC, HEIF (Apple formats) + - Screenshots, photos of documents, scanned pages + + + Audio files **require STT** to be configured. There's no fallback — audio can't be "text parsed." + + - MP3, WAV, OGG, FLAC + - M4A, WebM + - Voice recordings, podcast clips + + + +--- + +## Upload as Text vs. other upload options + +LibreChat has three ways to upload files. Each one works differently and suits different situations: + + + + Extracts the full file content and drops it into the conversation. Best for smaller files where you want the AI to read everything — contracts, code files, articles. Works with all models, no extra services needed. + + + Indexes the file in a vector database and retrieves only the relevant chunks when you ask a question. Better for large files or collections of files where dumping everything into context would waste tokens. Requires the RAG API. + + + Passes the file directly to the model — used for vision models analyzing images, or code interpreter running scripts. No text extraction happens. + + + +**Quick decision guide:** + +| Situation | Best option | +|-----------|-------------| +| _"Read this 5-page contract and summarize it"_ | **Upload as Text** | +| _"I have 50 PDFs, find what mentions pricing"_ | **File Search (RAG)** | +| _"What's in this screenshot?"_ (vision model) | **Standard Upload** | +| _"Run this Python script"_ (code interpreter) | **Standard Upload** | +| _"Review this code file for bugs"_ | **Upload as Text** | +| _"Search through our company docs"_ | **File Search (RAG)** | + +--- + +## The `context` capability + +Under the hood, Upload as Text is powered by the **`context` capability**. This is what controls whether the feature appears in your chat UI. + + + The `context` capability is **enabled by default**. You only need to touch this if your admin has customized the capabilities list and accidentally left it out. + + +```yaml title="librechat.yaml" endpoints: agents: capabilities: - - "context" # Enables "Upload as Text" + - "context" # This is what enables "Upload as Text" ``` -**Default:** The `context` capability is included by default. You only need to explicitly add it if you've customized the capabilities list. +The same `context` capability also powers **Agent File Context** (uploading files through the Agent Builder to embed text into an agent's system instructions). The difference is _where_ the text ends up: -## How It Works +| | Upload as Text | Agent File Context | +|---|---|---| +| **Where** | Chat input (any conversation) | Agent Builder panel | +| **Scope** | Current conversation only | Persists in agent's instructions | +| **Use case** | One-off document questions | Building specialized agents with baked-in knowledge | -When you upload a file using "Upload as Text": +--- -1. LibreChat checks the file MIME type against `fileConfig` patterns -2. **Processing method determined by precedence: OCR > STT > text parsing** -3. If file matches `fileConfig.ocr.supportedMimeTypes` AND OCR is configured: **Use OCR** -4. If file matches `fileConfig.stt.supportedMimeTypes` AND STT is configured: **Use STT** -5. If file matches `fileConfig.text.supportedMimeTypes`: **Use text parsing** -6. Otherwise: **Fallback to text parsing** -7. Text is truncated to `fileConfig.fileTokenLimit` before prompt construction -8. Full extracted text included in conversation context +## Token limits and truncation -### Text Processing Methods +When a file is too long to fit in the model's context window, LibreChat truncates the extracted text to stay within bounds. This happens automatically — you don't need to worry about it, but it's good to know how it works. -**Text Parsing (Default):** -- Uses a robust parsing library (same as the RAG API) -- Handles PDFs, Word docs, text files, code files, and more -- No external service required -- Works immediately without configuration -- Fallback method if no other match - -**OCR Enhancement (Optional):** -- Improves extraction from images, scanned documents, and complex PDFs -- Requires OCR service configuration -- Automatically used for files matching `fileConfig.ocr.supportedMimeTypes` when available -- See [OCR Configuration](/docs/features/ocr) - -**STT Processing (Optional):** -- Converts audio files to text -- Requires STT service configuration -- See [Speech-to-Text Configuration](/docs/configuration/librechat_yaml/object_structure/file_config#stt) - -## Usage - -1. Click the attachment icon in the chat input -2. Select "Upload as Text" from the menu -3. Choose your file -4. File content is extracted and included in your message - -**Note:** If you don't see "Upload as Text", ensure the `context` capability is enabled in your [`endpoints.agents.capabilities` configuration](/docs/configuration/librechat_yaml/object_structure/agents#capabilities). - -## Configuration - -### Basic Configuration - -The `context` capability is enabled by default. No additional configuration is required for basic text parsing functionality. - -### File Handling Configuration - -Control text processing behavior with `fileConfig`: - -```yaml -fileConfig: - # Maximum tokens from text files before truncation - fileTokenLimit: 100000 - - # Files matching these patterns use OCR (if configured) - ocr: - supportedMimeTypes: - - "^image/(jpeg|gif|png|webp|heic|heif)$" - - "^application/pdf$" - - "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$" - - "^application/vnd\\.ms-(word|powerpoint|excel)$" - - "^application/epub\\+zip$" - - # Files matching these patterns use text parsing - text: - supportedMimeTypes: - - "^text/(plain|markdown|csv|json|xml|html|css|javascript|typescript|x-python|x-java|x-csharp|x-php|x-ruby|x-go|x-rust|x-kotlin|x-swift|x-scala|x-perl|x-lua|x-shell|x-sql|x-yaml|x-toml)$" - - # Files matching these patterns use STT (if configured) - stt: - supportedMimeTypes: - - "^audio/(mp3|mpeg|mpeg3|wav|wave|x-wav|ogg|vorbis|mp4|x-m4a|flac|x-flac|webm)$" -``` - -**Processing Priority:** OCR > STT > text parsing > fallback - -For more details, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config). - -### Optional: Configure OCR for Enhanced Extraction - -OCR is **not required** but enhances extraction quality when configured: - -```yaml -# librechat.yaml -ocr: - strategy: "mistral_ocr" - apiKey: "${OCR_API_KEY}" - baseURL: "https://api.mistral.ai/v1" - mistralModel: "mistral-ocr-latest" -``` - -See [OCR Configuration](/docs/features/ocr) for full details. - -## When to Use Each Upload Option - -LibreChat offers three different ways to upload files, each suited for different use cases: - -### Use "Upload as Text" when: -- ✅ You want the AI to read the complete document content -- ✅ Working with smaller files that fit in context -- ✅ You need "chat with files" functionality -- ✅ Using models without tool capabilities -- ✅ You want direct content access without semantic search - -### Use "Upload for File Search" when: -- ✅ Working with large documents or multiple files -- ✅ You want to optimize token usage -- ✅ You need semantic search for relevant sections -- ✅ Building knowledge bases -- ✅ The `file_search` capability is enabled and toggled ON - -### Use standard "Upload Files" when: -- ✅ Using vision models to analyze images -- ✅ Using code interpreter to execute code -- ✅ Files don't need text extraction - -## Supported File Types - -### Text Files (text parsing) -- Plain text, Markdown, CSV, JSON, XML, HTML -- Programming languages (Python, JavaScript, Java, C++, etc.) -- Configuration files (YAML, TOML, INI, etc.) -- Shell scripts, SQL files - -### Documents (text parsing or OCR) -- PDF documents -- Word documents (.docx, .doc) -- PowerPoint presentations (.pptx, .ppt) -- Excel spreadsheets (.xlsx, .xls) -- EPUB books - -### Images (OCR if configured) -- JPEG, PNG, GIF, WebP -- HEIC, HEIF (Apple formats) -- Screenshots, photos of documents, scanned images - -### Audio (STT if configured) -- MP3, WAV, OGG, FLAC -- M4A, WebM -- Voice recordings, podcasts - -## File Processing Priority - -LibreChat processes files based on MIME type matching with the following **priority order**: - -1. **OCR** - If file matches `ocr.supportedMimeTypes` AND OCR is configured -2. **STT** - If file matches `stt.supportedMimeTypes` AND STT is configured -3. **Text Parsing** - If file matches `text.supportedMimeTypes` -4. **Fallback** - Text parsing as last resort - -### Processing Examples - -**PDF file with OCR configured:** -- Matches `ocr.supportedMimeTypes` -- **Uses OCR** to extract text -- Better quality for scanned PDFs - -**PDF file without OCR configured:** -- Matches `text.supportedMimeTypes` (fallback) -- **Uses text parsing** library -- Works well for digital PDFs - -**Python file:** -- Matches `text.supportedMimeTypes` -- **Uses text parsing** (no OCR needed) -- Direct text extraction - -**Audio file with STT configured:** -- Matches `stt.supportedMimeTypes` -- **Uses STT** to transcribe - -## Token Limits - -Files are truncated to `fileTokenLimit` tokens to manage context window usage: - -```yaml +```yaml title="librechat.yaml" fileConfig: fileTokenLimit: 100000 # Default: 100,000 tokens ``` -- Truncation happens at runtime before prompt construction -- Helps prevent exceeding model context limits -- Configurable based on your needs and model capabilities -- Larger limits allow more content but use more tokens + + If your file exceeds the limit, the text is cut off at the end. If you're getting incomplete answers, this might be why. You can increase `fileTokenLimit`, but keep in mind that larger values use more tokens per message — which increases cost and may hit the model's own context limit. + -## Comparison with Other File Features - -| Feature | Capability | Requires Service | Persistence | Best For | -|---------|-----------|------------------|-------------|----------| -| **Upload as Text** | `context` | No (enhanced by OCR) | Single conversation | Temporary document questions | -| **Agent File Context** | `context` | No (enhanced by OCR) | Agent system instructions | Specialized agent knowledge | -| **File Search** | `file_search` | Yes (vector DB) | Stored in vector store | Large documents, semantic search | - -### Upload as Text vs Agent File Context - -**Upload as Text (`context`):** -- Available in any chat conversation -- Content included in current conversation only -- No OCR service required (text parsing by default) -- Best for one-off document questions - -**Agent File Context (`context`):** -- Only available in Agent Builder -- Content stored in agent's system instructions -- No OCR service required (text parsing by default) -- Best for creating specialized agents with persistent knowledge -- See [OCR for Documents](/docs/features/ocr) - -### Upload as Text vs File Search - -**Upload as Text (`context`):** -- Full document content in conversation context -- Direct access to all text -- Token usage: entire file (up to limit) -- Works without RAG API configuration - -**File Search (`file_search`):** -- Semantic search over documents -- Returns relevant chunks via tool use -- Token usage: only relevant sections -- Requires RAG API and vector store configuration -- See [RAG API](/docs/features/rag_api) - -## Example Use Cases - -- **Document Analysis**: Upload contracts, reports, or articles for analysis -- **Code Review**: Upload source files for review and suggestions -- **Data Extraction**: Extract information from structured documents -- **Translation**: Translate document contents -- **Summarization**: Summarize articles, papers, or reports -- **Research**: Discuss academic papers or technical documentation -- **Troubleshooting**: Share log files for analysis -- **Content Editing**: Review and edit written content -- **Data Processing**: Work with CSV or JSON data files - -## Troubleshooting - -### "Upload as Text" option not appearing - -**Solution:** Ensure the `context` capability is enabled: - -```yaml -endpoints: - agents: - capabilities: - - "context" # Add this if missing -``` - -### File content not extracted properly - -**Solutions:** -1. Check if file type is supported (matches `fileConfig` patterns) -2. For images/scanned documents: Configure OCR for better extraction -3. For audio files: Configure STT service -4. Verify file is not corrupted - -### Content seems truncated - -**Solution:** Increase the token limit: - -```yaml -fileConfig: - fileTokenLimit: 150000 # Increase as needed -``` - -### Poor extraction quality from images - -**Solution:** Configure OCR to enhance extraction: - -```yaml -ocr: - strategy: "mistral_ocr" - apiKey: "${OCR_API_KEY}" -``` - -See [OCR Configuration](/docs/configuration/librechat_yaml/object_structure/ocr) for details. - -## Related Features - -- [File Context](/docs/features/agents#file-context) - Files used as Agent Context -- [OCR for Documents](/docs/features/ocr) - Learn about and configure OCR services -- [File Configuration](/docs/configuration/librechat_yaml/object_structure/file_config) - Configure file handling +**Rules of thumb:** +- 100k tokens ≈ a 300-page book (plenty for most use cases) +- If you're working with very large files, consider [File Search (RAG)](/docs/features/rag_api) instead — it only retrieves the relevant sections rather than stuffing everything into context --- -Upload as Text provides a simple, powerful way to work with documents in LibreChat without requiring complex configuration or external services. +## Optional: boosting extraction with OCR +Text parsing works fine for digitally-created documents (PDFs saved from Word, code files, plain text). But if you're uploading **scanned documents, photos of pages, or images with text**, the built-in parser won't get great results. +That's where OCR comes in. When configured, LibreChat automatically uses OCR for file types that benefit from it — you don't need to do anything differently as a user. + + + + Point them to the [OCR configuration docs](/docs/features/ocr). The short version: + + ```yaml title="librechat.yaml" + ocr: + strategy: "mistral_ocr" + apiKey: "${OCR_API_KEY}" + baseURL: "https://api.mistral.ai/v1" + mistralModel: "mistral-ocr-latest" + ``` + + Once configured, OCR automatically handles images and scanned PDFs — no changes needed on the user side. + + + For transcribing audio uploads, the admin needs to configure a Speech-to-Text service. See the [STT configuration reference](/docs/configuration/librechat_yaml/object_structure/file_config#stt). + + + +--- + +## File handling configuration reference + +This section is for admins who want to control which file types get processed by which method. The defaults work well — you only need to touch this if you want to fine-tune behavior. + + + + ```yaml title="librechat.yaml" + fileConfig: + # Max tokens extracted from a single file before truncation + fileTokenLimit: 100000 + + # Files matching these MIME patterns use OCR (if configured) + ocr: + supportedMimeTypes: + - "^image/(jpeg|gif|png|webp|heic|heif)$" + - "^application/pdf$" + - "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$" + - "^application/vnd\\.ms-(word|powerpoint|excel)$" + - "^application/epub\\+zip$" + + # Files matching these MIME patterns use text parsing + text: + supportedMimeTypes: + - "^text/(plain|markdown|csv|json|xml|html|css|javascript|typescript|x-python|x-java|x-csharp|x-php|x-ruby|x-go|x-rust|x-kotlin|x-swift|x-scala|x-perl|x-lua|x-shell|x-sql|x-yaml|x-toml)$" + + # Files matching these MIME patterns use STT (if configured) + stt: + supportedMimeTypes: + - "^audio/(mp3|mpeg|mpeg3|wav|wave|x-wav|ogg|vorbis|mp4|x-m4a|flac|x-flac|webm)$" + ``` + + **Priority reminder:** OCR > STT > text parsing > fallback. + + For the full reference, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config). + + + +--- + +## Troubleshooting + + + + The `context` capability was likely removed from your configuration. Ask your admin to add it back: + + ```yaml + endpoints: + agents: + capabilities: + - "context" + ``` + + + A few things to check: + - **Scanned PDFs / images** — These need OCR to extract text properly. Without it, the parser might return garbage or nothing. Ask your admin to [configure OCR](/docs/features/ocr). + - **Audio files** — These need STT. There's no text fallback for audio. + - **Corrupted files** — Try opening the file locally to make sure it's not damaged. + - **Unsupported format** — If the MIME type doesn't match any configured pattern, LibreChat attempts a text parse fallback, which may not work for binary formats. + + + Your file probably exceeded the `fileTokenLimit`. The text was truncated. + + **Options:** + - Ask your admin to increase `fileTokenLimit` in `librechat.yaml` + - Use [File Search (RAG)](/docs/features/rag_api) instead, which retrieves relevant chunks rather than loading the entire file + - Split the file into smaller parts and upload them separately + + + Without OCR, images are processed through text parsing, which can't actually "see" text in an image. You need OCR configured for this to work. See [OCR for Documents](/docs/features/ocr). + + Alternatively, use a **vision model** with standard file upload — the model itself can read text in images. + + + +--- + +## Related + +- [OCR for Documents](/docs/features/ocr) — Set up optical character recognition for images and scans +- [RAG API (Chat with Files)](/docs/features/rag_api) — Semantic search over large document collections +- [Agents — File Context](/docs/features/agents#file-context) — Embed file content into an agent's system instructions +- [File Config reference](/docs/configuration/librechat_yaml/object_structure/file_config) — Full YAML schema for file handling diff --git a/next.config.mjs b/next.config.mjs index d5e7f17..c6c1c54 100644 --- a/next.config.mjs +++ b/next.config.mjs @@ -58,6 +58,7 @@ const nonPermanentRedirects = [ ['/docs/user_guides/rag_api', '/docs/features/rag_api'], ['/docs/user_guides/plugins', '/docs/features/agents'], ['/docs/features/plugins', '/docs/features/agents'], + ['/docs/features/speech-to-text', '/docs/configuration/stt_tts'], ['/docs/configuration/librechat_yaml/setup', '/docs/configuration/librechat_yaml'], ['/toolkit/yaml_checker', '/toolkit/yaml-checker'], ['/toolkit/creds_generator', '/toolkit/creds-generator'],