mirror of
https://github.com/LibreChat-AI/librechat.ai.git
synced 2026-03-27 02:38:32 +07:00
fix: redirect broken /docs/features/speech-to-text link + rewrite upload_as_text docs (#539)
Closes #430 — adds redirect from /docs/features/speech-to-text to /docs/configuration/stt_tts in next.config.mjs. Also rewrites the upload_as_text.mdx page from scratch: - Replaces AI-sounding boilerplate with direct, human language - Uses fumadocs components: Steps, Tabs, Cards, Accordions, Callouts - Adds practical decision guide table for choosing upload method - Moves admin-only config into collapsible Accordions - Restructures troubleshooting as expandable FAQs - Keeps all technical accuracy (MIME matching, priority chain, etc.)
This commit is contained in:
@@ -1,321 +1,308 @@
|
||||
---
|
||||
title: Upload Files as Text
|
||||
icon: Upload
|
||||
description: Upload files to include their full content in conversations without requiring OCR configuration.
|
||||
description: Drop any file into your chat and let LibreChat read it — no setup needed.
|
||||
---
|
||||
|
||||
Upload as Text allows you to upload documents and have their full content included directly in your conversation with the AI. This feature works out-of-the-box using text parsing methods, with optional OCR enhancement for improved extraction quality.
|
||||
# Upload Files as Text
|
||||
|
||||
## Overview
|
||||
Ever wanted to hand a PDF, a code file, or a spreadsheet to the AI and just say _"read this"_? That's exactly what **Upload as Text** does.
|
||||
|
||||
- **No OCR required** - Uses text parsing with fallback methods by default
|
||||
- **Enhanced by OCR** - If OCR is configured, extraction quality improves for images and scanned documents
|
||||
- **Full document content** - Entire file content available to the model in the conversation
|
||||
- **Works with all models** - No special tool capabilities needed
|
||||
- **Token limit control** - Configurable via `fileTokenLimit` to manage context usage
|
||||
You attach a file, LibreChat extracts the text from it, and the full content gets pasted straight into your conversation. The AI can then read every word of it — no plugins, no vector databases, no extra services to configure. It works out of the box.
|
||||
|
||||
## The `context` Capability
|
||||
<Callout type="info" title="Zero setup required">
|
||||
Upload as Text works immediately on any LibreChat instance. It uses built-in text parsing — you don't need OCR, a RAG pipeline, or any external service to get started.
|
||||
</Callout>
|
||||
|
||||
Upload as Text is controlled by the `context` capability in your LibreChat configuration.
|
||||
---
|
||||
|
||||
```yaml
|
||||
# librechat.yaml
|
||||
## How to use it
|
||||
|
||||
<Steps>
|
||||
<Step>
|
||||
### Click the attachment icon
|
||||
|
||||
In the chat input bar, click the **paperclip** (📎) icon.
|
||||
</Step>
|
||||
<Step>
|
||||
### Pick "Upload as Text"
|
||||
|
||||
From the dropdown menu, select **Upload as Text**. This tells LibreChat to read the file contents rather than pass it as a raw attachment.
|
||||
</Step>
|
||||
<Step>
|
||||
### Choose your file
|
||||
|
||||
Select the file from your device. LibreChat will extract the text and embed it directly into your message.
|
||||
</Step>
|
||||
<Step>
|
||||
### Ask your question
|
||||
|
||||
Type your prompt as usual. The AI now has the full text of your file in context and can reference any part of it.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
<Callout type="warn" title="Don't see the option?">
|
||||
If "Upload as Text" doesn't appear, the `context` capability may have been disabled by your admin. It's on by default — but if the capabilities list was customized, `context` needs to be explicitly included. See the [configuration section](#the-context-capability) below.
|
||||
</Callout>
|
||||
|
||||
---
|
||||
|
||||
## What happens under the hood
|
||||
|
||||
When you upload a file this way, LibreChat doesn't just dump raw bytes into the prompt. It runs through a processing pipeline to extract clean, readable text:
|
||||
|
||||
1. **MIME type detection** — LibreChat checks what kind of file you uploaded (PDF, image, audio, source code, etc.) by inspecting its MIME type.
|
||||
2. **Method selection** — Based on the file type and what services are available, it picks the best extraction method using this priority:
|
||||
|
||||
<Tabs items={["Priority order", "Decision examples"]}>
|
||||
<Tabs.Tab>
|
||||
| Priority | Method | When it's used |
|
||||
|----------|--------|---------------|
|
||||
| 1st | **OCR** | File is an image or scanned document, _and_ OCR is configured |
|
||||
| 2nd | **STT** (Speech-to-Text) | File is audio, _and_ STT is configured |
|
||||
| 3rd | **Text parsing** | File matches a known text MIME type |
|
||||
| 4th | **Fallback** | None of the above matched — tries text parsing anyway |
|
||||
</Tabs.Tab>
|
||||
<Tabs.Tab>
|
||||
**A `.pdf` on an instance with OCR configured:**
|
||||
→ OCR kicks in. Great for scanned docs and complex layouts.
|
||||
|
||||
**A `.pdf` on a default instance (no OCR):**
|
||||
→ Text parsing handles it. Works well for digitally-created PDFs.
|
||||
|
||||
**A `.py` Python file:**
|
||||
→ Straight to text parsing. Source code is already text — no conversion needed.
|
||||
|
||||
**An `.mp3` on an instance with STT configured:**
|
||||
→ Speech-to-Text transcribes it into text for the conversation.
|
||||
|
||||
**A `.png` screenshot with no OCR configured:**
|
||||
→ Falls back to text parsing (limited results — consider setting up OCR for images).
|
||||
</Tabs.Tab>
|
||||
</Tabs>
|
||||
|
||||
3. **Token truncation** — The extracted text is trimmed to the `fileTokenLimit` (default: 100,000 tokens) so it doesn't blow past the model's context window.
|
||||
4. **Prompt injection** — The text gets included in the conversation context, right alongside your message.
|
||||
|
||||
---
|
||||
|
||||
## Which files are supported
|
||||
|
||||
<Tabs items={["Text & code", "Documents", "Images", "Audio"]}>
|
||||
<Tabs.Tab>
|
||||
These are parsed directly — they're already text, so no conversion is needed.
|
||||
|
||||
- Plain text (`.txt`), Markdown (`.md`), CSV, JSON, XML, HTML, CSS
|
||||
- Programming languages — Python, JavaScript, TypeScript, Java, C#, PHP, Ruby, Go, Rust, Kotlin, Swift, Scala, Perl, Lua
|
||||
- Config files — YAML, TOML, INI
|
||||
- Shell scripts, SQL files
|
||||
</Tabs.Tab>
|
||||
<Tabs.Tab>
|
||||
Text parsing handles these out of the box. If OCR is configured, it takes over for better accuracy on complex layouts.
|
||||
|
||||
- **PDF** — digital and scanned (scanned PDFs benefit from OCR)
|
||||
- **Word** — `.docx`, `.doc`
|
||||
- **PowerPoint** — `.pptx`, `.ppt`
|
||||
- **Excel** — `.xlsx`, `.xls`
|
||||
- **EPUB** books
|
||||
</Tabs.Tab>
|
||||
<Tabs.Tab>
|
||||
Images **require OCR** to produce useful text. Without it, results will be poor.
|
||||
|
||||
- JPEG, PNG, GIF, WebP
|
||||
- HEIC, HEIF (Apple formats)
|
||||
- Screenshots, photos of documents, scanned pages
|
||||
</Tabs.Tab>
|
||||
<Tabs.Tab>
|
||||
Audio files **require STT** to be configured. There's no fallback — audio can't be "text parsed."
|
||||
|
||||
- MP3, WAV, OGG, FLAC
|
||||
- M4A, WebM
|
||||
- Voice recordings, podcast clips
|
||||
</Tabs.Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Upload as Text vs. other upload options
|
||||
|
||||
LibreChat has three ways to upload files. Each one works differently and suits different situations:
|
||||
|
||||
<Cards>
|
||||
<Card title="Upload as Text" href="#how-to-use-it">
|
||||
Extracts the full file content and drops it into the conversation. Best for smaller files where you want the AI to read everything — contracts, code files, articles. Works with all models, no extra services needed.
|
||||
</Card>
|
||||
<Card title="Upload for File Search (RAG)" href="/docs/features/rag_api">
|
||||
Indexes the file in a vector database and retrieves only the relevant chunks when you ask a question. Better for large files or collections of files where dumping everything into context would waste tokens. Requires the RAG API.
|
||||
</Card>
|
||||
<Card title="Standard Upload" href="/docs/features/agents">
|
||||
Passes the file directly to the model — used for vision models analyzing images, or code interpreter running scripts. No text extraction happens.
|
||||
</Card>
|
||||
</Cards>
|
||||
|
||||
**Quick decision guide:**
|
||||
|
||||
| Situation | Best option |
|
||||
|-----------|-------------|
|
||||
| _"Read this 5-page contract and summarize it"_ | **Upload as Text** |
|
||||
| _"I have 50 PDFs, find what mentions pricing"_ | **File Search (RAG)** |
|
||||
| _"What's in this screenshot?"_ (vision model) | **Standard Upload** |
|
||||
| _"Run this Python script"_ (code interpreter) | **Standard Upload** |
|
||||
| _"Review this code file for bugs"_ | **Upload as Text** |
|
||||
| _"Search through our company docs"_ | **File Search (RAG)** |
|
||||
|
||||
---
|
||||
|
||||
## The `context` capability
|
||||
|
||||
Under the hood, Upload as Text is powered by the **`context` capability**. This is what controls whether the feature appears in your chat UI.
|
||||
|
||||
<Callout type="info">
|
||||
The `context` capability is **enabled by default**. You only need to touch this if your admin has customized the capabilities list and accidentally left it out.
|
||||
</Callout>
|
||||
|
||||
```yaml title="librechat.yaml"
|
||||
endpoints:
|
||||
agents:
|
||||
capabilities:
|
||||
- "context" # Enables "Upload as Text"
|
||||
- "context" # This is what enables "Upload as Text"
|
||||
```
|
||||
|
||||
**Default:** The `context` capability is included by default. You only need to explicitly add it if you've customized the capabilities list.
|
||||
The same `context` capability also powers **Agent File Context** (uploading files through the Agent Builder to embed text into an agent's system instructions). The difference is _where_ the text ends up:
|
||||
|
||||
## How It Works
|
||||
| | Upload as Text | Agent File Context |
|
||||
|---|---|---|
|
||||
| **Where** | Chat input (any conversation) | Agent Builder panel |
|
||||
| **Scope** | Current conversation only | Persists in agent's instructions |
|
||||
| **Use case** | One-off document questions | Building specialized agents with baked-in knowledge |
|
||||
|
||||
When you upload a file using "Upload as Text":
|
||||
---
|
||||
|
||||
1. LibreChat checks the file MIME type against `fileConfig` patterns
|
||||
2. **Processing method determined by precedence: OCR > STT > text parsing**
|
||||
3. If file matches `fileConfig.ocr.supportedMimeTypes` AND OCR is configured: **Use OCR**
|
||||
4. If file matches `fileConfig.stt.supportedMimeTypes` AND STT is configured: **Use STT**
|
||||
5. If file matches `fileConfig.text.supportedMimeTypes`: **Use text parsing**
|
||||
6. Otherwise: **Fallback to text parsing**
|
||||
7. Text is truncated to `fileConfig.fileTokenLimit` before prompt construction
|
||||
8. Full extracted text included in conversation context
|
||||
## Token limits and truncation
|
||||
|
||||
### Text Processing Methods
|
||||
When a file is too long to fit in the model's context window, LibreChat truncates the extracted text to stay within bounds. This happens automatically — you don't need to worry about it, but it's good to know how it works.
|
||||
|
||||
**Text Parsing (Default):**
|
||||
- Uses a robust parsing library (same as the RAG API)
|
||||
- Handles PDFs, Word docs, text files, code files, and more
|
||||
- No external service required
|
||||
- Works immediately without configuration
|
||||
- Fallback method if no other match
|
||||
|
||||
**OCR Enhancement (Optional):**
|
||||
- Improves extraction from images, scanned documents, and complex PDFs
|
||||
- Requires OCR service configuration
|
||||
- Automatically used for files matching `fileConfig.ocr.supportedMimeTypes` when available
|
||||
- See [OCR Configuration](/docs/features/ocr)
|
||||
|
||||
**STT Processing (Optional):**
|
||||
- Converts audio files to text
|
||||
- Requires STT service configuration
|
||||
- See [Speech-to-Text Configuration](/docs/configuration/librechat_yaml/object_structure/file_config#stt)
|
||||
|
||||
## Usage
|
||||
|
||||
1. Click the attachment icon in the chat input
|
||||
2. Select "Upload as Text" from the menu
|
||||
3. Choose your file
|
||||
4. File content is extracted and included in your message
|
||||
|
||||
**Note:** If you don't see "Upload as Text", ensure the `context` capability is enabled in your [`endpoints.agents.capabilities` configuration](/docs/configuration/librechat_yaml/object_structure/agents#capabilities).
|
||||
|
||||
## Configuration
|
||||
|
||||
### Basic Configuration
|
||||
|
||||
The `context` capability is enabled by default. No additional configuration is required for basic text parsing functionality.
|
||||
|
||||
### File Handling Configuration
|
||||
|
||||
Control text processing behavior with `fileConfig`:
|
||||
|
||||
```yaml
|
||||
fileConfig:
|
||||
# Maximum tokens from text files before truncation
|
||||
fileTokenLimit: 100000
|
||||
|
||||
# Files matching these patterns use OCR (if configured)
|
||||
ocr:
|
||||
supportedMimeTypes:
|
||||
- "^image/(jpeg|gif|png|webp|heic|heif)$"
|
||||
- "^application/pdf$"
|
||||
- "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$"
|
||||
- "^application/vnd\\.ms-(word|powerpoint|excel)$"
|
||||
- "^application/epub\\+zip$"
|
||||
|
||||
# Files matching these patterns use text parsing
|
||||
text:
|
||||
supportedMimeTypes:
|
||||
- "^text/(plain|markdown|csv|json|xml|html|css|javascript|typescript|x-python|x-java|x-csharp|x-php|x-ruby|x-go|x-rust|x-kotlin|x-swift|x-scala|x-perl|x-lua|x-shell|x-sql|x-yaml|x-toml)$"
|
||||
|
||||
# Files matching these patterns use STT (if configured)
|
||||
stt:
|
||||
supportedMimeTypes:
|
||||
- "^audio/(mp3|mpeg|mpeg3|wav|wave|x-wav|ogg|vorbis|mp4|x-m4a|flac|x-flac|webm)$"
|
||||
```
|
||||
|
||||
**Processing Priority:** OCR > STT > text parsing > fallback
|
||||
|
||||
For more details, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config).
|
||||
|
||||
### Optional: Configure OCR for Enhanced Extraction
|
||||
|
||||
OCR is **not required** but enhances extraction quality when configured:
|
||||
|
||||
```yaml
|
||||
# librechat.yaml
|
||||
ocr:
|
||||
strategy: "mistral_ocr"
|
||||
apiKey: "${OCR_API_KEY}"
|
||||
baseURL: "https://api.mistral.ai/v1"
|
||||
mistralModel: "mistral-ocr-latest"
|
||||
```
|
||||
|
||||
See [OCR Configuration](/docs/features/ocr) for full details.
|
||||
|
||||
## When to Use Each Upload Option
|
||||
|
||||
LibreChat offers three different ways to upload files, each suited for different use cases:
|
||||
|
||||
### Use "Upload as Text" when:
|
||||
- ✅ You want the AI to read the complete document content
|
||||
- ✅ Working with smaller files that fit in context
|
||||
- ✅ You need "chat with files" functionality
|
||||
- ✅ Using models without tool capabilities
|
||||
- ✅ You want direct content access without semantic search
|
||||
|
||||
### Use "Upload for File Search" when:
|
||||
- ✅ Working with large documents or multiple files
|
||||
- ✅ You want to optimize token usage
|
||||
- ✅ You need semantic search for relevant sections
|
||||
- ✅ Building knowledge bases
|
||||
- ✅ The `file_search` capability is enabled and toggled ON
|
||||
|
||||
### Use standard "Upload Files" when:
|
||||
- ✅ Using vision models to analyze images
|
||||
- ✅ Using code interpreter to execute code
|
||||
- ✅ Files don't need text extraction
|
||||
|
||||
## Supported File Types
|
||||
|
||||
### Text Files (text parsing)
|
||||
- Plain text, Markdown, CSV, JSON, XML, HTML
|
||||
- Programming languages (Python, JavaScript, Java, C++, etc.)
|
||||
- Configuration files (YAML, TOML, INI, etc.)
|
||||
- Shell scripts, SQL files
|
||||
|
||||
### Documents (text parsing or OCR)
|
||||
- PDF documents
|
||||
- Word documents (.docx, .doc)
|
||||
- PowerPoint presentations (.pptx, .ppt)
|
||||
- Excel spreadsheets (.xlsx, .xls)
|
||||
- EPUB books
|
||||
|
||||
### Images (OCR if configured)
|
||||
- JPEG, PNG, GIF, WebP
|
||||
- HEIC, HEIF (Apple formats)
|
||||
- Screenshots, photos of documents, scanned images
|
||||
|
||||
### Audio (STT if configured)
|
||||
- MP3, WAV, OGG, FLAC
|
||||
- M4A, WebM
|
||||
- Voice recordings, podcasts
|
||||
|
||||
## File Processing Priority
|
||||
|
||||
LibreChat processes files based on MIME type matching with the following **priority order**:
|
||||
|
||||
1. **OCR** - If file matches `ocr.supportedMimeTypes` AND OCR is configured
|
||||
2. **STT** - If file matches `stt.supportedMimeTypes` AND STT is configured
|
||||
3. **Text Parsing** - If file matches `text.supportedMimeTypes`
|
||||
4. **Fallback** - Text parsing as last resort
|
||||
|
||||
### Processing Examples
|
||||
|
||||
**PDF file with OCR configured:**
|
||||
- Matches `ocr.supportedMimeTypes`
|
||||
- **Uses OCR** to extract text
|
||||
- Better quality for scanned PDFs
|
||||
|
||||
**PDF file without OCR configured:**
|
||||
- Matches `text.supportedMimeTypes` (fallback)
|
||||
- **Uses text parsing** library
|
||||
- Works well for digital PDFs
|
||||
|
||||
**Python file:**
|
||||
- Matches `text.supportedMimeTypes`
|
||||
- **Uses text parsing** (no OCR needed)
|
||||
- Direct text extraction
|
||||
|
||||
**Audio file with STT configured:**
|
||||
- Matches `stt.supportedMimeTypes`
|
||||
- **Uses STT** to transcribe
|
||||
|
||||
## Token Limits
|
||||
|
||||
Files are truncated to `fileTokenLimit` tokens to manage context window usage:
|
||||
|
||||
```yaml
|
||||
```yaml title="librechat.yaml"
|
||||
fileConfig:
|
||||
fileTokenLimit: 100000 # Default: 100,000 tokens
|
||||
```
|
||||
|
||||
- Truncation happens at runtime before prompt construction
|
||||
- Helps prevent exceeding model context limits
|
||||
- Configurable based on your needs and model capabilities
|
||||
- Larger limits allow more content but use more tokens
|
||||
<Callout type="warn" title="Truncation means lost content">
|
||||
If your file exceeds the limit, the text is cut off at the end. If you're getting incomplete answers, this might be why. You can increase `fileTokenLimit`, but keep in mind that larger values use more tokens per message — which increases cost and may hit the model's own context limit.
|
||||
</Callout>
|
||||
|
||||
## Comparison with Other File Features
|
||||
|
||||
| Feature | Capability | Requires Service | Persistence | Best For |
|
||||
|---------|-----------|------------------|-------------|----------|
|
||||
| **Upload as Text** | `context` | No (enhanced by OCR) | Single conversation | Temporary document questions |
|
||||
| **Agent File Context** | `context` | No (enhanced by OCR) | Agent system instructions | Specialized agent knowledge |
|
||||
| **File Search** | `file_search` | Yes (vector DB) | Stored in vector store | Large documents, semantic search |
|
||||
|
||||
### Upload as Text vs Agent File Context
|
||||
|
||||
**Upload as Text (`context`):**
|
||||
- Available in any chat conversation
|
||||
- Content included in current conversation only
|
||||
- No OCR service required (text parsing by default)
|
||||
- Best for one-off document questions
|
||||
|
||||
**Agent File Context (`context`):**
|
||||
- Only available in Agent Builder
|
||||
- Content stored in agent's system instructions
|
||||
- No OCR service required (text parsing by default)
|
||||
- Best for creating specialized agents with persistent knowledge
|
||||
- See [OCR for Documents](/docs/features/ocr)
|
||||
|
||||
### Upload as Text vs File Search
|
||||
|
||||
**Upload as Text (`context`):**
|
||||
- Full document content in conversation context
|
||||
- Direct access to all text
|
||||
- Token usage: entire file (up to limit)
|
||||
- Works without RAG API configuration
|
||||
|
||||
**File Search (`file_search`):**
|
||||
- Semantic search over documents
|
||||
- Returns relevant chunks via tool use
|
||||
- Token usage: only relevant sections
|
||||
- Requires RAG API and vector store configuration
|
||||
- See [RAG API](/docs/features/rag_api)
|
||||
|
||||
## Example Use Cases
|
||||
|
||||
- **Document Analysis**: Upload contracts, reports, or articles for analysis
|
||||
- **Code Review**: Upload source files for review and suggestions
|
||||
- **Data Extraction**: Extract information from structured documents
|
||||
- **Translation**: Translate document contents
|
||||
- **Summarization**: Summarize articles, papers, or reports
|
||||
- **Research**: Discuss academic papers or technical documentation
|
||||
- **Troubleshooting**: Share log files for analysis
|
||||
- **Content Editing**: Review and edit written content
|
||||
- **Data Processing**: Work with CSV or JSON data files
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Upload as Text" option not appearing
|
||||
|
||||
**Solution:** Ensure the `context` capability is enabled:
|
||||
|
||||
```yaml
|
||||
endpoints:
|
||||
agents:
|
||||
capabilities:
|
||||
- "context" # Add this if missing
|
||||
```
|
||||
|
||||
### File content not extracted properly
|
||||
|
||||
**Solutions:**
|
||||
1. Check if file type is supported (matches `fileConfig` patterns)
|
||||
2. For images/scanned documents: Configure OCR for better extraction
|
||||
3. For audio files: Configure STT service
|
||||
4. Verify file is not corrupted
|
||||
|
||||
### Content seems truncated
|
||||
|
||||
**Solution:** Increase the token limit:
|
||||
|
||||
```yaml
|
||||
fileConfig:
|
||||
fileTokenLimit: 150000 # Increase as needed
|
||||
```
|
||||
|
||||
### Poor extraction quality from images
|
||||
|
||||
**Solution:** Configure OCR to enhance extraction:
|
||||
|
||||
```yaml
|
||||
ocr:
|
||||
strategy: "mistral_ocr"
|
||||
apiKey: "${OCR_API_KEY}"
|
||||
```
|
||||
|
||||
See [OCR Configuration](/docs/configuration/librechat_yaml/object_structure/ocr) for details.
|
||||
|
||||
## Related Features
|
||||
|
||||
- [File Context](/docs/features/agents#file-context) - Files used as Agent Context
|
||||
- [OCR for Documents](/docs/features/ocr) - Learn about and configure OCR services
|
||||
- [File Configuration](/docs/configuration/librechat_yaml/object_structure/file_config) - Configure file handling
|
||||
**Rules of thumb:**
|
||||
- 100k tokens ≈ a 300-page book (plenty for most use cases)
|
||||
- If you're working with very large files, consider [File Search (RAG)](/docs/features/rag_api) instead — it only retrieves the relevant sections rather than stuffing everything into context
|
||||
|
||||
---
|
||||
|
||||
Upload as Text provides a simple, powerful way to work with documents in LibreChat without requiring complex configuration or external services.
|
||||
## Optional: boosting extraction with OCR
|
||||
|
||||
Text parsing works fine for digitally-created documents (PDFs saved from Word, code files, plain text). But if you're uploading **scanned documents, photos of pages, or images with text**, the built-in parser won't get great results.
|
||||
|
||||
That's where OCR comes in. When configured, LibreChat automatically uses OCR for file types that benefit from it — you don't need to do anything differently as a user.
|
||||
|
||||
<Accordions>
|
||||
<Accordion title="How to tell your admin to set up OCR">
|
||||
Point them to the [OCR configuration docs](/docs/features/ocr). The short version:
|
||||
|
||||
```yaml title="librechat.yaml"
|
||||
ocr:
|
||||
strategy: "mistral_ocr"
|
||||
apiKey: "${OCR_API_KEY}"
|
||||
baseURL: "https://api.mistral.ai/v1"
|
||||
mistralModel: "mistral-ocr-latest"
|
||||
```
|
||||
|
||||
Once configured, OCR automatically handles images and scanned PDFs — no changes needed on the user side.
|
||||
</Accordion>
|
||||
<Accordion title="How to tell your admin to set up STT (for audio files)">
|
||||
For transcribing audio uploads, the admin needs to configure a Speech-to-Text service. See the [STT configuration reference](/docs/configuration/librechat_yaml/object_structure/file_config#stt).
|
||||
</Accordion>
|
||||
</Accordions>
|
||||
|
||||
---
|
||||
|
||||
## File handling configuration reference
|
||||
|
||||
This section is for admins who want to control which file types get processed by which method. The defaults work well — you only need to touch this if you want to fine-tune behavior.
|
||||
|
||||
<Accordions>
|
||||
<Accordion title="Full fileConfig example">
|
||||
```yaml title="librechat.yaml"
|
||||
fileConfig:
|
||||
# Max tokens extracted from a single file before truncation
|
||||
fileTokenLimit: 100000
|
||||
|
||||
# Files matching these MIME patterns use OCR (if configured)
|
||||
ocr:
|
||||
supportedMimeTypes:
|
||||
- "^image/(jpeg|gif|png|webp|heic|heif)$"
|
||||
- "^application/pdf$"
|
||||
- "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$"
|
||||
- "^application/vnd\\.ms-(word|powerpoint|excel)$"
|
||||
- "^application/epub\\+zip$"
|
||||
|
||||
# Files matching these MIME patterns use text parsing
|
||||
text:
|
||||
supportedMimeTypes:
|
||||
- "^text/(plain|markdown|csv|json|xml|html|css|javascript|typescript|x-python|x-java|x-csharp|x-php|x-ruby|x-go|x-rust|x-kotlin|x-swift|x-scala|x-perl|x-lua|x-shell|x-sql|x-yaml|x-toml)$"
|
||||
|
||||
# Files matching these MIME patterns use STT (if configured)
|
||||
stt:
|
||||
supportedMimeTypes:
|
||||
- "^audio/(mp3|mpeg|mpeg3|wav|wave|x-wav|ogg|vorbis|mp4|x-m4a|flac|x-flac|webm)$"
|
||||
```
|
||||
|
||||
**Priority reminder:** OCR > STT > text parsing > fallback.
|
||||
|
||||
For the full reference, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config).
|
||||
</Accordion>
|
||||
</Accordions>
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
<Accordions>
|
||||
<Accordion title="Upload as Text option not appearing">
|
||||
The `context` capability was likely removed from your configuration. Ask your admin to add it back:
|
||||
|
||||
```yaml
|
||||
endpoints:
|
||||
agents:
|
||||
capabilities:
|
||||
- "context"
|
||||
```
|
||||
</Accordion>
|
||||
<Accordion title="File content looks wrong or is mostly empty">
|
||||
A few things to check:
|
||||
- **Scanned PDFs / images** — These need OCR to extract text properly. Without it, the parser might return garbage or nothing. Ask your admin to [configure OCR](/docs/features/ocr).
|
||||
- **Audio files** — These need STT. There's no text fallback for audio.
|
||||
- **Corrupted files** — Try opening the file locally to make sure it's not damaged.
|
||||
- **Unsupported format** — If the MIME type doesn't match any configured pattern, LibreChat attempts a text parse fallback, which may not work for binary formats.
|
||||
</Accordion>
|
||||
<Accordion title="The AI seems to be missing part of my file">
|
||||
Your file probably exceeded the `fileTokenLimit`. The text was truncated.
|
||||
|
||||
**Options:**
|
||||
- Ask your admin to increase `fileTokenLimit` in `librechat.yaml`
|
||||
- Use [File Search (RAG)](/docs/features/rag_api) instead, which retrieves relevant chunks rather than loading the entire file
|
||||
- Split the file into smaller parts and upload them separately
|
||||
</Accordion>
|
||||
<Accordion title="Images upload but the AI can't read the text in them">
|
||||
Without OCR, images are processed through text parsing, which can't actually "see" text in an image. You need OCR configured for this to work. See [OCR for Documents](/docs/features/ocr).
|
||||
|
||||
Alternatively, use a **vision model** with standard file upload — the model itself can read text in images.
|
||||
</Accordion>
|
||||
</Accordions>
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- [OCR for Documents](/docs/features/ocr) — Set up optical character recognition for images and scans
|
||||
- [RAG API (Chat with Files)](/docs/features/rag_api) — Semantic search over large document collections
|
||||
- [Agents — File Context](/docs/features/agents#file-context) — Embed file content into an agent's system instructions
|
||||
- [File Config reference](/docs/configuration/librechat_yaml/object_structure/file_config) — Full YAML schema for file handling
|
||||
|
||||
@@ -58,6 +58,7 @@ const nonPermanentRedirects = [
|
||||
['/docs/user_guides/rag_api', '/docs/features/rag_api'],
|
||||
['/docs/user_guides/plugins', '/docs/features/agents'],
|
||||
['/docs/features/plugins', '/docs/features/agents'],
|
||||
['/docs/features/speech-to-text', '/docs/configuration/stt_tts'],
|
||||
['/docs/configuration/librechat_yaml/setup', '/docs/configuration/librechat_yaml'],
|
||||
['/toolkit/yaml_checker', '/toolkit/yaml-checker'],
|
||||
['/toolkit/creds_generator', '/toolkit/creds-generator'],
|
||||
|
||||
Reference in New Issue
Block a user