fix: redirect broken /docs/features/speech-to-text link + rewrite upload_as_text docs (#539)

Closes #430 — adds redirect from /docs/features/speech-to-text to
/docs/configuration/stt_tts in next.config.mjs.

Also rewrites the upload_as_text.mdx page from scratch:
- Replaces AI-sounding boilerplate with direct, human language
- Uses fumadocs components: Steps, Tabs, Cards, Accordions, Callouts
- Adds practical decision guide table for choosing upload method
- Moves admin-only config into collapsible Accordions
- Restructures troubleshooting as expandable FAQs
- Keeps all technical accuracy (MIME matching, priority chain, etc.)
This commit is contained in:
Marco Beretta
2026-03-21 17:14:27 +01:00
committed by GitHub
parent 4113721ae1
commit 00586da525
2 changed files with 279 additions and 291 deletions

View File

@@ -1,321 +1,308 @@
---
title: Upload Files as Text
icon: Upload
description: Upload files to include their full content in conversations without requiring OCR configuration.
description: Drop any file into your chat and let LibreChat read it — no setup needed.
---
Upload as Text allows you to upload documents and have their full content included directly in your conversation with the AI. This feature works out-of-the-box using text parsing methods, with optional OCR enhancement for improved extraction quality.
# Upload Files as Text
## Overview
Ever wanted to hand a PDF, a code file, or a spreadsheet to the AI and just say _"read this"_? That's exactly what **Upload as Text** does.
- **No OCR required** - Uses text parsing with fallback methods by default
- **Enhanced by OCR** - If OCR is configured, extraction quality improves for images and scanned documents
- **Full document content** - Entire file content available to the model in the conversation
- **Works with all models** - No special tool capabilities needed
- **Token limit control** - Configurable via `fileTokenLimit` to manage context usage
You attach a file, LibreChat extracts the text from it, and the full content gets pasted straight into your conversation. The AI can then read every word of it — no plugins, no vector databases, no extra services to configure. It works out of the box.
## The `context` Capability
<Callout type="info" title="Zero setup required">
Upload as Text works immediately on any LibreChat instance. It uses built-in text parsing — you don't need OCR, a RAG pipeline, or any external service to get started.
</Callout>
Upload as Text is controlled by the `context` capability in your LibreChat configuration.
---
```yaml
# librechat.yaml
## How to use it
<Steps>
<Step>
### Click the attachment icon
In the chat input bar, click the **paperclip** (📎) icon.
</Step>
<Step>
### Pick "Upload as Text"
From the dropdown menu, select **Upload as Text**. This tells LibreChat to read the file contents rather than pass it as a raw attachment.
</Step>
<Step>
### Choose your file
Select the file from your device. LibreChat will extract the text and embed it directly into your message.
</Step>
<Step>
### Ask your question
Type your prompt as usual. The AI now has the full text of your file in context and can reference any part of it.
</Step>
</Steps>
<Callout type="warn" title="Don't see the option?">
If "Upload as Text" doesn't appear, the `context` capability may have been disabled by your admin. It's on by default — but if the capabilities list was customized, `context` needs to be explicitly included. See the [configuration section](#the-context-capability) below.
</Callout>
---
## What happens under the hood
When you upload a file this way, LibreChat doesn't just dump raw bytes into the prompt. It runs through a processing pipeline to extract clean, readable text:
1. **MIME type detection** — LibreChat checks what kind of file you uploaded (PDF, image, audio, source code, etc.) by inspecting its MIME type.
2. **Method selection** — Based on the file type and what services are available, it picks the best extraction method using this priority:
<Tabs items={["Priority order", "Decision examples"]}>
<Tabs.Tab>
| Priority | Method | When it's used |
|----------|--------|---------------|
| 1st | **OCR** | File is an image or scanned document, _and_ OCR is configured |
| 2nd | **STT** (Speech-to-Text) | File is audio, _and_ STT is configured |
| 3rd | **Text parsing** | File matches a known text MIME type |
| 4th | **Fallback** | None of the above matched — tries text parsing anyway |
</Tabs.Tab>
<Tabs.Tab>
**A `.pdf` on an instance with OCR configured:**
→ OCR kicks in. Great for scanned docs and complex layouts.
**A `.pdf` on a default instance (no OCR):**
→ Text parsing handles it. Works well for digitally-created PDFs.
**A `.py` Python file:**
→ Straight to text parsing. Source code is already text — no conversion needed.
**An `.mp3` on an instance with STT configured:**
→ Speech-to-Text transcribes it into text for the conversation.
**A `.png` screenshot with no OCR configured:**
→ Falls back to text parsing (limited results — consider setting up OCR for images).
</Tabs.Tab>
</Tabs>
3. **Token truncation** — The extracted text is trimmed to the `fileTokenLimit` (default: 100,000 tokens) so it doesn't blow past the model's context window.
4. **Prompt injection** — The text gets included in the conversation context, right alongside your message.
---
## Which files are supported
<Tabs items={["Text & code", "Documents", "Images", "Audio"]}>
<Tabs.Tab>
These are parsed directly — they're already text, so no conversion is needed.
- Plain text (`.txt`), Markdown (`.md`), CSV, JSON, XML, HTML, CSS
- Programming languages — Python, JavaScript, TypeScript, Java, C#, PHP, Ruby, Go, Rust, Kotlin, Swift, Scala, Perl, Lua
- Config files — YAML, TOML, INI
- Shell scripts, SQL files
</Tabs.Tab>
<Tabs.Tab>
Text parsing handles these out of the box. If OCR is configured, it takes over for better accuracy on complex layouts.
- **PDF** — digital and scanned (scanned PDFs benefit from OCR)
- **Word** — `.docx`, `.doc`
- **PowerPoint** — `.pptx`, `.ppt`
- **Excel** — `.xlsx`, `.xls`
- **EPUB** books
</Tabs.Tab>
<Tabs.Tab>
Images **require OCR** to produce useful text. Without it, results will be poor.
- JPEG, PNG, GIF, WebP
- HEIC, HEIF (Apple formats)
- Screenshots, photos of documents, scanned pages
</Tabs.Tab>
<Tabs.Tab>
Audio files **require STT** to be configured. There's no fallback — audio can't be "text parsed."
- MP3, WAV, OGG, FLAC
- M4A, WebM
- Voice recordings, podcast clips
</Tabs.Tab>
</Tabs>
---
## Upload as Text vs. other upload options
LibreChat has three ways to upload files. Each one works differently and suits different situations:
<Cards>
<Card title="Upload as Text" href="#how-to-use-it">
Extracts the full file content and drops it into the conversation. Best for smaller files where you want the AI to read everything — contracts, code files, articles. Works with all models, no extra services needed.
</Card>
<Card title="Upload for File Search (RAG)" href="/docs/features/rag_api">
Indexes the file in a vector database and retrieves only the relevant chunks when you ask a question. Better for large files or collections of files where dumping everything into context would waste tokens. Requires the RAG API.
</Card>
<Card title="Standard Upload" href="/docs/features/agents">
Passes the file directly to the model — used for vision models analyzing images, or code interpreter running scripts. No text extraction happens.
</Card>
</Cards>
**Quick decision guide:**
| Situation | Best option |
|-----------|-------------|
| _"Read this 5-page contract and summarize it"_ | **Upload as Text** |
| _"I have 50 PDFs, find what mentions pricing"_ | **File Search (RAG)** |
| _"What's in this screenshot?"_ (vision model) | **Standard Upload** |
| _"Run this Python script"_ (code interpreter) | **Standard Upload** |
| _"Review this code file for bugs"_ | **Upload as Text** |
| _"Search through our company docs"_ | **File Search (RAG)** |
---
## The `context` capability
Under the hood, Upload as Text is powered by the **`context` capability**. This is what controls whether the feature appears in your chat UI.
<Callout type="info">
The `context` capability is **enabled by default**. You only need to touch this if your admin has customized the capabilities list and accidentally left it out.
</Callout>
```yaml title="librechat.yaml"
endpoints:
agents:
capabilities:
- "context" # Enables "Upload as Text"
- "context" # This is what enables "Upload as Text"
```
**Default:** The `context` capability is included by default. You only need to explicitly add it if you've customized the capabilities list.
The same `context` capability also powers **Agent File Context** (uploading files through the Agent Builder to embed text into an agent's system instructions). The difference is _where_ the text ends up:
## How It Works
| | Upload as Text | Agent File Context |
|---|---|---|
| **Where** | Chat input (any conversation) | Agent Builder panel |
| **Scope** | Current conversation only | Persists in agent's instructions |
| **Use case** | One-off document questions | Building specialized agents with baked-in knowledge |
When you upload a file using "Upload as Text":
---
1. LibreChat checks the file MIME type against `fileConfig` patterns
2. **Processing method determined by precedence: OCR > STT > text parsing**
3. If file matches `fileConfig.ocr.supportedMimeTypes` AND OCR is configured: **Use OCR**
4. If file matches `fileConfig.stt.supportedMimeTypes` AND STT is configured: **Use STT**
5. If file matches `fileConfig.text.supportedMimeTypes`: **Use text parsing**
6. Otherwise: **Fallback to text parsing**
7. Text is truncated to `fileConfig.fileTokenLimit` before prompt construction
8. Full extracted text included in conversation context
## Token limits and truncation
### Text Processing Methods
When a file is too long to fit in the model's context window, LibreChat truncates the extracted text to stay within bounds. This happens automatically — you don't need to worry about it, but it's good to know how it works.
**Text Parsing (Default):**
- Uses a robust parsing library (same as the RAG API)
- Handles PDFs, Word docs, text files, code files, and more
- No external service required
- Works immediately without configuration
- Fallback method if no other match
**OCR Enhancement (Optional):**
- Improves extraction from images, scanned documents, and complex PDFs
- Requires OCR service configuration
- Automatically used for files matching `fileConfig.ocr.supportedMimeTypes` when available
- See [OCR Configuration](/docs/features/ocr)
**STT Processing (Optional):**
- Converts audio files to text
- Requires STT service configuration
- See [Speech-to-Text Configuration](/docs/configuration/librechat_yaml/object_structure/file_config#stt)
## Usage
1. Click the attachment icon in the chat input
2. Select "Upload as Text" from the menu
3. Choose your file
4. File content is extracted and included in your message
**Note:** If you don't see "Upload as Text", ensure the `context` capability is enabled in your [`endpoints.agents.capabilities` configuration](/docs/configuration/librechat_yaml/object_structure/agents#capabilities).
## Configuration
### Basic Configuration
The `context` capability is enabled by default. No additional configuration is required for basic text parsing functionality.
### File Handling Configuration
Control text processing behavior with `fileConfig`:
```yaml
fileConfig:
# Maximum tokens from text files before truncation
fileTokenLimit: 100000
# Files matching these patterns use OCR (if configured)
ocr:
supportedMimeTypes:
- "^image/(jpeg|gif|png|webp|heic|heif)$"
- "^application/pdf$"
- "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$"
- "^application/vnd\\.ms-(word|powerpoint|excel)$"
- "^application/epub\\+zip$"
# Files matching these patterns use text parsing
text:
supportedMimeTypes:
- "^text/(plain|markdown|csv|json|xml|html|css|javascript|typescript|x-python|x-java|x-csharp|x-php|x-ruby|x-go|x-rust|x-kotlin|x-swift|x-scala|x-perl|x-lua|x-shell|x-sql|x-yaml|x-toml)$"
# Files matching these patterns use STT (if configured)
stt:
supportedMimeTypes:
- "^audio/(mp3|mpeg|mpeg3|wav|wave|x-wav|ogg|vorbis|mp4|x-m4a|flac|x-flac|webm)$"
```
**Processing Priority:** OCR > STT > text parsing > fallback
For more details, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config).
### Optional: Configure OCR for Enhanced Extraction
OCR is **not required** but enhances extraction quality when configured:
```yaml
# librechat.yaml
ocr:
strategy: "mistral_ocr"
apiKey: "${OCR_API_KEY}"
baseURL: "https://api.mistral.ai/v1"
mistralModel: "mistral-ocr-latest"
```
See [OCR Configuration](/docs/features/ocr) for full details.
## When to Use Each Upload Option
LibreChat offers three different ways to upload files, each suited for different use cases:
### Use "Upload as Text" when:
- ✅ You want the AI to read the complete document content
- ✅ Working with smaller files that fit in context
- ✅ You need "chat with files" functionality
- ✅ Using models without tool capabilities
- ✅ You want direct content access without semantic search
### Use "Upload for File Search" when:
- ✅ Working with large documents or multiple files
- ✅ You want to optimize token usage
- ✅ You need semantic search for relevant sections
- ✅ Building knowledge bases
- ✅ The `file_search` capability is enabled and toggled ON
### Use standard "Upload Files" when:
- ✅ Using vision models to analyze images
- ✅ Using code interpreter to execute code
- ✅ Files don't need text extraction
## Supported File Types
### Text Files (text parsing)
- Plain text, Markdown, CSV, JSON, XML, HTML
- Programming languages (Python, JavaScript, Java, C++, etc.)
- Configuration files (YAML, TOML, INI, etc.)
- Shell scripts, SQL files
### Documents (text parsing or OCR)
- PDF documents
- Word documents (.docx, .doc)
- PowerPoint presentations (.pptx, .ppt)
- Excel spreadsheets (.xlsx, .xls)
- EPUB books
### Images (OCR if configured)
- JPEG, PNG, GIF, WebP
- HEIC, HEIF (Apple formats)
- Screenshots, photos of documents, scanned images
### Audio (STT if configured)
- MP3, WAV, OGG, FLAC
- M4A, WebM
- Voice recordings, podcasts
## File Processing Priority
LibreChat processes files based on MIME type matching with the following **priority order**:
1. **OCR** - If file matches `ocr.supportedMimeTypes` AND OCR is configured
2. **STT** - If file matches `stt.supportedMimeTypes` AND STT is configured
3. **Text Parsing** - If file matches `text.supportedMimeTypes`
4. **Fallback** - Text parsing as last resort
### Processing Examples
**PDF file with OCR configured:**
- Matches `ocr.supportedMimeTypes`
- **Uses OCR** to extract text
- Better quality for scanned PDFs
**PDF file without OCR configured:**
- Matches `text.supportedMimeTypes` (fallback)
- **Uses text parsing** library
- Works well for digital PDFs
**Python file:**
- Matches `text.supportedMimeTypes`
- **Uses text parsing** (no OCR needed)
- Direct text extraction
**Audio file with STT configured:**
- Matches `stt.supportedMimeTypes`
- **Uses STT** to transcribe
## Token Limits
Files are truncated to `fileTokenLimit` tokens to manage context window usage:
```yaml
```yaml title="librechat.yaml"
fileConfig:
fileTokenLimit: 100000 # Default: 100,000 tokens
```
- Truncation happens at runtime before prompt construction
- Helps prevent exceeding model context limits
- Configurable based on your needs and model capabilities
- Larger limits allow more content but use more tokens
<Callout type="warn" title="Truncation means lost content">
If your file exceeds the limit, the text is cut off at the end. If you're getting incomplete answers, this might be why. You can increase `fileTokenLimit`, but keep in mind that larger values use more tokens per message — which increases cost and may hit the model's own context limit.
</Callout>
## Comparison with Other File Features
| Feature | Capability | Requires Service | Persistence | Best For |
|---------|-----------|------------------|-------------|----------|
| **Upload as Text** | `context` | No (enhanced by OCR) | Single conversation | Temporary document questions |
| **Agent File Context** | `context` | No (enhanced by OCR) | Agent system instructions | Specialized agent knowledge |
| **File Search** | `file_search` | Yes (vector DB) | Stored in vector store | Large documents, semantic search |
### Upload as Text vs Agent File Context
**Upload as Text (`context`):**
- Available in any chat conversation
- Content included in current conversation only
- No OCR service required (text parsing by default)
- Best for one-off document questions
**Agent File Context (`context`):**
- Only available in Agent Builder
- Content stored in agent's system instructions
- No OCR service required (text parsing by default)
- Best for creating specialized agents with persistent knowledge
- See [OCR for Documents](/docs/features/ocr)
### Upload as Text vs File Search
**Upload as Text (`context`):**
- Full document content in conversation context
- Direct access to all text
- Token usage: entire file (up to limit)
- Works without RAG API configuration
**File Search (`file_search`):**
- Semantic search over documents
- Returns relevant chunks via tool use
- Token usage: only relevant sections
- Requires RAG API and vector store configuration
- See [RAG API](/docs/features/rag_api)
## Example Use Cases
- **Document Analysis**: Upload contracts, reports, or articles for analysis
- **Code Review**: Upload source files for review and suggestions
- **Data Extraction**: Extract information from structured documents
- **Translation**: Translate document contents
- **Summarization**: Summarize articles, papers, or reports
- **Research**: Discuss academic papers or technical documentation
- **Troubleshooting**: Share log files for analysis
- **Content Editing**: Review and edit written content
- **Data Processing**: Work with CSV or JSON data files
## Troubleshooting
### "Upload as Text" option not appearing
**Solution:** Ensure the `context` capability is enabled:
```yaml
endpoints:
agents:
capabilities:
- "context" # Add this if missing
```
### File content not extracted properly
**Solutions:**
1. Check if file type is supported (matches `fileConfig` patterns)
2. For images/scanned documents: Configure OCR for better extraction
3. For audio files: Configure STT service
4. Verify file is not corrupted
### Content seems truncated
**Solution:** Increase the token limit:
```yaml
fileConfig:
fileTokenLimit: 150000 # Increase as needed
```
### Poor extraction quality from images
**Solution:** Configure OCR to enhance extraction:
```yaml
ocr:
strategy: "mistral_ocr"
apiKey: "${OCR_API_KEY}"
```
See [OCR Configuration](/docs/configuration/librechat_yaml/object_structure/ocr) for details.
## Related Features
- [File Context](/docs/features/agents#file-context) - Files used as Agent Context
- [OCR for Documents](/docs/features/ocr) - Learn about and configure OCR services
- [File Configuration](/docs/configuration/librechat_yaml/object_structure/file_config) - Configure file handling
**Rules of thumb:**
- 100k tokens ≈ a 300-page book (plenty for most use cases)
- If you're working with very large files, consider [File Search (RAG)](/docs/features/rag_api) instead — it only retrieves the relevant sections rather than stuffing everything into context
---
Upload as Text provides a simple, powerful way to work with documents in LibreChat without requiring complex configuration or external services.
## Optional: boosting extraction with OCR
Text parsing works fine for digitally-created documents (PDFs saved from Word, code files, plain text). But if you're uploading **scanned documents, photos of pages, or images with text**, the built-in parser won't get great results.
That's where OCR comes in. When configured, LibreChat automatically uses OCR for file types that benefit from it — you don't need to do anything differently as a user.
<Accordions>
<Accordion title="How to tell your admin to set up OCR">
Point them to the [OCR configuration docs](/docs/features/ocr). The short version:
```yaml title="librechat.yaml"
ocr:
strategy: "mistral_ocr"
apiKey: "${OCR_API_KEY}"
baseURL: "https://api.mistral.ai/v1"
mistralModel: "mistral-ocr-latest"
```
Once configured, OCR automatically handles images and scanned PDFs — no changes needed on the user side.
</Accordion>
<Accordion title="How to tell your admin to set up STT (for audio files)">
For transcribing audio uploads, the admin needs to configure a Speech-to-Text service. See the [STT configuration reference](/docs/configuration/librechat_yaml/object_structure/file_config#stt).
</Accordion>
</Accordions>
---
## File handling configuration reference
This section is for admins who want to control which file types get processed by which method. The defaults work well — you only need to touch this if you want to fine-tune behavior.
<Accordions>
<Accordion title="Full fileConfig example">
```yaml title="librechat.yaml"
fileConfig:
# Max tokens extracted from a single file before truncation
fileTokenLimit: 100000
# Files matching these MIME patterns use OCR (if configured)
ocr:
supportedMimeTypes:
- "^image/(jpeg|gif|png|webp|heic|heif)$"
- "^application/pdf$"
- "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$"
- "^application/vnd\\.ms-(word|powerpoint|excel)$"
- "^application/epub\\+zip$"
# Files matching these MIME patterns use text parsing
text:
supportedMimeTypes:
- "^text/(plain|markdown|csv|json|xml|html|css|javascript|typescript|x-python|x-java|x-csharp|x-php|x-ruby|x-go|x-rust|x-kotlin|x-swift|x-scala|x-perl|x-lua|x-shell|x-sql|x-yaml|x-toml)$"
# Files matching these MIME patterns use STT (if configured)
stt:
supportedMimeTypes:
- "^audio/(mp3|mpeg|mpeg3|wav|wave|x-wav|ogg|vorbis|mp4|x-m4a|flac|x-flac|webm)$"
```
**Priority reminder:** OCR > STT > text parsing > fallback.
For the full reference, see [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config).
</Accordion>
</Accordions>
---
## Troubleshooting
<Accordions>
<Accordion title="Upload as Text option not appearing">
The `context` capability was likely removed from your configuration. Ask your admin to add it back:
```yaml
endpoints:
agents:
capabilities:
- "context"
```
</Accordion>
<Accordion title="File content looks wrong or is mostly empty">
A few things to check:
- **Scanned PDFs / images** — These need OCR to extract text properly. Without it, the parser might return garbage or nothing. Ask your admin to [configure OCR](/docs/features/ocr).
- **Audio files** — These need STT. There's no text fallback for audio.
- **Corrupted files** — Try opening the file locally to make sure it's not damaged.
- **Unsupported format** — If the MIME type doesn't match any configured pattern, LibreChat attempts a text parse fallback, which may not work for binary formats.
</Accordion>
<Accordion title="The AI seems to be missing part of my file">
Your file probably exceeded the `fileTokenLimit`. The text was truncated.
**Options:**
- Ask your admin to increase `fileTokenLimit` in `librechat.yaml`
- Use [File Search (RAG)](/docs/features/rag_api) instead, which retrieves relevant chunks rather than loading the entire file
- Split the file into smaller parts and upload them separately
</Accordion>
<Accordion title="Images upload but the AI can't read the text in them">
Without OCR, images are processed through text parsing, which can't actually "see" text in an image. You need OCR configured for this to work. See [OCR for Documents](/docs/features/ocr).
Alternatively, use a **vision model** with standard file upload — the model itself can read text in images.
</Accordion>
</Accordions>
---
## Related
- [OCR for Documents](/docs/features/ocr) — Set up optical character recognition for images and scans
- [RAG API (Chat with Files)](/docs/features/rag_api) — Semantic search over large document collections
- [Agents — File Context](/docs/features/agents#file-context) — Embed file content into an agent's system instructions
- [File Config reference](/docs/configuration/librechat_yaml/object_structure/file_config) — Full YAML schema for file handling

View File

@@ -58,6 +58,7 @@ const nonPermanentRedirects = [
['/docs/user_guides/rag_api', '/docs/features/rag_api'],
['/docs/user_guides/plugins', '/docs/features/agents'],
['/docs/features/plugins', '/docs/features/agents'],
['/docs/features/speech-to-text', '/docs/configuration/stt_tts'],
['/docs/configuration/librechat_yaml/setup', '/docs/configuration/librechat_yaml'],
['/toolkit/yaml_checker', '/toolkit/yaml-checker'],
['/toolkit/creds_generator', '/toolkit/creds-generator'],