📙 docs: Upload as Text (#381)

* 📚 docs: Add OCR, textParsing, and fileTokenLimit configuration documentation (#376)

- Added support for `ocr` and `textParsing` configurations in `fileConfig`, allowing users to specify file types for OCR processing and direct text extraction.
- Introduced `fileTokenLimit` parameter for all endpoints to manage maximum token limits for file processing.

* 📚 docs: Add STT configuration documentation (#380)

- Added `stt` configuration to `fileConfig` for Speech-to-Text audio file processing, including supported MIME types.
- Updated changelog to reflect the addition of STT alongside existing OCR and text parsing configurations.

* 📚 docs: finish fileTokenLimit documentation and update changelog (#382)

* refactor: change `textParsing` to `text`
This commit is contained in:
Dustin Healy
2025-10-01 07:38:08 -07:00
committed by GitHub
parent 86476fe309
commit e2f771cea8
4 changed files with 150 additions and 2 deletions

View File

@@ -86,4 +86,20 @@
- See [Interface Object Structure - fileSearch](/docs/configuration/librechat_yaml/object_structure/interface#filesearch) for details
- Improved [Model Specs documentation](/docs/configuration/librechat_yaml/object_structure/model_specs) with parameter support updates:
- Added support for `disableStreaming`, `thinking`, `thinkingBudget`, `web_search`, and other parameters
- Added support for `disableStreaming`, `thinking`, `thinkingBudget`, `web_search`, and other parameters
- Added OCR, text parsing, and STT separation to `fileConfig`:
- Added `ocr` configuration to control which file types use OCR processing
- Added `text` configuration to control which file types use direct text extraction
- Added `stt` configuration to control which audio file types use Speech-to-Text transcription
- Separate processing paths for visual documents (OCR), text files (native parsing), and audio files (STT)
- Processing precedence: OCR > STT > text parsing
- Default OCR support: images (JPEG, GIF, PNG, WebP, HEIC, HEIF), PDFs, Office documents, EPUB files
- Default text parsing support: all text MIME types and common programming languages
- Default STT support: audio formats (MP3, WAV, FLAC, OGG, M4A, WebM, etc.)
- See [File Config Object Structure](/docs/configuration/librechat_yaml/object_structure/file_config) for details
- Added `fileTokenLimit` parameter support for all endpoints:
- Allows setting default and on-the-fly maximum token limits for file processing to control costs and resource usage
- Available as URL query parameter and in endpoint configuration panels, or can be configured in `fileConfig` field of `librechat.yaml`
- Runtime behavior: text from attached files is truncated to this limit just before prompt construction (default: 100000)