📚 docs: Add firecrawlOptions configuration details to web search documentation (#360)

Co-authored-by: Danny Avila <danny@librechat.ai>
This commit is contained in:
Dustin Healy
2025-07-22 19:54:17 -07:00
committed by GitHub
parent b8e656306c
commit f0f07d24a7
2 changed files with 171 additions and 0 deletions

View File

@@ -23,6 +23,9 @@
- Added user placeholder variables support to Custom Endpoint Headers:
- Users can now use `{{LIBRECHAT_USER_ID}}`, `{{LIBRECHAT_USER_EMAIL}}`, and other user field placeholders in custom endpoint headers
- See: [Custom Endpoint Object Structure - Headers](/docs/configuration/librechat_yaml/object_structure/custom_endpoint#headers) for details
- Enhanced `webSearch` configuration with comprehensive Firecrawl scraper options
- Added detailed configuration options for Firecrawl scraper including formats, includeTags, excludeTags, headers, waitFor, timeout, maxAge, mobile, skipTlsVerification, parsePDF, removeBase64Images, blockAds, storeInCache, zeroDataRetention, onlyMainContent, location, and changeTrackingOptions
- See [Web Search Configuration](/docs/configuration/librechat_yaml/object_structure/web_search) for details
- Improved [Model Specs documentation](/docs/configuration/librechat_yaml/object_structure/model_specs) with parameter support updates (disableStreaming, thinking, thinkingBudget, web_search, etc...)
- Enhanced MCP (Model Context Protocol) server management with connection status tracking and OAuth support
- Added dynamic status icons showing server state (connected, disconnected, OAuth required, error, initializing)

View File

@@ -105,6 +105,174 @@ webSearch:
]}
/>
### firecrawlOptions
<OptionTable
options={[
['firecrawlOptions', 'Object', 'Advanced configuration options for Firecrawl scraper.', ''],
]}
/>
**Subkeys:**
#### formats
<OptionTable
options={[
['formats', 'Array of Strings', 'Formats to include in the output.', ''],
]}
/>
#### includeTags
<OptionTable
options={[
['includeTags', 'Array of Strings', 'Tags to include in the output.', ''],
]}
/>
#### excludeTags
<OptionTable
options={[
['excludeTags', 'Array of Strings', 'Tags to exclude from the output.', ''],
]}
/>
#### headers
<OptionTable
options={[
['headers', 'Object', 'Headers to send with the request. Can be used to send cookies, user-agent, etc.', ''],
]}
/>
#### waitFor
<OptionTable
options={[
['waitFor', 'Number', 'Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.', ''],
]}
/>
#### timeout
<OptionTable
options={[
['timeout', 'Number', 'Timeout in milliseconds for the scraping request.', 'Default: 7500'],
]}
/>
#### maxAge
<OptionTable
options={[
['maxAge', 'Number', 'Returns a cached version of the page if it is younger than this age in milliseconds. If a cached version of the page is older than this value, the page will be scraped.', ''],
]}
/>
**Note:** If you do not need extremely fresh data, enabling this can speed up your scrapes by 500%.
#### mobile
<OptionTable
options={[
['mobile', 'Boolean', 'Emulate scraping from a mobile device.', ''],
]}
/>
#### skipTlsVerification
<OptionTable
options={[
['skipTlsVerification', 'Boolean', 'Skip TLS certificate verification when making requests.', ''],
]}
/>
#### blockAds
<OptionTable
options={[
['blockAds', 'Boolean', 'Enables ad-blocking and cookie popup blocking.', ''],
]}
/>
#### removeBase64Images
<OptionTable
options={[
['removeBase64Images', 'Boolean', 'Removes all base 64 images from the output, which may be overwhelmingly long. The image\'s alt text remains in the output, but the URL is replaced with a placeholder.', ''],
]}
/>
#### parsePDF
<OptionTable
options={[
['parsePDF', 'Boolean', 'Controls how PDF files are processed during scraping.', ''],
]}
/>
#### storeInCache
<OptionTable
options={[
['storeInCache', 'Boolean', 'If true, the page will be stored in the Firecrawl index and cache. Setting this to false is useful if your scraping activity may have data protection concerns. Using some parameters associated with sensitive scraping (headers) will force this parameter to be false.', ''],
]}
/>
#### zeroDataRetention
<OptionTable
options={[
['zeroDataRetention', 'Boolean', 'If true, this will enable zero data retention for this scrape (requires prior setup on Firecrawl).', ''],
]}
/>
#### location
<OptionTable
options={[
['location', 'Object', 'Geographic location and language settings for scraping.', ''],
]}
/>
#### onlyMainContent
<OptionTable
options={[
['onlyMainContent', 'Boolean', 'Only return the main content of the page excluding headers, navs, footers, etc.', ''],
]}
/>
#### changeTrackingOptions
<OptionTable
options={[
['changeTrackingOptions', 'Object', 'Configuration for tracking changes in scraped content.', ''],
]}
/>
**Example:**
```yaml filename="webSearch"
webSearch:
firecrawlApiKey: "${FIRECRAWL_API_KEY}"
firecrawlOptions:
formats: ["markdown", "rawHtml"]
includeTags: ["main", "article", ".content"]
excludeTags: ["nav", "footer", ".ads"]
waitFor: 2000
timeout: 10000
mobile: false
blockAds: true
onlyMainContent: true
location:
country: "US"
languages: ["en"]
```
**Note:** For detailed information about Firecrawl scraper options and defaults, see the [Firecrawl API Documentation](https://docs.firecrawl.dev/api-reference/endpoint/scrape).
## Rerankers
### jinaApiKey