📚 docs: Add firecrawlOptions configuration details to web search documentation (#360)

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-27 10:48:32 +07:00 · 2025-07-22 19:54:17 -07:00
parent b8e656306c
commit f0f07d24a7
2 changed files with 171 additions and 0 deletions
--- a/components/changelog/content/config_v1.2.8.mdx
+++ b/components/changelog/content/config_v1.2.8.mdx
@@ -23,6 +23,9 @@
 - Added user placeholder variables support to Custom Endpoint Headers:
  - Users can now use `{{LIBRECHAT_USER_ID}}`, `{{LIBRECHAT_USER_EMAIL}}`, and other user field placeholders in custom endpoint headers
  - See: [Custom Endpoint Object Structure - Headers](/docs/configuration/librechat_yaml/object_structure/custom_endpoint#headers) for details
+- Enhanced `webSearch` configuration with comprehensive Firecrawl scraper options
+  - Added detailed configuration options for Firecrawl scraper including formats, includeTags, excludeTags, headers, waitFor, timeout, maxAge, mobile, skipTlsVerification, parsePDF, removeBase64Images, blockAds, storeInCache, zeroDataRetention, onlyMainContent, location, and changeTrackingOptions
+  - See [Web Search Configuration](/docs/configuration/librechat_yaml/object_structure/web_search) for details
 - Improved [Model Specs documentation](/docs/configuration/librechat_yaml/object_structure/model_specs) with parameter support updates (disableStreaming, thinking, thinkingBudget, web_search, etc...)
 - Enhanced MCP (Model Context Protocol) server management with connection status tracking and OAuth support
  - Added dynamic status icons showing server state (connected, disconnected, OAuth required, error, initializing)
--- a/pages/docs/configuration/librechat_yaml/object_structure/web_search.mdx
+++ b/pages/docs/configuration/librechat_yaml/object_structure/web_search.mdx
@@ -105,6 +105,174 @@ webSearch:
  ]}
 />

+### firecrawlOptions
+
+<OptionTable
+  options={[
+    ['firecrawlOptions', 'Object', 'Advanced configuration options for Firecrawl scraper.', ''],
+  ]}
+/>
+
+**Subkeys:**
+
+#### formats
+
+<OptionTable
+  options={[
+    ['formats', 'Array of Strings', 'Formats to include in the output.', ''],
+  ]}
+/>
+
+#### includeTags
+
+<OptionTable
+  options={[
+    ['includeTags', 'Array of Strings', 'Tags to include in the output.', ''],
+  ]}
+/>
+
+#### excludeTags
+
+<OptionTable
+  options={[
+    ['excludeTags', 'Array of Strings', 'Tags to exclude from the output.', ''],
+  ]}
+/>
+
+#### headers
+
+<OptionTable
+  options={[
+    ['headers', 'Object', 'Headers to send with the request. Can be used to send cookies, user-agent, etc.', ''],
+  ]}
+/>
+
+#### waitFor
+
+<OptionTable
+  options={[
+    ['waitFor', 'Number', 'Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.', ''],
+  ]}
+/>
+
+#### timeout
+
+<OptionTable
+  options={[
+    ['timeout', 'Number', 'Timeout in milliseconds for the scraping request.', 'Default: 7500'],
+  ]}
+/>
+
+#### maxAge
+
+<OptionTable
+  options={[
+    ['maxAge', 'Number', 'Returns a cached version of the page if it is younger than this age in milliseconds. If a cached version of the page is older than this value, the page will be scraped.', ''],
+  ]}
+/>
+
+**Note:** If you do not need extremely fresh data, enabling this can speed up your scrapes by 500%.
+
+#### mobile
+
+<OptionTable
+  options={[
+    ['mobile', 'Boolean', 'Emulate scraping from a mobile device.', ''],
+  ]}
+/>
+
+#### skipTlsVerification
+
+<OptionTable
+  options={[
+    ['skipTlsVerification', 'Boolean', 'Skip TLS certificate verification when making requests.', ''],
+  ]}
+/>
+
+#### blockAds
+
+<OptionTable
+  options={[
+    ['blockAds', 'Boolean', 'Enables ad-blocking and cookie popup blocking.', ''],
+  ]}
+/>
+
+#### removeBase64Images
+
+<OptionTable
+  options={[
+    ['removeBase64Images', 'Boolean', 'Removes all base 64 images from the output, which may be overwhelmingly long. The image\'s alt text remains in the output, but the URL is replaced with a placeholder.', ''],
+  ]}
+/>
+
+#### parsePDF
+
+<OptionTable
+  options={[
+    ['parsePDF', 'Boolean', 'Controls how PDF files are processed during scraping.', ''],
+  ]}
+/>
+
+#### storeInCache
+
+<OptionTable
+  options={[
+    ['storeInCache', 'Boolean', 'If true, the page will be stored in the Firecrawl index and cache. Setting this to false is useful if your scraping activity may have data protection concerns. Using some parameters associated with sensitive scraping (headers) will force this parameter to be false.', ''],
+  ]}
+/>
+
+#### zeroDataRetention
+
+<OptionTable
+  options={[
+    ['zeroDataRetention', 'Boolean', 'If true, this will enable zero data retention for this scrape (requires prior setup on Firecrawl).', ''],
+  ]}
+/>
+
+#### location
+
+<OptionTable
+  options={[
+    ['location', 'Object', 'Geographic location and language settings for scraping.', ''],
+  ]}
+/>
+
+#### onlyMainContent
+
+<OptionTable
+  options={[
+    ['onlyMainContent', 'Boolean', 'Only return the main content of the page excluding headers, navs, footers, etc.', ''],
+  ]}
+/>
+
+#### changeTrackingOptions
+
+<OptionTable
+  options={[
+    ['changeTrackingOptions', 'Object', 'Configuration for tracking changes in scraped content.', ''],
+  ]}
+/>
+
+**Example:**
+```yaml filename="webSearch"
+webSearch:
+  firecrawlApiKey: "${FIRECRAWL_API_KEY}"
+  firecrawlOptions:
+    formats: ["markdown", "rawHtml"]
+    includeTags: ["main", "article", ".content"]
+    excludeTags: ["nav", "footer", ".ads"]
+    waitFor: 2000
+    timeout: 10000
+    mobile: false
+    blockAds: true
+    onlyMainContent: true
+    location:
+      country: "US"
+      languages: ["en"]
+```
+
+**Note:** For detailed information about Firecrawl scraper options and defaults, see the [Firecrawl API Documentation](https://docs.firecrawl.dev/api-reference/endpoint/scrape).
+
 ## Rerankers

 ### jinaApiKey